Deepfakes easily bypass watermark protections, researchers warn

With the proliferation of deepfakes and the evolution of attempts to put a stop to them, it seems that all these efforts might be futile, as the newest research has shown that any artificial intelligence (AI) image watermark can be removed.

Specifically, researchers from the University of Waterloo’s Cybersecurity and Privacy Institute have demonstrated the capabilities of deepfakes to remove AI image watermarks without the attacker needing to know the design of the watermark or whether an image had one at all, per a report on July 23.

According to Andre Kassis, a PhD candidate in computer science and the lead author of the study, we need to have an efficient method of seeing through deepfakes, as “people want a way to verify what’s real and what’s not because the damages will be huge if we can’t,” because “this technology could have terrible and wide-reaching consequences, (…) from political smear campaigns to non-consensual pornography.”

To try and address this problem, AI behemoths like OpenAI, Meta, and Google have all offered invisible encoded ‘watermarks,’ with an aim to create publicly available tools that accurately discern between AI-generated content and real photos or videos, without revealing the nature of the watermarks.

UnMarker easily bypasses watermark defenses

That said, the Waterloo team has created UnMarker, a universal tool that successfully eliminates watermarks in real-world settings without the need to know the specifics of their encoding. This means it doesn’t require any knowledge of the algorithm, access to internal parameters, or interaction with the detector at all, efficiently removing both traditional and semantic watermarks without any customization.

In the words of Dr. Urs Hengartner, associate professor of the David R. Cheriton School of Computer Science at the University of Waterloo:

“While watermarking schemes are typically kept secret by AI companies, they must satisfy two essential properties: they need to be invisible to human users to preserve image quality, and they must be robust, that is, resistant to manipulation of an image like cropping or reducing resolution.”

However, such requirements substantially limit the possible designs for watermarks, so they need to operate in the image’s spectral domain, which Hengartner explained means they “subtly manipulate how pixel intensities vary across the image.”

Notably, UnMarker uses a statistical attack to identify places in the image where the pixel frequency is unusual, distorting this frequency and making the image unrecognizable to the watermark-identifying tool but undetectably different to the naked eye.

This method was successful in over 50% of the tested cases on various AI models, including Google’s SynthID and Meta’s Stable Signature, even without any previous knowledge of the target’s origins or methods of watermarking. Finally, Kassis concludes:

“If we can figure this out, so can malicious actors. Watermarking is being promoted as this perfect solution, but we’ve shown that this technology is breakable. Deepfakes are still a huge threat. We live in an era where you can’t really trust what you see anymore.”

As a reminder, 98% of online deepfakes are sexual in nature (some violent), and what women actors and musicians make up 88% of posted videos, but at least MrDeepFakes, the internet’s biggest non-consensual deepfake porn site that had thousands of daily visits permanently shut down earlier this year, making it a small win for the victims.