It’s easy to decipher watermarks from AI-generated text

AI languages ​​work by predicting the next word in a sentence, producing one word at a time based on those predictions. Text watermarking algorithms divide the text of a language into words on the “green list” and “red list,” and enable the AI ​​model to select words from the green list. The more words in a sentence that come from the green list, the more likely the text was generated by a computer. People tend to write sentences that contain random words.

The researchers identified five watermarks that work in this way. They were able to reverse-engineer the watermarks using an API to find the AI ​​version and watermark that is used and activated most of the time, Staab says. The solutions allow the attacker to “steal” the watermark by constructing an image of the watermarking rules. They do this by analyzing AI outputs and comparing them to conventional documents.

Once they have an idea of ​​what the watermarked text might be, this allows researchers to kill two types of threats. The first, called a spoofing attack, allows malicious actors to use information learned after spoofing a watermark to create text that can be passed off as watermarks. The second attack allows attackers to delete AI-generated text from its watermark, so that the text can be passed off as human-written.

The team achieved about 80% success in destroying watermarks, and 85% success in removing AI-generated text from its watermark.

Researchers who are not affiliated with the ETH Zürich team, such as Soheil Feizi, associate professor and director of the Reliable AI Lab at the University of Maryland, are also. found watermarks be unreliable and vulnerable to fraud threats.

The findings from ETH Zürich confirm that these issues with watermarking continue and extend to the most advanced types of chatbots and major languages ​​in use today, says Feizi.

The study “underscores the need to be cautious about deploying automated systems at scale,” he says.

Despite these findings, watermarks are still the most reliable way to identify AI-generated content, says Nikola Jovanović, a PhD student at ETH Zürich who participated in the study.

But more research is needed before watermarks are ready for large-scale deployment, he adds. Until then, we have to manage our expectations about how reliable and useful these tools are. He said: “If it’s better than nothing, it’s still useful.”

Update: This study will be presented at the International Conference on Educational Representation. This article has been updated to reflect this.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *