Easy to Beat AI Platform
Wednesday, March 8, 2017 @ 02:03 PM gHale
Google’s new machine learning-based system to identify toxic comments in online discussion forums can end up bypassed by simply misspelling or adding unnecessary punctuation to abusive words, such as “idiot” or “moron,” researchers said.
Perspective is a project by Google’s technology incubator Jigsaw, which uses artificial intelligence to combat Internet trolls and promote more civil online discussion by automatically detecting online insults, harassment and abusive speech. The company launched a demonstration website Feb. 23 that allows anyone to type in a phrase and see its “toxicity score” — a measure of how rude, disrespectful or unreasonable a particular comment is, said researchers at the University of Washington.
The early stage technology system can end up deceived by using common adversarial tactics, said the UW electrical engineers and security experts who posted a paper on the e-print repository arXiv.
They showed one can subtly modify a phrase that receives a high toxicity score so it contains the same abusive language but receives a low toxicity score.
UW researchers evaluated Perspective in adversarial settings and found the system is vulnerable to missing incendiary language and falsely blocking non-abusive phrases.
“Machine learning systems are generally designed to yield the best performance in benign settings. But in real-world applications, these systems are susceptible to intelligent subversion or attacks,” said senior author Radha Poovendran, chair of the UW electrical engineering department and director of the Network Security Lab. “We wanted to demonstrate the importance of designing these machine learning tools in adversarial environments. Designing a system with a benign operating environment in mind and deploying it in adversarial environments can have devastating consequences.”
To solicit feedback and invite other researchers to explore the strengths and weaknesses of using machine learning as a tool to improve online discussions, Perspective developers made their experiments, models and data publicly available along with the tool itself.
In one case, the UW team misspelled or added extraneous punctuation or spaces to the offending words, which yielded much lower toxicity scores. For example, simply changing “idiot” to “idiiot” reduced the toxicity rate of an otherwise identical phrase from 84 percent to 20 percent.
Researchers also showed the system does not assign a low toxicity score to a negated version of an abusive phrase.
Researchers also observed the duplicitous changes often transfer among different phrases — once an intentionally misspelled word earned a low toxicity score in one phrase, it also would get a low score in another phrase. That means an adversary could create a “dictionary” of changes for every word and significantly simplify the attack process.
“There are two metrics for evaluating the performance of a filtering system like a spam blocker or toxic speech detector; one is the missed detection rate, and the other is the false alarm rate,” said lead author and UW electrical engineering doctoral student Hossein Hosseini. “Of course scoring the semantic toxicity of a phrase is challenging, but deploying defensive mechanisms both in algorithmic and system levels can help the usability of the system in real-world settings.”
The research team suggests several techniques to improve the robustness of toxic speech detectors, including applying a spellchecking filter prior to the detection system, training the machine learning algorithm with adversarial examples and blocking suspicious users for a period of time.
“Our Network Security Lab research is typically focused on the foundations and science of cybersecurity,” said Poovendran, the lead principal investigator of a recently awarded MURI grant, of which adversarial machine learning is a significant component. “But our expanded focus includes developing robust and resilient systems for machine learning and reasoning systems that need to operate in adversarial environments for a wide range of applications.”
Leave a Reply
You must be logged in to post a comment.