Researchers at the machine learning lab OpenAI have discovered that state-of-the-art computer vision systems can be tricked into using less sophisticated tools than pens and pads. As you can see in the image above, simply naming the object and pasting it on another one can trick the software into misidentifying what it’s seeing.
“We’ve got these attacks Typographic attack,” writes a researcher at OpenAI in a blog post. “By leveraging the model’s ability to read text robustly, we have found that even photos of handwritten text can often trick the model.” They note that such attacks are similar to “hostile images” that can trick commercial machine vision systems, but are much easier to create.
Hostile images present a real risk to systems that rely on machine vision. Researchers, for example, have tricked the Tesla autonomous car’s software to show that it is possible to change lanes without warning simply by putting a specific sticker on the road. These attacks pose a serious threat to a variety of AI applications, from medical to military.
However, the dangers from this particular attack are nothing to worry about, at least for now. The OpenAI software in question is an experimental system called CLIP that has not been distributed in commercial products. Indeed, the nature of CLIP’s unusual machine learning architecture has created a weakness for this attack to succeed.
CLIP is intended to explore how AI systems can learn how to identify objects without close supervision through training on a huge database of image and text pairs. In this case, OpenAI used about 400 million image text pairs scraped from the internet to train the CLIP released in January.
This month, OpenAI researchers published a new paper explaining how they opened CLIP to see how it works. They discovered what they call “multimodal neurons,” individual components of machine learning networks that respond not only to images of things, but also to sketches, cartoons, and related text. One of the reasons this is exciting is that it seems to reflect the way the human brain responds to stimuli, and a single brain cell has been observed that responds to abstract concepts rather than specific instances. OpenAI’s research suggests that it may be possible for AI systems to internalize such knowledge in a human-like way.
It could lead to more sophisticated vision systems in the future, but now that approach is in its infancy. Anyone can tell the difference between an apple and a piece of paper with the word “apple” on it, but software like CLIP doesn’t. The same ability that allows the program to connect words and images at an abstract level creates this unique weakness that OpenAI describes as “the error of abstraction.”
Another example provided by the lab is the neurons in CLIP that identify a piggy bank. This component reacts not only to a photo of a piggy bank, but also to a string of dollar signs. As in the example above, this means you can trick CLIP into identifying the chainsaw as a piggy bank by overlapping it with the string “$$$” as if it were half priced at a local hardware store.
Researchers have also found that CLIP’s multimodal neurons accurately encode some kind of bias that can be found when sourcing data from the Internet. They point out that “Middle East” neurons have also been linked to terrorism and have found “neurons that fire for both dark-skinned people and gorillas.” This duplicates the notorious error in Google’s image recognition system of tagging black people as gorillas. This is another example of how different machine intelligence is from humans. And that’s why it’s necessary to understand how it works by separating electrons before trusting our lives in AI.