When Google released its stand-alone Photos app in May 2015, people were wowed by what it could do: analyze images to label the people, places and things in them, an astounding consumer offering at the time. But a couple of months after the release, a software developer, Jacky Alciné, discovered that Google had labeled photos of him and a friend, who are both Black, as “gorillas,” a term that is particularly offensive because it echoes centuries of racist tropes.
In the ensuing controversy, Google prevented its software from categorizing anything in Photos as gorillas, and it vowed to fix the problem. Eight years later, with significant advances in artificial intelligence, we tested whether Google had resolved the issue, and we looked at comparable tools from its competitors: Apple, Amazon and Microsoft.
There was one member of the primate family that Google and Apple were able to recognize – lemurs, the permanently startled-looking, long-tailed animals that share opposable thumbs with humans, but are more distantly related than are apes.
Google’s and Apple’s tools were clearly the most sophisticated when it came to image analysis.
Yet Google, whose Android software underpins most of the world’s smartphones, has made the decision to turn off the ability to visually search for primates for fear of making an offensive mistake and labeling a person as an animal. And Apple, with technology that performed similarly to Google’s in our test, appeared to disable the ability to look for monkeys and apes as well.
Consumers may not need to frequently perform such a search – though in 2019, an iPhone user complained on Apple’s customer support forum that the software “can’t find monkeys in photos on my device.” But the issue raises larger questions about other unfixed, or unfixable, flaws lurking in services that rely on computer vision – a technology that interprets visual images – as well as other products powered by AI.
Alciné was dismayed to learn that Google has still not fully solved the problem and said society puts too much trust in technology.
“I’m going to forever have no faith in this AI,” he said.
Computer vision products are now used for tasks as mundane as sending an alert when there is a package on the doorstep, and as weighty as navigating cars and finding perpetrators in law enforcement investigations.
Errors can reflect racist attitudes among those encoding the data. In the gorilla incident, two former Google employees who worked on this technology said the problem was that the company had not put enough photos of Black people in the image collection that it used to train its AI system. As a result, the technology was not familiar enough with darker-skinned people and confused them for gorillas.
As AI becomes more embedded in our lives, it is eliciting fears of unintended consequences. Although computer vision products and AI chatbots like ChatGPT are different, both depend on underlying reams of data that train the software, and both can misfire because of flaws in the data or biases incorporated into their code.
Microsoft recently limited users’ ability to interact with a chatbot built into its search engine, Bing, after it instigated inappropriate conversations.
Microsoft’s decision, like Google’s choice to prevent its algorithm from identifying gorillas altogether, illustrates a common industry approach – to wall off technology features that malfunction rather than fixing them.
“Solving these issues is important,” said Vicente Ordóñez, a professor at Rice University who studies computer vision. “How can we trust this software for other scenarios?”
Michael Marconi, a Google spokesperson, said Google had prevented its photo app from labeling anything as a monkey or ape because it decided the benefit “does not outweigh the risk of harm.”
Apple declined to comment on users’ inability to search for most primates on its app.
Representatives from Amazon and Microsoft said the companies were always seeking to improve their products.
Bad vision
When Google was developing its photo app, it collected a large amount of images to train the AI system to identify people, animals and objects.
Its significant oversight – that there were not enough photos of Black people in its training data – caused the app to later malfunction, two former Google employees said. The company failed to uncover the “gorilla” problem back then because it had not asked enough employees to test the feature before its public debut, the former employees said.
Google profusely apologized for the gorillas incident, but it was one of a number of episodes in the wider tech industry that have led to accusations of bias.
Other products that have been criticized include HP’s facial-tracking webcams, which could not detect some people with dark skin, and the Apple Watch, which, according to a lawsuit, failed to accurately read blood oxygen levels across skin colors. The lapses suggested that tech products were not being designed for people with darker skin. (Apple pointed to a paper from 2022 that detailed its efforts to test its blood oxygen app on a “wide range of skin types and tones.”)
Years after the Google Photos error, the company encountered a similar problem with its Nest home-security camera during internal testing, according to a person familiar with the incident who worked at Google at the time. The Nest camera, which used AI to determine whether someone on a property was familiar or unfamiliar, mistook some Black people for animals. Google rushed to fix the problem before users had access to the product, the person said.
However, Nest customers continue to complain on the company’s forums about other flaws. In 2021, a customer received alerts that his mother was ringing the doorbell but found his mother-in-law instead on the other side of the door. When users complained that the system was mixing up faces they had marked as “familiar,” a customer support representative in the forum advised them to delete all of their labels and start over.
Marconi, the Google spokesperson, said that “our goal is to prevent these types of mistakes from ever happening.” He added that the company had improved its technology “by partnering with experts and diversifying our image data sets.”
In 2019, Google tried to improve a facial-recognition feature for Android smartphones by increasing the number of people with dark skin in its data set. But the contractors whom Google had hired to collect facial scans reportedly resorted to a troubling tactic to compensate for that dearth of diverse data: They targeted homeless people and students. Google executives called the incident “very disturbing” at the time.
The fix?
While Google worked behind the scenes to improve the technology, it never allowed users to judge those efforts.
Margaret Mitchell, a researcher and co-founder of Google’s Ethical AI group, joined the company after the gorilla incident and collaborated with the Photos team. She said in a recent interview that she was a proponent of Google’s decision to remove “the gorillas label, at least for a while.”
“You have to think about how often someone needs to label a gorilla versus perpetuating harmful stereotypes,” Mitchell said. “The benefits don’t outweigh the potential harms of doing it wrong.”
Ordóñez, the professor, speculated that Google and Apple could now be capable of distinguishing primates from humans, but that they didn’t want to enable the feature given the possible reputational risk if it misfired again.
Google has since released a more powerful image analysis product, Google Lens, a tool to search the web with photos rather than text. Wired discovered in 2018 that the tool was also unable to identify a gorilla.
These systems are never foolproof, said Mitchell, who is no longer working at Google. Because billions of people use Google’s services, even rare glitches that happen to only one person out of a billion users will surface.
“It only takes one mistake to have massive social ramifications,” she said, referring to it as “the poisoned needle in a haystack.”
This article originally appeared in The New York Times.