Most users cannot identify AI racial bias—even in training data

Gaby Clark
scientific editor

Robert Egan
associate editor

When recognizing faces and emotions, artificial intelligence (AI) can be biased, like classifying white people as happier than people from other racial backgrounds. This happens because the data used to train the AI contained a disproportionate number of happy white faces, leading it to correlate race with emotional expression.
In a recent study, published in , researchers asked users to assess such skewed training data, but most users didn't notice the bias—unless they were in the negatively portrayed group.
The study was designed to examine whether laypersons understand that unrepresentative data used to train AI systems can result in biased performance. The scholars, who have been studying this issue for five years, said AI systems should be trained so they "work for everyone," and produce outcomes that are diverse and representative for all groups, not just one majority group. According to the researchers, that includes understanding what AI is learning from unanticipated correlations in the training data—or the datasets fed into the system to teach it how it is expected to perform in the future.
"In the case of this study, AI seems to have learned that race is an important criterion for determining whether a face is happy or sad," said senior author S. Shyam Sundar, Evan Pugh University Professor and director of the Center for Socially Responsible Artificial Intelligence at Penn State. "Even though we don't mean for it to learn that."
The question is whether humans can recognize this bias in the training data. According to the researchers, most participants in their experiments only started to notice bias when the AI showed biased performance, such as misclassifying emotions for Black individuals but doing a good job of classifying the emotions expressed by white individuals. Black participants were more likely to suspect that there was an issue, especially when the training data over-represented their own group for representing negative emotion (sadness).
"In one of the experiment scenarios—which featured racially biased AI performance—the system failed to accurately classify the facial expression of the images from minority groups," said lead author Cheng "Chris" Chen, an assistant professor of emerging media and technology at Oregon State University who earned her doctorate in mass communications from the Donald P. Bellisario College of Communications at Penn State. "That is what we mean by biased performance in an AI system where the system favors the dominant group in its classification."
Chen, Sundar and co-author Eunchae Jang, a doctoral student in mass communications at the Bellisario College, created 12 versions of a prototype AI system designed to detect users' facial expressions. With 769 participants across three experiments, the researchers tested how users might detect bias in different scenarios. The first two experiments included participants from a variety of racial backgrounds with white participants making up most of the sample. In the third experiment, the researchers intentionally recruited an equal number of Black and white participants.
Images used in the studies were of Black and white individuals. The first experiment showed participants biased representation of race in certain classification categories, such as happy or sad images that were unevenly distributed across racial groups. Happy faces were mostly white. Sad faces were mostly Black.
The second showed bias pertaining to the lack of adequate representation of certain racial groups in the training data. For example, participants would see only white subject images in both happy and sad categories.
In the third experiment, the researchers presented the stimuli from the first two experiments alongside their counterexamples, resulting in five conditions: happy Black/sad white; happy white/sad Black; all white; all Black; and no racial confound, meaning there was no potential mixing of emotion and race.
For each experiment, the researchers asked participants if they perceived the AI system treated every racial group equally. The researchers found that over the three scenarios, most participants indicated that they did not notice any bias. In the final experiment, Black participants were more likely to identify the racial bias, compared to their white counterparts and often only when it involved unhappy images of Black people.
"We were surprised that people failed to recognize that race and emotion were confounded, that one race was more likely than others to represent a given emotion in the training data—even when it was staring them in the face," Sundar said. "For me, that's the most important discovery of the study."
Sundar added that the research was more about human psychology than technology. He said people often "trust AI to be neutral, even when it isn't."
Chen said people's inability to detect the racial confound in the training data leads to reliance on AI performance for evaluation.
"Bias in performance is very, very persuasive," Chen said. "When people see racially biased performance by an AI system, they ignore the training data characteristics and form their perceptions based on the biased outcome."
Plans for future research include developing and testing better ways to communicate bias inherent in AI to users, developers and policymakers. The researchers said they hope to continue studying how people perceive and understand algorithmic bias by focusing on improving media and AI literacy.
More information: Cheng Chen et al, Racial Bias in AI Training Data: Do Laypersons Notice?, Media Psychology (2025).
Provided by Pennsylvania State University