The technology that powers the nation’s leading automated speech recognition systems makes twice as many errors when interpreting words spoken by African Americans as when interpreting the same words spoken by Whites, according to a new study led by engineering researchers at Stanford University.
Researchers tested automated speech recognition systems developed by five companies: Amazon, IBM, Google, Microsoft, and Apple. All five speech recognition technologies had error rates that were almost twice as high for Blacks as for Whites – even when the speakers were matched by gender and age and when they spoke the same words. On average, the systems misunderstood 35 percent of the words spoken by Blacks but only 19 percent of those spoken by Whites. Error rates were highest for African American men, and the disparity was higher among speakers who made heavier use of African American vernacular English.
Why is this important? The researchers note that many companies now screen job applicants with automated online interviews that employ speech recognition. Courts use the technology to help transcribe hearings. In the COVID-19 environment, where telemedicine has become increasingly important, speech recognition software may impact the accuracy of information healthcare providers receive from patients.
“One should expect that U.S.-based companies would build products that serve all Americans,” said study lead author Allison Koenecke, a doctoral candidate in computational and mathematical engineering who teamed up with linguists and computer scientists on the work. “Right now, it seems that they’re not doing that for a whole segment of the population.”
The full study, “Racial Disparities in Automated Speech Recognition,” was published in the Proceeding of the National Academy of Sciences. It may be accessed here.
Note: The tests were conducted last spring, and the speech technologies may have been updated since then.