Jan 12, 2023 | Read time 4 min

Speech Technology in a Global World: Reducing Inequalities Across Languages

Speechmatics’ Data Science Engineer, Benedetta Cevoli looks at why speech technology still has some distance to travel in reducing inequality across different languages.
Speech Technology in a Global World: Reducing Inequalities Across Languages
Benedetta Cevoli
Benedetta CevoliSenior Machine Learning Engineer

Speech technology already plays a huge part in our everyday lives, from common applications on our phones and computers, to unseen uses in customer services and advertising. As artificial intelligence continues to make huge leaps in everything from structuring materials to content creation, it won’t be long before voice technology plays an even bigger role in our lives, including in areas like legal and healthcare. And while it might not be a matter of life and death if Alexa plays you the wrong song, in a health setting, it very well could be.

Speech Technology Today

Recent advancements in speech technology have been hugely impressive. AI-led speech-to-text is now the only real choice for transcription at scale, with human-led transcription being prohibitively expensive and time-consuming. But are we really at the stage where we can put our hands up and say, “The problem is solved”?

Speech recognition is an incredible technology that we’ll rely on more and more in the future. Yet, it can also be a barrier. As a non-native speaker, it’s a barrier I’m all too aware of. I’m originally from Italy but have been living in the UK for several years. A few years ago, my partner and I bought our first smart speaker. Excitedly, we started to interact with it in Italian, our native language. We usually speak Italian at home. We quickly switched to English. We didn’t switch because we were more comfortable with it, we switched because it didn’t work for us in Italian. It didn’t work great for us in English, either, with our accents. But it was the better of the two options.

For the Few, Not the Many?

In the past few years, research has shown that language, accent, race, gender and age are the main factors that influence the accuracy of speech recognition. Researchers at Stanford have found that speech-to-text systematically misunderstands Black speakers twice as often as White speakers. Another study reported robust differences in accuracy across both gender and dialect, with lower accuracy for women and speakers from Scotland.

It’s worth noting at this stage, that results for accuracy are complicated. After all, our voices are extremely rich and unique, no one is like any other. But any sort of barrier, any digital divide with unequal access to digital technologies, deserves dissection. As Halcyon Lawrence, an assistant professor of technical communication and information design at Towson University told Claudia Lopez Lloreda in a piece for Scientific America: “I don’t get to negotiate with these devices unless I adapt my identity”.

This is simply not inclusive. Why should some people have to adapt their own voices and others not? Why should some get inferior results and others not?

A Deprivation of Data

It’s an issue that reaches beyond the speech recognition world. English (and a handful of other languages) are generally the focus of today’s language technologies. Despite there being over 6,500 languages in the world today, only a handful are systematically represented in academia and industry. The issue is that the near-human results on language translation and understanding usually only apply to a few languages. The vast majority of languages fall far below such standards.

Modern deep learning systems are data-hungry, they rely on enormous amounts of data for accuracy. This is problematic for languages for which a limited amount of data is available. Without the data to drive efficiency, some languages will continue to improve while others won’t. The bridge to inequality will grow.

Exclusive vs Inclusive

There’s a vast difference between speech-to-text working for some people, most of the time, and for all people, all of the time. At Speechmatics, we’re battling hard to make the latter a reality. We strongly believe speech technology must help us interact with the digital world fairly. Until it works for everyone all the time, true fairness is a target not an accomplishment.

We currently support 50 languages, covering over half of the world’s population with leading, consistent accuracy, that’s not dependent on language. But we’re not stopping here. As we continue to move forward, expand our coverage, and improve our technology, we’ll keep pushing the limits of what inclusivity means for commercially-ready speech recognition.

Benedetta Cevoli, Data Science Engineer, Speechmatics

Latest Articles

Carousel slide image
Technical

How to build a microbatching workflow with the Speechmatics API

Build a cleaner path between batch and real time. Learn when micro-batching makes sense, how to chunk audio, submit jobs, stitch JSON, and scale safely with the Speechmatics API.

Speechmatics
SpeechmaticsEditorial Team
Carousel slide image
Product

Alphanumeric speech recognition: why voice assistants mangle SKUs (and how to fix it)

A guide for voice AI engineers, ecommerce platforms and warehouse teams on SKU recognition accuracy voice assistant deployments depend on: why speech recognition systems produce transcription errors on product codes, what to measure when error rates matter, and the fixes that move the needle on order picking, voice ordering and customer-facing voice AI.

Speechmatics
SpeechmaticsEditorial Team
Carousel slide image
Technical

The Adobe story: How we made cloud-grade AI work on your laptop

Behind the build: what it takes to make cloud-grade speech recognition work inside Adobe Premiere, and why Whisper raised the stakes.

Andrew Innes
Andrew InnesChief Architect
Carousel slide image
Company

Adobe and Speechmatics deliver cloud-grade speech recognition on-device for Premiere

Adobe Premiere users can run the most accurate on-device transcription locally; efficient enough for a laptop, powerful enough for professional work.

Speechmatics
SpeechmaticsEditorial Team
Carousel slide image
Use Cases

Best speech-to-text AI guide: APIs, platforms and services compared

Speech-to-text has moved from novelty to enterprise infrastructure. Here's how the leading platforms stack up in 2026 — and how to pick the right one.

Tom Young
Tom YoungDigital Specialist
Speechmatics x Thymia combine medical-grade speech-to-text with clinical-grade voice biomarker intelligence to identify health signals.
News

AI can now understand health signals from 15 seconds of your voice, including fatigue, stress and type 2 diabetes

The joint platform returns transcription and health signals in real time, with no additional hardware required.

Speechmatics
SpeechmaticsEditorial Team