Dec 8, 2020 | Read time 4 min

Solving the problem of accents for speech recognition languages

Global Spanish solves the accent problem for speech recognition languages by supporting all major Spanish accents and dialects for use in transcription.
Header image
Speechmatics
SpeechmaticsEditorial team

Global Spanish solves the accent-gap for speech recognition languages by supporting all major Spanish accents and dialects for use in speech-to-text transcription.

The challenge when it comes to global languages

With approximately 500 million speakers globally, Spanish is the second most natively spoken language in the world – and fourth most spoken language overall. But its global appeal and diversity of accents and dialects mean Spanish poses a significant challenge when it comes to providing consistent and accurate speech-to-text transcription.

To get accurate transcripts from Spanish speakers, speech-to-text technology providers usually create multiple Spanish language packs, each specializing in a specific region or speaker profile. But, in the real world, audio files often include more than one speaker from multiple regions, all with different accents, dialects and idiosyncrasies.

Deploying accent-specific language packs requires organizations to make a best guess as to the appropriate language pack to use for each audio file. It also requires them to host and store multiple language packs for one language – adding to operational complexities and costs for what should be an efficient, automated process and workflow.

There is also the costly and time-consuming problem of having to run a transcription multiple times through accent-specific language packs when there are multiple speakers with different Spanish accents in a single audio file. In the case of an interview involving a Mexican and a native Spaniard speaking in Spanish, for example, two transcriptions would need to be run to get the best accuracy for each speaker – one using a Mexican-Spanish language model and one using a Spanish-Spanish model. If just the Mexican-Spanish model was used, the native Spanish accent may not be recognized very well.

A new machine learning approach to dealing with accent and dialect variations

The Speechmatics approach is different – we are the first company to do away with creating multiple language packs for different accents and dialects. We use our unique Automatic Linguist (AL) machine learning framework to build language packs using machine learning. AL was a winner in the Innovation category of the 2019 Queen's Awards for Enterprise.

We started by creating a pioneering Global English language pack encompassing all major English accents and dialects. We then turned our attention to Spanish and created a Global Spanish language pack.

The benefits of the Speechmatics Global Spanish language pack

Our unique approach involves using machine learning to create a single, comprehensive language pack, accurately encompassing as many variations of Spanish as possible. For most real-world applications, this gives the most reliable, accurate and efficient performance for our customers and partners.

Our single language pack solution means users do not need to identify which Spanish variant is being spoken. When audio files feature multiple speakers with different accents – or where speaker accents are not known in advance – Global Spanish provides reliable results over a broader range of speakers.

In addition, by focusing resources on maintaining and updating fewer language packs, Speechmatics can increase quality, improve accuracy and ensure reliability for our customers and partners.

Global Spanish in the real world

A survey we conducted in 2019 found that Spanish and English are the most important languages for the contact center industry specifically.

As brands look to grow their reach, they also have to meet customer expectations and optimize their experience to drive loyalty and reduce churn. This means delivering localized and personalized services to those customers. The ability to use any-context speech recognition technology to transcribe Spanish accurately enables contact centers to use voice data to improve customer experiences and empower agents.

A 2016 survey by ICMI discovered that 57% of customers expect the service from their contact center to be in their native language – as opposed to the primary language of the contact center.

How we are innovating to deliver better performance across more speech recognition languages

Speech-to-text technology has advanced hugely in recent years, giving step-change improvements in a field used to marginal gains. In particular, modern neural network architectures are capable of generalizing across variations in speech. Deep neural networks feature multiple layers between input and output. This effectively gives us the performance of a variety of specialized models, all in one comprehensive language pack.

Single modern servers are more powerful than old, room-filling supercomputers. This astonishing rise in compute power, coupled with the repurposing of GPUs, gives masses of computing power. The advancements in compute power allow Speechmatics to train models based on more data, capable of supporting more variations in a single language pack.

By investing more time in gathering data from a wide range of sources, we have created a huge and diverse training corpus. This allows us to train models with a much wider range of applications than ever before. Speechmatics is already delivering leading levels of accuracy in speech recognition.

We are also investing in research and development to find new ways of solving problems to help our customers and partners innovate with voice. These approaches will deliver even better levels of accuracy across more speech recognition languages while making it easier to operate the Speechmatics solution.

Latest Articles

Carousel slide image
Product

Alphanumeric speech recognition: why voice assistants mangle SKUs (and how to fix it)

A guide for voice AI engineers, ecommerce platforms and warehouse teams on SKU recognition accuracy voice assistant deployments depend on: why speech recognition systems produce transcription errors on product codes, what to measure when error rates matter, and the fixes that move the needle on order picking, voice ordering and customer-facing voice AI.

Speechmatics
SpeechmaticsEditorial Team
Carousel slide image
Technical

The Adobe story: How we made cloud-grade AI work on your laptop

Behind the build: what it takes to make cloud-grade speech recognition work inside Adobe Premiere, and why Whisper raised the stakes.

Andrew Innes
Andrew InnesChief Architect
Carousel slide image
Company

Adobe and Speechmatics deliver cloud-grade speech recognition on-device for Premiere

Adobe Premiere users can run the most accurate on-device transcription locally; efficient enough for a laptop, powerful enough for professional work.

Speechmatics
SpeechmaticsEditorial Team
Carousel slide image
Use Cases

Best speech-to-text AI guide: APIs, platforms and services compared

Speech-to-text has moved from novelty to enterprise infrastructure. Here's how the leading platforms stack up in 2026 — and how to pick the right one.

Tom Young
Tom YoungDigital Specialist
Speechmatics x Thymia combine medical-grade speech-to-text with clinical-grade voice biomarker intelligence to identify health signals.
News

AI can now understand health signals from 15 seconds of your voice, including fatigue, stress and type 2 diabetes

The joint platform returns transcription and health signals in real time, with no additional hardware required.

Speechmatics
SpeechmaticsEditorial Team
[alt: Concentric circles radiate outward from a central orange icon with a white Speechmatics logo. The background is dark blue, enhancing the orange glow. A thin green line runs horizontally across the lower part of the image.]
Technical

Speed you can trust: The STT metrics that matter for voice agents

What “fast” actually means for voice agents — and why Pipecat’s TTFS + semantic accuracy is the clearest benchmark we’ve seen.

Archie McMullan
Archie McMullanSpeechmatics Graduate