May 21, 2021 | Read time 4 min

Speech recognition challenges and how to overcome them

Find out about the challenges in Speech Recognition and how the Speechmatics team has been able to overcome them to provide better voice tech. Read more!
Header image
Speechmatics
SpeechmaticsEditorial team

Accuracy has been one of the main speech recognition challenges for many years – and a barrier to entry for many businesses. Historically, the technology hasn’t been considered good enough to adopt as an integral part of a workflow and technology stack. But that is simply not true anymore. Voice technology has now improved to a point at which the output for the most spoken languages in the world – such as English, French, Spanish and German – is highly accurate in terms of word error rate (WER). So, what other challenges are affecting the future of speech recognition? And why is accuracy still a problem? These are the barriers highlighted by respondents to a survey as part of the Speechmatics report on Trends and Predictions for Voice Technology in 2021: 1. Accuracy

These days, accuracy refers to more than just the accuracy of the word output – the WER. Many other factors affect the level of accuracy on a case-by-case basis. These factors are often unique to a use case or a particular business need and include:

  • Background noise

  • Punctuation placement

  • Capitalization

  • Correct formatting

  • Timing of words

  • Domain-specific terminology

  • Speaker identification

2. Data security and privacy

The past year has seen a huge increase in concerns about data security and privacy – from 5% to 42% in the Speechmatics survey. This could be due to mistrust following media portrayal of ‘data-hungry’ tech giants. It could also be a result of more day-to-day conversations happening online when the coronavirus pandemic led to an explosion in remote working.

3. Deployment

Deploying and integrating voice technology – or any software, for that matter – needs to be simple. Whether a business requires deployment on-premises, in the cloud, or embedded, integration needs to be easy to do and secure. Without the appropriate support or documentation, integrating software can be time-consuming and expensive. It is, therefore, important for technology providers to make their deployments and integrations as frictionless as possible to avoid this barrier to adoption.

4. Language coverage

Many of the leading voice technology providers have a gap when it comes to language coverage. Most providers cover English but, when global businesses want to use voice technology, the lack of language coverage provides a barrier to adoption. When providers do offer more languages, accuracy is often still an issue when it comes to accent or dialect recognition. What happens when an American is speaking with a British person, for example? Which accent variation is used? Global language packs, encompassing a variety of accents, solve the problem.

What are the likely speech recognition challenges in the next 5-10 years?

Risks for speech recognition technology in the next 10 years.

Overcoming the speech recognition challenges around data privacy

Data privacy will continue to be a concern in the future of speech recognition, according to 95% or survey respondents. But there will be ways to overcome data security issues: Overcoming speech recognition challenges of data security 1. On-premises deployment

On-premises deployment of voice technology enables users to keep their data secure within their own environments – with no need for data to go into the cloud. It is often done using virtual appliances or containers so they can be deployed effortlessly into existing technology stacks. This is particularly important for industries such as banking, financial services and insurance where compliance and regulatory issues mean customer data and voice data cannot leave their premises. 2. Dark site environments

Typically, when deploying an on-premises solution for voice technology, businesses are required to connect to the public internet for licensing. Offline licensing is supported in dark site deployments – meaning all work is completed within an organization’s private environment. This delivers a more robust solution for compliance and data privacy needs. 3. Cloud deployment

Private cloud deployments are secure enough to keep data safe for lots of applications. If cloud deployment security is good enough for the business and use case needs, cloud deployment is often the preferred option due to low operational cost and less complexity. Want to know more about how to overcome speech recognition challenges? For more information – and the full survey results – download Trends and Predictions for Voice Technology in 2021.

Latest Articles

Carousel slide image
Technical

How to build a microbatching workflow with the Speechmatics API

Build a cleaner path between batch and real time. Learn when micro-batching makes sense, how to chunk audio, submit jobs, stitch JSON, and scale safely with the Speechmatics API.

Speechmatics
SpeechmaticsEditorial Team
Carousel slide image
Product

Alphanumeric speech recognition: why voice assistants mangle SKUs (and how to fix it)

A guide for voice AI engineers, ecommerce platforms and warehouse teams on SKU recognition accuracy voice assistant deployments depend on: why speech recognition systems produce transcription errors on product codes, what to measure when error rates matter, and the fixes that move the needle on order picking, voice ordering and customer-facing voice AI.

Speechmatics
SpeechmaticsEditorial Team
Carousel slide image
Technical

The Adobe story: How we made cloud-grade AI work on your laptop

Behind the build: what it takes to make cloud-grade speech recognition work inside Adobe Premiere, and why Whisper raised the stakes.

Andrew Innes
Andrew InnesChief Architect
Carousel slide image
Company

Adobe and Speechmatics deliver cloud-grade speech recognition on-device for Premiere

Adobe Premiere users can run the most accurate on-device transcription locally; efficient enough for a laptop, powerful enough for professional work.

Speechmatics
SpeechmaticsEditorial Team
Carousel slide image
Use Cases

Best speech-to-text AI guide: APIs, platforms and services compared

Speech-to-text has moved from novelty to enterprise infrastructure. Here's how the leading platforms stack up in 2026 — and how to pick the right one.

Tom Young
Tom YoungDigital Specialist
Speechmatics x Thymia combine medical-grade speech-to-text with clinical-grade voice biomarker intelligence to identify health signals.
News

AI can now understand health signals from 15 seconds of your voice, including fatigue, stress and type 2 diabetes

The joint platform returns transcription and health signals in real time, with no additional hardware required.

Speechmatics
SpeechmaticsEditorial Team