What does Speechmatics do?

Speechmatics provides speech technology and Voice AI for enterprises, offering accurate Speech-to-Text, Text-to-Speech, and Voice Agent solutions. Our models understand every voice and accent across 55+ languages, helping businesses unlock the full potential of voice data.

How accurate is Speechmatics Speech-to-Text?

Speechmatics delivers best-in-market accuracy, achieving up to 99% word accuracy and 96% medical keyword recall in industry benchmarks. Our models handle multiple accents, noisy environments, and multi speakers with ease.

What makes Speechmatics Text-to-Speech different?

Our low-latency Text-to-Speech (TTS) delivers lifelike, human-sounding voices with sub-150ms latency that is ideal for real-time conversations. Developers can stream natural speech in multiple voices and deploy it in the cloud, hybrid, or on-prem for privacy and control.

Can I build real-time voice agents with Speechmatics?

Our voice AI enables developers to build real-time voice agents that listen, understand, and respond naturally. Plug in fast with a flexible API and native integrations to power your AI voice agents.

Which industries use Speechmatics?

Speechmatics is trusted by organizations in media, healthcare, contact center, medical, finance, legal, education, and accessibility. Our technology powers transcription, translation, call analytics, and voice AI applications worldwide.

The Speechmatics approach to accent-independent speech recognition

Speechmatics has created an accent-independent speech recognition language pack enabling the accurate transcription of all major English accents.

A global approach to transcription

The days of having to adjust your accent so that speech recognition systems can understand you are well and truly over. You don't even have to choose between American English or British English in a drop-down menu of speech recognition languages. Our Global English language pack for speech-to-text transcription encompasses all major English accents and dialects.

Trained on thousands of hours of spoken data from more than 40 countries – and tens of billions of words drawn from global sources – Global English is the result of our unique accent-independent approach to creating world-class speech-to-text technology. We use The Automatic Linguist – our unique machine learning framework which is capable of learning new languages quickly. The technology is so world leading that it was a winner in the Innovation category of the 2019 Queen's Awards for Enterprise.

How we harnessed machine learning to create the Global English language pack

As an industry pioneer, Speechmatics has taken advantage of recent advances in machine learning and applied proprietary language training techniques – allowing a more universal, accent-independent and any-context approach to speech recognition languages than has been possible until now.

Speech recognition has advanced hugely in recent years, giving step-change improvements in a field used to marginal gains. In particular, modern neural network architectures are capable of generalizing across variations in speech by using representation learning. Deep neural networks feature multiple layers between input and output, allowing Speechmatics to filter everything but the phonetics. This effectively gives us the performance of a variety of specialized models, all in one comprehensive language pack.

In addition, single modern servers are more powerful than old room-filling supercomputers. This astonishing rise in compute power, coupled with repurposing of GPUs – from playthings of gamers into serious computing machines – gives masses of computing power. This allows us to train models, based on more data, capable of supporting more variations.

Investing in new ways of solving problems with high levels of speech-to-text accuracy

By investing more time in gathering data from a wide range of sources, we have created a huge and diverse training corpus – allowing us to train models with a much wider range of applications than ever before.

Traditionally, speech recognition – like other machine learning and big data organizations and products – relies on huge amounts of labeled data to achieve real-world results. While this approach is good in the short to medium term, as we continue towards the future where results are getting better, it will become harder to continue to drive improvements from a mechanism of training that relies on big data alone. Small data becomes increasingly important to support precise and focused use cases – as data also needs to be more representative to deliver exceptional levels of accuracy.

To reach the next stage of improvements, there will be a requirement to invest significant amounts of time in collecting and labeling data for machine learning. While Speechmatics is already delivering leading levels of accuracy, we are also investing in new ways of solving problems to enable high levels of accuracy without a growing commitment to labeling data that is not sustainable.

One accent-independent speech recognition model to rule them all

We have compared our Global English model with those of other providers of speech-to-text technology for the most common English accents. Test sets comprised diverse audio and transcribed text, with accented test files including variations in gender, age and region. In every case, our Global English language pack produced a more accurate transcription than our competitors' variant-specific language packs.

Speechmatics is committed to undertaking regular comparisons against other providers, with frequent testing and benchmarking to ensure we provide the best automatic speech recognition on the market. By moving from multiple specialist speech recognition language packs to a more comprehensive, single language pack, we have streamlined our portfolio and maximized the resources available for Global English.

Fast, accurate, reliable and now more flexible, convenient and inclusive, Global English offers users speech recognition for the future.

Oct 30, 2020 | Read time 4 min

The Speechmatics approach to accent-independent speech recognition

A global approach to transcription

How we harnessed machine learning to create the Global English language pack

Investing in new ways of solving problems with high levels of speech-to-text accuracy

One accent-independent speech recognition model to rule them all

Read also

Latest Articles

How to build a microbatching workflow with the Speechmatics API

Alphanumeric speech recognition: why voice assistants mangle SKUs (and how to fix it)

The Adobe story: How we made cloud-grade AI work on your laptop

Adobe and Speechmatics deliver cloud-grade speech recognition on-device for Premiere

Best speech-to-text AI guide: APIs, platforms and services compared

AI can now understand health signals from 15 seconds of your voice, including fatigue, stress and type 2 diabetes