The Speechmatics approach to accent-independent speech recognition

Speechmatics has created an accent-independent speech recognition language pack enabling the accurate transcription of all major English accents.

A global approach to transcription

The days of having to adjust your accent so that speech recognition systems can understand you are well and truly over. You don't even have to choose between American English or British English in a drop-down menu of speech recognition languages. Our Global English language pack for speech-to-text transcription encompasses all major English accents and dialects.

Trained on thousands of hours of spoken data from more than 40 countries – and tens of billions of words drawn from global sources – Global English is the result of our unique accent-independent approach to creating world-class speech-to-text technology. We use The Automatic Linguist – our unique machine learning framework which is capable of learning new languages quickly. The technology is so world leading that it was a winner in the Innovation category of the 2019 Queen's Awards for Enterprise.

How we harnessed machine learning to create the Global English language pack

As an industry pioneer, Speechmatics has taken advantage of recent advances in machine learning and applied proprietary language training techniques – allowing a more universal, accent-independent and any-context approach to speech recognition languages than has been possible until now.

Speech recognition has advanced hugely in recent years, giving step-change improvements in a field used to marginal gains. In particular, modern neural network architectures are capable of generalizing across variations in speech by using representation learning. Deep neural networks feature multiple layers between input and output, allowing Speechmatics to filter everything but the phonetics. This effectively gives us the performance of a variety of specialized models, all in one comprehensive language pack.

In addition, single modern servers are more powerful than old room-filling supercomputers. This astonishing rise in compute power, coupled with repurposing of GPUs – from playthings of gamers into serious computing machines – gives masses of computing power. This allows us to train models, based on more data, capable of supporting more variations.

Investing in new ways of solving problems with high levels of speech-to-text accuracy

By investing more time in gathering data from a wide range of sources, we have created a huge and diverse training corpus – allowing us to train models with a much wider range of applications than ever before.

Traditionally, speech recognition – like other machine learning and big data organizations and products – relies on huge amounts of labeled data to achieve real-world results. While this approach is good in the short to medium term, as we continue towards the future where results are getting better, it will become harder to continue to drive improvements from a mechanism of training that relies on big data alone. Small data becomes increasingly important to support precise and focused use cases – as data also needs to be more representative to deliver exceptional levels of accuracy.

To reach the next stage of improvements, there will be a requirement to invest significant amounts of time in collecting and labeling data for machine learning. While Speechmatics is already delivering leading levels of accuracy, we are also investing in new ways of solving problems to enable high levels of accuracy without a growing commitment to labeling data that is not sustainable.

One accent-independent speech recognition model to rule them all

We have compared our Global English model with those of other providers of speech-to-text technology for the most common English accents. Test sets comprised diverse audio and transcribed text, with accented test files including variations in gender, age and region. In every case, our Global English language pack produced a more accurate transcription than our competitors' variant-specific language packs.

Speechmatics is committed to undertaking regular comparisons against other providers, with frequent testing and benchmarking to ensure we provide the best automatic speech recognition on the market. By moving from multiple specialist speech recognition language packs to a more comprehensive, single language pack, we have streamlined our portfolio and maximized the resources available for Global English.

Fast, accurate, reliable and now more flexible, convenient and inclusive, Global English offers users speech recognition for the future.

Oct 30, 2020 | Read time 4 min

The Speechmatics approach to accent-independent speech recognition

A global approach to transcription

How we harnessed machine learning to create the Global English language pack

Investing in new ways of solving problems with high levels of speech-to-text accuracy

One accent-independent speech recognition model to rule them all