Feb 20, 2018 | Read time 3 min

Speechmatics launches Global English, an accent-agnostic language pack for speech-to-text transcription

languages 4
When tested against providers of similar solutions, GE consistently produced more accurate transcriptions. Compared directly, GE was between 3% and 55% more accurate than all Google’s Cloud Speech API accent-specific language packs and between 5% and 23% more accurate than IBM’s Cloud US English language pack*.

It supports every major English accent providing a consistent and cost-effective solution Today, Speechmatics is announcing the launch of Global English, a single English language pack supporting all major English accents for use in speech-to-text transcription. Global English (GE) was trained on thousands of hours of spoken data from over 40 countries and tens of billions of words drawn from global sources, making it one of the most comprehensive and accurate accent-agnostic transcription solutions on the market.

Traditionally, speech recognition has dealt with variations in language by producing a different language pack for every distinct accent or region. However, this meant a whole new set of models trained on data from that particular subset of speakers of the languages. With the launch of GE, Speechmatics is aiming to democratise speech-to-text transcription to overcome industry-wide issues where there are multiple English accents in one recording. Thus, providing a far more accurate, consistent and cost-effective solution. Speech recognition has advanced hugely in recent years, making GE possible. The team has been gathering data from a wide range of sources and taking advantage of the astonishing rise in computer power, allowing them to train bigger models, based on more data, capable of supporting more variations. Speechmatics has now built 72 unique languages, more than any other provider on the market, including Amazon, Google, Nuance, Microsoft and IBM. With the modern neural network architectures capable of generalising across variations in speech by using representation learning, Speechmatics were able to generate the accuracy of multiple specialised models all in one language pack.

Benedikt von Thüngen, CEO at Speechmatics, explained:

“At Speechmatics, we have historically produced North American, British and Australian versions of the English language packs, as well as domain-specific language packs. Applications include broadcast, compliance, speech analytics, call recording and meeting transcription among others. While a traditional British language pack does indeed perform better on British accented speech than say, a traditional North American language pack would, there’s still tens of distinct British accents to address. And so, we realised we need to come up with what we like to call ‘One Model to Rule Them All’ – an accent-agnostic language pack that is just as accurate at transcribing Australian accent as it is with Scottish.”

Tom Ash, Speech Recognition Director at Speechmatics and winner of the ‘Speech Luminary’ award, commented:

“In the UK alone, there are about 56 main ‘accent types’, and the concept of having one language pack per accent or region is very outdated. Bearing in mind that we live in an increasingly connected and mobile world, we need our tech to reflect that. We’ve all heard stories about people being misunderstood by their personal voice assistants or closed captioning getting something awkwardly wrong. While a lot of these stories are humorous, it’s ultimately highlighting a big issue. We’re hoping that Global English will inspire others to become more flexible and fair when it comes to people’s accents.”

Try it now Download infographic Download whitepaper *Test sets comprised of approximately 4 hours of diverse audio and transcribed text. Accented test files included variations in gender, age and region. We know accuracy results are always dependant on the test set used. If you would like to know further details about our test set, please get in touch.

Latest Articles

Carousel slide image
Use Cases

What Word Error Rate Is Acceptable for Legal Transcription?

Word error rate for legal transcription has no single acceptable threshold. But knowing how accuracy, audio quality, and review obligations connect to real legal risk is what separates a reliable transcript from a costly one.

Mieke Smith
Mieke SmithSenior Writer
Carousel slide image
Use Cases

The court reporter shortage crisis: data, causes, and what legal teams are doing about it

The court reporter shortage is reshaping litigation. Explore data, causes, and how legal teams are using digital reporting and AI transcription to adapt.

Tom Young
Tom YoungDigital Specialist
[alt: Bilingual medical model featuring terms related to various health conditions and medications in Arabic and English. Key terms include "Chronic kidney disease," "Heart attack," "Diabetes," and "Insulin," among others, displayed in an organized layout.]
Product

Speechmatics achieves a world first in bilingual Voice AI with new Arabic–English model

Sets a new accuracy bar for real-world code-switching: 35% fewer errors than the closest competitor.

Speechmatics
SpeechmaticsEditorial Team
[alt: Illuminated ancient mud-brick structures stand against a dusk sky, showcasing architectural details and textures. Palm trees are in the foreground, adding to the setting's ambiance. Visually captures a historic site in twilight.]
Product

Your voice agent speaks perfect Arabic. That's the problem.

Most voice AI models are trained on formal Arabic, but real conversations across the Middle East mix dialects and English in ways those systems aren’t built to handle.

Yahia Abaza
Yahia AbazaSenior Product Manger
new blog image header
Technical

How Nvidia Dominates the HuggingFace Leaderboards in This Key Metric

A technical deep-dive into Token Duration Transducers (TDT) — the frame-skipping architecture behind Nvidia's Parakeet models. Covers inference mechanics, training with forward-backward algorithm, and how TDT achieves up to 2.82x faster decoding than standard RNN-T.

Oliver Parish
Oliver Parish Machine Learning Engineer
[alt: Healthcare professionals in scrubs and lab coats walk briskly down a hospital corridor. A nurse uses a tablet while others carry patient charts and attend to a gurney. The setting conveys a busy, clinical environment focused on patient care.]
Use Cases

Why AI-native EHR platforms will treat speech as core infrastructure in 2026

As clinical workflows become automated and AI-driven, real-time speech is shifting from a transcription feature to the foundational intelligence layer inside modern EHR systems.

Vamsi Edara
Vamsi EdaraFounder and CEO, Edvak EHR