Nov 3, 2022 | Read time 3 min

Product Release November 2022: Including Language Coverage for over Half the World’s Population

Join Speechmatics’ Product Marketing Manager, Paul Gordon, as he looks at our latest product release, including the introduction of new languages, the improvement of others and the release of our Real-Time SaaS.
Product Release November 2022: Including Language Coverage for over Half the World’s Population
Paul Gordon
Paul GordonProduct Marketing Manager

This November, we're thrilled to bring you our latest updates and innovations. It’s been an exciting time for us, particularly in our language packs.

Our aim is to understand every voice, making speech-to-text more accessible and breaking down the barriers created by language. In line with this effort, we've added 14 new languages. Now, with 50 languages we cover over half the world’s population.

Elsewhere, we've seen accuracy levels jump on 20 of our existing language packs, added new information to the JSON output to help customers generate text transcripts based on language properties, and we’ve launched our Real-Time SaaS.

Speechmatics Goes Even More Global: 14 New Languages

In our pursuit to understand every voice, we need to understand more languages. Until recently, our speech-to-text was available to use in 35 languages. Now we’ve taken a huge leap of 14 languages to 50, with an eye to adding even more in the future. Here are the additions: Bashkir, Basque, Belarusian, Esperanto, Estonian, Galician, Interlingua, Marathi, Mongolian, Tamil, Thai, Uyghur, Vietnamese, and Welsh.

With the inclusion of lesser-spoken languages such as Welsh (883,300) and Basque (900,000), we’re working to preserve these cultures with our technology. Right now, this means Speechmatics’ speech-to-text covers 50% of the global population. While it’s our biggest addition ever, we’re aiming to reach 70% usability in the next three years.

20 Languages Receive Improvements

While increasing the number of languages makes our technology more accessible, a focus on accuracy is essential as a differentiator in a competitive industry. To that end, we’ve updated 20 languages, as follows: Latvian, Swedish, Hungarian, Portuguese, Polish, Mandarin Chinese, Arabic, Dutch, Slovak, Bulgarian, Romanian, Slovenian, Lithuanian, Croatian, Malay, Catalan, Czech, Danish, Greek, and Turkish.

Our team’s efforts produced some huge upticks in accuracy including a 6.9% relative decrease in word error rate (WER) for Latvian, 6.6% for Swedish, 6.2% for Portuguese, and the same for Hungarian.

We’ve also seen improved formatting of numeric entities such as dates, currencies, and large numbers for Swedish, Norwegian, and Dutch.

New Language Formatting

Another new update for the quarter sees updates to our JSON output – now at version 2.8. We’re offering more detailed information about properties of the language being used, such as writing direction and word delimiter.

Different properties of language – such as with Arabic text (written right-to-left), or Chinese Mandarin, which has no spaces between words – have historically proven a challenge for speech-to-text. However, with this latest update, our engine will now expose certain information about the language pack in our JSON output. This helps us understand the nuances that come with different dialects around the world.

At the recent IBC show in Amsterdam, we spoke to a number of customers about formatting issues. These marked improvements with delimiters will help them in a variety of areas including compliance, the manual process of checking spaces, and control over punctuation.

The Release of our Real-Time SaaS

After years of offering best-in-class on-premises Real-Time transcription, we’re exceptionally proud to launch our Real-Time SaaS offering. This low-risk, high-reward approach to speech-to-text offers a perfect balance of fast results and highly accurate output, deployed within a secure public cloud environment.

This release will open up our technology to every size of business, giving more users access to the most accurate and fast speech-to-text currently available. You can see for yourself how easy it is to use and integrate by accessing our portal.

And Finally

Last but not least, we’ve sped up our Batch SaaS, reducing turnaround time by up to 75%. Typically transcribing 60 minutes of audio now takes under 5 minutes for Standard and 10 minutes for Enhanced.

For more details on all of our updates, you can find release notes here. If you need any additional support on these or any of the above, please contact our Support team.

Paul Gordon, Product Marketing Manager, Speechmatics

Latest Articles

Carousel slide image
Use Cases

The court reporter shortage crisis: data, causes, and what legal teams are doing about it

The court reporter shortage is reshaping litigation. Explore data, causes, and how legal teams are using digital reporting and AI transcription to adapt.

Tom Young
Tom YoungDigital Specialist
Carousel slide image
Use Cases

What Word Error Rate Is Acceptable for Legal Transcription?

Word error rate for legal transcription has no single acceptable threshold. But knowing how accuracy, audio quality, and review obligations connect to real legal risk is what separates a reliable transcript from a costly one.

Tom Young
Tom YoungDigital Specialist
[alt: Bilingual medical model featuring terms related to various health conditions and medications in Arabic and English. Key terms include "Chronic kidney disease," "Heart attack," "Diabetes," and "Insulin," among others, displayed in an organized layout.]
Product

Speechmatics achieves a world first in bilingual Voice AI with new Arabic–English model

Sets a new accuracy bar for real-world code-switching: 35% fewer errors than the closest competitor.

Speechmatics
SpeechmaticsEditorial Team
[alt: Illuminated ancient mud-brick structures stand against a dusk sky, showcasing architectural details and textures. Palm trees are in the foreground, adding to the setting's ambiance. Visually captures a historic site in twilight.]
Product

Your voice agent speaks perfect Arabic. That's the problem.

Most voice AI models are trained on formal Arabic, but real conversations across the Middle East mix dialects and English in ways those systems aren’t built to handle.

Yahia Abaza
Yahia AbazaSenior Product Manger
new blog image header
Technical

How Nvidia Dominates the HuggingFace Leaderboards in This Key Metric

A technical deep-dive into Token Duration Transducers (TDT) — the frame-skipping architecture behind Nvidia's Parakeet models. Covers inference mechanics, training with forward-backward algorithm, and how TDT achieves up to 2.82x faster decoding than standard RNN-T.

Oliver Parish
Oliver Parish Machine Learning Engineer
[alt: Healthcare professionals in scrubs and lab coats walk briskly down a hospital corridor. A nurse uses a tablet while others carry patient charts and attend to a gurney. The setting conveys a busy, clinical environment focused on patient care.]
Use Cases

Why AI-native EHR platforms will treat speech as core infrastructure in 2026

As clinical workflows become automated and AI-driven, real-time speech is shifting from a transcription feature to the foundational intelligence layer inside modern EHR systems.

Vamsi Edara
Vamsi EdaraFounder and CEO, Edvak EHR