Mar 14, 2023 | Read time 4 min

Product Release March 2023: Uplifts in Accuracy, Translation, and Automatic Language Identification

Senior Product Manager, Owen O’Loan, introduces the world to Speechmatics’ latest product release, including the first look at our new Translation offering.
Accuracy Uplift blog header
Owen O'Loan
Owen O'LoanDirector of Engineering Operations

Our first release of the year is a big one – with a host of new features, updates, and improvements to our best-in-class speech API. This year, we’ll continue our mission to unlock human potential, increase the inclusivity of speech recognition engines, and lower bias through natural interaction with intelligent machines.

At the top of our release sits our accuracy uplifts with the release of Ursa, our latest generation of speech recognition models.

With that in mind, join us as we introduce our GPU support, showcase how our product continues its journey to understand every voice, and explain more about our new Translation offering.

Ursa Generation Models

We have greatly improved the accuracy of our English language transcription. Using a broad range of test sets, we see an average 22% relative improvement for the Enhanced model and 35% relative improvement for the Standard model.

Speechmatics Ursa generation models have achieved this breakthrough in performance by shifting execution to GPUs, enabling significantly larger machine learning models to be used in production.

Introducing Translation

By integrating Translation into our single Speech API, users can now use Speechmatics market-leading speech-to-text and Translation all in one place. As of now, Speechmatics offers translated text from and to English in 34 supported languages, start and end timing for sentences, as well as speaker labeling.

Accurate Translation is crucial to improving accessibility for businesses to global markets they haven’t previously tapped, including in use cases like Media Captioning, Meeting Platforms, and Contact Centers.

You can try our new Translation for free today in our portal. All Batch SaaS customers will have immediate, free access until 31st March 2023 through the existing Speech API.

Automatic Language Identification

Following swiftly on from the release of Language Identification last year, we’ve now introduced Automatic Language Identification as part of the Transcription API. Designed for customers working with audio data where the language may not be known, users can now transcribe audio as part of a single workflow, without specifying the language within the configuration. With coverage for 44 languages, you won’t need to tell us what the language is, we’ll tell you.

Numeral Formatting

This release brings a range of improvements to numeral formatting for English, including new measurements & telephone entity classes, and support for domain & email formatting.

Numeral formatting in speech recognition is essential for improving the readability of a transcript for all businesses but it is critical for financial, medical, broadcasting, and education sectors. A consistent output of numerals can save time and prevent human errors. The less time spent picking through edits in the post-processing phase, the better.

Speaker Diarization

We have improved Speaker Diarization accuracy for English in both our Standard and Enhanced models.

For our Real-Time ASR, we have achieved a step-change in diarization accuracy, with an average 14% relative improvement for Enhanced and 30% relative for Standard.

To learn more about our latest product release, watch the recording of our recent webinar.

For more details on all of our updates, you can find release notes here. If you need any additional support on these or any of the above, please contact our Support team.

Owen O’Loan, Senior Product Manager, Speechmatics

Ready to Understand Every Voice?

Sign up to our free speech-to-text SaaS Portal and we’ll guide you through the integration of our API.

Latest Articles

Carousel slide image
Use Cases

What Word Error Rate Is Acceptable for Legal Transcription?

Word error rate for legal transcription has no single acceptable threshold. But knowing how accuracy, audio quality, and review obligations connect to real legal risk is what separates a reliable transcript from a costly one.

Mieke Smith
Mieke SmithSenior Writer
Carousel slide image
Use Cases

The court reporter shortage crisis: data, causes, and what legal teams are doing about it

The court reporter shortage is reshaping litigation. Explore data, causes, and how legal teams are using digital reporting and AI transcription to adapt.

Tom Young
Tom YoungDigital Specialist
[alt: Bilingual medical model featuring terms related to various health conditions and medications in Arabic and English. Key terms include "Chronic kidney disease," "Heart attack," "Diabetes," and "Insulin," among others, displayed in an organized layout.]
Product

Speechmatics achieves a world first in bilingual Voice AI with new Arabic–English model

Sets a new accuracy bar for real-world code-switching: 35% fewer errors than the closest competitor.

Speechmatics
SpeechmaticsEditorial Team
[alt: Illuminated ancient mud-brick structures stand against a dusk sky, showcasing architectural details and textures. Palm trees are in the foreground, adding to the setting's ambiance. Visually captures a historic site in twilight.]
Product

Your voice agent speaks perfect Arabic. That's the problem.

Most voice AI models are trained on formal Arabic, but real conversations across the Middle East mix dialects and English in ways those systems aren’t built to handle.

Yahia Abaza
Yahia AbazaSenior Product Manger
new blog image header
Technical

How Nvidia Dominates the HuggingFace Leaderboards in This Key Metric

A technical deep-dive into Token Duration Transducers (TDT) — the frame-skipping architecture behind Nvidia's Parakeet models. Covers inference mechanics, training with forward-backward algorithm, and how TDT achieves up to 2.82x faster decoding than standard RNN-T.

Oliver Parish
Oliver Parish Machine Learning Engineer
[alt: Healthcare professionals in scrubs and lab coats walk briskly down a hospital corridor. A nurse uses a tablet while others carry patient charts and attend to a gurney. The setting conveys a busy, clinical environment focused on patient care.]
Use Cases

Why AI-native EHR platforms will treat speech as core infrastructure in 2026

As clinical workflows become automated and AI-driven, real-time speech is shifting from a transcription feature to the foundational intelligence layer inside modern EHR systems.

Vamsi Edara
Vamsi EdaraFounder and CEO, Edvak EHR