Mar 14, 2023 | Read time 4 min

Product Release March 2023: Uplifts in Accuracy, Translation, and Automatic Language Identification

Senior Product Manager, Owen O’Loan, introduces the world to Speechmatics’ latest product release, including the first look at our new Translation offering.
Accuracy Uplift blog header
Owen O'Loan
Owen O'LoanDirector of Engineering Operations

Our first release of the year is a big one – with a host of new features, updates, and improvements to our best-in-class speech API. This year, we’ll continue our mission to unlock human potential, increase the inclusivity of speech recognition engines, and lower bias through natural interaction with intelligent machines.

At the top of our release sits our accuracy uplifts with the release of Ursa, our latest generation of speech recognition models.

With that in mind, join us as we introduce our GPU support, showcase how our product continues its journey to understand every voice, and explain more about our new Translation offering.

Ursa Generation Models

We have greatly improved the accuracy of our English language transcription. Using a broad range of test sets, we see an average 22% relative improvement for the Enhanced model and 35% relative improvement for the Standard model.

Speechmatics Ursa generation models have achieved this breakthrough in performance by shifting execution to GPUs, enabling significantly larger machine learning models to be used in production.

Introducing Translation

By integrating Translation into our single Speech API, users can now use Speechmatics market-leading speech-to-text and Translation all in one place. As of now, Speechmatics offers translated text from and to English in 34 supported languages, start and end timing for sentences, as well as speaker labeling.

Accurate Translation is crucial to improving accessibility for businesses to global markets they haven’t previously tapped, including in use cases like Media Captioning, Meeting Platforms, and Contact Centers.

You can try our new Translation for free today in our portal. All Batch SaaS customers will have immediate, free access until 31st March 2023 through the existing Speech API.

Automatic Language Identification

Following swiftly on from the release of Language Identification last year, we’ve now introduced Automatic Language Identification as part of the Transcription API. Designed for customers working with audio data where the language may not be known, users can now transcribe audio as part of a single workflow, without specifying the language within the configuration. With coverage for 44 languages, you won’t need to tell us what the language is, we’ll tell you.

Numeral Formatting

This release brings a range of improvements to numeral formatting for English, including new measurements & telephone entity classes, and support for domain & email formatting.

Numeral formatting in speech recognition is essential for improving the readability of a transcript for all businesses but it is critical for financial, medical, broadcasting, and education sectors. A consistent output of numerals can save time and prevent human errors. The less time spent picking through edits in the post-processing phase, the better.

Speaker Diarization

We have improved Speaker Diarization accuracy for English in both our Standard and Enhanced models.

For our Real-Time ASR, we have achieved a step-change in diarization accuracy, with an average 14% relative improvement for Enhanced and 30% relative for Standard.

To learn more about our latest product release, watch the recording of our recent webinar.

For more details on all of our updates, you can find release notes here. If you need any additional support on these or any of the above, please contact our Support team.

Owen O’Loan, Senior Product Manager, Speechmatics

Ready to Understand Every Voice?

Sign up to our free speech-to-text SaaS Portal and we’ll guide you through the integration of our API.

Latest Articles

[alt: Bilingual medical model featuring terms related to various health conditions and medications in Arabic and English. Key terms include "Chronic kidney disease," "Heart attack," "Diabetes," and "Insulin," among others, displayed in an organized layout.]
Product

Speechmatics achieves a world first in bilingual Voice AI with new Arabic–English model

Sets a new accuracy bar for real-world code-switching: 35% fewer errors than the closest competitor.

Speechmatics
SpeechmaticsEditorial Team
[alt: Illuminated ancient mud-brick structures stand against a dusk sky, showcasing architectural details and textures. Palm trees are in the foreground, adding to the setting's ambiance. Visually captures a historic site in twilight.]
Product

Your voice agent speaks perfect Arabic. That's the problem.

Most voice AI models are trained on formal Arabic, but real conversations across the Middle East mix dialects and English in ways those systems aren’t built to handle.

Yahia Abaza
Yahia AbazaSenior Product Manger
new blog image header
Technical

How Nvidia Dominates the HuggingFace Leaderboards in This Key Metric

A technical deep-dive into Token Duration Transducers (TDT) — the frame-skipping architecture behind Nvidia's Parakeet models. Covers inference mechanics, training with forward-backward algorithm, and how TDT achieves up to 2.82x faster decoding than standard RNN-T.

Oliver Parish
Oliver Parish Machine Learning Engineer
[alt: Healthcare professionals in scrubs and lab coats walk briskly down a hospital corridor. A nurse uses a tablet while others carry patient charts and attend to a gurney. The setting conveys a busy, clinical environment focused on patient care.]
Use Cases

Why AI-native EHR platforms will treat speech as core infrastructure in 2026

As clinical workflows become automated and AI-driven, real-time speech is shifting from a transcription feature to the foundational intelligence layer inside modern EHR systems.

Vamsi Edara
Vamsi EdaraFounder and CEO, Edvak EHR
[alt: Logos of Speechmatics and Edvak are displayed side by side, interconnected by a stylized x symbol. The background features soft, wavy lines in light blue, creating a modern and tech-focused aesthetic.]
Company

One word changes everything: Speechmatics and Edvak EHR partner to make voice AI safe for clinical automation at scale

Turning real-time clinical speech into trusted, EHR-native automation.

Speechmatics
SpeechmaticsEditorial Team
[alt: Concentric circles radiate outward from a central orange icon with a white Speechmatics logo. The background is dark blue, enhancing the orange glow. A thin green line runs horizontally across the lower part of the image.]
Technical

Speed you can trust: The STT metrics that matter for voice agents

What “fast” actually means for voice agents — and why Pipecat’s TTFS + semantic accuracy is the clearest benchmark we’ve seen.

Archie McMullan
Archie McMullanSpeechmatics Graduate