Product Release March 2023: Uplifts in Accuracy, Translation, and Automatic Language Identification

Our first release of the year is a big one – with a host of new features, updates, and improvements to our best-in-class speech API. This year, we’ll continue our mission to unlock human potential, increase the inclusivity of speech recognition engines, and lower bias through natural interaction with intelligent machines.

At the top of our release sits our accuracy uplifts with the release of Ursa, our latest generation of speech recognition models.

With that in mind, join us as we introduce our GPU support, showcase how our product continues its journey to understand every voice, and explain more about our new Translation offering.

Ursa Generation Models

We have greatly improved the accuracy of our English language transcription. Using a broad range of test sets, we see an average 22% relative improvement for the Enhanced model and 35% relative improvement for the Standard model.

Speechmatics Ursa generation models have achieved this breakthrough in performance by shifting execution to GPUs, enabling significantly larger machine learning models to be used in production.

Introducing Translation

By integrating Translation into our single Speech API, users can now use Speechmatics market-leading speech-to-text and Translation all in one place. As of now, Speechmatics offers translated text from and to English in 34 supported languages, start and end timing for sentences, as well as speaker labeling.

Accurate Translation is crucial to improving accessibility for businesses to global markets they haven’t previously tapped, including in use cases like Media Captioning, Meeting Platforms, and Contact Centers.

You can try our new Translation for free today in our portal. All Batch SaaS customers will have immediate, free access until 31st March 2023 through the existing Speech API.

Automatic Language Identification

Following swiftly on from the release of Language Identification last year, we’ve now introduced Automatic Language Identification as part of the Transcription API. Designed for customers working with audio data where the language may not be known, users can now transcribe audio as part of a single workflow, without specifying the language within the configuration. With coverage for 44 languages, you won’t need to tell us what the language is, we’ll tell you.

Numeral Formatting

This release brings a range of improvements to numeral formatting for English, including new measurements & telephone entity classes, and support for domain & email formatting.

Numeral formatting in speech recognition is essential for improving the readability of a transcript for all businesses but it is critical for financial, medical, broadcasting, and education sectors. A consistent output of numerals can save time and prevent human errors. The less time spent picking through edits in the post-processing phase, the better.

Speaker Diarization

We have improved Speaker Diarization accuracy for English in both our Standard and Enhanced models.

For our Real-Time ASR, we have achieved a step-change in diarization accuracy, with an average 14% relative improvement for Enhanced and 30% relative for Standard.

To learn more about our latest product release, watch the recording of our recent webinar.

For more details on all of our updates, you can find release notes here. If you need any additional support on these or any of the above, please contact our Support team.

Owen O’Loan, Senior Product Manager, Speechmatics

Mar 14, 2023 | Read time 4 min

Product Release March 2023: Uplifts in Accuracy, Translation, and Automatic Language Identification

Ursa Generation Models

Introducing Translation

Automatic Language Identification

Numeral Formatting

Speaker Diarization

Ready to Understand Every Voice?

Related Articles

Product Release November 2022: Including Language Coverage for over Half the World’s Population

Introducing Real-Time SaaS: Balancing Accuracy with Speed

3 Influential Benefits of Language Identification