The world is now your oyster. With Speechmatics’ transcription and translation, you can bring your product or service to the largest audience possible, without the hassle of multiple different language APIs and lengthy setup times.
Speechmatics' AI-driven speech technology powers transcription, translation and understanding in 45+ languages. Our industry-leading language coverage ensures our technology can handle your business needs, regardless of where you're from.
Speechmatics delivers high accuracy transcription even on languages that other vendors like Google struggle with.
Downstream processes can only be relied upon if the underlying transcription is as accurate as possible.
All-in-one speech API
Translate in real-time
69 language pairs (and counting)
Translate into five languages at once
Reach a global audience, with precision and speed
Make your content work harder, access a larger audience, increase customer satisfaction. Whatever your industry, it's win after win after win after...
Our ASR supports over half the world’s population with its language coverage. Let your customer’s media reach as wide an audience as possible, regardless of the spoken language.
Speech translation encourages diversity and inclusion in education, as well as ensuring your services remain compliant with international laws and standards.
We've done the heavy lifting, so you don’t have to. Packed with features to give you access to global customer bases and audiences, without the headaches of lengthy setup and configuration.
We offer both transcription and translation with a single API call, drastically reducing complexity and maintenance. If you're looking for an expert partner to help you:
Service a customer base spread over a large range of languages and geographies
With a simple to use, unified API
Lower word error rate compared to Google (on German to English).
Better quality French to English translation than Google (as measured by COMET score).
Better quality score for ASR + Translation for German, Swedish, and Japanese (as measured by COMET score).
At Speechmatics, we’ve pioneered a self-supervised learning approach to building our speech model.
Speech models live and die by the data used to train them. Typically ASR models are trained only on labeled data, which drastically limits the available training data (as well as increasing its costs). In this case, labeled data is audio data that has been transcribed by a human – essentially creating a ‘ground truth’. This clearly limits the availability of labeled data, and when it comes to the nuances of speech and voice data, this is not ideal. When it comes to voice data, the more the better, since high accuracy can only be achieved by the models being trained on as wide a variety of voices as possible.
Unlabeled speech data on the other hand is abundant, but does not contain any additional information, and therefore is seen by most to be much harder to use to train speech models. Speechmatics are not most companies though, and we’ve achieved several breakthroughs that allow us to use unlabeled data and give us access to a far greater training dataset (over 1 million hours of multi-lingual data). This means more voices, more accents, more dialects, more noisy audio.
When trained with self-supervised learning, our models can autonomously learn to spot salient patterns in the large quantities of unlabeled data.
Just as children learn to speak a language by being exposed to a world of voices and conversation, our models also learn by the power of exposure with self-supervised learning. By finding hidden patterns present in unlabeled data, self-supervised learning allows us to efficiently learn to understand speech without needing massive amounts of time-consuming, costly and limited labeled data.
This breadth of sources for training data ensure we achieve a high level of accuracy across all of our languages, which in turn increases the accuracy of any translation made using them.