The foundation of a great product that uses speech is accurate AI transcription.
We’ve pioneered technological breakthroughs when transcribing the human voice, and have always taken an inclusive approach to our product. Our aim is to understand every voice, and, we provide transcription coverage for over half the world’s population, regardless of accent or local dialect.
We pride ourselves on our accuracy, outperforming some of the biggest companies in the world across the languages we support. This remains true for media where speakers have strong accents, noisy environments, low-quality audio, and where the media contains a lot of technical words and phrases.
As we continue to add additional speech capabilities to our offering, we still understand that the value of those is only as high as the transcription that powers them.
Delivering for multilingual, multicultural, and multinational businesses
We support transcription in 49 languages (including local dialects and accents) with automatic language detection, all with unparalleled accuracy. This means that over half the world’s population are your potential customers.
Correctly formatted numbers, dates and currencies, as well as language-specific capitalization, means your transcripts will be easy to parse and make sense off. Blocks of tricky-to-read text are ancient history.
Boost accuracy for proper nouns, acronyms or industry-specific terms by providing a list of custom words. Use a unique word in your business? No problem with Speechmatics.
Batch transcripts for the media that can wait. Real-time for the stuff that can’t. We power captions for live sporting events, so if you need our service in a hurry, no sweat.
No need to send multiple API calls for everything you need. With Speechmatics, all it takes is a single API call and you’ll get everything you need in return. This includes our growing suite of speech capabilities like summarization and sentiment analysis.
Our API can be deployed on cloud, on-prem or on-device, providing for every security, privacy and data sovereignty requirement you might have.
We offer two proprietary transcription models available to all customers:
We compared the relative accuracy of major Speech-text-providers in almost 4 million words, so you don’t have to. Speechmatics outperforms all the major cloud providers as well as Whisper on large publicly available data sets (see the full breakdown here).
Whilst everyone loves a great graph, it’s probably better to show you what we can do. Here’s a live demo, pulling an audio stream from international radio stations, transcribing in real-time. It also is translating it too, if you are interested...
The BEST way to see Speechmatics give you the accuracy you need is to see for yourself, on your media.
Head to the portal and get a free account today. Then upload your files and assess the output.
We promise you won't be disappointed.
Tom Wootton
Head of Product Area for Broadcast Services, Red Bee Media
Achieving these levels of accuracy have not been easy. But if it was easy, everyone would do it. They’re not. We are. And we’re proud of what we’ve been able to achieve.
By pioneering a Self-Supervised Learning (SSL) approach to speech-to-text, Speechmatics provides great accuracy, even with languages without vast amounts of training data. These are the headlines:
Our models are trained using over 1 million hours of speech audio to achieve maximum accuracy.
Speechmatics takes a global-first approach to our languages, supporting 45+ languages - from Arabic to Welsh, we've got you covered.
We increased our SSL model to 2 billion parameters, enabling us to better understand every voice.
We're using cutting edge GPUs for inference to achieve trustworthy transcription across all languages we offer.
Our SSL models give us rich acoustic representations of speech that we then use for labeled acoustic modelling.
Speechmatics achieve consistent, reliable and inclusive transcription - regardless of dialect.
We're accurate across all languages, even dialects. Don't believe us? This graph shows a 22% lead over the next best competitor on African-American vernacular English, calculated on the CORAAL dataset. The culmination of these is that we're able to achieve consistent, reliable, inclusive, trustworthy transcription across all of the languages we offer, even when the speaker is in a noisy environment and independent of the accent(s) being used.
This approach is inherently inclusive too, and performs well on speakers from different socio-economic backgrounds and across genders and ethnicities. Our innovative SSL approach allows us to overcome the limitations of well-curated labeled data and brings us a step closer to mitigating AI bias.
Book a meeting with our specialists to learn how you can unlock the value within speech by generating AI transcriptions.