What does Speechmatics do?

Speechmatics provides speech technology and Voice AI for enterprises, offering accurate Speech-to-Text, Text-to-Speech, and Voice Agent solutions. Our models understand every voice and accent across 56+ languages, helping businesses unlock the full potential of voice data.

How accurate is Speechmatics Speech-to-Text?

Speechmatics delivers best-in-market accuracy, achieving up to 99% word accuracy and 96% medical keyword recall in industry benchmarks. Our models handle multiple accents, noisy environments, and multi speakers with ease.

What makes Speechmatics Text-to-Speech different?

Our low-latency Text-to-Speech (TTS) delivers lifelike, human-sounding voices with sub-150ms latency that is ideal for real-time conversations. Developers can stream natural speech in multiple voices and deploy it in the cloud, hybrid, or on-prem for privacy and control.

Can I build real-time voice agents with Speechmatics?

Our voice AI enables developers to build real-time voice agents that listen, understand, and respond naturally. Plug in fast with a flexible API and native integrations to power your AI voice agents.

Which industries use Speechmatics?

Speechmatics is trusted by organizations in media, healthcare, contact center, medical, finance, legal, education, and accessibility. Our technology powers transcription, translation, call analytics, and voice AI applications worldwide.

Custom Dictionary – the technical edit

I am sure we can all remember a time when we misheard something and requested it to be repeated or perhaps just guessed what was said and carried on. Well, computers are no different, except in many cases, there is no possibility to ask for things to be repeated.

When you guessed what was said, your brain used all the surrounding context and information available to help you make the best attempt at interpreting what was actually said. Speechmatics’ Custom Dictionary allows the Speech Recognition engine to do the same thing, using whatever context is available.

Sounds cool, but isn't that difficult to do?

Traditionally, this might have involved complex model training or been something that you had to do that impacted all transcripts universally across your deployment, but like all Speechmatics’ features, we have made this as easy for you to use as possible. We do as much of the difficult part as we can, in a fast and flexible manner. All you have to do is provide the words that you think are relevant for the audio you are transcribing and the speech engine immediately does the rest for you.

As an example, suppose that you know you are transcribing a conversation between two people, you can add their names as the context and that will enable the engine to spell them correctly. Or perhaps you are transcribing a video about a company, you can add their brand or product names to make sure that they are correctly understood.

How could I do that?

To keep it really simple, you just provide the context as a set of words in plain text. Often the context is available from existing sources, such as the attendee names in a meeting, from meeting contacts in outlook, company names from a CRM system or the video brief being used. This means that you can use them directly from these locations. Each transcription session can use a different set of contexts so one deployment can be flexible enough for all your use cases without needing to 'optimise' for a common set across multiple transcriptions.

It really is this simple:

1. Connect to the speech server via the interface 2. Start the session with the context words 3. Provide the audio 4. Get the transcript

What’s the catch?

There is no catch, with the addition of a few words and practically no additional overheads, you can produce a more accurate transcript first time with less editing, less embarrassment, better accuracy and faster.

One last thing…

As always it works in all the languages we have too.

Ian Firth, Speechmatics