I am sure we can all remember a time when we misheard something and requested it to be repeated or perhaps just guessed what was said and carried on. Well, computers are no different, except in many cases, there is no possibility to ask for things to be repeated. When you guessed what was said, your brain used all the surrounding context and information available to help you make the best attempt at interpreting what was actually said. Speechmatics’ Custom Dictionary allows the Speech Recognition engine to do the same thing, using whatever context is available.
Sounds cool, but isn’t that difficult to do?
Traditionally, this might have involved complex model training or been something that you had to do that impacted all transcripts universally across your deployment, but like all Speechmatics’ features, we have made this as easy for you to use as possible. We do as much of the difficult part as we can, in a fast and flexible manner. All you have to do is provide the words that you think are relevant for the audio you are transcribing and the speech engine immediately does the rest for you.
As an example, suppose that you know you are transcribing a conversation between two people, you can add their names as the context and that will enable the engine to spell them correctly. Or perhaps you are transcribing a video about a company, you can add their brand or product names to make sure that they are correctly understood.
How could I do that?
To keep it really simple, you just provide the context as a set of words in plain text. Often the context is available from existing sources, such as the attendee names in a meeting, from meeting contacts in outlook, company names from a CRM system or the video brief being used. This means that you can use them directly from these locations. Each transcription session can use a different set of contexts so one deployment can be flexible enough for all your use cases without needing to ‘optimise’ for a common set across multiple transcriptions.
It really is this simple:
1. Connect to the speech server via the interface
2. Start the session with the context words
3. Provide the audio
4. Get the transcript
What’s the catch?
There is no catch, with the addition of a few words and practically no additional overheads, you can produce a more accurate transcript first time with less editing, less embarrassment, better accuracy and faster.
One last thing…
As always it works in all the languages we have too.
Ian Firth, Speechmatics