Apr 10, 2018 | Read time 1 min

Custom Dictionary – the technical edit

Accurate transcription of voice data using any-context speech recognition enables enterprise businesses to understand insights automatically.
Header image

I am sure we can all remember a time when we misheard something and requested it to be repeated or perhaps just guessed what was said and carried on. Well, computers are no different, except in many cases, there is no possibility to ask for things to be repeated.

When you guessed what was said, your brain used all the surrounding context and information available to help you make the best attempt at interpreting what was actually said. Speechmatics’ Custom Dictionary allows the Speech Recognition engine to do the same thing, using whatever context is available.

Sounds cool, but isn't that difficult to do?

Traditionally, this might have involved complex model training or been something that you had to do that impacted all transcripts universally across your deployment, but like all Speechmatics’ features, we have made this as easy for you to use as possible. We do as much of the difficult part as we can, in a fast and flexible manner. All you have to do is provide the words that you think are relevant for the audio you are transcribing and the speech engine immediately does the rest for you.

As an example, suppose that you know you are transcribing a conversation between two people, you can add their names as the context and that will enable the engine to spell them correctly. Or perhaps you are transcribing a video about a company, you can add their brand or product names to make sure that they are correctly understood.

How could I do that?

To keep it really simple, you just provide the context as a set of words in plain text. Often the context is available from existing sources, such as the attendee names in a meeting, from meeting contacts in outlook, company names from a CRM system or the video brief being used. This means that you can use them directly from these locations. Each transcription session can use a different set of contexts so one deployment can be flexible enough for all your use cases without needing to 'optimise' for a common set across multiple transcriptions.

It really is this simple:

1. Connect to the speech server via the interface 2. Start the session with the context words 3. Provide the audio 4. Get the transcript

What’s the catch?

There is no catch, with the addition of a few words and practically no additional overheads, you can produce a more accurate transcript first time with less editing, less embarrassment, better accuracy and faster.

One last thing…

As always it works in all the languages we have too.

Ian Firth, Speechmatics

Latest Articles

Carousel slide image
Use Cases

The court reporter shortage crisis: data, causes, and what legal teams are doing about it

The court reporter shortage is reshaping litigation. Explore data, causes, and how legal teams are using digital reporting and AI transcription to adapt.

Tom Young
Tom YoungDigital Specialist
Carousel slide image
Use Cases

What Word Error Rate Is Acceptable for Legal Transcription?

Word error rate for legal transcription has no single acceptable threshold. But knowing how accuracy, audio quality, and review obligations connect to real legal risk is what separates a reliable transcript from a costly one.

Tom Young
Tom YoungDigital Specialist
[alt: Bilingual medical model featuring terms related to various health conditions and medications in Arabic and English. Key terms include "Chronic kidney disease," "Heart attack," "Diabetes," and "Insulin," among others, displayed in an organized layout.]
Product

Speechmatics achieves a world first in bilingual Voice AI with new Arabic–English model

Sets a new accuracy bar for real-world code-switching: 35% fewer errors than the closest competitor.

Speechmatics
SpeechmaticsEditorial Team
[alt: Illuminated ancient mud-brick structures stand against a dusk sky, showcasing architectural details and textures. Palm trees are in the foreground, adding to the setting's ambiance. Visually captures a historic site in twilight.]
Product

Your voice agent speaks perfect Arabic. That's the problem.

Most voice AI models are trained on formal Arabic, but real conversations across the Middle East mix dialects and English in ways those systems aren’t built to handle.

Yahia Abaza
Yahia AbazaSenior Product Manger
new blog image header
Technical

How Nvidia Dominates the HuggingFace Leaderboards in This Key Metric

A technical deep-dive into Token Duration Transducers (TDT) — the frame-skipping architecture behind Nvidia's Parakeet models. Covers inference mechanics, training with forward-backward algorithm, and how TDT achieves up to 2.82x faster decoding than standard RNN-T.

Oliver Parish
Oliver Parish Machine Learning Engineer
[alt: Healthcare professionals in scrubs and lab coats walk briskly down a hospital corridor. A nurse uses a tablet while others carry patient charts and attend to a gurney. The setting conveys a busy, clinical environment focused on patient care.]
Use Cases

Why AI-native EHR platforms will treat speech as core infrastructure in 2026

As clinical workflows become automated and AI-driven, real-time speech is shifting from a transcription feature to the foundational intelligence layer inside modern EHR systems.

Vamsi Edara
Vamsi EdaraFounder and CEO, Edvak EHR