Product - Features and Deployments

There's speech-to-text. Then there's Speechmatics.

An API with a comprehensive range of features, unmatched accuracy, flexible deployment and AI-powered capabilities.

Everything you need to build brilliant voice features and products.


Our models are built to deliver for your needs

Get the very best performance and fast transcription whether you choose real-time or batch modes - deployed however suits you.


File transcription

Process thousands of hours of pre-recorded files, whenever you need them, and fast.


Live transcription

Transcribe media as it happens. Get initial transcriptions in milliseconds, with context-driven accuracy improvements over time.



Meet architecture, security and compliance needs by hosting our API in your own environment. Combine with Cloud, deploy using Docker Containers, or preconfigured Virtual Appliances.



Get secure and scalable access to our API through our cloud deployment and get instant access to all our new features, languages and updates.

Transcription Features

Everything you need to hit the highest accuracy possible

Our customization options allow you to finely tune your set up to achieve high accuracy with even the most unique words and phrases.


Custom Dictionary

Boost accuracy for proper nouns, acronyms or industry-specific terms by providing a list of custom words.


Language Model Adaptation

Increase accuracy for a use-case or domain by using a relevant corpus of textual content to customize default models.


Speaker & Channel Diarization

Track who said what and when with speaker labelling for each word, available for both batch and real-time transcription.


Numeral Formatting

Identify and correctly format numbers, dates and currencies automatically to improve transcript readability and enable effective post-processing.


Profanity & Disfluency Detection

Aid comprehensibility and compliance by detecting and optionally removing words that are considered profanities or hesitations.


File Formats

Minimize the resource needed to prepare audio or video files with support for all major audio and video formats along with automatic sample rate detection.

Advanced Features

Easily push a variety of media formats to the API

Easily push a variety of media formats to the API and get a rich set of metadata to support your post processing needs.


Confidence Scores

Collect confidence scores for every word in the transcript to enable efficient human review and editing.


Industry Language Packs

We're developing English language packs optimized to industry with sector-specific terminology. Finance is available now, with more to follow soon.


Word Timings

Get accurate timestamps for every word in the transcript to allow for post-processing and improved end user experience.


Advanced Punctuation & Casing

Improve readability with language-specific capitalization and punctuation including commas, question marks and exclamation marks.


Audio Events

Improve accessibility & fully-automate tedious captioning by identifying and labelling non-speech sounds in media, using AI.


Partner with Speechmatics to maximize your total addressable market

We deliver for multilingual, multicultural and multinational businesses, with coverage of nearly half the world’s languages across a range of dialects and accents.

Language Coverage

We support 50 languages, covering most native languages with unmatched accuracy.

Accents and dialects

Whether you need Brazilian Portuguese or Canadian French, we have you covered with a single language model that supports all associated accents and dialects.


Transcribe and translate audio to and from English for over 30 languages using a single API call.

Language Identification

Simplify integration and ensure accurate transcription with automatic detection of the language spoken.

AI Powered Capabilities

The combination of accurate transcription with breathtaking speech capabilities, providing solution bundles for customers makes Speechmatics truly unique.


With automatic translation with a single API call, you can translate media and provide captions for over half the world’s population.


Instantly generate summaries for social and video platforms, so viewers know what to expect, without you having to manually write.


Don’t just rely on reviews. See how customers are feeling about every aspect of your service by identifying sentiment throughout calls.


Your audience don’t want to (always) watch long media. Give them the topics discussed and the timestamps so they can engage with what they are most interested in.


As well as being divided up and summarized, each chapter is given a heading, making it super easy to find the most engaging content.

Ready to Understand Every Voice?

Sign up to our free speech-to-text SaaS Portal and we’ll guide you through the integration of our API.