Aug 3, 2022 | Read time 4 min

5 Easy Steps to Get Started with Speechmatics

In a continually innovating industry, it’s important to keep things simple. Our latest guide on how to get started with Speechmatics speech recognition API.
5 Easy Steps to Get Started with Speechmatics
Paul Gordon
Paul GordonProduct Marketing Manager

To stay at the summit of the speech-to-text mountain, we recognize that not everyone will immediately know how to use our award-winning Autonomous Speech Recognition engine (ASR). 

Moreover, our mission to understand every voice heavily centers around accessibility within media – for example, open and closed captions are becoming less of an accessory and more of a necessity. So, naturally, we want to keep our highly advanced AI accessible and easy to use. 

In pursuit of that, we've created a helpful guide with five easy steps for getting started with Speechmatics. 

Step 1: Choose Your Deployment Options

Firstly, you will need to choose your deployment options. You have four choices: virtual appliance, containers, SaaS, and hybrid. Here's a short summary of each:

Virtual Appliance

A pre-configured virtual machine capable of doing Real-Time or Batch processing. It can be deployed directly in your on-premises environment.

Containers

Our Docker Containers enable you to build scalable transcription services within your infrastructure in Real-Time or Batch processing.

SaaS

We can deliver all the benefits of the Speechmatics ASR without the complexities of deploying it within your team and environment. Choose public (hosted by Speechmatics) or your cloud.

Hybrid

Hybrid deployment helps those with a mixture of data requirements that use cloud and on-premises processing.

First choice down, four to go. 

Step 2: Choose Your Offering

Next on the guide is to choose your offering. In this instance, we provide you with two options: Batch and Real-Time.

Firstly, there's Batch, where you transcribe speech-to-text from pre-recorded media files at your convenience using our ASR. You can schedule a transcription at a time that suits you.

With the Real-Time offering, you can get your speech-to-text in real time and get results instantly. As a result, you can gather actionable data as soon as needed. If you're worried accuracy would be compromised, fear not; our proprietary technology delivers best-in-class accuracy even at low latencies - proven in recent research against our competitors.

Step 3: Choose Your Features

At this point, your options open some more. Our features range from channel diarization to flexible endpointing. Here's a complete list: 

  • Entity Formatting

  • Notifications

  • Speaker Diarization

  • Partials

  • Channel Diarization

  • Transcript Finalization

  • Custom Dictionary and Sounds Feature

  • Flexible Endpointing

  • Speaker Change

Speech-to-text is not a solved problem yet, so we're always looking to innovate our ASR, expect to see more features in the future. 

Step 4: Choose Your Format

Next up, you will need to choose your format. Happily, we support all major audio and video formats, reducing users' time to prepare files. After all, we're all about speech-to-text accessibility.

For clarity, the default for our output is JSON. Users also have the option to pick an alternative in srt and txt.  

Step 5: Choose Your Language

Despite competition from the most prominent names in the speech-to-text and surrounding AI industry (Google, Microsoft, etc.), we are proud to boast about the 50 languages in our ASR's coverage.

This includes Arabic, Bulgarian, Catalan, Cantonese, Croatian, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Malay, Mandarin (traditional and simplified), Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Slovakian, Slovenian, Spanish, Swedish, Turkish, and Ukrainian.

Arabic, Bashkir, Basque, Belarusian, Bulgarian, Cantonese, Catalan, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, German, Greek, Hindi, Hungarian, Indonesian, Interlingua, Italian, Japanese, Korean, Latvian, Lithuanian, Malay, Mandarin (Traditional & Simplified), Marathi, Mongolian, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Tamil, Thai, Turkish, Ukrainian, Uyghur, Vietnamese and Welsh.

Getting Started Is Only a Few Choices Away

There you have it. This guide was designed to be simple, so with that, here's a short summary:

  1. Choose your deployment options: virtual appliance, containers, SaaS, and hybrid. 

  2. Choose your offering: batch or real-time.

  3. Choose your features: entity formatting, notifications, speaker diarization, partials, channel diarization, transcript finalization, custom dictionary, flexible endpointing, and speaker change. 

  4. Choose your format: all major audio and video formats.

  5. Choose your language: 50 dialects at your disposal. 

Paul Gordon, Product Marketing Manager, Speechmatics

Power your products with enterprise-grade Voice AI

We handle the speech, you deliver conversations that matter.

Latest Articles

Carousel slide image
Technical

How to build a microbatching workflow with the Speechmatics API

Build a cleaner path between batch and real time. Learn when micro-batching makes sense, how to chunk audio, submit jobs, stitch JSON, and scale safely with the Speechmatics API.

Speechmatics
SpeechmaticsEditorial Team
Carousel slide image
Product

Alphanumeric speech recognition: why voice assistants mangle SKUs (and how to fix it)

A guide for voice AI engineers, ecommerce platforms and warehouse teams on SKU recognition accuracy voice assistant deployments depend on: why speech recognition systems produce transcription errors on product codes, what to measure when error rates matter, and the fixes that move the needle on order picking, voice ordering and customer-facing voice AI.

Speechmatics
SpeechmaticsEditorial Team
Carousel slide image
Technical

The Adobe story: How we made cloud-grade AI work on your laptop

Behind the build: what it takes to make cloud-grade speech recognition work inside Adobe Premiere, and why Whisper raised the stakes.

Andrew Innes
Andrew InnesChief Architect
Carousel slide image
Company

Adobe and Speechmatics deliver cloud-grade speech recognition on-device for Premiere

Adobe Premiere users can run the most accurate on-device transcription locally; efficient enough for a laptop, powerful enough for professional work.

Speechmatics
SpeechmaticsEditorial Team
Carousel slide image
Use Cases

Best speech-to-text AI guide: APIs, platforms and services compared

Speech-to-text has moved from novelty to enterprise infrastructure. Here's how the leading platforms stack up in 2026 — and how to pick the right one.

Tom Young
Tom YoungDigital Specialist
Speechmatics x Thymia combine medical-grade speech-to-text with clinical-grade voice biomarker intelligence to identify health signals.
News

AI can now understand health signals from 15 seconds of your voice, including fatigue, stress and type 2 diabetes

The joint platform returns transcription and health signals in real time, with no additional hardware required.

Speechmatics
SpeechmaticsEditorial Team