What does Speechmatics do?

Speechmatics provides speech technology and Voice AI for enterprises, offering accurate Speech-to-Text, Text-to-Speech, and Voice Agent solutions. Our models understand every voice and accent across 56+ languages, helping businesses unlock the full potential of voice data.

How accurate is Speechmatics Speech-to-Text?

Speechmatics delivers best-in-market accuracy, achieving up to 99% word accuracy and 96% medical keyword recall in industry benchmarks. Our models handle multiple accents, noisy environments, and multi speakers with ease.

What makes Speechmatics Text-to-Speech different?

Our low-latency Text-to-Speech (TTS) delivers lifelike, human-sounding voices with sub-150ms latency that is ideal for real-time conversations. Developers can stream natural speech in multiple voices and deploy it in the cloud, hybrid, or on-prem for privacy and control.

Can I build real-time voice agents with Speechmatics?

Our voice AI enables developers to build real-time voice agents that listen, understand, and respond naturally. Plug in fast with a flexible API and native integrations to power your AI voice agents.

Which industries use Speechmatics?

Speechmatics is trusted by organizations in media, healthcare, contact center, medical, finance, legal, education, and accessibility. Our technology powers transcription, translation, call analytics, and voice AI applications worldwide.

Introducing Melia, our new multilingual speech-to-text model

Today we're launching Melia, a multilingual model that handles code-switching across all 56+ languages we support. It's in your Speechmatics account now, in the Portal, Batch API, and SDKs.

Speechmatics has always been driven by one mission: to understand every voice. For over two decades, that has meant the most accurate real-world speech-to-text available, especially for the accents and dialects other systems treat as edge cases. But until now, it has largely meant one language at a time.

Over half the world speaks more than one language, and people move between them mid-conversation. For a growing number of our customers, understanding every voice now means transcribing several languages in a single file, quickly and affordably. Melia is where that begins, in batch today, with more to come.

What Melia does differently and where we are heading

Melia covers all the languages we support, so a recording that moves between languages comes back as one continuous transcript, with no language to select in advance and no language packs to manage. That keeps your workflow, and the orchestration behind it, simple.

Handling multiple languages is one thing. Handling how multilingual people actually sound is another: they carry an accent from one language into the next, and that accented speech is where most models struggle. It's where Speechmatics' two decades of accuracy work shows up, and where Melia is strongest.

Our goal for this lineage is to build the world's most accurate code-switching speech-to-text model. Melia 1 is the first step toward that, and each release from here makes its code-switching more accurate, with further improvements landing regularly over the coming weeks.

How it compares with other providers

On FLEURS, an open benchmark across many languages of read speech, Melia produces fewer errors than Deepgram, Microsoft and AssemblyAI on most languages. Measured against each vendor's strongest model, here's the share of languages where Melia wins:

Deepgram: 91% (best of Nova-3 and Nova-2)
Microsoft: 91% (best of Enhanced and Standard)
AssemblyAI: 77% (best of Universal-3-Pro, Universal-2, and Universal)

Where Melia fits with our current models

Melia sits alongside Standard and Enhanced, not in place of them: it's the multilingual addition to the lineup. When the lowest possible word error rate is what matters, Enhanced is still the model to choose.

For multilingual audio, Melia is the obvious option. It's our lowest-priced model, and turnaround is blisteringly fast and getting faster.

But it's not only for multilingual work. In our internal benchmarks on challenging, noisy monolingual audio, Melia averages a 5% lower word error rate than Standard across the languages we tested. For many single-language workloads, including ones running on Standard today, it's worth testing Melia in its place, at a lower price.

Standard and Enhanced still offer more features, including real-time transcription. See the documentation to compare all three.

Ready for production workloads: Melia is part of the same Batch API as Standard and Enhanced, and runs on the same production infrastructure. You get the same cloud regions, plus on-prem for teams that run Speechmatics in their own environment. The SLAs and reliability you depend on apply to Melia from launch. And because it covers every language in one model, there's just one to integrate, manage, and run instead of one per language. Melia carries a preview label because it's improving quickly, with more to come over the coming weeks and months, and we welcome feedback from production users. The three examples below are a starting point: places where Melia already makes a real difference today, and far from the only ones.

Contact center analytics: Multilingual call recordings that monolingual models couldn't process are now transcribable at scale, across European, Gulf, Southeast Asian, and US Hispanic operations. Melia returns language metadata with every transcript, so you can break calls down by language for routing, reporting, and quality work.

Multilingual broadcast captioning: Spanish-English content for US Latino audiences, Arabic-English broadcasts across the Gulf, multi-language news from across Southeast Asia: one model, not one per language.

Compliance monitoring: Regulated teams record customer calls across dozens of markets and have to review all of them. Melia transcribes the full archive, including the language switches that leave monolingual transcripts with gaps, so reviews aren't built on partial records. It also labels the languages in each call, so every one can reach a reviewer who speaks them.

What's next?

Melia is early in its life and will keep improving quickly. Most of that work is on accuracy: building on the gains you see here, making code-switching sharper, and expanding features and functionality. Real-time is on the way too, first in preview and then in production, so the same model can handle live audio as well as files. Expect regular improvements rather than big releases.

Team members on Melia's impact, use cases, vision, and what's coming.

Try it

Melia is live in your Speechmatics account today. Select Melia 1 in the Portal, or set melia-1 in your Batch API config. It's our lowest-priced model: as low as $0.129 per hour, with 10 hours per month free. Volume and enterprise pricing available; talk to your account team.

Documentation, supported languages, and SDK code examples: docs.speechmatics.com.

Jun 17, 2026 | Read time 4 min

Introducing Melia, our new multilingual speech-to-text model

TL;DR

What Melia does differently and where we are heading

How it compares with other providers

Where Melia fits with our current models

What's next?

Try it

Related Articles

The Adobe story: How we made cloud-grade AI work on your laptop

AI can now understand health signals from 15 seconds of your voice, including fatigue, stress and type 2 diabetes

Alphanumeric speech recognition: why voice assistants mangle SKUs (and how to fix it)

Latest Articles

Speaker Focus: Fixing Voice AI for the real world

Stenograph and Speechmatics Announce Industry-First On-Device Integration for CATalyst VP

From a Parked Side Project to 30 Teams Running Real Sales Calls on Speechmatics

A Simpler Way to Pay: Speechmatics Is Moving to Credits

Dutch doctors spend a quarter of their day on admin. Wellcom has built the fix.

A Practical Guide to Building Voice AI Applications With Real-Time Transcription in 2026