What does Speechmatics do?

Speechmatics provides speech technology and Voice AI for enterprises, offering accurate Speech-to-Text, Text-to-Speech, and Voice Agent solutions. Our models understand every voice and accent across 55+ languages, helping businesses unlock the full potential of voice data.

How accurate is Speechmatics Speech-to-Text?

Speechmatics delivers best-in-market accuracy, achieving up to 99% word accuracy and 96% medical keyword recall in industry benchmarks. Our models handle multiple accents, noisy environments, and multi speakers with ease.

What makes Speechmatics Text-to-Speech different?

Our low-latency Text-to-Speech (TTS) delivers lifelike, human-sounding voices with sub-150ms latency that is ideal for real-time conversations. Developers can stream natural speech in multiple voices and deploy it in the cloud, hybrid, or on-prem for privacy and control.

Can I build real-time voice agents with Speechmatics?

Our voice AI enables developers to build real-time voice agents that listen, understand, and respond naturally. Plug in fast with a flexible API and native integrations to power your AI voice agents.

Which industries use Speechmatics?

Speechmatics is trusted by organizations in media, healthcare, contact center, finance, education, and accessibility. Our technology powers transcription, translation, call analytics, and voice AI applications worldwide.

Speechmatics in 2025: The numbers that shaped Voice AI's breakthrough year (+ what’s to come in 2026)

Voice AI won in 2025. Not "showed promise" or "gained traction." Won.

The market saw 22% of Y Combinator's latest cohort building voice-first companies, voice AI funding surging eightfold to $2.1 billion, and contact centers preparing for call volumes to hit 39 billion by 2029. With voice-enabled devices hitting 8.4 billion globally, voice shifted from "interesting capability" to operational infrastructure.

That scale created real outcomes: our customers returned over 30 million minutes to the healthcare workforce, hit 21x ROI on autonomous documentation workflows and scaled revenue 10x then 10x again.

The difference between companies shipping reliable voice products and those stuck in demo mode? The foundational speech layer.

When accuracy failures cascade into compliance risks, when latency breaks conversation flow, when models choke on accented speech or domain terminology, the entire application collapses.

Speechmatics models powered the speech infrastructure behind these wins. Our mission – to understand every voice, took a significant step forward in 2025, as we scaled real-time accuracy, multilingual capability, and specialist models that work in the messiest, most demanding production environments.

Here's what that looked like in 2025.

4x real-time acceleration YoY: From broadcast origins to voice agent explosion

Real-time usage grew 4x from 2024 to 2025. Our infrastructure, that powers live captioning for the world's largest broadcasters – handling cross-talk, accents, and zero tolerance for on-screen errors – became the foundation for voice agents at production scale. Customers AI-Media alone processed over 79 million caption minutes in FY25, a 49% increase year-over-year. The technology that passed broadcasting's stress test now powers voice agents handling customer service, healthcare documentation, and complex multilingual workflows.

9x growth and <250ms latency: Voice agents that work, in increasingly regulated use cases

Voice agent usage grew 9x in 2025. That growth came from production deployments that worked in tricky, real world environments, with systems where speed and accuracy work together, not against each other.

Our real-time STT returns partial transcripts in under 250ms and detects end-of-speech in 400ms. Production voice agents hit 1 to 1.5s total response time. This sub-second latency keeps conversations natural, but accuracy determines whether the agent completes the workflow or breaks trust.

Regulated sectors emerged as high-growth verticals. Financial services, debt collection, and insurance share one thing: regulatory pressure driving demand for accurate call documentation and interaction transparency. At this scale, consistency matters as much as peak performance.

Healthcare AI reaches clinical scale: 15x usage growth following focused medical training

One of the biggest wins this year was watching scale, growth and innovation in healthcare AI use cases. Our medical models usage grew 15x year-over-year as developers moved beyond demos into live clinical workflows with ambient scribe and agentic use cases.

Over 90% of that growth was real-time, reflecting how healthcare AI shifted from batch processing to live workflows. Autonomous agents and ambient scribes moved from single-doctor pilots to enterprise scale, handling patient intake, clinical documentation, and insurance verification in real-time with impressive ROI returning millions of hours back to the medical workforce.

Models trained on 16 billion words of clinical conversations delivered keyword error rates 70% lower than alternatives, with medical keyword recall hitting 96% in production. When one misheard medication name creates patient harm, brilliant medical accuracy become requirements, not features. That reliability enabled new healthcare AI companies to scale rapidly, building applications that finally worked in real medical contexts.“

10x growth in Nordic RT and 6x growth in Arabic RT: Global innovators chase US AI trailblazers

Voice AI stopped being a US-only story in 2025. Innovation hotspots emerged in the Nordics and Middle East, pushing speech technology to accommodate how the rest of the world actually speaks.

Nine out of ten top Norwegian banks are deploying voice AI, and 118 municipalities share the same platform partner – Boost.ai, all requiring systems that work accurately across Finnish, Swedish, Norwegian, and Danish.

This complexity was also seen in a massive spike in Speechmatics-powered Arabic agent deployments. Platforms had to handle conversations in Gulf, Levantine, Egyptian, and Maghrebi dialects. Systems trained only on Modern Standard Arabic simply couldn’t handle real world conversations.

2 million+ laptop users: Speechmatics on-device deployment hits production scale

Millions of users now run Speechmatics locally. This milestone reflects how we’ve been able to offer world beating accuracy on a laptop size model. This has made viable a range of production use cases - from media editing through to note taking and medical scribes.

Shifting the processing to the end user's laptop completely removes latency, issues with connectivity and hosting costs.

Our new On Device model achieves within 10% of our server grade models accuracy while running locally (and comfortably!) on a low-mid spec laptop. We are not aware of any alternative that comes close.

Speech innovation in 2026

Our customers are pushing hard to scale impact in 2026: healthcare partners deploying across new regions with English-Arabic bilingual models, live media entering new markets with 55+ languages, contact centers automating complex interactions.

The common thread: deeper integration into core workflows.

The innovation roadmap for 2026:

Continued latency reduction without sacrificing accuracy. Every millisecond compounds across thousands of daily interactions.

Multilingual as baseline, not feature. Code-switching mid-sentence is becoming the norm. Production systems must handle how humans actually speak.

Voice agents in regulated spaces. Compliance-heavy sectors adopting agents as essential infrastructure for documentation and audit requirements across financial services, insurance, and healthcare.

Flexible deployment. Edge, cloud, hybrid: seamless movement between modes for regulated industries and privacy-conscious workflows.

Voice embedded in workflows. Moving from interface to orchestration layer. Systems that listen becoming systems that act: triggering workflows, routing decisions, populating records, verifying identity.

The companies winning in voice AI are those building on reliable speech infrastructure.

2026 is going to be exciting...

Jan 7, 2026 | Read time 5 min

Speechmatics in 2025: The numbers that shaped Voice AI's breakthrough year (+ what’s to come in 2026)

4x real-time acceleration YoY: From broadcast origins to voice agent explosion

9x growth and <250ms latency: Voice agents that work, in increasingly regulated use cases

Healthcare AI reaches clinical scale: 15x usage growth following focused medical training

10x growth in Nordic RT and 6x growth in Arabic RT: Global innovators chase US AI trailblazers

2 million+ laptop users: Speechmatics on-device deployment hits production scale

Speech innovation in 2026

Related Articles

Best TTS APIs in 2026: ElevenLabs, Google, AWS & 9 More Compared

Speechmatics sets new standard for real-time medical transcription with German and Nordic roll-out

Do you really need to pay a fortune for your text-to-speech?

Latest Articles

Speechmatics achieves a world first in bilingual Voice AI with new Arabic–English model

Your voice agent speaks perfect Arabic. That's the problem.

How Nvidia Dominates the HuggingFace Leaderboards in This Key Metric

Why AI-native EHR platforms will treat speech as core infrastructure in 2026

One word changes everything: Speechmatics and Edvak EHR partner to make voice AI safe for clinical automation at scale

Speed you can trust: The STT metrics that matter for voice agents