Jan 7, 2026 | Read time 5 min

Speechmatics in 2025: The numbers that shaped Voice AI's breakthrough year (+ what’s to come in 2026)

The metrics, milestones, and real-world outcomes that defined Voice AI’s breakout year.
2025-numbers-1200x900 1 5x
Lauren King
Lauren KingChief Marketing Officer

Voice AI won in 2025. Not "showed promise" or "gained traction." Won. 

The market saw 22% of Y Combinator's latest cohort building voice-first companies, voice AI funding surging eightfold to $2.1 billion, and contact centers preparing for call volumes to hit 39 billion by 2029. With voice-enabled devices hitting 8.4 billion globally, voice shifted from "interesting capability" to operational infrastructure. 

That scale created real outcomes: our customers returned over 30 million minutes to the healthcare workforce, hit 21x ROI on autonomous documentation workflows and scaled revenue 10x then 10x again.

The difference between companies shipping reliable voice products and those stuck in demo mode? The foundational speech layer.

When accuracy failures cascade into compliance risks, when latency breaks conversation flow, when models choke on accented speech or domain terminology, the entire application collapses.

Speechmatics models powered the speech infrastructure behind these wins. Our mission – to understand every voice, took a significant step forward in 2025, as we scaled real-time accuracy, multilingual capability, and specialist models that work in the messiest, most demanding production environments. 

Here's what that looked like in 2025.

4x real-time acceleration YoY: From broadcast origins to voice agent explosion

Real-time usage grew 4x from 2024 to 2025. Our infrastructure, that powers live captioning for the world's largest broadcasters – handling cross-talk, accents, and zero tolerance for on-screen errors – became the foundation for voice agents at production scale. Customers AI-Media alone processed over 79 million caption minutes in FY25, a 49% increase year-over-year. The technology that passed broadcasting's stress test now powers voice agents handling customer service, healthcare documentation, and complex multilingual workflows.

9x growth and <250ms latency: Voice agents that work, in increasingly regulated use cases 

Voice agent usage grew 9x in 2025. That growth came from production deployments that worked in tricky, real world environments, with systems where speed and accuracy work together, not against each other.

Our real-time STT returns partial transcripts in under 250ms and detects end-of-speech in 400ms. Production voice agents hit 1 to 1.5s total response time. This sub-second latency keeps conversations natural, but accuracy determines whether the agent completes the workflow or breaks trust.

Regulated sectors emerged as high-growth verticals. Financial services, debt collection, and insurance share one thing: regulatory pressure driving demand for accurate call documentation and interaction transparency. At this scale, consistency matters as much as peak performance.

Healthcare AI reaches clinical scale: 15x usage growth following focused medical training

One of the biggest wins this year was watching scale, growth and innovation in healthcare AI use cases. Our medical models usage grew 15x year-over-year as developers moved beyond demos into live clinical workflows with ambient scribe and agentic use cases. 

Over 90% of that growth was real-time, reflecting how healthcare AI shifted from batch processing to live workflows. Autonomous agents and ambient scribes moved from single-doctor pilots to enterprise scale, handling patient intake, clinical documentation, and insurance verification in real-time with impressive ROI returning millions of hours back to the medical workforce.

Models trained on 16 billion words of clinical conversations delivered keyword error rates 70% lower than alternatives, with medical keyword recall hitting 96% in production. When one misheard medication name creates patient harm, brilliant medical accuracy become requirements, not features. That reliability enabled new healthcare AI companies to scale rapidly, building applications that finally worked in real medical contexts.

10x growth in Nordic RT and 6x growth in Arabic RT: Global innovators chase US AI trailblazers

Voice AI stopped being a US-only story in 2025. Innovation hotspots emerged in the Nordics and Middle East, pushing speech technology to accommodate how the rest of the world actually speaks.

Nine out of ten top Norwegian banks are deploying voice AI, and 118 municipalities share the same platform partner – Boost.ai, all requiring systems that work accurately across Finnish, Swedish, Norwegian, and Danish.

This complexity was also seen in a massive spike in Speechmatics-powered Arabic agent deployments. Platforms had to handle conversations in Gulf, Levantine, Egyptian, and Maghrebi dialects. Systems trained only on Modern Standard Arabic simply couldn’t handle real world conversations.

2 million+ laptop users: Speechmatics on-device deployment hits production scale

Millions of users now run Speechmatics locally. This milestone reflects how we’ve been able to offer world beating accuracy on a laptop size model. This has made viable a range of production use cases - from media editing through to note taking and medical scribes.

Shifting the processing to the end user's laptop completely removes latency, issues with connectivity and hosting costs.

Our new On Device model achieves within 10% of our server grade models accuracy while running locally (and comfortably!) on a low-mid spec laptop. We are not aware of any alternative that comes close.

Speech innovation in 2026

Our customers are pushing hard to scale impact in 2026: healthcare partners deploying across new regions with English-Arabic bilingual models, live media entering new markets with 55+ languages, contact centers automating complex interactions. 

The common thread: deeper integration into core workflows.

The innovation roadmap for 2026:

Continued latency reduction without sacrificing accuracy. Every millisecond compounds across thousands of daily interactions.

Multilingual as baseline, not feature. Code-switching mid-sentence is becoming the norm. Production systems must handle how humans actually speak.

Voice agents in regulated spaces. Compliance-heavy sectors adopting agents as essential infrastructure for documentation and audit requirements across financial services, insurance, and healthcare.

Flexible deployment. Edge, cloud, hybrid: seamless movement between modes for regulated industries and privacy-conscious workflows.

Voice embedded in workflows. Moving from interface to orchestration layer. Systems that listen becoming systems that act: triggering workflows, routing decisions, populating records, verifying identity.

The companies winning in voice AI are those building on reliable speech infrastructure.

2026 is going to be exciting...

Latest Articles

[alt: Bilingual medical model featuring terms related to various health conditions and medications in Arabic and English. Key terms include "Chronic kidney disease," "Heart attack," "Diabetes," and "Insulin," among others, displayed in an organized layout.]
Product

Speechmatics achieves a world first in bilingual Voice AI with new Arabic–English model

Sets a new accuracy bar for real-world code-switching: 35% fewer errors than the closest competitor.

Speechmatics
SpeechmaticsEditorial Team
[alt: Illuminated ancient mud-brick structures stand against a dusk sky, showcasing architectural details and textures. Palm trees are in the foreground, adding to the setting's ambiance. Visually captures a historic site in twilight.]
Product

Your voice agent speaks perfect Arabic. That's the problem.

Most voice AI models are trained on formal Arabic, but real conversations across the Middle East mix dialects and English in ways those systems aren’t built to handle.

Yahia Abaza
Yahia AbazaSenior Product Manger
new blog image header
Technical

How Nvidia Dominates the HuggingFace Leaderboards in This Key Metric

A technical deep-dive into Token Duration Transducers (TDT) — the frame-skipping architecture behind Nvidia's Parakeet models. Covers inference mechanics, training with forward-backward algorithm, and how TDT achieves up to 2.82x faster decoding than standard RNN-T.

Oliver Parish
Oliver Parish Machine Learning Engineer
[alt: Healthcare professionals in scrubs and lab coats walk briskly down a hospital corridor. A nurse uses a tablet while others carry patient charts and attend to a gurney. The setting conveys a busy, clinical environment focused on patient care.]
Use Cases

Why AI-native EHR platforms will treat speech as core infrastructure in 2026

As clinical workflows become automated and AI-driven, real-time speech is shifting from a transcription feature to the foundational intelligence layer inside modern EHR systems.

Vamsi Edara
Vamsi EdaraFounder and CEO, Edvak EHR
[alt: Logos of Speechmatics and Edvak are displayed side by side, interconnected by a stylized x symbol. The background features soft, wavy lines in light blue, creating a modern and tech-focused aesthetic.]
Company

One word changes everything: Speechmatics and Edvak EHR partner to make voice AI safe for clinical automation at scale

Turning real-time clinical speech into trusted, EHR-native automation.

Speechmatics
SpeechmaticsEditorial Team
[alt: Concentric circles radiate outward from a central orange icon with a white Speechmatics logo. The background is dark blue, enhancing the orange glow. A thin green line runs horizontally across the lower part of the image.]
Technical

Speed you can trust: The STT metrics that matter for voice agents

What “fast” actually means for voice agents — and why Pipecat’s TTFS + semantic accuracy is the clearest benchmark we’ve seen.

Archie McMullan
Archie McMullanSpeechmatics Graduate