Jan 7, 2026 | Read time 5 min

Speechmatics in 2025: The numbers that shaped Voice AI's breakthrough year (+ what’s to come in 2026)

The metrics, milestones, and real-world outcomes that defined Voice AI’s breakout year.
Lauren KingChief Marketing Officer

Voice AI won in 2025. Not "showed promise" or "gained traction." Won. 

The market saw 22% of Y Combinator's latest cohort building voice-first companies, voice AI funding surging eightfold to $2.1 billion, and contact centers preparing for call volumes to hit 39 billion by 2029. With voice-enabled devices hitting 8.4 billion globally, voice shifted from "interesting capability" to operational infrastructure. 

That scale created real outcomes: our customers returned over 30 million minutes to the healthcare workforce, hit 21x ROI on autonomous documentation workflows and scaled revenue 10x then 10x again.

The difference between companies shipping reliable voice products and those stuck in demo mode? The foundational speech layer.

When accuracy failures cascade into compliance risks, when latency breaks conversation flow, when models choke on accented speech or domain terminology, the entire application collapses.

Speechmatics models powered the speech infrastructure behind these wins. Our mission – to understand every voice, took a significant step forward in 2025, as we scaled real-time accuracy, multilingual capability, and specialist models that work in the messiest, most demanding production environments. 

Here's what that looked like in 2025.

4x real-time acceleration YoY: From broadcast origins to voice agent explosion

Real-time usage grew 4x from 2024 to 2025. Our infrastructure, that powers live captioning for the world's largest broadcasters – handling cross-talk, accents, and zero tolerance for on-screen errors – became the foundation for voice agents at production scale. Customers AI-Media alone processed over 79 million caption minutes in FY25, a 49% increase year-over-year. The technology that passed broadcasting's stress test now powers voice agents handling customer service, healthcare documentation, and complex multilingual workflows.

9x growth and <250ms latency: Voice agents that work, in increasingly regulated use cases 

Voice agent usage grew 9x in 2025. That growth came from production deployments that worked in tricky, real world environments, with systems where speed and accuracy work together, not against each other.

Our real-time STT returns partial transcripts in under 250ms and detects end-of-speech in 400ms. Production voice agents hit 1 to 1.5s total response time. This sub-second latency keeps conversations natural, but accuracy determines whether the agent completes the workflow or breaks trust.

Regulated sectors emerged as high-growth verticals. Financial services, debt collection, and insurance share one thing: regulatory pressure driving demand for accurate call documentation and interaction transparency. At this scale, consistency matters as much as peak performance.

Healthcare AI reaches clinical scale: 15x usage growth following focused medical training

One of the biggest wins this year was watching scale, growth and innovation in healthcare AI use cases. Our medical models usage grew 15x year-over-year as developers moved beyond demos into live clinical workflows with ambient scribe and agentic use cases. 

Over 90% of that growth was real-time, reflecting how healthcare AI shifted from batch processing to live workflows. Autonomous agents and ambient scribes moved from single-doctor pilots to enterprise scale, handling patient intake, clinical documentation, and insurance verification in real-time with impressive ROI returning millions of hours back to the medical workforce.

Models trained on 16 billion words of clinical conversations delivered keyword error rates 70% lower than alternatives, with medical keyword recall hitting 96% in production. When one misheard medication name creates patient harm, brilliant medical accuracy become requirements, not features. That reliability enabled new healthcare AI companies to scale rapidly, building applications that finally worked in real medical contexts.

10x growth in Nordic RT and 6x growth in Arabic RT: Global innovators chase US AI trailblazers

Voice AI stopped being a US-only story in 2025. Innovation hotspots emerged in the Nordics and Middle East, pushing speech technology to accommodate how the rest of the world actually speaks.

Nine out of ten top Norwegian banks are deploying voice AI, and 118 municipalities share the same platform partner – Boost.ai, all requiring systems that work accurately across Finnish, Swedish, Norwegian, and Danish.

This complexity was also seen in a massive spike in Speechmatics-powered Arabic agent deployments. Platforms had to handle conversations in Gulf, Levantine, Egyptian, and Maghrebi dialects. Systems trained only on Modern Standard Arabic simply couldn’t handle real world conversations.

2 million+ laptop users: Speechmatics on-device deployment hits production scale

Millions of users now run Speechmatics locally. This milestone reflects how we’ve been able to offer world beating accuracy on a laptop size model. This has made viable a range of production use cases - from media editing through to note taking and medical scribes.

Shifting the processing to the end user's laptop completely removes latency, issues with connectivity and hosting costs.

Our new On Device model achieves within 10% of our server grade models accuracy while running locally (and comfortably!) on a low-mid spec laptop. We are not aware of any alternative that comes close.

Speech innovation in 2026

Our customers are pushing hard to scale impact in 2026: healthcare partners deploying across new regions with English-Arabic bilingual models, live media entering new markets with 55+ languages, contact centers automating complex interactions. 

The common thread: deeper integration into core workflows.

The innovation roadmap for 2026:

Continued latency reduction without sacrificing accuracy. Every millisecond compounds across thousands of daily interactions.

Multilingual as baseline, not feature. Code-switching mid-sentence is becoming the norm. Production systems must handle how humans actually speak.

Voice agents in regulated spaces. Compliance-heavy sectors adopting agents as essential infrastructure for documentation and audit requirements across financial services, insurance, and healthcare.

Flexible deployment. Edge, cloud, hybrid: seamless movement between modes for regulated industries and privacy-conscious workflows.

Voice embedded in workflows. Moving from interface to orchestration layer. Systems that listen becoming systems that act: triggering workflows, routing decisions, populating records, verifying identity.

The companies winning in voice AI are those building on reliable speech infrastructure.

2026 is going to be exciting...