What does Speechmatics do?

Speechmatics provides speech technology and Voice AI for enterprises, offering accurate Speech-to-Text, Text-to-Speech, and Voice Agent solutions. Our models understand every voice and accent across 55+ languages, helping businesses unlock the full potential of voice data.

How accurate is Speechmatics Speech-to-Text?

Speechmatics delivers best-in-market accuracy, achieving up to 99% word accuracy and 96% medical keyword recall in industry benchmarks. Our models handle multiple accents, noisy environments, and multi speakers with ease.

What makes Speechmatics Text-to-Speech different?

Our low-latency Text-to-Speech (TTS) delivers lifelike, human-sounding voices with sub-150ms latency that is ideal for real-time conversations. Developers can stream natural speech in multiple voices and deploy it in the cloud, hybrid, or on-prem for privacy and control.

Can I build real-time voice agents with Speechmatics?

Our voice AI enables developers to build real-time voice agents that listen, understand, and respond naturally. Plug in fast with a flexible API and native integrations to power your AI voice agents.

Which industries use Speechmatics?

Speechmatics is trusted by organizations in media, healthcare, contact center, finance, education, and accessibility. Our technology powers transcription, translation, call analytics, and voice AI applications worldwide.

How we built real-time concurrency for Voice AI at scale

TL;DR

Session-based concurrency ≠ request throughput. It’s about supporting thousands of persistent audio streams in parallel, often for hours at a time.
It’s a product-critical experience factor, not just an engineering concern. One missed session is one failed product moment.
At Speechmatics, we’ve built real-time concurrency into our architecture from the start. From 100 millisecond start times to multi-day persistent sessions, we’ve learned how to scale for bursty, real-world usage.

Each week, my team at Speechmatics processes millions of hours of conversation. That includes meetings, customer service calls, medical consultations and voice assistant interactions. Much of that happens in real-time, and the share of that is growing. Whether live or post-recorded, all of it depends on systems that can handle pressure without breaking.

To make this work, speed and accuracy of our world-leading speech recognition models is fundamental. It's also critical to ensure that our models are available instantly when our customers need them: hence concurrency.

Concurrency means handling many live speech sessions at the same time, with each one starting immediately and continuing smoothly, often for hours.

It’s the difference between a demo that runs smoothly and a product that delivers at scale. Without it, even high-performance models fall short when demand spikes.

In this post, we’ll unpack how we’ve built session-based concurrency into Speechmatics' architecture—what it is, why it matters, and what we’ve learned from supporting thousands of live voice sessions in parallel, every single day.

Why concurrency in Voice AI is a different beast

In traditional web services, concurrency usually means handling more requests per second. But in real-time voice AI, concurrency means something else entirely: supporting thousands of long-running, persistent audio streams--and doing it reliably.

When a live meeting starts, or a customer calls a helpline, the transcription needs to start instantly. There’s no loading screen, no buffer. And it needs to keep running for hours, sometimes even days, without breaking.

Request-Based Concurrency	Session-Based Concurrency
Short bursts (e.g. API calls)	Long, persistent streams
Easy to load-balance	Requires session state
Retry on failure is feasible	Must remain uninterrupted
Scaling by request volume	Scaling by number of active sessions
Stateless	Stateful
Low individual request duration	Sessions can last minutes to days
Traditional backend logic applies	Needs concurrency-aware orchestration

This is what we mean when we say concurrency is non-negotiable in voice AI.

Concurrency is a product experience, not just infrastructure

At Speechmatics, many of the conversations we process are real-time sessions that span anywhere from 30 seconds to 48 hours. In fact, one of our longest sessions ran unbroken for over 100 days.

That kind of scale demands an entirely different engineering mindset. Where traditional applications expect short bursts of user activity, our workloads are long-lived, persistent, and unpredictable. We designed our infrastructure to match.

💡Session-based concurrency is fundamentally different from request-based concurrency.

💡Scaling for live speech isn’t just about handling volume but about handling time.

Scaling pains and lessons learned

The biggest lessons came from our customers.

In healthcare, session demand is tied to real-world rhythms: clinics open at 9am, emergency care spikes after hours. In media and meetings, concurrency can double in seconds, based on breaking news or a global webinar launch.

That meant we had to move beyond conservative limits. We evolved our quotas, tuned our autoscaling, and adjusted orchestration logic so customers never feel like they’re outgrowing us. Our system is designed to flex before the customer even notices a surge. 💡If you wait for demand to hit you, you’re already too late.

How we built it

To support persistent sessions at scale, our architecture is built around a 'real-time first' principle.

From day one, we optimized our system for:

Ultra-low latency models and infrastructure
Fast, flexible session starts (often under 100 milliseconds)
High-availability orchestration that handles bursty demand
Session persistence across hours, days, or however long it takes

Here’s how it works:

A user requests a new session
Our load balancer routes it based on current demand
A concurrency-aware orchestration layer allocates the right resources
A persistent STT engine instance takes over, managing the stream from start to finish

💡Real-world concurrency means treating every live session like a critical connection, not just a spike in throughput.

What this means for builders

If you're building a voice-powered product (anything from transcription tooling to live subtitling or agent support) concurrency should be a day-one conversation.

Here's why:

You can’t fake real-time.
Your users won’t wait.
Your architecture needs to scale before you scale.

Pick the wrong Speech-to-Text provider, and you’ll hit the ceiling before you hit product-market fit.

Start building with Voice AI

Get started in minutes

Sep 9, 2025 | Read time 4 min

How we built real-time concurrency for Voice AI at scale

TL;DR

Why concurrency in Voice AI is a different beast

Concurrency is a product experience, not just infrastructure

Scaling pains and lessons learned

How we built it

What this means for builders

Start building with Voice AI

Related Articles

Introducing real-time, speaker-aware Voice Agents with LiveKit + Speechmatics

Knowing who said what: the importance of Speaker Diarization for analytics and conversations in Voice AI

The ultimate guide to Voice AI: 21 questions answered