What does Speechmatics do?

Speechmatics provides speech technology and Voice AI for enterprises, offering accurate Speech-to-Text, Text-to-Speech, and Voice Agent solutions. Our models understand every voice and accent across 55+ languages, helping businesses unlock the full potential of voice data.

How accurate is Speechmatics Speech-to-Text?

Speechmatics delivers best-in-market accuracy, achieving up to 99% word accuracy and 96% medical keyword recall in industry benchmarks. Our models handle multiple accents, noisy environments, and multi speakers with ease.

What makes Speechmatics Text-to-Speech different?

Our low-latency Text-to-Speech (TTS) delivers lifelike, human-sounding voices with sub-150ms latency that is ideal for real-time conversations. Developers can stream natural speech in multiple voices and deploy it in the cloud, hybrid, or on-prem for privacy and control.

Can I build real-time voice agents with Speechmatics?

Our voice AI enables developers to build real-time voice agents that listen, understand, and respond naturally. Plug in fast with a flexible API and native integrations to power your AI voice agents.

Which industries use Speechmatics?

Speechmatics is trusted by organizations in media, healthcare, contact center, finance, education, and accessibility. Our technology powers transcription, translation, call analytics, and voice AI applications worldwide.

The next generation of AI medical speech recognition is here

When we talk about breakthroughs in healthcare AI, the headlines tend to focus on diagnostics like Microsoft's super-accurate AI, drug discovery like AI-designed antibiotics, or LLMs like "Dr. ChatGPT".

But often, the most meaningful gains happen further down the stack, where documentation meets reality.

That is the layer our new Medical Model upgrades.

TL;DR

Short on time? Here is the snapshot. Speechmatics new Medical Model:

93% accuracy in clinical transcription with 7% WER.
50% fewer errors on medical terms vs. the next best system.
96% keyword recall and 4% keyword error rate.
Real-time first design and consistent performance across batch and live workflows.
Built for clinical messiness with accent-independent recognition and speaker diarization.
Expanded medical vocabulary with numeric and temporal formatting.
Available now - try it in our Portal preview or via API (real-time and batch supported).

Why now? The clinical bottleneck

Clinical AI is moving from pilot to production, and the bottleneck is documentation that keeps pace with real conversations.

Physicians speak fast, patients interrupt, and acronyms collide with drug names. When the transcript wobbles, downstream tools misfire.

A recent study in NEJM Catalyst tracked over 7,000 physicians using AI scribes across 2.6 million clinical encounters in a single year. The results are telling:

15,700 hours of documentation time saved—equivalent to almost 1,800 workdays
84% of doctors said patient interactions improved
82% reported better job satisfaction

High‑volume users, particularly in emergency medicine, primary care, and mental health, saw the biggest gains. Even low‑frequency users reported measurable time savings… and not a single patient in the study reported a drop in care quality

Transcription that understands clinical nuance is clearly not a luxury, but a multiplier, and this is the gap we know our Medical Model will close.

Benchmarks that matter in clinics

We ran side by side tests across multiple clinical sets to measure what matters in practice.

Headline results:

93% general accuracy measured as 7.27% WER.
96.0% medical keyword recall so critical terms land in the transcript.
4.0% keyword error rate, which translates to fewer mistakes on diagnoses, drug names, and timelines.
~50% fewer keyword errors on clinical terms, and ~17% lower overall word errors than the next best system.

Model	KER	WER	Accuracy
Speechmatics Medical	4.01%	7.27%	93%
ElevenLabs Scribe	8.51%	8.78%	91%
Deepgram Nova‑3 Medical	9.74%	8.88%	91%
AssemblyAI Standard	11.42%	9.21%	91%
OpenAI Whisper‑1	12.46%	11.10%	89%
Microsoft Enhanced	13.98%	12.25%	88%
Amazon Medical Dictation	12.47%	14.15%	86%
Google Medical Dictation	16.50%	17.10%	83%

Across our test sets, Speechmatics leads both on general accuracy and clinical term handling.

Why KER matters: Clinical documentation rides on keywords. A missed allergy, an incorrect dosage, or a wrong laterality can derail care. Tracking keyword error rate alongside WER gives a clearer view of clinical safety, not just raw accuracy.

What’s new?

Why are we polling first? There are four main changes that matter most for clinical use:

Vocabulary that speaks healthcare. Coverage for drug names, procedures, and clinical shorthand now lands reliably, including correct formatting for numbers, dosages, dates, and times. See our healthcare transcription support for more info.
Real-time diarization that keeps up. The system distinguishes clinicians, patients, and family members in the room, even with background noise or rapid turn-taking. Notes are easier to attribute, and handovers are cleaner.
Accent-independent by design. Healthcare is global. The model understands diverse accents and overlapping speech without forcing users to slow down or over-enunciate.
Real-time first. You get consistent accuracy whether you are streaming live audio or processing files, so teams do not trade precision for speed.

Together these updates reduce cognitive load and help keep the record clean.

Architecture matters when milliseconds add up

The new medical model is engineered for low latency and high throughput. It handles live dictation, in-room capture, and telemedicine sessions without choking on domain-specific language.

Batch workloads run at scale for backlogs and historical records. Developers get predictable performance and operational simplicity across deployment environments.

Our models are also built real-time first, so moving from file-based transcription to streaming does not mean an accuracy trade off.

What the new medical model unlocks

Here is what those gains mean in day to day work.

For developers - real-time transcription that stands up to domain pressure. Clean timestamps and entity handling simplify downstream NLP.
For clinicians - less screen time and more face time. Notes that reflect what was said rather than what the model guessed.
For patients - a calmer room. The computer listens, records, and stays out of the way.

When the transcript is right, everything built on top works better.

Getting started

Hands on is the best proof.

Test the Medical Model in the Speechmatics Portal preview, or integrate it directly via the API. Both real-time and batch are supported.

You can also see our healthcare language coverage via our docs.

If you are heading to HLTH USA in Las Vegas this October, come see it in action.

Bring your toughest audio. Tell us what success looks like and we will help you measure it.

FAQ

Which languages are supported today?

English is available now. Support aligns with our broader language coverage, including healthcare transcription. Contact us for roadmap details.

Does it handle speaker changes in busy rooms? Yes. Real-time speaker diarization separates speakers for cleaner attribution in clinical settings.

Where does the model fit in my stack? Use it for ambient scribing, clinician dictation, telemedicine, and call-based triage. Feed transcripts into EHRs, analytics, and LLM-powered assistants.

How does it deploy? Use our managed service or talk to us about enterprise options that meet your security and governance needs.

If you have a question not covered here, reach out and we will get you an answer.

Sep 16, 2025 | Read time 6 min

The next generation of AI medical speech recognition is here

TL;DR

Why now? The clinical bottleneck

Benchmarks that matter in clinics

What’s new?

Architecture matters when milliseconds add up

What the new medical model unlocks

Getting started

FAQ

Related Articles

Speechmatics sets record in medical Speech-to-Text with 93% accuracy, powered by NVIDIA

How Ambient Listening AI Automates Note-Taking for Doctors

AI for medical transcription: The ultimate guide to healthcare Speech Recognition