Sep 15, 2025 | Read time 4 min

Speechmatics sets record in medical Speech-to-Text with 93% accuracy, powered by NVIDIA

Next-gen medical transcription model cuts errors, boosts accuracy, and brings real-time reliability to clinical documentation.
Medical model header asset
Speechmatics
SpeechmaticsEditorial Team

Cambridge, UK – September 15, 2025 — Speechmatics today launched a next-generation Medical Speech-to-Text (STT) model for clinical transcription, reaching 93% general real-world accuracy and outperforming peers with 50% fewer errors on medical terms. And now available in Spanish, French, Dutch and Finnish too.

Engineered for the speed and complexity of care, the model extends coverage of medical care and pharmaceutical terms and is optimized for rapid, multi-speaker dialogue. The result: cleaner notes, fewer corrections, and a clearer record of each encounter. 

Our goal is simple: build speech tech clinicians can trust in the messiness of real-world practice. When every voice is understood, the experience feels human again.

This is why we build, so teams can focus on what matters.” Katy Wigdahl, CEO, Speechmatics

Available in both batch and real-time modes, this release brings consistent high performance to AI-Scribe & dictation-driven workflows. 

Healthcare is rapidly shifting toward ambient documentation at scale. A recent NEJM Catalyst study noted widespread use of AI scribe technology, reporting over 15,700 hours of documentation time saved across 2.6 million patient visits.  

Speechmatics' new medical model is designed to accelerate this shift, providing higher transcription accuracy with robust performance across clinical settings.

Clinical-grade recognition for real-world messiness 

Healthcare speech is complex and chaotic. This new model is built for it. It's accent-independent by design, recognizing diverse voices, fast turn-taking, and shifting context. 

Real-time speaker diarization distinguishes clinicians, patients, and family members, ensuring clear attribution even with background noise or interruptions. 

Expanded medical vocabulary includes drug names, dosages, procedures, plus numerical and temporal formatting for structured outputs. This reduces manual correction and improves clarity at handover. 

“Out of the box, it was extremely accurate on medication terms and dosages, separated speakers cleanly, and held up in noisy rooms. That reliability is why we use it.

Accent handling and speaker differentiation have also made a huge difference to our teams.”

Karan Wallia, CEO, Nordhealth Therapy

How it stacks up: benchmark-beating accuracy 

Speechmatics’ new enhanced medical model cuts errors where they matter most:   50% fewer keyword errors on clinical terms and 17% lower overall word errors than the next best system. 

In latest benchmarking, Speechmatics achieved:

  • 93% general accuracy (7% WER, 17% lower than the next best vendor) 

  • 96% medical keyword recall

  • 4% keyword error rate (half the number of keyword errors on medical terminology vs. the next best vendor) 

These results place Speechmatics significantly ahead of other leading vendors (peer accuracy range: 74-91%; next best: 91%).

Unlike other providers, Speechmatics models are built real-time first, meaning switching from file-based transcription to real-time doesn’t need to be an accuracy trade-off anymore. 

Keyword Error Rate (KER) is a crucial metric in clinical domains, measuring the system’s ability to recognize key medical terms correctly.

With a KER of just 4%, Speechmatics outperformed all evaluated systems, helping ensure that critical information like diagnoses, dosages, and timelines are captured accurately. 

Built on NVIDIA, ready for scale

The model is deployed using NVIDIA GPUs, optimized through NVIDIA Triton Inference Server and CUDA acceleration. This enables high-throughput, low-latency processing at scale, with deployment flexibility across data center, private cloud, and edge environments.

This architecture supports rapid customization and horizontal scaling for diverse healthcare deployments—from telehealth platforms and contact centers to EHR-connected scribes and bedside tools.

Available now: The upgraded Medical Model for Real-time is available now in preview via the Speechmatics Portal and API. For batch trials, please contact the Speechmatics team referencing Batch Medical Model. English is supported at launch, with additional languages rolling out across our 55 language portfolio. The model will be demonstrated live at HLTH US in Las Vegas, October 19-22, 2025.

About Speechmatics 

Speechmatics provide enterprise-grade speech recognition technology trusted by global brands to power voice experiences that work in the real world. Their Speech-to-Text (STT) API turns messy, multilingual, multi-speaker audio into structured, accurate text—in real time and with minimal latency. 

Built for developers and scaled for enterprises, Speechmatics technology integrates seamlessly into existing workflows and platforms, with flexible deployment across cloud, on-prem, or edge environments. Their STT models understand 55+ languages, adapt to accents, and handle overlapping speech with precision. 

Speechmatics powers leading technology providers such as AI Media, Content Guru, and Nordhealth Therapy across industries including healthcare, media, contact centers, voice agents and AI driven workflows. They also partner with Voice AI platforms like LiveKit and Pipecat to help application builders scale. Speechmatics is headquartered in Cambridge and London.

Contact: events@speechmatics.com   6th Floor, Classic House, 174-180 Martha's Buildings, Old St  London, EC1V 9BP 

Latest Articles

[alt: Bilingual medical model featuring terms related to various health conditions and medications in Arabic and English. Key terms include "Chronic kidney disease," "Heart attack," "Diabetes," and "Insulin," among others, displayed in an organized layout.]
Product

Speechmatics achieves a world first in bilingual Voice AI with new Arabic–English model

Sets a new accuracy bar for real-world code-switching: 35% fewer errors than the closest competitor.

Speechmatics
SpeechmaticsEditorial Team
[alt: Illuminated ancient mud-brick structures stand against a dusk sky, showcasing architectural details and textures. Palm trees are in the foreground, adding to the setting's ambiance. Visually captures a historic site in twilight.]
Product

Your voice agent speaks perfect Arabic. That's the problem.

Most voice AI models are trained on formal Arabic, but real conversations across the Middle East mix dialects and English in ways those systems aren’t built to handle.

Yahia Abaza
Yahia AbazaSenior Product Manger
new blog image header
Technical

How Nvidia Dominates the HuggingFace Leaderboards in This Key Metric

A technical deep-dive into Token Duration Transducers (TDT) — the frame-skipping architecture behind Nvidia's Parakeet models. Covers inference mechanics, training with forward-backward algorithm, and how TDT achieves up to 2.82x faster decoding than standard RNN-T.

Oliver Parish
Oliver Parish Machine Learning Engineer
[alt: Healthcare professionals in scrubs and lab coats walk briskly down a hospital corridor. A nurse uses a tablet while others carry patient charts and attend to a gurney. The setting conveys a busy, clinical environment focused on patient care.]
Use Cases

Why AI-native EHR platforms will treat speech as core infrastructure in 2026

As clinical workflows become automated and AI-driven, real-time speech is shifting from a transcription feature to the foundational intelligence layer inside modern EHR systems.

Vamsi Edara
Vamsi EdaraFounder and CEO, Edvak EHR
[alt: Logos of Speechmatics and Edvak are displayed side by side, interconnected by a stylized x symbol. The background features soft, wavy lines in light blue, creating a modern and tech-focused aesthetic.]
Company

One word changes everything: Speechmatics and Edvak EHR partner to make voice AI safe for clinical automation at scale

Turning real-time clinical speech into trusted, EHR-native automation.

Speechmatics
SpeechmaticsEditorial Team
[alt: Concentric circles radiate outward from a central orange icon with a white Speechmatics logo. The background is dark blue, enhancing the orange glow. A thin green line runs horizontally across the lower part of the image.]
Technical

Speed you can trust: The STT metrics that matter for voice agents

What “fast” actually means for voice agents — and why Pipecat’s TTFS + semantic accuracy is the clearest benchmark we’ve seen.

Archie McMullan
Archie McMullanSpeechmatics Graduate