Cambridge, UK – September 15, 2025 — Speechmatics today launched a next-generation Medical Speech-to-Text (STT) model for clinical transcription, reaching 93% general real-world accuracy and outperforming peers with 50% fewer errors on medical terms.
Engineered for the speed and complexity of care, the model extends coverage of medical care and pharmaceutical terms and is optimized for rapid, multi-speaker dialogue. The result: cleaner notes, fewer corrections, and a clearer record of each encounter.
Our goal is simple: build speech tech clinicians can trust in the messiness of real-world practice. When every voice is understood, the experience feels human again.
This is why we build, so teams can focus on what matters.”
Katy Wigdahl, CEO, Speechmatics
Available in both batch and real-time modes, this release brings consistent high performance to AI-Scribe & dictation-driven workflows.
Healthcare is rapidly shifting toward ambient documentation at scale. A recent NEJM Catalyst study noted widespread use of AI scribe technology, reporting over 15,700 hours of documentation time saved across 2.6 million patient visits.
Speechmatics' new medical model is designed to accelerate this shift, providing higher transcription accuracy with robust performance across clinical settings.
Healthcare speech is complex and chaotic. This new model is built for it. It's accent-independent by design, recognizing diverse voices, fast turn-taking, and shifting context.
Real-time speaker diarization distinguishes clinicians, patients, and family members, ensuring clear attribution even with background noise or interruptions.
Expanded medical vocabulary includes drug names, dosages, procedures, plus numerical and temporal formatting for structured outputs. This reduces manual correction and improves clarity at handover.
“Out of the box, it was extremely accurate on medication terms and dosages, separated speakers cleanly, and held up in noisy rooms. That reliability is why we use it.
Accent handling and speaker differentiation have also made a huge difference to our teams.”
Karan Wallia, CEO, Nordhealth Therapy
Speechmatics’ new enhanced medical model cuts errors where they matter most: 50% fewer keyword errors on clinical terms and 17% lower overall word errors than the next best system.
In latest benchmarking, Speechmatics achieved:
93% general accuracy (7% WER, 17% lower than the next best vendor)
96% medical keyword recall
4% keyword error rate (half the number of keyword errors on medical terminology vs. the next best vendor)
These results place Speechmatics significantly ahead of other leading vendors (peer accuracy range: 74-91%; next best: 91%).
Unlike other providers, Speechmatics models are built real-time first, meaning switching from file-based transcription to real-time doesn’t need to be an accuracy trade-off anymore.
Keyword Error Rate (KER) is a crucial metric in clinical domains, measuring the system’s ability to recognize key medical terms correctly.
With a KER of just 4%, Speechmatics outperformed all evaluated systems, helping ensure that critical information like diagnoses, dosages, and timelines are captured accurately.
The model is deployed using NVIDIA GPUs, optimized through NVIDIA Triton Inference Server and CUDA acceleration. This enables high-throughput, low-latency processing at scale, with deployment flexibility across data center, private cloud, and edge environments.
This architecture supports rapid customization and horizontal scaling for diverse healthcare deployments—from telehealth platforms and contact centers to EHR-connected scribes and bedside tools.
Available now: The upgraded Medical Model for Real-time is available now in preview via the Speechmatics Portal and API. For batch trials, please contact the Speechmatics team referencing Batch Medical Model. English is supported at launch, with additional languages rolling out across our 55 language portfolio. The model will be demonstrated live at HLTH US in Las Vegas, October 19-22, 2025.
Speechmatics provide enterprise-grade speech recognition technology trusted by global brands to power voice experiences that work in the real world. Their Speech-to-Text (STT) API turns messy, multilingual, multi-speaker audio into structured, accurate text—in real time and with minimal latency.
Built for developers and scaled for enterprises, Speechmatics technology integrates seamlessly into existing workflows and platforms, with flexible deployment across cloud, on-prem, or edge environments. Their STT models understand 55+ languages, adapt to accents, and handle overlapping speech with precision.
Speechmatics powers leading technology providers such as AI Media, Content Guru, and Nordhealth Therapy across industries including healthcare, media, contact centers, voice agents and AI driven workflows. They also partner with Voice AI platforms like LiveKit and Pipecat to help application builders scale. Speechmatics is headquartered in Cambridge and London.
Contact: events@speechmatics.com 6th Floor, Classic House, 174-180 Martha's Buildings, Old St London, EC1V 9BP