When we talk about breakthroughs in healthcare AI, the headlines tend to focus on diagnostics like Microsoft's super-accurate AI, drug discovery like AI-designed antibiotics, or LLMs like "Dr. ChatGPT".
But often, the most meaningful gains happen further down the stack, where documentation meets reality.
That is the layer our new Medical Model upgrades.
Short on time? Here is the snapshot. Speechmatics new Medical Model:
93% accuracy in clinical transcription with 7% WER.
50% fewer errors on medical terms vs. the next best system.
96% keyword recall and 4% keyword error rate.
Real-time first design and consistent performance across batch and live workflows.
Built for clinical messiness with accent-independent recognition and speaker diarization.
Expanded medical vocabulary with numeric and temporal formatting.
Available now - try it in our Portal preview or via API (real-time and batch supported).
Clinical AI is moving from pilot to production, and the bottleneck is documentation that keeps pace with real conversations.
Physicians speak fast, patients interrupt, and acronyms collide with drug names. When the transcript wobbles, downstream tools misfire.
A recent study in NEJM Catalyst tracked over 7,000 physicians using AI scribes across 2.6 million clinical encounters in a single year. The results are telling:
15,700 hours of documentation time saved—equivalent to almost 1,800 workdays
84% of doctors said patient interactions improved
82% reported better job satisfaction
High‑volume users, particularly in emergency medicine, primary care, and mental health, saw the biggest gains. Even low‑frequency users reported measurable time savings… and not a single patient in the study reported a drop in care quality
Transcription that understands clinical nuance is clearly not a luxury, but a multiplier, and this is the gap we know our Medical Model will close.
We ran side by side tests across multiple clinical sets to measure what matters in practice.
Headline results:
93% general accuracy measured as 7.27% WER.
96.0% medical keyword recall so critical terms land in the transcript.
4.0% keyword error rate, which translates to fewer mistakes on diagnoses, drug names, and timelines.
~50% fewer keyword errors on clinical terms, and ~17% lower overall word errors than the next best system.
Model | KER | WER | Accuracy |
---|---|---|---|
Speechmatics Medical | 4.01% | 7.27% | 93% |
ElevenLabs Scribe | 8.51% | 8.78% | 91% |
Deepgram Nova‑3 Medical | 9.74% | 8.88% | 91% |
AssemblyAI Standard | 11.42% | 9.21% | 91% |
OpenAI Whisper‑1 | 12.46% | 11.10% | 89% |
Microsoft Enhanced | 13.98% | 12.25% | 88% |
Amazon Medical Dictation | 12.47% | 14.15% | 86% |
Google Medical Dictation | 16.50% | 17.10% | 83% |
Across our test sets, Speechmatics leads both on general accuracy and clinical term handling.
Why KER matters: Clinical documentation rides on keywords. A missed allergy, an incorrect dosage, or a wrong laterality can derail care. Tracking keyword error rate alongside WER gives a clearer view of clinical safety, not just raw accuracy.
Why are we polling first? There are four main changes that matter most for clinical use:
Vocabulary that speaks healthcare. Coverage for drug names, procedures, and clinical shorthand now lands reliably, including correct formatting for numbers, dosages, dates, and times. See our healthcare transcription support for more info.
Real-time diarization that keeps up. The system distinguishes clinicians, patients, and family members in the room, even with background noise or rapid turn-taking. Notes are easier to attribute, and handovers are cleaner.
Accent-independent by design. Healthcare is global. The model understands diverse accents and overlapping speech without forcing users to slow down or over-enunciate.
Real-time first. You get consistent accuracy whether you are streaming live audio or processing files, so teams do not trade precision for speed.
Together these updates reduce cognitive load and help keep the record clean.
The new medical model is engineered for low latency and high throughput. It handles live dictation, in-room capture, and telemedicine sessions without choking on domain-specific language.
Batch workloads run at scale for backlogs and historical records. Developers get predictable performance and operational simplicity across deployment environments.
Our models are also built real-time first, so moving from file-based transcription to streaming does not mean an accuracy trade off.
Here is what those gains mean in day to day work.
For developers - real-time transcription that stands up to domain pressure. Clean timestamps and entity handling simplify downstream NLP.
For clinicians - less screen time and more face time. Notes that reflect what was said rather than what the model guessed.
For patients - a calmer room. The computer listens, records, and stays out of the way.
When the transcript is right, everything built on top works better.
Hands on is the best proof.
Test the Medical Model in the Speechmatics Portal preview, or integrate it directly via the API. Both real-time and batch are supported.
You can also see our healthcare language coverage via our docs.
If you are heading to HLTH USA in Las Vegas this October, come see it in action.
Bring your toughest audio. Tell us what success looks like and we will help you measure it.
Which languages are supported today?
English is available now. Support aligns with our broader language coverage, including healthcare transcription. Contact us for roadmap details.
Does it handle speaker changes in busy rooms? Yes. Real-time speaker diarization separates speakers for cleaner attribution in clinical settings.
Where does the model fit in my stack? Use it for ambient scribing, clinician dictation, telemedicine, and call-based triage. Feed transcripts into EHRs, analytics, and LLM-powered assistants.
How does it deploy? Use our managed service or talk to us about enterprise options that meet your security and governance needs.
If you have a question not covered here, reach out and we will get you an answer.