What does Speechmatics do?

Speechmatics provides speech technology and Voice AI for enterprises, offering accurate Speech-to-Text, Text-to-Speech, and Voice Agent solutions. Our models understand every voice and accent across 56+ languages, helping businesses unlock the full potential of voice data.

How accurate is Speechmatics Speech-to-Text?

Speechmatics delivers best-in-market accuracy, achieving up to 99% word accuracy and 96% medical keyword recall in industry benchmarks. Our models handle multiple accents, noisy environments, and multi speakers with ease.

What makes Speechmatics Text-to-Speech different?

Our low-latency Text-to-Speech (TTS) delivers lifelike, human-sounding voices with sub-150ms latency that is ideal for real-time conversations. Developers can stream natural speech in multiple voices and deploy it in the cloud, hybrid, or on-prem for privacy and control.

Can I build real-time voice agents with Speechmatics?

Our voice AI enables developers to build real-time voice agents that listen, understand, and respond naturally. Plug in fast with a flexible API and native integrations to power your AI voice agents.

Which industries use Speechmatics?

Speechmatics is trusted by organizations in media, healthcare, contact center, medical, finance, legal, education, and accessibility. Our technology powers transcription, translation, call analytics, and voice AI applications worldwide.

Speechmatics sets record in medical Speech-to-Text with 93% accuracy, powered by NVIDIA

Cambridge, UK – September 15, 2025 — Speechmatics today launched a next-generation Medical Speech-to-Text (STT) model for clinical transcription, reaching 93% general real-world accuracy and outperforming peers with 50% fewer errors on medical terms. And now available in Spanish, French, Dutch and Finnish too.

Engineered for the speed and complexity of care, the model extends coverage of medical care and pharmaceutical terms and is optimized for rapid, multi-speaker dialogue. The result: cleaner notes, fewer corrections, and a clearer record of each encounter.

Our goal is simple: build speech tech clinicians can trust in the messiness of real-world practice. When every voice is understood, the experience feels human again.
This is why we build, so teams can focus on what matters.” Katy Wigdahl, CEO, Speechmatics

Available in both batch and real-time modes, this release brings consistent high performance to AI-Scribe & dictation-driven workflows.

Healthcare is rapidly shifting toward ambient documentation at scale. A recent NEJM Catalyst study noted widespread use of AI scribe technology, reporting over 15,700 hours of documentation time saved across 2.6 million patient visits.

Speechmatics' new medical model is designed to accelerate this shift, providing higher transcription accuracy with robust performance across clinical settings.

Clinical-grade recognition for real-world messiness

Healthcare speech is complex and chaotic. This new model is built for it. It's accent-independent by design, recognizing diverse voices, fast turn-taking, and shifting context.

Real-time speaker diarization distinguishes clinicians, patients, and family members, ensuring clear attribution even with background noise or interruptions.

Expanded medical vocabulary includes drug names, dosages, procedures, plus numerical and temporal formatting for structured outputs. This reduces manual correction and improves clarity at handover.

“Out of the box, it was extremely accurate on medication terms and dosages, separated speakers cleanly, and held up in noisy rooms. That reliability is why we use it.
Accent handling and speaker differentiation have also made a huge difference to our teams.”
Karan Wallia, CEO, Nordhealth Therapy

How it stacks up: benchmark-beating accuracy

Speechmatics’ new enhanced medical model cuts errors where they matter most: 50% fewer keyword errors on clinical terms and 17% lower overall word errors than the next best system.

In latest benchmarking, Speechmatics achieved:

93% general accuracy (7% WER, 17% lower than the next best vendor)
96% medical keyword recall
4% keyword error rate (half the number of keyword errors on medical terminology vs. the next best vendor)

These results place Speechmatics significantly ahead of other leading vendors (peer accuracy range: 74-91%; next best: 91%).

Unlike other providers, Speechmatics models are built real-time first, meaning switching from file-based transcription to real-time doesn’t need to be an accuracy trade-off anymore.

Keyword Error Rate (KER) is a crucial metric in clinical domains, measuring the system’s ability to recognize key medical terms correctly.

With a KER of just 4%, Speechmatics outperformed all evaluated systems, helping ensure that critical information like diagnoses, dosages, and timelines are captured accurately.

Built on NVIDIA, ready for scale

The model is deployed using NVIDIA GPUs, optimized through NVIDIA Triton Inference Server and CUDA acceleration. This enables high-throughput, low-latency processing at scale, with deployment flexibility across data center, private cloud, and edge environments.

This architecture supports rapid customization and horizontal scaling for diverse healthcare deployments—from telehealth platforms and contact centers to EHR-connected scribes and bedside tools.

Available now: The upgraded Medical Model for Real-time is available now in preview via the Speechmatics Portal and API. For batch trials, please contact the Speechmatics team referencing Batch Medical Model. English is supported at launch, with additional languages rolling out across our 56+ language portfolio. The model will be demonstrated live at HLTH US in Las Vegas, October 19-22, 2025.

About Speechmatics

Speechmatics provide enterprise-grade speech recognition technology trusted by global brands to power voice experiences that work in the real world. Their Speech-to-Text (STT) API turns messy, multilingual, multi-speaker audio into structured, accurate text—in real time and with minimal latency.

Built for developers and scaled for enterprises, Speechmatics technology integrates seamlessly into existing workflows and platforms, with flexible deployment across cloud, on-prem, or edge environments. Their STT models understand 56+ languages, adapt to accents, and handle overlapping speech with precision.

Speechmatics powers leading technology providers such as AI Media, Content Guru, and Nordhealth Therapy across industries including healthcare, media, contact centers, voice agents and AI driven workflows. They also partner with Voice AI platforms like LiveKit and Pipecat to help application builders scale. Speechmatics is headquartered in Cambridge and London.

Contact: events@speechmatics.com 6th Floor, Classic House, 174-180 Martha's Buildings, Old St London, EC1V 9BP

Sep 15, 2025 | Read time 4 min

Speechmatics sets record in medical Speech-to-Text with 93% accuracy, powered by NVIDIA

Clinical-grade recognition for real-world messiness

How it stacks up: benchmark-beating accuracy

Built on NVIDIA, ready for scale

About Speechmatics

Read also

Related Articles

AI for medical transcription: The ultimate guide to healthcare Speech Recognition

Speechmatics launches Medical Model for real-time clinical transcription

What is Ambient AI? How Voice-First Tech is Transforming Healthcare

Latest Articles

Dutch doctors spend a quarter of their day on admin. Wellcom has built the fix.

Speechmatics versus Whisper: how Adobe Premiere's on-device speech engine got rebuilt

The Adobe story: How we made cloud-grade AI work on your laptop

De-risk your voice agent: The 11 best voice agent testing platforms in 2026

How to build a microbatching workflow with the Speechmatics API

Alphanumeric speech recognition: why voice assistants mangle SKUs (and how to fix it)