Jan 12, 2026 | Read time 4 min

Speechmatics and Sully.ai partner to scale healthcare AI infrastructure globally

Built on NVIDIA AI infrastructure, the collaboration combines medical-grade speech models and agent workflows to deliver measurable ROI; cutting admin time and scaling healthcare capacity worldwide.
Header-Sully-announcement
Speechmatics
SpeechmaticsEditorial Team

Cambridge, UK — 12 January 2025 — Speechmatics, the voice AI company on a mission to understand every voice, today announced a strategic partnership with Sully.ai to power the next generation of autonomous healthcare agents and scribes.

Built on NVIDIA AI infrastructure, the collaboration combines best in class medical-grade speech models with autonomous agent workflows to deliver AI receptionists and clinical scribes that handle real operational tasks and deliver tangible ROI across clinical settings.

The partnership arrives as global healthcare faces acute staffing shortages and mounting administrative costs. Sully, which scaled from single-doctor clinics to enterprise customers with 500+ providers in under a year, is already driving significant efficiency back into healthcare.

Their north star metric of Minutes Added to Workforce (MAW) measures how agentic AI drives efficiency within healthcare use cases. As of December 2025, Sully has added more than 30 million minutes back to the healthcare workforce and current customers include Oshi Health, Tebra and Midi.

21x ROI: Autonomous agents delivering measurable workforce impact, soon to benefit more patients around the globe

Sully's partners are seeing 21x ROI in early case studies with autonomous operating systems capitalizing on a suite of multiple agents working in coordination.

Typical ROIs include: 5%+ increase in patient retention, 2.4+ hours time saving per physician per day and 18.5% increase in capacity for patient appointments.

Building upon these early positive signals, Sully is seeing rapid expansion across large, multi-site provider networks, hence the requirement for Speechmatics - highly accurate, medical grade speech models that scale across different clinical environments.

Sully.ai selected Speechmatics after extensive internal testing across multiple speech model providers. Evaluation focused on clinical accuracy, handling ambiguous medical pronunciations, and real-time performance in noisy environments.

Speechmatics' English Medical Model set benchmark results in 2025, delivering 93% general real-time accuracy (7% WER) and 96% medical keyword recall, with a medical keyword error rate 50% lower than the nearest competitor. These gains reduce corrections, improve downstream documentation quality, and support more reliable automation in patient access and clinical workflows.

"We needed best-in-class speech models that work in real clinical environments: complex medical terminology, fast overlapping dialogue, accents, imperfect audio, not just clean test clips. Speechmatics has been the most responsive provider with solutioning for us, and we've seen them handle medications better on our troublesome audio than any competitor." — Ahmed Omar, Founder and CEO, Sully.ai

Speechmatics and Sully.ai plan to expand into new global markets including the Middle East following the launch of an English-Arabic bilingual model in early 2026. Bilingual, code-switching conversations are expected to be a defining requirement for voice automation in care delivery and patient access within this region.

Speechmatics' Arabic capabilities are designed to perform across Modern Standard Arabic as well as Egyptian, Gulf, and Levantine dialects, supporting consistent performance across varied speakers and accents.

Trained on 16 billion words of real medical conversations

Trained on over 16 billion words of medical conversations, clinical documentation, and healthcare interactions, the models deliver keyword error rates 5-20% lower than evaluated competitors on medical test sets across most languages.

This training scale enables the models to distinguish between "hypertension" and "hypotension" in noisy emergency rooms, understand pharmaceutical names with regional accents, handle overlapping clinician-patient speech, and parse medical abbreviations, drug dosages, and ICD-10 codes, all while maintaining near-batch accuracy at sub-second latency.

"High-accuracy, low-latency speech recognition is a core enabler for autonomous agents that 'actually listen' and operate safely in mission-critical environments. Together with Sully.ai, we're enabling healthcare organizations to deploy ambient scribes and AI agents across multiple languages and global markets without compromising on quality, security, or speed."

— Katy Wigdahl, CEO, Speechmatics

Built on NVIDIA, deployed on healthcare's terms

The model is deployed using NVIDIA AI infrastructure, optimized through NVIDIA Triton Inference Server and NVIDIA CUDA libraries. This enables high-throughput, low-latency processing at scale, with deployment flexibility across data center, private cloud, and edge environments. This architecture supports rapid customization and horizontal scaling for diverse healthcare deployments: from telehealth platforms and contact centers to EHR-connected scribes and bedside tools.

Unlike cloud-only competitors, Speechmatics supports on-premises, private cloud, and SaaS deployment, critical for organizations navigating data residency requirements, HIPAA compliance, and regulatory frameworks. This flexibility allows enterprises to keep sensitive patient data within their own infrastructure while accessing state-of-the-art speech technology.

About Speechmatics

Speechmatics is the voice AI company on a mission to understand every voice. The company's speech-to-text technology delivers industry-leading accuracy across 55+ languages, with specialized medical models trained on over 16 billion words of clinical data. Speechmatics' real-time and batch transcription APIs power applications for healthcare, media, contact centers, and voice agent organizations worldwide, with customers including AI Media, Content Guru and boost.ai. Founded in Cambridge, UK, Speechmatics has deployment options spanning on-premises, private cloud, SaaS infrastructure and On-Device. Learn more at www.speechmatics.com.

About Sully.ai

Sully.ai is transforming healthcare operations with autonomous AI agents that handle mission-critical workflows across multi-doctor practices and large provider networks. The company's full suite (from voice AI receptionists to clinical scribes) is built on an agentic operating system designed for the complexity of real healthcare environments. Founded by a team bringing together medical expertise, technical founders, and entrepreneurial leadership, Sully scaled from single-doctor clinics to enterprise customers with 500+ providers in under a year. Learn more at www.sully.ai.

Media Contact

Mieke Smith Content Lead, Speechmatics mieke.smith@speechmatics.com +44 7713 014319

Latest Articles

[alt: Bilingual medical model featuring terms related to various health conditions and medications in Arabic and English. Key terms include "Chronic kidney disease," "Heart attack," "Diabetes," and "Insulin," among others, displayed in an organized layout.]
Product

Speechmatics achieves a world first in bilingual Voice AI with new Arabic–English model

Sets a new accuracy bar for real-world code-switching: 35% fewer errors than the closest competitor.

Speechmatics
SpeechmaticsEditorial Team
[alt: Illuminated ancient mud-brick structures stand against a dusk sky, showcasing architectural details and textures. Palm trees are in the foreground, adding to the setting's ambiance. Visually captures a historic site in twilight.]
Product

Your voice agent speaks perfect Arabic. That's the problem.

Most voice AI models are trained on formal Arabic, but real conversations across the Middle East mix dialects and English in ways those systems aren’t built to handle.

Yahia Abaza
Yahia AbazaSenior Product Manger
new blog image header
Technical

How Nvidia Dominates the HuggingFace Leaderboards in This Key Metric

A technical deep-dive into Token Duration Transducers (TDT) — the frame-skipping architecture behind Nvidia's Parakeet models. Covers inference mechanics, training with forward-backward algorithm, and how TDT achieves up to 2.82x faster decoding than standard RNN-T.

Oliver Parish
Oliver Parish Machine Learning Engineer
[alt: Healthcare professionals in scrubs and lab coats walk briskly down a hospital corridor. A nurse uses a tablet while others carry patient charts and attend to a gurney. The setting conveys a busy, clinical environment focused on patient care.]
Use Cases

Why AI-native EHR platforms will treat speech as core infrastructure in 2026

As clinical workflows become automated and AI-driven, real-time speech is shifting from a transcription feature to the foundational intelligence layer inside modern EHR systems.

Vamsi Edara
Vamsi EdaraFounder and CEO, Edvak EHR
[alt: Logos of Speechmatics and Edvak are displayed side by side, interconnected by a stylized x symbol. The background features soft, wavy lines in light blue, creating a modern and tech-focused aesthetic.]
Company

One word changes everything: Speechmatics and Edvak EHR partner to make voice AI safe for clinical automation at scale

Turning real-time clinical speech into trusted, EHR-native automation.

Speechmatics
SpeechmaticsEditorial Team
[alt: Concentric circles radiate outward from a central orange icon with a white Speechmatics logo. The background is dark blue, enhancing the orange glow. A thin green line runs horizontally across the lower part of the image.]
Technical

Speed you can trust: The STT metrics that matter for voice agents

What “fast” actually means for voice agents — and why Pipecat’s TTFS + semantic accuracy is the clearest benchmark we’ve seen.

Archie McMullan
Archie McMullanSpeechmatics Graduate