Jan 28, 2026 | Read time 3 min

Speechmatics launches new Swedish medical model, cutting transcription errors by 40%

Expanding a Nordic medical lineup with 3.91% KWER model that delivers sub-second latency across Swedish, Finnish, Danish, and Norwegian clinical workflows.
Swedish medical model roll out - Header (1200 × 900)
Yahia Abaza
Yahia AbazaSenior Product Manger

Speechmatics today launched a medical-grade Swedish speech-to-text model achieving 3.91% Keyword Error Rate (KWER) on medical terms. This is 40% lower than the closest competitor, with real-time performance in milliseconds. 

The model handles complex Swedish medical terminology, rapid multi-speaker dialogue, and diverse Nordic accents in noisy clinical environments, delivering accuracy that enables reliable automation in patient documentation, ambient scribes, and voice-driven workflows.

The Swedish release expands Speechmatics' Nordic medical lineup alongside Finnish, Danish, and Norwegian models. 

This expansion arrives as healthcare organizations increasingly adopt ambient documentation and autonomous AI agents, where transcription accuracy is non-negotiable.

Why Swedish medical speech is hard

Swedish presents distinct challenges for speech recognition: compound words that combine multiple terms into single units, regional dialectal variation, and pitch accents that change meaning. 

Layer in medical domain complexity (pharmaceutical names, dosages, procedures, ICD-10 codes) and the difficulty compounds. Clinicians speak fast, often with overlapping dialogue between patient and provider, in rooms with background noise and interruptions.

Speechmatics approaches these challenges the same way it tackled languages such as Norwegian: collect region-specific training data, model acoustic variation across dialects, and build language models that understand compound word formation rather than memorizing every possible combination. 

This philosophy – target the hard cases, not clean demos, enables the model to parse pharmaceutical names spoken with regional accents and handle overlapping speech without attribution errors.

What we built

The Swedish medical model was trained on billions of words of medical conversations, clinical documentation, and healthcare interactions. Unlike competitors, Speechmatics builds real-time models first, meaning switching from batch transcription to live ambient scribes doesn't force an accuracy trade-off.

The Swedish medical model delivers:

  • 3.91% KWER on medical test sets: 40% lower error rate than nearest competitor

  • Sub-second real-time latency: maintains near-batch accuracy at streaming speeds

  • Expanded medical vocabulary: drugs, dosages, procedures, abbreviations, ICD-10 codes

  • Accent-independent recognition: handles dialectal variation across Swedish regions

  • Real-time speaker diarization: distinguishes clinicians, patients, family members in overlapping dialogue

  • Compound word support: understands Swedish word formation without requiring exhaustive word lists

Proof: Swedish medical model vs. competitors

Results from medical test set include:

Provider

Model

KWER (Lower is better)

Speechmatics

Medical

3.91% 🏆

Google

Chirp_2

5.72%

AssemblyAI

Universal

6.05%

Amazon

Standard

6.53%

OpenAI

Whisper-1

6.81%

Deepgram

Nova-3

7.87%

Microsoft

Enhanced

10.56%

The 3.91% KWER translates to approximately 1,800 more words transcribed correctly per hour of audio compared to a 6% baseline. 

That means:

  • fewer corrections,

  • cleaner EHR integration,

  • and reduced friction in patient interactions. 

For clinical documentation workflows, this level of accuracy makes the difference between transcripts that require heavy manual editing and those that can be reviewed and approved with minimal changes.

Medical language coverage across the Nordics

Speechmatics now supports seven languages with dedicated medical models, including expanding Nordic coverage:

Language

Medical KWER

General WER

Swedish

3.91%

7.76%

Finnish

5.41%

6.59%

Danish

6.15%

9.59%

Norwegian

7.25%

7.13%

This roster enables Nordic healthcare providers to standardize on a single Voice AI platform while supporting native-language workflows across Swedish, Finnish, Danish, and Norwegian operations. It also positions Speechmatics for expansion into emerging multilingual use cases, including code-switching conversations in bilingual clinical environments and cross-border telehealth platforms.

Enabling autonomous medical AI workflows

Medical-grade speech recognition is becoming foundational infrastructure for autonomous healthcare agents. 

Speechmatics' recent partnership with Sully.ai demonstrates this shift in practice. Sully scaled from single-doctor clinics to enterprise customers with 500+ providers in under a year, deploying AI receptionists and clinical scribes that handle real operational tasks. Their north star metric, Minutes Added to Workforce (MAW), measures how agentic AI drives efficiency within healthcare. As of December 2025, Sully has added more than 30 million minutes back to the healthcare workforce, with customers seeing 21x ROI in early case studies.

"We needed speech models that work in real clinical environments: complex medical terminology, fast overlapping dialogue, accents, imperfect audio. We've seen Speechmatics handle medications better on our troublesome audio than any competitor."

Ahmed Omar, Founder & CEO, Sully.ai

The Swedish launch extends this capability across the Nordics, enabling ambient scribes, AI receptionists, and documentation assistants to operate in native languages without sacrificing the accuracy that makes automation practical.

Production-ready for regulated environments

Healthcare organizations need speech technology that works within their compliance frameworks and operational infrastructure. Speechmatics' Swedish medical model supports on-premises deployment for data residency requirements, on-device processing for edge use cases, and hybrid architectures that balance cloud scalability with regulatory constraints. 

This flexibility allows enterprises to adopt Voice AI without compromising on performance, security, or speed.

"High-accuracy, low-latency speech recognition is a core requirement for clinical workflows that operate safely at scale. With Swedish, we're enabling Nordic healthcare organizations to deploy ambient scribes and AI agents without compromising on quality, compliance, or real-time performance."

Yahia Abaza, Product Manager, Speechmatics

The English breakthrough that launched a portfolio

The Swedish medical model builds on Speechmatics' September 2025 breakthrough: an English medical model that set industry benchmarks at 93% accuracy (7% WER), 96% medical keyword recall, and a keyword error rate 50% lower than the nearest competitor. 

That release, powered by NVIDIA infrastructure and trained on 14 billion words of medical data, established the architecture and training methodology now applied across the Nordic medical lineup.

Each new language undergoes rigorous testing and optimization before release, ensuring the models handle the demands of real healthcare environments: rapid multi-speaker dialogue, medical abbreviations, drug dosages, and diverse accents. 

The result is consistent high performance across languages, with deployment flexibility that supports ambient scribes, telehealth platforms, clinical contact centers, and EHR-connected documentation tools.

What's next?

Speechmatics continues expanding its medical language portfolio, with additional languages rolling out on request. 

The company is also investing in emerging medical AI workflows, including autonomous agents that handle patient access, appointment scheduling, and care coordination, use cases where speech accuracy directly impacts operational efficiency and patient experience.

Nordic healthcare organizations can begin testing the Swedish medical model today through the Speechmatics Portal and API, with support for both real-time and batch transcription workflows.

Speak to the team: Schedule a technical demo and discuss deployment options for your clinical workflows.

Try it yourself: Access the Swedish medical model through the Speechmatics Portal for immediate testing.

Experience the future of medical transcription today

With Speechmatics’ new Medical Model, you’ll streamline documentation, enhance patient care, and reduce administrative burdens.

Latest Articles

Carousel slide image
Use Cases

The court reporter shortage crisis: data, causes, and what legal teams are doing about it

The court reporter shortage is reshaping litigation. Explore data, causes, and how legal teams are using digital reporting and AI transcription to adapt.

Tom Young
Tom YoungDigital Specialist
Carousel slide image
Use Cases

What Word Error Rate Is Acceptable for Legal Transcription?

Word error rate for legal transcription has no single acceptable threshold. But knowing how accuracy, audio quality, and review obligations connect to real legal risk is what separates a reliable transcript from a costly one.

Tom Young
Tom YoungDigital Specialist
[alt: Bilingual medical model featuring terms related to various health conditions and medications in Arabic and English. Key terms include "Chronic kidney disease," "Heart attack," "Diabetes," and "Insulin," among others, displayed in an organized layout.]
Product

Speechmatics achieves a world first in bilingual Voice AI with new Arabic–English model

Sets a new accuracy bar for real-world code-switching: 35% fewer errors than the closest competitor.

Speechmatics
SpeechmaticsEditorial Team
[alt: Illuminated ancient mud-brick structures stand against a dusk sky, showcasing architectural details and textures. Palm trees are in the foreground, adding to the setting's ambiance. Visually captures a historic site in twilight.]
Product

Your voice agent speaks perfect Arabic. That's the problem.

Most voice AI models are trained on formal Arabic, but real conversations across the Middle East mix dialects and English in ways those systems aren’t built to handle.

Yahia Abaza
Yahia AbazaSenior Product Manger
new blog image header
Technical

How Nvidia Dominates the HuggingFace Leaderboards in This Key Metric

A technical deep-dive into Token Duration Transducers (TDT) — the frame-skipping architecture behind Nvidia's Parakeet models. Covers inference mechanics, training with forward-backward algorithm, and how TDT achieves up to 2.82x faster decoding than standard RNN-T.

Oliver Parish
Oliver Parish Machine Learning Engineer
[alt: Healthcare professionals in scrubs and lab coats walk briskly down a hospital corridor. A nurse uses a tablet while others carry patient charts and attend to a gurney. The setting conveys a busy, clinical environment focused on patient care.]
Use Cases

Why AI-native EHR platforms will treat speech as core infrastructure in 2026

As clinical workflows become automated and AI-driven, real-time speech is shifting from a transcription feature to the foundational intelligence layer inside modern EHR systems.

Vamsi Edara
Vamsi EdaraFounder and CEO, Edvak EHR