What does Speechmatics do?

Speechmatics provides speech technology and Voice AI for enterprises, offering accurate Speech-to-Text, Text-to-Speech, and Voice Agent solutions. Our models understand every voice and accent across 56+ languages, helping businesses unlock the full potential of voice data.

How accurate is Speechmatics Speech-to-Text?

Speechmatics delivers best-in-market accuracy, achieving up to 99% word accuracy and 96% medical keyword recall in industry benchmarks. Our models handle multiple accents, noisy environments, and multi speakers with ease.

What makes Speechmatics Text-to-Speech different?

Our low-latency Text-to-Speech (TTS) delivers lifelike, human-sounding voices with sub-150ms latency that is ideal for real-time conversations. Developers can stream natural speech in multiple voices and deploy it in the cloud, hybrid, or on-prem for privacy and control.

Can I build real-time voice agents with Speechmatics?

Our voice AI enables developers to build real-time voice agents that listen, understand, and respond naturally. Plug in fast with a flexible API and native integrations to power your AI voice agents.

Which industries use Speechmatics?

Speechmatics is trusted by organizations in media, healthcare, contact center, medical, finance, legal, education, and accessibility. Our technology powers transcription, translation, call analytics, and voice AI applications worldwide.

Adobe and Speechmatics deliver cloud-grade speech recognition on-device for Premiere

Dialogue is the centerpiece of modern content. Whether it's a podcast, a DIY instructional video, or a documentary, what people say drives the story. Accurately understanding speech and giving creators control over how it’s used has become essential to producing compelling, high-quality content.

Now, as LLM-centric workflows take hold and natural language becomes the interface for shaping stories, that speech-to-text foundation matters more than ever. Accurate transcription isn't just a feature—it's the layer that optimizes content workflows, enables faster content creation and makes agentic AI work.

Speechmatics has been Adobe's partner since 2021, when Adobe became the first non-linear editing platform to include speech-to-text (STT) in Premiere. Today, that partnership deepens with a new on-device STT model in Premiere that delivers near-cloud accuracy while keeping all audio local to the device.

On-device from the start, evolved for today

When Adobe launched STT for Premiere, large enterprises couldn't always use cloud-based services due to privacy concerns. Speechmatics was one of the few providers with on-device models—a key reason for the partnership.

Five years later, those privacy requirements haven't changed. With the rise of LLMs and data sovereignty concerns, the need for secure deployments has, in fact, increased. What has changed is the performance gap: Speechmatics' new on-device model brings local transcription on par with cloud accuracy with optimizations to run efficiently.

Studios, agencies, and production companies handling content before it goes public can now work seamlessly from anywhere: on a film set, between client meetings, on a flight—at full accuracy, with no dependency on a connection and no interruption to the work.

Editing video and audio with text, creating captions quickly, and labeling speakers with industry-leading speaker diarization—all local, all private, all accurate.

Voice AI that works for everyone

For voice to be useful for creative work, it has to understand how people actually speak. The new Speechmatics on-device model has been trained on millions of hours of speech to deliver high accuracy for accented speech, non-native speakers, and noisy environments like field reporting or film sets.

The benchmark results reflect that. The new on-device model in Premiere:

Is within 5% relative to cloud accuracy, evaluated across nearly 10 million words of diverse real-world data
Processes 1 hour of audio in about 55 seconds
Leads the way against the closest competitor, with a 12-16% improvement against Whisper-powered creative solutions
Runs on Windows & Mac, making use of the latest AI acceleration techniques to ensure efficient processing across a range of hardware, including broad hardware support for the latest Mac M5, NVIDIA RTX, AMD GPUs and older hardware such as Intel Macs

“Adobe's global creator community speaks hundreds of languages and dialects. Since 2021, our partnership has focused on making sure speech technology works for everyone - whether you're editing in Scottish English, Mexican Spanish, or Cantonese. Today, millions of users can benefit from accurate transcription that works anywhere - on-device for privacy, and in the cloud for scale - without compromising performance."

"As Adobe builds toward LLM-powered creative workflows, having a speech foundation that truly understands diverse voices becomes even more critical. We're proud to be part of that future.”

Katy Wigdahl CEO, Speechmatics

Availability

Speechmatics on-device joins Speechmatics cloud and Speechmatics on-prem as a purpose-built option for ISVs and OEMs where data residency, offline capability, or predictable costs make local execution the right architectural call. It integrates as a C/C++ library on macOS and Windows.

About Speechmatics

Speechmatics is the Voice AI company on a mission to understand every voice. Its speech-to-text technology delivers industry-leading accuracy across 56+ languages, with specialized models for healthcare, media, contact centers, and enterprise organizations worldwide. Speechmatics powers leading technology providers including Adobe, AI Media, Content Guru, and Nordhealth, and offers deployment across cloud, Speechmatics on-prem, and on-device. Headquartered in Cambridge, UK.

Learn more at www.speechmatics.com.