Dialogue is the centerpiece of modern content. Whether it's a podcast, a DIY instructional video, or a documentary, what people say drives the story. Accurately understanding speech and giving creators control over how it’s used has become essential to producing compelling, high-quality content.
Now, as LLM-centric workflows take hold and natural language becomes the interface for shaping stories, that speech-to-text foundation matters more than ever. Accurate transcription isn't just a feature—it's the layer that optimizes content workflows, enables faster content creation and makes agentic AI work.
Speechmatics has been Adobe's partner since 2021, when Adobe became the first non-linear editing platform to include speech-to-text (STT) in Premiere. Today, that partnership deepens with a new on-device STT model in Premiere that delivers near-cloud accuracy while keeping all audio local to the device.
On-device from the start, evolved for today
When Adobe launched STT for Premiere, large enterprises couldn't always use cloud-based services due to privacy concerns. Speechmatics was one of the few providers with on-device models—a key reason for the partnership.
Five years later, those privacy requirements haven't changed. With the rise of LLMs and data sovereignty concerns, the need for secure deployments has, in fact, increased. What has changed is the performance gap: Speechmatics' new on-device model brings local transcription on par with cloud accuracy with optimizations to run efficiently.
Studios, agencies, and production companies handling content before it goes public can now work seamlessly from anywhere: on a film set, between client meetings, on a flight—at full accuracy, with no dependency on a connection and no interruption to the work.
Editing video and audio with text, creating captions quickly, and labeling speakers with industry-leading speaker diarization—all local, all private, all accurate.
Voice AI that works for everyone
For voice to be useful for creative work, it has to understand how people actually speak. The new Speechmatics on-device model has been trained on millions of hours of speech to deliver high accuracy for accented speech, non-native speakers, and noisy environments like field reporting or film sets.
The benchmark results reflect that. The new on-device model in Premiere:
Is within 5% relative to cloud accuracy, evaluated across nearly 10 million words of diverse real-world data
Processes 1 hour of audio in about 55 seconds
Leads the way against the closest competitor, with a 12-16% improvement against Whisper-powered creative solutions
Runs on Windows & Mac, making use of the latest AI acceleration techniques to ensure efficient processing across a range of hardware, including broad hardware support for the latest Mac M5, NVIDIA RTX, AMD GPUs and older hardware such as Intel Macs
“Adobe's global creator community speaks hundreds of languages and dialects. Since 2021, our partnership has focused on making sure speech technology works for everyone - whether you're editing in Scottish English, Mexican Spanish, or Cantonese. Today, millions of users can benefit from accurate transcription that works anywhere - on-device for privacy, and in the cloud for scale - without compromising performance."
"As Adobe builds toward LLM-powered creative workflows, having a speech foundation that truly understands diverse voices becomes even more critical. We're proud to be part of that future.”
Speechmatics on-device joins Speechmatics cloud and Speechmatics on-prem as a purpose-built option for ISVs and OEMs where data residency, offline capability, or predictable costs make local execution the right architectural call. It integrates as a C/C++ library on macOS and Windows.
About Speechmatics
Speechmatics is the Voice AI company on a mission to understand every voice. Its speech-to-text technology delivers industry-leading accuracy across 55+ languages, with specialized models for healthcare, media, contact centers, and enterprise organizations worldwide. Speechmatics powers leading technology providers including Adobe, AI Media, Content Guru, and Nordhealth, and offers deployment across cloud, Speechmatics on-prem, and on-device. Headquartered in Cambridge, UK.