What does Speechmatics do?

Speechmatics provides speech technology and Voice AI for enterprises, offering accurate Speech-to-Text, Text-to-Speech, and Voice Agent solutions. Our models understand every voice and accent across 56+ languages, helping businesses unlock the full potential of voice data.

How accurate is Speechmatics Speech-to-Text?

Speechmatics delivers best-in-market accuracy, achieving up to 99% word accuracy and 96% medical keyword recall in industry benchmarks. Our models handle multiple accents, noisy environments, and multi speakers with ease.

What makes Speechmatics Text-to-Speech different?

Our low-latency Text-to-Speech (TTS) delivers lifelike, human-sounding voices with sub-150ms latency that is ideal for real-time conversations. Developers can stream natural speech in multiple voices and deploy it in the cloud, hybrid, or on-prem for privacy and control.

Can I build real-time voice agents with Speechmatics?

Our voice AI enables developers to build real-time voice agents that listen, understand, and respond naturally. Plug in fast with a flexible API and native integrations to power your AI voice agents.

Which industries use Speechmatics?

Speechmatics is trusted by organizations in media, healthcare, contact center, medical, finance, legal, education, and accessibility. Our technology powers transcription, translation, call analytics, and voice AI applications worldwide.

Transforming live broadcasts: Speechmatics' real-time ASR now on NVIDIA Holoscan for media

TL;DR

Speechmatics is the first speech-to-text provider integrated with NVIDIA Holoscan for Media, bringing real-time, low-latency captioning to live broadcasts. This integration enables media companies to deliver highly accurate, AI-powered live captions across 56+ languages using NVIDIA’s edge-to-cloud infrastructure with edge AI capabilities—transforming accessibility, engagement, and immediacy for global audiences.

Speechmatics Becomes First Speech-to-Text Provider on NVIDIA Holoscan for Media

This milestone integration showcases the power of combining Speechmatics’ state-of-the-art speech recognition with NVIDIA’s next-generation live media infrastructure, as Speechmatics joins over 10,000 AI startups in the prestigious NVIDIA Inception Program.

David Agmen-Smith, Director of Product at Speechmatics, shared: “Speechmatics is delighted to extend our collaboration with NVIDIA and become the first speech-to-text provider on the software-defined NVIDIA Holoscan for Media platform. Our years of foundational research into speech AI have allowed us to lead the automatic speech recognition field in terms of accuracy, even at very low latency. The combination of Speechmatics with Holoscan for Media allows lightning-quick and highly accurate captions to be broadcast, enhancing viewer experiences. This integration leverages high performance computing to deliver exceptional performance and real time processing, enabling efficient, low-latency captioning for demanding live media applications.”

Why Holoscan for Media Changes Live Broadcasts

Holoscan for Media is a software-defined platform that enables live video pipelines to run on the same infrastructure as AI with unprecedented flexibility and efficiency, leveraging an IP-based, cloud-native architecture. The Holoscan Sensor Bridge is a critical component for integrating and processing high-bandwidth sensor data over Ethernet, enabling low-latency AI workflows.

It combines hardware systems—including sensors, networking, NVIDIA IGX, and GPUs—with optimized libraries and core microservices to deliver real-time, low-latency data processing and high performance. Network connectivity and Ethernet play a key role in supporting high-speed, low-latency data transmission for low latency sensor integration. The platform’s modularity is powered by essential operators and core microservices, allowing users to build flexible sensor processing pipelines and integrate seamlessly with diverse hardware components. Developers can use Graph Composer to visually create and manage data processing pipelines, build pipelines, and build streaming AI pipelines using the Holoscan SDK and NVIDIA Holoscan SDK. The software stack includes the SDK, Holoscan SDK, and NVIDIA Holoscan SDK as key components for workflow development, supporting rapid prototyping and the ability to create and run streaming applications.

Speechmatics’ integration is the first exciting example, providing customers with advanced speech-to-text with very low latency in their live production.
Guillaume Polaillon Product Manager, NVIDIA

Holoscan for Media is designed to meet strict latency requirements and support real time performance, making it suitable for applications that require immediate response. As a domain-agnostic solution, it serves as a comprehensive platform for data processing and AI across industries, including medical devices, medical imaging, industrial automation, and edge AI applications. The platform supports interoperability with other applications and can scale from edge to data center environments. Users have access to a wide range of resources, including reference applications, Holoscan Reference Applications, and benchmarks, to support development and troubleshooting.

“With Holoscan for Media, NVIDIA is transforming live media production by enabling live AI processing to run on the same platform as traditional broadcast applications,” said Guillaume Polaillon, product line manager, live media solutions at NVIDIA. “Speechmatics’ integration is the first exciting example, providing customers with advanced speech-to-text with very low latency in their live production.”

Speechmatics + NVIDIA: Real-Time Captions Across 56+ Languages

The integration of Speechmatics’ industry-leading automatic speech recognition (ASR) with Holoscan for Media sets the stage for transformational change in live captioning. Real-time transcription is critical in high-stakes environments like live sports, global news, and streaming services, where accuracy and immediacy directly shape viewer experience.

By supporting over 50 languages, Speechmatics ensures captions are not only fast and accurate, but also accessible to audiences worldwide. This commitment to inclusivity makes live broadcasts more engaging, diverse, and representative.

Who’s Already Using Holoscan for Media?

Early adopters of Holoscan for Media include ASG, Beamr, Comprimato, Lawo, Monks, Pebble, RED Digital Cinema, Sony Corporation, and Telestream. These companies can now combine their advanced workflows with Speechmatics’ enterprise-grade ASR to deliver captions at the lowest latency possible.

For broadcasters and production teams, this means they can run real-time data pipelines alongside capture cards and hardware platforms, rapidly prototype new workflows, and scale from edge to cloud with confidence. Users can leverage reference applications, Holoscan Reference Applications, and benchmarks to evaluate, customize, and optimize their workflows. Modular operators enable flexible workflow customization, while developers can create new sensor data processing pipelines and end-to-end applications using these tools.

Accessibility and Engagement for Global Audiences

The Speechmatics–NVIDIA partnership exemplifies a shared mission: to make live content accessible, engaging, and inclusive for everyone. Whether for accessibility teams, broadcasters, or global streaming platforms, the integration enhances live media by ensuring conversations, sports commentary, and news reach audiences in their own language—instantly and accurately.

This collaboration reinforces Speechmatics’ mission to Understand Every Voice and empowers industries to leverage cutting-edge AI technology in the latest evolution of live media.

FAQs: NVIDIA Holoscan

What is NVIDIA Holoscan?

NVIDIA Holoscan is an ai sensor processing platform and software-defined system designed to run real-time AI applications at the edge, particularly for processing sensor data such as video or audio in live media workflows. It is a domain-agnostic solution for data processing and AI, enabling streaming AI pipelines across various industries.

What does “Holoscan for Media” mean?

Holoscan for Media is a specialized branch of the broader Holoscan platform, which is a comprehensive hardware and software ecosystem designed to facilitate sensor data processing, AI application development, and deployment across edge, embedded, and cloud environments. Holoscan for Media focuses specifically on live video production, allowing AI applications (like Speechmatics’ ASR) to run alongside traditional broadcast workflows on unified infrastructure, reducing latency and boosting scalability. Holoscan for Media leverages advanced network connectivity and high-speed Ethernet to integrate seamlessly with diverse hardware and software components, enabling efficient, low-latency data transmission and real-time performance.

Why is Speechmatics integrated with NVIDIA Holoscan?

Speechmatics provides real-time speech recognition with low latency and high accuracy. Integrating with Holoscan for Media ensures that live captions can be generated and delivered with minimal delay—vital for live broadcasting, sports, and global news.

Who benefits from this Holoscan integration?

Broadcasters, sports networks, media production houses, and streaming platforms seeking accurate, instant captions in multiple languages. It also benefits accessibility teams aiming to serve global audiences in real time.

How does Holoscan improve real-time transcription?

Holoscan leverages powerful NVIDIA GPUs and optimized libraries to reduce processing time for AI workloads and lower CPU load through advanced technologies. This enables real-time ASR systems like Speechmatics to deliver near-instantaneous transcription, even under heavy video or audio data loads, by providing real-time processing capabilities for AI workloads.

Is Holoscan cloud-only?

No. Holoscan supports flexible deployment from edge to cloud, and can also be deployed in data center environments for large-scale processing. That means media companies can run real-time AI on-site (at the venue), in the cloud, or within a data center, depending on latency and bandwidth requirements.

"Speechmatics’ integration is the first exciting example, providing customers with advanced speech to text with very low latency in their live production."
Guillaume PolaillonProduct Line Manager, NVIDIA

Power your products with enterprise-grade Voice AI

We handle the speech, you deliver conversations that matter.

Updated: Sep 10, 2025 | Read time 2 min

Transforming live broadcasts: Speechmatics' real-time ASR now on NVIDIA Holoscan for media

TL;DR

Speechmatics Becomes First Speech-to-Text Provider on NVIDIA Holoscan for Media

Why Holoscan for Media Changes Live Broadcasts

Speechmatics + NVIDIA: Real-Time Captions Across 56+ Languages

Who’s Already Using Holoscan for Media?

Accessibility and Engagement for Global Audiences

FAQs: NVIDIA Holoscan

What is NVIDIA Holoscan?

What does “Holoscan for Media” mean?

Why is Speechmatics integrated with NVIDIA Holoscan?

Who benefits from this Holoscan integration?

How does Holoscan improve real-time transcription?

Is Holoscan cloud-only?

Power your products with enterprise-grade Voice AI

Read also

Related Articles

Speechmatics Joins Over 10,000 AI Startups in the Prestigious NVIDIA Inception Program

How we built real-time concurrency for Voice AI at scale

Knowing who said what: the importance of Speaker Diarization for analytics and conversations in Voice AI

Latest Articles

Speaker Focus: Fixing Voice AI for the real world

Stenograph and Speechmatics Announce Industry-First On-Device Integration for CATalyst VP

From a Parked Side Project to 30 Teams Running Real Sales Calls on Speechmatics

Dutch doctors spend a quarter of their day on admin. Wellcom has built the fix.

A Practical Guide to Building Voice AI Applications With Real-Time Transcription in 2026

Speechmatics versus Whisper: how Adobe Premiere's on-device speech engine got rebuilt