Sep 13, 2024 | Read time 2 min
Last updated Sep 10, 2025

Transforming live broadcasts: Speechmatics' real-time ASR now on NVIDIA Holoscan for media

Discover how Speechmatics and NVIDIA Holoscan bring real time captions to live media with speed, accuracy and global reach.
SpeechmaticsEditorial team

TL;DR

Speechmatics is the first speech-to-text provider integrated with NVIDIA Holoscan for Media, bringing real-time, low-latency captioning to live broadcasts. This integration enables media companies to deliver highly accurate, AI-powered live captions across 55+ languages using NVIDIA’s edge-to-cloud infrastructure with edge AI capabilities—transforming accessibility, engagement, and immediacy for global audiences.

Speechmatics Becomes First Speech-to-Text Provider on NVIDIA Holoscan for Media

This milestone integration showcases the power of combining Speechmatics’ state-of-the-art speech recognition with NVIDIA’s next-generation live media infrastructure, as Speechmatics joins over 10,000 AI startups in the prestigious NVIDIA Inception Program.

David Agmen-Smith, Director of Product at Speechmatics, shared: “Speechmatics is delighted to extend our collaboration with NVIDIA and become the first speech-to-text provider on the software-defined NVIDIA Holoscan for Media platform. Our years of foundational research into speech AI have allowed us to lead the automatic speech recognition field in terms of accuracy, even at very low latency. The combination of Speechmatics with Holoscan for Media allows lightning-quick and highly accurate captions to be broadcast, enhancing viewer experiences. This integration leverages high performance computing to deliver exceptional performance and real time processing, enabling efficient, low-latency captioning for demanding live media applications.”

Why Holoscan for Media Changes Live Broadcasts

Holoscan for Media is a software-defined platform that enables live video pipelines to run on the same infrastructure as AI with unprecedented flexibility and efficiency, leveraging an IP-based, cloud-native architecture. The Holoscan Sensor Bridge is a critical component for integrating and processing high-bandwidth sensor data over Ethernet, enabling low-latency AI workflows.

It combines hardware systems—including sensors, networking, NVIDIA IGX, and GPUs—with optimized libraries and core microservices to deliver real-time, low-latency data processing and high performance. Network connectivity and Ethernet play a key role in supporting high-speed, low-latency data transmission for low latency sensor integration. The platform’s modularity is powered by essential operators and core microservices, allowing users to build flexible sensor processing pipelines and integrate seamlessly with diverse hardware components. Developers can use Graph Composer to visually create and manage data processing pipelines, build pipelines, and build streaming AI pipelines using the Holoscan SDK and NVIDIA Holoscan SDK. The software stack includes the SDK, Holoscan SDK, and NVIDIA Holoscan SDK as key components for workflow development, supporting rapid prototyping and the ability to create and run streaming applications.

Speechmatics’ integration is the first exciting example, providing customers with advanced speech-to-text with very low latency in their live production.

Guillaume Polaillon Product Manager, NVIDIA

Holoscan for Media is designed to meet strict latency requirements and support real time performance, making it suitable for applications that require immediate response. As a domain-agnostic solution, it serves as a comprehensive platform for data processing and AI across industries, including medical devices, medical imaging, industrial automation, and edge AI applications. The platform supports interoperability with other applications and can scale from edge to data center environments. Users have access to a wide range of resources, including reference applications, Holoscan Reference Applications, and benchmarks, to support development and troubleshooting.

“With Holoscan for Media, NVIDIA is transforming live media production by enabling live AI processing to run on the same platform as traditional broadcast applications,” said Guillaume Polaillon, product line manager, live media solutions at NVIDIA. “Speechmatics’ integration is the first exciting example, providing customers with advanced speech-to-text with very low latency in their live production.”

Speechmatics + NVIDIA: Real-Time Captions Across 55+ Languages

The integration of Speechmatics’ industry-leading automatic speech recognition (ASR) with Holoscan for Media sets the stage for transformational change in live captioning. Real-time transcription is critical in high-stakes environments like live sports, global news, and streaming services, where accuracy and immediacy directly shape viewer experience.

By supporting over 50 languages, Speechmatics ensures captions are not only fast and accurate, but also accessible to audiences worldwide. This commitment to inclusivity makes live broadcasts more engaging, diverse, and representative.

Who’s Already Using Holoscan for Media?

Early adopters of Holoscan for Media include ASG, Beamr, Comprimato, Lawo, Monks, Pebble, RED Digital Cinema, Sony Corporation, and Telestream. These companies can now combine their advanced workflows with Speechmatics’ enterprise-grade ASR to deliver captions at the lowest latency possible.

For broadcasters and production teams, this means they can run real-time data pipelines alongside capture cards and hardware platforms, rapidly prototype new workflows, and scale from edge to cloud with confidence. Users can leverage reference applications, Holoscan Reference Applications, and benchmarks to evaluate, customize, and optimize their workflows. Modular operators enable flexible workflow customization, while developers can create new sensor data processing pipelines and end-to-end applications using these tools.

Accessibility and Engagement for Global Audiences

The Speechmatics–NVIDIA partnership exemplifies a shared mission: to make live content accessible, engaging, and inclusive for everyone. Whether for accessibility teams, broadcasters, or global streaming platforms, the integration enhances live media by ensuring conversations, sports commentary, and news reach audiences in their own language—instantly and accurately.

This collaboration reinforces Speechmatics’ mission to Understand Every Voice and empowers industries to leverage cutting-edge AI technology in the latest evolution of live media.

FAQs: NVIDIA Holoscan

What is NVIDIA Holoscan?

NVIDIA Holoscan is an ai sensor processing platform and software-defined system designed to run real-time AI applications at the edge, particularly for processing sensor data such as video or audio in live media workflows. It is a domain-agnostic solution for data processing and AI, enabling streaming AI pipelines across various industries.

What does “Holoscan for Media” mean?

Holoscan for Media is a specialized branch of the broader Holoscan platform, which is a comprehensive hardware and software ecosystem designed to facilitate sensor data processing, AI application development, and deployment across edge, embedded, and cloud environments. Holoscan for Media focuses specifically on live video production, allowing AI applications (like Speechmatics’ ASR) to run alongside traditional broadcast workflows on unified infrastructure, reducing latency and boosting scalability. Holoscan for Media leverages advanced network connectivity and high-speed Ethernet to integrate seamlessly with diverse hardware and software components, enabling efficient, low-latency data transmission and real-time performance.

Why is Speechmatics integrated with NVIDIA Holoscan?

Speechmatics provides real-time speech recognition with low latency and high accuracy. Integrating with Holoscan for Media ensures that live captions can be generated and delivered with minimal delay—vital for live broadcasting, sports, and global news.

Who benefits from this Holoscan integration?

Broadcasters, sports networks, media production houses, and streaming platforms seeking accurate, instant captions in multiple languages. It also benefits accessibility teams aiming to serve global audiences in real time.

How does Holoscan improve real-time transcription?

Holoscan leverages powerful NVIDIA GPUs and optimized libraries to reduce processing time for AI workloads and lower CPU load through advanced technologies. This enables real-time ASR systems like Speechmatics to deliver near-instantaneous transcription, even under heavy video or audio data loads, by providing real-time processing capabilities for AI workloads.

Is Holoscan cloud-only?

No. Holoscan supports flexible deployment from edge to cloud, and can also be deployed in data center environments for large-scale processing. That means media companies can run real-time AI on-site (at the venue), in the cloud, or within a data center, depending on latency and bandwidth requirements.

"Speechmatics’ integration is the first exciting example, providing customers with advanced speech to text with very low latency in their live production."

Guillaume PolaillonProduct Line Manager, NVIDIA

Power your products with enterprise-grade Voice AI

We handle the speech, you deliver conversations that matter.