What does Speechmatics do?

Speechmatics provides speech technology and Voice AI for enterprises, offering accurate Speech-to-Text, Text-to-Speech, and Voice Agent solutions. Our models understand every voice and accent across 55+ languages, helping businesses unlock the full potential of voice data.

How accurate is Speechmatics Speech-to-Text?

Speechmatics delivers best-in-market accuracy, achieving up to 99% word accuracy and 96% medical keyword recall in industry benchmarks. Our models handle multiple accents, noisy environments, and multi speakers with ease.

What makes Speechmatics Text-to-Speech different?

Our low-latency Text-to-Speech (TTS) delivers lifelike, human-sounding voices with sub-150ms latency that is ideal for real-time conversations. Developers can stream natural speech in multiple voices and deploy it in the cloud, hybrid, or on-prem for privacy and control.

Can I build real-time voice agents with Speechmatics?

Our voice AI enables developers to build real-time voice agents that listen, understand, and respond naturally. Plug in fast with a flexible API and native integrations to power your AI voice agents.

Which industries use Speechmatics?

Speechmatics is trusted by organizations in media, healthcare, contact center, finance, education, and accessibility. Our technology powers transcription, translation, call analytics, and voice AI applications worldwide.

Caption Chaos | Hilarious Times Captions Got It Wrong

We took a look at some of the captioning fails the internet has to offer. But why do these happen, you may be asking. We are going to take you through what the more serious impact can be as well as steps you can take to avoid them. But first let's get a kick out of some of the funniest caption fails.

2 Benedicts = Double beer batch?

How to spell Benedict Cumberbatch according to BBC subtitles pic.twitter.com/MFGd18lqk4
— Кати (@farmfeatures) March 19, 2014

The halftime snacks clearly weren't sufficient

Footballing cannibals! "Went on to eat their opponents..." #subtitlefail pic.twitter.com/twr356T9Xo
— Wendy Bradley (@wendybradley) December 1, 2013

Enjoy more in our Twitter thread >

[2/12] Big Smiles...and bad smells? 👀

Subtitles creating rumors! Great find, thanks for sharing @misslyssakitten! 👏 pic.twitter.com/SPP6SSNVSM
— Speechmatics (@Speechmatics) July 25, 2023

Human Errors

Nobody is perfect and that includes human stenographers who listen to audio and type out captions in real-time. This manual process can lead to human errors, where the spoken word can be misheard or misinterpreted. In live broadcasts or events, where captions are generated in real-time, there is limited time for corrections or revisions. Stenographers may struggle to keep up with the pace of speech, leading to captioning errors. This, coupled with accents, background noise, or unclear audio, can contribute to inaccuracies in the captions.

ASR can help, but often it isn't perfect

Automatic speech recognition (ASR) technology converts spoken language into written text without the assistance of a human. ASR brings down overall captioning costs and enables captioning of media content at scale. ASR also significantly reduces the time it takes to caption content. While the manual aspect of captioning is removed with ASR, errors can still occur and hinder what is intended to be conveyed.

Despite significant advancements in ASR technology, it still faces challenges. Accent and dialect variation is only one of many factors that influences speech recognition performance. ASR systems have also been shown to exhibit systematic inaccuracies or biases towards groups of speakers with varying age, gender, and other demographic factors.

Similarly to humans, background noise, overlapping speech, or low-quality audio can also impact the performance of ASR systems.

What are the risks when captions go rogue?

Your brand becomes the butt of the joke!

Today, anyone can become a viral sensation overnight – or a viral nightmare. As humorous as these blunders can be, no brand wants this to become a reality. Inaccurate captions that are poorly translated, misinterpreted, or offensive can harm a brand’s reputation and jeopardize its credibility. It is essential for brands to ensure they invest in accurate captions.

Accessibility Failed.

Inaccurate captions that are poorly timed or incomplete can hinder understanding and exclude those who rely on them. Around 48 million people in America experience some form of hearing loss. Captions play a crucial role in making content accessible to a wider audience, including individuals that are hearing impaired, non-native speakers, and those who prefer to listen without sound. Audiences now expect to consume captioned. A joint study from Verizon and Publicis Media found over 60% of young people watch ALL videos with captions. Failing to provide accurate captions limits the reach of content and can alienate people that require captioned content.

The Importance of Precise Captioning.

Accessibility & Inclusion

Language barriers and inaccurate captions hinder understanding and accessibility in live interactions and content, the same applies to inaccurate captions in content, rendering it inaccessible. Not being able to follow and understand spoken dialogue, sound effects, and other audio information in films, tv shows, and other online video content can be extremely frustrating and can lead to less engagement in content.

Accuracy in captions is crucial for accessibility and inclusion. Accurate captions help to break down communication barriers and enable all individuals – regardless of their hearing ability – to participate in discussions and social interactions and feel involved in audio content.

Viewer Experience

57% of Americans say they watch videos in public, therefore relying on the accuracy of the subtitle for the context of the video. This just goes to show how captions are now part of how people consume media and is often an expectation. Poor captioning can lead to a reduction in audiences consuming media, damaging their preferred way to view content.

Educational Value

Companies like Udemy improve lives through learning. Captions for them are vital in instilling trust in the organization and ensuring that they meet accessibility requirements. Captioning enhances the educational value of videos and gives students the option to read along with the spoken words, reinforcing their reading and language skills.

A paper published by the University of Wisconsin illustrates that watching videos with audio and captions leads to significantly better reading skills. Children who watch captioned videos can better define words that were heard in the videos, pronounce novel words, recognize vocabulary items (which may or may not have been heard in the videos), and draw inferences about what happened in the videos.

Legal & Regulatory Compliance

As Ellie Good from Udemy mentions above, accurate captions are needed to comply with certain legislation. In numerous countries, broadcasters and online video platforms are legally required to provide captions for certain types of content. Compliance with these regulations ensures equal access to information for all individuals. Captioning content is widely recognized not only for enhancing user-friendliness but, more importantly, it promotes digital inclusion among viewers.

Searchability

Every company wants to be visible online. So why hinder the discoverability of your video content with poor captions? Search engines utilize timestamps, video transcripts, and visual analysis technology to extract relevant information from videos. Keyword depth is increased via the text used in closed captions, making it easier for your content to be found based on keywords or phrases mentioned in the captions.

Great technology can help reduce captioning fails

Technology – when deployed in the right way – can be a huge help in reducing errors and mitigating against the negative impacts outlined above.

Real-time

Real-time captioning is required for live events and broadcasts, amongst other use cases. Ensuring these are accurate and timely is essential. Harnessing AI and machine learning to give you fast, accurate transcription in multiple languages can elevate your brand to the next level. Companies that integrate real-time ASR into their workflows are enhancing their accessibility and inclusivity to viewers and customers, helping to reach a broader audience all whilst complying with accessibility standards.

Why does latency matter in broadcasting?

When broadcasting, it is vital that captions are synchronized with the audio and video content in real-time. Latency refers to the delay between events occurring and when the captions are displayed. Latency in broadcasting refers to the time it takes for a caption to appear on the screen after the audio has been played. If captions are lagging behind the spoken word, it makes it very difficult to follow along in real-time. When the latency is lower, it allows viewers to follow along as dialogue unfolds. Reducing latency ensures a more immediate transmission of the content being broadcasted, keeping viewers engaged in real-time without any live stream delay making content even more accessible to all individuals.

The combination of all these factors provides the end user with an enhanced experience and is the answer to reducing captioning chaos! If you're looking to avoid any future captioning chaos, head to our media and caption section to find out more.

Jul 25, 2023 | Read time 6 min