
Advances in computing power, graphics processing and cloud computing have given rise to the power of artificial intelligence (AI) and machine learning (ML) to build innovative applications.
With their rise in power has come increased hype to the extent where AI and ML have become buzzwords. Many media companies understand the potential value that these systems bring, but don’t understand what they can actually do with it or how it can be applied.
With increased maturity comes a surge in adoption of ML and AI applications that can be used in the real world. Speech recognition technology is one of those many applications. Speech recognition technology allows companies to realize the true potential of AI and ML. In this article, we’ll take a look at the role of mission-critical, accurate speech recognition in enabling media companies to innovate with voice and some of the challenges that these companies face when adopting the technology.
Media companies face a range of challenges when adopting speech technology including the need for secure deployments, cost challenges and the know-how of how to apply it. Before we address these challenges, we’ll first look at why media companies decide to adopt speech recognition technology in the first place.
Speechmatics conducted market research and found that 53% of media organizations involved in the study have already integrated speech technology within their applications. A further 20% of media companies said that speech recognition technology will be a priority for their business in the next 5-years, and 20% said that they are currently considering the adoption of speech recognition technology. In terms of the key reasons why media companies adopt speech recognition technology, 80% said that it offers operational efficiencies through reduced turn-around time, lower costs and improved productivity. 60% believe that deploying an any-context speech recognition engine generates significant competitive advantages through building innovative applications, and 33% indicated an improved customer experience and the ability to analyze big data sources of video and audio files. That all sounds great, so what are the challenges?
Integrating any technology comes with challenges and speech is no different. 53% of media companies find that the complexities of deploying speech technology into production is a key challenge. Similarly, 53% also indicated that speech recognition technology is not yet at a suitable accuracy level for their use case, meaning that a combination of human intervention and machine automation is required. 47% said that it was too timely and resource-heavy to integrate speech technology into their workflows and 27% of media companies indicated that cost was a challenge. Let’s dive into these challenges in a little more detail.
53% of media companies said that accuracy was a key challenge when adopting speech recognition technology. Accuracy is a key unit of measurement when choosing an automatic speech recognition (ASR) provider. The media market has high expectations when it comes to accuracy. Laws are being enforced to ensure media companies are adhering to strict accessibility laws as more and more online content is created. Government agencies like the FCC in the US lead this and put significant pressure on media companies. Captioning and subtitling for broadcast requires extremely high accuracy levels, especially when done in real-time. It is therefore important for media companies to find an ASR provider with close to 100% accuracy for broadcast scenarios. This will enable them to use technology and limit their reliance on human transcribers. But accuracy isn’t enough on its own anymore. Speed of transcription, language coverage and flexible deployment options are also considerations.
The complexity of deploying voice technology is a challenge for media companies as it requires resources to be allocated to ensure successful integration. It also requires the ASR provider to ensure that they have processes, procedures, documents, support and training in place to ensure that the deployment process is as easy as possible for their customers. Available resources to assist with deployment isn’t the only challenge. Data security has also complicated the deployment process.
Unsurprisingly, 79% of the media and broadcast industry consider data privacy to matter a lot. One of the main concerns for media and broadcast companies, therefore, is trusting third parties with vast quantities of data that shouldn’t be public facing. With popular media like the final series of HBO’s Game of Thrones (watched by 17.8 million people, according to CNN Business), the requirements on media companies to keep this kind of media secure until the broadcast date is paramount. The potential impact a data leak or breach could have on a production company could be millions of dollars. Media companies therefore typically prefer an on-premises deployment option to ensure that their content and transcripts can remain within a secure environment.
“Pre-broadcast media security is very important for broadcasters/content producers and a few insist on us [Red Bee Media] keeping all media within our network to help guarantee its security. The ability to deploy speech technology solutions locally has been key for that reason.” 
Hewson Maxwell, Head of Technology Development, Access Services at Red Bee Media.
From our research into deployment preferences, we found that 44% of the media industry would prefer nothing to leave their own networks. An on-premises deployment option is an appealing proposition for the media and broadcast industry. Not only does it help to provide lower latency when transcribing in real-time, but it also ensures the security of data.
The time and resources associated with integrating speech technology prove to be a challenge for media companies. Media companies use speech recognition technology as a component of their innovative applications, helping to create a competitive advantage. The media industry respondents we spoke to indicated that ease of integration was one of the most important considerations when adopting speech recognition technology. A product that is easy to integrate into an existing application is essential, especially for cases where the solution consists of many different products layering functionality to deliver the final application.
The majority of the media and broadcast industry can integrate products into their applications using their own teams. However, the ability to provide a simplified integration or even managed services is valuable for build efficiency and time to market.
Some benefits of easy integration for media companies include:
Accelerate deployment timeline
Speed up time to market
Reduce engineering times
Leverage new capabilities
Increase revenue
Just 27% of businesses said that the cost was a key challenge. This stat indicates that the media market is more concerned about accuracy and transcription output over price. The concerns around costs are minimal because media companies understand and value that deploying speech technology comes with operational efficiencies and longer-term reduced cost, and so a return on investment is both realistic and achievable.