Blog 6th Nov 2019
Hands browsing media on smartphone

Speech recognition technology – an overview

Speech recognition technology has been around for some time, with research going back to the 50s and 60s. It wasn’t until the 80s that modern systems started being used such as recurrent neural networks and Hidden Markov Models (HMMs). In fact, Speechmatics’ Founder, Dr Tony Robinson pioneered the approach of applying neural networks to automatic speech recognition in the 80s. He demonstrated that neural networks greatly outperformed traditional systems. However, back then computing power was not good enough. Now, with the rise in computing power, graphics processing and cloud computing this approach is a reality and has unlocked the true value in voice.

Artificial intelligence and machine learning have enhanced the power of voice to the extent that its application is both broad and far-reaching. Speech recognition technology is now being adopted by many companies from a range of industries. It helps to enhance business processes, consumer experiences and ultimately improve the bottom line. We’re going to look at how the technology is being used within the media and broadcast industry to deliver value.

 

How are media companies using speech recognition technology?

Media and broadcasting companies are under continuous pressures to improve their offering and get content in front of more viewers. From media monitoring and media asset management to editing and caption creation, the application of speech recognition technology is providing great value to media companies.

Media and broadcast report banner: Click to download!

 

  1. Reduce editing time

The editing market has grown significantly due to the growth in video content creation and consumption. Media companies use speech recognition technology to streamline the editing process. Previously, media companies were required to have large teams of editors to edit inaccurate transcripts. This method was time-consuming, especially in the cases where a large number of files required checking and editing in parallel. Automatic speech recognition technology has been adopted to significantly reduce editing time. It enables editing teams to be more efficient and use their specialist skills to add value where machines cannot.

 

  1. Media asset management

 With more people using the Internet each day, more content is being consumed and generated. Media companies are harnessing the power of artificial intelligence and machine learning to transform how they manage their digital assets. It is now easier than ever to collate and manage vast quantities of media content, identify specific elements and automatically tag these assets.

The media industry is utilising speech recognition technology to capture audio within media content. They can then easily categorise, index and discover digital assets. Once stored, companies can search for keywords, names, people, events, dates, places, genre or other desired categories. The adoption of automatic speech recognition technology for media asset management companies enables them to significantly improve organisational productivity. It reduces the time taken to search for media clips and considerably cuts costs as a result.

 

  1. Media monitoring

With media coverage being broadcast in more channels than ever before, it’s become increasingly important to track and monitor the output. From TV, radio, social media and many other channels, it’s essential for brands to capture what is being said about a person, situation, event or brand. It can help commercial businesses, political campaigns, scientists etc., to analyse what is being said about a subject. This leads to better analysis.

Media monitoring companies are using speech recognition technology to monitor media coverage through TV, radio, social media and other spoken forms and to convert that spoken content into text. Monitoring companies can listen for specific keywords or terms in real-time or from pre-recorded files. These can then be categorised and indexed for future use.

 

  1. Captioning and subtitling

Captioning and subtitling comprise encoding, editing, and repurposing of video subtitles and captions for delivery platforms such as web, mobile, and television. The key driver behind captioning and subtitling is to support accessibility in all forms of communication. A recent report conducted by Speechmatics revealed that 29% of the media market is using human-only processing as their solution to captioning. However, the costs are high and require a great deal of human resource to transcribe, align and position captions. Media companies are turning to speech recognition technology to seek operational efficiencies and to reduce costs. Automated captioning helps broadcasting and web media organisations to caption huge quantities of audio and video content quickly and at a relatively low cost.

The ability for speech solutions to deliver high accuracy returns on transcripts provides significant advantages. For example, it is much faster than using humans for pre-recorded and real-time content. In some instances, machine transcription cannot be used in isolation. However, advances in artificial intelligence and machine learning means that speech recognition technology is on the rise and will be used in conjunction with traditional methods.

 

Download our report on the state of speech recognition technology for media companies!

So, there you have it, four ways that media companies are using speech recognition technology to improve both their business and customer workflows. To get more information, and read key insights from media professionals, download our report!

 

James Page, Speechmatics

 

Squared dark grey stroke speech bubble with a green stroke envelope shape in it

Want to see more content like this?

Sign up for our newsletter!

  • This field is for validation purposes and should be left unchanged.

X