Blog - Use Cases

Dec 9, 2020 | Read time 4 min

Improve your digital asset management workflows with speech-to-text technology

Media companies are accelerating content production and improving their digital asset management workflows with speech recognition technology.
Speechmatics Team Meet the Team

Media companies are accelerating content production and improving their digital asset management workflows with speech-to-text technology.

Why demand is growing for digital and media asset management solutions

Companies large and small are spending more on their websites, digital commerce, and digital advertising than ever before – with a shift away from traditional advertising as consumers spend more time online. According to a report from PwC Advisory Services, US digital ad revenue increased by 16.9% from 2018 to 2019, driven by Connected TV (CTV) advertising. Moreover, online advertisements improve campaign performance tracking and iteration with the real-time data that is received.

Figures from HubSpot revealed that 80% of users recall a video ad they viewed in the past 30 days – and, after watching a video, 64% of users are more likely to buy a product online. The growing need for media and broadcasting solutions and services for advertisement and data management is now driving demand for digital asset management solutions.

The rise of audio and video assets in the digital age

Digital and media asset management covers a range of media types – from images to video and audio assets. Image management is an important part of the asset management process. However, videos are fast becoming the most significant media assets for organizations.

Video assets have a significant part to play when it comes to social media, lead generation and revenue – and this is only growing in the current economic climate, with most businesses around the world shifting to a digital-first strategy.

How speech-to-text technology improves speed and accuracy of content production

For media and digital asset management, speed and accuracy go hand in hand. Content creators in markets such as sports broadcast need to be able to download, search, edit and curate their content for analysis, promotion or publicizing on social media or other online formats immediately after an event has happened. As the volume of broadcast content increases, so too does the demand for curated content. Accuracy of speech-to-text transcription is essential for quick search and captioning of online video content – often involving highlight edits, clips or full programs online.

Using any-context speech recognition technology to automate the process of turning audio or video into a text-based format significantly speeds up the production workflow for the content curation team. Automating a laborious task such as sifting through hours’ worth of broadcast content for online highlights means editors and producers can focus on the creativity and quality of their content production rather than manually searching for the specific clips they need.

In the media and broadcast market, 100% accuracy is required for most use cases – otherwise, the asset remains difficult to search or the broadcast presents embarrassing blunders. Human transcription is necessary to obtain this level of accuracy, but it is time-consuming and expensive – often taking up to four hours to transcribe one hour of audio.

This is where human transcriptionists and voice technology work hand in hand to deliver the necessary levels of perfection. Voice technology can do the heavy lifting when it comes to transcribing voice within media assets. Where humans take up to four times the length of the audio file to transcribe it, speech-to-text providers do the job in half the time of the audio. Once transcribed, humans can be used for the value-adding editing process to ensure 100% accuracy is delivered.

Time and cost optimization for improved digital asset management

Using speech-to-text technology to transcribe voice from audio and video files helps to reduce the time spent searching for content from live broadcasts or archived content. Content creators can simply search for keywords, themes or other elements within the file using metadata – providing significant cost optimizations during content production.

Archiving media files is notoriously expensive due to associated storage costs coupled with the time investment searching for and finding old files. Once transcribed, media assets are easier to search, which makes the content production workflow more efficient. With one fully transcribed and audited depository and the right tools available, individuals working from multiple locations can access and use all available content with ease and without worry.

Download our Smart Guide