In the last year, more companies have adopted voice as part of their long-term strategies. And 65% of respondents in a survey for the Speechmatics report on Trends and Predictions for Voice Technology in 2021 said they would be considering voice in their 5-year strategy. Voice technology is reaching maturity and becoming more widespread and accessible. Companies are realizing the value that voice technology brings to their businesses, not only to improve efficiencies and streamline workflows and processes but also to improve customer experiences and ultimately increase revenues.
Emerging technologies such as natural language processing (NLP) and voice analytics tools are providing value to businesses across many industries. However, these tools can only be used with text as an input. This means, without the action of transcribing video, audio and voice into text, all of this data is left unusable. With text and voice data combined and triaged, businesses can improve their products and make use of NLP and analytics tools to obtain a 360-degree understanding of business processes and customer interactions – providing value where it has never been seen before.
More than a third of survey respondents (34%) predicted that the Asia Pacific region will see the largest growth in the adoption of speech recognition technology. This is largely due to the growth of the economy and population, which will impact business and consumer trends. Regions that will adopt voice technology first North, Central and South America came next in the poll, with Europe also predicted to adopt voice technology at a rapid rate. Again, this is largely due to economic, social and technological factors, with the value of voice being realized. It is also worth noting that North, Central and South America, as well as Europe, are further ahead in voice technology adoption than Asia Pacific. On the other hand, survey respondents thought the Middle East and Africa were unlikely to adopt speech recognition technology rapidly as there is not yet a wide need for it in many instances.
Looking at the future of speech recognition over the next three years, survey respondents said they expected voice technology to cover the following additional languages: Voice technology languages in the next three years As well as specific languages, respondents also acknowledged the need for any-accent language packs – for example, in the case of Spanish accents. Spanish is spoken in many countries across the world, with varying accents and dialects, so an any-accent Spanish language model will become crucial to serving that market over the next three years.
The perception of accuracy for voice technology is very specific to the use case and business. Some of the measures that survey respondents said were important to consider when it comes to accuracy include word error rate, speaker diarization accuracy and language identification, as well as more listed below.
Nearly three-quarters of survey respondents (73%) said the future of speech recognition is improved word error rate (WER). As machine learning algorithms continue to evolve, it is likely that WER accuracy will reach more than 95% – especially for commonly used languages like English. Improvements to speech to text in the future It’s no surprise that the ability to deal with noisy environments is another key factor in the post-COVID-19 future of speech recognition. Noise is a major factor that can affect speech recognition accuracy – so the ability to reduce interference or deliver high-quality recognition, even in challenging environments, remains a top priority. The future of speech recognition technology is also likely to see a greater focus on accent-independent language packs. Providers will need to weigh up the benefits and cost of deploying and operating an increasing number of language packs for single languages – and consider how to make the right decision on which to use for each use case and for each file. In the future, it is also expected that transcription – the core function of speech-to-text technology – will become just a single component of a wider enterprise-grade ASR offering. The addition of punctuation, speaker diarization and language identification will optimize the speech-to-text component of the overall solution.
For more information – and the full survey results – download Trends and Predictions for Voice Technology in 2021. Bonney O’Hanlon