Description Number recognition is a notoriously difficult problem in automatic speech recognition (ASR). Unlike words where there is only a single way to express them within a transcript, numbers provide a challenge for transcription as they can be expressed as digits or words. This presents inconsistencies when transcribing numbers that can impact both the readability for human consumers and for machine tools that might expect a certain output format. When the numbers are a crucial part of an interaction, for example, credit card and phone number use cases, unpredictable outputs present a challenge in any instance where numbers need to be transcribed. Speechmatics’ ASR delivers a standardized and consistent format of transcribing numbers (less than 10) as words. Number recognition ASR has evolved significantly in recent years. So too have the expectations of users. Battles are no longer fought over word error rates. Top providers are consistently delivering accuracy results in the mid to high 90s, especially for English. The battlegrounds have shifted with providers considering elements other than word error rate in the pursuit of capturing more of the intricacies of voice and speech. Last year, for example, Speechmatics rolled out the most advanced punctuation in the market to its top languages. Work is in progress to add Advanced Punctuation to even more languages. ASR has many applications and capabilities to add value to businesses. It enables businesses to innovate with the voice data in their organization. From the voice of their employees to the voice of the customers they serve. Organizations are looking to integrate ASR solutions in addition to other 3rd party solutions to build out workflows using voice. These use cases range from straight-up transcription, captioning, media monitoring, call interaction capture, call routing, call center agent assist solutions, compliance monitoring and analytics. In these situations, a consistent and accurate representation of common entities such as numbers is not only necessary but expected. ASR solutions are highly effective and accurate at transcribing speech. However, when it comes to numbers the format of how these are transcribed can be mixed. In some cases, transcribed numbers are unpredictable due to how models are trained. For example, there might be a mix of words and digits with the transcription product unable to differentiate that the entity it has recognized is a number and not a word.
Speechmatics’ enhanced number recognition and consistent formatting Accurate number recognition enhances the quality of the Speechmatics Global English language pack. It delivers accurate recognition of numbers within speech and provides a consistent output format of words for numbers less than 10. Previous output “Yes, please call me back. The best number to get me on is 0 7 seven 2 3 four 5 six 7 eight nine” New output “Yes, please call me back. The best number to get me on is zero seven seven two three four five six seven eight nine” Numbers less than 10 are now always outputted as words. This standardized transcription output delivers predictability. In the case that words represent a different format than the one required by the customer, this standard approach enables a simplified mapping so that numbers can be normalized (or converted) based on the customer’s specific needs. The benefits The focus on number recognition and delivering a consistent format uplifts the quality of the Speechmatics output. The demand on customers to review and edit transcripts can be significantly reduced. This accelerates the time to market of perfect transcripts for applications like closed captioning especially in real-time. The predictable output of numbers less than ten means that transcripts require less triage from human editors, optimizing the workforce and their efforts. Another example of the benefits of this feature is within the contact center. Speechmatics can significantly optimize agent tasks like interaction and call note capture. This can be done automatically and accurately through Speechmatics’ ASR. Number recognition can also uplift the capabilities of automated customer-facing tools such as interactive voice response (IVR) and for privacy and compliance scenarios. These use cases rely on the recognition of numbers in the voice of the speaker and also require a specific format from the ASR solution to work seamlessly with additional products within the workflow. Where and how can you use Speechmatics’ enhanced number recognition? Great news, to use this you don’t have to do anything! It comes as standard in Speechmatics’ Global English language pack in all deployment options (SaaS, Batch and Real-Time Virtual Appliances, and Batch and Real-Time Containers). Other numbers (including 10 and larger) will be addressed in subsequent releases.