Description
Number recognition is a notoriously difficult problem in automatic speech recognition (ASR). Unlike words where there is only a single way to express them within a transcript, numbers provide a challenge for transcription as they can be expressed as digits or words. This presents inconsistencies when transcribing numbers that can impact both the readability for human consumers and for machine tools that might expect a certain output format.
When the numbers are a crucial part of an interaction, for example, credit card and phone number use cases, unpredictable outputs present a challenge in any instance where numbers need to be transcribed. Speechmatics’ ASR delivers a standardized and consistent format of transcribing numbers (less than 10) as words.
Number recognition
ASR has evolved significantly in recent years. So too have the expectations of users. Battles are no longer fought over word error rates. Top providers are consistently delivering accuracy results in the mid to high 90s, especially for English. The battlegrounds have shifted with providers considering elements other than word error rate in the pursuit of capturing more of the intricacies of voice and speech. Last year, for example, Speechmatics rolled out the most advanced punctuation in the market to its top languages. Work is in progress to add Advanced Punctuation to even more languages.
ASR has many applications and capabilities to add value to businesses. It enables businesses to innovate with the voice data in their organization. From the voice of their employees to the voice of the customers they serve. Organizations are looking to integrate ASR solutions in addition to other 3rd party solutions to build out workflows using voice. These use cases range from straight-up transcription, captioning, media monitoring, call interaction capture, call routing, call center agent assist solutions, compliance monitoring and analytics. In these situations, a consistent and accurate representation of common entities such as numbers is not only necessary but expected.
ASR solutions are highly effective and accurate at transcribing speech. However, when it comes to numbers the format of how these are transcribed can be mixed. In some cases, transcribed numbers are unpredictable due to how models are trained. For example, there might be a mix of words and digits with the transcription product unable to differentiate that the entity it has recognized is a number and not a word.