Language is constantly evolving with new words and phrases being added every day. The adaptive nature of language presents a problem for automatic speech recognition (ASR) providers who are tasked with ensuring their language models are always up to date with the modern world and contemporary vocabulary.
It is standard practice for ASR providers to continually train and improve their language models to ensure they deliver the highest levels of recognition accuracy. New features are added regularly, and models trained on diverse data sets to improve recognition accuracy and ensure this accuracy can be achieved in real-world scenarios. For example, an ASR system should be able to cope and deliver robustness in noisy environments.
However, there will always be a delay between the discovery of new words in spoken language and the ability to add them to training data, retrain language models and make these newly trained models available to customers.
Some words or aspects of vocabulary develop naturally over time – creating limited impact when it comes to recognition accuracy. Words such as ‘selfie’ and ‘flexitarian’ would be added into speech recognition language models naturally as they appear in the data used to retrain and improve languages. These would then be released in a regular product update cadence.
These words slowly become part of the language and often grow in popularity but are still used relatively infrequently. The impact of these newly coined words and phrases not being immediately added into an ASR language model is minimal as the risk of them producing widely reduced word error rates is low.
For example, if the word ‘selfie’ were to appear once within a 1,000-word transcript, the impact of incorrectly transcribing it would represent a word error rate (assuming all other words were transcribed correctly) of 0.1%. This is negligible compared to the cost and time of updating the language model to include this word (or even a number of words) when time and effort could be used to deliver new feature capability or the development of a new language.
Language models are trained using speech data. However, to ensure that language models remain up to date it requires constant evaluation of this training data. Additionally, it’s vital that the data used is the right quality.
Throwing data at the problem might solve the issue of vocabulary and word diversity but it’s not as easy as that. This big data approach presents a risk of introducing bias or adding uncleaned, low-quality data that would negatively affect the transcription output for customers.
ASR providers are also challenged with obtaining quality data. While some providers rely on using customer data from their cloud services, savvy consumers and businesses alike are increasingly apprehensive about their personal data being used in this way. This goes back to the issue of data quality and whether end-user audio files are the right data to train on, or whether training on this data could actually skew transcription accuracy.
The old adage ‘rubbish in, rubbish out’ rings true when it comes to training data. For this reason, ASR providers are very selective over the data needed to train a language model and must conform to the highest levels of data compliance to ensure the data they use is not only high quality and clean but also ethically sourced.
For more information on training data, read our eBook: ‘How to make the most of a data surplus’.
There are some examples where words or phrases are coined which become instantly popular and on the lips of millions of people overnight. A prime example of this was ‘Brexit’ and more recently ‘coronavirus’ and ‘COVID-19’. With the significance of these words, the rate in which they are used within language grows exponentially even in a very short period of time.
These words are often far-reaching and relevant for a large and diverse range of channels. From broadcast and media to social media and in contact centers, conversations become dominated with these terms. Not only do these words appear in key positions within a piece of content such as headlines, but they also appear frequently. Unlike less frequently used words, the impact on word error rate accuracy when transcribing one of these words is much more apparent and the impact on end-users far greater.
Speechmatics’ Custom Dictionary feature was designed for this use case – to optimize the accuracy of Speechmatics’ languages models by enabling customers to add their own words to the dictionary simply via text. Not only does this enable the rapid adaption of Speechmatics’ language models it also helps with formatting.
The Custom Dictionary Sounds feature enables users to not only add words but differentiate between how a word sounds and how it is written. For example, the engine can be instructed to express COVID-19 in capitals with a hyphen and transcribe ‘19’ in digits to ensure the transcription aligns with the accepted media formatting.
For the media and broadcast industry where world news and media are fast-paced and ever-changing, captioning is vital for accessibility. The addition of these words and terms through Custom Dictionary saves those awkward blunders and reduces the work required by human transcribers. This increases the speed in which media content can be aired and in a compliant format.
Within the contact center it is crucial for speech recognition language models to be adapted to ensure they contain the relevant vocabulary for the product/service they are representing. Here, accuracy can directly affect the bottom line. The right tools powered by accurate transcription – even with evolving vocabularies – can resolve customer issues more efficiently, provide targeted information more quickly, and ultimately reduce costs.
Accurate transcription of these unique viral words enables keyword tracking and automated call flow generation solutions to be effective. These types of solutions are in demand during situations when contact center staff are at their most stretched in the event of a situation like the COVID-19 pandemic. It is therefore important that these automated systems are working well to support the agents with an accurate transcription to help resolve customer issues as quickly as possible.
The ability to leverage Custom Dictionary delivers both of these attributes directly to the hands of the user to ensure they remain up to date even in the rapidly evolving landscape of languages.
Want more information on Custom Dictionary and Custom Dictionary Sounds feature? Read more on our website.
Your government will have all the information that you need to follow with regards to the COVID-19 virus. Please familiarize yourself with them, and stay safe, from everyone at Speechmatics.
Alex Fleming, Speechmatics