Historically, to get the most accurate results from speech recognition technology
Specialist language packs were key but the specific packs are still extremely limited as The Economist go on to explain “the makers of these systems are aware of the problem. They are trying to offer more options: you can set Apple’s Siri or the Echo to Australian English. But they can still reach only so many accents, with a bias towards standard rather than regional ones. India, with its wide variety of English accents, presents the firms with both a tempting market and a huge technical challenge.” This challenge has led to many users being forced to modify their speech patterns in order to make themselves understood – adapting their own voice to the technology rather than the other way around.
Here at Speechmatics, we also adopted the approach of having specific language pack variants for Australian, North American and British English to ensure the most reliable results for our users when we built them 2 years ago.
Building specific variants has helped the industry move a step closer to transcribing multiple accents more accurately but it hasn’t solved the problem. The progress is great but the problem still stands where language pack variants haven’t been trained on accents from across the certain country to provide a well-covered model and it also highlights a new challenge “what do I do if I have multiple English accents from various locations – not just a single country – in one audio or video file or multiple accents in one household using the same VPA?”
With the recent advances in algorithms and neural network architectures as well as increasing compute power in speech recognition technology, maybe solving the problem of voice technology for English accent variants could be closer than you think…
Read the full Economist article here.
Georgina Robertson, Speechmatics