With the voice revolution upon us there is more controversy than ever around voice and speech recognition technologies. Virtual Personal Assistants – or VPAs – are the most widely adopted consumer products using short form voice recognition technology but they have been criticised lately for several reasons.
I want to focus on one key issue that was covered in a recent article by The Economist on the challenges of accents in VPAs and speech technology in general. As explained in the article “More and more smartphones and computers (including countertop ones such as the Echo) can be operated by voice commands. These systems are getting ever better at knowing what users tell them to do—but not all users equally. They struggle with accents that differ from standard British or American.”
Historically, to get the most accurate results from speech recognition technology, specialist language packs were key but the specific packs are still extremely limited as The Economist go on to explain “the makers of these systems are aware of the problem. They are trying to offer more options: you can set Apple’s Siri or the Echo to Australian English. But they can still reach only so many accents, with a bias towards standard rather than regional ones. India, with its wide variety of English accents, presents the firms with both a tempting market and a huge technical challenge.” This challenge has led to many users being forced to modify their speech patterns in order to make themselves understood – adapting their own voice to the technology rather than the other way around.
Here at Speechmatics, we also adopted the approach of having specific language pack variants for Australian, North American and British English to ensure the most reliable results for our users when we built them 2 years ago.
Building specific variants has helped the industry move a step closer to transcribing multiple accents more accurately but it hasn’t solved the problem. The progress is great but the problem still stands where language pack variants haven’t been trained on accents from across the certain country to provide a well-covered model and it also highlights a new challenge “what do I do if I have multiple English accents from various locations – not just a single country – in one audio or video file or multiple accents in one household using the same VPA?”
With the recent advances in algorithms and neural network architectures as well as increasing compute power in speech recognition technology, maybe solving the problem of voice technology for English accent variants could be closer than you think…
Read the full Economist article here.
Georgina Robertson, Speechmatics