Blog 22nd Mar 2018

It is clear that we’re living in a speech revolution, with the advent of Virtual Personal Assistants (VPAs – or more commonly recognised as Siri and Alexa) allowing us to communicate with our computers and phones more efficiently through our own voices. But will talking ever overtake typing? Some experts think it will, and sooner than you think.

An article published by the BBC indicates that voice will soon take over from typing and clicking as the main way of interacting online. With the need for accessibility for the blind, partially sighted and those that can’t read, a spoken-web would be revolutionary for the future of communication.

A poignant example is used in the article for how speech can benefit society, referencing farmers in Ghana with low literacy rates who do not have access to the vital online weather reports needed for sowing seeds, irrigating crops and grazing their animals. The farmers proposed a solution by converting online weather reports into speech recordings that were accessible on their basic phones. The technology was cheap and easy to run, working on a Raspberry Pi 2 computer with a GSM dongle, allowing farmers to accurately predict weather patterns, enhancing the longevity of their trade.

So, if speech can reap such rewards for the Ghanaian farmers, surely it can benefit the one in five adults in Europe and the US with poor reading skills? The simple answer is yes it can, but Automatic Speech Recognition (ASR) has proven to be difficult to master. Sure, we’ve come a long way in teaching machines to respond to simple commands such as finding the nearest coffee shop or turning the music up on our smart speakers, but it’s much more difficult with multiple domains, covering a variety of topics and contexts, as the article suggests.

So why is ASR so difficult? Well, because we are difficult to understand. There are so many different ways of saying and pronouncing words and sentences due to local dialects and diverse accents. So how do we get around it? Clever innovation.

Here at Speechmatics, we’ve developed an accent-agnostic pack for speech-to-text transcription called Global English, which was trained on thousands of hours of spoken data from over 40 countries and tens of billions of words drawn from global sources. Global English takes us a step closer to solving the problem of speech and can help improve business efficiency, increase transcription accuracy for English and reduce costly errors. Although there will always be some need for text on the web for certain contexts, the article states that by 2020, “up to half of all searches could be voice.”

It’s certainly an exciting time to be in speech, and who knows, maybe talking will overtake typing sooner than we think…

Read the full BBC article here.

Ben Leaman, Speechmatics

X