Machine learning framework to build and adapt speech-to-text languages

Speechmatics has developed The Automatic Linguist, a machine learning framework to build and adapt speech-to-text languages.

The importance of clear communication

In our increasingly connected, digital world, clear communication is crucial. Speech recognition has made the latest technology accessible to more users than ever before. But that's not enough on its own. Businesses operating in a global marketplace need any-context speech recognition support in a wide range of languages.

Until recently, extensive bespoke work was required to support just a single language. But at Speechmatics we've used our expertize and knowledge of machine learning and neural networks to create a breakthrough machine learning framework for training speech recognition language models.

Our breakthrough framework is called The Automatic Linguist and in 2019, it won us a Queen’s Award for Enterprise in the innovation category. The Automatic Linguist (AL) has helped us build and update our 30+ speech-to-text languages, including the prestigious Global English language pack which encompasses all major English accents and dialects.

The challenges of building and adapting languages the traditional way

Before we created the Automatic Linguist in 2017, our speech recognition teams had to spend a lot of time listening to, filtering and cleaning up data to make sure they were training using only good, clean data. This was a huge overhead, making new languages both slow and expensive to learn.

Each of the many aspects of a language pack – from the acoustics of the language to vocabulary and grammar – required a separate expert who focused solely on that aspect of the build. Separating these aspects of the build into isolated teams meant that individual components did not always work together as well as expected. It was also very difficult to balance resources, so bottlenecks would occur with no obvious way of relieving the pressure. This was an ongoing problem that needed solving across the industry.

Each new language we tackled needed a different linguistic expert to plan and manage the build as a new challenge, with specific measures put in place for that particular language. These experts were only human. High levels of manual intervention could restrict the way the new language pack could be scaled up and generalized to other new languages.

How cutting-edge machine learning techniques have transformed speech recognition

AL was designed to allow us to explore possibilities and then rapidly integrate the best techniques by using machine learning. With a standard framework capability for the development of all our language packs, we can easily compare what differences new techniques make and establish which ones will keep us moving forward. This delivers our partners and customers with highly accurate speech-to-text languages for use in their mission-critical applications.

We do not rely on linguistic expertize for every language. Instead, when we come across a linguistic problem, we devise novel machine learning approaches to make the solution as generic as possible, so that when we come across a similar problem in another language, we don’t need to solve it all over again. The Automatic Linguist learns from its previous builds, while the Speechmatics experts extract, streamline and improve on any new learnings to make future builds easier, faster and more accurate.

The Automatic Linguist harnesses data like never before

Data is important for the success of any machine learning project and the Automatic Linguist is no different. However, our training algorithms allow us to use less data and, thanks to our filtering methods, the Automatic Linguist can also work with noisy data.

Traditionally, more data makes for better quality language packs. Every speech sample, even from the same speaker, is slightly different, and the more variations the system has been trained to expect, the better. However, there are diminishing returns and eventually, a plateau is reached. High-quality data can help overcome the plateau effect – but high-quality data is not always available. This is where our automated methods to filter and clean data come into their own. They improve results and reduce build times by allowing us to train on less data – as well as reducing development time when it comes to improving and adapting the speech-to-text languages we offer.

World-class speech-to-text languages – around the globe and into the future

The speech recognition market is evolving rapidly, and users have increasing expectations when it comes to quality and speed. By adopting a flexible framework and combining it with our experience and expertize in machine learning, we can rapidly assimilate the best of the new advances in our field to provide high-quality speech-to-text languages for our customers and partners, no matter where in the world they operate. The Automatic Linguist is designed to ensure we stay ahead of the game – in anyone's language.

Nov 18, 2020 | Read time 4 min