Research & Development

AI innovation in speech technology and beyond

At Speechmatics, our relentless pursuit of innovation ensures unparalleled accuracy and broad language inclusivity, redefining industry standards.

Expanding accuracy through innovation

Enabling autonomous learning to enhance our understanding

At Speechmatics, self-supervised learning (SSL) serves as a transformative approach in training speech models, harnessing unlabeled data to enhance our speech recognition systems.

This technique allows us to autonomously identify patterns in vast amounts of data, significantly expanding the diversity of speech variations our models can learn from and improving accuracy across multiple languages.

Tirelessly pushing speech technology forward...

Throughout the years, Speechmatics has remained at the forefront of speech recognition research and innovation. We are consistently pushing the boundaries of what's possible in speech-to-text technology.

2024

Improving accessibility by identifying and labelling non-speech sounds in media - helping understand more than speech. This innovation enhances the context and usability of audio data, allowing businesses to gain deeper insights and automate responses to specific audio triggers.

Our published research

Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought

James Chua, Edward Rees, Hunar Batra, Samuel R. Bowman, Julian Michael, Ethan Perez, Miles Turpin. March 8, 2024.

While chain-of-thought prompting (CoT) has the potential to improve the explainability of language model reasoning, it can systematically misrepresent the factors influencing models' behavior--for example, rationalizing answers in line with a user's opinion without mentioning this bias. To mitigate this biased reasoning problem, we introduce bias-augmented consistency training (BCT), an unsupervised fine-tuning scheme that trains models to give consistent reasoning across prompts with and without biasing features.Read the paper in full

Debating with More Persuasive LLMs Leads to More Truthful Answers

Akbir Khan, John Hughes, Dan Valentine, Laura Ruis, Kshitij Sachan, Ansh Radhakrishnan, Edward Grefenstette, Samuel R. Bowman, Tim Rocktäschel, Ethan Perez. February 9, 2024.

Common methods for aligning large language models (LLMs) with desired behaviour heavily rely on human-labelled data. However, as models grow increasingly sophisticated, they will surpass human expertise, and the role of human evaluation will evolve into non-experts overseeing experts. In anticipation of this, we ask: can weaker models assess the correctness of stronger models?Read the paper in full

Hierarchical Quantized Autoencoders

Will Williams, Sam Ringer, Tom Ash, John Hughes, David MacLeod, Jamie Dougherty. February 19, 2020.

Speechmatics’ paper was submitted and accepted to the most prestigious ML conference – NeurIPs. The paper is about a type of lossy image compression algorithm based on discrete representation learning, leading to a system that can reconstruct images of high-perceptual quality and retain semantically meaningful features despite very high compression rates.Read the paper in full

Texture Bias Of CNNs Limits Few-Shot Classification Performance

Sam Ringer, Will Williams, Tom Ash, Remi Francis, David MacLeod. October 18, 2019.

Speechmatics published the paper at NeurIPS 2019 presenting in the meta-learning workshop.Read the paper in full

Discriminative training of RNNLMs with the average word error criterion

Remi Francis, Tom Ash, Will Williams. November 8, 2020.

In this paper, Speechmatics demonstrates how you could improve recurrent neural network language models by optimizing for downstream speech recognition accuracy directly, rather than the usual generative approach which tries to model the probability of the next word in a sequence.Read the paper in full

The Speechmatics Parallel Corpus Filtering System for WMT18

Tom Ash, Remi Francis, Will Williams. Machine Translation (WMT) October 31 – November 1, 2018.

Speechmatics published the paper at Workshop on Statistical Machine Translation (WMT) 2018 and presented a translation proof of concept.Read the paper in full

A Framework for Speech Recognition Benchmarking

Franck Dernoncourt, Trung Bui, Walter Chang. Adobe Research. Interspeech 2018

At Interspeech 2018 in Hyderabad Speechmatics referred to as one of the most accurate providers of ASR after some evaluations, such as one done by Adobe Research. We demonstrated that our continued focus on innovation and to drive new R&D maintains our position in a growing and increasingly challenging field.Read the paper in full

Scaling Recurrent Neural Network Language Models

W. Williams, N. Prasad, D. Mrva, T. Ash, A.J. Robinson. ICASSP 2015. February 2, 2015.

This is the first paper that shows that recurrent net language models scale to give very significant gains in speech recognition and it describes the most powerful models to date and some of the special methods needed to train them.Read the paper in full

One billion word benchmark for measuring progress in statistical language modeling

C. Chelba, T. Mikolov, M. Schuster, Q. Ge, T. Brants, P. Koehn, A.J. Robinson. Interspeech 2014. December 10, 2013.

This paper with Google presents a standard large benchmark so that progress in language modeling may be measured. Prior to this paper there was no open, freely available corpus that was large enough to be representative for modern language modeling tasks.Read the paper in full

Connectionist Speech Recognition of Broadcast News

A. J. Robinson, G. D. Cook, D. P. W. Ellis, E. Fosler-Lussier, S. J. Renals, and D. A. G. Williams. Speech Communication, 37(1), 2002.

This paper provides an overview of the 2002 state-of-the-art methods to perform speech recognition using neural networks.Read the paper in full

Recognition, indexing and retrieval of British broadcast news with the THISL system

A.J. Robinson, D. Abberley, D. Kirby, and S. Renals. Proceedings of the European Conference on Speech Technology. volume 3, pages 1267–1270, September 1999.

Here we show that speech recognition can be used to find information in audio in much the same way that web pages can be found with a search engine.Read the paper in full

Time-First Search for Large Vocabulary Speech Recognition

A.J. Robinson and J. Christie. ICASSP, pages 829–832, 1998.

Here we fundamentally change the main mechanism in speech recognition to make it both faster and more memory efficient (also US patent 5983180).Read the paper in full

Forward-Backward Retraining of Recurrent Neural Networks

A. Senior and A.J. Robinson. Advances in Neural Information Processing Systems 8, 1996.

This presents the first “end-to-end” training paper for tasks such as speech recognition.Read the paper in full

The Use of Recurrent Networks in Continuous Speech Recognition

A.J. Robinson. Automatic Speech and Speaker Recognition: Advanced Topics, chapter 10.

Recurrent nets applied to large vocabulary speech recognition for the first time.Read the paper in full

The Application of Recurrent Nets to Phone Probability Estimation

IEEE Transactions on Neural Networks, 5(2), March 1994. A.J. Robinson.

Recurrent nets are demonstrated to give the best performing system on a well-established phoneme recognition task.Read the paper in full

A Recurrent Error Propagation Network Speech Recognition System

A.J. Robinson and F. Fallside. Computer Speech and Language, 5(3):259–274, July 1991.

The first application of recurrent nets to speech recognition.Read the paper in full

Dynamic Error Propagation Networks

A. J. Robinson. PhD thesis, Cambridge University Engineering Department, February 1989.

This PhD thesis introduces several key concepts of recurrent networks, several different novel architectures, the algorithms needed to train them and applications to speech recognition, coding, and reinforcement learning/game playing.Read the paper in full

Technical spotlight

Technical

Sparse All-Reduce in PyTorch

The All-Reduce collective is ubiquitous in distributed training, but is currently not supported for sparse CUDA tensors in PyTorch.

In the first part of this blog we contrast the existing alternatives available in the Gloo/NCCL backends.

David MacLeodMachine Learning Architect

Speech Intelligence

The.Shed: Speechmatics Capabilities in real-time

Imagine being able to understand and interpret spoken language not only retrospectively, but as it happens. This isn't just a pipe dream — it's a reality we're crafting at Speechmatics.

Our mission is to deliver Speech Intelligence for the AI era, leveraging foundational speech technology and cutting-edge AI.

Aaron NgMachine Learning Engineer

Technical

An Almost Pointless Exercise in GPU Optimization

Not everyone is able to write funky fused operators to make ML models run faster on GPUs using clever quantization tricks. However lots of developers work with algorithms that feel like they should be able to leverage the thousands of cores in a GPU to run faster than using the dozens of cores on a server CPU.

To see what is possible and what is involved, I revisited the first problem I ever considered trying to accelerate with a GPU.

Andrew InnesChief Architect

Be part of our mission

We are actively seeking talented individuals to join our collective team of ambitious, problem solvers and throught-leaders, paving the way for inclusion in speech recognition technology.