Expertise Built on Research

Find out how discovery is embedded in our business.

Our innovative thinking and unparalleled research keeps us at the top of our game and our industry.

Experience has taught us that to really excel at something, you need world-class expertise at your fingertips.

That's why our core team of researchers is encouraged to iterate across AI, neural networks, machine learning, and language models and apply their learnings to our technology.

We're always looking for opportunities to prototype, expand our offering and provide industry-leading accuracy.

researcher image

Speechmatics Research Papers

Research Papers

01.

Hierarchical Quantized Autoencoders

“Hierarchical Quantized Autoencoders”, Will Williams, Sam Ringer, Tom Ash, John Hughes, David MacLeod, Jamie Dougherty. February 19, 2020.

Speechmatics’ paper was submitted and accepted to the most prestigious ML conference – NeurIPs. The paper is about a type of lossy image compression algorithm based on discrete representation learning, leading to a system that can reconstruct images of high-perceptual quality and retain semantically meaningful features despite very high compression rates.

02.

Texture Bias Of CNNs Limits Few-Shot Classification Performance

“Texture Bias Of CNNs Limits Few-Shot Classification Performance”, Sam Ringer, Will Williams, Tom Ash, Remi Francis, David MacLeod. October 18, 2019.

Speechmatics published the paper at NeurIPS 2019 presenting in the meta-learning workshop.

03.

Discriminative training of RNNLMs with the average word error criterion

“Discriminative training of RNNLMs with the average word error criterion”, Remi Francis, Tom Ash, Will Williams. November 8, 2020.

In this paper, Speechmatics demonstrates how you could improve recurrent neural network language models by optimizing for downstream speech recognition accuracy directly, rather than the usual generative approach which tries to model the probability of the next word in a sequence.

04.

The Speechmatics Parallel Corpus Filtering System for WMT18

“The Speechmatics Parallel Corpus Filtering System for WMT18”, Tom Ash, Remi Francis, Will Williams. Machine Translation (WMT) October 31 – November 1, 2018.

Speechmatics published the paper at Workshop on Statistical Machine Translation (WMT) 2018 and presented a translation proof of concept.

05.

A Framework for Speech Recognition Benchmarking

“A Framework for Speech Recognition Benchmarking”, Franck Dernoncourt, Trung Bui, Walter Chang. Adobe Research. Interspeech 2018.

At Interspeech 2018 in Hyderabad Speechmatics referred to as one of the most accurate providers of ASR after some evaluations, such as one done by Adobe Research. We demonstrated that our continued focus on innovation and to drive new R&D maintains our position in a growing and increasingly challenging field.

06.

Scaling Recurrent Neural Network Language Models

“Scaling Recurrent Neural Network Language Models”, W. Williams, N. Prasad, D. Mrva, T. Ash, A.J. Robinson. ICASSP 2015.

This is the first paper that shows that recurrent net language models scale to give very significant gains in speech recognition and it describes the most powerful models to date and some of the special methods needed to train them.

07.

One billion word benchmark for measuring progress in statistical language modeling

“One billion word benchmark for measuring progress in statistical language modeling”, C. Chelba, T. Mikolov, M. Schuster, Q. Ge, T. Brants, P. Koehn, A.J. Robinson. INTERSPEECH 2014.

This paper with Google presents a standard large benchmark so that progress in language modeling may be measured. Prior to this paper there was no open, freely available corpus that was large enough to be representative for modern language modeling tasks.

08.

Connectionist Speech Recognition of Broadcast News

“Connectionist speech recognition of broadcast news”, A. J. Robinson, G. D. Cook, D. P. W. Ellis, E. Fosler-Lussier, S. J. Renals, and D. A. G. Williams. Speech Communication, 37(1), 2002.

This paper provides an overview of the 2002 state-of-the-art methods to perform speech recognition using neural networks.

09.

Recognition, indexing and retrieval of British broadcast news with the THISL system

“Recognition, indexing and retrieval of British broadcast news with the THISL system”, A.J. Robinson, D. Abberley, D. Kirby, and S. Renals. Proceedings of the European Conference on Speech Technology. volume 3, pages 1267–1270, September 1999.

Here we show that speech recognition can be used to find information in audio in much the same way that web pages can be found with a search engine.

10.

Time-First Search for Large Vocabulary Speech Recognition

“Time-first search for large vocabulary speech recognition”. A.J. Robinson and J. Christie. ICASSP, pages 829–832, 1998.

Here we fundamentally change the main mechanism in speech recognition to make it both faster and more memory efficient (also US patent 5983180).

11.

Forward-Backward Retraining of Recurrent Neural Networks

“Forward-backward retraining of recurrent neural networks”. A. Senior and A.J. Robinson. Advances in Neural Information Processing Systems 8, 1996.

This presents the first “end-to-end” training paper for tasks such as speech recognition.

12.

The Use of Recurrent Networks in Continuous Speech Recognition

“The use of recurrent networks in continuous speech recognition”. A.J. Robinson. Automatic Speech and Speaker Recognition: Advanced Topics, chapter 10.

Recurrent nets applied to large vocabulary speech recognition for the first time.

13.

The Application of Recurrent Nets to Phone Probability Estimation

“The application of recurrent nets to phone probability estimation”. IEEE Transactions on Neural Networks, 5(2), March 1994. A.J. Robinson.

Recurrent nets are demonstrated to give the best performing system on a well-established phoneme recognition task.

14.

A Recurrent Error Propagation Network Speech Recognition System

“A recurrent error propagation network speech recognition system”. A.J. Robinson and F. Fallside. Computer Speech and Language, 5(3):259–274, July 1991.

The first application of recurrent nets to speech recognition.

15.

Dynamic Error Propagation Networks

“Dynamic Error Propagation Networks”. A. J. Robinson. PhD thesis, Cambridge University Engineering Department, February 1989.

This PhD thesis introduces several key concepts of recurrent networks, several different novel architectures, the algorithms needed to train them and applications to speech recognition, coding, and reinforcement learning/game playing.