Oct 12, 2021 | Read time 6 min

A Career in Machine Learning: Q&A with Will Williams

Read the blog to learn from VP Machine Learning, Will Williams about the ins and outs of how machine learning is driving innovation at Speechmatics.
Career-in-ML-Blog-(595 x 841 px)
Speechmatics
SpeechmaticsEditorial team

If you want to build the world’s most powerful and accurate speech-to-text technology, there’s one thing you really need – and that’s the right team in place. So what does that team look like, what sort of projects are they working on, and what traits do they need to succeed?

To find out, we sat down with Will Williams, our VP of Machine Learning. From the ins and outs of how machine learning is driving innovation at Speechmatics, to the key trait of a machine learning expert, here’s what he told us.

Will, how long have you been at Speechmatics and what does your work entail?

I’ve been here for over eight years now – in fact, as an intern, I was one of the company’s first employees! During that time I’ve worked on pretty much every part of our tech stack, so I’ve had great exposure to all aspects of our product.

Right now, I’m heading up our machine learning efforts – whether that’s building our capabilities to ensure we can support more speech recognition languages, working on models to improve our accuracy, or helping us become more deployable across a whole range of different devices.

How did you get into machine learning as a field, and speech-to-text more specifically?

I encountered this burgeoning field when I was quite young and just found it super exciting. I started teaching myself, following courses online. I actually moved to Cambridge to do a Master's degree in machine learning.

The name Tony Robinson (our founder) cropped up in this lecture series I was watching by Geoffrey Hinton, one of the godfathers of machine learning. He was talking about Tony’s heroic efforts in the eighties to make speech recognition work, which really piqued my interest.

So you can imagine my surprise when I put out a message on Google+ asking if anyone had any summer internships going, and Tony Robinson himself replied. I jumped at the chance to work with him, started learning on the job, and ended up forgetting about the Master’s altogether. I’ve been here ever since – and it’s been a brilliant place to learn and develop.

Speechmatics is heavily driven by its research and development arm. Could you talk about some of your team’s past focus areas? What sort of projects have you worked on?

The ultimate goal with speech recognition is to make it as accurate as possible. In other words: how do you get it to a point where you can rely on it in any scenario (and then add in those valuable extras, like metadata and sentiment analysis)? It’s an elusive problem, but one we enjoy trying to tackle.

A lot of my initial research was around the ‘language model’, which essentially tells you the probability of a given sentence. That’s really useful because it allows you to differentiate between close calls or constructions that sound similar. So if I say 'recognize speech', am I talking about 'recognizing speech', or am I talking about 'wrecking a nice beach'? That sort of thing.

What about your current and future goals? How are you and the team using machine learning to drive advancements in Speechmatics’ tech?

Another big focus recently has been the acoustic model, which takes a snippet of audio, and tells you: in that snippet, what was the probability I said ‘buh’ or ‘phuh’ or ‘guh’? It goes across all the phonemes, and you have a probability distribution for each of these slices, which is obviously very useful information for a speech recognition system to have.

We’ve been focusing on that, the last couple of years. Building out giant neural networks that actually learn on unlabeled data, to build good representations to make that acoustic model work well. It’s called representation learning, and the idea is: let's take a slice of audio and produce some representations inside a neural network that make all kinds of downstream tasks easy.

Transcription is the main application, but it would also know which language the speaker is talking in, whether they were angry, and so on. We find we can train on neural networks really well. Those representations get increasingly rich, and we effectively have to do nothing in terms of downstream training to train our actual speech recognition system on top of this.

Is there such a thing as a ‘standard day’ in your role, when projects and research are always evolving?

Maybe not a standard day, but there are definitely some commonalities. I try to read maybe two papers every day, and I also like to attend stand-ups with different teams to get a sense of what’s going on. Often, I spend some time writing architecture docs for an upcoming research project, or even working with the marketing and sales teams and taking interviews.

At this precise moment in time, I’m spending a lot of time hiring. We need great people to keep leading in our field, so finding those people is a big priority.

What sort of person do you look for during the hiring process – are there any particular traits that you think make someone well suited for this line of work?

In my experience, a lot of people at Speechmatics are exceptional superstars: they’re smart, they’re focused, and they have a really good sense for where their research should go next. They can go heads-down and work alone when needed, but they’re also collaborative, with a clear ability to pull together to solve really complex problems.

One of the biggest things I look for in a hire is a desire to learn; someone who wants to take control of their learning. I’m also always looking out for what I can only describe as grit – the tenacity to chase down a problem until it’s fixed.

That’s really important, in this work, because the thing that makes machine learning so difficult is that when problems occur, it’s not just like fixing a bug in code. You could have a mathematical problem in your system that’s incredibly difficult to track down, so you have to be tenacious.

Final question – what are your hopes for the future of machine learning?

I think a lot about the nature of machine intelligence. And what it really boils down to – without getting into the world of business aphorisms – is being able to do more with less.

Think of it like this: I could prime somebody walking into an advanced physics lecture with all the questions the lecturer might ask. So they might watch the lecture and look like they have all the answers, and it would probably seem very impressive. But that’s not real intelligence.

Now, if you had a 12-year-old in the room who’d never studied physics before, but who could take the minimal input of that lecture and somehow grasp all of the concepts the lecturer was covering... that would be true intelligence. And that’s what we’re striving to achieve at Speechmatics. A situation where our models can encounter something they’ve never seen before, and still do something really smart with it.

It’s a pretty lofty ambition – but the team at Speechmatics has never been stronger, so it’s a challenge I look forward to tackling every single day.

Latest Articles

[alt: Bilingual medical model featuring terms related to various health conditions and medications in Arabic and English. Key terms include "Chronic kidney disease," "Heart attack," "Diabetes," and "Insulin," among others, displayed in an organized layout.]
Product

Speechmatics achieves a world first in bilingual Voice AI with new Arabic–English model

Sets a new accuracy bar for real-world code-switching: 35% fewer errors than the closest competitor.

Speechmatics
SpeechmaticsEditorial Team
[alt: Illuminated ancient mud-brick structures stand against a dusk sky, showcasing architectural details and textures. Palm trees are in the foreground, adding to the setting's ambiance. Visually captures a historic site in twilight.]
Product

Your voice agent speaks perfect Arabic. That's the problem.

Most voice AI models are trained on formal Arabic, but real conversations across the Middle East mix dialects and English in ways those systems aren’t built to handle.

Yahia Abaza
Yahia AbazaSenior Product Manger
new blog image header
Technical

How Nvidia Dominates the HuggingFace Leaderboards in This Key Metric

A technical deep-dive into Token Duration Transducers (TDT) — the frame-skipping architecture behind Nvidia's Parakeet models. Covers inference mechanics, training with forward-backward algorithm, and how TDT achieves up to 2.82x faster decoding than standard RNN-T.

Oliver Parish
Oliver Parish Machine Learning Engineer
[alt: Healthcare professionals in scrubs and lab coats walk briskly down a hospital corridor. A nurse uses a tablet while others carry patient charts and attend to a gurney. The setting conveys a busy, clinical environment focused on patient care.]
Use Cases

Why AI-native EHR platforms will treat speech as core infrastructure in 2026

As clinical workflows become automated and AI-driven, real-time speech is shifting from a transcription feature to the foundational intelligence layer inside modern EHR systems.

Vamsi Edara
Vamsi EdaraFounder and CEO, Edvak EHR
[alt: Logos of Speechmatics and Edvak are displayed side by side, interconnected by a stylized x symbol. The background features soft, wavy lines in light blue, creating a modern and tech-focused aesthetic.]
Company

One word changes everything: Speechmatics and Edvak EHR partner to make voice AI safe for clinical automation at scale

Turning real-time clinical speech into trusted, EHR-native automation.

Speechmatics
SpeechmaticsEditorial Team
[alt: Concentric circles radiate outward from a central orange icon with a white Speechmatics logo. The background is dark blue, enhancing the orange glow. A thin green line runs horizontally across the lower part of the image.]
Technical

Speed you can trust: The STT metrics that matter for voice agents

What “fast” actually means for voice agents — and why Pipecat’s TTFS + semantic accuracy is the clearest benchmark we’ve seen.

Archie McMullan
Archie McMullanSpeechmatics Graduate