Feb 12, 2024 | Read time 4 min

Unaligned with Robert Scoble: Discussing the power of speech technology

CPO Trevor Back talks AI, speech recognition advancements, and making the jump to seamless Samantha with the legendary Robert Scoble.
Unaligned
Tom Young
Tom YoungDigital Specialist

"Integrating AI seamlessly into our lives rather than it taking over our lives."

Robert Scoble's podcast, Unaligned, is renowned for its captivating discussions with industry leaders in AI and innovation. He sat down with Speechmatics CPO Trevor Back to delve deep into how speech will play a core part of the future AGI stack and where the future of voice technology is heading.

They talk about the complexities of cutting-edge speech recognition technology and envision it's impact on our daily lives.

Here is a snippet of their conversation:

Robert: How can speech recognition technology seamlessly integrate into our lives without disrupting our experiences?

Trevor: We want to use speech recognition to push technology into the background, so that it's doing things for us without interrupting our experience of the world.

Voice technology can really push those interfaces into the background so that these systems can work without interrupting the lovely conversation that we're having.

Robert: How do you envision the future of AI enhancing everyday tasks, such as ordering food or interacting with our devices?

Trevor: The goal is for AI to work for us without us even noticing....Whether it's ordering food or interacting with devices, speech recognition can push technology into the background, enabling smoother, more natural interactions.

Watch the full podcast here.

Want to skip to the best bits?

We've broken down the pod into digestible chapters using our own tech, so you can find the specific sections that you're interested in.

(00:00:00) Introduction of Trevor Back and Speechmatics The speaker introduces himself as Trevor Back, the chief product officer at Speechmatics. He provides background that Speechmatics is a company focused on speech technology that was spun out from the University of Cambridge around 10 years ago under the leadership of the founder Tony Robinson.

(00:00:26) Comparing Speechmatics to other voice assistants Trevor contrasts Speechmatics to other well-known voice assistants like Siri, Alexa and Google Assistant. He characterizes those as more brittle, limited AI systems compared to the more advanced speech recognition capabilities of Speechmatics. The speaker explains that Speechmatics aims to understand every voice and handle diverse accents, languages and localizations beyond what those other systems can do currently.

(00:00:59) Speechmatics' focus on accuracy Trevor emphasizes Speechmatics' heritage and focus on speech technology and accuracy. He explains they use proprietary methods requiring less data to achieve high accuracy across languages, accents and dialects. This enables them to handle underrepresented groups and challenges like speech impediments better than large generic models.

(00:10:11) The future of AI agents The speaker discusses the future potential of AI agents interacting via speech interfaces. He talks about the challenges involved in making speech interactions seamless and human-like, on par with text chatbot interfaces. The speaker envisions a future of AI agents conversing fluidly with humans to perform tasks through voice.

(00:22:59) Business model and customers The speaker describes Speechmatics' business model of positioning themselves as a premium, high-accuracy offering. He explains they target customers who care about accuracy and deriving value from transcripts, unlike those just wanting low-cost transcription. The speaker gives examples of use cases needing high accuracy like passing transcripts to language models.

(00:25:38) Deployment options The speaker discusses the different platforms and deployment options Speechmatics supports. This includes cloud offerings as well as on-premise deployment for customers wanting more control, privacy or security. He highlights their ability to offer models optimized to run on smaller local devices as well.

(00:27:19) Privacy and local models The speaker talks about privacy considerations and options for running models locally rather than in the cloud. He gives the example of an AI that runs locally, listening continuously but storing everything locally rather than sending data to the cloud. The speaker indicates Speechmatics is moving towards supporting local iOS deployment as well.

(00:28:53) The exponential pace of progress in AI The speaker reflects on the rapid, exponential rate of progress being made in AI recently. He notes the challenges humans face in grasping the implications of exponential growth. The speaker expects we will continue to see surprises from the exponential improvements in areas like model size and capabilities.

(00:32:25) Opportunities for focused AI companies Trevor discusses opportunities for companies focused on specific AI capabilities even as large tech companies race to build huge general purpose AI models. He argues there are still challenges in areas like speech recognition that specialized companies can target and build expertise in.

(00:41:35) Conclusion and final thoughts In conclusion, the speaker provides some final thoughts and a call to action for the audience to learn more about Speechmatics. He directs them to visit the company website, try out their demo, and provide feedback.

Astonishing accurate ASR is here, in real-time.

What are you waiting for?

Latest Articles

[alt: Bilingual medical model featuring terms related to various health conditions and medications in Arabic and English. Key terms include "Chronic kidney disease," "Heart attack," "Diabetes," and "Insulin," among others, displayed in an organized layout.]
Product

Speechmatics achieves a world first in bilingual Voice AI with new Arabic–English model

Sets a new accuracy bar for real-world code-switching: 35% fewer errors than the closest competitor.

Speechmatics
SpeechmaticsEditorial Team
[alt: Illuminated ancient mud-brick structures stand against a dusk sky, showcasing architectural details and textures. Palm trees are in the foreground, adding to the setting's ambiance. Visually captures a historic site in twilight.]
Product

Your voice agent speaks perfect Arabic. That's the problem.

Most voice AI models are trained on formal Arabic, but real conversations across the Middle East mix dialects and English in ways those systems aren’t built to handle.

Yahia Abaza
Yahia AbazaSenior Product Manger
new blog image header
Technical

How Nvidia Dominates the HuggingFace Leaderboards in This Key Metric

A technical deep-dive into Token Duration Transducers (TDT) — the frame-skipping architecture behind Nvidia's Parakeet models. Covers inference mechanics, training with forward-backward algorithm, and how TDT achieves up to 2.82x faster decoding than standard RNN-T.

Oliver Parish
Oliver Parish Machine Learning Engineer
[alt: Healthcare professionals in scrubs and lab coats walk briskly down a hospital corridor. A nurse uses a tablet while others carry patient charts and attend to a gurney. The setting conveys a busy, clinical environment focused on patient care.]
Use Cases

Why AI-native EHR platforms will treat speech as core infrastructure in 2026

As clinical workflows become automated and AI-driven, real-time speech is shifting from a transcription feature to the foundational intelligence layer inside modern EHR systems.

Vamsi Edara
Vamsi EdaraFounder and CEO, Edvak EHR
[alt: Logos of Speechmatics and Edvak are displayed side by side, interconnected by a stylized x symbol. The background features soft, wavy lines in light blue, creating a modern and tech-focused aesthetic.]
Company

One word changes everything: Speechmatics and Edvak EHR partner to make voice AI safe for clinical automation at scale

Turning real-time clinical speech into trusted, EHR-native automation.

Speechmatics
SpeechmaticsEditorial Team
[alt: Concentric circles radiate outward from a central orange icon with a white Speechmatics logo. The background is dark blue, enhancing the orange glow. A thin green line runs horizontally across the lower part of the image.]
Technical

Speed you can trust: The STT metrics that matter for voice agents

What “fast” actually means for voice agents — and why Pipecat’s TTFS + semantic accuracy is the clearest benchmark we’ve seen.

Archie McMullan
Archie McMullanSpeechmatics Graduate