Kane Simm’s podcast, VUX World, is renowned for its captivating discussions with industry leaders in Conversational AI and Customer Experience.
With AI voice technology and voice AI transforming voice interactions across industries, enabling more natural, efficient, and intelligent communication, he sat down with Speechmatics VP Corp Dev, Ricardo Herreros-Symons. Conversational interfaces and virtual assistants, such as Google Assistant, are revolutionizing customer support by handling customer calls, answering questions, and managing customer queries across sectors like healthcare, retail, and financial services.
The two discussed voice-driven tech, conversational agents, live captioning, and the future of AI, as well as the impact of accurate speech recognition in transforming business operations and customer experiences.
Here is a snippet of their conversation:
Kane: How do you approach solving the generalized speech recognition problem and what is the importance of incredibly accurate speech technology?
Ricardo: Accurate speech recognition is crucial because it ensures that we can understand every single voice, regardless of accent or background noise, which significantly enhances user interactions and accessibility. Natural sounding speech and the ability to understand regional dialects are crucial for AI voice agents to answer FAQs and customer questions effectively, leading to higher customer satisfaction.
By improving the quality and efficiency of speech-to-text conversion, we can open up new use cases and make interactions with technology more seamless and natural. Accurate responses, immediate responses, and answering questions are made possible through advanced natural language understanding, further improving the user experience. AI voice agents can handle inbound calls and phone calls, manage claim status and order status inquiries, and schedule appointments, thereby boosting efficiency and improving patient care and patient satisfaction in healthcare and other industries.
The demo featured in the podcast showcased how our AI solutions help manage operations and streamline business processes, making customer interactions more efficient. Many voice agents still face challenges with regional dialects, but advances in natural language processing and other technologies are helping overcome these pain points.
When it comes to data management and privacy, the conversation also highlighted the importance of protecting voice data and sensitive information in AI voice systems to ensure compliance and maintain user trust. Data collection and customer feedback are essential for refining AI voice solutions and addressing customer pain points.
Two standout moments from the podcast include:
How Speechmatics' can even understand a lightning fast rap about legal drugs in the UK (5:00-5:30mins into the video) 💊
An incredibly impressive Rafael Nadal impression following his early departure from the French Open (6:30-7:30mins into the video) 🎾
We've also broken down the pod into digestible chapters using our own tech, so you can find the specific sections that you're interested in...
(00:00:00) Unparsed Conference Introduction The speaker greets the audience and introduces the upcoming Unparsed conference, which is three weeks away. The event is anticipated to be larger than the previous year, with a new track for developers and a focus on conversational and generative AI. The speaker encourages attendance, mentioning available promo codes and the event's aim to unite AI communities for sharing best practices and insights.
(00:02:32) Demonstrating Speechmatics' Technology Ricardo Herrera Simmons from Speechmatics is introduced and proceeds to demonstrate the company's speech recognition technology. The demonstration showcases the system's ability to transcribe speech accurately in real-time, handle complex vocabulary, and understand different accents and languages. The speaker emphasizes Speechmatics' focus on accuracy and low latency in their speech recognition solutions.
(00:03:11) The Role of Speech Recognition in AI The speaker inquires about Speechmatics and its role in AI. Ricardo explains the company's focus on turning audio into text and their differentiation in the market through high accuracy and broad vocabulary. He discusses the company's approach to building efficient models that can understand diverse voices and accents, and the importance of speech recognition in enabling various AI applications.
(00:08:45) Challenges and Future of Speech Recognition The conversation explores the future challenges in speech recognition, such as end-of-speech detection and the potential of audio language modeling. The speaker and Ricardo discuss the importance of creating natural and seamless interactions with AI agents, the need for efficient models that can run on devices, and the impact of speech recognition on emerging use cases like smart glasses and robotics.
(00:52:06) Data Management and Privacy in AI The speaker addresses the critical issue of data management and privacy in AI, particularly in speech recognition. Ricardo discusses Speechmatics' approach to sourcing training data, ensuring customer data privacy, and the company's preference for on-device processing to mitigate privacy concerns. He highlights the importance of building efficient models that respect user privacy and reduce environmental impact.
Voice technology and artificial intelligence are enabling new use cases, improving operational efficiency, and enhancing customer interaction across sectors.
We recently rejoined VUX World, this time with Paolina White, Senior Director at Speechmatics, and Martin Taylor, Co-founder of Content Guru.
The discussion? Straight talk on where Voice AI is really headed in customer experience.
You can watch the full conversation here:
You can also find our top five takeaways in the blog post 5 lessons on the future of voice in CX, with highlights below:
1. Metadata is being ignored. Taylor makes it clear: businesses are sitting on gold. Every call contains valuable context—sentiment, pace, urgency—but most of it never leaves the conversation.
2. Real-time matters more than ever. “If you're only analyzing after the fact, you’ve missed your moment,” says White. Voice AI is becoming the lens through which teams act—not just reflect.
3. Customers want to be understood. Not just heard. Not just recorded. Voice AI should fuel smoother routing, stronger personalization, and sharper summaries across every channel.
4. Regulation is forcing the issue. Compliance is no longer a nice-to-have. Accurate, searchable transcripts—paired with diarization and redaction—are becoming standard, not premium.
5. Accuracy isn't a feature. It’s the foundation. “Get transcription wrong, and nothing else works,” says White. Speechmatics is solving for real-world noise, cross-talk, and fast-paced dialogue—where most models stumble.
If you’re looking to learn more about Voice AI, check out our Ultimate Guide to Voice AI and learn more about foundational speech technology for the AI era.
1) What is Voice AI? Voice AI combines speech‑to‑text (STT), natural language understanding, and voice generation (TTS) to listen, comprehend, and respond in real time. It’s the technology behind hands‑free interactions from voice assistants to advanced enterprise voice agents, powering conversational interfaces and virtual assistants used in customer support and other applications.
2) How does Voice AI work? Voice AI captures spoken words, converts them to text, interprets intent, and then crafts a spoken response—often in milliseconds—to keep conversations fluid and natural. Voice AI can also be used to answer FAQs and customer questions automatically, improving efficiency across industries.
3) What distinguishes a voice assistant from a voice agent? A voice assistant handles simple, one‑off commands like “Play music.” A voice agent carries out multi‑step, context‑rich tasks—such as “Reschedule my appointment and notify attendees”—maintaining conversational awareness and memory. A virtual assistant, meanwhile, automates tasks and supports users across various industries.
4) What is AI‑driven voice generation? AI voice generation creates human-sounding synthetic speech. It’s used for narrating audiobooks, dubbing videos, or providing voices for digital characters—either generic or custom-trained.
5) Which Voice AI tools work best? It depends on the goal. Speechmatics excels at real-world transcription accuracy. Other platforms like ElevenLabs lead in natural-sounding voice generation, while OpenAI and Anthropic stand out in conversational reasoning. The best solutions pair specialists for comprehension, transcription, and natural dialogue.
6) How do enterprises deploy domain‑specific Voice AI? Start with a clear use case—like captioning, transcription, or meeting summaries. Evaluate providers based on accuracy, latency, language support, robustness in noisy environments, privacy controls, integration ease, and scalability. Multilingual support is also crucial for serving diverse customer bases and ensuring accessibility across languages. Success depends on choosing the right use case and ensuring robust data collection for continuous improvement.
7) How is data privacy handled in Voice AI systems? Privacy-conscious providers support GDPR, HIPAA, and other standards. Look for features such as‑data minimisation, encryption, configurable retention, real‑time PII redaction, audit logs, and on‑premise or hybrid deployment. Secure data collection and management are essential for maintaining privacy and compliance.
8) Why choose Speechmatics for Voice AI? Speechmatics offers high‑accuracy transcription across diverse accents and noisy settings, supports multi‑speaker diarization, delivers flexible deployment (cloud, on‑premise or on-device), and integrates with APIs and services like LiveKit and Pipecat.
One area where everything we’ve discussed comes together – from real-time accuracy to customer satisfaction, is in the growing use of Voice AI agents.
Voice AI agents are becoming essential in sectors where speed, volume, and consistency matter. From managing high call volumes to handling repetitive tasks, they help reduce operational costs while improving service quality.
Voice AI agents can handle calls, including inbound calls and phone calls, and efficiently manage customer calls and customer queries for improved customer support.
These agents can route calls, answer common questions, and support live agents by taking care of the basics, freeing up human teams to focus on complex or sensitive issues. Voice AI agents supplement human agents and allow the human agent to focus on complex issues.
They integrate with existing systems and natural language processing tools, making it possible to enhance conversations, surface insights, and streamline processes at scale. Voice AI agents also work with other technologies to streamline operations and enable efficient data collection and customer feedback analysis.
In healthcare, they help with patient scheduling and follow-ups. Voice AI agents can schedule appointments, manage claim status and order status inquiries, and improve patient care and patient satisfaction. In financial services, they can assist with claims or compliance queries. In e-commerce, they respond to order questions or returns, instantly.
But many voice agents still struggle with regional dialects, but advances in natural sounding speech are helping overcome these barriers. And while accuracy is improving, success depends on understanding real-world speech. Dialects, background noise, and fast-paced exchanges remain challenges—and this is exactly wh ere speech recognition quality becomes the differentiator.
As accuracy improves, so does the potential for Voice AI agents to deliver fast, reliable, and human-sounding experiences, without needing a human at the other end.