Aug 22, 2025 | Read time 3 min

5 lessons on the future of voice in CX - according to the experts shaping it

Leading CX experts share lessons on the future of voice, from real-time transcription to compliance, trust and smarter conversations.
Enterprise voice listing image
Maria Anastasiou
Maria AnastasiouEvents & Customer Marketing Lead

Why Voice Matters in Customer Experience As automation accelerates and customer expectations climb, voice remains the most powerful yet underutilized channel in the contact center.

This was the central thread of a recent VUX World podcast hosted by Kane Simms, featuring:

Together, they unpacked where voice fits in a world shaped by chatbots, LLMs, and real-time analytics — and what’s really at stake for CX leaders.

Lesson 1: Every Conversation Holds Untapped Metadata

“There’s a huge amount of metadata in a conversation, but most companies aren’t surfacing it.” – Martin Taylor

Every customer interaction contains layers of context. Emotion, urgency, sentiment shifts — far beyond the words exchanged. Calls are recorded, but rarely analyzed in ways that can be searched, segmented, or acted upon.

Lesson 2: Real-Time Beats Retrospective

“If you’re only analyzing after the fact, you’ve already lost the moment.” – Paolina White

Recording isn’t enough. Real-time transcription transforms voice from archive to asset. It enables live agent support, flags issues before they escalate, and allows supervisors to intervene before a customer walks away.

Lesson 3: Customers Just Want to Be Understood

“What [customers] care about is being understood.” – Paolina White

Customers don’t care about channels, they care about clarity. Voice, text, and intent should be treated as one continuous conversation. This makes intelligent routing, smarter summarization, and cross-channel continuity possible.

Lesson 4: Regulation Is Driving Voice Forward

“The demands of regulation are finally making companies look at what’s inside their calls.” – Martin Taylor

In regulated industries, transcription is no longer back-office admin. It’s a frontline requirement for compliance, proof of statements, and auditability. That’s pushing demand for diarization, redaction, and summarization.

Lesson 5: Accurate Transcription Is the Foundation

“You have to get transcription right before you can do anything else.” – Paolina White

Overlapping speech, strong accents, and noisy environments aren’t edge cases — they’re everyday. Accuracy under real-world conditions is the key that unlocks everything else: better coaching, smarter automation, and useful AI.

The Bigger Picture

From the risks of poor transcription to the rise of language as infrastructure, this conversation mapped out what the future of voice in the enterprise really looks like.

👉 Watch the full podcast at the top of this page to dive deeper.

Latest Articles

[alt: Bilingual medical model featuring terms related to various health conditions and medications in Arabic and English. Key terms include "Chronic kidney disease," "Heart attack," "Diabetes," and "Insulin," among others, displayed in an organized layout.]
Product

Speechmatics achieves a world first in bilingual Voice AI with new Arabic–English model

Sets a new accuracy bar for real-world code-switching: 35% fewer errors than the closest competitor.

Speechmatics
SpeechmaticsEditorial Team
[alt: Illuminated ancient mud-brick structures stand against a dusk sky, showcasing architectural details and textures. Palm trees are in the foreground, adding to the setting's ambiance. Visually captures a historic site in twilight.]
Product

Your voice agent speaks perfect Arabic. That's the problem.

Most voice AI models are trained on formal Arabic, but real conversations across the Middle East mix dialects and English in ways those systems aren’t built to handle.

Yahia Abaza
Yahia AbazaSenior Product Manger
new blog image header
Technical

How Nvidia Dominates the HuggingFace Leaderboards in This Key Metric

A technical deep-dive into Token Duration Transducers (TDT) — the frame-skipping architecture behind Nvidia's Parakeet models. Covers inference mechanics, training with forward-backward algorithm, and how TDT achieves up to 2.82x faster decoding than standard RNN-T.

Oliver Parish
Oliver Parish Machine Learning Engineer
[alt: Healthcare professionals in scrubs and lab coats walk briskly down a hospital corridor. A nurse uses a tablet while others carry patient charts and attend to a gurney. The setting conveys a busy, clinical environment focused on patient care.]
Use Cases

Why AI-native EHR platforms will treat speech as core infrastructure in 2026

As clinical workflows become automated and AI-driven, real-time speech is shifting from a transcription feature to the foundational intelligence layer inside modern EHR systems.

Vamsi Edara
Vamsi EdaraFounder and CEO, Edvak EHR
[alt: Logos of Speechmatics and Edvak are displayed side by side, interconnected by a stylized x symbol. The background features soft, wavy lines in light blue, creating a modern and tech-focused aesthetic.]
Company

One word changes everything: Speechmatics and Edvak EHR partner to make voice AI safe for clinical automation at scale

Turning real-time clinical speech into trusted, EHR-native automation.

Speechmatics
SpeechmaticsEditorial Team
[alt: Concentric circles radiate outward from a central orange icon with a white Speechmatics logo. The background is dark blue, enhancing the orange glow. A thin green line runs horizontally across the lower part of the image.]
Technical

Speed you can trust: The STT metrics that matter for voice agents

What “fast” actually means for voice agents — and why Pipecat’s TTFS + semantic accuracy is the clearest benchmark we’ve seen.

Archie McMullan
Archie McMullanSpeechmatics Graduate