Aug 22, 2025 | Read time 3 min

5 lessons on the future of voice in CX - according to the experts shaping it

Leading CX experts share lessons on the future of voice, from real-time transcription to compliance, trust and smarter conversations.
Enterprise voice listing image
Maria Anastasiou
Maria AnastasiouEvents & Customer Marketing Lead

Why Voice Matters in Customer Experience As automation accelerates and customer expectations climb, voice remains the most powerful yet underutilized channel in the contact center.

This was the central thread of a recent VUX World podcast hosted by Kane Simms, featuring:

Together, they unpacked where voice fits in a world shaped by chatbots, LLMs, and real-time analytics — and what’s really at stake for CX leaders.

Lesson 1: Every Conversation Holds Untapped Metadata

“There’s a huge amount of metadata in a conversation, but most companies aren’t surfacing it.” – Martin Taylor

Every customer interaction contains layers of context. Emotion, urgency, sentiment shifts — far beyond the words exchanged. Calls are recorded, but rarely analyzed in ways that can be searched, segmented, or acted upon.

Lesson 2: Real-Time Beats Retrospective

“If you’re only analyzing after the fact, you’ve already lost the moment.” – Paolina White

Recording isn’t enough. Real-time transcription transforms voice from archive to asset. It enables live agent support, flags issues before they escalate, and allows supervisors to intervene before a customer walks away.

Lesson 3: Customers Just Want to Be Understood

“What [customers] care about is being understood.” – Paolina White

Customers don’t care about channels, they care about clarity. Voice, text, and intent should be treated as one continuous conversation. This makes intelligent routing, smarter summarization, and cross-channel continuity possible.

Lesson 4: Regulation Is Driving Voice Forward

“The demands of regulation are finally making companies look at what’s inside their calls.” – Martin Taylor

In regulated industries, transcription is no longer back-office admin. It’s a frontline requirement for compliance, proof of statements, and auditability. That’s pushing demand for diarization, redaction, and summarization.

Lesson 5: Accurate Transcription Is the Foundation

“You have to get transcription right before you can do anything else.” – Paolina White

Overlapping speech, strong accents, and noisy environments aren’t edge cases — they’re everyday. Accuracy under real-world conditions is the key that unlocks everything else: better coaching, smarter automation, and useful AI.

The Bigger Picture

From the risks of poor transcription to the rise of language as infrastructure, this conversation mapped out what the future of voice in the enterprise really looks like.

👉 Watch the full podcast at the top of this page to dive deeper.

Latest Articles

Carousel slide image
Use Cases

What Word Error Rate Is Acceptable for Legal Transcription?

Word error rate for legal transcription has no single acceptable threshold. But knowing how accuracy, audio quality, and review obligations connect to real legal risk is what separates a reliable transcript from a costly one.

Mieke Smith
Mieke SmithSenior Writer
Carousel slide image
Use Cases

The court reporter shortage crisis: data, causes, and what legal teams are doing about it

The court reporter shortage is reshaping litigation. Explore data, causes, and how legal teams are using digital reporting and AI transcription to adapt.

Tom Young
Tom YoungDigital Specialist
[alt: Bilingual medical model featuring terms related to various health conditions and medications in Arabic and English. Key terms include "Chronic kidney disease," "Heart attack," "Diabetes," and "Insulin," among others, displayed in an organized layout.]
Product

Speechmatics achieves a world first in bilingual Voice AI with new Arabic–English model

Sets a new accuracy bar for real-world code-switching: 35% fewer errors than the closest competitor.

Speechmatics
SpeechmaticsEditorial Team
[alt: Illuminated ancient mud-brick structures stand against a dusk sky, showcasing architectural details and textures. Palm trees are in the foreground, adding to the setting's ambiance. Visually captures a historic site in twilight.]
Product

Your voice agent speaks perfect Arabic. That's the problem.

Most voice AI models are trained on formal Arabic, but real conversations across the Middle East mix dialects and English in ways those systems aren’t built to handle.

Yahia Abaza
Yahia AbazaSenior Product Manger
new blog image header
Technical

How Nvidia Dominates the HuggingFace Leaderboards in This Key Metric

A technical deep-dive into Token Duration Transducers (TDT) — the frame-skipping architecture behind Nvidia's Parakeet models. Covers inference mechanics, training with forward-backward algorithm, and how TDT achieves up to 2.82x faster decoding than standard RNN-T.

Oliver Parish
Oliver Parish Machine Learning Engineer
[alt: Healthcare professionals in scrubs and lab coats walk briskly down a hospital corridor. A nurse uses a tablet while others carry patient charts and attend to a gurney. The setting conveys a busy, clinical environment focused on patient care.]
Use Cases

Why AI-native EHR platforms will treat speech as core infrastructure in 2026

As clinical workflows become automated and AI-driven, real-time speech is shifting from a transcription feature to the foundational intelligence layer inside modern EHR systems.

Vamsi Edara
Vamsi EdaraFounder and CEO, Edvak EHR