Blog - Company
May 1, 2025 | Read time 3 min

The future of conversations: Why voice AI will reshape digital advertising

Voice AI is redefining how brands connect with audiences. Discover why transcription accuracy and context will shape the future of digital advertising.
Tamara NelsonCofounder and CEO, Barometer

As voice interfaces shift from novelty to necessity, we’re entering a new era in how brands, platforms, and users communicate.  

In this guest post, Dr. Tamara Zubatiy Nelson, co-founder, and expert in contextual AI — explores why conversations  will define the next decade of digital interaction, and how transcription accuracy, contextual understanding, and responsible AI will underpin the future of advertising. 

Over to Tamara.

The future of conversations 

Podcasting is just one type of conversation. That’s where I started — but when I think about how people interact with the internet today versus how they’ll interact ten years from now, everything changes. 

We’re heading away from search. Away from typing. Away from websites as the central place people go to find what they need. Instead, we’ll engage through voice-first interfaces — natural, fluent, and increasingly personalized. 

“Will people even be interacting on a typing interface, or will they just be speaking?” 

In this future, advertising won't just need to appear in the right place — it will need to appear in the right moment, tone, and context.  

That’s a fundamental shift in how we think about digital media. And it starts with understanding conversations properly.

Context is king in the age of voice AI 

One of the biggest misconceptions in digital advertising is that keywords  are enough. But especially in audio, that approach has failed. 

Microphone on a desk, to illustrate a podcast

Podcasting, for instance, is one of the most decentralized — and often uncensored — media channels we have. It's powerful. It's messy. And it's full of nuance. 

“Even one word being wrong could lead to a miscategorization of the context.” 

That’s where transcription accuracy becomes mission critical. 

If you're trying to understand the suitability of a conversation — for a brand, for a platform, for compliance — that transcription has to be correct. You can't get the wrong "but" and flag something as adult content. You can’t misread “shot” and mistake a sports podcast for crime news. 

And it’s not just about the literal words. Tone matters. Sentiment matters. The relationship between words matters. If the transcription is wrong, the meaning is wrong. And if the meaning is wrong, your data, your targeting, your entire strategy could be off. 

Conversations will become the new ad inventory 

One of my core convictions is that the future of advertising is conversations. Not just podcast episodes — but smart speaker dialogues, generative search, voice chats with assistants, or even spontaneous AI interactions. 

“What could it mean to understand the suitability of a conversation a user is having with their chatbot for the placement of an ad?” 

That’s the question I’m exploring. Will we reach a point where brands advertise not on static webpages, but inside live, flowing conversations? What will the auditing process for that even look like? Will it be okay? Will it be ethical? 

We’re not quite there yet. But the foundations are already being built — and the ability to accurately capture and interpret spoken language will be the enabler for all of it. 

We need accuracy at scale 

To make this future real, we need systems that can scale without sacrificing accuracy.  

At Barometer, we analyze thousands of hours of media, from podcasts to visual content. Every word matters. 

Technologies like Speechmatics have been an integral part of our approach, giving us the precision and multilingual capabilities we need to grow — especially as our coverage has doubled in the last year. 

“We used to only offer Spanish. Now, we can support any language Speechmatics can and share in your mission to understand every voice.” 

That evolution is essential as audio becomes a global format. In some countries, podcasting is the only uncensored medium available. In others, shows might disguise their language just to secure monetization. We need tools that can detect and decode that — fast and reliably.

The role of voice AI in grounding the future 

I'm excited about voice AI — but I’m not naive. I have a PhD in this space. I don’t think we’re “one summer away from AGI.” Real progress takes time. 

But I do believe the interface shift happening now is profound. We’re moving into a world where people are overwhelmed by traditional digital formats. One might even call it the enshittification of the internet, to borrow a phrase from Cory Doctorow

Voice offers relief — a simpler, more human way to engage with technology. And with that shift comes an opportunity: to rebuild trust, design better systems, and ensure that advertising doesn’t just reach people, but actually resonates with them. 

“Systems like ours could help ground the hallucinations or inferences of larger models.” 

That’s where I see the future going: a blend of open generative systems and expert guardrails. Contextual AI that doesn’t just react, but understands. 

What’s really at stake in the voice-first era 

Advertising is rarely seen as life or death — but the stakes in this space are real. In our world, being down for even a minute could endanger hundreds of millions in ad spend. That’s how high the bar is. And that’s why accuracy, speed, and reliability in transcription tech aren’t optional — they’re foundational. 

We're entering an era where voice becomes the dominant interface, and conversations become the content that powers everything — discovery, connection, monetization.  

The question now is not if  this transformation is coming. It’s how fast we can build the infrastructure to support it — and how carefully we listen along the way.