Blog - Use Cases

Jan 10, 2023 | Read time 7 min

Q&A with Prosodica’s Mariano Tan: A First-Hand Account of Speech-to-Text in Contact Centers

Speechmatics' Ricardo Herreros-Symons sits down with Mariano Tan, founder of Prosodica, to talk about speech-to-text in the contact center industry in a lively Q&A.

Ricardo Herreros-Symons

VP Sales

Late last year, Ricardo Herreros-Symons from Speechmatics sat down with Mariano Tan, founder of Prosodica (part of Vail Systems), to talk about speech-to-text in the contact center industry. In a wide-ranging discussion, they drilled down on Speechmatics’ breakthrough in self-supervised learning, what it means for the contact center industry and how AI-led speech-to-text helps Prosodica optimize their business.

RH-S:

At Speechmatics, understanding every voice is our goal. We value and pride ourselves on being the most accurate transcription engine out there with a genuine and continual commitment to improving and working with as many partners as possible. Once such partner is Prosodica, founded by Mariano Tan. Mariano, can you tell us a little about yourself and Prosodica?

MT:

My name is Mariano Tan. I'm the CEO and founder of Prosodica. We're a contact center analytics, or a conversational analytics platform, used in the contact center industry. We started our business in 2011 and then started working with Speechmatics around 2018, 2019.

RH-S:

Would you be able to give us a quick description of what Prosodica does, how it works and how it utilizes speech-to-text?

MT:

Prosodica is, as we said, a conversational analytics platform. It's designed to provide contact centers, or enterprises, with insights that we derive from the conversations they have with their customers. And at the core of this are both voice analytics and speech analytics capabilities.

But the speech analytics capability – or the speech transcription capability – is provided by Speechmatics, which provides the highest quality transcripts we can use to support the various machine learning classifiers that we use to distill the volumes of conversational data into more useful scores and predictions, which our customers can use to optimize their business.

RH-S:

And what are the typical use cases for that?

MT:

The architecture of Prosodica is fundamentally a data pipeline that takes inbound conversational data and turns it into scores and predictions. But we can support various use cases. Prosodica can create a satisfaction score from every context, so you don't have to survey your customers to see how satisfied they are. We can do things that are a little bit more complex like model the conversational behavior exhibited by either party in the conversation to assess performance, particularly soft skills performance. Or we can do things that are more sophisticated, like make predictions about the risk or the opportunity presented within a conversation.

RH-S:

You were saying it can be used for live coaching as well?

MT:

We presently practice post-call analysis. It's a very intentional thing, but we are currently not analyzing any of the live data that the system is producing. It is producing it. We just don't share it with anybody at present.

RH-S:

Is there a reason why it's important to have a particularly accurate transcription as part of that?

MT:

That's a good question. The classic use cases for transcription are what we might call text mining. So you sort of think if I've transcribed a call I might as well mine the text and try to see what patterns occur. And that's an interesting use case. But what we find is that to get more sophisticated predictions out of what you're dealing with, you don't want to actually search for discrete word patterns. You want to apply machine learning. And once you get outside of the practice of text mining, incremental improvements in accuracy have significant impacts on machine learning accuracy.

I'd say in the most general sense, the more complex the prediction, the more sensitive that prediction is to transcription error. I guess we could say that machine learning algorithms are attracted to the unusual or the surprising things and mistranscriptions, errors, (substitutions in particular) tend to be surprising to the algorithm, but they are also wrong. The algorithm thinks something interesting is going on, when there isn't something interesting going on.

RH-S:

You say machine learning is far more sensitive to that accuracy, but do you have any visibility on what that relationship looks like?

MT:

It's not a linear relationship. It's a function of complexity. A small incremental improvement in transcription accuracy will give you a large incremental improvement in machine learning detection accuracy. Again, depending on the sophistication of the model, that will be more leverage or less leverage.

The short answer is... a little bit of transcription accuracy has a larger effect on machine learning accuracy.

RH-S:

We have a huge breadth and depth of voices that make up the training corpora that go into our models. Have you experienced - in the last five, ten years - the importance of having a more diverse and inclusive voice model?

MT:

We support multinational corporations that send voice traffic to call centers all throughout the world. We've always had to deal with a range of voices and accents and even languages. But lately, as speech recognition becomes more mainstream and we start seeing it applied in different places, we're seeing more and more businesses are worried about the potential disparate impacts of transcription accuracy differences when analyzing what we'll call less “typical voices”.

RH-S:

In terms of the future of speech and speech recognition within the contact center, what does that look like? Where do you see it evolving over the next few years?

MT:

There's more and more prevalent use of speech analytics technology in the contact center space. This was probably triggered by what's referred to as The Great Resignation. This heightened focus on employees and the ability for an employee to leave your business and kind of leave you in a bad place. And I find that customers are taking one of two quite opposite paths in dealing with that.

One path is, use speech analytics to measure and improve the employee experience. Don't just focus on what the customer is doing, but also focus on what the employee is doing and try to use conversational analytics or use those tools to understand when your employee might be struggling. Understand how you might coach them. Try to figure out how to make their lives better so they'll want to stay at your job. That's the side of the business that we work in. That's the approach that we take.

The other approach is quite the opposite. Assuming your employees are a volatile asset that can leave, automate them out of the business. So, the two approaches are to use speech recognition to fully automate interactions, stop depending on humans. Or measure what's going on and try to support the humans.

RH-S:

What does a more remote workforce mean for customers? Should contact centers be remote ready?

MT:

COVID-19 drove almost an instantaneous transition to the remote contact center. To answer your question, literally, contact centers should be remote-ready. That is the new way of the world, the hybrid workforce. You have to be able to support both.

But what readiness means is not what people originally anticipated, which was being able to land a call in somebody's home. What readiness is, is being able to support a call in an agent's home. And by support that means measure it, manage it, coach it, and do all the things that you could have done face to face.

RH-S:

Thank you so much, Mariano, for taking the time. I know it's an incredibly busy time at the moment, so we’re really grateful.

MT:

Thanks, Ricardo. Thanks, all.

Book a meeting today with a Contact Center Solutions specialist and we’ll give you the tools you need to differentiate your Contact Center Solutions in market and help you deliver on constantly evolving customer expectations.

This article is an edited and abridged version of a LinkedIn Live conversation. You can watch the full video here.

Ready to Try Speechmatics?

Sign up for your free trial and we'll guide you through the implementation of our API. We pride ourselves on offering the best support for your business needs. If you have any questions, just ask.

Related Articles