Blog - Use Cases
Jan 10, 2023 | Read time 5 min

Q&A with Prosodica’s Mariano Tan: A First-Hand Account of Speech-to-Text in Contact Centers

Speechmatics' Ricardo Herreros-Symons sits down with Mariano Tan, President and CEO of Prosodica, to talk about speech-to-text in the contact center industry in a lively Q&A.
Ricardo Herreros-SymonsVP Corporate Development & Partnerships
Mariano TanPresident & CEO, Prosodica

Speechmatics & Prosodica Partnership

At Speechmatics, understanding every voice is our goal. We take pride and value in being the most accurate transcription engine and are committed to continuous improvement and collaboration with numerous partners, such as Prosodica. A company designed to provide contact centers, or enterprises, with insights that are derived from the conversations had with their customers. The speech transcription capabilities needed for the insights have been provided by Speechmatics since 2018. Speechmatics provides high-quality transcripts for our speech transcription capability, enabling us to support machine learning classifiers. These classifiers help distill large volumes of conversational data into useful scores and predictions, allowing Prosodica customers to optimize their business.  

Ricardo Herreros- Symons from Speechmatics chats with Mariano Tan, President and CEO of Prosodica (part of Vail Systems) to discuss what it means for the contact center industry and how AI-led Speechmatics speech-to-text helps Prosodica optimize their business. 

Q: What are the typical use cases for refining speech transcription into useful scores and predictions?  

A: The architecture of Prosodica is fundamentally a data pipeline that takes inbound conversational data and turns it into scores and predictions. We can support various use cases, here’s an example, Prosodica eliminates the need for customer surveys by generating a satisfaction score for each context, providing valuable insights into customer satisfaction. Here are two outcomes from the data;  

  • Complex Option: model the conversational behavior exhibited by either party in the conversation to assess performance, particularly soft skills performance 

  • Sophisticated Option: make predictions about the risk or the opportunity presented within a conversation. 

 Q: How can the inbound conversational data be used for live coaching?  

 A: We presently practice post-call analysis. It's a very intentional thing, but we are currently not analyzing any of the live data that the system is producing.  

Q: Why is it important to have high-accuracy transcription as part of the conversational data? 

A: That's a good question. The classic use cases for transcription are text mining. If a call is transcribed, mining the call will help determine if any patterns occurred. But to get more sophisticated predictions, you want to avoid searching for discrete word patterns. This is where you want to apply machine learning. Once you get outside of the practice of text mining, the incremental improvements in accuracy have significant impacts on machine learning accuracy.  

In a general sense, the more complex the prediction, the more sensitive that prediction is to a transcription error. Machine learning algorithms are drawn to unusual, unexpected occurrences and mis transcriptions errors, especially substitutions, which tend to be both surprising and incorrect. The algorithm mistakenly perceives such instances as intriguing, even though they do not indicate any actual noteworthy pattern. 

Q: You say machine learning is far more sensitive to that accuracy, but do you have any visibility on what that relationship looks like?  

A: It's not a linear relationship. It's a function of complexity. A small incremental improvement in transcription accuracy will give you a large incremental improvement in machine learning detection accuracy. Again, depending on the sophistication of the model, that will be more leverage or less leverage.  

Q: Speechmatics has a wide range of diverse voices within the training corpora that go into our models. Have you experienced - in the last five/ten years - the importance of having a more diverse and inclusive voice model?  

 A: We support multinational corporations that send voice traffic to call centers all throughout the world. We've always had to deal with a range of voices, accents, and languages. As speech recognition becomes more mainstream and we start seeing it applied in different places, we're seeing more and more businesses worried about the potential disparate impacts of transcription accuracy differences when analyzing what we'll call less “typical voices”.  

Q: In terms of the future of speech and speech recognition within the contact center, what does that look like? Where do you see it evolving over the next few years?  

A: There's more and more prevalent use of speech analytics technology in the contact center space.  

This was probably triggered by what's referred to as ‘The Great Resignation’. This heightened focus on employees and the ability for an employee to leave the business and be left in a difficult position. I find that contact center customers are taking one of two quite opposite paths in dealing with that.  

1. Using Speech Analytics as a Benefit 

Leverage speech analytics to enhance employee experience by focusing on both customer and employee interactions. Utilize conversational analytics to identify and support employees who may be facing challenges, offering coaching and improvements to enhance their job satisfaction and retention. This is the approach Prosodica uses, measuring what is occurring and how to best support humans. 

2. Automate Employees  

This approach uses speech recognition to fully automate interactions and assumes the employees are a volatile asset, automating them out of the business.  

Q: What does a more remote workforce mean for customers? Should contact centers be remote ready?  

A: COVID-19 drove almost an instantaneous transition to remote contact centers. To answer your question, literally, contact centers should be remote-ready. That is the new way of the world, the hybrid workforce. You have to be able to support both.  

Although, what readiness means is not what people originally anticipated, which was being able to land a call in somebody's home.  

Readiness is being able to support a call in an agent's home. Support in that phrase means measure it, manage it, coach it, and do all the things that you could have done face to face.  

Book a meeting today with a Contact Center Solutions specialist and we’ll give you the tools you need to differentiate your Contact Center Solutions in market and help you deliver on constantly evolving customer expectations.

This article is an edited and abridged version of a LinkedIn Live conversation. You can watch the full video below.

Related Articles