
Interactions with contact centres can be massively frustrating, but they shouldn’t have to be. Recently, I had a negative experience with a contact centre which could have easily been avoided.
Something as simple as having my call captured and used in an interaction history dashboard could have significantly improved my customer experience and ultimately stopped me from churning.
“This call is being recorded for training and compliance reasons.” I’m sure you’ve heard this phrase at some point in the last month or so, but do we as consumers really understand how it can benefit us? Do we really understand how the call centres’ adoption of AI solutions like speech-to-text services can significantly increase our experiences as a customer, reduce the time spent resolving issues and optimise agent responses?
Recently, I had a dispute with a contact centre when I tried to update my insurance policy for a new motorcycle. After the call, I was happy with the outcome and as far as the conversation was concerned, the policy had been updated seamlessly. A few days later, however, I received a letter with a large bill attached, much to my confusion – this hadn’t been communicated on the phone.
I opened a dispute via email, stating the time of the call and my policy number for a review of the recording. Even though I had the exact date and time of the call, contact centres record millions of hours of calls a year and so finding my call took a long time and it wasn’t an easy process.
I received my response a week later once the contact centre had listened to and reviewed my call. I knew that this case had taken a long time to resolve since it had to go through several dispute teams to find the call, review it and then feedback the outcome. It is frustrating because I know that there is a better way for contact centres to deal with this. A way that is simpler, that accelerates the dispute resolution timeline, that any agent can use (not just specialist QA agents), and that can be integrated effortlessly within any contact centre regardless of their data production requirements.
As a Product Marketing Manager at Speechmatics, I’ve been researching the application of automatic speech recognition within contact centres. I’ve found that improving customer experience is a key focus for most contact centres. Integrating speech recognition technology gives contact centres the capabilities to achieve this goal.
Speech recognition provides the ability to transform call recordings into text which can then be easily accessed. Not only does having calls in a text format turn voice data into actionable insights and be used for analysis, but it can also be used to inject conversation history into the agent’s dashboard in real-time. The agent will have all the information they require at their fingertips with no requirements for taking notes throughout the call, enabling them to focus on finding a solution to the customer’s problem.
What’s the impact on the contact centre?
It reduces time searching for a single call in the millions of hours of recordings
It mitigates the need to pass the dispute tracking to a specialist team – instead agents can access the call history, review previous interactions and clarify details
Call transcriptions are easily searchable. Agents can find specific words or skip to relevant sections of the call with minimal effort and much quicker than listening to an entire call
Having said that, not any speech-to-text service will do. Before choosing your ASR provider it’s crucial to consider elements that are important to your business. For example, if you operate globally then it’s worth considering a provider with vast language capabilities. Finding a provider that suits your particular use case can be challenging, but the rewards of finding the right one are massive.
From my experience above, the ability for an agent to quickly review my interaction history would have accelerated my dispute resolution to a single exchange, saving the contact centre a lot of time and money in the process. Instead, this experience took 7-weeks and was enough for me to move over to a different provider.
By integrating speech-to-text technology into your contact centre, you will have a huge opportunity to optimise the customer experience in more ways than you think possible. Your agents are at the heart of your business and so giving them the tools that they require is essential to providing world-leading customer experience. Not only this but contact centres will save lots of money through accelerating dispute resolution.
If you want to know more about how we can work together to integrate the best speech recognition technology on the market into your business’ solution, get in touch.
Alex Fleming, Speechmatics
![[alt: Smiling man with gray hair sits against a teal background, holding a blank clipboard. He wears a blue sweater and appears relaxed and approachable, suggesting a friendly environment.]](/_next/image?url=https%3A%2F%2Fimages.ctfassets.net%2Fyze1aysi0225%2F2B2UcXrPGOWkeyLII5FGUA%2Ff263f595ae176937bdc93a08b55febcd%2FBlog-header__1_-wide-carousel.webp&w=3840&q=75)
The founder who built speech recognition in 1989 on latency, turn detection and faulty pipelines

Word error rate for legal transcription has no single acceptable threshold. But knowing how accuracy, audio quality, and review obligations connect to real legal risk is what separates a reliable transcript from a costly one.

The court reporter shortage is reshaping litigation. Explore data, causes, and how legal teams are using digital reporting and AI transcription to adapt.
![[alt: Bilingual medical model featuring terms related to various health conditions and medications in Arabic and English. Key terms include "Chronic kidney disease," "Heart attack," "Diabetes," and "Insulin," among others, displayed in an organized layout.]](/_next/image?url=https%3A%2F%2Fimages.ctfassets.net%2Fyze1aysi0225%2F3I31FQHBheddd0CibURFBv%2F4355036ed3d14b4e1accb3fe39ecd886%2FArabic-English-blog-Jade-wide-carousel.webp&w=3840&q=75)
Sets a new accuracy bar for real-world code-switching: 35% fewer errors than the closest competitor.
![[alt: Illuminated ancient mud-brick structures stand against a dusk sky, showcasing architectural details and textures. Palm trees are in the foreground, adding to the setting's ambiance. Visually captures a historic site in twilight.]](/_next/image?url=https%3A%2F%2Fimages.ctfassets.net%2Fyze1aysi0225%2F2qdoWdIOsIygVY0cwl8UD4%2Fe7725d963a96f84c87d614ccc6cce3c6%2FAdobeStock_669627191-wide-carousel.webp&w=3840&q=75)
Most voice AI models are trained on formal Arabic, but real conversations across the Middle East mix dialects and English in ways those systems aren’t built to handle.

A technical deep-dive into Token Duration Transducers (TDT) — the frame-skipping architecture behind Nvidia's Parakeet models. Covers inference mechanics, training with forward-backward algorithm, and how TDT achieves up to 2.82x faster decoding than standard RNN-T.