What does Speechmatics do?

Speechmatics provides speech technology and Voice AI for enterprises, offering accurate Speech-to-Text, Text-to-Speech, and Voice Agent solutions. Our models understand every voice and accent across 55+ languages, helping businesses unlock the full potential of voice data.

How accurate is Speechmatics Speech-to-Text?

Speechmatics delivers best-in-market accuracy, achieving up to 99% word accuracy and 96% medical keyword recall in industry benchmarks. Our models handle multiple accents, noisy environments, and multi speakers with ease.

What makes Speechmatics Text-to-Speech different?

Our low-latency Text-to-Speech (TTS) delivers lifelike, human-sounding voices with sub-150ms latency that is ideal for real-time conversations. Developers can stream natural speech in multiple voices and deploy it in the cloud, hybrid, or on-prem for privacy and control.

Can I build real-time voice agents with Speechmatics?

Our voice AI enables developers to build real-time voice agents that listen, understand, and respond naturally. Plug in fast with a flexible API and native integrations to power your AI voice agents.

Which industries use Speechmatics?

Speechmatics is trusted by organizations in media, healthcare, contact center, finance, education, and accessibility. Our technology powers transcription, translation, call analytics, and voice AI applications worldwide.

Contact Centers rely on the best ASR - why are you settling for less?

tl;dr:

Hyperscalers, like Microsoft, offer convenient ASR solutions, but their generic models have limitations for Contact Center as a Service (CCaaS) vendors.
Speechmatics, a specialist ASR provider, offers high accuracy for every speaker, supporting multiple languages and downstream features.
CCaaS vendors need accurate ASR for a superior customer experience and growth in non-English markets.
Hyperscalers may limit competition, while specialized ASR providers offer flexibility, accuracy, and support.
Partnering with Speechmatics allows for customized solutions and continuous innovation in contact center operations.

Exploring the Limitations of Hyperscalers

The result of hyperscalers is a one-stop-shop; a bundle of products that play well with existing cloud solutions and make life easier for Product teams – so much so that many don’t even bother to look for specialist alternatives, particularly when it comes to automatic speech recognition (ASR). Microsoft is a great example, if you’re already using Azure then it makes sense to use Microsoft speech-to-text as well. It has a huge range of services with decent accuracy for English. If you don’t need international coverage or understand the complexities of deploying on-prem, then there’s no reason to branch out of your ecosystem. After all, your hyperscaler’s speech-to-text is good enough. Or is it?

Accurate enough is not good enough for Contact Center as a Service (CCaaS) vendors. In the US, phone interactions are still the preferred customer service channel for 76% of consumers, and nearly a third of consumers say they’d consider leaving a brand they love after just one poor experience. As such, customer experience is an overriding priority for contact centers. Inaccurate speech-to-text outputs can bias conversational data, cause customers to be routed to the wrong representative, result in churn, and spark expensive compliance fines. For CCaaS looking to deliver an AI-first-omnichannel solution for their customer's teams, ASR is too important to rely on a generic solution.

There's more to language than English

English is the most spoken language worldwide. In 2022, 1.5bn of us spoke it as either a first or second language. Thanks to its prevalence and the corpus of data available for training models on the English language, ASR technology in general handles English better than any other language.

But ASR with good accuracy for the English language is more restrictive than you might think. While 1.5bn sounds like a sizable market opportunity – just under 20% of the total global population – only around 24% of those are native speakers. By this measure, Spanish is a more common natively spoken language than English.

And that matters, because non-native speakers can be problematic for speech-to-text accuracy, specifically the more generic models developed by hyperscalers. Google’s accuracy in English is pretty serviceable – provided you’re a native male speaker aged in your late 20s or early 30s who is a university graduate. For users that come from minority backgrounds, are aged over 60, female, of a lower socioeconomic status, or non-native speakers, Google’s word error rates (WER) are sky high (check out our detailed analysis in this article). Hyperscalers’ ASR solutions struggle to deliver a consistent level of accuracy for a variety of English speakers.

By contrast, the specialist speech-to-text engine from Speechmatics, built using Ursa generation models, has been developed by a team steeped in rich ASR heritage. It delivers ground-breaking accuracy for every speaker, regardless of their accent, dialect, or various demographic factors. It has a relative accuracy lead of up to 30% compared to Amazon, Google, IBM, and Microsoft.

Choose a clip

Play audio

They were known as seers and they were held in fear by women and the elderly.

People (They) have (were) noticed (known) seals (as) seers and they were held in fear by women and the elderly.

Help

The comparison text for ASR providers shows how the recognized output compares to the reference. Words in red indicate the errors with substitutions being in italic (e.g. substitution), deletions (e.g. deletion) being crossed out, and insertions (e.g. insertion) being underlined. Hovering over the substitution error will show the ground truth.

Currently for IBM, languages are deemed to be of low importance and are not seeing much development. For product teams, this presents a problem. Marketplaces are increasingly international, and vendors need to be able to service more than just native English speakers. They also need to be able to respond confidently to potential customers in new markets and encourage existing customers to add more users or leverage more features to increase consumption. ASR that doesn’t support additional languages or downstream features will inevitably hinder growth.

This is doubly true for CCaaS vendors. If your platform lacks heightened accuracy for speech-to-text, and can’t support international users from a diverse range of demographics, then the rest of your workflows will be severely compromised, in turn hampering usability. This further hinders growth and revenue potential by not being able to penetrate non-English speaking markets.

So, what options do you have?

You could build your own ASR engine with multiple languages. However, this will be time-consuming and costly and may not lead to the desired outcome. You could use a hyperscaler’s ASR. However, if increasing total addressable market (TAM) is a priority for your business, you should be looking at ASR solutions that can service your language requirements at the accuracy levels needed to keep your existing customers happy while helping you to expand into new markets.

As a bonus, we’ve integrated translation capabilities into our single speech API, so your users can translate to and from English in over 30 languages. And, because we’re as focused on an exceptional user experience, our API supports speaker labeling, sentence start and end times, and much more. Translation can help you increase accessibility and reduce language barriers, which presents a key opportunity for contact center operators with an international presence and centralized compliance teams. Speechmatics can help you transform the customer experience by enabling CCaaS providers with real-time transcription and translation to speed up dispute resolutions and call times.

What's in a word?

Speech-to-text accuracy has a huge impact on the quality of downstream tasks. We believe in providing our customers with the tools to quickly and effectively unlock value from audio. Our new release in Summarization has made use of open-source large language models (LLMs) to enable this, providing an efficient means by which to extract, condense, and edit information. The foundation of good summaries and output from utilizing LLMs is the quality and accuracy of your transcript. And the unique combination of world-beating accuracy and experience in both language and machine learning, allows us to deliver the highest-quality stream of outputs so you can be confident to deliver a superior experience. Read more about our accuracy improvements and how this is key when using LLMs to power downstream tasks.

The contact center is increasingly being modernized with the adoption of AI. By building a strong foundation in ASR, NLU, and NLP, CCaaS providers can:

1) Boost contact center performance by providing valuable insights to agent supervisors to double down on improving their performance.

2) Uncover valuable customer insights related to satisfaction, risks, brand strength, customer churn, and competitor intelligence.

3) Enhance quality monitoring by enabling the efficient review of large samples within a shorter timeframe.

Speechmatics' highly accurate ASR improves the quality of downstream tasks

However, without accurate transcripts, the ability of your NLU and NLP to perform downstream tasks will remain futile. For example, Sentiment Analysis enables valuable strategic insights for users of CCaaS platforms. By analyzing and precisely categorizing speech segments within a transcription as positive, negative, or neutral, contact centers can enable users to monitor customer sentiments and feedback regarding products, specific agents, events, or competitors. Moreover, Sentiment Analysis can be deployed to assess agent behavior and its impact on customer responses - whether certain sentiments elicit desired actions.

Optimize total cost of ownership for your infrastructure when choosing your speech partner by adopting a unified Speech API to transcribe, understand, and unlock value for your customers.

Noisy? Hard to understand? No problem. We enable you to understand every voice, in every condition.

CCaaS providers need to be able to deliver on the ability to perform well against a range of audio conditions while maintaining high accuracy. ASR engines accurately transcribe speech in a variety of conditions such as; noisy contact centers, busy offices, trains, and sports stadiums. ASR that can’t be relied on in a real-world setting, can’t be relied on at all. While most hyperscalers see a reduction in accuracy on audio with background noise, we at Speechmatics use self-supervised learning which enables us to train on much more unlabeled data, so we perform much better on audio containing background noise (check out page 6 to learn more).

Mediocre won't win your market

For busy Product teams, integrating a solution that's part of your existing ecosystem might initially seem like a welcome time saver, but it quickly becomes restrictive. Like all Product teams, hyperscalers are motivated to increase consumption and so limit deployment options.

In a 2023 report, the UK communications regulator OfCom cited concerns that hyperscalers are intentionally limiting competition with “market features that make it more difficult for customers to switch and use multiple suppliers.” Google’s speech-to-text is available on cloud and only offers on-prem as a private feature, while Azure attempts to limit speech-to-text deployment to their own ecosystem, meaning your customers may later be forced to pay for or install features they don’t need in order to access it. Similarly, deployment to locally-hosted containers is often only available for a subset of speech-to-text features. And speech-to-text is treated more as an add-on than a priority offering.

By contrast, specialist ASR providers are built to integrate and fit far more easily into these ecosystems – offering more flexibility alongside better accuracy, additional features, and more personalized support. Speechmatics' Standard model, for example, is highly versatile; our customers don’t need to pay to commission extra training or custom-built models, and we provide flexibility with cloud or on-prem deployments, so you can cater to every security, privacy and data sovereignty requirement your customers might have.

The inconvenient truth

In its 2023 report, OfCom also called out Microsoft and AWS specifically for anti-competitive practices, citing concerns around egress fees, technical restrictions to interoperability, and committed spend discounts. The regulator expressed unease around their potential to leave some customers wholly reliant on and unable to leave one hyperscaler, as well as the negative impact on independent software vendors (ISVs) – with hyperscalers less incentivized to support ISVs with offerings that compete with their own products long term. In the US, Microsoft has also come under fire for anticompetitive practices, albeit from accusations leveled by AWS.

OfCom’s concerns too shouldn’t be overlooked. Product teams that increasingly rely on hyperscaler solutions may not only find themselves ‘locked in’ to the ecosystems they choose to build on, but may ultimately find their TAMs limited to others within that ecosystem as well.

Partnerships of equals help you focus on innovating

We believe in partnering with our customers to foster continuous innovation in the spaces they operate in. That way, our customers can remain competitive and respond quickly to market needs. Partnering with us gives you access to dedicated go to market, product, and technical support. Our teams prioritize customer feature requests to make sure we are helping CCaaS providers stay competitive and continue to launch new products and features that further differentiate themselves.

ASR that works for you

Hyperscalers serve huge volumes of customers and have little incentive to change their ASR offering to make it more useful for specific customers or use cases.

At Speechmatics, we value offering exceptional experiences to our customers. Our Standard ASR model is highly versatile, so we’re able to work with our customers to understand requirements and pain points, and tailor our solutions to solving them. It’s a level of customization that results in optimized performance, cost savings, and improved efficiency for our customers’ contact center solutions.

To find out how we can help you to take your ASR for CCaaS capabilities to the next level speak to our team today.

Jul 4, 2023 | Read time 9 min