Innovating with Speech APIs

Speechmatics’ VP Products, Harish Kumar, shares his insights on what you need to think about when successfully innovating with Speech APIs

One of the joys of working for a tech API company is that you get to collaborate with a diverse mix of product leaders seeking to leverage your tech to power their innovation. The delight stems from sparking their imagination and in turn igniting your own.

Over the course of my career, I’ve worked for companies that delivered APIs across a range of technology domains, including search engines, knowledge graphs, natural language processing (NLP), and currently speech-to-text (Speech). In helping support product folks to pick the right API and realize a new feature or product, our conversations have typically revolved around three key goals:

How do we deliver the best user experience?
How do we target the widest possible market?
How do we manage cost of delivery?

In this blog post, I want to dive into what you need to look for in a Speech API in order to optimize for these goals.

This piece is part of our new product leader series. If you want to know when a new blog post is live, sign up here.

Optimizing for User Experience

As a product leader, the most important lever you have for establishing market differentiation is to deliver a compelling user experience. Now, any chef will tell you that the secret to creating a great dish is to start with great ingredients. Your cooking wizardry will not compensate for starting with poor ingredients.

When it comes to applications that leverage speech transcription, the key “ingredient” that will drive the quality of the user experience is the accuracy of the transcript and its associated metadata. The metadata includes elements like Speaker Diarization, word timings and more.

Accuracy is central to the entire spectrum of use cases that speech transcription support, from captioning videos to improving accessibility, creating transcripts of meetings for reference, as well as performing analytics to gather insights from a large volume of sales conversations.

Optimizing for Market Size

To continue to drive continued revenue growth, the foundations of your product should allow you to target the largest possible Total Addressable Markets (TAMs). You might start out targeting a very specific market, but you need to bake in the ability to, ultimately, go big.

In the context of transcription, perhaps the most important dimension to consider here is inclusive, geographical reach. As a rule of thumb when considering a Speech API, you want to make sure you’re able to cover the widest range of languages, accents, and dialects, and ensure that there is no bias with regard to gender, age, or race.

But, depending on your product, there could be other things to consider, for instance:

Privacy and data storage: Supporting a wide range of privacy needs is a common, vital requirement. While some of your customers might be comfortable sending their content to a public cloud, others might prefer to have the content processed in a private cloud or custom installation. If your product can support both, you can increase your target market size as well as command a significant premium.
Custom, sector-specific vocabulary: Supporting different sectors, such as finance, healthcare, law, and news means accounting for widely varying vocabularies. The performance of a Speech API can differ considerably for these domains. Specific support for your target domains or the ability to “customize” the API to support these sectors will help to increase your target market.

Optimizing for Cost of Delivery

Alongside ensuring you can target a large market with a great user experience, you also want to make sure your method is cost-effective. Managing your Total Cost of Ownership (TCO) ultimately delivers greater price leverage, as well as helping improve your margins. In estimating TCO, you need to consider both the direct and indirect costs of using a Speech API.

If you’re using software as a service (SaaS), working out direct TCO is straightforward. However, if you are running the engines on your own infrastructure, you will need to take cloud compute costs into account alongside operational overheads and licencing – a topic which, if we were to really get into, would be long enough for its own blog post.

Indirect TCO can come in different flavours. For instance, if your solution requires a human-in-the-loop to review transcripts, a high accuracy output will significantly lower your costs. The cost of API integration is another factor to consider.

Optimizing for Success

Achieving differentiation through user experience, assembling a large target market, and managing the cost of delivery represent key pillars when it comes to product innovation.

Product leadership requires you to go on a journey of acquiring a deep understanding of some of these elements and making complex trade-off decisions to find the sweet spot. To make the right decisions, you will need technology partners who are keen to join you on this journey of innovation – matching your pace of innovation with theirs.

If you’re on this journey right now, do reach out to us. We’d love to talk to you.

Feb 9, 2023 | Read time 5 min

Innovating with Speech APIs

Speechmatics’ VP Products, Harish Kumar, shares his insights on what you need to think about when successfully innovating with Speech APIs

Optimizing for User Experience

Optimizing for Market Size

Optimizing for Cost of Delivery

Optimizing for Success

Related Articles

Introducing Real-Time SaaS: Balancing Accuracy with Speed

Product Release November 2022: Including Language Coverage for over Half the World’s Population

5 Easy Steps to Get Started with Speechmatics