Blog - Product

Jul 24, 2023 | Read time 6 min

TL;DWOL (too long; didn't watch or listen): Speechmatics Have Launched Summarization

Speechmatics has introduced Summarization to their Speech-to-Text API, allowing businesses to extract more value from audio & video content.
Speechmatics Team Meet the Team
Did you catch that 3-hour podcast last week? You’ll need to add to your listen list, along with 5 others you’re behind. 
You missed that 2-hour product launch meeting? You’ll need to watch it back this evening. 
This customer call was great! You should listen to it! It was only 45 minutes.

Content is everywhere. And not just written content. Video and audio too. And not just across social media and in our leisure time too. Our work is filled with it. Internal meetings, external meetings, customer calls, the media we create as companies. Show us someone who keeps on top of it all, and we’ll show you a liar. Or at least someone who doesn’t sleep enough. 

Speech-to-text technology was a great first step in elevating some of these headaches. Accurately transcribing the audio in these various formats at least provides a searchable, scannable record, but would still require an enormous amount of time to read through. 

The answer? Summarization.

Introducing Summarization from Speechmatics

We’ve recently launched Summarization into our unified API, our first step in delivering a comprehensive suite of speech understanding features for our customers.  

Our Summarization utilizes abstractive summarization, a powerful model in AI and natural language processing. With abstractive summarization, we analyze the input, extract the key points, and produce a summary that captures the essence of the content. Unlike extractive summarization, which rearranges (and reduces) existing text, our advanced approach involves comprehending and generating new language for more effective summarization. 

We offer flexibility in our Summarization output by letting users choose between content and summary type, as well as the summary length as depicted below.

Let’s talk through some of the things you can see above: 

Content type 

  • Auto – The Auto output option detects the style of content and chooses the best summary style to match the audio input.

  • Conversational – Our conversational summary format is designed to perform well for unstructured dialogues & conversations with multiple people, ideal for meetings, sales calls, and contact centers

  • Informative – Our informative summary format is designed to work best for podcasts/news or other media content. The output delivered allows it to be more structured for the information that is being delivered by one or more people. 

Summary Length

  • Brief – provides a succinct summary, condensing the content into just a few sentences. 

  • Detailed – provides a longer, structured summary. For conversational content, it includes key topics and a summary of the entire conversation. For informative content, it logically divides the audio into sections and provides a summary for each. 

Summary Type

  • Pretty self-explanatory this one – you can either choose full sentences and prose or have the output arranged into bullet points.

  • This gives users a huge amount of flexibility in the type of summaries they want (and find most useful). You can read our examples and tutorials in our full documentation

So, what's powering all of this?

Under the Hood - Large Language Models

If you haven’t heard of ChatGPT, where have you been? It’s hard to now think of a world before the endless LinkedIn articles outlining ‘Ten Ways to Start A Side Hustle Using ChatGPT’. Now ChatGPT, and the Large Language Models (LLMs) that power it, seem to be everywhere.

So, what exactly are Large Language Models?

Well, let’s ask ChatGPT (very meta of us): 

LLMs have a wide range of applications, including language translation, content creation, text summarization, chatbots & conversational AI, speech understanding, sentiment analysis, and more.

They also exhibit a couple of key characteristics that make it particularly useful when thinking about summarization: 

  • Contextual Understanding: this means they can comprehend the meaning of a word or phrase based on the surrounding context.

  • Creative Text Generation: LLMs can generate coherent and contextually relevant text, making them capable of tasks like writing articles, stories, code, and even engaging in conversations. 

This makes them a great use case for taking a long transcript and creating a summary based on it. Rather than simply remove words until you’re left with a short (but probably stilted) summary, they can generate new sentences based on their understanding of what was said given the context.

While LLMs are clearly powerful, they do have certain limitations. One key challenge is the amount of text you can pass into it at a given time – sometimes this limit can be as low as 3,000 words. To put this into perspective, a 1-hour transcript might contain as many as 9,000 words. This seems to be a big blocker here since the aim of summarization is to take extremely long transcripts and make them more digestible.

Well, fear not.

Speechmatics’ Summarization enables you to summarize files of any duration. This is particularly useful in scenarios where you’d like to summarize day-long meetings or workshops. So, if you were on a beach, or on a yoga retreat, or simply binge-watching Ted Lasso, you will still be able to catch up on what you missed.

Our team have worked hard to incorporate the latest advancements as they emerge into our speech APIs, whilst adding functionality and removing limitations to make it as valuable as possible to our users.

Contact Centers 

Summarization can give every agent a summary of customer interactions. This both reduces admin time, but also allows other agents to review previous conversations quickly, focusing on dispute resolution and customer experience rather than retreading old conversations (and adding to frustration). Summaries can also be used to automate tasks, as well as being used by supervisors and sales enablement teams.

Virtual Meetings

Enhance team communication, whilst also saving time, with accurate notes presented in an easy-to-digest and shareable way. Everyone interested can stay informed, even if they didn’t attend the meeting in question.

Media & Podcasts 

Provide key takeaways and highlights for all content created, which can be used for descriptions, recaps, and to make content searchable. Viewers who missed content can stay in the loop and engage with content even when they are tight on time.

The TL;DR of all this? Try Summarization now! 

Summarization has already launched in our Portal – you can create a Portal account free, right now, and start generating useful summaries to your heart’s content (well, up to 8 hours per month for free).

The amount of content being created isn’t going down, and won’t. LLMs and APIs like Speechmatics give you a powerful new tool to help improve productivity, increase collaboration, and make the most of the time you have. You might even be able to convince your friends that you’re an expert coffee grinder, even if you didn’t make it through those 90 minutes.