Jul 24, 2023 | Read time 6 min

TL;DWOL (too long; didn't watch or listen): Speechmatics Have Launched Summarization

Speechmatics has introduced Summarization to their Speech-to-Text API, allowing businesses to extract more value from audio & video content.
Summarization Blog Header 504x378-02
Speechmatics
SpeechmaticsEditorial team
Did you catch that 3-hour podcast last week? You’ll need to add to your listen list, along with 5 others you’re behind. 
You missed that 2-hour product launch meeting? You’ll need to watch it back this evening. 
This customer call was great! You should listen to it! It was only 45 minutes.

Content is everywhere. And not just written content. Video and audio too. And not just across social media and in our leisure time too. Our work is filled with it. Internal meetings, external meetings, customer calls, the media we create as companies. Show us someone who keeps on top of it all, and we’ll show you a liar. Or at least someone who doesn’t sleep enough. 

Speech-to-text technology was a great first step in elevating some of these headaches. Accurately transcribing the audio in these various formats at least provides a searchable, scannable record, but would still require an enormous amount of time to read through. 

The answer? Summarization.

Introducing Summarization from Speechmatics

We’ve recently launched Summarization into our unified API, our first step in delivering a comprehensive suite of speech understanding features for our customers.  

Our Summarization utilizes abstractive summarization, a powerful model in AI and natural language processing. With abstractive summarization, we analyze the input, extract the key points, and produce a summary that captures the essence of the content. Unlike extractive summarization, which rearranges (and reduces) existing text, our advanced approach involves comprehending and generating new language for more effective summarization. 

We offer flexibility in our Summarization output by letting users choose between content and summary type, as well as the summary length as depicted below.

Let’s talk through some of the things you can see above: 

Content type 

  • Auto – The Auto output option detects the style of content and chooses the best summary style to match the audio input.

  • Conversational – Our conversational summary format is designed to perform well for unstructured dialogues & conversations with multiple people, ideal for meetings, sales calls, and contact centers

  • Informative – Our informative summary format is designed to work best for podcasts/news or other media content. The output delivered allows it to be more structured for the information that is being delivered by one or more people. 

Summary Length

  • Brief – provides a succinct summary, condensing the content into just a few sentences. 

  • Detailed – provides a longer, structured summary. For conversational content, it includes key topics and a summary of the entire conversation. For informative content, it logically divides the audio into sections and provides a summary for each. 

Summary Type

  • Pretty self-explanatory this one – you can either choose full sentences and prose or have the output arranged into bullet points.

  • This gives users a huge amount of flexibility in the type of summaries they want (and find most useful). You can read our examples and tutorials in our full documentation

So, what's powering all of this?

Under the Hood - Large Language Models

If you haven’t heard of ChatGPT, where have you been? It’s hard to now think of a world before the endless LinkedIn articles outlining ‘Ten Ways to Start A Side Hustle Using ChatGPT’. Now ChatGPT, and the Large Language Models (LLMs) that power it, seem to be everywhere.

So, what exactly are Large Language Models?

Well, let’s ask ChatGPT (very meta of us): 

LLMs have a wide range of applications, including language translation, content creation, text summarization, chatbots & conversational AI, speech understanding, sentiment analysis, and more.

They also exhibit a couple of key characteristics that make it particularly useful when thinking about summarization: 

  • Contextual Understanding: this means they can comprehend the meaning of a word or phrase based on the surrounding context.

  • Creative Text Generation: LLMs can generate coherent and contextually relevant text, making them capable of tasks like writing articles, stories, code, and even engaging in conversations. 

This makes them a great use case for taking a long transcript and creating a summary based on it. Rather than simply remove words until you’re left with a short (but probably stilted) summary, they can generate new sentences based on their understanding of what was said given the context.

While LLMs are clearly powerful, they do have certain limitations. One key challenge is the amount of text you can pass into it at a given time – sometimes this limit can be as low as 3,000 words. To put this into perspective, a 1-hour transcript might contain as many as 9,000 words. This seems to be a big blocker here since the aim of summarization is to take extremely long transcripts and make them more digestible.

Well, fear not.

Speechmatics’ Summarization enables you to summarize files of any duration. This is particularly useful in scenarios where you’d like to summarize day-long meetings or workshops. So, if you were on a beach, or on a yoga retreat, or simply binge-watching Ted Lasso, you will still be able to catch up on what you missed.

Our team have worked hard to incorporate the latest advancements as they emerge into our speech APIs, whilst adding functionality and removing limitations to make it as valuable as possible to our users.

Contact Centers 

Summarization can give every agent a summary of customer interactions. This both reduces admin time, but also allows other agents to review previous conversations quickly, focusing on dispute resolution and customer experience rather than retreading old conversations (and adding to frustration). Summaries can also be used to automate tasks, as well as being used by supervisors and sales enablement teams.

Virtual Meetings

Enhance team communication, whilst also saving time, with accurate notes presented in an easy-to-digest and shareable way. Everyone interested can stay informed, even if they didn’t attend the meeting in question.

Media & Podcasts 

Provide key takeaways and highlights for all content created, which can be used for descriptions, recaps, and to make content searchable. Viewers who missed content can stay in the loop and engage with content even when they are tight on time.

The TL;DR of all this? Try Summarization now! 

Summarization has already launched in our Portal – you can create a Portal account free, right now, and start generating useful summaries to your heart’s content (well, up to 8 hours per month for free).

The amount of content being created isn’t going down, and won’t. LLMs and APIs like Speechmatics give you a powerful new tool to help improve productivity, increase collaboration, and make the most of the time you have. You might even be able to convince your friends that you’re an expert coffee grinder, even if you didn’t make it through those 90 minutes.

Latest Articles

[alt: Bilingual medical model featuring terms related to various health conditions and medications in Arabic and English. Key terms include "Chronic kidney disease," "Heart attack," "Diabetes," and "Insulin," among others, displayed in an organized layout.]
Product

Speechmatics achieves a world first in bilingual Voice AI with new Arabic–English model

Sets a new accuracy bar for real-world code-switching: 35% fewer errors than the closest competitor.

Speechmatics
SpeechmaticsEditorial Team
[alt: Illuminated ancient mud-brick structures stand against a dusk sky, showcasing architectural details and textures. Palm trees are in the foreground, adding to the setting's ambiance. Visually captures a historic site in twilight.]
Product

Your voice agent speaks perfect Arabic. That's the problem.

Most voice AI models are trained on formal Arabic, but real conversations across the Middle East mix dialects and English in ways those systems aren’t built to handle.

Yahia Abaza
Yahia AbazaSenior Product Manger
new blog image header
Technical

How Nvidia Dominates the HuggingFace Leaderboards in This Key Metric

A technical deep-dive into Token Duration Transducers (TDT) — the frame-skipping architecture behind Nvidia's Parakeet models. Covers inference mechanics, training with forward-backward algorithm, and how TDT achieves up to 2.82x faster decoding than standard RNN-T.

Oliver Parish
Oliver Parish Machine Learning Engineer
[alt: Healthcare professionals in scrubs and lab coats walk briskly down a hospital corridor. A nurse uses a tablet while others carry patient charts and attend to a gurney. The setting conveys a busy, clinical environment focused on patient care.]
Use Cases

Why AI-native EHR platforms will treat speech as core infrastructure in 2026

As clinical workflows become automated and AI-driven, real-time speech is shifting from a transcription feature to the foundational intelligence layer inside modern EHR systems.

Vamsi Edara
Vamsi EdaraFounder and CEO, Edvak EHR
[alt: Logos of Speechmatics and Edvak are displayed side by side, interconnected by a stylized x symbol. The background features soft, wavy lines in light blue, creating a modern and tech-focused aesthetic.]
Company

One word changes everything: Speechmatics and Edvak EHR partner to make voice AI safe for clinical automation at scale

Turning real-time clinical speech into trusted, EHR-native automation.

Speechmatics
SpeechmaticsEditorial Team
[alt: Concentric circles radiate outward from a central orange icon with a white Speechmatics logo. The background is dark blue, enhancing the orange glow. A thin green line runs horizontally across the lower part of the image.]
Technical

Speed you can trust: The STT metrics that matter for voice agents

What “fast” actually means for voice agents — and why Pipecat’s TTFS + semantic accuracy is the clearest benchmark we’ve seen.

Archie McMullan
Archie McMullanSpeechmatics Graduate