Blog - Product
Nov 17, 2023 | Read time 5 min

Transforming the spoken word into written chapters

Speechmatics can now automatically detect natural transition points in spoken content to divide files into digestible, summarized chapters.
Rohan SarinProduct Manager (ML)

Chapter 1: What are Chapters, anyway?

“Mabel! Mabel! Mabel! Mabel! Mabel! Mabel! Mabel! Mabel!” 

The above is a chapter (yes, an entire chapter) from Ronald Firbank's novel "Inclinations". It might well be the strangest and shortest chapters of any novel ever written. On the other end of the spectrum, Chapter 5 of 'The Passenger' by Cormac McCarthy is 62 pages long.

So, chapters don't have a set length. What exactly are they then, and why are they useful?

"The chapter has become a way of looking at the world, a way of dividing time and, therefore, of dividing experience. Its origins date back to long before the printing press or even the bound codex, back to the emergence of prose in antiquity as both an expressive and an informational medium."

Chapters seem so ingrained in the way we consume the written word and have a fascinating history (the above is taken from this New Yorker article, which is much more engrossing than a history of 'chapters' implies).

They can group and organize information in factual books, or mark a point of pause in fictional writing. 'Good' chapters organize content around a topic or narrative section and make the content itself easier to remember. Imagine your favorite novel written as a single, unbroken chapter. Not only would it be physically exhausting to get through, but parts of it would also be harder to place and remember. This desire for writing to be split into 'chunks' – both paragraphs and chapters far outdates the modern era of fast, consumable media, though now more than ever folks wish to be able to scan and imbibe content in digestible nuggets.

But what about the spoken word? Speeches, conversations, podcasts, lectures, interviews. Given their live, flowing nature, these are not formally organized into chapters with headings, though they do have ebbs and flows, and move from one topic to the next, so in a sense exist in 'chapters' in your mind when thinking back on what you heard.

If you were reading through transcripts of these and turning them into books, you might naturally have the inclination to break them up into the familiar chapters of the written word, giving you all those benefits of the convention that has been around longer than the printing press. Your transcript would formalize and reflect how the conversation existed already when you listened to it, and this structure would help you parse and understand the essence of the audio.

That's what we've built 😊

Chapter 2: Digestible, summarized chapters, automatically

We use best-in-class machine learning models to identify optimal places for chapter markers based on topic changes in media files. We call this Chapters, the latest Speech Intelligence capability from Speechmatics.

Speechmatics now can automatically detect natural transition points in spoken content to divide files into distinct chapters and then summarize the content within that chapter. This makes it effortless to create navigable chapters for videos, podcast episodes, audiobooks, lectures, and other long-form media. Instead of a single unbroken stream of sentences, transcripts can now be split into chapters at points where the media naturally moves from one topic to the next.

We even give those chapters headings, automatically. This removes the need to do this manually and satisfies that natural urge to make content easier to parse, but also gives you grouped information to use in other workflows.

Take the first 5 minutes of this video from TechLinked.

As a full transcript, this could be hard to read 😴:

But, with Chapters, here’s what we get... 

(00:00:00) Apple's M3 Chip Event 

The speaker mentions Apple's event where they were expected to unveil new Macs with the M3 chip. However, the event started after they filmed the video so the speaker does not know for sure what was announced.

(00:00:14) Nvidia RTX 4080 Super GPU Leaks 

There were leaks about Nvidia's upcoming RTX 4080 Super GPU having similar specs to the regular RTX 4080. The speaker is skeptical about the value of a new 'Super' GPU at the same price as the original.

(00:01:40) Qualcomm's Snapdragon Performance Claims 

Qualcomm made big performance claims about their new Snapdragon laptop chips at an event. Detailed test results showed the new chips outperforming Intel and AMD chips in benchmarks, though it remains to be seen if they work well with Windows.

(00:03:06) Meta's Ad-Free Subscription Service 

Meta launched a paid subscription without ads for Facebook and Instagram in the EU, intended to satisfy regulators concerned about data collection. It costs €10 on desktop or €13 on mobile to cover fees charged by Google and Apple.

(00:04:21) Quick News Bits 

This section covers multiple short news topics including: Volcanic Coffee sponsor ad; ChatGPT gaining data analysis features; Microsoft partnership with iFixit for Surface device repairs; White House AI transparency executive order; Android 14 bug locking users out; Google using earbuds as heart rate monitors; revelations about common animal sounds used in media.

(00:04:55) ChatGPT's New Data Analysis Capabilities 

ChatGPT gained a new feature to analyze uploaded documents like PDFs by chatting with them. The speaker jokingly compares this to bringing an inanimate object to life.

Lovely, digestible chunks, along with timestamps (which can be useful for both the viewers of content and also for building workflows using the media). 

With this, you can split an interview into multiple themes, divide a long news highlight into different headlines, help students revise a particular section from a lecture, or share just a segment of a podcast with your users.

We've designed it such that you do less work processing and iterating on the transcript, and can focus on what you do best, making that transcript useful and actionable in your product.

Besides the intuitive reasons for wanting to make content digestible, there are statistical engagement benefits for doing so too. For EdTech companies, longer videos have higher dropout rates among students. For media distribution companies and content creators, longer videos lose viewers.

Chapters is now available through the Speechmatics API and existing Speechmatics customers will be able to sign up for Early Bird usage at a time-limited reduced price.

We're excited to launch Chapters alongside our other existing capabilities as part of our move into Speech Intelligence. Our goal is to make speech as useful and actionable as possible.

To end, we want to leave you with a final, profound chapter of our own:

Chapter 3

Chapters! Chapters! Chapters! Chapters! Chapters! Chapters!

Effortlessly add chapters to your media file

Automatically organize and summarize long audio into small chunks with Speechmatics’ Chapters capability.