Sep 24, 2019 | Read time 3 min

Subtitling, Captioning and Artificial Intelligence

The use of automated speech recognition systems based upon AI has created possibilities for producing captions and subtitles that did not previously exist.
Header image

The production of captions and subtitles is a creative activity with a clear relationship to speech and linguistics.

The use of automated speech recognition (ASR) systems based upon ‘artificial intelligence’ (AI) has created possibilities for producing captions and subtitles that did not previously exist.

There are general aspects that arise from using ASR

It will have a significant and positive impact on production techniques, for example, cost reduction and speed increases. But some negative aspects cannot be overlooked. Most importantly, ASR solutions do not have a true comprehension of speech, which inevitably leads to ‘non-human’ errors in their output. Additionally, part of the task of producing captions and subtitles involves subjective decisions in editing the text equivalent. For example, removing repetition and redundant speech requires a comprehension of language that is not currently feasible in fully automated solutions.

Regardless of these limitations, ASR certainly has an increasingly relevant role in caption and subtitle production, particularly where cost and/or production time considerations preclude the use of manual processes. This is arguably the situation for high volume, low value or ephemeral content and for live broadcasts, where in essence, the use of ASR technology may enable captioning that previously would be uneconomic.

ASR systems that are based upon artificial intelligence can ‘learn’ or improve their performance based on feedback. Since ASR systems tend to make predictable and repeatable errors, it is often possible to ‘train’ an ASR system to avoid similar errors in the future, leading to an improvement in performance over time.

Speech recognition is only one part of the process of caption and subtitle production. Other less obvious aspects are also potential candidates for artificial intelligence techniques. For example, using AI to automatically generate information about who is speaking, or the topic of the speech could directly improve ASR output. This information could also be used in the generation of the resulting subtitles or captions, for example, to influence the style or the speed of presentation of the text. Automated systems could also be used to check if the captions and subtitles in a broadcast do correctly match the speech in the video and are correctly timed.

As automated systems inevitably improve, it is clear that they will have increased utility in the caption and subtitle creation process. There are also other applications for ASR and AI in the quality control, monitoring and archiving workflows, where cost is a significant factor. As a company, Screen Subtitling Systems are actively embracing artificial intelligence-based solutions to support and enable a wider range of workflows and to improve the quality and quantity of subtitle and caption provision in the future.

John Birch, Strategy and Business Development Manager, Screen Subtitling Systems

About Screen Systems

Screen was founded as Screen Electronics by Laurie Atkin in 1976, and pioneered the first ever electronic subtitling system, providing the first digital character generator to the BBC. Throughout the 1970s and 80s, Screen continued to lead the market, developing a number of new subtitling technologies including fully automated transmission using timecode, the first PC based subtitle preparation system and the first multi-channel, multi-language subtitling systems.

In 2001 Screen took subtitling technologies into the 21st Century with the Polistream transmission and Poliscript preparation products. In 2011 it diversified by acquiring SysMedia Ltd, a leader in the fields of subtitle preparation and teletext content production and publishing systems. Then in 2018, the company itself was acquired by BroadStream Holdings Ltd (BHL) bringing Integrated Playout into the fold of its capability via parent company BroadStream Solutions.

Screen is now the number 1 provider of subtitling production and delivery systems in the world, and with its broader product portfolio now builds on that success with products that enhance broadcast content with value-add information services across multiple platforms and devices.

Latest Articles

Carousel slide image
Product

Alphanumeric speech recognition: why voice assistants mangle SKUs (and how to fix it)

A guide for voice AI engineers, ecommerce platforms and warehouse teams on SKU recognition accuracy voice assistant deployments depend on: why speech recognition systems produce transcription errors on product codes, what to measure when error rates matter, and the fixes that move the needle on order picking, voice ordering and customer-facing voice AI.

Speechmatics
SpeechmaticsEditorial Team
Carousel slide image
Technical

The Adobe story: How we made cloud-grade AI work on your laptop

Behind the build: what it takes to make cloud-grade speech recognition work inside Adobe Premiere, and why Whisper raised the stakes.

Andrew Innes
Andrew InnesChief Architect
Carousel slide image
Company

Adobe and Speechmatics deliver cloud-grade speech recognition on-device for Premiere

Adobe Premiere users can run the most accurate on-device transcription locally; efficient enough for a laptop, powerful enough for professional work.

Speechmatics
SpeechmaticsEditorial Team
Carousel slide image
Use Cases

Best speech-to-text AI guide: APIs, platforms and services compared

Speech-to-text has moved from novelty to enterprise infrastructure. Here's how the leading platforms stack up in 2026 — and how to pick the right one.

Tom Young
Tom YoungDigital Specialist
Speechmatics x Thymia combine medical-grade speech-to-text with clinical-grade voice biomarker intelligence to identify health signals.
News

AI can now understand health signals from 15 seconds of your voice, including fatigue, stress and type 2 diabetes

The joint platform returns transcription and health signals in real time, with no additional hardware required.

Speechmatics
SpeechmaticsEditorial Team
[alt: Concentric circles radiate outward from a central orange icon with a white Speechmatics logo. The background is dark blue, enhancing the orange glow. A thin green line runs horizontally across the lower part of the image.]
Technical

Speed you can trust: The STT metrics that matter for voice agents

What “fast” actually means for voice agents — and why Pipecat’s TTFS + semantic accuracy is the clearest benchmark we’ve seen.

Archie McMullan
Archie McMullanSpeechmatics Graduate