May 2, 2016 | Read time 2 min

Speechmatics unveils Universal Time Alignment system

Header image

Speechmatics releases Universal Time Alignment, our language-independent forced-alignment service to match words in text files to their counterparts in audio files, accurately and automatically delivering improved content discoverability, in any language!

The R&D team at Speechmatics have used their deep learning expertise to create a highly accurate and automated system for aligning audio to text.

By synchronising audio to text, Universal Time Alignment can be used for the creation of closed captions and subtitles, indexing archives and enriching human generated transcripts with extra metadata that would usually be carried out laboriously by hand. In an industry where metadata and searchability is becoming increasingly crucial, time alignment offers a simple and very cost effective way of making audio, video and text searchable across any language. To create Universal Time Alignment we extracted elements from our modular speech recognition technology, re-engineered them for the purpose and added additional alignment specific technology based on our machine learning expertise and experience. As a result we have created a system that is not only robust and accurate, but crucially able to cope with any language in the world.

“Any language in the world” was not quite specific enough as Dr Tom Ash (Director of Speech Recognition at Speechmatics) found out – “when we told the commercial team it would work with foreign languages, we were confident that we would cope with the French and German broadcast and telephony they were intending to throw at the system. However, when they discovered that we had successfully time-aligned 14th century Italian epics and the works of Chinese poet Xu Zhimo, even they were surprised.”

Speechmatics’ Universal Time Alignment system is a game changer in an industry that has long suffered from over-promise and under-delivery. This technology is a big step in helping bring the efficiency savings of these technologies to more difficult audio.

There are still many cases where audio quality is too low for traditional ASR to add value or save time in the workflow. However, for the cases where a human transcript has to be created, time alignment can now be used on the human transcript to add further value, reduce the cost of human time stamping and aid discoverability. We encourage everyone to visit our site at www.speechmatics.com/register to see for themselves how we can help content owners and transcribers extract the most value from their audio and video inventory.

Latest Articles

Carousel slide image
Product

Alphanumeric speech recognition: why voice assistants mangle SKUs (and how to fix it)

A guide for voice AI engineers, ecommerce platforms and warehouse teams on SKU recognition accuracy voice assistant deployments depend on: why speech recognition systems produce transcription errors on product codes, what to measure when error rates matter, and the fixes that move the needle on order picking, voice ordering and customer-facing voice AI.

Speechmatics
SpeechmaticsEditorial Team
Carousel slide image
Technical

The Adobe story: How we made cloud-grade AI work on your laptop

Behind the build: what it takes to make cloud-grade speech recognition work inside Adobe Premiere, and why Whisper raised the stakes.

Andrew Innes
Andrew InnesChief Architect
Carousel slide image
Company

Adobe and Speechmatics deliver cloud-grade speech recognition on-device for Premiere

Adobe Premiere users can run the most accurate on-device transcription locally; efficient enough for a laptop, powerful enough for professional work.

Speechmatics
SpeechmaticsEditorial Team
Carousel slide image
Use Cases

Best speech-to-text AI guide: APIs, platforms and services compared

Speech-to-text has moved from novelty to enterprise infrastructure. Here's how the leading platforms stack up in 2026 — and how to pick the right one.

Tom Young
Tom YoungDigital Specialist
Speechmatics x Thymia combine medical-grade speech-to-text with clinical-grade voice biomarker intelligence to identify health signals.
News

AI can now understand health signals from 15 seconds of your voice, including fatigue, stress and type 2 diabetes

The joint platform returns transcription and health signals in real time, with no additional hardware required.

Speechmatics
SpeechmaticsEditorial Team
[alt: Concentric circles radiate outward from a central orange icon with a white Speechmatics logo. The background is dark blue, enhancing the orange glow. A thin green line runs horizontally across the lower part of the image.]
Technical

Speed you can trust: The STT metrics that matter for voice agents

What “fast” actually means for voice agents — and why Pipecat’s TTFS + semantic accuracy is the clearest benchmark we’ve seen.

Archie McMullan
Archie McMullanSpeechmatics Graduate