May 2, 2016 | Read time 2 min

Speechmatics unveils Universal Time Alignment system

Header image

Speechmatics releases Universal Time Alignment, our language-independent forced-alignment service to match words in text files to their counterparts in audio files, accurately and automatically delivering improved content discoverability, in any language!

The R&D team at Speechmatics have used their deep learning expertise to create a highly accurate and automated system for aligning audio to text.

By synchronising audio to text, Universal Time Alignment can be used for the creation of closed captions and subtitles, indexing archives and enriching human generated transcripts with extra metadata that would usually be carried out laboriously by hand. In an industry where metadata and searchability is becoming increasingly crucial, time alignment offers a simple and very cost effective way of making audio, video and text searchable across any language. To create Universal Time Alignment we extracted elements from our modular speech recognition technology, re-engineered them for the purpose and added additional alignment specific technology based on our machine learning expertise and experience. As a result we have created a system that is not only robust and accurate, but crucially able to cope with any language in the world.

“Any language in the world” was not quite specific enough as Dr Tom Ash (Director of Speech Recognition at Speechmatics) found out – “when we told the commercial team it would work with foreign languages, we were confident that we would cope with the French and German broadcast and telephony they were intending to throw at the system. However, when they discovered that we had successfully time-aligned 14th century Italian epics and the works of Chinese poet Xu Zhimo, even they were surprised.”

Speechmatics’ Universal Time Alignment system is a game changer in an industry that has long suffered from over-promise and under-delivery. This technology is a big step in helping bring the efficiency savings of these technologies to more difficult audio.

There are still many cases where audio quality is too low for traditional ASR to add value or save time in the workflow. However, for the cases where a human transcript has to be created, time alignment can now be used on the human transcript to add further value, reduce the cost of human time stamping and aid discoverability. We encourage everyone to visit our site at www.speechmatics.com/register to see for themselves how we can help content owners and transcribers extract the most value from their audio and video inventory.

Latest Articles

Carousel slide image
Technical

De-risk your voice agent: The 11 best voice agent testing platforms in 2026

Voice agents that pass in demos routinely fail in production. This guide covers the 11 best voice agent testing platforms in 2026, with the Five-Layer Testing Framework, platform deep dives, open-source alternatives, and a decision guide by maturity stage.

Speechmatics
SpeechmaticsEditorial Team
Carousel slide image
Technical

How to build a microbatching workflow with the Speechmatics API

Build a cleaner path between batch and real time. Learn when micro-batching makes sense, how to chunk audio, submit jobs, stitch JSON, and scale safely with the Speechmatics API.

Speechmatics
SpeechmaticsEditorial Team
Carousel slide image
Product

Alphanumeric speech recognition: why voice assistants mangle SKUs (and how to fix it)

A guide for voice AI engineers, ecommerce platforms and warehouse teams on SKU recognition accuracy voice assistant deployments depend on: why speech recognition systems produce transcription errors on product codes, what to measure when error rates matter, and the fixes that move the needle on order picking, voice ordering and customer-facing voice AI.

Speechmatics
SpeechmaticsEditorial Team
Carousel slide image
Technical

The Adobe story: How we made cloud-grade AI work on your laptop

Behind the build: what it takes to make cloud-grade speech recognition work inside Adobe Premiere, and why Whisper raised the stakes.

Andrew Innes
Andrew InnesChief Architect
Carousel slide image
Company

Adobe and Speechmatics deliver cloud-grade speech recognition on-device for Premiere

Adobe Premiere users can run the most accurate on-device transcription locally; efficient enough for a laptop, powerful enough for professional work.

Speechmatics
SpeechmaticsEditorial Team
Speechmatics x Thymia combine medical-grade speech-to-text with clinical-grade voice biomarker intelligence to identify health signals.
News

AI can now understand health signals from 15 seconds of your voice, including fatigue, stress and type 2 diabetes

The joint platform returns transcription and health signals in real time, with no additional hardware required.

Speechmatics
SpeechmaticsEditorial Team