Product - Real-Time

Astonishingly accurate ASR is here, in real-time.

What are you waiting for?

Precise, low-latency transcription, translation and speech capabilities, all delivered before your media even ends.

Why wait, when you can deliver incredible features now?

Live. Instant. Real-time.

Whatever you call it, you can have it. Build a new world of features and insight on top of live speech, and blow your customers' minds.

Hit the perfect balance between speed and accuracy

Return your transcript in between two seconds and ten seconds.

At 10s, real-time transcripts are just 0.05% less accurate than our batch service. At 2s, your transcriptions will still be 92.6% as accurate as our batch service.

Outperforming the competition, even at low-latency

Receive transcripts in a few hundred milliseconds through our partial transcription.

We compared our fastest real-time transcription to our competitors’ most accurate. Our real-time transcription was 45% better than Amazon, 28% better than Google, and 14% better than Microsoft.

Low-quality, noisy audio? We hear you

We put our real-time models through rigorous testing that reflects real-world, noisy environments.

In our tests, designed to mimic a train station, a pub, a football match, and a contact center, our word error rate (WER) is uniformly low, and uniformly lower than competitor providers.

Real-time translation - in 35 languages

You’re not limited to English with Speechmatics. Transcribe and translate over half the world’s population, in real-time.

From Bulgarian, to German, and even Vietnamese, we break down language barriers so you can bring your product to the largest possible audience.

Still not sold? Try our real-time English model and see for yourself...

Click the 'Start Transcribing' button below, and speak into your mic.

What are you waiting for?

Take a step towards seamless interaction with technology.

Build Speech Intelligence with live speech data.


How much does real-time transcription cost?

We charge per second for transcription - for a more detailed breakdown please visit our dedicated pricing page.

How quickly will I get my transcription back?

You can start receiving transcription in under a few hundred milliseconds after the words are spoken through our partial transcription.

As more words are spoken, we use the context to correct ambiguous words until we give our final best transcription. These finals can be returned within 2 seconds, depending on the accuracy vs latency requirements.

How can you get real-time accuracy so close to the accuracy of transcribing a file?

At the core, the machine learning models we use are identical for batch file transcription and real-time.

This means that you get our best accuracy transcription in both modes. The small accuracy impact at lower latencies in real-time (<4s) comes due to having less context from the speaker.

How many speakers can be identified in real-time?

By default, we can identify up to 50 speakers in a real-time stream. This can be increased to 100 speakers.

What’s the longest stream time you support?

We can support streams over 24 hours long!

Stream duration is effectively unlimited - in fact we've had customers with streams running for over a month.

How many concurrent real-time connections can I have?

If you sign up through our portal you get two concurrent real-time connections on the free usage tier.

On our 'Pay As You Grow' tier you can use ten concurrent streams, and for our 'Enterprise' customers we support as many connections as you require.

Astonishing accurate ASR is here, in real-time.

What are you waiting for?