Hear Speechmatics' Text to Speech in action
Why choose Speechmatics' Text to Speech?
Scalable pricing → $0.011 per 1k characters keeps costs low.
Sub-150ms latency → ensures smooth, natural conversations.
Enterprise-ready deployment → cloud, on-prem, or hybrid.
Unified stack → best-in-class STT and TTS from 1 trusted provider.
Proven expertise → decades of leadership in enterprise speech.
Built for developers, trusted by enterprise
Crystal clear, human-like English voices today, with more languages, accents, and styles coming soon.
Consistent audio generation optimized for real-time voice agents, assistants, and other interactive apps.
Built with security and compliance - sensitive data stays protected across industries like finance & healthcare.
Just $0.011/1k characters — making Speechmatics one of the most affordable TTS options available.
Crystal clear, human-like English voices today, with more languages, accents, and styles coming soon.
Consistent audio generation optimized for real-time voice agents, assistants, and other interactive apps.
Built with security and compliance - sensitive data stays protected across industries like finance & healthcare.
Just $0.011/1k characters — making Speechmatics one of the most affordable TTS options available.
Text to Speech FAQs
What languages do you support?
We support English (both British and American) and we plan to launch additional languages in 2025.
Can I control voice speed, pitch, or emphasis?
Not yet. The API outputs natural speech with prosody driven by the text content. Fine-grained voice control features may be added in future releases.
How much latency should I expect?
The initial audio chunk typically returns in less than 200ms, with subsequent audio chunks returning faster than real time.
Is there a streaming API for real-time generation?
The API supports streaming audio output (you can play audio as it arrives), but not full bidirectional streaming. We plan to add support for this in the future.
Can I deploy this in my own environment?
Yes! The text to speech API can be consumed via our managed service or deployed in your own environment. To learn more about on-premises deployment options please contact our sales team.
Resources for Text-to-Speech
Why we built our low-latency Text-to-Speech
Most TTS sounds great in demos but breaks in real conversations. We built ours for sub-150ms latency, natural voices, and global scale.
Non-English TTS still sounds like a Dalek
Why most voices sound natural in English but still robotic in other languages, and how to fix it.
Introducing real-time, speaker-aware Voice Agents with LiveKit + Speechmatics
Speechmatics brings speaker diarization to LiveKit agents - enabling them to understand not just what was said, but who said it.