light estimateLast updated 2026-06-22

Lowest latency text-to-speech API

In Coval's independent latency benchmark, Cartesia Sonic 3.5 has the lowest time-to-first-audio at about 188 ms (P50), ahead of ElevenLabs (around 264 ms) and Deepgram Aura-2 (around 313 ms). These independent P50s run materially slower than vendors' own optimized marketing claims, so trust the measured figures over the spec sheets. A light estimate captured May 2026; providers absent from the benchmark are left blank.

DefaultCartesia Sonic 3 / Sonic 3.5snelsteCartesia Sonic 3 / Sonic 3.5

Provider offerings

Provider offerings compared on Price, ELO, TTFA, Langs and capabilities
Offering	Price ($/1M chars)	ELO	TTFA	Langs	Capabilities
ElevenLabs Multilingual v2 / Eleven v3ElevenLabs	100	—	264 ms	32	streamingcloning
Cartesia Sonic 3 / Sonic 3.5Cartesia	39*	1203	188 ms	42	streamingcloning
Google Gemini 3.1 Flash TTSGoogle	12*	1214	—	—	streaming
OpenAI gpt-4o-mini-ttsOpenAI	15*	—	—	50	streaming
MiniMax Speech 2.5 TurboMiniMax	78	—	—	—	streamingcloning
Deepgram Aura-2Deepgram	30	—	313 ms	7	streaming

* token-/credit-priced — the headline understates real per-unit cost, so it is excluded from the cheapest ranking.

Sources

Artificial Analysis — Text to Speech Leaderboard (Speech Arena, blind-vote ELO)2026-06-22
Coval — Best Text-to-Speech Providers in 2026 (independent TTFA/TTFB benchmark, captured 2026-05-04)2026-06-01
ElevenLabs — API Pricing2026-06-22
Cartesia — Pricing2026-06-22
Google — Gemini Developer API Pricing2026-06-22
OpenAI — API Pricing2026-06-22
MiniMax — Product Pricing (API docs)2026-06-22
Deepgram — Pricing (Aura-2 TTS)2026-06-22