Lowest latency text-to-speech API
In Coval's independent latency benchmark, Cartesia Sonic 3.5 has the lowest time-to-first-audio at about 188 ms (P50), ahead of ElevenLabs (around 264 ms) and Deepgram Aura-2 (around 313 ms). These independent P50s run materially slower than vendors' own optimized marketing claims, so trust the measured figures over the spec sheets. A light estimate captured May 2026; providers absent from the benchmark are left blank.
DefaultCartesia Sonic 3 / Sonic 3.5snelsteCartesia Sonic 3 / Sonic 3.5
Provider offerings
| Offering | Price ($/1M chars) | ELO | TTFA | Langs | Capabilities |
|---|---|---|---|---|---|
| ElevenLabs Multilingual v2 / Eleven v3ElevenLabs | 100 | — | 264 ms | 32 | streamingcloning |
| Cartesia Sonic 3 / Sonic 3.5Cartesia | 39* | 1203 | 188 ms | 42 | streamingcloning |
| Google Gemini 3.1 Flash TTSGoogle | 12* | 1214 | — | — | streaming |
| OpenAI gpt-4o-mini-ttsOpenAI | 15* | — | — | 50 | streaming |
| MiniMax Speech 2.5 TurboMiniMax | 78 | — | — | — | streamingcloning |
| Deepgram Aura-2Deepgram | 30 | — | 313 ms | 7 | streaming |
* token-/credit-priced — the headline understates real per-unit cost, so it is excluded from the cheapest ranking.
Sources
- Artificial Analysis — Text to Speech Leaderboard (Speech Arena, blind-vote ELO)2026-06-22
- Coval — Best Text-to-Speech Providers in 2026 (independent TTFA/TTFB benchmark, captured 2026-05-04)2026-06-01
- ElevenLabs — API Pricing2026-06-22
- Cartesia — Pricing2026-06-22
- Google — Gemini Developer API Pricing2026-06-22
- OpenAI — API Pricing2026-06-22
- MiniMax — Product Pricing (API docs)2026-06-22
- Deepgram — Pricing (Aura-2 TTS)2026-06-22