light estimateLast updated 2026-06-22

Lowest latency text-to-speech API

In Coval's independent latency benchmark, Cartesia Sonic 3.5 has the lowest time-to-first-audio at about 188 ms (P50), ahead of ElevenLabs (around 264 ms) and Deepgram Aura-2 (around 313 ms). These independent P50s run materially slower than vendors' own optimized marketing claims, so trust the measured figures over the spec sheets. A light estimate captured May 2026; providers absent from the benchmark are left blank.

DefaultCartesia Sonic 3 / Sonic 3.5snelsteCartesia Sonic 3 / Sonic 3.5
Provider offerings compared on Price, ELO, TTFA, Langs and capabilities
OfferingPrice ($/1M chars)ELOTTFALangsCapabilities
ElevenLabs Multilingual v2 / Eleven v3ElevenLabs100264 ms32streamingcloning
Cartesia Sonic 3 / Sonic 3.5Cartesia39*1203188 ms42streamingcloning
Google Gemini 3.1 Flash TTSGoogle12*1214streaming
OpenAI gpt-4o-mini-ttsOpenAI15*50streaming
MiniMax Speech 2.5 TurboMiniMax78streamingcloning
Deepgram Aura-2Deepgram30313 ms7streaming

* token-/credit-priced — the headline understates real per-unit cost, so it is excluded from the cheapest ranking.