light estimateLast updated 2026-06-22

Most natural text-to-speech voice

On the Artificial Analysis Speech Arena, a blind-vote naturalness ranking, Google Gemini 3.1 Flash TTS scores highest in this set at around 1214 ELO, narrowly ahead of Cartesia Sonic 3.5 (about 1203). Most other flagships had no Arena-specific score in this snapshot. The Arena re-ranks weekly and two fetches differed by more than ten points, so read this as rank-order, not an exact gap. A light estimate.

DefaultGoogle Gemini 3.1 Flash TTSnatuurlijksteGoogle Gemini 3.1 Flash TTS
Provider offerings compared on Price, ELO, TTFA, Langs and capabilities
OfferingPrice ($/1M chars)ELOTTFALangsCapabilities
ElevenLabs Multilingual v2 / Eleven v3ElevenLabs100264 ms32streamingcloning
Cartesia Sonic 3 / Sonic 3.5Cartesia39*1203188 ms42streamingcloning
Google Gemini 3.1 Flash TTSGoogle12*1214streaming
OpenAI gpt-4o-mini-ttsOpenAI15*50streaming
MiniMax Speech 2.5 TurboMiniMax78streamingcloning
Deepgram Aura-2Deepgram30313 ms7streaming

* token-/credit-priced — the headline understates real per-unit cost, so it is excluded from the cheapest ranking.