Most natural text-to-speech voice
On the Artificial Analysis Speech Arena, a blind-vote naturalness ranking, Google Gemini 3.1 Flash TTS scores highest in this set at around 1214 ELO, narrowly ahead of Cartesia Sonic 3.5 (about 1203). Most other flagships had no Arena-specific score in this snapshot. The Arena re-ranks weekly and two fetches differed by more than ten points, so read this as rank-order, not an exact gap. A light estimate.
DefaultGoogle Gemini 3.1 Flash TTSnatuurlijksteGoogle Gemini 3.1 Flash TTS
Provider offerings
| Offering | Price ($/1M chars) | ELO | TTFA | Langs | Capabilities |
|---|---|---|---|---|---|
| ElevenLabs Multilingual v2 / Eleven v3ElevenLabs | 100 | — | 264 ms | 32 | streamingcloning |
| Cartesia Sonic 3 / Sonic 3.5Cartesia | 39* | 1203 | 188 ms | 42 | streamingcloning |
| Google Gemini 3.1 Flash TTSGoogle | 12* | 1214 | — | — | streaming |
| OpenAI gpt-4o-mini-ttsOpenAI | 15* | — | — | 50 | streaming |
| MiniMax Speech 2.5 TurboMiniMax | 78 | — | — | — | streamingcloning |
| Deepgram Aura-2Deepgram | 30 | — | 313 ms | 7 | streaming |
* token-/credit-priced — the headline understates real per-unit cost, so it is excluded from the cheapest ranking.
Sources
- Artificial Analysis — Text to Speech Leaderboard (Speech Arena, blind-vote ELO)2026-06-22
- Coval — Best Text-to-Speech Providers in 2026 (independent TTFA/TTFB benchmark, captured 2026-05-04)2026-06-01
- ElevenLabs — API Pricing2026-06-22
- Cartesia — Pricing2026-06-22
- Google — Gemini Developer API Pricing2026-06-22
- OpenAI — API Pricing2026-06-22
- MiniMax — Product Pricing (API docs)2026-06-22
- Deepgram — Pricing (Aura-2 TTS)2026-06-22