Text-to-speech API with voice cloning
For voice cloning, ElevenLabs is the most established option: it has the largest voice library and the most mature instant and professional cloning workflows, and it supports streaming. Cartesia and MiniMax also offer cloning, while Google Gemini and OpenAI's gpt-4o-mini-tts do not. Pricing and quality differ by tier, so check each provider row. A light estimate from provider documentation and public benchmarks, June 2026.
DefaultElevenLabs Multilingual v2 / Eleven v3stemklonenElevenLabs Multilingual v2 / Eleven v3
Provider offerings
| Offering | Price ($/1M chars) | ELO | TTFA | Langs | Capabilities |
|---|---|---|---|---|---|
| ElevenLabs Multilingual v2 / Eleven v3ElevenLabs | 100 | — | 264 ms | 32 | streamingcloning |
| Cartesia Sonic 3 / Sonic 3.5Cartesia | 39* | 1203 | 188 ms | 42 | streamingcloning |
| Google Gemini 3.1 Flash TTSGoogle | 12* | 1214 | — | — | streaming |
| OpenAI gpt-4o-mini-ttsOpenAI | 15* | — | — | 50 | streaming |
| MiniMax Speech 2.5 TurboMiniMax | 78 | — | — | — | streamingcloning |
| Deepgram Aura-2Deepgram | 30 | — | 313 ms | 7 | streaming |
* token-/credit-priced — the headline understates real per-unit cost, so it is excluded from the cheapest ranking.
Sources
- Artificial Analysis — Text to Speech Leaderboard (Speech Arena, blind-vote ELO)2026-06-22
- Coval — Best Text-to-Speech Providers in 2026 (independent TTFA/TTFB benchmark, captured 2026-05-04)2026-06-01
- ElevenLabs — API Pricing2026-06-22
- Cartesia — Pricing2026-06-22
- Google — Gemini Developer API Pricing2026-06-22
- OpenAI — API Pricing2026-06-22
- MiniMax — Product Pricing (API docs)2026-06-22
- Deepgram — Pricing (Aura-2 TTS)2026-06-22