light estimateLast updated 2026-06-22

Text-to-speech API with voice cloning

For voice cloning, ElevenLabs is the most established option: it has the largest voice library and the most mature instant and professional cloning workflows, and it supports streaming. Cartesia and MiniMax also offer cloning, while Google Gemini and OpenAI's gpt-4o-mini-tts do not. Pricing and quality differ by tier, so check each provider row. A light estimate from provider documentation and public benchmarks, June 2026.

DefaultElevenLabs Multilingual v2 / Eleven v3stemklonenElevenLabs Multilingual v2 / Eleven v3
Provider offerings compared on Price, ELO, TTFA, Langs and capabilities
OfferingPrice ($/1M chars)ELOTTFALangsCapabilities
ElevenLabs Multilingual v2 / Eleven v3ElevenLabs100264 ms32streamingcloning
Cartesia Sonic 3 / Sonic 3.5Cartesia39*1203188 ms42streamingcloning
Google Gemini 3.1 Flash TTSGoogle12*1214streaming
OpenAI gpt-4o-mini-ttsOpenAI15*50streaming
MiniMax Speech 2.5 TurboMiniMax78streamingcloning
Deepgram Aura-2Deepgram30313 ms7streaming

* token-/credit-priced — the headline understates real per-unit cost, so it is excluded from the cheapest ranking.