light estimateLast updated 2026-06-19

Most accurate transcription API for English

For English audio, ElevenLabs Scribe v2 leads the Artificial Analysis leaderboard at 2.2% WER. Because that benchmark is English-leaning, the score is most representative precisely for English — a firmer call here than for other languages. Google Gemini (2.9%) is close but is a general multimodal model; AssemblyAI Universal-3 Pro (3.1%) trails slightly while costing less. Still a light estimate aggregated from public benchmarks, not a first-hand test.

DefaultElevenLabs Scribe v2hoogste nauwkeurigheidElevenLabs Scribe v2
Provider offerings compared on Price, WER, Langs, Latency and capabilities
OfferingPrice ($/1000 min)WERLangsLatencyCapabilities
ElevenLabs Scribe v2ElevenLabs3.672.2%90150 msdiarizationtimestampsvocab
AssemblyAI Universal-3 ProAssemblyAI3.53.1%99150 msdiarizationtimestampsvocab
Deepgram Nova-3Deepgram4.35.2%50300 msdiarizationtimestampsvocab
Speechmatics EnhancedSpeechmatics6.74%70500 msdiarizationtimestampsvocab
OpenAI gpt-4o-transcribeOpenAI64%
Google Gemini 3 FlashGoogle1.92*2.9%

* token-/credit-priced — the headline understates real per-unit cost, so it is excluded from the cheapest ranking.