light estimateLast updated 2026-06-19

Best transcription API with speaker diarization

Need to know who spoke when — meetings, interviews, multi-speaker calls? ElevenLabs Scribe v2 is the top-ranked offering with built-in diarization, alongside word timestamps and custom vocabulary. AssemblyAI Universal-3 Pro and Deepgram Nova-3 also return native speaker labels; OpenAI's base transcribe model and Google Gemini do not, needing extra tooling. Diarization quality isn't separately scored here, so this order is accuracy-led. A light estimate from public docs.

DefaultElevenLabs Scribe v2met diarisatieElevenLabs Scribe v2

Provider offerings

Provider offerings compared on Price, WER, Langs, Latency and capabilities
Offering	Price ($/1000 min)	WER	Langs	Latency	Capabilities
ElevenLabs Scribe v2ElevenLabs	3.67	2.2%	90	150 ms	diarizationtimestampsvocab
AssemblyAI Universal-3 ProAssemblyAI	3.5	3.1%	99	150 ms	diarizationtimestampsvocab
Deepgram Nova-3Deepgram	4.3	5.2%	50	300 ms	diarizationtimestampsvocab
Speechmatics EnhancedSpeechmatics	6.7	4%	70	500 ms	diarizationtimestampsvocab
OpenAI gpt-4o-transcribeOpenAI	6	4%	—	—	—
Google Gemini 3 FlashGoogle	1.92*	2.9%	—	—	—

* token-/credit-priced — the headline understates real per-unit cost, so it is excluded from the cheapest ranking.

Sources

Artificial Analysis — Speech to Text2026-06-19
Open ASR Leaderboard2026-06-19