Best transcription API with custom vocabulary
When transcripts are full of product names, medical or legal jargon, or unusual spellings, custom vocabulary lifts accuracy. ElevenLabs Scribe v2 leads the offerings supporting phrase lists or keyterm boosting, with AssemblyAI Universal-3 Pro, Deepgram Nova-3 and Speechmatics Enhanced close behind. OpenAI's base model only allows prompt-biasing, not a real phrase list, and Gemini has no dedicated control. Vocabulary handling stays unbenchmarked, so accuracy sets the order. A light estimate from documentation.
DefaultElevenLabs Scribe v2met custom vocabularyElevenLabs Scribe v2
Provider offerings
| Offering | Price ($/1000 min) | WER | Langs | Latency | Capabilities |
|---|---|---|---|---|---|
| ElevenLabs Scribe v2ElevenLabs | 3.67 | 2.2% | 90 | 150 ms | diarizationtimestampsvocab |
| AssemblyAI Universal-3 ProAssemblyAI | 3.5 | 3.1% | 99 | 150 ms | diarizationtimestampsvocab |
| Deepgram Nova-3Deepgram | 4.3 | 5.2% | 50 | 300 ms | diarizationtimestampsvocab |
| Speechmatics EnhancedSpeechmatics | 6.7 | 4% | 70 | 500 ms | diarizationtimestampsvocab |
| OpenAI gpt-4o-transcribeOpenAI | 6 | 4% | — | — | — |
| Google Gemini 3 FlashGoogle | 1.92* | 2.9% | — | — | — |
* token-/credit-priced — the headline understates real per-unit cost, so it is excluded from the cheapest ranking.
Sources
- Artificial Analysis — Speech to Text2026-06-19
- Open ASR Leaderboard2026-06-19