light estimateLast updated 2026-06-19

Best transcription API with word-level timestamps

Subtitling, captioning and audio-text alignment need precise word-level timing. ElevenLabs Scribe v2 ranks highest among the services returning per-word timestamps natively, together with diarization and custom vocabulary. AssemblyAI Universal-3 Pro, Deepgram Nova-3 and Speechmatics Enhanced also emit them; OpenAI's base model and Gemini return plain text only. Timestamp granularity isn't independently benchmarked, so the ranking follows overall accuracy. A light estimate aggregated from provider documentation with attribution.

DefaultElevenLabs Scribe v2met timestampsElevenLabs Scribe v2
Provider offerings compared on Price, WER, Langs, Latency and capabilities
OfferingPrice ($/1000 min)WERLangsLatencyCapabilities
ElevenLabs Scribe v2ElevenLabs3.672.2%90150 msdiarizationtimestampsvocab
AssemblyAI Universal-3 ProAssemblyAI3.53.1%99150 msdiarizationtimestampsvocab
Deepgram Nova-3Deepgram4.35.2%50300 msdiarizationtimestampsvocab
Speechmatics EnhancedSpeechmatics6.74%70500 msdiarizationtimestampsvocab
OpenAI gpt-4o-transcribeOpenAI64%
Google Gemini 3 FlashGoogle1.92*2.9%

* token-/credit-priced — the headline understates real per-unit cost, so it is excluded from the cheapest ranking.