OCR API for large documents
For large documents, AWS Textract handles the biggest jobs asynchronously — up to about 3000 pages (500MB) per call — ahead of Azure (around 2000) and Google (around 200). Mistral caps lower (around 1000 pages, 50MB). These are documented async limits, not throughput or accuracy measures, and structured extraction tiers carry separate costs. A light estimate from provider documentation, June 2026.
DefaultAWS Textract (DetectDocumentText)max_grootte_asyncAWS Textract (DetectDocumentText)
Provider offerings
| Offering | Price ($/1000 pages) | Score | Max pages | Capabilities |
|---|---|---|---|---|
| Mistral OCR 3Mistral AI | 2 | 79.75 | 1000 | tableshandwritingJSON |
| Google Document AI (Enterprise Document OCR)Google Cloud | 1.5 | — | 200 | tableshandwritingJSON |
| AWS Textract (DetectDocumentText)Amazon Web Services | 1.5 | — | 3000 | tableshandwritingJSON |
| Azure Document Intelligence (Read)Microsoft Azure | 1.5 | — | 2000 | tableshandwritingJSON |
| ReductoReducto | —* | — | — | tableshandwritingJSON |
| LlamaParseLlamaIndex (LlamaCloud) | 3* | — | — | tableshandwritingJSON |
* token-/credit-priced — the headline understates real per-unit cost, so it is excluded from the cheapest ranking.
Sources
- Mistral Pricing2026-06-22
- Introducing Mistral OCR 32026-06-22
- CodeSOTA — OmniDocBench Leaderboard2026-06-22
- OmniDocBench (CVPR 2025) — opendatalab2026-06-22
- Google Cloud Document AI Pricing2026-06-22
- AWS Textract Pricing2026-06-22
- Azure Document Intelligence Pricing2026-06-22
- Reducto Pricing2026-06-22
- LlamaParse / LlamaIndex Pricing2026-06-22