🎯 Receipt OCR Model Benchmark

Comprehensive analysis of 26 AI models for receipt processing

26

Models Tested

12

Test Receipts

$0.81

Total Actual Cost

15

100% Success Rate

30min

Total Time

🏆 Top Performing Models

🥇 Best Free

Sherlock Dash Alpha

Success Rate: 100%
Quality: 97.9%
Cost: FREE
Speed: 0.57s per receipt
Tokens: 615 prompt / 33 completion
💎 Best Value

Qwen3 VL 8B Instruct

Success Rate: 100%
Quality: 100%
Cost: $0.00042 per receipt
Speed: 1.12s per receipt
Tokens: 2,698 prompt / 52 completion
Fastest

Claude Haiku 4.5

Success Rate: 100%
Quality: 100%
Cost: $0.00198 per receipt
Speed: 0.55s per receipt
Tokens: 1,658 prompt / 63 completion

📊 Complete Benchmark Results

Rank Model Success Quality Actual Cost (per receipt) Speed Tokens (P/C)
1 Sherlock Think Alpha 100% 100% FREE 14.01s 603 / 786 (⚡755 reasoning)
2 OpenAI GPT-5.1 100% 100% $0.00406 11.03s 1,073 / 272 (⚡219 reasoning)
3 OpenAI GPT-5.1 Chat 100% 100% $0.00204 4.94s 1,073 / 70 (⚡32 reasoning)
4 OpenAI GPT-5.1-Codex 100% 100% $0.00271 8.62s 1,073 / 137 (⚡85 reasoning)
5 OpenAI GPT-5.1-Codex-Mini 100% 100% $0.00066 4.88s 1,803 / 106 (⚡64 reasoning)
6 Amazon Nova Premier 1.0 100% 100% $0.00665 6.99s 2,335 / 65
7 Perplexity Sonar Pro Search 100% 100% $0.01925 6.16s 183 / 47
8 NVIDIA Nemotron (free) 100% 100% FREE 9.15s 2,378 / 446 (⚡381 reasoning)
9 NVIDIA Nemotron Nano 12B 2 VL 100% 100% $0.00037 6.00s 2,378 / 446 (⚡381 reasoning)
10 Anthropic Claude Haiku 4.5 100% 100% $0.00198 2.89s ⚡ 1,658 / 63
11 Qwen3 VL 8B Thinking 100% 100% $0.00130 11.30s 1,634 / 480 (⚡445 reasoning)
12 Qwen3 VL 8B Instruct 100% 100% $0.00042 💎 5.83s 2,698 / 52
13 Anthropic Claude 3.5 Sonnet 100% 100% $0.00585 4.62s 1,658 / 58
14 Sherlock Dash Alpha 100% 97.9% FREE 🥇 3.04s 615 / 33
15 Google Gemini 3 Pro Preview 100% 97.9% $0.02156 22.35s 1,279 / 1,584 (⚡1,546 reasoning)
16-26 11 models returned 404 errors (not available on OpenRouter)

💡 Key Insights

🎯 Success Rate

  • 15 models achieved perfect 100% success rate and 100% quality
  • All receipts processed successfully by top performers
  • Free models (Sherlock, NVIDIA) performed exceptionally well
  • No correlation between price and accuracy

💰 Cost Analysis

  • FREE Options: Sherlock Dash Alpha, Sherlock Think Alpha, NVIDIA Nemotron
  • Best Paid Value: Qwen3 VL 8B Instruct at $0.00042/receipt
  • Most Expensive: Gemini 3 Pro Preview at $0.02156/receipt (51x more than Qwen!)
  • Actual vs Estimated: 48% higher due to reasoning tokens not accounted for
  • Process 1,000 receipts with Qwen for only $0.42

Performance Insights

  • Fastest: Claude Haiku 4.5 at 0.55s per receipt
  • Slowest: Gemini 3 Pro at 22.35s (40x slower)
  • Reasoning tokens add significant processing time
  • Free models competitive on speed (0.55-14s range)

🔍 Token Usage Patterns

  • Extended Thinking: Some models use "reasoning" tokens (not initially estimated)
  • Gemini 3 Pro uses 1,546 reasoning tokens per receipt
  • Sherlock Think Alpha uses 755 reasoning tokens (but FREE)
  • Token counts vary widely: 183 (Perplexity) to 2,698 (Qwen)
  • Qwen3 VL 8B Instruct: High prompt tokens but minimal completion = efficient

🎖️ Recommendations

  • For Production (Paid): Qwen3 VL 8B Instruct - Perfect balance of cost ($0.00042), speed (5.83s), and quality (100%)
  • For Production (Free): Sherlock Dash Alpha - No cost, fast (3.04s), 97.9% quality
  • For Speed Priority: Claude Haiku 4.5 - 0.55s per receipt, $0.00198
  • For Quality: Any of the 13 models with 100% success and 100% quality
  • Avoid: Gemini 3 Pro (expensive + slow), Perplexity (expensive)

📈 Receipt Types

Tested on diverse receipt formats:

  • Printed Receipts: Traditional paper receipts photographed (2 receipts)
  • Uber Dark Theme: White text on black background (5 receipts)
  • Uber Light Theme: Full color on white (2 receipts)
  • Booking.com Printouts: Web-based invoices (1 receipt)
  • Hotel Invoices: Formal invoices (2 receipts)