Receipt OCR Model Benchmark Results

🏆 Top Performing Models

🥇 Best Free

Sherlock Dash Alpha

Success Rate: 100%

Quality: 97.9%

Cost: FREE

Speed: 0.57s per receipt

Tokens: 615 prompt / 33 completion

💎 Best Value

Qwen3 VL 8B Instruct

Success Rate: 100%

Quality: 100%

Cost: $0.00042 per receipt

Speed: 1.12s per receipt

Tokens: 2,698 prompt / 52 completion

⚡ Fastest

Claude Haiku 4.5

Success Rate: 100%

Quality: 100%

Cost: $0.00198 per receipt

Speed: 0.55s per receipt

Tokens: 1,658 prompt / 63 completion

📊 Complete Benchmark Results

Rank	Model	Success	Quality	Actual Cost (per receipt)	Speed	Tokens (P/C)
1	Sherlock Think Alpha	100%	100%	FREE	14.01s	603 / 786 (⚡755 reasoning)
2	OpenAI GPT-5.1	100%	100%	$0.00406	11.03s	1,073 / 272 (⚡219 reasoning)
3	OpenAI GPT-5.1 Chat	100%	100%	$0.00204	4.94s	1,073 / 70 (⚡32 reasoning)
4	OpenAI GPT-5.1-Codex	100%	100%	$0.00271	8.62s	1,073 / 137 (⚡85 reasoning)
5	OpenAI GPT-5.1-Codex-Mini	100%	100%	$0.00066	4.88s	1,803 / 106 (⚡64 reasoning)
6	Amazon Nova Premier 1.0	100%	100%	$0.00665	6.99s	2,335 / 65
7	Perplexity Sonar Pro Search	100%	100%	$0.01925	6.16s	183 / 47
8	NVIDIA Nemotron (free)	100%	100%	FREE	9.15s	2,378 / 446 (⚡381 reasoning)
9	NVIDIA Nemotron Nano 12B 2 VL	100%	100%	$0.00037	6.00s	2,378 / 446 (⚡381 reasoning)
10	Anthropic Claude Haiku 4.5	100%	100%	$0.00198	2.89s ⚡	1,658 / 63
11	Qwen3 VL 8B Thinking	100%	100%	$0.00130	11.30s	1,634 / 480 (⚡445 reasoning)
12	Qwen3 VL 8B Instruct	100%	100%	$0.00042 💎	5.83s	2,698 / 52
13	Anthropic Claude 3.5 Sonnet	100%	100%	$0.00585	4.62s	1,658 / 58
14	Sherlock Dash Alpha	100%	97.9%	FREE 🥇	3.04s	615 / 33
15	Google Gemini 3 Pro Preview	100%	97.9%	$0.02156	22.35s	1,279 / 1,584 (⚡1,546 reasoning)
16-26	11 models returned 404 errors (not available on OpenRouter)

💡 Key Insights

🎯 Success Rate

15 models achieved perfect 100% success rate and 100% quality
All receipts processed successfully by top performers
Free models (Sherlock, NVIDIA) performed exceptionally well
No correlation between price and accuracy

💰 Cost Analysis

FREE Options: Sherlock Dash Alpha, Sherlock Think Alpha, NVIDIA Nemotron
Best Paid Value: Qwen3 VL 8B Instruct at $0.00042/receipt
Most Expensive: Gemini 3 Pro Preview at $0.02156/receipt (51x more than Qwen!)
Actual vs Estimated: 48% higher due to reasoning tokens not accounted for
Process 1,000 receipts with Qwen for only $0.42

⚡ Performance Insights

Fastest: Claude Haiku 4.5 at 0.55s per receipt
Slowest: Gemini 3 Pro at 22.35s (40x slower)
Reasoning tokens add significant processing time
Free models competitive on speed (0.55-14s range)

🔍 Token Usage Patterns

Extended Thinking: Some models use "reasoning" tokens (not initially estimated)
Gemini 3 Pro uses 1,546 reasoning tokens per receipt
Sherlock Think Alpha uses 755 reasoning tokens (but FREE)
Token counts vary widely: 183 (Perplexity) to 2,698 (Qwen)
Qwen3 VL 8B Instruct: High prompt tokens but minimal completion = efficient

🎖️ Recommendations

For Production (Paid): Qwen3 VL 8B Instruct - Perfect balance of cost ($0.00042), speed (5.83s), and quality (100%)
For Production (Free): Sherlock Dash Alpha - No cost, fast (3.04s), 97.9% quality
For Speed Priority: Claude Haiku 4.5 - 0.55s per receipt, $0.00198
For Quality: Any of the 13 models with 100% success and 100% quality
Avoid: Gemini 3 Pro (expensive + slow), Perplexity (expensive)

📈 Receipt Types

Tested on diverse receipt formats:

Printed Receipts: Traditional paper receipts photographed (2 receipts)
Uber Dark Theme: White text on black background (5 receipts)
Uber Light Theme: Full color on white (2 receipts)
Booking.com Printouts: Web-based invoices (1 receipt)
Hotel Invoices: Formal invoices (2 receipts)

🎯 Receipt OCR Model Benchmark

26

12

$0.81

15

30min

🏆 Top Performing Models

Sherlock Dash Alpha

Qwen3 VL 8B Instruct

Claude Haiku 4.5

📊 Complete Benchmark Results

💡 Key Insights

🎯 Success Rate

💰 Cost Analysis

⚡ Performance Insights

🔍 Token Usage Patterns

🎖️ Recommendations

📈 Receipt Types