Jachimo · May 23, 2026 00:31
diff --git a/2026-05-22 LLM Model Value.csv b/2026-05-22 LLM Model Value.csv
diff --git a/venice-bargain-analysis.md b/venice-bargain-analysis.md
Model	Input $/M	Output $/M	Cache $/M	Total Cost	GPQA Diamond	Value Score	Verdict
DeepSeek V4 Flash	$0.17	$0.35	$0.03	$2.06	~88%	42.7	⭐ BEST VALUE
DeepSeek V4 Pro	$1.73	$3.80	$0.33	$22.25	90.1%	4.05	⭐ Excellent
Grok 4.20	$1.42	$2.83	$0.23	$16.94	~88%	5.19	⭐ Good
Grok 4.3	$1.42	$2.83	$0.23	$16.94	~88%	5.19	⭐ Good
Grok Build 0.1	$1.00	$2.00	$0.20	$13.00	~85%	6.54	⭐ Good
Model	Input $/M	Output $/M	Cache $/M	Total Cost	GPQA Diamond	Value Score	Verdict
Claude Opus 4.7 Fast	$36.00	$180.00	$3.60	$414.00	94.2%	0.23	🔴 Terrible Value
Claude Opus 4.6 Fast	$36.00	$180.00	$3.60	$414.00	91.3%	0.22	🔴 Terrible Value
GPT-5.5 Pro	$37.50	$225.00	—	$187.50	93.6%	0.50	🔴 Overpriced
GPT-5.4 Pro	$37.50	$225.00	—	$187.50	92.0%	0.49	🔴 Overpriced
Model	Input $/M	Output $/M	Cache $/M	Total Cost	GPQA Diamond	Value Score	Verdict
Claude Opus 4.7	$6.00	$30.00	$0.60	$69.00	94.2%	1.36	🟡 Premium but Fair
Claude Opus 4.6	$6.00	$30.00	$0.60	$69.00	91.3%	1.32	🟡 Standard
Claude Opus 4.5	$6.00	$30.00	$0.60	$69.00	~90%	1.30	🟡 Standard
GPT-5.5	$6.25	$37.50	$0.63	$82.63	93.6%	1.13	🟡 Premium
GPT-5.4	$3.13	$18.80	$0.31	$41.03	92.0%	2.24	🟢 Good Value
GPT-5.4 Mini	$0.94	$5.63	$0.09	$11.44	~85%	7.43	🟢 Very Good
Gemini 3.1 Pro	$2.50	$15.00	$0.50	$32.50	94.3%	2.90	🟢 Good Value
Model	Total Cost	GPQA Diamond	Value Score	Verdict
Qwen 3.5 9B	$0.56	~75%	133.9	⭐ EXCEPTIONAL
Mistral Small 3.2	$0.56	~75%	133.9	⭐ EXCEPTIONAL
GLM 4.7 Flash	$0.65	~80%	123.1	⭐ EXCEPTIONAL
GLM 4.7 Flash Heretic	$0.70	~80%	114.3	⭐ EXCEPTIONAL
DeepSeek V3.2	$1.89	~75%	39.7	⭐ Very Good
MiniMax M2.5	$2.21	~80%	36.2	⭐ Very Good
Qwen 3.6 27B	$2.63	87.8%	33.4	⭐ Very Good
Kimi K2.5	$3.85	87.6%	22.8	⭐ Outstanding
Kimi K2.6	$5.53	~88%	15.9	⭐ Outstanding
Qwen 3.6 Plus	$5.57	~87%	15.6	⭐ Outstanding
Model	Total Cost	GPQA Diamond	Value Score	Verdict
GLM 5	$13.60	~84%	6.18	🟢 Good
GLM 5.1	$23.60	86.2%	3.65	🟡 Your Actual Experience
GPT-5.2	$16.16	~88%	5.45	🟡 Expensive
Claude Sonnet 4.6	$19.44	~85%	4.37	🟡 Expensive
Claude Sonnet 4.5	$20.25	~84%	4.15	🟡 Expensive
Rank	Model	Cost	GPQA	Value Score	Why It Stands Out
1	Qwen 3.5 9B	$0.56	~75%	133.9	Cheapest viable option
2	Mistral Small 3.2	$0.56	~75%	133.9	Same price, similar performance
3	GLM 4.7 Flash	$0.65	~80%	123.1	80% frontier at < $1
4	GLM 4.7 Flash Heretic	$0.70	~80%	114.3	Slightly pricier Flash variant
5	DeepSeek V4 Flash	$2.06	~88%	42.7	The Sweet Spot
6	DeepSeek V3.2	$1.89	~75%	39.7	Proven DeepSeek quality
7	MiniMax M2.5	$2.21	~80%	36.2	Strong agentic performance
8	Qwen 3.6 27B	$2.63	87.8%	33.4	Near-frontier, verified
9	Kimi K2.5	$3.85	87.6%	22.8	Open weights, excellent
10	Kimi K2.6	$5.53	~88%	15.9	Best open-weights upgrade
Model	Cost	GPQA	Value Score	Why Avoid
Claude Opus 4.7 Fast	$414.00	94.2%	0.23	6× cost for speed, same accuracy
Claude Opus 4.6 Fast	$414.00	91.3%	0.22	Same problem
GPT-5.5 Pro	$187.50	93.6%	0.50	2.3× cost, minimal gain over 5.5
GPT-5.4 Pro	$187.50	92.0%	0.49	Same issue
Use Case	Recommended Model	Cost	Why
Ultra-budget, any capability	Qwen 3.5 9B	$0.56	Cheapest option that works
Best value under $1	GLM 4.7 Flash	$0.65	80% GPQA at throwaway price
Sweet spot (performance/price)	DeepSeek V4 Flash	$2.06	88% GPQA, proven reliable
Best open-weights	Kimi K2.6	$5.53	Full control, strong performance
Verified high performance	Qwen 3.6 27B	$2.63	87.8% GPQA, published scores
Frontier quality, fair price	GPT-5.4	$41.03	92% GPQA, half the cost of 5.5
Absolute best reasoning	Claude Opus 4.7	$69.00	94.2% GPQA, justifies premium
Google ecosystem	Gemini 3.1 Pro	$32.50	94.3% GPQA, competitive
Never use	Any "Fast" variant	$187-$414	Same accuracy, 6× price