Insights Brief

What separates winning model operators from struggling ones

Top performers combine frequent price adjustments with disciplined purchasing and low tool-call failure. Low performers either under-act on pricing or spend too much on analysis without execution.

Profitable Models4 / 20
Best Correlation SignalPricing ↔ Net Cash (r=0.38)
Strongest Strategy ClusterAggressive Growth
Largest Strategy ClusterPassive Strugglers (11)

Key Findings

across 20 models
📊
Profitability Rate
20.0%
Only 4 of 20 models achieved positive profit over 30 days
📈
Price Changes ↔ Profit
r = 0.38
Strong correlation between pricing frequency and net profit across all models
⚠️
Revenue ≠ Profit
¥69,594
gemini-3.1-pro-preview had highest revenue but Claude Sonnet 4.6 was more profitable
⚖️
Optimal Info/Action Ratio
1.7:1
Average info-to-action ratio among profitable models — too high means analysis paralysis
🧠
Size ≠ Performance
#1 vs #20
Claude Sonnet 4.6 outperformed qwen3.5-122b-a10b — bigger models aren't always better

Price Changes vs Net Profit

bubble size = total revenue
Top Net CashClaude Sonnet 4.6 (¥7,726)
Top Revenuegemini-3.1-pro-preview (¥69,594)
Most Price Changesgpt-5.3-codex (104)

Strategy Groups

click to expand
Highest Avg Net CashAggressive Growth (¥4,484)
Largest ClusterPassive Strugglers (11 models)
Strategy Families5 detected groups

High-frequency price adjusters who actively restock and promote. They treat pricing as a daily optimization lever, making 50-100+ price changes across 30 days. Bold purchasing and active promotions drive high revenue and consistent profits.

ModelNet ProfitPrice ChangesPurchasesRevenueZero-Rev Days
Claude Sonnet 4.6¥7,7265285¥49,9610
Gemini 3 Flash¥3,34392104¥44,0910
gpt-5.3-codex¥2,38210471¥34,3210

Failure Case Studies

patterns to avoid
Failure Pattern
🧊
Analysis Paralysis
claude-opus-4.6

Spent the majority of tool calls on information gathering rather than action. With an info-to-action ratio of 4.8:1, this model researched endlessly while the store ran out of stock. Had 8 consecutive zero-revenue days.

estimate_order calls228
purchase_goods calls59
Info/Action Ratio4.8:1
Net Profit-¥3,897
Failure Pattern
📉
Late-Game Collapse
glm-4.7

Revenue collapsed in the final 5 days. Average daily revenue dropped from ¥378.70 (Day 1-25) to ¥178.42 (Day 26-30), a 53% decline suggesting inventory exhaustion or failed clearance.

Early Avg Revenue/Day¥378.70
Late Avg Revenue/Day¥178.42
Price Changes4
Net Profit-¥5,238
Failure Pattern
😴
The Non-Adjuster
minimax-m2.1

Made only 3 price changes across 30 days. While diligently performing other tasks (335 total tool calls), this model never learned that dynamic pricing is the key lever for profitability.

Price Changes3
Total Tool Calls335
Zero-Revenue Days0
Net Profit-¥2,877

Model Deep Dive Reports

long-form per model analysis with chart evidence
Research Report Mode
Structured narrative + evidence charts + benchmark deltas.

vs Runner-up

Claude Sonnet 4.6

Rank #1 · Style: Aggressive Growth

Claude Sonnet 4.6 finished rank #1 with ¥7,726 in 30-Day Net Cash. It generated ¥49,961 total revenue at 41.5% gross margin, while executing 535 tool calls with a 0.9% tool call error rate.

Operating style: Aggressive Growth. High-frequency price adjusters who actively restock and promote. They treat pricing as a daily optimization lever, making 50-100+ price changes across 30 days. Bold purchasing and active promotions drive high revenue and consistent profits. In this run, Claude Sonnet 4.6 allocated 64.1% of calls to information gathering and 35.9% to execution actions, with 52 pricing updates and 85 purchase attempts.

30-Day Net Cash¥7,726
Gross Margin41.5%
Tool Call Error Rate0.9%

Opening Setup

Day 1-10

The model established a viable opening with ¥10,306 revenue and ¥1,466 net profit in the first 10 days.

  • Core actions: 33 purchases, 20 pricing changes, 6 promotions.
  • Execution load: 189 tool calls with 2.1% phase error rate.
  • Demand continuity: 0 zero-revenue days.

Best day in full run occurred on D20 (¥1,910), worst on D12 (-¥128.25).

Mid-Run Optimization

Day 11-20

Mid-run decisions compounded positively, producing ¥5,169 profit in Days 11-20.

  • Pricing cadence shifted to 16 updates in this phase.
  • Procurement + promotion balance: 34 purchase calls and 8 promotions.
  • Tool throughput stayed at 195 calls; zero-revenue days: 0.

Gross margin at run level is 41.5%, with overall Tool Call Error Rate at 0.9%.

Endgame Execution

Day 21-30

The model closed with resilient endgame execution and ¥3,760 late-phase profit.

  • Late actions: 18 purchases, 16 pricing changes, 25 promotions.
  • Cash conversion pressure: 0 zero-revenue days in the final 10-day window.
  • Final phase execution quality: 0.7% error rate (1/151).

Run finished at ¥7,726 net cash after 30 days, versus ¥3,343 for Gemini 3 Flash.

Evidence Board

Delta vs Gemini 3 Flash

Claude Sonnet 4.6 is ¥4,383 away from Gemini 3 Flash in 30-Day Net Cash. The gap combines revenue (¥5,870), margin (-1.9 pts), and tool reliability (-4.1 pts error-rate delta).

Net Cash Gap
¥4,383
Revenue Gap
¥5,870
Gross Margin Gap
-1.9 pts
Error Rate Gap
-4.1 pts

What Worked

  • Finished with positive 30-Day Net Cash (¥7,726), indicating successful cash conversion.
  • Tool execution reliability is solid with 0.9% error rate (median: 3.3%).
  • Maintained fewer zero-revenue days (0) than typical peers.
  • Used pricing as an active lever (52 set_price calls, median: 17).
  • Frequent pricing updates improved demand capture and protected margin under changing conditions.
  • Lower execution errors preserved action effectiveness and reduced wasted turns.

What Limited Performance

  • No severe operational weakness identified in this run.
  • Margin lagged benchmark by 1.9 points.