The Ultra-Bullet Evolution
A master class analysis of 24 engines competing under extreme time pressure — 9s+0.1s increment, single thread, 50 rounds
Intel Core i5-6500 · 16 GB RAM 346 / 1120 games played (30.8%) Blitz 9s + 0.1s increment GUI: Arena Chess
Tournament leader
SF 18dev
Elo 2909.6 · 66.2% score
+29.6 Elo
Biggest surprise
Reckless 0.10d
Rank #3 · above SF 17 & 17.1
59.4% score
Biggest collapse
RubiChess
Rank #24 · 35.8% score
-35.8 Elo
Most consistent
Caissa 1.25
Perfectly neutral across all blocks
+0.3 Elo
Full standings — all 24 engines
# Engine Points Games Elo ± Elo Score % Elo bar Trend
Score % per 8-round block — top movers
SF 18dev Reckless 0.10.0d SF 16.1 (recovery) RubiChess (collapse)
Score trend chart
Elo gain/loss per 8-round block — selected engines
Engine R 1–8R 9–16R 17–24 R 25–32R 33–40R 41–50 Total
SF 18dev SF+4.1+8.6+9.1+5.7+4.4+7.5+39.4
SF 18 SF+5.3+7.0+7.8+4.9+1.2+4.7+30.9
Reckless 0.10.0d RC+2.4+5.9+4.3+8.8+2.4+7.3+31.1
SF 17 SF+2.5-0.8+3.4+0.3+8.2+2.9+16.5
PlentyChess 7.0.37 PC+5.2+6.0+3.7+2.1+2.4+4.3+23.7
SF 17.1 SF+2.5+3.7+1.0-2.4+1.0+3.2+9.0
Reckless 0.9.0 RC+1.4+2.5-0.9+4.6+3.2-3.3+7.5
SF 16.1 SF-5.0+0.4+0.6+1.7+3.0+5.0+5.7
Komodo Dragon 3.3 KD-7.5-6.4-7.4-4.3-6.2-4.9-36.7
RubiChess RU-6.5-7.9-9.6-10.4-2.5-8.4-45.3
Key findings
Reckless in ultra-bullet
Reckless 0.10.0d sits only 1.5 points behind SF 18dev while playing 100 more games. A 59.4% score at 9 seconds against this full field suggests its evaluation function is extraordinarily time-efficient — extracting more value per node than any other non-Stockfish engine here.
SF 16.1 — the comeback story
Started catastrophically at -5.0 Elo in R1–8, then recovered block by block, reaching +5.0 in R41–50. The most dramatic recovery arc in the tournament. The engine appears to "warm up" as its Elo settles against the field — a calibration effect.
Single-thread penalty
Komodo Dragon (-29.6) and RubiChess (-35.8) were architecturally designed for multi-thread environments. Stripped of parallelism at 9 seconds, their MCTS and deep-search approaches cannot function correctly. This tournament exposes that dependency brutally.
Points vs Elo — the illusion
PlentyChess 7.0.37 holds 617.0 raw points — more than anyone — yet ranks #5 by Elo. It plays 1,050 games vs 900 for Stockfish versions. Raw points are meaningless across unequal game counts; Score% and Elo are the only honest metrics.
Anomaly: Alexandria 9.0.0 ranks below 8.1.12
In a well-functioning version series, a newer version should outperform the older one. Here, Alexandria 9.0.0 (Elo 2816.8, -9.2) ranks below Alexandria 8.1.12 (Elo 2822.3, -16.6). Under single-thread ultra-bullet conditions, the architectural changes in 9.0.0 appear counter-productive — a probable speed/quality trade-off that does not suit 9-second time controls.
Final standings projection
Tournament winner
Stockfish 18dev
Positive Elo gain in every single block — no signs of slowing. Barring a structural change, this engine finishes first.
Will hold rank #3
Reckless 0.10.0d
Consistently positive across all blocks, with a strong +7.3 in the last period. Could threaten rank #2 if SF 18 loses momentum.
Watch closely
Stockfish 16.1
Recovery arc is real and accelerating (+5.0 in R41–50). Likely to overtake SF 17.1 in final standings if the trend holds.
Bottom two locked
Komodo Dragon & RubiChess
No recovery signals in any block. RubiChess holds the tournament record for worst single block (-10.4 in R25–32). Both finish last.