Chess Engine Tournament - The Ultra-Bullet Evolution

The Ultra-Bullet Evolution

A master class analysis of 24 engines competing under extreme time pressure — 9s+0.1s increment, single thread, 50 rounds

Intel Core i5-6500 · 16 GB RAM 346 / 1120 games played (30.8%) Blitz 9s + 0.1s increment GUI: Arena Chess

Tournament leader

SF 18dev

Elo 2909.6 · 66.2% score

+29.6 Elo

Biggest surprise

Reckless 0.10d

Rank #3 · above SF 17 & 17.1

59.4% score

Biggest collapse

RubiChess

Rank #24 · 35.8% score

-35.8 Elo

Most consistent

Caissa 1.25

Perfectly neutral across all blocks

+0.3 Elo

Engine	R 1–8	R 9–16	R 17–24	R 25–32	R 33–40	R 41–50	Total
SF 18dev SF	+4.1	+8.6	+9.1	+5.7	+4.4	+7.5	+39.4
SF 18 SF	+5.3	+7.0	+7.8	+4.9	+1.2	+4.7	+30.9
Reckless 0.10.0d RC	+2.4	+5.9	+4.3	+8.8	+2.4	+7.3	+31.1
SF 17 SF	+2.5	-0.8	+3.4	+0.3	+8.2	+2.9	+16.5
PlentyChess 7.0.37 PC	+5.2	+6.0	+3.7	+2.1	+2.4	+4.3	+23.7
SF 17.1 SF	+2.5	+3.7	+1.0	-2.4	+1.0	+3.2	+9.0
Reckless 0.9.0 RC	+1.4	+2.5	-0.9	+4.6	+3.2	-3.3	+7.5
SF 16.1 SF	-5.0	+0.4	+0.6	+1.7	+3.0	+5.0	+5.7
Komodo Dragon 3.3 KD	-7.5	-6.4	-7.4	-4.3	-6.2	-4.9	-36.7
RubiChess RU	-6.5	-7.9	-9.6	-10.4	-2.5	-8.4	-45.3

Reckless in ultra-bullet

Reckless 0.10.0d sits only 1.5 points behind SF 18dev while playing 100 more games. A 59.4% score at 9 seconds against this full field suggests its evaluation function is extraordinarily time-efficient — extracting more value per node than any other non-Stockfish engine here.

SF 16.1 — the comeback story

Started catastrophically at -5.0 Elo in R1–8, then recovered block by block, reaching +5.0 in R41–50. The most dramatic recovery arc in the tournament. The engine appears to "warm up" as its Elo settles against the field — a calibration effect.

Single-thread penalty

Komodo Dragon (-29.6) and RubiChess (-35.8) were architecturally designed for multi-thread environments. Stripped of parallelism at 9 seconds, their MCTS and deep-search approaches cannot function correctly. This tournament exposes that dependency brutally.

Points vs Elo — the illusion

PlentyChess 7.0.37 holds 617.0 raw points — more than anyone — yet ranks #5 by Elo. It plays 1,050 games vs 900 for Stockfish versions. Raw points are meaningless across unequal game counts; Score% and Elo are the only honest metrics.

Anomaly: Alexandria 9.0.0 ranks below 8.1.12

In a well-functioning version series, a newer version should outperform the older one. Here, Alexandria 9.0.0 (Elo 2816.8, -9.2) ranks below Alexandria 8.1.12 (Elo 2822.3, -16.6). Under single-thread ultra-bullet conditions, the architectural changes in 9.0.0 appear counter-productive — a probable speed/quality trade-off that does not suit 9-second time controls.

Tournament winner

Stockfish 18dev

Positive Elo gain in every single block — no signs of slowing. Barring a structural change, this engine finishes first.

Will hold rank #3

Reckless 0.10.0d

Consistently positive across all blocks, with a strong +7.3 in the last period. Could threaten rank #2 if SF 18 loses momentum.

Watch closely

Stockfish 16.1

Recovery arc is real and accelerating (+5.0 in R41–50). Likely to overtake SF 17.1 in final standings if the trend holds.

Bottom two locked

Komodo Dragon & RubiChess

No recovery signals in any block. RubiChess holds the tournament record for worst single block (-10.4 in R25–32). Both finish last.