overall gap — top 3 vs bottom 3
avg top-3 score vs bottom
74.8%
across 216 matchup games
avg Elo gap (top vs bottom)
~105 pts
live Elo difference
best single matchup
83.3%
SF18dev vs Obsidian 15
closest bottom-3 result
65.6%
KomoDragon vs PlentyChess
head-to-head score matrix (top 3 rows, bottom 3 cols — % won by top engine)
80–100% (dominant) 70–79% (strong) 60–69% (comfortable)
Engine vs Obsidian 15.0 vs RubiChess vs Komodo Dragon 3.3 Avg vs bottom
Stockfish 18dev 20–4 (83%) 17.5–6.5 (73%) 18–6 (75%) 77.0%
Stockfish 18 18–6 (75%) 19.5–4.5 (81%) 17.5–6.5 (73%) 76.3%
PlentyChess 7.0.37 17–7 (71%) 17.5–6.5 (73%) 18–6 (75%) 72.9%
Bottom engine avg conceded 23.2% won 24.6% won 25.0% won
win rate bars — each top engine vs all opponents
vs Obsidian 15.083.3%
vs Komodo Dragon 3.375.0%
vs RubiChess72.9%
vs Pawnocchio72.9%
vs Caissa 1.2272.9%
vs Alexandria 8.072.9%
vs Berserk 13.066.7%
vs Caissa 1.2568.8%
vs Obsidian 16.066.7%
vs Alexandria 9.077.1%
vs Alexandria 8.1.1262.5%
vs PlentyChess 6.066.7%
vs Stockfish 15.162.5%
vs PlentyChess 7.0.066.7%
vs Stockfish 16.154.2%
bottom-3 resistance — how well each held up
Obsidian 15: 4,6,7. RubiChess: 6.5,4.5,6.5. Komodo Dragon: 6,6.5,6.
vs SF18dev vs SF18 vs PlentyChess
key insights
anomalySF18 beats SF18dev vs RubiChess
Stockfish 18 scored 19.5–4.5 (81%) against RubiChess, while 18dev only managed 17.5–6.5 (73%). This is the only matchup where the "older" version outperforms the dev build — likely a result of opening book variance over 24 games rather than a genuine strength difference.
surpriseKomodo Dragon is the most resistant bottom engine
Across all three top engines, Komodo Dragon conceded the least points on average (25.0% won vs ~23% for Obsidian and ~24.6% for RubiChess). Despite ranking last overall, it maintains grit against elite opponents. Its Elo of 2767 still reflects decades of development depth.
gapPlentyChess's one weak spot: Berserk 13.0
PlentyChess 7.0.37 dominates nearly everyone in the bottom half, but scores only 47.9% (11.5–12.5) against Berserk 13.0 — a near-even result against an engine ranked 18th. This is PlentyChess's clearest vulnerability in the tournament and suggests Berserk's style creates specific problems for it.
scaleThe 20–4 result is historically brutal
SF18dev vs Obsidian 15.0 (20–4) means Obsidian won fewer than 1 game in 6. In a 24-game series, this reflects not just superiority but near-total positional and tactical dominance. An Elo gap of ~115 points predicts roughly 67% — the actual 83% is far above expectation, suggesting Obsidian 15 is particularly poorly suited to SF18dev's style.