Top-3 vs Bottom-3 Deep Analysis

overall gap — top 3 vs bottom 3

avg top-3 score vs bottom

74.8%

across 216 matchup games

avg Elo gap (top vs bottom)

~105 pts

live Elo difference

best single matchup

83.3%

SF18dev vs Obsidian 15

closest bottom-3 result

65.6%

KomoDragon vs PlentyChess

head-to-head score matrix (top 3 rows, bottom 3 cols — % won by top engine)

80–100% (dominant) 70–79% (strong) 60–69% (comfortable)

Engine	vs Obsidian 15.0	vs RubiChess	vs Komodo Dragon 3.3	Avg vs bottom
Stockfish 18dev	20–4 (83%)	17.5–6.5 (73%)	18–6 (75%)	77.0%
Stockfish 18	18–6 (75%)	19.5–4.5 (81%)	17.5–6.5 (73%)	76.3%
PlentyChess 7.0.37	17–7 (71%)	17.5–6.5 (73%)	18–6 (75%)	72.9%
Bottom engine avg conceded	23.2% won	24.6% won	25.0% won

win rate bars — each top engine vs all opponents

vs Obsidian 15.083.3%

vs Komodo Dragon 3.375.0%

vs RubiChess72.9%

vs Pawnocchio72.9%

vs Caissa 1.2272.9%

vs Alexandria 8.072.9%

vs Berserk 13.066.7%

vs Caissa 1.2568.8%

vs Obsidian 16.066.7%

vs Alexandria 9.077.1%

vs Alexandria 8.1.1262.5%

vs PlentyChess 6.066.7%

vs Stockfish 15.162.5%

vs PlentyChess 7.0.066.7%

vs Stockfish 16.154.2%

bottom-3 resistance — how well each held up

vs SF18dev vs SF18 vs PlentyChess

key insights

anomalySF18 beats SF18dev vs RubiChess

Stockfish 18 scored 19.5–4.5 (81%) against RubiChess, while 18dev only managed 17.5–6.5 (73%). This is the only matchup where the "older" version outperforms the dev build — likely a result of opening book variance over 24 games rather than a genuine strength difference.

surpriseKomodo Dragon is the most resistant bottom engine

Across all three top engines, Komodo Dragon conceded the least points on average (25.0% won vs ~23% for Obsidian and ~24.6% for RubiChess). Despite ranking last overall, it maintains grit against elite opponents. Its Elo of 2767 still reflects decades of development depth.

gapPlentyChess's one weak spot: Berserk 13.0

PlentyChess 7.0.37 dominates nearly everyone in the bottom half, but scores only 47.9% (11.5–12.5) against Berserk 13.0 — a near-even result against an engine ranked 18th. This is PlentyChess's clearest vulnerability in the tournament and suggests Berserk's style creates specific problems for it.

scaleThe 20–4 result is historically brutal

SF18dev vs Obsidian 15.0 (20–4) means Obsidian won fewer than 1 game in 6. In a 24-game series, this reflects not just superiority but near-total positional and tactical dominance. An Elo gap of ~115 points predicts roughly 67% — the actual 83% is far above expectation, suggesting Obsidian 15 is particularly poorly suited to SF18dev's style.