head-to-head score matrix (top 3 rows, bottom 3 cols — % won by top engine)
80–100% (dominant)
70–79% (strong)
60–69% (comfortable)
| Engine |
vs Obsidian 15.0 |
vs RubiChess |
vs Komodo Dragon 3.3 |
Avg vs bottom |
| Stockfish 18dev |
20–4 (83%) |
17.5–6.5 (73%) |
18–6 (75%) |
77.0% |
| Stockfish 18 |
18–6 (75%) |
19.5–4.5 (81%) |
17.5–6.5 (73%) |
76.3% |
| PlentyChess 7.0.37 |
17–7 (71%) |
17.5–6.5 (73%) |
18–6 (75%) |
72.9% |
| Bottom engine avg conceded |
23.2% won |
24.6% won |
25.0% won |
|
bottom-3 resistance — how well each held up
vs SF18dev
vs SF18
vs PlentyChess
key insights
anomalySF18 beats SF18dev vs RubiChess
Stockfish 18 scored 19.5–4.5 (81%) against RubiChess, while 18dev only managed 17.5–6.5 (73%). This is the only matchup where the "older" version outperforms the dev build — likely a result of opening book variance over 24 games rather than a genuine strength difference.
surpriseKomodo Dragon is the most resistant bottom engine
Across all three top engines, Komodo Dragon conceded the least points on average (25.0% won vs ~23% for Obsidian and ~24.6% for RubiChess). Despite ranking last overall, it maintains grit against elite opponents. Its Elo of 2767 still reflects decades of development depth.
gapPlentyChess's one weak spot: Berserk 13.0
PlentyChess 7.0.37 dominates nearly everyone in the bottom half, but scores only 47.9% (11.5–12.5) against Berserk 13.0 — a near-even result against an engine ranked 18th. This is PlentyChess's clearest vulnerability in the tournament and suggests Berserk's style creates specific problems for it.
scaleThe 20–4 result is historically brutal
SF18dev vs Obsidian 15.0 (20–4) means Obsidian won fewer than 1 game in 6. In a 24-game series, this reflects not just superiority but near-total positional and tactical dominance. An Elo gap of ~115 points predicts roughly 67% — the actual 83% is far above expectation, suggesting Obsidian 15 is particularly poorly suited to SF18dev's style.