1 · The model in one sentence
Every World Cup 2026 match probability on Onside is the output of an opponent-rating logistic model: each team gets a deterministic score from four signals — FIFA world ranking, Premier League squad footprint, host advantage and confederation strength — and the win/draw/loss split for a fixture comes from the difference between the two ratings, calibrated so the draw rate matches the historical group-stage baseline (~24%).
2 · The four input signals
FIFA world ranking (weight 2.8) is the dominant signal. We map the rank to a 0–1 score where rank #1 = 1.0 and rank #100 ≈ 0.0 using a linear transform clamped at 1 and 100. Premier League footprint (weight 0.10) counts the number of confirmed Premier League players in each tournament squad — normalised against England's 22 as the dataset maximum. This term is intentionally small: it functions as a tiebreaker, not a primary signal. Host advantage (weight 0.30) applies a binary +0.30 bonus to the three hosts (Mexico, USA, Canada) on home soil. Confederation strength (weight 0.30) draws from historical World Cup performance — UEFA 0.95, CONMEBOL 0.92, CONCACAF 0.52, CAF 0.55, AFC 0.48, OFC 0.30. We reduced this term from 0.6 to 0.3 on 2026-06-03 because FIFA rank already encodes most confederation strength, and double-counting was inflating UEFA-vs-rest matchups.
3 · From rating to probability
Once each team has a rating R, we compute the rating delta Δ = R_home − R_away. Two sigmoids on shifted versions of Δ produce the "wins or draws" probability for each side, with the overlap interpreted as the draw probability. The +0.55 shift inside each sigmoid is calibrated so a perfectly even matchup yields roughly 38% home / 24% draw / 38% away — matching long-run historical group-stage outcome rates. The final probability vector is renormalised to sum to exactly 100% and rounded to whole percentages for display.
4 · The Monte Carlo simulator
For the tournament-wide champion probabilities, we run 10,000 simulations of the full bracket. Each simulation samples every fixture independently from the match-probability model, except for matches that have already been played — those use the real result with certainty. Group standings tiebreak on points → goal difference → goals scored → a deterministic random tiebreaker. The top-two from each group plus the eight best third-placed teams advance to the R32. Knockout pairings re-randomise per round (a "bracket-fair" abstraction until FIFA's exact seeding plan is fixed). The aggregated counts per team across 10,000 runs give R16/QF/SF/Final/Champion probabilities.
5 · Live result integration
The simulator and the predictions index both pull live results from football-data.org via a 4-second-timeout client wrapped in Redis caching. Every completed group fixture is locked in as a certain prior; remaining group fixtures and all knockouts are sampled from the model. As the tournament progresses, the simulator naturally sharpens — by the SF round, only the SF and Final are sampled, so champion probabilities become very precise. The hourly ISR cache on the simulator page balances freshness against compute cost.
6 · What the model deliberately ignores
We do not currently model: injury news after squad submission, individual key-player absence, manager tactical change, in-tournament form. We do not adjust for late substitutions, weather, or stadium effects beyond the binary host bonus. The model is calibrated for the long run, not for any individual match — over 72 group fixtures we expect to be roughly accurate on a per-favourite basis (target: 60–70% favourite-correct), but variance on any single game is high. Treat the percentages as probabilities, not predictions.
7 · How to verify our numbers
Every per-fixture page shows the exact inputs we used (FIFA rank, PL stars, confederation, host status). Plug them into the rating function above (R = 2.8·rs + 0.10·pl + 0.30·host + 0.30·confed) and you'll arrive at the same probability vector. The /predictions page surfaces a live "model accuracy" chip once matches start playing — that's the running score of favourite-wins-only correctness vs. our prior model. We will publish a full post-tournament reconciliation comparing predicted vs actual outcomes.