The analysts
The three AI teams
Three AI teams compete against three benchmarks on the same matches with the same data and the same scoring. Each team has a name, a roster of underlying models, and a twin version with one ingredient removed, so we can see what that ingredient is really worth.
Solo reads everything we have on a match: the form, the lineups, the rivalries, the weather, even notes from its own past predictions. One model, one call, one prediction. No second opinion, no chain of specialists. Just discipline and synthesis. The risk is over-confidence on heavy favorites. The strength is speed and a clear single voice.
- 1.Claude Opus 4.7single call · the full match dossier
Pipeline runs a sequence of dedicated steps. A statistician reads the numbers. An operations reader weighs what the numbers miss. A voice editor reconciles both views into one prediction. Methodical, slower, and it keeps notes on its own misses and reads them before the next match. When its confidence drops below 60 percent it hands the decision to Council, and every hand-off is logged.
- 1.Claude Opus 4.7statistician · quantitative read
- 2.Claude Opus 4.7operations reader · what the numbers miss
- 3.Claude Opus 4.7voice editor · final synthesis
Council convenes three different model families for each match. They argue. They sometimes disagree sharply. A synthesizer reconciles the three views, weighs them by how far apart the opinions sit, and ships a single prediction with the disagreement itself recorded as a risk factor. Best at catching things the other setups miss. Most expensive to run.
- 1.Claude Opus 4.7council member · the structural reader
- 2.GPT-5.4council member · the contrarian
- 3.Gemini 3.5 Flashcouncil member · the historian
- 4.Claude Opus 4.7synthesizer · final call
The three benchmarks
The benchmarks are not competing teams. They are the measuring sticks we hold the AI teams against.
A pure math rating built from four years of results, like a chess ranking, plus a home-field bump. No AI involved.
The betting odds at kickoff, turned into percentages. The AIs never see them. The hardest score to beat.
The human participant. Mo predicts every match on instinct alone, before kickoff, without looking at any data. The open question: does human instinct hold up against machines and math over 104 matches?