About this project

What this is, in plain words.

The short version

We built three AI systems that predict football matches. During the 2026 World Cup, they forecast every single game, in public, before kickoff. They compete against the betting market, a statistical rating, and one human. Every prediction, every miss, and every dollar spent is published.

Why football, when we are an operations company

Football is the test track, not the point. A World Cup gives us 104 decisions in five weeks, each with a clear right answer and a hard deadline. That is exactly what a business decision looks like. The question we actually care about: when an AI tells you it is 70 percent sure, can you trust that number? Because companies are starting to run real decisions on numbers like that one.

The three AI teams

Solo is one model doing everything alone in a single pass. Pipeline is a chain of specialists where each step builds on the last, and it keeps a memory of its own past mistakes. Council is three different AI models that each give an opinion, then a fourth reconciles them. Same data, same matches, same rules. The architecture is the only difference. That is the experiment.

The human in the race

Mo, the founder, predicts every match too. On instinct alone. No statistics, no odds, no data, just a pick and a confidence number before kickoff. Over one match, instinct can beat anything. Over 104 matches, does instinct survive against data discipline? That answer gets published whichever way it lands.

What we expect to find

Honestly: we expect nobody to beat the betting market on raw accuracy. That is not the prize. We measure whether each forecaster knows what it knows. Saying 70 percent and being right 70 percent of the time is called calibration, and for anyone putting AI into real operations, calibration is the number that decides whether you can delegate a decision or not.

Why you can trust the numbers

The rules were locked and published before scoring starts, so we cannot move the goalposts after the fact. The models are frozen for the whole tournament. Misses are published with the same prominence as hits. Costs are shown per prediction, to the cent. And the full dataset ships with the final paper on August 1.

Who is behind it

inocta.io, an operations boutique from Toronto and Montréal that puts AI into real businesses. This benchmark is our method shown in public: measure before you trust, understand before you automate. You can't automate what you don't understand.

The eight, in one table

Character	Real name	In plain words	What we learn from it
The Soloist	Solo	One AI reads the full match dossier and makes the call alone, in a single pass.	Whether one strong model with good data is all you need.
The Purist	Solo-Zero	The same AI with the dossier taken away. It predicts from memory alone.	The gap between Solo and Solo-Zero shows what the match data is actually worth.
The Assembly Line	Pipeline	A chain of specialists: one reads the numbers, one reads everything else, one writes the final call. It keeps notes on its own past misses.	Whether splitting the work into steps, plus learning from mistakes, beats one model working alone.
The Creature of Habit	Pipeline-Static	The exact same chain with the memory of past misses switched off.	The gap between Pipeline and Pipeline-Static shows whether the learning is real or just a story.
The Council	Council	Three different AIs each give an opinion, sometimes disagreeing sharply, and a fourth merges them into one call.	Whether a debate between different AIs beats any single one of them, and whether it is worth the extra cost.
The Statistician	ELO	A pure math rating built from four years of results, like a chess ranking, plus a home-field bump. No AI.	If the AI teams cannot beat simple math, the AI is not adding anything.
The Market	Market	The betting odds at kickoff, turned into percentages. The AIs never see them.	The toughest score in sports. How close anyone gets to the market is the real measure.
The Human	Mo	The founder picks every match on pure gut feel, no data, before kickoff.	Whether human instinct survives 104 matches against machines and math.

Back to the live scoreboard →