Wednesday, March 22, 2006

Probability Analysis of March Madness Upsets - or what I should have picked.

The fun of participating in a sport pool isn't just the chance of winning, it is the thrill of the mathematical analysis. It is apocryphal that all statisticians are gamblers, it gives them plenty of good examples for their lectures.

It is March so the sports pool in question must be March Madness, the NCAA men's basketball tournament pool. The one I am participating has an extra wrinkle. Normally a pool gives a point for each win in the first round (round of 64), 2 points for the round of 32, and so on up to 32 points for picking the winner of the championship. The pool I am in is different because it rewards upsets. The pool points are the round points times the seed of the winner of the matchup. Thus a #1 seed yields 1 point, but a #8 yields 8 points, all multiplied the points for the round (1,2,4,8,16 or 32). This setup really changes how you pick. Without rewarding upsets, you wouldn't bother picking an upset, here the seed multiplier tempts you to try to pick the upset because it could be many more points (For instance, #8 beating #1 or #15 beating #2).

So in a pool with upsets, how should you pick your winners? Like any good statistician I resort to looking at the historical data. 1985 was the first year in which 64 teams played in the tournament so we will restrict our analysis to that. The data comes from Wikipedia and shrpsports.

Just how often do upsets occur in the first round of 64 teams? The chart below shows the upset probability for each of the matchups, compiled from 672 first round matchups over 21 years (since 1985). An upset is when the lower ranked team wins the game. The chart shows that a #16 team has never beaten a #1 team, while upsets happen more than half the time for #8 vs #9 matchups, so pick those #9's. Even #5 beats #12 almost one out of every three tries. If you want the points you better pick some upsets in the first round.

The round of 32 is slightly more difficult to analyze because there is correspondingly less data, only 336 matchups over 21 years, and they are dependent on the results of the first round and the way the brackets are structured. Still, we can draw some conclusions. Surprisingly #10's beat #2's almost half the time, and #12's are almost as successful against #4's. The data reveal that a significant number of upsets occur in the round of 32 as well.

Other information you might use in your bracket would be that the worst seed to win the tournament was #8 and this happened once. The worst seed to make it to the Final Four was ranked #11, which also only happened once. Since these are highly improbably events your bracket should have avoided them.

There is one final issue with the analysis. Certainly the data shows that in the round of 64 and 32 that upsets will happen. On any given day any team can beat any other team. The key to winning is picking which upsets will happen and that takes some knowledge of the teams. Right now at the Sweet 16 round I am not in last place in the pool, but I'm close. I also don't have too much hope for further points compared to the rest of the pool. Please take that into account when taking any advice from here. Trust the data, what you do with it is up to you.

tags: , , ,

No comments: