Wednesday, March 25, 2009

March Madness statistical analysis does NOT guaranty good pool performance

After all of the statistical analysis from 24 years of the NCAA Basketball tournament why is my pool doing so poorly? The pool type favors upsets, the analysis says to pick upsets, but it doesn't say which to pick. To win the bracket must contain the correct upsets. Mine, however, does not.

Potential reasons for this:
  • I don't know anything about college basketball. I admit to this as a first principle
  • My technique picked every upset beyond a certain threshold of difference in the Sagarin ratings. This is probably too aggressive and resulted in two 6 seeds in the Final Four, which I should have corrected as this is too unlikely.
  • The average number of upsets is different from the fluctuations in upsets from year to year. Picking the average number is picking a certain outcome from the many different outcomes over the years. Average value is different from most likely value in a histogram.
For fluctuations vs. average it is instructive to look at the outcome of the round of 64. Over 24 years of tournaments the graph below (reproduced from an earlier post) shows the fraction of times a given matchup resulted in either the expected outcome or an upset where the lower seed wins.
As an example, 55 out of 96 historical matchups between 8 and 9 seed teams result in upsets where the 9 seed wins. That is more than half the time. That does not mean that every year there are two upsets, it means that on average over all the years roughly half of the outcomes are upsets. This has implications for what we can expect each year.

Each year in the round of 64 a given matchup occurs 4 times, one for reach region. A different presentation of the round of 64 data from the past 24 years shows that for each given matchup different years have different numbers of upsets. The possible range is from no upsets to four upsets. The chart tallies the number of years with each particular combination of upsets for each matchup. Comparing this variability data to the outcome chart shows that while more than half the time 9 seeds beat 8 seeds, in any give year every possibility has occurred. In fact only 9 years in 24 opportunities (~38%) have there been exactly two upsets. It is the most common value, but still more likely to be wrong than right.

I am not sure how to present a similar analysis for the later rounds, since the number of opportunities is determined by the outcome of the round before. It is just as important in those rounds to realize that the average outcome over 24 years is not the same as the most likely outcome from those 24 years.

Perhaps a more systematic process that seeks to maximize the number of points even in the face of these uncertainties is needed.

No comments: