Sunday, March 18, 2007

Can't get enough statistical analysis of March Madness upset picks

I decided to go back through the data I have accumulated from NCAA tournaments from 1985 to 2005 in order to see just how many points one could achieve by picking the winners with 100% accuracy. The point total scheme to be used is based on the sum of the seeds multiplied by a Round factor:

points = sum of winning seeds of 1st round * 1
+ sum of winning seeds of 2nd round * 2
+ sum of winning seeds of Sweet 16 Round * 4
+ sum of winning seeds of Elite 8 Round * 8
+ sum of winning seeds of Final Four Round * 16
+seed of the Champion * 32

As mentioned in earlier analyses this scheme encourages picking upsets since you get correspondingly more points. The big problem remains: which upsets to pick, and did I pick the right number of upsets. The following chart shows 21 year of data and the point totals expected for each. (click to enlarge)

This chart lets us answer some questions about which round is the most important for points and whether the tournament is the same from year to year or very different. Most years have around 185 points for Round 1, and 150 points for Round 2 and have the variability in the number of points increasing in later Rounds.

The highest point total year was 1985 (909 points) when #8 seed Villanova won the entire tournament and the #8 factor multiplied through every round. 1988 (799 points) had #6 seed Kansas win the entire tournament with similar effect. Lowest point year 1993 (only 486 points) had the opposite effect with almost no upsets the entire tournament and 3 #1 seeds and a #2 seed in the Final Four.

The point totals are high in 2000 for a different reason. In 2000 (789 points) two #8 seeds, North Carolina and Wisconsin made it to the Final Four. It is apparent and maybe obvious that years with upsets have correspondingly higher potential for points based on the upset scoring format, but that these points show up more in later rounds than in Rounds 1 and 2.

The next chart shows the distribution of the points for the past 21 years. (Click to enlarge) Half of the totals are less than 632 and 90% are less than 789. After the bracket is together for a particular year you can calculate the potential points assuming that all of the picks are right. It seems to me that this potential point total should reflect the distribution of past tournaments.

Thus, potential point totals of 600 to 650 reflect the most common potential point totals from the past. If you have chosen a bracket that falls out side of the ranges above they are statistically less likely. This type of analysis should allow you to check a bracket once it is complete to ensure that you haven't chosen too wildly or too conservatively. My potential point total this year was around 450, so I now feel that that was too conservative.

The next analysis needs to get at the difficult point of exactly which upsets to pick and how to perform the above analysis taking into account that no one gets all of the picks correctly. What is the correct way to make picks that ensures I get the bounce in the points in later rounds?

No comments: