Showing posts with label probability. Show all posts
Showing posts with label probability. Show all posts

Sunday, January 02, 2011

Probability of winning an NFL game - recalculated after thirty years

It's NFL playoff time again. I am in the process of redoing my playoff football model to more accurately reflect the probability of a given team to win a game based on the spread or the Sagarin rating difference.

Stern wrote a paper called "On the Probability of Winning a Football Game" (1991) in which he collected the final scores and the spreads from 1981, 1932, 1984 to determine the relationship between the two. He found that the final score difference between the favorite and the underdog, subtracting the spread could be modeled with a normal distribution with standard deviation of 13.89. The average was 0.07 which is effectively zero for the purposes of the analysis. The probability that a team will win a given game is then the cumulative normal distribution around the spread with a standard deviation of 13.86, or normsdist(spread/13.86) using Excel functions.

I wondered if the analysis had changed in 30 years so I pulled the data for this year through week 16. The plot is below:

The standard deviation is 13.92 with an average -0.17. Hardly any difference found from the earlier analysis for a lot of work to extract the data and get it into a format for the analysis, but at least we now know it hasn't changed. The normsdist function with the spread replaced with the Sagarin difference (home+home advantage-away) divided by 13.92 is what will be used in the game simulation for the playoff fantasy football.

Tuesday, June 15, 2010

How much of a coincidence is it? The Walt Disney World couple

Boing Boing points the coincidence of a woman spotting her husband's father and her husband as a toddler in her old family pictures that records the fact that they were at Walt Disney World at the same time as children long before they ever met and then married. The future wife is with the Mr. Smee character in the foreground and the future husband's dad is clearly visible pushing a stroller with the future husband in it in the center in the background.

It's a wonderful synchronicity that foreshadows that they were meant to be from long before they ever met, or is it? What is the probability of such an occurrence?

I think it breaks into two problems:

Problem one: Given that you were at Disney World in your youth on a particular day, and you took a picture, what is the probability that someone you know was there on the same day and in the picture.

Problem two:On the other hand, we heard about this on the news, so we could state it another way and ask, how often would we expect to hear about someone having a picture from there youth at Walt Disney World, that captured someone they know now but didn't know then in it.

Here are two comments on BoingBoing trying to figure out this probability (one, two). I think the two possibilities above are more common than you think and that the second one is very likely, I just have given up trying to formulate the problem in a clear manner since I can't quite wrap my brain around the probabilities because there are some dependent ones in here. What do you think? I have provided some data below to help:


Some data that might be helpful:

According to Park World the attendance at the Magic Kingdom was 17 million in 2007. Other estimates of milestones reached by Walt Disney World are at Disney By The Numbers. For instance the 600 millionth visitor entered on June 24, 1998. The Orlando convention and visitors bureau estimated about 3 million international visitors in 2009 (from here). Assume they all go to Disney World and are part of the 17 million total mentioned.


Kodak estimates that approximately 4 percent of all the amateur photographs taken in the United States are snapped at Walt Disney World Resort or Disneyland.

Looking at the picture itself shows about 15 people in it.

Photos printed per year (from here).

Rolls developed per year. Within the amateur market, 710 million rolls of film were developed in 1995. Total rolls were down slightly from the 1994 figure of 716 million, but higher than the 1993 total of 694. (from here). Assuming 24 photos/roll gets to 17 billion pictures, and that these are US figures, which is implied in the report.

"In fact, in 1960 newborn babies and young children were the object of 55 percent of the 2.2 billion photos taken that year." (from here)

There are over 2700 photographs taken every second around the world, adding up to well over 80 billion new images a year taken on over 3 billion rolls of film, according to estimates published by the United States Department of Commerce. (from here via there) See also this chart of the number of photos taken/year.

(via BoingBoing, via The Disney Blog, via WXII TV, photo from WXII TV)

Wednesday, March 19, 2008

March Madness for 2008 - reviewing the links

This week I have gotten a lot of search traffic based on search terms like "march madness statistics" and "march madness stats". Welcome to you all. I have in the past tried to use analysis of the past NCAA College basketball tournaments to try to improve my March Madness picks. In the past several years I still haven't won, but at least I know that I had a statistically good picks. To help you make your picks this year I suggest the following past posts:


  • For the Final Four and the Championship only certain seeds have ever made it that far. These frequency charts might help you to ensure your picks are not outrageously different from the past history.

  • Our March Madness pool gives points based on the seed of the team. If the team you picked wins then the points you get are the seed multiplied by the factor for the round (1,2,4,8,16,32 for the Round of 64, Round of 32, Sweet 16, Elite 8, Final Four and the Championship). I analyzed the data from 1985 to 2005 to find what the maximum points a perfect winner could get to help see if my picks were optimistic or pessimistic.

  • The final useful detail is that though their will always be upsets, you can win a pool by picking which teams will upset. I use the Sagarin ratings to get some idea of which teams are seeded correctly and which are over- or underestimated. It is surprising how the seeds often don't follow the ranks or the ratings. Also remember that any given day any team can beat any other team no matter the rating.

All of the posts above are chuck full of statistical analysis, charts and data. I have even offered some tentative advice. Good luck with your pool, but hurry tomorrow is the start. Perhaps this year I will update two years more data and finally analyze the Sweet 16 and Elite 8. The data is in a spreadsheet just calling to me.

tags: , , ,

Sunday, March 18, 2007

Can't get enough statistical analysis of March Madness upset picks

I decided to go back through the data I have accumulated from NCAA tournaments from 1985 to 2005 in order to see just how many points one could achieve by picking the winners with 100% accuracy. The point total scheme to be used is based on the sum of the seeds multiplied by a Round factor:

points = sum of winning seeds of 1st round * 1
+ sum of winning seeds of 2nd round * 2
+ sum of winning seeds of Sweet 16 Round * 4
+ sum of winning seeds of Elite 8 Round * 8
+ sum of winning seeds of Final Four Round * 16
+seed of the Champion * 32

As mentioned in earlier analyses this scheme encourages picking upsets since you get correspondingly more points. The big problem remains: which upsets to pick, and did I pick the right number of upsets. The following chart shows 21 year of data and the point totals expected for each. (click to enlarge)

This chart lets us answer some questions about which round is the most important for points and whether the tournament is the same from year to year or very different. Most years have around 185 points for Round 1, and 150 points for Round 2 and have the variability in the number of points increasing in later Rounds.

The highest point total year was 1985 (909 points) when #8 seed Villanova won the entire tournament and the #8 factor multiplied through every round. 1988 (799 points) had #6 seed Kansas win the entire tournament with similar effect. Lowest point year 1993 (only 486 points) had the opposite effect with almost no upsets the entire tournament and 3 #1 seeds and a #2 seed in the Final Four.

The point totals are high in 2000 for a different reason. In 2000 (789 points) two #8 seeds, North Carolina and Wisconsin made it to the Final Four. It is apparent and maybe obvious that years with upsets have correspondingly higher potential for points based on the upset scoring format, but that these points show up more in later rounds than in Rounds 1 and 2.

The next chart shows the distribution of the points for the past 21 years. (Click to enlarge) Half of the totals are less than 632 and 90% are less than 789. After the bracket is together for a particular year you can calculate the potential points assuming that all of the picks are right. It seems to me that this potential point total should reflect the distribution of past tournaments.

Thus, potential point totals of 600 to 650 reflect the most common potential point totals from the past. If you have chosen a bracket that falls out side of the ranges above they are statistically less likely. This type of analysis should allow you to check a bracket once it is complete to ensure that you haven't chosen too wildly or too conservatively. My potential point total this year was around 450, so I now feel that that was too conservative.

The next analysis needs to get at the difficult point of exactly which upsets to pick and how to perform the above analysis taking into account that no one gets all of the picks correctly. What is the correct way to make picks that ensures I get the bounce in the points in later rounds?