Having developed a model last year, I simply had to enter this year's teams and Sagarin ratings and the simulation was already to run in last year's format. My process for hopefully picking the winning roster has several steps.
1.) Data collection: Collect the Sagarin data on team ranking and performance and the data (I used Yahoo) on player and defense touchdowns, field goals, turnovers.Having already developed the model for simulating the playoffs and playoff rosters made the modifications I made this year easy, and I was able to do them in enough time to have an impact on the rosters I chose for these years Playoff fantasy football pool.
2.) Playoff game simulation: Simulate the playoff games to determine which teams are likely to play the most number of games. Wild card teams that play four games are best, teams that play three games are also good (and likely won played in the Superbowl). Determine which teams are the most likely to make to to the Superbowl. The model currently runs 1000 simulations at one time.
3.) Initial Roster Selection: The model automatically calculates the points for a given roster using the information from the games played and the stats of the players selected. Rank the players and defence by their points stats and build some rosters based on that. Also build rosters using the information of how many games a player's team plays. Also randomly choose some rosters with high point values. These are the initial seeds for the "genetic" algorithm below.
4.) Optimize and find highest point rosters: Using the rosters generated above, the spreadsheet makes combinations (or cross breeds) of the rosters (the current model pool is 33 rosters) and I keep the highest point value rosters that are generated. Sometimes I let the model use a random player in a given roster spot to ensure that I have explored all of the possibilities (random mutation). Usually the selection is from the current list of high value rosters. This year the search did find higher value rosters than the initial seeds. I would sorely love to automate this step. Perhaps in the next version of the model.
The new schema is simple. Either of the two teams with the same Sagarin ratings either team should be expected to win with a probability of 50%, since the ratings represent the number of points a team will score in the game. If a team has no points then it is expected to lose all of the time. Thus I proposed that the probability that team1 will win is...
team1 rating /(team1 rating + team2 rating).To include the Sagarin home advantage this really becomes...
home team rating + home advantage /(home team rating + home advantage + away team rating).For the monte carlo simulation, a random number between 0 and 1 less than the above probability means that the home team has won.
This preserves our earlier assumptions of evenly matched teams and team with no points and all of the arguments about its appropriateness fall to discussing what happens in between, and the validity of the ratings themselves. Sagarin suggests using the pure points for predicting the outcome of games rather than his ratings, so that is what I used.
Recall that we are trying to determine how many games each team will play so that we can pick players or defences that not only score points, but also have a three or four game, rather than one or two, in which to score them. The best player in only one game may not be the best choice. (The record breaking, once in the history of the playoffs, Green Bay and Arizona game notwithstanding.)
I generated 100,000 simulations of the playoffs using the model above and tabulated the team matchups in the Superbowl. Click on the chart below for larger.
Chart of the likely matchups using the Sagarin pure points sorted by the probability of the outcome. Circles represent the median of 100 trials of 1000 simulations, diamonds bracket the 25th to 75th percentiles, crossbars the 10th and 90th, and the lines extend to the maximum and minimum variation in the results. Those matchups that are already eliminated by the wild card week of games are shaded out.
The top eight outcomes in the chart are matchups with either Indianapolis (IND) or San Diego (SD) playing Minnesota (MIN) or New Orleans (NO) in the Superbowl. The top outcome is the obvious NO beating IND in the Superbowl, while the second has them beating SD. Close examination of the first eight outcomes, out of 72 possible, shows them to really be set apart from the rest of the pack, and representing almost one third of the probability. Thus I chose rosters with players representing these matchups by setting the model to fix each particular matchup by giving high ratings to teams in question and then searching the rosters using the genetic algorithm method described above.
The next matchups on the chart are New England (NE) vs. New Orleans (NO) matchups, which I did have rosters supporting, but which are now eliminated because NE lost in the Wild card weekend. There are other chances for the harsh light of reality to burn away my optimistic modeling by having one of the team I didn't pick due to low probability, Dallas or Arizona, for instance, to go all the way and destroy my roster's chance of winning.
We can check some of the predicted outcomes of the model by looking to other sources of odds or probability for teams in the Superbowl. I took the Yahoo Odds Futures sheet, collected the teams in the playoffs, and normalized the probabilities to have the total outcomes equal 100% to get a list of the teams in the playoffs and the chance that each one would win the Superbowl. I did the same with my simulations.
Above is a chart (click the chart for larger) of the winners predicted from the Yahoo odds futures and from a simulation of 100,000 outcomes using the Sagarin pure points ratings and my new scheme for randomly simulating the winner of each matchup. Those teams that are already eliminated are shaded out. The error bars on the Yahoo odds are plus or minus one standard deviation of the six betting ratings, and the ones on the Sagarin simulation reflect the standard deviation of 100 trials of 1000 simulations.
The good news is that the two four teams are the same for the Yahoo odds futures and my simulations. Yahoo odds favors IND as the top outcome by probability, whereas my simulations show New Orleans to be the top. We can redo the chart, now taking into account the results of the Wild Card week's games.
For this chart (click for bigger) I used today's yahoo Odds futures and I set my simulations to ensure that CIN, GB, NE and PHI lost their games (by either setting their ratings at 0 if there were the away team or to minus the home advantage if they were the home team). The yahoo odds still favor IND but now DAL and MIN are rising in the odds. My simulation based on the Sagarin simulation has more changes, NO is slightly favored over the others, but the evenness of probabilities between IND, MIN, SD and BAL, DAL, and NYJ is disconcerting since I have rosters built on players and matchups from the first group, and not from the second group. BAL, DAL, or NYJ wins next week are bad news for my picks. Alternatively, if ARZ does better than expected as they have already, I will also lose the Fantasy Playoff football pool.
(Potentially next post, some analysis of the actual roster from this years Fantasy Playoff football pool.)
No comments:
Post a Comment