Developing the simulation required the following steps and assumptions.
1.) Collect the data for each player or teams games for this season. Turnovers and touchdowns for DEF (and special teams); field goals and extra points for the kickers; passing and rushing touchdowns for the quarterback, wide receivers, tight ends; rushing touchdowns for the running backs. I will randomly select from this history to generate simulations of the Superbowl.
2.) Assume a player's performance in the Superbowl will be identical to their performance in one of the games they played in this season. If a player didn't play they get a zero for that game, except for the kickers for which I only have partial season data. This may decrease the points slightly, and is potentially a bad assumption.
3.) Kickers get their own field goals, but only get extra points equal to the touchdowns their team scores (actually all the rushing and defense touchdowns, but only the QB passing TD's to avoid double counting). Typically a game with field goals has less touchdowns, so decoupling the game history so that a game with a lot of touchdowns for the QB could be paired with a game with a lot of field goals for the kicker could result in a higher than expected points. Possible another poor assumption.
4.) Everybody (RB and QB) gets their rushing touchdowns, but passing touchdowns are awarded only if the quarterback throws at least one. There are instances in the game history of the QB's not throwing any. I really should assign each passing touchdown to a WR, TE, or RB or player not on the list but that is to complicated to program in excel. This may result in excess points, and is an expedient assumption.
5.) The score of the game is the field goals, rushing RD's, and only the quarterback's passing TD's to avoid double counting, and the defense/special teams touchdowns. This is slightly inaccurate since the passing touchdowns for the receivers are not all counted or double counted. The simulation still generates widely varying scores.
6.) Simulate many games by bootstrapping (selecting TD's or outcomes from each particular player's history this season. Add the points for each player to the rosters that have the players on them.
7.) Used the RANK() function to determine the places. Ties get the same rank using this function and the next ranks down are eliminated. For instance 3 first places get rank 1 and the next rank is 4. Rank is important to determine who is "in the money".
8.) As to the money, it is a fraction of the total collected from all of the rosters: 70% for first place, and 30% for second place. However, a tie for first divides the total money (100%) and there is no second, a tie for second with only one first divides the second place money, 30%, among the second place tied rosters. To be in the money RDK1 needs to be alone in first, tie first, or be alone or tied for second with only one first place roster ahead.
The rosters above show RDK1 roster in fifth place, but with enough similarities to other rosters both ahead and behind it that winning money in the pool will require some fine threading of the outcomes.
Remember that the focus of this simulation is to answer the question "With only the Superbowl to go, will the RDK1 roster be in the money at the end of the playoffs?" and "What combination of player results does RDK1 need to be in the money and what is the chance that such an outcome will occur?"
The histogram above (click for larger) shows the rank of the two top RDK rosters, RDK1 and RDK6 after the outcome of 20,000 simulations. The first red bar highlights the fraction of simulations with RDK1 roster in first place and in the money (alone or tied) at 1.4% of 5000 simulations. The green bar highlights the fraction of simulations with RDK1 roster in second place (alone or tied, with no first place tie) and in the money at 16.3% of 5000 simulations. RDK1 is in the money in about 18% of the 5000 simulations. RDK1 starts in fifth place before the Superbowl and can climb to first or slip to 13th place according to the simulations. There was some hope that RDK6 might have the potential to be in the money but from its starting point at 13th place, it never rises above 3rd place and can slip to 30th in the simulations.
Another way to look at this data is go ahead and calculate the winnings for each outcome.
This chart shows that the most likely outcome, 80%, is that RDK1 has no winnings, but the rest of the bars which add up to about 20% are various outcomes with winnings for the RDK1 roster.
This chart expands the Y axis to zoom in on the lower probability outcomes. There is a 10% chance of being alone in second place, a 4% chance of tieing second. There is even a less than 0.2% chance of being alone in first place. The less likely outcomes include situations in which I am tied with several others, up to 5 others, for first or, up to 7 others, for second. I need about 10% of the total collected to break even for the six rosters I entered.
Of course simulation generates outcomes for all of the rosters, otherwise I couldn't perform the comparisons needed to determine what place I am in or whether RDK1 roster will earn money. A less self-centered data reporting approach yields information about all of the outcomes.
The chart above (definitely click for larger) shows the histogram of the frequencies of the final rank after the Superbowl (simulated) of the top twenty rosters as they stand now(actual) before the Superbowl. The top twenty was chosen as a cutoff because it contains the lowest ranked roster that could win money in the simulations. The legend has the roster in their current ranking order (Cara H in 1st through RDK3 in 20th place). Bruschi Drink 3 ends most of the 1000 simulations in first with Tim G5 ending most of the 1000 simulations in second. there is a small but significant fraction of RDK 1 results in second place as we showed earlier. The chart will reward closer examination for the interested.
The information above can be used to determine the fraction of simulations (in this case, 5000) in which any given roster will be "in the money". The chart above shows that Bruschi Drink 3 is more than 80% likely to win some money followed by Tim G 5a at 42%. Almost a third of the time, Cara H in first place is likely to end up with some money. More annoying is that Bruschi Drink 4, a roster currently tied for 20th place, has a small but finite chance of being in the money. The results above do not total to 100% because more than one roster can be in the money (not just 1 and 2 but multiple rosters tieing for first, or one first place with multiple 2nds).
A compilation of the actual outcomes of each of 20,000 simulations can show the most likely particular outcome instead of the probabilistic compilations further above. The outcomes above compile the rosters in first or second place. Recall that in the case of a first place tie there is no second place.
As suggested by the charts further above, but shown directly in this one, the most likely first and second outcome at 22% is that Bruschi Drink 3 will be first with Tim G 5 second. The next most likely is heartening because it has Bruschi Drink 3 in first with RDK1 in second. Even so, these top twenty outcomes represent only 82% of the outcomes generated in 20,000 simulations. There are highly unlikely but predicted outcomes of all sorts, including some interesting ones with 6 tied in first place, or a first place with 8 tied for second, both only 1 time out of 20,000.
The RDK1 roster appears in these outcomes usually as a second place winner in the 2nd, 14th, 15th ,and 18th most likely outcome. You need to go down to the 17th most likely outcome to see RDK1 in first place, though it is tied with the ever successful Bruschi Drink 3.
Thus my final prediction is that Brschi Drink 3 will be in first place with Tim G 5 in second, though I am hoping for the 18% chance of RDK1, my own roster, being "in the money".
No comments:
Post a Comment