2019 Algorithm Postmortem
Sept 1, 2019 21:40:23 GMT -8
Nationals GM (Preston - Old), Rays GM (Donavan), and 3 more like this
Post by Rockies GM (Dan) on Sept 1, 2019 21:40:23 GMT -8
This was a crazy year. One which, as many are rightfully pointing out, the program missed its mark on quite a few teams.
So, my question becomes - is it a matter of the simulation being faulty or a matter of the stat accumulation from projections being less than accurate?
I think I have a definitive answer here.
I have, without unveiling the project, been working on a way to simulate the season by throwing even more chaos and randomness into the equation. Up until this point, I have been simulating games and seasons by historical correlations between the Z-Score of a stat in relation to the league and winning percentage of said statistic. Before midway through this year, in fact, it had merely been the ranking and/or numerical value of a stat. Adding the Z-Score increased accuracy, but still had things a bit TOO rigid, as I'll show below. That being said, it still produced a good correlation between its predictions and outcomes.
Now, I have a way of actually simulating statistics in each game. I've found historical correlations between average statistics accumulated throughout the season and their standard deviation during a season. It's a way to account for the fact that, while the Rockies have 215 HRs, some weeks they'll only get 5 and others 10. While doing that, I found a multi-variable equation for OPS with a strong correlation between Batting Average, HRs, and OPS. While WHIP is trickier, I found a good correlation between WHIP and the Z-Scores (compared to the season average for the team in question) of Strikeouts and ERA. It stands to reason that, if you have a better than average week in both strikeouts and ERA, you will probably have a WHIP. It's definitely one of the weakest correlations I found, but it's there.
So, I input the final results of this year's accumulated statistics and ran the entire season, starting from scratch. From there, you can simply look at the R-squared value to see correlation. Finally, in a perfect simulation, the slope of the trendline would be 1, with a Y-Intercept of 0.
Let's take a look:
Extremely promising, but I can do better:
Look at that beauty. R-Squared, Slope and Intercept about as good as it gets in a predictive program like this.
----------------------------------
So, what does this prove?
While the Z-Score predictions were actually still fairly accurate, I think it's needless to say I'll be going with the full simulation algorithm from here on out.
Moreover, what it proves is that the fault does not lie in the simulation itself, but the accumulation of statistics, especially preseason.
There will never be such a thing as a perfect preseason prediction. There will always be teams overrated and underrated at the beginning of the year. It's the beauty of not knowing how a baseball season will unfold. Guys you thought would have amazing years flop, and bargain basement FAs no one expects to do anything can win you a league. All I have to go off of are the projections published for public consumption. But I'll do my best to examine my methodology and try and improve the accuracy of assessing statistical accumulation.
-----------------------------------
Tl;dr
Algorithm actually very accurate for being predictive. Statistical accumulation/Accuracy of projections downfall this year (and of those, I only have control of one).
So, my question becomes - is it a matter of the simulation being faulty or a matter of the stat accumulation from projections being less than accurate?
I think I have a definitive answer here.
I have, without unveiling the project, been working on a way to simulate the season by throwing even more chaos and randomness into the equation. Up until this point, I have been simulating games and seasons by historical correlations between the Z-Score of a stat in relation to the league and winning percentage of said statistic. Before midway through this year, in fact, it had merely been the ranking and/or numerical value of a stat. Adding the Z-Score increased accuracy, but still had things a bit TOO rigid, as I'll show below. That being said, it still produced a good correlation between its predictions and outcomes.
Now, I have a way of actually simulating statistics in each game. I've found historical correlations between average statistics accumulated throughout the season and their standard deviation during a season. It's a way to account for the fact that, while the Rockies have 215 HRs, some weeks they'll only get 5 and others 10. While doing that, I found a multi-variable equation for OPS with a strong correlation between Batting Average, HRs, and OPS. While WHIP is trickier, I found a good correlation between WHIP and the Z-Scores (compared to the season average for the team in question) of Strikeouts and ERA. It stands to reason that, if you have a better than average week in both strikeouts and ERA, you will probably have a WHIP. It's definitely one of the weakest correlations I found, but it's there.
So, I input the final results of this year's accumulated statistics and ran the entire season, starting from scratch. From there, you can simply look at the R-squared value to see correlation. Finally, in a perfect simulation, the slope of the trendline would be 1, with a Y-Intercept of 0.
Let's take a look:
Extremely promising, but I can do better:
Look at that beauty. R-Squared, Slope and Intercept about as good as it gets in a predictive program like this.
----------------------------------
So, what does this prove?
While the Z-Score predictions were actually still fairly accurate, I think it's needless to say I'll be going with the full simulation algorithm from here on out.
Moreover, what it proves is that the fault does not lie in the simulation itself, but the accumulation of statistics, especially preseason.
There will never be such a thing as a perfect preseason prediction. There will always be teams overrated and underrated at the beginning of the year. It's the beauty of not knowing how a baseball season will unfold. Guys you thought would have amazing years flop, and bargain basement FAs no one expects to do anything can win you a league. All I have to go off of are the projections published for public consumption. But I'll do my best to examine my methodology and try and improve the accuracy of assessing statistical accumulation.
-----------------------------------
Tl;dr
Algorithm actually very accurate for being predictive. Statistical accumulation/Accuracy of projections downfall this year (and of those, I only have control of one).