2019 Algorithm Postmortem

« Prev
1
Next »

Rockies GM (Dan)
Trade Panel Member

Posts: 2,333

2019 Algorithm Postmortem Sept 1, 2019 21:40:23 GMT -8 Nationals GM (Preston - Old), Rays GM (Donavan), and 3 more like this

Quote

Post by Rockies GM (Dan) on Sept 1, 2019 21:40:23 GMT -8

This was a crazy year. One which, as many are rightfully pointing out, the program missed its mark on quite a few teams.

So, my question becomes - is it a matter of the simulation being faulty or a matter of the stat accumulation from projections being less than accurate?

I think I have a definitive answer here.

I have, without unveiling the project, been working on a way to simulate the season by throwing even more chaos and randomness into the equation. Up until this point, I have been simulating games and seasons by historical correlations between the Z-Score of a stat in relation to the league and winning percentage of said statistic. Before midway through this year, in fact, it had merely been the ranking and/or numerical value of a stat. Adding the Z-Score increased accuracy, but still had things a bit TOO rigid, as I'll show below. That being said, it still produced a good correlation between its predictions and outcomes.

Now, I have a way of actually simulating statistics in each game. I've found historical correlations between average statistics accumulated throughout the season and their standard deviation during a season. It's a way to account for the fact that, while the Rockies have 215 HRs, some weeks they'll only get 5 and others 10. While doing that, I found a multi-variable equation for OPS with a strong correlation between Batting Average, HRs, and OPS. While WHIP is trickier, I found a good correlation between WHIP and the Z-Scores (compared to the season average for the team in question) of Strikeouts and ERA. It stands to reason that, if you have a better than average week in both strikeouts and ERA, you will probably have a WHIP. It's definitely one of the weakest correlations I found, but it's there.

So, I input the final results of this year's accumulated statistics and ran the entire season, starting from scratch. From there, you can simply look at the R-squared value to see correlation. Finally, in a perfect simulation, the slope of the trendline would be 1, with a Y-Intercept of 0.

Let's take a look:

Extremely promising, but I can do better:

Look at that beauty. R-Squared, Slope and Intercept about as good as it gets in a predictive program like this.

----------------------------------

So, what does this prove?

While the Z-Score predictions were actually still fairly accurate, I think it's needless to say I'll be going with the full simulation algorithm from here on out.

Moreover, what it proves is that the fault does not lie in the simulation itself, but the accumulation of statistics, especially preseason.

There will never be such a thing as a perfect preseason prediction. There will always be teams overrated and underrated at the beginning of the year. It's the beauty of not knowing how a baseball season will unfold. Guys you thought would have amazing years flop, and bargain basement FAs no one expects to do anything can win you a league. All I have to go off of are the projections published for public consumption. But I'll do my best to examine my methodology and try and improve the accuracy of assessing statistical accumulation.

-----------------------------------

Tl;dr

Algorithm actually very accurate for being predictive. Statistical accumulation/Accuracy of projections downfall this year (and of those, I only have control of one).

Last Edit: Sept 1, 2019 22:04:18 GMT -8 by Rockies GM (Dan)

2020 World Series Champions

Rays GM (Donavan)
Moderator

Happily re-married (got it right this time) with 6 kids

Posts: 1,765

2019 Algorithm Postmortem Sept 1, 2019 22:45:58 GMT -8

Quote

Post by Rays GM (Donavan) on Sept 1, 2019 22:45:58 GMT -8

For what it is worth (probably not much), I think these predictions are amazing.

I struggle to determine which starting pitchers to start in my rotation on a day to day basis, based on match-ups/form, and likewise with starting bats that may be running hot or not. So given that I don't know my starting line-up, any prediction that had my team finish within a win or where I finished is nothing short of magic.

Whatsmore, with the number of free agency players entering teams each week, and releases and trades occurring in this active league, some teams don't look anything like what they did in the pre-season, I know my team doesn't.

Well done Dan, any chance you can predict which of my currently rostered players will still be on my roster at this time next year.

Last Edit: Sept 1, 2019 22:47:10 GMT -8 by Rays GM (Donavan)

Former Twins GM (Robin) Hall of Famer How about those Twins? Posts: 1,665	2019 Algorithm Postmortem Sept 2, 2019 6:48:11 GMT -8 via mobile Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Former Twins GM (Robin) on Sept 2, 2019 6:48:11 GMT -8 Absolutely top notch Dan. I love seeing this stuff. You're really spoiling us with the stats. Thank you!
	2014 AL Champions 2015 AL Central Champions 2019 AL Central Champions

Rockies GM (Dan) Trade Panel Member Posts: 2,333	2019 Algorithm Postmortem Sept 2, 2019 8:32:46 GMT -8 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Rockies GM (Dan) on Sept 2, 2019 8:32:46 GMT -8 What I find kind of astounding is that the accuracy actually improves with more noise and chaos. The more variability week-to-week, the closer the results
	2020 World Series Champions

Shoutbox

Pirates GM (Hollar): You sound like my therapist. Mar 11, 2024 1:29:12 GMT -8

Red Sox GM (Shane): Thanks Dave Mar 11, 2024 4:08:10 GMT -8

Padres GM (Amy): @hollah, that is truly brave work Mar 11, 2024 5:47:59 GMT -8

Reds GM (Pat H.): Hi, my name is Pat and I'm addicted to fantasy baseball. Mar 11, 2024 6:26:35 GMT -8

Padres GM (Amy): i tried to quit and we see how that went Mar 11, 2024 6:27:33 GMT -8 *

Cardinals GM (John C): Quote from Amy: "Just When I Think I'm Out, They Pull Me Back In." Mar 14, 2024 6:54:31 GMT -8

Reds GM (Pat H.): We will try Round 5 of the draft on Fantrax. You are able to fill your queue with players now. It doesn't start until Round 4 is over. Mar 14, 2024 7:24:36 GMT -8

Padres GM (Amy): Pretty sure Yankees pick is invalid as Martorella just released Mar 17, 2024 13:08:03 GMT -8 *

Yankees GM (Jeremiah): yea, he will be released. Mar 18, 2024 6:56:50 GMT -8

Pirates GM (Hollar): Amy, are you gonna join us on Discord any time soon? It's the new hot place for shitposting. Mar 19, 2024 0:25:28 GMT -8

Padres GM (Amy): so i have discord but i think i lost my invite to this league or something Mar 19, 2024 6:01:36 GMT -8

Pirates GM (Hollar): If I knew how to send those, I would send you one. Mar 21, 2024 1:30:28 GMT -8

Padres GM (Amy): Thanks maybe some day Mar 21, 2024 15:44:05 GMT -8

Cubs GM (Beau): Looking for holds. Let's do an early season trade! Apr 11, 2024 14:16:09 GMT -8

Nationals GM (Preston): Sorry to those who have reached out lately; work and life have been busy. Continue to be in the market for CI/RP! Jun 10, 2024 18:16:28 GMT -8

Pirates GM (Hollar): I cannot begin to understand work and life being busy. Go to jail. Jun 14, 2024 23:43:29 GMT -8

Reds GM (Pat H.): This week lasts until July 28. The minimum AB to qualify for AVG & OPS is 142. The minimum IP to qualify for ERA & WHIP is 42. Disregard what fantrax says about MIN/MAX for this week. Jul 17, 2024 13:26:11 GMT -8

Reds GM (Pat H.): This is the final week for free agency pickups Aug 27, 2024 10:25:21 GMT -8

Reds GM (Pat H.): Please vote if you are returning next year in the poll in the off-season board. Sept 11, 2024 14:00:08 GMT -8 *

Reds GM (Pat H.): Please archive (copy and paste) your Proboards roster in the off-season board on Proboards. We still need 6 teams to answer the returning for next season question. Sept 25, 2024 5:25:26 GMT -8