March 5, 2021
Spring Training is here and that means so are the MLB prediction models! Here is how to build one in just a few hours based on the Bradley-Terry model framework from Analyzing Baseball Data with R.
From a high level, the model looks at the matchups for all 2,430 games in a season then flips a coin to determine the winner. The coin flips are weighted based on the probability each team has of winning a given game, which is determined using the Bradley-Terry model. In a game between team A and B, the probability of team A winning is given by:
where Ti is the talent level of team i.
To calculate talent I use the average of the ZiPS and Steamer WAR projections from FanGraphs for individual players and sum up by team. A team's talent level is the percentage above or below median total team WAR they are. Given a median WAR of 30.85, the table below shows each team's talent level going into the 2021 season (as of March 5th, 2021).
Team | Total WAR | Talent |
---|---|---|
ARI | 29.25 | -0.0518639 |
ATL | 39.5 | 0.280389 |
BAL | 17.05 | -0.447326 |
BOS | 36.8 | 0.192869 |
CHC | 29.45 | -0.0453809 |
CHW | 38.3 | 0.241491 |
CIN | 21.4 | -0.306321 |
CLE | 29.1 | -0.0567261 |
COL | 8.6 | -0.721232 |
DET | 20.3 | -0.341977 |
HOU | 38.95 | 0.262561 |
KCR | 20.75 | -0.327391 |
LAA | 34.05 | 0.103728 |
LAD | 51.5 | 0.669368 |
MIA | 12.35 | -0.599676 |
MIL | 33.15 | 0.0745543 |
MIN | 43.5 | 0.410049 |
NYM | 39.9 | 0.293355 |
NYY | 49.1 | 0.591572 |
OAK | 30.45 | -0.012966 |
PHI | 30.3 | -0.0178282 |
PIT | 15.1 | -0.510535 |
SDP | 44.2 | 0.432739 |
SEA | 27.15 | -0.119935 |
SFG | 20.85 | -0.324149 |
STL | 31.6 | 0.0243112 |
TBR | 42.2 | 0.367909 |
TEX | 13.95 | -0.547812 |
TOR | 45.45 | 0.473258 |
WSN | 31.25 | 0.012966 |
As an example of how the probabilities are calculated, the Houston Astros open their season against the Oakland A's. Based on each team's talent, the probability the Astros win is
Using the same talent level for each simulated season would be boring, so in the actual simulations I add some noise by randomly increasing or decreasing each team's talent in each one. You can think of this as teams over or under performing relative to their projections for the year.
Finally, we simulate the post season using the same method as the regular season for each playoff series.
After 10,000 simulations, here are the results of the model along with the FanGraphs projections for comparison:
Team | Mean Wins | Mean Losses | Max Wins | Min Wins | Win Division (%) | Make Wild Card (%) | Make Playoffs (%) | Win League (%) | Win WS (%) | FanGraphs Wins |
---|---|---|---|---|---|---|---|---|---|---|
HOU | 91.888 | 70.112 | 116 | 69 | 71.65 | 2.82 | 74.47 | 24.85 | 10.83 | 88 |
LAA | 84.4918 | 77.5082 | 108 | 61 | 19.29 | 4.06 | 23.35 | 9.74 | 5.41 | 84 |
OAK | 80.17 | 81.83 | 102 | 57 | 7.12 | 1.5 | 8.62 | 4.23 | 2.45 | 83 |
SEA | 75.4513 | 86.5487 | 99 | 42 | 1.94 | 0.38 | 2.32 | 1.27 | 0.79 | 74 |
TEX | 57.7197 | 104.28 | 80 | 38 | 0 | 0 | 0 | 0 | 0 | 72 |
Team | Mean Wins | Mean Losses | Max Wins | Min Wins | Win Division (%) | Make Wild Card (%) | Make Playoffs (%) | Win League (%) | Win WS (%) | FanGraphs Wins |
---|---|---|---|---|---|---|---|---|---|---|
MIN | 98.2415 | 63.7585 | 122 | 75 | 77.66 | 11.5 | 89.16 | 17.91 | 6.69 | 87 |
CHW | 91.4366 | 70.5634 | 113 | 68 | 21.69 | 30.41 | 52.1 | 11.96 | 5.77 | 87 |
CLE | 78.2207 | 83.7793 | 105 | 56 | 0.65 | 1.62 | 2.27 | 0.73 | 0.46 | 80 |
KCR | 66.3395 | 95.6605 | 90 | 43 | 0 | 0.03 | 0.03 | 0.01 | 0.01 | 77 |
DET | 66.1495 | 95.8505 | 90 | 43 | 0 | 0.02 | 0.02 | 0.01 | 0.01 | 72 |
Team | Mean Wins | Mean Losses | Max Wins | Min Wins | Win Division (%) | Make Wild Card (%) | Make Playoffs (%) | Win League (%) | Win WS (%) | FanGraphs Wins |
---|---|---|---|---|---|---|---|---|---|---|
NYY | 101.465 | 60.5349 | 124 | 78 | 63.51 | 31.08 | 94.59 | 9.44 | 2.59 | 96 |
TOR | 96.5399 | 65.4601 | 120 | 73 | 24.79 | 55.48 | 80.27 | 8.71 | 3.16 | 88 |
TBR | 92.7891 | 69.2109 | 119 | 67 | 10.64 | 48.95 | 59.59 | 9.03 | 3.79 | 83 |
BOS | 84.4039 | 77.5961 | 111 | 63 | 1.06 | 12.15 | 13.21 | 2.11 | 1.14 | 85 |
BAL | 57.3638 | 104.636 | 80 | 35 | 0 | 0 | 0 | 0 | 0 | 66 |
Team | Mean Wins | Mean Losses | Max Wins | Min Wins | Win Division (%) | Make Wild Card (%) | Make Playoffs (%) | Win League (%) | Win WS (%) | FanGraphs Wins |
---|---|---|---|---|---|---|---|---|---|---|
LAD | 110.268 | 51.732 | 132 | 88 | 84.38 | 15.57 | 99.95 | 6.82 | 2.07 | 98 |
SDP | 100.945 | 61.0548 | 123 | 80 | 15.59 | 81.8 | 97.39 | 7.68 | 3.43 | 95 |
ARI | 80.1856 | 81.8144 | 103 | 54 | 0.03 | 6.97 | 7 | 1.68 | 1.2 | 74 |
SFG | 68.8702 | 93.1298 | 94 | 48 | 0 | 0.12 | 0.12 | 0.07 | 0.05 | 77 |
COL | 52.3235 | 109.677 | 74 | 33 | 0 | 0 | 0 | 0 | 0 | 66 |
Team | Mean Wins | Mean Losses | Max Wins | Min Wins | Win Division (%) | Make Wild Card (%) | Make Playoffs (%) | Win League (%) | Win WS (%) | FanGraphs Wins |
---|---|---|---|---|---|---|---|---|---|---|
MIL | 87.7649 | 74.2351 | 111 | 66 | 46.97 | 6.07 | 53.04 | 19.96 | 12.39 | 79 |
STL | 85.9834 | 76.0166 | 109 | 64 | 36.37 | 6.9 | 43.27 | 16.49 | 10.9 | 80 |
CHC | 82.0682 | 79.9318 | 103 | 59 | 15.95 | 4.1 | 20.05 | 8.48 | 5.84 | 78 |
CIN | 71.2318 | 90.7682 | 95 | 48 | 0.69 | 0.12 | 0.81 | 0.52 | 0.42 | 77 |
PIT | 62.6362 | 99.3638 | 87 | 39 | 0.02 | 0 | 0.02 | 0.02 | 0.02 | 65 |
Team | Mean Wins | Mean Losses | Max Wins | Min Wins | Win Division (%) | Make Wild Card (%) | Make Playoffs (%) | Win League (%) | Win WS (%) | FanGraphs Wins |
---|---|---|---|---|---|---|---|---|---|---|
NYM | 93.7153 | 68.2847 | 119 | 65 | 49.24 | 30.41 | 79.65 | 16.33 | 8.77 | 92 |
ATL | 93.3949 | 68.6051 | 117 | 68 | 44.9 | 33.56 | 78.46 | 16.1 | 8.2 | 89 |
WSN | 81.9628 | 80.0372 | 108 | 57 | 3.33 | 8.43 | 11.76 | 3.51 | 2.15 | 83 |
PHI | 80.4196 | 81.5804 | 106 | 59 | 2.53 | 5.95 | 8.48 | 2.34 | 1.46 | 81 |
MIA | 55.56 | 106.44 | 79 | 33 | 0 | 0 | 0 | 0 | 0 | 73 |
This model is very simple and I have a long list of improvements I'd like to make, but for only a few hours of work it does a fairly solid job. I'll post an update to these projections the day before Opening Day with a more detailed look at the results as well as a retrospective at the end of the season to see how it did. In the meantime, all of the code can be found here.
Last Updated: March 5, 2021
Tweet