March 5, 2021

Building a Baseball Model in An Evening


Spring Training is here and that means so are the MLB prediction models! Here is how to build one in just a few hours based on the Bradley-Terry model framework from Analyzing Baseball Data with R.

From a high level, the model looks at the matchups for all 2,430 games in a season then flips a coin to determine the winner. The coin flips are weighted based on the probability each team has of winning a given game, which is determined using the Bradley-Terry model. In a game between team A and B, the probability of team A winning is given by:

P(A Wins) = exp(TA) ÷ [exp(TA) + exp(TB)]

where Ti is the talent level of team i.

To calculate talent I use the average of the ZiPS and Steamer WAR projections from FanGraphs for individual players and sum up by team. A team's talent level is the percentage above or below median total team WAR they are. Given a median WAR of 30.85, the table below shows each team's talent level going into the 2021 season (as of March 5th, 2021).

Team Total WAR Talent
ARI 29.25-0.0518639
ATL 39.5 0.280389
BAL 17.05-0.447326
BOS 36.8 0.192869
CHC 29.45-0.0453809
CHW 38.3 0.241491
CIN 21.4 -0.306321
CLE 29.1 -0.0567261
COL 8.6 -0.721232
DET 20.3 -0.341977
HOU 38.95 0.262561
KCR 20.75-0.327391
LAA 34.05 0.103728
LAD 51.5 0.669368
MIA 12.35-0.599676
MIL 33.15 0.0745543
MIN 43.5 0.410049
NYM 39.9 0.293355
NYY 49.1 0.591572
OAK 30.45-0.012966
PHI 30.3 -0.0178282
PIT 15.1 -0.510535
SDP 44.2 0.432739
SEA 27.15-0.119935
SFG 20.85-0.324149
STL 31.6 0.0243112
TBR 42.2 0.367909
TEX 13.95-0.547812
TOR 45.45 0.473258
WSN 31.25 0.012966


As an example of how the probabilities are calculated, the Houston Astros open their season against the Oakland A's. Based on each team's talent, the probability the Astros win is

P(Astros Win) = exp(0.273917) ÷ [exp(0.273917) + exp(-0.017171)] = 0.568.

Using the same talent level for each simulated season would be boring, so in the actual simulations I add some noise by randomly increasing or decreasing each team's talent in each one. You can think of this as teams over or under performing relative to their projections for the year.

Finally, we simulate the post season using the same method as the regular season for each playoff series.

After 10,000 simulations, here are the results of the model along with the FanGraphs projections for comparison:

AL West

Team Mean Wins Mean Losses Max Wins Min Wins Win Division (%) Make Wild Card (%) Make Playoffs (%) Win League (%) Win WS (%) FanGraphs Wins
HOU 91.888 70.112 116 69 71.65 2.82 74.47 24.85 10.83 88
LAA 84.4918 77.5082 108 61 19.29 4.06 23.35 9.74 5.41 84
OAK 80.17 81.83 102 57 7.12 1.5 8.62 4.23 2.45 83
SEA 75.4513 86.5487 99 42 1.94 0.38 2.32 1.27 0.79 74
TEX 57.7197 104.28 80 38 0 0 0 0 0 72


AL Central

Team Mean Wins Mean Losses Max Wins Min Wins Win Division (%) Make Wild Card (%) Make Playoffs (%) Win League (%) Win WS (%) FanGraphs Wins
MIN 98.2415 63.7585 122 75 77.66 11.5 89.16 17.91 6.69 87
CHW 91.4366 70.5634 113 68 21.69 30.41 52.1 11.96 5.77 87
CLE 78.2207 83.7793 105 56 0.65 1.62 2.27 0.73 0.46 80
KCR 66.3395 95.6605 90 43 0 0.03 0.03 0.01 0.01 77
DET 66.1495 95.8505 90 43 0 0.02 0.02 0.01 0.01 72


AL East

Team Mean Wins Mean Losses Max Wins Min Wins Win Division (%) Make Wild Card (%) Make Playoffs (%) Win League (%) Win WS (%) FanGraphs Wins
NYY 101.465 60.5349 124 78 63.51 31.08 94.59 9.44 2.59 96
TOR 96.5399 65.4601 120 73 24.79 55.48 80.27 8.71 3.16 88
TBR 92.7891 69.2109 119 67 10.64 48.95 59.59 9.03 3.79 83
BOS 84.4039 77.5961 111 63 1.06 12.15 13.21 2.11 1.14 85
BAL 57.3638 104.636 80 35 0 0 0 0 0 66


NL West

Team Mean Wins Mean Losses Max Wins Min Wins Win Division (%) Make Wild Card (%) Make Playoffs (%) Win League (%) Win WS (%) FanGraphs Wins
LAD 110.268 51.732 132 88 84.38 15.57 99.95 6.82 2.07 98
SDP 100.945 61.0548 123 80 15.59 81.8 97.39 7.68 3.43 95
ARI 80.1856 81.8144 103 54 0.03 6.97 7 1.68 1.2 74
SFG 68.8702 93.1298 94 48 0 0.12 0.12 0.07 0.05 77
COL 52.3235 109.677 74 33 0 0 0 0 0 66


NL Central

Team Mean Wins Mean Losses Max Wins Min Wins Win Division (%) Make Wild Card (%) Make Playoffs (%) Win League (%) Win WS (%) FanGraphs Wins
MIL 87.7649 74.2351 111 66 46.97 6.07 53.04 19.96 12.39 79
STL 85.9834 76.0166 109 64 36.37 6.9 43.27 16.49 10.9 80
CHC 82.0682 79.9318 103 59 15.95 4.1 20.05 8.48 5.84 78
CIN 71.2318 90.7682 95 48 0.69 0.12 0.81 0.52 0.42 77
PIT 62.6362 99.3638 87 39 0.02 0 0.02 0.02 0.02 65


NL East

Team Mean Wins Mean Losses Max Wins Min Wins Win Division (%) Make Wild Card (%) Make Playoffs (%) Win League (%) Win WS (%) FanGraphs Wins
NYM 93.7153 68.2847 119 65 49.24 30.41 79.65 16.33 8.77 92
ATL 93.3949 68.6051 117 68 44.9 33.56 78.46 16.1 8.2 89
WSN 81.9628 80.0372 108 57 3.33 8.43 11.76 3.51 2.15 83
PHI 80.4196 81.5804 106 59 2.53 5.95 8.48 2.34 1.46 81
MIA 55.56 106.44 79 33 0 0 0 0 0 73


This model is very simple and I have a long list of improvements I'd like to make, but for only a few hours of work it does a fairly solid job. I'll post an update to these projections the day before Opening Day with a more detailed look at the results as well as a retrospective at the end of the season to see how it did. In the meantime, all of the code can be found here.

Last Updated: March 5, 2021

Tweet

Follow @andersonfrailey