I think that perhaps one of the most intriguing debates in baseball is whether or not Ted Williams would have become the consensus greatest player of all-time if not for his three years of military service during World War II. I decided to do some analysis based on players with similar career paths at the same age Williams went off to war, one thing led to another and I ended up calculating some probabilities for career totals in Home Runs (Originally 20th All-Time), Runs Batted In (14th All-Time), Hits (75th All-Time) and Wins Above Replacement (14th All-Time) using a Poisson Cumulative Distribution Function. A Poisson Distribution is very useful when calculating sports probabilities because it takes into account a mean or expected value, so as long as you can reasonably conclude an expected value for the years that Teddy Ballgame missed you can theoretically find the probabilities that he would have ended up as the best player who ever lived (at least from a statistics standpoint).
Finding The Expected Value
In order to find Ted Williams’s expected value for Home Runs, Runs Batted In and Wins Above Replacement I am able to use the similarity score metric founded by Bill James, statistician extraordinaire. Bill James is the founder of many of the advanced metrics that we know about today and is also widely known for his analytics work with the Boston Red Sox. The methodology on how to find the similarity scores can be found on Baseball Reference. For the case of Ted Williams, I used the similarity scores for his age 23 season, which was the last full season he played before being drafted into the military. There were ten players with similar statistics during their own respective age 23 seasons (we’ll call them ‘similarity scorers’) and those players were Jimmie Foxx, Albert Pujols, Joe DiMaggio, Mel Ott, Mike Trout, Hank Aaron, Mickey Mantle, Frank Robinson, Orlando Cepeda and Miguel Cabrera.
What I did next was I compiled all of the statistics for the ten similarity scorers for the seasons that they were 23 years old through 27 years old. Ted Williams missed three seasons, his 24 year old season through his 26 year old season but I chose to compile the bookend seasons as well in order to see how close the projections were to the actual numbers Williams produced. Lastly, in order to generate the expected values needed calculate a Poisson Cumulative Distribution Function or Poisson CDF, I linearly weighted and averaged out the statistics individually for HRs, H, RBIs and WAR based off of the sample from the similarity scorers.
Now, as you can see by the visuals above, Ted Williams’s projected stats over the course of his three year absence would have been on average, 111.911 Home Runs, 553.522 Hits and 358.2 RBIs. And you can’t exactly tell what the expected value for Wins Above Replacement is from the chart, so I’ll just tell you that it is 25.43 WAR . So now, we have expected values to test using the Poisson CDF for those four categories. Based off of the expected values alone, that would bring Ted Williams’s career totals to 633 HR (6th All-Time, +14 rank difference), 3,208 H (14th All-Time, +51 rank difference), 2,197 RBI (3rd All-Time, +11 rank difference) and 148.53 WAR (7th All-Time, +7 rank difference). The expected values definitely add quite a boost to Williams’s career accomplishments when compared to the all-time leaders, but we can calculate the probability that Williams would have been able to climb the rankings even further, possibly leading a statistical category that he doesn’t already (His Career On-Base Percentage of 0.482 is All-Time Best).
Where Would He Rank on the All-Time Lists?
So, next I ran the Poisson CDF for each of the statistical categories in order to find the probability that Ted Williams actually could have been the best player of all-time. I foolishly calculated the probability for Teddy Ballgame to lead each of the categories but as you can probably tell, intuitively some of them can’t be caught by Williams with just three extra seasons. Nonetheless I started with Home Runs, and for Williams to pass Babe Ruth who was the leader at the time of Williams’s retirement he would have needed to hit 65 homers per year, extremely improbable. The probability for Williams to pass Ruth is basically zero (2.4 x 10^-10%) and the probability for Williams to own the Crown over Barry Bonds was legitimately 0%.
I learned my lesson a little bit, just on the surface the odds that Ted Williams piles up more hits than the all-time hits leader Pete Rose or the all-time hits leader at the time of Williams’s retirement Ty Cobb was impossible. I’m pretty sure Williams would have needed about 500 hits per season to be in the same ballpark but I never actually did that calculation so you don’t need to take my word for it. However, an important milestone for hitters is the 3,000 hit benchmark and for Williams to hit that plateau he would need 116 hits per season. Just by common sense, Williams could do that in his sleep, and the Poisson CDF shows exactly that with a probability Williams gets at least 3,000 hits of 100%.
Going back to the expected values for Ted Williams and Runs Batted In, this one actually looked possible unlike the Home Run and Hit Crowns. Williams would only need to surpass his expected value in RBIs by 18 to pass Babe Ruth who was the all-time leader at the time of Williams’s retirement. However he would need an extra 101 RBIs to currently hold the RBI Crown over Hank Aaron. Based off of the Poisson Cumulative Distribution Function, Williams would have about a 19.4% to surpass Babe Ruth’s RBI total as seen below. However, the chances of catching Hammerin’ Hank Aaron is an extremely slim 1.83 x 10^-5% (winning the Powerball Jackpot is about 3.42 x 10^-7%).
How Probable Is ‘Greatest Player Ever’?
I would say that the general consensus within the baseball statistics community would find WAR or Wins Above Replacement to be the most encompassing metric that we have today. So in order to determine whether Ted Williams is the greatest player of all-time, it only makes sense to determine the probability that his career WAR would be the all-time leader. Instead of outlining rank by rank and equation by equation on the all-time leaderboard, for sake of the article I skipped a few steps and filled it all out on a table. If you look at the table below, it clearly shows the probability that Ted Williams would have to surpass that player in the all-time WAR rankings.
Bottom line, a 0.04% chance at becoming the greatest player of all-time isn’t the best percentage but crazier things have happened. Last year, Leicester City FC won the Premier League entering the season at 5000-1 odds (which is 0.02%). The projections, at least in my opinion are a bit conservative when predicting Ted Williams in his prime. The projection system using the similarity scorers should account for missed time due to injury or down years because it is an average score, but if Williams had three consecutive seasons like his 1941, 1942 or 1946 campaigns then the data will of course be a little off because there is no way to project outliers in the form an average or expected value (that is what makes it an outlier, it proximate enough to the average) at least not one that I am aware of. It is also worth noting that Mike Trout who was one of the similarity scorers is only 25 years old so there obviously wasn’t data to collect on him and that dropped our sample size from ten players to nine for ages 25 and 26. If I were to do this study again I think that I would look to add in projections for MVP, other individual awards or even World Series titles but perhaps that is an article for another day.
Be sure to let me know what you all thought? Was the Ted Williams projection methodology interesting or was it too much mathematics and statistics? Clearly we can’t go back and time and force Ted Williams to play in those seasons so we will never truly know how he would have fared but either way, I hope you enjoyed the article as much as I enjoyed collecting all the information needed for it. I’ll always say that Ted Williams was the greatest player that ever lived, so I guess I’ll be counting on that 0.04% chance to pay off.
Cover Photo via Sports Illustrated
Statistics via Baseball Reference
Visualized Poisson Summations via Wolfram Alpha