Wednesday, August 09, 2006

Predicting Wins Using Bill James' "Favorite Toy"

Several weeks ago, a member of the discussion forum at http://www.baseballtruth.com/ by the name of Ray Anselmo completed a fair amount of research using the "Favorite Toy", a Bill James creation that uses recent performance to roughly predict career totals in a specific category. Ray put together lists of the major offensive categories (homeruns, hits, runs batted in, etc.) in an attempt to determine chances that active players might approach milestones and records such as 3,000 hits and 600 homeruns. It was fun to read his findings, especially since I had heard about the Favorite Toy before but had never known how to use it.

One area not included in Ray’s study was pitching wins. This statistic has come under much fire in recent years as a poor measure of a pitcher's actual worth to his team, to the point that in some circles, wins are considered worthless and no longer even mentioned as a barometer of pitching success. This flies in the face of public opinion, largely propagated by the sports writing community, which for at least 130 years has held wins as one of the most important statistic by which to judge a pitcher. The two schools of thought met head on over the topic last year, when Houston’s Roger Clemens dominated the National League but did not win the Cy Young award because his total of thirteen wins was considered too low.

Despite the differing points of view regarding the importance of how many games a single pitcher wins, there is one career milestone for wins that, when approached, remains an important event. It doesn't happen very often any more - experts have for years proclaimed that the sky is falling in this regard - but 300 wins is still considered one of the benchmark achievements for a great pitching career, a number matched only by 3,000 strikeouts. Everyone who has reached that total and is eligible is enshrined in the Hall of Fame, while the two active pitchers with more than 300 (Clemens and Greg Maddux) have seen and heard their names accompanied by the phrase "future Hall of Famer" for several years.

While 300 wins is still the brass ring for which all great pitchers strive, much has been made of the impending closure of the club to new members. Even before Clemens and Maddux zoomed past the milestone, there was talk that we might have seen the last of the big winners. That idle gossip has not proven entirely accurate over the past twenty-five years; of the twenty-two club members, eight joined since 1983. Given that number, it seems that the 300 win club will safely continue to add members for the foreseeable future. There might be a sizeable gap in between entrants (a thirty year break between Early Wynn and Steve Carlton, another thirteen between Nolan Ryan and Clemens) but Clemens and Maddux proved that even pitchers who played more than half their careers during the biggest offensive explosion in history can still work long enough and consistently enough to enjoy positive results.

Given the assumption that there will continue to be 300 game winners in baseball, I decided to use the Favorite Toy to determine which active pitchers, if any, might have a shot at reaching this particular plateau. (Also included but not heavily discussed are findings on the Future 200 Win Club, an area that seemed important given the youth of so many good current pitchers). The methodology is by no means perfect; James' formula predicts a set number of remaining career seasons based on a player's age and relies exclusively on the performances from the previous three years, but it is still a fun method that is bound to start conversations/arguments whenever it is put into use.

The formula I used for calculating the Favorite Toy came from Jesse Frey's 2003 article on the Baseball Think Factory web site in which he broke the process down into the individual parts. This differentiation helped me better understand where the final results were coming from. Another useful tool came into play midway through the process: an online Favorite Toy calculator at ESPN.com. Though I followed through on my goal to calculate all formulas in an Excel spreadsheet so as to see how certain numbers affected the process, the ESPN simulator came in handy as a means of double checking my final results to make sure they were correct.

The following is a brief description of the formula for the Favorite Toy:

Part 1: EYW = (3*Y0 + 2*Y1 + Y2)/6 – Y0 is last season’s total, Y1 is the previous year’s total and Y2 is the total from three years ago. The result will be what I called “Expected Yearly Wins” (EYW), which is the number of wins the pitcher will be expected to average per season for the remainder of his career.
Example: Curt Schilling won 21 games in 2004, 8 games in 2005 and projects to win 21 games in 2006. (All 2006 numbers were projected over a full season using statistics as of August 5.) Plugging those numbers into the formula, we find that Schilling is expected to average 17 wins a year for the rest of his career, which is listed at 1.5 seasons (see Part 2).

Part 2: YR = max(0.6*(40-age),1.5) – Age is, of course, the player’s age (all ages were based on those found at Baseball Reference.) This number gives the expected years remaining in the player’s career. If the pitcher is older than about forty, as several on this list were, the YR will fall below 1.5 when calculated; in this case, 1.5 is used as the baseline measure.
Example: Curt Schilling is 39 years old. When his YR is calculated, the resulting number is .6. Because the number is below 1.5, however, it is automatically raised to that minimum number.

Part 3: EXP = (EYW*YR) + Career Wins – Multiplying EYW times YR gives the number of projected wins the player has remaining in his career. This number is added to the career wins up to this year to reach the final projected total.
Example: The product of Schilling's 17 EYW times his 1.5 YR is 25 wins; when added to the 213 he is projected to have accumulated by the end of the 2006 season, it works out to a career total of 238 wins.

Part 4: P = EXP/X - 0.5 expresses the pitcher’s chances of reaching an established mark (the P stands for Probability) – in this case, 300 career wins – as a percentage. If a number is negative, the chances of reaching the milestone given the information supplied is zero; if it is higher than 100, it is a virtual certainty that the milestone will be passed. Because nothing is guaranteed, however, such results are rounded down to 97%.
Example: Schilling's probability of reaching 250 wins works out to 17.57%; he has a zero percent chance at 300.

As mentioned above, the Favorite Toy isn't perfect, but it is a lot of fun to play around with. Before starting, it should be noted that projected results can fluctuate wildly, often due to a particularly good or bad year at an advanced age. In 2003, for instance, Houston's Andy Pettitte had an 18.9% chance at reaching 300 career wins (higher than any pitcher on the list below and at a more advanced age). But after suffering through a sub par season in 2004, when at the age of 32 he could ill afford such a slip, his odds of attaining that mark have plummeted below zero.

There are ten players with career win totals below 200 who conceivably have a shot at 300 wins (not including Pettitte or John Smoltz, both of whom are close enough to that mark to be included in a separate discussion). The chart below shows these pitchers in descending order, with the highest probability listed at the top (the percentage for 300 career wins is listed on the far right side of the chart, in the "300 W" column).

(In order to be considered, pitchers had to have accumulated over 50 career wins as of August 5, 2006 and have a reasonable chance of adding at least 150 wins to that total, which effectively excluded all relievers, young pitchers who did not have at least one standout season and many older pitchers.)

Name TM AG Y0 Y1 Y2 PC1 EYW YR EFW PCW P200 P300
Garland, J. CHW 26 18 18 12 82 17 8.4 143 225 71.02% 5.50%
Zambrano, C.CHC 25 18 14 16 66 16 9 147 213 59.70% 12.82%
Santana, J. MIN 27 18 16 20 77 18 7.8 138 215 62.03% 11.79%
Willis, D. FLA 24 12 22 10 58 15 9.6 144 202 51.41% 9.50%
Buehrle, M. CHW 27 14 16 16 99 15 7.8 117 216 65.84% 8.21%
Zito, B. OAK 28 18 14 11 104 16 7.2 112 216 66.25% 6.94%
Oswalt, R. HOU 28 12 20 20 95 16 7.2 115 210 59.71% 6.20%
Beckett, J. BOS 26 19 15 9 60 16 8.4 134 194 46.00% 6.00%
Sabathia, C.CLE 25 12 15 11 81 13 9 116 197 47.06% 2.74%
Marquis, J. STL 27 18 13 15 60 16 7.8 124 184 38.21% 1.46%


The first thing that stands out about every pitcher on this list is his age: Oswalt is the oldest, and he doesn't turn 29 until the end of August. This is important for two reasons: 1) It means that all ten are conservatively projected to play between 7.2 and 9 more seasons, which at their current EYWs will allow them to add between 112 (Zito) and 147 (Zambrano) wins to their present career totals. As the Projected Career Wins (PCW) column shows, only Beckett, Sabathia and Marquis (whose inclusion here surprised me) are currently on track to finish below 200 wins.

2) The low ages of everyone on the list also means that these pitchers are approaching their prime years, when their win totals should spike for several seasons. Increasing age over this time will of course decrease the final probability, but that can be counterbalanced with good numbers, which should allow each individual to roughly maintain his current position. Mark Buehrle, for instance, is 27 years old and has won 14, 16 and 16 games the past three seasons. If he posts similar totals for the next three seasons, he will continue toward 300 wins at about the same pace; if he raises his win totals, however, to 17, 17, and 18, which is an understandable projection at this point in his career, his probability of reaching 300 (P300) climbs close to 20%.

As surprising as it was to see Jason Marquis on the list (logic and his 5.82 ERA in 2006 suggest that his slim 1.46% window will soon close), the guy at the top provided just as much of a shock. Jon Garland had rolled along as a .500 pitcher for several major league seasons before posting a career best eighteen wins and 3.50 ERA last year for the World Series champion Chicago White Sox. This year, he is projected to match that win total even though his ERA has climbed all the way to 5.16. Still, at only 26 years old, Garland is in a good situation for the time being, with a great lineup providing 6.29 runs of support per game. The continuance of that support might determine his final career win total (currently projected at 225; his chances of reaching 200 are 71.02%), as his low strikeout rates, mid-level number of homeruns allowed and unimpressive Batting Average on Balls in Play (BABIP) could translate to a consistently high ERA in the coming years. He'll need the support of the big bats to maintain and improve upon his 17 EYW.

Of all the names on the list, including those who missed the cut this time around but are still in range to bounce back with a few good seasons (more on them in a minute), the most likely to win 300 games might be Minnesota’s Johan Santana. At 27 years old, the native Venezuelan is currently third on the list with an 11.79% probability of reaching that milestone, a level he has reached by posting four consecutive double digit win seasons and emerging as the most dominant left-handed pitcher in the game. He strikes out a high number of batters (9.47 per nine innings for his career), holds opponents to a low batting average (.222 career) and doesn’t allow many baserunners (a 1.06 WHIP this season). It’s obvious that if he wants to win 300 games, it won’t be his left arm holding him back (unless, of course, the injury bug strikes). The great unknown, however, is the amount of run support he will receive. When Santana won twenty games and the AL Cy Young award in 2004, he finished 17th in the run support category among all qualifying pitchers; in 2005, he slipped to 30th (which might have cost him a second consecutive Cy) and this year, his team is only averaging 4.84 runs when he’s on the mound. Santana is the anchor of a youth movement in Minnesota, where the Twins were the second youngest team in the American League in 2005, but he will need to see better results from that future All-Star lineup in order to post the impressive numbers that will get him to 300 wins.

After Garland, Santana and Marquis, the other seven names are all ones you would expect to see. Zito is another former Cy Young award winner. Oswalt has won twenty games in each of the past two seasons. Buehrle has anchored the White Sox staff and spearheaded the team’s World Series run last season. Willis won the Rookie of the Year award in 2003 and finished second in the Cy Young voting in 2005. Beckett was the 2003 World Series MVP. Sabathia just turned 26 and already has 77 career wins. Zambrano has supplied the Cubs with a bright spot in an otherwise lost season. All are currently 28 or younger and, in the era of free agency, should at some point have the opportunity to play for a contending team.

There were at least a handful of other names that might have been expected to make an appearance here but fell just shy of meeting the minimum requirements; their absences help demonstrate the volatility of the Favorite Toy method. Mark Mulder, for instance, came into 2006 with back to back seasons of 17 and 16 victories before being derailed by injuries (his projected wins for this year: 8). Had he maintained that pace this season, his chance at 300 wins would have been a few shades under ten percent. Instead, it's well below zero (although he’s still about 42% to reach 200 wins).

Roy Halladay's projected chances at 300 were also reduced by the time he has missed due to injury in the past two years, although he was only left off by about 1.5% (almost exactly the same percentage by which Jason Marquis made it). He proved his dominance in 2003, the first half of 2005, and 2006, but still needs to show that he can remain healthy in order to approach any significant pitching milestone. If he does avoid serious injury for the next few years, other factors are in his favor, including age (29) and the quality of his team’s lineup, as the Blue Jays have provided him with 6.62 runs of support per game this season..

Other pitchers we might have expected to see on this list include Tim Hudson (injury problems and increasing age), John Lackey (surprising but true; he’s only 27 with great consistency that could make him this generation’s Don Sutton), Jake Peavy (who is having a rough 2006 after two good seasons) and Bartolo Colon (followed up 18 and 21 win seasons with just one victory in an injury plagued 2006).

Time will tell what will happen to the ten pitchers on this list; as we saw with the Pettitte example above, things can change in a hurry. For comparison’s sake, we’ll run the numbers again at the end of the season (when the actual 2006 win totals are in the books) and see what has changed in that short period of time.

Up next: Eight pitchers at the tail end of great careers and their chances of reaching 300 wins.

0 Comments: