- Home
- Travis Sawchik
Big Data Baseball Page 4
Big Data Baseball Read online
Page 4
3
PROVING GROUNDS
The experiment began far from the spotlight in late February 2010, on the back fields of the team’s spring training home in Bradenton, Florida. The fields are remote, behind the dorms, batting cages, and office space at the Pirate City complex, and surrounded by chain-link fences and windscreens erected in vain to try to tamp down the relentless Gulf breeze. The complex is completely private, with the parking lot gated and the perimeter of the property obscured by palms, wax myrtles, gum trees, and undergrowth.
Here, the director of player development, Kyle Stark, behaving curiously, held a map of sorts, a diagram of a baseball field with X’s marked at various positions. He walked the infield holding a can of white spray paint and marked an X roughly equidistant between second and third base deep on the infield skin. The tall, brown-haired, and blue-eyed Stark paced some more, then marked another white X just behind second base, then set off into shallow right field between first base and second base and made another mark. What the hell was he doing? wondered the gray-bearded minor league coaching staff as they watched. Was he searching for buried treasure? In a way, yes.
Stark had traded e-mails all winter and early spring with the creator and director of the Pirates analytics department, Dan Fox. Fox, a profoundly unknown Pirates employee, was a former Baseball Prospectus writer and Chevron data architect who is never interviewed before or after games, and whom the public and players know little about. He has the last bio in the 2013 media guide at the bottom of page 14, which includes a mug shot and brief paragraph of text that explains in a hundred words that Fox is responsible for the “architecture, development and dissemination of information systems and quantitative analytics” within baseball operations. Most of the coaches on the field that day watching Stark had never even heard of Fox. But Stark was intrigued with Fox’s research. Months of dialogue between the two men had led Stark to this field, at this moment, to these markers, which were a guide to hidden value. Stark had in his hands something resembling a treasure map.
A baseball team places nine defenders on a baseball field. Two of those players, the pitcher and the catcher, are in fixed positions. But the other seven are free, in theory, to position themselves at any location on the field. Since the game’s beginnings players have taken familiar defensive positions. Infielders and outfielders position themselves not based upon where balls are most often hit but, rather, equidistant from other fielders. Patches of outfield grass are worn down from players’ being stationed there so often. It was counterintuitive to leave large swaths of the field unoccupied, considering that the average baseball field covers just under three acres.
This traditional defensive positioning has for more than a hundred years been based on anecdotal evidence. From the nineteenth century through the twentieth, players and managers were often armed with only personal history and observations to make decisions on defensive positioning.
But what if since the game’s origins, everyone—players, coaches, executives—everyone, had gotten defense wrong? Throughout baseball history some brief, isolated deviations from tradition have been tried. The first recorded defensive shift—generally defined as when three or more infielders are positioned to one side of second base—occurred in the nineteenth century. On May 9, 1877, the Louisville Courier-Journal reported a curious defensive strategy by Hartford manager Bob Ferguson against the Louisville Grays. Not only did Ferguson shift infielders, he occasionally moved all three of his outfielders to one-half of the outfield.
But shifts largely vanished until the 1920s, when several National League managers shifted three infielders to the right of second base to defend against pull-heavy Cy Williams, according to the Society for American Baseball Research. The left-handed Williams was power and pull conscious, meaning that he often hit the ball to right field, his pull-side. He was the first National League player to hit 200 career home runs and was one of only three players, including Babe Ruth and Rogers Hornsby, born before 1900 to hit 200 home runs in his career. But Williams and the effectiveness of the shift against him were forgotten over time.
The other Williams Shift is often credited as baseball’s first radical departure from traditional defensive alignment. Though some teams had shifted on the great Ted Williams as early as 1941, the Ted Williams Shift is often credited to have been born in the second game of a doubleheader on July 14, 1946, at Fenway Park. In the first game, Williams had gone 4-for-5, homered 3 times, and driven in 8 runs against Cleveland. So when Williams came to bat in the second game, Indians player-manager Lou Boudreau moved from his usual shortstop position to the traditional second-base position. Cleveland’s second baseman was deployed to shallow right field, and its third baseman moved to the right of second. The alignment was so radical a photo of the shift appeared in The Sporting News later that month. Against one of the first documented shifts of the postwar, modern era, Williams went 1-for-2 with a double and 2 walks. Maybe Williams’s successful work in a single game against a radically aligned defense helped delay the proliferation of shifts by another sixty-five years.
“Somebody was telling me a story about Ted Williams,’’ Hurdle once told the Pittsburgh Tribune-Review. “It was an umpire relaying a story from another umpire. [Williams] stepped to the plate at Fenway, and they had the tremendous shift for him. First time it had happened. He stepped back and looked at it. The umpire goes, ‘That’s interesting,’ [and Williams] said, ‘Not really. They can’t play me high enough.’”
But as before, shifts mostly vanished from the game after that. Why? Because there was no hard evidence that teams should shift, no data proving that shifts were effective. Any successful moves away from traditional alignment would be based upon anecdotal evidence since no one was tracking batted balls statistically. And even if somebody was, another barrier remained: fear. Going against conventional thought requires courage and conviction because, when such an unorthodox attempt fails, public criticism is intense. For most of the game’s history and for the entire twentieth century, teams played defense the same way they always had. Then came John Dewan.
* * *
On a bright Saturday afternoon in 1984, Dewan was eating lunch in the kitchen of his Chicago home and enjoying the latest Bill James Baseball Abstract, a book of in-depth and original statistical research of baseball published each year from 1977 to 1988. James wrote about baseball in a way no one had before and measured things that no one else had. For instance, in the 1985 Baseball Abstract he introduced a system to translate minor league batting performance of prospects into future major league production. He wrote about how ballparks shape statistics, how career length varies at different positions, and of course the importance but lack of understanding of defensive play. Like Dewan, Bill James was a baseball outsider. He began writing while working as a night-shift security guard at Stokely–Van Camp’s pork-and-beans cannery in Lawrence, Kansas, in the 1970s, and his work is often credited as being most responsible for bringing objective and scientific thought to baseball. In the early 1980s, James was attracting a niche following, those who had similar interests in advancing baseball thought. While James was poring over box scores and counting things that had not been counted before, it was not big data, which Wikipedia defines as an information set so large and complex it is impossible to process using traditional tools. What James and Dewan understood was that to advance the understanding of the game more data was needed. In the 1984 Abstract, he proposed beginning a grassroots effort called Project Scoresheet, which would employ a network of fans to score every game in more detail, with the information then entered into a computer database. Wrote James in the 1984 Abstract: “When Project Scoresheet is in place, all previous measures of performance in baseball will become obsolete and an entire universe of research options will fall in front of us … there is no need for the next generation of fans to be ignorant as we are.” Dewan stopped eating his lunch when he read that James was looking for volunteers to help out with his Scoresheet s
tringer network. “I remember going, ‘Oh, my gosh, this is what I’ve always wanted to do,’” Dewan said. This was similar to Dewan’s dream to computerize the play-by-play information in all sports. Personal computers were becoming more practical, powerful, and affordable, making the project possible. Dewan had graduated with two degrees from Loyola University in mathematics and computer science, and this was right up his alley.
Dewan left his kitchen table in search of a telephone directory to look up James. Three weeks later, Dewan was entering and collecting data for Project Scoresheet, and a year after that he was the project manager, writing the software and organizing people all over the country to input data. He gave his stringers scoring templates that broke the field into zones and educated his network on how the plays should be coded. By 1994, the project, under various caretakers, had collected ten years’ worth of games (1984–94), which covered 23,000 games and 1.7 million plays.
In 1987 Dewan decided his hobby had become too engrossing. His wife, Sue, had even quit her job to put more time into the data-gathering. He had to either lessen his commitment or make it a career. He was so committed to Project Scoresheet and statistical analysis of baseball that he left his successful career as an insurance actuary. He invested in a small company called STATS—an acronym for Sports Team Analysis and Tracking Systems—and became its president. His first headquarters was a spare bedroom in his Chicago home. He then moved his company into a basement office and later rented a proper office space as the company expanded.
Dewan still often works out of that spare bedroom, though his hair has turned white and his thick, black eyebrows have grown fuller. Back in 1987, STATS supplied research to NBC for its postseason-baseball coverage and was doing the same for ESPN regular-season broadcasts in 1989. James and Dewan helped dramatically increase the game’s data.
In 2000, Dewan sold STATS to News Corps. and two years later formed another company, Baseball Info Solutions (BIS), which recorded batted-ball and pitch-by-pitch data at a more detailed level.
At BIS, Dewan hired a group of video scouts to review every play of every major league game—all 2,430 per year. Consider BIS’s plus-minus statistic, which is an important metric in understanding how the evaluation of individual defensive performance has improved. Plus-minus measures how many balls individual defenders reach compared to a league average at their respective positions. BIS video scouts record the exact location on the field where every batted ball lands or is caught, then convert those locations into coordinates, storing the information in a computer database. James’s Project Scoresheet simply noted what zone a ball landed in using a uniform grid overlaying the field. In 2009, BIS began more accurately measuring how hard balls were hit. For fly balls and line drives, BIS times, stopwatch-style, the batted ball’s hang time until it lands or is caught. For ground balls in the infield, the time from when a ball is hit to when a fielder first intercepts the ball is timed. Based on that data, BIS records how many balls a fielder reaches compared to the league average at his position. The fielders are debited or credited points. The plus-minus points are then converted into a run value to quantify how many runs an individual player was saving above or below league average, a statistic called defensive runs saved.
Up until that time defenders had been judged largely on the subjective statistic of errors. An error is a judgment by the official scorer. Beyond anecdotal evidence, no one was accounting for more important factors such as how much ground a defender could cover after the ball was hit.
When Dewan was a softball-league infielder, he loved playing defense, which is why he focused so much on it. He prided himself on his defensive play, choosing to play shortstop and third base, the two positions on the more difficult left side of the infield. He was also an enthusiastic player of the baseball-simulation board game Strat-O-Matic, which, like APBA baseball, assigned probabilities to each individual player card that reflected the actual player’s skills and used dice to create random numbers and outcomes. While baseball had simplistic methods of measuring defensive value, via errors and fielding percentage, Strat-O-Matic gave each player a defensive rating, and it was important to field a strong defensive team. This all helped Dewan appreciate defensive value, while most in the baseball world were concerned with batting average and home run totals for position players.
“It made me want to appreciate what the best players are really worth, because the eye can be deceiving as with anything. Anything you do, anything you perceive, is not always the reality,” Dewan said.
Baseball’s perceptions on where to place defenders in the field were wrong.
Dewan and his team unearthed interesting data about the nature of balls in play. For instance, BIS found that major league hitters hit ground balls to their pull-side 73 percent of the time, meaning a left-handed batter hits toward the right-side of the field an overwhelming amount. Batters also hit line drives to their pull-side 55 percent of the time. The only type of balls major league hitters did not pull the majority of the time were fly-balls, which went to their pull-side only 40 percent of the time. Those numbers changed little year to year over a decade of study.
“When we got really in-depth data from Baseball Info Solutions,” Dewan said, “that’s when I also started looking very closely at shifting.”
Dewan’s database taught him some valuable lessons, which he began sharing with baseball after the 2011 season. In March of 2012 at the Society for American Baseball Research (SABR) Analytics Conference, Dewan demonstrated the value of shifting to officials from twenty major league teams. Dewan revealed that of the eight most shifted-upon hitters in 2011, their combined batting average when shifted upon declined by 51 points. All eight of the players were power-hitting, left-handed batters, the only type of batters that the majority of baseball was employing generic shifts against. While teams were shifting on some power-hitting lefties such as Jim Thome and Adam Dunn based upon anecdotal evidence, Dewan found defenses should be shifting for a hundred major league hitters—25 percent of the league.
Pull-heavy, right-handed hitters should also have seen shifts, but rarely did. According to BIS’s database, the first shift employed against a right-handed hitter in the modern era didn’t occur until June 11, 2009, when the Phillies shifted left against Gary Sheffield. In 2010 and 2011, despite mountains of batted-ball data available from scouting services such as BIS and Inside Edge, teams were only shifting 0.8 times per game, and those shifts were almost exclusively against left-handed power hitters.
In a March 30, 2012, article for BillJamesOnline.com, a Web site of statistical analysis run by James, Dewan noted the Tampa Bay Rays were the top defensive team in baseball in 2011, amassing 85 defensive runs saved. The Rays won 91 games in 2011, and Dewan noted that if the Rays had had an average defense, meaning zero defensive runs saved, they would have won 8 or 9 fewer games, using the standard that 10 runs equals 1 win. The Rays were the most aggressive shifting team in baseball in 2011, shifting their defense 216 times. The next closest team was the Milwaukee Brewers, with 170 shifts. Only two other teams—the Cleveland Indians and the Toronto Blue Jays—shifted more than 100 times in 2011.
Dewan wondered, Was it a fluke the team that shifted the most in 2011 had the best defense?
He looked at an interesting case study: the 2010 Brewers versus the 2011 Brewers. The individual defenders on the Brewers were not as talented as those on the Rays. In fact, the Brewers had the worst collection of defensive infielders in baseball. Milwaukee first baseman Prince Fielder ranked as worst defensive first baseman in baseball in 2010 according to BIS, costing the Brewers negative 17 defensive runs saved. Rickie Weeks ranked 34th out of 35 second basemen, costing the Brewers 16 runs. Shortstop Yuniesky Betancourt—who was acquired from the Kansas City Royals—cost the Royals a whopping 27 runs as he ranked last among shortstops in 2010. Third baseman Casey McGehee was their top defensive infielder, ranking 31st at his position, worth negative 14 defensive runs saved. It was difficult to dream up a
worse defensive infield.
So what did new Brewers manager Ron Roenicke do in 2011? Dewan noted the Brewers went from being one of the least aggressive defensive teams—shifting just 22 times in 2010—to the second-most-aggressive shifting, shifting 170 times.
BIS categorizes shifts into two types: the Ted Williams Shift, when three infielders are positioned to one side of second base, and “Other Shifts,” when players are out of traditional infield alignment but not quite in a Williams Shift position. Dewan noted the Brewers employed the Ted Williams Shift 45 times in 2011 against the obvious pull-heavy, lefty power hitters, the handful of hitters other teams were beginning to shift against by employing the Williams Shift. But the Brewers shifted 125 times against hitters that no other team in the National League was shifting against, using a variety of more sophisticated, nuanced, data-based alignments, making their defense dramatically better.
According to defensive runs saved, Fielder had saved 8 more runs in 2011 than the season before. Weeks had a 9-run improvement, McGehee had a 17-run turnaround, and Betancourt saved 20 more runs than during his previous season with the Royals.
In 2011, the Brewers infielders made a defensive runs saved improvement of 56 runs and added 5 or 6 wins simply through shifting more often. They went from being a 77-win team in 2010 to a 96-win team in 2011. BIS has found since 2010, when they started measuring batting performance against shifts, that batting average on ground balls and short line drives (BAGSL), the types of batted balls designed to be gobbled up by the shift, declined by 30 to 40 points per season with shifts on, compared to batting average against conventional defense.
Still, while shifting jumped slightly to 1.9 shifts per game in baseball in 2012, most of the shifts were utilized by only a handful of teams—with the Rays and the Brewers as the most devout believers and becoming ever more aggressive.