Tag: baseball


Sunday Data/Statistics Link Roundup (10/7/12)

  1. Jack Welch got a little conspiracy-theory crazy with the job numbers. Thomas Lumley over at StatsChat makes a pretty good case for debunking the theory. I think the real take home message of Thomas’ post and one worth celebrating/highlighting is that agencies that produce the jobs report do so based on a fixed and well-defined study design. Careful efforts by government statistics agencies make it hard to fudge/change the numbers. This is an underrated and hugely important component of a well-run democracy. 
  2. On a similar note Dan Gardner at the Ottawa Citizen points out that evidence-based policy making is actually not enough. He points out the critical problem with evidence: in the era of data what is a fact? “Facts” can come from flawed or biased studies just as easily from strong studies. He suggests that a true “evidence based” administration would invest more money in research/statistical agencies. I think this is a great idea. 
  3. An interesting article by Ben Bernanke suggesting that an optimal approach (in baseball and in policy) is one based on statistical analysis, coupled with careful thinking about long-term versus short-term strategy. I think one of his arguments about allowing players to play even when they are struggling short term is actually a case for letting the weak law of large numbers play out. If you have a player with skill/talent, they will eventually converge to their “true” numbers. It’s also good for their confidence….(via David Santiago).
  4. Here is another interesting peer review dust-up. It explains why some journals “reject” papers when they really mean major/minor revision to be able to push down their review times. I think this highlights yet another problem with pre-publication peer review. The evidence is mounting, but I hear we may get a defense of the current system from one of the editors of this blog, so stay tuned…
  5. Several people (Sherri R., Alex N., many folks on Twitter) have pointed me to this article about gender bias in science. I initially was a bit skeptical of such a strong effect across a broad range of demographic variables. After reading the supplemental material carefully, it is clear I was wrong. It is a very well designed/executed study and suggests that there is still a strong gender bias in science, across ages and disciplines. Interestingly both men and women were biased against the female candidates. This is clearly a non-trivial problem to solve and needs a lot more work, maybe one step is to make recruitment packages more flexible (see the comment by Allison T. especially). 

Once in a lifetime collapse

Baseball Prospectus uses Monte Carlo simulation to predict which teams will make the postseason. According to this page, on Sept 1st, the probability of the Red Sox making the playoffs was 99.5%. They were ahead of the Tampa Bay Rays by 9 games. Before last night’s game, in September, the Red Sox had lost 19 of 26 games and were tied with the Rays for the wild card (the last spot for the playoffs). To make this event even more improbable, The Red Sox were up by one in the ninth with two outs and no one on for the last place Orioles. In this situation the team that’s winning, wins more than 95% of the time. The Rays were in exactly the same situation as the Orioles, losing to the first place Yankees (well, their subs). So guess what happened? The Red Sox lost, the Rays won. But perhaps the most amazing event is that these two games, both lasting much more than usual (one due to rain the other to extra innings) ended within seconds of each other. 

Update: Nate Silver beat me to it. And has much more!


Small ball is a bad strategy

Bill James pointed this out a long time ago. If you don’t know Bill James, you should look him up. I consider him to be one of the most influential statisticians of all times. This post relates to one of his first conjectures: sacrificing outs for runs, referred to as small ball, is a bad strategy. 

ESPN’s Gamecast, a webtool that gives you pitch-by-pitch updates of baseball games, also gives you a pitch-by-pitch “probability” of wining. Gamecast confirms the conjecure with data. How do they calculate this “probability”? I am pretty sure it is based only on historical data. No modeling. For example, if the away team is up 4-2 in the bottom of the 7th with no outs and runners on 1st and 2nd, they look at all the instances exactly like this one that have ever happened in the digitally recorded history of baseball and report the proportion of times the home team wins. Well in this situation this proportion is 45%. If the next batter successfully bunts, moving the runners over, this proportion drops to 41%.  Furthermore, if after the successful bunt, the run from third scores on a sacrifice fly, the proportion drops again from 41%  to 39%. The extra out hurts you more than the extra run helps you. That was Bill James’ intuition: you only have three outs so the last thing you want to do is give 33% away. 


[youtube http://www.youtube.com/watch?v=_tvh5edD22c?wmode=transparent&autohide=1&egm=0&hd=1&iv_load_policy=3&modestbranding=1&rel=0&showinfo=0&showsearch=0&w=500&h=375]

“Any other team wins the World Series, good for them…if we win, with this team … we’ll have changed the game.”

Moneyball! Maybe the start of the era of data. Plus it is a feel good baseball movie where a statistician is the hero. I haven’t been this stoked for a movie in a long time.