Sunday data/statistics link roundup (2/3/2013)

Jeff Leek
  1. My student, Hilary, wrote a post about how her name is the most poisoned in history. A poisoned name is a name that quickly loses popularity year over year. The post is awesome for the following reasons: (1) she is a good/funny writer and has lots of great links in the post, (2) she very clearly explains concepts that are widely used in biostatistics like relative risk, and (3) she took the time to try to really figure out all the trends she saw in the name popularity. I’m not the only one who thinks it is a good post, it was reprinted in New York Magazine and went viral this last week.
  2. In honor of it being Super Bowl Sunday (go Ravens!) here is a post about the reasons why it often doesn’t make sense to consider the odds of an event retrospectively due to the Wyatt Earp effect. Another way to think about it is, if you have a big tournament with tons of teams - someone will win. But at the very beginning, any team had a pretty small chance of winning all the games and taking the championship. If we wait until some team wins and calculate their pre-tournament odds of winning, it will probably be small. (via David S.)
  3. A new article by Ben Goldacre in the NYT about unreported clinical trials. This is a major issue and Ben is all over it with his All Trials project. This is another reason we need a deterministic statistical machine. Don’t worry, we are working on building it.
  4. Even though it is Super Bowl Sunday, I’m still eagerly looking forward to spring and the real sport of baseball. Rafa sends along this link analyzing the effectiveness of patient hitters when they swing at a first strike. It looks like it is only a big advantage if you are an elite hitter.
  5. An article in Wired on the importance of long data. The article talks about how in addition to cross-sectional big data, we might also want to be looking at data over time - possibly large amounts of time. I think the title is maybe a little over the top, but the point is well taken. It turns out this is something a bunch of my colleagues in imaging and environmental health  have been working on/talking about for a while. Longitudinal/time series big data seems like an important and wide-open field (via Nick R.).