Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Sunday data/statistics link roundup (1/6/2013)

  1. Not really statistics, but this is an interesting article about how rational optimization by individual actors does not always lead to an optimal solutiohn. Related, ere is the coolest street sign I think I’ve ever seen, with a heatmap of traffic density to try to influence commuters.
  2. An interesting paper that talks about how clustering is only a really hard problem when there aren’t obvious clusters. I was a little disappointed in the paper, because it defines the “obviousness” of clusters only theoretically by a distance metric. There is very little discussion of the practical distance/visual distance metrics people use when looking at clustering dendograms, etc.
  3. A post about the two cultures of statistical learning and a related post on how data-driven science is a failure of imagination. I think in both cases, it is worth pointing out that the only good data science is good science - i.e. it seeks to answer a real, specific question through the scientific method. However, I think for many modern scientific problems it is pretty naive to think we will be able to come to a full, mechanistic understanding complete with tidy theorems that describe all the properties of the system. I think the real failure of imagination is to think that science/statistics/mathematics won’t change to tackle the realistic challenges posed in solving modern scientific problems.
  4. A graph that shows the incredibly strong correlation ( > 0.99!) between the growth of autism diagnoses and organic food sales. Another example where even really strong correlation does not imply causation.
  5. The Buffalo Bills are going to start an advanced analytics department (via Rafa and Chris V.), maybe they can take advantage of all this free play-by-play data from years of NFL games.
  6. A prescient interview with Isaac Asimov on learning, predicting the Kahn Academy, MOOCs and other developments in online learning (via Rafa and Marginal Revolution).
  7. The statistical software signal - what your choice of software says about you. Just another reason we need a deterministic statistical machine.