Theory

Sunday data/statistics link roundup (1/6/2013)

Not really statistics, but this is an interesting article about how rational optimization by individual actors does not always lead to an optimal solutiohn. Related, ere is the coolest street sign I think I’ve ever seen, with a heatmap of traffic density to try to influence commuters. An interesting paper that talks about how clustering is only a really hard problem when there aren’t obvious clusters. I was a little disappointed in the paper, because it defines the “obviousness” of clusters only theoretically by a distance metric.

Cleveland's (?) 2001 plan for redefining statistics as "data science"

This plan has been making the rounds on Twitter and is being attributed to William Cleveland in 2001 (thanks to Kasper for the link). I’m not sure of the provenance of the document but it has some really interesting ideas and is worth reading in its entirety. I actually think that many Biostatistics departments follow the proposed distribution of effort pretty closely. One of the most interesting sections is the discussion of computing (emphasis mine):  Data analysis projects today rely on databases, computer and network hardware, and computer and network software.