Sunday data/statistics link roundup (8/26/12)


First off, a quick apology for missing last week, and thanks to Augusto for noticing! On to the links:

  1. Unbelievably the BRCA gene patents were upheld by the lower court despite the Supreme Court coming down pretty unequivocally against patenting correlations between metabolites and health outcomes. I wonder if this one will be overturned if it makes it back up to the Supreme Court. 
  2. A really nice interview with David Spiegelhalter on Statistics and Risk. David runs the Understanding Uncertainty blog and published a recent paper on visualizing uncertainty. My favorite line from the interview might be: “There is a nice quote from Joel Best that “all statistics are social products, the results of people’s efforts”. He says you should always ask, “Why was this statistic created?” Certainly statistics are constructed from things that people have chosen to measure and define, and the numbers that come out of those studies often take on a life of their own.”
  3. For those of you who use Tumblr like we do, here is a cool post on how to put technical content into your blog. My favorite thing I learned about is the Github Gist that can be used to embed syntax-highlighted code.
  4. A few interesting and relatively simple stats for projecting the success of NFL teams.  One thing I love about sports statistics is that they are totally willing to be super ad-hoc and to be super simple. Sometimes this is all you need to be highly predictive (see for example, the results of Football’s Pythagorean Theorem). I’m sure there are tons of more sophisticated analyses out there, but if it ain’t broke… (via Rafa). 
  5. My student Hilary has a new blog that’s worth checking out. Here is a nice review of ProjectTemplate she did. I think the idea of having an organizing principle behind your code is a great one. Hilary likes ProjectTemplate, I think there are a few others out there that might be useful. If you know about them, you should leave a comment on her blog!
  6. This is ridiculously cool. Man City has opened up their data/statistics to the data analytics community. After registering, you have access to many of the statistics the club uses to analyze their players. This is yet another example of open data taking over the world. It’s clear that data generators can create way more value for themselves by releasing cool data, rather than holding it all in house. 
  7. The Portland Public Library has created a website called Book Psychic, basically a recommender system for books. I love this idea. It would be great to have a recommender system for scientific papers