Sunday data/statistics link roundup (7/1)

  1. A really nice explanation of the elements of Obamacare. Rafa’s post on the new inHealth initiative Scott is leading got a lot of comments on Reddit. Some of them are funny (Rafa’s spelling got rocked) and if you get past the usual level of internet-commentary politeness, some of them seem to be really relevant - especially the comments about generalizability and the economics of health care. 
  2. From Andrew J. a cool visualization of the human genome, they are showing every base of the human genome over the course of a year. That turns out to be about 100 bases per second. I think this is a great way to show how much information is in just one human genome. It also puts the sequencing data deluge in perspective. We are now sequencing thousands of these genomes a year and its only going to get faster. 
  3. Cosma Shalizi has a nice list of unsolved problems in statistics on his blog (via Edo A.). These problems primarily fall into what I call Category 1 problems in my post on motivating statistical projects. I think he has some really nice insight though and some of these problems sound like a big deal if one was able to solve them.
  4. A really provocative talk on why consumers are the job creators. The issue of who are the job creators seems absolutely ripe for a thorough statistical analysis. There are a thousand confounders here and my guess is that most of the work so far has been Category 2 - let’s use convenient data to make a stab at this. But a thorough and legitimate data analysis would be hugely impactful. 
  5. Your eReader is collecting data about you.