Sunday data/statistics link roundup (1/5/14)

Jeff Leek
  1. If you haven’t seen lolmythesis it is pretty incredible. 1-2 line description of thesis projects. I think every student should be required to make one of these up before they defend. The best I could come up with for mine is, “We built a machine sensitive enough to measure the abundance of every gene in your body at once; turns out it measures other stuff too.”
  2. An interesting article about how different direct to consumer genetic tests give different results. It doesn’t say, but it would be interesting if the raw data were highly replicable and the interpretations were different. If the genotype calls themselves didn’t match up that would be much worse on some level. I agree people have a right to their genetic data. On the other hand, I think it is important to remember that even people with Ph.D’s and 15 years experience have trouble interpreting the results of a GWAS. To assume the average individual will understand their genetic risk is seriously optimistic (via Rafa).
  3. The 10 commandments of egoless programming.These are so important on big collaborative projects like my group has been working on the last year or so. Fortunately my students and postdocs are much better at being egoless than I am (I am an academic with a blog so it isn’t like you couldn’t see the ego coming :-)).
  4. This is a neat post on parsing and analyzing data from a Garmin. The analysis even produces an automated report! I love it when people do cool things like this with their own data in R.
  5. Super interesting advice page for potential graduate students from a faculty member at Duke Biology. This is particularly interesting in light of the ongoing debate about the viability of the graduate education pipeline highlighted in this recent article. I think it is important for graduate students in Ph.D. programs to know that not every student goes to an academic position. This has been true for a long time in Biostatistics, where many people end up in industry positions. That also means it is the obligation of Ph.D. programs to prepare students for a variety of jobs. Fortunately, most Ph.D.s in Biostatistics have experience processing data, working with collaborators, and developing data products so are usually also really prepared for industry.
  6. This old video of Tukey and Friedman is awesome and mind-blowing (via Mike L.).
  7. Cool site that lets you try to balance Baltimore’s budget. This type of thing would be even cooler if there were Github like pull requests where you could make new suggestions as well.
  8. My student Alyssa has a very interesting post on teaching R to a non-programmer in one hour. Take the Frazee Challenge and list what you would teach.