Sunday data/statistics link roundup (4/8)

  1. This is a great article about the illusion of progress in machine learning. In part, I think it explains why the Leekasso (just using the top 10) isn’t a totally silly idea. I also love how he talks about sources of uncertainty in real prediction problems that aren’t part of the classical models when developing prediction algorithms. I think that this is a hugely underrated component of building an accurate classifier - just finding the quirks particular to a type of data. Via @chlalanne.
  2. An interesting post from Michael Eisen on a serious abuse of statistical ideas in the New York Times. The professor of genetics quoted in the story apparently wasn’t aware of the birthday problem. Lack of statistical literacy, even among scientists, is becoming critical. I would love it if the Kahn academy (or some enterprising students) would come up with a set of videos that just explained a bunch of basic statistical concepts - skipping all the hard math and focusing on the ideas. 
  3.  TechCrunch finally caught up to our Mayo vs. Prometheus coverage. This decision is going to affect more than just personalized medicine. Speaking of the decision, stay tuned for more on that topic from the folks over here at Simply Statistics. 
  4. How much is a megabyte? I love this question. They asked people on the street how much data was in a megabyte. The answers were pretty far ranging looks like. This question is hyper-critical for scientists in the new era, but the better question might be, “How much is a terabyte?”