Sunday data/statistics link roundup (4/15)

  1. Incredibly cook, dynamic real-time maps of wind patterns in the United States. (Via Flowing Data)
  2. A d3.js coding tool that updates automatically as you update the code. This is going to be really useful for beginners trying to learn about D3. Real time coding (Via Flowing Data)
  3. An interesting blog post describing why the winning algorithm in the Netflix prize hasn’t actually been implemented! It looks like it was too much of an engineering hassle. I wonder if this will make others think twice before offering big sums for prizes like this. Unless the real value is advertising…(via Chris V.)
  4. An article about a group at USC that plans to collect all the information from apps that measure heart beats. Their project is called everyheartbeat. I think this is a little bit pre-mature, given the technology, but certainly the quantified self field is heating up. I wonder how long until the target audience for these sorts of projects isn’t just wealthy young technofiles? 
  5. A really good deconstruction of a recent paper suggesting that the mood on Twitter could be used to game the stock market. The author illustrates several major statistical flaws, including not correcting for multiple testing, an implausible statistical model, and not using a big enough training set. The scary thing is apparently a hedge fund is teaming up with this group of academics to try to implement their approach. I wouldn’t put my money anywhere they can get their hands on it. This is just one more in the accelerating line of results that illustrate the critical need for statistical literacy both among scientists and in the general public.