Sunday data/statistics link roundup (12/2/12)

  1. An interview with Anthony Goldbloom, CEO of Kaggle. I'm not sure I'd agree with the characterization that all data scientists are: creative, curious, and competitive and certainly those characteristics aren't unique to data scientists. And I didn't know this: "We have 65,000 data scientists signed up to Kaggle, and just like with golf tournaments, we have them all ranked from 1 to 65,000." 
  2. Check it out, art with R! It's actually pretty interesting to see how they use statistical algorithms to generate different artistic styles. Here are some more. 
  3. Now that Ethan Perlstein's crowdfunding experiment was successful, other people are getting on the bandwagon. If you want to find out what kind of bacteria you have in your gut, for example, you could check out this
  4. I thought I had it rough, but apparently some data analysts spend all their time developing algorithms to detect penis drawings!
  5. Roger was on Anderson Cooper 360 as part of the Building America segment. We can't find the video, but here is the transcript. 
  6. An interesting article on the half-life of facts. I think the analogy is an interesting one and certainly there is research to be done there. But I think it jumps the shark a bit when they start talking about how the moon landing was predictable, etc. I completely believe in the retrospective analysis of knowledge, but predicting things is pretty hard, especially when it is the future.  

Sunday data/statistics link roundup (6/10)

  1.  Yelp put a data set online for people to play with, including reviews, star ratings, etc. This could be a really neat data set for a student project. The data they have made available focuses on the area around 30 universities. My alma mater is one of them. 
  2. A sort of goofy talk about how to choose the optimal marriage partner when viewing the problem as an optimal stopping problem. The author suggests that you need to date around 196,132 partners to make sure you have made the optimal decision. Fortunately for the Simply Statistics authors, it took many fewer for us all to end up with our optimal matches. Via @fhuszar.
  3. An interesting article on the recent Kaggle contest that sought to identify statistical algorithms that could accurately match human scoring of written essays. Several students in my advanced biostatistics course competed in this competition and did quite well. I understand the need for these kinds of algorithms, since it takes a huge amount of human labor to score these essays well. But it also makes me a bit sad since it still seems even the best algorithms will have a hard time scoring creativity. For example, this phrase from my favorite president, doesn’t use big words, but it sure is clever, “I think there is only one quality worse than hardness of heart and that is softness of head.”
  4. A really good article by friend of the blog, Steven, on the perils of gene patents. This part sums it up perfectly, “Genes are not inventions. This simple fact, which no serious scientist would dispute, should be enough to rule them out as the subject of patents.” Simply Statistics has weighed in on this issue a couple of times before. But I think in light of 23andMe’s recent Parkinson’s patent it bears repeating. Here is an awesome summary of the issue from Genomics Lawyer.
  5. A proposal for a really fast statistics journal I wrote about a month or two ago. Expect more on this topic from me this week.