Sunday data/statistics link roundup (6/10)

  1.  Yelp put a data set online for people to play with, including reviews, star ratings, etc. This could be a really neat data set for a student project. The data they have made available focuses on the area around 30 universities. My alma mater is one of them. 
  2. A sort of goofy talk about how to choose the optimal marriage partner when viewing the problem as an optimal stopping problem. The author suggests that you need to date around 196,132 partners to make sure you have made the optimal decision. Fortunately for the Simply Statistics authors, it took many fewer for us all to end up with our optimal matches. Via @fhuszar.
  3. An interesting article on the recent Kaggle contest that sought to identify statistical algorithms that could accurately match human scoring of written essays. Several students in my advanced biostatistics course competed in this competition and did quite well. I understand the need for these kinds of algorithms, since it takes a huge amount of human labor to score these essays well. But it also makes me a bit sad since it still seems even the best algorithms will have a hard time scoring creativity. For example, this phrase from my favorite president, doesn’t use big words, but it sure is clever, “I think there is only one quality worse than hardness of heart and that is softness of head.”
  4. A really good article by friend of the blog, Steven, on the perils of gene patents. This part sums it up perfectly, “Genes are not inventions. This simple fact, which no serious scientist would dispute, should be enough to rule them out as the subject of patents.” Simply Statistics has weighed in on this issue a couple of times before. But I think in light of 23andMe’s recent Parkinson’s patent it bears repeating. Here is an awesome summary of the issue from Genomics Lawyer.
  5. A proposal for a really fast statistics journal I wrote about a month or two ago. Expect more on this topic from me this week.