An important article about anti-science sentiment in the U.S. (via David S.). The politicization of scientific issues such as global warming, evolution, and healthcare (think vaccination) makes the U.S. less competitive. I think the lack of statistical literacy and training in the U.S. is one of the sources of the problem. People use/skew/mangle statistical analyses and experiments to support their view and without a statistically well trained public, it all looks “reasonable and scientific”.
A fascinating article about the debate on whether to regulate sugary beverages. One of the protagonists is David Allison, a statistical geneticist, among other things. It is fascinating to see the interplay of statistical analysis and public policy. Yet another example of how statistics/data will drive some of the most important policy decisions going forward. A related article is this one on the way risk is reported in the media.
A really nice list of journals software/data release policies from Titus’ blog. Interesting that he couldn’t find a data/release policy for the New England Journal of Medicine. I wonder if that is because it publishes mostly clinical studies, where the data are often protected for privacy reasons? It seems like there is going to eventually be a big discussion of the relative importance of privacy and open data in the clinical world.
Yelp put a data set online for people to play with, including reviews, star ratings, etc. This could be a really neat data set for a student project. The data they have made available focuses on the area around 30 universities. My alma mater is one of them. A sort of goofy talk about how to choose the optimal marriage partner when viewing the problem as an optimal stopping problem.
Amanda Cox on the process they went through to come up with this graphic about the Facebook IPO. So cool to see how R is used in the development process. A favorite quote of mine, “But rather than bringing clarity, it just sort of looked chaotic, even to the seasoned chart freaks of 620 8th Avenue.” One of the more interesting things about posts like this is you get to see how statistics versus a deadline works.
Nature genetics has an editorial on the Mayo and Myriad cases. I agree with this bit: “In our opinion, it is not new judgments or legislation that are needed but more innovation. In the era of whole-genome sequencing of highly variable genomes, it is increasingly hard to justify exclusive ownership of particularly useful parts of the genome, and method claims must be more carefully described.” Via Andrew J. One of Tech Review’s 10 emerging technologies from a February 2003 article?
This is a great article about the illusion of progress in machine learning. In part, I think it explains why the Leekasso (just using the top 10) isn’t a totally silly idea. I also love how he talks about sources of uncertainty in real prediction problems that aren’t part of the classical models when developing prediction algorithms. I think that this is a hugely underrated component of building an accurate classifier - just finding the quirks particular to a type of data.
The psychologist whose experiment didn’t replicate then went off on the scientists who did the replication experiment is at it again. I don’t see a clear argument about the facts of the matter in his post, just more name calling. This seems to be a case study in what not to do when your study doesn’t replicate. More on “conceptual replication” in there too. Berkeley is running a data science course with instructors Jeff Hammerbacher and Mike Franklin, I looked through the notes and it looks pretty amazing.
A really interesting proposal by Rafa (in Spanish - we’ll get on him to write a translation) for the University of Puerto Rico. The post concerns changing the focus from simply teaching to creating knowledge and the potential benefits to both the university and to Puerto Rico. It also has a really nice summary of the benefits that the university system in the United States has produced. Definitely worth a read.
This is the big one. ESPN has opened up access to their API! It looks like there may only be access to some of the data for the general public though, does anyone know more? Looks like ESPN isn’t the only sports-related organization in the API mood, Nike plans to open up an API too. It would be great if they had better access to individual, downloadable data. Via Leonid K.
A couple of links: figshare is a site where scientists can share data sets/figures/code. One of the goals is to encourage researchers to share negative results as well. I think this is a great idea - I often find negative results and this could be a place to put them. It also uses a tagging system, like Flickr. I think this is a great idea for scientific research discovery. They give you unlimited public space and 1GB of private space.
Statistics help for journalists (don’t forget to keep rating stories!) This is the kind of thing that could grow into a statisteracy page. The author also has a really nice plug for public schools. An interactive graphic to determine if you are in the 1% from the New York Times (I’m not…). Mike Bostock’s d3.js presentation, this is some really impressive visualization software. You have to change the slide numbers manually but it is totally worth it.
A few data/statistics related links of interest: Eric Lander Profile The math of lego (should be “The statistics of lego”) Where people are looking for homes. Hans Rosling’s Ted Talk on the Developing world (an oldie but a goodie) Elsevier is trying to make open-access illegal (not strictly statistics related, but a hugely important issue for academics who believe government funded research should be freely accessible), more here.