Moneyball for Academic Institutes

A way that universities grow in research fields for which they have no department is by creating institutes. Millions of dollars are invested to promote collaboration between existing faculty interested in the new field. But do they work? Does the university get their investment back? Through the years I have noticed that many institutes are nothing more than a webpage and others are so successful they practically become self-sustained entities. This paper (published in STM) led by John Hogenesch, uses data from papers and grants to evaluate an institute at Penn.

Benford's law

Am I the only one who didn’t know about Benford’s law? It says that for many datasets, the probability that the first digit of a random element is d is given by P(d)= log_10 (1 + 1/d). This post by Jialan Wang explores financial report data and, using Benford’s law, notices that something fishy is going on… Hat tip to David Santiago. Update: A link has been fixed. 

Errors in Biomedical Computing

Biomedical Computation Review has a nice summary (in which I am quoted briefly) by Kristin Sainani about the many different types of errors in computational research, including the infamous Duke incident and some other recent examples. The reproducible research policy at _Biostatistics_ is described as an example for how the publication process might need to change to prevent errors from persisting (or occurring).

A Really Cool Paper on the "Hot Hand" in Sports

I just found this really cool paper on the phenomenon of the “hot hand” in sports. The idea behind the “hot hand” (also called the “clustering illusion”) is that success breeds success. In other words, when you are successful (you win games, you make free throws, you get hits) you will continue to be successful. In sports, it has frequently been observed that events are close to independent, meaning that the “hot hand” is just an illusion.

Cool papers

Here is a paper where they scraped Twitter data over a year and showed how the the tweets corresponded with sleep patterns and diurnal rhythms. The coolest part of this paper is that these two guys just went out and collected the data for free. I wish they had focused on more interesting questions though, it seems like you could do a lot with data like this.  Since flu season is upon us, here is an interesting paper where the authors used data on friendship networks and class structure in a high school to study flu transmission.

Defining data science

Rebranding of statistics as a field seems to be a popular topic these days and “data science” is one of the potential rebranding options. This article over at Revolutions is a nice summary of where the term comes from and what it means. This quote seems pretty accurate: My own take is that Data Science is a valuable rebranding of computer science and applied statistics skills.

Innovation and overconfidence

I posted a while ago on how overconfidence may be a good thing. I just read this fascinating article by Neal Stephenson (via about innovation starvation. The article focuses a lot on how science fiction inspires people to work on big/hard/impossible problems in science. Its a great read for the nerds in the audience. But one quote stuck out for me: Most people who work in corporations or academia have witnessed something like the following: A number of engineers are sitting together in a room, bouncing ideas off each other.

Some cool papers

A cool article on the regulator’s dilemma. It turns out what is the best risk profile to prevent one bank from failing is not the best risk profile to prevent all banks from failing.  Persistence of web resources for computational biology. I think this one is particularly relevant for academic statisticians since a lot of academic software/packages are developed by graduate students. Once they move on, a large chunk of “institutional knowledge” is lost.