Tag: LiteratureWatch


Moneyball for Academic Institutes

A way that universities grow in research fields for which they have no department is by creating institutes. Millions of dollars are invested to promote collaboration between existing faculty interested in the new field. But do they work? Does the university get their investment back? Through the years I have noticed that many institutes are nothing more than a webpage and others are so successful they practically become self-sustained entities. This paper (published in STM) led by John Hogenesch, uses data from papers and grants to evaluate an institute at Penn. Among other things, they present a method that uses network analysis to objectively evaluate the effect of the institute on collaboration. The findings are fascinating. 

The use of data to evaluate academics is becoming more and more popular, especially among administrators. Is this a good thing? I am not sure yet, but statisticians better get involved before a biased analyses gets some of us fired.


Benford's law

Am I the only one who didn’t know about Benford’s law? It says that for many datasets, the probability that the first digit of a random element is d is given by P(d)= log_10 (1 + 1/d). This post by Jialan Wang explores financial report data and, using Benford’s law, notices that something fishy is going on… 

Hat tip to David Santiago.

Update: A link has been fixed. 


Errors in Biomedical Computing

Biomedical Computation Review has a nice summary (in which I am quoted briefly) by Kristin Sainani about the many different types of errors in computational research, including the infamous Duke incident and some other recent examples. The reproducible research policy at Biostatistics is described as an example for how the publication process might need to change to prevent errors from persisting (or occurring).


A Really Cool Paper on the "Hot Hand" in Sports

I just found this really cool paper on the phenomenon of the “hot hand” in sports. The idea behind the “hot hand” (also called the “clustering illusion”) is that success breeds success. In other words, when you are successful (you win games, you make free throws, you get hits) you will continue to be successful. In sports, it has frequently been observed that events are close to independent, meaning that the “hot hand” is just an illusion. 

In the paper, the authors downloaded all the data on NBA free throws for the 2005/2006 through the 2009/2010 seasons. They cleaned up the data, then analyzed changes in conditional probability. Their analysis suggested that free throw success was not an independent event. They go on to explain: 

However, while statistical traces of this phenomenon are observed in the data, an open question still remains: are these non random patterns a result of “success breeds success” and “failure breeds failure” mechanisms or simply “better” and “worse” periods? Although free throws data is not adequate to answer this question in a definite way, we speculate based on it, that the latter is the dominant cause behind the appearance of the “hot hand” phenomenon in the data.

The things I like about the paper are that they explain things very simply, use a lot of real data they obtained themselves, and are very careful in their conclusions. 


Cool papers

  1. Here is a paper where they scraped Twitter data over a year and showed how the the tweets corresponded with sleep patterns and diurnal rhythms. The coolest part of this paper is that these two guys just went out and collected the data for free. I wish they had focused on more interesting questions though, it seems like you could do a lot with data like this. 
  2. Since flu season is upon us, here is an interesting paper where the authors used data on friendship networks and class structure in a high school to study flu transmission. They show targeted treatment isn’t as effective as people had thought when using random mixing models. 
  3. This one is a little less statistical. Over the last few years there were some pretty high profile papers that suggested that over-expressing just one protein could double or triple the lifetime of flies or worms. Obviously, that is a pretty crazy/interesting result. But in this paper some of those results are called into question. 

Defining data science

Rebranding of statistics as a field seems to be a popular topic these days and “data science” is one of the potential rebranding options. This article over at Revolutions is a nice summary of where the term comes from and what it means. This quote seems pretty accurate:

My own take is that Data Science is a valuable rebranding of computer science and applied statistics skills.


Innovation and overconfidence

I posted a while ago on how overconfidence may be a good thing. I just read this fascinating article by Neal Stephenson (via aldaily.com) about innovation starvation. The article focuses a lot on how science fiction inspires people to work on big/hard/impossible problems in science. Its a great read for the nerds in the audience. But one quote stuck out for me:

Most people who work in corporations or academia have witnessed something like the following: A number of engineers are sitting together in a room, bouncing ideas off each other. Out of the discussion emerges a new concept that seems promising. Then some laptop-wielding person in the corner, having performed a quick Google search, announces that this “new” idea is, in fact, an old one—or at least vaguely similar—and has already been tried. Either it failed, or it succeeded. If it failed, then no manager who wants to keep his or her job will approve spending money trying to revive it. If it succeeded, then it’s patented and entry to the market is presumed to be unattainable, since the first people who thought of it will have “first-mover advantage” and will have created “barriers to entry.” The number of seemingly promising ideas that have been crushed in this way must number in the millions.

This has to be the single biggest killer of ideas for me. I come up with an idea, google it, find something that is close, and think well it has already been done so I will skip it. I wonder how many of those ideas would have actually turned into something interesting if I had just had a little more overconfidence and skipped the googling? 


Some cool papers

  1. A cool article on the regulator’s dilemma. It turns out what is the best risk profile to prevent one bank from failing is not the best risk profile to prevent all banks from failing. 
  2. Persistence of web resources for computational biology. I think this one is particularly relevant for academic statisticians since a lot of academic software/packages are developed by graduate students. Once they move on, a large chunk of “institutional knowledge” is lost. 
  3. Are private schools better than public schools? A quote from the paper: “Indeed when comparing the average score in the two types of schools after adjusting for the enrollment effects, we find quite surprisingly that public schools perform better on average.