Simply Statistics


The researcher degrees of freedom - recipe tradeoff in data analysis

An important concept that is only recently gaining the attention it deserves is researcher degrees of freedom. From Simmons et al.:

The culprit is a construct we refer to as researcher degrees of freedom. In the course of collecting and analyzing data, researchers have many decisions to make: Should more data be collected? Should some observations be excluded? Which conditions should be combined and which ones compared? Which control variables should be considered? Should specific measures be combined or transformed or both?

So far, researcher degrees of freedom has primarily been used with negative connotations. This probably stems from the original definition of the idea which focused on how analysts could "manufacture" statistical significance by changing the way the data was processed without disclosing those changes. Reproducible research and distributed code would of course address these issues to some extent. But it is still relatively easy to obfuscate dubious analysis by dressing it up in technical language.

One interesting point that I think sometimes gets lost in all of this is the  researcher degrees of freedom - recipe tradeoff. You could think of this as the bias-variance tradeoff for big data.

At one end of the scale you can  allow the data analyst full freedom, in which case researcher degrees of freedom may lead to overfitting and open yourself up to the manufacture of statistical results (optimistic significance or point estimates or confidence intervals). Or you can require a recipe for every data analysis which means that it isn't possible to adapt to the unanticipated quirks (missing data mechanism, outliers, etc.) that may be present in an individual data set.

As with the bias-variance tradeoff, the optimal approach probably depends on your optimality criteria. You could imagine fitting a model that minimizes the mean squared error for fitting a linear model where you do not constrain the degrees of freedom in any way (that might represent an analysis where the researcher tries all possible models, including all types of data munging, choices of which observations to drop, how to handle outliers, etc.) to get the absolute best fit. Of course, this would likely be a strongly overfit/biased model. Alternatively you could penalize the flexibility allowed to the analyst. For example, you minimize a weighted criteria like:

 \sum_{i=1}^n (y_i - b_0 x_{i1} + b_1 x_{i2})^2 + Researcher \; Penalty(\vec{y},\vec{x})

Some examples of the penalties could be:

  •  \lambda \times \sum_{i=1}^n 1_{researcher\; dropped \; y_i , x_i\ ; from \; analysis}
  • \lambda \times \#\{of\;transforms\;tried\}
  •  \lambda \times \#{Outliers \; removed \; ad-hoc}

You could also combine all of the penalties together into the "elastic researcher net" type approach. Then as the collective pentalty  \lambda \rightarrow \infty you get the DSM, like you have in a clinical trial for example. As \lambda \rightarrow 0 you get fully flexible data analysis, which you might want for discovery.

Of course if you allow researchers to choose the penalty you are right back to a scenario where you have degrees of freedom in the analysis (the problem you always get with any penalized approach). On the other hand it would make it easier to disclose how those degrees of freedom were applied.


Sunday data/statistics link roundup (7/28/13)

  1. An article in the Huffpo about a report claiming there is no gender bias in the hiring of physics faculty. I didn't read the paper carefully but  I definitely agree with the quote from  Prof. Dame Athene Donald that the comparison should be made to the number of faculty candidates on the market. I'd also be a little careful about touting my record of gender equality if only 13% of faculty in my discipline were women (via Alex N.).
  2. If you are the only person who hasn't seen the upwardly mobile by geography article yet, here it is (via Rafa). Also covered over at the great "charts n things" blog.
  3. Finally some good news on the science funding front; a Senate panel raises NSF's budget by 8% (the link worked for me earlier but I was having a little trouble today). I think that this is of course a positive development. I think that article pairs very well with this provocative piece suggesting Detroit might have done better if they had a private research school.
  4. I'm going to probably talk about this more later in the week because it gets my blood pressure up, but I thought I'd just say again that hyperbolic takedowns of the statistical methods in specific papers in the popular press leads only one direction.

Statistics takes center stage in the Independent

Check out this really good piece over at the Independent. It talks about the rise of statisticians as rockstars, naming Hans Rosling, Nate Silver, and Chris Volinsky among others. I think that those guys are great and deserve all the attention they get.

I only hope that more of the superstars that fly under the radar of the general public but have made huge contributions  to science/medicine (like Ross Prentice, Terry Speed, Scott Zeger, or others that were highlighted in the comments here) get the same kind of attention (although I suspect they might not want it).

I think one of the best parts of the article (which you should read in it's entirety) is Marie Davidian's quote:

There are rock stars, and then there are rock bands: statisticians frequently work in teams


What are the 5 most influential statistics papers of 2000-2010?

A few folks here at Hopkins were just reading the comments of our post on  awesome young/senior statisticians. It was cool to see the diversity of opinions and all the impressive people working in our field. We realized that another question we didn't have a great answer to was:

What are the 5 most influential statistics papers of the aughts (2000-2010)?

Now that the auggies or aughts or whatever are a few years behind us, we have the benefit of a little hindsight and can get a reasonable measure of retrospective impact.

Since this is a pretty broad question I'd thought I'd lay down some generic ground rules for nominations:

  1. Papers must have been published in 2000-2010.
  2. Papers must primarily report a statistical method or analysis (the impact shouldn't be only because of the scientific result).
  3. Papers may be published in either statistical or applied journals.

For extra credit, along with your list give your definition of impact. Mine would be something like:

  • Has been cited at a high rate in scientific papers (in other words, it is used by science, not just cited by statisticians trying to beat it)
  • Has corresponding software that has been used
  • Made simpler/changed the way we did a specific type of analysis

I don't have my list yet (I know, a cop-out) but I'm working on it.


Sunday data/statistics link roundup (7/21/2013)

  1. Let's shake up the social sciences is a piece in the New York Times by Nicholas Christakis who rose to fame by claiming that obesity is contagious. Gelman responds that he thinks maybe Christakis got a little ahead of himself. I'm going to stay out of this one as it is all pretty far outside my realm - but I will say that I think quantitative social sciences is a hot area and all hot areas bring both interesting new results and hype. You just have to figure out which is which (via Rafa).
  2. This is both creepy and proves my point about the ubiquity of data. Basically police departments are storing tons of information about where we drive because, well, it is easy to do so why not?
  3. I mean, I'm not an actuary and I don't run cities, but this strikes me as a little insane. How do you not just keep track of all the pensions you owe people and add them up to know your total obligation? Why predict it when you could actually just collect the data? Maybe an economist can explain this one to me.  (via Andrew J.)
  4. The Times reverse scoops our clinical trials post! In all seriousness, there are a lot of nice responses there to the original article.
  5. JH Hospital back to #1. Order is restored. Read our analysis of Hopkins ignominious drop to #2 last year (via Sherri R.).

The "failure" of MOOCs and the ecological fallacy

At first blush the news out of San Jose State that the partnership with Udacity is being temporarily suspended is bad news for MOOCs. It is particularly bad news since the main reason for the suspension is poor student performance on exams. I think in the PR game there is certainly some reason to be disappointed in the failure of this first big experiment, but as someone who loves the idea of high-throughput education, I think that this is primarily a good learning experience.

The money quote in my mind is:

Officials say the data suggests many of the students had little college experience or held jobs while attending classes. Both populations traditionally struggle with college courses.

"We had this element that we picked, student populations who were not likely to succeed," Thrun said.

I think it was a really nice idea to try to expand educational opportunities to students who traditionally dont have time for college or have struggled with college. But this represents a pretty major confounder in the analysis comparing educational outcomes between students in the online and in person classes. There is a lot of room for the ecological fallacy to make it look like online classes are failing. They could very easily address this problem by using a subset of students randomized in the right way. There are even really good papers - like this one by Glynn - on the optimal way to do this.

I think there are some potential lessons learned here from this PR problem:

  1. We need good study design in high-throughput education. I don't know how rigorous the study design was in the case of the San Jose State experiment, but if the comparison is just whoever signed up in class versus whoever signed up online we have a long way to go in evaluating these classes.
  2. We need coherent programs online It looks like they offered a scattered collection of mostly lower level courses online (elementary statistics, college algebra, entry level math, introduction to programming and introduction to psychology). These courses are obvious ones for picking off with MOOCs since they are usually large lecture-style courses in person as well. But they are also hard classes to "get motivated for" if there isn't a clear end goal in mind. If you are learning college algebra online but don't have a clear path to using that education it might make more sense to start with the Khan Academy
  3. We need to parse variation in educational attainment.  It makes sense to evaluate in class and online students with similar instruments. But I wonder if there is a way to estimate the components of variation: motivation, prior skill, time dedicated to the course, learning from course materials, learning from course discussion, and learning for different types of knowledge (e.g. vocational versus theoretical) using statistical models. I think that kind of modeling would offer a much more clear picture of whether these programs are "working".

Defending clinical trials

The New York Times has published some letters to the Editor in response to the piece by Clifton Leaf on clinical trials. You can also see our response here.


Name 5 statisticians, now name 5 young statisticians

I have been thinking for a while how hard it is to find statisticians to interview for the blog. When I started the interview series, it was targeted at interviewing statisticians at the early stages of their careers. It is relatively easy, if you work in academic statistics, to name 5 famous statisticians. If you asked me to do that, I'd probably say something like: Efron, Tibshirani, Irizarry, Prentice, and Storey. I could also name 5 famous statisticians in industry with relative ease: Mason, Volinsky, Heineike, Patil, Conway.

Most of that is because of where I went to school (Storey/Prentice), the area I work in (Tibshirani/Irizarry/Storey), my advisor (Storey), or the bootstrap (Efron) and the people I see on Twitter (all the industry folks). I could, of course, name a lot of other famous statisticians. Almost all of them biased by my education or the books I read.

But almost surely I will miss people who work outside my area or didn't go to school where I did. This is particularly true in applied statistics, where people might not even spend most of their time in statistics departments. It is doubly true of people who are young and just getting started, as I haven't had a chance to hear about them.

So if you have a few minutes in the comments name five statisticians you admire. Then name five junior statisticians you think will be awesome. They don't have to be famous (in fact it is better if they are good but not famous so I can learn something). Plus it will be interesting to see the responses.


Yes, Clinical Trials Work

This saturday the New York Times published an opinion pieces wondering "do clinical trials work?". The answer, of course, is: absolutely. For those that don't know the history, randomized control trials (RCTs) are one of the reasons why life spans skyrocketed in the 20th century. Before RCTs wishful thinking and arrogance lead numerous well-meaning scientist and doctors to incorrectly believe their treatments worked. They are so successful that they have been adopted with much fanfare in far flung arenas like poverty alleviation (see e.g.,this discussion by Esther Duflo); where wishful thinking also lead many to incorrectly believe their interventions helped.

The first chapter of this book contains several examples and this is a really nice introduction to clinical studies. A very common problem was that the developers of the treatment would create treatment groups that were healthier to start with. Randomization takes care of this. To understand the importance of controls I quote the opinion piece to demonstrate a common mistake we humans make: "Some patients did do better on the drug, and indeed, doctors and patients insist that some who take Avastin significantly beat the average." The problem is that the fact that Avastin did not do better on average means that the exact same statement can be made about the control group! It also means that some patient did worse than average too. The use of a control points to the possibility that Avastin has nothing to do with the observed improvements.

The opinion piece is very critical of current clinical trials work and complains about the "dismal success rate for drug development". But what is the author comparing too? Dismal compared to what? We are talking about developing complicated compounds that must be both safe and efficacious in often critically ill populations. It would be surprising if our success rate was incredibly high.  Or is the author comparing the current state of affairs to the pre-clinical-trials days when procedures such as bloodletting were popular.

A better question might be, "how can we make clinical trials more efficient?" To answer this question there is definitely a lively and ongoing research area. In some cases they can definitely be better by adapting to new developments such as biomarkers and the advent of personalized medicine. This is why there are dozens of statisticians working in this area.

The article says that

"[p]art of the novelty lies in a statistical technique called Bayesian analysis that lets doctors quickly glean information about which therapies are working best. "

As Jeff pointed out this a pretty major oversimplification of all of the hard work that it takes to maintain scientific integrity and patient safety when studying new compounds. The fact that the analysis is Bayesian is ancillary to other issues like adaptive trials (as Julian pointed out in the comments), dynamic treatment regimes, or even more established ideas like group sequential trials. The basic principle underlying these ideas is the same: can we run a trial more efficiently while achieving reasonable estimates of effect sizes and uncertainties? You could imagine doing this by focusing on subpopulations that seem to work well for subpopulations with specific biomarkers, or by stopping trials early if drugs are strongly (in)effective, or by picking optimal paths through multiple treatments. That the statistical methodology is Bayesian or Frequentist has little to do with the ways that clinical trials are adapting to be more efficient.

This is a wide open area and deserves a much more informed conversation. I'm providing here a list of resources that would be a good place to start:

  1. An introduction to clinical trials
  2. Michael Rosenblum's adaptive trial design page. 
  3. - registry of clinical trials
  4. Test, learn adapt - a white paper on using clinical trials for public policy
  5. Alltrials - an initiative to make all clinical trial data public
  6. ASCO clinical trials resources - on clinical trials ethics and standards
  7. Don Berry's paper on adaptive design.
  8. Fundamentals of clinical trials - a good general book (via David H.)
  9. Clinical trials, a methodological perspective - a more regulatory take (via David H.)

This post is by Rafa and Jeff. 


Sunday data/statistics link roundup (7/14/2013)

  1. Question: Do clinical trials work?Answer: Yes. Clinical trials are one of the defining success stories in the process of scientific inquiry. Do they work as fast/efficiently as a pharma company with potentially billions on the line would like? That is definitely much more up for debate. Most of the article is a good summary of how drug development works - although I think the statistics reporting is a little prone to hyperbole. I also think this sentence is both misleading, wrong, and way over the top, "Part of the novelty lies in a statistical technique called Bayesian analysis that lets doctors quickly glean information about which therapies are working best. There’s no certainty in the assessment, but doctors get to learn during the process and then incorporate that knowledge into the ongoing trial." 
  2. The fun begins in the grim world of patenting genes. Two companies are being sued by Myriad even though they just lost the case on their main patent. Myriad is claiming violation of one of their 500 or so other patents. Can someone with legal expertise give me an idea - is Myriad now a patent troll?
  3. R spells for data wizards from Thomas Levine. I also link the pink on grey look.
  4. Larry W. takes on non-informative priors. Worth the read, particularly the discussion of how non-informative priors can be informative in different parameterizations. The problem Larry points out here is one I think that is critical - in big data applications where the goal is often discovery, we rarely have enough prior information to make reasonable informative priors either. Not to say some regularization can't be helpful, but I think there is danger in putting an even weakly informative prior on a poorly understood, high dimensional space and then claiming victory when we discover something.
  5. Statistics and actuarial science are jumping into a politically fraught situation by raising the insurance on schools that allow teachers to carry guns. Fiscally, this is clearly going to be the right move. I wonder what the political fallout will be for the insurance company and for the governments that passed these laws (via Rafa via Marginal Revolution).
  6. Timmy!! Tim Lincecum throws his first no hitter. I know this isn't strictly data/stats but he went to UW like me!