Treading a New Path for Reproducible Research: Part 1

Tweet about this on TwitterShare on Facebook39Share on Google+35Share on LinkedIn2Email this to someone

Discussions about reproducibility in scientific research have been on the rise lately, including on this blog. There are many underlying trends that have produced this increased interest in reproducibility: larger and larger studies being harder to replicate independently, cheaper data collection technologies/methods producing larger datasets, cheaper computing power allowing for more sophisticated analyses (even for small datasets), and the rise of general computational science (for every “X” we now have “Computational X”).

For those that haven’t been following, here’s a brief review of what I mean when I say “reproducibility”. For the most part in science, we focus on what I and some others call “replication”. The purpose of replication is to address the validity of a scientific claim. If I conduct a study and conclude that “X is related to Y”, then others may be encouraged to replicate my study--with independent investigators, data collection, instruments, methods, and analysis--in order to determine whether my claim of “X is related to Y” is in fact true. If many scientists replicate the study and come to the same conclusion, then there’s evidence in favor of the claim’s validity. If other scientists cannot replicate the same finding, then one might conclude that the original claim was false. In either case, this is how science has always worked and how it will continue to work.

Reproducibility, on the other hand, focuses on the validity of the data analysis. In the past, when datasets were small and the analyses were fairly straightforward, the idea of being able to reproduce a data analysis was perhaps not that interesting. But now, with computational science, where data analyses can be extraodinarily complicated, there’s great interest in whether certain data analyses can in fact be reproduced. By this I mean is it possible to take someone’s dataset and come to the same numerical/graphical/whatever output that they came to. While this seems theoretically trivial, in practice it’s very complicated because a given data analysis, which typically will involve a long pipeline of analytic operations, may be difficult to keep track of without proper organization, training, or software.

What Problem Does Reproducibility Solve?

In my opinion, reproducibility cannot really address the validity of a scientific claim as well as replication. Of course, if a given analysis is not reproducible, that may call into question any conclusions drawn from the analysis. However, if an analysis is reproducible, that says practically nothing about the validity of the conclusion or of the analysis itself.

In fact, there are numerous examples in the literature of analyses that were reproducible but just wrong. Perhaps the most nefarious recent example is the Potti scandal at Duke. Given the amount of effort (somewhere close to 2000 hours) Keith Baggerly and his colleagues had to put into figuring out what Potti and others did, I think it’s reasonable to say that their work was not reproducible. But in the end, Baggerly was able to reproduce some of the results--this was how he was able to figure out that the analysis were incorrect. If the Potti analysis had not been reproducible from the start, it would have been impossible for Baggerly to come up with the laundry list of errors that they made.

The Reinhart-Rogoff kerfuffle is another example of analysis that ultimately was reproducible but nevertheless questionable. While Herndon did have to do a little reverse engineering to figure out the original analysis, it was nowhere near the years-long effort of Baggerly and colleagues. However, it was Reinhart-Rogoff’s unconventional weighting scheme (fully reproducible, mind you) that drew all of the attention and strongly influenced the analysis.

I think the key question we want to answer when seeing the results of any data analysis is “Can I trust this analysis?” It’s not possible to go into every data analysis and check everything, even if all the data and code were available. In most cases, we want to have a sense that the analysis was done appropriately (if not optimally). I would argue that requiring that analyses be reproducible does not address this key question.

With reproducibility you get a number of important benefits: transparency, data and code for others to analyze, and an increased rate of transfer of knowledge. These are all very important things. Data sharing in particular may be important independent of the need to reproduce a study if others want to aggregate datasets or do meta-analyses. But reproducibility does not guarantee validity or correctness of the analysis.

Prevention vs. Medication

One key problem with the notion of reproducibility is the point in the research process at which we can apply it as an intervention. Reproducibility plays a role only in the most downstream aspect of the research process--post-publication. Only after a paper is published (and after any questionable analyses have been conducted) can we check to see if an analysis was reproducible or conducted in error.


At this point it may be difficult to correct any mistakes if they are identified. Grad students have graduated, postdocs have left, people have moved on. In the Potti case, letters to the journal editors were ignored. While it may be better to check the research process at the end rather than to never check it, intervening at the post-publication phase is arguably the most expensive place to do it. At this phase of the research process, you are merely “medicating” the problem, to draw an analogy with chronic diseases. But fundamental data analytic damage may have already been done.

This medication aspect of reproducibility reminds me of a famous quotation from R. A. Fisher:

To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.

Reproducibility allows for the statistician to conduct the post mortem of a data analysis. But wouldn’t it have been better to have prevented the analysis from dying in the first place?

Moving Upstream

There has already been much discussion of changing the role of reproducibility in the publication/dissemination process. What if a paper had to be deemed reproducible before it was published? The question here is who will reproduce the analysis? We can't trust the authors to do it so we have to get an independent third party. What about peer reviewers? I would argue that this is a pretty big burden to place on a peer reviewer who is already working for free. How about one of the Editors? Well, at the journal Biostatistics, that’s exactly what we do. However, our policy is voluntary and only plays a role after a paper has been accepted through the usual peer review process. At any rate, from a business perspective, most journal owners will be reluctant to implement any policy that might reduce the number of submissions to the journal.

What Then?

To summarize, I believe reproducibility of computational research is very important, primarily to increase transparency and to improve knowledge sharing. However, I don’t think reproducibility in and of itself addresses the fundamental question of “Can I trust this analysis?”. Furthermore, reproducibility plays a role at the most downstream part of the research process (post-publication) where it is costliest to fix any mistakes that may be discovered. Ultimately, we need to think beyond reproducibility and to consider developing ways to ensure the quality of data analysis from the start.

How can we address the key problem concerning the validity of a data analysis? I’ll talk about what I think we should do in Part 2 of this post.

  • Stuart Buck

    In organic chemistry, the journal Organic Syntheses has an editor re-run each procedure before publication to make sure that it's replicable. See http://www.orgsyn.org/

  • kent37

    I think you are too hard on reproducibility. Though it is no guarantee of correctness of the analysis, it does make it possible to verify correctness. Your two examples show this - if the analyses had not been (ultimately) reproducible, Baggerly and Herndon would not have been able to show that they were incorrect. It's kind of like open-source software. If the source is available, you can verify it. If the source is closed, you have to take the developer's word for it.

  • Gray

    I don't think that reproducibility can stop bad faith errors, but as a procedure it can help researchers find their own errors. It's easier to be sloppy when I don't expect anyone else to read my code.

  • http://www.facebook.com/people/Stephen-Henderson/621809902 Stephen Henderson

    If you are working in a data analytical science then ideally - reproducibility should be integrated into your workflow. The parameters, normalisations, weights etc, should all be contained within your Rscripts, make config files, or similar which you use to create the Figs and Tables. So - ideally - it should come throughout not just at the end.

    In my own field I often look at the methods section uncomprehendingly and think to myself... I know all these open-source code/tools if you just put the script in supplementary then I could follow this... because algorithms shouldn't be expressed in English (or not only).

    However...(there's always a however) most scientists aren't working in such an ideal. Usually they are using other peoples software that they may not fully understand, that maybe executable code only, that may have hidden parameters and so on. At this point you just have to shrug and let it pass.

    • John Minter

      I work in an analytical lab (microscopy/image analysis) and must use several closed source vendor packages as well as my preferred open source packages. I'd like to comment on your last paragraph. One simply makes the workflow as documented and reproducible as possible. This includes scripting (under version control, of course) where possible and documenting the methods and input choices (saving files where possible) and documenting the output. This requires conviction and self-discipline from the analyst - because most others are more concerned with volume, turn-around time, and cost per analysis.

      I prefer keeping two folder trees for each project: one is the project folder which is under version control (git) and contains everything required to reproduce the final report, including a script to do so with a single mouse click. This folder tree gets compressed into a 7zip archive for storage.

      I keep a second folder tree which contains all the binary data to reproduce the results. At the end of the project this is compressed to a 7zip archive which becomes the project compendium. These two archives are stored on network shares and then DVD or off-line disks. This makes it easy to jump back into a project when a client comes back months later and wants to extend the work with new samples.

  • Diego Pereira

    I don't see reproducibility they way you see it, that is as an answer to the question "Can I trust this analysis?". Reproducibility is more ancillary, more fundamental to science than that. Reproducibility has to be with the way scientific knowledge is built as opposed to pseudoscientific claims. Reproducibility is a necessary but not sufficient condition for naming a discovery scientific. Note that this has nothing to do with the ideas derived from that discovery.

    The weird bands in Andy Fire's gels were initially thought as to be errors, resulting probably from the lack of experience of his postdoc. I don't remember whether the experiments were well performed or not (just remember the ugly original gels), but they could reproduce those results again and again, and bum... He got an idea with a sufficient explanatory power and further confirmed by other groups that explained their observations.

    Reproducibility is attached to every scientific labor not just to the final and published results of a study. That's the reason you annotate your code or keep a lab notebook. If your experiment was wrong you can go and double-check what happened, perform some tests and fix it. If your experiment were not reproducible that wouldn't be possible.

    As new techniques are developed, old techniques are reviewed, and conclusions re-evaluated. If a study is well documented data generated by it can be further analyzed by newer techniques and the conclusions generated challenged. That's expected and in many cases desirable. An important part of science is the process of build and rebuild concepts and reproducibility is fundamental for that.

    • Keith O’Rourke

      Nicely put, science is a habit of being reproducible.
      Also not at all new - when I was hired to do free trade policy analysis in 1985 by http://en.wikipedia.org/wiki/James_Fleck , I was told that my first priority was to come up with analysis programs that were fully documented and could be re-run without any direct contact with me. When the new guy took over next summer, they were forbidden to talk to me until they had redone all the calculations. One of the most important lessons in my career.

  • PolSci Replication

    In political science the distinction between reproducing work and replicating work is not as strict as described above (http://tinyurl.com/cxwt7eh). We often speak of replication when we reproduce and check results, and then maybe add some data and variables to refine the analysis.

    In fact, often re-analyzing/reproducing a published research article is already so difficult (http://tinyurl.com/kubj6b7), that the next step of collecting a completely new data set to come to the same correlations is obsolete. Why try to validate results in a new analysis if the old results could not even be reproduced?

    This is not to say that replication in the sense of collecting new data is not more illuminating than mere re-analysis - it's just an even more difficult aim to achieve and always a second step in my opinion.


  • nana

    I don't think that reproducibility can stop bad faith errors, but as a
    procedure it can help researchers find their own errors. It's easier to
    be sloppy when I don't expect anyone else to read my code.

    obat kuat
    obat kuat alami
    obat kuat pria
    obat kuat sex
    obat pembesar penis
    obat perangsang wanita
    alat bantu sex
    obat penambah sperma
    pembesar penis