Replication, psychology, and big science


Reproducibility has been a hot topic for the last several years among computational scientists. A study is reproducible if there is a specific set of computational functions/analyses (usually specified in terms of code) that exactly reproduce all of the numbers in a published paper from raw data. It is now recognized that a critical component of the scientific process is that data analyses can be reproduced. This point has been driven home particularly for personalized medicine applications, where irreproducible results can lead to delays in evaluating new procedures that affect patients’ health. 

But just because a study is reproducible does not mean that it is replicable. Replicability is stronger than reproducibility. A study is only replicable if you perform the exact same experiment (at least) twice, collect data in the same way both times, perform the same data analysis, and arrive at the same conclusions. The difference with reproducibility is that to achieve replicability, you have to perform the experiment and collect the data again. This of course introduces all sorts of new potential sources of error in your experiment (new scientists, new materials, new lab, new thinking, different settings on the machines, etc.)

Replicability is getting a lot of attention recently in psychology due to some high-profile studies that did not replicate. First, there was the highly-cited experiment that failed to replicate, leading to a show down between the author of the original experiment and the replicators. Now there is a psychology project that allows researchers to post the results of replications of experiments - whether they succeeded or failed. Finally, the Reproducibility Project, probably better termed the Replicability Project, seeks to replicate the results of every experiment in the journals _Psychological Science, the Journal of Personality and Social Psychology,or the Journal of Experimental Psychology: Learning, Memory, and Cognition _in the year 2008.

Replicability raises important issues for “big science” projects, ranging from genomics (The Thousand Genomes Project) to physics (The Large Hadron Collider). These experiments are too big and costly to actually replicate. So how do we know the results of these experiments aren’t just errors, that upon replication (if we could do it) would not show up again? Maybe smaller scale replications of sub-projects could be used to help convince us of discoveries in these big projects?

In the meantime, I love the idea that replication is getting the credit it deserves (at least in psychology). The incentives in science often only credit the first person to an idea, not the long tail of folks who replicate the results. For example, replications of experiments are often not considered interesting enough to publish. Maybe these new projects will start to change some of the perverse academic incentives.