As you know, we are big fans of reproducible research here at Simply Statistics. As you know, we are [big fans of reproducible research](http://simplystatistics.org/?s=reproducible+research) here at Simply Statistics. around the lack of reproducibility in the analyses performed by Anil Potti and subsequent fallout drove the importance of this topic home.
So when I started teaching a course on Data Analysis for Coursera, of course I wanted to focus on reproducible research. The students in the class will be performing two data analyses during the course. They will be peer evaluated using a rubric specifically designed for evaluating data analyses at scale. One of the components of the rubric was to evaluate whether the code people submitted with their assignments reproduced all the numbers in the assignment.
Unfortunately, I just had to cancel the reproducibility component of the first data analysis assignment. Here are the things I realized while trying to set up the process that may seem obvious but weren’t to me when I was designing the rubric:
Overall, I think the solution is to run some kind of EC2 instance with a standardized set of software. That is the only thing I can think of that would be scalable to a class this size. On the other hand that would both be expensive, a pain to maintain, and would require everyone to run code on EC2.
Regardless, it is a super interesting question. How do you do reproducibility at scale?