What can we learn from data analysis failures?

Roger Peng
2018-04-23

Back in February, I gave a talk at the Walter and Eliza Hall Research Institute in Melbourne titled “Lessons in Disaster: What Can We Learn from Data Analysis Failures?” This talk was quite different from talks that I usually give on computing or environmental health and I’m guessing it probably showed. It was nevertheless a fun talk and I got to talk about space-related stuff. If you want to hear some discussion of the development of this talk, you can listen to Episode 53 of Not So Standard Deviations.

It’s difficult to have a discussion about data analysis without some mention of John Tukey. In particular, his paper “The Future of Data Analysis”, published in the Annals of Statistics in 1962, weighs heavily. In fact, it weighs so heavily that the paper required its own table of contents! One paragraph from the end of Tukey’s massive paper has always struck me, in that his description of how we should teach data analysis is relatively simple, but we seem unable to implement it.

We would teach [data analysis] like biochemistry, with emphasis on what we have learned…with relegation of all question of detailed methods to the “laboratory work”. All study of detailed proofs…or comparisons of ways of presentation would belong in “the laboratory” rather than “in class”.

My interest is in taking this statement rather broadly and asking how often do we actually do this when it comes to data analysis?

Another statement that has fascinated me comes from Daryl Pregibon, who wrote in a 1991 National Research Council report titled The Future of Statistical Software,

Throughout American or even global industry, there is much advocacy of statistical process control and of understanding processes. Statisticians have a process they espouse but do not know anything about. It is the process of putting together many tiny pieces, the process called data analysis, and is not really understood.

The “putting together many tiny pieces” aspect of data analysis is really key. My guess is that Pregibon was referring to putting together many statistical tools and making all the little decisions about data that one always makes. However, often those little “pieces” are in fact people, and getting all of those people to fit together can be an equally challenging and equally critical aspect of success.

Learning about how data analyses succeeds or fails (but more importantly, fails) is extremely challenging without actually going through the process yourself. I don’t think I ever learned about it except through first hand experience, which took place over the course of years. There are a few reasons for this that I have observed over time:

I want to use one case study to think about what kinds of generalizable knowledge we can obtain from data analysis failures. The one I describe below is special because it had serious implications and large parts of it played out in public. While we likely will never know all of the details, we know enough to have a meaningful discussion.

The Duke Saga

The “Duke Saga” has been a tough nut to crack for me for many years now. While it’s fascinating because of the sheer number of problems that occurred, I’ve always struggled to identify exactly what went wrong. In other words, given what I know now, what intervention would I have taken to prevent a similar episode from happening in the future. I’ve long felt that the lessons people take away from this saga are not the correct ones in that applying the lessons to future work would not prevent failure.

First some background. Note that this is a highly abbreviated timeline:

I’ve obviously left out a lot of detail and if you want to hear more about this you can hear about it from Keith Baggerly himself in this nice lecture. However, I just wanted to give a sketch of what happened over a now more than 10 year period.

In my opinion, the details of the Duke Saga were salacious, but it was difficult to draw any conclusion about what actually went wrong and what approach should be taken to prevent something like this from happening again. Most people were just speculating about what could have happened and the people who really would know the details weren’t talking very much. Here’s how I would summarize the basic points that most people seemed to take away from the publicly available information about the saga:

In January 2015, The Cancer Letter published a memo written by Bradford Perez, who in 2008 was a medical student trainee in the Potti lab. He saw what was going on in the lab and recognized its shoddiness. Problems that Baggerly and Coombes had to essentially reverse engineer, Perez saw first hand and immediately recognized them as serious. In fact, in 2008 he wrote a memo to the leadership of his institute describing some of those problems:

“Fifty-nine cell line samples with mRNA expression data…were split in half to designate sensitive and resistant phenotypes. Then in developing the model, only those samples which fit the model best in cross validation were included. Over half of the original samples were removed…. This was an incredibly biased approach which does little more than give the appearance of a successful cross validation.” [emphasis added]

He further wrote,

At this point, I believe that the situation is serious enough that all further analysis should be stopped to evaluate what is known about each predictor and it should be reconsidered which are appropriate to continue using and under what circumstances…. I would argue that at this point nothing…should be taken for granted. All claims of predictor validations should be independently and blindly performed.”

The memo was ignored by the leadership. Nothing was stopped and nothing was changed at the time. Perez eventually took his name off a series of papers and left the lab.

Lessons Learned

This memo is critical in my opinion because it fundamentally changes the narrative about what went wrong in this entire saga. Yes, genomic analyses are “hard to do” but clearly there was expertise in the lab to recognize that difficulty and to recognize when statistical methods were being incorrectly applied. The problem was not a lack of training, nor was it simply the result of a few honest data management mistakes here and there. The problem was a breakdown in communication and a total lack of trust between investigators and members of the data analytic team. Perez clearly felt uncomfortable raising these issues in the lab and wrote the memo knowing that he had “much to lose”. He thought the problem in the lab was that statistical methods were being misapplied, but the real problem in the lab was that he did not feel comfortable discussing it. A breakdown in the relationship between an analyst and an investigator is a serious data analytic problem.

It’s possible for me to imagine an alternate scenario where a data analyst like Perez sees a problem with the way models are being developed or applied, mentions this to the principal investigator and has a detailed discussion, perhaps seeks outside expertise (e.g. from a statistician), and then modifies the procedure to fix the problem. It’s easy for me to imagine this because it happens pretty much every day. No data analysis is perfect from start to finish. Changes and course corrections are constantly made along the way. When I analyze data and run into problems that can be traced to data collection, I will raise this with the PI. When I give results to other investigators, sometimes the results don’t seem right to them and they come to me and seek clarification. If it’s a mistake on my part, I’ll fix it and send them updated results.

When the relationships between an analyst and various members of the investigator team are strong and there is substantial trust between them, honest mistakes are just minor bumps in the road that can be uncovered, discussed, and fixed. When there is a breakdown in those relationships, the exact same mistakes are covered up, denied, and buried. A breakdown in the relationships between analysts and other investigators on the team generally cannot be fixed with a better statistical method, or a reproducible workflow, or open source software. Recognizing that this is the problem is difficult because often there is no easy solution.

I think the data analytic lesson learned from the Duke Saga is that data analysts need to be allowed to say “stop”. But also, the ability to do so depends critically on the relationships between the analyst and members of the investigator team. If an analyst feels uncomfortable raising analytic issues with other members, then arguably all analyses done by the team are at risk. No amount of statistical expertise or tooling can fix this fundamental human problem.