The difference between data hype and data hope

Jeff Leek

I was reading one of my favorite stats blogs, StatsChat, where Thomas points to this article in the Atlantic and highlights this quote:

Dassault Systèmes is focusing on that level of granularity now, trying to simulate propagation of cholesterol in human cells and building oncological cell models. “It’s data science and modeling,” Charlès told me. “Coupling the two creates a new environment in medicine.”

I think that is a perfect example of data hype. This is a cool idea and if it worked would be completely revolutionary. But the reality is we are not even close to this. In very simple model organisms we can predict very high level phenotypes some of the time with whole cell modeling. We aren’t anywhere near the resolution we’d need to model the behavior of human cells, let alone the complex genetic, epigenetic, genomic, and environmental components that likely contribute to complex diseases. It is awesome that people are thinking about the future and the fastest way to science future is usually through science fiction, but this is way overstating the power of current or even currently achievable data science.

So does that mean data science for improving clinical trials right now should be abandoned?


There is tons of currently applicable and real world data science being done in sequential analysis,  adaptive clinical trials, and dynamic treatment regimes. These are important contributions that are impacting clinical trials _right now _and where advances can reduce costs, save patient harm, and speed the implementation of clinical trials. I think that is the hope of data science - using statistics and data to make steady, realizable improvement in the way we treat patients.