The fact that data analysts base their conclusions on data does not mean they ignore experts

Rafael Irizarry
2014-03-24

Paul Krugman recently joined the new FiveThirtyEight hating bandwagon. I am not crazy about the new website either (although I’ll wait more than one weeks before judging) but in a recent post Krugman creates a false dichotomy that is important to correct. Krugmanam states that “[w]hat [Nate Silver] seems to have concluded is that there are no experts anywhere, that a smart data analyst can and should ignore all that.” I don’t think that is what Nate Silver, nor any other smart data scientist or applied statistician has concluded. Note that to build his election prediction model, Nate had to understand how the electoral college works, how polls work, how different polls are different, the relationship between primaries and presidential election, among many other details specific to polls and US presidential elections. He learned all of this by reading and talking to experts. Same is true for PECOTA where data analysts who know quite a bit about baseball collect data to create meaningful and predictive summary statistics. As Jeff said before, the key word in “Data Science” is not Data, it is Science.

The one example Krugman points too as ignoring experts appears to be written by someone who, according to the article that Krugman links to, was biased by his own opinions, not by data analysis that ignored experts. However, in Nate’s analysis of polls and baseball data it is hard to argue that he let his bias affect his analysis. Furthermore, it is important to point out that he did not simply stick data into a black box prediction algorithm. Instead he did what most of us applied statisticians do: we build empirically inspired models but guided by expert knowledge.

ps - Krugman links to a [Paul Krugman recently joined the new FiveThirtyEight hating bandwagon. I am not crazy about the new website either (although I’ll wait more than one weeks before judging) but in a recent post Krugman creates a false dichotomy that is important to correct. Krugmanam states that “[w]hat [Nate Silver] seems to have concluded is that there are no experts anywhere, that a smart data analyst can and should ignore all that.” I don’t think that is what Nate Silver, nor any other smart data scientist or applied statistician has concluded. Note that to build his election prediction model, Nate had to understand how the electoral college works, how polls work, how different polls are different, the relationship between primaries and presidential election, among many other details specific to polls and US presidential elections. He learned all of this by reading and talking to experts. Same is true for PECOTA where data analysts who know quite a bit about baseball collect data to create meaningful and predictive summary statistics. As Jeff said before, the key word in “Data Science” is not Data, it is Science.

The one example Krugman points too as ignoring experts appears to be written by someone who, according to the article that Krugman links to, was biased by his own opinions, not by data analysis that ignored experts. However, in Nate’s analysis of polls and baseball data it is hard to argue that he let his bias affect his analysis. Furthermore, it is important to point out that he did not simply stick data into a black box prediction algorithm. Instead he did what most of us applied statisticians do: we build empirically inspired models but guided by expert knowledge.

ps - Krugman links to a](http://www.nytimes.com/2014/03/22/opinion/egan-creativity-vs-quants.html?src=me&ref=general) piece which has another false dichotomy as the title: “Creativity vs. Quants”. He should try doing it before assuming there is no creativity involved in extracting information from data.