# In the era of data what is a fact?

13 Jan 2012The Twitter universe is abuzz about this article in the New York Times. Arthur Brisbane, who responds to reader’s comments, asks

I’m looking for reader input on whether and when New York Times news reporters should challenge “facts” that are asserted by newsmakers they write about.

He goes on to give a couple of examples of qualitative facts that reporters have used in stories without questioning the veracity of the claims. As many people pointed out in the comments, this is completely absurd. Of course reporters should check facts and report when the facts in their stories, or stated by candidates, are not correct. That is the purpose of news reporting.

But I think the question is a little more subtle when it comes to quantitative facts and statistics. Depending on what subsets of data you look at, what summary statistics you pick, and the way you present information - you can say a lot of different things with the same data. As long as you report what you calculated, you are technically reporting a fact - but it may be deceptive. The classic example is calculating median vs. mean home prices. If Bill Gates is in your neighborhood, no matter what the other houses cost, the mean price is going to be pretty high!

Two concrete things can be done to deal with the malleability of facts in the data age.

First, we need to require that our reporters, policy makers, politicians, and decision makers report the context of numbers they state. It is tempting to use statistics as blunt instruments, punctuating claims. Instead, we should demand that people using statistics to make a point embed them in the broader context. For example, in the case of housing prices, if a politician reports the mean home price in a neighborhood, they should be required to state that potential outliers may be driving that number up. How do we make this demand? By not believing any isolated statistics - statistics will only be believed when the source is quoted and the statistic is described.

But this isn’t enough, since the context and statistics will be meaningless without raising overall statisteracy (statistical literacy, not to be confused with numeracy). In the U.S. literacy campaigns have been promoted by library systems. Statisteracy is becoming just as critical; the same level of social pressure and assistance should be applied to individuals who don’t know basic statistics as those who don’t have basic reading skills. Statistical organizations, academic departments, and companies interested in analytics/data science/statistics all have a vested interest in raising the population statisteracy. Maybe a website dedicated to understanding the consequences of basic statistical concepts, rather than the concepts themselves?

And don’t forget to keep rating health news stories!