There’s a nice article by Nick Bilton in the New York Times Bits blog about the need for context when looking at Big Data. Actually, the article starts off by describing how Google’s Flu Trends model overestimated the number of people infected with flue in the U.S. this season, but then veers off into a more general discussion about Big Data.
My favorite quote comes from Mark Hansen:
“Data inherently has all of the foibles of being human,” said Mark Hansen, director of the David and Helen Gurley Brown Institute for Media Innovation at Columbia University. “Data is not a magic force in society; it’s an extension of us.”
Bilton also talks about a course he taught where students built sensors to install in elevators and stairwells at NYU to see how often they were used. The idea was to explore how often and when the NYU students used the stairs versus the elevator.
As I left campus that evening, one of the N.Y.U. security guards who had seen students setting up the computers in the elevators asked how our experiment had gone. I explained that we had found that students seemed to use the elevators in the morning, perhaps because they were tired from staying up late, and switch to the stairs at night, when they became energized.
“Oh, no, they don’t,” the security guard told me, laughing as he assured me that lazy college students used the elevators whenever possible. “One of the elevators broke down a few evenings last week, so they had no choice but to use the stairs.”
I can see at least three problems here, not necessarily mutually exclusive:
I don’t mean to be unduly critical of some students in a class who were just trying to collect some data. I think there should be more of that going on. But my point is that it’s not as easy as it looks. Even trying to answer a seemingly innocuous question of how students use elevators and stairs requires some forethought, study design, and careful analysis.