In the past couple of years several non-statisticians have asked me “what is Big Data exactly?” or “How big is Big Data?”. My answer has been “I think Big Data is much more about “data” than “big”. I explain below. Since 2011 Big Data has been all over the news. The New York Times, The Economist, Science, Nature, etc.. have told us that the Big Data Revolution is upon us (see google trends figure above).
This is scientific variant on the #whatshouldwecallme meme isn’t exclusive to statistics, but it is hilarious. This is a really interesting post that is a follow-up to the XKCD password security comic. The thing I find most interesting about this is that researchers realized the key problem with passwords was that we were looking at them purely from a computer science perspective. But _people _use passwords, so we need a person-focused approach to maximize security.
Just got back from IBC 2012 in Kobe Japan. I was in an awesome session (organized by the inimitable Lieven Clement) with great talks by Matt McCall, Djork-Arne Clevert, Adetayo Kasim, and Willem Talloen. Willem’s talk nicely tied in our work and how it plays into the pharmaceutical development process and the bigger theme of big data. On the way home through SFO I saw this hanging in the airport.
Statisticians have not always been great self-promoters. I think in part this comes from our tendency to be arbiters rather than being involved in the scientific process. In some ways, I think this is a good thing. Self-promotion can quickly become really annoying. On the other hand, I think our advertising shortcomings are hurting our field in a number of different ways. Here are a few: As Rafa points out even though statisticians are ridiculously employable right now it seems like statistics M.
It seems like half of the battle in statistics is identifying an important/unsolved problem. In math, this is easy, they have a list. So why is it harder for statistics? Since I have to think up projects to work on for my research group, for classes I teach, and for exams we give, I have spent some time thinking about ways that research problems in statistics arise. I borrowed a page out of Roger’s book and made a little diagram to illustrate my ideas (actually I can’t even claim credit, it was Roger’s idea to make the diagram).
There’s lots of talk about “big data” these days and I think that’s great. I think it’s bringing statistics out into the mainstream (even if they don’t call it statistics) and it creating lots of opportunities for people with statistics training. It’s one of the reasons we created this blog. One thing that I think gets missed in much of the mainstream reporting is that, in my opinion, the biggest problems aren’t with the truly massive datasets out there that need to be mined for important information.
People who read the news should be aware by now that we are in the midst of a big data era. The New York Times, for example, has been writing about this frequently. One of their most recent articles describes how UC Berkeley is getting $60 million dollars for a new computer science center. Meanwhile, at University of Florida the administration seems to be oblivious to all this and about a month ago announced it was dropping its computer science department to save $.
Larry Ellison, the CEO of Oracle, like most technology CEOs, has a tendency for the over-the-top sales pitch. But it’s fun to keep track of what these companies are up to just to see what they think the trends are. It seems clear that companies like IBM, Oracle, and HP, which focus substantially on the enterprise (or try to), think the future is data data data. One piece of evidence is the list of companies that they’ve acquired recently.
Bits: Big Data: Sorting Reality From the Hype