Computer scientists discover statistics and find it useful

This article in the New York Times today describes some of the advances that computer scientists have made in recent years.

The technology, called deep learning, has already been put to use in services like Apple’s Siri virtual personal assistant, which is based on Nuance Communications’ speech recognition service, and in Google’s Street View, which uses machine vision to identify specific addresses.

But what is new in recent months is the growing speed and accuracy of deep-learning programs, often called artificial neural networks or just “neural nets” for their resemblance to the neural connections in the brain.

Deep learning? Really?

Okay, names aside, there are a few things to say here. First, the advances described in the article are real–I think that’s clear. There’s a lot of pretty cool stuff out there (including Siri, in my opinion) coming from the likes of Google, Microsoft, Apple, and many others and, frankly, I appreciate all of it. I hope to have my own self-driving car one day.

The question is how did we get here? What worries me about this article and many others is that you can get the impression that there were tremendous advances in the technology/methods used. But I find that hard to believe given that the methods that are often discussed in these advances are methods that have been around for quite a while (neural networks, anyone?). The real advance has been in the incorporation of data into these technologies and the use of statistical models. The interesting thing is not that the data are big, it’s that we’re using data at all.

Did Nate Silver produce a better prediction of the election than the pundits because he had better models or better technology? No, it’s because he bothered to use data at all. This is not to downplay the sophistication of Silver’s or others’ approach, but many others did what he did (presumably using different methods–I don’t think there was collaboration) and more or less got the same results. So the variation across different models is small, but the variation between using data vs. not using data is, well, big. Peter Norvig notes this in his talk about how Google uses data for translation. An area that computational linguists had been working on for decades was advanced dramatically by a ton of data and (a variation of) Bayes’ Theorem. I may be going out on a limb here, but I don’t think it was Bayes’ Theorem that did the trick. But there will probably be an article in the New York Times soon about how Bayes’ Theorem is revolutionizing artificial intelligence. Oh wait, there already was one.

It may sound like I’m trying to bash the computer scientists here, but I’m not. It would be too too easy for me to write a post complaining about how the computer scientists have stolen the ideas that statisticians have been using for decades and are claiming to have discovered new approaches to everything. But that’s exactly what is happening and good for them.

I don’t like to frame everything as an us-versus-them scenario, but the truth is the computer scientists are winning and the statisticians are losing. The reason is that they’ve taken our best ideas and used them to solve problems that matter to people. Meanwhile, we should have been stealing the computer scientists’ best ideas and using them to solve problems that matter to people. But we didn’t. And now we’re playing catch-up, and not doing a particularly good job of it.

That said, I believe there’s still time for statistics to play a big role in “big data”. We just have to choose to do it. Borrowing ideas from other fields is good–that’s why it’s called “re”search, right? Statisticians shouldn’t be shy about it. Otherwise, all we’ll have left to do is complain about how all those people took what we’d been working on for decades and…made it useful.