Computer scientists discover statistics and find it useful

Tweet about this on TwitterShare on Facebook69Share on Google+10Share on LinkedIn35Email this to someone

This article in the New York Times today describes some of the advances that computer scientists have made in recent years.

The technology, called deep learning, has already been put to use in services like Apple’s Siri virtual personal assistant, which is based on Nuance Communications’ speech recognition service, and in Google’s Street View, which uses machine vision to identify specific addresses.

But what is new in recent months is the growing speed and accuracy of deep-learning programs, often called artificial neural networks or just “neural nets” for their resemblance to the neural connections in the brain.

Deep learning? Really?

Okay, names aside, there are a few things to say here. First, the advances described in the article are real--I think that's clear. There's a lot of pretty cool stuff out there (including Siri, in my opinion) coming from the likes of Google, Microsoft, Apple, and many others and, frankly, I appreciate all of it. I hope to have my own self-driving car one day.

The question is how did we get here? What worries me about this article and many others is that you can get the impression that there were tremendous advances in the technology/methods used. But I find that hard to believe given that the methods that are often discussed in these advances are methods that have been around for quite a while (neural networks, anyone?). The real advance has been in the incorporation of data into these technologies and the use of statistical models. The interesting thing is not that the data are big, it's that we're using data at all.

Did Nate Silver produce a better prediction of the election than the pundits because he had better models or better technology? No, it's because he bothered to use data at all. This is not to downplay the sophistication of Silver's or others' approach, but many others did what he did (presumably using different methods--I don't think there was collaboration) and more or less got the same results. So the variation across different models is small, but the variation between using data vs. not using data is, well, big. Peter Norvig notes this in his talk about how Google uses data for translation. An area that computational linguists had been working on for decades was advanced dramatically by a ton of data and (a variation of) Bayes' Theorem. I may be going out on a limb here, but I don't think it was Bayes' Theorem that did the trick. But there will probably be an article in the New York Times soon about how Bayes' Theorem is revolutionizing artificial intelligence. Oh wait, there already was one.

It may sound like I'm trying to bash the computer scientists here, but I'm not. It would be too too easy for me to write a post complaining about how the computer scientists have stolen the ideas that statisticians have been using for decades and are claiming to have discovered new approaches to everything. But that's exactly what is happening and good for them.

I don't like to frame everything as an us-versus-them scenario, but the truth is the computer scientists are winning and the statisticians are losing. The reason is that they've taken our best ideas and used them to solve problems that matter to people. Meanwhile, we should have been stealing the computer scientists' best ideas and using them to solve problems that matter to people. But we didn't. And now we're playing catch-up, and not doing a particularly good job of it.

That said, I believe there's still time for statistics to play a big role in "big data". We just have to choose to do it. Borrowing ideas from other fields is good--that's why it's called "re"search, right? Statisticians shouldn't be shy about it. Otherwise, all we'll have left to do is complain about how all those people took what we'd been working on for decades and...made it useful.

  • Marc

    the same kind of discussion went on 10 years ago when there was the fuss about KDD. Hey, you are Data Mining ? Looks cool. I've been doing something called Pattern Recognition for 10 years. And so on.

  • David J. Harris

    This post seems to have very little to do with my understanding of what's going on with these advances.

    First, I don't think that the advantages deep learning has shown in the last 7 or 8 years have much to do with being statistical. Depending on the field, deep learning is usually competing with things like hidden Markov models, Gaussian processes, mixture models, or various fancy regression approaches like boosting and sparse regularizers. And it's not like Hinton just "discovered" statistics--the guy trained Radford Neal, for example. If you really wanted to, you *could* make an argument that modern neural nets outperform the old ones because of a renewed statistical influence (RBMs are much more probabilistic in nature than a typical feed-forward neural net), but Andrew Ng and others have had a lot of success with deep learning in non-probabilistic contexts too.

    Nor is the difference that they're suddenly using data, as you seem to be asserting--all the competitors are using data too. Top-down, purely logic-based, "good-old-fashioned AI" has been mostly dead for a long time now too. Again, it's not like LeCun et al. did their work on handwritten character recognition in the '90's without looking at huge handwritten corpora.

    This is definitely not a case like election forecasting, where simply *having* a data-based model puts you head-and-shoulders above the rank-and-file. Deep learning really is making big advances in speech recognition, image recognition, document classification, and other fields that have been using models and data for a very long time.

    My understanding is that most of the difference is exactly what you're making fun of--architectural depth. Yoshua Bengio's group has done a lot of work exploring this question, and it seems like they're correct: deeper circuits can have exponentially more capacity than a shallow one of comparable size, and they have much more efficient and useful representations than shallower methods of equal capacity. Another way of saying this is that deep learning methods find better features for feeding to your favorite classifier than shallower ones do. That's been known for a long time, but the difference is that in the last decade, we can finally train deep models. Or perhaps more accurately, people that don't speak French and people using non-convolutional architectures can now fit these models for the first time.

    Some of the improvement has to do with pre-training (which has interesting connections to statistics), some of it has to with better optimization routines (which basically doesn't), but I think that giving statistics credit for the recent advances is a very strange thing to do.

  • Will

    Err, not quite. The slightly silly 'deep learning' term does seem to be Hinton's. But if you actually read his stuff (e.g. this http://www.cs.toronto.edu/~hinton/absps/tics.pdf) you'll see that he's not really doing 'neural networks' in the 1980s sense at all. It's actually about posterior inference in Bayesian networks with a layered structure. Unlike many of his colleagues (say, Jordan and Bishop), he just hasn't dumped the 'neural' vocabulary. In part I'd guess that's because journalists understand that sort of thing better than, say, Gibbs sampling. The bottom line here is that one should not get one's news about what computer scientists are up to from the New York Times or inferences from other people's choice of label.

    • David J. Harris

      +1 Good point.

      I'd add that Andrew Ng and Yann LeCun have some "deep" methods that don't really look like Bayesian networks or any other probabilistic model. But that's a good point.

    • David Warde-Farley

      You are partly mistaken. Many of the pre-training results in the literature have relied on training criteria based on probabilistic models (namely, greedily stacked restricted Boltzmann machines), but most often the resulting parameters are used to initialize a deep, deterministic feed-forward system that is very much an artificial neural network in the 1980s sense. Correct probabilistic inference is almost never done in a "deep belief network" learned by greedy stacking of RBMs, but rather very crude, approximate inference that corresponds exactly to a feed-forward pass in a logistic-activation neural network. Many non-probabilistic criteria in the same spirit have since been proposed (e.g., those based on autoencoders from the Bengio and LeCun labs). The key insight that has led to most of the progress seems to be the greedy layerwise strategy, not anything relying strictly on probabilistic models.

      Even when probabilistic models are employed, the focus is different. Unlike Mike Jordan's work, nearly no independence structure is built into a typical feature learning module (of the sort that are stacked in deep learning) except for perhaps a bipartite relationship between observed quantities and latent quantities. All of that said, there is much probabilistic model and MCMC research ongoing within the deep learning community.

  • paul

    the big difference between the two fields is that statisticians seem to stop being useful once they run out of rows in their Excel spreadsheet.

  • Pingback: Computer scientists discover statistics and find it useful | Simply … | To Share()

  • Megan

    I think one of the main differences is that computer scientists doing data mining and big data don't seem to be concentrating as much on sampling, which is pretty much a statistician's wheelhouse. Rather, the bias in CS people who do data mining is on having "all" the data. So why would you need to sample? Instead, we say "thanks" to the stats folks for those cool principles (like Bayes), and then we concentrate on gaining optimum efficiency of computational algorithms under massive amounts of data, not samples.

  • Pingback: Sunday data/statistics link roundup (11/25/2012) | Simply Statistics()

  • Pingback: Computer scientists discover statistics and find it useful | Simply … | Blog Mas Rifky()

  • Pingback: The Week in Big Data « Big Data Insight()

  • Pingback: Weekly roundup November 19-25 « lingwhatics()

  • Keith O’Rourke

    I do remember Geoff Hinton in a seminar at the University of Toronto saying that there were only two choices, learn statistics or make friends with a statistician (the punch line was that he refused to comment on which he thought would be easier.) Not sure which choice he made, but Radford Neal clearly chose to learn statistics.

    Not sure if marketing is the right term but it should be obvious that acknowledging statistical theory would have a cost or at least be distracting when promoting one's own not-statistics discipline.

    But as Marc pointed out this happens repeatedly and as Aristotle once roughly said "you are what you repeatedly do" so it is important to reflect on. My guess is that it has to do with a loss of steam/interest by those in statistics when things start to seem less mathematical or less of a logic of inference challenge. What seems to save the statistics discipline (not to say any discipline should be saved) is that others never seem to quite get a logic of inference (Megan’s comment perhaps being somewhat supportive).

  • Link

    I enjoyed this blog and thought it was quite thought provoking. Please keep it up.

  • Pingback: Various Developments – 11/26/2012 « Homologus()

  • Pingback: Edmond Beran()

  • Darin

    I think you are being a bit harsh on the NYT. They write those articles for a wide audience. Also, I don't really think that computer scientists (I am one) are winning and statisticians are losing. Computer scientists are using the work that statisticians produce (and giving proper credit I hope). Isn't that really the ultimate flattery?

  • Pingback: Links 11/28/12 | Mike the Mad Biologist()

  • Pingback: The Week in Big Data #47 | Big Data Insight()

  • Tao Shi

    Trained as a statistician and currently working stuffs related with data mining, I do find the evolving relationship between computer science and statistics interesting. From my experience, I found computer scientists, especially those in the field of data mining, machine learning and AI, turns to touch way more data and does it more often than most statisticians.

    I admit that I'm still trying to learn deep about "deep learning", but I do feel the greedy layerwise strategy, not particularly being Bayesian or not, holds the key for the current success of "deep learning". The Bayesian formulation happened to be a handy tool for the current implementation of the so called "deep learning". Someone correct me if I'm wrong here.

    BTW, I do agree with Magan that data collection or sampling seems to be a big difference. Purely based on the comments to this post, one may say "most people who read the article do not agree with post". Given enough comments (presumably much more than 10), data mining techniques MIGHT be able to predict fairly accurately on positive or negative attitude when a person is going to write a comment. However, to find out if people who read the post agree with it or not, we need to take into account that most comments on this post are from computer scientists. We should find out the consistence between "readers" and "who made comments". Somehow, sampling (and statistics) becomes relevant again in addressing this type of questions.

  • Pingback: Data Science Links from Recent Days | Data Science 101()

  • Pingback: xtb christmas unpacked()

  • Pingback: jbl on tour xtb()

  • Pingback: Ulysses Monserrat()

  • Pingback: Arnette Tes()

  • Pingback: free xbox games()

  • Pingback: low cholesterol diet plan()

  • Pingback: pediatric nurse degree()

  • Pingback: cheap edu backlinks()

  • Pingback: buy edu backlinks()

  • Pingback: pediatric nurse salary in ny()

  • Pingback: neonatal nurse job vacancy dubai()

  • Pingback: national association of neonatal nurses()

  • Pingback: registered nurse salary in california()

  • Pingback: asset protection trust form()

  • Pingback: skin cancer symptoms()

  • Pingback: lotto euro millions results()

  • Pingback: Sharla Roehling()

  • Pingback: best cleaning services durham nc()

  • Pingback: Vannesa Manzone()

  • Pingback: Maxwell Messamore()

  • Pingback: Jeannine Keay()

  • Pingback: edu link building service()