Editor’s note: Martin Wainwright is the winner of the 2014 COPSS Award. This award is the most prestigious award in statistics, sometimes refereed to as the Nobel Prize in Statistics. Martin received the award for: “ For fundamental and groundbreaking contributions to high-dimensional statistics, graphical modeling, machine learning, optimization and algorithms, covering deep and elegant mathematical analysis as well as new methodology with wide-ranging implications for numerous applications.” He kindly agreed to be interviewed by Simply Statistics.
SS: How did you find out you had received the COPSS prize?
It was pretty informal -– I received an email in February from
Raymond Carroll, who chaired the committee. But it had explicit
instructions to keep the information private until the award ceremony
in August.
SS: You are in Electrical Engineering & Computer Science (EECS) and
Statistics at Berkeley: why that mix of departments?
Just to give a little bit of history, I did my undergraduate degree in
math at the University of Waterloo in Canada, and then my Ph.D. in
EECS at MIT, before coming to Berkeley to work as a postdoc in
Statistics. So when it came time to looking at faculty positions,
having a joint position between these two departments made a lot of
sense. Berkeley has always been at the forefront of having effective
joint appointments of the “Statistics plus X” variety, whether X is
EECS, Mathematics, Political Science, Computational Biology and so on.
For me personally, the EECS plus Statistics combination is terrific,
as a lot of my interests lie at the boundary between these two areas,
whether it is investigating tradeoffs between computational and
statistical efficiency, connections between information theory and
statistics, and so on. I hope that it is also good for my students!
In any case, whether they enter in EECS or Statistics, they graduate
with a strong background in both statistical theory and methods, as
well as optimization, algorithms and so on. I think that this kind of
mix is becoming increasingly relevant to the practice of modern
statistics, and one can certainly see that Berkeley consistently
produces students, whether from my own group or other people at
Berkeley, with this kind of hybrid background.
SS: What do you see as the relationship between statistics and machine
learning?
This is an interesting question, but tricky to answer, as it can
really depend on the person. In my own view, statistics is a very
broad and encompassing field, and in this context, machine learning
can be viewed as a particular subset of it, one especially focused on
algorithmic and computational aspects of statistics. But on the other
hand, as things stand, machine learning has rather different cultural
roots than statistics, certainly strongly influenced by computer
science. In general, I think that both groups have lessons to learn
from each other. For instance, in my opinion, anyone who wants to do
serious machine learning needs to have a solid background in
statistics. Statisticians have been thinking about data and
inferential issues for a very long time now, and these fundamental
issues remain just as important now, even though the application
domains and data types may be changing. On the other hand, in certain
ways, statistics is still a conservative field, perhaps not as quick
to move into new application domains, experiment with new methods and
so on, as people in machine learning do. So I think that
statisticians can benefit from the playful creativity and unorthodox
experimentation that one sees in some machine learning work, as well
as the algorithmic and programming expertise that is standard in
computer science.
SS: What sorts of things is your group working on these days?
I have fairly eclectic interests, so we are working on a range of
topics. A number of projects concern the interface between
computation and statistics. For instance, we have a recent pre-print
(with postdoc Sivaraman Balakrishnan and colleague Bin Yu) that tries
to address the gap between statistical and computational guarantees in
applications of the expectation-maximization (EM) algorithm for latent
variable models. In theory, we know that the global minimizer of the
(nonconvex) likelihood has good properties, but the in practice, the
EM algorithm only returns local optima. How to resolve this gap
between existing theory and actual practice? In this paper, we show
that under pretty reasonable conditions-–that hold for various types
of latent variable models-–the EM fixed points are as good as the
global minima from the statistical perspective. This explains what is
observed a lot in practice, namely that when the EM algorithm is given
a reasonable initialization, it often returns a very good answer.
There are lots of other interesting questions at this
computation/statistics interface. For instance, a lot of modern data
sets (e.g., Netflix) are so large that they cannot be stored on a
single machine, but must be split up into separate pieces. Any
statistical task must then be carried out in a distributed way, with
each processor performing local operations on a subset of the data,
and then passing messages to other processors that summarize the
results of its local computations. This leads to a lot of fascinating
questions. What can be said about the statistical performance of such
distributed methods for estimation or inference? How many bits do the
machines need to exchange in order for the distributed performance to
match that of the centralized “oracle method” that has access to all
the data at once? We have addressed some of these questions in a
recent line of work (with student Yuchen Zhang, former student John
Duchi and colleague Micheel Jordan).
So my students and postdocs are keeping me busy, and in addition, I am
also busy writing a couple of books, one jointly with Trevor Hastie
and Rob Tibshirani at Stanford University on the Lasso and related
methods, and a second solo-authored effort, more theoretical in focus,
on high-dimensional and non-asymptotic statistics.
SS: What role do you see statistics playing in the relationship
between Big Data and Privacy?
Another very topical question: privacy considerations are certainly
becoming more and more relevant as the scale and richness of data
collection grows. Witness the recent controversies with the NSA, data
manipulation on social media sites, etc. I think that statistics
should have a lot to say about data and privacy. There has a long
line of statistical work on privacy, dating back at least to Warner’s
work on survey sampling in the 1960s, but I anticipate seeing more of
it over the next years. Privacy constraints bring a lot of
interesting statistical questions-–how to design experiments, how to
perform inference, how should data be aggregated and what should be
released and so on-–and I think that statisticians should be at the
forefront of this discussion.
In fact, in some joint work with former student John Duchi and
colleague Michael Jordan, we have examined some tradeoffs between
privacy constraints and statistical utility. We adopt the framework
of local differential privacy that has been put forth in the computer
science community, and study how statistical utility (in the form of
estimation accuracy) varies as a function of the privacy level.
Obviously, preserving privacy means obscuring something, so that
estimation accuracy goes down, but what is the quantitative form of
this tradeoff? An interesting consequence of our analysis is that in
certain settings, it identifies optimal mechanisms for preserving a
certain level of privacy in data.
What advice would you give young statisticians getting into the
discipline right now?
It is certainly an exciting time to be getting into the discipline.
For undergraduates thinking of going to graduate school in statistics,
I would encourage them to build a strong background in basic
mathematics (linear algebra, analysis, probability theory and so on)
that are all important for a deep understanding of statistical methods
and theory. I would also suggest “getting their hands dirty”, that is
doing some applied work involving statistical modeling, data analysis
and so on. Even for a person who ultimately wants to do more
theoretical work, having some exposure to real-world problems is
essential. As part of this, I would suggest acquiring some knowledge
of algorithms, optimization, and so on, all of which are essential in
dealing with large, real-world data sets.