NSF should understand that Statistics is not Mathematics

NSF has realized that the role of Statistics is growing in all areas of science and engineering and has formed a subcommittee to examine the current structure of support of the statistical sciences.  As Roger explained in August, the NSF is divided into directorates composed of divisions. Statistics is in the Division of Mathematical Sciences (DMS) within the Directorate for Mathematical and Physical Sciences. Within this division it is a Disciplinary Research Program along with Topology, Geometric Analysis, etc.. To statisticians this does not make much sense, and my first thought when asked for recommendations was that we need a proper division. But the committee is seeking out recommendations that

[do] not include renaming of the Division of Mathematical Sciences. Particularly desired are recommendations that can be implemented within the current divisional and directorate structure of NSF; Foundation (NSF) and to provide recommendations for NSF to consider.

This clarification is there because former director Sastry Pantula suggested DMS change names to "Division of Mathematical and Statistical Sciences”.  The NSF shot down this idea and listed this as one of the reasons:

If the name change attracts more proposals to the Division from the statistics community, this could draw funding away from other subfields

So NSF does not want to take away from the other math programs and this is understandable given the current levels of research funding for Mathematics. But this being the case, I can't really think of a recommendation other than giving Statistics it's own division or give data related sciences their own directorate. Increasing support for the statistical sciences means increasing funding. You secure the necessary funding either by asking congress for a bigger budget (good luck with that) or by cutting from other divisions, not just Mathematics. A new division makes sense not only in practice but also in principle because Statistics is not Mathematics.

Statistics is analogous to other disciplines that use mathematics as a fundamental language, like Physics, Engineering, and Computer Science. But like those disciplines, Statistics contributes separate and fundamental scientific knowledge. While the field of applied mathematics tries to explain the world with deterministic equations, Statistics takes a dramatically different approach. In highly complex systems, such as the weather, Mathematicians battle LaPlace's demon and struggle to explain nature using mathematics derived from first principles. Statisticians accept  that deterministic approaches are not always useful and instead develop and rely on random models. These two approaches are both important as demonstrated by the improvements in meteorological predictions  achieved once data-driven statistical models were used to compliment deterministic mathematical models.

Although Statisticians rely heavily on theoretical/mathematical thinking, another important distinction from Mathematics is that advances in our field are almost exclusively driven by empirical work. Statistics always starts with a specific, concrete real world problem: we thrive in Pasteur's quadrant. Important theoretical work that generalizes our solutions always follows. This approach, built mostly by basic researchers, has yielded some of the most useful concepts relied upon by modren science: the p-value, randomization, analysis of variance, regression, the proportional hazards model, causal inference, Bayesian methods, and the Bootstrap, just to name a few examples. These ideas were instrumental in the most important genetic discoveries, improving agriculture, the inception of the empirical social sciences, and revolutionizing medicine via randomized clinical trials. They have also fundamentally changed the way we abstract quantitative problems from real data.

The 21st century brings the era of big data, and distinguishing Statistics from Mathematics becomes more important than ever.  Many areas of science are now being driven by new measurement technologies. Insights are being made by discovery-driven, as opposed to hypothesis-driven, experiments. Although testing hypotheses developed theoretically will of course remain important to science, it is inconceivable to think that, just like Leeuwenhoek became the father of microbiology by looking through the microscope without theoretical predictions, the era of big data will enable discoveries that we have not yet even imagined. However, it is naive to think that these new datasets will be free of noise and unwanted variability. Deterministic models alone will almost certainly fail at extracting useful information from these data just like they have failed at predicting complex systems like the weather. The advancement in science during the era of big data that the NSF wants to see will only happen if the field that specializes in making sense of data is properly defined as a separate field from Mathematics and appropriately supported.

Addendum: On a related note, NIH just announced that they plan to recruit a new senior scientific position: the Associate Director for Data Science

This entry was posted in Uncategorized. Bookmark the permalink.
  • http://www.facebook.com/LoveandHumanity Francisco Javier Arceo

    Misspelled modern, homie (it was a typo).

    Ironically, I misspelled "misspelled" as I wrote that. :(

  • http://www.facebook.com/LoveandHumanity Francisco Javier Arceo

    Fantastic article, entirely agree. Stochastic methods allow us to really make inference about something that we cannot deterministically solve; the implications of stochastics are huge. There are millions of examples, so I won't quote just one, but I think as statistics continues to emerge as one of the most business applicable sciences, we may see some support from the NSF in setting it as its own category.

  • Keith O’Rourke

    I agree that statistics is not math and will even admit to hoping that in the next 5-10 or 20 years most statistics can be done computationally (avoiding mathematical representations and manipulations) so its basis will simply be logic (the same basis of math?) but ...

    I can't agree with the history and specifically the examples.

    Bootstrap - it was Efron's mathematization that made it _important_

    Bayesian methods - anything prior to MCMC was mostly just math (except Francis Galton's qunincux) as was numerical integration variance reduction and early MCMC development

    the proportional hazards model - David Cox often said he preferred doing math (and had one of the best math tools kits of any statistician)
    The other problem is that 80%+ stats faculty are currently in math depts

    • Rafael Irizarry

      Here is how I see it:

      The Bootstrap was motivated by the need to estimate the distribution of a statistic when figuring it out analytically was too complicated. This serves a very practical need, e.g. reporting confidence intervals for a median.

      Even before MCMC made it generally practical, the Bayesian approach provided a way to summarize evidence after observing data, taking into account prior belief. How is this Mathematics? If anything it is Philosophy.

      The proportional hazards model: the title of the paper is "Regression models for Life Tables". Without Life Tables as motivation what's left? I doubt many people would care for the relatively simple math in the paper without the practical implication.

  • Ricardo Toledano

    Probability theory is Mathematics, pure and simple, it's worthless besides the very basic if you try to remove the First order logic theorem proving aspect from it. The same goes for Inference theory.

    What you described for me is basically what I call Statistical modeling and I agree it is very distant from, say, Functional Analysis, Set Theory, the study of Differential Equations or Mathematics in general. It's also in general very spread, Biostatistics, Econometrics and Applied models in other areas are generally included in other departments and not generally in a Statistics department, even when one exists, this is what I think should be dealt with.

    Only to add, Numerical analysis also have almost all of its problems arising from empirical work, much of it was developed in the Manhattan Project, as was Monte Carlo methods, and it mainly deals with problems in the industry, there's a reason it's theoreticians are considered Applied mathematicians. I do not see how theoretical statistics for example should be considered in a different category.

    • Ricardo Toledano

      Just to add, Statistical analysis is also distant from Mathematics. I forgot to say that.

      • Malarkey

        By this reductio ad absurdum chemistry or engineering are just a branch of physics - or even biology is just a really esoteric niche.

        I think just because one discipline evolved from or has foundations in another does not mean it is a sub-genre of that discipline. Once it's practice and application becomes distinct then we put it in a separate department and give it a tea room (speaking form UK here).

        I think if computing has its own funding stream so should statistics - its big enough and distinct enough.

  • Troy Mc.

    The idea that applied mathematics doesn't include stochastic models is utter baloney. Of course it does. Browse through the SIAM journals and you'll find tons of stochastic models.

    Everyone wants more funding, but recategorization of statistics not the way to go about getting it.

    • Rafael Irizarry

      Of course applied mathematicians write down stochastic models. Probabilists (which i consider mathematicians), Economists, Physicists and many others do as well. The fundamental difference I am trying to explain relates to how models are motivated and used. When attacking a real world problem, applied mathematics tries to describe nature from first principles (differential equations are very popular) while statisticians look at data and are content with random models that fit the observations or make good prediction, even if we don't quite understand why. Both approaches have their strengths, but they are fundamentally different and lead to a completely different body of work.

      I actually don't think Statistics needs more funding in the short term. Right now demand for our services is way higher than we can handle. What we need is more people gaining expertise in Applied Statistics and I am pretty sure this is happening (the stat undergrad at Harvard has quadrupled in the last 10 years). But the NSF has explicitly said they want to better support Statistics. My post described why I think they need a division to do this properly.

  • Pingback: NSF should understand that Statistics is not Mathematics | Simply Statistics Papers