Simply Statistics: On the relative importance of mathematical abstraction in graduate statistical education

_Editor’s Note: This is the counterpoint in our series of posts on the value of abstraction in graduate education. See Brian’s defense of abstraction on Monday and the comments on his post, as well as the comments on our original teaser post for more. See below for a full description of the T-bone inside joke*._**

Brian did a good job at defining abstraction. In a cagey debater’s move, he provided an incredibly broad definition of abstraction that includes the reason we call a a smiley face, the reason why we can apply least squares to a variety of data types, and the reason we write functions when programming. At this very broad level, it is clear that abstract thinking is necessary for graduate students or any other data professional.

But our debate was inspired by a discussion of whether measure-theoretic probability was a key component of our graduate program. There was some agreement that for many biostatistics Ph.D. students, this exact topic may not be necessary for their research or careers. Brian suggested that measure-theoretic probability was a surrogate marker for something more important - abstract thinking and the ability to generalize ideas. This is a very specific form of generalization and abstraction that is used most commonly by statisticians: the ability that permits one to prove theorems and develop statistical models that can be applied to a variety of data types. I will therefore refocus the debate on the original topic. I have three main points:
**
**

There is an over emphasis in statistical graduate programs on abstraction defined as the ability to prove mathematical theorems and develop general statistical methods.
It is possible to create incredible statistical value without developing generalizable statistical methods
While abstraction as defined generally is good, overemphasis on this specific type of abstraction limits our ability to include computing and real data analysis in our curriculum. It also takes away from the most important learning experience of graduate school: performing independent research.

There is an over emphasis in statistical graduate programs on abstraction defined as the ability to prove mathematical theorems and develop general statistical methods.

At a top program, you can expect to take courses in very theoretical statistics, measure theoretic probability, and an applied (or methods) sequence. The first two courses are exclusively mathematical. The third (at the programs I have visited, graduated from, taught in), despite its name, is most generally focused on mathematical details underlying statistical methods. The result is that most Ph.D. students are heavily trained in the mathematical theory behind statistics.

At the same time, there are a long list of skills necessary to develop a successful Ph.D. statistician. These include creativity in applications, statistical programming skills, grit to power through the boring/hard parts of research, interpretation of statistical results on real data, ability to identify the most important scientific problems, and a deep understanding of the scientific problems you are working on. Abstraction is on that list, but it is just one of many skills on that list. Graduate education is a zero-sum game over a finite period of time. Our strong focus on mathematical abstraction means there is less time for everything else.

Any hard quantitative course will measure the ability of a student to abstract in the general sense Brian defined. One of these courses would be very useful for our students. But it is not clear that we should focus on mathematical abstraction to the exclusion of other important characteristics of graduate students.

It is possible to create incredible statistical value without developing generalizable statistical methods

A major standard for success in academia is the ability to generate solutions to problems that are widely read, cited, and used. A graduate student who produces these types of solutions is likely to have a high-impact and well-respected career. In general, it is not necessary to be able to prove theorems, understand measure theory, or develop generalizable statistical models to have this type of success.

One example is one of the co-authors of our blog, best known for his work in genomics. In this field, data is noisy and full of systematic errors, and for several technologies, he invented methods to correct them. For example, he developed the most popular method for making measurements from different experiments comparable, for removing the dependence of measurements on the letters in a gene, and for reducing variability due to operators who run the machine or the ozone levels. Each of these discoveries involved: (1) deep understanding of the specific technology used, (2) a good intuition of what signals were due to biology and which were due to technology, (3) application/development of specific, somewhat ad-hoc, statistical procedures to correct the mistakes, and (4) the development and distribution of good software. His work has been hugely influential on genomics, has been cited thousands of times, and has substantially improved the quality of both biological and statistical results.

But the work did not result in knowledge that was generalizable to other areas of application, it deals with problems that are highly specialized to genomics. If these were his only contributions (they are not), he’d be a hugely successful Ph.D. statistician. But had he focused on general solutions he would have never solved the problems at hand, since the problems were highly specific to a single application. And this is just one example I know well because I work in the area. There are a ton more just like it.

While abstraction as defined generally is good, overemphasis on a specific type of abstraction limits our ability to include computing and real data analysis in our curriculum. It also takes away from the most important learning experience of graduate school: performing independent research.

One could argue that the choice of statistical techniques during data analysis is abstraction, or that one needs to abstract to develop efficient software. But the ability to abstract needed for these tasks can be measured by a wide range of classes, not just measure theoretic probability. Some of these classes might teach practically applicable skills like writing fast and efficient algorithms. Many results of high statistical value do not require mathematical proofs, abstract inductive reasoning, or asymptotic theory. It is a good idea to have a some people who can abstract away the science behind statistical methods to the core mathematical philosophy. But our current curriculum is too heavily weighted in this direction. In some cases, statisticians are even being left behind because they do not have sufficient time in their curriculum to develop the computational skills and amass the necessary subject matter knowledge needed to compete with the increasingly diverse set of engineers, computer scientists, data scientists, and computational biologists tackling the same scientific problems. We need to reserve a larger portion of graduate education for diving deeply into specific scientific problems, even if it means they spend less time developing generalizable/abstract statistical ideas. * Inside joke explanation: Two years ago at JSM I ran a footrace with <a href="http://www.biostat.jhsph.edu/~jgoldsmi/" target="_blank">this guy</a> for the rights to the name “Jeff” in the department of Biostatistics at Hopkins for the rest of 2011. Unfortunately, we did not pro-rate for age and he nipped me by about a half-yard. True to my word, I went by Tullis (my middle name) for a few months, including on the <a href="http://biostat.jhsph.edu/~jleek/jsm-2011-title-slide.pdf" target="_blank">title slide</a> of my JSM talk. This was, of course, immediately subjected to all sorts of nicknaming and B-Caffo loves to use “T-bone”. I apologize on behalf of those that brought it up.