An experimental foundation for statistics


In a recent conversation with Brian (of abstraction fame) about the relationship between mathematics and statistics. Statistics, for historical reasons, has been treated as a mathematical sub-discipline (this is the NSF’s view).

One reason statistics is viewed as a sub-discipline of math is because the foundations of statistics are built on the basis of deductive reasoning, where you start with a few general propositions or foundations that you assume to be true and then systematically prove more specific results. A similar approach is taken in most mathematical disciplines. 

In contrast, scientific disciplines like biology are largely built on the basis of inductive reasoning and the scientific method. Specific individual discoveries are described and used as a framework for building up more general theories and principles. 

So the question Brian and I had was: what if you started over and built statistics from the ground up on the basis of inductive reasoning and experimentation? Instead of making mathematical assumptions and then proving statistical results, you would use experiments to identify core principals. This actually isn’t without precedent in the statistics community. Bill Cleveland and Robert McGill studied how people perceive graphical information and produced some general recommendations about the use of area/linear contrasts, common axes, etc. There has also been a lot of work on experimental understanding of how humans understand uncertainty

So what if we put statistics on an experimental, rather than on a mathematical foundation. We performed experiments to see what kind of regression models people were able to interpret most clearly, what were the best ways to evaluate confounding/outliers, or what measure of statistical significance people understood best? Basically, what if the “quality” of a statistical method did not rest on the mathematics behind the method, but on the basis of experimental results demonstrating how people used the methods? So, instead of justifying lowess mathematically, we justified it on the basis of its practical usefulness through specific, controlled experiments. Some of this is already happening when people do surveys of the most successful methods in Kaggle contests or with the MAQC.

I wonder what methods would survive the change in paradigm?