21
Oct

## Sunday Data/Statistics Link Roundup (10/21/12)

1. This is scientific variant on the #whatshouldwecallme meme isn’t exclusive to statistics, but it is hilarious.
2. This is a really interesting post that is a follow-up to the XKCD password security comic. The thing I find most interesting about this is that researchers realized the key problem with passwords was that we were looking at them purely from a computer science perspective. But people use passwords, so we need a person-focused approach to maximize security. This is a very similar idea to our previous post on an experimental foundation for statistics. Looks like Di Cook and others are already way ahead of us on this idea. It would be interesting to redefine optimality incorporating the knowledge that most of the time it is a person running the statistics.
3. This is another fascinating article about the math education wars. It starts off as the typical dueling schools issue in academia - two different schools of thought who routinely go after the other side. But the interesting thing here is it sounds like one side of this math debate is being waged by a person collecting data and the other is being waged by a side that isn’t. It is interesting how many areas are being touched by data - including what kind of math we should teach.
4. I’m going to visit Minnesota in a couple of weeks. I was so pumped up to be an outlaw. Looks like I’m just a regular law abiding citizen though….
5. Here are outstanding summaries of what went on at the Carl Morris Big Data conference this last week. Tons of interesting stuff there. Parts one, two, and three
13
Sep

## An experimental foundation for statistics

In a recent conversation with Brian (of abstraction fame) about the relationship between mathematics and statistics. Statistics, for historical reasons, has been treated as a mathematical sub-discipline (this is the NSF’s view).

One reason statistics is viewed as a sub-discipline of math is because the foundations of statistics are built on the basis of deductive reasoning, where you start with a few general propositions or foundations that you assume to be true and then systematically prove more specific results. A similar approach is taken in most mathematical disciplines.

In contrast, scientific disciplines like biology are largely built on the basis of inductive reasoning and the scientific method. Specific individual discoveries are described and used as a framework for building up more general theories and principles.

So the question Brian and I had was: what if you started over and built statistics from the ground up on the basis of inductive reasoning and experimentation? Instead of making mathematical assumptions and then proving statistical results, you would use experiments to identify core principals. This actually isn’t without precedent in the statistics community. Bill Cleveland and Robert McGill studied how people perceive graphical information and produced some general recommendations about the use of area/linear contrasts, common axes, etc. There has also been a lot of work on experimental understanding of how humans understand uncertainty

So what if we put statistics on an experimental, rather than on a mathematical foundation. We performed experiments to see what kind of regression models people were able to interpret most clearly, what were the best ways to evaluate confounding/outliers, or what measure of statistical significance people understood best? Basically, what if the “quality” of a statistical method did not rest on the mathematics behind the method, but on the basis of experimental results demonstrating how people used the methods? So, instead of justifying lowess mathematically, we justified it on the basis of its practical usefulness through specific, controlled experiments. Some of this is already happening when people do surveys of the most successful methods in Kaggle contests or with the MAQC.

I wonder what methods would survive the change in paradigm?