Tag: p-value


R.A. Fisher is the most influential scientist ever

You can now see profiles of famous scientists on Google Scholar citations. Here are links to a few of them (via Ben L.). Von Neumann, Einstein, Newton, Feynman

But their impact on science pales in comparison (with the possible exception of Newton) to the impact of one statistician: R.A. Fisher. Many of the concepts he developed are so common and are considered so standard, that he is never cited/credited. Here are some examples of things he invented along with a conservative number of citations they would have received calculated via Google Scholar*. 

  1. P-values - 3 million citations
  2. Analysis of variance (ANOVA) - 1.57 million citations
  3. Maximum likelihood estimation - 1.54 million citations
  4. Fisher’s linear discriminant 62,400 citations
  5. Randomization/permutation tests 37,940 citations
  6. Genetic linkage analysis 298,000 citations
  7. Fisher information 57,000 citations
  8. Fisher’s exact test 237,000 citations

A couple of notes:

  1. These are seriously conservative estimates, since I only searched for a few variants on some key words
  2. These numbers are BIG, there isn’t another scientist in the ballpark. The guy who wrote the “most highly cited paper” got 228,441 citations on GS. His next most cited paper? 3,000 citations. Fisher has at least 5 papers with more citations than his best one. 
  3. This page says Bert Vogelstein has the most citations of any person over the last 30 years. If you add up the number of citations to his top 8 papers on GS, you get 57,418. About as many as to the Fisher information matrix. 

I think this really speaks to a couple of things. One is that Fisher invented some of the most critical concepts in statistics. The other is the breadth of impact of statistical ideas across a range of disciplines. In any case, I would be hard pressed to think of another scientist who has influenced a greater range or depth of scientists with their work. 

* Calculations of citations #####################

  1. As described in a previous post
  2. # of GS results for “Analysis of Variance” + # for “ANOVA” - “Analysis of Variance”
  3. # of GS results for “maximum likelihood”
  4. # of GS results for “linear discriminant”
  5. # of GS results for “permutation test” + # for ”permutation tests” - “permutation test”
  6. # of GS results for “linkage analysis”
  7. # of GS results for “fisher information” + # for “information matrix” - “fisher information”
  8. # of GS results for “fisher’s exact test” + # for “fisher exact test” - “fisher’s exact test”

P-values and hypothesis testing get a bad rap - but we sometimes find them useful.

This post written by Jeff Leek and Rafa Irizarry.

The p-value is the most widely-known statistic. P-values are reported in a large majority of scientific publications that measure and report data. R.A. Fisher is widely credited with inventing the p-value. If he was cited every time a p-value was reported his paper would have, at the very least, 3 million citations* - making it the most highly cited paper of all time. 

However, the p-value has a large number of very vocal critics. The criticisms of p-values, and hypothesis testing more generally, range from philosophical to practical. There are even entire websites dedicated to “debunking” p-values! One issue many statisticians raise with p-values is that they are easily misinterpreted, another is that p-values are not calibrated by sample size, another is that it ignores existing information or knowledge about the parameter in question, and yet another is that very significant (small) p-values may result even when the value of the parameter of interest is scientifically uninteresting.

We agree with all these criticisms. Yet, in practice, we find p-values useful and, if used correctly, a powerful tool for the advancement of science. The fact that many misinterpret the p-value is not the p-value’s fault. If the statement “under the null the chance of observing something this convincing is 0.65” is correct, then why not use it? Why not explain to our collaborator that the observation they thought was so convincing can easily happen by chance in a setting that is uninteresting. In cases where p-values are small enough then the substantive experts can help decide if the parameter of interest is scientifically interesting. In general, we find p-value to be superior to our collaborators intuition of what patterns are statistically interesting and which ones are not.

We also find p-values provide a simple way to construct decision algorithms. For example, a government agency can define general rules based on p-values that are applied equally to products needing a specific seal of approval. If the rule proves to be to lenient or restrictive, we change the p-value cut-off appropriately. In this situation we view the p-value as part of a practical protocol, not a tool for statistical inference.

Moreover the p-value has the following useful properties for applied statisticians:

  1. p-values are easy to calculate, even for complicated statistics. Many statistics do not lend themselves to easy analytic calculation; but using permutation and bootstrap procedures p-values can be calculated even for very complicated statistics. 
  2. p-values are relatively easy to understand.  The statistical interpretation of the p-value remains roughly the same no matter how complicated the underlying statistic and they also bounded between 0 and 1. This also means that p-values are easy to mis-interpret - they are not posterior probabilities. But this is a difficulty with education, not a difficulty with the statistic itself. 
  3. p-values have simple, universal properties  Correct p-values are uniformly distributed under the null, regardless of how complicated the underlying statistic. 
  4. p-values are calibrated to error rates scientists care about Regardless of the underlying statistic, calling all P-values less than 0.05 significant leads to on average about 5% false positives even if the null hypothesis is always true. If this property is ignored things like publication bias can result, but again this is a problem with education and the scientific process, not with p-values. 
  5. p-values are useful for multiple testing correction. The advent of new measurement technology has shifted much of science from hypothesis driven to discovery driven making the existing multiple testing machinery useful. Using the simple, universal properties of p-values it is possible to easily calculate estimates of quantities like the false discovery rate - the rate at which discovered associations are false.
  6. p-values are reproducible. All statistics are reproducible with enough information. Given the simplicity of calculating p-values, it is relatively easy to communicate sufficient information to reproduce them. 

We agree there are flaws with p-values, just like there are with any statistic one might choose to calculate. In particular, we do think that confidence intervals should be reported with p-values when possible. But we believe that any other decision-making statistic would lead to other problems. One thing we are sure about is that p-values beat scientists’ intuition about chance any day. So before bashing p-values too much we should be careful because, like democracy to government, p-values may be the worst form of statistical significance calculation except all those other forms that have been tried from time to time. 


* Calculated using Google Scholar using the formula:

Number of P-value Citations = # of papers with exact phrase “P < 0.05” + (# of papers with exact phrase “P < 0.01” and not exact phrase “P < 0.05”) +   (# of papers with exact phrase “P < 0.001” and not exact phrase “P < 0.05” or “P < 0.001”) 

= 1,320,000 + 1,030,000 + 662,500

This is obviously an extremely conservative estimate.