Tag: Citations

04
Jan

Does NIH fund innovative work? Does Nature care about publishing accurate articles?

Editor's Note: In a recent post we disagreed with a Nature article claiming that NIH doesn't support innovation. Our colleague Steven Salzberg actually looked at the data and wrote the guest post below. 

Nature published an article last month with the provocative title "Research grants: Conform and be funded."  The authors looked at papers with over 1000 citations to find out whether scientists "who do the most influential scientific work get funded by the NIH."  Their dramatic conclusion, widely reported, was that only 40% of such influential scientists get funding.

Dramatic, but wrong.  I re-analyzed the authors' data and wrote a letter to Nature, which was published today along with the authors response, which more or less ignored my points.  Unfortunately, Nature cut my already-short letter in half, so what readers see in the journal omits half my argument.  My entire letter is published here, thanks to my colleagues at Simply Statistics.  I titled it "NIH funds the overwhelming majority of highly influential original science results," because that's what the original study should have concluded from their very own data.  Here goes:

To the Editors:

In their recent commentary, "Conform and be funded," Joshua Nicholson and John Ioannidis claim that "too many US authors of the most innovative and influential papers in the life sciences do not receive NIH funding."  They support their thesis with an analysis of 200 papers sampled from 700 life science papers with over 1,000 citations.  Their main finding was that only 40% of "primary authors" on these papers are PIs on NIH grants, from which they argue that the peer review system "encourage[s] conformity if not mediocrity."

While this makes for an appealing headline, the authors' own data does not support their conclusion.  I downloaded the full text for a random sample of 125 of the 700 highly cited papers [data available upon request].  A majority of these papers were either reviews (63), which do not report original findings, or not in the life sciences (17) despite being included in the authors' database.  For the remaining 45 papers, I looked at each paper to see if the work was supported by NIH.  In a few cases where the paper did not include this information, I used the NIH grants database to determine if the corresponding author has current NIH support.  34 out of 45 (75%) of these highly-cited papers were supported by NIH.  The 11 papers not supported included papers published by other branches of the U.S. government, including the CDC and the U.S. Army, for which NIH support would not be appropriate.  Thus, using the authors' own data, one would have to conclude that NIH has supported a large majority of highly influential life sciences discoveries in the past twelve years.

The authors – and the editors at Nature, who contributed to the article – suffer from the same biases that Ioannidis himself has often criticized.  Their inclusion of inappropriate articles and especially the choice to require that both the first and last author be PIs on an NIH grant, even when the first author was a student, produced an artificially low number that misrepresents the degree to which NIH supports innovative original research.

It seems pretty clear that Nature wanted a headline about how NIH doesn't support innovation, and Ioannidis was happy to give it to them.  Now, I'd love it if NIH had the funds to support more scientists, and I'd also be in favor of funding at least some work retrospectively - based on recent major achievements, for example, rather than proposed future work.  But the evidence doesn't support the "Conform and be funded" headline, however much Nature might want it to be true.

30
Dec

Sunday data/statistics link roundup (12/30/12)

  1. An interesting new app called 100plus, which looks like it uses public data to help determine how little decisions (walking more, one more glass of wine, etc.) lead to more or less health. Here's a post describing it on the heathdata.gov blog. As far as I can tell, the app is still in beta, so only the folks who have a code can download it.
  2. Data on mass shootings from the Mother Jones investigation.
  3. A post by Hilary M. on "Getting Started with Data Science". I really like the suggestion of just picking a project and doing something, getting it out there. One thing I'd add to the list is that I would spend a little time learning about an area you are interested in. With all the free data out there, it is easy to just "do something", without putting in the requisite work to know why what you are doing is good/bad. So when you are doing something, make sure you take the time to "know something".
  4. An analysis of various measures of citation impact (also via Hilary M.). I'm not sure I follow the reasoning behind all of the analyses performed (seems a little like throwing everything at the problem and hoping something sticks) but one interesting point is how citation/usage are far apart from each other on the PCA plot. This is likely just because the measures cluster into two big categories, but it makes me wonder. Is it better to have a lot of people read your paper (broad impact?) or cite your paper (deep impact?).
  5. An interesting conversation on Twitter about how big data does not mean you can ignore the scientific method. We have talked a little bit about this before, in terms of how one should motivate statistical projects.
08
Mar

A plot of my citations in Google Scholar vs. Web of Science

There has been some discussion about whether Google Scholar or one of the proprietary software companies numbers are better for citation counts. I personally think Google Scholar is better for a number of reasons:

  1. Higher numbers, but consistently/adjustably higher :-)
  2. It’s free and the data are openly available. 
  3. It covers more ground (patents, theses, etc.) to give a better idea of global impact
  4. It’s easier to use

I haven’t seen a plot yet relating Web of Science citations to Google Scholar citations, so I made one for my papers.

GS has about 41% more citations per paper than Web of Science. That is consistent with what other people have found. It also looks reasonably linearish. I wonder what other people’s plots would look like? 

Here is the R code I used to generate the plot (the names are Pubmed IDs for the papers):

library(ggplot2)

names = c(16141318,16357033,16928955,17597765,17907809,19033188,19276151,19924215,20560929,20701754,20838408, 21186247,21685414,21747377,21931541,22031444,22087737,22096506,22257669) 

y = c(287,122,84,39,120,53,4,52,6,33,57,0,0,4,1,5,0,2,0)

x = c(200,92,48,31,79,29,4,51,2,18,44,0,0,1,0,2,0,1,0)

Year = c(2005,2006,2007,2007,2007,2008,2009,2009,2011,2010,2010,2011,2012,2011,2011,2011,2011,2011,2012)

q <- qplot(x,y,xlim=c(-20,300),ylim=c(-20,300),xlab=”Web of Knowledge”,ylab=”Google Scholar”) + geom_point(aes(colour=Year),size=5) + geom_line(aes(x = y, y = y),size=2)

07
Mar

R.A. Fisher is the most influential scientist ever

You can now see profiles of famous scientists on Google Scholar citations. Here are links to a few of them (via Ben L.). Von Neumann, Einstein, Newton, Feynman

But their impact on science pales in comparison (with the possible exception of Newton) to the impact of one statistician: R.A. Fisher. Many of the concepts he developed are so common and are considered so standard, that he is never cited/credited. Here are some examples of things he invented along with a conservative number of citations they would have received calculated via Google Scholar*. 

  1. P-values - 3 million citations
  2. Analysis of variance (ANOVA) - 1.57 million citations
  3. Maximum likelihood estimation - 1.54 million citations
  4. Fisher’s linear discriminant 62,400 citations
  5. Randomization/permutation tests 37,940 citations
  6. Genetic linkage analysis 298,000 citations
  7. Fisher information 57,000 citations
  8. Fisher’s exact test 237,000 citations

A couple of notes:

  1. These are seriously conservative estimates, since I only searched for a few variants on some key words
  2. These numbers are BIG, there isn’t another scientist in the ballpark. The guy who wrote the “most highly cited paper” got 228,441 citations on GS. His next most cited paper? 3,000 citations. Fisher has at least 5 papers with more citations than his best one. 
  3. This page says Bert Vogelstein has the most citations of any person over the last 30 years. If you add up the number of citations to his top 8 papers on GS, you get 57,418. About as many as to the Fisher information matrix. 

I think this really speaks to a couple of things. One is that Fisher invented some of the most critical concepts in statistics. The other is the breadth of impact of statistical ideas across a range of disciplines. In any case, I would be hard pressed to think of another scientist who has influenced a greater range or depth of scientists with their work. 

* Calculations of citations #####################

  1. As described in a previous post
  2. # of GS results for “Analysis of Variance” + # for “ANOVA” - “Analysis of Variance”
  3. # of GS results for “maximum likelihood”
  4. # of GS results for “linear discriminant”
  5. # of GS results for “permutation test” + # for ”permutation tests” - “permutation test”
  6. # of GS results for “linkage analysis”
  7. # of GS results for “fisher information” + # for “information matrix” - “fisher information”
  8. # of GS results for “fisher’s exact test” + # for “fisher exact test” - “fisher’s exact test”