Simply Statistics


Simply Statistics Podcast #4: Interview with Rebecca Nugent

Interview with Rebecca Nugent of Carnegie Mellon University.

In this episode Jeff and I talk with Rebecca Nugent, Associate Teaching Professor in the Department of Statistics at Carnegie Mellon University. We talk with her about her work with the Census and the growing interest in statistics among undergraduates.


Statistics isn't math but statistics can produce math

Mathgen, the web site that can produce randomly generated mathematics papers has apparently gotten a paper accepted in a peer-reviewed journal (although perhaps not the most reputable one). I am not at all surprised this happened, but it’s fun to read both the paper and the reviewer’s comments. 

(Thanks to Kasper H. for the pointer.)


Comparing Hospitals

There was a story a few weeks ago on NPR about how Medicare will begin fining hospitals that have 30-day readmission rates that are too high. This process was introduced in the Affordable Care Act and

Under the health care law, the penalties gradually will rise until 3 percent of Medicare payments to hospitals are at risk. Medicare is considering holding hospitals accountable on four more measures: joint replacements, stenting, heart bypass and treatment of stroke.

Those of you taking my computing course on Coursera have already seen some of the data used to for this assessment, which can be obtained at the hospital compare web site. It’s also worth noting that underlying the analysis for this was a detailed and thoughtful report published by the Committee of Presidents of Statistical Societies (COPSS) which was chaired by Tom Louis, a Professor here at Johns Hopkins.

The report, titled “Statistical Issues in Assessing Hospital Performance” covers much of the current methodology and its criticisms and has a number of recommendations. Of particular concern for hospitals is the issue of shrinkage targets—in an hierarchical model the estimate of the readmission rate for a hospital is shrunken towards the mean. But which mean? Hospitals with higher risk or sicker patient populations will look quite a bit worse than hospitals sitting amongst a healthy population if they are both compared to the same mean.

The report is worth reading even if you’re just interested in the practical application of hierarchical models. And the web site is fun to explore if you want to know how the hospitals around you are fairing.


[vimeo 43305640 w=500 h=281]

Johns Hopkins grad Anthony Damico shows how to make coffee with R (except not really). The BLS mug is what makes it for me.


A statistician loves the how do we analyze it?

Amanda Palmer broke Twitter yesterday with her insurance poll. She started off just talking about how hard it is for musicians who rarely have health insurance, but then wandered into polling territory. She sent out a request for people to respond with the following information:

quick twitter poll. 1) COUNTRY?! 2) profession? 3) insured? 4) if not, why not, if so, at what cost per month (or covered by job)?

This quick little poll struck a nerve with people and her Twitter feed blew up. Long story short, tons of interesting information was gathered from folks. This information is frequently kept semi-obscured, particularly what is the cost of health insurance for folks in different places. This isn’t the sort of info that insurance companies necessarily publicize widely and isn’t the sort of thing people talk about. 

The results were really fascinating and its worth reading the above blog post or checking out the hashtag: #insurancepoll. But the most fascinating thing for me as a statistician was thinking about how to analyze these data. @aubreyjaubrey is apparently collecting the data someplace, hopefully she’ll make it public. 

At least two key issues spring to mind:

  1. This is a massive convenience sample. 
  2. It is being collected through a social network

Although I’m sure there are more. If a student is looking for an amazingly interesting/rich data set and some seriously hard stats problems, they should get in touch with Aubrey and see if they can make something of it!


Sunday Data/Statistics Link Roundup (10/14/12)

  1. A fascinating article about the debate on whether to regulate sugary beverages. One of the protagonists is David Allison, a statistical geneticist, among other things. It is fascinating to see the interplay of statistical analysis and public policy. Yet another example of how statistics/data will drive some of the most important policy decisions going forward. 
  2. A related article is this one on the way risk is reported in the media. It is becoming more and more clear that to be an educated member of society now means that you absolutely have to have a basic understanding of the concepts of statistics. Both leaders and the general public are responsible for the danger that lies in misinterpreting/misleading with risk. 
  3. A press release from the Census Bureau about how the choice of college major can have a major impact on career earnings. More data breaking the results down by employment characteristics and major are here and here. These data update some of the data we have talked about before in calculating expected salaries by major. (via Scott Z.)
  4. An interesting article about Recorded Future that describes how they are using social media data etc. to try to predict events that will happen. I think this isn’t an entirely crazy idea, but the thing that always strikes me about these sorts of project is how hard it is to measure success. It is highly unlikely you will ever exactly predict a future event, so how do you define how close you were? For instance, if you predicted an uprising in Egypt, but missed by a month, is that a good or a bad prediction? 
  5. Seriously guys, this is getting embarrassing. An article appears in the New England Journal “finding” an association between chocolate consumption and Nobel prize winners.  This is, of course, a horrible statistical analysis and unless it was a joke to publish it, it is irresponsible of the NEJM to publish. I’ll bet any student in Stat 101 could find the huge flaws with this analysis. If the editors of the major scientific journals want to continue publishing statistical papers, they should get serious about statistical editing.

What's wrong with the predicting h-index paper.

Editor’s Note: I recently posted about a paper in Nature that purported to predict the H-index. The authors contacted me to get my criticisms, then responded to those criticisms. They have requested the opportunity to respond publicly, and I think it is a totally reasonable request. Until there is a better comment generating mechanism at the journal level, this seems like as good a forum as any to discuss statistical papers. I will post an extended version of my criticisms here and give them the opportunity to respond publicly in the comments. 

The paper in question is a clearly a clever idea and the kind that would get people fired up. Quantifying researchers output is all the rage and being able to predict this quantity in the future would obviously make a lot of evaluators happy. I think it was, in that sense, a really good idea to chase down these data, since it was clear that if they found anything at all it would be very widely covered in the scientific/popular press. 

My original post was inspired out of my frustration with Nature, which has a history of publishing somewhat suspect statistical papers, such as this one. I posted the prediction contest after reading another paper I consider to be a flawed statistical paper, both for statistical reasons and for scientific reasons. I originally commented on the statistics in my post. The authors, being good sports, contacted me for my criticisms. I sent them the following criticisms, which I think are sufficiently major that a statistical referee or statistical journal would have likely rejected the paper:
  1. Lack of reproducibility. The code/data are not made available either through Nature or on your website. This is a critical component of papers based on computation and has led to serious problems before. It is also easily addressable. 
  2. No training/test set. You mention cross-validation (and maybe the R^2 is the R^2 using the held out scientists?) but if you use the cross-validation step to optimize the model parameters and to estimate the error rate, you could see some major overfitting. 
  3. The R^2 values are pretty low. An R^2 of 0.67 is obviously superior to the h-index alone, but (a) there is concern about overfitting, and (b) even without overfitting, that low of R^2 could lead to substantial variance in predictions. 
  4. The prediction error is not reported in the paper (or in the online calculator). How far off could you be at 5 years, at 10? Would the results still be impressive with those errors reported?
  5. You use model selection and show only the optimal model (as described in the last paragraph of the supplementary), but no indication of the potential difficulties with this model selection are made in the text. 
  6. You use a single regression model without any time variation in the coefficients and without any potential non-linearity. Clearly when predicting several years into the future there will be variation with time and non-linearity. There is also likely heavy variance in the types of individuals/career trajectories, and outliers may be important, etc. 
They carefully responded to these criticisms and hopefully they will post their responses in the comments. My impression based on their responses is that the statistics were not as flawed as I originally thought, but that the data aren’t sufficient to form a useful prediction. 
However, I think the much bigger flaw is the basic scientific premise. The h-index has been identified as having major flaws, biases (including gender bias), and to be a generally poor summary of a scientist’s contribution. See here, the list of criticisms here, and the discussion here for starters. The authors of the Nature paper propose a highly inaccurate predictor of this deeply flawed index. While that alone is sufficient to call into question the results in the paper, the authors also make bold claims about their prediction tool: 
Our formula is particularly useful for funding agencies, peer reviewers and hir­ing committees who have to deal with vast 
numbers of applications and can give each only a cursory examination. Statistical techniques have the advantage of returning 
results instantaneously and in an unbiased way.
Suggesting that this type of prediction should be used to make important decisions on hiring, promotion, and funding is highly scientifically flawed. Coupled with the online calculator the authors handily provide (which produces no measure of uncertainty) it makes it all too easy for people to miss the real value of scientific publications: the science contained in them. 

Why we should continue publishing peer-reviewed papers

Several bloggers are calling for the end of peer-reviewed journals as we know them. Jeff suggest that we replace them with a system in which everyone posts their papers on their blog, pubmed aggregates the feeds, and peer-review happens post publication via, for example, counting up like and dislike votes. In my view, many of these critiques seem to conflate problems from different aspects of the process. Here I try to break down the current system into its key components and defend the one aspect I think we should preserve (at least for now): pre-publication peer-review.

To avoid confusion let me start by enumerating some of the components for which I agree change is needed.

  • There is no need to produce paper copies of our publications. Indulging our preference for reading hard copies does not justify keeping the price of disseminating our work twice as high as it should be. 
  • There is no reason to be sending the same manuscript (adapted to fit guidelines) to several journals, until it gets accepted. This frustrating and time-consuming process adds very little value (we previously described Nick Jewell’s solution). 
  • There is no reason for publications to be static. As Jeff and many others suggest, readers should be able to comment and rate systematically on published papers and authors should be able to update them.

However, all these changes can be implemented without doing away with pre-publication peer-review.

A key reason American and British universities consistently lead the pack of research institutions is their strict adherence to a peer-review system that minimizes cronyism and tolerance for mediocrity. At the center of this system is a promotion process in which outside experts evaluate a candidate’s ability to produce high quality ideas. Peer-reviewed journal articles are the backbone of this evaluation. When reviewing a candidate I familiarize myself with his or her work by reading 5-10 key papers. It’s true that I read these disregarding the journal and blog posts would serve the same purpose. But I also use the publication section of the CV not only because reading all papers is logistically impossible but because these have already been evaluated by ~ three referees plus an editor and provide an independent assessment to mine. I also use the journal’s prestige because although it is a highly noisy measure of quality, the law of large numbers starts kicking in after 10 papers or so. 

So are three reviewers better than the entire Internet? Can a reddit-like system provide just as much signal as the current peer-reviewed journal? You can think of the current system as a cooperative in which we all agree to read each other’s papers thoroughly (we evaluate 2-3 for each one we publish) with journals taking care of the logistics. The result of a review is an estimate of quality ranging from highest (Nature, Science) to 0 (not published). This estimate is certainly noisy given the bias and quality variance of referees and editors. But, across all papers on a CV variance is reduced and bias averages out (I note that we complain vociferously when the bias keeps us from publishing in a good journal, but we rarely say a word when the bias helps us get into a better journal than deserved). Jeff’s argument is that post-publication review will result in many more evaluations and therefore a stronger signal to noise ratio. I need to see evidence of this before being convinced. In the current system ~ three referees commit to thoroughly reviewing of the paper. If they do a sloppy job, they will embarrass themselves with an editor or an AE (not a good thing). With the post-publication review system nobody is forced to review. I fear most papers will go without comment or votes, including really good ones. My feeling is that marketing and PR will matter even more than it does now and that’s not a good thing.

Dissemination of ideas is another important role of the literature. Jeff describes a couple of anecdotes to argue it can be sped up by just posting it on your blog.

I posted a quick idea called the Leekasso, which led to some discussion on the blog, has nearly 2,000 page views

But the typical junior investigator does not have a blog with hundreds of followers. Will their papers ever be read if even more papers are added to the already bloated scientific literature? The current peer-review system provides an important filter. There is an inherent trade-off between speed of dissemination and quality and it’s not clear to me that we should swing the balance all the way over to the speed side. There are other ways to speed up dissemination that we should try first. Also there is nothing stopping us from posting our papers online before publication and promoting them via twitter or an aggregator. In fact, as pointed out by Jan Jensen on Jeff’s post,  arXiv papers are indexed on Google Scholar within a week, which also keeps track of arXiv citations.

The Internet is bringing many changes that will improve our peer-review system. But the current pre-publication peer-review process does a decent job of  

  1. providing signal for the promotion process and
  2. reducing noise in the literature to make dissemination possible. 

Any alternative systems should be evaluated carefully before dismantling a system that has helped keep our Universities at the top of the world rankings.


Sunday Data/Statistics Link Roundup (10/7/12)

  1. Jack Welch got a little conspiracy-theory crazy with the job numbers. Thomas Lumley over at StatsChat makes a pretty good case for debunking the theory. I think the real take home message of Thomas’ post and one worth celebrating/highlighting is that agencies that produce the jobs report do so based on a fixed and well-defined study design. Careful efforts by government statistics agencies make it hard to fudge/change the numbers. This is an underrated and hugely important component of a well-run democracy. 
  2. On a similar note Dan Gardner at the Ottawa Citizen points out that evidence-based policy making is actually not enough. He points out the critical problem with evidence: in the era of data what is a fact? “Facts” can come from flawed or biased studies just as easily from strong studies. He suggests that a true “evidence based” administration would invest more money in research/statistical agencies. I think this is a great idea. 
  3. An interesting article by Ben Bernanke suggesting that an optimal approach (in baseball and in policy) is one based on statistical analysis, coupled with careful thinking about long-term versus short-term strategy. I think one of his arguments about allowing players to play even when they are struggling short term is actually a case for letting the weak law of large numbers play out. If you have a player with skill/talent, they will eventually converge to their “true” numbers. It’s also good for their confidence….(via David Santiago).
  4. Here is another interesting peer review dust-up. It explains why some journals “reject” papers when they really mean major/minor revision to be able to push down their review times. I think this highlights yet another problem with pre-publication peer review. The evidence is mounting, but I hear we may get a defense of the current system from one of the editors of this blog, so stay tuned…
  5. Several people (Sherri R., Alex N., many folks on Twitter) have pointed me to this article about gender bias in science. I initially was a bit skeptical of such a strong effect across a broad range of demographic variables. After reading the supplemental material carefully, it is clear I was wrong. It is a very well designed/executed study and suggests that there is still a strong gender bias in science, across ages and disciplines. Interestingly both men and women were biased against the female candidates. This is clearly a non-trivial problem to solve and needs a lot more work, maybe one step is to make recruitment packages more flexible (see the comment by Allison T. especially).