Marie Curie says stop hating on quilt plots already.

"There are sadistic scientists who hurry to hunt down error instead of establishing the truth." -Marie Curie (http://en.wikiquote.org/wiki/Marie_Curie)

Thanks to Kasper H. for that quote. I think it is a perfect fit for today's culture of academic put down as academic contribution. One perfect example is the explosion of hate against the quilt plot. A quilt plot is a heatmap with several parameters selected in advance; that's it. This simplification of R's heatmap function appeared in the journal PLoS One. They say (though not up front and not clearly enough for my personal taste) that they know it is just a heatmap.

Over the course of the next several weeks quilt plots went viral. Here are a few example tweets. It was also widely trashed on people's blogs and even in the scientist. So I did an experiment. I built a table of frequencies in R like this and applied the heatmap function in R, then the quilt.plot function in fields, then the function written by the authors of the paper with as minimal tweeking as possible.

set.seed(12345)
library(fields)
x = matrix(rbinom(25,size=4,prob=0.5),nrow=5)
pt = prop.table(x)
heatmap(pt)
quilt.plot(x=rep(1:5,5),y=rep(1:5,5),z=pt)
quilt(pt,1:5,1:5,zlabel="Proportion")

Here are the results:

heatmap

heatmap

quilt.plot

quilt.plot

quilt

quilt

It is clear that out of the box and with no tinkering, the new plot makes something nicer/more interpretable. The columns/rows are where I expect and the scale is there and nicely labeled. Everyone who has ever made heatmaps in R has some bit of code that looks like this:


image(t(bdat)[,nrow(bdat):1],col=colsb(9),breaks=quantile(as.vector(as.matrix(dat)),probs=seq(0,1,length=10)),xaxt="n",yaxt="n",xlab="",ylab="")

To hack together a heatmap in R that looks like you expect. It is a total pain. Obviously the quilt plot paper has a few flaws:

  1. It tries to introduce the quilt plot as a new idea.
  2. It doesn't just come out and say it is a hack of the heatmap function, but tries to dance around it.
  3. It produces code, but only as images in word files. I had to retype the code to make my plot.

That being said here are a couple of other true things about the paper:

  1. The code works if you type it out and apply it.
  2. They produced code.
  3. The paper is open access.
  4. The paper is correct technically.
  5. The hack is useful for users with few R skills.

So why exactly isn't it a paper? It smacks of academic elitism to claim that this isn't good enough because it isn't a "new idea". Not every paper discovers radium. Some papers are better than others and that is ok. I think the quilt plot being published isn't a problem, maybe I don't like the way it is written exactly, but they do acknowledge the heat map, they do produce correct, relevant code, and it does solve a problem people actually have. That is better than a lot of papers that appear in more prestigious journals. Arsenic life anyone?

I think it is useful to have a forum where people can post correct, useful, but not necessarily ground breaking results and get credit for them, even if the credit is modest. Otherwise we might miss out on useful bits of code. Frank Harrell has a bunch of functions that tons of people use but he doesn't get citations, you probably have heard of the Hmisc package if you use R.

But did you know Karl Broman has a bunch of really useful functions in his personal R package, qqline2 is great. I know Rafa has a bunch of functions he has never published because they seem "too trivial" but I use them all the time. Every scientist who touches code has a personal library like this. I'm not saying the quilt plot is in that category. But I am saying that it is stupid not to have a public forum for making these functions available to other scientists. But that won't happen if the "quilt plot backlash" is what people see when they try to get published credit for simple code that solves real problems.

Hacks like the quilt plot can help people who aren't comfortable with R write reproducible scripts without having to figure out every plotting parameter. Keeping in mind that the vast majority of data analysis is not done by statisticians, it seems like these little hacks are an important part of science. If you believe in figshare, github, open science, and shareable code, you shouldn't be making fun of the quilt plotters.

Marie Curie says so.

This entry was posted in Uncategorized. Bookmark the permalink.
  • klmr

    I don’t think this is a fair comparison. The code is using heatmap with default arguments but supplies further arguments to quilt – wasn’t the whole point that this is unnecessary? Supplying just a few additional arguments to heatmap produces a very similar plot (modulo the legend). In fact, heatmap.2(pt, Rowv=NA, Colv=NA, trace='none') does pretty much the same as the quilt call – with the same number of arguments.

    In summary, I don’t agree with the “out of the box, with no tinkering” assessment at all. Even more so as “out of the box” the quilt plot code doesn’t work at all: R complains about an invalid file format when I try to source the Word document. Facetious? I don’t think so. This paper doesn’t “solve” a problem in any real sense – it’s certainly not easier or faster to type out all the code from the paper than it is to google for an appropriate solution.

    “The paper is correct technically” only if we ignore numerous small inaccuracies which e.g. Neil Saunders has listed.

    The gist of this blog post seems to be that we should be less negative (agreed) and that a publication doesn’t have to be amazing to be valid and worthwhile (agreed). But the quilt plot publication wasn’t a-okay. It was truly terrible, because it obviously operated on a complete lack of understanding of the field – of what a heatmap is. Publishing is hard and cumbersome, mainly because there’s a justified desire to maintain as high a quality of research as possible.

    Seeing a paper just ignore all these hurdles and cheat its way through is understandably galling. The standard of publishing was completely subverted here. The arsenic life paper was terrible for the same reason – it failed quality control and thus consisted of entirely spurious results.

    The list of benefits that you attribute to this paper reads a bit like a validation of cargo cult: the authors attempted to do all the right things (albeit without understanding them) and produced an approximation of the real thing. Which is nevertheless not the real thing.

  • verse

    These have "have" existed for over 5 years. Check out three levelplot function from the lattice package. So in essence, there truly is nothing novel about their work.

  • David J. Harris

    If the quilt plot paper had been presented as a quick little hack for getting a heat map with fewer lines of code, that would be one thing (although PLOS One doesn't really seem like the correct forum for it).

    But the paper was *deeply* confused. The authors seemed not to know that anyone else had ever suggested plotting heatmaps without dendrograms attached. The editorial process failed on that front.

  • leslie

    Providing useful code is a worthy contribution, and it would not have been wrong of plosone to accept a software release type article. But in presented this as methodology, not software, the authors are attaching their byline to the heatmap. I agree with the haters, how did THIS paper get published with these statements in the abstract "We propose a simple tool for visualization of data, known as a “quilt plot”, that provides an alternative to presenting large volumes of data as frequency tables." and in conclusion, "We recommend that, where possible, “quilt plots” be used along with traditional quantitative assessments of the data as an explanatory data analysis tool."

  • http://nsaunders.wordpress.com neilfws

    I saw very little, if any, discussion of quilt plots that I would characterise as "hating" or making fun. Twitter's 140 character limit leaves little room for subtlety or reflection, so criticism is often harsh and fierce, but I would not mistake that for hate.

    The issue here is not with the authors or even their work, but with what level of contribution merits publication in a scholarly journal. I think almost everyone agrees that the answer is "substantially more than this one." There's nothing elitist about that.

    As others have said: had they posted the code to e.g. Github and written a blog post, there would be no problem.

  • John C. Earls

    I don't get the hate here. Looks like a nice, simple heatmap tool that was published in PLOS ONE. My understanding is that PLOS ONE is the place to put things when novelty and impact are not being considered. I think they should have put the software up on CRAN, but I looked through the paper and saw no outlandish claims.

    Jeff mentioned all of the little tools he and other statisticians have built that might have some utility to others, but instead sit in some folder on their hard drive. It would be great if PLOS ONE became a place where those sort of tools could be shared with other researchers, especially if that could be done without 4 months of back and forth with editors.

    To me, this paper simply says here is an easy way to make heatmaps in R. Enjoy. The field could use more "papers" like that.