Statistical illiteracy may lead to parents panicking about Autism.

I just was doing my morning reading of a few news sources and stumbled across this Huffington Post article talking about research correlating babies cries to autism. It suggests that the sound of a babies cries may predict their future risk for autism. As the parent of a young son, this obviously caught my attention in a very lizard-brain, caveman sort of way. I couldn’t find a link to the research paper in the article so I did some searching and found out this result is also being covered by Time, Science Daily, Medical Daily, and a bunch of other news outlets.

The pebbles of academia

I have just been awarded a certificate for successful completion of the Conflict of Interest Commitment training (I barely passed). Lately, I have been totally swamped by administrative duties and have had little time for actual research. The experience reminded me of something I read in this NYTimes article by Tyler Cowen Michael Mandel, an economist with the Progressive Policy Institute, compares government regulation of innovation to the accumulation of pebbles in a stream.

Sunday Data/Statistics Link Roundup (7/22/12)

This paper is the paper describing how Uri Simonsohn identified academic misconduct using statistical analyses. This approach has received a huge amount of press in the scientific literature. The basic approach is that he calculates the standard deviations of mean/standard deviation estimates across groups being compared. Then he simulates from a Normal distribution and shows that under the Normal model, it is unlikely that the means/standard deviations are so similar.

My worst (recent) experience with peer review

My colleagues and I just published a paper on validation of genomic results in BMC Bioinformatics. It is “highly accessed” and we are really happy with how it turned out. But it was brutal getting it published. Here is the line-up of places I sent the paper.  Science: Submitted 10/6/10, rejected 10/18/10 without review. I know this seems like a long shot, but this paper on validation was published in Science not too long after.

What is a major revision?

I posted a little while ago on a proposal for a fast statistics journal. It generated a bunch of comments and even a really nice follow up post with some great ideas. Since then I’ve gotten reviews back on a couple of papers and I think I realized one of the key issues that is driving me nuts about the current publishing model. It boils down to one simple question: What is a major revision?

Sunday data/statistics link roundup (3/18)

A really interesting proposal by Rafa (in Spanish - we’ll get on him to write a translation) for the University of Puerto Rico. The post concerns changing the focus from simply teaching to creating knowledge and the potential benefits to both the university and to Puerto Rico. It also has a really nice summary of the benefits that the university system in the United States has produced. Definitely worth a read.

An example of how sending a paper to a statistics journal can get you scooped

In a previous post I complained about statistics journals taking way too long rejecting papers. Today I am complaining because even when everything goes right —better than above average review time (for statistics), useful and insightful comments from reviewers— we can come out losing. In May 2011 we submitted a paper on removing GC bias from RNAseq data to Biostatistics. It was published on December 27. However, we were scooped by this BMC Bioinformatics paper published ten days earlier despite being submitted three months later and accepted 11 days after ours.

Where do you get your data?

Here’s a question I get fairly frequently from various types of people: Where do you get your data? This is sometimes followed up quickly with “Can we use some of your data?” My contention is that if someone asks you these questions, start looking for the exits. There are of course legitimate reasons why someone might ask you this question. For example, they might be interested in the source of the data to verify its quality.

P-values and hypothesis testing get a bad rap - but we sometimes find them useful.

This post written by Jeff Leek and Rafa Irizarry. The p-value is the most widely-known statistic. P-values are reported in a large majority of scientific publications that measure and report data. R.A. Fisher is widely credited with inventing the p-value. If he was cited every time a p-value was reported his paper would have, at the very least, 3 million citations* - making it the most highly cited paper of all time.

Dear editors/associate editors/referees, Please reject my papers quickly

The review times for most journals in our field are ridiculous. Check out Figure 1 here. A careful review takes time, but not six months. Let’s be honest, those papers are sitting on desks for the great majority of those six months. But here is what really kills me: waiting six months for a review basically saying the paper is not of sufficient interest to the readership of the journal. That decision you can come to in half a day.

Reverse scooping

I would like to define a new term: reverse scooping is when someone publishes your idea after you, and doesn’t cite you. It has happened to me a few times. What does one do? I usually send a polite message to the authors with a link to my related paper(s). These emails are usually ignored, but not always. Most times I don’t think it is malicious though. In fact, I almost reverse scooped a colleague recently.

Submitting scientific papers is too time consuming

As an academic who does a lot of research for a living, I spend a lot of my time writing and submitting papers. Before my time, this process involved sending multiple physical copies of a paper by snail mail to the editorial office. New technology has changed this process. Now to submit a paper you generally have to: (1) find a Microsoft Word or Latex template for the journal and use it for your paper and (2) upload the manuscript and figures (usually separately).

25 minute seminars

Most Statistics and Biostatistics departments have weekly seminars. We usually invite outside speakers to share their knowledge via a 50 minute powerpoint (or beamer) presentation. This gives us the opportunity to meet colleagues from other Universities and pick their brains in small group meetings. This is all great. But, giving a good one hour seminar is hard. Really hard. Few people can pull it off. I propose to the statistical community that we cut the seminars to 25 minutes with 35 minutes for questions and further discussion.


In this TED talk Jason Fried explains why work doesn’t happen at work. He describes the evils of meetings. Meetings are particularly disruptive for applied statisticians, especially for those of us that hack data files, explore data for systematic errors, get inspiration from visual inspection, and thoroughly test our code. Why? Before I become productive I go through a ramp-up/boot-up stage. Scripts need to be found, data loaded into memory, and most importantly, my brains needs to re-familiarize itself with the data and the essence of the problem at hand.

Where are the Case Studies?

Many case studies I find interesting don’t appear in JASA Applications and Case Studies or other applied statistics journals for that matter. Some because the technical skill needed to satisfy reviewers is not sufficiently impressive, others because they lack mathematical rigor. But perhaps the main reason for this disconnect is that many interesting case studies are developed by people outside our field or outside academia. In this blog we will try to introduce readers to some of these case studies.