The #rOpenSci hackathon #ropenhack

Editor's note: This is a guest post by Alyssa Frazee, a graduate student in the Biostatistics department at Johns Hopkins and a participant in the recent rOpenSci hackathon.  Last week, I took a break from my normal PhD student schedule

A non-comprehensive comparison of prominent data science programs on cost and frequency.

We did a really brief comparison of a few notable data science programs for a grant submission we were working on. I thought it was pretty fascinating, so I'm posting it here. A couple of notes about the table. 1. Our

The 80/20 rule of statistical methods development

Developing statistical methods is hard and often frustrating work. One of the under appreciated rules in statistical methods development is what I call the 80/20 rule (maybe could even by the 90/10 rule). The basic idea is that the first

The time traveler's challenge.

Editor's note: This has nothing to do with statistics.  I do a lot of statistics for a living and would claim to know a relatively large amount about it. I also know a little bit about a bunch of other scientific

Oh no, the Leekasso....

An astute reader (Niels Hansen, who is visiting our department today) caught a bug in my code on Github for the Leekasso. I had: lm1 = lm(y ~ leekX) predict.lm(lm1, Unfortunately, this meant that I was getting predictions for the

PLoS One, I have an idea for what to do with all your profits: buy hard drives

I've been closely following the fallout from PLoS One's new policy for data sharing. The policy says, basically, that if you publish a paper, all data and code to go with that paper should be made publicly available at the

Repost: Ronald Fisher is one of the few scientists with a legit claim to most influential scientist ever

Editor's Note: Ronald  This is a repost of the post "R.A. Fisher is the most influential scientist ever" with a picture of my pilgrimage to his  gravesite in Adelaide, Australia.  You can now see profiles of famous scientists on Google Scholar citations.

On the scalability of statistical procedures: why the p-value bashers just don't get it.

Executive Summary The problem is not p-values it is a fundamental shortage of data analytic skill. In general it makes sense to reduce researcher degrees of freedom for non-experts, but any choice of statistic, when used by many untrained people,

Monday data/statistics link roundup (2/10/14)

I'm going to try Monday's for the links. Let me know what you think. The Guardian is reading our blog. A week after Rafa posts that everyone should learn to code for career preparedness, the Guardian gets on the bandwagon.

Just a thought on peer reviewing - I can't help myself.

Today I was thinking about reviewing, probably because I was handling a couple of papers as AE and doing tasks associated with reviewing several other papers. I know that this is idle thinking, but suppose peer review was just a

