Author Archives: Jeff Leek

Picking a (bio)statistics thesis topic for real world impact and transferable skills

One of the things that was hardest for me in graduate school was starting to think about my own research projects and not just the ideas my advisor fed me. I remember that it was stressful because I didn't quite … Continue reading

Posted in Uncategorized | Leave a comment

The #rOpenSci hackathon #ropenhack

Editor's note: This is a guest post by Alyssa Frazee, a graduate student in the Biostatistics department at Johns Hopkins and a participant in the recent rOpenSci hackathon.  Last week, I took a break from my normal PhD student schedule … Continue reading

Posted in Uncategorized | Leave a comment

A non-comprehensive comparison of prominent data science programs on cost and frequency.

We did a really brief comparison of a few notable data science programs for a grant submission we were working on. I thought it was pretty fascinating, so I'm posting it here. A couple of notes about the table. 1. Our … Continue reading

Posted in Uncategorized | 9 Comments

The 80/20 rule of statistical methods development

Developing statistical methods is hard and often frustrating work. One of the under appreciated rules in statistical methods development is what I call the 80/20 rule (maybe could even by the 90/10 rule). The basic idea is that the first … Continue reading

Posted in Uncategorized | 1 Comment

The time traveler's challenge.

Editor's note: This has nothing to do with statistics.  I do a lot of statistics for a living and would claim to know a relatively large amount about it. I also know a little bit about a bunch of other scientific … Continue reading

Posted in Uncategorized | 27 Comments

Oh no, the Leekasso....

An astute reader (Niels Hansen, who is visiting our department today) caught a bug in my code on Github for the Leekasso. I had: lm1 = lm(y ~ leekX) predict.lm(lm1, Unfortunately, this meant that I was getting predictions for the … Continue reading

Posted in Uncategorized | 9 Comments

PLoS One, I have an idea for what to do with all your profits: buy hard drives

I've been closely following the fallout from PLoS One's new policy for data sharing. The policy says, basically, that if you publish a paper, all data and code to go with that paper should be made publicly available at the … Continue reading

Posted in Uncategorized | 7 Comments

Repost: Ronald Fisher is one of the few scientists with a legit claim to most influential scientist ever

Editor's Note: Ronald  This is a repost of the post "R.A. Fisher is the most influential scientist ever" with a picture of my pilgrimage to his  gravesite in Adelaide, Australia.  You can now see profiles of famous scientists on Google Scholar citations. … Continue reading

Posted in Uncategorized | 6 Comments

On the scalability of statistical procedures: why the p-value bashers just don't get it.

Executive Summary The problem is not p-values it is a fundamental shortage of data analytic skill. In general it makes sense to reduce researcher degrees of freedom for non-experts, but any choice of statistic, when used by many untrained people, … Continue reading

Posted in Uncategorized | 19 Comments

Monday data/statistics link roundup (2/10/14)

I'm going to try Monday's for the links. Let me know what you think. The Guardian is reading our blog. A week after Rafa posts that everyone should learn to code for career preparedness, the Guardian gets on the bandwagon. … Continue reading

Posted in Uncategorized | 1 Comment