Simply Statistics


25 minute seminars

Most Statistics and Biostatistics departments have weekly seminars. We usually invite outside speakers to share their knowledge via a 50 minute powerpoint (or beamer) presentation. This gives us the opportunity to meet colleagues from other Universities and pick their brains in small group meetings. This is all great. But, giving a good one hour seminar is hard. Really hard. Few people can pull it off. I propose to the statistical community that we cut the seminars to 25 minutes with 35 minutes for questions and further discussion. We can make exceptions of course. But in general, I think we would all benefit from shorter seminars. 


How do you spend your day?

I’ve seen visualizations of how people spend their time a couple of places. Here is a good one over at Flowing Data. 


Getting email responses from busy people

I’ve had the good fortune of working with some really smart and successful people during my career. As a young person, one problem with working with really successful people is that they get a ton of email. Some only see the subject lines on their phone before deleting them. 

I’ve picked up a few tricks for getting email responses from important/successful people:  

The SI Rules

  1. Try to send no more than one email a day. 
  2. Emails should be 3 sentences or less. Better if you can get the whole email in the subject line. 
  3. If you need information, ask yes or no questions whenever possible. Never ask a question that requires a full sentence response.
  4. When something is time sensitive, state the action you will take if you don’t get a response by a time you specify. 
  5. Be as specific as you can while conforming to the length requirements. 
  6. Bonus: include obvious keywords people can use to search for your email. 

Anecdotally, SI emails have a 10-fold higher response probability. The rules are designed around the fact that busy people who get lots of email love checking things off their list. SI emails are easy to check off! That will make them happy and get you a response. 

It takes more work on your end when writing an SI email. You often need to think more carefully about what to ask, how to phrase it succinctly, and how to minimize the number of emails you write. A surprising side effect of applying SI principles is that I often figure out answers to my questions on my own. I have to decide which questions to include in my SI emails and they have to be yes/no answers, so I end up taking care of simple questions on my own. 

Here are examples of SI emails just to get you started: 

Example 1

Subject: Is my response to reviewer 2 ok with you?

Body: I’ve attached the paper/responses to referees.

Example 2

Subject: Can you send my letter of recommendation to


Keywords = recommendation, Jeff, John Doe.

Example 3

Subject: I revised the draft to include your suggestions about simulations and language

Revisions attached. Let me know if you have any problems, otherwise I’ll submit Monday at 2pm. 


Dongle communism

If you have a mac and give talks or teach, chances are you have embarrassed yourself by forgetting your dongle. Our lab meetings and classes were constantly delayed due to missing dongles. Communism solved this problem. We bought 10 dongles, sprinkled them around the department, and declared all dongles public property. All dongles, not just the 10. No longer do we have to ask to borrow dongles because they have no owner. Please join the revolution. ps -I think this should apply to pens too!


The Killer App for Peer Review

A little while ago, over at Genomes Unzipped, Joe Pickrell asked, “Why publish science in peer reviewed journals?” He points out the flaws with the current peer review system and suggests how we can do better. What he suggests is missing is the killer app for peer review. 

Well, PLoS has now developed an API, where you can easily access tons of data on the papers published in those journals including downloads, citations, number of social bookmarks, and mentions in major science blogs. Along with Mendeley a free reference manager, they have launched an competition to build cool apps with their free data. 

Seems like with the right statistical analysis/cool features a recommender system for say, PLoS One could have most of the features suggested by Joe in his article. One idea would be an RSS-feed based on an idea like the Pandora music sharing service. You input a couple of papers you like from the journal, then it creates an RSS feed with papers similar to that paper. 



I think our field would attract more students if we changed the name to something ending with X or K. I’ve joked about this for years, but someone has actually done it (kind of):


Small ball is a bad strategy

Bill James pointed this out a long time ago. If you don’t know Bill James, you should look him up. I consider him to be one of the most influential statisticians of all times. This post relates to one of his first conjectures: sacrificing outs for runs, referred to as small ball, is a bad strategy. 

ESPN’s Gamecast, a webtool that gives you pitch-by-pitch updates of baseball games, also gives you a pitch-by-pitch “probability” of wining. Gamecast confirms the conjecure with data. How do they calculate this “probability”? I am pretty sure it is based only on historical data. No modeling. For example, if the away team is up 4-2 in the bottom of the 7th with no outs and runners on 1st and 2nd, they look at all the instances exactly like this one that have ever happened in the digitally recorded history of baseball and report the proportion of times the home team wins. Well in this situation this proportion is 45%. If the next batter successfully bunts, moving the runners over, this proportion drops to 41%.  Furthermore, if after the successful bunt, the run from third scores on a sacrifice fly, the proportion drops again from 41%  to 39%. The extra out hurts you more than the extra run helps you. That was Bill James’ intuition: you only have three outs so the last thing you want to do is give 33% away. 


MacArthur Fellow Shwetak Patel

The new MacArthur Fellows list is out and, as usual, they are an interesting bunch. One person that I thought was worth pointing out is Shwetak Patel. I had the privilege of meeting Shwetak at a National Research Council meeting on sustainability and computer science. Basically, he’s working on devices that you can install in your home to monitor your resource usage. He’s already spun-off a startup company to make/sell some of these devices. 

In the writeup for the award, they mention

When coupled with a machine learning algorithm that analyzes patterns of activity and the signature noise produced by each appliance, the sensors enable users to measure and disaggregate their energy and water consumption and to detect inefficiencies more effectively.

Now that’s statistics at work!