Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Not So Standard Deviations Episode 27 - Special Guest Amelia McNamara

I had the pleasure of sitting down with Amelia McNamara, Visiting Assistant Professor of Statistical and Data Sciences at Smith College, to talk about data science, data journalism, visualization, the problems with R, and adult coloring books.

If you have questions you’d like Hilary and me to answer, you can send them to nssdeviations @ or tweet us at @NSSDeviations.

Show notes:

Download the audio for this episode

Listen here:

Help choose the Leek group color palette

My research group just recently finish a paper where several different teams within the group worked on different analyses. If you are interested the paper describes the recount resource which includes processed versions of thousands of human RNA-seq data sets.

As part of this project each group had to contribute some plots to the paper. One thing that I noticed is that each person used their own color palette and theme when building the plots. When we wrote the paper this made it a little harder for the figures to all fit together - especially when different group members worked on a single panel of a multi-panel plot.

So I started thinking about setting up a Leek group theme for both base R and ggplot2 graphics. One of the first problems was that every group member had their own opinion about what the best color palette would be. So we are running a little competition to determine what the official Leek group color palette for plots will be in the future.

As part of that process, one of my awesome postdocs, Shannon Ellis, decided to collect some data on how people perceive different color palettes. The survey is here:

If you have a few minutes and have an opinion about colors (I know you do!) please consider participating in our little poll and helping to determine the future of Leek group plots!

Open letter to my lab: I am not "moving to Canada"

Dear Lab Members,

I know that the results of Tuesday’s election have many of you concerned about your future. You are not alone. I am concerned about my future as well. But I want you to know that I have no plans of going anywhere and I intend to dedicate as much time to our projects as I always have. Meeting, discussing ideas and putting them into practice with you is, by far, the best part of my job.

We are all concerned that if certain campaign promises are kept many of our fellow citizens may need our help. If this happens, then we will pause to do whatever we can to help. But I am currently cautiously optimistic that we will be able to continue focusing on helping society in the best way we know how: by doing scientific research.

This week Dr. Francis Collins assured us that there is strong bipartisan support for scientific research. As an example consider this op-ed in which Newt Gingrich advocates for doubling the NIH budget. There also seems to be wide consensus in this country that scientific research is highly beneficial to society and an understanding that to do the best research we need the best of the best no matter their gender, race, religion or country of origin. Nothing good comes from creative, intelligent, dedicated people leaving science.

I know there is much uncertainty but, as of now, there is nothing stopping us from continuing to work hard. My plan is to do just that and I hope you join me.

Not all forecasters got it wrong: Nate Silver does it again (again)

Four years ago we posted on Nate Silver’s, and other forecasters’, triumph over pundits. In contrast, after yesterday’s presidential election, results contradicted most polls and data-driven forecasters, several news articles came out wondering how this happened. It is important to point out that not all forecasters got it wrong. Statistically speaking, Nate Silver, once again, got it right.

To show this, below I include a plot showing the expected margin of victory for Clinton versus the actual results for the most competitive states provided by 538. It includes the uncertainty bands provided by 538 in this site (I eyeballed the band sizes to make the plot in R, so they are not exactly like 538’s).


Note that if these are 95% confidence/credible intervals, 538 got 1 wrong. This is exactly what we expect since 15/16 is about 95%. Furthermore, judging by the plot here, 538 estimated the popular vote margin to be 3.6% with a confidence/credible interval of about 5%. This too was an accurate prediction since Clinton is going to win the popular vote by about 1% 0.5% (note this final result is in the margin of error of several traditional polls as well). Finally, when other forecasters were giving Trump between 14% and 0.1% chances of winning, 538 gave him about a 30% chance which is slightly more than what a team has when down 3-2 in the World Series. In contrast, in 2012 538 gave Romney only a 9% chance of winning. Also, remember, if in ten election cycles you call it for someone with a 70% chance, you should get it wrong 3 times. If you get it right every time then your 70% statement was wrong.

So how did 538 outperform all other forecasters? First, as far as I can tell they model the possibility of an overall bias, modeled as a random effect, that affects every state. This bias can be introduced by systematic lying to pollsters or under sampling some group. Note that this bias can’t be estimated from data from one election cycle but it’s variability can be estimated from historical data. 538 appear to estimate the standard error of this term to be about 2%. More details on this are included here. In 2016 we saw this bias and you can see it in the plot above (more points are above the line than below). The confidence bands account for this source of variabilty and furthermore their simulations account for the strong correlation you will see across states: the chance of seeing an upset in Pennsylvania, Wisconsin, and Michigan is not the product of an upset in each. In fact it’s much higher. Another advantage 538 had is that they somehow were able to predict a systematic, not random, bias against Trump. You can see this by comparing their adjusted data to the raw data (the adjustment favored Trump about 1.5 on average). We can clearly see this when comparing the 538 estimates to The Upshots’:


The fact that 538 did so much better than other forecasters should remind us how hard it is to do data analysis in real life. Knowing math, statistics and programming is not enough. It requires experience and a deep understanding of the nuances related to the specific problem at hand. Nate Silver and the 538 team seem to understand this more than others.

Update: Jason Merkin points out (via Twitter) that 538 provides 80% credible intervals.

Data scientist on a chromebook take two

My friend Fernando showed me his collection of old Apple dongles that no longer work with the latest generation of Apple devices. This coupled with the announcement of the Macbook pro that promises way more dongles and mostly the same computing, had me freaking out about my computing platform for the future. I’ve been using cloudy tools for more and more of what I do and so it had me wondering if it was time to go back and try my Chromebook experiment again. Basically the question is whether I can do everything I need to do comfortably on a Chromebook.

So to execute the experience I got a brand new ASUS chromebook flip and the connector I need to plug it into hdmi monitors (there is no escaping at least one dongle I guess :(). Here is what that badboy looks like in my home office with Apple superfanboy Roger on the screen.


In terms of software there have been some major improvements since I last tried this experiment out. Some of these I talk about in my book How to be a modern scientist. As of this writing this is my current setup:

That handles the vast majority of my workload so far (its only been a day :)). But I would welcome suggestions and I’ll report back when either I give up or if things are still going strong in a little while….