30 Nov 2016
I had the pleasure of sitting down with Amelia McNamara, Visiting Assistant Professor of Statistical and Data Sciences at Smith College, to talk about data science, data journalism, visualization, the problems with R, and adult coloring books.
If you have questions you’d like Hilary and me to answer, you can send them to nssdeviations @ gmail.com or tweet us at @NSSDeviations.
Download the audio for this episode
17 Nov 2016
My research group just recently finish a paper where several different teams within the group worked on different analyses. If you are interested the paper describes the recount resource which includes processed versions of thousands of human RNA-seq data sets.
As part of this project each group had to contribute some plots to the paper. One thing that I noticed is that each person used their own color palette and theme when building the plots. When we wrote the paper this made it a little harder for the figures to all fit together - especially when different group members worked on a single panel of a multi-panel plot.
So I started thinking about setting up a Leek group theme for both base R and ggplot2 graphics. One of the first problems was that every group member had their own opinion about what the best color palette would be. So we are running a little competition to determine what the official Leek group color palette for plots will be in the future.
As part of that process, one of my awesome postdocs, Shannon Ellis, decided to collect some data on how people perceive different color palettes. The survey is here:
If you have a few minutes and have an opinion about colors (I know you do!) please consider participating in our little poll and helping to determine the future of Leek group plots!
11 Nov 2016
Dear Lab Members,
I know that the results of Tuesday’s election have many of you
concerned about your future. You are not alone. I am concerned
about my future as well. But I want you to know that I have no plans
of going anywhere and I intend to dedicate as much time to our
projects as I always have. Meeting, discussing ideas and putting them
into practice with you is, by far, the best part of my job.
We are all concerned that if certain campaign promises are kept many
of our fellow citizens may need our help. If this happens, then we
will pause to do whatever we can to help. But I am currently
cautiously optimistic that we will be able to continue focusing on
helping society in the best way we know how: by doing scientific
This week Dr. Francis Collins assured us that there is strong
bipartisan support for scientific research. As an example consider
in which Newt Gingrich advocates for doubling the NIH budget. There
also seems to be wide consensus in this country that scientific
research is highly beneficial to society and an understanding that to
do the best research we need the best of the best no matter their
gender, race, religion or country of origin. Nothing good comes from
creative, intelligent, dedicated people leaving science.
I know there is much uncertainty but, as of now, there is nothing stopping us
from continuing to work hard. My plan is to do just that and I hope
you join me.
09 Nov 2016
Four years ago we
on Nate Silver’s, and other forecasters’, triumph over pundits. In
contrast, after yesterday’s presidential election, results contradicted
most polls and data-driven forecasters, several news articles came out
wondering how this happened. It is important to point
out that not all forecasters got it wrong. Statistically
speaking, Nate Silver, once again, got it right.
To show this, below I include a plot showing the expected margin of
victory for Clinton versus the actual results for the most competitive states provided by 538. It includes the uncertainty bands provided by 538 in
(I eyeballed the band sizes to make the plot in R, so they are not
exactly like 538’s).
Note that if these are 95% confidence/credible intervals, 538 got 1
wrong. This is exactly what we expect since 15/16 is about
95%. Furthermore, judging by the plot here, 538 estimated the popular vote margin to be 3.6%
with a confidence/credible interval of about 5%.
This too was an accurate
prediction since Clinton is going to win the popular vote by
0.5% (note this final result is in the margin of error of
several traditional polls as well). Finally, when other forecasters were
giving Trump between 14% and 0.1% chances of winning, 538 gave
him about a
30% chance which is slightly more than what a team has when down 3-2
in the World Series. In contrast, in 2012 538 gave Romney only a 9%
chance of winning. Also, remember, if in ten election cycles you
call it for someone with a 70% chance, you should get it wrong 3
times. If you get it right every time then your 70% statement was wrong.
So how did 538 outperform all other forecasters? First, as far as I
can tell they model the possibility of an overall bias, modeled as a
random effect, that affects
every state. This bias can be introduced by systematic
lying to pollsters or under sampling some group. Note that this bias
can’t be estimated from data from
one election cycle but it’s variability can be estimated from
historical data. 538 appear
to estimate the standard error of this term to be
about 2%. More details on this are included here. In 2016 we saw this bias and you can see it in
the plot above (more points are above the line than below). The
confidence bands account for this source of variabilty and furthermore
their simulations account for the strong correlation you will see
across states: the chance of seeing an upset in Pennsylvania, Wisconsin,
and Michigan is not the product of an upset in each. In
fact it’s much higher. Another advantage 538 had is that they somehow
were able to predict a systematic, not random, bias against
Trump. You can see this by
comparing their adjusted data to the raw data (the adjustment favored
Trump about 1.5 on average). We can clearly see this when comparing the 538
estimates to The Upshots’:
The fact that 538 did so much better than other forecasters should
remind us how hard it is to do data analysis in real life. Knowing
math, statistics and programming is not enough. It requires experience
and a deep understanding of the nuances related to the specific
problem at hand. Nate Silver and the 538 team seem to understand this
more than others.
Update: Jason Merkin points out (via Twitter) that 538 provides 80% credible
08 Nov 2016
My friend Fernando showed me his collection of old Apple dongles that no longer work with the latest generation of Apple devices. This coupled with the announcement of the Macbook pro that promises way more dongles and mostly the same computing, had me freaking out about my computing platform for the future. I’ve been using cloudy tools for more and more of what I do and so it had me wondering if it was time to go back and try my Chromebook experiment again. Basically the question is whether I can do everything I need to do comfortably on a Chromebook.
So to execute the experience I got a brand new ASUS chromebook flip and the connector I need to plug it into hdmi monitors (there is no escaping at least one dongle I guess :(). Here is what that badboy looks like in my home office with Apple superfanboy Roger on the screen.
In terms of software there have been some major improvements since I last tried this experiment out. Some of these I talk about in my book How to be a modern scientist. As of this writing this is my current setup:
That handles the vast majority of my workload so far (its only been a day :)). But I would welcome suggestions and I’ll report back when either I give up or if things are still going strong in a little while….