Simply Statistics: The Dangers of Weighting Up a Sample

There’s a great story by Nate Cohn over at the New York Times’ Upshot about the dangers of “weighting up” a sample from a survey. In this case, it is in regards to a U.S.C/LA Times poll asking who people will vote for President:

The U.S.C./LAT poll weights for many tiny categories: like 18-to-21-year-old men, which U.S.C./LAT estimates make up around 3.3 percent of the adult citizen population. Weighting simply for 18-to-21-year-olds would be pretty bold for a political survey; 18-to-21-year-old men is really unusual.

The U.S.C./LA Times poll apparently goes even further:

When you start considering the competing demands across multiple categories, it can quickly become necessary to give an astonishing amount of extra weight to particularly underrepresented voters — like 18-to-21-year-old black men. This wouldn’t be a problem with broader categories, like those 18 to 29, and there aren’t very many national polls that are weighting respondents up by more than eight or 10-fold. The extreme weights for the 19-year-old black Trump voter in Illinois are not normal.

It’s worth noting (as a good thing) that the U.S.C./LA Times poll data is completely open, thus allowing the NYT to reproduce this entire analysis.

I haven’t done much in the way of survey analyses, but I’ve done some inverse probability weighting and in my experience it can be a tricky procedure in ways that are not always immediately obvious. The article discusses weight trimming, but also notes the dangers of that procedure. Overall, a good treatment of a complex issue.