Simply Statistics: Google's brainteasers (that don't work) and Johns Hopkins Biostatistics Data Analysis

This article is getting some attention, because Google’s VP for people operations at Google has made public a few insights that the Google HR team has come to over the last several years. The most surprising might be:

They don’t collect GPAs except for new candidates
Test scores are worthless
Interview scores weren’t correlated with success.
Brainteasers that Google is so famous for are worthless
Behavioral interviews are the most effective

The reason the article is getting so much attention is how surprising these facts may be to people who have little experience hiring/managing in technical fields. But I thought this quote was really telling:

One of my own frustrations when I was in college and grad school is that you knew the professor was looking for a specific answer. You could figure that out, but it’s much more interesting to solve problems where there isn’t an obvious answer.

Interestingly, that is the whole point of my data analysis course here at Hopkins. Over my relatively limited time as a faculty member I realized there were two key qualities that made students in biostatistics stand out: (1) that they were hustlers - willing to just work until the problem is solved even if it was frustrating and (2) that they were willing/able to try new approaches or techniques they weren’t comfortable with. I don’t have the quantitative data that Google does, but I would venture to guess those two traits explain 80%+ of the variation in success rates for graduate students in statistics/computing/data analysis.

Once that realization is made, it becomes clear pretty quickly that textbook problems or re-analysis of well known data sets measure something orthogonal to traits (1) and (2). So I went about redesigning the types of problems our students had to tackle. Instead of assigning problems out of a book I redesigned the questions to have the following characteristics:

The were based on live data sets. I define a “live” data set as a data set that has not been used to answer the question of interest previously.
The questions are problem forward, not solution backward. I would have an idea of what would likely work and what would likely not work. But I defined the question without thinking about what methods the students might use.
The answer was open ended (and often not known to me in advance).
The problems often had to do with unique scenarios not encountered frequently in statistics (e.g. you have a data census instead of just a sample).
The problems involved methods application/development, coding, and writing/communication.

I have found that problems with these characteristics more precisely measure hustle and flexibility, like Google is looking for in their hiring practices. Of course, there are some down sides to this approach. I think it can be more frustrating for students, who don’t have as clearly defined a path through the homework. It also means dramatically more work for the instructor in terms of analyzing the data to find the quirks, creating personalized feedback for students, and being able to properly estimate the amount of work a project will take.

We have started thinking about how to do this same thing at scale on Coursera. In the meantime, Google will just have to send their recruiters to Hopkins Biostats to find students who meet the characteristics they are looking for :-).

Google’s brainteasers (that don’t work) and Johns Hopkins Biostatistics Data Analysis