20
Jun

Google's brainteasers (that don't work) and Johns Hopkins Biostatistics Data Analysis

This article is getting some attention, because Google's VP for people operations at Google has made public a few insights that the Google HR team has come to over the last several years. The most surprising might be:

  1. They don't collect GPAs except for new candidates
  2. Test scores are worthless
  3. Interview scores weren't correlated with success.
  4. Brainteasers that Google is so famous for are worthless
  5. Behavioral interviews are the most effective

The reason the article is getting so much attention is how surprising these facts may be to people who have little experience hiring/managing in technical fields. But I thought this quote was really telling:

 One of my own frustrations when I was in college and grad school is that you knew the professor was looking for a specific answer. You could figure that out, but it’s much more interesting to solve problems where there isn’t an obvious answer.

Interestingly, that is the whole point of my data analysis course here at Hopkins. Over my relatively limited time as a faculty member I realized there were two key qualities that made students in biostatistics stand out: (1) that they were hustlers - willing to just work until the problem is solved even if it was frustrating and (2) that they were willing/able to try new approaches or techniques they weren't comfortable with. I don't have the quantitative data that Google does, but I would venture to guess those two traits explain 80%+ of the variation in success rates for graduate students in statistics/computing/data analysis.

Once that realization is made, it becomes clear pretty quickly that textbook problems or re-analysis of well known data sets measure something orthogonal to traits (1) and (2). So I went about redesigning the types of problems our students had to tackle. Instead of assigning problems out of a book I redesigned the questions to have the following characteristics:

  1. The were based on live data sets. I define a "live" data set as a data set that has not been used to answer the question of interest previously. 
  2. The questions are problem forward, not solution backward. I would have an idea of what would likely work and what would likely not work. But I defined the question without thinking about what methods the students might use.
  3. The answer was open ended (and often not known to me in advance).
  4. The problems often had to do with unique scenarios not encountered frequently in statistics (e.g. you have a data census instead of just a sample).
  5. The problems involved methods application/development, coding, and writing/communication.

I have found that problems with these characteristics more precisely measure hustle and flexibility, like Google is looking for in their hiring practices. Of course, there are some down sides to this approach. I think it can be more frustrating for students, who don't have as clearly defined a path through the homework. It also means dramatically more work for the instructor in terms of analyzing the data to find the quirks, creating personalized feedback for students, and being able to properly estimate the amount of work a project will take.

We have started thinking about how to do this same thing at scale on Coursera. In the meantime, Google will just have to send their recruiters to Hopkins Biostats to find students who meet the characteristics they are looking for :-).

  • DCE

    When I saw these:

    2. Test scores are worthless
    3. Interview scores weren't correlated with success.
    4. Brainteasers that Google is so famous for are worthless

    I was kind of sad.

    Am I wrong in my understanding that Google used these metrics to turn down far more applicants than it hired? And wouldn't that suggest that to make these statements, you would need to know how well all those people would have done?

    Take that third point: Of the sample that interviewed well, ulitmate success was not significant. But hypothesize that the entire balance of the population -- i.e., all those who interviewed poorly -- would have failed, then the usefulness of the interview score becomes much more important.

    Clearly Google is saying that these criteria are not sufficient to ensure success. But that isn't to say they aren't necessary. And I think that undermines these categorical statements -- without knowing how poorly (or well) the rejected candidates would have done, we can't assess the operating characteristic for these metrics.

  • hngjohnson

    To say that test scores are worthless; isn't it more accurate to say that these measures are invalid; suffering from either construct irrelevance (measuring the wrong constructs) or construct underrepresentation. I would find it more informative to say something like off-the-self tests of IQ don't predict success at Google. And, don't all 5 methods measure some form of aptitude construct? I find the interesting questions is, what construct is important and how can it be measured? I agree that the constructs you mentioned may be important, call them a high level of motivation and persistence in the face of ambiguity, but other constructs specific to Google and their management processes may also be valid.

  • dnlbrky

    Prof. Leek, I took your excellent Data Analysis course and have also just finished up the Machine Learning course from Andrew Ng. I really appreciated your teaching style, as you always started by presenting a real-world problem to be solved, and then guided us through it with actual data and helpful visualizations.

    With no disrespect to Prof. Ng (and I can't complain for getting free lessons from a world-class scientist/institution!), I felt that most of his lectures started out from the weeds ("here is a cost function equation...") and may or may not have gotten around to explaining what problem was trying to be solved or providing examples with real data. I suppose the assignments were meant to provide that, but I felt that they were too paint-by-number and didn't require any struggle/creativity. (Data Science is an art, after all...)

    In contrast, the two assignments you gave us as part of the Coursera class were much closer to the real world and really made me think. As a result I learned and remembered the material better. I wasn't sure where to start on the second one, but successfully implementing a Random Forest model for the first time was a real aha moment. Yes, the Data Analysis assignments were much more challenging to grade fairly in the MOOC environment than the ML code assignments. But I think the "messiness" was definitely worth it.

    All of this (and your emphasis on reproducibility) reminds me of my undergrad days at Rose-Hulman Institute of Technology. They stressed the importance of really understanding the root problem first, clearly stating what information and analysis tools (or principals of physics/engineering) you had at your disposal, identifying any unknowns, systematically solving the problem, and articulating the answer at the end. They made us write this all out on green engineering paper for every assignment. I hated it then, but having that problem-solving framework embedded in my thinking has been invaluable in the years since.

    Thanks for your contributions to Coursera, and for writing your blog.