Rose/Hatfield © Savannah Bergquist
Laura Hatfield and Sherri Rose are Assistant Professors specializing in biostatistics at Harvard Medical School in the Department of Health Care Policy. Laura received her PhD in Biostatistics from the University of Minnesota and Sherri completed her PhD in Biostatistics at UC Berkeley. They are developing novel statistical methods for health policy problems.
SimplyStats: Do you consider yourselves statisticians, data scientists, machine learners, or something else?
Rose: I’d definitely say a statistician. Even when I'm working on things that fall into the categories of data science or machine learning, there's underlying statistical theory guiding that process, be it for methods development or applications. Basically, there's a statistical foundation to everything I do.
Hatfield: When people ask what I do, I start by saying that I do research in health policy. Then I say I’m a statistician by training and I work with economists and physicians. People have mistaken ideas about what a statistician or professor does, so describing my context and work seems more informative. If I’m at a party, I usually wrap it up in a bow as, “I crunch numbers to study how Obamacare is working.” [laughs]
SimplyStats: What is the Health Policy Data Science Lab? How did you decide to start that?
Hatfield: We wanted to give our trainees a venue to promote their work and get feedback from their peers. And it helps me keep up on the cool projects Sherri and her students are working on.
Rose: This grew out of us starting to jointly mentor trainees. It's been a great way for us to make intellectual contributions to each other’s work through Lab meetings. Laura and I approach statistics from completely different frameworks, but work on related applications, so that's a unique structure for a lab.
SimplyStats: What kinds of problems are your groups working on these days? Are they mostly focused on health policy?
Rose: One of the fun things about working in health policy is that it is quite expansive. Statisticians can have an even bigger impact on science and public health if we take that next step: thinking about the policy implications of our research. And then, who needs to see the work in order to influence relevant policies. A couple projects I’m working on that demonstrate this breadth include a machine learning framework for risk adjustment in insurance plan payment and a new estimator for causal effects in a complex epidemiologic study of chronic disease. The first might be considered more obviously health policy, but the second will have important policy implications as well.
Hatfield: When I start an applied collaboration, I’m also thinking, “Where is the methods paper?” Most of my projects use messy observational data, so there is almost always a methods paper. For example, many studies here need to find a control group from an administrative data source. I’ve been keeping track of challenges in this process. One of our Lab students is working with me on a pathological case of a seemingly benign control group selection method gone bad. I love the creativity required in this work; my first 10 analysis ideas may turn out to be infeasible given the data, but that’s what makes this fun!
SimplyStats: What are some particular challenges of working with large health data?
Hatfield: When I first heard about the huge sample sizes, I was excited! Then I learned that data not collected for research purposes...
Rose: This was going to be my answer!
Hatfield: ...are very hard to use for research! In a recent project, I’ve been studying how giving people a tool to look up prices for medical services changes their health care spending. But the data set we have leaves out [painful pause] a lot of variables we’d like to use for control group selection and... a lot of the prices. But as I said, these gaps in the data are begging to be filled by new methods.
Rose: I think the fact that we have similar answers is important. I’ve repeatedly seen “big data” not have a strong signal for the research question, since they weren’t collected for that purpose. It’s easy to get excited about thousands of covariates in an electronic health record, but so much of it is noise, and then you end up with an R2 of 10%. It can be difficult enough to generate an effective prediction function, even with innovative tools, let alone try to address causal inference questions. It goes back to basics: what’s the research question and how can we translate that into a statistical problem we can answer given the limitations of the data.
SimplyStats: You both have very strong data science skills but are in academic positions. Do you have any advice for students considering the tradeoff between academia and industry?
Hatfield: I think there is more variance within academia and within industry than between the two.
Rose: Really? That’s surprising to me...
Hatfield: I had stereotypes about academic jobs, but my current job defies those.
Rose: What if a larger component of your research platform included programming tools and R packages? My immediate thought was about computing and its role in academia. Statisticians in genomics have navigated this better than some other areas. It can surely be done, but there are still challenges folding that into an academic career.
Hatfield: I think academia imposes few restrictions on what you can disseminate compared to industry, where there may be more privacy and intellectual property concerns. But I take your point that R packages do not impress most tenure and promotion committees.
Rose: You want to find a good match between how you like spending your time and what’s rewarded. Not all academic jobs are the same and not all industry jobs are alike either. I wrote a more detailed guest post on this topic for Simply Statistics.
Hatfield: I totally agree you should think about how you’d actually spend your time in any job you’re considering, rather than relying on broad ideas about industry versus academia. Do you love writing? Do you love coding? etc.
SimplyStats: You are both adopters of social media as a mechanism of disseminating your work and interacting with the community. What do you think of social media as a scientific communication tool? Do you find it is enhancing your careers?
Hatfield: Sherri is my social media mentor!
Rose: I think social media can be a useful tool for networking, finding and sharing neat articles and news, and putting your research out there to a broader audience. I’ve definitely received speaking invitations and started collaborations because people initially “knew me from Twitter.” It’s become a way to recruit students as well. Prospective students are more likely to “know me” from a guest post or Twitter than traditional academic products, like journal articles.
Hatfield: I’m grateful for our Lab’s new Twitter because it’s a purely academic account. My personal account has been awkwardly transitioning to include professional content; I still tweet silly things there.
Rose: My timeline might have a cat picture or two.
Hatfield: My very favorite thing about academic Twitter is discovering things I wouldn’t have even known to search for, especially packages and tricks in R. For example, that’s how I got converted to tidy data and dplyr.
Rose: I agree. I think it’s a fantastic place to become exposed to work that’s incredibly related to your own but in another field, and you wouldn’t otherwise find it preparing a typical statistics literature review.
SimplyStats: What would you change in the statistics community?
Rose: Mentoring. I was tremendously lucky to receive incredible mentoring as a graduate student and now as a new faculty member. Not everyone gets this, and trainees don’t know where to find guidance. I’ve actively reached out to trainees during conferences and university visits, erring on the side of offering too much unsolicited help, because I feel there’s a need for that. I also have a resources page on my website that I continue to update. I wish I had a more global solution beyond encouraging statisticians to take an active role in mentoring not just your own trainees. We shouldn’t lose good people because they didn’t get the support they needed.
Hatfield: I think we could make conferences much better! Being in the same physical space at the same time is very precious. I would like to take better advantage of that at big meetings to do work that requires face time. Talks are not an example of this. Workshops and hackathons and panels and working groups -- these all make better use of face-to-face time. And are a lot more fun!