Simply Statistics: The Care and Feeding of Your Scientist Collaborator

Editor’s Note: This post written by Roger Peng is part of a two-part series on Scientist-Statistician interactions. The first post was written by Elizabeth C. Matsui, an Associate Professor in the Division of Allergy and Immunology at the Johns Hopkins School of Medicine.

This post is a followup to Elizabeth Matsui’s previous post for scientists/clinicians on collaborating with biostatisticians. Elizabeth and I have been working for over half a decade and I think the story of how we started working together is perhaps a brief lesson on collaboration in and of itself. Basically, she emailed someone who didn’t have time, so that person emailed someone else who didn’t have time, so that person emailed someone else who didn’t have time, so that person emailed me, who as a mere assistant professor had plenty of time! A few people I’ve talked to are irked by this process because it feels like you’re someone’s fourth choice. But personally, I don’t care. I’d say almost all my good collaborations have come about this way. To me, it either works or it doesn’t work, regardless of where on the list you were when you were contacted.

I’ve written before about how to find good collaborators (although I neglected to mention the process described above), but this post tries to answer the question, “Now that I’ve found this good collaborator, what do I do with her/him?” Her are some thoughts I’ve accumulated over the years.

Understand that a scientist is not a fountain from which “the numbers” flow. Most statisticians like to work with data, and some even need it to demonstrate the usefulness of their methods or theory. So there’s a temptation to go “find a scientist” to “give you some data”. This is starting off on the wrong foot. If you picture your collaborator as a person who hands over the data and then you never talk to that person again (because who needs a clinician for a JASA paper?), then things will probably not end up so great. And I think there are two ways in which the experience will be sub-optimal. First, your scientist collaborator may feel miffed that you basically went off and did your own thing, making her/him less inclined to work with you in the future. Second, the product you end up with (paper, software, etc.) might not have the same impact on science as it would have had if you’d worked together more closely. This is the bigger problem: see #5 below.
All good collaborations involve some teaching: Be patient, not patronizing. Statisticians are often annoyed that “So-and-so didn’t even know this” or “they tried to do this with a sample size of 3!” True, there are egregious cases of scientists with a lack of basic statistical knowledge, but in my experience, all good collaborations involve some teaching. Otherwise, why would you collaborate with someone who knows exactly the same things that you know? Just like it’s important to take some time to learn the discipline that you’re applying statistical methods to, it’s important to take some time to describe to your collaborator how those statistical methods you’re using really work. Where does the information in the data come from? What aspects are important; what aspects are not important? What do parameter estimates mean in the context of this problem? If you find you can’t actually explain these concepts, or become very impatient when they don’t understand, that may be an indication that there’s a problem with the method itself that may need rethinking. Or maybe you just need a simpler method.
Go to where they are. This bit of advice I got from Scott Zeger when I was just starting out at Johns Hopkins. His bottom line was that if you understand where the data come from (as in literally, the data come from this organ in this person’s body), then you might not be so flippant about asking for an extra 100 subjects to have a sufficient sample size. In biomedical science, the data usually come from people. Real people. And the job of collecting that data, the scientist’s job, is usually not easy. So if you have a chance, go see how the data are collected and what needs to be done. Even just going to their office or lab for a meeting rather than having them come to you can be helpful in understanding the environment in which they work. I know it can feel nice (and convenient) to have everyone coming to you, but that’s crap. Take the time and go to where they are.
Their business is your business, so pitch in. A lot of research (and actually most jobs) involves doing things that are not specifically relevant to your primary goal (a paper in a good journal). But sometimes you do those things to achieve broader goals, like building better relationships and networks of contacts. This may involve, say, doing a sample size calculation once in a while for a new grant that’s going in. That may not be pertinent to your current project, but it’s not that hard to do, and it’ll help your collaborator a lot. You’re part of a team here, so everyone has to pitch in. In a restaurant kitchen, even the Chef works the line once in a while. Another way to think of this is as an investment. Particularly in the early stages there’s going to be a lot of ambiguity about what should be done and what is the best way to proceed. Sometimes the ideal solution won’t show itself until much later (the so-called “j-shaped curve” of investment). In the meantime, pitch in and keep things going.
Your job is to advance the science. In a good collaboration, everyone should be focused on the same goal. In my area, that goal is improving public health. If I have to prove a theorem or develop a new method to do that, then I will (or at least try). But if I’m collaborating with a biomedical scientist, there has to be an alignment of long-term goals. Otherwise, if the goals are scattered, the science tends to be scattered, and ultimately sub-optimal with respect to impact. I actually think that if you think of your job in this way (to advance the science), then you end up with better collaborations. Why? Because you start looking for people who are similarly advancing the science and having an impact, rather than looking for people who have “good data”, whatever that means, for applying your methods.

In the end, I think statisticians need to focus on two things: Go out and find the best people to work with and then help them advance the science.