Being at the Center

Roger Peng
2018-09-07

Hilary Parker and I just released part 2 of our book club discussion of Nigel Cross’s book Design Thinking and it centers around a profile of designer Gordan Murray, who spent his career designing Formula One race cars. One of the aspects of his job as a designer is taking a “systems approach” to solving problems. Coupled with that approach is his role in balancing the various priorities of members of his team. He describes himself as both dictator and diplomat in doing this aspect of the job.

When designing a complex object like a race car, there will be many people contributing who have specific expertise. It is their job to focus on what they think is the highest priority, but it is the designer’s job to put the whole car together and, on the way, raise some priorities and lower other priorities. The designer is at the center of activity and must have good relationships with every member of the team in order for everything to come together on time and on budget.

Data Analyst at the Center

A mentor once told me that in any large-ish coordinated scientific collaboration there will usually be regular meetings to discuss the data collection, data analysis, or both. Basically, a meeting to discuss data. And that these meetings, over time, tend to become the most important and influential meetings of the entire collaboration. It makes sense: Science is ultimately about the data and any productivity that results from the collaboration will be a function of the data collected. My mentor’s implication was that as a statistician directing the analyses of the group, these data meetings were an important place to be.

I have been in a few collaborations of this nature (both small and large) and can echo the advice that I got. The data-related meetings tend to be the most interesting and often are where people get most animated and excited. For scientific collaborations, that is in fact where the “action” occurs. As a result, it’s important that the data analyst running the analyses know what their job is.

If these meetings are about data analysis, then it’s important to realize that the product that the group is developing is the data analysis. As such, the data analyst should play the role of designer. Too often, I see analysts playing a minor role in these kinds of meetings because it’s their job to “just run the models”. Usually, this is not their fault. Meetings like this tend to be populated with large egos, high-level professors, principal investigators, and the like. The data analyst is often a staff member for the team or a junior faculty, so comparatively “low ranked”. It can be difficult to even speak up in these meetings, much less direct them.

However, I think it’s essential that the data analyst be at the center of a meeting about data analysis. The reason is simply that they are in the best position to balance the priorities of the collaboration. Because they are closest to the data, they have the best sense of what information and evidence exists in the data and, perhaps more importantly, what is not available in the data. Investigators will often have assumptions about what might be possible and perhaps what they would like to achieve, but these things may or may not be supported by the data.

It’s common that different investigators have very different priorities. One investigator wants to publish a paper as quickly as possible (perhaps they are a junior faculty that needs to publish papers or they know there is a competitor doing the same research). Another wants to run lots of models and explore the data more. Yet another thinks that there’s nothing worth publishing here and yet another wants to wait and collect more data. And there’s always one investigator who wants to “rethink the entire scientific question”. There’s no one thing to be done here, but the analyst is often the only one who can mediate all these conflicts.

What happens in these situations is a kind of “statistical horse trading”. You want a paper published quickly? Then we’ll have to use this really fast method that requires stronger assumptions and therefore weakens the conclusions. If you want to collect more data, maybe we design the analytic pipeline in such manner that we can analyze what we have now and then easily incorporate the new data when it arrives. If there’s no time or money for getting more data, we can use this other model that attempts to use a proxy for that data (again, more assumptions, but maybe reasonable ones).

Managing these types of negotiations can be difficult because people naturally want to have “all the things”. The data analyst has to figure out the relative ordering of priorities from the various parties involved. There’s no magical one-liner that you can say to convince people of what to do. It’s an iterative process with lots of discussion and trust-building. Frankly, it doesn’t always work.

The analyst, as the designer of the ultimate product, the data analysis, must think of solutions that can balance all the priorities in different ways. There isn’t always a solution that threads the needle and makes everyone happy or satisfied. But a well-functioning team can recognize that and move forward with an analysis and produce something useful.