Biostatistics: It’s not what you think it is

Rafael Irizarry

My department recently sent me on a recruitment trip for our graduate program. I had the opportunity to chat with undergrads interested in pursuing a career related to data analysis. I found that several did not know about the existence of Departments of Biostatistics and most of the rest thought Biostatistics was the study of clinical trials. We have posted on the need for better marketing for Statistics, but Biostatistics needs it even more. So this post is for students considering a career as applied statisticians or data science and are considering PhD programs.

There are dozens of Biostatistics departments and most run PhD programs. As an undergraduate, you may have never heard of it because they are usually in schools that undergrads don’t regularly frequent: Public Health and Medicine.  However, they are very active in research and teaching graduate students. In fact, the 2014 US News & World Report ranking of Statistics Departments includes three Biostat departments in the top five spots. Although clinical trials are a popular area of interest in these departments, there are now many other areas of research. With so many fields of science shifting to data intensive research, Biostatistics has adapted to work in these areas. Today pretty much any Biostat department will have people working on projects related to genetics, genomics, computational biology, electronic medical records, neuroscience, environmental sciences, and epidemiology, health-risk analysis, and clinical decision making. Through collaborations, academic biostatisticians have early access to the cutting edge datasets produced by public health scientists and biomedical researchers. Our research usually revolves in either developing statistical methods that are used by researchers working in these fields or working directly with a collaborator in data-driven discovery.

How is it different from Statistics? In the grand scheme of things, they are not very different. As implied by the name, Biostatisticians focus on data related to biology while statisticians tend to be more general. However, the underlying theory and skills we learn are similar. In my view, the major difference is that Biostatisticians, in general, tend to be more interested in data and the subject matter, while in Statistics Departments more emphasis is given to the mathematical theory.

What type of job can I get with a Phd In Biostatistics? A well paying one. And you will have many options to chose from. Our graduates tend to go to academia, industry or government. Also, the Bio in the name does not keep our graduates for landing non-bio related jobs, such as in high tech. The reason for this is that the training our students receive and the what they learn from research experiences can be widely applied to data analysis challenges.

How should I prepare if I want to apply to a PhD program? First you need to decide if you are going to like it. One way to do this is to participate in one of the summer programs where you get a glimpse of what we do. My department runs one of these as well.  However, as an undergrad I would mainly focus on courses. Undergraduate research experiences are a good way to get an idea of what it’s like, but it is difficult to do real research unless you can set aside several hours a week for several consecutive months. This is difficult as an undergrad because you have to make sure to do well in your courses, prepare for the GRE, and get a solid mathematical and computing foundation in order to conduct research later. This is why these programs are usually in the summer. If you decide to apply to a PhD program, I recommend you take advanced math courses such as Real Analysis and Matrix Algebra. If you plan to develop software for complex datasets, I  recommend CS courses that cover algorithms and optimization. Note that programming skills are not the same thing as the theory taught in these CS courses. Programming skills in R will serve you well if you plan to analyze data regardless of what academic route you follow. Python and a low-level language such as C++ are more powerful languages that many biostatisticians use these days.

I think the demand for well-trained researchers that can make sense of data will continue to be on the rise. If you want a fulfilling job where you analyze data for a living, you should consider a PhD in Biostatistics.