A Short Guide for Students Interested in a Statistics PhD Program

Rafael Irizarry
2016-09-06

This summer I had several conversations with undergraduate students seeking career advice. All were interested in data analysis and were considering graduate school. I also frequently receive requests for advice via email. We have posted on this topic before, for example here and here, but I thought it would be useful to share this short guide I put together based on my recent interactions.

It’s OK to be confused

When I was a college senior I didn’t really understand what Applied Statistics was nor did I understand what one does as a researcher in academia. Now I love being an academic doing research in applied statistics. But it is hard to understand what being a researcher is like until you do it for a while. Things become clearer as you gain more experience. One important piece of advice is to carefully consider advice from those with more experience than you. It might not make sense at first, but I can tell today that I knew much less than I thought I did when I was 22.

Should I even go to graduate school?

Yes. An undergraduate degree in mathematics, statistics, engineering, or computer science provides a great background, but some more training greatly increases your career options. You may be able to learn on the job, but note that a masters can be as short as a year.

A masters or a PhD?

If you want a career in academia or as a researcher in industry or government you need a PhD. In general, a PhD will give you more career options. If you want to become a data analyst or research assistant, a masters may be enough. A masters is also a good way to test out if this career is a good match for you. Many people do a masters before applying to PhD Programs. The rest of this guide focuses on those interested in a PhD.

What discipline?

There are many disciplines that can lead you to a career in data science: Statistics, Biostatistics, Astronomy, Economics, Machine Learning, Computational Biology, and Ecology are examples that come to mind. I did my PhD in Statistics and got a job in a Department of Biostatistics. So this guide focuses on Statistics/Biostatistics.

Note that once you finish your PhD you have a chance to become a postdoctoral fellow and further focus your training. By then you will have a much better idea of what you want to do and will have the opportunity to chose a lab that closely matches your interests.

What is the difference between Statistics and Biostatistics?

Short answer: very little. I treat them as the same in this guide. Long answer: read this.

How should I prepare during my senior year?

Math

Good grades in math and statistics classes are almost a requirement. Good GRE scores help and you need to get a near perfect score in the Quantitative Reasoning part of the GRE. Get yourself a practice book and start preparing. Note that to survive the first two years of a statistics PhD program you need to prove theorems and derive relatively complicated mathematical results. If you can’t easily handle the math part of the GRE, this will be quite challenging.

When choosing classes note that the area of math most related to your stat PhD courses is Real Analysis. The area of math most used in applied work is Linear Algebra, specifically matrix theory including understanding eigenvalues and eigenvectors. You might not make the connection between what you learn in class and what you use in practice until much later. This is totally normal.

If you don’t feel ready, consider doing a masters first. But also, get a second opinion. You might be being too hard on yourself.

Programming

You will be using a computer to analyze data so knowing some programming is a must these days. At a minimum, take a basic programming class. Other computer science classes will help especially if you go into an area dealing with large datasets. In hindsight, I wish I had taken classes on optimization and algorithm design.

Know that learning to program and learning a computer language are different things. You need to learn to program. The choice of language is up for debate. If you only learn one, learn R. If you learn three, learn R, Python and C++.

Knowing Linux/Unix is an advantage. If you have a Mac try to use the terminal as much as possible. On Windows get an emulator.

Writing and Communicating

My biggest educational regret is that, as a college student, I underestimated the importance of writing. To this day I am correcting that mistake.

Your success as a researcher greatly depends on how well you write and communicate. Your thesis, papers, grant proposals and even emails have to be well written. So practice as much as possible. Take classes, read works by good writers, and practice. Consider starting a blog even if you don’t make it public. Also note that in academia, job interviews will involve a 50 minute talk as well as several conversations about your work and future plans. So communication skills are also a big plus.

But wait, why so much math?

The PhD curriculum is indeed math heavy. Faculty often debate the possibility of changing the curriculum. But regardless of differing opinions on what is the right amount, math is the foundation of our discipline. Although it is true that you will not directly use much of what you learn, I don’t regret learning so much abstract math because I believe it positively shaped the way I think and attack problems.

Note that after the first two years you are pretty much done with courses and you start on your research. If you work with an applied statistician you will learn data analysis via the apprenticeship model. You will learn the most, by far, during this stage. So be patient. Watch these two Karate Kid scenes for some inspiration.

What department should I apply to?

The top 20-30 departments are practically interchangeable in my opinion. If you are interested in applied statistics make sure you pick a department with faculty doing applied research. Note that some professors focus their research on the mathematical aspects of statistics. By reading some of their recent papers you will be able to tell. An applied paper usually shows data (not simulated) and motivates a subject area challenge in the abstract or introduction. A theory paper shows no data at all or uses it only as an example.

Can I take a year off?

Absolutely. Especially if it’s to work in a data related job. In general, maturity and life experiences are an advantage in grad school.

What should I expect when I finish?

You will have many many options. The demand of your expertise is great and growing. As a result there are many high-paying options. If you want to become an academic I recommend doing a postdoc. Here is why. But there are many other options as we describe here and here.