Note: This is the first in a series of posts where we will be interviewing junior, up-and-coming statisticians/data scientists. Our goal is to build visibility for people who are at the early stages of their careers.
Daniela Witten
Daniela is an assistant professor of Biostatistics at the University of Washington in Seattle. She moved to Seattle after getting her Ph.D. at Stanford. Daniela has been developing exciting new statistical methods for analyzing high dimensional data and is a recipient of the NIH Director’s Early Independence Award.
Which term applies to you: data scientist/statistician/data analyst?
Statistician! We have to own the term. Some of us have a tendency to try to sugarcoat what we do. But I say that I’m a statistician with pride! It means that I have been rigorously trained, that I have a broadly applicable skill set, and that I’m always open to new and interesting problems. Also, I sometimes get surprised reactions from people at cocktail parties, which is funny.
To the extent that there is a stigma associated with being a statistician, we statisticians need to face the problem and overcome it. The future of our field depends on it.
How did you get into statistics/data science?
I definitely did not set out to become a statistician. Before I got to college, I was planning to study foreign languages. Like most undergrads, I changed my mind, and eventually I majored in biology and math. I spent a summer in college doing experimental biology, but quickly discovered that I had neither the hand-eye coordination nor the patience for lab work. When I was nearing the end of college, I wasn’t sure what was next. I wanted to go to grad school, but I didn’t want to commit to one particular area of study for the next five years and potentially for my entire career.
I was lucky to be at Stanford and to stumble upon the Stat department there. Initially, statistics appealed to me because it was a good way to combine my interests in math and biology from the safety of a computer terminal instead of a lab bench. After spending more time in the department, I realized that if I studied statistics, I could develop a broad skill set that could be applied to a variety of areas, from cancer research to movie recommendations to the stock market.
What is the problem currently driving you?
My research involves the development of statistical methods for the analysis of very large data sets. Recently, I’ve been interested in better understanding networks and their applications to biology. In the past few years there has been a lot of work in the statistical community on network estimation, or graphical modeling. In parallel, biologists have been interested in taking network-based approaches to understanding large-scale biological data sets. There is a real need for these two areas of research to be brought closer together, so that statisticians can develop useful tools for rigorous network-based analysis of biological data sets.
For example, the standard approach for analyzing a gene expression data set with samples from two classes (like cancer and normal tissue) involves testing each gene for differential expression between the two classes, for instance using a two-sample t-statistic. But we know that an individual gene does not drive the differences between cancer and normal tissue; rather, sets of genes work together in pathways in order to have an effect on the phenotype. Instead of testing individual genes for differential expression, can we develop an approach to identify aspects of the gene network that are perturbed in cancer?
What are the top 3 skills you look for in a student who works with you?
I look for a student who is intellectually curious, self-motivated, and a good personality fit. Intellectual curiosity is a prerequisite for grad school, self-motivation is needed to make it through the 2 years of PhD level coursework and 3 years of research that make up a typical Stat/Biostat PhD, and a good personality fit is needed because grad school is long and sometimes frustrating ( but ultimately very rewarding), and it’s important to have an advisor who can be a friend along the way!
Who were really good mentors to you? What were the qualities that really helped you?
My PhD advisor, Rob Tibshirani, has been a great mentor. In addition to being a top statistician, he is also an enthusiastic advisor, a tireless advocate for his students, and a loyal friend. I learned from him the value of good collaborations and of simple solutions to complicated problems. I also learned that it is important to maintain a relaxed attitude and to occasionally play pranks on students.
For more information:
Check out her website. Or read her really nice papers on penalized classification and penalized matrix decompositions.