Tag: advice


Pro-tips for graduate students (Part 3)

This is part of the ongoing series of pro tips for graduate students, check out parts one and two for the original installments. 

  1. Learn how to write papers in a very clear and simple style. Whenever you can write in plain English, skip jargon as much as possible, and make the approach you are using understandable and clear. This can (sometimes) make it harder to get your papers into journals. But simple, clear language leads to much higher use/citation of your work. Examples of great writers are: John Storey, Rob Tibshirani, Robert May, Martin Nowak, etc.
  2. It is a great idea to start reviewing papers as a graduate student. Don’t do too many, you should focus on your research, but doing a few will give you a lot of insight into how the peer-review system works. Ask your advisor/research mentor they will generally have a review or two they could use help with. When doing reviews, keep in mind a person spent a large chunk of time working on the paper you are reviewing. Also, don’t forget to use Google.
  3. Try to write your first paper as soon as you possibly can and try to do as much of it on your own as you can. You don’t have to wait for faculty to give you ideas, read papers and think of what you think would have been better (you might check with a faculty member first so you don’t repeat what’s done, etc.). You will learn more writing your first paper than in almost any/all classes.

When dealing with poop, it's best to just get your hands dirty

I’m a relatively new dad. Before the kid we affectionately call the “tiny tornado” (TT) came into my life, I had relatively little experience dealing with babies and all the fluids they emit. So admittedly, I was a little squeamish dealing with the poopy explosions the TT would create. Inevitably, things would get much more messy than they had to be while I was being too delicate with the issue. It took me an embarrassingly long time for an educated man, but I finally realized you just have to get in there and change the thing even if it is messy, then wash your hands after. It comes off. 

It is a similar situation in my professional life, but I’m having a harder time learning the lesson. There are frequently things that I’m not really excited to do: review a lot of papers, go to long meetings, revise a draft of that paper that has just been sitting around forever. Inevitably, once I get going they usually aren’t as difficult or as arduous as I thought. Even better, once they are done I feel a huge sense of accomplishment and relief. I used to have a metaphor for this, I’d tell myself, “Jeff, just rip off the band-aid”. Now, I think “Jeff, just get your hands dirty”. 


Pro Tips for Grad Students in Statistics/Biostatistics (Part 1)

I just finished teaching a Ph.D. level applied statistical methods course here at Hopkins. As part of the course, I gave one “pro-tip” a day; something I wish I had learned in graduate school that has helped me in becoming a practicing applied statistician. Here are the first three, more to come soon. 
  1. A major component of being a researcher is knowing what’s going on in the research community. Set up an RSS feed with journal articles. Google Reader is a good one, but there are others. Here are some good applied stat journals: Biostatistics, Biometrics, Annals of Applied Statistics…
  2. Reproducible research is a hot topic, in part because a couple of high-profile papers that were disastrously non-reproducible (see “Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology”). When you write code for statistical analysis try to make sure that: (a) It is neat and well-commented - liberal and specific comments are your friend. (b)That it can be run by someone other than you, to produce the same results that you report.
  3. In data analysis - particularly for complex high-dimensional
    data - it is frequently better to choose simple models for clearly defined parameters. With a lot of data, there is a strong temptation to go overboard with statistically complicated models; the danger of overfitting/ over-interpreting is extreme. The most reproducible results are often produced by sensible and statistically “simple” analyses (Note: being sensible and simple does not always lead to higher prole results).

"How do we evaluate statisticians working in genomics? Why don't they publish in stats journals?" Here is my answer

During the past couple of years I have been asked these questions by several department chairs and other senior statisticians interested in hiring or promoting faculty working in genomics. The main difficulty stems from the fact that we (statisticians working in genomics) publish in journals outside the mainstream statistical journals. This can be a problem during evaluation because a quick-and-dirty approach to evaluating an academic statistician is to count papers in the Annals of Statistics, JASA, JRSS and Biometrics. The evaluators feel safe counting these papers because they trust the fellow-statistician editors of these journals. However, statisticians working in genomics tend to publish in journals like Nature Genetics, Genome Research, PNAS, Nature Methods, Nucleic Acids Research, Genome Biology, and Bioinformatics. In general, these journals do not recruit statistical referees and a considerable number of papers with questionable statistics do get published in them. However, when the paper’s main topic is a statistical method or if it heavily relies on statistical methods, statistical referees are used. So, if the statistician is the corresponding or last author and it’s a stats paper, it is OK to assume the statistics are fine and you should go ahead and be impressed by the impact factor of the journal… it’s not east getting statistics papers in these journals. 

But we really should not be counting papers blindly. Instead we should be reading at least some of them. But here again the evaluators get stuck as we tend to publish papers with application/technology specific jargon and show-off by presenting results that are of interest to our potential users (biologists) and not necessarily to our fellow statisticians. Here all I can recommend is that you seek help. There are now a handful of us that are full professors and most of us are more than willing to help out with, for example, promotion letters.

So why don’t we publish in statistical journals? The fear of getting scooped due to the slow turnaround of stats journals is only one reason. New technologies that quickly became widely used (microarrays in 2000 and nextgen sequencing today) created a need for data analysis methods among large groups of biologists. Journals with large readerships and high impact factors, typically not interested in straight statistical methodology work, suddenly became amenable to publishing our papers, especially if they solved a data analytic problem faced by many biologists. The possibility of publishing in widely read journals is certainly seductive. 

While in several other fields, data analysis methodology development is restricted to the statistics discipline, in genomics we compete with other quantitative scientists capable of developing useful solutions: computer scientists, physicists, and engineers were also seduced by the possibility of gaining notoriety with publications in high impact journals. Thus, in genomics, the competition for funding, citation and publication in the top scientific journals is fierce. 

Then there is funding. Note that while most biostatistics methodology NIH proposals go to the Biostatistical Methods and Research Design (BMRD) study section, many of the genomics related grants get sent to other sections such as the Genomics Computational Biology and Technology (GCAT) and Biodata Management and Anlayis (BDMA) study sections. BDMA and GCAT are much more impressed by Nature Genetics and Genome Research than JASA and Biometrics. They also look for citations and software downloads. 

To be considered successful by our peers in genomics, those who referee our papers and review our grant applications, our statistical methods need to be delivered as software and garner a user base. Publications in statistical journals, especially those not appearing in PubMed, are not rewarded. This lack of incentive combined with how time consuming it is to produce and maintain usable software, has led many statisticians working in genomics to focus solely on the development of practical methods rather than generalizable mathematical theory. As a result, statisticians working in genomics do not publish much in the traditional statistical journals. You should not hold this against them, especially if they are developers and maintainers of widely used software.


Reverse scooping

I would like to define a new term: reverse scooping is when someone publishes your idea after you, and doesn’t cite you. It has happened to me a few times. What does one do? I usually send a polite message to the authors with a link to my related paper(s). These emails are usually ignored, but not always. Most times I don’t think it is malicious though. In fact, I almost reverse scooped a colleague recently.  People arrive at the same idea a few months (or years) later and there is just too much literature to keep track-off. And remember the culprit authors were not the only ones that missed your paper, the referees and associate editor missed it as well. One thing I have learned is that if you want to claim an idea, try to include it in the title or abstract as very few papers get read cover-to-cover.


Show 'em the data!

In a previous post I argued that students entering college should be shown job prospect by major data. This week I found out the American Bar Association might make it a requirement for law school accreditation.

Hat tip to Willmai Rivera.


Preparing for tenure track job interviews

If you are in the job market you will soon be receiving (or already received) an invitation for an interview. So how should you prepare?  You have two goals. The first is to make a good impression. Here are some tips:

1) During your talk, do NOT go over your allotted time. Practice your talk at least twice. Both times in front of a live audiences that asks questions. 

2) Know you audience. If it’s a “math-y” department, give a more “math-y” talk. If it’s an applied department, give a more applied talk. But (sorry for the cliché) be yourself. Don’t pretend to be interested in something you are not. I remember one candidate that pretended to be interested in applications and it back fired badly during the talk.  

3) Learn about the faculty’s research interests. This will help during the one-on-one interviews.

4)  Be ready to answer the question “what do you want to teach?” and “where do you see yourself in five years?”

5) I can’t think of any department where it is necessary to wear a suit (correct me if I’m wrong in the comments). In some places you might feel uncomfortable wearing a suit while those interviewing you are in shorts and t-shirt. But do dress up. Show them you care. 

Second, and just as important, you want to figure out if you like the department you are visiting. Do you want to spend the next 5, 10, 50 years there?  Make sure to find out as much as you can to answer this question. Some questions are more appropriate for junior faculty, the more sensitive ones for the chair. Here are some example questions I would ask:

1) What are the expectations for promotion? Would you promote someone publishing exclusively in Nature? Somebody publishing exclusively in Annals of Statistics? Is being a PI on an R01 a requirement for tenure? 

2) What are the expectations for teaching/service/collaboration? How are teaching and committee service assignments made?   

3) How did you connect with your collaborators? How are these connections made?

4) What percent of my salary am I expected to cover? Is it possible to do this by being a co-investigator?

5) Where do you live? How are the schools? How is the commute?  

6) How many graduate students does the department have? How are graduate students funded? If I want someone to work with me, do I have to cover their stipend/tuition?

Specific questions for the junior Faculty:

Are the expectations for promotion made clear to you? Do you get feedback on your progress? Do the senior faculty mentor you? Do the senior faculty get along? What do you like most about the department? What can be improved? In the last 10 years, what percent of junior faculty get promoted?

Questions for the chair:

What percent of my salary am I expected to cover? How soon? Is their bridge funding? What is a standard startup package? Can you describe the promotion process in detail? What space is available for postdocs? (for hard money place) I love teaching, but can I buy out teaching with grants? 

I am sure I missed stuff, so please comment away….

Update: I can’t believe I forgot computing! Make sure to ask about computing support. This varies a lot from place to place. Some departments share amazing systems. Ask how costs are shared? How is the IT staff? Is R supported? In others you might have to buy your own hardware. Get all the details.


Expected Salary by Major

In this recent editorial about the Occupy Wall Street movement, Richard Kim profiles a protestor that despite having a master’s degree can’t find a job. This particular protestor quit his job as a school teacher three years ago and took out a $35K student loan to obtain a master’s degree in puppetry from the University of Connecticut. I wonder if, before taking his money, UConn showed this person data on job prospects for their puppetry graduates. More generally, I wonder if any university shows their idealist 18 year old freshmen such data.

Georgetown’s Center for Education and the Workforce has an informative interactive webpage that students can use to find out by-major salary information. I scraped data from this Wall Street Journal webpage which also provides, for each major, unemployment rates, salary quartiles, and its rank in popularity. I used these data to compute expected salaries by multiplying median salary by percent of employment. The graph above shows expected salary versus popularity rank (1=most popular) for the 50 most popular majors (Go here for a complete table and here is the raw data and code). I also included Physics (the 70-th). I used different colors to represent four categories: engineering, math/stat/computers, physical sciences, and the rest. As a baseline I added a horizontal line representing the average salary for a truck driver: $65K, a job currently with plenty of openings. Different font sizes are used only to make names fit. A couple of observations stand out. First, only one of the top 10 most popular majors, Computer Science, has a higher expected salary than truck drivers. Second, Psychology, the fifth most popular major, has an expected salary of $40K and, as seen in the table, an unemployment rate of 6.1%; almost three times worse than nursing. 

A few editorial remarks: 1) I understand that being a truck driver is very hard and that there is little room for career development. 2) I am not advocating that people pick majors based on future salaries. 3) I think college freshmen deserve to know the data given how much money they fork over to us. 4) The graph is for bachelor’s degrees, not graduate education. The CEW website includes data for graduate degrees. Note that Biology shoots way up with a graduate degree. 5) For those interested in a PhD in Statistics I recommend you major in Math with a minor in a liberal arts subject, such as English, while taking as many programming classes as you can. We all know Math is the base for everything statisticians do, but why English? Students interested in academia tend to underestimate the importance of writing and communicating.

Related articles: This NY Times article describes how/why students are leaving the sciences. Here, Alex Tabarrok describes big changes in the balance of majors between 1985 and today and here he shares his thoughts on Richard Kim’s editorial. Matt Yglesias explains that unemployment is rising across the board. Finally, Peter Orszag share his views on how a changing world is changing the value of a college degree. 

Hat tip to David Santiago for sending various of these links and Harris Jaffee for help with scrapping.


The self-assessment trap

Several months ago I was sitting next to my colleague Ben Langmead at the Genome Informatics meeting. Various talks were presented on short read alignments and every single performance table showed the speaker’s method as #1 and Ben’s Bowtie as #2 among a crowded field of lesser methods. It was fun to make fun of Ben for getting beat every time, but the reality was that all I could conclude was that Bowtie was best and speakers were falling into the the self-assessment trap: each speaker had tweaked the assessment to make their method look best. This practice is pervasive in Statistics where easy-to-tweak Monte Carlo simulations are commonly used to assess performance. In a recent paper, a team at IBM described how the problem in the systems biology literature is pervasive as well. Co-author Gustavo Stolovitzky Stolovitsky is a co-developer of the DREAM challenge in which the assessments are fixed and developers are asked to submit. About 7 years ago we developed affycomp, a comparison webtool for microarray preprocessing methods. I encourage others involved in fields where methods are constantly being compared to develop such tools. It’s a lot of work, but journals are usually friendly to papers describing the results of such competitions.

Related Posts:  Roger on colors in R, Jeff on battling bad science


Finding good collaborators

The job of the statistician is almost entirely about collaboration. Sure, there’s theoretical work that we can do by ourselves, but most of the impact that we have on science comes from our work with scientists in other fields. Collaboration is also what makes the field of statistics so much fun.

So one question I get a lot from people is “how do you find good collaborations”? Or, put another way, how do you find good collaborators? It turns out this distinction is more important than it might seem.

My approach to developing collaborations has evolved over time and I consider myself fairly lucky to have developed a few very productive and very enjoyable collaborations. These days my strategy for finding good collaborations is to look for good collaborators. I personally find it important to work with people that I like as well as respect as scientists, because a good collaboration is going to involve a lot of personal interaction. A place like Johns Hopkins has no shortage of very intelligent and very productive researchers that are doing interesting things, but that doesn’t mean you want to work with all of them.

Here’s what I’ve been telling people lately about finding collaborations, which is a mish-mash of a lot of advice I’ve gotten over the years.

  1. Find people you can work with. I sometimes see situations where a statistician will want to work with someone because he/she is working on an important problem. Of course, you want to be working on a problem that interests you, but it’s only partly about the specific project. It’s very much about the person. If you can’t develop a strong working relationship with a collaborator, both sides will suffer. If you don’t feel comfortable asking (stupid) questions, pointing out problems, or making suggestions, then chances are the science won’t be as good as it could be. 
  2. It’s going to take some time. I sometimes half-jokingly tell people that good collaborations are what you’re left with after getting rid of all your bad ones. Part of the reasoning here is that you actually may not know what kinds of people you are most comfortable working with. So it takes time and a series of interactions to learn these things about yourself and to see what works and doesn’t work. Of course, you can’t take forever, particularly in academic settings where the tenure clock might be ticking, but you also can’t rush things either. One rule I heard once was that a collaboration is worth doing if it will likely end up with a published paper. That’s a decent rule of thumb, but see my next comment.
  3. It’s going to take some time. Developing good collaborations will usually take some time, even if you’ve found the right person. You might need to learn the science, get up to speed on the latest methods/techniques, learn the jargon, etc. So it might be a while before you can start having intelligent conversations about the subject matter. Then it takes time to understand how the key scientific questions translate to statistical problems. Then it takes time to figure out how to develop new methods to address these statistical problems. So a good collaboration is a serious long-term investment which has some risk of not working out.  There may not be a lot of papers initially, but the idea is to make the early investment so that truly excellent papers can be published later.
  4. Work with people who are getting things done. Nothing is more frustrating than collaborating on a project with someone who isn’t that interested in bringing it to a close (i.e. a published paper, completed software package). Sometimes there isn’t a strong incentive for the collaborator to finish (i.e she/he is already tenured) and other times things just fall by the wayside. So finding a collaborator who is continuously getting things done is key. One way to determine this is to check out their CV. Is there a steady stream of productivity? Papers in good journals? Software used by lots of other people? Grants? Web site that’s not in total disrepair?
  5. You’re not like everyone else. One thing that surprised me was discovering that just because someone you know works well with a specific person doesn’t mean that you will work well with that person. This sounds obvious in retrospect, but there were a few situations where a collaborator was recommended to me by a source that I trusted completely, and yet the collaboration didn’t work out. The bottom line is to trust your mentors and friends, but realize that differences in personality and scientific interests may determine a different set of collaborators with whom you work well.

These are just a few of my thoughts on finding good collaborators. I’d be interested in hearing others’ thoughts and experiences along these lines.

Related Posts: Rafa on authorship conventions, finish and publish