Simply Statistics



The History of Nonlinear Principal Components Analysis, a lecture given by Jan de Leeuw. For those that have ~45 minutes to spare, it’s a very nice talk given in Jan’s characteristic style.


Preparing for tenure track job interviews

If you are in the job market you will soon be receiving (or already received) an invitation for an interview. So how should you prepare?  You have two goals. The first is to make a good impression. Here are some tips:

1) During your talk, do NOT go over your allotted time. Practice your talk at least twice. Both times in front of a live audiences that asks questions. 

2) Know you audience. If it’s a “math-y” department, give a more “math-y” talk. If it’s an applied department, give a more applied talk. But (sorry for the cliché) be yourself. Don’t pretend to be interested in something you are not. I remember one candidate that pretended to be interested in applications and it back fired badly during the talk.  

3) Learn about the faculty’s research interests. This will help during the one-on-one interviews.

4)  Be ready to answer the question “what do you want to teach?” and “where do you see yourself in five years?”

5) I can’t think of any department where it is necessary to wear a suit (correct me if I’m wrong in the comments). In some places you might feel uncomfortable wearing a suit while those interviewing you are in shorts and t-shirt. But do dress up. Show them you care. 

Second, and just as important, you want to figure out if you like the department you are visiting. Do you want to spend the next 5, 10, 50 years there?  Make sure to find out as much as you can to answer this question. Some questions are more appropriate for junior faculty, the more sensitive ones for the chair. Here are some example questions I would ask:

1) What are the expectations for promotion? Would you promote someone publishing exclusively in Nature? Somebody publishing exclusively in Annals of Statistics? Is being a PI on an R01 a requirement for tenure? 

2) What are the expectations for teaching/service/collaboration? How are teaching and committee service assignments made?   

3) How did you connect with your collaborators? How are these connections made?

4) What percent of my salary am I expected to cover? Is it possible to do this by being a co-investigator?

5) Where do you live? How are the schools? How is the commute?  

6) How many graduate students does the department have? How are graduate students funded? If I want someone to work with me, do I have to cover their stipend/tuition?

Specific questions for the junior Faculty:

Are the expectations for promotion made clear to you? Do you get feedback on your progress? Do the senior faculty mentor you? Do the senior faculty get along? What do you like most about the department? What can be improved? In the last 10 years, what percent of junior faculty get promoted?

Questions for the chair:

What percent of my salary am I expected to cover? How soon? Is their bridge funding? What is a standard startup package? Can you describe the promotion process in detail? What space is available for postdocs? (for hard money place) I love teaching, but can I buy out teaching with grants? 

I am sure I missed stuff, so please comment away….

Update: I can’t believe I forgot computing! Make sure to ask about computing support. This varies a lot from place to place. Some departments share amazing systems. Ask how costs are shared? How is the IT staff? Is R supported? In others you might have to buy your own hardware. Get all the details.


OK Cupid data on Infochimps - anybody got $1k for data?

OK Cupid is an online dating site that has grown its visibility in part through a pretty awesome blog called OK Trends, where they have analyzed their online dating data to, for example, show you what kind of profile picture works best. Now, they have compiled data from their personality survey and made it available online through Infochimps. We have talked about Infochimps before, it is basically a site for distributing/selling data. Unfortunately, the OK Cupid data costs $1000. I can think of some cool analyses we could do with this data, but unfortunately the price is a little steep for me. Anybody got a grand they want to give me to buy some data? 

Related Posts: Jeff on APIs, Jeff on Data sources, Roger on Private health insurers to release data


The Cost of a U.S. College Education

As a follow up to my previous post on expected salaries by majors I want to share the following graph:

So why is the cost of higher education going up at a faster rate than most everything else? Economists please correct me if I’m wrong, but it must be that demand grew right? Universities are non-profits so they didn’t necessarily have to respond by increasing offers. But apparently they did. So if the proportion of the population going to college grew, why is there a shortage of STEM majors? I think it’s because the proportion of the population that can complete such a degree has not changed since 1985 and most of those people were already going to college. If this is right, then it implies that to make more offers, the universities had to grow majors with higher graduation rates.  The graph below (taken from here) seems to confirm this:  


Unfortunately, in 1985 there was no dearth of psychologists, visual and performing artists, and journalists. So we should not be surprised that the increase in their numbers resulted in graduates from these fields having a harder time finding employment (see bottom of this table). Meanwhile, the US has 2 million job openings that can’t be filled, many in vocational careers. So why aren’t more students opting for technical training  with good job prospects? In this NYTimes article, Motoko Rich explains that

In European countries like Germany, Denmark and Switzerland, vocational programs have long been viable choices for a significant portion of teenagers. Yet in the United States, technical courses have often been viewed as the ugly stepchildren of education, backwaters for underachieving or difficult students.

It’s hard not to think that universities have benefited from the social stigma associated with vocational degrees. In any case, as I said in my my previous post, I am not interested in telling people what to study, but universities should show students the data.


Cooperation between Referees and Authors Increases Peer Review Accuracy

Jeff Leek and colleagues just published an article in PLoS ONE on the differences between anonymous (closed) and non-anonymous (open) peer review of research articles. They developed a “peer review game” as a model system to track authors’ and reviewers’ behavior over time under open and closed systems.

Under the open system, it was possible for authors to see who was reviewing their work. They found that under the open system authors and reviewers tended to cooperate by reviewing each others’ work. Interestingly, they say

It was not immediately clear that cooperation between referees and authors would increase reviewing accuracy. Intuitively, one might expect that players who cooperate would always accept each others solutions - regardless of whether they were correct. However, we observed that when a submitter and reviewer acted cooperatively, reviewing accuracy actually increased by 11%.


Expected Salary by Major

In this recent editorial about the Occupy Wall Street movement, Richard Kim profiles a protestor that despite having a master’s degree can’t find a job. This particular protestor quit his job as a school teacher three years ago and took out a $35K student loan to obtain a master’s degree in puppetry from the University of Connecticut. I wonder if, before taking his money, UConn showed this person data on job prospects for their puppetry graduates. More generally, I wonder if any university shows their idealist 18 year old freshmen such data.

Georgetown’s Center for Education and the Workforce has an informative interactive webpage that students can use to find out by-major salary information. I scraped data from this Wall Street Journal webpage which also provides, for each major, unemployment rates, salary quartiles, and its rank in popularity. I used these data to compute expected salaries by multiplying median salary by percent of employment. The graph above shows expected salary versus popularity rank (1=most popular) for the 50 most popular majors (Go here for a complete table and here is the raw data and code). I also included Physics (the 70-th). I used different colors to represent four categories: engineering, math/stat/computers, physical sciences, and the rest. As a baseline I added a horizontal line representing the average salary for a truck driver: $65K, a job currently with plenty of openings. Different font sizes are used only to make names fit. A couple of observations stand out. First, only one of the top 10 most popular majors, Computer Science, has a higher expected salary than truck drivers. Second, Psychology, the fifth most popular major, has an expected salary of $40K and, as seen in the table, an unemployment rate of 6.1%; almost three times worse than nursing. 

A few editorial remarks: 1) I understand that being a truck driver is very hard and that there is little room for career development. 2) I am not advocating that people pick majors based on future salaries. 3) I think college freshmen deserve to know the data given how much money they fork over to us. 4) The graph is for bachelor’s degrees, not graduate education. The CEW website includes data for graduate degrees. Note that Biology shoots way up with a graduate degree. 5) For those interested in a PhD in Statistics I recommend you major in Math with a minor in a liberal arts subject, such as English, while taking as many programming classes as you can. We all know Math is the base for everything statisticians do, but why English? Students interested in academia tend to underestimate the importance of writing and communicating.

Related articles: This NY Times article describes how/why students are leaving the sciences. Here, Alex Tabarrok describes big changes in the balance of majors between 1985 and today and here he shares his thoughts on Richard Kim’s editorial. Matt Yglesias explains that unemployment is rising across the board. Finally, Peter Orszag share his views on how a changing world is changing the value of a college degree. 

Hat tip to David Santiago for sending various of these links and Harris Jaffee for help with scrapping.


Statisticians on me find more!

In honor of our blog finally dragging itself into the 21st century and jumping onto Twitter/Facebook, I have been compiling a list of statistical people on Twitter. I couldn’t figure out an easy way to find statisticians in one go (which could be because I don’t have Twitter skills). 

So here is my very informal list of statisticians I found in a half hour of searching. I know I missed a ton of people; let me know who I missed so I can update!

@leekgroup - Jeff Leek (What, you thought I’d list someone else first?)

@rdpeng - Roger Peng

@rafalab - Rafael Irizarry

@storeylab - John Storey

@bcaffo - Brian Caffo

@sherrirose - Sherri Rose

@raphg - Raphael Gottardo

@airoldilab - Edo Airoldi

@stat110 - Joe Blitzstein

@tylermccormick - Tyler McCormick

@statpumpkin - Chris Volinsky

@fivethirtyeight - Nate Silver

@flowingdata - Nathan Yau

@kinggary - Gary King

@StatModeling - Andrew Gelman

@AmstatNews - Amstat News

@hadleywickham - Hadley Wickham