It is very hard to evaluate people's productivity or work in any meaningful way. This problem is the source of:
Roger and I were just talking about this problem in the context of evaluating the impact of software as a faculty member and Roger suggested the problem is that:
Evaluating people requires real work and so people are always looking for shortcuts
To evaluate a person's work or their productivity requires three things:
These three fundamental things are at the heart of why it is so hard to get good evaluations of people and why peer review and other systems are under such fire. The main source of the problem is the conflict between 1 and 2. The group of people in any organization or on any scale that is truly world class at any given topic from software engineering to history is small. It has to be by definition. This group of people inevitably has some reason to care about the success of the other people in that same group. Either they work with the other world class people and want them to succeed or they either intentionally or unintentionally are competing with them.
The conflict between being and expert and having no say wouldn't be such a problem if it wasn't for issue number 3: the time to evaluate people. To truly get good evaluations what you need is for someone who isn't an expert in a field and so has no stake to take the time to become an expert and then evaluate the person/software. But this requires a huge amount of effort on the part of a reviewer who has to become expert in a new field. Given that reviewing is often considered the least important task in people's workflow, evidenced by the value we put on people acting as peer reviewers for journals, or the value people get for doing a good job in people's evaluation for promotion in companies, it is no wonder people don't take the time to become experts.
I actually think that tenure review committees at forward thinking places may be the best at this (Rafa said the same thing about NIH study section). They at least attempt to get outside reviews from people who are unbiased about the work that a faculty member is doing before they are promoted. This system, of course, has large and well-document problems, but I think it is better than having a person's direct supervisor - who clearly has a stake - being the only person evaluating them.It is also better than only using the quantifiable metrics like number of papers and impact factor of the corresponding journals. I also think that most senior faculty who evaluate people take the job very seriously despite the only incentive being good citizenship.
Since real evaluation requires hard work and expertise, most of the time people are looking for a short cut. These short cuts typically take the form of quantifiable metrics. In the academic world these shortcuts are things like:
I think all of these things are associated with quality but none define quality. You could try to model the relationship, but it is very hard to come up with a universal definition for the outcome you are trying to model. In academics, some people have suggested that open review or post-publication review solves the problem. But this is only true for a very small subset of cases that violate rule number 2. The only papers that get serious post-publication review are where people have an incentive for the paper to go one way or the other. This means that papers in Science will be post-pub reviewed much much more often than equally important papers in discipline specific journals - just because people care more about Science. This will leave the vast majority of papers unreviewed - as evidenced by the relatively modest number of papers reviewed by PubPeer or Pubmed Commons.
I'm beginning to think that the only way to do evaluation well is to hire people whose only job is to evaluate something well. In other words, peer reviewers who are paid to review papers full time and are only measured by how often those papers are retracted or proved false. Or tenure reviewers who are paid exclusively to evaluate tenure cases and are measured by how well the post-tenure process goes for the people they evaluate and whether there is any measurable bias in their reviews.
The trouble with evaluating anything is that it is hard work and right now we aren't paying anyone to do it.
Yesterday I posted about atomizing statistical education into knowledge units. You can try out the first knowledge unit here: https://jtleek.typeform.com/to/jMPZQe. The early data is in and it is consistent with many of our hypotheses about the future of online education.
Editor's note: This idea is Brian's idea and based on conversations with him and Roger, but I just executed it.
The length of academic courses has traditionally ranged between a few days for a short course to a few months for a semester-long course. Lectures are typically either 30 minutes or one hour. Term and lecture lengths have been dictated by tradition and the relative inconvenience of coordinating schedules of the instructors and students for shorter periods of time. As classes have moved online the barrier of inconvenience to varying the length of an academic course has been removed. Despite this flexibilty, most academic online courses adhere to the traditional semester-long format. For example, the first massive online open courses were simply semester-long courses directly recorded and offered online.
Data collected from massive online open courses suggest that shrinking both the length of recorded lectures and the length of courses leads to higher student retention. These results line up with data on other online activities such as Youtube video watching or form completion, which also show that shorter activities lead to higher completion rates.
We have some of the earliest and most highly subscribed massive online open courses through the Coursera platform: Data Analysis, Computing for Data Analysis, and Mathematical Biostatistics Bootcamp. Our original courses were translated from courses we offered locally and were therefore closer to semester long with longer lectures ranging from 15-30 minutes. Based on feedback from our students and the data we observed about completion rates, we made the decision to break our courses down into smaller, one-month courses with no more than two hours of lecture material per week. Since then, we have enrolled more than a million students in our MOOCs.
The data suggest that the shorter you can make an academic unit online, the higher the completion percentage. The question then becomes “How short can you make an online course?” To answer this question requires a definition of a course. For our purposes we will define a course as an educational unit consisting of the following three components:
A typical university class delivers 36 hours = 12 weeks x 3 hours/week of content knowledge, evaluates that knowledge based on the order of 10 homework assignments and 2 tests, and results in a certification equivalent to 3 university credits.With this definition, what is the smallest possible unit that satisfies all three definitions of a course? We will call this smallest possible unit one knowledge unit. The smallest knowledge unit that satisfies all three definitions is a course that:
An example of a knowledge unit appears here: https://jtleek.typeform.com/to/jMPZQe. The knowledge unit consists of a short (less than 2 minute) video and 3 quiz questions. When completed, the unit sends the completer an email verifying that the quiz has been completed. Just as an atom is the smallest unit of mass that defines a chemical element, the knowledge unit is the smallest unit of education that defines a course.
Shrinking the units down to this scale opens up some ideas about how you can connect them together into courses and credentials. I'll leave that for a future post.
Editor's note: This post was originally titled: Personalized medicine is primarily a population health intervention. It has been updated with the graph of odds ratios/betas from GWAS studies.
There has been a lot of discussion of personalized medicine, individualized health, and precision medicine in the news and in the medical research community and President Obama just announced a brand new initiative in precision medicine . Despite this recent attention, it is clear that healthcare has always been personalized to some extent. For example, men are rarely pregnant and heart attacks occur more often among older patients. In these cases, easily collected variables such as sex and age, can be used to predict health outcomes and therefore used to "personalize" healthcare for those individuals.
So why the recent excitement around personalized medicine? The reason is that it is increasingly cheap and easy to collect more precise measurements about patients that might be able to predict their health outcomes. An example that has recently been in the news is the measurement of mutations in the BRCA genes. Angelina Jolie made the decision to undergo a prophylactic double mastectomy based on her family history of breast cancer and measurements of mutations in her BRCA genes. Based on these measurements, previous studies had suggested she might have a lifetime risk as high as 80% of developing breast cancer.
This kind of scenario will become increasingly common as newer and more accurate genomic screening and predictive tests are used in medical practice. When I read these stories there are two points I think of that sometimes get obscured by the obviously fraught emotional, physical, and economic considerations involved with making decisions on the basis of new measurement technologies:
The first point bears serious consideration in light of President Obama's new proposal. We have already collected a massive amount of genetic data about a large number of common diseases. In almost all cases, the amount of predictive information that we can glean from genetic studies is modest. One paper pointed this issue out in a rather snarky way by comparing two approaches to predicting people's heights: (1) averaging their parents heights - an approach from the Victorian era and (2) combing the latest information on the best genetic markers at the time. It turns out, all the genetic information we gathered isn't as good as averaging parents heights. Another way to see this is to download data on all genetic variants associated with disease from the GWAS catalog that have a P-value less than 1 x 10e-8. If you do that and look at the distribution of effect sizes, you see that 95% have an odds ratio or beta coefficient less than about 4. Here is a histogram of the effect sizes:
This means that nearly all identified genetic effects are small. The ones that are really large (effect size greater than 100) are not for common disease outcomes, they are for Birdshot chorioretinopathy and hippocampal volume. You can really see this if you look at the bulk of the distribution of effect sizes, which are mostly less than 2 by zooming the plot on the x-axis:
These effect sizes translate into very limited predictive capacity for most identified genetic biomarkers. The implication is that personalized medicine, at least for common diseases, is highly likely to be inaccurate for any individual person. But if we can take advantage of the population-level improvements in health from precision medicine by increasing risk literacy, improving our use of uncertain markers, and understanding that precision medicine isn't precise for any one person, it could be a really big deal.
People often talk about academic superstars as people who have written highly cited papers. Some of that has to do with people's genius, or ability, or whatever. But one factor that I think sometimes gets lost is luck and timing. So I wrote a little script to get the first 30 papers that appear when you search Google Scholar for the terms:
Google Scholar sorts by relevance, but that relevance is driven to a large degree by citations. For example, if you look at the first 10 papers you get for searching for false discovery rate you get.
People who work in this area will recognize that many of these papers are the most important/most cited in the field.
Now we can make a plot that shows for each term when these 30 highest ranked papers appear. There are some missing values, because of the way the data are scraped, but this plot gives you some idea of when the most cited papers on these topics were published:
You can see from the plot that the median publication year of the top 30 hits for "empirical processes" was 1990 and for "RNA-seq statistics" was 2010. The medians for the other topics were:
I think this pretty much matches up with the intuition most people have about the relative timing of fields, with a few exceptions (GEE in particular seems a bit late). There are a bunch of reasons this analysis isn't perfect, but it does suggest that luck and timing in choosing a problem can play a major role in the "success" of academic work as measured by citations. It also suggests another reason for success in science than individual brilliance. Given the potentially negative consequences the expectation of brilliance has on certain subgroups, it is important to recognize the importance of timing and luck. The median most cited "false discovery rate" paper was 2002, but almost none of the 30 top hits were published after about 2008.
The code for my analysis is here. It is super hacky so have mercy.
I just saw a pretty wild statistic on Twitter that less than 60% of university news releases link to the papers they are describing.
— Justin Wolfers (@JustinWolfers) January 15, 2015
Before you believe anything your read about science in the news, you need to go and find the original article. When the article isn't linked in the press release, sometimes you need to do a bit of sleuthing. Here is an example of how I do it for a news article. In general the press-release approach is very similar, but you skip the first step I describe below.
Here is the news article (link):
Step 1: Look for a link to the article
Usually it will be linked near the top or the bottom of the article. In this case, the article links to the press release about the paper. This is not the original research article. If you don't get to a scientific journal you aren't finished. In this case, the press release actually gives the full title of the article, but that will happen less than 60% of the time according to the statistic above.
Step 2: Look for names of the authors, scientific key words and journal name if available
And some key words:
Step 3 Use Google Scholar
You could just google those words and sometimes you get the real paper, but often you just end up back at the press release/news article. So instead the best way to find the article is to go to Google Scholar then click on the little triangle next to the search box.
Fill in information while you can. Fill in the same year as the press release, information about the journal, university and key words.
Step 4 Victory
Often this will come up with the article you are looking for:
Unfortunately, the article may be paywalled, so if you don't work at a university or institute with a subscription, you can always tweet the article name with the hashtag #icanhazpdf and your contact info. Then you just have to hope that someone will send it to you (they often do).
This weekend was one of those hardcore parenting weekends that any parent of little kids will understand. We were up and actively taking care of kids for a huge fraction of the weekend. (Un)fortunately I was wearing my Fitbit, so I can quantify exactly how little we were sleeping over the weekend.
Here is Saturday:
There you can see that I rocked about midnight-4am without running around chasing a kid or bouncing one to sleep. But Sunday was the real winner:
Check that out. I was totally asleep from like 4am-6am there. Nice.
Stay tuned for much more from my Fitbit data over the next few weeks.
In my last Sunday Links roundup I mentioned we were going to be really close to 1 million page views this year. Chris V. tried to rally the troops:
— Chris Volinsky (@statpumpkin) December 22, 2014
but alas we are probably not going to make it (unless by some miracle one of our posts goes viral in the next 12 hours):
Stay tuned for a bunch of cool new stuff from Simply Stats in 2015, including a new podcasting idea, more interviews, another unconference, and a new plotting theme!