17
May

## When does replication reveal fraud?

213

Here's a little thought experiment for your weekend pleasure. Consider the following:

Joe Scientist decides to conduct a study (call it Study A) to test the hypothesis that a parameter D > 0 vs. the null hypothesis that D = 0. He designs a study, collects some data, conducts an appropriate statistical analysis and concludes that D > 0. This result is published in the Journal of Awesome Results along with all the details of how the study was done.

Jane Scientist finds Joe's study very interesting and tries to replicate his findings. She conducts a study (call it Study B) that is similar to Study A but completely independent of it (and does not communicate with Joe). In her analysis she does not find strong evidence that D > 0 and concludes that she cannot rule out the possibility that D = 0. She publishes her findings in the Journal of Null Results along with all the details.

From these two studies, which of the following conclusions can we make?

1. Study A is obviously a fraud. If the truth were that D > 0, then Jane should have concluded that D > 0 in her independent replication.
2. Study B is obviously a fraud. If Study A were conducted properly, then Jane should have reached the same conclusion.
3. Neither Study A nor Study B was a fraud, but the result for Study A was a Type I error, i.e. a false positive.
4. Neither Study A nor Study B was a fraud, but the result for Study B was a Type II error, i.e a false negative.

I realize that there are a number of subtle details concerning why things might happen but I've purposely left them out. My question is, based on the information that you actually have about the two studies, what would you consider to be the most likely case? What further information would you like to know beyond what was given here?

• Eric Yount

I'd said _most_ likely, given a standard alpha=.05 and beta=.2, would be 4. I'd immediately rule out fraud, unless there was more evidence than given above. That would give #3 a 1/20 chance (alpha) of being the case, if D is not significantly different from 0 or #4 a 1/5 chance (beta) of being the case, if D is different from 0.

• Matthew

I agree with Eric. We conventionally allow a smaller type 1 error rate than type 2 error rate, so my first instinct would be that study B committed a type 2 error.

• Ken

Type 2 error is only conditional on a given effect size and sample size so you don't know this. The actual power may be not much above 0.05 to find any reasonable difference.

Given the way most science works, I would expect that study A was a type I error. What I would want to know is the power for study B given an effect size that is consistent with study A, and then could determine if study B was likely to be type II error. Would need a lot more evidence to accuse anyone of fraud.

• xyx

good

• Dan Scharfstein

why don't you put this on the qualifying exam?

• Roger Peng

Dan, I'm not in charge of that anymore, remember?

• Daniel Scharfstein

Nope.

My initial guess is choice 3. I think it is more likely that the first study revealed a false positive. Maybe there were subtle differences in the two experiments which Jane can use to shed light on unspoken assumptions by Joe.

• Henrik Kristensen

The description doesn't mention anything about assumptions, methods, etc. It's hard to judge the end results if you don't know what they're based on. That aside, graphs of the results (e.g., showing actual distributions) might help figure out what was really going on here?

A thought experiment helps a bit. Assume three things: (1) the p-value estimated in study A was exactly .05, (2) the estimated value of D happened to be exactly accurate--that is, the estimate produced in study A was the exact value fo D in real life, finally, (3) that samples of D are normally distributed--that is there's no bias in how it's measured, only error. If that's the case then half of the replication tests would fail a null of D=0 at a 95% significance level.

A single replication failure is not evidence of fraud or even of Type I Error.

• Howard Johnson

I would look first at a power estimate of study A and the related estimate of the variance explained. Would that not provide some evidence of the probability of #3. Beyond that, I would suspect to find something in the methodology. There are so many possibilities to be found there. My favorite cynical reference on methodology is Paul Meehl's 1990, Why summaries of research on psychological theories are often uninterpretable. Psychological Reports, 66, 195-244

• Chuck Rohde

You first need to define what evidence is!

• Steven Salzberg

My conclusion is that Study A appears in Science or Nature. Study B appears in ArXiv and after 3 rejections finally gets published a year later in PLoS ONE. Investigators learn once again that there's no point in trying to publish negative results.

Ergo: for every negative result that gets published, another 5-10 negative results get tossed in the bin. Thus a negative result carries more weight.

I'd first like to know group sizes and effect sizes of both studies before drawing some conclusion as to the big F.

• Desmond

• Keith O’Rourke

Maybe I am missing something here but

1. Differences in statistical significance aren't significant (Andrew Gelman phrase)

2. From wiki entry on meta-analysis http://en.wikipedia.org/wiki/Meta-analysis

"In statistics, a meta-analysis refers to methods focused on _contrasting_ and combining results from different studies, in the hope of identifying patterns among study results, _sources of disagreement_ among those results, or other interesting relationships that may come to light in the context of multiple studies".
So the question to me, sounds like asking for the sound of one hand clapping...

• abc

test

enter

• abc