Category: Uncategorized


Instead of research on reproducibility, just do reproducible research

Right now reproducibility, replicability, false positive rates, biases in methods, and other problems with science are the hot topic. As I mentioned in a previous post pointing out a flaw with a scientific study is way easier to do correctly than generating a new scientific study. Some folks have noticed that right now there is a huge market for papers pointing out how science is flawed. The combination of the relative ease of pointing out flaws and the huge payout for writing these papers is helping to generate the hype around the "reproducibility crisis".

I gave a talk a little while ago at an NAS workshop where I stated that all the tools for reproducible research exist (the caveat being really large analyses - although that is changing as well). To make a paper completely reproducible, open, and available for post publication review you can use the following approach with no new tools/frameworks needed.

  1. Use Github for version control.
  2. Use rmarkdown or iPython notebooks for your analysis code
  3. When your paper is done post it to arxiv or biorxiv.
  4. Post your data to an appropriate repository like SRA or a general purpose site like figshare.
  5. Send any software you develop to a controlled repository like CRAN or Bioconductor.
  6. Participate in the post publication discussion on Twitter and with a Blog

This is also true of open science, open data sharing, reproducibility, replicability, post-publication peer review and all the other issues forming the "reproducibility crisis". There is a lot of attention and heat that has focused on the "crisis" or on folks who make a point to take a stand on reproducibility or open science or post publication review. But in the background, outside of the hype, there are a large group of people that are quietly executing solid, open, reproducible science.

I wish that this group would get more attention so I decided to point out a few of them. Next time somebody asks me about the research on reproducibility or open science I'll just point them here and tell them to just follow the lead of people doing it.

This list was made completely haphazardly as all my lists are, but just to indicate there are a ton of people out there doing this. One thing that is clear too is that grad students and postdocs are adopting the approach I described at a very high rate.

Moreover there are people that have been doing parts of this for a long time (like the physics or biostatistics communities with preprints, or how people have used Sweave for a long time) . I purposely left people off the list like Titus and Ethan who have gone all in, even posting their grants online. I did this because they are very loud advocates of open science, but I wanted to highlight quieter contributors and point out that while there is a lot of noise going on over in one corner, many people are quietly doing really good science in another.


By opposing tracking well-meaning educators are hurting disadvantaged kids

An unfortunate fact about the US K-12 system is that the education gap between poor and rich is growing. One manifestation of this trend is that we rarely see US kids from disadvantaged backgrounds become tenure track faculty, especially in the STEM fields. In my experience, the ones that do make it, when asked how they overcame the suboptimal math education their school district provided, often respond "I was tracked" or "I went to a magnet school". Magnet schools filter students with admission tests and then teach at a higher level than an average school, so essentially the entire school is an advanced track.

Twenty years of classroom instruction experience has taught me that classes with diverse academic abilities present one of the most difficult teaching challenges. Typically, one is forced to focus on only a sub-group of students, usually the second quartile. As a consequence the lower and higher quartiles are not properly served. At the university level, we minimize this problem by offering different levels: remedial math versus math for engineers, probability for the Masters program versus probability for PhD students, co-ed intramural sports versus the varsity basketball team, intro to World Music versus a spot in the orchestra, etc. In K-12, tracking seems like the obvious solution to teaching to an array of student levels.

Unfortunately, there has been a trend recently to move away from tracking and several school districts now forbid it. The motivation seems to be a series of observational studies that note that "low-track classes tend to be primarily composed of low-income students, usually minorities, while upper-track classes are usually dominated by students from socioeconomically successful groups." Tracking opponents infer that this unfortunate reality is due to bias (conscious or unconscious) in the the informal referrals that are typically used to decide which students are advanced. However, this is a critique of the referral system, not of tracking itself. A simple fix is to administer an objective test or use the percentiles from state assessment tests. In fact, such exams have been developed and implemented. A recent study (summarized here) examined the data from a district that for a period of time implemented an objective assessment and found that

[t]he number of Hispanic students [in the advanced track increased] by 130 percent and the number of black students by 80 percent.

Unfortunately, instead of maintaining the placement criteria, which benefited underrepresented minorities without relaxing standards, these school districts reverted to the old, flawed system due to budget cuts.

Another argument against tracking is that students benefit more from being in classes with higher-achieving peers, rather than being in a class with students with similar subject mastery and a teacher focused on their level. However a recent randomized control trial (and the only one of which I am aware) finds that tracking helps all students:

We find that tracking students by prior achievement raised scores for all students, even those assigned to lower achieving peers. On average, after 18 months, test scores were 0.14 standard deviations higher in tracking schools than in non-tracking schools (0.18 standard deviations higher after controlling for baseline scores and other control variables). After controlling for the baseline scores, students in the top half of the pre-assignment distribution gained 0.19 standard deviations, and those in the bottom half gained 0.16 standard deviations. Students in all quantiles benefited from tracking. 

I believe that without tracking, the achievement gap between disadvantaged children and their affluent peers will continue to widen since involved parents will seek alternative educational opportunities, including private schools or subject specific extracurricular acceleration programs. With limited or no access to advanced classes in the public system, disadvantaged students will be less prepared to enter the very competitive STEM fields. Note that competition comes not only from within the US, but from other countries including many with educational systems that track.

To illustrate the extreme gap, the following exercises are from a 7th grade public school math class (in a high performing school district):

Screen Shot 2015-12-07 at 11.49.41 AM Screen Shot 2015-12-09 at 9.00.57 AM

(Click to enlarge). There is no tracking so all students must work on these problems. Meanwhile, in a 7th grade advanced, private math class, that same student can be working on problems like these:Screen Shot 2015-12-07 at 11.47.45 AMLet me stress that there is nothing wrong with the first example if it is the appropriate level of the student.  However, a student who can work at the level of the second example, should be provided with the opportunity to do so notwithstanding their family's ability to pay. Poorer kids in districts which do not offer advanced classes will not only be less equipped to compete with their richer peers, but many of the academically advanced ones may, I suspect,  dismiss academics due to lack of challenge and boredom.  Educators need to consider evidence when making decisions regarding policy. Tracking can be applied unfairly, but that aspect can be remedied. Eliminating tracking all together takes away a crucial tool for disadvantaged students to move into the STEM fields and, according to the empirical evidence, hurts all students.


Not So Standard Deviations: Episode 5 - IRL Roger is Totally With It

I just posted Episode 5 of Not So Standard Deviations so check your feeds! Sorry for the long delay since the last episode but we got a bit tripped up by the Thanksgiving holiday.

In this episode, Hilary and I open up the mailbag and go through some of the feedback we've gotten on the previous episodes. The rest of the time is spent talking about the importance of reproducibility in data analysis both in academic research and in industry settings.

If you haven't already, you can subscribe to the podcast through iTunes. Or you can use the SoundCloud RSS feed directly.


Download the audio file for this episode.

Or you can listen right here:


Thinking like a statistician: the importance of investigator-initiated grants

A substantial amount of scientific research is funded by investigator-initiated grants. A researcher has an idea, writes it up and sends a proposal to a funding agency. The agency then elicits help from a group of peers to evaluate competing proposals. Grants are awarded to the most highly ranked ideas. The percent awarded depends on how much funding gets allocated to these types of proposals. At the NIH, the largest funding agency of these types of grants, the success rate recently fell below 20% from a high above 35%. Part of the reason these percentages have fallen is to make room for large collaborative projects. Large projects seem to be increasing, and not just at the NIH. In Europe, for example, the Human Brain Project has an estimated cost of over 1 billion US$ over 10 years. To put this in perspective, 1 billion dollars can fund over 500 NIH R01s. R01 is the NIH mechanism most appropriate for investigator initiated proposals.

The merits of big science has been widely debated (for example here and here). And most agree that some big projects have been successful. However, in this post I present a statistical argument highlighting the importance of investigator-initiated awards. The idea is summarized in the graph below.


The two panes above represent two different funding strategies: fund-many-R01s (left) or reduce R01s to fund several large projects (right). The grey crosses represent investigators and the gold dots represent potential paradigm-shifting geniuses. Location on the Cartesian plane represent research areas, with the blue circles denoting areas that are prime for an important scientific advance. The largest scientific contributions occur when a gold dot falls in a blue circle. Large contributions also result from the accumulation of incremental work produced by grey crosses in the blue circles.

Although not perfect, the peer review approach implemented by most funding agencies appears to work quite well at weeding out unproductive researchers and unpromising ideas. They also seem to do well at spreading funds across general areas. For example NIH spreads funds across diseases and public health challenges (for example cancer, mental health, heart, genomics, heart and lung disease.) as well as general medicine, genomics and information. However, precisely predicting who will be a gold dot or what specific area will be a blue circle seems like an impossible endeavor. Increasing the number of tested ideas and researchers therefore increases our chance of success. When a funding agency decides to invest big in a specific area (green dollar signs) they are predicting the location of a blue circle. As funding flows into these areas, so do investigators (note the clusters). The total number of funded lead investigators also drops. The risk here is that if the dollar sign lands far from a blue dot, we pull researchers away from potentially fruitful areas. If after 10 years of funding, the Human Brain Project doesn't "achieve a multi-level, integrated understanding of brain structure and function" we will have missed out on trying out 500 ideas by hundreds of different investigators. With a sample size this large, we expect at least a  handful of these attempts to result in the type of impactful advance that justifies funding scientific research.

The simulation presented (code below) here is clearly an over simplification, but it does depict the statistical reason why I favor investigator-initiated grants.  The simulation clearly depicts that the strategy of funding many investigator-initiated grants is key for the continued success of scientific research.

## Start with the many R01s model
##generate location of 2,000 investigators
N = 2000
x = runif(N)
y = runif(N)
## 1% are geniuses
Ng = N*0.01
g = rep(4,N);g[1:Ng]=16
## generate location of important areas of research
M0 = 10
x0 = runif(M0)
y0 = runif(M0)
r0 = rep(0.03,M0)
##Make the plot
nullplot(xaxt="n",yaxt="n",main="Many R01s")
### Generate the location of 5 big projects
M1 = 5
x1 = runif(M1)
y1 = runif(M1)
##make initial plot
nullplot(xaxt="n",yaxt="n",main="A Few Big Projects")
### Generate location of investigators attracted
### to location of big projects. There are 1000 total
### investigators
Sigma = diag(2)*0.005
N1 = 200
Ng1 = round(N1*0.01)
g1 = rep(4,N);g1[1:Ng1]=16
for(i in 1:M1){
xy = mvrnorm(N1,c(x1[i],y1[i]),Sigma)
### generate location of investigators that ignore big projects
### note now 500 instead of 200. Note overall total
## is also less because large projects result in less
## lead investigators
N = 500
x = runif(N)
y = runif(N)
Ng = N*0.01
g = rep(4,N);g[1:Ng]=16


A thanksgiving dplyr Rubik's cube puzzle for you

Nick Carchedi is back visiting from DataCamp and for fun we came up with a dplyr Rubik's cube puzzle. Here is how it works. To solve the puzzle you have to make a 4 x 3 data frame that spells Thanksgiving like this:

To solve the puzzle you need to pipe this data frame in 

and pipe out the Thanksgiving data frame using only the dplyr commands arrange, mutate, slice, filter and select. For advanced users you can try our slightly more complicated puzzle:

See if you can do it this fast. Post your solutions in the comments and Happy Thanksgiving!


20 years of Data Science: from Music to Genomics

I finally got around to reading David Donoho's 50 Years of Data Science paper.  I highly recommend it. The following quote seems to summarize the sentiment that motivated the paper, as well as why it has resonated among academic statisticians:

The statistics profession is caught at a confusing moment: the activities which preoccupied it over centuries are now in the limelight, but those activities are claimed to be bright shiny new, and carried out by (although not actually invented by) upstarts and strangers.

The reason we started this blog over four years ago was because, as Jeff wrote in his inaugural post, we were "fired up about the new era where data is abundant and statisticians are scientists". It was clear that many disciplines were becoming data-driven and  that interest in data analysis was growing rapidly. We were further motivated because, despite this new found interest in our work, academic statisticians were, in general, more interested in the development of context free methods than in leveraging applied statistics to take leadership roles in data-driven projects. Meanwhile, great and highly visible applied statistics work was occurring in other fields such as astronomy, computational biology, computer science, political science and economics. So it was not completely surprising that some (bio)statistics departments were being left out from larger university-wide data science initiatives. Some of our posts exhorted academic departments to embrace larger numbers of applied statisticians:

[M]any of the giants of our discipline were very much interested in solving specific problems in genetics, agriculture, and the social sciences. In fact, many of today’s most widely-applied methods were originally inspired by insights gained by answering very specific scientific questions. I worry that the balance between application and theory has shifted too far away from applications. An unfortunate consequence is that our flagship journals, including our applied journals, are publishing too many methods seeking to solve many problems but actually solving none.  By shifting some of our efforts to solving specific problems we will get closer to the essence of modern problems and will actually inspire more successful generalizable methods.

Donoho points out that John Tukey had a similar preoccupation 50 years ago:

For a long time I have thought I was a statistician, interested in inferences from the particular to the general. But as I have watched mathematical statistics evolve, I have had cause to wonder and to doubt. ... All in all I have come to feel that my central interest is in data analysis, which I take to include, among other things: procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data

Many applied statisticians do the things Tukey mentions above. In the blog we have encouraged them to teach the gory details of what what they do, along with the general methodology we currently teach. With all this in mind, several months ago, when I was invited to give a talk at a department that was, at the time, deciphering their role in their university's data science initiative, I gave a talk titled 20 years of Data Science: from Music to Genomics. The goal was to explain why applied statistician is not considered synonymous with data scientist even when we focus on the same goal: extract knowledge or insights from data.

The first example in the talk related to how academic applied statisticians tend to emphasize the parts that will be most appreciated by our math stat colleagues and ignore the aspects that are today being heralded as the linchpins of data science. I used my thesis papers as examples. My dissertation work was about finding meaningful parametrization of musical sound signals thatSpectrogram my collaborators could use to manipulate sounds to create new ones. To do this, I prepared a database of sounds, wrote code to extract and import the digital representations from CDs into S-plus (yes, I'm that old), visualized the data to motivate models, wrote code in C (or was it Fortran?) to make the analysis go faster, and tested these models with residual analysis by ear (you can listen to them here). None of these data science aspects were highlighted in the papers I wrote about my thesis. Here is a screen shot from this paper:

Screen Shot 2015-04-15 at 12.24.40 PM

I am actually glad I wrote out and published all the technical details of this work.  It was great training. My point was simply that based on the focus of these papers, this work would not be considered data science.

The rest of my talk described some of the work I did once I transitioned into applications in Biology. I was fortunate to have a department chair that appreciated lead-author papers in the subject matter journals as much as statistical methodology papers. This opened the door for me to become a full fledged applied statistician/data scientist. In the talk I described how developing software packages, planning the gathering of data to aid method development, developing web tools to assess data analysis techniques in the wild, and facilitating data-driven discovery in biology has been very gratifying and, simultaneously, helped my career. However, at some point, early in my career, senior members of my department encouraged me to write and submit a methods paper to a statistical journal to go along with every paper I sent to the subject matter journals. Although I do write methods papers when I think the ideas add to the statistical literature, I did not follow the advice to simply write papers for the sake of publishing in statistics journals. Note that if (bio)statistics departments require applied statisticians to do this, then it becomes harder to have an impact as data scientists. Departments that are not producing widely used methodology or successful and visible applied statistics projects (or both), should not be surprised when they are not included in data science initiatives. So, applied statistician, read that Tukey quote again, listen to President Obama, and go do some great data science.




Some Links Related to Randomized Controlled Trials for Policymaking

In response to my previous post, Avi Feller sent me these links related to efforts promoting the use of RCTs  and evidence-based approaches for policymaking:

  •  The theme of this year's just-concluded APPAM conference (the national public policy research organization) was "evidence-based policymaking," with a headline panel on using experiments in policy (see here and here).
  • Jeff Liebman has written extensively about the use of randomized experiments in policy (see here for a recent interview).
  • The White House now has an entire office devoted to running randomized trials to improve government performance (the so-called "nudge unit"). Check out their recent annual report here.
  • JPAL North America just launched a major initiative to help state and local governments run randomized trials (see here).

Given the history of medicine, why are randomized trials not used for social policy?

Policy changes can have substantial societal effects. For example, clean water and  hygiene policies have saved millions, if not billions, of lives. But effects are not always positive. For example, prohibition, or the "noble experiment", boosted organized crime, slowed economic growth and increased deaths caused by tainted liquor. Good intentions do not guarantee desirable outcomes.

The medical establishment is well aware of the danger of basing decisions on the good intentions of doctors or biomedical researchers. For this reason, randomized controlled trials (RCTs) are the standard approach to determining if a new treatment is safe and effective. In these trials an objective assessment is achieved by assigning patients at random to a treatment or control group, and then comparing the outcomes in these two groups. Probability calculations are used to summarize the evidence in favor or against the new treatment. Modern RCTs are considered one of the greatest medical advances of the 20th century.

Despite their unprecedented success in medicine, RCTs have not been fully adopted outside of scientific fields. In this post, Ben Goldcare advocates for politicians to learn from scientists and base policy decisions on RCTs. He provides several examples in which results contradicted conventional wisdom. In this TED talk Esther Duflo convincingly argues that RCTs should be used to determine what interventions are best at fighting poverty. Although some RCTs  are being conducted, they are still rare and oftentimes ignored by policymakers. For example, despite at least two RCTs finding that universal pre-K programs are not effective, polymakers in New York are implementing a $400 million a year program. Supporters of this noble endeavor defend their decision by pointing to observational studies and "expert" opinion that support their preconceived views. Before the 1950s, indifference to RCTs was common among medical doctors as well, and the outcomes were at times devastating.

Today, when we compare conclusions from non-RCT studies to RCTs, we note the unintended strong effects that preconceived notions can have. The first chapter in this book provides a summary and some examples. One example comes from a study of 51 studies on the effectiveness of the portacaval shunt. Here is table summarizing the conclusions of the 51 studies:

Design Marked Improvement Moderate Improvement None
No control 24 7 1
Controls; but no randomized 10 3 2
Randomized 0 1 3

Compare the first and last column to appreciate the importance of the randomized trials.

A particularly troubling example relates to the studies on Diethylstilbestrol (DES). DES is a drug that was used to prevent spontaneous abortions. Five out of five studies using historical controls found the drug to be effective, yet all three randomized trials found the opposite. Before the randomized trials convinced doctors to stop using this drug , it was given to thousands of women. This turned out to be a tragedy as later studies showed DES has terrible side effects. Despite the doctors having the best intentions in mind, ignoring the randomized trials resulted in unintended consequences.

Well meaning experts are regularly implementing policies without really testing their effects. Although randomized trials are not always possible, it seems that they are rarely considered, in particular when the intentions are noble. Just like well-meaning turn-of-the-20th-century doctors, convinced that they were doing good, put their patients at risk by providing ineffective treatments, well intentioned policies may end up hurting society.

Update: A reader pointed me to these preprints which point out that the control group in one of the cited early education RCTs included children that receive care in a range of different settings, not just staying at home. This implies that the signal is attenuated if what we want to know is if the program is effective for children that would otherwise stay at home. In this preprint they use statistical methodology (principal stratification framework) to obtain separate estimates: the effect for children that would otherwise go to other center-based care and the effect for children that would otherwise stay at home. They find no effect for the former group but a significant effect for the latter. Note that in this analysis the effect being estimated is no longer based on groups assigned at random. Instead, model assumptions are used to infer the two effects. To avoid dependence on these assumptions we will have to perform an RCT with better defined controls. Also note that the RCT data facilitated the principal stratification framework analysis. I also want to restate what I've posted before, "I am not saying that observational studies are uninformative. If properly analyzed, observational data can be very valuable. For example, the data supporting smoking as a cause of lung cancer is all observational. Furthermore, there is an entire subfield within statistics (referred to as causal inference) that develops methodologies to deal with observational data. But unfortunately, observational data are commonly misinterpreted."


So you are getting crushed on the internet? The new normal for academics.

Roger and I were just talking about all the discussion around the Case and Deaton paper on death rates for middle class people. Andrew Gelman discussed it among many others. They noticed a potential bias in the analysis and did some re-analysis. Just yesterday an economist blogger wrote a piece about academics versus blogs and how many academics are taken by surprise when they see their paper being discussed so rapidly on the internet. Much of the debate comes down to the speed, tone, and ferocity of internet discussion of academic work - along with the fact that sometimes it isn't fully fleshed out.

I have been seeing this play out not just in the case of this specific paper, but many times that folks have been confronted with blogs or the quick publication process of f1000Research. I think it is pretty scary for folks who aren't used to "internet speed" to see this play out and I thought it would be helpful to make a few points.

  1. Everyone is an internet scientist now. The internet has arrived as part of academics and if you publish a paper that is of interest (or if you are a Nobel prize winner, or if you dispute a claim, etc.) you will see discussion of that paper within a day or two on the blogs. This is now a fact of life.
  2. The internet loves a fight. The internet responds best to personal/angry blog posts or blog posts about controversial topics like p-values, errors, and bias. Almost certainly if someone writes a blog post about your work or an f1000 paper it will be about an error/bias/correction or something personal.
  3. Takedowns are easier than new research and happen faster. It is much, much easier to critique a paper than to design an experiment, collect data, figure out what question to ask, ask it quantitatively, analyze the data, and write it up. This doesn't mean the critique won't be good/right it just means it will happen much much faster than it took you to publish the paper because it is easier to do. All it takes is noticing one little bug in the code or one error in the regression model. So be prepared for speed in the response.

In light of these three things, you have a couple of options about how to react if you write an interesting paper and people are discussing it - which they will certainly do (point 1), in a way that will likely make you uncomfortable (point 2), and faster than you'd expect (point 3). The first thing to keep in mind is that the internet wants you to "fight back" and wants to declare a "winner". Reading about amicable disagreements doesn't build audience. That is why there is reality TV. So there will be pressure for you to score points, be clever, be fast, and refute every point or be declared the loser. I have found from my own experience that is what I feel like doing too. I think that resisting this urge is both (a) very very hard and (b) the right thing to do. I find the best solution is to be proud of your work, but be humble, because no paper is perfect and thats ok. If you do the best you can , sensible people will acknowledge that.

I think these are the three ways to respond to rapid internet criticism of your work.

  • Option 1: Respond on internet time. This means if you publish a big paper that you think might be controversial  you should block off a day or two to spend time on the internet responding. You should be ready to do new analysis quickly, be prepared to admit mistakes quickly if they exist, and you should be prepared to make it clear when there aren't. You will need social media accounts and you should probably have a blog so you can post longer form responses. Github/Figshare accounts make it better for quickly sharing quantitative/new analyses. Again your goal is to avoid the personal and stick to facts, so I find that Twitter/Facebook are best for disseminating your more long form responses on blogs/Github/Figshare. If you are going to go this route you should try to respond to as many of the major criticisms as possible, but usually they cluster into one or two specific comments, which you can address all in one.
  • Option2 : Respond in academic time. You might have spent a year writing a paper to have people respond to it essentially instantaneously. Sometimes they will have good points, but they will rarely have carefully thought out arguments given the internet-speed response (although remember point 3 that good critiques can be faster than good papers). One approach is to collect all the feedback, ignore the pressure for an immediate response, and write a careful, scientific response which you can publish in a journal or in a fast outlet like f1000Research. I think this route can be the most scientific and productive if executed well. But this will be hard because people will treat that like "you didn't have a good answer so you didn't respond immediately". The internet wants a quick winner/loser and that is terrible for science. Even if you choose this route though, you should make sure you have a way of publicizing your well thought out response - through blogs, social media, etc. once it is done.
  • Option 3: Do not respond. This is what a lot of people do and I'm unsure if it is ok or not. Clearly internet facing commentary can have an impact on you/your work/how it is perceived for better or worse. So if you ignore it, you are ignoring those consequences. This may be ok, but depending on the severity of the criticism may be hard to deal with and it may mean that you have a lot of questions to answer later. Honestly, I think as time goes on if you write a big paper under a lot of scrutiny Option 3 is going to go away.

All of this only applies if you write a paper that a ton of people care about/is controversial. Many technical papers won't have this issue and if you keep your claims small, this also probably won't apply. But I thought it was useful to try to work out how to act under this "new normal".


Prediction Markets for Science: What Problem Do They Solve?

I've recently seen a bunch of press on this paper, which describes an experiment with developing a prediction market for scientific results. From FiveThirtyEight:

Although replication is essential for verifying results, the current scientific culture does little to encourage it in most fields. That’s a problem because it means that misleading scientific results, like those from the “shades of gray” study, could be common in the scientific literature. Indeed, a 2005 study claimed that most published research findings are false.


The researchers began by selecting some studies slated for replication in the Reproducibility Project: Psychology — a project that aimed to reproduce 100 studies published in three high-profile psychology journals in 2008. They then recruited psychology researchers to take part in two prediction markets. These are the same types of markets that people use to bet on who’s going to be president. In this case, though, researchers were betting on whether a study would replicate or not.

There are all kinds of prediction markets these days--for politics, general ideas--so having one for scientific ideas is not too controversial. But I'm not sure I see exactly what problem is solved by having a prediction market for science. In the paper, they claim that the market-based bets were better predictors of the general survey that was administrated to the scientists. I'll admit that's an interesting result, but I'm not yet convinced.

First off, it's worth noting that this work comes out of the massive replication project conducted by the Center for Open Science, where I believe they have a fundamentally flawed definition of replication. So I'm not sure I can really agree with the idea of basing a prediction market on such a definition, but I'll let that go for now.

The purpose of most markets is some general notion of "price discovery". One popular market is the stock market and I think it's instructive to see how that works. Basically, people continuously bid on the shares of certain companies and markets keep track of all the bids/offers and the completed transactions. If you are interested in finding out what people are willing to pay for a share of Apple, Inc., then it's probably best to look at...what people are willing to pay. That's exactly what the stock market gives you. You only run into trouble when there's no liquidity, so no one shows up to bid/offer, but that would be a problem for any market.

Now, suppose you're interested in finding out what the "true fundamental value" of Apple, Inc. Some people think the stock market gives you that at every instance, while others think that the stock market can behave irrationally for long periods of time. Perhaps in the very long run, you get a sense of the fundamental value of a company, but that may not be useful information at that point.

What does the market for scientific hypotheses give you? Well, it would be one thing if granting agencies participated in the market. Then, we would never have to write grant applications. The granting agencies could then signal what they'd be willing to pay for different ideas. But that's not what we're talking about.

Here, we're trying to get at whether a given hypothesis is true or not. The only real way to get information about that is to conduct an experiment. How many people betting in the markets will have conducted an experiment? Likely the minority, given that the whole point is to save money by not having people conduct experiments investigating hypotheses that are likely false.

But if market participants aren't contributing real information about an hypothesis, what are they contributing? Well, they're contributing their opinion about an hypothesis. How is that related to science? I'm not sure. Of course, participants could be experts in the field (although not necessarily) and so their opinions will be informed by past results. And ultimately, it's consensus amongst scientists that determines, after repeated experiments, whether an hypothesis is true or not. But at the early stages of investigation, it's not clear how valuable people's opinions are.

In a way, this reminds me of a time a while back when the EPA was soliciting "expert opinion" about the health effects of outdoor air pollution, as if that were a reasonable substitute for collecting actual data on the topic. At least it cost less money--just the price of a conference call.

There's a version of this playing out in the health tech market right now. Companies like Theranos and 23andMe are selling health products that they claim are better than some current benchmark. In particular, Theranos claims its blood tests are accurate when only using a tiny sample of blood. Is this claim true or not? No one outside Theranos knows for sure, but we can look to the financial markets.

Theranos can point to the marketplace and show that people are willing to pay for its products. Indeed, the $9 billion valuation of the private company is another indicator that people...highly value the company. But ultimately, we still don't know if their blood tests are accurate because we don't have any data. If we were to go by the financial markets alone, we would necessarily conclude that their tests are good, because why else would anyone invest so much money in the company?

I think there may be a role to play for prediction markets in science, but I'm not sure discovering the truth about nature is one of them.