Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Given the history of medicine, why are randomized trials not used for social policy?

Policy changes can have substantial societal effects. For example, clean water and  hygiene policies have saved millions, if not billions, of lives. But effects are not always positive. For example, prohibition, or the “noble experiment”, boosted organized crime, slowed economic growth and increased deaths caused by tainted liquor. Good intentions do not guarantee desirable outcomes.

The medical establishment is well aware of the danger of basing decisions on the good intentions of doctors or biomedical researchers. For this reason, randomized controlled trials (RCTs) are the standard approach to determining if a new treatment is safe and effective. In these trials an objective assessment is achieved by assigning patients at random to a treatment or control group, and then comparing the outcomes in these two groups. Probability calculations are used to summarize the evidence in favor or against the new treatment. Modern RCTs are considered one of the greatest medical advances of the 20th century.

Despite their unprecedented success in medicine, RCTs have not been fully adopted outside of scientific fields. In this post, Ben Goldcare advocates for politicians to learn from scientists and base policy decisions on RCTs. He provides several examples in which results contradicted conventional wisdom. In this TED talk Esther Duflo convincingly argues that RCTs should be used to determine what interventions are best at fighting poverty. Although some RCTs  are being conducted, they are still rare and oftentimes ignored by policymakers. For example, despite at least two RCTs finding that universal pre-K programs are not effective, polymakers in New York are implementing a $400 million a year program. Supporters of this noble endeavor defend their decision by pointing to observational studies and “expert” opinion that support their preconceived views. Before the 1950s, indifference to RCTs was common among medical doctors as well, and the outcomes were at times devastating.

Today, when we compare conclusions from non-RCT studies to RCTs, we note the unintended strong effects that preconceived notions can have. The first chapter in this book provides a summary and some examples. One example comes from a study of 51 studies on the effectiveness of the portacaval shunt. Here is table summarizing the conclusions of the 51 studies:

Design Marked Improvement Moderate Improvement None
No control 24 7 1
Controls; but no randomized 10 3 2
Randomized 1 3

Compare the first and last column to appreciate the importance of the randomized trials.

A particularly troubling example relates to the studies on Diethylstilbestrol (DES). DES is a drug that was used to prevent spontaneous abortions. Five out of five studies using historical controls found the drug to be effective, yet all three randomized trials found the opposite. Before the randomized trials convinced doctors to stop using this drug , it was given to thousands of women. This turned out to be a tragedy as later studies showed DES has terrible side effects. Despite the doctors having the best intentions in mind, ignoring the randomized trials resulted in unintended consequences.

Well meaning experts are regularly implementing policies without really testing their effects. Although randomized trials are not always possible, it seems that they are rarely considered, in particular when the intentions are noble. Just like well-meaning turn-of-the-20th-century doctors, convinced that they were doing good, put their patients at risk by providing ineffective treatments, well intentioned policies may end up hurting society.

Update: A reader pointed me to these preprints which point out that the control group in one of the cited early education RCTs included children that receive care in a range of different settings, not just staying at home. This implies that the signal is attenuated if what we want to know is if the program is effective for children that would otherwise stay at home. In this preprint they use statistical methodology (principal stratification framework) to obtain separate estimates: the effect for children that would otherwise go to other center-based care and the effect for children that would otherwise stay at home. They find no effect for the former group but a significant effect for the latter. Note that in this analysis the effect being estimated is no longer based on groups assigned at random. Instead, model assumptions are used to infer the two effects. To avoid dependence on these assumptions we will have to perform an RCT with better defined controls. Also note that the RCT data facilitated the principal stratification framework analysis. I also want to restate what I’ve posted before, “I am not saying that observational studies are uninformative. If properly analyzed, observational data can be very valuable. For example, the data supporting smoking as a cause of lung cancer is all observational. Furthermore, there is an entire subfield within statistics (referred to as causal inference) that develops methodologies to deal with observational data. But unfortunately, observational data are commonly misinterpreted.”