In today’s Wall Street Journal, Amy Marcus has a piece on the Citizen Science movement, focusing on citizen science in health in particular. I am fully in support of this enthusiasm and a big fan of citizen science - if done properly. There have already been some pretty big success stories. As more companies like Fitbit and 23andMe spring up, it is really easy to collect data about yourself (right Chris?). At the same time organizations like Patients Like Me make it possible for people with specific diseases or experiences to self-organize.
But the thing that struck me the most in reading the article is the importance of statistical literacy for citizen scientists, reporters, and anyone reading these articles. For example the article says:
The questions that most people have about their DNA—such as what health risks they face and how to prevent them—aren’t always in sync with the approach taken by pharmaceutical and academic researchers, who don’t usually share any potentially life-saving findings with the patients.
I think its pretty unlikely that any organization would hide life-saving findings from the public. My impression from reading the article is that this statement refers to keeping results blinded from patients/doctors during an experiment or clinical trial. Blinding is a critical component of clinical trials, which reduces many potential sources of bias in the results of a study. Obviously, once the trial/study has ended (or been stopped early because a treatment is effective) then the results are quickly disseminated.
Several key statistical issues are then raised in bullet-point form without discussion:
Amateurs may not collect data rigorously, they say, and may draw conclusions from sample sizes that are too small to yield statistically reliable results.
Having individuals collect their own data poses other issues. Patients may enter data only when they are motivated, or feeling well, rendering the data useless. In traditional studies, both doctors and patients are typically kept blind as to who is getting a drug and who is taking a placebo, so as not to skew how either group perceives the patients’ progress.
The article goes on to describe an anecdotal example of citizen science - which suffers from a key statistical problem (small sample size):
Last year, Ms. Swan helped to run a small trial to test what type of vitamin B people with a certain gene should take to lower their levels of homocysteine, an amino acid connected to heart-disease risk. (The gene affects the body’s ability to metabolize B vitamins.)
Seven people—one in Japan and six, including herself, in her local area—paid around $300 each to buy two forms of vitamin B and Centrum, which they took in two-week periods followed by two-week “wash-out” periods with no vitamins at all.
The article points out the issue:
The scientists clapped politely at the end of Ms. Swan’s presentation, but during the question-and-answer session, one stood up and said that the data was not statistically significant—and it could be harmful if patients built their own regimens based on the results.
But doesn’t carefully explain the importance of sample size, suggesting instead that the only reason why you need more people is “insure better accuracy”.
It strikes me that statistical literacy is critical if the citizen science movement is going to go forward. Ideas like experimental design, randomization, blinding, placebos, and sample size need to be in the toolbox of any practicing citizen scientist.
One major drawback is that there are very few places where the general public can learn about statistics. Mostly statistics is taught in university courses. Resources like the Kahn Academy and the Cartoon Guide to Statistics exist, but are only really useful if you are self motivated and have some idea of math/statistics to begin with.
Since knowledge of basic statistical concepts is quickly becoming indispensable for citizen science or even basic life choices like deciding on healthcare options, do we need “adult statistical literacy courses”? These courses could focus on the basics of experimental design and how to understand results in stories about science in the popular press. It feels like it might be time to add a basic understanding of statistics and data to reading/writing/arithmetic as critical life skills. I’m not the only one who thinks so.