Simply Statistics: Are MOOC's fundamentally flawed? Or is it a problem with statistical literacy?

People know I have taught a MOOC on Data Analysis, so I frequently get emails about updates on the “state of MOOCs”. It definitely feels like the wild west of education is happening right now. If you make an analogy to air travel, I would say we are about here:

So of course I feel like it is a bit premature for quotes like this:

Two years after a Stanford professor drew 160,000 students from around the globe to a free online course on artificial intelligence, starting what was widely viewed as a revolution in higher education, early results for such large-scale courses are disappointing, forcing a rethinking of how college instruction can best use the Internet.

These headlines are being driven in large part by Sebastian Thrun, the founder of Udacity, which has had some trouble with their business model. One reason is that they seem to have had the most trouble with luring instructors from the top schools to their platform.

But the main reason that gets cited for the “failure” of MOOCs is this experiment performed at San Jose State. I previously pointed out one major flaw with the study design: that the students in the two comparison groups were not comparable.

Here are a few choice quotes from the study:

Poor response rate:

While a major effort was made to increase participation in the survey research within this population, the result was disappointing (response rates of 32% for Survey 1; 34% for Survey 2, and 32% for Survey 3).

Not a representative sample:

The research team compared the survey participants to the entire student population and found significant differences. Most importantly, students who succeeded are over-represented among the survey respondents.

Difficulties with data collection/processing:

While most of the data were provided by the end of the Spring 2013 semester, clarifications, corrections and data transformations had to be made for many weeks thereafter, including resolving accuracy questions that arose once the analysis of the Udacity platform data began

These ideas alone point to an incredibly suspect study that is not the fault of the researchers in question. They were working with the data the best they could, but the study design and data are deeply flawed. The most egregious, of course, is the difference in populations between the students who matriculated and didn’t (Tables 1-4 show the dramatic differences in population).

My take home message is that if this study were submitted to a journal it would be seriously questioned on both scientific and statistical grounds. Before we rush to claim that the whole idea of MOOCs are flawed, I think we should wait for more thorough, larger, and well-designed studies are performed.

Are MOOC’s fundamentally flawed? Or is it a problem with statistical literacy?