An instructor's thoughts on peer-review for data analysis in Coursera

Tweet about this on TwitterShare on Facebook3Share on Google+4Share on LinkedIn1Email this to someone

I used peer-review for the data analysis course I just finished. As I mentioned in the post-mortem podcast I knew in advance that it was likely to be the most controversial component of the class. So it wasn't surprising that based on feedback in the discussion boards and on this blog, the peer review process is by far the thing students were most concerned about.

But to evaluate complete data analysis projects at scale there is no other alternative that is economically feasible. To give you an idea, I have our local students perform 3 data analyses in an 8 week term here at Johns Hopkins. There are generally 10-15 students in that class and I estimate that I spend around an hour reading each analysis, digesting what was done, and writing up comments. That means I usually spend almost an entire weekend grading just for 10-15 data analyses. If you extrapolate that out to the 5,000 or so people who turned in data analysis assignments, it is clearly not possible for me to do all the grading.

Another alternative would be to pay trained data analysts to grade all the assignments. Of course that would be expensive - you couldn't farm it out to the mechanical turk. If you want to get a better/more consistent grading scheme than peer review you'd need to hire highly trained data analysts to do that and that would be very expensive. While Johns Hopkins has been incredibly supportive in terms of technical support and giving me the flexibility to pursue the class, it is definitely something I did on my own time and with a lot of my own resources. It isn't clear that it make sense for Hopkins to pour huge resources into really high-quality grading. At the same time, I'm not sure Coursera could afford to do this for all of the classes where peer review is needed, as they are just a startup.

So I think that at least for the moment, peer review is the best option for grading. This has  big implications for the value of the Coursera statements of accomplishment in classe where peer review is necessary. I think that it would benefit Coursera hugely to do some research on how to ensure/maintain quality in peer review (Coursera - if you are reading this and you have some $$ you want to send my way to support some students/postdocs I have some ideas on how to do that). The good news is that the amazing Coursera platform collects so much data that it is possible to do that kind of research.


  • MOOC Student

    I was a student in your class who completed all the assignments. On the first of my two assignments, the peer review matched my self review almost exactly. On the second one there were some discrepancies but I still received an acceptable grade so I wasn't too unhappy with it.

    I think the peer assessment would have been more successful if the rubric was tweaked a bit to more closely follow the goals of each assignment. It would also be nice if there was some way to explain why a particular score was given.

    Seeing the distribution of peer grades would also be interesting.

    Thanks for the class. I've also been enjoying your blog.

  • Kirk

    I was also a student in the class and agree with the previous comment. In general, I got the score I expected & am happy with it. But, there are several discrepancies between my self-evaluation and those of my peers. Without comments from my peers, I do not know how I can improve & it is a lost opportunity for learning.

  • Phillip Johnson

    I actually did a review of the class on my blog and this is one of the issues I discussed. I'm wondering if maybe the better option for MOOCs is to avoid projects like a complete analysis that require another person to review. I listened to your post-mortem and I agree that being able to communicate findings is a critical component of data analysis. However, since we know that a MOOC cannot replace a true university course, do we need to stick to the conventional types of university course assignments? Especially since the class is so heterogeneous, one format for data analysis may not be appropriate. I can tell you if I turned in an academic analysis like I completed for this class to my manager, his eyes would glaze over.

    Even with the peer grading, I did enjoy the class and felt I learned a lot. I hope to see it offered again in the future with some changes so that people who left because they were discouraged have a second chance.

    • heather

      I like this idea on assessment. I teach online, but not in a MOOC (http://octette.cs.man.ac.uk/bioinformatics/). My courses are part of a Masters programme, so the assessments need to build towards a research project. Writing for a manager could be much more useful for some people.

  • Pavel

    Hi, I also was a student in your class, I have completed 8 courses from the beginning of 2012, most of them had some programming tasks. In my opinion most interesting tasks were in Stanford's NLP class (https://class.coursera.org/nlp/class/index ), and main reason is that you always was to balance between underfitting or overfitting your model. We were given with training set and validation set, but finale score was based on test set, which is not available for you to download. It was very interesting and challenging to improve your model. For Coursera it is not hard to create server-based grading-testing system.

    May be you should look into that way of grading, for example you may restrict student with some allowed libraries. As for me it would be more interesting to work on such type of task. You may say, that some students will not use exploratory analysis, graphs, confounders and so on. But as I understand, if we are restricted with some R libraries, we will use it, because there will not be any other way to get good grade.

    For example, you may ask to create model only using linear regression, so simplest way to build prediction is to train model using all variables, but it will not give best result. Sorry for my English -)

  • Fabrizio

    I was enrolled to the course, but I had to give up (I'm trying again next time though) because in the past two months I was too busy writing grant proposals (I am a PostDoc in Genomics). By the way, the course is awesome (like Prof. Peng's course, which I completed in October).

    I just wanted to say that I agree that Peer Review can be unfair or non-optimal to score assignments in an academic course, but it is also the method that is closest to the real world. Being a scientist I had to get used to have my work (both papers and grant proposals) being evaluated by anonymous reviewers. And, as everybody in science knows, pretty often the peer review process presents some problem.
    It's not perfect, but I think it's the best system we have, so I think it is good to start facing it in academic courses and/or MOOCs, in order to get used to it, and maybe to get better in presenting our analyses.

    • Max Gordon

      I agree. I've gotten some terrible real peer reviews, but if those are the rules of the game. I guess most that take this class have some work related experience and should be aware of that life is not fair, if not - it's a lesson that is good to learn.

      I completed the class was 1.5 points from the higher grade, but to be honest I could have put in a little more effort and I took the class for my own education and not for the grade. I learned a lot although I believe that the most valuable feedback was lost as there was no comment-box for the assignments. In the second assignment I actually asked for feedback and provided my e-mail but I guess people were afraid to e-mail me their thoughts.

  • dirkmjk

    I totally understand that peer review is used for practical reasons and luckily I didn’t have much to complain about when it regards how my assignments were rated. That said, I’d be interested in data on inter-rater agreement, just out of curiosity.

  • herchu

    I was a bit afraid when I saw how the course would be graded but common sense says it's the only way to do it for a class like that. It actually ended a lot, lot better than I thought. Maybe I was just lucky, but I've got to grade a handful of really brilliant analysis which I'm happily keeping. MOOCs are different. Let's realize that, unlike traditional education, we won't be able to learn from our teacher corrections but from our peers. We must generous and give up some "fairness" in order to be able to participate in such top quality free course. We should evaluate the course on the quantity and quality of what we've learnt and with those parameters in mind I don't know how much better can it be.

    PS. It's been an honor to be a student of yours. Thanks a lot.

  • Mary Howard

    It is clear from the discussion boards that there were many students in the class, even those that did all the assignments, who were unprepared for graduate level work in biostatistics. They were able to pass the quizzes due to a lot of comments closely resembling answers on the discussion boards, but had a great deal of trouble with the assignments. Yet these persons were required to grade in this course, a data analysis course.
    So we say there is peer-grading, but in reality there is such a range of students that students are by no means peers at all. These students have no way to grade rubric questions such as , "were the methods correctly applied?"
    My suggestion, if the assignments are to stay the way they are, is this: It would take 500 graders grading 50 papers each to cover all 5000 students, but that might be possible to find as 500 volunteer graders (or call them community TA's if preferred). And then have those people grade the assignments. All you can give them is recognition, but that's not expensive, so then give special certificates to those who choose to be graders. Some inter-rater reliability checks might be needed, but each grader will have graded more papers so that should be possible, and I doubt those unqualified would sign up to be graders.

  • http://twitter.com/skiihne Suzanne Kiihne

    Thank you for all your work on the course and for your careful thinking about grading, MOOCs and the future of education. I did Prof. Peng's course in October, and also completed your Data Analysis course. I've also looked in on a couple of other Coursera courses, but not finished them. I have mixed feelings about the peer grading. On the one hand, my grades for the second assignment were quite a bit lower than I thought they should be. On the other hand, it makes me understand my own communication problems and blind spots a bit better. It is not peer review as in science -- that is imperfect, and sometimes you get a reviewer with insufficient background in the field, but mostly there is quite a bit of common ground. In the MOOC, there can be very little common ground, and many students don't have enough technical background to really make a judgement.

    A sample solution and 3x sample reviews with marking schemes might help improve some of the grading. The sample data analysis for earthquake data was very useful, and if you could get a grad student or post-doc or 2 to provide a short review of it, that might be a helpful guide for some students.

    However, there is a basic problem of motivation: As a student, you can spend a lot of time grading conscientiously, or you can just go through the motions and click some boxes at random. There is no reward in being careful about it, so the rational response to do a minimal job. The Coursera statistics should show this -- how many reviews were there with the same mark for all the rubric questions?

  • Simon

    Thanks, Jeff, for all that time generously given.

    I agree that gettting the rubric tuned to the assignment is probably a bigger issue than applying the rubric (apart from 'were the methods appropriately applied?'). Is there any way you could try the assignements early on some eg Community TAs, or students? Think that kind of feedback could make a huge difference.

    More radically, you could invite/sample some early submissions (within first couple of days of assignment) for grading by you or people you trust and not finalise rubric/grading til after that. Submissions are not put out for peer-review for 7 days, so there would be time. Better to have a tuned rubric that isn't available in full beforehand than a raw one that is.

  • David Hood

    As another student who did the course, though speaking as someone with a background in the area, I thought that the peer assignments were absolutely critical in getting students to confront the question "how do I approach an analysis/ think about the data in this analysis". I can also see why most other similar courses avoid the issue by guiding the students into a "in this case implement this technique" type of assessments for ease of marking.

    I'm not sure the platform at the moment permits the sort of workflow that would approximate the full peer review process (student does analysis, peers review analysis, student improves analysis with feedback, peers (with access to initial work, reviews, and revised work) mark how the work was improved in response to the feedback. But something where the student could see more immediate personal benefit would be good, because at the moment people aren't seeing why doing such open assignments is advantageous.

    I think their were a few rubric issues that you have to expect with a massively cross cultural paper where the norms of the assignment aren't conveyed in person. What might be useful in future iterations is a rubric with more in the way of example material/ explanations of why parts are important, and in week one posting it as a draft rubric that people could draw out issues of interpretation (the many eyes approach). That then gets finalised after a couple of weeks at about the time people are actually starting the assignment.

  • http://profiles.google.com/jrminter John Minter

    I, too, was a student in the class and appreciated the work that Jeff put into it. I agree that given the open-ended nature of the assignments and the size of the class that peer grading is the most workable option.

    I think there are some things that the instructor can do make this work with fewer hiccups: 1) provide an exemplar answer that covers all the points as a template to aid the graders as we apply the rubric. 2) provide a text box for specific feedback on the top one or two improvements the student could have made to improve the analysis. In the end, the certificates are just momentos for us to remind us of the hard work and value we got from the course. The true value is the skills we picked up and specific feedback to correct misconceptions or to help clarify a concept has great values to the students.

    • David Hood

      While I think example material illustrating rubric points would be a very good idea, a full exemplar project makes me nervous in as much that it was pretty clear from the course discussions that among the very wide student community there were some students who had never experienced the concept of open assignments and were having trouble wrapping their head around it. My concern would be that these students would treat an exemplar assignment as a marking guide rather than an example. I suppose this could be solved by have two exemplars that fulfil the grading criteria, but in different ways (so making it clear there are alternatives).

      • JGreenberg

        I found the example given before the first assignment quite helpful, though I had to keep reminding myself not to follow it and the rubric slavishly. I tried to focus less on the score and more on what I could learn from the exercise, and found both assignments useful from that perspective.

        I too would have liked to get free-text feedback from my graders, as an outsider's perspective on one's work is often helpful. My reviewers on average thought well of my analyses, but I had no opportunity to use constructive criticism to improve my thinking. You'd have to make sure such feedback didn't fuel a grade-revision frenzy, though.

  • Mary Howard

    Also I doubt that it is outrageously expensive to hire persons to do grading. Assume graders can be found at $20 per hour, a wage most PHD students and a lot of working professionals would be wiling to put in 100 hours for, which would make them $2000 per course. Each student needs how many hours of grading? 2 assignments at 1 hour each = 2 hours per assignement. So if you have 5000 students who submit 2 grading assignments each, you need 10,000 hour of grading. If yo have each grader do 100 hours of grading, you need 100 graders, more if some graders do less grading. But per student, each student needs 2 hours of grading, so at $20.00 per hour each student needs $40 worth of grading. Yes, you would have to start charging for courses, but would Coursera get 5000 students if it charged $200 a student, to cover the $40 in grading plus administrative expenses? I think it could. Even to pay Dr. Leek at $500 an hour, it might still be able to make a profit, because the videos need be done only once and after that the cost would be in grading and the instructor participating in the course. But the grading itself; is only around $40.00 a student. Even if you paid $100 an hour, that's only $200 a student; much less than the normal class costs today.

    • Chrisfs

      The logistics in your example aren't practical. Did you take the course? You say it would take 100 graders 100 hours to grade the assignments. Say 50 hrs for the first and 50 hrs for the second. So working full time, it would take over a week to get an assignment returned. And I can't see any graduate student would work full time for a week on this as they have their own studies to attend to. So you are going to have to wait two weeks to see how the papers came back. After working hard on a paper, I want to see how I did much faster than two weeks by which time I have moved on from that. For the second assignment, I have to wait two weeks to see if I even get a certificate.
      After that, there's a question as to where to find 100 good graders. You would have to assess that these graders knew more than your average person taking the course. In order to find 100 good ones, you would have to attract perhaps 300 applicants and assess all of them. I can't think of an efficient way to do that.

      The process also denies students the opportunity to see other student's work, which is valuable in itself in the educational process.

      $200 for a non credit course is prohibitively expensive for a lot of people. I could do it but I wouldn't have because I could have spent in much better elsewhere. Also charging $200 goes against a core intention of the site, which is to offer quality college courses to everyone.

      • Mary Howard

        Yes, I did take the course, and also Dr. Peng's before it. Even though I got reasonable grades,because I think graders were grading towards the rubric, which I didn't follow that closely on the first assignment and got graded down, and did follow on the second and got graded up, there was still a dissatisfacion since actually in terms of data analysis (I have 20+ years of experience), my first DA was better than the second, but just didn't hit the rubric as well. Yes, I do agree it is helpful for students to see other students work, but how can we possibly be convinced that "methods were correctly applied", when we on the last week had at least 10 students who did not understand that assignments were to be scaled to 40 points, yet even some of these students actually graded?? Yes, I agree, the concept of a low-cost course matters, as I took 7 SAS Institute courses like this that cost $1500 each. But should they be free as an alternative? I think "free" is not sustainable, and certainly not appropriate if these courses are to be offered for credit; the problem with "free" is that it is very hard to get feedback, certainly not from Jeff who has 10,000 students, but not from anyone who knows anything either. I do agree that learning with a lot of other people has value, but there is no actual feedback. In programing courses, maybe that is not a huge problem, because a program works or it doesn't, but to send 5000 students off without any idea of whether they are actually any good at data analysis, aside from being able to adhere to a rubric, is bad.

  • festdaddy

    You could easily overcome the shortcomings of the peer review process (at least somewhat) by posting good examples of the proper analysis, and by creating a better grading rubric. One of the most frustrating parts of the peer review is not really knowing why or how a particular grade was arrived at. I took your online data analysis course and was very frustrated by the lack of a 'solution' for the assignments, and also by the lack of knowing how I did relative to the rest of the class. I also took Peng's course when it was first offered, and found the automated grading via submitting code to be far superior to the peer review. Perhaps there is a way to blend the two to make the peer review both easier for the reviewer, and more helpful to the reviewee.

  • http://www.facebook.com/agaved Adriano Gaved

    I did the class and enjoyed it very very much.
    I understand the problems that led to choose self assessment, and I accept the limitations. What wouldn't be that difficult to implement, though, is at least to be able to ask the reviewers the reasons behind each grade. I got some low grades in areas where I really would like to understand what I did wrong.
    By the way, I would have gladly paid some tens of dollars for the course, maybe shifting to a cheap but paid model could generate the resources needed for support.

  • Fred Fischer

    I was one of the 5K students that completed your class and turned in both assignments. On the whole, I thought the peer review grading worked well -- on both assignments, the score from my peers was close to the score I gave myself. I'm guessing that that was probably true for most (80%?) of the class (Coursera has the #'s to check this).

    The problem is that there were some folks that were "robbed" or got marks better than they deserved. In a traditional college course, you can talk to a TA or the professor to appeal egregious grading mistakes. Unfortunately, in a MOOC, you're stuck with whatever you get since there is no process to deal with individual cases. Instead of focusing on the 80% that worked OK, maybe creating a process to identify and deal with the 5-10% of people that have legitimate gripes would be possible.

    There was a lot of talk on the message boards about the "meaningless certificate", but it hurts when you invest a lot of time into the assignments and receive marks obviously much lower than you deserve.

    BTW, I enjoyed your class and learned a lot. Thanks for all your efforts.

  • http://twitter.com/KevinLDavenport Kevin Davenport

    I completed your course and I think the peer review grading would have been more accurate if the grading rubric was individualized to the assignment.

    • softweave

      The rubrics for the two assignments were somewhat individualized. The largest drawback I encountered as a grader was not being able to give feedback explaining grades given. This is something that can be added in future iterations of the course.

      • http://twitter.com/KevinLDavenport Kevin Davenport

        I agree, it would be nice to leave feedback. I enjoyed learning new approaches through grading fellow students.

  • Anonymous

    You could get Coursera to have a track with working professionals who volunteer to look at your work, many students would pay to get that feedback and it would make the certificates more valuable

  • John Moehrke

    I too was in the class and fully enjoyed it. My first assignment was graded very closely to the same grade I gave myself. I learned some from grading others work as well. I saw no problem at all with peer grading. I unfortunately ran out of time, and likely brain power to finish the second assignment. I struggled with understanding the last few weeks, and found that the references simply gave me more statistics mumbo jumbo. It is not useful to explain a statistics term that someone doesn't understand with more statistics terms that one doesn't understand. Overall I am very happy with the class, Well done. I still want to understand so, I go back to cousera for another class on the topic.

  • Charles N. Steele

    I successfully completed the course & found it a very valuable experience. The data analysis projects were particularly valuable, IMO; far better learning experiences than quizzes or exams. I put a good deal of thought & effort into mine. Peer grading has a number a drawbacks, but so far as I no there are no reasonable alternatives, which makes peer review optimal. I was skeptical of some of the scores I received on some of the rubrics, but overall I think the process is fair enough, and I hope Prof. Leek sticks with it. The assignments were quite valuable, and it's something of a miracle that thousands of them were graded by a systematic process with a turnaround time of a week.

    How might it be improved in a future class?

    1. Sample assignments in advance of the project. The Earthquake Study was extremely useful in this regard. It would have been helpful to see a sample prediction study as well. Maybe making two or three samples for each project available would be a good idea; at this point Prof. Leek might even grab a couple of the highest scoring assignments, check them for quality, and use them as samples.

    2. Post some guidelines for how to score each particular rubric. The message boards now contain many discussions of areas where grading was problematic. Going through some of these might highlight problems & confusions that could be addressed by tweaking the rubrics or giving a few lines of instruction on how to grade them.

    3. Perhaps a short statement on how to grade, a few paragraphs long, would help. This could give an overall grading philosophy or basic guidelines to keep in mind.when grading; many (most?) of the students have never graded before and guidance would be useful.

    Items 2 and 3 could be made available after the assignment deadline.

    I very much enjoyed the course, and am very grateful to Prof. Leek and also to my fellow students who posted on the boards.

  • Robin

    I was a student on the Data Analysis course and the preceding Computing for Data Analysis course. The automatic scoring worked well on the computing course but I don't think it is appropriate for a data analysis, where communicating the results is as important as deriving them with R.

    Peer reviewing is the only way to score 5000+ papers quickly to my mind. The issue is mainly with people who felt they were scored too low (the ones who were too generously scored seemed to keep quiet!)

    One suggestion would be for students to have to score a paper that had also been marked by a lecturer or TA which would be randomly mixed among the papers they have to mark. If there was a big discrepancy, the reviewer could be tagged as an over/under scorer and the marks they give out adjusted, depending how much they over/under score. You would need a small pool of accurately scored papers to be randomly picked and put into the group of papers the students have to mark. You probably would allow a few marks as a margin of error, as there is always a subjective element.

    Finally, thanks for the course - lots of really interesting content, and free too!

  • http://www.facebook.com/hauser.quaid.3 Hauser Quaid

    You could make a study, you take random sample of peer reviewed data analysis and you make your assessment and then compare the two, publish the results.

  • Ilya Kipnis

    Thank you for the class, Dr. Leek. It allowed me to improve my understanding of Random Forests and SVMs. That stated...I disliked the peer-graded assignments with a burning passion, namely because there was really little clue that one was going "in the right direction", in using the concepts. One could have an absolutely terrible model, but so long as one checked off the bullet points on the rubric, one would receive a high grade.

    While communication is important, to base half of the grade for the entire class not necessarily on achieving any sort of accuracy, but simply on taking 15 hours to meticulously follow a rubric and frame an essay just seems like a missed opportunity.

    For these involved assignments, why not go the Kaggle route, and include a training set, validation set, and a server-based test set, and demand an accuracy above a certain percentage?

  • http://twitter.com/agtomlinson Alastair Tomlinson

    Jeff, I also took part in your class although in the end found I wasn't able to commit sufficient time to complete the assessments due to other pressures (I started both but never finished!). Very much enjoyed the studies though and hope to take the module again in the future with a better idea of the time commitment involved as I know I would learn even more by being able to fully complete the data analysis assignments.

    In my day job as a university lecturer I have also used peer assessment in a number of different contexts. Forgive me if I'm telling you what you already know but as you can probably imagine there's a pretty extensive literature on the effective use of self-, peer- and co-assessment. A good review paper on the subject is by Dochy, Segers & Sluijsmans (1999) - see http://dx.doi.org/10.1080/03075079912331379935 for the abstract - in particular there are some useful guidelines for practitioners in how to use these types of assessment successfully, key issues in supporting students. Perhaps this might be helpful for you and/or Coursera in running future versions of the module. Look forward to seeing how things develop.

  • Blaine Wishart

    I too completed the course. I did poorly on the first assignment and well on the second. In both cases my scores were more or less appropriate.
    The experience of reviewing the work of others was one of the most valuable parts of the course. I do think that comments explaining some of the scores might be useful however that would demand much more time on the reviews. I'd rather have people reviewing the work of more people.
    I think the idea of a 'correct' example is misses the point of the course. There were several approaches which could have been appropriate and utilized the statistical tools which were introduced appropriately. A few more examples of well presented analysis would have been helpful, but not if they covered the assignments. For each of the assignments there were many useful ways the problem could be addressed. I believe the goal of the course was and should have been to make that reality clear through experience. The papers I graded had wildly different approaches. A few were quite good. Some were quite poor, but there was a lot be be learned from all of them.

    I think it would be interesting to have the TAs and/or instructors select and review a few example papers for discussion. Those chosen should be chosen for their instructional value and they definitely should not be presented as 'correct' approaches.

    I would like to see the approach to quizes rethought.

    When one realizes that in addition to the ~100,000 who were introduced to a set of timely and useful tools and ideas and that ~5,000 actually completed work more or less like the normal class of 15 or so, this MOOC was an astounding event. Coursera and the instructor have provided a real service.

  • http://www.bestessayservices.com/ Bradley@ Essay Writing Service

    Considering the large number students participating in MOOCs, it is true that grading is a big issue and so the best method for grading would be peer review. And I think this will be the biggest challenge even in future when MOOCs are adopted by all institutions or when more people adopt this medium.

    However for this to be successful, there must be clear guidelines to direct the peers on the procedure to follow or how to conduct the review to avoid biases that might lead to discrepancies.