27
Sep

Announcing Statistics with Interactive R Learning Software Environment

Tweet about this on Twitter133Share on Facebook139Share on Google+63Share on LinkedIn17Email this to someone

Editor's note: This post was written by Nick Carchedi, a Master's degree student in the Department of Biostatistics at Johns Hopkins. He is working with us to develop software for interactive learning of R and statistics. 

Inspired by the relative lack of computer-based platforms for learning statistics and the R programming language, we at Johns Hopkins Biostatistics have created a new R package designed to teach both topics simultaneously and interactively. Accordingly, we’ve named the package swirl, which stands for “Statistics with Interactive R Learning”. We sought to model swirl after other highly successful interactive learning platforms such as Codecademy, Code School, and Khan Academy, but with a specific focus on teaching statistics and R. Additionally, we wanted users to learn these topics within the same environment in which they would be applying them, namely the R console.

If you’re reading this article, then you probably already have an appreciation for the R language and there’s no need to beat that drum any further. Staying true to the R culture, the swirl package is totally open-source and free for anyone to use, modify, or improve. Furthermore, anyone with something to teach can use the platform to create their own interactive content for the world to use.

A typical swirl session has a user load the package from the R console, choose from a menu of options the course he or she would like to take, then work through 10-15 minute interactive modules, each covering a particular topic. A module generally alternates between instructional text output to the user and prompts for the user to answer questions. One question may ask for the result of a simple numerical calculation, while another requires the user to enter an actual R command (which is parsed and executed, if correct) to perform a requested task. Multiple choice, text-based and approximate numerical answers are also fair game. Whenever the user answers a question incorrectly, immediate feedback is given in the form of a hint before prompting her to try again. Finally, plots, figures, and even videos may be incorporated into a module for the sake of reinforcing the methods or concepts being taught.

We believe that this form of interactive learning, or learning by doing, is essential for true mastery of topics as challenging and complex as statistics and statistical computing. While we are aware of a handful of other platforms for learning R interactively, our goal was to focus on the teaching of R and statistics simultaneously. As far as we know, swirl is the only platform of its kind and almost certainly the only one that takes place within the R console.

When we developed the swirl package, we wanted from the start to allow other people to extend and customize it to their particular needs. The beauty of the swirl platform is that anyone can create their own content and have it included in the package for all users to access. We have designed pre-formatted templates (color-coded spreadsheets) that instructors can fill out with their own content according to a fairly simple set of instructions. Once instructors send us the completed templates, we then load the content into the package so that anyone with the most recent version of swirl on their computer can access the content. We’ve tried to make the process of content creation as simple and painless as possible so that the statistics and computing communities are encouraged to share their knowledge with the world through our platform.

The package currently includes only a few sample modules that we’ve created in-house, primarily serving as demonstrations of how the platform works and how a typical module may appear to users. In the future, we envision a vibrant and dynamic collection of full courses and short modules that users can vote up or down based on the quality of their experience with each. In such a scenario, the very best courses would naturally float to the top and the less effective courses would fall out of favor and perhaps be recommended for revision.

In addition to making more content available to future users, we hope to one day transition swirl from being an interactive learning environment to one that is truly adaptive to the individual needs of each user. Perhaps this future version of our software would support a more intricate web of content, intelligently navigating users among topics based on a dynamic, data-driven interpretation of their strengths, weaknesses, competencies, and knowledge gaps. With the right people on board, this could become a reality.

We’ve created this package with the hope that the statistics and computing communities find it to be a valuable educational tool. We’ve got the basic infrastructure in place, but we recognize that there is a great deal of room for improvement. The swirl package is still very much in development and we are actively seeking feedback on how we can make it better. Please visit the swirl website to download the package or for more information on the project. We’d love for you to give it a try and let us know what you think.

Go to swirl website: http://swirlstats.com

  • Hilary Parker

    This is _awesome_

  • hadley

    You could make this considerably more slick through use of setActiveBindings and/or addTaskCallback

    • jtleek

      Nick is a new student and I'm sure he'd love help/feedback on how to make it slicker. We have him focusing on getting content in (the main limiting issue for almost all R learning platforms in existence is lack of content). But he would love help with the platform too!

      • hadley

        I'd be happy to provide some help. An approach I tried some time ago is at https://gist.github.com/hadley/6734404". Source it, then type hi() to get started (but there's only 1 step).

        It's also not a great idea to be writing files locally, but I can suggest some better approaches if you're interested. Feel free to email me.

        • Nick Carchedi

          Thanks for your suggestions, Hadley. I just sent you an email.

  • Dr David Martin

    It would be far better if it didn't drop you into "guess the word in the author's mind" type questions. These are frustrating and lead to rapid disillusionment.

    Mean, median, and mode are all measures of ____________.

    Which word am I supposed to guess here? Is it required to be letter perfect?

    Is it culture dependent on what would be expected? Not a clue so I am giving up after getting not very far because I can't progress.

    • http://NickCarchedi.com/ Nick Carchedi

      Thanks for your concern. Please see my response to your other comment below.

    • Cédric

      I think the answer is "average" but it does not take it as a good answer. I tried a few other words, like "qverqge" (i'm on an azerty keyboard so, who knows?) but cannot go any further because of this question too.

      Could we have the answer to this question in ordre to see what's next please?

      I really appreciate this initiative. After giving-up on R several times before, by trying to learn it on coursea, and with some .pdf, you made me download R and R-studio again, to test your package. So thank you !

      • http://NickCarchedi.com/ Nick Carchedi

        Spoiler Alert: The answer is "central tendency" -- a term we introduce in the module just above this question. I apologize for the confusion and will revise the question and answer to make it more clear.

        • Cédric

          Thank you for the answer, it worked :) I've been able to go to the median(myDrive) error which is already posted. I really like your package and will check for updates.

          • http://NickCarchedi.com/ Nick Carchedi

            Thanks for the kind words. We are small group of busy people, but I hope to have these issues resolved soon! Good things are on the way, I promise.

            Nick

  • Joshua Weiner

    this is amazing. I can't wait to see it grow.

  • Dr David Martin

    OK, I've worked with this some more and I have some criticisms that would hopefully make it more usable and not have the students give up early. At present I can't possibly hope to give this to my students.

    1. Be accurate and precise. Some of the questions are vague, leaving students to guess. Some are plain inaccurate. There appears to be a big confusion between the normal distribution and the central limit theorem.

    2. Be consistent and flexible. R doesn't care about spaces so why should swirl? It should parse rather than match. I get a complaint doing mean( myMPG ) instead of mean(myMPG).

    3. Explain steps more clearly. 'using the dataset$variable notation' is insufficient for most who have never seen that formalism before.

    I think it is a great idea, but the execution requires some work before it can be released into the wild.

    • Nick Carchedi

      Thanks for your feedback, Dr. Martin. Any shortcomings of the software are attributable only to me. I agree with you that there are many opportunities for improvement, many of which you outlined above.

      Please understand that the content we have at this point is more for proof of concept than anything else. My hope was that the R/stats community would jump behind the project and help us make it more robust over time. We were afraid to wait until we felt it was more 'perfect', because often that time never comes and we might just be left with an idea, never implemented.

      Our goal with swirl was to create an interactive learning environment within the R console. We believe we've accomplished this. Now, our challenge is to make it a lot better and to generate useful content (with the help of others) so that we can provide the data community with a truly valuable educational tool.

      I will take all of your input into consideration and hope that you'll give swirl another shot down the road once it's more refined.

      Sincerely,
      Nick

  • http://yihui.name/ Yihui Xie

    You really should make a video (at least screenshots) instead of writing pages and pages of descriptions. You know, people nowadays are easily bored if they cannot see anything live. I read the Simply Statistics blog primarily because Rafa's pictures and Jeff/Roger's videos. I do not bother reading other things at all. Just kidding :)

    Anyway, keep up the good work!

    • http://NickCarchedi.com/ Nick Carchedi

      Thanks for your suggestion, Xihui. I believe you are right about this. First priority = bug fixes. Then I'll get back to work on the website and documentation. By the way, thanks to you and Hadley for providing such great models for the rest of us to follow.

  • Mike

    This sounds like a great idea, unfortunately it doesn't seem to work with the latest version of R:

    Warning message:

    package ‘swirl’ is not available (for R version 3.0.2)

    • http://NickCarchedi.com/ Nick Carchedi

      Thanks, Mike. I was just made aware of this. We will fix soon. We have a running list of all bugs/concerns on our GitHub "Issues" page:

      https://github.com/ncarchedi/swirl/issues

      Sincerely,
      Nick

  • Tunga

    As someone with stats background, but has never really used R, this posting prompted me to give it a try, It's a great start, but here is some feedback from Module 1:

    1) Installation instructions are missing one step. In Step 1 at http://ncarchedi.github.io/swirl/students.html you also need to tell students to install Rtools. Otherwise library("devtools") command fails.

    2) I agree with Dr.Martin that the instructions need to be a bit more explanatory. As someone who was using R for the first time I did stumble at " 'using the dataset$variable notation'". Similarly, later on, I stumbled again at "store the contents of the 'cars$mpgCity' in a new variable called 'myMPG'." as I didn't know how to create a new variable, or what the assignment operator is (which is the next feedback you get, but it doesn't explain it, it just tells you to use it).

    3) you also need a way for students to access the correct answer. If they try 2-3 times and can't figure out, they need to be able to "give up" or ask for "help" and get the exact answer. That way they can learn and move forward, if they weren't able to figure it out on their own.

    4) Somewhere there, it's also probably useful to mention the case-sensitive nature of R. For most Windows users, case sensitivity is unfamiliar grounds and they might not realize this will be an issue. I struggled a bit with cars$mpgCity as a kept writing it cars$mpgcity and couldn't figure out why it wasn't working....

    5) Finally, the answer for the last mode question is incorrect. The mode for driveTrain is "front" and yet swirl expects the answer 43. That's the frequency of the mode, not the mode....

    • http://NickCarchedi.com/ Nick Carchedi

      Thanks for the thoughtful feedback. These are all good suggestions and I will take everything you've mentioned into account as we work to improve the software.

      Thanks,
      Nick

  • Nathan

    Is there to go further than the "median(myDrive)" bug for now? I tried everything to skip the question but this doesn't seem to be an option..

    • http://NickCarchedi.com/ Nick Carchedi

      Hi Nathan:

      Thanks for letting us know about this. I just revised the content of this module and you should no longer run into this problem. Please elect to update swirl when it prompts you to do so upon starting the program.

      I plan on adding a feature that allows you to skip over questions at your discretion, but I haven't had the time to work on it yet.

      Sincerely,
      Nick

  • MesiasRaul

    Hi!

    I have a problem with the swirl aplication,

    In Module 3 of the data analysis course happens that the application is stopped and no way to continue.

    what's the solution?

    carefully Mesias Raul.

  • MesiasRaul

    Hi!
    in Module 3 of the data analysis course happens that the application is stopped and no way to continue.

    carefully Mesias Raul

    • http://NickCarchedi.com/ Nick Carchedi

      Hi Mesias:

      I've responded to your comment on our Issues page on GitHub (https://github.com/ncarchedi/swirl/issues). We don't have any content for that course beyond the 3rd module, so that's why it stops. We are in the process of improving the software and hope to add additional content in the near future.

      Sincerely,
      Nick

  • http://NickCarchedi.com/ Nick Carchedi

    Hey Everyone:

    Here's a link to my most recent Simply Stats post on swirl: http://simplystatistics.org/2014/01/28/swirl-2/

    Also, check us out at http://swirlstats.com.

    Nick

  • Rxenny Perrho

    mean,median and mode are measures of central tendency

  • Elizard

    Swirl has been super helpful! One request: consider creating a way to play() at any time during the module. I've often found myself wanting to try something out but had to wait until the lesson progressed past a series of "..." prompts