I just got back from the rOpenSci OzUnconf that was run in Melbourne last week. I’d like to give a big thanks to the organizers (Nick Tierney, Di Cook, Rob Hyndman and others) for putting on a great unconference. These events are always a great opportunity to meet people just getting started in the R community and to get them involved.
As is typical for these unconferences, topic ideas were pitched via issues on the OzUnconf GitHub repo. One issue that I filed was titled “How do you pitch R to new users?”. While this issue was not taken up during the unconference (for good reason, I think), Nick was kind enough to initiate a lunch time discussion of the topic (with Daniel Falster kindly taking notes).
The topic came up in my mind because I’ve found that I’ve had to change the way that I “sell” R to new users. Through various discussions at the unconference and in many other venues, I’ve found that many R users, even today, are the “lone R user” in their group/institution/organization. While they may enjoy using R, it’s often made difficult by the fact that others in their group do not use R and therefore there is some negotiation over “how things get done”. Convincing others in the group to use R is one way to go (I suppose abandoning R is the other) and the question of how best to do this for different audiences is the question that arose in my mind. As a teacher, you can usually design the curriculum so that the students are forced to use R. But in most other environments, a different approach is needed.
As part of the introduction to the topic, I talked about how I used to convince others to use R. Bear in mind, this was almost 20 years ago and the majority of people I was talking to were using SAS, Stata, SPSS, Excel, or some other commercial package. These were largely interactive packages, perhaps with graphical user interfaces, designed to do more or less traditional statistical analyses. My pitch usually involved three things:
Free. R was both free as in cost and free as in free software. The free cost part made it a highly accessible package and the free software part allowed for anyone to tinker with the package, inspect its code, and make improvements.
Graphics. R was able to produce high quality “publication ready” graphics and it gave you detailed control over all the graphical elements. S-PLUS could also do this but S-PLUS didn’t come with the free part mentioned above.
Programming language. Unlike packages like SAS, Stata, or SPSS, R came with a robust and sophisticated Lisp-like programming language that was well-suited for data analysis applications. In addition, you could use it to build packages that could extend the core R system.
Much has changed since those early days and I’ve varied my pitch quite a bit to focus on a few different things (that didn’t exist back then). In particular, the audience has changed—I talk to many more people who are just getting started in data analysis and therefore are a bit more open-minded about which software to use. Also, python has come on the scene as a viable alternative to R for data science and so there are even more arguments to consider. Some of the things I focus on now are
Reproducibility and Reporting. With the development of knitr and its combination with R Markdown, the writing of reproducible reports was made infinitely easier. (Markdown itself, probably deserves its own discussion, but it’s not specifically R-related.)
RStudio. The development of the RStudio IDE has made getting started with R much easier. Having a powerful IDE was important to me for learning other languages and I’m glad R finally has something solid for itself. RStudio has significantly simplified the development of R packages via devtools and roxygen2. While it’s not yet perfect, these tools have changed what used to be a labor-intensive and finicky process into a more manageable and easier to learn work flow.
Graphics. R still has the ability to make great data graphics and with the introduction of ggplot2
, it has become easier to make good graphics.
R Packages and Community. With over 10,000 packages on CRAN alone, there’s pretty much a package to do anything. More importantly, the people contributing those packages and the greater R community have expanded tremendously over time, bringing in new users and pushing R to be useful in more applications. Eveyr year now there are probably hundreds if not thousands of meetups, conferences, seminars, and workshops all around the world, all related to R.
At the unconference, a number of people had different approaches to how to convince others to use R. Here are just a summary:
dplyr
package was sometimes a good selling point. The idea here was that you could show people how much time could be saved by automating analyses and using dplyr
to clean up data.This just a brief summary of our discussion at the unconference and I was heartened to see all of the enthusiasm for R there. Even with R’s incredible growth over the last 20 years, there will still come a time when a case needs to be made to use R over something else. I’m just glad that we have so many more reasons today than we used to.