30
Dec

Some things R can do you might not be aware of

Tweet about this on Twitter172Share on Facebook61Share on Google+48Share on LinkedIn30Email this to someone

There is a lot of noise around the "R versus Contender X" for Data Science. I think the two main competitors right now that I hear about are Python and Julia. I'm not going to weigh into the debates because I go by the motto: "Why not just use something that works?"

R offers a lot of benefits if you are interested in statistical or predictive modeling. It is basically unrivaled in terms of the breadth of packages for applied statistics.  But I think sometimes it isn't obvious that R can handle some tasks that you used to have to do with other languages. This misconception is particularly common among people who regularly code in a different language and are moving to R. So I thought I'd point out a few cool things that R can do. Please add to the list in the comments if I've missed things that R can do people don't expect.

  1. R can do regular expressions/text processing: Check out stringr, tm, and a large number of other natural language processing packages.
  2. R can get data out of a database: Check out RMySQL, RMongoDB, rhdf5, ROracle, MonetDB.R (via Anthony D.).
  3. R can process nasty data: Check out plyrreshape2, Hmisc
  4. R can process images: EBImage is a good general purpose tool, but there are also packages for various file types like jpeg.
  5. R can handle different data formats: XML and RJSONIO handle two common types, but you can also read from Excel files with xlsx or handle pretty much every common data storage type (you'll have to search R + data type) to find the package.
  6. R can interact with APIs: Check out RCurl and httr for general purpose software, or you could try some specific examples like twitteR. You can create an api from R code using yhat.
  7. R can build apps/interactive graphics: Some pretty cool things have already been built with shiny, rCharts interfaces with a ton of interactive graphics packages.
  8. R can create dynamic documents: Try out knitr or slidify.
  9. R can play with Hadoop: Check out the rhadoop wiki.
  10. R can create interactive teaching modules: You can do it in the console with swirl or on the web with Datamind.
  11. R interfaces very nicely with C if you need to be hardcore (also maybe? interfaces with Python): Rcpp, enough said. Also read the tutorial. I haven't tried the rPython library, but it looks like a great idea.
  • josep2

    Great comments here.

  • Tim

    You want to use jsonlite. Read the vignette for "why". Or take my word. Either way.

    jvmr (http://dahl.byu.edu/software/jvmr/) is also worth a look, for many reasons.

    Doug Bates left for Julia because he couldn't stand Ripley's insistence on Oracle (tm) Brand Solaris as a CRAN prerequisite. I have started playing with Julia not because it's better (R is far more practical for most things at this point) but rather because it's less polished than R, has huge holes in its stats support, and forces me to implement certain things (e.g. Fisher scoring for survival comparisons expressed as logistic regressions, dropout training, etc.) close to the metal. Scala, similarly, has much to recommend it for working on ridiculously huge, fast, unstructured volumes of data.

    Most of the time, R is a good enough tool for the job, and the other times, we are blessed by an embarrassment of riches -- fast, concise, expressive, Lispy dialects... The "Big Data Science!!1" marketers won't admit it, but nearly any mature statistical programming language runs rings in usability around the "X" in "R vs. X". Certainly, watching Wes McKinney detail the difficulties of getting Python to play nicely across platforms and setups was an eye opener. Scala and Julia are wonderful tools, too, but it will be quite some time before they're "user" friendly (vs. "programmer" friendly).

    JMHO...

  • Justin

    R can also handle audio and signal analysis. See the tuneR and seewave packages.

  • rkostadi

    I completely agree. For big data and execution speed you have to get close to the metal with C / Python. One funny example would be: why don't you try to re-code Bowtie in R? I bet the result would be "An ultraslow memory-inefficient short read aligner".