# rafalib package now on CRAN

10 Aug 2015For the last several years I have been collecting functions I routinely use during exploratory data analysis in a private R package. Mike Love and I used some of these in our HarvardX course and now, due to popular demand, I have created man pages and added the rafalib package to CRAN. Mike has made several improvements and added some functions of his own. Here is quick descriptions of the rafalib functions I most use:

mypar - Before making a plot in R I almost always type `mypar()`. This basically gets around the suboptimal defaults of `par`. For example, it makes the margins (`mar`, `mpg`) smaller and defines RColorBrewer colors as defaults. It is optimized for the RStudio window. Another advantage is that you can type `mypar(3,2)` instead of `par(mfrow=c(3,2))`. `bigpar()` is optimized for R presentations or PowerPoint slides.

as.fumeric - This function turns characters into factors and then into numerics. This is useful, for example, if you want to plot values `x,y` with colors defined by their corresponding categories saved in a character vector `labs``plot(x,y,col=as.fumeric(labs))`.

shist (smooth histogram, pronounced *shitz*) - I wrote this function because I have a hard time interpreting the y-axis of `density`. The height of the curve drawn by `shist` can be interpreted as the height of a histogram if you used the units shown on the plot. Also, it automatically draws a smooth histogram for each entry in a matrix on the same plot.

splot (subset plot) - The datasets I work with are typically large enough that

`plot(x,y)` involves millions of points, which is a problem. Several solution are available to avoid over plotting, such as alpha-blending, hexbinning and 2d kernel smoothing. For reasons I won’t explain here, I generally prefer subsampling over these solutions. `splot` automatically subsamples. You can also specify an index that defines the subset.

sboxplot (smart boxplot) - This function draws points, boxplots or outlier-less boxplots depending on sample size. Coming soon is the kaboxplot (Karl Broman box-plots) for when you have too many boxplots.

install_bioc - For Bioconductor users, this function simply does the `source(“http://www.bioconductor.org/biocLite.R”)` for you and then uses `BiocLite` to install.