rafalib package now on CRAN

Rafael Irizarry
2015-08-10

For the last several years I have been collecting functions I routinely use during exploratory data analysis in a private R package. Mike Love and I used some of these in our HarvardX course and now, due to popular demand, I have created man pages and added the rafalib package to CRAN. Mike has made several improvements and added some functions of his own. Here is quick descriptions of the rafalib functions I most use:

mypar - Before making a plot in R I almost always type mypar(). This basically gets around the suboptimal defaults of par. For example, it makes the margins (mar, mpg) smaller and defines RColorBrewer colors as defaults.  It is optimized for the RStudio window. Another advantage is that you can type mypar(3,2) instead of par(mfrow=c(3,2)). bigpar() is optimized for R presentations or PowerPoint slides.

as.fumeric - This function turns characters into factors and then into numerics. This is useful, for example, if you want to plot values x,y with colors defined by their corresponding categories saved in a character vector labsplot(x,y,col=as.fumeric(labs)).

shist (smooth histogram, pronounced shitz) - I wrote this function because I have a hard time interpreting the y-axis of density. The height of the curve drawn by shist can be interpreted as the height of a histogram if you used the units shown on the plot. Also, it automatically draws a smooth histogram for each entry in a matrix on the same plot.

splot (subset plot) - The datasets I work with are typically large enough that

plot(x,y) involves millions of points, which is a problem. Several solution are available to avoid over plotting, such as alpha-blending, hexbinning and 2d kernel smoothing. For reasons I won’t explain here, I generally prefer subsampling over these solutions. splot automatically subsamples. You can also specify an index that defines the subset.

sboxplot (smart boxplot) - This function draws points, boxplots or outlier-less boxplots depending on sample size. Coming soon is the kaboxplot (Karl Broman box-plots) for when you have too many boxplots.

install_bioc - For Bioconductor users, this function simply does the source(“http://www.bioconductor.org/biocLite.R”) for you and then uses BiocLite to install.