An R function to determine if you are a data scientist


“Data scientist” is one of the buzzwords in the running for rebranding applied statistics mixed with some computing. David Champagne, over at Revolution Analytics, described the skills for being a data scientist with a Venn Diagram. Just for fun, I wrote a little R function for determining where you land on the data science Venn Diagram. Here is an example of a plot the function makes using the Simply Statistics bloggers as examples. 

The code can be found here. You will need the png and klaR R packages to run the script. You also need to either download the file datascience.png or be connected to the internet. 

Here is the function definition:

dataScientist(names=c(“D. Scientist”),skills=matrix(rep(1/3,3),nrow=1), addSS=TRUE, just=NULL)

So how do you define your skills? Here is how it works:

If you are an academic

You calculate your skills by adding papers in journals. The classification scheme is the following:

Some journals are general, like Nature, Science, the Nature sub-journals, PNAS, and PLoS One. For papers in those journals, assess which of the areas the paper falls in by determining the main contribution of the paper in terms of the non-academic classification below. 

If you are a non-academic

Since papers aren’t involved, determine the percent of your time you spend on the following things: