Tag: fun


Data supports claim that if Kobe stops ball hogging the Lakers will win more

The Lakers recently snapped a four game losing streak. In that game Kobe, the league leader in field goal attempts and missed shots, had a season low of 14 points but a season high of 14 assists. This makes sense to me since Kobe shooting less means more efficient players are shooting more. Kobe has a lower career true shooting % than Gasol, Howard and Nash (ranked 17,3 and 2 respectively). Despite this he takes more than 1/4 of the shots. Commentators usually praise top scorers no matter what, but recently they have started looking at data and noticed that the Lakers are 6-22 when Kobe has more than 19 field goal attempts and 12-3 in the rest of the games.


This graph shows score differential versus % of shots taken by Kobe* . Linear regression suggests that an increase of 1% in % of shots taken by Kobe results in a drop of 1.16 points (+/- 0.22)  in score differential. It also suggests that when Kobe takes 15% of the shots, the Lakers win by an average of about 10 points, when he takes 30% (not a rare occurrence) they lose by an average of about 5. Of course we should not take this regression analysis to seriously but it's hard to ignore the fact that when Kobe takes less than 23 23.25% of the shots the Lakers are 13-1.

I suspect that this relationship is not unique to Kobe and the Lakers. In general, teams with  a more balanced attack probably do better. Testing this could be a good project for Jeff's class.

* I approximated shots taken as field goal attempts + floor(0.5 x Free Throw Attempts).

Data is here.

Update: Commentator Sidney fixed some entires in the  data file. Data and plot updated.


An R function to determine if you are a data scientist

“Data scientist” is one of the buzzwords in the running for rebranding applied statistics mixed with some computing. David Champagne, over at Revolution Analytics, described the skills for being a data scientist with a Venn Diagram. Just for fun, I wrote a little R function for determining where you land on the data science Venn Diagram. Here is an example of a plot the function makes using the Simply Statistics bloggers as examples. 

The code can be found here. You will need the png and klaR R packages to run the script. You also need to either download the file datascience.png or be connected to the internet. 

Here is the function definition:

dataScientist(names=c(“D. Scientist”),skills=matrix(rep(1/3,3),nrow=1), addSS=TRUE, just=NULL)

  • names = a character vector of the names of the people to plot
  • addSS = if TRUE will add the blog authors to the plot
  • just = whether to write the name on the right or the left of the point, just = “left” prints on the left and just =”right” prints on the right. If just=NULL, then all names will print to the right. 
  • skills = a matrix with one row for each person you are plotting, the first column corresponds to “hacking”, the second column is “substantive expertise”, and the third column is “math and statistics knowledge”

So how do you define your skills? Here is how it works:

If you are an academic

You calculate your skills by adding papers in journals. The classification scheme is the following:

  • Hacking = sum of papers in journals that are primarily dedicated to software/computation/methods for very specific problems. Examples are: Bioinformatics, Journal of Statistical Software, IEEE Computing in Science and Engineering, or a software article in Genome Biology.
  • Substantive  = sum of papers in journals that primarily publish scientific results such as JAMA, New England Journal of Medicine, Cell, Sleep, Circulation
  • Math and Statistics = sum of papers in primarily statistical journals including Biostatistics, Biometrics, JASA, JRSSB, Annals of Statistics

Some journals are general, like Nature, Science, the Nature sub-journals, PNAS, and PLoS One. For papers in those journals, assess which of the areas the paper falls in by determining the main contribution of the paper in terms of the non-academic classification below. 

If you are a non-academic

Since papers aren’t involved, determine the percent of your time you spend on the following things:

  • Hacking = downloading/transferring data, cleaning data, writing software, combining previously used software
  • Substantive = time you spend learning about the scientific problem, discussing with scientists, working in the lab/field.
  • Math and Statistics = time you spend formalizing a problem in mathematical notation, time you spend developing new mathematical/statistical theory, time you spend developing general method.