Data Science

My Online Course Development Workflow

One of the nice things about developing 9 new courses for the JHU Data Science Specialization in a short period of time is that you get to learn all kinds of cool and interesting tools. One of the ways that we were able to push out so much content in just a few months was that we did most of the work ourselves, rather than outsourcing things like video production and editing.

NIH is looking for an Associate Director for Data Science: Statisticians should consider applying

NIH understands the importance of data and several months ago they announced this new position. Here is an excerpt from the add: The ADDS will focus on the urgent need and increased opportunities for capitalizing on the expanding collections of biomedical data to advance NIH’s mission. In doing so, the incumbent will provide programmatic NIH-wide leadership for areas of data science that relate to data emanating from many areas of study (e.

Sunday data/statistics link roundup (1/6/2013)

Not really statistics, but this is an interesting article about how rational optimization by individual actors does not always lead to an optimal solutiohn. Related, ere is the coolest street sign I think I’ve ever seen, with a heatmap of traffic density to try to influence commuters. An interesting paper that talks about how clustering is only a really hard problem when there aren’t obvious clusters. I was a little disappointed in the paper, because it defines the “obviousness” of clusters only theoretically by a distance metric.

Sunday data/statistics link roundup (11/25/2012)

My wife used to teach at Grinnell College, so we were psyched to see that a Grinnell player set the NCAA record for most points in a game. We used to go to the games, which were amazing to watch, when we lived in Iowa. The system the coach has in place there is a ton of fun to watch and is based on statistics! Someone has to vet the science writers at the Huffpo.

How important is abstract thinking for graduate students in statistics?

A recent lunchtime discussion here at Hopkins brought up the somewhat-controversial topic of abstract thinking in our graduate program. We, like a lot of other biostatistics/statistics programs, require our students to take measure theoretic probability as part of the curriculum. The discussion started as a conversation about whether we should require measure theoretic probability for our students. It evolved into a discussion of the value of abstract thinking (and whether measure theoretic probability was a good tool to measure abstract thinking).

A disappointing response from @NatureMagazine about folks with statistical skills

Last week I linked to an ad for a Data Editor position at Nature Magazine. I was super excited that Nature was recognizing data as an important growth area. But the ad doesn’t mention anything about statistical analysis skills; it focuses exclusively on data management expertise. As I pointed out in the earlier post, managing data is only half the equation - figuring out what to do with the data is the other half.

Sunday data/statistics link roundup (4/29)

Nature genetics has an editorial on the Mayo and Myriad cases. I agree with this bit: “In our opinion, it is not new judgments or legislation that are needed but more innovation. In the era of whole-genome sequencing of highly variable genomes, it is increasingly hard to justify exclusive ownership of particularly useful parts of the genome, and method claims must be more carefully described.” Via Andrew J. One of Tech Review’s 10 emerging technologies from a February 2003 article?

Interview with Drew Conway - Author of "Machine Learning for Hackers"

Drew Conway Drew Conway is a Ph.D. student in Politics at New York University and the co-ordinator of the New York Open Statistical Programming Meetup. He is the creator of the famous (or infamous) data science Venn diagram, the basis for our R function to determine if your a data scientist. He is also the co-author of Machine Learning for Hackers, a book of case studies that illustrates data science from a hacker’s perspective.

Sunday data/statistics link roundup (3/25)

The psychologist whose experiment didn’t replicate then went off on the scientists who did the replication experiment is at it again. I don’t see a clear argument about the facts of the matter in his post, just more name calling. This seems to be a case study in what not to do when your study doesn’t replicate. More on “conceptual replication” in there too.  Berkeley is running a data science course with instructors Jeff Hammerbacher and Mike Franklin, I looked through the notes and it looks pretty amazing.

Cleveland's (?) 2001 plan for redefining statistics as "data science"

This plan has been making the rounds on Twitter and is being attributed to William Cleveland in 2001 (thanks to Kasper for the link). I’m not sure of the provenance of the document but it has some really interesting ideas and is worth reading in its entirety. I actually think that many Biostatistics departments follow the proposed distribution of effort pretty closely. One of the most interesting sections is the discussion of computing (emphasis mine):  Data analysis projects today rely on databases, computer and network hardware, and computer and network software.

Data Scientist vs. Statistician

There’s in interesting discussion over at reddit on the difference between a data scientist and a statistician. My crude summary of the discussion seems to be that by and large they are the same but the phrase “data scientist” is just the hip new name for statistician that will probably sound stupid 5 years from now. My question is why isn’t “statistician” hip? The comments don’t seem to address that much (although a few go in that direction).