## Podcast #6: Data Analysis MOOC Post-mortem

Jeff and I talk about Jeff’s recently completed MOOC on Data Analysis.

Jeff and I talk about Jeff’s recently completed MOOC on Data Analysis.

This might be short. I have a couple of classes starting on Monday. The first is our 1. This might be short. I have a couple of classes starting on Monday. The first is our class. This is one of my favorite classes to teach, our Ph.D. students are pretty awesome and they always amaze me with what they can do. The other is my Coursera debut in Data Analysis.

I’m not the first one to suggest that Biostatistics has been undervalued in the scientific community, and some of the shortcomings of epidemiology and biostatistics have been noted elsewhere. But this previous work focuses primarily on the contributions of statistics/biostatistics at the purely scientific level. The Cox Proportional Hazards model is one of the most widely used statistical models in the analysis of data from clinical trials and other medical studies.

Jeff and I talk with Brian Caffo about teaching MOOCs on Coursera.

A recent lunchtime discussion here at Hopkins brought up the somewhat-controversial topic of abstract thinking in our graduate program. We, like a lot of other biostatistics/statistics programs, require our students to take measure theoretic probability as part of the curriculum. The discussion started as a conversation about whether we should require measure theoretic probability for our students. It evolved into a discussion of the value of abstract thinking (and whether measure theoretic probability was a good tool to measure abstract thinking).

Statistics depends on math, like a lot of other disciplines (physics, engineering, chemistry, computer science). But just like those other disciplines, statistics is not math; math is just a tool used to solve statistical problems. Unlike those other disciplines, statistics gets lumped in with math in headlines. Whenever people use statistical analysis to solve an interesting problem, the headline reads: “Math can be used to solve amazing problem X” or

I posted a little while ago on a proposal for a fast statistics journal. It generated a bunch of comments and even a really nice follow up post with some great ideas. Since then I’ve gotten reviews back on a couple of papers and I think I realized one of the key issues that is driving me nuts about the current publishing model. It boils down to one simple question: What is a major revision?

The tough economic times we live in, and the potential for big paydays, have made entrepreneurship cool. From the venture capitalist-in-chief, to the javascript coding mayor of New York, everyone is on board. No surprise there, successful startups lead to job creation which can have a major positive impact on the economy. The game has been dominated for a long time by the folks over in CS. But the value of many recent startups is either based on, or can be magnified by, good data analysis.

We here at Simply Statistics are big fans of science news reporting. We read newspapers, blogs, and the news sections of scientific journals to keep up with the coolest new research. But health science reporting, although exciting, can also be incredibly frustrating to read. Many articles have sensational titles, like “How using Facebook could raise your risk of cancer”. The articles go on to describe some research and interview a few scientists, then typically make fairly large claims about what the research means.

A few data/statistics related links of interest: Eric Lander Profile The math of lego (should be “The statistics of lego”) Where people are looking for homes. Hans Rosling’s Ted Talk on the Developing world (an oldie but a goodie) Elsevier is trying to make open-access illegal (not strictly statistics related, but a hugely important issue for academics who believe government funded research should be freely accessible), more here.

There’s in interesting discussion over at reddit on the difference between a data scientist and a statistician. My crude summary of the discussion seems to be that by and large they are the same but the phrase “data scientist” is just the hip new name for statistician that will probably sound stupid 5 years from now. My question is why isn’t “statistician” hip? The comments don’t seem to address that much (although a few go in that direction).

[youtube http://www.youtube.com/watch?v=V-hFORcBj44?wmode=transparent&autohide=1&egm=0&hd=1&iv_load_policy=3&modestbranding=1&rel=0&showinfo=0&showsearch=0&w=500&h=375] The History of Nonlinear Principal Components Analysis, a lecture given by Jan de Leeuw. For those that have ~45 minutes to spare, it’s a very nice talk given in Jan’s characteristic style. (Source: http://www.youtube.com/)

Howard Chang, a former PhD student of mine now at Emory, just published a paper on a measurement error model for estimating the health effects of coarse particulate matter (PM). This is a cool paper that deals with the problem that coarse PM tends to be very spatially heterogeneous. Coarse PM is a bit of a hot topic now because there is currently no national ambient air quality standard for coarse PM specifically.

All statisticians in academia are constantly confronted with the question of where to publish their papers. Sometimes it’s obvious: A theoretical paper might go to the Annals of Statistics orJASA Theory & Methods or Biometrika. A more “methods-y” paper might go to JASA or JRSS-B orBiometrics or maybe even Biostatistics (where all three of us are or have been associate editors). But where should the applied papers go? I think this is an increasingly large category of papers being produced by statisticians.