Episode 4 of Not So Standard Deviations is hot off the audio editor. In this episode Hilary first explains to me what heck is DevOps and then we talk about the statistical challenges in detecting rare events in an enormous set of time series data. There's also some discussion of Ben and Jerry's and the t-test, so you'll want to hang on for that.
Theranos is a Silicon Valley diagnostic testing company that has been in the news recently. The story of Theranos has fascinated me because I think it represents a perfect collision of the tech startup culture and the health care culture and how combining them together can generate unique problems.
I talked with Elizabeth Matsui, a Professor of Pediatrics in the Division of Allergy and Immunology here at Johns Hopkins, to discuss Theranos, the realities of diagnostic testing, and the unique challenges that a health-tech startup faces with respect to doing good science and building products people want to buy.
I just uploaded Episode 3 of Not So Standard Deviations so check your feeds. In this episode Hilary and I talk about our jobs and the life of the data scientist in both academia and the tech industry. It turns out that they're not as different as I would have thought.
Episode 2 of my podcast with Hilary Parker, Not So Standard Deviations, is out! In this episode, we talk about user testing for statistical methods, navigating the Hadleyverse, the crucial significance of rename(), and the secret reason for creating the podcast (hint: it rhymes with "bee"). Also, I erroneously claim that Bill Cleveland is way older than he actually is. Sorry Bill.
In other news, we are finally on iTunes so you can subscribe from there directly if you want (just search for "Not So Standard Deviations" or paste the link directly into your podcatcher.
I'm happy to announce that I've started a brand new podcast called Not So Standard Deviations with Hilary Parker at Etsy. Episode 1 "RCatLadies Origin Story" is available through SoundCloud. In this episode we talk about the origins of RCatLadies, evidence-based data analysis, my new book, and the Python vs. R debate.
You can subscribe to the podcast using the RSS feed from SoundCloud. We'll be getting it up on iTunes hopefully very soon.
Interview with Rebecca Nugent of Carnegie Mellon University.
In this episode Jeff and I talk with Rebecca Nugent, Associate Teaching Professor in the Department of Statistics at Carnegie Mellon University. We talk with her about her work with the Census and the growing interest in statistics among undergraduates.
Interview with Steven Salzberg about the ENCODE Project.
In this episode Jeff and I have a discussion with Steven Salzberg, Professor of Medicine and Biostatistics at Johns Hopkins University, about the recent findings from the ENCODE Project where he helps us separate fact from fiction. You’re going to want to watch to the end with this one.
Here are some excerpts from the interview.
Regarding why the data should have been released immediately without restriction:
If this [ENCODE] were funded by a regular investigator-initiated grant, then I would say you have your own grant, you’ve got some hypotheses you’re pursuing, you’re collecting data, you’ve already demonstrated that…you have some special ability to do this work and you should get some time to look at your data that you just generated to publish it. This was not that kind of a project. These are not hypothesis-driven projects. They are data collection projects. The whole model is…they’re creating a resource and it’s more efficient to create the resource in one place…. So we all get this data that’s being made available for less money…. I think if you’re going to be funded that way, you should release the data right away, no restrictions, because you’re funded because you’re good at generating this data cheaply….But you may not be the best person to do the analysis.
Regarding the problem with large-scale top-down funding approaches versus the individual investigator approach:
Well, it’s inefficient because it’s anti-competitive. They have a huge amount of money going to a few centers, they’ll do tons of experiments of the same type—may not be the best place to do that. They could instead give that money to 20 times as many investigators who would be refining the techniques and developing better ones. And a few years from now, instead of having another set of ENCODE papers—which we’re probably going to have—we might have much better methods and I think we’d have just as much in terms of discovery, probably more.
Regarding best way to make discoveries:
I think a problem I have with it…is that the top-down approach to science isn’t the way you make discoveries. And NIH has sort of said we’re going to fund these data generation and data analysis groups—they’re doing both…and by golly we’re going to discover some things. Well, it doesn’t always work if you do that. You can’t just say…so the Human Genome [Project], even though, of course there were lots of promises about curing cancer, we didn’t say we were going to discover how a particular gene works, we said we’re going to discover what the sequence is. And we did! Really well. With these [ENCODE] projects they said we’re going to figure out the function of all the elements, and they haven’t figured that out, at all.
In this episode of the Simply Statistics podcast Jeff and I discuss the deterministic statistical machine and increasing the cost of data analysis. We decided to eschew the studio setup this time and attempt a more guerilla style of podcasting. Also, Rafa was nowhere to be found when we recorded so you’ll have to catch his melodious singing voice in the next episode.
And in case you’re wondering, Jeff’s office is in fact that clean.