Simply Statistics

A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

A non-comprehensive list of awesome things other people did in 2017

Editor’s note: For the last few years I have made a list of awesome things that other people did (2016,2015, 2014, 2013). Like in previous years I’m making a list, again right off the top of my head. If you know of some, you should make your own list or add it to the comments! I have also avoided talking about stuff I worked on or that people here at Hopkins are doing because this post is supposed to be about other people’s awesome stuff.

Thoughts on David Donoho’s "Fifty Years of Data Science"

Note: This post was originally published as part of a collection of discussion pieces on David Donoho’s paper. The original paper and collection of discussions can be found at the JCGS web site. Professor Donoho’s commentary comes at a perfect time, given that, according to his own chronology, we are just about due for another push to “widen the tent” of statistics to include a broader array of activities. Looking back at the efforts of people like Tukey, Cleveland, and Chambers to broaden the meaning of statistics, I would argue that to some extent their efforts have failed.

How Do Machines Learn?

I like all of CGP Grey’s videos but most of them have to do with voting systems and so aren’t really relevant to this blog. But his latest video titled “How Do Machines Learn?” is highly relevant and I thought very well done. That said, although the animations of the robots were very cute and helped to tell the story, I found them a bit disconcerting in a way that I can’t quite explain.

Puerto Rico's governor wants recount of hurricane death toll

A quick followup to Rafa’s analysis of the death toll from Hurricane Maria, from Axios: Puerto Rican Governor Ricardo Rosselló ordered a recount Monday of every death on the island since Hurricane Maria made landfall on September 20, as evidence continues to show that the official death toll grossly undercuts the true number, reports the New York Times. There are at least two ways to do this. One way is inferential in nature, taking a look at what we might expect the mortality to be and looking at what was observed.

Data Analysis and Engagement - Does Caring About the Analysis Matter?

Sometimes, when you’re recording a podcast, it’s actually difficult to listen. That’s because while you’re recording you’re monitoring the network lag, the sound levels, the show notes, and the outline. On some episodes I’m just barely hanging on by a thread. While Hilary Parker and I were recording Episode 50 of Not So Standard Deviations we had a discussion about her experience doing A/B testing at Etsy and how one experiment, which involved showing customers their passwords as they typed them, resulted in an increase in the number of failed login attempts, which was not what they were expecting.

This is a brave post and everyone in statistics should read it

This post by Kristian Lum is incredibly brave. It points out some awful behavior by people in our field and should be required reading for everyone. It took a lot of courage for Kristian to post this but we believe her, think this is a serious and critical issue for our field, and will not tolerate this kind of behavior among our colleagues. Her post has aleady inspired important discussions among the faculty at Johns Hopkins Biostatistics and is an important contribution to making sure our field is welcoming for everyone.

Hurricane María official death count in conflict with mortality data

A recent preprint by Alexis R. Santos-Lozada and Jeffrey T. Howard concludes that The mortality burden may [be] higher than official counts, and may exceed the current official death toll by a factor of 10. The authors used monthly death records from the Puerto Rico Vital Statistics system from 2010 to 2016. Although data for 2017 was apparently not available, they extracted data from a statement made by Héctor Pesquera, the Secretary of Public Safety:

Some roadblocks to the broad adoption of machine learning and AI

I read two blog posts on AI over the Thanksgiving break. One was a nice post discussing the challenges for AI in medicine by Luke Oakden-Rayder and the other was about the need for increased focus on basic research in AI motivated by AlphaGo by Tim Harford. I’ve had a lot of interactions with people lately who want to take advantage of machine learning/AI in their research or business. Despite the excitement around AI and the exciting results we see from sophisticated research teams almost daily - the actual extent and application of AI is much smaller.

A few things that would reduce stress around reproducibility/replicability in science

I was listening to the Effort Report Episode on The Messy Execution of Reproducible Research where they were discussing the piece about Amy Cuddy in the New York Times. I think both the article and the podcast did a good job of discussing the nuances of the importance of reproducibility and the challenges of the social interactions around this topic. After listening to the podcast I realized that I see a lot of posts about reproducibility/replicability, but many of them are focused on the technical side.

Follow Up on Reasoning About Data

Sometimes, when I write a really long blog post, I forget what the point was at the end. I suppose I could just update the previous post…but that feels wrong for some reason. I meant to make one final point in my last post about how better data analyses help you reason about the data. In particular, I meant to tie together the discussion about garbage collection to the section on data analysis.