Editor’s note: For the last few years I have made a list of awesome things that other people did (2016,2015, 2014, 2013). Like in previous years I’m making a list, again right off the top of my head. If you know of some, you should make your own list or add it to the comments! I have also avoided talking about stuff I worked on or that people here at Hopkins are doing because this post is supposed to be about other people’s awesome stuff. I write this post because a blog often feels like a place to complain, but we started Simply Stats as a place to be pumped up about the stuff people were doing with data.
- Amelia McNamara and Aran Lunzer made this crazy nice illustration of how histograms work.
- I am just continuously impressed by the work coming out of Rahul Satija’s group - this year it was this paper on aligning cell populations in single cell data.
- Karl Broman and Kara Woo wrote the definitive guide to data in spreadsheets with the winner for most dense delivery of valuable information in a sentence in the last sentence of the abstract.
- The rest of the papers in the Peerj data science collection are also awesome.
- Thomas Lin Pederson came out with the patchwork package for composing plots. This is the plot organization software I’ve always wanted :).
- Getting Avericked became a word as Mara Averick somehow makes everything R related look so cool.
- The folks at Fast.AI do so many impressive things (they are MOOC geniuses) its hard to list just one to list, but this post on data ethics might be my favorite.
- Kristian Lum wrote the most important blog post in statistics and jump-started the discussion throughout our field on the standard of behavior we want to hold ourselves to as a field.
- Hadley Wickham is the LeBron James of data science - so effortlessly great it is almost expected. usethis is yet another example of how he’s revolutionizing package development (again).
- I looooovveeddd this post from Julia Silge on how to do word2vec like things with tidy data. Such a beautiful distillation of something complicated into something we all can do.
- I only bought one stats book all year. It was David Robinson’s Empirical Bayes Book. An insta-classic.
- I’ve been following Nick Strayer’s dive into deep learning all year on Twitter, this post on LSTM with a baseball analogy is a good example of the awesomeness.
- Jeff Kao had a tour de force post on identifying fake comments on the net neutrality repeal. Such an impressively timed and high impact piece of forensic data science.
- Mike Love released his amazing lecture notes for his intro to computational Biology class. I want to take a class from Mike.
- The keras R package is sooooo nice. I love everything JJ Allaire does - whenever he writes software it always seems like he’s read my mind and the way I’d want the software to work it just does.
- Jesse Meagan’s post on educational data wrangling was realy close to my heart. Plus it is so awesome to read about her (impressive) transition from Excel to R. Something I’m thinking about a ton these days.
- I’ve been gobbling up Cristoph Molnar’s book on interpretable machine learning.
- The book on tidytextmining came out this year. What a tour-de-force from Julia Silge and David Robinson.
- I read a lot of FastForward blog posts this year - one of my favorites was this one on auditing machine learning models - such an important topic.
- I love this post on the ten fallacies of data science - my favorite one? “The data exist”.
- I basically read every one of Maëlle Salmon’s blog post but I found her analysis of seeds that people use in R to be fascinating. I’m looking forward to more on the epidemiology of code :).
- I dug Emily Robinson’s post on making R code faster this is something I’m adding to every class I teach in the future - so nice to see all the steps laid out so clearly!
- Natalie Telis did one of the most impressive crowd sourced studies of who is asking questions at the major genetics conference. I think solutions like this are so clever when they are low cost/low overhead but produce super interesting results.
- I liked Stephen Turner’s updated post on how to stay current in Bioinformatics. Keeping up is super hard to do well - so this is a big service for the community.
- Its hard to point to just one thing Karthik Ram does for the R community in any given year, but I was pumped to see this call for rOpenSci fellows. I know I’m going to hit peak FOMO whenever Karthik is organizing.
- My main goal as an advisor is to be like Marta Kutas- “I don’t care what they become,” she says of her students, “so long as it’s decent, thinking human beings!”
- I’m such a fan of Rob Hyndman’s forcasting work, and was pumped to see him on Datacamp. The data camp folks have been just vacuming up awesome people to teach courses.
- Andrew Bream did an amazing, polite, and thorough job of setting me straight on deep learning.
- The UpSetR package was a heartwarming contribution for those of us who are terrified of out of control Venn diagrams. * I think Florian Markowetz consistently is one of the most thoughtful computational scientists. I read his stuff relentelessly but loved his idea for a pre-registration of the Human Cell Atlas.
- It was the year of datasaurus! I am in the process of replacing all my Anscomb quartet lecture notes. What a cool idea by Justin Matejka and George Fitzmaurice.
- There are times when I wonder whether Emma Pierson has created multiple AI copies of herself. How can one person write so, many, awesome, things in one year?
- Ben Haibe-Kains and colleagues put together a tour-de-force in reproducibility with PharmacoDB.
- There is often talk of how academic statistics isn’t supportive enough of non-standard (read: non-math) contributions. There is definitely truth there, but I love Lance Waller on navigating these non-standard paths in academics.
- I am so amped about the skimr package, especially the histograms as part of the summary statistics.
- Christie Aschwanden added another awesome piece to her collection of the most level-headed, responsible, and thoughtful discusison of the statistical issues in science.
That’s all I have for the moment. I’m sure I missed a ton and I’m sorry if I missed you - please add more in the comments I always love to see cool stuff!