Applied Statisticians: people want to learn what we do. Let's teach them.

In this recent opinion piece, Hadley Wickham explains how data science goes beyond Statistics and that data science is not promoted in academia. He defines data science as follows: I think there are three main steps in a data science

Academic statisticians: there is no shame in developing statistical solutions that solve just one problem

I think that the main distinction between academic statisticians and those calling themselves data scientists is that the latter are very much willing to invest most of their time and energy into solving specific problems by analyzing specific data sets.

The Big in Big Data relates to importance not size

In the past couple of years several non-statisticians have asked me "what is Big Data exactly?" or "How big is Big Data?". My answer has been "I think Big Data is much more about "data" than "big". I explain below.

Confession: I sometimes enjoy reading the fake journal/conference spam

I've spent a considerable amount of time setting up filters to avoid getting spam from fake journals and conferences. Unfortunately, they are exceptionally good at thwarting my defenses. This does not annoy me as much as I pretend because, secretly,

Correlation does not imply causation (parental involvement edition)

The New York Times recently published an article on education titled "Parental Involvement Is Overrated". Most research in this area supports the opposite view, but the authors claim that "evidence from our research suggests otherwise".  Before you stop helping your children

Writing good software can have more impact than publishing in high impact journals for genomic statisticians

Every once in a while we see computational papers published in science journals with high impact factors.  Genomics related methods appear quite often in these journals. Several of my junior colleagues express frustration that all their papers get rejected from these journals.

Data Analysis for Genomics edX Course

Mike Love (@mikelove) and I have been working hard the past couple of months preparing a free online edX course on data analysis for genomics. Our target audience are the postdocs, graduate students and research scientists that are tasked with

The fact that data analysts base their conclusions on data does not mean they ignore experts

Paul Krugman recently joined the new FiveThirtyEight hating bandwagon. I am not crazy about the new website either (although I'll wait more than one weeks before judging) but in a recent post Krugman creates a false dichotomy that is important to

How to use Bioconductor to find empirical evidence in support of π being a normal number

Happy π day everybody! I wanted to write some simple code (included below) to the test parallelization capabilities of my  new cluster. So, in honor of  π day, I decided to check for evidence that π is a normal number. A

Per capita GDP versus years since women received right to vote

Below is a plot of per capita GPD (in log scale) against years since women received the right to vote for 42 countries. Is this cause, effect, both or neither? We all know correlation does not imply causation, but I

