Tag: policy


Statisticians and computer scientists - if there is no code, there is no paper

I think it has been beat to death that the incentives in academia lean heavily toward producing papers and less toward producing/maintaining software. There are people that are way, way more knowledgeable than me about building and maintaining software. For example, Titus Brown hit a lot of the key issues in his interview. The open source community is also filled with advocates and researchers who know way more about this than I do.

This post is more about my views on changing the perspective of code/software in the data analysis community. I have been frustrated often with statisticians and computer scientists who write papers where they develop new methods and seem to demonstrate that those methods blow away all their competitors. But then no software is available to actually test and see if that is true. Even worse, sometimes I just want to use their method to solve a problem in our pipeline, but I have to code it from scratch!

I have also had several cases where I emailed the authors for their software and they said it "wasn't fit for distribution" or they "don't have code" or the "code can only be run on our machines". I totally understand the first and last, my code isn't always pretty (I have zero formal training in computer science so messy code is actually the most likely scenario) but I always say, "I'll take whatever you got and I'm willing to hack it out to make it work". I often still am turned down.

So I have a new policy when evaluating CV's of candidates for jobs, or when I'm reading a paper as a referee. If the paper is about a new statistical method or machine learning algorithm and there is no software available for that method - I simply mentally cross it off the CV. If I'm reading a data analysis and there isn't code that reproduces their analysis - I mentally cross it off. In my mind, new methods/analyses without software are just vapor ware. Now, you'd definitely have to cross a few papers off my CV, based on this principle. I do that. But I'm trying really hard going forward to make sure nothing gets crossed off.

In a future post I'll talk about the new issue I'm struggling with - maintaing all that software I'm creating.



Ozone rules

A recent article in the New York Times describes the backstory behind the decision to not revise the ozone national ambient air quality standard. This article highlights the reality of balancing the need to set air pollution regulation to protect public health and the desire to get re-elected. Not having ever served in politics (does being elected to the faculty senate count?) I can’t comment on the political aspect. But I wanted to highlight some of the scientific evidence that goes into developing these standards. 

A bit of background: the Clean Air Act of 1970 and its subsequent amendments requires that national ambient air quality standards be set to protect public health with “an adequate margin of safety”. Ozone (usually referred to as smog in the press) is one of the pollutants for which standards are set, along with particulate matter, nitrogen oxides, sulfur dioxide, carbon monoxide, and airborne lead. Importantly, the Clean Air Act requires that the EPA to set standards based on the best available scientific evidence.

The ozone standard was re-evaluated years ago under the (second) Bush administration. At the time, the EPA staff recommended a daily standard of between 60 and 70 ppb as providing an adequate margin of safety. Roughly speaking, if the standard is 70 ppb, this means that states cannot have levels of ozone higher than 70 ppb on any given day (that’s not exactly true but the real standard is a mouthful). Stephen Johnson, EPA administrator at the time, set the standard at 75 ppb, citing in part the lack of evidence showing a link between ozone and health at low levels.

We’ve conducted epidemiological analyses that show that ozone is associated with mortality even at levels far below 60 ppb (See Figure 2). Note, this paper was not published in time to make into the previous EPA review. The study suggests that if a threshold exists below which ozone has no health effect, it is probably at a level lower than the current standard, possibly nearing natural background levels. Detecting thresholds at very low levels is challenging because you start running out of data quickly. But other studies that have attempted to do this have found results similar to ours.

The bottom line is pollution levels below current air quality standards should not be misinterpreted as safe for human health.