Simply Statistics

25
Aug
24
Aug

Simply Statistics Podcast #1

Simply Statistics Podcast #1.

To mark the occasion of our 1-year anniversary of starting the blog, Jeff, Rafa, and I have recorded our first podcast. You can tell that it’s our very first podcast because we don’t appear to have any idea what we’re doing. However, we decided to throw caution to the wind.

In this episode we talk about why we started the blog and discuss our thoughts on statistics and big data. Be sure to watch to the end as Rafa provides a special treat.

UPDATE: For those of you who can’t bear the sight of us, there is an audio only version.

UPDATE 2: I have setup an RSS feed for the audio-only version of the podcast.

UPDATE 3: Here is the RSS feed for HD video version of the podcast.

23
Aug

Science Exchange starts Reproducibility Initiative

I’ve fallen behind and so haven’t had a chance to mention this, but Science Exchange has started its Reproducibility Initiative. The idea is that authors can submit their study to be reproduced and Science Exchange will match the study with a validator who will attempt to reproduce the results (for a fee).

Validated studies will receive a Certificate of Reproducibility acknowledging that their results have been independently reproduced as part of the Reproducibility Initiative. Researchers have the opportunity to publish the replicated results as an independent publication in the PLOS Reproducibility Collection, and can share their data via the figshare Reproducibility Collection repository. The original study will also be acknowledged as independently reproduced if published in a supporting journal.

This is a very interesting initiative and it’s one I and a number of others have been talking about doing. They have an excellent advisory board and seem to have all the right partners/infrastructure lined up. 

The obvious question to me is if you’re going to submit your study to this service and get it reproduced, why would you ever want to submit it to a journal? The level of review you’d get here is quite a bit more rigorous than you’d receive at a journal and the submission process essentially involves writing a paper without the Introduction and the Discussion (usually the hardest and most annoying parts). At the moment, it seems the service is set up to work in parallel with standard publication or perhaps after the fact. But I could see it eventually replacing standard publication altogether.

The timing, of course, could be an issue. It’s not clear how long one should expect it to take to reproduce a study. But it’s probably not much longer than a review you’d get at a statistics journal.

22
Aug

Data Startups from Y Combinator Demo Day

Y Combinator, the tech startup incubator, had its 15th demo day. Here are some of the data/statistics-related highlights (thanks to TechCrunch for doing the hard work):

  • EVERYDAY.ME — A PRIVATE, ONLINE RECORD OF YOUR LIFE. 

    This company seems to me like a meta-data company. It compiles your data from other sites.

  • MTH SENSE: IMPROVING MOBILE AD TARGETING

    “Most [mobile] ads served are blind. Mth sense’s solution adds demographic data to ads through predictive modeling based on app and device usage. For example, if you have the Pinterest, and Vogue apps, you’re more likely to be a soccer mom.” Hmm, I guess I’d better delete those apps from my phone….

  • SURVATA: REPLACING PAYWALLS WITH SURVEYWALLS

    Survata’s product replaces paywalls on premium content from online publishers with surveys that conduct market research.

  • RENT.IO — RENT PRICE PREDICTION

    Rent.io says it wants to “optimize pricing of the single biggest recurring expense in lives of 100 million Americans.&rdquo

  • BIGCALC: FAST NUMBER-CRUNCHING FOR MAKING FINANCIAL TRADING DECISIONS

    BigCalc says its platform for financial modeling scales to enormous datasets, and purportedly does simulations that typically take 22 hours in 24 minutes.

  • DATANITRO — A BACKBONE FOR FINANCE-RELATED DATA

    DataNitro’s founders have both worked in finance, and they say they know from experience that financial industry software is basically “held together with duct tape.” A big problem with the status quo is how data is exported from Excel.

  • STATWING: EASY TO USE DATA ANALYSIS

    Most existing data analysis tools (in particular SPSS) are built for statisticians. Statwing has created tools that make it easier for marketers and analysts to interact with data without dealing with arcane technical terminology. Those users only need a few core functions, Statwing says, so that’s what the company provides. With just a few clicks, users can get the graphs that they want. And the data is summarized in a single sentence of conversational English.

22
Aug
21
Aug

NSF recognizes math and statistics are not the same thing...kind of

There’s controversy brewing over at the National Science Foundation over names. Back in October 2011, Sastry Pantula, the Director of the Division of Mathematical Sciences at NSF (formerly the Chair of NC State Statistics Department and President of the ASA), proposed that the name of the Division be changed to the “Division of Mathematical and Statistical Sciences”. Excerpting from his original proposal, Pantula says

Extracting useful knowledge from the deluge of data is critical to the scientific successes of the future. Data-intensive research will drive many of the major scientific breakthroughs in the coming decades. There is a long-term need for research and workforce development in computational and data-enabled sciences. Statistics is broadly recognized as a data-centric discipline, thus having it in the Division’s name as proposed would be advantageous whenever “Big Data” and data-sciences investments are discussed internally and externally.

This bureaucratic move by Pantula created quite a reaction. A sub-committee of the Math and Physical Sciences Advisory Committee (MPSAC) was formed to investigate the name change and to solicit feedback from the relevant communities. The sub-committee was chaired by Fred Roberts (Rutgers) and also included James Berger (Duke), Emery Brown (MIT), Kevin Corlette (U. of Chicago), Irene Fonseca (CMU), and Juan Meza (UC Merced). A number of organizations provided feedback to the sub-committee, including the American Statistical Association and the American Mathematical Society.

There was intense feedback both for and against the name change. Somewhat predictably, mathematicians were adamantly opposed to the name change and statisticians were for it. The final report of the sub-committee is both interesting and enlightening for those not familiar with the arguments involved.

First a little background for people (like me) who are not familiar with NSF’s organizational structure. NSF has a number of Directorates, of which Mathematical and Physical Sciences (MPS) is one, and within MPS is the Division of Mathematical Sciences (DMS). DMS includes 11 program areas ranging from algebra and number theory to topology. Statistics is one of those program areas. 

This should already give one pause. How exactly do statistics and topology end up in the same basket? I’m not exactly sure but I’m guessing it’s the result of bureaucratic inertia. Statistics came later and it had to be stuck somewhere. DMS is not the only place at NSF to get funding for statistics, but a quick search through the currently active grants shows that the vast majority of statistics-related grants go through DMS, with a smattering coming through other Divisions.

The primary issue here, and the only reason it’s an issue at all, is money. Statistics is one of 11 program areas in DMS, which means that it roughly gets 9% of the funding allocated to DMS. This is worth noting—the entire field of statistics gets roughly as much funding as, say, topology. For example, one of the arguments against the name change in the sub-committee’s report is

3). Statistics constitutes a small (although significant) proportion of the DMS portfolio in terms of number of programs, number of grant applications, number of grants funded.

Well, yes, but I would argue that the reason for this is the historically (low) prioritization of statistics in the Division. This is a choice, not a fact. I believe statistics could play a much bigger role in the Division and perhaps within NSF more generally if there were an agreement on its importance. A key argument comes next, which is

If the name change attracts more proposals to the Division from the statistics community, this could draw funding away from other subfields and it could also increase the workload of the Division’s program officers.

Okay, so money’s important too, but let’s get to the main attraction, which comes in comment number 5:

5). Statistics is funded throughout the federal government. The traditional funding of statistics by DMS is appropriate: fund fundamental research in statistics. Broadening the mission of DMS to include more applied statistics would not benefit the overall funding of the mathematical sciences.

The first sentence is a fact: Many government agencies fund statistics research. For example, the National Institutes of Health funds many statisticians who develop and apply methods to problems in the health sciences. The EPA will occasionally fund statisticians to develop methods for environmental health applications.

But who is charged with funding the development and application of statistical methods to every other scientific field? The problem now is that you essentially have a group of NIH-funded (bio)statisticians doing biomedical research and a group of NSF-funded statisticians doing “fundamental” research in statistics (note that “fundamental” equals “mathematical” here). But that hardly represents all of the statisticians out there. So for the rest of the statisticians who are not doing biomedical research and are not doing “fundamental” research, where do they go for funding?

These days, statistics is “applied” to everything. NSF itself has acknowledged that we are in an era of big data—it’s clear that statistics will play a big role whether we call it “statistics” or not. If NSF decided to fund research into the application of statistics to all areas, it would likely overwhelm the funding of every other program area in DMS. This is why the “solution” is to resort to what is informally understood as the mission of NSF, which is to fund “fundamental” research.

But it’s not clear to me that NSF should limit itself in this manner. In particular, if NSF got serious about funding the application of statistics to all scientific areas (either through DMS or some other Division), it would incentivize statisticians to build stronger and closer collaborations with scientists all over. I see this as a win-win for everyone involved. 

As a statistician, I’m willing to admit I’m biased, but I think NSF should play a much bigger role in advancing statistics as one of the critical tools of the future. Perhaps the solution is not to rename the Division, but to create a separate division for statistical sciences independent of mathematics, one of the suggestions in the sub-committee report. This separation would mirror what has occurred in many universities over the past 50 years or so with the creation of independent departments of statistics and biostatistics.  

Ultimately, the name of the Division was not changed. Here’s the release from last week:

NSF is committed to supporting the research necessary to maximize the benefits to be derived from the age of data, and to promoting and funding research related to data-centric scientific discovery and innovation, and in particular, the growing role of the statistical sciences in all research areas. Recognizing both the complex composition of the various communities and the support of statistical sciences throughout NSF, and taking into account the various community views described in the very thoughtful report of the MPSAC, I have decided to maintain the name “Division of Mathematical Sciences (DMS)” within MPS, but to affirm strong commitment to the statistical sciences.

To demonstrate this commitment, (a) whenever appropriate, we will specifically mention “statistics” alongside “mathematics” in budget requests and in solicitations in order to recognize the unique and pervasive role of statistical sciences, and to ensure that relevant solicitations reach the statistical sciences community….

Well, I feel better already. I suppose this is progress of some sort.

21
Aug
17
Aug

Interview with C. Titus Brown - Computational biologist and open access champion

C. Titus Brown 


C. Titus Brown is an assistant professor in the Department of Computer Science and Engineering at Michigan State University. He develops computational software for next generation sequencing and the author of the blog, “Living in an Ivory Basement”. We talked to Titus about open access (he publishes his unfunded grants online!), improving the reputation of PLoS One, his research in computational software development, and work-life balance in academics. 

Do you consider yourself a statistician, data scientist, computer scientist, or something else?

Good question.  Short answer: apparently somewhere along the way I
became a biologist, but with a heavy dose of “computational scientist”
in there.

The longer answer?  Well, it’s a really long answer…

My first research was on Avida, a bottom-up model for evolution that
Chris Adami, Charles Ofria and I wrote together at Caltech in 1993:
http://en.wikipedia.org/wiki/Avida.  (Fun fact: Chris, Charles and I
are now all faculty at Michigan State!  Chris and I have offices one
door apart, and Charles has an office one floor down.)  Avida got me
very interested in biology, but not in the undergrad “memorize stuff”
biology — more in research.  This was computational science: using
simple models to study biological phenomena.

While continuing evolution research, I did my undergrad in pure math at Reed
College, which was pretty intense; I worked in the Software Development
lab there, which connected me to a bunch of reasonably well known hackers
including Keith Packard, Mark Galassi, and Nelson Minar.

I also took a year off and worked on Earthshine:

http://en.wikipedia.org/wiki/Planetshine#Earthshine

and then rebooted the project as an RA in 1997, the summer after
graduation.  This was mostly data analysis, although it included a
fair amount of hanging off of telescopes adjusting things as the
freezing winter wind howled through the Big Bear Solar Observatory’s
observing room, aka “data acquisition”.

After Reed, I applied to a bunch of grad schools, including Princeton
and Caltech in bio, UW in Math, and UT Austin and Ohio State in
physics.  I ended up at Caltech, where I switched over to
developmental biology and (eventually) regulatory genomics and genome
biology in Eric Davidson’s lab.  My work there included quite a bit
of wet bench biology, which is not something many people associate with me,
but was nonetheless something I did!

Genomics was really starting to hit the fan in the early 2000s, and I
was appalled by how biologists were handling the data — as one
example, we had about $500k worth of sequences sitting on a shared
Windows server, with no metadata or anything — just the filenames.
As another example, I watched a postdoc manually BLAST a few thousand
ESTs against the NCBI nr database; he sat there and did them three by
three, having figured out that he could concatenate three sequences
together and then manually deconvolve the results.  As probably the
most computationally experienced person in the lab, I quickly got
involved in data analysis and Web site stuff, and ended up writing
some comparative sequence analysis software that was mildly popular
for a while.

As part of the sequence analysis Web site I wrote, I became aware that
maintaining software was a *really* hard problem.  So, towards the end
of my 9 year stint in grad school, I spent a few years getting into
testing, both Web testing and more generally automated software
testing.  This led to perhaps my most used piece of software, twill, a
scripting language for Web testing.  It also ended up being one of the
things that got me elected into the Python Software Foundation,
because I was doing everything in Python (which is a really great
language, incidentally).

I also did some microbial genome analysis (which led to my first
completely reproducible paper (Brown and Callan, 2004;
http://www.ncbi.nlm.nih.gov/pubmed/14983022) and collaborated with the
Orphan lab on some metagenomics:
http://www.ncbi.nlm.nih.gov/pubmed?term=18467493.  This led to a
fascination with the biological “dark matter” in nature that is the
subject of some of my current work on metagenomics.

I landed my faculty position at MSU right out of grad school, because
bioinformatics is sexy and CS departments are OK with hiring grad
students as faculty.  However, I deferred for two years to do a
postdoc in Marianne Bronner-Fraser’s lab because I wanted to switch to
the chick as a model organism, and so I ended up arriving at MSU in
2009.  I had planned to focus a lot on development gene regulatory
networks, but 2009 was when Illumina sequencing hit, and as one of the
few people around who wasn’t visibly frightened by the term “gigabyte”
I got inextricably involved in a lot of different sequence analysis
projects.  These all converged on assembly, and, well, that seems to
be what I work on now :).

The two strongest threads that run through my research are these:

1. “better science through superior software” — so much of science
depends on computational inference these days, and so little of the
underlying software is “good”.  Scientists *really* suck at software
development (for both good and bad reasons) and I worry that a lot of
our current science is on a really shaky foundation.  This is one
reason I’m invested in Software Carpentry
(http://software-carpentry.org), a training program that Greg Wilson
has been developing — he and I agree that science is our best hope
for a positive future, and good software skills are going to be
essential for a lot of that science.  More generally I hope to turn
good software development into a competitive advantage for my lab
and my students.

2. “better hypothesis generation is needed” — biologists, in
particular, tend to leap towards the first testable hypothesis they
find.  This is a cultural thing stemming (I think) from a lot of
really bad interactions with theory: the way physicists and
mathematicians think about the world simply doesn’t fit with the Rube
Goldberg-esque features of biology (see
http://ivory.idyll.org/blog/is-discovery-science-really-bogus.html).

So getting back to the question, uh, yeah, I think I’m a computational
scientist who is working on biology?  And if I need to write a little
(or a lot) of software to solve my problems, I’ll do that, and I’ll
try to do it with some attention to good software development
practice — not just out of ethical concern for correctness, but
because it makes our research move faster.

One thing I’m definitely *not* is a statistician.  I have friends who
are statisticians, though, and they seem like perfectly nice people.

You have a pretty radical approach to open access, can you tell us a little bit about that?

Ever since Mark Galassi introduced me to open source, I thought it
made sense.  So I’ve been an open source-nik since … 1988?

From there it’s just a short step to thinking that open science makes
a lot of sense, too.  When you’re a grad student or a postdoc, you
don’t get to make those decisions, though; it took until I was a PI
for me to start thinking about how to do it.  I’m still conflicted
about *how* open to be, but I’ve come to the conclusion that posting
preprints is obvious
(http://ivory.idyll.org/blog/blog-practicing-open-science.html).

The “radical” aspect that you’re referring to is probably my posting
of grants (http://ivory.idyll.org/blog/grants-posted.html).  There are
two reasons I ended up posting all of my single-PI grants.  Both have
their genesis in this past summer, when I spent about 5 months writing
6 different grants — 4 of which were written entirely by me.  Ugh.

First, I was really miserable one day and joked on Twitter that “all
this grant writing is really cutting into my blogging” — a mocking
reference to the fact that grant writing (to get $$) is considered
academically worthwhile, while blogging (which communicates with the
public and is objectively quite valuable) counts for naught with my
employer.  Jonathan Eisen responded by suggesting that I post all of
the grants and I thought, what a great idea!

Second, I’m sure it’s escaped most people (hah!), but grant funding
rates are in the toilet — I spent all summer writing grants while
expecting most of them to be rejected.  That’s just flat-out
depressing!  So it behooves me to figure out how to make them serve
multiple duties.  One way to do that is to attract collaborators;
another is to serve as google bait for my lab; a third is to provide
my grad students with well-laid-out PhD projects.  A fourth duty they
serve (and I swear this was unintentional) is to point out to people
that this is MY turf and I’m already solving these problems, so maybe
they should go play in less occupied territory.  I know, very passive
aggressive…

So I posted the grants, and unknowingly joined a really awesome cadre
of folk who had already done the same
(http://jabberwocky.weecology.org/2012/08/10/a-list-of-publicly-available-grant-proposals-in-the-biological-sciences/).
Most feedback I’ve gotten has been from grad students and undergrads
who really appreciate the chance to look at grants; some people told
me that they’d been refused the chance to look at grants from their
own PIs!

At the end of the day, I’d be lucky to be relevant enough that people
want to steal my grants or my software (which, by the way, is under a
BSD license — free for the taking, no “theft” required…).  My
observation over the years is that most people will do just about
anything to avoid using other people’s software.

In theoretical statistics, there is a tradition of publishing pre-prints while papers are submitted. Why do you think biology is lagging behind?

I wish I knew!  There’s clearly a tradition of secrecy in biology;
just look at the Cold Spring Harbor rules re tweeting and blogging
(http://meetings.cshl.edu/report.html) - this is a conference, for
chrissakes, where you go to present and communicate!  I think it’s
self-destructive and leads to an insider culture where only those who
attend meetings and chat informally get to be members of the club,
which frankly slows down research. Given the societal and medical
challenges we face, this seems like a really bad way to continue doing
research.

One of the things I’m proudest of is our effort on the cephalopod
genome consortium’s white paper,
http://ivory.idyll.org/blog/cephseq-cephalopod-genomics.html, where a
group of bioinformaticians at the meeting pushed really hard to walk
the line between secrecy and openness.  I came away from that effort
thinking two things: first, that biologists were erring on the side of
risk aversity; and second, that genome database folk were smoking
crack when they pushed for complete openness of data.  (I have a blog
post on that last statement coming up at some point.)

The bottom line is that the incentives in academic biology are aligned
against openness.  In particular, you are often rewarded for the first
observation, not for the most useful one; if your data is used to do
cool stuff, you don’t get much if any credit; and it’s all about
first/last authorship and who is PI on the grants.  All too often this
means that people sit on their data endlessly.

This is getting particularly bad with next-gen data sets, because
anyone can generate them but most people have no idea how to analyze
their data, and so they just sit on it forever…

Do you think the ArXiv model will catch on in biology or just within the bioinformatics community?

One of my favorite quotes is: “Making predictions is hard, especially
when they’re about the future.” I attribute it to Niels Bohr.

It’ll take a bunch of big, important scientists to lead the way. We
need key members of each subcommunity of biology to decide to do it on
a regular basis. (At this point I will take the obligatory cheap shot
and point out that Jonathan Eisen, noted open access fan, doesn’t post
his stuff to preprint servers very often.  What’s up with that?)  It’s
going to be a long road.

What is the reaction you most commonly get when you tell people you have posted your un-funded grants online?

“Ohmigod what if someone steals them?”

Nobody has come up with a really convincing model for why posting
grants is a bad thing.  They’re just worried that it *might* be.  I
get the vague concerns about theft, but I have a hard time figuring
out exactly how it would work out well for the thief — reputation is
a big deal in science, and gossip would inevitably happen.  And at
least in bioinformatics I’m aiming to be well enough known that
straight up ripping me off would be suicidal.  Plus, if reviewers
do/did google searches on key concepts then my grants would pop up,
right?  I just don’t see it being a path to fame and glory for anyone.

Revisiting the passive-aggressive nature of my grant posting, I’d like
to point out that most of my grants depend on preliminary results from
our own algorithms.  So even if they want to compete on my turf, it’ll
be on a foundation I laid.  I’m fine with that — more citations for
me, either way :).

More optimistically, I really hope that people read my grants and then
find new (and better!) ways of solving the problems posed in them.  My
goal is to enable better science, not to hunker down in a tenured job
and engage in irrelevant science; if someone else can use my grants as
a positive or negative signpost to make progress, then broadly
speaking, my job is done.

Or, to look at it another way: I don’t have a good model for either
the possible risks OR the possible rewards of posting the grants, and
my inclinations are towards openness, so I thought I’d see what
happens.

How can junior researchers correct misunderstandings about open access/journals like PLoS One that separate correctness from impact? Do you have any concrete ideas for changing minds of senior folks who aren’t convinced?

Render them irrelevant by becoming senior researchers who supplant them
when they retire.  It’s the academic tradition, after all!  And it’s
really the only way within the current academic system, which — for
better or for worse — isn’t going anywhere.

Honestly, we need fewer people yammering on about open access and more
people simply doing awesome science and submitting it to OA journals.
Conveniently, many of the high impact journals are shooting themselves
in the foot and encouraging this by rejecting good science that then
ends up in an OA journal; that wonderful ecology oped on PLoS One
citation rates shows this well
(http://library.queensu.ca/ojs/index.php/IEE/article/view/4351).

Do you have any advice on what computing skills/courses statistics students interested in next generation sequencing should take?

For courses, no — in my opinion 80% of what any good researcher
learns is self-motivated and often self-taught, and so it’s almost
silly to pretend that any particular course or set of skills is
sufficient or even useful enough to warrant a whole course.  I’m not a
big fan of our current undergrad educational system :)

For skills?  You need critical thinking coupled with an awareness that
a lot of smart people have worked in science, and odds are that there
are useful tricks and approaches that you can use.  So talk to other
people, a lot!  My lab has a mix of biologists, computer scientists,
graph theorists, bioinformaticians, and physicists; more labs should
be like that.

Good programming skills are going to serve you well no matter what, of
course.  But I know plenty of good programmers who aren’t very
knowledgeable about biology, and who run into problems doing actual
science.  So it’s not a panacea.

How does replicable or reproducible research fit into your interests?

I’ve wasted *so much time* reproducing other people’s work that when
the opportunity came up to put down a marker, I took it.

http://ivory.idyll.org/blog/replication-i.html

The digital normalization paper shouldn’t have been particularly
radical; that it is tells you all you need to know about replication
in computational biology.

This is actually something I first did a long time ago, with what was
perhaps my favorite pre-faculty-job paper: if you look at the methods
for Brown & Callan (2004) you’ll find a downloadable package that
contains all of the source code for the paper itself and the analysis
scripts.  But back then I didn’t blog :).

Lack of reproducibility and openness in methods has serious
consequences — how much of cancer research has been useless, for
example?  See `this horrific report
<http://online.wsj.com/article/SB10001424052970203764804577059841672541590.html>`__.)
Again, the incentives are all wrong: you get grant money for
publishing, not for being useful.  The two are not necessarily the
same…

Do you have a family, and how do you balance work life and home life?

Why, thank you for asking!  I do have a family — my wife, Tracy Teal,
is a bioinformatician and microbial ecologist, and we have two
wonderful daughters, Amarie (4) and Jessie (1).  It’s not easy being a
junior professor and a parent at the same time, and I keep on trying
to figure out how to balance the needs of travel with the need to be a
parent (hint: I’m not good at it).  I’m increasingly leaning towards
blogging as being a good way to have an impact while being around
more; we’ll see how that goes.

14
Aug

Statistics/statisticians need better marketing

Statisticians have not always been great self-promoters. I think in part this comes from our tendency to be arbiters rather than being involved in the scientific process. In some ways, I think this is a good thing. Self-promotion can quickly become really annoying. On the other hand, I think our advertising shortcomings are hurting our field in a number of different ways. 

Here are a few:

  1. As Rafa points out even though statisticians are ridiculously employable right now it seems like statistics M.S. and Ph.D. programs are flying under the radar in all the hype about data/data science (here is an awesome one if you are looking). Computer Science and Engineering, even the social sciences, are cornering the market on “big data”. This potentially huge and influential source of students may pass us by if we don’t advertise better. 
  2. A corollary to this is lack of funding. When the Big Data event happened at the White House with all the major funders in attendance to announce $200 million in new funding for big data, none of the invited panelists were statisticians. 
  3. Our top awards don’t get the press they do in other fields. The Nobel Prize announcements are an international event. There is always speculation/intense interest in who will win. There is similar interest around the Fields medal in mathematics. But the top award in statistics, the COPSS award doesn’t get nearly the attention it should. Part of the reason is lack of funding (the Fields is $15k, the COPSS is $1k). But part of the reason is that we, as statisticians, don’t announce it, share it, speculate about it, tell our friends about it, etc. The prestige of these awards can have a big impact on the visibility of a field. 
  4.  A major component of visibility of a scientific discipline, for better or worse, is the popular press. The most recent article in a long list of articles at the New York Times about the data revolution does not mention statistics/statisticians. Neither do the other articles. We need to cultivate relationships with the media. 

We are all busy solving real/hard scientific and statistical problems, so we don’t have a lot of time to devote to publicity. But here are a couple of easy ways we could rapidly increase the visibility of our field, ordered roughly by the degree of time commitment. 

  1. All statisticians should have Twitter accounts and we should share/discuss our work and ideas online. The more we help each other share, the more visibility our ideas will get. 
  2. We should make sure we let the ASA know about cool things that are happening with data/statistics in our organizations and they should spread the word through their Twitter account and other social media. 
  3. We should start a conversation about who we think will win the next COPSS award in advance of the next JSM and try to get local media outlets to pick up our ideas and talk about the award. 
  4. We should be more “big tent” about statistics. ASA President Robert Rodriguez nailed this in his speech at JSM. Whenever someone does something with data, we should claim them as a statistician. Sometimes this will lead to claiming people we don’t necessarily agree with. But the big tent approach is what is allowing CS and other disciplines to overtake us in the data era. 
  5. We should consider setting up a place for statisticians to donate money to build up the award fund for the COPSS/other statistics prizes. 
  6. We should try to forge relationships with start-up companies and encourage our students to pursue industry/start-up opportunities if they have interest. The less we are insular within the academic community, the more high-profile we will be. 
  7. It would be awesome if we started a statistical literacy outreach program in communities around the U.S. We could offer free courses in community centers to teach people how to understand polling data/the census/weather reports/anything touching data. 

Those are just a few of my ideas, but I have a ton more. I’m sure other people do too and I’d love to hear them. Let’s raise the tide and lift all of our boats!

13
Aug