Tag: data scientist


Sunday Data/Statistics Link Roundup (9/23/12)

  1. Harvard Business school is getting in on the fun, calling the data scientist the sexy profession for the 21st century. Although I am a little worried that by the time it gets into a Harvard Business document, the hype may be outstripping the real promise of the discipline. Still, good news for statisticians! (via Rafa via Francesca D.’s Facebook feed). 
  2. The counterpoint is this article which suggests that data scientists might be able to be replaced by tools/software. I think this is also a bit too much hype for my tastes. Certain things will definitely be automated and we may even end up with a deterministic statistical machine or two. But there will continually be new problems to solve which require the expertise of people with data analysis skills and good intuition (link via Samara K.)
  3. A bunch of websites are popping up where you can sign up and have people take your online courses for you. I’m not going to give them the benefit of a link, but they aren’t hard to find these days. The thing I don’t understand is, if it is a free online course, why have someone else take it for you? It’s free, its in your spare time, and the bar for passing is pretty low (links via Sherri R. redacted)….
  4. Maybe mostly useful for me, but for other people with Tumblr blogs, here is a way to insert Latex.
  5. Brian Caffo shares his impressions of the SAMSI massive data workshop.  He raises an important issue which definitely deserves more discussion: should we be focusing on specific or general problems? Worth a read. 
  6. For the people into self-tracking, Chris V. points to an app created by the University of Indiana that lets people track their sexual activity. The most interesting thing about that app is how it highlights a key and I suppose often overlooked issue with analyzing self-tracking data. Despite the size of these data sets, they are still definitely biased samples. It’s only a brave few who will tell the University of Indiana all about their sex life. 

A disappointing response from @NatureMagazine about folks with statistical skills

Last week I linked to an ad for a Data Editor position at Nature Magazine. I was super excited that Nature was recognizing data as an important growth area. But the ad doesn’t mention anything about statistical analysis skills; it focuses exclusively on data management expertise. As I pointed out in the earlier post, managing data is only half the equation - figuring out what to do with the data is the other half. The second half requires knowledge of statistics.

The folks over at Nature responded to our post on Twitter:

 it’s unrealistic to think this editor (or anyone) could do what you suggest. Curation & accessibility are key. ^ng

I disagree with this statement for the following reasons:

1. Is it really unrealistic to think someone could have data management and statistical expertise? Pick your favorite data scientist and you would have someone with those skills. Most students coming out of computer science, computational biology, bioinformatics, or statistical genomics programs would have a blend of those two skills in some proportion. 

But maybe the problem is this:

Applicants must have a PhD in the biological sciences

It is possible that there are few PhDs in the biological sciences who know both statistics and data management (although that is probably changing). But most computational biologists have a pretty good knowledge of biology and a very good knowledge of data - both managing and analyzing. If you are hiring a data editor, this might be the target audience. I’d replace PhD in the biological science in the ad with, knowledge of biology,statistics, data analysis, and data visualization. There would be plenty of folks with those qualifications.

2. The response mentions curation, which is a critical issue. But good curation requires knowledge of two things: (i) the biological or scientific problem and (ii) how and in what way the data will be analyzed and used by researchers. As the Duke scandal made clear, a statistician with technological and biological knowledge running through a data analysis will identify many critical issues in data curation that would be missed by someone who doesn’t actually analyze data. 

3. The response says that “Curation and accessibility” are key. I agree that they are part of the key. It is critical that data can be properly accessed by researchers to perform new analyses, verify results in papers, and discover new results. But if the goal is to ensure the quality of science being published in Nature (the role of an editor) curation and accessibility are not enough. The editor should be able to evaluate statistical methods described in papers to identify potential flaws, or to rerun code and make sure that it performs the same/sensible analyses. A bad analysis that is reproducible will be discovered more quickly, but it is still a bad analysis. 

To be fair, I don’t think that Nature is the only organization that is missing the value of statistical skill in hiring data positions. It seems like many organizations are still just searching for folks who can handle/process the massive data sets being generated. But if they want to make accurate and informed decisions, statistical knowledge needs to be at the top of their list of qualifications.