Humor

Batch effects are everywhere! Deflategate edition

In my opinion, batch effects are the biggest challenge faced by genomics research, especially in precision medicine. As we point out in this review, they are everywhere among high-throughput experiments. But batch effects are not specific to genomics technology. In fact, in this 1972 paper (paywalled), WJ Youden describes batch effects in the context of measurements made by physicists. Check out this plot of astronomical unit speed of light estimates with an estimate of spread confidence intervals (red and green are same lab).

Confession: I sometimes enjoy reading the fake journal/conference spam

I've spent a considerable amount of time setting up filters to avoid getting spam from fake journals and conferences. Unfortunately, they are exceptionally good at thwarting my defenses. This does not annoy me as much as I pretend because, secretly, I enjoy reading some of these emails. Here are three of my favorites. 1) Over-the-top robot: It gives us immense pleasure to invite you and your research allies to submit a manuscript for the journal “REDACTED”.

paste0 is statistical computing's most influential contribution of the 21st century

The day I discovered paste0 I literally cried. No more paste(bla,bla, sep=“”). While looking through code written by a student who did not know about paste0 I started pondering about how many person hours it has saved humanity. So typing sep=“” takes about 1 second. We R users use paste about 100 times a day and there are about 1,000,000 R users in the world. That’s over 3 person years a day!

Sunday data/statistics link roundup (12/2/12)

An interview with Anthony Goldbloom, CEO of Kaggle. I’m not sure I’d agree with the characterization that all data scientists are: creative, curious, and competitive and certainly those characteristics aren’t unique to data scientists. And I didn’t know this: “We have 65,000 data scientists signed up to Kaggle, and just like with golf tournaments, we have them all ranked from 1 to 65,000.“  Check it out, art with R!

I give up, I am embracing pie charts

Most statisticians know that pie charts are a terrible way to plot percentages. You can find explanations here, here, and here as well as the R help file for the pie function which states: Pie charts are a very bad way of displaying information. The eye is good at judging linear measures and bad at judging relative areas. A bar chart or dot chart is a preferable way of displaying this type of data.

The pebbles of academia

I have just been awarded a certificate for successful completion of the Conflict of Interest Commitment training (I barely passed). Lately, I have been totally swamped by administrative duties and have had little time for actual research. The experience reminded me of something I read in this NYTimes article by Tyler Cowen Michael Mandel, an economist with the Progressive Policy Institute, compares government regulation of innovation to the accumulation of pebbles in a stream.

When dealing with poop, it's best to just get your hands dirty

I’m a relatively new dad. Before the kid we affectionately call the “tiny tornado” (TT) came into my life, I had relatively little experience dealing with babies and all the fluids they emit. So admittedly, I was a little squeamish dealing with the poopy explosions the TT would create. Inevitably, things would get much more messy than they had to be while I was being too delicate with the issue. It took me an embarrassingly long time for an educated man, but I finally realized you just have to get in there and change the thing even if it is messy, then wash your hands after.

Sunday data/statistics link roundup (5/27)

Amanda Cox on the process they went through to come up with this graphic about the Facebook IPO. So cool to see how R is used in the development process. A favorite quote of mine, “But rather than bringing clarity, it just sort of looked chaotic, even to the seasoned chart freaks of 620 8th Avenue.” One of the more interesting things about posts like this is you get to see how statistics versus a deadline works.

Computational biologist blogger saves computer science department

People who read the news should be aware by now that we are in the midst of a big data era. The New York Times, for example, has been writing about this frequently. One of their most recent articles describes how UC Berkeley is getting $60 million dollars for a new computer science center. Meanwhile, at University of Florida the administration seems to be oblivious to all this and about a month ago announced it was dropping its computer science department to save $.

Sunday data/statistics link roundup (1/29)

A really nice D3 tutorial. I’m 100% on board with D3, if they could figure out a way to export the graphics as pdfs, I think this would be the best visualization tool out there. A personalized calculator that tells you what number (of the 7 billion or so) that you are based on your birth day. I’m person 4,590,743,884. Makes me feel so special…. An old post of ours, on dongle communism.

Fundamentals of Engineering Review Question Oops

The Fundamentals of Engineering Exam is the first licensing exam for engineers. You have to pass it on your way to becoming a professional engineer (PE). I was recently shown a problem from a review manual:  When it is operating properly, a chemical plant has a daily production rate that is normally distributed with a mean of 880 tons/day and a standard deviation of 21 tons/day. During an analysis period, the output is measured with random sampling on 50 consecutive days, and the mean output is found to be 871 tons/day.

Dear editors/associate editors/referees, Please reject my papers quickly

The review times for most journals in our field are ridiculous. Check out Figure 1 here. A careful review takes time, but not six months. Let’s be honest, those papers are sitting on desks for the great majority of those six months. But here is what really kills me: waiting six months for a review basically saying the paper is not of sufficient interest to the readership of the journal. That decision you can come to in half a day.

Getting email responses from busy people

I’ve had the good fortune of working with some really smart and successful people during my career. As a young person, one problem with working with really successful people is that they get a _ton_ of email. Some only see the subject lines on their phone before deleting them. I’ve picked up a few tricks for getting email responses from important/successful people: The SI Rules Try to send no more than one email a day.

Dongle communism

If you have a mac and give talks or teach, chances are you have embarrassed yourself by forgetting your dongle. Our lab meetings and classes were constantly delayed due to missing dongles. Communism solved this problem. We bought 10 dongles, sprinkled them around the department, and declared all dongles public property. All dongles, not just the 10. No longer do we have to ask to borrow dongles because they have no owner.

StatistiX

I think our field would attract more students if we changed the name to something ending with X or K. I’ve joked about this for years, but someone has actually done it (kind of): http://www.bitlifesciences.com/AnalytiX2012/