Tag: humor


Batch effects are everywhere! Deflategate edition

In my opinion, batch effects are the biggest challenge faced by genomics research, especially in precision medicine. As we point out in this review, they are everywhere among high-throughput experiments. But batch effects are not specific to genomics technology. In fact, in this 1972 paper (paywalled), WJ Youden describes batch effects in the context of measurements made by physicists. Check out this plot of astronomical unit speed of light estimates with an estimate of spread confidence intervals (red and green are same lab).



Sometimes you find batch effects where you least expect them. For example, in the deflategate debate. Here is quote from the New England patriot's deflategate rebuttal (written with help from Nobel Prize winner Roderick MacKinnon)

in other words, the Colts balls were measured after the Patriots balls and had warmed up more. For the above reasons, the Wells Report conclusion that physical law cannot explain the pressures is incorrect.

Here is another one:

In the pressure measurements physical conditions were not very well-defined and major uncertainties, such as which gauge was used in pre-game measurements, affect conclusions.

So NFL, please read our paper before you accuse a player of cheating.

Disclaimer: I live in New England but I am Ravens fan.


Confession: I sometimes enjoy reading the fake journal/conference spam

I've spent a considerable amount of time setting up filters to avoid getting spam from fake journals and conferences. Unfortunately, they are exceptionally good at thwarting my defenses. This does not annoy me as much as I pretend because, secretly, I enjoy reading some of these emails. Here are three of my favorites.

1) Over-the-top robot:

It gives us immense pleasure to invite you and your research allies to submit a manuscript for the journal “REDACTED”. The expertise of you in the never ending field of Gene Technology is highly appreciable. The level of intricacy shown by you in your work makes us even more proud, and we believe that your works should be known to mankind of science.

2) Sarcastic robot?

First of all, congratulations on the publication of your highly cited original article < The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores > in the field of colon cancer, which has been cited more than 1 times and is in the world's top one percent of papers. Such high number of citations reflects the high quality and influence of your paper.

3) Intimidating robot:

This is Rocky.... Recently we have mailed you about the details of the conference. But we still have not received your response. So today we contact you again.

NB: Although I am joking in this post, I do think these fake journals and conferences are a very serious problem. The fact that they are still around means enough money (mostly taxpayer money) is being spent to keep them in business. If you want to learn more, this blog does a good job on reporting on them and includes a list of culprits.


paste0 is statistical computing's most influential contribution of the 21st century

The day I discovered paste0 I literally cried. No more paste(bla,bla, sep=""). While looking through code written by a student who did not know about paste0 I started pondering about how many person hours it has saved humanity. So typing sep="" takes about 1 second. We R users use paste about 100 times  a day and there are about 1,000,000 R users in the world. That's over 3 person years a day! Next up read.table0 (who doesn't want  as.is to be TRUE?).


R package meme

I just got this from a former student who is working on a project with me:





Sunday data/statistics link roundup (12/2/12)

  1. An interview with Anthony Goldbloom, CEO of Kaggle. I'm not sure I'd agree with the characterization that all data scientists are: creative, curious, and competitive and certainly those characteristics aren't unique to data scientists. And I didn't know this: "We have 65,000 data scientists signed up to Kaggle, and just like with golf tournaments, we have them all ranked from 1 to 65,000." 
  2. Check it out, art with R! It's actually pretty interesting to see how they use statistical algorithms to generate different artistic styles. Here are some more. 
  3. Now that Ethan Perlstein's crowdfunding experiment was successful, other people are getting on the bandwagon. If you want to find out what kind of bacteria you have in your gut, for example, you could check out this
  4. I thought I had it rough, but apparently some data analysts spend all their time developing algorithms to detect penis drawings!
  5. Roger was on Anderson Cooper 360 as part of the Building America segment. We can't find the video, but here is the transcript. 
  6. An interesting article on the half-life of facts. I think the analogy is an interesting one and certainly there is research to be done there. But I think it jumps the shark a bit when they start talking about how the moon landing was predictable, etc. I completely believe in the retrospective analysis of knowledge, but predicting things is pretty hard, especially when it is the future.  

I give up, I am embracing pie charts

Most statisticians know that pie charts are a terrible way to plot percentages. You can find explanations here, here, and here as well as the R help file for the pie function which states:

Pie charts are a very bad way of displaying information. The eye is good at judging linear measures and bad at judging relative areas. A bar chart or dot chart is a preferable way of displaying this type of data.


I have only used the pie R function once and it was to make this plot (R code below):

So why are they ubiquitous? The best explanation I've heard is that they are easy to make in Microsoft Excel. Regardless, after years of training, lay people are probably better at interpreting pie charts than any other graph. So I'm surrendering and embracing the pie chart. Jeff's recent post shows we have bigger fish to fry.

for(i in 0:(N-1)){
system("convert Rplot*.png pacman.gif")
##system("rm *.png") edited to save caffo's pngs (see comments)
system("rm Rplot*.png")


The pebbles of academia

I have just been awarded a certificate for successful completion of the Conflict of Interest Commitment training (I barely passed). Lately, I have been totally swamped by administrative duties and have had little time for actual research. The experience reminded me of something I read in this NYTimes article by Tyler Cowen

Michael Mandel, an economist with the Progressive Policy Institute, compares government regulation of innovation to the accumulation of pebbles in a stream. At some point too many pebbles block off the water flow, yet no single pebble is to blame for the slowdown. Right now the pebbles are limiting investment in future innovation.

Here are some of the pebbles of my academic career (past and present): financial conflict of interest training , human subjects training, HIPAA training, safety training, ethics training, submitting papers online, filling out copyright forms, faculty meetings, center grant quarterly meetings, 2 hour oral exams, 2 hour thesis committee meetings, big project conference calls, retreats, JSM, anything with “strategic” in the title, admissions committee, affirmative action committee, faculty senate meetings, brown bag lunches, orientations, effort reporting, conflict of interest reporting, progress reports (can’t I just point to pubmed?), dbgap progress reports, people who ramble at study section, rambling at study section, buying airplane tickets for invited talks, filling out travel expense sheets, and organizing and turning in travel receipts. I know that some of these are somewhat important or take minimal time, but read the quote again.

I also acknowledge that I actually have it real easy compared to others so I am interested in hearing about other people’s pebbles? 

Update: add changing my eRA commons password to list!


When dealing with poop, it's best to just get your hands dirty

I’m a relatively new dad. Before the kid we affectionately call the “tiny tornado” (TT) came into my life, I had relatively little experience dealing with babies and all the fluids they emit. So admittedly, I was a little squeamish dealing with the poopy explosions the TT would create. Inevitably, things would get much more messy than they had to be while I was being too delicate with the issue. It took me an embarrassingly long time for an educated man, but I finally realized you just have to get in there and change the thing even if it is messy, then wash your hands after. It comes off. 

It is a similar situation in my professional life, but I’m having a harder time learning the lesson. There are frequently things that I’m not really excited to do: review a lot of papers, go to long meetings, revise a draft of that paper that has just been sitting around forever. Inevitably, once I get going they usually aren’t as difficult or as arduous as I thought. Even better, once they are done I feel a huge sense of accomplishment and relief. I used to have a metaphor for this, I’d tell myself, “Jeff, just rip off the band-aid”. Now, I think “Jeff, just get your hands dirty”. 


Sunday data/statistics link roundup (5/27)

  1. Amanda Cox on the process they went through to come up with this graphic about the Facebook IPO. So cool to see how R is used in the development process. A favorite quote of mine, “But rather than bringing clarity, it just sort of looked chaotic, even to the seasoned chart freaks of 620 8th Avenue.” One of the more interesting things about posts like this is you get to see how statistics versus a deadline works. This is typically the role of the analyst, since they come in late and there is usually a deadline looming…
  2. An interview with Steve Blank about Silicon valley and how venture capitalists (VC’s) are focused on social technologies since they can make a profit quickly. A depressing/fascinating quote from this one is, “If I have a choice of investing in a blockbuster cancer drug that will pay me nothing for ten years,  at best, whereas social media will go big in two years, what do you think I’m going to pick? If you’re a VC firm, you’re tossing out your life science division.” He also goes on to say thank goodness for the NIH, NSF, and Google who are funding interesting “real science” problems. This probably deserves its own post later in the week, the difference between analyzing data because it will make money and analyzing data to solve a hard science problem. The latter usually takes way more patience and the data take much longer to collect. 
  3. An interesting post on how Obama’s analytics department ran an A/B test which improved the number of people who signed up for his mailing list. I don’t necessarily agree with their claim that they helped raise $60 million, there may be some confounding factors that mean that the individuals who sign up with the best combination of image/button don’t necessarily donate as much. But still, an interesting look into why Obama needs statisticians
  4. A cute statistics cartoon from @kristin_linn  via Chris V. Yes, we are now shamelessly reposting cute cartoons for retweets :-). 
  5. Rafa’s post inspired some interesting conversation both on our blog and on some statistics mailing lists. It seems to me that everyone is making an effort to understand the increasingly diverse field of statistics, but we still have a ways to go. I’m particularly interested in discussion on how we evaluate the contribution/effort behind making good and usable academic software. I think the strength of the Bioconductor community and the rise of Github among academics are a good start.  For example, it is really useful that Bioconductor now tracks the number of package downloads

Computational biologist blogger saves computer science department

People who read the news should be aware by now that we are in the midst of a big data era. The New York Times, for example, has been writing about this frequently. One of their most recent articles describes how UC Berkeley is getting $60 million dollars for a new computer science center. Meanwhile, at University of Florida the administration seems to be oblivious to all this and about a month ago announced it was dropping its computer science department to save $. Blogger Steven Salzberg, a computational biologists known for his work in genomics, wrote a post titled “University of Florida eliminates Computer Science Department. At least they still have football” ridiculing UF for their decisions. Here are my favorite quotes:

 in the midst of a technology revolution, with a shortage of engineers and computer scientists, UF decides to cut computer science completely? 

Computer scientist Carl de Boor, a member of the National Academy of Sciences and winner of the 2003 National Medal of Science, asked the UF president “What were you thinking?”

Well, his post went viral and days later UF reversed it’s decision! So my point is this: statistics departments, be nice to bloggers that work in genomics… one of them might save your butt some day.

Disclaimer: Steven Salzberg has a joint appointment in my department and we have joint lab meetings.