The self-assessment trap21 Oct 2011
Several months ago I was sitting next to my colleague Ben Langmead at the Genome Informatics meeting. Various talks were presented on short read alignments and every single performance table showed the speaker’s method as #1 and Ben’s Bowtie as #2 among a crowded field of lesser methods. It was fun to make fun of Ben for getting beat every time, but the reality was that all I could conclude was that Bowtie was best and speakers were falling into the the self-assessment trap: each speaker had tweaked the assessment to make their method look best. This practice is pervasive in Statistics where easy-to-tweak Monte Carlo simulations are commonly used to assess performance. In a recent paper, a team at IBM described how the problem in the systems biology literature is pervasive as well. Co-author Gustavo Stolovitzky
Stolovitsky is a co-developer of the DREAM challenge in which the assessments are fixed and developers are asked to submit. About 7 years ago we developed affycomp, a comparison webtool for microarray preprocessing methods. I encourage others involved in fields where methods are constantly being compared to develop such tools. It’s a lot of work, but journals are usually friendly to papers describing the results of such competitions.