An astute reader (Niels Hansen, who is visiting our department today) caught a bug in my code on Github for the Leekasso. I had:
lm1 = lm(y ~ leekX)
predict.lm(lm1,as.data.frame(
Unfortunately, this meant that I was getting predictions for the training set on the test set. Since I set up the test/training sets the same, this meant that I was actually getting training set error rates for the Leekasso. Neils Hansen noticed the bug and reran the fixed code with this term instead:
lm1 = lm(y ~ ., data = as.data.frame(leekX))
predict.lm(lm1,as.data.frame(
He created a heatmap subtracting the average accuracy of the Leekasso/Lasso and showed they are essentially equivalent.
This is a bummer, the Leekasso isn’t a world crushing algorithm. On the other hand, I’m happy that just choosing the top 10 is still competitive with the optimized lasso on average. More importantly, although I hate being wrong, I appreciate people taking the time to look through my code.
Just out of curiosity I’m taking a survey. Do you think I should publish this top10 predictor thing as a paper? Or do you think it is too trivial?