Tag: Rant


Reverse scooping

I would like to define a new term: reverse scooping is when someone publishes your idea after you, and doesn’t cite you. It has happened to me a few times. What does one do? I usually send a polite message to the authors with a link to my related paper(s). These emails are usually ignored, but not always. Most times I don’t think it is malicious though. In fact, I almost reverse scooped a colleague recently.  People arrive at the same idea a few months (or years) later and there is just too much literature to keep track-off. And remember the culprit authors were not the only ones that missed your paper, the referees and associate editor missed it as well. One thing I have learned is that if you want to claim an idea, try to include it in the title or abstract as very few papers get read cover-to-cover.


Submitting scientific papers is too time consuming

As an academic who does a lot of research for a living, I spend a lot of my time writing and submitting papers. Before my time, this process involved sending multiple physical copies of a paper by snail mail to the editorial office. New technology has changed this process. Now to submit a paper you generally have to: (1) find a Microsoft Word or Latex template for the journal and use it for your paper and (2) upload the manuscript and figures (usually separately). This is a big improvement over snail mail submission! But it still takes a huge amount of time. Some simple changes would give academics back huge blocks of time to focus on teaching and research. 

Just to give an idea of how complicated the current system is here is an outline of what it takes to submit a paper.

To complete step (1) you go to the webpage of the journal you are submitting to, find their template files, and wrestle your content into the template. Sometimes this requires finding additional files which are not on the website of the journal you are submitting too. It always requires a large amount of tweaking the text and content to fit the template. 

To complete step (2) you have to go the webpage of the journal and start an account with their content management system. There are frequently different requirements for usernames and passwords, leading to proliferation of both. Then you have to upload the files and fill out between 5-7 web forms with information about the authors, information about the paper, information about the funding, information about human subjects research, etc. If the files aren’t in the right format you may have to reformat them before they will be accepted. Some journals even have editorial assistants who will go over your submission and find problems that have to be resolved before your paper can even be reviewed. 

This whole process can take anywhere from one to ten hours, depending on the journal. If you have to revise your paper for that journal, you have to go through the process again. If your paper is rejected, then you have to start all over with a new template and a new content management system at a new journal. 

It seems like a much simpler system would be for people to submit their papers in pdf/word format with all the figures embedded. If the paper is accepted to a journal, then of course you might need to reformat the submission to make it easier for typesetters to reformat your article. But that could happen just one time, once a paper is accepted. 

This seems like a small thing. But suppose you submit a paper between 10 and 15 times a year (very common for academics in my field). Suppose it takes on average 3 hours to submit a paper. That is 3*10 = 30 hours a year, almost an entire workweek, just dealing with reformatting papers!  

In the comments, I’d love to hear about the best/worst experiences you have had submitting papers. Where is good? Where is bad? 


25 minute seminars

Most Statistics and Biostatistics departments have weekly seminars. We usually invite outside speakers to share their knowledge via a 50 minute powerpoint (or beamer) presentation. This gives us the opportunity to meet colleagues from other Universities and pick their brains in small group meetings. This is all great. But, giving a good one hour seminar is hard. Really hard. Few people can pull it off. I propose to the statistical community that we cut the seminars to 25 minutes with 35 minutes for questions and further discussion. We can make exceptions of course. But in general, I think we would all benefit from shorter seminars. 



In this TED talk Jason Fried explains why work doesn’t happen at work. He describes the evils of meetings. Meetings are particularly disruptive for applied statisticians, especially for those of us that hack data files, explore data for systematic errors, get inspiration from visual inspection, and thoroughly test our code. Why? Before I become productive I go through a ramp-up/boot-up stage. Scripts need to be found, data loaded into memory, and most importantly, my brains needs to re-familiarize itself with the data and the essence of the problem at hand. I need a similar ramp up for writing as well. It usually takes me between 15 to 60 minutes before I am in full-productivity mode. But once I am in “the zone”, I become very focused and I can stay in this mode for hours. There is nothing worse than interrupting this state of mind to go to a meeting. I lose much more than the hour I spend at the meeting. A short way to explain this is that having 10 separate hours to work is basically nothing, while having 10 hours in the zone is when I get stuff done.

Of course not all meetings are a waste of time. Academic leaders and administrators need to consult and get advice before making important decisions. I find lab meetings very stimulating and, generally, productive: we unstick the stuck and realign the derailed. But before you go and set up a standing meeting consider this calculation: a weekly one hour meeting with 20 people translates into 1 hour x 20 people x 52 weeks/year = 1040 person hours of potentially lost production per year. Assuming 40 hour weeks, that translates into six months. How many grants, papers, and lectures can we produce in six months? And this does not take into account the non-linear effect described above. Jason Fried suggest you cancel your next meeting, notice that nothing bad happens and enjoy the extra hour of work.

I know many others that are like me in this regard and for you I have these recommendations: 1- avoid unnecessary meetings, especially if you are already in full-productivity mode. Don’t be afraid to use this as an excuse to cancel.  If you are in a soft $ institution, remember who pays your salary.  2- Try to bunch all the necessary meetings all together into one day. 3- Separate at least one day a week to stay home and work for 10 hours straight. Jason Fried also recommends that every work place declare a day in which no one talks. No meetings, no chit-chat, no friendly banter, etc… No talk Thursdays anyone? 


Where are the Case Studies?

Many case studies I find interesting don’t appear in JASA Applications and Case Studies or other applied statistics journals for that matter. Some because the technical skill needed to satisfy reviewers is not sufficiently impressive, others because they lack mathematical rigor. But perhaps the main reason for this disconnect is that many interesting case studies are developed by people outside our field or outside academia.

In this blog we will try to introduce readers to some of these case studies. I’ll start it off by pointing readers to Nate Silver’s FiveThirtyEight blog. Mr. Silver (yes, Mr. not Prof. nor Dr.) is one of my favorite statisticians. He first became famous for PECOTA; a system that uses data and statistics to predict the performance of baseball players. In FiveThirtyEight he uses a rather sophisticated meta-analysis approach to predicting election outcomes.

For example, for the 2008 election he used data from the primaries to calibrate pollsters and then properly weighed these pollsters’ predictions to give a more precise estimate of election results. He predicted Obama would win 349 to 189 with a 6.1% difference in the popular vote. The actual result was 365 to 173 with a difference of 7.2%. His website included graphs that very clearly illustrated the uncertainty of his prediction. These were updated daily and I had a ton of fun visiting his blog at least once a day. I also learned quite a bit, used his data in class, and gained insights that I have used in my own projects.