Anil Dash asked people what their favorite file format was. David Robinson replied:
CSV is similar to Markdown. No one global standard (though there are attempts) but a damn good attempt at "Whatever humans think it is at a glance, they're probably right"— David Robinson (@drob) February 8, 2018
His tweet reminded me a lot of this tweet from Stephen Turner
In defense of Fahrenheit pic.twitter.com/qwDcBm0XVr— Stephen Turner (@strnr) February 20, 2015
There is a spectrum for tools from the theortically optimal to the most human usable. It is almost always a tradeoff between purity/completeness and ease of human use.
For data science tools/statistical methods development there is a ton of work that needs to be done to govern edge cases. That work can either be done by the human users of the software/tool/data or it can be done on the back end by the developers of tools. Once you start looking for this tradeoff you see it everywhere - one of my favorite recent examples is flexdashboard versus Shiny. Shiny is llows for a lot of flexibility, deals directly with edge cases, and asks the user to pay attention to those cases. Flexdashboard “just works” in the sense that for most of the stuff I’m doing it seems to almost intuit what a person with minimal training would do.
I think both kinds of approaches have their place. Most of the time I just want to use the thing that reads my mind and takes minimal intellectual overhead. Same thing for when training people new to data science (or anything else). But eventually I start running into too many edge cases and want to move to something more sophisticated.
One thing I have a really hard time with is the transition from the “just works for humans” to the “theoretically optimal”. Almost always that transition is hard/rough. I wish there was a way (both selfishly and as an instructor) to smooth the transition from the intuitive, easy tool to the more flexible but necessarily more complicated tool. That always feels like a cliff to me.comments powered by Disqus