Editor’s note: This is a repost of “R and the little data scientist’s predicament”. A brief idea for an update is presented at the end in italics.
I just read this fascinating post on _why, apparently a bit of a cult hero among enthusiasts of the Ruby programming language. One of the most interesting bits was The Little Coder’s Predicament, which boiled down essentially says that computer programming languages have grown too complex - so children/newbies can’t get the instant gratification when they start programming. He suggested a simplified “gateway language” that would get kids fired up about programming, because with a simple line of code or two they could make the computer do things like play some music or make a video.
I feel like there is a similar ramp up with data scientists. To be able to do anything cool/inspiring with data you need to know (a) a little statistics, (b) a little bit about a programming language, and (c) quite a bit about syntax.
Wouldn’t it be cool if there was an R package that solved the little data scientist’s predicament? The package would have to have at least some of these properties:
That alone would be a huge start. In just 2 lines kids could load and visualize cool data in a pretty way they could show their parents/friends.
Update: Now that Nick and co. have created swirl the technology is absolutely in place to have people do something awesome quickly. You could imagine taking the airplane data and immediately having them make a plot of all the flights using ggplot. Or any number of awesome government data sets and going straight to ggvis. Solving this problem is now no longer technically a challenge, it is just a matter of someone coming up with an amazing swirl module that immediately sucks students in. This would be a really awesome project for a grad student or even an undergrad with an interest in teaching. If you do do it, you should absolutely send it our way and we’ll advertise the heck out of it!