Visualization

Introducing the healthvis R package - one line D3 graphics with R

We have been a little slow on the posting for the last couple of months here at Simply Stats. That’s bad news for the blog, but good news for our research programs! Today I’m announcing the new healthvis R package that is being developed by my student Prasad Patil (who needs a website like yesterday), Hector Corrada Bravo, and myself*. The basic idea is that I have loved D3 interactive graphics for a while.

Sunday data/statistics link roundup (1/27/2013)

Wisconsin is decoupling the education and degree granting components of education. This means if you take a MOOC like mine, Brian’s or Roger’s and there is an equivalent class to pass at Wisconsin, you can take the exam and get credit. This is big. (via Rafa) 1. Wisconsin is decoupling the education and degree granting components of education. This means if you take a MOOC like mine, Brian’s or Roger’s and there is an equivalent class to pass at Wisconsin, you can take the exam and get credit.

Sunday Data/Statistics Link Roundup (9/9/12)

Not necessarily statistics related, but pretty appropriate now that the school year is starting. Here is a little introduction to “how to google” (via Andrew J.). Being able to “just google it” and find answers for oneself without having to resort to asking folks is maybe the #1 most useful skill as a statistician.  A really nice presentation on interactive graphics with the googleVis package. I think one of the most interesting things about the presentation is that it was built with markdown/knitr/slidy (see slide 53).

Sunday Data/Statistics Link Roundup (9/2/2012)

Just got back from IBC 2012 in Kobe Japan. I was in an awesome session (organized by the inimitable Lieven Clement) with great talks by Matt McCall, Djork-Arne Clevert, Adetayo Kasim, and Willem Talloen. Willem’s talk nicely tied in our work and how it plays into the pharmaceutical development process and the bigger theme of big data. On the way home through SFO I saw this hanging in the airport.

How do I know if my figure is too complicated?

One of the key things every statistician needs to learn is how to create informative figures and graphs. Sometimes, it is easy to use off-the-shelf plots like barplots, histograms, or if one is truly desperate a pie-chart. But sometimes the information you are trying to communicate requires the development of a new graphic. I am currently working on a project with a graduate student where the standard illustration are Venn Diagrams - including complicated Venn Diagrams with 5 or 10 circles.

Sunday data/statistics link roundup (4/22)

Now we know who is to blame for the pie chart. I had no idea it had been around, straining our ability to compare relative areas, since 1801. However, the same guy (William Playfair) apparently also invented the bar chart. So he wouldn’t be totally shunned by statisticians. (via Leonid K.) A nice article in the Guardian about the current group of scientists that are boycotting Elsevier. I have to agree with the quote that leads the article, “All professions are conspiracies against the laity.

Sunday data/statistics link roundup (4/15)

Incredibly cook, dynamic real-time maps of wind patterns in the United States. (Via Flowing Data) A d3.js coding tool that updates automatically as you update the code. This is going to be really useful for beginners trying to learn about D3. Real time coding (Via Flowing Data) An interesting blog post describing why the winning algorithm in the Netflix prize hasn’t actually been implemented! It looks like it was too much of an engineering hassle.

R and the little data scientist's predicament

I just read this fascinating post on _why, apparently a bit of a cult hero among enthusiasts of the Ruby programming language. One of the most interesting bits was The Little Coder’s Predicament, which boiled down essentially says that computer programming languages have grown too complex - so children/newbies can’t get the instant gratification when they start programming. He suggested a simplified “gateway language” that would get kids fired up about programming, because with a simple line of code or two they could make the computer do things like play some music or make a video.

Sunday data/statistics link roundup (3/4)

A cool article on Github by the folks at Wired. I’m starting to think the fact that I’m not on Github is a serious dent in my nerd cred.  Datawrapper - a less intensive, but less flexible open source data visualization creator. I have seen a few of these types of services starting to pop up. I think that some statistics training should be mandatory before people use them.

A wordcloud comparison of the 2011 and 2012 #SOTU

I wrote a quick (and very dirty) R script for creating a comparison cloud and a commonality cloud for President Obama’s 2011 and 2012 State of the Union speeches. The cloud on the left shows words that have different frequencies between the two speeches and the cloud on the right shows the words in common between the two speeches. Here is a higher resolution version. The focus on jobs hasn’t changed much.

An R function to map your Twitter Followers

I wrote a little function to make a personalized map of who follows you or who you follow on Twitter. The idea for this function was inspired by some plots I discussed in a previous post. I also found a lot of really useful code over at flowing data here. The function uses the packages twitteR, maps, geosphere, and RColorBrewer. If you don’t have the packages installed, when you source the twitterMap code, it will try to install them for you.

Interview with Nathan Yau of FlowingData

Nathan Yau Nathan Yau is a graduate student in statistics at UCLA and the author of the extremely popular data visualization blog flowingdata.com. He recently published a book Visualize This-a really nice guide to modern data visualization using R, Illustrator and Javascript - which should be on the bookshelf of any statistician working on data visualization. Do you consider yourself a statistician/data scientist/or something else?

Visualizing Yahoo Email

Here is a cool page where yahoo shows you the email it is processing in real time. It includes a visualization of the most popular words in emails at a given time. A pretty neat tool and definitely good for procrastination, but I’m not sure what else it is good for…

Spectacular Plots Made Entirely in R

When doing data analysis, I often create a set of plots quickly just to explore the data and see what the general trends are. Later I go back and fiddle with the plots to make them look pretty for publication. But some people have taken this to the next level. Here are two plots made entirely in R: The descriptions of how they were created are here and here. Related: Check out Roger’s post on R colors and my post on APIs

Communicating uncertainty visually

From a cool review about communicating risk to people without statistical/probabilistic training. Despite the burgeoning interest in infographics, there is limited experimental evidence on how different types of visualizations are processed and understood, although the effectiveness of some graphics clearly depends on the relative numeracy of an audience. 

Data visualization and art

Mark Hansen is easily one of my favorite statisticians today. He is a Professor of Statistics at UCLA and his collaborations with artists have brought data visualization to a whole new place, one that is both informative and moving. Here is a video of his project with Ben Rubin called Listening Post. The installation grabs conversations from unrestricted chat rooms and processes them in real-time to create interesting “themes” or “movements”.