Tag: visualization

21
Dec

An R function to map your Twitter Followers

I wrote a little function to make a personalized map of who follows you or who you follow on Twitter. The idea for this function was inspired by some plots I discussed in a previous post. I also found a lot of really useful code over at flowing data here

The function uses the packages twitteR, maps, geosphere, and RColorBrewer. If you don’t have the packages installed, when you source the twitterMap code, it will try to install them for you. The code also requires you to have a working internet connection. 

One word of warning is that if you have a large number of followers or people you follow, you may be rate limited by Twitter and unable to make the plot.

To make your personalized twitter map, first source the function:

> source(“http://biostat.jhsph.edu/~jleek/code/twitterMap.R”)

The function has the following form: 

twitterMap <- function(userName,userLocation=NULL,fileName=”twitterMap.pdf”,nMax = 1000,plotType=c(“followers”,”both”,”following”))

with arguments:

  • userName - the twitter username you want to plot
  • userLocation - an optional argument giving the location of the user, necessary when the location information you have provided Twitter isn’t sufficient for us to find latitude/longitude data
  • fileName - the file where you want the plot to appear
  • nMax - The maximum number of followers/following to get from Twitter, this is implemented to avoid rate limiting for people with large numbers of followers. 
  • plotType - if “both” both followers/following are plotted, etc. 

Then you can create a plot with both followers/following like so: 

> twitterMap(“simplystats”)

Here is what the resulting plot looks like for our Twitter Account:

If your location can’t be found or latitude longitude can’t be calculated, you may have to chose a bigger city near you. The list of cities used by twitterMap can be found like so:

>library(maps)

>data(world.cities)

>grep(“Baltimore”, world.cities[,1])

If your city is in the database, this will return the row number of the world.cities data frame corresponding to your city. 

If you like this function you may also like our function to determine if you are a data scientist or to analyze your Google Scholar citations page.
Update: The bulk of the heavy lifting done by these functions is performed by Jeff Gentry’s very nice twitteR package and code put together by Nathan Yau over at FlowingData. This is really an example of standing on the shoulders of giants. 
16
Dec

Interview with Nathan Yau of FlowingData

Nathan Yau

Nathan Yau is a graduate student in statistics at UCLA and the author of the extremely popular data visualization blog flowingdata.com. He recently published a book Visualize This - a really nice guide to modern data visualization using R, Illustrator and Javascript - which should be on the bookshelf of any statistician working on data visualization. 

Do you consider yourself a statistician/data scientist/or something else?

Statistician. I feel like statisticians can call them data scientists, but not the other way around. Although with data scientists there’s an implied knowledge of programming, which statisticians need to get better at.

Who have been good mentors to you and what qualities have been most helpful for you?

I’m visualization-focused, and I really got into the area during a summer internship at The New York Times. Before that, I mostly made graphs in R for reports. I learned a lot about telling stories with data and presenting data to a general audience, and that has stuck with me ever since.

Similarly, my adviser Mark Hansen has showed me how data is more free-flowing and intertwined with everything. It’s hard to describe. I mean coming into graduate school, I thought in terms of datasets and databases, but now I see it as something more organic. I think that helps me see what the data is about more clearly.

How did you get into statistics/data visualization?

In undergrad, an introduction to statistics (for engineering) actually pulled me in. The professor taught with so much energy, and the material sort of clicked with me. My friends who were also taking the course complained and had trouble with it, but I wanted more for some reason. I eventually switched from electrical engineering to statistics.

I got into visualization during my first year in grad school. My adviser gave a presentation on visualization, but from a media arts perspective rather than a charts-and-graphs-in-R-Tufte point of view. I went home after that class, googled visualization and that was that.

Why do you think there has been an explosion of interest in data visualization?

The Web is a really visual place, so it’s easy for good visualization to spread. It’s also easier for a general audience to read a graph than it is to understand statistical concepts. And from a more analytical point of view, there’s just a growing amount of data and visualization is a good way to poke around.

Other than R, what tools should students learn to improve their data visualizations?

For static graphics, I use Illustrator all the time to bring storytelling into the mix or to just provide some polish. For interactive graphics on the Web, it’s all about JavaScript nowadays. D3, Raphael.js, and Processing.js are all good libraries to get started.

Do you think the rise of infographics has led to a “watering down” of data visualization?

So I actually just wrote a post along these lines. It’s true that there a lot of low-quality infographics, but I don’t think that takes away from visualization at all. It makes good work more obvious. I think the flood of infographics is a good indicator of people’s eagerness to read data.

How did you decide to write your book “Visualize This”?
Pretty simple. I get emails and comments all the time when I post graphics on FlowingData that ask how something was done. There aren’t many resources that show people how to do that. There are books that describe what makes good graphics but don’t say anything about how to actually go about doing it, and there are programming books for say, R, but are too technical for most and aren’t visualization-centric. I wanted to write a book that I wish I had in the early days.
Any final thoughts on statistics, data and visualization? 

Keep an open mind. Oftentimes, statisticians seem to box themselves into positions of analysis and reports. Statistics is an applied field though, and now more than ever, there are opportunities to work anywhere there is data, which is practically everywhere.
26
Oct

Visualizing Yahoo Email

Here is a cool page where yahoo shows you the email it is processing in real time. It includes a visualization of the most popular words in emails at a given time. A pretty neat tool and definitely good for procrastination, but I’m not sure what else it is good for…

18
Oct

Spectacular Plots Made Entirely in R

When doing data analysis, I often create a set of plots quickly just to explore the data and see what the general trends are. Later I go back and fiddle with the plots to make them look pretty for publication. But some people have taken this to the next level. Here are two plots made entirely in R:

The descriptions of how they were created are here and here.

Related: Check out Roger’s post on R colors and my post on APIs

15
Sep

Communicating uncertainty visually

From a cool review about communicating risk to people without statistical/probabilistic training.

Despite the burgeoning interest in infographics, there is limited experimental evidence on how different types of visualizations are processed and understood, although the effectiveness of some graphics clearly depends on the relative numeracy of an audience. 

09
Sep

Data visualization and art

Mark Hansen is easily one of my favorite statisticians today. He is a Professor of Statistics at UCLA and his collaborations with artists have brought data visualization to a whole new place, one that is both informative and moving. 

Here is a video of his project with Ben Rubin called Listening Post. The installation grabs conversations from unrestricted chat rooms and processes them in real-time to create interesting “themes” or “movements”. I believe this one is called “I am” and the video is taken from the Whitney Museum of American Art.

[youtube http://www.youtube.com/watch?v=dD36IajCz6A&w=420&h=345]

Here some pretty cool time-lapse photography of the installation of Listening Post at the San Jose Museum of Art

[youtube http://www.youtube.com/watch?v=cClHQU6Fqro]