Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek


The internet is the greatest source of publicly available data. One of the key skills to being able to obtain data from the web is “web-scraping”, where you use a piece of software to run through a website and collect information. 

This technique can be used for collecting data from databases or to collect data that is scattered across a website. Here is a very cool little exercise in web-scraping that can be used as an example of the things that are possible. 

Related Posts: Jeff on APIs, Data Sources, Regex, and The Open Data Movement.