A little while ago I wrote a post on statistics projects ideas for students. In honor of the first Simply Statistics Coursera offering, Computing for Data Analysis, here is a new list of student projects for folks excited about trying out those new R programming skills. Again we have rated each project with my best guess difficulty and effort required. Happy computing!
Data Analysis
- Use city data to predict areas with the highest risk for parking tickets. Here is the data for Baltimore. (Difficulty: Moderate, Effort: Low/Moderate)
- If you have a Fitbit with a premium account, download the data into a spreadsheet (or get Chris’s data) Then build various predictors using the data: (a) are you running or walking, (b) are you having a good day or not, (c) did you eat well that day or not, (d) etc. For special bonus points create a blog with your new discoveries and share your data with the world. (Difficulty: Depends on what you are trying to predict, Effort: Moderate with Fitbit/Jawbone/etc.)
Data Collection/Synthesis
- Make a list of skills associated with each component of the Data Scientist Venn Diagram. Then update the data scientist R function described in this post to ask a set of questions, then plot people on the diagram. Hint, check out the readline() function. (_Difficulty: Moderately low, Effort:__Moderate)_
- HealthData.gov has a ton of data from various sources about public health, medicines, etc. Some of this data is super useful for projects/analysis and some of it is just data dumps. Create an R package that downloads data from healthdata.gov and gives some measures of how useful/interesting it is for projects (e.g. number of samples in the study, number of variables measured, is it summary data or raw data, etc.) (Difficulty: Moderately hard, Effort: High)
- Build an up-to-date aggregator of R tutorials/how-to videos, summarize/rate each one so that people know which ones to look at for learning which tasks. (Difficulty: Low, Effort: Medium)
Tool building
- Build software that creates a 2-d author list and averages people’s 2-d author lists. (Difficulty: Medium, Effort: Low)
- Create an R package that interacts with and downloads data from government websites and processes it in a way that is easy to analyze. (Difficulty: Medium, Effort: High)
_
_