## The landscape of data analysis

I have been getting some questions via email, LinkedIn, and Twitter about the content of the Data Analysis class I will be teaching for Coursera. Data Analysis and Data Science mean different things to different people. So I made a video describing how Data Analysis fits into the landscape of other quantitative classes here:

Here is the corresponding presentation. I also made a tentative list of topics we will cover, subject to change at the instructor's whim. Here it is:

• The structure of a data analysis  (steps in the process, knowing when to quit, etc.)
• Types of data (census, designed studies, randomized trials)
• Types of data analysis questions (exploratory, inferential, predictive, etc.)
• How to write up a data analysis (compositional style, reproducibility, etc.)
• Plotting data for exploratory purposes (boxplots, scatterplots, etc.)
• Exploratory statistical models (clustering)
• Statistical models for inference (linear models, basic confidence intervals/hypothesis testing)
• Basic model checking (primarily visually)
• The prediction process
• Study design for prediction
• Cross-validation
• A couple of simple prediction models
• Basics of simulation for evaluating models
• Ways you can fool yourself and how to avoid them (confounding, multiple testing, etc.)

Of course that is a ton of material for 8 weeks and so obviously we will be covering just the very basics. I think it is really important to remember that being a good Data Analyst is like being a good surgeon or writer. There is no such thing as a prodigy in surgery or writing, because it requires long experience, trying lots of things out, and learning from mistakes. I hope to give people the basic information they need to get started and point to resources where they can learn more. I also hope to give them a chance to practice a couple of times some basics and to learn that in data analysis the first goal is to "do no harm".

• Sophie

Hey Jeff : In addition to teaching the statistical underpinnings of data analysis, will you at all be covering the management of very large data sets? For example : using something manageable like the Iris data set would allow an understanding of the statistical techniques, but many data sets are way larger than the average person's machine can handle.

• jtleek

Sophie, I will probably briefly touch on some aspects of big data (feature selection, multiple testing, etc.) but the focus of the class isn't on data management for large data sets. I will point to some resources and platforms that can be used to handle these data in specific cases, but won't go over them in detail.

• Bill

How about a link to the actual presentation, rather than to some commercial site which wants personal information?

• jtleek

The presentation is built on that commercial site. Feel free to give it incorrect personal information.

• Bill

Okay, but that won't work as well with a tablet, as they want you to install their own! proprietary! app!
Anyway, thanks for posting it.

• jtleek

I didn't realize it would ask for personal information. Next time I'll be sure to use open-source software!

From the web you can view the presentation without having to log in to Prezi.

• Andrew Jaffe

Nice use of Prezi!

• Stefan

I am currently taking the class "Computing for Data Analysis" (Roger Peng), I am looking forward to participating in your course, and I would love to see Rafa Irizarry's lectures on Coursera, too - is there a chance this might happen?

I really want to take this course, trying to see if I can fit it in..is there a set time this class takes place or are these pre-recorded?

You can watch the lectures on your own time.

• Satyendra Srivastava

Yes, I am doing "Data Analysis" course. I am from health background. Initially I had lot of problems- very steep learning curve! I am feeling better - now in 5th week. More in command. I still take all FOUR attempts to do my quiz (!) but am getting better at R. Never dreamt that I will ever get the hang of R and do something meaningful in it. I am beginning to enjoy and get interested in statistics because of this program.. Thanks a lot to both Professors Jeff and Peng..and Coursera!

