The landscape of data analysis

Tweet about this on TwitterShare on Facebook65Share on Google+28Share on LinkedIn16Email this to someone

I have been getting some questions via email, LinkedIn, and Twitter about the content of the Data Analysis class I will be teaching for Coursera. Data Analysis and Data Science mean different things to different people. So I made a video describing how Data Analysis fits into the landscape of other quantitative classes here:

Here is the corresponding presentation. I also made a tentative list of topics we will cover, subject to change at the instructor's whim. Here it is:

  • The structure of a data analysis  (steps in the process, knowing when to quit, etc.)
  • Types of data (census, designed studies, randomized trials)
  • Types of data analysis questions (exploratory, inferential, predictive, etc.)
  • How to write up a data analysis (compositional style, reproducibility, etc.)
  • Obtaining data from the web (through downloads mostly)
  • Loading data into R from different file types
  • Plotting data for exploratory purposes (boxplots, scatterplots, etc.)
  • Exploratory statistical models (clustering)
  • Statistical models for inference (linear models, basic confidence intervals/hypothesis testing)
  • Basic model checking (primarily visually)
  • The prediction process
  • Study design for prediction
  • Cross-validation
  • A couple of simple prediction models
  • Basics of simulation for evaluating models
  • Ways you can fool yourself and how to avoid them (confounding, multiple testing, etc.)

Of course that is a ton of material for 8 weeks and so obviously we will be covering just the very basics. I think it is really important to remember that being a good Data Analyst is like being a good surgeon or writer. There is no such thing as a prodigy in surgery or writing, because it requires long experience, trying lots of things out, and learning from mistakes. I hope to give people the basic information they need to get started and point to resources where they can learn more. I also hope to give them a chance to practice a couple of times some basics and to learn that in data analysis the first goal is to "do no harm".

  • Sophie

    Hey Jeff : In addition to teaching the statistical underpinnings of data analysis, will you at all be covering the management of very large data sets? For example : using something manageable like the Iris data set would allow an understanding of the statistical techniques, but many data sets are way larger than the average person's machine can handle.

    • jtleek

      Sophie, I will probably briefly touch on some aspects of big data (feature selection, multiple testing, etc.) but the focus of the class isn't on data management for large data sets. I will point to some resources and platforms that can be used to handle these data in specific cases, but won't go over them in detail.

  • Bill

    How about a link to the actual presentation, rather than to some commercial site which wants personal information?

    • jtleek

      The presentation is built on that commercial site. Feel free to give it incorrect personal information.

      • Bill

        Okay, but that won't work as well with a tablet, as they want you to install their own! proprietary! app!
        Anyway, thanks for posting it.

        • jtleek

          I didn't realize it would ask for personal information. Next time I'll be sure to use open-source software!

    • http://twitter.com/hspter Hilary Parker

      From the web you can view the presentation without having to log in to Prezi.

  • Andrew Jaffe

    Nice use of Prezi!

  • Pingback: The landscape of data analysis | My Daily Feeds()

  • jhprks

    Thank you this was very informative. I've been looking for a good overview of the landscape of data analysis lately and I think I've found one. I've signed up for your coursera class and I look forward to it!

  • Pingback: The landscape of data analysis | Simply Statistics Papers()

  • Mike M

    Thanks, very informative! I am now really looking forward to the course.

  • Stefan

    I am currently taking the class "Computing for Data Analysis" (Roger Peng), I am looking forward to participating in your course, and I would love to see Rafa Irizarry's lectures on Coursera, too - is there a chance this might happen?

  • http://www.facebook.com/joni.allen Joni Allen

    I really want to take this course, trying to see if I can fit it in..is there a set time this class takes place or are these pre-recorded?

    • http://twitter.com/CHOMPandSTOMP CHOMPandSTOMP

      You can watch the lectures on your own time.

  • Pingback: Data Analysis Course Starts 22 January 2013 « Reudismam()

  • Sandra

    I'm excited to take this course! thanks for the opportunity!

  • sukesh

    Looking forward to learn many new things.

  • alinsoar

    Thank you Jeff !

    This will be one of the most interesting courses I will take on coursera.

  • Satyendra Srivastava

    Yes, I am doing "Data Analysis" course. I am from health background. Initially I had lot of problems- very steep learning curve! I am feeling better - now in 5th week. More in command. I still take all FOUR attempts to do my quiz (!) but am getting better at R. Never dreamt that I will ever get the hang of R and do something meaningful in it. I am beginning to enjoy and get interested in statistics because of this program.. Thanks a lot to both Professors Jeff and Peng..and Coursera!

  • http://www.facebook.com/andy.mitchell Andy Mitchell

    I want to take a moment to recommend this class. I just finished my final quiz and Jeff really delivered.

    I came into the class with no R and a 15-year-old undergrad math minor.

    If you want super-precise lectures and quizzes that match the slides, then don't sign up.

    But if you're willing to get messy and dive into the deep end of the (data) pool, then this is the class for you.

    My biggest area of growth was in "How to write up a data analysis." Now I know that data reports can be conversational, not dry and boring.

    The class has opened opportunities in my job, too. I now have an appointment to sit down with my company's leader, two other supervisors and a peer; my laptop and R; and a Google doc of munged data to do some important data analysis on-the-fly.

    Jeff, you and the other instructors and TAs have made me smarter and more marketable. Thank you.