The Johns Hopkins Data Science Specialization on Coursera

We are very proud to announce the the Johns Hopkins Data Science Specialization on Coursera. You can see the official announcement from the Coursera folks here. This is the main reason Simply Statistics has been a little quiet lately.

The three of us (Brian Caffo, Roger Peng, and Jeff Leek) along with a couple of incredibly hard working graduate students (Nick Carchedi of swirl fame and Sean Kross) have put together nine new one-month classes to run on the Coursera platform. The classes are:

  1. The Data Scientist’s Toolbox  - A basic introduction to data and data science and a  basic guide to R/Rstudio/Github/Command Line Interface.
  2. R Programming  - Introduction to R programming, from installing R to types, to functions, to control structures.
  3. Getting and Cleaning Data - An introduction to getting data from the web, from images, from APIs, and from databases. The course also covers how to go from raw data to tidy data.
  4. Exploratory Data Analysis - This course covers plotting in base graphics, lattice, ggplot2 and clustering and other exploratory techniques. It also covers how to think about exploring data you haven't seen.
  5. Reproducible Research  - This is one of the unique courses to our sequence. It covers how to think about reproducible research, evidence based data analysis, reproducible research checklists and knitr, markdown, R markdown, etc.
  6. Statistical Inference  - This course covers the fundamentals of statistical inference from a practical perspective. The course covers both the technical details and important ideas like confounding.
  7. Regression Models  - This course covers the fundamentals of linear and generalized linear regression modeling. It also serves as an introduction to how to "think about" relating variables to each other quantitatively.
  8. Practical Machine Learning  - This course will cover the basic conceptual ideas in machine learning like in/out of sample errors, cross validation, and training and test sets. It will also cover a range of machine learning algorithms and their practical implementation.
  9. Developing Data Products  - This course will cover how to develop tools for communicating data, methods, and analyses with other people. It will cover building R packages, Shiny, and Slidify, among other things.

There will also be a specialization project - consisting of a 10th class where students will work on projects conducted with industry, government, and academic partners.

The classes represent some of the content we have previously covered in our popular Coursera classes and a ton of brand new content for this specialization. Here are some things that I think make our program stand out:

  • We will roll out 3 classes at a time starting in April. Once a class is running, it will run every single month concurrently.
  • The specialization offers a bunch of unique content, particularly in the courses Getting and Cleaning Data, Reproducible Research, and Developing Data Products.
  • All of the content is being developed open source and open-access on Github. You are welcome to check it out as we develop it and contribute!
  • You can take the first 9 courses of the specialization entirely for free.
  • You can choose to pay a very modest fee to get “Signature Track” certification in every course.

I have also created a little page that summarizes some of the unique aspects of our program. Scroll through it and you’ll find sharing links at the bottom. Please share with your friends, we think this is pretty cool: http://jhudatascience.org