The Johns Hopkins Data Science Specialization on Coursera

Tweet about this on TwitterShare on Facebook75Share on Google+48Share on LinkedIn62Email this to someone

We are very proud to announce the the Johns Hopkins Data Science Specialization on Coursera. You can see the official announcement from the Coursera folks here. This is the main reason Simply Statistics has been a little quiet lately.

The three of us (Brian Caffo, Roger Peng, and Jeff Leek) along with a couple of incredibly hard working graduate students (Nick Carchedi of swirl fame and Sean Kross) have put together nine new one-month classes to run on the Coursera platform. The classes are:

  1. The Data Scientist's Toolbox  - A basic introduction to data and data science and a  basic guide to R/Rstudio/Github/Command Line Interface.
  2. R Programming  - Introduction to R programming, from installing R to types, to functions, to control structures.
  3. Getting and Cleaning Data - An introduction to getting data from the web, from images, from APIs, and from databases. The course also covers how to go from raw data to tidy data.
  4. Exploratory Data Analysis - This course covers plotting in base graphics, lattice, ggplot2 and clustering and other exploratory techniques. It also covers how to think about exploring data you haven't seen.
  5. Reproducible Research  - This is one of the unique courses to our sequence. It covers how to think about reproducible research, evidence based data analysis, reproducible research checklists and knitr, markdown, R markdown, etc.
  6. Statistical Inference  - This course covers the fundamentals of statistical inference from a practical perspective. The course covers both the technical details and important ideas like confounding.
  7. Regression Models  - This course covers the fundamentals of linear and generalized linear regression modeling. It also serves as an introduction to how to "think about" relating variables to each other quantitatively.
  8. Practical Machine Learning  - This course will cover the basic conceptual ideas in machine learning like in/out of sample errors, cross validation, and training and test sets. It will also cover a range of machine learning algorithms and their practical implementation.
  9. Developing Data Products  - This course will cover how to develop tools for communicating data, methods, and analyses with other people. It will cover building R packages, Shiny, and Slidify, among other things.

There will also be a specialization project - consisting of a 10th class where students will work on projects conducted with industry, government, and academic partners.

The classes represent some of the content we have previously covered in our popular Coursera classes and a ton of brand new content for this specialization. Here are some things that I think make our program stand out:

  • We will roll out 3 classes at a time starting in April. Once a class is running, it will run every single month concurrently.
  • The specialization offers a bunch of unique content, particularly in the courses Getting and Cleaning Data, Reproducible Research, and Developing Data Products.
  • All of the content is being developed open source and open-access on Github. You are welcome to check it out as we develop it and contribute!
  • You can take the first 9 courses of the specialization entirely for free.
  • You can choose to pay a very modest fee to get "Signature Track" certification in every course.

I have also created a little page that summarizes some of the unique aspects of our program. Scroll through it and you'll find sharing links at the bottom. Please share with your friends, we think this is pretty cool: http://jhudatascience.org

  • enedene

    This is great news!

  • ProspectiveStatStudent


    I find the initiative really interesting. However, it would be desirable to know which kind of project the prospective student would need to accomplish before enrolling:
    - Would it mean the student would become cheap labor for the industry, the government or the academia?

    - What if it would create conflict of interest with the student's job?

    • StevenS123

      This is the opportunity for you to get a cheap education from top school, and a chance to put something to show for in your portfolio by completing a real project. You will not replace 100k$/year professionals with your four week assignment using tools you've learned in 3 months taking this program.

      In fact this is a great opportunity both for you and companies to cheaply and accurately asses potential workers, and once they employ you, they will pay you, as far as I know, good data scientists are in a high demand.

      As far as students jobs. I don't see how this would negatively affect the student jobs, quite the opposite, students that complete this course and make a good last project could more easily find a job, as they have something to show for. Sure, the job market could get denser, but the only people that could potentially have problem with this are the ones going to top universities, as now everyone could get a chance, and that scenario looks good to me, the idea that everyone get's a chance, not just a few who can afford it.

  • John Warlop

    Hello, I'm a current student in Coursera's "Computing for Data Analysis". I'm interesting in your "Data Science Specialization". As far as I can tell the "Computing for Data Analysis" class appears similar/identical to "R programming" class. I'd like to get the "Data Science Specialization" certificate, but I'd like to use "Computing for Data Analysis" to take the place of "R programming", do you plan on accepting this substitution?

  • John Warlop

    Don't know what happened with my other post. Can I substitute "Computing for Data Analysis" for "R Programming"?

    • cyoung

      I also would like to know about this, because I'm in "Computing for Data Analysis" course and interested in this specialization course.

  • flg

    To join the specialization you need to pay for the verified ceritficate? doesn't seem to be an option to enroll without payment.

  • Tianyuan Cui

    As a big fan of your classes in coursera, this is such a great news to me!See you in coursera

  • jshoyer

    Presumably you'll stop offering your 'Computing for data analysis' and 'Data analysis' then?

    Thanks for all the new course material!

  • Will Johnson

    Noooo!! My life will be consumed by Coursera classes. I'm very excited for the nine courses. A month sounds like a very short time though. Is there an estimate of how much overlap will there be with previous classes?

  • ImAndy

    You can take the first 9 courses of the specialization entirely for free
    So how come each one is tabbed at $49 on the coursera site?

    • Chase

      You pay $49 for a formal certificate. You can take the class for free but it cost $49 to get the certificate verifying your completion. Same as with most Coursera/Edx courses.

  • Michael Timothy

    Awesome. Looking forward to it.

  • Jason

    You can take the first 9 courses of the specialization entirely for free.
    so does this mean the courses coming after will not be free?

  • ImAndy

    You can take the first 9 courses of the specialization entirely for free.
    According to coursera site it is $49 per class

  • Ricardo Vladimiro

    Great news. I'm currently taking Computing Data Analysis (with Signature Track) and I'm waiting for Jeff's Data Analysis. I'm having a lot of fun with MOOCs related to data and statistics and I'm looking forward for this opportunity. Keep up the good work and thank you for sharing your knowledge!

  • Val

    They are not free as advertized at the top of this page: http://jhudatascience.org/
    Coursera wants $490 for rehash of classes offered in the past.

  • Pedro Medeiros

    Will the specialization be repeated in a posterior date? April is impossible for me but I am very interested in this program.

  • http://www.twentylys.com/ TwentyLYS

    These seems to be a real deal! Great help to people who want to get it started!!

  • Jose Antonio Gallego Vázquez

    I´m really interested in these courses, but: how many can I make at the same time?. Shall I start with the first, and when finished go for the second, or can I make two or three at the same time? (no previous experience or training with the topic) Best.

  • Paul s

    this specialization track idea seems great, but the practical how to is difficult to follow. i see 9 classes in an order - that's great. but, clearly, to schedule out from 1 to 9, i am forced to overlap some classes with others. which ones should i do concurrently? there isn't any guidance on the best way to go through the full set, and i find it quite frustrating.

  • SB

    How does this compare to the stat2x series on edX?

  • Jan


    These courses are very good intro courses. They can never replace full-fledged college courses, where the experience of a Prof is shared first hand (Masters and above). I have a Masters degree in this area - from 1979. Of course it was called something else back then and there were two major differences: 1) the amount of data was not as large (tho' it was "big" by that time's standards and 2) the presentation layer, as compared to now, was nonexistent.

    Other than that nothing much has changed. R is a newer language, easier to use, etc, but nothing new. More brain power at ones disposal - due to the internet