Simply Statistics

29
Oct

Computing for Data Analysis (Simply Statistics Edition)

As the entire East Coast gets soaked by Hurricane Sandy, I can’t help but think that this is the perfect time to…take a course online! Well, as long as you have electricity, that is. I live in a heavily tree-lined area and so it’s only a matter of time before the lights cut out on me (I’d better type quickly!). 

I just finished teaching my course Computing for Data Analysis through Coursera. This was my first experience teaching a course online and definitely my first experience teaching a course to > 50,000 people. There were definitely some bumps along the road, but the students who participated were fantastic at helping me smooth the way. In particular, the interaction on the discussion forums was very helpful. I couldn’t have done it without the students’ help. So, if you took my course over the past 4 weeks, thanks for participating!

Here are a couple quick stats on the course participation (as of today) for the curious:

  • 50,899: Number of students enrolled
  • 27,900: Number of users watching lecture videos
  • 459,927: Total number of streaming views (over 4 weeks)
  • 414,359: Total number of video downloads (not all courses allow this)
  • 14,375: Number of users submitting the weekly quizzes (graded)
  • 6,420: Number of users submitting the bi-weekly R programming assignments (graded)
  • 6393+3291: Total number of posts+comments to the discussion forum
  • 314,302: Total number of views in the discussion forum

I’ve received a number of emails from people who signed up in the middle of the course or after the course finished. Given that it was a 4-week course, signing up in the middle of the course meant you missed quite a bit of material. I will eventually be closing down the Coursera version of the course—at this point it’s not clear when it will be offered again on that platform but I would like to do so—and so access to the course material will be restricted. However, I’d like to make that material more widely available even if it isn’t in the Coursera format.

So I’m announcing today that next month I’ll be offering the Simply Statistics Edition of Computing for Data Analysis. This will be a slightly simplified version of the course that was offered on Coursera since I don’t have access to all of the cool platform features that they offer. But all of the original content will be available, including some new material that I hope to add over the coming weeks.

If you are interested in taking this course or know of someone who is, please check back here soon for more details on how to sign up and get the course information.

28
Oct

Sunday Data/Statistics Link Roundup (10/28/12)

  1. An important article about anti-science sentiment in the U.S. (via David S.). The politicization of scientific issues such as global warming, evolution, and healthcare (think vaccination) makes the U.S. less competitive. I think the lack of statistical literacy and training in the U.S. is one of the sources of the problem. People use/skew/mangle statistical analyses and experiments to support their view and without a statistically well trained public, it all looks “reasonable and scientific”. But when science seems to contradict itself, it loses credibility. Another reason to teach statistics to everyone in high school.
  2. Scientific American was loaded this last week, here is another article on cancer screening.  The article covers several of the issues that make it hard to convince people that screening isn’t always good. The predictive value of the positive confusion is a huge one in cancer screening right now. The author of the piece is someone worth following on Twitter @hildabast.
  3. A bunch of data on the use of Github. Always cool to see new data sets that are worth playing with for student projects, etc. (via Hilary M.). 
  4. A really interesting post over at Stats Chat about why we study seemingly obvious things. Hint, the reason is that “obvious” things aren’t always true. 
  5. A story on “sentiment analysis” by NPR that suggests that most of the variation in a stock’s price during the day can be explained by the number of Facebook likes. Obviously, this is an interesting correlation. Probably more interesting for hedge funders/stockpickers if the correlation was with the change in stock price the next day. (via Dan S.)
  6. Yihui Xie visited our department this week. We had a great time chatting with him about knitr/animation and all the cool work he is doing. Here are his slides from the talk he gave. Particularly check out his idea for a fast journal. You are seeing the future of publishing.  
  7. Bonus Link: R is a trendy open source technology for big data
26
Oct

I love those first discussions about a new research project

That has got to be the best reason to stay in academia. The meetings where it is just you and a bunch of really smart people thinking about tackling a new project, coming up with cool ideas, and dreaming about how you can really change the way the world works are so much fun.

There is no part of a research job that is better as far as I’m concerned.  It is always downhill after that, you start running into pebbles, your code doesn’t work, or your paper gets rejected. But that first blissful planning meeting always seems so full of potential.

Just had a great one like that and am full of optimism.  

23
Oct

Let's make the Joint Statistical Mettings modular

Have you ever met a statistician that enjoys the joint statistical meetings (JSM)? I haven’t. With the exception of the one night we catch up with old friends there are few positive things we can say about JSM.They are way too big and the two talks I want to see are always somehow scheduled at the same time as mine.

But statisticians actually like conferences. Most of us have a favorite statistics conference, or session within a bigger subject matter conference, that we look forward to going to. But it’s never JSM. So why can’t JSM just be a collection of these conferences? For sure we should drop the current format and come up with something new.

I propose that we start by giving each ASA section two non-concurrent sessions scheduled on two consecutive days (perhaps more slots for bigger sections) and let them do whatever they want. Hopefully they would turn this into the conference that they want to go to. It’s our meeting, we pay for it, so let’s turn it into something we like.

22
Oct

A statistical project bleg (urgent-ish)

We all know that politicians can play it a little fast and loose with the truth. This is particularly true in debates, where politicians have to think on their feet and respond to questions from the audience or from each other. 

Usually, we find out about how truthful politicians are in the “post-game show”. The discussion of the veracity of the claims is usually based on independent fact checkers such as PolitiFact. Some of these fact checkers (Politifact in particular) live-tweet their reports on many of the issues discussed during the debate. This is possible, since both candidates have a pretty fixed set of talking points they use, so it is near real time fact-checking. 

What would be awesome is if someone could write an R script that would scrape the live data off of Politifact’s Twitter account and create a truthfullness meter that looks something like CNN’s instant reaction graph (see #7) for independent voters. The line would show the moving average of how honest each politician was being. How cool would it be to show the two candidates and how truthful they are being? If you did this, tell me it wouldn’t be a feature one of the major news networks would pick up…

21
Oct

Sunday Data/Statistics Link Roundup (10/21/12)

  1. This is scientific variant on the #whatshouldwecallme meme isn’t exclusive to statistics, but it is hilarious. 
  2. This is a really interesting post that is a follow-up to the XKCD password security comic. The thing I find most interesting about this is that researchers realized the key problem with passwords was that we were looking at them purely from a computer science perspective. But people use passwords, so we need a person-focused approach to maximize security. This is a very similar idea to our previous post on an experimental foundation for statistics. Looks like Di Cook and others are already way ahead of us on this idea. It would be interesting to redefine optimality incorporating the knowledge that most of the time it is a person running the statistics. 
  3. This is another fascinating article about the math education wars. It starts off as the typical dueling schools issue in academia - two different schools of thought who routinely go after the other side. But the interesting thing here is it sounds like one side of this math debate is being waged by a person collecting data and the other is being waged by a side that isn’t. It is interesting how many areas are being touched by data - including what kind of math we should teach. 
  4. I’m going to visit Minnesota in a couple of weeks. I was so pumped up to be an outlaw. Looks like I’m just a regular law abiding citizen though….
  5. Here are outstanding summaries of what went on at the Carl Morris Big Data conference this last week. Tons of interesting stuff there. Parts one, two, and three
20
Oct
20
Oct
19
Oct

Simply Statistics Podcast #4: Interview with Rebecca Nugent

Interview with Rebecca Nugent of Carnegie Mellon University.

In this episode Jeff and I talk with Rebecca Nugent, Associate Teaching Professor in the Department of Statistics at Carnegie Mellon University. We talk with her about her work with the Census and the growing interest in statistics among undergraduates.

18
Oct

Statistics isn't math but statistics can produce math

Mathgen, the web site that can produce randomly generated mathematics papers has apparently gotten a paper accepted in a peer-reviewed journal (although perhaps not the most reputable one). I am not at all surprised this happened, but it’s fun to read both the paper and the reviewer’s comments. 

(Thanks to Kasper H. for the pointer.)