Tag: analytics

06
Jan

Sunday data/statistics link roundup (1/6/2013)

  1. Not really statistics, but this is an interesting article about how rational optimization by individual actors does not always lead to an optimal solutiohn. Related, ere is the coolest street sign I think I've ever seen, with a heatmap of traffic density to try to influence commuters.
  2. An interesting paper that talks about how clustering is only a really hard problem when there aren't obvious clusters. I was a little disappointed in the paper, because it defines the "obviousness" of clusters only theoretically by a distance metric. There is very little discussion of the practical distance/visual distance metrics people use when looking at clustering dendograms, etc.
  3. A post about the two cultures of statistical learning and a related post on how data-driven science is a failure of imagination. I think in both cases, it is worth pointing out that the only good data science is good science - i.e. it seeks to answer a real, specific question through the scientific method. However, I think for many modern scientific problems it is pretty naive to think we will be able to come to a full, mechanistic understanding complete with tidy theorems that describe all the properties of the system. I think the real failure of imagination is to think that science/statistics/mathematics won't change to tackle the realistic challenges posed in solving modern scientific problems.
  4. A graph that shows the incredibly strong correlation ( > 0.99!) between the growth of autism diagnoses and organic food sales. Another example where even really strong correlation does not imply causation.
  5. The Buffalo Bills are going to start an advanced analytics department (via Rafa and Chris V.), maybe they can take advantage of all this free play-by-play data from years of NFL games.
  6. A prescient interview with Isaac Asimov on learning, predicting the Kahn Academy, MOOCs and other developments in online learning (via Rafa and Marginal Revolution).
  7. The statistical software signal - what your choice of software says about you. Just another reason we need a deterministic statistical machine.

 

02
Sep

Sunday Data/Statistics Link Roundup (9/2/2012)

  1. Just got back from IBC 2012 in Kobe Japan. I was in an awesome session (organized by the inimitable Lieven Clement) with great talks by Matt McCall, Djork-Arne Clevert, Adetayo Kasim, and Willem Talloen. Willem’s talk nicely tied in our work and how it plays into the pharmaceutical development process and the bigger theme of big data. On the way home through SFO I saw this hanging in the airport. A fitting welcome back to the states. Although, as we talked about in our first podcast, I wonder how long the Big Data hype will last…
  2. Simina B. sent this link along for a masters program in analytics at NC State. Interesting because it looks a lot like a masters in statistics program, but with a heavier emphasis on data collection/data management. I wonder what role the stat department down there is playing in this program and if we will see more like it pop up? Or if programs like this with more data management will be run by stats departments other places. Maybe our friends down in Raleigh have some thoughts for us. 
  3. If one set of weekly links isn’t enough to fill your procrastination quota, go check out NextGenSeek’s weekly stories. A bit genomics focused, but lots of cool data/statistics links in there too. Love the “extreme Venn diagrams”. 
  4. This seems almost like the fast statistics journal I proposed earlier. Can’t seem to access the first issue/editorial board either. Doesn’t look like it is open access, so it’s still not perfect. But I love the sentiment of fast/single round review. We can do better though. I think Yihue X. has some really interesting ideas on how. 
  5. My wife taught for a year at Grinnell in Iowa and loved it there. They just released this cool data set with a bunch of information about the college. If all colleges did this, we could really dig in and learn a lot about the American secondary education system (link via Hilary M.). 
  6. From the way-back machine, a rant from Rafa about meetings. Stayed tuned this week for some Simply Statistics data about our first year on the series of tubes
30
Nov

Selling the Power of Statistics

A few weeks ago we learned that Warren Buffett is a big IBM fan (a $10 billion fan, that is). Having heard that I went over to the IBM web site to see what they’re doing these days. For starters, they’re not selling computers anymore! At least not the kind that I would use. One of the big things they do now is “Business Analytics and Optimization” (i.e. statistics), which is one of the reasons they bought SPSS and then later Algorithmics.

Roaming around the IBM web site, I found this little video on how IBM is involved with tennis matches like the US Open. It’s the usual promo video: a bit cheesy, but pretty interesting too. For example, they provide all the players an automatically generated post-game “match analysis DVD” that has summaries of all the data from their match with corresponding video.

It occurred to me that one of the challenges that a company like IBM faces is selling the “power of analytics” to other companies. They need to make these promo videos because, I guess, some companies are not convinced they need this whole analytics thing (or at least not from IBM). They probably need to do methods and software development too, but getting the deal in the first place is at least as important.

In contrast, here at Johns Hopkins, my experience has been that we don’t really need to sell the “power of statistics” to anyone. For the most part, researchers around here seem to be already “sold”. They understand that they are collecting a ton of data and they’re going to need statisticians to help them understand it. Maybe Hopkins is the exception, but I doubt it.

Good for us, I suppose, for now. But there is a danger that we take this kind of monopoly position for granted. Companies like IBM hire the same people we do (including one grad school classmate) and there’s no reason why they couldn’t become direct competitors. We need to continuously show that we can make sense of data in novel ways.