Simply Statistics


Prediction Markets for Science: What Problem Do They Solve?

I've recently seen a bunch of press on this paper, which describes an experiment with developing a prediction market for scientific results. From FiveThirtyEight:

Although replication is essential for verifying results, the current scientific culture does little to encourage it in most fields. That’s a problem because it means that misleading scientific results, like those from the “shades of gray” study, could be common in the scientific literature. Indeed, a 2005 study claimed that most published research findings are false.


The researchers began by selecting some studies slated for replication in the Reproducibility Project: Psychology — a project that aimed to reproduce 100 studies published in three high-profile psychology journals in 2008. They then recruited psychology researchers to take part in two prediction markets. These are the same types of markets that people use to bet on who’s going to be president. In this case, though, researchers were betting on whether a study would replicate or not.

There are all kinds of prediction markets these days--for politics, general ideas--so having one for scientific ideas is not too controversial. But I'm not sure I see exactly what problem is solved by having a prediction market for science. In the paper, they claim that the market-based bets were better predictors of the general survey that was administrated to the scientists. I'll admit that's an interesting result, but I'm not yet convinced.

First off, it's worth noting that this work comes out of the massive replication project conducted by the Center for Open Science, where I believe they have a fundamentally flawed definition of replication. So I'm not sure I can really agree with the idea of basing a prediction market on such a definition, but I'll let that go for now.

The purpose of most markets is some general notion of "price discovery". One popular market is the stock market and I think it's instructive to see how that works. Basically, people continuously bid on the shares of certain companies and markets keep track of all the bids/offers and the completed transactions. If you are interested in finding out what people are willing to pay for a share of Apple, Inc., then it's probably best to look at...what people are willing to pay. That's exactly what the stock market gives you. You only run into trouble when there's no liquidity, so no one shows up to bid/offer, but that would be a problem for any market.

Now, suppose you're interested in finding out what the "true fundamental value" of Apple, Inc. Some people think the stock market gives you that at every instance, while others think that the stock market can behave irrationally for long periods of time. Perhaps in the very long run, you get a sense of the fundamental value of a company, but that may not be useful information at that point.

What does the market for scientific hypotheses give you? Well, it would be one thing if granting agencies participated in the market. Then, we would never have to write grant applications. The granting agencies could then signal what they'd be willing to pay for different ideas. But that's not what we're talking about.

Here, we're trying to get at whether a given hypothesis is true or not. The only real way to get information about that is to conduct an experiment. How many people betting in the markets will have conducted an experiment? Likely the minority, given that the whole point is to save money by not having people conduct experiments investigating hypotheses that are likely false.

But if market participants aren't contributing real information about an hypothesis, what are they contributing? Well, they're contributing their opinion about an hypothesis. How is that related to science? I'm not sure. Of course, participants could be experts in the field (although not necessarily) and so their opinions will be informed by past results. And ultimately, it's consensus amongst scientists that determines, after repeated experiments, whether an hypothesis is true or not. But at the early stages of investigation, it's not clear how valuable people's opinions are.

In a way, this reminds me of a time a while back when the EPA was soliciting "expert opinion" about the health effects of outdoor air pollution, as if that were a reasonable substitute for collecting actual data on the topic. At least it cost less money--just the price of a conference call.

There's a version of this playing out in the health tech market right now. Companies like Theranos and 23andMe are selling health products that they claim are better than some current benchmark. In particular, Theranos claims its blood tests are accurate when only using a tiny sample of blood. Is this claim true or not? No one outside Theranos knows for sure, but we can look to the financial markets.

Theranos can point to the marketplace and show that people are willing to pay for its products. Indeed, the $9 billion valuation of the private company is another indicator that people...highly value the company. But ultimately, we still don't know if their blood tests are accurate because we don't have any data. If we were to go by the financial markets alone, we would necessarily conclude that their tests are good, because why else would anyone invest so much money in the company?

I think there may be a role to play for prediction markets in science, but I'm not sure discovering the truth about nature is one of them.


Not So Standard Deviations: Episode 4 - A Gajillion Time Series

Episode 4 of Not So Standard Deviations is hot off the audio editor. In this episode Hilary first explains to me what heck is DevOps and then we talk about the statistical challenges in detecting rare events in an enormous set of time series data. There's also some discussion of Ben and Jerry's and the t-test, so you'll want to hang on for that.




The Statistics Identity Crisis: Am I a Data Scientist

The joint ASA/Simply Statistics webinar on the statistics identity crisis is now live!


Discussion of the Theranos Controversy with Elizabeth Matsui

Theranos is a Silicon Valley diagnostic testing company that has been in the news recently. The story of Theranos has fascinated me because I think it represents a perfect collision of the tech startup culture and the health care culture and how combining them together can generate unique problems.

I talked with Elizabeth Matsui, a Professor of Pediatrics in the Division of Allergy and Immunology here at Johns Hopkins, to discuss Theranos, the realities of diagnostic testing, and the unique challenges that a health-tech startup faces with respect to doing good science and building products people want to buy.



Not So Standard Deviations: Episode 3 - Gilmore Girls

I just uploaded Episode 3 of Not So Standard Deviations so check your feeds. In this episode Hilary and I talk about our jobs and the life of the data scientist in both academia and the tech industry. It turns out that they're not as different as I would have thought.

Download the audio file for this episode.


Theranos runs head first into the realities of diagnostic testing

The Wall Street Journal has published a lengthy investigation into the diagnostic testing company Theranos.

The company offers more than 240 tests, ranging from cholesterol to cancer. It claims its technology can work with just a finger prick. Investors have poured more than $400 million into Theranos, valuing it at $9 billion and her majority stake at more than half that. The 31-year-old Ms. Holmes’s bold talk and black turtlenecks draw comparisons to Apple Inc. cofounder Steve Jobs.

If ever there were a warning sign, the comparison to Steve Jobs has got to be it.

But Theranos has struggled behind the scenes to turn the excitement over its technology into reality. At the end of 2014, the lab instrument developed as the linchpin of its strategy handled just a small fraction of the tests then sold to consumers, according to four former employees.

One former senior employee says Theranos was routinely using the device, named Edison after the prolific inventor, for only 15 tests in December 2014. Some employees were leery about the machine’s accuracy, according to the former employees and emails reviewed by The Wall Street Journal.
In a complaint to regulators, one Theranos employee accused the company of failing to report test results that raised questions about the precision of the Edison system. Such a failure could be a violation of federal rules for laboratories, the former employee said.
With these kinds of stories, it's always hard to tell whether there's reality here or it's just a bunch of axe grinding. But one thing that's for sure is that people are talking, and probably not for good reasons.

Minimal R Package Check List

A little while back I had the pleasure of flying in a small Cessna with a friend and for the first time I got to see what happens in the cockpit with a real pilot. One thing I noticed was that basically you don't lift a finger without going through some sort of check list. This starts before you even roll the airplane out of the hangar. It makes sense because flying is a pretty dangerous hobby and you want to prevent problems from occurring when you're in the air.

That experience got me thinking about what might be the minimal check list for building an R package, a somewhat less dangerous hobby. First off, much has changed (for the better) since I started making R packages and I wanted to have some clean documentation of the process, particularly with using RStudio's tools. So I wiped off my installations of both R and RStudio and started from scratch to see what it would take to get someone to build their first R package.

The list is basically a "pre-flight" list---the presumption here is that you actually know the important details of building packages, but need to make sure that your environment is setup correctly so that you don't run into errors or problems. I find this is often a problem for me when teaching students to build packages because I focus on the details of actually making the packages (i.e. DESCRIPTION files, Roxygen, etc.) and forget that way back when I actually configured my environment to do this.

Pre-flight Procedures for R Packages

  1. Install most recent version of R
  2. Install most recent version of RStudio
  3. Open RStudio
  4. Install devtools package
  5. Click on Project --> New Project... --> New Directory --> R package
  6. Enter package name
  7. Delete boilerplate code and "hello.R" file
  8. Goto "man" directory an delete "hello.Rd" file
  9. In File browser, click on package name to go to the top level directory
  10. Click "Build" tab in environment browser
  11. Click "Configure Build Tools..."
  12. Check "Generate documentation with Roxygen"
  13. Check "Build & Reload" when Roxygen Options window opens --> Click OK
  14. Click OK in Project Options window

At this point, you're clear to build your package, which obviously involves writing R code, Roxygen documentation, writing package metadata, and building/checking your package.

If I'm missing a step or have too many steps, I'd like to hear about it. But I think this is the minimum number of steps you need to configure your environment for building R packages in RStudio.

UPDATE: I've made some changes to the check list and will be posting future updates/modifications to my GitHub repository.


Profile of Data Scientist Shannon Cebron

The "This is Statistics" campaign has a nice profile of Shannon Cebron, a data scientist working at the Baltimore-based Pegged Software.

What advice would you give to someone thinking of a career in data science?

Take some advanced statistics courses if you want to see what it’s like to be a statistician or data scientist. By that point, you’ll be familiar with enough statistical methods to begin solving real-world problems and understanding the power of statistical science.  I didn’t realize I wanted to be a data scientist until I took more advanced statistics courses, around my third year as an undergraduate math major.


Not So Standard Deviations: Episode 2 - We Got it Under 40 Minutes

Episode 2 of my podcast with Hilary Parker, Not So Standard Deviations, is out! In this episode, we talk about user testing for statistical methods, navigating the Hadleyverse, the crucial significance of rename(), and the secret reason for creating the podcast (hint: it rhymes with "bee"). Also, I erroneously claim that Bill Cleveland is way older than he actually is. Sorry Bill.

In other news, we are finally on iTunes so you can subscribe from there directly if you want (just search for "Not So Standard Deviations" or paste the link directly into your podcatcher.

Download the audio file for this episode.



Apple Music's Moment of Truth

Today is the day when Apple, Inc. learns whether it's brand new streaming music service, Apple Music, is going to be a major contributor to the bottom line or just another streaming service (JASS?). Apple Music launched 3 months ago and all new users are offered a 3-month free trial. Today, that free trial ends and the big question is how many people will start to pay for their subscription, as opposed to simply canceling it. My guess is that most people (> 50%) will opt to pay, but that's a complete guess. For what it's worth, I'll be paying for my subscription. After adding all this music to my library, I'd hate to see it all go away.

Back on August 18, 2015, consumer market research firm MusicWatch released a study that claimed, among other things, that

Among people who had tried Apple Music, 48 percent reported they are not currently using the service.

This would suggest that almost half of people who had signed up for the free trial period of Apple Music were not interested in using it further and would likely not pay for it once the trial ended. If it were true, it would be a blow to the newly launched service.

But how did MusicWatch arrive at its number? It claimed to have surveyed 5,000 people in its study. Shortly before the survey by MusicWatch was released, Apple claimed that about 11 million people had signed up for their new Apple Music service (because the service had just launched, everyone who had signed up was in the free trial period). Clearly, 5,000 people do not make up the entire population, so we have but a small sample of users.

What is the target that MusicWatch was trying to answer? It seems that they wanted to know the percentage of all people who had signed up for Apple Music that were still using the service. Can they make inference about the entire population from the sample of 5,000?

If the sample is representative and the individuals are independent, we could use the number 48% as an estimate of the percentage in the population who no longer use the service. The press release from MusicWatch did not indicate any measure of uncertainty, so we don't know how reliable the number is.

Interestingly, soon after the MusicWatch survey was released, Apple released a statement to the publication The Verge, stating that 79% of users who had signed up were still using the service (i.e. only 21% had stopped using it, as opposed to 48% reported by MusicWatch). In other words, Apple just came out and gave us the truth! This was unusual because Apple typically does not make public statements about newly launched products. I just found this amusing because I've never been in a situation where I was trying to estimate a parameter and then someone later just told me what its value was.

If we believe that Apple and MusicWatch were measuring the same thing in their analyses (and it's not clear that they were), then it would suggest that MusicWatch's estimate of the population percentage (48%) was quite far off from the true value (21%). What would explain this large difference?

  1. Random variation. It's true that MusicWatch's survey was a small sample relative to the full population, but the sample was still big with 5,000 people. Furthermore, the analysis was fairly simple (just taking the proportion of users still using the service), so the uncertainty associated with that estimate is unlikely to be that large.
  2. Selection bias. Recall that it's not clear how MusicWatch sampled its respondents, but it's possible that the way that they did it led them to capture a set of respondents who were less inclined to use Apple Music. Beyond this, we can't really say more without knowing the details of the survey process.
  3. Respondents are not independent. It's possible that the survey respondents are not independent of each other. This would primiarily affect the uncertainty about the estimate, making it larger than we might expect if the respondents were all independent. However, since we do not know what MusicWatch's uncertainty about their estimate was in the first place, it's difficult to tell if dependence between respondents could play a role. Apple's number, of course, has no uncertainty.
  4. Measurement differences. This is the big one, in my opinion. We don't know is how either MusicWatch or Apple defined "still using the service". You could imagine a variety of ways to determine whether a person was still using the service. You could ask "Have you used it in the last week?" or perhaps "Did you use it yesterday?" Responses to these questions would be quite different and would likely lead to different overall percentages of usage.