Why I disagree with Andrew Gelman's critique of my paper about the rate of false discoveries in the medical literature

Tweet about this on TwitterShare on Facebook11Share on Google+4Share on LinkedIn1Email this to someone

With a colleague, I wrote a paper titled, "Empirical estimates suggest most published medical research is true"  which we quietly posted to ArXiv a few days ago. I posted to the ArXiv in the interest of open science and because we didn't want to delay the dissemination of our approach during the long review process. I didn't email anyone about the paper or talk to anyone about it, except my friends here locally.

I underestimated the internet. Yesterday, the paper was covered in this piece on the MIT Tech review. That exposure was enough for the paper to appear in a few different outlets. I'm totally comfortable with the paper, but was not anticipating all of the attention so quickly.

In particular, I was a little surprised to see it appear on Andrew Gelman's blog with the disheartening title, "I don’t believe the paper, “Empirical estimates suggest most published medical research is true.” That is, most published medical research may well be true, but I’m not at all convinced by the analysis being used to support this claim." I responded briefly this morning to his post, but then had to run off to teach class. After thinking about it a little more, I realized I have some objections to his critique.

His main criticisms of our paper are: (1) with type I/type II errors instead of type S versus type M errors (paragraph 2), (2) that we didn't look at replication, we performed inference (paragraph 4), (3) that there is p-value hacking going on (paragraph 4), and (4) he thinks that our model does not apply because p-value hacking my change the assumptions underlying this model in genomics.

I will handle each of these individually:

(1) This is primarily semantics. Andrew is concerned with interesting/uninteresting with his Type S and Type M Errors. We are concerned with true/false positives as defined by type I and type II errors (and a null hypothesis). You might believe that the null is never true - but then by the standards of the original paper all published research is true. Or you might say that a non-null result might have an effect size too small to be interesting - but the framework being used here is hypothesis testing and we have stated how we defined a true positive in that framework explicitly.  We define the error rate by the rate of classifying thing as null when they should be classified as alternative and vice versa. We then estimate the false discovery rate, under the framework used to calculate those p-values. So this is not a criticism of our work with evidence, rather it is a stated difference of opinion about the philosophy of statistics not supported by conclusive data.

(2) Gelman says he originally thought we would follow up specific p-values to see if the results replicated and makes that a critique of our paper. That would definitely be another approach to the problem. Instead, we chose to perform statistical inference using justified and widely used statistical techniques. Others have taken the replication route, but of course that approach too would be fraught with difficulty - are the exact conditions replicable (e.g. for a clinical trial), can we sample from the same population (if it has changed or is hard to sample), and what do we mean by replicates (would two p-values less than 0.05 be convincing?). This again is not a criticism of our approach, but a statement of another, different analysis Gelman was wishing to see.

(3)-(4) Gelman states, "You don’t have to be Uri Simonsohn to know that there’s a lot of p-hacking going on." Indeed Uri Samuelson wrote a paper where he talks about the potential for p-value hacking. He does not collect data from real experiments/analyses, but uses simulations, theoretical arguments, and prospective experiments designed to show specific problems. While these arguments are useful and informative, it gives no indication of the extent of p-value hacking in the medical literature. So this argument is made on the basis of a supposition by Gelman that this happens broadly, rather than on data.

My objection to his criticism is that his critiques are based primarily on philosophy (1), a wish that we had done the study a different way (2), and assumptions about the way science works with only anecdotal evidence (3-4).

One thing you could very reasonably argue is how sensitive our approach is to violations of our assumptions (which Gelman implied with criticisms 3-4). To address this,  my co-author and I have now performed a simulation analysis. In the first simulation, we considered a case where every p-value less than 0.05 was reported and the p-values were uniformly distributed, just as our assumptions would state. We then plot our estimates of the swfdr versus the truth. Here our estimator works pretty well.



We also simulate a pretty serious p-value hacking scenario where people report only the minimum p-value they observe out of 20 p-values. Here our assumption of uniformity is strongly violated. But we still get pretty accurate estimates of the swfdr for the range of values (14%) we report in our paper.


Since I recognize this is only a couple of simulations, I have also put the code up on Github with the rest of our code for the paper so other people can test it out.

Whether you are convinced by Gelman, or convinced by my response, I agree with him that it is pretty unlikely that "most published research is false" so I'm glad our paper is at least bringing that important point up. I also hope that by introducing a new estimator of the science-wise fdr we inspire more methodological development and that philosophical criticisms won't prevent people from looking at the data in new ways.




  • zbicyclist

    I do not know whether this would be at all helpful to you as an idea, but I'm going to float it anyway. (free advice being worth what you pay for it)

    Suppose one would take cases where the results were actually false, as in fraudulent. I'm thinking in particular of that case in which dozens of papers by one author in an anesthesiology journal were "involuntarily retracted", since that's medical research. I do not know how many cases there are that were investigated with this degree of intensity, but you need more than just this one. If we put these results into your model, they should clearly NOT fit, right? That wouldn't be definitive, of course, but might be a supporting point.

  • Pingback: I don’t believe the paper, “Empirical estimates suggest most published medical research is true.” That is, most published medical research may well be true, but I’m not at all convinced by the analysis being used to support this cl()

  • Pingback: I don’t believe the paper, “Empirical estimates suggest most published medical research is true.” That is, most published medical research may well be true, but I’m not at all convinced by the analysis being used to support this cl()

  • http://fellgernon.tumblr.com/ Leonardo Collado Torres

    Interesting paper and discussion on both blogs and it's good to see some degree of agreement after the comments exchange. The main question is definitely interesting and I liked how it was approached in the paper. Though as Jeff says, hopefully this is just the start.

    I find interesting the argument by Gelman regarding effect size and the possible trend to report confidence intervals instead of P-values that was described in the paper. I wonder how the CIs could be used. As a start, it'll be more difficult to scrape.

  • Phil Price

    Your approach does assume that the amount of "p-value hacking" is small, doesn't it? It's interesting that you think the burden is on Gelman (or other critics) to show that this is false, rather than on you to show that it's a reasonable assumption.

    With regard to your item 1, you say "You might believe that the null is never true – but then by the standards of the original paper all published research is true" but that's not right at all: a Type S (sign) error is wrong by either standard. If you claim that such-and-such a treatment reduces mortality, but it actually increases it, then you're just wrong.

    Sorry if it seems like I'm piling on.

    • jtleek

      Phil - Thanks for the note. I"m absolutely happy to discuss the issues. For sure there are a number of ways in which the assumptions/analyses could be questioned and I agree with a lot of them. I also have a prior belief that p-value hacking might happen. I also think that the null p-values might not be exactly U(0,1). I am doing my best to address these criticisms.

      However, there are two types of criticisms: (1) we have data or evidence that shows your analysis is flawed, and (2) we think it is probably flawed because of the way we assume the world works. I can respond to criticisms of the 2nd type, since they are based on data and evidence. My frustration with this process is that most of the criticisms of my paper have been of the former type. They make a judgement call - not based on published evidence of the rate of p-value hacking, or an examination of my data to see if I have that problem, but as a vague and general criticism. These nebulous criticisms always exist for papers (and may even be right!) but I find them to be detrimental to the scientific process. So I prefer strongly to respond to criticisms that can be tested and either proved or falsified.

      • http://twitter.com/Neuro_Skeptic Neuroskeptic

        The criticisms that you call "nebulous" are not based on assumptions about the way the world works; they're based on doubting your assumptions in your model.

        It's true that this doubt often comes from anecdotal evidence etc. but it's still doubt, not an assumption.

        The burden of proof is on the person proposing the model to show that it's valid.

        I mean look. I could say, "my model is that all p-values are false, and false p-values follow a beta distribution because of p-hacking. Look at the data - it fits that distribution. I'm right."

        The response to this proposal idea would, quite rightly, be that I need to prove that false positives follow such a distribution.

        Your model assumes that false-positive p values are uniformly distributed. Why is that any better than my hypothetical assumption that they're beta distributed? Or any other arbitrary assumption?

        The only basis for the idea that they're uniform, as far as I can see, is that in theory (by definition) null p values should be uniformly distributed.

        But that's not an empirical finding, it's an assumption about how the world works. It assumes that in practice, in the published literature, false positive p-values behave as statistics textbooks say they ought to.

        If you run a million tests under the null hypothesis, the p values will indeed be uniformly distributed; but unless we also assume that all p values are equally likely to be published in their original form (not fudged), we cannot assume that this is true of the published literature.

        • jtleek

          You could make such a claim - that p-values are beta distributed under the null because the data look like that. I actually think that is perfectly illustrative of the type of criticisms that are frustrating.

          The reason they are not is that p-values have a theoretical property that states they are uniformly distributed under the null ( http://en.wikipedia.org/wiki/P-value) when correctly calculated. We have also observed null p-values to behave as uniformly distributed in the literature (see genomics studies - where even with p-value hacking, batch effects, etc. they are often very close to uniform)

          You can obviously dispute the "when correctly calculated" bit and I'd even agree with you that many might not be. But we have ample evidence that null p-values are close to uniformly distributed in real examples in the literature. We also have ample evidence that FDR estimators work and are widely used in many applications.hese approaches rely on the null p-value uniformity as do ours. We also have theory that says this is how things should work.

          I completely agree that we should consider robustness of the approach, and I've started to do that as you see above. But we should do that on the basis of empirical data, since the mathematical theory says p-values should be uniform, we have seen evidence that is true in past studies, and other people have successfully used that assumption to estimate the FDR elsewhere, I don't think it is a stretch to argue that it is a reasonable starting place for a model.

          That being said, as I mentioned, I'll be looking into robustness to violation of assumptions. I've also taken great pains to allow others to do the same. I expect any day we'll see at least a few examples where it doesn't work give the attention this is getting. I hope in the meantime, we can discuss this on a level where you ask: Jeff what if x is going on? Where x represents a specific, testable hypothesis and I respond Neuroskeptic (I don't know your real name :-) I tried it and this is what happens. But I will no longer respond to criticisms like "We think p-value hacking might be going on, this violates your assumptions, so your method/results are wrong". It isn't specific, it isn't testable, and it isn't science.

          • Keith O’Rourke

            Jeff: I know things are comming at you from all directions but

            I think there may be unmeasured confounders in the Epi studies and non-randomized clinical studies in your sample.

            Specifically I think that bias could be 25% of the SD

            Given that is a reasonable possibility the distribution of p_values given no effect is testable by simulation (example below)

            May not be science but its math (if you do it analytically)

            Now there probabily is some empirical science that might be able to inform the size of unmeasured confounding bias...


          • jtleek

            This is another specific hypothesis - unmeasured confounders. I'm happy to try to evaluate this under simulated conditions or to try to hunt down data to evaluate it. Of course, I don't have time to chase down all ideas so I'd love some help - I'm also perfectly happy for you to take my code and find the cases where it "breaks". I'll even post them here.

          • Keith O’Rourke

            Thanks for the reply.

            As CS Pieirce once wrote, it is good thing we die, otherwise we would live long enough to realise anything we thought we understood we don't. To live long enough to realise unmeasured confounders is a theoretical rather practical concern would be a treat.

            But I realise, as an academic your primary responsibility is to catch on to a dream and see where it takes you, disregarding rather than ignoring criticisms.

            But, but the best advice I got was to pretend no matter how hard it is to imagine that those criticising were trying to HELP rather than trip you up - no matter how hard it is to pretend that.

            For help, respectively, you should ask postdocs,grad students and research assistants

  • http://twitter.com/Neuro_Skeptic Neuroskeptic

    Thanks for the comment. I have two questions:

    1. How do you respond to the claim, made by myself and many others, that the sample you selected (Abstracts, with significant p-values, from the top-ranked journals) is likely to be better than the rest of the medical literature? It seems to me that this your estimate is the lower bound of the likely overall false positive rate.

    2. Your (useful!) sensitivity analysis shows that once the true FDR hits 50%, your approach is unable to detect further increases, and that it underestimates the FDR ( slightly) even for low values. This, to me, is a warning sign that the assumption of evenly distributed false positive p-values is a serious vulnerability of your analysis, in principle. It's true that the sensitivity analysis is in one sense an extreme case (20 independent p-values per paper with the best one reported - unlikely to happen) but there are other kinds of p-hacking e.g. outlier removal, picking the analysis method that gives the lowest p...

    3. We know that p-values "bunch" below 0.05 in some fields - this is not just a theoretical possibility, it's been shown empirically IIRC by Kahneman, Simonsohn and others (I can't find the references right now). In your sample this doesn't seem to have happened, but this suggests that Point 1 is more than just a possibility...

    • jtleek

      First of all - I really appreciate you taking the time to read my paper, post about it, and then even come over to my blog to respond. It makes me happy people are reading my paper.

      To summarize my response to your points below: I agree with most of what you have said and will likely write future papers on all of them. I only take exception to the fact that the whole approach was written off by you, Gelman, et al. as totally incorrect without data to support those claims. I agree the burden of proof is on the investigator to show that a new approach has merit. I think we have shown enough that our approach is worth considering and following up (especially as there is no other method currently available for estimating this quantity based on data in the literature!).

      Now to respond to your points:

      (1) With respect to (1) I totally agree with you. We are focusing on the top ranked journals and the abstracts of those papers which are the most significant results (likely). We totally agree that a more thorough analysis is warranted and may indeed show that secondary analyses are less likely to be accurate than primary analyses. Other types of journals may also suffer from false positives - although we intentionally included an epidemiological journal to see if the results changed and they did not. We actually explicitly state these limitations in the paper and also suggest exactly what you do - that further research is needed into other journals and secondary claims. So we agree with you! I don't see this as a failing of our first paper though, just beyond the scope (that will be our 2nd paper on the topic :-).

      (2) With respect to (2) I agree that in that simulation the FDR is underestimated slightly for low values and pretty seriously for high values. I also think that is a pretty extreme case - it suggests every scientist is engaged in p-value hacking. More than that, it suggests that every scientist is knowingly choosing the smallest p-value to report and not reporting any of the others. This would be tantamount to scientific misconduct on the part of a whole field. Even in this pretty extreme case, we are close to the truth. But your point is well taken - a more thorough sensitivity analysis and a deep discussion of p-value hacking is worth doing. But I would again say, I believe our current work should stand since it is the first attempt to perform such an analysis. But you have now given me an idea for our 3rd paper on the topic - "P-value hacking and the science-wise false discovery rate" want to be a co-author? You can pick the ways we hack, we'll hack away and try it out.

      (3) Yes we do know the p-values "bunch" below 0.05 in some fields. One reference is here: (http://www.tandfonline.com/doi/abs/10.1080/17470218.2012.711335). But as you point out, we don't see strong evidence in the data we have collected for this clumping. I will however, follow this point up to be sure. In the meantime, I would be a little hesitant to claim we are sure there is widespread p-value hacking going on when there is little supporting evidence in our data.

  • http://twitter.com/jamesthorniley James Thorniley

    I found this interesting. Having read through Gelman's blog and comments thread, and here, I found something a little funny: it seems that the same standard of criticism people are applying to your work (questioning validity of assumptions, Gelman even questions whether the general approach can work at all) is not applied to Simonsohn (who as far as I can tell uses broadly similar methods to you, though I'm not 100% sure) or to Ioannidis (who just doesn't present any data at all - the original Ioannidis paper you are responding to is in fact pure speculation as far as I can see).

    Well criticism is a healthy thing anyway, but I'm not sure if it's being applied equally!

    • http://twitter.com/Neuro_Skeptic Neuroskeptic

      Simonsohn's method is similar but it doesn't rely on assumptions about false positives. It makes assumptions about *true* positives and then says that any deviation from that must represent false ones; e.g. true positives should not "bunch" at just below 0.05 (because that's an entirely arbitrary line).

      That is an assumption but a much more conservative one than assuming that we know how false positives behave - a system can only go right in one particular way, but it can go wrong in many ways.

  • Alex D’Amour

    I think that the salient criticism you've received is valid but is something that you could respond to with additional evidence. Essentially, you've proposed a model for p-values that is internally coherent, but you haven't validated this model. To validate your model, you need to make predictions and check them.

    The most critical assumption to validate is that p-values in the literature under consideration follow a uniform distribution for "false positives".

    This is the most critical assumption of the paper -- after all, without it you'd just be clustering p-values without any interpretation for what each component corresponds to. You cite the theoretical result that p-values are uniformly distributed _if the null hypothesis is true_, but this is not realistic enough to stand on its own.

    The null hypothesis being true is a statement about the whole noise distribution in a study, and that entire distribution needs to be correct for the p-value to be uniformly distributed. This assumption is violated all the time -- for example, when papers report the results of regressions where the errors may be t-distributed instead of normally distributed as assumed under the null, or when the errors are heteroskedastic instead of homoskedastic as assumed under the null. Every kind of misspecification in the null distribution will yield a different non-uniform distribution over p-values.

    This has serious implications for your model because the component of your mixture
    distribution that models false positives is inflexibly uniform, while the component that models true positives is a flexible beta distribution. That flexibility makes it more likely that p-values from misspecified models will be classified as "true positives", meaning that the more prone a discipline is to misspecifying a null hypothesis, the more likely your measure is to call the results "true positives". It should be a red flag that in your paper observational studies have a lower false positive rate than experimental studies do.

    You cite the success of the FDR literature as validation of this assumption. However, in some cases in the FDR literature, this assumption has been called into question (see Efron's "Large Scale Inference" book and the papers it's based on where he considers using an "empirical null" for microarray studies). More importantly, this context is very different from the standard FDR application. When it's the same experiment that you're running hundreds of thousands of times in parallel, you can usually get a good sense of what the true noise distribution is, and use that in specifying your null. But doing this sort of model validation on an entire literature seems quite daunting. Indeed, the propensity to misspecify is literature-, or even experimental-design-, specific. It's not realistic to cite the seeming uniformity of null p-values in one context as reason to believe that they will be uniform in another.

    The only way to respond to this is to show that false positive p-values _in the fields that you are studying_ do actually follow a uniform distribution, or to try to measure what the false positive distribution actually looks like and to incorporate that empirical estimate into your model. Admittedly, this requires collecting data on papers that were actually shown to be false, which is much more difficult to automate. But until you do this, critics who already doubt that p-values are a good indicator of valid scientific results are not going to simply spot you the assumption that the study authors specified a null hypothesis with the correct noise distribution.

    • jtleek

      Very thoroughly documented. Thanks for pointing this out. It is very similar to the arguments of Gelman and Neuroskeptic. I don't ask them to spot me the assumption and I'm not wed to the exact number 14%. What I ask is that we together come up with some specific hypotheses to test about robustness. If the estimates are reasonably robust to departures of the uniformity assumption - great. If not, I hope that our methodology - and our approach of trying to use data rather than theoretical conjecture - will inspire improvements and more robust estimates.

      However, I think that new models should be proposed, like ours. I also think they deserve to be critically evaluated with specific changes in hypotheses. We will certainly find cases where our estimates are not robust and those should be explored. But I'd strongly prefer to respond to specific cases, like the one Michael points out above with unreported multiple testing, than to respond to vague criticisms and statements that this is all "hopeless" as some of the critics have suggested.

  • John Storey

    Jeff -- I would like to commend you and Leah Jager for having the fortitude to tackle a challenging and controversial problem, and to openly share your data, code, and results with the community. Whereas the Ioannidis paper is almost entirely speculative, based on inputting unjustified numbers into the false positive paradox argument, you and Jager collected a substantial amount of data and fit a model to the data. You clearly stated your assumptions and made your analysis transparent, which has allowed others to quickly understand the results and form their own opinion.

    If the null p-values are systematically non-Uniform in the anti-conservative direction, then of course you're under-estimating the false discovery rate in the literature. I imagine this is why you clearly stated the Uniform distribution as an assumption -- so that readers who disagree with the assumption can judge the results through that lens. Also, if one doesn't believe that hypothesis testing is a valid scientific approach, then of course this individual will be critical of the paper.

    I only wish the statistics community had approached the Ioannidis paper with the level of scrutiny and intensity that is being applied to yours. I think your paper calls into question the lack of data supporting Ioannidis' claim. It would be interesting to determine how non-Uniform the p-values need to be before the false discovery rate is >50%, which is the claim that Ioannidis made. I'm assuming when he says "most" he means FDR > 50%.

    • Rafael Irizarry

      I second that...

      • Brian Caffo

        Thirded. When reading a lot of the criticisms, I think a key point is being missed. The manuscript is in the terms and framework from Ioannidis' paper. So, for example, critiques of sharp null testing versus scientific significance are irrelevant.

      • Michael Ezewoko

        Definitely, professor Leek - I salute your transparency on the issue, your methods and the verifiable source code. I believe that the days of closed-door articles with heavily proprietary methods, software and unverifiable statements is coming and has come to an end. Cheers!

  • Michael Last

    The paper does a good job arguing against
    Ioannides claim about prior probabilities of medical studies being
    true/false. Under the model of Ioannides' PLOS paper, it has shown
    that the vast majority of published medical studies are true. This is a
    really cool result.

    However, we know that *many* medical studies fail to replicate. There was a paper a couple of years ago about putting a dead salmon in
    an MRI, and conducting a typical fMRI study. Using techniques commonly
    accepted in the literature, they found regions of activation (since they
    didn't correct for multiple testing). Ioannides has a paper in JAMA showing the
    failure of a number of highly-cited studies failing to be replicated in
    clinical trials; while the clinical trials replicated at a reasonable
    rate, 5/6 observational studies failed to replicated. Stan Young, at
    NISS, followed up on this, and found 19/20 observational studies that
    were followed up with clinical trials failed to replicate.

    So something other than Ioannides' prior probability argument is necessary to explain what has been seen.

    • jtleek

      You absolutely have hit the reasonable level of specificity with your multiple testing argument. In fact, I think that that is likely the biggest potential flaw with our logic.

      I'd love to work with you to try to figure out the range of cases where unreported multiple testing leads to inaccurate science wise false discovery rates. We can use our code and try out a few things.

      Any interest in working on a paper together on this? I have submitted our original paper for discussion at Biostatistics, but I'm interested in evaluating our assumptions further under specific assumptions like you have suggested.

      I'm totally happy to criticize my own work in cases where it doesn't apply. I agree the title of our paper was a bit cheeky but it served it's purpose - to draw attention to the fact that Ioannidis does not use data to make his arguments.

  • Seth Roberts

    You say you think the title of your paper "served its purpose - to draw attention to the fact that Ioannidis does not use data to make his arguments." I disagree. I don't get that from the title nor from the text of your paper.

    The abstract of your paper says that the original paper ("Why most published results are false") "suggest[s] that most published medical research is false" and that the original paper "call[s] into question" the accuracy of published results. I disagree here, too. Since its arguments were backed by no data at all, to me at least the original paper did not suggest most published research is false. It suggested nothing about how often papers are false. Nor did it imply anything about the accuracy of published results. You seem to be overestimating the original paper.

  • Vladimir Morozov


    I have suggestion to test your methodology using only ALS
    (amyotrophic lateral sclerosis) animal drug efficacy p-values. Since no of the ALS animal studies were
    reproduced, close to 100% of the published p-values are false. It would be interesting
    to see what you model will show for these p-values. Some background to support that practically all
    the p-values from the ALS animal studies can be considered as false. I work for
    als.net that has re-tested (in the same animal model) probably all of the most
    significant “therapeutic” effects and failed to confirm them. Another failed effort to confirm the published
    results was $1M prize (http://www.prize4life.org/page/prizes/treatment_prize)
    to anybody (including folks who published the papers about ‘successful”
    therapies) who can show modest but still statistically
    significant effect. I am ready to help
    with searching, parsing the published ALS papers , if you are ready to run your
    analysis on extracted p-values.


  • David Colquhoun

    I suspect the answer depends on what you count as "medical research". I can believe that Ioannidis was right for clinical trials, observational epidemiology, microarray data, some parts of psychology and fMRI work. If, on the other hand, you include the harder end of physiology and biophysics, I'd guess that the fraction of untrue results would be a good deal smaller. (CoI, I do the latter).

  • ezracolbert

    looking at papers in NEJM and Lancet is like asking the question, do PhDs in statistics publish more then one paper a year by taking people who have full prof with tenure at harvard/stanford/cmu before the age of 30 as you study set

    or, if you went and asked one of my colleagues, are you thinking of sending your next paper to JAMA/LANCET, etc, their normal answer would be, I wish the paper was that good; a conversation I've had many times, are you thinking of sending this to [lancet], well, maybe if I get this next super exciting result....

    my math stops at HS algebra, so to me it is (arthur c clarke ((1)) magic that you can infer what proportion of p values are right, but like most scientists, I don't really question stuff in other fields I don't understand.

    But still, I'm not sure I believe it - esp as microarrays are such a diff thing, I find it hard to believe that models that work in microarray land (I do know about mircoarrays) work in publication land

    However, if this really works, I think it would be of quite general interest and worth a publication in a good journal like Nature or Science
    PS: the link at the end of the archiv pdf to the supp info is broken, at least in my pdf viewer.


  • ezracolbert

    A simple empirical test of your method
    I think most biomedical researchers will take as true, that the proportion of false pos p values (this is a little backward - a false pos is a p that is truly >0.05, but is reported as <0.05) will INCREASE as you go from top journals like Lancet and NEJM to lesser ranked journals; not to put to fine a point on it, as you descend into the nether regions of biomedical research publishing, there should be a big change.

    If you found that the proportion of false pos Ps increased or stayed the same, I would take that to mean that your method is flawed
    If you find the proportion decreasing, that would be consistent with the model

  • nana

    Thirded. When reading a lot of the criticisms, I think a key point is
    being missed. The manuscript is in the terms and framework from
    Ioannidis' paper. So, for example, critiques of sharp null testing
    versus scientific significance are irrelevant.
    obat kuat
    obat kuat alami
    obat kuat pria
    obat kuat sex
    obat pembesar penis
    obat perangsang wanita
    alat bantu sex
    obat penambah sperma
    pembesar penis