"Why Most Published Research Findings Are False" by John P.A. Ioannidis
Published August 2005 in PLOS Medicine
The article raises concerns about the validity of current published research findings, emphasizing that many may be false. The truth of a research claim depends on factors like study power, bias, the number of studies on the same question, and the ratio of true to no relationships in a field. Smaller studies, smaller effect sizes, greater number and lesser selection of tested relationships, and greater flexibility in study designs contribute to a higher likelihood of false findings. Additionally, financial interests, prejudices, and multiple research teams chasing statistical significance further distort the probability of true findings. The article suggests that in many scientific fields, especially those with high numbers of studies and low pre-study probability, the majority of published claims are likely false, often reflecting prevailing biases rather than true effects.
The author also discusses the modeling of false positive findings, highlighting the misuse of p-values and the importance of considering pre-study probabilities, study power, and statistical significance. Bias is identified as a significant factor, encompassing various design, data, analysis, and presentation elements that can lead to false findings. The involvement of several independent teams in a field often results in a decrease in the probability of true findings. The article presents several corollaries regarding the likelihood of true research findings, considering factors like study size, effect size, number of tested relationships, study design flexibility, financial interests, and the number of research teams involved.
Is this strong evidence to falsify the claim?
Add a comment 💬
Nancy Drew
Critique by Goodman and Greenland published April 2007
"while we agree that there are more false claims than many would suspect—based both on poor study design, misinterpretation of p-values, and perhaps analytic manipulation—the mathematical argument in the PLoS Medicine paper underlying the “proof” of the title's claim has a degree of circularity."
"Dr. Ioannidis utilizes a mathematical model that severely diminishes the evidential value of studies—even meta-analyses—such that none can produce more than modest evidence against the null hypothesis, and most are far weaker. This is why, in the offered “proof,” the only study types that achieve a posterior probability of 50% or more (large RCTs [randomized controlled trials] and meta-analysis of RCTs) are those to which a prior probability of 50% or more are assigned. So the model employed cannot be considered a proof that most published claims are untrue, but is rather a claim that no study or combination of studies can ever provide convincing evidence."
"In addition to the above problems, the paper claims to have proven something it describes as paradoxical; that the “hotter” an area is (i.e., the more studies published), the more likely studies in that area are to make false claims. We have shown this claim to be erroneous. The mathematical proof offered for this in the PLoS Medicine paper shows merely that the more studies published on any subject, the higher the absolute number of false positive (and false negative) studies. It does not show what the papers' graphs and text claim, viz, that the number of false claims will be a higher proportion of the total number of studies published (i.e., that the positive predictive value of each study decreases with increasing number of studies)."
"the claims that the model employed in this paper constitutes a “proof” that most published medical research claims are false, and that research in “hot” areas is most likely to be false, are unfounded."
A former editor of the British Medical Journal for 13 years, Richard Smith argues in 2006 that the peer review process itself introduces bias against negative results (papers illustrating failure of a hypothesis tend not to be published, or even submitted), reviews are incoherent and contradictory, anonymity cannot be guaranteed on either end of the process, and it is slow and costly to the scientific community, averaging at the time a cost of $5000 per article published.
It is rather ironic to cite journal articles on the falsify side. I'm providing a New York times article about James A. Lindsay, Helen Pluckrose and Peter Boghossian who got several hoax papers published in peer reviewed journals. Their story is reported in many other places as well and searching for the author's names will find them. The fiasco rather demonstrates that "peer review" guarantees little about the quality of the research being reported.
"The replication crisis is an ongoing methodological crisis in which the results of many scientific studies are difficult or impossible to reproduce. Because the reproducibility of empirical results is an essential part of the scientific method, such failures undermine the credibility of theories building on them and potentially call into question substantial parts of scientific knowledge.
The beginning of the replication crisis can be traced to a number of events in the early 2010s."
As of 2010, when this study was published, Scientific journals didn't disclose their own financial conflicts of interest, as they require their authors to do.
This article assessed the quality of evidence for the first-listed primary outcomes in updated Cochrane reviews (considered a high standard of peer review in medicine). It found that only a minority of outcomes for health care interventions are supported by high-quality evidence as laid out through the peer review process. The quality of the evidence did not consistently improve or worsen in updated reviews.
Published February 21, 2020 in BMC, part of Springer Nature
"A reproducibility crisis is a situation where many scientific studies cannot be reproduced. Inappropriate practices of science, such as HARKing, p-hacking, and selective reporting of positive results, have been suggested as causes of irreproducibility.... I propose that a lack of raw data or data fabrication is another possible cause of irreproducibility."
"As of now, most journals from major publishers require or encourage the authors to [provide raw data to readers after publication, upon request,] but do not require them to deposit raw data before publication."
This article talks about the author's experience as the editor of a scientific journal, upon asking a subset of manuscript authors to provide their raw data, half withdrew, those that didn't still had many issues in the raw data they provided. In total, 97% of the manuscripts that he asked to provide their raw data did not present data supporting their results. Some of the ones that withdrew published their manuscripts in other journals that didn't require raw data to publish. However, those journals did require or encourage authors to share their raw data with any readers that asked. So the author of this article asked the authors of those manuscripts for their raw data again, and most failed to respond. Those that did respond, only provided partial data.
In 2020 a major study claiming to have studied the safety and efficacy of Hydroxychloroquine for treating Covid-19 was published in The Lancet. It was used to determine international health policy and resulted in the delay other large-scale trials of Hydroxychloroquine.
However, it quickly became evidenct that the study's data was likely faked.
"Scientists and journalists noted that the Lancet paper’s data included impossibly high numbers of cases—exceeding official case or death counts for some continents and coming implausibly close for others. Similar data discrepancies were also identified in two previous studies that had relied on the company’s database. Inquiries by The Scientist and The Guardian, meanwhile, failed to identify any hospital that had contributed to the registry."
It's wild that this study was published in the highly respected and peer reviewed Lancet, and yet journalists, scientists, and rando's on the internet quickly noticed red flags, both about the study and the company that had produced it.
This article about the scandal, published in October 2020, notes that "While a few parties have since accepted some responsibility and outlined plans to avoid similar situations in the future, the majority have not."
Eric Weinstein discusses how Ghislaine Maxwell's father diluted the quality of editorship of the leading journals since post mid-1960s such that the scientific method and "peer review" mean different things now.
The problem isn’t the peer review process itself. Neither is there a “replication problem.”
In science without confirmation of results, the initial data is unusable and considered to be one potentiality. When a test has inconclusive results, or can’t be repeated, that information should be discarded. However lots of tests don’t get repeated, or have inconsistent results. Those still end up being used in the media, or shared online. The information is taken at face value, without without proper scientific procedure being followed. This leads to a spread of misinformation.
Most scientific hypotheses will prove to be false in the end. That a test can not be replicated or that its findings are different than expected is not actually a problem in the sciences. If an experiment can't be replicated, the information from the test is unusable. If a hypothesis proves to be false or inconclusive, it can still help inform new tests. It's all a part of the regular process. Most of what we learn will be falsified, corrected, expanded upon, or changed. Getting the same result after 3 repeats, has a 66.7% chance of being more accurate than a single test. The problem lies with how the information from experiments like these is spread. Scientific experiments only ever try to test for one thing at a time and it happens slowly. People, however, like to extrapolate.
The problem does not lie with the scientific process, or the peer review process. The problem lies with how scientific information is spread, talked about, and understood by the general public. The attached article shows the different variables that can occur in experimentation, and ways to try and correct for it.
Is this strong evidence to support the claim?
Add a comment 💬
Judge Judy
You say, "If a hypothesis proves to be false or inconclusive, it can still help inform new tests," but then isn't it a problem with the scientific process/publishing system that the experiments which prove false or inconclusive are much less likely to be published?
I was conflicted about which side to put this article in, since it talks about how "researchers were more likely to 'confirm' past results than refute them — results that did not conform to their expectation were more often systematically discarded or revised" and argues that the analysis portion of scientific studies need to be blinded in order to avoid this bias. This would suggest that so called "triple-blinded" studies which do this may be more likely to have reliable results, but it casts doubt on research that doesn't employ this corrective technique.
"Peer Review in Scientific Publications: Benefits, Critiques, & A Survival Guide"
Published October 2014 in The Journal of the International Federation of Clinical Chemistry and Laboratory Medicine
"Peer Review is defined as “a process of subjecting an author’s scholarly work, research or ideas to the scrutiny of others who are experts in the same field” (1). Peer review is intended to serve two primary purposes. Firstly, it acts as a filter to ensure that only high quality research is published, especially in reputable journals, by determining the validity, significance and originality of the study. Secondly, peer review is intended to improve the quality of manuscripts that are deemed suitable for publication. Peer reviewers provide suggestions to authors on how to improve the quality of their manuscripts, and also identify any errors that need correcting before publication."
"while we agree that there are more false claims than many would suspect—based both on poor study design, misinterpretation of p-values, and perhaps analytic manipulation—the mathematical argument in the PLoS Medicine paper underlying the “proof” of the title's claim has a degree of circularity."
"Dr. Ioannidis utilizes a mathematical model that severely diminishes the evidential value of studies—even meta-analyses—such that none can produce more than modest evidence against the null hypothesis, and most are far weaker. This is why, in the offered “proof,” the only study types that achieve a posterior probability of 50% or more (large RCTs [randomized controlled trials] and meta-analysis of RCTs) are those to which a prior probability of 50% or more are assigned. So the model employed cannot be considered a proof that most published claims are untrue, but is rather a claim that no study or combination of studies can ever provide convincing evidence."
"In addition to the above problems, the paper claims to have proven something it describes as paradoxical; that the “hotter” an area is (i.e., the more studies published), the more likely studies in that area are to make false claims. We have shown this claim to be erroneous. The mathematical proof offered for this in the PLoS Medicine paper shows merely that the more studies published on any subject, the higher the absolute number of false positive (and false negative) studies. It does not show what the papers' graphs and text claim, viz, that the number of false claims will be a higher proportion of the total number of studies published (i.e., that the positive predictive value of each study decreases with increasing number of studies)."
"the claims that the model employed in this paper constitutes a “proof” that most published medical research claims are false, and that research in “hot” areas is most likely to be false, are unfounded."
https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0040168