Article Navigation
 < Previous
 Next >
Journal Article
Beatrice Grabowski
JNCI: Journal of the National Cancer Institute, Volume 108, Issue 8, August 2016, djw194, https://doi.org/10.1093/jnci/djw194
Published:
10 August 2016
 Split View
 Views
 Article contents
 Figures & tables
 Video
 Audio
 Supplementary Data

Cite
Cite
Beatrice Grabowski, “P < 0.05” Might Not Mean What You Think: American Statistical Association Clarifies P Values, JNCI: Journal of the National Cancer Institute, Volume 108, Issue 8, August 2016, djw194, https://doi.org/10.1093/jnci/djw194
Close
Search
Close
Search
Advanced Search
Search Menu
In 2011, the U.S. Supreme Court unanimously ruled in Matrixx Initiatives Inc. v. Siracusano that investors could sue a drug company for failing to report adverse drug effects—even though they were not statistically significant.
Describing the case in the April 2, 2011, issue of the Wall Street Journal, Carl Bialik wrote, “A group of mathematicians has been trying for years to have a core statistical concept debunked. Now the Supreme Court might have done it for them.” That conclusion may have been overly optimistic, since misguided use of the P value continued unabated. However, in 2014 concerns about misinterpretation and misuse of P values led the American Statistical Association (ASA) Board to convene a panel of statisticians and experts from a variety of disciplines to draft a policy statement on the use of P values and hypothesis testing. After a year of discussion, ASA published a consensus statement in American Statistician (doi:10.1080/00031305.2016.1154108).
The statement consists of six principles in nontechnical language on the proper interpretation of P values, hypothesis testing, science and policy decisionmaking, and the necessity for full reporting and transparency of research studies. However, assembling a short, clear statement by such a diverse group took longer and was more contentious than expected. Participants wrote supplementary commentaries, available online with the published statement.
The panel discussed many misconceptions about P values. Test your knowledge: Which of the following is true?
P > 0.05 is the probability that the null hypothesis is true.
1 minus the P value is the probability that the alternative hypothesis is true.
A statistically significant test result (P ≤ 0.05) means that the test hypothesis is false or should be rejected.
A P value greater than 0.05 means that no effect was observed.
If you answered “none of the above,” you may understand this slippery concept better than many researchers. The ASA panel defined the P value as “the probability under a specified statistical model that a statistical summary of the data (for example, the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.”
Why is the exact definition so important? Many authors use statistical software that presumably is based on the correct definition. “It’s very easy for researchers to get papers published and survive based on knowledge of what statistical packages are out there but not necessarily how to avoid the problems that statistical packages can create for you if you don’t understand their appropriate use,” said Barnett S. Kramer, M.D., M.P.H., JNCI’s former editor in chief and now director of the National Cancer Institute’s Division of Cancer Prevention. (Kramer was not on the ASA panel.)
Part of the problem lies in how people interpret P values. According to the ASA statement, “A conclusion does not immediately become ‘true’ on one side of the divide and ‘false’ on the other.” Valuable information may be lost because researchers may not pursue “insignificant” results. Conversely, small effects with “significant” P values may be biologically or clinically unimportant. At best, such practices may slow scientific progress and waste resources. At worst, they may cause grievous harm when adverse effects go unreported. The Supreme Court case involved the drug Zicam, which caused permanent hearing loss in some users. Another drug, rofecoxib (Vioxx), was taken off the market because of adverse cardiovascular effects. The drug companies involved did not report those adverse effects because of lack of statistical significance in the original drug tests (Rev. Soc. Econ. 2016;74:83–97; doi:10.1080/00346764.2016.1150730).
ASA panelists encouraged using alternative methods “that emphasize estimation over testing, such as confidence, credibility, or prediction intervals; Bayesian methods; alternative measures of evidence, such as likelihood ratios or Bayes Factors; and other approaches such as decisiontheoretic modeling and false discovery rates.” However, any method can be used invalidly. “If success is defined based on passing some magic threshold, biases may continue to exert their influence regardless of whether the threshold is defined by a P value, Bayes factor, falsediscovery rate, or anything else,” wrote panelist John Ioannidis, Ph.D., professor of medicine and of health research and policy at Stanford University School of Medicine in Stanford , Calif.
Some panelists argued that the P value per se is not the problem and that it has its proper uses. A P value can sometimes be “more informative than an interval”—such as when “the predictor of interest is a multicategorical variable,” said Clarice Weinberg, Ph.D., who was not on the panel. “While it is true that P values are imperfect measures of the extent of evidence against the null hypothesis, confidence intervals have a host of problems of their own,” said Weinberg, deputy chief of the Biostatistics and Computational Biology Branch and principal investigator of the National Institute of Environmental Health Sciences in Research Triangle Park, N.C.
“If success is defined based on passing some magic threshold, biases may continue to exert their influence regardless of whether the threshold is defined by a P value, Bayes factor, falsediscovery rate, or anything else.”
Beyond simple misinterpretation of the P value and the associated loss of information, authors consciously or unconsciously but routinely engage in data dredging (aka fishing, Phacking) and selective reporting. “Any statistical technique can be misused and it can be manipulated especially after you see the data generated from the study,” Kramer said. “You can fish through a sea of data and find one positive finding and then convince yourself that even before you started your study that would have been the key hypothesis and it has a lot of plausibility to the investigator.”
In response to those practices and concerns about replicability in science, some journals have banned the P value and inferential statistics. Others, such as JNCI, require confidence intervals and effect sizes, which “convey what a P value does not: the magnitude and relative importance of an effect,” wrote panel member Regina Nuzzo, Ph.D., professor of mathematics and computer sciences at Gallaudet University in Washington, D.C. (Nature 2014;506:150–2).
How can practice improve? Panel members emphasized the need for full reporting and transparency by authors as well as changes in statistics education. In his commentary, Don Berry, Ph.D., professor of biostatistics at the University of Texas M.D. Anderson Cancer Center in Houston, urged researchers to report every aspect of the study. “The specifics of data collection and curation and even your intentions and motivation are critical for inference. What have you not told the statistician? Have you deleted some data points or experimental units, possibly because they seemed to be outliers?” he wrote.
Kramer advised researchers to “consult a statistician when writing a grant application rather than after the study is finished; limit the number of hypotheses to be tested to a realistic number that doesn’t increase the false discovery rate; be conservative in interpreting the data; don’t consider P = 0.05 as a magic number; and whenever possible, provide confidence intervals.” He also suggested, “Webinars and symposia on this issue will be useful to clinical scientists and bench researchers because they’re often not trained in these principles.” As the ASA statement concludes, “No single index should substitute for scientific reasoning.”
© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Issue Section:
News
Download all slides
Advertisement
Citations
Views
17,508
Altmetric
More metrics information
Email alerts
Article activity alert
Advance article alerts
New issue alert
Receive exclusive offers and updates from Oxford Academic
Citing articles via
Google Scholar

Latest

Most Read

Most Cited
More from Oxford Academic
Medicine and Health
Books
Journals
, Vermont
Advertisement
FAQs
What does it mean when the pvalue for this test is 0.05 explain what this value represents? ›
If the pvalue is less than 0.05, we reject the null hypothesis that there's no difference between the means and conclude that a significant difference does exist. If the pvalue is larger than 0.05, we cannot conclude that a significant difference exists.
When using a 0.05 level of significance What is the pvalue and what is your conclusion? ›What does pvalue of 0.05 mean? If your pvalue is less than or equal to 0.05 (the significance level), you would conclude that your result is statistically significant. This means the evidence is strong enough to reject the null hypothesis in favor of the alternative hypothesis.
What does a P value of P 0.05 mean quizlet? ›What does a P value of 0.05 mean? The probability of the null hypothesis being true is 1 in 20, or 5% When a statistical tests gives that the P value is less than 0.05 (P<0.05) We reject the null hypothesis. We accept the alternative hypothesis.
Why is the pvalue less than .05 significant? ›Again: A pvalue of less than . 05 means that there is less than a 5 percent chance of seeing these results (or more extreme results), in the world where the null hypothesis is true. This sounds nitpicky, but it's critical.
What is meant by significant at p 0.05 in the context of this experiment? ›Significant at p<0.05 means that there is a 5% chance that findings of this experiment are the result of chance and a 95% confidence level that any difference seen in the results is because of the manipulation of the independent variable.
What is the critical value for this test at the p 0.05 level? ›To evaluate the statistical significance of an experimental result with two categories (d = 1), note that the critical value of p_{0.05} = 3.841.
What does a .05 level of significance mean we are? ›For example, a significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference. Lower significance levels indicate that you require stronger evidence before you will reject the null hypothesis.
What is the significance level of 0.05 symbol? ›Significance level symbol
Level of significance = probability value (type I error) = α, in this case. When values or observations deviate from the mean, they are less probable. The findings are denoted as “significant at x%.” For instance, the value significant at 5% denotes a pvalue less than 0.05 or p < 0.05.
Also known as significance level. Normally set to . 05, which means that we may reject the null hypothesis only if the observed data are so unusual that they would have occurred by chance at most 5 percent at a time.
What does an alpha of 0.05 mean that result you obtained could be expected to occur 5 times out of 100? ›An alpha level represents the number of times out of 100 you are willing to be incorrect if you reject the null hypothesis. If you choose an alpha level of 0.05, 5 times out of 100 you will be incorrect if you reject the null hypothesis. Those five times, both means would come from the same population (Case III).
What does the pvalue of a statistical result mean? ›
The P value is defined as the probability under the assumption of no effect or no difference (null hypothesis), of obtaining a result equal to or more extreme than what was actually observed. The P stands for probability and measures how likely it is that any observed difference between groups is due to chance.
What can you determine from pvalues that are less than .05 quizlet? ›The only pvalues that are less than . 05 are for the Intercept (which we do not assess for significance) and ERA.
What if pvalue is greater than 0.05 in correlation? ›If the Pvalue is bigger than the significance level (α =0.05), we fail to reject the null hypothesis. We conclude that the correlation is not statically significant. Or in other words “we conclude that there is not a significant linear correlation between x and y in the population”
What if pvalue is greater than 0.05 in regression? ›A high Pvalue (> 0.05) means that we cannot conclude that the explanatory variable affects the dependent variable (here: if Average_Pulse affects Calorie_Burnage). A high Pvalue is also called an insignificant Pvalue.
What happens if pvalue is less than significance? ›If your P value is less than the chosen significance level then you reject the null hypothesis i.e. accept that your sample gives reasonable evidence to support the alternative hypothesis.
How do you interpret the pvalue quizlet? ›In statistics, the pvalue is the probability of obtaining results at least as extreme as the observed results of a statistical hypothesis test, assuming that the null hypothesis is correct. . A large pvalue (> 0.05) indicates weak evidence against the null hypothesis, so you fail to reject the null hypothesis.
What does a Pvalue 0.01 indicate quizlet? ›A pvalue=0.01 indicates that: not the null hypothesis is true. or there is weak to no sample evidence against the null hypothesis.
What is pvalue in terms of confidence interval? ›In accordance with the conventional acceptance of statistical significance at a Pvalue of 0.05 or 5%, CI are frequently calculated at a confidence level of 95%. In general, if an observed result is statistically significant at a Pvalue of 0.05, then the null hypothesis should not fall within the 95% CI.
Is the pvalue of a test the smallest? ›The Pvalue is the smallest significance level that leads us to reject the null hypothesis. Alternatively (and the way I prefer to think of Pvalues), the Pvalue is the probability that we'd observe a more extreme statistic than we did if the null hypothesis were true.