“P < 0.05” Might Not Mean What You Think: American Statistical Association Clarifies P Values (2023)

Article Navigation

Volume 108 Issue 8 August 2016
    • < Previous
    • Next >

    Journal Article

    Beatrice Grabowski

    Beatrice Grabowski

    Search for other works by this author on:

    Oxford Academic

    JNCI: Journal of the National Cancer Institute, Volume 108, Issue 8, August 2016, djw194, https://doi.org/10.1093/jnci/djw194


    10 August 2016

    • PDF
    • Split View
    • Views
      • Article contents
      • Figures & tables
      • Video
      • Audio
      • Supplementary Data
    • Cite


      Beatrice Grabowski, “P < 0.05” Might Not Mean What You Think: American Statistical Association Clarifies P Values, JNCI: Journal of the National Cancer Institute, Volume 108, Issue 8, August 2016, djw194, https://doi.org/10.1093/jnci/djw194





    Advanced Search

    Search Menu

    In 2011, the U.S. Supreme Court unanimously ruled in Matrixx Initiatives Inc. v. Siracusano that investors could sue a drug company for failing to report adverse drug effects—even though they were not statistically significant.

    Describing the case in the April 2, 2011, issue of the Wall Street Journal, Carl Bialik wrote, “A group of mathematicians has been trying for years to have a core statistical concept debunked. Now the Supreme Court might have done it for them.” That conclusion may have been overly optimistic, since misguided use of the P value continued unabated. However, in 2014 concerns about misinterpretation and misuse of P values led the American Statistical Association (ASA) Board to convene a panel of statisticians and experts from a variety of disciplines to draft a policy statement on the use of P values and hypothesis testing. After a year of discussion, ASA published a consensus statement in American Statistician (doi:10.1080/00031305.2016.1154108).

    The statement consists of six principles in nontechnical language on the proper interpretation of P values, hypothesis testing, science and policy decision-making, and the necessity for full reporting and transparency of research studies. However, assembling a short, clear statement by such a diverse group took longer and was more contentious than expected. Participants wrote supplementary commentaries, available online with the published statement.

    The panel discussed many misconceptions about P values. Test your knowledge: Which of the following is true?

    • P > 0.05 is the probability that the null hypothesis is true.

    • 1 minus the P value is the probability that the alternative hypothesis is true.

    • A statistically significant test result (P ≤ 0.05) means that the test hypothesis is false or should be rejected.

    • A P value greater than 0.05 means that no effect was observed.

    If you answered “none of the above,” you may understand this slippery concept better than many researchers. The ASA panel defined the P value as “the probability under a specified statistical model that a statistical summary of the data (for example, the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.”

    Why is the exact definition so important? Many authors use statistical software that presumably is based on the correct definition. “It’s very easy for researchers to get papers published and survive based on knowledge of what statistical packages are out there but not necessarily how to avoid the problems that statistical packages can create for you if you don’t understand their appropriate use,” said Barnett S. Kramer, M.D., M.P.H., JNCI’s former editor in chief and now director of the National Cancer Institute’s Division of Cancer Prevention. (Kramer was not on the ASA panel.)

    Part of the problem lies in how people interpret P values. According to the ASA statement, “A conclusion does not immediately become ‘true’ on one side of the divide and ‘false’ on the other.” Valuable information may be lost because researchers may not pursue “insignificant” results. Conversely, small effects with “significant” P values may be biologically or clinically unimportant. At best, such practices may slow scientific progress and waste resources. At worst, they may cause grievous harm when adverse effects go unreported. The Supreme Court case involved the drug Zicam, which caused permanent hearing loss in some users. Another drug, rofecoxib (Vioxx), was taken off the market because of adverse cardiovascular effects. The drug companies involved did not report those adverse effects because of lack of statistical significance in the original drug tests (Rev. Soc. Econ. 2016;74:83–97; doi:10.1080/00346764.2016.1150730).

    ASA panelists encouraged using alternative methods “that emphasize estimation over testing, such as confidence, credibility, or prediction intervals; Bayesian methods; alternative measures of evidence, such as likelihood ratios or Bayes Factors; and other approaches such as decision-theoretic modeling and false discovery rates.” However, any method can be used invalidly. “If success is defined based on passing some magic threshold, biases may continue to exert their influence regardless of whether the threshold is defined by a P value, Bayes factor, false-discovery rate, or anything else,” wrote panelist John Ioannidis, Ph.D., professor of medicine and of health research and policy at Stanford University School of Medicine in Stanford , Calif.

    Some panelists argued that the P value per se is not the problem and that it has its proper uses. A P value can sometimes be “more informative than an interval”—such as when “the predictor of interest is a multicategorical variable,” said Clarice Weinberg, Ph.D., who was not on the panel. “While it is true that P values are imperfect measures of the extent of evidence against the null hypothesis, confidence intervals have a host of problems of their own,” said Weinberg, deputy chief of the Biostatistics and Computational Biology Branch and principal investigator of the National Institute of Environmental Health Sciences in Research Triangle Park, N.C.

    “If success is defined based on passing some magic threshold, biases may continue to exert their influence regardless of whether the threshold is defined by a P value, Bayes factor, false-discovery rate, or anything else.”

    Beyond simple misinterpretation of the P value and the associated loss of information, authors consciously or unconsciously but routinely engage in data dredging (aka fishing, P-hacking) and selective reporting. “Any statistical technique can be misused and it can be manipulated especially after you see the data generated from the study,” Kramer said. “You can fish through a sea of data and find one positive finding and then convince yourself that even before you started your study that would have been the key hypothesis and it has a lot of plausibility to the investigator.”

    In response to those practices and concerns about replicability in science, some journals have banned the P value and inferential statistics. Others, such as JNCI, require confidence intervals and effect sizes, which “convey what a P value does not: the magnitude and relative importance of an effect,” wrote panel member Regina Nuzzo, Ph.D., professor of mathematics and computer sciences at Gallaudet University in Washington, D.C. (Nature 2014;506:150–2).

    How can practice improve? Panel members emphasized the need for full reporting and transparency by authors as well as changes in statistics education. In his commentary, Don Berry, Ph.D., professor of biostatistics at the University of Texas M.D. Anderson Cancer Center in Houston, urged researchers to report every aspect of the study. “The specifics of data collection and curation and even your intentions and motivation are critical for inference. What have you not told the statistician? Have you deleted some data points or experimental units, possibly because they seemed to be outliers?” he wrote.

    “P < 0.05” Might Not Mean What You Think: American Statistical Association Clarifies P Values (3)

    Kramer advised researchers to “consult a statistician when writing a grant application rather than after the study is finished; limit the number of hypotheses to be tested to a realistic number that doesn’t increase the false discovery rate; be conservative in interpreting the data; don’t consider P = 0.05 as a magic number; and whenever possible, provide confidence intervals.” He also suggested, “Webinars and symposia on this issue will be useful to clinical scientists and bench researchers because they’re often not trained in these principles.” As the ASA statement concludes, “No single index should substitute for scientific reasoning.”

    © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

    Issue Section:


    Download all slides






    More metrics information


    Email alerts

    Article activity alert

    Advance article alerts

    New issue alert

    Receive exclusive offers and updates from Oxford Academic

    Citing articles via

    Google Scholar

    • Latest

    • Most Read

    • Most Cited

    Bilaterality, not multifocality, is an independent risk factor for recurrence in low risk papillary thyroid cancer
    Disparities in parental distress in a multicenter clinical trial for pediatric acute lymphoblastic leukemia
    Lung cancer risk discrimination of prediagnostic proteomics measurements compared with existing prediction tools
    Consensus Report of the 2021 National Cancer Institute Neuroendocrine Tumor Clinical Trials Planning Meeting
    Risk of ER-specific breast cancer by family history of ER subtypes and other cancers

    More from Oxford Academic

    Medicine and Health





    What does it mean when the p-value for this test is 0.05 explain what this value represents? ›

    If the p-value is less than 0.05, we reject the null hypothesis that there's no difference between the means and conclude that a significant difference does exist. If the p-value is larger than 0.05, we cannot conclude that a significant difference exists.

    When using a 0.05 level of significance What is the p-value and what is your conclusion? ›

    What does p-value of 0.05 mean? If your p-value is less than or equal to 0.05 (the significance level), you would conclude that your result is statistically significant. This means the evidence is strong enough to reject the null hypothesis in favor of the alternative hypothesis.

    What does a P value of P 0.05 mean quizlet? ›

    What does a P value of 0.05 mean? The probability of the null hypothesis being true is 1 in 20, or 5% When a statistical tests gives that the P value is less than 0.05 (P<0.05) We reject the null hypothesis. We accept the alternative hypothesis.

    Why is the p-value less than .05 significant? ›

    Again: A p-value of less than . 05 means that there is less than a 5 percent chance of seeing these results (or more extreme results), in the world where the null hypothesis is true. This sounds nitpicky, but it's critical.

    What is meant by significant at p 0.05 in the context of this experiment? ›

    Significant at p<0.05 means that there is a 5% chance that findings of this experiment are the result of chance and a 95% confidence level that any difference seen in the results is because of the manipulation of the independent variable.

    What is the critical value for this test at the p 0.05 level? ›

    To evaluate the statistical significance of an experimental result with two categories (d = 1), note that the critical value of p0.05 = 3.841.

    What does a .05 level of significance mean we are? ›

    For example, a significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference. Lower significance levels indicate that you require stronger evidence before you will reject the null hypothesis.

    What is the significance level of 0.05 symbol? ›

    Significance level symbol

    Level of significance = probability value (type I error) = α, in this case. When values or observations deviate from the mean, they are less probable. The findings are denoted as “significant at x%.” For instance, the value significant at 5% denotes a p-value less than 0.05 or p < 0.05.

    What does a .05 level of significance mean quizlet? ›

    Also known as significance level. Normally set to . 05, which means that we may reject the null hypothesis only if the observed data are so unusual that they would have occurred by chance at most 5 percent at a time.

    What does an alpha of 0.05 mean that result you obtained could be expected to occur 5 times out of 100? ›

    An alpha level represents the number of times out of 100 you are willing to be incorrect if you reject the null hypothesis. If you choose an alpha level of 0.05, 5 times out of 100 you will be incorrect if you reject the null hypothesis. Those five times, both means would come from the same population (Case III).

    What does the p-value of a statistical result mean? ›

    The P value is defined as the probability under the assumption of no effect or no difference (null hypothesis), of obtaining a result equal to or more extreme than what was actually observed. The P stands for probability and measures how likely it is that any observed difference between groups is due to chance.

    What can you determine from p-values that are less than .05 quizlet? ›

    The only p-values that are less than . 05 are for the Intercept (which we do not assess for significance) and ERA.

    What if p-value is greater than 0.05 in correlation? ›

    If the P-value is bigger than the significance level (α =0.05), we fail to reject the null hypothesis. We conclude that the correlation is not statically significant. Or in other words “we conclude that there is not a significant linear correlation between x and y in the population”

    What if p-value is greater than 0.05 in regression? ›

    A high P-value (> 0.05) means that we cannot conclude that the explanatory variable affects the dependent variable (here: if Average_Pulse affects Calorie_Burnage). A high P-value is also called an insignificant P-value.

    What happens if p-value is less than significance? ›

    If your P value is less than the chosen significance level then you reject the null hypothesis i.e. accept that your sample gives reasonable evidence to support the alternative hypothesis.

    How do you interpret the p-value quizlet? ›

    In statistics, the p-value is the probability of obtaining results at least as extreme as the observed results of a statistical hypothesis test, assuming that the null hypothesis is correct. . A large p-value (> 0.05) indicates weak evidence against the null hypothesis, so you fail to reject the null hypothesis.

    What does a P-value 0.01 indicate quizlet? ›

    A p-value=0.01 indicates that: not the null hypothesis is true. or there is weak to no sample evidence against the null hypothesis.

    What is p-value in terms of confidence interval? ›

    In accordance with the conventional acceptance of statistical significance at a P-value of 0.05 or 5%, CI are frequently calculated at a confidence level of 95%. In general, if an observed result is statistically significant at a P-value of 0.05, then the null hypothesis should not fall within the 95% CI.

    Is the p-value of a test the smallest? ›

    The P-value is the smallest significance level that leads us to reject the null hypothesis. Alternatively (and the way I prefer to think of P-values), the P-value is the probability that we'd observe a more extreme statistic than we did if the null hypothesis were true.


    Top Articles
    Latest Posts
    Article information

    Author: Foster Heidenreich CPA

    Last Updated: 23/10/2023

    Views: 5642

    Rating: 4.6 / 5 (56 voted)

    Reviews: 87% of readers found this page helpful

    Author information

    Name: Foster Heidenreich CPA

    Birthday: 1995-01-14

    Address: 55021 Usha Garden, North Larisa, DE 19209

    Phone: +6812240846623

    Job: Corporate Healthcare Strategist

    Hobby: Singing, Listening to music, Rafting, LARPing, Gardening, Quilting, Rappelling

    Introduction: My name is Foster Heidenreich CPA, I am a delightful, quaint, glorious, quaint, faithful, enchanting, fine person who loves writing and wants to share my knowledge and understanding with you.