To p or not to p—The question of statistical significance: Part 2

In a previous post, we discussed the growing resistance to accepting the validity of validity of p<0.05 as a measure of a true effect. That debate is far from over and has picked up more steam. A March 2019 article authored by a zoology professor, a professor of epidemiology and statistics, and a statistical methodologist and professor of marketing and endorsed by more than 800 signatories expresses the “pervasive problem” in lay language:

Let’s be clear about what must stop: we should never conclude there is ‘no difference’ or ‘no association’ just because a P value is larger than a threshold such as 0.05 or, equivalently, because a confidence interval includes zero. Neither should we conclude that two studies conflict because one had a statistically significant result and the other did not. These errors waste research efforts and misinform policy decisions.

The authors of “Scientists Rise up Against Statistical Significance,”  published in the journal Nature, call for the entire concept of statistical significance to be abandoned, but added this qualification:

We are not calling for a ban on P values. Nor are we saying they cannot be used as a decision criterion in certain specialized applications (such as determining whether a manufacturing process meets some quality-control standard). And we are also not advocating for an anything-goes situation, in which weak evidence suddenly becomes credible. Rather, and in line with many others over the decades, we are calling for a stop to the use of P values in the conventional, dichotomous way—to decide whether a result refutes or supports a scientific hypothesis.

Also in March 2019, The American Statistician devoted an entire issue to “Statistical Inference in the 21st Century: A World Beyond p < 0.05.” In the opening editorial, the journal editors reiterate the long list of P value don’ts—all of which are still common in the literature, so lots of people haven’t gotten the message—and then explain the purpose of the special issue:

Knowing what not to do with p-values is indeed necessary, but it does not suffice. It is as though statisticians were asking users of statistics to tear out the beams and struts holding up the edifice of modern scientific research without offering solid construction materials to replace them. Pointing out old, rotting timbers was a good start, but now we need more.

 

The 43 articles in the special issue propose new ideas for good statistical practice and are a carefully considered attempt to move to the next step beyond the 2016 “American Statistical Association Statement on P-Values: Context, Process, and Purpose.”

As the close of their introduction to the issue, the editors take a bold step:

The ASA Statement on P-Values and Statistical Significance stopped just short of recommending that declarations of “statistical significance” be abandoned. We take that step here. We conclude, based on our review of the articles in this special issue and the broader literature, that it is time to stop using the term “statistically significant” entirely.

Worth repeating: “…it is time to stop using the term ‘statistically significant’ entirely.”