A recent comment in Nature calls for ending with the uncritical reporting of p-values as the main criterion to determine the acceptance or rejection of a hypothesis. They claim that p-value reporting fosters a dichotomous way of thinking which leads to misinterpretation of results. That means, that having a significant result, does not mean there is an actual ‘difference’, nor a lack of significance means one should discard a potential difference.
“Don’t say statistically significant”
The main critique has little to do with the statistical use of p-values, and in fact, the authors of the comment do not suggest that p-values should be banned from research. On the contrary, it has to do with the why they are interpreted, easing the path into straightforward answers, instead of confronting uncertainty. Hence, the criticism does not necessarily affect to the way things are done but to the way things are said and shown.
As an alternative, they suggest talking about ‘compatibility intervals’ instead of confidence intervals, that is, concluding for instance, that some results are compatible with the hypothesis x or y. They also indicate that more importance should be given to the estimate, as it is the most compatible point and values near it are more plausible than further ones. Furthermore, the thresholds used in significance tests (usually 0.05) are arbitrary and may not be justified many times. Finally, the fact that a result is significant or not might be due to the statistical assumptions made on the design of the model.
Searching for alternatives
If one of Nature’s comments presented the problem, the other piece presented some alternatives. Here five statistician offered some advice, and I was happy to see that one was J. Leek. I must confess I greatly admire the work by Jeff Leek and his group, and have followed for many years their blog Simply Statistics. Following I summarize some of those solutions:
- Use graphs to illustrate differences (preferably bars).
- Report in a non-misleading way.
- If theory and common sense go against a statistically significant result, you should question it.
- Be open on your data retrieval, processing and reporting practices.
- Report false-positive risk.