The Durk Pearson & Sandy Shaw®
Life Extension NewsTM
Volume 16 No. 11 • December 2013
The Difference Between Statistical Significance and Biological Meaningfulness
A recent paper1 discusses the importance of distinguishing between a statistically significant result in an experiment and a biologically (or clinically) meaningful one. Statistical significance at p<0.05 means that the null hypothesis, that there is no difference between the treatment and no treatment conditions, is rejected. There is a 95% chance that there is a difference between these two conditions. But what this doesn’t tell you is if the difference is large enough to make a biological or clinical difference, which is really the important bottom line. Does the difference mean that a patient is likely to get a meaningful benefit from a therapy that is statistically significant in relation to a placebo control. It might but it might not. Hence, the question is of an entirely different nature. The author of the new paper discussed here1 focuses on experimental design as a critical factor in getting an answer to the question of whether a significant result is meaningful.
One of the approaches discussed in the paper is that the statistical test of p value should be used only to gauge whether the null hypothesis has been rejected and not whether another hypothesis has been accepted. As the author expresses it: “… each experiment is, in effect, one from a population of possible experiments and is thus only an estimate (with a distribution) of the real difference. By ‘bad luck’ the actual experiment performed can give an estimate in one of the tails and the study would be reported as ‘not significant’ even when there is a real difference (a Type II error).” “Importantly, the p value is the probability that a difference as large as or larger than that seen in the experiment would have occurred by chance alone if the treatment groups were in fact not different. It is NOT the probability that the null hypothesis is true, which is a frequent but serious misinterpretation.” “Ionnidis has highlighted the problems in a number of biological fields including genome-wide association studies, gene expression experiments, and the lack of reproducibility of highly cited papers that result from an over-reliance on statistical significance.
Scientists have begun developing concepts such as the minimum clinically important difference or the clinically relevant difference, but the evaluation of such measures would clearly depend on an appreciable amount of expert, but nevertheless, subjective judgment. The EFSA steering committee (European Food Safety Authority) has been said to recommend that ‘… the nature and size of biological changes or differences seen in studies that would be considered relevant should be defined before studies are initiated. The size of such changes should be used to design studies with sufficient statistical power to be able to detect effects of such size if they truly occurred.” (quoted in paper #1)
These issues have arisen in the context of studies of equivalent or noninferiority or superiority effects. “Instead of trying to show that there is no statistically significant difference between two formulations, the objective of these methods is to show that although there may be a difference, this is sufficiently small to be considered as not biologically important or relevant.”