Randomised trial protocols may incorporate interim analyses, with the potential to stop the study for futility if early data show insufficient promise of a treatment benefit. Previously, we have shown that this approach will theoretically lead to mis-estimation of the treatment effect. We now wished to ascertain the importance of this phenomenon in practice.
We reviewed the methods and results in a set of trials that had stopped for futility, identified through an extensive literature search. We recorded clinical areas, interventions, study design, outcomes, trial setting, sponsorship, planned and actual treatment effects, sample sizes; power; and if there was a data safety monitoring board, or a published protocol. We identified: if interim analyses were pre-specified, and how many analyses actually occurred; what pre-specified criteria might define futility; if a futility analysis formed the basis for stopping; who made the decision to stop; and the conditional power of each study, i.e. the probability of statistically significant results if the study were to continue to its complete sample size.
We identified 52 eligible trials, covering many clinical areas. Most trials had multiple centres, tested drugs, and 40% were industry sponsored. There were 75% where at least one interim analysis was planned a priori; a majority had only one interim analysis, typically with about half the target total sample size. A majority of trials did not pre-define a stopping rule, and a variety of reasons were given for stopping. Few studies calculated and reported low conditional power to justify the early stop. When conditional power could be calculated, it was typically low, especially under the current trend hypothesis. However, under the original design hypothesis, a few studies had relatively high conditional power. Data collection often continued after the interim analysis.
Although other factors will typically be involved, we conclude that, from the perspective of conditional power, stopping early for futility was probably reasonable in most cases, but documentation of the basis for stopping was often missing or vague. Interpretation of truncated trials would be enhanced by improved reporting of stopping protocols, and of their actual execution.