The fragility of statistically significant...

The fragility of statistically significant findings from randomized trials in spine surgery: a systematic survey

Abstract

BACKGROUND CONTEXT: Randomized controlled trials (RCTs) are the most trustworthy source for evaluating treatment effects, but RCTs of spine surgery interventions often produce discordant results. The Fragility Index is a novel metric to inform about the robustness of statistically significant results. PURPOSE: The aim was to determine the robustness of statistically significant results from RCTs of spine surgery interventions. STUDY DESIGN/SETTING: This was a systematic survey. PATIENT SAMPLE: The sample included RCTs of spine surgery interventions. OUTCOME MEASURES: The Fragility Index is the minimum number of patients in a trial whose status would have to change from a nonevent to an event to change a statistically significant result to a nonsignificant result. Events refer to the occurrence of any dichotomous outcome, such as successful fusion, incident fracture, adjacent segment degeneration, or achievement of a certain functional score. A small Fragility Index indicates that the statistical significance of a result hinges on only a few events, and a large Fragility Index increases one's confidence in the observed treatment effects. METHODS: We systematically reviewed a database for evidence-based orthopedics and identified all the RCTs that reported at least one positive outcome (ie, p<.05). Two reviewers independently assessed eligibility and extracted data. We used the Fisher exact test to compute Fragility Index values and multivariable linear regression to evaluate potential associated factors. RESULTS: We identified 40 eligible RCTs with a median sample size of 132 patients (interquartile range [IQR] 79-208) and a median total number of outcome events for the chosen outcome of 31 (IQR 13-63). The median Fragility Index was two (IQR 1-3), which means that adding two events to one of the trial's treatment arms eliminated its statistical significance. The Fragility Index was less than or equal to three events in 75% of the trials, and was less than or equal to the number of patients lost to follow-up in 65% of the trials. Fragility Index values correlated positively with total sample size (r=0.35; p<.05). When adjusted for losses to follow-up and risk of bias, increasing Fragility Index values were associated only with increasingly significant reported p values (p<.01). CONCLUSIONS: Statistically significant results in spine surgery RCTs are frequently fragile. The addition of only a small number of outcome events can completely eliminate significance. Surgeons, researchers, and other evidence users should exercise caution when interpreting the findings from RCTs with low Fragility Index values and applying these results to patient care.