Machine learning models predicting risk of revision or secondary knee injury after anterior cruciate ligament reconstruction demonstrate variable discriminatory and accuracy performance: a systematic review. Journal Articles uri icon

  •  
  • Overview
  •  
  • Research
  •  
  • Identity
  •  
  • Additional Document Info
  •  
  • View All
  •  

abstract

  • BACKGROUND: To summarize the statistical performance of machine learning in predicting revision, secondary knee injury, or reoperations following anterior cruciate ligament reconstruction (ACLR), and to provide a general overview of the statistical performance of these models. METHODS: Three online databases (PubMed, MEDLINE, EMBASE) were searched from database inception to February 6, 2024, to identify literature on the use of machine learning to predict revision, secondary knee injury (e.g. anterior cruciate ligament (ACL) or meniscus), or reoperation in ACLR. The authors adhered to the PRISMA and R-AMSTAR guidelines as well as the Cochrane Handbook for Systematic Reviews of Interventions. Demographic data and machine learning specifics were recorded. Model performance was recorded using discrimination, area under the curve (AUC), concordance, calibration, and Brier score. Factors deemed predictive for revision, secondary injury or reoperation were also extracted. The MINORS criteria were used for methodological quality assessment. RESULTS: Nine studies comprising 125,427 patients with a mean follow-up of 5.82 (0.08-12.3) years were included in this review. Two of nine (22.2%) studies served as external validation analyses. Five (55.6%) studies reported on mean AUC (strongest model range 0.77-0.997). Four (44.4%) studies reported mean concordance (strongest model range: 0.67-0.713). Two studies reported on Brier score, calibration intercept, and calibration slope, with values ranging from 0.10 to 0.18, 0.0051-0.006, and 0.96-0.97 amongst highest performing models, respectively. Four studies reported calibration error, with all four studies demonstrating significant miscalibration at either two or five-year follow-ups amongst 10 of 14 models assessed. CONCLUSION: Machine learning models designed to predict the risk of revision or secondary knee injury demonstrate variable discriminatory performance when evaluated with AUC or concordance metrics. Furthermore, there is variable calibration, with several models demonstrating evidence of miscalibration at two or five-year marks. The lack of external validation of existing models limits the generalizability of these findings. Future research should focus on validating current models in addition to developing new multimodal neural networks to improve accuracy and reliability.

authors

  • Blackman, Benjamin
  • Vivekanantha, Prushoth
  • Mughal, Rafay
  • Pareek, Ayoosh
  • Bozzo, Anthony
  • Samuelsson, Kristian
  • de SA, Darren

publication date

  • January 4, 2025