Artificial intelligence in the interpretation of upper extremity trauma radiographs: a systematic review and meta-analysis. Journal Articles uri icon

  •  
  • Overview
  •  
  • Research
  •  
  • Identity
  •  
  • Additional Document Info
  •  
  • View All
  •  

abstract

  • BACKGROUND: Upper extremity fractures represent a significant reason for emergency room visits; however, nonexpert readings commonly lead to diagnostic errors, particularly missed fractures. Artificial intelligence (AI) has emerged as a promising tool to aid in fracture detection, but it has been shown to be comparable to physicians at best, so it remains unclear whether there is value in its increasing implementation. This review aims to analyze the existing literature on AI in the identification and interpretation of upper extremity fractures on x-ray and to assess the diagnostic performance of such AI models. METHODS: Three databases were searched (MEDLINE, Embase, and CENTRAL) for studies involving AI and imaging in upper extremity orthopedics. The review was conducted in adherence to the Preferred Reporting Items for Systematic reviews and Meta-Analyses guidelines. Inclusion criteria were papers that (1) investigated fractures of the upper extremity, (2) included the use of AI models to identify or augment the identification of fractures on imaging, identify characteristic of images, or classify images, and (3) assessed X-ray, computed tomography, or magnetic resonance imaging identification of fractures. Exclusion criteria were papers that (1) were not published in English, (2) were case reports, conference abstracts, editorials, or review articles, (3) related to hand and wrist orthopedics, and (4) reported upper extremity data integrated with nonupper extremity data. Data on fracture detection accuracy, area under the curve, sensitivity, and specificity were recorded. The Quality Assessment of Diagnostic Accuracy Studies score was used to conduct a quality assessment of all included studies. A meta-analysis was conducted on the sensitivity, specificity, and AI-reader differences in sensitivity and specificity on relevant studies. RESULTS: A total of 16 studies were included in this review. The mean accuracy of AI models across 5 studies was 89.9%. The mean area under the curve across 8 studies was 0.932. Across 10 studies, the pooled sensitivity and specificity for fracture detection of the AI models were 97.65% (95% confidence interval [CI]: 97.16%-98.13%, I 2 = 98.15%) and 91.38% (95% CI: 90.87%-91.89%, I 2 = 96.91%), respectively. The pooled AI-reader difference in sensitivity and specificity across the same 10 studies were 8.43% (95% CI: 7.97%-8.89%, I 2 = 79.45%) and 1.98% (95% CI: 1.47%-2.49%, I 2 = 90.50%), respectively. DISCUSSION: The AI models show promising diagnostic accuracy in the detection of upper extremity fractures, but there is significant variability in the results. Future studies should investigate whether factors such as model type or anatomic location influence accuracy in order to guide physicians on where such models will meet a minimum standard of accuracy.

publication date

  • August 2025