How reliable are reliability studies of fracture classifications?A systematic review of their methodologies

Two independent reviewers performed a search in MEDLINE and EMBASE for fracture classification reliability studies. Data were obtained on classifications, image modalities, fracture selection processes, sample sizes and their justification, type and number of raters, practical issues for the classification sessions, statistical methods, and results. A 10-item checklist was devised for quality assessment of methodologies. 44 studies assessing 32 fracture classification systems were included. We found a wide variation of methodologies. For instance, the median number of raters was 5 (2-36) and the median number of fractures was 50 (10-200). This selection was considered representative in 17/44 of the studies. The true distribution of classification categories was estimated in 9 studies. The kappa coefficient was mostly used (39/44) to quantify the raters' agreement. Methodological issues are discussed. Given limitations in the use and interpretation of kappa coefficients, investigators should consider alternative methods that focus upon the accuracy of the classification systems. The development and adoption of a systematic methodological approach to the development and validation of fracture classification systems is needed.

How reliable are reliability studies of fracture classifications?A systematic review of their methodologies Journal Articles