An international study to increase concordance in Ki67 scoring
- Additional Document Info
- View All
Although an important biomarker in breast cancer, Ki67 lacks scoring standardization, which has limited its clinical use. Our previous study found variability when laboratories used their own scoring methods on centrally stained tissue microarray slides. In this current study, 16 laboratories from eight countries calibrated to a specific Ki67 scoring method and then scored 50 centrally MIB-1 stained tissue microarray cases. Simple instructions prescribed scoring pattern and staining thresholds for determination of the percentage of stained tumor cells. To calibrate, laboratories scored 18 'training' and 'test' web-based images. Software tracked object selection and scoring. Success for the calibration was prespecified as Root Mean Square Error of scores compared with reference <0.6 and Maximum Absolute Deviation from reference <1.0 (log2-transformed data). Prespecified success criteria for tissue microarray scoring required intraclass correlation significantly >0.70 but aiming for observed intraclass correlation ≥0.90. Laboratory performance showed non-significant but promising trends of improvement through the calibration exercise (mean Root Mean Square Error decreased from 0.6 to 0.4, Maximum Absolute Deviation from 1.6 to 0.9; paired t-test: P=0.07 for Root Mean Square Error, 0.06 for Maximum Absolute Deviation). For tissue microarray scoring, the intraclass correlation estimate was 0.94 (95% credible interval: 0.90-0.97), markedly and significantly >0.70, the prespecified minimum target for success. Some discrepancies persisted, including around clinically relevant cutoffs. After calibrating to a common scoring method via a web-based tool, laboratories can achieve high inter-laboratory reproducibility in Ki67 scoring on centrally stained tissue microarray slides. Although these data are potentially encouraging, suggesting that it may be possible to standardize scoring of Ki67 among pathology laboratories, clinically important discrepancies persist. Before this biomarker could be recommended for clinical use, future research will need to extend this approach to biopsies and whole sections, account for staining variability, and link to outcomes.
has subject area