An international multicenter study to evaluate reproducibility of automated scoring for assessment of Ki67 in breast cancer
- Additional Document Info
- View All
The nuclear proliferation biomarker Ki67 has potential prognostic, predictive, and monitoring roles in breast cancer. Unacceptable between-laboratory variability has limited its clinical value. The International Ki67 in Breast Cancer Working Group investigated whether Ki67 immunohistochemistry can be analytically validated and standardized across laboratories using automated machine-based scoring. Sets of pre-stained core-cut biopsy sections of 30 breast tumors were circulated to 14 laboratories for scanning and automated assessment of the average and maximum percentage of tumor cells positive for Ki67. Seven unique scanners and 10 software platforms were involved in this study. Pre-specified analyses included evaluation of reproducibility between all laboratories (primary) as well as among those using scanners from a single vendor (secondary). The primary reproducibility metric was intraclass correlation coefficient between laboratories, with success considered to be intraclass correlation coefficient >0.80. Intraclass correlation coefficient for automated average scores across 16 operators was 0.83 (95% credible interval: 0.73-0.91) and intraclass correlation coefficient for maximum scores across 10 operators was 0.63 (95% credible interval: 0.44-0.80). For the laboratories using scanners from a single vendor (8 score sets), intraclass correlation coefficient for average automated scores was 0.89 (95% credible interval: 0.81-0.96), which was similar to the intraclass correlation coefficient of 0.87 (95% credible interval: 0.81-0.93) achieved using these same slides in a prior visual-reading reproducibility study. Automated machine assessment of average Ki67 has the potential to achieve between-laboratory reproducibility similar to that for a rigorously standardized pathologist-based visual assessment of Ki67. The observed intraclass correlation coefficient was worse for maximum compared to average scoring methods, suggesting that maximum score methods may be suboptimal for consistent measurement of proliferation. Automated average scoring methods show promise for assessment of Ki67 scoring, but requires further standardization and subsequent clinical validation.
has subject area