abstract
- PURPOSE: To assess the interobserver variability of readers when outlining breast tumors in MRI, study the reasons behind the variability, and quantify the effect of the variability on algorithmic imaging features extracted from breast MRI. METHODS: Four readers annotated breast tumors from the MRI examinations of 50 patients from one institution using a bounding box to indicate a tumor. All of the annotated tumors were biopsy proven cancers. The similarity of bounding boxes was analyzed using Dice coefficients. An automatic tumor segmentation algorithm was used to segment tumors from the readers' annotations. The segmented tumors were then compared between readers using Dice coefficients as the similarity metric. Cases showing high interobserver variability (average Dice coefficient <0.8) after segmentation were analyzed by a panel of radiologists to identify the reasons causing the low level of agreement. Furthermore, an imaging feature, quantifying tumor and breast tissue enhancement dynamics, was extracted from each segmented tumor for a patient. Pearson's correlation coefficients were computed between the features for each pair of readers to assess the effect of the annotation on the feature values. Finally, the authors quantified the extent of variation in feature values caused by each of the individual reasons for low agreement. RESULTS: The average agreement between readers in terms of the overlap (Dice coefficient) of the bounding box was 0.60. Automatic segmentation of tumor improved the average Dice coefficient for 92% of the cases to the average value of 0.77. The mean agreement between readers expressed by the correlation coefficient for the imaging feature was 0.96. CONCLUSIONS: There is a moderate variability between readers when identifying the rectangular outline of breast tumors on MRI. This variability is alleviated by the automatic segmentation of the tumors. Furthermore, the moderate interobserver variability in terms of the bounding box does not translate into a considerable variability in terms of assessment of enhancement dynamics. The authors propose some additional ways to further reduce the interobserver variability.