Using Natural Language Processing to Evaluate the Quality of Supervisor Narrative Comments in Competency-Based Medical Education Journal Articles uri icon

  •  
  • Overview
  •  
  • Research
  •  
  • Identity
  •  
  • Additional Document Info
  •  
  • View All
  •  

abstract

  • Abstract Purpose Learner development and promotion rely heavily on narrative assessment comments, but narrative assessment quality is rarely evaluated in medical education. Educators have developed tools such as the Quality of Assessment for Learning (QuAL) tool to evaluate the quality of narrative assessment comments; however, scoring the comments generated in medical education assessment programs is time intensive. The authors developed a natural language processing (NLP) model for applying the QuAL score to narrative supervisor comments. Method Samples of 2,500 Entrustable Professional Activities assessments were randomly extracted and deidentified from the McMaster (1,250 comments) and Saskatchewan (1,250 comments) emergency medicine (EM) residency training programs during the 2019–2020 academic year. Comments were rated using the QuAL score by 25 EM faculty members and 25 EM residents. The results were used to develop and test an NLP model to predict the overall QuAL score and QuAL subscores. Results All 50 raters completed the rating exercise. Approximately 50% of the comments had perfect agreement on the QuAL score, with the remaining resolved by the study authors. Creating a meaningful suggestion for improvement was the key differentiator between high- and moderate-quality feedback. The overall QuAL model predicted the exact human-rated score or 1 point above or below it in 87% of instances. Overall model performance was excellent, especially regarding the subtasks on suggestions for improvement and the link between resident performance and improvement suggestions, which achieved 85% and 82% balanced accuracies, respectively. Conclusions This model could save considerable time for programs that want to rate the quality of supervisor comments, with the potential to automatically score a large volume of comments. This model could be used to provide faculty with real-time feedback or as a tool to quantify and track the quality of assessment comments at faculty, rotation, program, or institution levels.

authors

  • Spadafore, Maxwell
  • Yilmaz, Yusuf
  • Rally, Veronica
  • Chan, Teresa
  • Russell, Mackenzie
  • Thoma, Brent
  • Singh, Sim
  • Monteiro, Sandra
  • Pardhan, Alim
  • Martin, Lynsey
  • Monrad, Seetha U
  • Woods, Rob

publication date

  • May 2024