The Quality of Assessment of Learning (Qual)...

The Quality of Assessment of Learning (Qual) Score: Validity Evidence for a Scoring System Aimed at Rating Short, Workplace-Based Comments on Trainee Performance

Abstract

Construct: This study seeks to determine validity evidence for the Quality of Assessment for Learning score (QuAL score), which was created to evaluate short qualitative comments that are related to specific scores entered into a workplace-based assessment, common within the competency-based medical education (CBME) context. Background: In the age of CBME, qualitative comments play an important role in clarifying the quantitative scores rendered by observers at the bedside. Currently there are few practical tools that evaluate mixed data (e.g. associated score-and-comment data), other than the comprehensive Completed Clinical Evaluation Report Rating tool (CCERR) that was originally derived to rate end-of-rotation reports. Approach: A multi-center, randomized cohort-based rating exercise was conducted to evaluate the rating properties of the QuAL score as compared to the CCERR. One group rated comments using the QuAL score, and the other group rated comments using the CCERR. A generalizability study (G-Study) and a decision study (D-study) were conducted to determine the number of meta-raters for a reliable rating (phi-coefficient target of >0.80). Both scores were correlated against rater's gestalt perceptions of utility for both faculty and residents reading the scores. Results: Twenty-five meta-raters from 20 sites participated in this rating exercise. The G-study revealed that the CCERR group (n = 13) rated the comments with a very high reliability (Phi = 0.97). Meanwhile, the QuAL group (n = 12) rated the comments with a similarly high reliability (Phi = 0.97). The QuAL score required only two raters to reach an acceptable target reliability of >0.80, while the CCERR required three. The QuAL score correlated with perceptions of utility (Meta-rater usefulness, Pearson's r = 0.69, p < 0.001; Perceived usefulness for trainee, r = 0.74, p < 0.001). The CCERR performed similarly, correlating with perceived faculty (r = 0.67, <0.001) and resident utility (0.79, <0.001). Conclusions: The QuAL score is reliable rating score that correlates well with perceptions of utility. The QuAL score may be useful for rating shorter comments generated by workplace-based assessments.