Content specificity and oral certification examinations
Additional Document Info
This study reports on the generalizability of different skills assessed in the oral certification examinations in Internal Medicine of the Royal College of Physicians and Surgeons of Canada. Assessments from the 1992 examination were examined prospectively to determine (i) inter-rater reliability, (ii) correlation from morning to afternoon sessions, and (iii) overall test reliability. While inter-rater reliability was acceptable and in the range reported from previous studies, the generalizability across sessions was very low, ranging from 0.30 to 0.47, presumably reflecting content specificity. As a consequence, the overall test reliability was low, ranging from 0.57 to 0.69. Collapsing the overall scores into three decision categories (pass, borderline, fail) lowered the test reliability still further. Strategies to resolve this problem are suggested.