abstract
- This article may be of interest to physical therapy educators who are responsible for structuring station or practical examinations used to evaluate physical therapy students. The global intent of the article is to provide information that may be useful in selecting test items. Specifically, the purposes of this study were 1) to examine how two item-sampling strategies (one based on different diagnostic concepts, or diagnostic probes, and the other based on different anatomical sites) influenced the generalizability of a station examination, 2) to determine the interrater reliability during the station examination, and 3) to determine whether the status of the rater (that of observer or simulated patient) influenced the rating. Using a nested study design, 24 physical therapy students were assessed by eight raters. The raters were randomly and equally assigned to four teams. Each team assessed six students. One rater acted as the simulated patient for the first three students in each group, and the other rater acted as observer. This order was reversed for the last three students. Each student performed nine mini-diagnostic patient cases consisting of three diagnostic probes reproduced at three different anatomical sites. The results demonstrate that 1) similar diagnostic concepts can be generalized across anatomical sites, although different concepts or skills cannot be generalized at a given anatomical site or across sites; 2) interrater reliability was excellent; and 3) the status of the raters (ie, simulated patient or observer) did not bias the ratings.(ABSTRACT TRUNCATED AT 250 WORDS)