Screens for developmental delay generally provide a set of norms for different age groups. Development varies continuously with age, however, and applying a single criterion for an age range will inevitably produce misclassifications. In this report, we estimate the resulting error rate for one example: the cognitive subscale of the Bayley Scales of Infant and Toddler Development (BSID-III).
Data come from a general population sample of 594 children (305 male) aged 1 month to 42.5 months who received the BSID-III as part of a validation study. We used regression models to estimate the mean and variance of the cognitive subscale as a function of age. We then used these results to generate a dataset of one million simulated participants and compared their status before and after division into age groups. Finally, we applied broader age bands used in two other instruments and explored likely validity limitations when different instruments are compared.
When BSID-III age groups are used, 15% of cases are missed and 15% of apparent cases are false positives. Wider age groups produced error rates from 27% to 46%. Comparison of different age groups suggests that sensitivity in validation studies would be limited, under certain assumptions, to 70% or less.
The use of age groups produces a large number of misclassifications. Although affected children will usually be close to the threshold, this may lead to misreferrals. Results may help to explain the poor measured agreement of development screens. Scoring methods that treat child age as continuous would improve instrument accuracy.