abstract
- INTRODUCTION: The ideal parameters to predict significant antenatal hydronephrosis remain controversial. Given the subjectivity of the Society of Fetal Urology (SFU) and Urinary Tract Dilation (UTD) grading systems, more objective measurements like medullary pyramidal thickness (PT) and parenchymal thickness (ParT) may be useful. OBJECTIVE: We sought to assess the interrater reliability of objective measures of hydronephrosis and the UTD grading system among pediatric urologists at multiple institutions through the Societies of Pediatric Urology (SPU) Hydronephrosis Task Force Registry. STUDY DESIGN: Fifteen renal sonograms of infants from a single center were chosen from patients enrolled in the registry's prospective database. Images were shared confidentially with pediatric urologists at participating institutions. Reviewers were taught standardized measurement techniques. Eight reviewers from five institutions analyzed each study and recorded anterior posterior renal pelvis diameter (APD), PT, ParT, and UTD grade. Interrater reliability was analyzed using Intraclass Correlation Coefficient (ICC) with 95 % CI, one-way random effects, consistency model for continuous variables and percent agreement for binary variables. Light's kappa with 95 % CI was calculated for reliability of UTD grade. RESULTS: Reviewers collected data on fifteen renal sonograms for a total of 30 units. APD had excellent reliability, PT moderate to good reliability, and ParT poor to moderate reliability. One reviewer was found to be an outlier with respect to ParT measurements. When assessed as a binary variable (>3 mm vs. <3 mm) the percent agreement between reviewers for PT was 70 %. UTD grade was considered to have weak to moderate reliability with a Light's Kappa of 0.43. The most common discrepancy between graders was the distinction between UTD P2 and P3. DISCUSSION/CONCLUSION: APD demonstrated the highest interrater reliability among objective assessments of antenatal hydronephrosis. PT had moderate to good reliability and was more reliable than ParT. This analysis highlights the need for incorporation of more reliable methods to characterize UTD.