Assessor burden, inter-rater agreement and user experience of the RoB-SPEO tool for assessing risk of bias in studies estimating prevalence of exposure to occupational risk factors: An analysis from the WHO/ILO Joint Estimates of the Work-related Burden of Disease and Injury Journal Articles uri icon

  • Overview
  • Research
  • Identity
  • Additional Document Info
  • View All


  • BACKGROUND: As part of the development of the World Health Organization (WHO)/International Labour Organization (ILO) Joint Estimates of the Work-related Burden of Disease and Injury, WHO and ILO carried out several systematic reviews to determine the prevalence of exposure to selected occupational risk factors. Risk of bias assessment for individual studies is a critical step of a systematic review. No tool existed for assessing the risk of bias in prevalence studies of exposure to occupational risk factors, so WHO and ILO developed and pilot tested the RoB-SPEO tool for this purpose. Here, we investigate the assessor burden, inter-rater agreement, and user experience of this new instrument, based on the abovementioned WHO/ILO systematic reviews. METHODS: Twenty-seven individual experts applied RoB-SPEO to assess risk of bias. Four systematic reviews provided a total of 283 individual assessments, carried out for 137 studies. For each study, two or more assessors independently assessed risk of bias across the eight RoB-SPEO domains selecting one of RoB-SPEO's six ratings (i.e., "low", "probably low", "probably high", "high", "unclear" or "cannot be determined"). Assessors were asked to report time taken (i.e. indicator of assessor burden) to complete each assessment and describe their user experience. To gauge assessor burden, we calculated the median and inter-quartile range of times taken per individual risk of bias assessment. To assess inter-rater reliability, we calculated a raw measure of inter-rater agreement (Pi) for each RoB-SPEO domain, between Pi = 0.00, indicating no agreement and Pi = 1.00, indicating perfect agreement. As subgroup analyses, Pi was also disaggregated by systematic review, assessor experience with RoB-SPEO (≤10 assessments versus > 10 assessments), and assessment time (tertiles: ≤25 min versus 26-66 min versus ≥ 67 min). To describe user experience, we synthesised the assessors' comments and recommendations. RESULTS: Assessors reported a median of 40 min to complete one assessment (interquartile range 21-120 min). For all domains, raw inter-rater agreement ranged from 0.54 to 0.82. Agreement varied by systematic review and assessor experience with RoB-SPEO between domains, and increased with increasing assessment time. A small number of users recommended further development of instructions for selected RoB-SPEO domains, especially bias in selection of participants into the study (domain 1) and bias due to differences in numerator and denominator (domain 7). DISCUSSION: Overall, our results indicated good agreement across the eight domains of the RoB-SPEO tool. The median assessment time was comparable to that of other risk of bias tools, indicating comparable assessor burden. However, there was considerable variation in time taken to complete assessments. Additional time spent on assessments may improve inter-rater agreement. Further development of the RoB-SPEO tool could focus on refining instructions for selected RoB-SPEO domains and additional testing to assess agreement for different topic areas and with a wider range of assessors from different research backgrounds.


  • Momen, Natalie C
  • Streicher, Kai N
  • da Silva, Denise TC
  • Descatha, Alexis
  • Frings-Dresen, Monique HW
  • Gagliardi, Diana
  • Godderis, Lode
  • Loney, Tom
  • Mandrioli, Daniele
  • Modenese, Alberto
  • Morgan, Rebecca
  • Pachito, Daniela
  • Scheepers, Paul TJ
  • Sgargi, Daria
  • Paulo, Marília Silva
  • Schlünssen, Vivi
  • Sembajwe, Grace
  • Sørensen, Kathrine
  • Teixeira, Liliane R
  • Tenkate, Thomas
  • Pega, Frank

publication date

  • January 2022