Empirical evaluation of SUCRA-based treatment ranks in network meta-analysis: quantifying robustness using Cohen’s kappa Academic Article uri icon

  • Overview
  • Research
  • Identity
  • Additional Document Info
  • View All


  • ObjectiveTo provide a framework for quantifying the robustness of treatment ranks based on Surface Under the Cumulative RAnking curve (SUCRA) in network meta-analysis (NMA) and investigating potential factors associated with lack of robustness.MethodsWe propose the use of Cohen’s kappa to quantify the agreement between SUCRA-based treatment ranks estimated through NMA of a complete data set and a subset of it. We illustrate our approach using five published NMA data sets, where robustness was assessed by removing studies one at a time.ResultsOverall, SUCRA-based treatment ranks were robust to individual studies in the five data sets we considered. We observed more incidences of disagreement between ranks in the networks with larger numbers of treatments. Most treatments moved only one or two ranks up or down. The lowest quadratic weighted kappa estimate observed across all networks was in the network with the smallest number of treatments (4), where weighted kappa=40%. In the network with the largest number of treatments (12), the lowest observed quadratic weighted kappa=89%, reflecting a small shift in this network's treatment ranks overall. Preliminary observations suggest that a study’s size, the number of studies making a treatment comparison, and the agreement of a study’s estimated treatment effect(s) with those estimated by other studies making the same comparison(s) may explain the overall robustness of treatment ranks to studies.ConclusionsInvestigating robustness or sensitivity in an NMA may reveal outlying rank changes that are clinically or policy-relevant. Cohen’s kappa is a useful measure that permits investigation into study characteristics that may explain varying sensitivity to individual studies. However, this study presents a framework as a proof of concept and further investigation is required to identify potential factors associated with the robustness of treatment ranks using more extensive empirical evaluations.


  • Daly, Caitlin H
  • Neupane, Binod
  • Beyene, Joseph
  • Thabane, Lehana
  • Straus, Sharon E
  • Hamid, Jemila S

publication date

  • September 2019