Leveraging AI to Optimize Maintenance of Health Evidence and Offer a One-Stop Shop for Quality-Appraised Evidence Syntheses on the Effectiveness of Public Health Interventions: Quality Improvement Project. Journal Articles uri icon

  •  
  • Overview
  •  
  • Research
  •  
  • Identity
  •  
  • Additional Document Info
  •  
  • View All
  •  

abstract

  • BACKGROUND: Health Evidence provides access to quality appraisals for >10,000 evidence syntheses on the effectiveness and cost-effectiveness of public health and health promotion interventions. Maintaining Health Evidence has become increasingly resource-intensive due to the exponential growth of published literature. Innovative screening methods using artificial intelligence (AI) can potentially improve efficiency. OBJECTIVE: The objectives of this project are to: (1) assess the ability of AI-assisted screening to correctly predict nonrelevant references at the title and abstract level and investigate the consistency of this performance over time, and (2) evaluate the impact of AI-assisted screening on the overall monthly manual screening set. METHODS: Training and testing were conducted using the DistillerSR AI Preview & Rank feature. A set of manually screened references (n=43,273) was uploaded and used to train the AI feature and assign probability scores to each reference to predict relevance. A minimum threshold was established where the AI feature correctly identified all manually screened relevant references. The AI feature was tested on a separate set of references (n=72,686) from the May 2019 to April 2020 monthly searches. The testing set was used to determine an optimal threshold that ensured >99% of relevant references would continue to be added to Health Evidence. The performance of AI-assisted screening at the title and abstract screening level was evaluated using recall, specificity, precision, negative predictive value, and the number of references removed by AI. The number and percentage of references removed by AI-assisted screening and the change in monthly manual screening time were estimated using an implementation reference set (n=272,253) from November 2020 to 2023. RESULTS: The minimum threshold in the training set of references was 0.068, which correctly removed 37% (n=16,122) of nonrelevant references. Analysis of the testing set identified an optimal threshold of 0.17, which removed 51,706 (71.14%) references using AI-assisted screening. A slight decrease in recall between the 0.068 minimum threshold (99.68%) and the 0.17 optimal threshold (94.84%) was noted, resulting in four missed references included via manual screening at the full-text level. This was accompanied by an increase in specificity from 35.95% to 71.70%, doubling the proportion of references AI-assisted screening correctly predicted as not relevant. Over 3 years of implementation, the number of references requiring manual screening was reduced by 70%, reducing the time spent manually screening by an estimated 382 hours. CONCLUSIONS: Given the magnitude of newly published peer-reviewed evidence, the curation of evidence supports decision makers in making informed decisions. AI-assisted screening can be an important tool to supplement manual screening and reduce the number of references that require manual screening, ensuring that the continued availability of curated high-quality synthesis evidence in public health is possible.

publication date

  • July 29, 2025