Home
Scholarly Works
Comparison of AI-assisted and human-generated...
Journal article

Comparison of AI-assisted and human-generated plain language summaries for Cochrane reviews: protocol for a randomised trial (HIET-1)

Abstract

Effective communication of health information enables informed decision-making. Plain Language Summaries (PLS) of systematic reviews present complex health evidence in accessible language for the general public. Advances in artificial intelligence (AI), particularly large language models like Open AI’s ChatGPT, offer potential enhancements in generating these summaries. This protocol outlines a randomised, parallel-group, two-armed, non-inferiority trial comparing the effectiveness of AI-assisted versus human-generated PLS. Adults aged 18 years or older, proficient in English will be recruited online via the Prolific audience recruitment platform. Participants will be randomly assigned (1:1 ratio) to one of two groups: 1.Intervention Group: Receives three AI-assisted PLS summaries created through a human-in-the-loop process with a large language model based on Cochrane reviews published within the prior 3 months. 2.Control Group: Receives three standard human-generated Cochrane PLS matching the summaries in the intervention group. Each participant will receive three Plain Language Summaries purposefully selected from Cochrane intervention reviews, representing different health conditions and varying levels of evidence certainty. The primary outcome is comprehension (aligned with QUEST's Understanding dimension), assessed via a standardised 10-item multiple-choice questionnaire for each , structured according to Cochrane PLS template sections (review topic, aims, methods, main findings, limitations, and currency of evidence). Secondary outcomes, all aligned with QUEST framework dimensions, are: readability (Expression Style dimension), quality of information (Quality dimension), safety considerations (Safety and Harm dimension), and perceived trustworthiness (Trust and Confidence dimension). Non-inferiority margins are set at 10% difference in mean scores for comprehension, 10% for quality of information, and safety, 1-grade level for readability, and 0.5 points for trustworthiness (scale 1-5). This study is open-label. Neither participants nor researchers can be blinded due to the nature of the interventions and the public availability of human-generated Cochrane PLSs. The primary analysis population will exclude participants who fail attention checks or show evidence of poor engagement. Statistical analyses will employ mixed-effects models to account for the hierarchical data structure with sensitivity analyses including all randomised participants. Safety outcomes will be assessed using separate frameworks for AI-assisted and human-generated summaries.. If non-inferiority is established for any outcome, we will test for superiority for that outcome using the same model structure. By evaluating whether AI-assisted summaries are non-inferior to human-generated summaries in these five key dimensions, this study aims to provide insights into integrating AI technologies in health communication. Findings will inform future practices in disseminating evidence-based health information to the public.

Authors

Devane D; Pope J; Byrne P; Forde E; Woloshin S; Culloty E; Dahly D; Elgersma IH; Munthe-Kaas H; Judge C

Journal

Journal of Clinical Epidemiology, Vol. 185, ,

Publisher

Elsevier

Publication Date

September 1, 2025

DOI

10.1016/j.jclinepi.2025.111894

ISSN

0895-4356

Labels

Fields of Research (FoR)

Contact the Experts team