Assessment of patient handouts on burns created by...

Assessment of patient handouts on burns created by burn surgeons compared to ChatGPT-4o

Abstract

Physicians often face time constraints that may impact the delivery of patient education. Large language models have illustrated promising results in patient education across various specialties. The present study’s aim was to investigate the quality and readability of ChatGPT- generated handouts on burns and compare these results to a published handout. We asked ChatGPT-4o to generate and regenerate patient handouts for seven topics regarding burns. These handouts, along with a patient handout with similar topics published by Hamilton Health Sciences, were assessed. The Quality of Generated Language Outputs for Patients (QGLOP) scale was used to assess handouts based on accuracy/comprehensiveness, bias, currency, and tone, where each domain was scored out of 4 for a total of 16. The Simple Measure of Gobbledygook (SMOG) score was calculated to assess handout readability. The threshold for statistical significance was set at p < 0.05. The mean QGLOP scores for the ChatGPT-4o generated handouts and the published handout did not significantly differ. The mean QGLOP scores between ChatGPT-4o and the published handout were not significantly different for accuracy, bias, currency, and tone. ChatGPT-4o had lower scores on the topic of skin care, but higher scores on coping with burns. The two groups did not significantly differ for any other topic. We found that ChatGPT could produce patient education handouts on burns with scores comparable to those of a patient handout published by a burn unit, suggesting that plastic surgeons would have a similar level of satisfaction for both groups.