Assessing the Ability of ChatGPT to Screen Articles for Systematic
Reviews
Abstract
By organizing knowledge within a research field, Systematic Reviews (SR)
provide valuable leads to steer research. Evidence suggests that SRs have
become first-class artifacts in software engineering. However, the tedious
manual effort associated with the screening phase of SRs renders these studies
a costly and error-prone endeavor. While screening has traditionally been
considered not amenable to automation, the advent of generative AI-driven
chatbots, backed with large language models is set to disrupt the field. In
this report, we propose an approach to leverage these novel technological
developments for automating the screening of SRs. We assess the consistency,
classification performance, and generalizability of ChatGPT in screening
articles for SRs and compare these figures with those of traditional
classifiers used in SR automation. Our results indicate that ChatGPT is a
viable option to automate the SR processes, but requires careful considerations
from developers when integrating ChatGPT into their SR tools.