Interpretation of Clinical Retinal Images Using an Artificial Intelligence Chatbot
Journal Articles
Overview
Research
Identity
Additional Document Info
View All
Overview
abstract
PURPOSE: To assess the performance of Chat Generative Pre-Trained Transformer-4 in providing accurate diagnoses to retina teaching cases from OCTCases. DESIGN: Cross-sectional study. SUBJECTS: Retina teaching cases from OCTCases. METHODS: We prompted a custom chatbot with 69 retina cases containing multimodal ophthalmic images, asking it to provide the most likely diagnosis. In a sensitivity analysis, we inputted increasing amounts of clinical information pertaining to each case until the chatbot achieved a correct diagnosis. We performed multivariable logistic regressions on Stata v17.0 (StataCorp LLC) to investigate associations between the amount of text-based information inputted per prompt and the odds of the chatbot achieving a correct diagnosis, adjusting for the laterality of cases, number of ophthalmic images inputted, and imaging modalities. MAIN OUTCOME MEASURES: Our primary outcome was the proportion of cases for which the chatbot was able to provide a correct diagnosis. Our secondary outcome was the chatbot's performance in relation to the amount of text-based information accompanying ophthalmic images. RESULTS: Across 69 retina cases collectively containing 139 ophthalmic images, the chatbot was able to provide a definitive, correct diagnosis for 35 (50.7%) cases. The chatbot needed variable amounts of clinical information to achieve a correct diagnosis, where the entire patient description as presented by OCTCases was required for a majority of correctly diagnosed cases (23 of 35 cases, 65.7%). Relative to when the chatbot was only prompted with a patient's age and sex, the chatbot achieved a higher odds of a correct diagnosis when prompted with an entire patient description (odds ratio = 10.1, 95% confidence interval = 3.3-30.3, P < 0.01). Despite providing an incorrect diagnosis for 34 (49.3%) cases, the chatbot listed the correct diagnosis within its differential diagnosis for 7 (20.6%) of these incorrectly answered cases. CONCLUSIONS: This custom chatbot was able to accurately diagnose approximately half of the retina cases requiring multimodal input, albeit relying heavily on text-based contextual information that accompanied ophthalmic images. The diagnostic ability of the chatbot in interpretation of multimodal imaging without text-based information is currently limited. The appropriate use of the chatbot in this setting is of utmost importance, given bioethical concerns. FINANCIAL DISCLOSURES: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.