Evaluating ChatGPT on Orbital and Oculofacial Disorders: Accuracy and Readability Insights

abstract

Purpose: To assess the accuracy and readability of responses generated by the artificial intelligence model, ChatGPT (version 4.0), to questions related to 10 essential domains of orbital and oculofacial disease. Methods: A set of 100 questions related to the diagnosis, treatment, and interpretation of orbital and oculofacial diseases was posed to ChatGPT 4.0. Responses were evaluated by a panel of 7 experts based on appropriateness and accuracy, with performance scores measured on a 7-item Likert scale. Inter-rater reliability was determined via the intraclass correlation coefficient. Results: The artificial intelligence model demonstrated accurate and consistent performance across all 10 domains of orbital and oculofacial disease, with an average appropriateness score of 5.3/6.0 (“mostly appropriate” to “completely appropriate”). Domains of cavernous sinus fistula, retrobulbar hemorrhage, and blepharospasm had the highest domain scores (average scores of 5.5 to 5.6), while the proptosis domain had the lowest (average score of 5.0/6.0). The intraclass correlation coefficient was 0.64 (95% CI: 0.52 to 0.74), reflecting moderate inter-rater reliability. The responses exhibited a high reading-level complexity, representing the comprehension levels of a college or graduate education. Conclusions: This study demonstrates the potential of ChatGPT 4.0 to provide accurate information in the field of ophthalmology, specifically orbital and oculofacial disease. However, challenges remain in ensuring accurate and comprehensive responses across all disease domains. Future improvements should focus on refining the model’s correctness and eventually expanding the scope to visual data interpretation. Our results highlight the vast potential for artificial intelligence in educational and clinical ophthalmology contexts.

authors

Balas, Michael
Janic, Ana
Daigle, Patrick
Nijhawan, Navdeep
Hussain, Ahsen
Gill, Harmeet
Lahaie, Gabriela L
Belliveau, Michel J
Crawford, Sean A
Arjmand, Parnian
Ing, Edsel B

publication date

March 2024

has subject area

1113 Opthalmology and Optometry (FoR)
Artificial Intelligence (MeSH)
Blepharospasm (MeSH)
Cavernous Sinus (MeSH)
Comprehension (MeSH)
Humans (MeSH)
Ophthalmology & Optometry (Science Metrix)
Reproducibility of Results (MeSH)

published in

Ophthalmic Plastic and Reconstructive Surgery Journal

Evaluating ChatGPT on Orbital and Oculofacial Disorders: Accuracy and Readability Insights Journal Articles

Overview

abstract

authors

publication date

has subject area

published in

Research

keywords

Identity

Digital Object Identifier (DOI)

PubMed ID

Additional Document Info

start page

end page

volume

issue