A Blinded Comparison of Three Generative Artificial Intelligence Chatbots for Orthopaedic Surgery Therapeutic Questions

abstract

Objective To compare the quality of responses from three chatbots (ChatGPT, Bing Chat, and AskOE) across various orthopaedic surgery therapeutic treatment questions. Design We identified a series of treatment-related questions across a range of subspecialties in orthopaedic surgery. Questions were "identically" entered into one of three chatbots (ChatGPT, Bing Chat, and AskOE) and reviewed using a standardized rubric. Participants Orthopaedic surgery experts associated with McMaster University and the University of Toronto blindly reviewed all responses. Outcomes The primary outcomes were scores on a five-item assessment tool assessing clinical correctness, clinical completeness, safety, usefulness, and references. The secondary outcome was the reviewers' preferred response for each question. We performed a mixed effects logistic regression to identify factors associated with selecting a preferred chatbot. Results Across all questions and answers, AskOE was preferred by reviewers to a significantly greater extent than both ChatGPT (P<0.001) and Bing (P<0.001). AskOE also received significantly higher total evaluation scores than both ChatGPT (P<0.001) and Bing (P<0.001). Further regression analysis showed that clinical correctness, clinical completeness, usefulness, and references were significantly associated with a preference for AskOE. Across all responses, there were four considered as having major errors in response, with three occurring with ChatGPT and one occurring with AskOE. Conclusions Reviewers significantly preferred AskOE over ChatGPT and Bing Chat across a variety of variables in orthopaedic therapy questions. This technology has important implications in a healthcare setting as it provides access to trustworthy answers in orthopaedic surgery.

authors

Arora, Vikram
Silburt, Joseph
Phillips, Mark
Khan, Moin
Petrisor, Bradley Allan
Chaudhry, Harman
Mundi, Raman
Bhandari, Mohit

status

published

publication date

July 25, 2024

has subject area

11 Medical and Health Sciences (FoR)

published in

Cureus Journal

A Blinded Comparison of Three Generative Artificial Intelligence Chatbots for Orthopaedic Surgery Therapeutic Questions Journal Articles

Overview

abstract

authors

status

publication date

has subject area

published in

Research

keywords

Identity

Digital Object Identifier (DOI)

Additional Document Info

start page

volume

issue