Back to 2025 Abstracts
GPT-4 as a Source of Healthcare Information for Hand-Surgery Patients
Stephen Parlamas, BS
1; Praneet Paidisetty, BS
2; Kylie Swiekatowski, BS
2; Ashton Mansour, MD
2; Candice Teunis-Washko, MD
2; Yuewei Wu-Fienberg, MD
1; Wendy Chen, MD
21McGovern Medical School at UT Health, Houston, TX; 2McGovern Medical School, Houston, TX
PurposeArtificial intelligence (AI) is increasingly becoming important in healthcare, especially pertaining to patient education. As such, hand surgeons should understand the implications these technologies have on patient care. This study investigated the accuracy, comprehensiveness, and readability of healthcare information provided by OpenAI's GPT-4 in response to common questions at varying levels of complexity regarding elective hand procedures.
MethodsGPT-4 was asked questions of low, medium, and high complexities for the five common hand conditions: carpal tunnel syndrome, trigger finger, cubital tunnel syndrome, distal radius fracture, and thumb carpometacarpal osteoarthritis. Low complexity questions reflect what patients may search online, medium complexity questions reflect what physicians are asked in an outpatient setting, and high complexity questions reflect clinical vignettes that patients may pose to the software requiring medical judgment. Each question was asked three times to the chatbot. The quality of responses were evaluated using a 3-point numerical scale by four board-certified hand surgeons (two plastics and two orthopedics). Half-point scores were given if an answer was felt to fit between values. Readability scores were generated using the Simple Measure of Gobbledygook (SMOG).
ResultsA total of 105 AI responses were evaluated by each surgeon. The average rating for surgeons across the three question complexities for all five procedures was 2.300, with primary complexities rated highest (2.492), followed by secondary (2.318), and high complexities (2.183, p=0.02). AI responses received an average SMOG score of 14.9, equivalent to that of a university junior reading-grade level. The chatbot was unable to provide a single response that adhered to the National Institutes of Health (NIH) recommended reading level of the sixth grade. The lowest recorded SMOG score equated to that of a high school junior, and the highest to that of a postgraduate student.
ConclusionGPT-4 provided generally accurate, appropriate answers to questions of varying complexities relating to common hand procedures. The accuracy and thoroughness of information diminished when enquiries required consideration of unique individual patient factors. A reading level well beyond the NIH guidelines was required to comprehend the information presented. Though this AI model is promising, continued development is necessary to expand upon its clinical utility. Patients should continue to seek counseling from hand surgeons.
Back to 2025 Abstracts