American Association for Hand Surgery

AAHS Home AAHS Home Past & Future Meetings Past & Future Meetings
Facebook    Twitter

Back to 2025 Abstracts


Artificial Intelligence in Hand and Upper Extremity Surgery Education: Accuracy and Validity of ChatGPT-4o Versus UpToDate as a Learning Tool for Trainees
Caleb H Bercu, MD1; Brianna Rosner, BS2; Aneeq S. Chaudhry, BA2; Hannah Korah, PhD3; Isabel Bernal, DO4; Jonathan D Freedman, MD, PhD5; Aaron J Berger, MD, PhD1
1Division of Plastic and Reconstructive Surgery, Nicklaus Children's Hospital, Miami, FL; 2Florida International University Herbert Wertheim College of Medicine, Miami, FL; 3University of Arizona College of Medicine, Tuscon, AZ; 4HCA Florida Westside Hospital, Plantation, FL; 5University of Miami, Miami, FL

Introduction

The use of artificial intelligence (AI) in medical education has risen rapidly in recent years. ChatGPT-4o users can ask clinical questions and receive management recommendations, streamlining information gathering compared to traditional resources like UpToDate. Previous studies assessed the accuracy of ChatGPT in head, neck, and breast surgery, but none have examined hand/upper extremity surgery. This study aims to evaluate the accuracy of ChatGPT-4o compared to UpToDate and categorizes the validity of sources provided by ChatGPT-4o.

Methods

Five hand/upper extremity surgery clinical cases were inserted into ChatGPT-4o with the phrases "Tell me how to manage ...” and "Give me references at the end of your response.” A relevant UpToDate article was selected for each case. Evaluators included two hand/upper extremity surgeons and five medical students, who completed a web-based survey. Resources were rated on a scale from 1 to 3, with 1 indicating incomplete information and not useful; 2 indicating semi-complete information and somewhat useful; 3 indicating complete answer and useful for management.

ChatGPT-4o references were scored by two reviewers. On a scale of 0 to 2; 0 indicated the reference is not available with the described DOI number and source link or was incorrect; 1 indicated the reference was available with the described DOI number and source link but not related to the specific topic; 2 indicated the reference was available with the described DOI number and source link and strongly related to the topic.

Results


Hand/upper extremity surgeons rated ChatGPT-4o and UpToDate as semi-complete and somewhat useful, with median scores of 2.00 and 2.50 respectively. A Student's t-test revealed no significant differences between resources (p=1; statistical significance p<0.05). Amongst medical students, ChatGPT had a median score of 2.00 and a mean score of 2.28, while UpToDate had a median score of 3.00 and a mean score of 2.44; no statistically significant differences were found (median p=0.157, mean p=0.161).

Of the 25 references provided by ChatGPT, 28% were accurate (score =2), 6% somewhat accurate (score =1), and 66% were not accurate (score =0). ChatGPT was limited to five references per case in contrast to UpToDate. ChatGPT frequently altered details of sources' authors and article titles.

Conclusion

Our findings indicate comparable perceived usefulness of ChatGPT-4o and UpToDate by hand/upper extremity surgeons and trainees. ChatGPT-4o holds promise as an educational tool, however, accuracy concerns remain. ChatGPT-4o generates factual inconsistencies, also known as "hallucinations", including references to articles that do not exist.
Back to 2025 Abstracts