Back to 2025 Abstracts
AI Text-to-Image Generators and the Lack of Diversity in Hand Surgeon Demographic Representation
Isra Abdulwadood, BS
1; Meeti Mehta, BS
2; Kassandra Carrion, BA
3; Xinfei Miao, BS
4; Sonal Kumar, BA
5; Parul Rai, BS
6; Sabrina Lazar, BS
7; Heli S Patel, MBA
8; Noopur Gangopadhyay, MD
9; Wendy Chen, MD
101Mayo Clinic, Scottsdale, AZ; 2University of Pittsburgh, Pittsburgh, PA; 3Stanford University, Palo Alto, CA; 4California University of Science and Medicine, Colton, CA; 5Ross University, Miramar, FL; 6University of Texas Medical Branch, Galveston, TX; 7Albany Medical College, Albany, NY; 8Nova Southeastern University College of Allopathic Medicine, Davie, FL; 9Ann & Robert H. Lurie Children's Hospital, Chicago, IL; 10The Johns Hopkins University School of Medicine, Baltimore, MD
Purpose: Artificial intelligence (AI) models are already being extensively applied in medicine; however, recent studies have revealed the existence of significant gender and racial gaps with the utilization of AI in the care and education of patients. Resultantly, there is a growing concern that these gaps may lead to unintended biases and inequalities in patient care. Furthermore, demographic disparities have been established in many surgical subspecialties, including hand surgery, with women and people of color often in the minority. This paper intends to analyze the demographic representation of hand surgeons in AI-generated visuals models in order to shed light on any disparities and analyze the consequential implications for both the medical community and broader society.
Methods: We assessed three of the most popular and publicly available AI text-to-image generators, including DALL-E 3, Midjourney, and DreamStudio. Three reviewers independently evaluated over 300 AI-generated images, categorizing them according to gender (female and male) and race (non-White and White). Inter-rater reliability was determined using Cohen's Kappa. Chi-square was performed to compare the distribution of female and non-White hand surgeons in the AI-generated images with current demographic data of hand surgeons in the United States. Statistical significance was established at alpha = 0.01.
Results: Cohen's kappa for racial agreement across three AI platforms was 0.608 (moderate to substantial agreement), and for gender agreement was 1 (perfect agreement). Cohen's kappa did not differ when comparing each AI platform for gender or racial agreement. DALL-E 3 showed a significant difference between percentage of rater identified whites and non-whites when compared to the national average of PR (plastic and reconstructive) surgeons (76.6% white, p<0.01)-- image output showed 64% white PR surgeons. On the contrary, DALL-E 3 did not show a significant difference between image output percent males (91%) and the national average of PR male surgeons (83%, p=0.03). Midjourney image outputs favored white (100%), male (100%) PR surgeons, and this was significantly higher than the national average (p<0.01). DreamStudio showed outputs reflective of the national average of male PR surgeons (81%, p=0.59), but showed significantly more white PR surgeons (97%) than the national average.
Conclusion: As AI technologies continue to shape healthcare, our study aims to underscore the urgency of cultivating more inclusive AI datasets that accurately reflect the growing diversity within hand surgery. Addressing this gap is crucial for fostering equitable advancements in AI applications, enhancing medical education, and ensuring a comprehensive understanding of hand surgery.
Back to 2025 Abstracts