Back to 2026 ePosters
Orthopedic Hand Surgery: Evaluating the Ability of ChatGPT to Improve Grade-Level Readability in Patient Education Materials
Samuel R Nofsinger, MA, Ben Kirby, MD, Peter Serour, BS, Peter Albrecht, BS, Julia AV Nuelle, MD; Daniel London, MD
University of Missouri, Columbia, MO
Introduction The American Medical Association (AMA) recommends that patient instructions be written at a maximum 6th-grade reading level. However, most educational handouts fail to meet this standard. We hypothesized that ChatGPT can simplify these materials to align with readability guidelines.
Materials & Methods All 95 patient education handouts from the American Society for Surgery of the Hand (ASSH) were independently evaluated using validated readability tools: Patient Education Material Assessment Tool (PEMAT), Flesch-Kincaid (FK), Automated Readability Index (ARI), Simple Measure of Gobbledygook (SMOG), Gunning Fog Index (GFI), and Coleman-Liau Index (CLI). FK scores were determined as the average of three separate online calculators. All other readability metrics are determined as the average of two online calculators. Handouts were processed through iterations of OpenAI's ChatGPT 3.5, 4, and 4o with the prompt to translate to a 6
th-grade reading level. Revised outputs were re-assessed using the same readability metrics. Two orthopedic hand surgeons independently evaluated for content preservation. T-tests were used to compare scores between groups.
Results Compared to ASSH, ChatGPT revised handouts had lower composite grade level readability (8.6 vs 10.5) but remained above the target 6
th grade target
(Table 1, Figure 1). ChatGPT 4o improved readability in nearly all metrics: FK Grade Level (8.3 vs 10.3), FK Reading Ease (68.0 vs 54.9), ARI (7.4 vs 9.4), SMOG (8.3 vs 10.2), CLI (9.3 vs 10.9), and GFI (9.6 vs 11.9). Compared to ASSH, ChatGPT-generated handouts had lower total PEMAT scores (ChatGPT 4o: 71.1% vs 79.6%, p < 0.001) primary due to lower understandability scores (ChatGPT 4o: 76.7% vs 86.3%, p < 0.001) over actionability (ChatGPT 4o: 56.4% vs 57.5%, p = 0.70) as shown in
Table 2. Changes in readability were primarily due to lower total word count (410 vs 719), shorter sentence length (12 vs 21 words per sentence), and shorter word length (147 vs 155 syllables per 100 words). ChatGPT preserved the content of the original handouts in 94.7% of cases.
Conclusion Current educational handouts exceed the recommended 6th-grade reading level. Although ChatGPT did not reach this target, it was able to increase readability while preserving content. Further improvement in ChatGPT algorithms and prompts may support improved simplification of patient education materials. AI-assisted revision of surgeon-written material could be considered to assist in the challenge of improving readability of patient educational materials.


Back to 2026 ePosters