Purpose To evaluate whether large language model (LLM)-generated patient education materials for cataract surgery vary in readability, length, and accuracy based on demographic modifiers including race, gender, geography, and insurance status. Design Cross-sectional study Methods This study analyzed 7,000 responses from five LLMs (ChatGPT, Claude, Copilot, DeepSeek, and Gemini) between March-May 2025 using 280 standardized prompts that varied by race, gender, province/territory, and insurance coverage. Each prompt was submitted five times. Readability was assessed using the Flesch-Kincaid Grade Level (FKGL), Flesch Reading Ease (FRE), and SMOG index. Accuracy was assessed by dual blinded reviewers against AAO clinical guidelines. ANOVA was performed (α = 0.05). Results LLM outputs differed significantly across all metrics (p < 0.001). Gemini generated the longest (876 ± 143 words) and among the most complex text (FKGL 11.9 ± 1.2). Race, insurance status, and geography significantly impacted readability. Prompts referencing Indigenous patients were the most complex (FKGL 11.1 ± 1.8, FRE 36.5 ± 7.9). Insured prompts were longer and more complex (11.0 ± 1.7 vs. 10.8 ± 1.7; 429 vs. 399 words; p < 0.001). Prompts from Nunavut and Manitoba were the least readable (FKGL ≥ 11.1), while Quebec and PEI were most readable. Gender had minimal impact. No outputs contained clinically unsafe information, but most lacked sufficient depth. None of the responses met the AMA’s sixth-grade readability recommendation. Conclusion LLM-generated patient education for cataract surgery varies by patient demographics. These disparities may hinder equitable access to health information and highlight the need for bias-aware development of AI tools in healthcare.