New Model, Old Risks? Sociodemographic Bias and Adversarial Hallucinations Vulnerability in GPT-5.

Omar, Mahmud, Reem Agbareia, Donald U Apakama, Carol R Horowitz, Robert Freeman, Alexander W Charney, Girish N Nadkarni, and Eyal Klang. 2025. “New Model, Old Risks? Sociodemographic Bias and Adversarial Hallucinations Vulnerability in GPT-5.”. MedRxiv : The Preprint Server for Health Sciences.

Abstract

Extending our validated benchmarking work, GPT-5 showed no improvement in sociodemographic-linked decision variation compared with GPT-4o and seemed to be worse on several endpoints. We re-tested GPT-5 with a fixed pipeline: 500 physician-validated emergency vignettes, each replayed across 32 sociodemographic labels plus an unlabeled control, answering the same four questions (triage, further testing, treatment level, and need for mental-health assessment). This design holds clinical content constant to isolate the effect of the label. GPT-5 reproduced subgroup-linked variation, with higher assigned urgency and less advanced testing for several historically marginalized and intersectional groups. Notably, several LGBTQIA+ labels were flagged for mental-health screening in 100% of cases, versus  41-73% for comparable groups with GPT-4o. Additionally, in an adversarial re-run that inserted one fabricated medical detail into otherwise standard clinical cases, GPT-5 adopted or elaborated on the fabrication in 65% of runs (vs 53% for GPT-4o). A single mitigation prompt reduced this to 7.67%.

Last updated on 04/09/2026
PubMed