Detecting stigmatizing language in clinical notes with large language models for addiction care.

Sethi, R., Caskey, J., Gao, Y., Churpek, M. M., Miller, T. A., Mayampurath, A., Salisbury-Afshar, E., Afshar, M., & Dligach, D. (2026). Detecting stigmatizing language in clinical notes with large language models for addiction care.. Npj Health Systems, 3(1), 15.

Abstract

Intensive care units (ICU) produce numerous progress notes that may contain stigmatizing language that perpetuate negative biases and punitive approaches against patients. Patients with substance use disorders are particularly vulnerable to stigma. This study examined the performance of Large Language Models (LLMs) in the identification of stigmatizing language. We annotated a dataset with over 77,000 stigmatizing and non-stigmatizing notes from the MIMIC-III database. We utilized Meta's Llama-3 8B Instruct LLM to run the following experiments for stigma detection: zero-shot; in-context learning; in-context learning with a selective retrieval; supervised fine-tuning (SFT); and keyword search. All approaches were evaluated on a held-out test set and external validation (University of Wisconsin Health System). SFT had the best performance with 97.2% accuracy, followed by in-context learning. The LLMs with in-context learning and SFT provided appropriate reasoning for false positives during human review. Both approaches identified clinical notes with stigmatizing language that were missed during annotation. SFT achieved 97.9% accuracy on external validation dataset. LLMs, particularly SFT and in-context learning, effectively identify stigmatizing language in ICU notes with high accuracy while explaining their reasoning in an asynchronous fashion and demonstrated the ability to identify novel stigmatizing language, not explicitly in training data nor existing guidelines.

Last updated on 04/02/2026
PubMed