Publications

Dymm, Braydon, and Daniel M Goldenholz. (2026) 2026. “Prompting Is All You Need: How to Make LLMs More Helpful for Clinical Decision Support.”. MedRxiv : The Preprint Server for Health Sciences. https://doi.org/10.64898/2026.02.12.26346005.

Publisher's Version

Publisher's Version

IMPORTANCE: Large language models (LLMs) offer potential decision support, but their accuracy varies. Prompt engineering can generally enhance LLM behavior in a clinical context, yet best practices have yet to be formally explored in realistic clinical contexts for neurology.
OBJECTIVE: To evaluate the impact of structured prompting versus naive prompting on the performance of four LLMs (two closed-source: OpenAI GPT-4o, OpenAI o3; three open-source: Meta Llama-4-Scout-17B-16E-Instruct, Llama-3.3-70B-Instruct-Turbo, and the reasoning model r1-1776) for thrombolytic clinical decision support (CDS) in acute stroke.
DESIGN: Models responded to three novel ischemic stroke vignettes using either a naive question ("Should this patient be offered thrombolytics?") or a five-step structured prompt (CARDS) guiding information extraction, timing analysis, contraindication checking, decision process explanation, and risk-benefit discussion. Outputs were assessed across seven domains: guideline adherence, unsafe recommendations, risk recognition, guideline grading accuracy, inclusion of conversational explanation, clarity, and overall helpfulness.
RESULTS: Structured prompts significantly enhanced performance across most domains, with varying effects between model families. For closed-source models (GPT-4o, o3), prompts structured in the CARDS style improved guideline adherence from 83.3% to 100%, eliminated unsafe recommendations (16.7% to 0%), and increased specific guideline grading accuracy from 0% to 100%. Similarly, the open-source reasoning model r1-1776 achieved these top-tier outcomes (100% adherence, 0% unsafe, 100% grading, 100% conversation) when structured prompts were applied, with grading and conversation improving from 0%. In contrast, other open-source models (Llama-4-Scout, Llama-3.3-70B) showed more modest gains: risk recognition improved (83.3% to 100%) and guideline grading accuracy increased (0% to 66.7%), while guideline adherence (66.7%) and unsafe recommendations (33.3%) persisted. Overall, structured prompting yielded the largest improvements in guideline grading accuracy and conversational reasoning across multiple models.
CONCLUSION AND RELEVANCE: Structured prompting substantially enhances LLM performance for acute stroke thrombolysis CDS. Notably, some models, including the proprietary GPT-4o and o3, and the open-source reasoning model r1-1776, achieved excellent safety and adherence with structured prompts. For clinical deployment of any LLM, structured prompts are crucial, and vigilant human oversight remains essential.
Goldenholz, Daniel M, Joshua C Cheng, Chi-Yuan Chang, Robert Moss, and Brandon Westover. (2026) 2026. “Does Missing Medication Acutely Change Seizure Risk? A Prospective Study.”. Annals of Neurology 99 (4): 1076-82. https://doi.org/10.1002/ana.78134.

Publisher's Version

Publisher's Version

OBJECTIVE: The objective of this study was to determine whether missing individual doses of anti-seizure medications (ASMs) elevate short-term seizure risk in people with drug-resistant epilepsy.
METHODS: In a prospective, community-based cohort, adults with drug-resistant epilepsy (≥ 3 seizures/month) or their caregivers recorded seizures and ASM intake with smartphone applications for 10 months each. Individual level analysis modeled the relationships between ASM adherence with seizure occurrence, as well as with a simplified seizure forecast via a 90-day moving average ("Napkin method"). Group-level analysis with a mixed-effects model was performed to examine the relationship between ASM adherence and simplified forecasts, while controlling for differences in individual seizure frequency via random effects.
RESULTS: Twenty-seven participants (median age = 29 years) contributed 7,853 person-days. Individual analysis showed that only a small (n = 2) number of participants had a weak relationship between ASM adherence with seizure occurrence. Group-level analysis showed that seizure occurrence was highly linked to the Napkin method, but not ASM adherence.
INTERPRETATION: Among individuals with frequent, drug-resistant epilepsy, occasional missed ASM doses did not measurably raise immediate seizure risk. Whereas sustained nonadherence remains a clinical concern, clinicians may reassure patients that infrequent brief lapses are unlikely to trigger seizures acutely, yet they should continue emphasizing overall adherence for long-term seizure control. ANN NEUROL 2026;99:1076-1082.
Chang, Chi-Yuan, Robert Moss, Brandon Westover, and Daniel M Goldenholz. (2025) 2025. “Deployable Seizure Forecasting Requires Clinically Meaningful Performance: Response to Stirling Et al.”. Epilepsia. https://doi.org/10.1002/epi.70083.

Publisher's Version

Publisher's Version