Inferring high-fat dietary patterns from electronic health record data using machine learning.

Yeh, Y.-Y., Lin, H.-Y., Guo, J., Sun, R. C., Jiang, S., Bian, J., & Dai, H. (2026). Inferring high-fat dietary patterns from electronic health record data using machine learning.. JAMIA Open, 9(1), ooaf181.

Publisher's Version

Abstract

OBJECTIVES: Electronic health records (EHRs) rarely capture dietary detail, limiting diet-disease research. We aimed to develop machine learning (ML) computable phenotypes to identify high-fat diet (HFD) using variables typically available in EHRs.

MATERIALS AND METHODS: We used National Health and Nutrition Examination Survey (NHANES) 1999-2020 data, where 24-h dietary recall served as ground truth. Dietary fat intake was summarized into a score (0-30) based on percent energy from fat, carbohydrate, and protein; lower scores indicated HFD. We defined HFD at cutoffs of 10, 15, and 20, and trained ML models (Extreme Gradient Boosting, logistic regression, random forest) using EHR-compatible variables (demographics, comorbidities, labs, anthropometrics). Model interpretability was assessed using Shapley Additive Explanations. To evaluate clinical relevance, we compared cancer associations using ML-predicted vs true diet labels.

RESULTS: Machine learning models classified HFD with good performance, strongest at broader definitions. Random forest achieved an F1-score of 0.79 (recall 0.74, precision 0.84) at cutoff 20. Key predictors included race/ethnicity, triglycerides, obesity metrics (body mass index and derived indices), and metabolic panel results.

DISCUSSION: These findings indicate that dietary patterns, though seldom recorded in EHRs, can be inferred from routinely available variables. The ability of ML-derived phenotypes to reproduce known diet-disease relationships underscore their epidemiologic validity. Top predictors also align with established biological pathways linking obesity, lipid metabolism, and cancer risk, supporting plausibility.

CONCLUSION: A high-fat dietary pattern can be inferred from EHR-compatible variables using ML-based phenotyping. This approach offers a scalable tool to integrate diet into EHR-based research and precision medicine.

Last updated on 04/01/2026
PubMed

Find Us

330 Brookline Ave.
Boston, MA 02115

Inferring high-fat dietary patterns from electronic health record data using machine learning.

Abstract

Find Us

Get In Touch

Let's Talk