Abstract
BACKGROUND: Accurate quantification of sodium intake based on self-reported dietary assessments has been a persistent challenge. We aimed to apply machine-learning (ML) algorithms to predict 24-hour urinary sodium excretion from self-reported questionnaire information.
METHODS AND RESULTS: We analyzed 3454 participants from the NHS (Nurses' Health Study), NHS-II (Nurses' Health Study II), and HPFS (Health Professionals Follow-Up Study), with repeated measures of 24-hour urinary sodium excretion over 1 year. We used an ensemble approach to predict averaged 24-hour urinary sodium excretion using 36 characteristics. The TOHP-I (Trial of Hypertension Prevention I) was used for the external validation. The final ML algorithms were applied to 167 920 nonhypertensive adults with 30-year follow-up to estimate confounder-adjusted hazard ratio (HR) of incident hypertension for predicted sodium. Averaged 24-hour urinary sodium excretion was better predicted and calibrated with ML compared with the food frequency questionnaire (Spearman correlation coefficient, 0.51 [95% CI, 0.49-0.54] with ML; 0.19 [95% CI, 0.16-0.23] with the food frequency questionnaire; 0.46 [95% CI, 0.42-0.50] in the TOHP-I). However, the prediction heavily depended on body size, and the prediction of energy-adjusted 24-hour sodium excretion was modestly better using ML. ML-predicted sodium was modestly more strongly associated than food frequency questionnaire-based sodium in the NHS-II (HR comparing Q5 versus Q1, 1.48 [95% CI, 1.40-1.56] with ML; 1.04 [95% CI, 0.99-1.08] with the food frequency questionnaire), but no material differences were observed in the NHS or HPFS.
CONCLUSIONS: The present ML algorithm improved prediction of participants' absolute 24-hour urinary sodium excretion. The present algorithms may be a generalizable approach for predicting absolute sodium intake but do not substantially reduce the bias stemming from measurement error in disease associations.