Metabolomic epidemiology is the high-throughput study of the relationship between metabolites and health-related traits. This emerging and rapidly growing field has improved our understanding of disease aetiology and contributed to advances in precision medicine. As the field continues to develop, metabolomic epidemiology could lead to the discovery of diagnostic biomarkers predictive of disease risk, aiding in earlier disease detection and better prognosis. In this Review, we discuss key advances facilitated by the field of metabolomic epidemiology for a range of conditions, including cardiometabolic diseases, cancer, Alzheimer's disease and COVID-19, with a focus on potential clinical utility. Core principles in metabolomic epidemiology, including study design, causal inference methods and multi-omic integration, are briefly discussed. Future directions required for clinical translation of metabolomic epidemiology findings are summarized, emphasizing public health implications. Further work is needed to establish which metabolites reproducibly improve clinical risk prediction in diverse populations and are causally related to disease progression.
Publications
2023
BACKGROUND: Current clinical decision tools for assessing bleeding risk in individuals with atrial fibrillation (AF) have limited performance and were developed for individuals treated with warfarin. This study develops and validates a clinical risk score to personalize estimates of bleeding risk for individuals with atrial fibrillation taking direct-acting oral anticoagulants (DOACs).
METHODS: Among individuals taking dabigatran 150 mg twice per day from 44 countries and 951 centers in this secondary analysis of the RE-LY trial (Randomized Evaluation of Long-Term Anticoagulation Therapy), a risk score was developed to determine the comparative risk for bleeding on the basis of covariates derived in a Cox proportional hazards model. The risk prediction model was internally validated with bootstrapping. The model was then further developed in the GARFIELD-AF registry (Global Anticoagulant Registry in the Field-Atrial Fibrillation), with individuals taking dabigatran, edoxaban, rivaroxaban, and apixaban. To determine generalizability in external cohorts and among individuals on different DOACs, the risk prediction model was validated in the COMBINE-AF (A Collaboration Between Multiple Institutions to Better Investigate Non-Vitamin K Antagonist Oral Anticoagulant Use in Atrial Fibrillation) pooled clinical trial cohort and the Quebec Régie de l'Assurance Maladie du Québec and Med-Echo Administrative Databases (RAMQ) administrative database. The primary outcome was major bleeding. The risk score, termed the DOAC Score, was compared with the HAS-BLED score.
RESULTS: Of the 5684 patients in RE-LY, 386 (6.8%) experienced a major bleeding event, within a median follow-up of 1.74 years. The prediction model had an optimism-corrected C statistic of 0.73 after internal validation with bootstrapping and was well-calibrated based on visual inspection of calibration plots (goodness-of-fit P=0.57). The DOAC Score assigned points for age, creatinine clearance/glomerular filtration rate, underweight status, stroke/transient ischemic attack/embolism history, diabetes, hypertension, antiplatelet use, nonsteroidal anti-inflammatory use, liver disease, and bleeding history, with each additional point scored associated with a 48.7% (95% CI, 38.9%-59.3%; P<0.001) increase in major bleeding in RE-LY. The score had superior performance to the HAS-BLED score in RE-LY (C statistic, 0.73 versus 0.60; P for difference <0.001) and among 12 296 individuals in GARFIELD-AF (C statistic, 0.71 versus 0.66; P for difference = 0.025). The DOAC Score had stronger predictive performance than the HAS-BLED score in both validation cohorts, including 25 586 individuals in COMBINE-AF (C statistic, 0.67 versus 0.63; P for difference <0.001) and 11 945 individuals in RAMQ (C statistic, 0.65 versus 0.58; P for difference <0.001).
CONCLUSIONS: In individuals with atrial fibrillation potentially eligible for DOAC therapy, the DOAC Score can help stratify patients on the basis of expected bleeding risk.
Although many novel gene-metabolite and gene-protein associations have been identified using high-throughput biochemical profiling, systematic studies that leverage human genetics to illuminate causal relationships between circulating proteins and metabolites are lacking. Here, we performed protein-metabolite association studies in 3,626 plasma samples from three human cohorts. We detected 171,800 significant protein-metabolite pairwise correlations between 1,265 proteins and 365 metabolites, including established relationships in metabolic and signaling pathways such as the protein thyroxine-binding globulin and the metabolite thyroxine, as well as thousands of new findings. In Mendelian randomization (MR) analyses, we identified putative causal protein-to-metabolite associations. We experimentally validated top MR associations in proof-of-concept plasma metabolomics studies in three murine knockout strains of key protein regulators. These analyses identified previously unrecognized associations between bioactive proteins and metabolites in human plasma. We provide publicly available data to be leveraged for studies in human metabolism and disease.
BACKGROUND: Black adults have higher incidence of all-cause death and worse cardiovascular outcomes when compared to other populations. The Duffy chemokine receptor is not expressed in a large majority of Black adults and the clinical implications of this are unclear.
METHODS: Here, we investigated the relationship of Duffy receptor status, high-sensitivity C-reactive protein (hs-CRP), and long-term cardiovascular outcomes in Black members of two contemporary, longitudinal cohort studies (the Jackson Heart Study and Multi-Ethnic Study of Atherosclerosis). Data on 4,307 Black participants (2,942 Duffy null and 1,365 Duffy receptor positive, as defined using Single Nucleotide Polymorphism (SNP) rs2814778) were included in this analysis.
RESULTS: Duffy null was not independently associated with elevated levels of serum hs-CRP levels once conditioning for known CRP locus alleles in linkage disequilibrium with the Duffy gene. Duffy null status was not found to be independently associated with higher incidence of all-cause mortality or secondary outcomes after adjusting for possible confounders in Black participants.
CONCLUSIONS: These findings suggest that increased levels of hs-CRP found in Duffy null individuals is due to co-inheritance of CRP alleles known to influence circulating levels hs-CRP and that Duffy null status was not associated with worse adverse outcomes over the follow-up period in this cohort of well-balanced Black participants.
BACKGROUND: Sickle cell trait affects approximately 8% of Black individuals in the United States, along with many other individuals with ancestry from malaria-endemic regions worldwide. While traditionally considered a benign condition, recent evidence suggests that sickle cell trait is associated with lower eGFR and higher risk of kidney diseases, including kidney failure. The mechanisms underlying these associations remain poorly understood. We used proteomic profiling to gain insight into the pathobiology of sickle cell trait.
METHODS: We measured proteomics ( N =1285 proteins assayed by Olink Explore) using baseline plasma samples from 592 Black participants with sickle cell trait and 1:1 age-matched Black participants without sickle cell trait from the prospective Women's Health Initiative cohort. Age-adjusted linear regression was used to assess the association between protein levels and sickle cell trait.
RESULTS: In age-adjusted models, 35 proteins were significantly associated with sickle cell trait after correction for multiple testing. Several of the sickle cell trait-protein associations were replicated in Black participants from two independent cohorts (Atherosclerosis Risk in Communities study and Jackson Heart Study) assayed using an orthogonal aptamer-based proteomic platform (SomaScan). Many of the validated sickle cell trait-associated proteins are known biomarkers of kidney function or injury ( e.g. , hepatitis A virus cellular receptor 1 [HAVCR1]/kidney injury molecule-1 [KIM-1], uromodulin [UMOD], ephrins), related to red cell physiology or hemolysis (erythropoietin [EPO], heme oxygenase 1 [HMOX1], and α -hemoglobin stabilizing protein) and/or inflammation (fractalkine, C-C motif chemokine ligand 2/monocyte chemoattractant protein-1 [MCP-1], and urokinase plasminogen activator surface receptor [PLAUR]). A protein risk score constructed from the top sickle cell trait-associated biomarkers was associated with incident kidney failure among those with sickle cell trait during Women's Health Initiative follow-up (odds ratio, 1.32; 95% confidence interval, 1.10 to 1.58).
CONCLUSIONS: We identified and replicated the association of sickle cell trait with a number of plasma proteins related to hemolysis, kidney injury, and inflammation.
Cardiometabolic diseases, including cardiovascular disease and diabetes, are major causes of morbidity and mortality worldwide. Despite progress in prevention and treatment, recent trends show a stalling in the reduction of cardiovascular disease morbidity and mortality, paralleled by increasing rates of cardiometabolic disease risk factors in young adults, underscoring the importance of risk assessments in this population. This review highlights the evidence for molecular biomarkers for early risk assessment in young individuals. We examine the utility of traditional biomarkers in young individuals and discuss novel, nontraditional biomarkers specific to pathways contributing to early cardiometabolic disease risk. Additionally, we explore emerging omic technologies and analytical approaches that could enhance risk assessment for cardiometabolic disease.
Circulating metabolite levels may reflect the state of the human organism in health and disease, however, the genetic architecture of metabolites is not fully understood. We have performed a whole-genome sequencing association analysis of both common and rare variants in up to 11,840 multi-ethnic participants from five studies with up to 1666 circulating metabolites. We have discovered 1985 novel variant-metabolite associations, and validated 761 locus-metabolite associations reported previously. Seventy-nine novel variant-metabolite associations have been replicated, including three genetic loci located on the X chromosome that have demonstrated its involvement in metabolic regulation. Gene-based analysis have provided further support for seven metabolite-replicated loci pairs and their biologically plausible genes. Among those novel replicated variant-metabolite pairs, follow-up analyses have revealed that 26 metabolites have colocalized with 21 tissues, seven metabolite-disease outcome associations have been putatively causal, and 7 metabolites might be regulated by plasma protein levels. Our results have depicted the genetic contribution to circulating metabolite levels, providing additional insights into understanding human disease.
N-acyl amino acids are a large family of circulating lipid metabolites that modulate energy expenditure and fat mass in rodents. However, little is known about the regulation and potential cardiometabolic functions of N-acyl amino acids in humans. Here, we analyze the cardiometabolic phenotype associations and genomic associations of four plasma N-acyl amino acids (N-oleoyl-leucine, N-oleoyl-phenylalanine, N-oleoyl-serine, and N-oleoyl-glycine) in 2351 individuals from the Jackson Heart Study. We find that plasma levels of specific N-acyl amino acids are associated with cardiometabolic disease endpoints independent of free amino acid plasma levels and in patterns according to the amino acid head group. By integrating whole genome sequencing data with N-acyl amino acid levels, we identify that the genetic determinants of N-acyl amino acid levels also cluster according to the amino acid head group. Furthermore, we identify the CYP4F2 locus as a genetic determinant of plasma N-oleoyl-leucine and N-oleoyl-phenylalanine levels in human plasma. In experimental studies, we demonstrate that CYP4F2-mediated hydroxylation of N-oleoyl-leucine and N-oleoyl-phenylalanine results in metabolic diversification and production of many previously unknown lipid metabolites with varying characteristics of the fatty acid tail group, including several that structurally resemble fatty acid hydroxy fatty acids. These studies provide a structural framework for understanding the regulation and disease associations of N-acyl amino acids in humans and identify that the diversity of this lipid signaling family can be significantly expanded through CYP4F-mediated ω-hydroxylation.