COVID-19 is one of the most consequential pandemics in the last century, yet the biological mechanisms that confer disease risk are incompletely understood. Further, heterogeneity in disease outcomes is influenced by race, though the relative contributions of structural/social and genetic factors remain unclear. Very recent unpublished work has identified two genetic risk loci that confer greater risk for respiratory failure in COVID-19: the ABO locus and the 3p21.31 locus. To understand how these loci might confer risk and whether this differs by race, we utilized proteomic profiling and genetic information from three cohorts including black and white participants to identify proteins influenced by these loci. We observed that variants in the ABO locus are associated with levels of CD209/DC-SIGN, a known binding protein for SARS-CoV and other viruses, as well as multiple inflammatory and thrombotic proteins, while the 3p21.31 locus is associated with levels of CXCL16, a known inflammatory chemokine. Thus, integration of genetic information and proteomic profiling in biracial cohorts highlights putative mechanisms for genetic risk in COVID-19 disease.
Multi-omics
High-throughput proteomic profiling using antibody or aptamer-based affinity reagents is used increasingly in human studies. However, direct analyses to address the relative strengths and weaknesses of these platforms are lacking. We assessed findings from the SomaScan1.3K (N = 1301 reagents), the SomaScan5K platform (N = 4979 reagents), and the Olink Explore (N = 1472 reagents) profiling techniques in 568 adults from the Jackson Heart Study and 219 participants in the HERITAGE Family Study across four performance domains: precision, accuracy, analytic breadth, and phenotypic associations leveraging detailed clinical phenotyping and genetic data. Across these studies, we show evidence supporting more reliable protein target specificity and a higher number of phenotypic associations for the Olink platform, while the Soma platforms benefit from greater measurement precision and analytic breadth across the proteome.
Integrating genetic information with metabolomics has provided new insights into genes affecting human metabolism. However, gene-metabolite integration has been primarily studied in individuals of European Ancestry, limiting the opportunity to leverage genomic diversity for discovery. In addition, these analyses have principally involved known metabolites, with the majority of the profiled peaks left unannotated. Here, we perform a whole genome association study of 2,291 metabolite peaks (known and unknown features) in 2,466 Black individuals from the Jackson Heart Study. We identify 519 locus-metabolite associations for 427 metabolite peaks and validate our findings in two multi-ethnic cohorts. A significant proportion of these associations are in ancestry specific alleles including findings in APOE, TTR and CD36. We leverage tandem mass spectrometry to annotate unknown metabolites, providing new insight into hereditary diseases including transthyretin amyloidosis and sickle cell disease. Our integrative omics approach leverages genomic diversity to provide novel insights into diverse cardiometabolic diseases.
BACKGROUND: We recently identified 156 proteins in human plasma that were each associated with the net Framingham Cardiovascular Disease Risk Score using an aptamer-based proteomic platform in Framingham Heart Study Offspring participants. Here we hypothesized that performing genome-wide association studies and exome array analyses on the levels of each of these 156 proteins might identify genetic determinants of risk-associated circulating factors and provide insights into early cardiovascular pathophysiology.
METHODS: We studied the association of genetic variants with the plasma levels of each of the 156 Framingham Cardiovascular Disease Risk Score-associated proteins using linear mixed-effects models in 2 population-based cohorts. We performed discovery analyses on plasma samples from 759 participants of the Framingham Heart Study Offspring cohort, an observational study of the offspring of the original Framingham Heart Study and their spouses, and validated these findings in plasma samples from 1421 participants of the MDCS (Malmö Diet and Cancer Study). To evaluate the utility of this strategy in identifying new biological pathways relevant to cardiovascular disease pathophysiology, we performed studies in a cell-model system to experimentally validate the functional significance of an especially novel genetic association with circulating apolipoprotein E levels.
RESULTS: We identified 120 locus-protein associations in genome-wide analyses and 41 associations in exome array analyses, the majority of which have not been described previously. These loci explained up to 66% of interindividual plasma protein-level variation and, on average, accounted for 3 times the amount of variation explained by common clinical factors, such as age, sex, and diabetes mellitus status. We described overlap among many of these loci and cardiovascular disease genetic risk variants. Finally, we experimentally validated a novel association between circulating apolipoprotein E levels and the transcription factor phosphatase 1G. Knockdown of phosphatase 1G in a human liver cell model resulted in decreased apolipoprotein E transcription and apolipoprotein E protein levels in cultured supernatants.
CONCLUSIONS: We identified dozens of novel genetic determinants of proteins associated with the Framingham Cardiovascular Disease Risk Score and experimentally validated a new role for phosphatase 1G in lipoprotein biology. Further, genome-wide and exome array data for each protein have been made publicly available as a resource for cardiovascular disease research.
BACKGROUND: Proteomic approaches allow measurement of thousands of proteins in a single specimen, which can accelerate biomarker discovery. However, applying these technologies to massive biobanks is not currently feasible because of the practical barriers and costs of implementing such assays at scale. To overcome these challenges, we used a "virtual proteomic" approach, linking genetically predicted protein levels to clinical diagnoses in >40 000 individuals.
METHODS: We used genome-wide association data from the Framingham Heart Study (n=759) to construct genetic predictors for 1129 plasma protein levels. We validated the genetic predictors for 268 proteins and used them to compute predicted protein levels in 41 288 genotyped individuals in the Electronic Medical Records and Genomics (eMERGE) cohort. We tested associations for each predicted protein with 1128 clinical phenotypes. Lead associations were validated with directly measured protein levels and either low-density lipoprotein cholesterol or subclinical atherosclerosis in the MDCS (Malmö Diet and Cancer Study; n=651).
RESULTS: In the virtual proteomic analysis in eMERGE, 55 proteins were associated with 89 distinct diagnoses at a false discovery rate q<0.1. Among these, 13 associations involved lipid (n=7) or atherosclerosis (n=6) phenotypes. We tested each association for validation in MDCS using directly measured protein levels. At Bonferroni-adjusted significance thresholds, levels of apolipoprotein E isoforms were associated with hyperlipidemia, and circulating C-type lectin domain family 1 member B and platelet-derived growth factor receptor-β predicted subclinical atherosclerosis. Odds ratios for carotid atherosclerosis were 1.31 (95% CI, 1.08-1.58; P=0.006) per 1-SD increment in C-type lectin domain family 1 member B and 0.79 (0.66-0.94; P=0.008) per 1-SD increment in platelet-derived growth factor receptor-β.
CONCLUSIONS: We demonstrate a biomarker discovery paradigm to identify candidate biomarkers of cardiovascular and other diseases.
Rationale: Genome-wide association studies have identified genetic loci associated with insulin resistance (IR) but pinpointing the causal genes of a risk locus has been challenging. Objective: To identify candidate causal genes for IR, we screened regional and biologically plausible genes (16 in total) near the top 10 IR-loci in risk-relevant cell types, namely preadipocytes and adipocytes. Methods and Results: We generated 16 human Simpson-Golabi-Behmel syndrome preadipocyte knockout lines each with a single IR-gene knocked out by lentivirus-mediated CRISPR (clustered regularly interspaced short palindromic repeats)/Cas9 system. We evaluated each gene knockout by screening IR-relevant phenotypes in the 3 insulin-sensitizing mechanisms, including adipogenesis, lipid metabolism, and insulin signaling. We performed genetic analyses using data on the genotype-tissue expression portal expression quantitative trait loci database and accelerating medicines partnership type 2 diabetes mellitus Knowledge Portal to evaluate whether candidate genes prioritized by our in vitro studies were expression quantitative trait loci genes in human subcutaneous adipose tissue, and whether expression of these genes is associated with risk of IR, type 2 diabetes mellitus, and cardiovascular diseases. We further validated the functions of 3 new adipose IR genes by overexpression-based phenotypic rescue in the Simpson-Golabi-Behmel syndrome preadipocyte knockout lines. Twelve genes, PPARG, IRS-1, FST, PEPD, PDGFC, MAP3K1, GRB14, ARL15, ANKRD55, RSPO3, COBLL1, and LYPLAL1, showed diverse phenotypes in the 3 insulin-sensitizing mechanisms, and the first 7 of these genes could affect all the 3 mechanisms. Five out of 6 expression quantitative trait loci genes are among the top candidate causal genes and the abnormal expression levels of these genes (IRS-1, GRB14, FST, PEPD, and PDGFC) in human subcutaneous adipose tissue could be associated with increased risk of IR, type 2 diabetes mellitus, and cardiovascular disease. Phenotypic rescue by overexpression of the candidate causal genes (FST, PEPD, and PDGFC) in the Simpson-Golabi-Behmel syndrome preadipocyte knockout lines confirmed their function in adipose IR. Conclusions: Twelve genes showed diverse phenotypes indicating differential roles in insulin sensitization, suggesting mechanisms bridging the association of their genomic loci with IR. We prioritized PPARG, IRS-1, GRB14, MAP3K1, FST, PEPD, and PDGFC as top candidate genes. Our work points to novel roles for FST, PEPD, and PDGFC in adipose tissue, with consequences for cardiometabolic diseases.
BACKGROUND: Increased left ventricular (LV) mass is associated with adverse cardiovascular events including heart failure (HF). Both increased LV mass and HF disproportionately affect Black individuals. To understand the underlying mechanisms, we undertook a proteomic screen in a Black cohort and compared the findings to results from a White cohort.
METHODS: We measured 1305 plasma proteins using the SomaScan platform in 1772 Black participants (mean age, 56 years; 62% women) in JHS (Jackson Heart Study) with LV mass assessed by 2-dimensional echocardiography. Incident HF was assessed in 1600 participants. We then compared protein associations in JHS to those observed in White participants from FHS (Framingham Heart Study; mean age, 54 years; 56% women).
RESULTS: In JHS, there were 110 proteins associated with LV mass and 13 proteins associated with incident HF hospitalization with false discovery rate <5% after multivariable adjustment. Several proteins showed expected associations with both LV mass and HF, including NT-proBNP (N-terminal pro-B-type natriuretic peptide; β=0.04; P=2×10-8; hazard ratio, 1.48; P=0.0001). The strongest association with LV mass was novel: LKHA4 (leukotriene-A4 hydrolase; β=0.05; P=5×10-15). This association was confirmed on an alternate proteomics platform and further supported by related metabolomic data. Fractalkine/CX3CL1 (C-X3-C Motif Chemokine Ligand 1) showed a novel association with incident HF (hazard ratio, 1.32; P=0.0002). While established biomarkers such as cystatin C and NT-proBNP showed consistent associations in Black and White individuals, LKHA4 and fractalkine were significantly different between the two groups.
CONCLUSIONS: We identified several novel biological pathways specific to Black adults hypothesized to contribute to the pathophysiologic cascade of LV hypertrophy and incident HF including LKHA4 and fractalkine.
Although many novel gene-metabolite and gene-protein associations have been identified using high-throughput biochemical profiling, systematic studies that leverage human genetics to illuminate causal relationships between circulating proteins and metabolites are lacking. Here, we performed protein-metabolite association studies in 3,626 plasma samples from three human cohorts. We detected 171,800 significant protein-metabolite pairwise correlations between 1,265 proteins and 365 metabolites, including established relationships in metabolic and signaling pathways such as the protein thyroxine-binding globulin and the metabolite thyroxine, as well as thousands of new findings. In Mendelian randomization (MR) analyses, we identified putative causal protein-to-metabolite associations. We experimentally validated top MR associations in proof-of-concept plasma metabolomics studies in three murine knockout strains of key protein regulators. These analyses identified previously unrecognized associations between bioactive proteins and metabolites in human plasma. We provide publicly available data to be leveraged for studies in human metabolism and disease.
BACKGROUNDMost GWAS of plasma proteomics have focused on White individuals of European ancestry, limiting biological insight from other ancestry-enriched protein quantitative loci (pQTL).METHODSWe conducted a discovery GWAS of approximately 3,000 plasma proteins measured by the antibody-based Olink platform in 1,054 Black adults from the Jackson Heart Study (JHS) and validated our findings in the Multi-Ethnic Study of Atherosclerosis (MESA). The genetic architecture of identified pQTLs was further explored through fine mapping and admixture association analysis. Finally, using our pQTL findings, we performed a phenome-wide association study (PheWAS) across 2 large multiethnic electronic health record (EHR) systems in All of Us and BioMe.RESULTSWe identified 1,002 pQTLs for 925 protein assays. Fine mapping and admixture analyses suggested allelic heterogeneity of the plasma proteome across diverse populations. We identified associations for variants enriched in African ancestry, many in diseases that lack precise biomarkers, including cis-pQTLs for cathepsin L (CTSL) and Siglec-9, which were linked with sarcoidosis and non-Hodgkin's lymphoma, respectively. We found concordant associations across clinical diagnoses and laboratory measurements, elucidating disease pathways, including a cis-pQTL associated with circulating CD58, WBC count, and multiple sclerosis.CONCLUSIONSOur findings emphasize the value of leveraging diverse populations to enhance biological insights from proteomics GWAS, and we have made this resource readily available as an interactive web portal.FUNDINGNIH K08 HL161445-01A1; 5T32HL160522-03; HHSN268201600034I; HL133870.