Publications

2022

Granot-Hershkovitz E, Sun Q, Argos M, Zhou H, Lin X, Browning SR, et al. AFA: Ancestry-specific allele frequency estimation in admixed populations: The Hispanic Community Health Study/Study of Latinos.. HGG advances. 2022;3(2):100096.

Allele frequency estimates in admixed populations, such as Hispanics and Latinos, rely on the sample's specific admixture composition and thus may differ between two seemingly similar populations. However, ancestry-specific allele frequencies, i.e., pertaining to the ancestral populations of an admixed group, may be particularly useful for prioritizing genetic variants for genetic discovery and personalized genomic health. We developed a method, ancestry-specific allele frequency estimation in admixed populations (AFA), to estimate the frequencies of biallelic variants in admixed populations with an unlimited number of ancestries. AFA uses maximum-likelihood estimation by modeling the conditional probability of having an allele given proportions of genetic ancestries. It can be applied using either local ancestry interval proportions encompassing the variant (local-ancestry-specific allele frequency estimations in admixed populations [LAFAs]) or global proportions of genetic ancestries (global-ancestry-specific allele frequency estimations in admixed populations [GAFAs]), which are easier to compute and are more widely available. Simulations and comparisons to existing software demonstrated the high accuracy of the method. We implemented AFA on high-quality imputed data of ∼9,000 Hispanics and Latinos from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), an understudied, admixed population with three predominant continental ancestries: Amerindian, European, and African. Comparison of the European and African estimated frequencies to the respective gnomAD frequencies demonstrated high correlations (Pearson R2 = 0.97-0.99). We provide a genome-wide dataset of the estimated ancestry-specific allele frequencies for available variants with allele frequency between 5% and 95% in at least one of the three ancestral populations. Association analysis of Amerindian-enriched variants with cardiometabolic traits identified five loci associated with lipid traits in Hispanics and Latinos, demonstrating the utility of ancestry-specific allele frequencies in admixed populations.

Wang H, Kurniansyah N, Cade BE, Goodman MO, Chen H, Gottlieb DJ, et al. Upregulated heme biosynthesis increases obstructive sleep apnea severity: a pathway-based Mendelian randomization study.. Scientific reports. 2022;12(1):1472.

Obstructive sleep apnea (OSA) is a common disorder associated with increased risk of cardiovascular disease and mortality. Iron and heme metabolism, implicated in ventilatory control and OSA comorbidities, was associated with OSA phenotypes in recent admixture mapping and gene enrichment analyses. However, its causal contribution was unclear. In this study, we performed pathway-level transcriptional Mendelian randomization (MR) analysis to investigate the causal relationships between iron and heme related pathways and OSA. In primary analysis, we examined the expression level of four iron/heme Reactome pathways as exposures and four OSA traits as outcomes using cross-tissue cis-eQTLs from the Genotype-Tissue Expression portal and published genome-wide summary statistics of OSA. We identify a significant putative causal association between up-regulated heme biosynthesis pathway with higher sleep time percentage of hypoxemia (p = 6.14 × 10-3). This association is supported by consistency of point estimates in one-sample MR in the Multi-Ethnic Study of Atherosclerosis using high coverage DNA and RNA sequencing data generated by the Trans-Omics for Precision Medicine project. Secondary analysis for 37 additional iron/heme Gene Ontology pathways did not reveal any significant causal associations. This study suggests a causal association between increased heme biosynthesis and OSA severity.

He S, Granot-Hershkovitz E, Zhang Y, Bressler J, Tarraf W, Yu B, et al. Blood metabolites predicting mild cognitive impairment in the study of Latinos-investigation of neurocognitive aging (HCHS/SOL).. Alzheimer’s & dementia (Amsterdam, Netherlands). 2022;14(1):e12259.

INTRODUCTION: Blood metabolomics-based biomarkers may be useful to predict measures of neurocognitive aging.

METHODS: We tested the association between 707 blood metabolites measured in 1451 participants from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), with mild cognitive impairment (MCI) and global cognitive change assessed 7 years later. We further used Lasso penalized regression to construct a metabolomics risk score (MRS) that predicts MCI, potentially identifying a different set of metabolites than those discovered in individual-metabolite analysis.

RESULTS: We identified 20 metabolites predicting prevalent MCI and/or global cognitive change. Six of them were novel and 14 were previously reported as associated with neurocognitive aging outcomes. The MCI MRS comprised 61 metabolites and improved prediction accuracy from 84% (minimally adjusted model) to 89% in the entire dataset and from 75% to 87% among apolipoprotein E ε4 carriers.

DISCUSSION: Blood metabolites may serve as biomarkers identifying individuals at risk for MCI among US Hispanics/Latinos.

Kurniansyah N, Goodman MO, Kelly TN, Elfassy T, Wiggins KL, Bis JC, et al. A multi-ethnic polygenic risk score is associated with hypertension prevalence and progression throughout adulthood.. Nature communications. 2022;13(1):3549.

In a multi-stage analysis of 52,436 individuals aged 17-90 across diverse cohorts and biobanks, we train, test, and evaluate a polygenic risk score (PRS) for hypertension risk and progression. The PRS is trained using genome-wide association studies (GWAS) for systolic, diastolic blood pressure, and hypertension, respectively. For each trait, PRS is selected by optimizing the coefficient of variation (CV) across estimated effect sizes from multiple potential PRS using the same GWAS, after which the 3 trait-specific PRSs are combined via an unweighted sum called "PRSsum", forming the HTN-PRS. The HTN-PRS is associated with both prevalent and incident hypertension at 4-6 years of follow up. This association is further confirmed in age-stratified analysis. In an independent biobank of 40,201 individuals, the HTN-PRS is confirmed to be predictive of increased risk for coronary artery disease, ischemic stroke, type 2 diabetes, and chronic kidney disease.

Elgart M, Lyons G, Romero-Brufau S, Kurniansyah N, Brody JA, Guo X, et al. Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations.. Communications biology. 2022;5(1):856.

Polygenic risk scores (PRS) are commonly used to quantify the inherited susceptibility for a trait, yet they fail to account for non-linear and interaction effects between single nucleotide polymorphisms (SNPs). We address this via a machine learning approach, validated in nine complex phenotypes in a multi-ancestry population. We use an ensemble method of SNP selection followed by gradient boosted trees (XGBoost) to allow for non-linearities and interaction effects. We compare our results to the standard, linear PRS model developed using PRSice, LDpred2, and lassosum2. Combining a PRS as a feature in an XGBoost model results in a relative increase in the percentage variance explained compared to the standard linear PRS model by 22% for height, 27% for HDL cholesterol, 43% for body mass index, 50% for sleep duration, 58% for systolic blood pressure, 64% for total cholesterol, 66% for triglycerides, 77% for LDL cholesterol, and 100% for diastolic blood pressure. Multi-ancestry trained models perform similarly to specific racial/ethnic group trained models and are consistently superior to the standard linear PRS models. This work demonstrates an effective method to account for non-linearities and interaction effects in genetics-based prediction models.

Nelson SC, Gogarten SM, Fullerton SM, Isasi CR, Mitchell BD, North KE, et al. Social and scientific motivations to move beyond groups in allele frequencies: The TOPMed experience.. American journal of human genetics. 2022;109(9):1582-90.

For the genomics community, allele frequencies within defined groups (or "strata") are useful across multiple research and clinical contexts. Benefits include allowing researchers to identify populations for replication or "look up" studies, enabling researchers to compare population-specific frequencies to validate findings, and facilitating assessment of variant pathogenicity in clinical contexts. However, there are potential concerns with stratified allele frequencies. These include potential re-identification (determining whether or not an individual participated in a given research study based on allele frequencies and individual-level genetic data), harm from associating stigmatizing variants with specific groups, potential reification of race as a biological rather than a socio-political category, and whether presenting stratified frequencies-and the downstream applications that this presentation enables-is consistent with participants' informed consents. The NHLBI Trans-Omics for Precision Medicine (TOPMed) program considered the scientific and social implications of different approaches for adding stratified frequencies to the TOPMed BRAVO (Browse All Variants Online) variant server. We recommend a novel approach of presenting ancestry-specific allele frequencies using a statistical method based upon local genetic ancestry inference. Notably, this approach does not require grouping individuals by either predominant global ancestry or race/ethnicity and, therefore, mitigates re-identification and other concerns as the mixture distribution of ancestral allele frequencies varies across the genome. Here we describe our considerations and approach, which can assist other genomics research programs facing similar issues of how to define and present stratified frequencies in publicly available variant databases.

Khan AT, Gogarten SM, McHugh CP, Stilp AM, Sofer T, Bowers ML, et al. Recommendations on the use and reporting of race, ethnicity, and ancestry in genetic research: Experiences from the NHLBI TOPMed program.. Cell genomics. 2022;2(8).

How race, ethnicity, and ancestry are used in genomic research has wide-ranging implications for how research is translated into clinical care and incorporated into public understanding. Correlation between race and genetic ancestry contributes to unresolved complexity for the scientific community, as illustrated by heterogeneous definitions and applications of these variables. Here, we offer commentary and recommendations on the use of race, ethnicity, and ancestry across the arc of genetic research, including data harmonization, analysis, and reporting. While informed by our experiences as researchers affiliated with the NHLBI Trans-Omics for Precision Medicine (TOPMed) program, these recommendations are applicable to basic and translational genomic research in diverse populations with genome-wide data. Moving forward, considerable collaborative effort will be required to ensure that race, ethnicity, and ancestry are described and used appropriately to generate scientific knowledge that yields broad and equitable benefit.

Wang H, Kurniansyah N, Cade BE, Goodman MO, Chen H, Gottlieb DJ, et al. Upregulated heme biosynthesis increases obstructive sleep apnea severity: a pathway-based Mendelian randomization study.. Scientific reports. 2022;12(1):1472.

Obstructive sleep apnea (OSA) is a common disorder associated with increased risk of cardiovascular disease and mortality. Iron and heme metabolism, implicated in ventilatory control and OSA comorbidities, was associated with OSA phenotypes in recent admixture mapping and gene enrichment analyses. However, its causal contribution was unclear. In this study, we performed pathway-level transcriptional Mendelian randomization (MR) analysis to investigate the causal relationships between iron and heme related pathways and OSA. In primary analysis, we examined the expression level of four iron/heme Reactome pathways as exposures and four OSA traits as outcomes using cross-tissue cis-eQTLs from the Genotype-Tissue Expression portal and published genome-wide summary statistics of OSA. We identify a significant putative causal association between up-regulated heme biosynthesis pathway with higher sleep time percentage of hypoxemia (p = 6.14 × 10-3). This association is supported by consistency of point estimates in one-sample MR in the Multi-Ethnic Study of Atherosclerosis using high coverage DNA and RNA sequencing data generated by the Trans-Omics for Precision Medicine project. Secondary analysis for 37 additional iron/heme Gene Ontology pathways did not reveal any significant causal associations. This study suggests a causal association between increased heme biosynthesis and OSA severity.

Wainschtein P, Jain D, Zheng Z, Group TAW, Consortium NTO for PM, Cupples A, et al. Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data.. Nature genetics. 2022;54(3):263-7.

Analyses of data from genome-wide association studies on unrelated individuals have shown that, for human traits and diseases, approximately one-third to two-thirds of heritability is captured by common SNPs. However, it is not known whether the remaining heritability is due to the imperfect tagging of causal variants by common SNPs, in particular whether the causal variants are rare, or whether it is overestimated due to bias in inference from pedigree data. Here we estimated heritability for height and body mass index (BMI) from whole-genome sequence data on 25,465 unrelated individuals of European ancestry. The estimated heritability was 0.68 (standard error 0.10) for height and 0.30 (standard error 0.10) for body mass index. Low minor allele frequency variants in low linkage disequilibrium (LD) with neighboring variants were enriched for heritability, to a greater extent for protein-altering variants, consistent with negative selection. Our results imply that rare variants, in particular those in regions of low linkage disequilibrium, are a major source of the still missing heritability of complex traits and disease.

Hu X, Qiao D, Kim W, Moll M, Balte PP, Lange LA, et al. Polygenic transcriptome risk scores for COPD and lung function improve cross-ethnic portability of prediction in the NHLBI TOPMed program.. American journal of human genetics. 2022;109(5):857-70.

While polygenic risk scores (PRSs) enable early identification of genetic risk for chronic obstructive pulmonary disease (COPD), predictive performance is limited when the discovery and target populations are not well matched. Hypothesizing that the biological mechanisms of disease are shared across ancestry groups, we introduce a PrediXcan-derived polygenic transcriptome risk score (PTRS) to improve cross-ethnic portability of risk prediction. We constructed the PTRS using summary statistics from application of PrediXcan on large-scale GWASs of lung function (forced expiratory volume in 1 s [FEV1] and its ratio to forced vital capacity [FEV1/FVC]) in the UK Biobank. We examined prediction performance and cross-ethnic portability of PTRS through smoking-stratified analyses both on 29,381 multi-ethnic participants from TOPMed population/family-based cohorts and on 11,771 multi-ethnic participants from TOPMed COPD-enriched studies. Analyses were carried out for two dichotomous COPD traits (moderate-to-severe and severe COPD) and two quantitative lung function traits (FEV1 and FEV1/FVC). While the proposed PTRS showed weaker associations with disease than PRS for European ancestry, the PTRS showed stronger association with COPD than PRS for African Americans (e.g., odds ratio [OR] = 1.24 [95% confidence interval [CI]: 1.08-1.43] for PTRS versus 1.10 [0.96-1.26] for PRS among heavy smokers with ≥ 40 pack-years of smoking) for moderate-to-severe COPD. Cross-ethnic portability of the PTRS was significantly higher than the PRS (paired t test p < 2.2 × 10-16 with portability gains ranging from 5% to 28%) for both dichotomous COPD traits and across all smoking strata. Our study demonstrates the value of PTRS for improved cross-ethnic portability compared to PRS in predicting COPD risk.