Allele frequency estimates in admixed populations, such as Hispanics and Latinos, rely on the sample's specific admixture composition and thus may differ between two seemingly similar populations. However, ancestry-specific allele frequencies, i.e., pertaining to the ancestral populations of an admixed group, may be particularly useful for prioritizing genetic variants for genetic discovery and personalized genomic health. We developed a method, ancestry-specific allele frequency estimation in admixed populations (AFA), to estimate the frequencies of biallelic variants in admixed populations with an unlimited number of ancestries. AFA uses maximum-likelihood estimation by modeling the conditional probability of having an allele given proportions of genetic ancestries. It can be applied using either local ancestry interval proportions encompassing the variant (local-ancestry-specific allele frequency estimations in admixed populations [LAFAs]) or global proportions of genetic ancestries (global-ancestry-specific allele frequency estimations in admixed populations [GAFAs]), which are easier to compute and are more widely available. Simulations and comparisons to existing software demonstrated the high accuracy of the method. We implemented AFA on high-quality imputed data of ∼9,000 Hispanics and Latinos from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), an understudied, admixed population with three predominant continental ancestries: Amerindian, European, and African. Comparison of the European and African estimated frequencies to the respective gnomAD frequencies demonstrated high correlations (Pearson R2 = 0.97-0.99). We provide a genome-wide dataset of the estimated ancestry-specific allele frequencies for available variants with allele frequency between 5% and 95% in at least one of the three ancestral populations. Association analysis of Amerindian-enriched variants with cardiometabolic traits identified five loci associated with lipid traits in Hispanics and Latinos, demonstrating the utility of ancestry-specific allele frequencies in admixed populations.
Diverse Populations
In a multi-stage analysis of 52,436 individuals aged 17-90 across diverse cohorts and biobanks, we train, test, and evaluate a polygenic risk score (PRS) for hypertension risk and progression. The PRS is trained using genome-wide association studies (GWAS) for systolic, diastolic blood pressure, and hypertension, respectively. For each trait, PRS is selected by optimizing the coefficient of variation (CV) across estimated effect sizes from multiple potential PRS using the same GWAS, after which the 3 trait-specific PRSs are combined via an unweighted sum called "PRSsum", forming the HTN-PRS. The HTN-PRS is associated with both prevalent and incident hypertension at 4-6 years of follow up. This association is further confirmed in age-stratified analysis. In an independent biobank of 40,201 individuals, the HTN-PRS is confirmed to be predictive of increased risk for coronary artery disease, ischemic stroke, type 2 diabetes, and chronic kidney disease.
Polygenic risk scores (PRS) are commonly used to quantify the inherited susceptibility for a trait, yet they fail to account for non-linear and interaction effects between single nucleotide polymorphisms (SNPs). We address this via a machine learning approach, validated in nine complex phenotypes in a multi-ancestry population. We use an ensemble method of SNP selection followed by gradient boosted trees (XGBoost) to allow for non-linearities and interaction effects. We compare our results to the standard, linear PRS model developed using PRSice, LDpred2, and lassosum2. Combining a PRS as a feature in an XGBoost model results in a relative increase in the percentage variance explained compared to the standard linear PRS model by 22% for height, 27% for HDL cholesterol, 43% for body mass index, 50% for sleep duration, 58% for systolic blood pressure, 64% for total cholesterol, 66% for triglycerides, 77% for LDL cholesterol, and 100% for diastolic blood pressure. Multi-ancestry trained models perform similarly to specific racial/ethnic group trained models and are consistently superior to the standard linear PRS models. This work demonstrates an effective method to account for non-linearities and interaction effects in genetics-based prediction models.
For the genomics community, allele frequencies within defined groups (or "strata") are useful across multiple research and clinical contexts. Benefits include allowing researchers to identify populations for replication or "look up" studies, enabling researchers to compare population-specific frequencies to validate findings, and facilitating assessment of variant pathogenicity in clinical contexts. However, there are potential concerns with stratified allele frequencies. These include potential re-identification (determining whether or not an individual participated in a given research study based on allele frequencies and individual-level genetic data), harm from associating stigmatizing variants with specific groups, potential reification of race as a biological rather than a socio-political category, and whether presenting stratified frequencies-and the downstream applications that this presentation enables-is consistent with participants' informed consents. The NHLBI Trans-Omics for Precision Medicine (TOPMed) program considered the scientific and social implications of different approaches for adding stratified frequencies to the TOPMed BRAVO (Browse All Variants Online) variant server. We recommend a novel approach of presenting ancestry-specific allele frequencies using a statistical method based upon local genetic ancestry inference. Notably, this approach does not require grouping individuals by either predominant global ancestry or race/ethnicity and, therefore, mitigates re-identification and other concerns as the mixture distribution of ancestral allele frequencies varies across the genome. Here we describe our considerations and approach, which can assist other genomics research programs facing similar issues of how to define and present stratified frequencies in publicly available variant databases.
How race, ethnicity, and ancestry are used in genomic research has wide-ranging implications for how research is translated into clinical care and incorporated into public understanding. Correlation between race and genetic ancestry contributes to unresolved complexity for the scientific community, as illustrated by heterogeneous definitions and applications of these variables. Here, we offer commentary and recommendations on the use of race, ethnicity, and ancestry across the arc of genetic research, including data harmonization, analysis, and reporting. While informed by our experiences as researchers affiliated with the NHLBI Trans-Omics for Precision Medicine (TOPMed) program, these recommendations are applicable to basic and translational genomic research in diverse populations with genome-wide data. Moving forward, considerable collaborative effort will be required to ensure that race, ethnicity, and ancestry are described and used appropriately to generate scientific knowledge that yields broad and equitable benefit.
INTRODUCTION: We studied the replication and generalization of previously identified metabolites potentially associated with global cognitive function in multiple race/ethnicities and assessed the contribution of diet to these associations.
METHODS: We tested metabolite-cognitive function associations in U.S.A. Hispanic/Latino adults (n = 2222) from the Community Health Study/ Study of Latinos (HCHS/SOL) and in European (n = 1365) and African (n = 478) Americans from the Atherosclerosis Risk In Communities (ARIC) Study. We applied Mendelian Randomization (MR) analyses to assess causal associations between the metabolites and cognitive function and between Mediterranean diet and cognitive function.
RESULTS: Six metabolites were consistently associated with lower global cognitive function across all studies. Of these, four were sugar-related (e.g., ribitol). MR analyses provided weak evidence for a potential causal effect of ribitol on cognitive function and bi-directional effects of cognitive performance on diet.
DISCUSSION: Several diet-related metabolites were associated with global cognitive function across studies with different race/ethnicities.
HIGHLIGHTS: Metabolites associated with cognitive function in Puerto Rican adults were recently identified. We demonstrate the generalizability of these associations across diverse race/ethnicities. Most identified metabolites are related to sugars. Mendelian Randomization (MR) provides weak evidence for a causal effect of ribitol on cognitive function. Beta-cryptoxanthin and other metabolites highlight the importance of a healthy diet.
Estimated glomerular filtration rate (eGFR) is used to evaluate kidney function and determine the presence of chronic kidney disease (CKD), a highly prevalent disease in the US1 , 2 , 3 that varies among subgroups of Hispanic/Latino individuals.4 , 5 The polygenic risk score (PRS) is a popular method that uses large genome-wide association studies (GWASs) to provide a strong estimate of disease risk.7 However, due to the limited availability of summary statistics from GWAS meta-analyses based on Hispanic/Latino populations, PRSs can only be computed using different ancestry GWASs. The performance of eGFR PRSs derived from other GWAS reference populations for Hispanic/Latino population has not been examined. We compared PRS constructions for eGFR prediction in Hispanic/Latino individuals using GWAS-significant variants, clumping and thresholding (C&T),8 and PRS-CS,22 as well as a combination of PRSs calculated with different reference GWAS meta-analyses from European and multi-ethnic studies in Hispanic/Latino individuals from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). All eGFR PRSs were highly associated with eGFR (p < 1E-20). Additionally, eGFR PRSs were significantly associated with lower risk of prevalent CKD at visit 1 or 2 and incident CKD at visit 2, with the combined PRSs having the best performance. These PRS findings were replicated in an additional dataset of Hispanic/Latino individuals using data from the Women's Health Initiative SNP Health Association Resource (WHI-SHARe).17.
OBJECTIVE: In the United States, Hispanic/Latino adults face a high burden of obesity; yet, not all individuals are equally affected, partly due in part to this ethnic group's marked sociocultural diversity. We sought to analyze the modification of body mass index (BMI) genetic effects in Hispanic/Latino adults by their level of acculturation, a complex biosocial phenomenon that remains understudied.
METHODS: Among 11,747 Hispanic/Latinos adults in the Hispanic Community Health Study/Study of Latinos aged 18 to 76 years from four urban communities (2008-2011), we a) tested our hypothesis that the effect of a genetic risk score (GRS) for increased BMI may be exacerbated by higher levels of acculturation and b) examined if GRS acculturation interactions varied by gender or Hispanic/Latino background group. All genetic modeling controlled for relatedness, age, gender, principal components of ancestry, center, and complex study design within a generalized estimated equation framework.
RESULTS: We observed a GRS increase of 0.34 kg/m 2 per risk allele in weighted mean BMI. The estimated main effect of GRS on BMI varied both across acculturation level and across gender. The difference between high and low acculturation ranged from 0.03 to 0.23 kg/m 2 per risk allele, but varied across acculturation measure and gender.
CONCLUSIONS: These results suggest the presence of effect modification by acculturation, with stronger effects on BMI among highly acculturated individuals and female immigrants. Future studies of obesity in the Hispanic/Latino community should account for sociocultural environments and consider their intersection with gender to better target obesity interventions.
BACKGROUND: Metabolic pathways are related to physiological functions and disease states and are influenced by genetic variation and environmental factors. Hispanics/Latino individuals have ancestry-derived genomic regions (local ancestry) from their recent admixture that have been less characterized for associations with metabolite abundance and disease risk.
METHODS: We performed admixture mapping of 640 circulating metabolites in 3887 Hispanic/Latino individuals from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). Metabolites were quantified in fasting serum through non-targeted mass spectrometry (MS) analysis using ultra-performance liquid chromatography-MS/MS. Replication was performed in 1856 nonoverlapping HCHS/SOL participants with metabolomic data.
RESULTS: By leveraging local ancestry, this study identified significant ancestry-enriched associations for 78 circulating metabolites at 484 independent regions, including 116 novel metabolite-genomic region associations that replicated in an independent sample. Among the main findings, we identified Native American enriched genomic regions at chromosomes 11 and 15, mapping to FADS1/FADS2 and LIPC, respectively, associated with reduced long-chain polyunsaturated fatty acid metabolites implicated in metabolic and inflammatory pathways. An African-derived genomic region at chromosome 2 was associated with N-acetylated amino acid metabolites. This region, mapped to ALMS1, is associated with chronic kidney disease, a disease that disproportionately burdens individuals of African descent.
CONCLUSIONS: Our findings provide important insights into differences in metabolite quantities related to ancestry in admixed populations including metabolites related to regulation of lipid polyunsaturated fatty acids and N-acetylated amino acids, which may have implications for common diseases in populations.
Polygenic risk scores (PRSs) are weighted sums of risk allele counts of single-nucleotide polymorphisms (SNPs) associated with a disease or trait. PRSs are typically constructed based on published results from Genome-Wide Association Studies (GWASs), and the majority of which has been performed in large populations of European ancestry (EA) individuals. Although many genotype-trait associations have generalized across populations, the optimal choice of SNPs and weights for PRSs may differ between populations due to different linkage disequilibrium (LD) and allele frequency patterns. We compare various approaches for PRS construction, using GWAS results from both large EA studies and a smaller study in Hispanics/Latinos: The Hispanic Community Health Study/Study of Latinos (HCHS/SOL, n = 12 , 803 ). We consider multiple approaches for selecting SNPs and for computing SNP weights. We study the performance of the resulting PRSs in an independent study of Hispanics/Latinos from the Women's Health Initiative (WHI, n = 3 , 582 ). We support our investigation with simulation studies of potential genetic architectures in a single locus. We observed that selecting variants based on EA GWASs generally performs well, except for blood pressure trait. However, the use of EA GWASs for weight estimation was suboptimal. Using non-EA GWAS results to estimate weights improved results.