In a multi-stage analysis of 52,436 individuals aged 17-90 across diverse cohorts and biobanks, we train, test, and evaluate a polygenic risk score (PRS) for hypertension risk and progression. The PRS is trained using genome-wide association studies (GWAS) for systolic, diastolic blood pressure, and hypertension, respectively. For each trait, PRS is selected by optimizing the coefficient of variation (CV) across estimated effect sizes from multiple potential PRS using the same GWAS, after which the 3 trait-specific PRSs are combined via an unweighted sum called "PRSsum", forming the HTN-PRS. The HTN-PRS is associated with both prevalent and incident hypertension at 4-6 years of follow up. This association is further confirmed in age-stratified analysis. In an independent biobank of 40,201 individuals, the HTN-PRS is confirmed to be predictive of increased risk for coronary artery disease, ischemic stroke, type 2 diabetes, and chronic kidney disease.
Polygenic Risk Scores
Polygenic risk scores (PRS) are commonly used to quantify the inherited susceptibility for a trait, yet they fail to account for non-linear and interaction effects between single nucleotide polymorphisms (SNPs). We address this via a machine learning approach, validated in nine complex phenotypes in a multi-ancestry population. We use an ensemble method of SNP selection followed by gradient boosted trees (XGBoost) to allow for non-linearities and interaction effects. We compare our results to the standard, linear PRS model developed using PRSice, LDpred2, and lassosum2. Combining a PRS as a feature in an XGBoost model results in a relative increase in the percentage variance explained compared to the standard linear PRS model by 22% for height, 27% for HDL cholesterol, 43% for body mass index, 50% for sleep duration, 58% for systolic blood pressure, 64% for total cholesterol, 66% for triglycerides, 77% for LDL cholesterol, and 100% for diastolic blood pressure. Multi-ancestry trained models perform similarly to specific racial/ethnic group trained models and are consistently superior to the standard linear PRS models. This work demonstrates an effective method to account for non-linearities and interaction effects in genetics-based prediction models.
Estimated glomerular filtration rate (eGFR) is used to evaluate kidney function and determine the presence of chronic kidney disease (CKD), a highly prevalent disease in the US1 , 2 , 3 that varies among subgroups of Hispanic/Latino individuals.4 , 5 The polygenic risk score (PRS) is a popular method that uses large genome-wide association studies (GWASs) to provide a strong estimate of disease risk.7 However, due to the limited availability of summary statistics from GWAS meta-analyses based on Hispanic/Latino populations, PRSs can only be computed using different ancestry GWASs. The performance of eGFR PRSs derived from other GWAS reference populations for Hispanic/Latino population has not been examined. We compared PRS constructions for eGFR prediction in Hispanic/Latino individuals using GWAS-significant variants, clumping and thresholding (C&T),8 and PRS-CS,22 as well as a combination of PRSs calculated with different reference GWAS meta-analyses from European and multi-ethnic studies in Hispanic/Latino individuals from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). All eGFR PRSs were highly associated with eGFR (p < 1E-20). Additionally, eGFR PRSs were significantly associated with lower risk of prevalent CKD at visit 1 or 2 and incident CKD at visit 2, with the combined PRSs having the best performance. These PRS findings were replicated in an additional dataset of Hispanic/Latino individuals using data from the Women's Health Initiative SNP Health Association Resource (WHI-SHARe).17.
OBJECTIVE: In the United States, Hispanic/Latino adults face a high burden of obesity; yet, not all individuals are equally affected, partly due in part to this ethnic group's marked sociocultural diversity. We sought to analyze the modification of body mass index (BMI) genetic effects in Hispanic/Latino adults by their level of acculturation, a complex biosocial phenomenon that remains understudied.
METHODS: Among 11,747 Hispanic/Latinos adults in the Hispanic Community Health Study/Study of Latinos aged 18 to 76 years from four urban communities (2008-2011), we a) tested our hypothesis that the effect of a genetic risk score (GRS) for increased BMI may be exacerbated by higher levels of acculturation and b) examined if GRS acculturation interactions varied by gender or Hispanic/Latino background group. All genetic modeling controlled for relatedness, age, gender, principal components of ancestry, center, and complex study design within a generalized estimated equation framework.
RESULTS: We observed a GRS increase of 0.34 kg/m 2 per risk allele in weighted mean BMI. The estimated main effect of GRS on BMI varied both across acculturation level and across gender. The difference between high and low acculturation ranged from 0.03 to 0.23 kg/m 2 per risk allele, but varied across acculturation measure and gender.
CONCLUSIONS: These results suggest the presence of effect modification by acculturation, with stronger effects on BMI among highly acculturated individuals and female immigrants. Future studies of obesity in the Hispanic/Latino community should account for sociocultural environments and consider their intersection with gender to better target obesity interventions.
Polygenic risk scores (PRSs) are weighted sums of risk allele counts of single-nucleotide polymorphisms (SNPs) associated with a disease or trait. PRSs are typically constructed based on published results from Genome-Wide Association Studies (GWASs), and the majority of which has been performed in large populations of European ancestry (EA) individuals. Although many genotype-trait associations have generalized across populations, the optimal choice of SNPs and weights for PRSs may differ between populations due to different linkage disequilibrium (LD) and allele frequency patterns. We compare various approaches for PRS construction, using GWAS results from both large EA studies and a smaller study in Hispanics/Latinos: The Hispanic Community Health Study/Study of Latinos (HCHS/SOL, n = 12 , 803 ). We consider multiple approaches for selecting SNPs and for computing SNP weights. We study the performance of the resulting PRSs in an independent study of Hispanics/Latinos from the Women's Health Initiative (WHI, n = 3 , 582 ). We support our investigation with simulation studies of potential genetic architectures in a single locus. We observed that selecting variants based on EA GWASs generally performs well, except for blood pressure trait. However, the use of EA GWASs for weight estimation was suboptimal. Using non-EA GWAS results to estimate weights improved results.