Increased urine albumin excretion is highly prevalent in Hispanics/Latinos. Previous studies have found an association between urine albumin excretion and Amerindian ancestry in Hispanic/Latino populations. Admixture between racial/ethnic groups creates long-range linkage disequilibrium between variants with different allelic frequencies in the founding populations and it can be used to localize genes. Hispanic/Latino genomes are an admixture of European, African, and Amerindian ancestries. We leveraged this admixture to identify associations between urine albumin excretion (urine albumin-to-creatinine ratio [UACR]) and genomic regions harboring variants with highly differentiated allele frequencies among the ancestral populations. Admixture mapping analysis of 12,212 Hispanic Community Health Study/Study of Latinos participants, using a linear mixed model, identified three novel genome-wide significant signals on chromosomes 2, 11, and 16. The admixture mapping signal identified on chromosome 2, spanning q11.2-14.1 and not previously reported for UACR, is driven by a difference between Amerindian ancestry and the other two ancestries (P<5.7 × 10-5). Within this locus, two common variants located at the proapoptotic BCL2L11 gene associated with UACR: rs116907128 (allele frequency =0.14; P=1.5 × 10-7) and rs586283 (C allele frequency =0.35; P=4.2 × 10-7). In a secondary analysis, rs116907128 accounted for most of the admixture mapping signal observed in the region. The rs116907128 variant is common among full-heritage Pima Indians (A allele frequency =0.54) but is monomorphic in the 1000 Genomes European and African populations. In a replication analysis using a sample of full-heritage Pima Indians, rs116907128 significantly associated with UACR (P=0.01; n=1568). Our findings provide evidence for the presence of Amerindian-specific variants influencing the variation of urine albumin excretion in Hispanics/Latinos.
Publications
2017
This corrects the article DOI: 10.1038/ncomms15805.
Trans-ethnic meta-analysis of genome-wide association studies (GWAS) across diverse populations can increase power to detect complex trait loci when the underlying causal variants are shared between ancestry groups. However, heterogeneity in allelic effects between GWAS at these loci can occur that is correlated with ancestry. Here, a novel approach is presented to detect SNP association and quantify the extent of heterogeneity in allelic effects that is correlated with ancestry. We employ trans-ethnic meta-regression to model allelic effects as a function of axes of genetic variation, derived from a matrix of mean pairwise allele frequency differences between GWAS, and implemented in the MR-MEGA software. Through detailed simulations, we demonstrate increased power to detect association for MR-MEGA over fixed- and random-effects meta-analysis across a range of scenarios of heterogeneity in allelic effects between ethnic groups. We also demonstrate improved fine-mapping resolution, in loci containing a single causal variant, compared to these meta-analysis approaches and PAINTOR, and equivalent performance to MANTRA at reduced computational cost. Application of MR-MEGA to trans-ethnic GWAS of kidney function in 71,461 individuals indicates stronger signals of association than fixed-effects meta-analysis when heterogeneity in allelic effects is correlated with ancestry. Application of MR-MEGA to fine-mapping four type 2 diabetes susceptibility loci in 22,086 cases and 42,539 controls highlights: (i) strong evidence for heterogeneity in allelic effects that is correlated with ancestry only at the index SNP for the association signal at the CDKAL1 locus; and (ii) 99% credible sets with six or fewer variants for five distinct association signals.
Hypertension prevalence varies between ethnic groups, possibly due to differences in genetic, environmental, and cultural determinants. Hispanic/Latino Americans are a diverse and understudied population. We performed a genome-wide association study (GWAS) of blood pressure (BP) traits in 12,278 participants from the Hispanics Community Health Study/Study of Latinos (HCHS/SOL). In the discovery phase we identified eight previously unreported BP loci. In the replication stage, we tested these loci in the 1982 Pelotas Birth Cohort Study of admixed Southern Brazilians, the COGENT-BP study of African descent, women of European descent from the Women Health Initiative (WHI), and a sample of European descent from the UK Biobank. No loci met the Bonferroni-adjusted level of statistical significance (0.0024). Two loci had marginal evidence of replication: rs78701042 (NGF) with diastolic BP (P = 0.008 in the 1982 Pelotas Birth Cohort Study), and rs7315692 (SLC5A8) with systolic BP (P = 0.007 in European ancestry replication). We investigated whether previously reported loci associated with BP in studies of European, African, and Asian ancestry generalize to Hispanics/Latinos. Overall, 26% of the known associations in studies of individuals of European and Chinese ancestries generalized, while only a single association previously discovered in a people of African descent generalized.
Heritability is the proportion of phenotypic variance in a population that is attributable to individual genotypes. Heritability is considered an important measure in both evolutionary biology and in medicine, and is routinely estimated and reported in genetic epidemiology studies. In population-based genome-wide association studies (GWAS), mixed models are used to estimate variance components, from which a heritability estimate is obtained. The estimated heritability is the proportion of the model's total variance that is due to the genetic relatedness matrix (kinship measured from genotypes). Current practice is to use bootstrapping, which is slow, or normal asymptotic approximation to estimate the precision of the heritability estimate; however, this approximation fails to hold near the boundaries of the parameter space or when the sample size is small. In this paper we propose to estimate variance components via a Haseman-Elston regression, find the asymptotic distribution of the variance components and proportions of variance, and use them to construct confidence intervals (CIs). Our method is further developed to obtain unbiased variance components estimators and construct CIs by meta-analyzing information from multiple studies. We demonstrate our approach on data from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL).
BACKGROUND: Despite ethnic disparities in lipid profiles, there are few genome-wide association studies investigating genetic variation of lipids in non-European ancestry populations. In this study, we present findings from genetic association analyses for total cholesterol, low density lipoprotein cholesterol (LDL), high density lipoprotein cholesterol (HDL), and triglycerides in a large Hispanic/Latino cohort in the U.S., the Hispanic Community Health Study / Study of Latinos (HCHS/SOL).
METHODS: We estimated a heritability of approximately 20% for each lipid trait, similar to previous estimates in Europeans. To search for novel lipid loci, we performed conditional association analysis in which the statistical model was adjusted for previously reported SNPs associated with any of the four lipid traits. SNPs that remained genome-wide significant (P < 5 × 10-8) after conditioning on known loci were evaluated for replication.
RESULTS: We identified eight potentially novel lipid signals with minor allele frequencies <1%, none of which replicated. We tested previously reported SNP-trait associations for generalization to Hispanics/Latinos via a statistical framework. The generalization analysis revealed that approximately 50% of previously established lipid variants generalize to HCHS/SOL based on directional FDR r-value < 0.05. Some failures to generalize were due to lack of power.
CONCLUSIONS: These results demonstrate that many loci associated with lipid levels are shared across populations.
BACKGROUND: Although time-domain measures of heart rate variability (HRV) are used to estimate cardiac autonomic tone and disease risk in multiethnic populations, the genetic epidemiology of HRV in Hispanics/Latinos has not been characterized.
OBJECTIVE: The purpose of this study was to conduct a genome-wide association study of heart rate (HR) and its variability in the Hispanic Community Health Study/Study of Latinos, Multi-Ethnic Study of Atherosclerosis, and Women's Health Initiative Hispanic SNP-Health Association Resource project (n = 13,767).
METHODS: We estimated HR (bpm), standard deviation of normal-to-normal interbeat intervals (SDNN, ms), and root mean squared difference in successive, normal-to-normal interbeat intervals (RMSSD, ms) from resting, standard 12-lead ECGs. We estimated associations between each phenotype and 17 million genotyped or imputed single nucleotide polymorphisms (SNPs), accounting for relatedness and adjusting for age, sex, study site, and ancestry. Cohort-specific estimates were combined using fixed-effects, inverse-variance meta-analysis. We investigated replication for select SNPs exceeding genome-wide (P <5 × 10-8) or suggestive (P <10-6) significance thresholds.
RESULTS: Two genome-wide significant SNPs replicated in a European ancestry cohort, 1 one for RMSSD (rs4963772; chromosome 12) and another for SDNN (rs12982903; chromosome 19). A suggestive SNP for HR (rs236352; chromosome 6) replicated in an African-American cohort. Functional annotation of replicated SNPs in cardiac and neuronal tissues identified potentially causal variants and mechanisms.
CONCLUSION: This first genome-wide association study of HRV and HR in Hispanics/Latinos underscores the potential for even modestly sized samples of non-European ancestry to inform the genetic epidemiology of complex traits.
Admixture mapping can be used to detect genetic association regions in admixed populations, such as Hispanics/Latinos, by estimating associations between local ancestry allele counts and the trait of interest. We performed admixture mapping of the blood pressure traits systolic and diastolic blood pressure (SBP, DBP), mean arterial pressure (MAP), and pulse pressure (PP), in a dataset of 12,116 participants from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). Hispanics/Latinos have three predominant ancestral populations (European, African, and Amerindian), for each of which we separately tested local ancestry intervals across the genome. We identified four regions that were significantly associated with a blood pressure trait at the genome-wide admixture mapping level. A 6p21.31 Amerindian ancestry association region has multiple known associations, but none explained the admixture mapping signal. We identified variants that completely explained this signal. One of these variants had p-values of 0.02 (MAP) and 0.04 (SBP) in replication testing in Pima Indians. A 11q13.4 Amerindian ancestry association region spans a variant that was previously reported (p-value = 0.001) in a targeted association study of Blood Pressure (BP) traits and variants in the vitamin D pathway. There was no replication evidence supporting an association in the identified 17q25.3 Amerindian ancestry association region. For a region on 6p12.3, associated with African ancestry, we did not identify any candidate variants driving the association. It may be driven by rare variants. Whole genome sequence data may be necessary to fine map these association signals, which may contribute to disparities in BP traits between diverse populations.
We propose a weighted pseudolikelihood method for analyzing the association of a SNP set, example, SNPs in a gene or a genetic pathway or network, with multiple secondary phenotypes in case-control genetic association studies. To boost analysis power, we assume that the SNP-specific effects are shared across all secondary phenotypes using a scaled mean model. We estimate regression parameters using Inverse Probability Weighted (IPW) estimating equations obtained from the weighted pseudolikelihood, which accounts for case-control sampling to prevent potential ascertainment bias. To test the effect of a SNP set, we propose a weighted variance component pseudo-score test. We also propose a penalized IPW pseudolikelihood method for selecting a subset of SNPs that are associated with the multiple secondary phenotypes. We show that the proposed variable selection procedure has the oracle properties and is robust to misspecification of the correlation structure among secondary phenotypes. We select the tuning parameter using a weighted Bayesian Information-like Criterion (wBIC). We evaluate the finite sample performance of the proposed methods via simulations, and illustrate the methods by the analysis of the multiple secondary smoking behavior outcomes in a lung cancer case-control genetic association study.
Temporomandibular disorder (TMD) is a musculoskeletal condition characterized by pain and reduced function in the temporomandibular joint and/or associated masticatory musculature. Prevalence in the United States is 5% and twice as high among women as men. We conducted a discovery genome-wide association study (GWAS) of TMD in 10,153 participants (769 cases, 9,384 controls) of the US Hispanic Community Health Study/Study of Latinos (HCHS/SOL). The most promising single-nucleotide polymorphisms (SNPs) were tested in meta-analysis of 4 independent cohorts. One replication cohort was from the United States, and the others were from Germany, Finland, and Brazil, totaling 1,911 TMD cases and 6,903 controls. A locus near the sarcoglycan alpha ( SGCA), rs4794106, was suggestive in the discovery analysis ( P = 2.6 × 106) and replicated (i.e., 1-tailed P = 0.016) in the Brazilian cohort. In the discovery cohort, sex-stratified analysis identified 2 additional genome-wide significant loci in females. One lying upstream of the relaxin/insulin-like family peptide receptor 2 ( RXP2) (chromosome 13, rs60249166, odds ratio [OR] = 0.65, P = 3.6 × 10-8) was replicated among females in the meta-analysis (1-tailed P = 0.052). The other (chromosome 17, rs1531554, OR = 0.68, P = 2.9 × 10-8) was replicated among females (1-tailed P = 0.002), as well as replicated in meta-analysis of both sexes (1-tailed P = 0.021). A novel locus at genome-wide level of significance (rs73460075, OR = 0.56, P = 3.8 × 10-8) in the intron of the dystrophin gene DMD (X chromosome), and a suggestive locus on chromosome 7 (rs73271865, P = 2.9 × 10-7) upstream of the Sp4 Transcription Factor ( SP4) gene were identified in the discovery cohort, but neither of these was replicated. The SGCA gene encodes SGCA, which is involved in the cellular structure of muscle fibers and, along with DMD, forms part of the dystrophin-glycoprotein complex. Functional annotation suggested that several of these variants reside in loci that regulate processes relevant to TMD pathobiologic processes.