Trans-ethnic meta-analysis of genome-wide association studies (GWAS) across diverse populations can increase power to detect complex trait loci when the underlying causal variants are shared between ancestry groups. However, heterogeneity in allelic effects between GWAS at these loci can occur that is correlated with ancestry. Here, a novel approach is presented to detect SNP association and quantify the extent of heterogeneity in allelic effects that is correlated with ancestry. We employ trans-ethnic meta-regression to model allelic effects as a function of axes of genetic variation, derived from a matrix of mean pairwise allele frequency differences between GWAS, and implemented in the MR-MEGA software. Through detailed simulations, we demonstrate increased power to detect association for MR-MEGA over fixed- and random-effects meta-analysis across a range of scenarios of heterogeneity in allelic effects between ethnic groups. We also demonstrate improved fine-mapping resolution, in loci containing a single causal variant, compared to these meta-analysis approaches and PAINTOR, and equivalent performance to MANTRA at reduced computational cost. Application of MR-MEGA to trans-ethnic GWAS of kidney function in 71,461 individuals indicates stronger signals of association than fixed-effects meta-analysis when heterogeneity in allelic effects is correlated with ancestry. Application of MR-MEGA to fine-mapping four type 2 diabetes susceptibility loci in 22,086 cases and 42,539 controls highlights: (i) strong evidence for heterogeneity in allelic effects that is correlated with ancestry only at the index SNP for the association signal at the CDKAL1 locus; and (ii) 99% credible sets with six or fewer variants for five distinct association signals.
Publications
2017
Hypertension prevalence varies between ethnic groups, possibly due to differences in genetic, environmental, and cultural determinants. Hispanic/Latino Americans are a diverse and understudied population. We performed a genome-wide association study (GWAS) of blood pressure (BP) traits in 12,278 participants from the Hispanics Community Health Study/Study of Latinos (HCHS/SOL). In the discovery phase we identified eight previously unreported BP loci. In the replication stage, we tested these loci in the 1982 Pelotas Birth Cohort Study of admixed Southern Brazilians, the COGENT-BP study of African descent, women of European descent from the Women Health Initiative (WHI), and a sample of European descent from the UK Biobank. No loci met the Bonferroni-adjusted level of statistical significance (0.0024). Two loci had marginal evidence of replication: rs78701042 (NGF) with diastolic BP (P = 0.008 in the 1982 Pelotas Birth Cohort Study), and rs7315692 (SLC5A8) with systolic BP (P = 0.007 in European ancestry replication). We investigated whether previously reported loci associated with BP in studies of European, African, and Asian ancestry generalize to Hispanics/Latinos. Overall, 26% of the known associations in studies of individuals of European and Chinese ancestries generalized, while only a single association previously discovered in a people of African descent generalized.
Heritability is the proportion of phenotypic variance in a population that is attributable to individual genotypes. Heritability is considered an important measure in both evolutionary biology and in medicine, and is routinely estimated and reported in genetic epidemiology studies. In population-based genome-wide association studies (GWAS), mixed models are used to estimate variance components, from which a heritability estimate is obtained. The estimated heritability is the proportion of the model's total variance that is due to the genetic relatedness matrix (kinship measured from genotypes). Current practice is to use bootstrapping, which is slow, or normal asymptotic approximation to estimate the precision of the heritability estimate; however, this approximation fails to hold near the boundaries of the parameter space or when the sample size is small. In this paper we propose to estimate variance components via a Haseman-Elston regression, find the asymptotic distribution of the variance components and proportions of variance, and use them to construct confidence intervals (CIs). Our method is further developed to obtain unbiased variance components estimators and construct CIs by meta-analyzing information from multiple studies. We demonstrate our approach on data from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL).
BACKGROUND: Despite ethnic disparities in lipid profiles, there are few genome-wide association studies investigating genetic variation of lipids in non-European ancestry populations. In this study, we present findings from genetic association analyses for total cholesterol, low density lipoprotein cholesterol (LDL), high density lipoprotein cholesterol (HDL), and triglycerides in a large Hispanic/Latino cohort in the U.S., the Hispanic Community Health Study / Study of Latinos (HCHS/SOL).
METHODS: We estimated a heritability of approximately 20% for each lipid trait, similar to previous estimates in Europeans. To search for novel lipid loci, we performed conditional association analysis in which the statistical model was adjusted for previously reported SNPs associated with any of the four lipid traits. SNPs that remained genome-wide significant (P < 5 × 10-8) after conditioning on known loci were evaluated for replication.
RESULTS: We identified eight potentially novel lipid signals with minor allele frequencies <1%, none of which replicated. We tested previously reported SNP-trait associations for generalization to Hispanics/Latinos via a statistical framework. The generalization analysis revealed that approximately 50% of previously established lipid variants generalize to HCHS/SOL based on directional FDR r-value < 0.05. Some failures to generalize were due to lack of power.
CONCLUSIONS: These results demonstrate that many loci associated with lipid levels are shared across populations.
BACKGROUND: Although time-domain measures of heart rate variability (HRV) are used to estimate cardiac autonomic tone and disease risk in multiethnic populations, the genetic epidemiology of HRV in Hispanics/Latinos has not been characterized.
OBJECTIVE: The purpose of this study was to conduct a genome-wide association study of heart rate (HR) and its variability in the Hispanic Community Health Study/Study of Latinos, Multi-Ethnic Study of Atherosclerosis, and Women's Health Initiative Hispanic SNP-Health Association Resource project (n = 13,767).
METHODS: We estimated HR (bpm), standard deviation of normal-to-normal interbeat intervals (SDNN, ms), and root mean squared difference in successive, normal-to-normal interbeat intervals (RMSSD, ms) from resting, standard 12-lead ECGs. We estimated associations between each phenotype and 17 million genotyped or imputed single nucleotide polymorphisms (SNPs), accounting for relatedness and adjusting for age, sex, study site, and ancestry. Cohort-specific estimates were combined using fixed-effects, inverse-variance meta-analysis. We investigated replication for select SNPs exceeding genome-wide (P <5 × 10-8) or suggestive (P <10-6) significance thresholds.
RESULTS: Two genome-wide significant SNPs replicated in a European ancestry cohort, 1 one for RMSSD (rs4963772; chromosome 12) and another for SDNN (rs12982903; chromosome 19). A suggestive SNP for HR (rs236352; chromosome 6) replicated in an African-American cohort. Functional annotation of replicated SNPs in cardiac and neuronal tissues identified potentially causal variants and mechanisms.
CONCLUSION: This first genome-wide association study of HRV and HR in Hispanics/Latinos underscores the potential for even modestly sized samples of non-European ancestry to inform the genetic epidemiology of complex traits.
Admixture mapping can be used to detect genetic association regions in admixed populations, such as Hispanics/Latinos, by estimating associations between local ancestry allele counts and the trait of interest. We performed admixture mapping of the blood pressure traits systolic and diastolic blood pressure (SBP, DBP), mean arterial pressure (MAP), and pulse pressure (PP), in a dataset of 12,116 participants from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). Hispanics/Latinos have three predominant ancestral populations (European, African, and Amerindian), for each of which we separately tested local ancestry intervals across the genome. We identified four regions that were significantly associated with a blood pressure trait at the genome-wide admixture mapping level. A 6p21.31 Amerindian ancestry association region has multiple known associations, but none explained the admixture mapping signal. We identified variants that completely explained this signal. One of these variants had p-values of 0.02 (MAP) and 0.04 (SBP) in replication testing in Pima Indians. A 11q13.4 Amerindian ancestry association region spans a variant that was previously reported (p-value = 0.001) in a targeted association study of Blood Pressure (BP) traits and variants in the vitamin D pathway. There was no replication evidence supporting an association in the identified 17q25.3 Amerindian ancestry association region. For a region on 6p12.3, associated with African ancestry, we did not identify any candidate variants driving the association. It may be driven by rare variants. Whole genome sequence data may be necessary to fine map these association signals, which may contribute to disparities in BP traits between diverse populations.
We propose a weighted pseudolikelihood method for analyzing the association of a SNP set, example, SNPs in a gene or a genetic pathway or network, with multiple secondary phenotypes in case-control genetic association studies. To boost analysis power, we assume that the SNP-specific effects are shared across all secondary phenotypes using a scaled mean model. We estimate regression parameters using Inverse Probability Weighted (IPW) estimating equations obtained from the weighted pseudolikelihood, which accounts for case-control sampling to prevent potential ascertainment bias. To test the effect of a SNP set, we propose a weighted variance component pseudo-score test. We also propose a penalized IPW pseudolikelihood method for selecting a subset of SNPs that are associated with the multiple secondary phenotypes. We show that the proposed variable selection procedure has the oracle properties and is robust to misspecification of the correlation structure among secondary phenotypes. We select the tuning parameter using a weighted Bayesian Information-like Criterion (wBIC). We evaluate the finite sample performance of the proposed methods via simulations, and illustrate the methods by the analysis of the multiple secondary smoking behavior outcomes in a lung cancer case-control genetic association study.
Temporomandibular disorder (TMD) is a musculoskeletal condition characterized by pain and reduced function in the temporomandibular joint and/or associated masticatory musculature. Prevalence in the United States is 5% and twice as high among women as men. We conducted a discovery genome-wide association study (GWAS) of TMD in 10,153 participants (769 cases, 9,384 controls) of the US Hispanic Community Health Study/Study of Latinos (HCHS/SOL). The most promising single-nucleotide polymorphisms (SNPs) were tested in meta-analysis of 4 independent cohorts. One replication cohort was from the United States, and the others were from Germany, Finland, and Brazil, totaling 1,911 TMD cases and 6,903 controls. A locus near the sarcoglycan alpha ( SGCA), rs4794106, was suggestive in the discovery analysis ( P = 2.6 × 106) and replicated (i.e., 1-tailed P = 0.016) in the Brazilian cohort. In the discovery cohort, sex-stratified analysis identified 2 additional genome-wide significant loci in females. One lying upstream of the relaxin/insulin-like family peptide receptor 2 ( RXP2) (chromosome 13, rs60249166, odds ratio [OR] = 0.65, P = 3.6 × 10-8) was replicated among females in the meta-analysis (1-tailed P = 0.052). The other (chromosome 17, rs1531554, OR = 0.68, P = 2.9 × 10-8) was replicated among females (1-tailed P = 0.002), as well as replicated in meta-analysis of both sexes (1-tailed P = 0.021). A novel locus at genome-wide level of significance (rs73460075, OR = 0.56, P = 3.8 × 10-8) in the intron of the dystrophin gene DMD (X chromosome), and a suggestive locus on chromosome 7 (rs73271865, P = 2.9 × 10-7) upstream of the Sp4 Transcription Factor ( SP4) gene were identified in the discovery cohort, but neither of these was replicated. The SGCA gene encodes SGCA, which is involved in the cellular structure of muscle fibers and, along with DMD, forms part of the dystrophin-glycoprotein complex. Functional annotation suggested that several of these variants reside in loci that regulate processes relevant to TMD pathobiologic processes.
Circulating white blood cell (WBC) counts (neutrophils, monocytes, lymphocytes, eosinophils, basophils) differ by ethnicity. The genetic factors underlying basal WBC traits in Hispanics/Latinos are unknown. We performed a genome-wide association study of total WBC and differential counts in a large, ethnically diverse US population sample of Hispanics/Latinos ascertained by the Hispanic Community Health Study and Study of Latinos (HCHS/SOL). We demonstrate that several previously known WBC-associated genetic loci (e.g. the African Duffy antigen receptor for chemokines null variant for neutrophil count) are generalizable to WBC traits in Hispanics/Latinos. We identified and replicated common and rare germ-line variants at FLT3 (a gene often somatically mutated in leukemia) associated with monocyte count. The common FLT3 variant rs76428106 has a large allele frequency differential between African and non-African populations. We also identified several novel genetic loci involving or regulating hematopoietic transcription factors (CEBPE-SLC7A7, CEBPA and CRBN-TRNT1) associated with basophil count. The minor allele of the CEBPE variant associated with lower basophil count has been previously associated with Amerindian ancestry and higher risk of acute lymphoblastic leukemia in Hispanics. Together, these data suggest that germline genetic variation affecting transcriptional and signaling pathways that underlie WBC development and lineage specification can contribute to inter-individual as well as ethnic differences in peripheral blood cell counts (normal hematopoiesis) in addition to susceptibility to leukemia (malignant hematopoiesis).
In genome-wide association studies (GWAS), "generalization" is the replication of genotype-phenotype association in a population with different ancestry than the population in which it was first identified. Current practices for declaring generalizations rely on testing associations while controlling the family-wise error rate (FWER) in the discovery study, then separately controlling error measures in the follow-up study. This approach does not guarantee control over the FWER or false discovery rate (FDR) of the generalization null hypotheses. It also fails to leverage the two-stage design to increase power for detecting generalized associations. We provide a formal statistical framework for quantifying the evidence of generalization that accounts for the (in)consistency between the directions of associations in the discovery and follow-up studies. We develop the directional generalization FWER (FWERg ) and FDR (FDRg ) controlling r-values, which are used to declare associations as generalized. This framework extends to generalization testing when applied to a published list of Single Nucleotide Polymorphism-(SNP)-trait associations. Our methods control FWERg or FDRg under various SNP selection rules based on P-values in the discovery study. We find that it is often beneficial to use a more lenient P-value threshold than the genome-wide significance threshold. In a GWAS of total cholesterol in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), when testing all SNPs with P-values <5×10-8 (15 genomic regions) for generalization in a large GWAS of whites, we generalized SNPs from 15 regions. But when testing all SNPs with P-values <6.6×10-5 (89 regions), we generalized SNPs from 27 regions.