Admixture mapping can be used to detect genetic association regions in admixed populations, such as Hispanics/Latinos, by estimating associations between local ancestry allele counts and the trait of interest. We performed admixture mapping of the blood pressure traits systolic and diastolic blood pressure (SBP, DBP), mean arterial pressure (MAP), and pulse pressure (PP), in a dataset of 12,116 participants from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). Hispanics/Latinos have three predominant ancestral populations (European, African, and Amerindian), for each of which we separately tested local ancestry intervals across the genome. We identified four regions that were significantly associated with a blood pressure trait at the genome-wide admixture mapping level. A 6p21.31 Amerindian ancestry association region has multiple known associations, but none explained the admixture mapping signal. We identified variants that completely explained this signal. One of these variants had p-values of 0.02 (MAP) and 0.04 (SBP) in replication testing in Pima Indians. A 11q13.4 Amerindian ancestry association region spans a variant that was previously reported (p-value = 0.001) in a targeted association study of Blood Pressure (BP) traits and variants in the vitamin D pathway. There was no replication evidence supporting an association in the identified 17q25.3 Amerindian ancestry association region. For a region on 6p12.3, associated with African ancestry, we did not identify any candidate variants driving the association. It may be driven by rare variants. Whole genome sequence data may be necessary to fine map these association signals, which may contribute to disparities in BP traits between diverse populations.
Publications by Year: 2017
2017
We propose a weighted pseudolikelihood method for analyzing the association of a SNP set, example, SNPs in a gene or a genetic pathway or network, with multiple secondary phenotypes in case-control genetic association studies. To boost analysis power, we assume that the SNP-specific effects are shared across all secondary phenotypes using a scaled mean model. We estimate regression parameters using Inverse Probability Weighted (IPW) estimating equations obtained from the weighted pseudolikelihood, which accounts for case-control sampling to prevent potential ascertainment bias. To test the effect of a SNP set, we propose a weighted variance component pseudo-score test. We also propose a penalized IPW pseudolikelihood method for selecting a subset of SNPs that are associated with the multiple secondary phenotypes. We show that the proposed variable selection procedure has the oracle properties and is robust to misspecification of the correlation structure among secondary phenotypes. We select the tuning parameter using a weighted Bayesian Information-like Criterion (wBIC). We evaluate the finite sample performance of the proposed methods via simulations, and illustrate the methods by the analysis of the multiple secondary smoking behavior outcomes in a lung cancer case-control genetic association study.
Temporomandibular disorder (TMD) is a musculoskeletal condition characterized by pain and reduced function in the temporomandibular joint and/or associated masticatory musculature. Prevalence in the United States is 5% and twice as high among women as men. We conducted a discovery genome-wide association study (GWAS) of TMD in 10,153 participants (769 cases, 9,384 controls) of the US Hispanic Community Health Study/Study of Latinos (HCHS/SOL). The most promising single-nucleotide polymorphisms (SNPs) were tested in meta-analysis of 4 independent cohorts. One replication cohort was from the United States, and the others were from Germany, Finland, and Brazil, totaling 1,911 TMD cases and 6,903 controls. A locus near the sarcoglycan alpha ( SGCA), rs4794106, was suggestive in the discovery analysis ( P = 2.6 × 106) and replicated (i.e., 1-tailed P = 0.016) in the Brazilian cohort. In the discovery cohort, sex-stratified analysis identified 2 additional genome-wide significant loci in females. One lying upstream of the relaxin/insulin-like family peptide receptor 2 ( RXP2) (chromosome 13, rs60249166, odds ratio [OR] = 0.65, P = 3.6 × 10-8) was replicated among females in the meta-analysis (1-tailed P = 0.052). The other (chromosome 17, rs1531554, OR = 0.68, P = 2.9 × 10-8) was replicated among females (1-tailed P = 0.002), as well as replicated in meta-analysis of both sexes (1-tailed P = 0.021). A novel locus at genome-wide level of significance (rs73460075, OR = 0.56, P = 3.8 × 10-8) in the intron of the dystrophin gene DMD (X chromosome), and a suggestive locus on chromosome 7 (rs73271865, P = 2.9 × 10-7) upstream of the Sp4 Transcription Factor ( SP4) gene were identified in the discovery cohort, but neither of these was replicated. The SGCA gene encodes SGCA, which is involved in the cellular structure of muscle fibers and, along with DMD, forms part of the dystrophin-glycoprotein complex. Functional annotation suggested that several of these variants reside in loci that regulate processes relevant to TMD pathobiologic processes.
Circulating white blood cell (WBC) counts (neutrophils, monocytes, lymphocytes, eosinophils, basophils) differ by ethnicity. The genetic factors underlying basal WBC traits in Hispanics/Latinos are unknown. We performed a genome-wide association study of total WBC and differential counts in a large, ethnically diverse US population sample of Hispanics/Latinos ascertained by the Hispanic Community Health Study and Study of Latinos (HCHS/SOL). We demonstrate that several previously known WBC-associated genetic loci (e.g. the African Duffy antigen receptor for chemokines null variant for neutrophil count) are generalizable to WBC traits in Hispanics/Latinos. We identified and replicated common and rare germ-line variants at FLT3 (a gene often somatically mutated in leukemia) associated with monocyte count. The common FLT3 variant rs76428106 has a large allele frequency differential between African and non-African populations. We also identified several novel genetic loci involving or regulating hematopoietic transcription factors (CEBPE-SLC7A7, CEBPA and CRBN-TRNT1) associated with basophil count. The minor allele of the CEBPE variant associated with lower basophil count has been previously associated with Amerindian ancestry and higher risk of acute lymphoblastic leukemia in Hispanics. Together, these data suggest that germline genetic variation affecting transcriptional and signaling pathways that underlie WBC development and lineage specification can contribute to inter-individual as well as ethnic differences in peripheral blood cell counts (normal hematopoiesis) in addition to susceptibility to leukemia (malignant hematopoiesis).
Case-control studies are designed towards studying associations between risk factors and a single, primary outcome. Information about additional, secondary outcomes is also collected, but association studies targeting such secondary outcomes should account for the case-control sampling scheme, or otherwise results may be biased. Often, one uses inverse probability weighted (IPW) estimators to estimate population effects in such studies. IPW estimators are robust, as they only require correct specification of the mean regression model of the secondary outcome on covariates, and knowledge of the disease prevalence. However, IPW estimators are inefficient relative to estimators that make additional assumptions about the data generating mechanism. We propose a class of estimators for the effect of risk factors on a secondary outcome in case-control studies that combine IPW with an additional modeling assumption: specification of the disease outcome probability model. We incorporate this model via a mean zero control function. We derive the class of all regular and asymptotically linear estimators corresponding to our modeling assumption, when the secondary outcome mean is modeled using either the identity or the log link. We find the efficient estimator in our class of estimators and show that it reduces to standard IPW when the model for the primary disease outcome is unrestricted, and is more efficient than standard IPW when the model is either parametric or semiparametric.
In genome-wide association studies (GWAS), "generalization" is the replication of genotype-phenotype association in a population with different ancestry than the population in which it was first identified. Current practices for declaring generalizations rely on testing associations while controlling the family-wise error rate (FWER) in the discovery study, then separately controlling error measures in the follow-up study. This approach does not guarantee control over the FWER or false discovery rate (FDR) of the generalization null hypotheses. It also fails to leverage the two-stage design to increase power for detecting generalized associations. We provide a formal statistical framework for quantifying the evidence of generalization that accounts for the (in)consistency between the directions of associations in the discovery and follow-up studies. We develop the directional generalization FWER (FWERg ) and FDR (FDRg ) controlling r-values, which are used to declare associations as generalized. This framework extends to generalization testing when applied to a published list of Single Nucleotide Polymorphism-(SNP)-trait associations. Our methods control FWERg or FDRg under various SNP selection rules based on P-values in the discovery study. We find that it is often beneficial to use a more lenient P-value threshold than the genome-wide significance threshold. In a GWAS of total cholesterol in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), when testing all SNPs with P-values <5×10-8 (15 genomic regions) for generalization in a large GWAS of whites, we generalized SNPs from 15 regions. But when testing all SNPs with P-values <6.6×10-5 (89 regions), we generalized SNPs from 27 regions.
Prior GWAS have identified loci associated with red blood cell (RBC) traits in populations of European, African, and Asian ancestry. These studies have not included individuals with an Amerindian ancestral background, such as Hispanics/Latinos, nor evaluated the full spectrum of genomic variation beyond single nucleotide variants. Using a custom genotyping array enriched for Amerindian ancestral content and 1000 Genomes imputation, we performed GWAS in 12,502 participants of Hispanic Community Health Study and Study of Latinos (HCHS/SOL) for hematocrit, hemoglobin, RBC count, RBC distribution width (RDW), and RBC indices. Approximately 60% of previously reported RBC trait loci generalized to HCHS/SOL Hispanics/Latinos, including African ancestral alpha- and beta-globin gene variants. In addition to the known 3.8kb alpha-globin copy number variant, we identified an Amerindian ancestral association in an alpha-globin regulatory region on chromosome 16p13.3 for mean corpuscular volume and mean corpuscular hemoglobin. We also discovered and replicated three genome-wide significant variants in previously unreported loci for RDW (SLC12A2 rs17764730, PSMB5 rs941718), and hematocrit (PROX1 rs3754140). Among the proxy variants at the SLC12A2 locus we identified rs3812049, located in a bi-directional promoter between SLC12A2 (which encodes a red cell membrane ion-transport protein) and an upstream anti-sense long-noncoding RNA, LINC01184, as the likely causal variant. We further demonstrate that disruption of the regulatory element harboring rs3812049 affects transcription of SLC12A2 and LINC01184 in human erythroid progenitor cells. Together, these results reinforce the importance of genetic study of diverse ancestral populations, in particular Hispanics/Latinos.
Puerto Ricans are disproportionately affected with asthma in the USA. In this study, we aim to identify genetic variants that confer susceptibility to asthma in Puerto Ricans.We conducted a meta-analysis of genome-wide association studies (GWAS) of asthma in Puerto Ricans, including participants from: the Genetics of Asthma in Latino Americans (GALA) I-II, the Hartford-Puerto Rico Study and the Hispanic Community Health Study. Moreover, we examined whether susceptibility loci identified in previous meta-analyses of GWAS are associated with asthma in Puerto Ricans.The only locus to achieve genome-wide significance was chromosome 17q21, as evidenced by our top single nucleotide polymorphism (SNP), rs907092 (OR 0.71, p=1.2×10-12) at IKZF3 Similar to results in non-Puerto Ricans, SNPs in genes in the same linkage disequilibrium block as IKZF3 (e.g. ZPBP2, ORMDL3 and GSDMB) were significantly associated with asthma in Puerto Ricans. With regard to results from a meta-analysis in Europeans, we replicated findings for rs2305480 at GSDMB, but not for SNPs in any other genes. On the other hand, we replicated results from a meta-analysis of North American populations for SNPs at IL1RL1, TSLP and GSDMB but not for IL33Our findings suggest that common variants on chromosome 17q21 have the greatest effects on asthma in Puerto Ricans.
Few genome-wide association studies (GWAS) of type 2 diabetes (T2D) have been conducted in U.S. Hispanics/Latinos of diverse backgrounds who are disproportionately affected by diabetes. We conducted a GWAS in 2,499 T2D case subjects and 5,247 control subjects from six Hispanic/Latino background groups in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). Our GWAS identified two known loci (TCF7L2 and KCNQ1) reaching genome-wide significance levels. Conditional analysis on known index single nucleotide polymorphisms (SNPs) indicated an additional independent signal at KCNQ1, represented by an African ancestry-specific variant, rs1049549 (odds ratio 1.49 [95% CI 1.27-1.75]). This association was consistent across Hispanic/Latino background groups and replicated in the MEta-analysis of type 2 DIabetes in African Americans (MEDIA) Consortium. Among 80 previously known index SNPs at T2D loci, 66 SNPs showed consistency with the reported direction of associations and 14 SNPs significantly generalized to the HCHS/SOL. A genetic risk score based on these 80 index SNPs was significantly associated with T2D (odds ratio 1.07 [1.06-1.09] per risk allele), with a stronger effect observed in nonobese than in obese individuals. Our study identified a novel independent signal suggesting an African ancestry-specific allele at KCNQ1 for T2D. Associations between previously identified loci and T2D were generally shown in a large cohort of U.S. Hispanics/Latinos.
Hypertension is a leading cause of global disease, mortality, and disability. While individuals of African descent suffer a disproportionate burden of hypertension and its complications, they have been underrepresented in genetic studies. To identify novel susceptibility loci for blood pressure and hypertension in people of African ancestry, we performed both single and multiple-trait genome-wide association analyses. We analyzed 21 genome-wide association studies comprised of 31,968 individuals of African ancestry, and validated our results with additional 54,395 individuals from multi-ethnic studies. These analyses identified nine loci with eleven independent variants which reached genome-wide significance (P < 1.25×10-8) for either systolic and diastolic blood pressure, hypertension, or for combined traits. Single-trait analyses identified two loci (TARID/TCF21 and LLPH/TMBIM4) and multiple-trait analyses identified one novel locus (FRMD3) for blood pressure. At these three loci, as well as at GRP20/CDH17, associated variants had alleles common only in African-ancestry populations. Functional annotation showed enrichment for genes expressed in immune and kidney cells, as well as in heart and vascular cells/tissues. Experiments driven by these findings and using angiotensin-II induced hypertension in mice showed altered kidney mRNA expression of six genes, suggesting their potential role in hypertension. Our study provides new evidence for genes related to hypertension susceptibility, and the need to study African-ancestry populations in order to identify biologic factors contributing to hypertension.