Linear mixed models (LMMs) are widely used in genome-wide association studies (GWASs) to account for population structure and relatedness, for both continuous and binary traits. Motivated by the failure of LMMs to control type I errors in a GWAS of asthma, a binary trait, we show that LMMs are generally inappropriate for analyzing binary traits when population stratification leads to violation of the LMM's constant-residual variance assumption. To overcome this problem, we develop a computationally efficient logistic mixed model approach for genome-wide analysis of binary traits, the generalized linear mixed model association test (GMMAT). This approach fits a logistic mixed model once per GWAS and performs score tests under the null hypothesis of no association between a binary trait and individual genetic variants. We show in simulation studies and real data analysis that GMMAT effectively controls for population structure and relatedness when analyzing binary traits in a wide variety of study designs.
Publications
2016
Investigators often meta-analyze multiple genome-wide association studies (GWASs) to increase the power to detect associations of single nucleotide polymorphisms (SNPs) with a trait. Meta-analysis is also performed within a single cohort that is stratified by, e.g., sex or ancestry group. Having correlated individuals among the strata may complicate meta-analyses, limit power, and inflate Type 1 error. For example, in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), sources of correlation include genetic relatedness, shared household, and shared community. We propose a novel mixed-effect model for meta-analysis, "MetaCor," which accounts for correlation between stratum-specific effect estimates. Simulations show that MetaCor controls inflation better than alternatives such as ignoring the correlation between the strata or analyzing all strata together in a "pooled" GWAS, especially with different minor allele frequencies (MAFs) between strata. We illustrate the benefits of MetaCor on two GWASs in the HCHS/SOL. Analysis of dental caries (tooth decay) stratified by ancestry group detected a genome-wide significant SNP (rs7791001, P-value = 3.66×10-8, compared to 4.67×10-7 in pooled), with different MAFs between strata. Stratified analysis of body mass index (BMI) by ancestry group and sex reduced overall inflation from λGC=1.050 (pooled) to λGC=1.028 (MetaCor). Furthermore, even after removing close relatives to obtain nearly uncorrelated strata, a naïve stratified analysis resulted in λGC=1.058 compared to λGC=1.027 for MetaCor.
The difference-in-differences (DID) approach is a well known strategy for estimating the effect of an exposure in the presence of unobserved confounding. The approach is most commonly used when pre-and post-exposure outcome measurements are available, and one can assume that the association of the unobserved confounder with the outcome is equal in the two exposure groups, and constant over time. Then, one recovers the treatment effect by regressing the change in outcome over time on the exposure. In this paper, we interpret the difference-in-differences as a negative outcome control (NOC) approach. We show that the pre-exposure outcome is a negative control outcome, as it cannot be influenced by the subsequent exposure, and it is affected by both observed and unobserved confounders of the exposure-outcome association of interest. The relation between DID and NOC provides simple conditions under which negative control outcomes can be used to detect and correct for confounding bias. However, for general negative control outcomes, the DID-like assumption may be overly restrictive and rarely credible, because it requires that both the outcome of interest and the control outcome are measured on the same scale. Thus, we present a scale-invariant generalization of the DID that may be used in broader NOC contexts. The proposed approach is demonstrated in simulations and on a Normative Aging Study data set, in which Body Mass Index is used for NOC of the relationship between air pollution and inflammatory outcomes.
We analyzed genome-wide association studies (GWASs), including data from 71,638 individuals from four ancestries, for estimated glomerular filtration rate (eGFR), a measure of kidney function used to define chronic kidney disease (CKD). We identified 20 loci attaining genome-wide-significant evidence of association (p < 5 × 10(-8)) with kidney function and highlighted that allelic effects on eGFR at lead SNPs are homogeneous across ancestries. We leveraged differences in the pattern of linkage disequilibrium between diverse populations to fine-map the 20 loci through construction of "credible sets" of variants driving eGFR association signals. Credible variants at the 20 eGFR loci were enriched for DNase I hypersensitivity sites (DHSs) in human kidney cells. DHS credible variants were expression quantitative trait loci for NFATC1 and RGS14 (at the SLC34A1 locus) in multiple tissues. Loss-of-function mutations in ancestral orthologs of both genes in Drosophila melanogaster were associated with altered sensitivity to salt stress. Renal mRNA expression of Nfatc1 and Rgs14 in a salt-sensitive mouse model was also reduced after exposure to a high-salt diet or induced CKD. Our study (1) demonstrates the utility of trans-ethnic fine mapping through integration of GWASs involving diverse populations with genomic annotation from relevant tissues to define molecular mechanisms by which association signals exert their effect and (2) suggests that salt sensitivity might be an important marker for biological processes that affect kidney function and CKD in humans.
Platelets play an essential role in hemostasis and thrombosis. We performed a genome-wide association study of platelet count in 12,491 participants of the Hispanic Community Health Study/Study of Latinos by using a mixed-model method that accounts for admixture and family relationships. We discovered and replicated associations with five genes (ACTN1, ETV7, GABBR1-MOG, MEF2C, and ZBTB9-BAK1). Our strongest association was with Amerindian-specific variant rs117672662 (p value = 1.16 × 10(-28)) in ACTN1, a gene implicated in congenital macrothrombocytopenia. rs117672662 exhibited allelic differences in transcriptional activity and protein binding in hematopoietic cells. Our results underscore the value of diverse populations to extend insights into the allelic architecture of complex traits.
(1) OBJECTIVE: To examine the relationship between the choice of second-generation antidepressant drug treatment and long-term weight change; (2) METHODS: We conducted a retrospective cohort study to investigate the relationship between choice of antidepressant medication and weight change at two years among adult patients with a new antidepressant treatment episode between January, 2006 and October, 2009 in a large health system in Washington State. Medication use, encounters, diagnoses, height, and weight were collected from electronic databases. We modeled change in weight and BMI at two years after initiation of treatment using inverse probability weighted linear regression models that adjusted for potential confounders. Fluoxetine was the reference treatment; (3) RESULTS: In intent-to-treat analyses, non-smokers who initiated bupropion treatment on average lost 7.1 lbs compared to fluoxetine users who were non-smokers (95% CI: -11.3, -2.8; p-value < 0.01); smokers who initiated bupropion treatment gained on average 2.2 lbs compared to fluoxetine users who were smokers (95% CI: -2.3, 6.8; p-value = 0.33). Changes in weight associated with all other antidepressant medications were not significantly different than fluoxetine, except for sertraline users, who gained an average of 5.9 lbs compared to fluoxetine users (95% CI: 0.8, 10.9; p-value = 0.02); (4) CONCLUSION: Antidepressant drug therapy is significantly associated with long-term weight change at two years. Bupropion may be considered as the first-line drug of choice for overweight and obese patients unless there are other existing contraindications.
RATIONALE: Obstructive sleep apnea is a common disorder associated with increased risk for cardiovascular disease, diabetes, and premature mortality. Although there is strong clinical and epidemiologic evidence supporting the importance of genetic factors in influencing obstructive sleep apnea, its genetic basis is still largely unknown. Prior genetic studies focused on traits defined using the apnea-hypopnea index, which contains limited information on potentially important genetically determined physiologic factors, such as propensity for hypoxemia and respiratory arousability.
OBJECTIVES: To define novel obstructive sleep apnea genetic risk loci for obstructive sleep apnea, we conducted genome-wide association studies of quantitative traits in Hispanic/Latino Americans from three cohorts.
METHODS: Genome-wide data from as many as 12,558 participants in the Hispanic Community Health Study/Study of Latinos, Multi-Ethnic Study of Atherosclerosis, and Starr County Health Studies population-based cohorts were metaanalyzed for association with the apnea-hypopnea index, average oxygen saturation during sleep, and average respiratory event duration.
MEASUREMENTS AND MAIN RESULTS: Two novel loci were identified at genome-level significance (rs11691765, GPR83, P = 1.90 × 10-8 for the apnea-hypopnea index, and rs35424364; C6ORF183/CCDC162P, P = 4.88 × 10-8 for respiratory event duration) and seven additional loci were identified with suggestive significance (P < 5 × 10-7). Secondary sex-stratified analyses also identified one significant and several suggestive associations. Multiple loci overlapped genes with biologic plausibility.
CONCLUSIONS: These are the first genome-level significant findings reported for obstructive sleep apnea-related physiologic traits in any population. These findings identify novel associations in inflammatory, hypoxia signaling, and sleep pathways.
BACKGROUND: Osteoporosis is a major public health problem associated with excess disability and mortality. It is estimated that 50-70% of the variation in osteoporotic fracture risk is attributable to genetic factors. The purpose of this hypothesis-generating study was to identify possible genetic determinants of fracture among African American (AA) women in a GWAS meta-analysis.
METHODS: Data on clinical fractures (all fractures except fingers, toes, face, skull or sternum) were analyzed among AA female participants in the Women's Health Initiative (WHI) (N = 8155), Cardiovascular Health Study (CHS) (N = 504), BioVU (N = 704), Health ABC (N = 651), and the Johnston County Osteoarthritis Project (JoCoOA) (N = 291). Affymetrix (WHI) and Illumina (Health ABC, JoCoOA, BioVU, CHS) GWAS panels were used for genotyping, and a 1:1 ratio of YRI:CEU HapMap haplotypes was used as an imputation reference panel. We used Cox proportional hazard models or logistic regression to evaluate the association of 2.5 million SNPs with fracture risk, adjusting for ancestry, age, and geographic region where applicable. We conducted a fixed-effects, inverse variance-weighted meta-analysis. Genome-wide significance was set at P < 5 × 10- 8.
RESULTS: One SNP, rs12775980 in an intron of SVIL on chromosome 10p11.2, reached genome-wide significance (P = 4.0 × 10- 8). Although this SNP has a low minor allele frequency (0.03), there was no evidence for heterogeneity of effects across the studies (I2 = 0). This locus was not reported in any previous osteoporosis-related GWA studies. We also interrogated previously reported GWA-significant loci associated with fracture or bone mineral density in our data. One locus (SMOC1) generalized, but overall there was not substantial evidence of generalization. Possible reasons for the lack of generalization are discussed.
CONCLUSION: This GWAS meta-analysis of fractures in African American women identified a potentially novel locus in the supervillin gene, which encodes a platelet-associated factor and was previously associated with platelet thrombus formation in African Americans. If validated in other populations of African descent, these findings suggest potential new mechanisms involved in fracture that may be particularly important among African Americans.
BACKGROUND: Social Anxiety Disorder (SAD) is linked to social norms and role expectations which are culture dependent, such as the construal of one's self as independent or interdependent in relation to others. The current study is the first to examine SAD symptoms among Ethiopian and former Soviet Union immigrants to Israel compared to a sample of native Israelis. We investigated the relationship between SAD, ethnicity and independent/ interdependent self-construals.
METHODS: A total of 261 students (151 native-born Israelis, 60 Ethiopian immigrants and 50 students from the former USSR) were administrated the Liebowitz Scale (LSAS), the Self-construal Scale (SCS), Beck Depression Inventory (BDI) and a socio-demographic questionnaire.
RESULTS: Ethiopians exhibited highest SAD scores while no differences were found between the FSU immigrants and native-born Israelis. Additionally, Ethiopians and native-born Israeli students exhibited similar high interdependence scores. Finally, SAD scores were predicted by gender, origin, independent and interdependent self-construals.
CONCLUSION: Immigration per se is not a universal risk factor of SAD and ethnological-cultural factors do contribute specifically to SAD. A possible psychological mediator between culture and the susceptibility to SAD are the interdependence and independent self-construals. When treating immigrants, clinicians and health care providers are advised to consider the effect of cultural influence on the mental well-being and integration process of immigrants in to their host country.
US Hispanic/Latino individuals are diverse in genetic ancestry, culture, and environmental exposures. Here, we characterized and controlled for this diversity in genome-wide association studies (GWASs) for the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). We simultaneously estimated population-structure principal components (PCs) robust to familial relatedness and pairwise kinship coefficients (KCs) robust to population structure, admixture, and Hardy-Weinberg departures. The PCs revealed substantial genetic differentiation within and among six self-identified background groups (Cuban, Dominican, Puerto Rican, Mexican, and Central and South American). To control for variation among groups, we developed a multi-dimensional clustering method to define a "genetic-analysis group" variable that retains many properties of self-identified background while achieving substantially greater genetic homogeneity within groups and including participants with non-specific self-identification. In GWASs of 22 biomedical traits, we used a linear mixed model (LMM) including pairwise empirical KCs to account for familial relatedness, PCs for ancestry, and genetic-analysis groups for additional group-associated effects. Including the genetic-analysis group as a covariate accounted for significant trait variation in 8 of 22 traits, even after we fit 20 PCs. Additionally, genetic-analysis groups had significant heterogeneity of residual variance for 20 of 22 traits, and modeling this heteroscedasticity within the LMM reduced genomic inflation for 19 traits. Furthermore, fitting an LMM that utilized a genetic-analysis group rather than a self-identified background group achieved higher power to detect previously reported associations. We expect that the methods applied here will be useful in other studies with multiple ethnic groups, admixture, and relatedness.