Most genome-wide association studies (GWAS) of major depression (MD) have been conducted in samples of European ancestry. Here we report a multi-ancestry GWAS of MD, adding data from 21 cohorts with 88,316 MD cases and 902,757 controls to previously reported data. This analysis used a range of measures to define MD and included samples of African (36% of effective sample size), East Asian (26%) and South Asian (6%) ancestry and Hispanic/Latin American participants (32%). The multi-ancestry GWAS identified 53 significantly associated novel loci. For loci from GWAS in European ancestry samples, fewer than expected were transferable to other ancestry groups. Fine mapping benefited from additional sample diversity. A transcriptome-wide association study identified 205 significantly associated novel genes. These findings suggest that, for MD, increasing ancestral and global diversity in genetic studies may be particularly important to ensure discovery of core genes and inform about transferability of findings.
Genome-wide association study
2024
2023
BACKGROUND: Risk for venous thromboembolism has a strong genetic component. Whole genome sequencing from the TOPMed program (Trans-Omics for Precision Medicine) allowed us to look for new associations, particularly rare variants missed by standard genome-wide association studies.
METHODS: The 3793 cases and 7834 controls (11.6% of cases were individuals of African, Hispanic/Latino, or Asian ancestry) were analyzed using a single variant approach and an aggregate gene-based approach using our primary filter (included only loss-of-function and missense variants predicted to be deleterious) and our secondary filter (included all missense variants).
RESULTS: Single variant analyses identified associations at 5 known loci. Aggregate gene-based analyses identified only PROC (odds ratio, 6.2 for carriers of rare variants; P=7.4×10-14) when using our primary filter. Employing our secondary variant filter led to a smaller effect size at PROC (odds ratio, 3.8; P=1.6×10-14), while excluding variants found only in rare isoforms led to a larger one (odds ratio, 7.5). Different filtering strategies improved the signal for 2 other known genes: PROS1 became significant (minimum P=1.8×10-6 with the secondary filter), while SERPINC1 did not (minimum P=4.4×10-5 with minor allele frequency <0.0005). Results were largely the same when restricting the analyses to include only unprovoked cases; however, one novel gene, MS4A1, became significant (P=4.4×10-7 using all missense variants with minor allele frequency <0.0005).
CONCLUSIONS: Here, we have demonstrated the importance of using multiple variant filtering strategies, as we detected additional genes when filtering variants based on their predicted deleteriousness, frequency, and presence on the most expressed isoforms. Our primary analyses did not identify new candidate loci; thus larger follow-up studies are needed to replicate the novel MS4A1 locus and to identify additional rare variation associated with venous thromboembolism.
BACKGROUND: Genome-wide association studies (GWAS) for obstructive sleep apnoea (OSA) are limited due to the underdiagnosis of OSA, leading to misclassification of OSA, which consequently reduces statistical power. We performed a GWAS of OSA in the Million Veteran Program (MVP) of the U.S. Department of Veterans Affairs (VA) healthcare system, where OSA prevalence is close to its true population prevalence.
METHODS: We performed GWAS of 568,576 MVP participants, stratified by biological sex and by harmonized race/ethnicity and genetic ancestry (HARE) groups of White, Black, Hispanic, and Asian individuals. We considered both BMI adjusted (BMI-adj) and unadjusted (BMI-unadj) models. We replicated associations in independent datasets, and analysed the heterogeneity of OSA genetic associations across HARE and sex groups. We finally performed a larger meta-analysis GWAS of MVP, FinnGen, and the MGB Biobank, totalling 916,696 individuals.
FINDINGS: MVP participants are 91% male. OSA prevalence is 21%. In MVP there were 18 and 6 genome-wide significant loci in BMI-unadj and BMI-adj analyses, respectively, corresponding to 21 association regions. Of these, 17 were not previously reported in association with OSA, and 13 replicated in FinnGen (False Discovery Rate p-value < 0.05). There were widespread significant differences in genetic effects between men and women, but less so across HARE groups. Meta-analysis of MVP, FinnGen, and MGB biobank revealed 17 additional, previously unreported, genome-wide significant regions.
INTERPRETATION: Sex differences in genetic associations with OSA are widespread, likely associated with multiple OSA risk factors. OSA shares genetic underpinnings with several sleep phenotypes, suggesting shared aetiology and causal pathways.
FUNDING: Described in acknowledgements.
2019
Dental caries and periodontitis account for a vast burden of morbidity and healthcare spending, yet their genetic basis remains largely uncharacterized. Here, we identify self-reported dental disease proxies which have similar underlying genetic contributions to clinical disease measures and then combine these in a genome-wide association study meta-analysis, identifying 47 novel and conditionally-independent risk loci for dental caries. We show that the heritability of dental caries is enriched for conserved genomic regions and partially overlapping with a range of complex traits including smoking, education, personality traits and metabolic measures. Using cardio-metabolic traits as an example in Mendelian randomization analysis, we estimate causal relationships and provide evidence suggesting that the processes contributing to dental caries may have undesirable downstream effects on health.
2016
BACKGROUND: Genome-wide association studies (GWAS) have made little progress in identifying variants linked to depression. We hypothesized that examining depressive symptoms and considering gene-environment interaction (GxE) might improve efficiency for gene discovery. We therefore conducted a GWAS and genome-wide by environment interaction study (GWEIS) of depressive symptoms.
METHODS: Using data from the SHARe cohort of the Women's Health Initiative, comprising African Americans (n = 7,179) and Hispanics/Latinas (n = 3,138), we examined genetic main effects and GxE with stressful life events and social support. We also conducted a heritability analysis using genome-wide complex trait analysis (GCTA). Replication was attempted in four independent cohorts.
RESULTS: No SNPs achieved genome-wide significance for main effects in either discovery sample. The top signals in African Americans were rs73531535 (located 20 kb from GPR139, P = 5.75 × 10(-8) ) and rs75407252 (intronic to CACNA2D3, P = 6.99 × 10(-7) ). In Hispanics/Latinas, the top signals were rs2532087 (located 27 kb from CD38, P = 2.44 × 10(-7) ) and rs4542757 (intronic to DCC, P = 7.31 × 10(-7) ). In the GEWIS with stressful life events, one interaction signal was genome-wide significant in African Americans (rs4652467; P = 4.10 × 10(-10) ; located 14 kb from CEP350). This interaction was not observed in a smaller replication cohort. Although heritability estimates for depressive symptoms and stressful life events were each less than 10%, they were strongly genetically correlated (rG = 0.95), suggesting that common variation underlying self-reported depressive symptoms and stressful life event exposure, though modest on their own, were highly overlapping in this sample.
CONCLUSIONS: Our results underscore the need for larger samples, more GEWIS, and greater investigation into genetic and environmental determinants of depressive symptoms in minorities.
Linear mixed models (LMMs) are widely used in genome-wide association studies (GWASs) to account for population structure and relatedness, for both continuous and binary traits. Motivated by the failure of LMMs to control type I errors in a GWAS of asthma, a binary trait, we show that LMMs are generally inappropriate for analyzing binary traits when population stratification leads to violation of the LMM's constant-residual variance assumption. To overcome this problem, we develop a computationally efficient logistic mixed model approach for genome-wide analysis of binary traits, the generalized linear mixed model association test (GMMAT). This approach fits a logistic mixed model once per GWAS and performs score tests under the null hypothesis of no association between a binary trait and individual genetic variants. We show in simulation studies and real data analysis that GMMAT effectively controls for population structure and relatedness when analyzing binary traits in a wide variety of study designs.
Investigators often meta-analyze multiple genome-wide association studies (GWASs) to increase the power to detect associations of single nucleotide polymorphisms (SNPs) with a trait. Meta-analysis is also performed within a single cohort that is stratified by, e.g., sex or ancestry group. Having correlated individuals among the strata may complicate meta-analyses, limit power, and inflate Type 1 error. For example, in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), sources of correlation include genetic relatedness, shared household, and shared community. We propose a novel mixed-effect model for meta-analysis, "MetaCor," which accounts for correlation between stratum-specific effect estimates. Simulations show that MetaCor controls inflation better than alternatives such as ignoring the correlation between the strata or analyzing all strata together in a "pooled" GWAS, especially with different minor allele frequencies (MAFs) between strata. We illustrate the benefits of MetaCor on two GWASs in the HCHS/SOL. Analysis of dental caries (tooth decay) stratified by ancestry group detected a genome-wide significant SNP (rs7791001, P-value = 3.66×10-8, compared to 4.67×10-7 in pooled), with different MAFs between strata. Stratified analysis of body mass index (BMI) by ancestry group and sex reduced overall inflation from λGC=1.050 (pooled) to λGC=1.028 (MetaCor). Furthermore, even after removing close relatives to obtain nearly uncorrelated strata, a naïve stratified analysis resulted in λGC=1.058 compared to λGC=1.027 for MetaCor.
We analyzed genome-wide association studies (GWASs), including data from 71,638 individuals from four ancestries, for estimated glomerular filtration rate (eGFR), a measure of kidney function used to define chronic kidney disease (CKD). We identified 20 loci attaining genome-wide-significant evidence of association (p < 5 × 10(-8)) with kidney function and highlighted that allelic effects on eGFR at lead SNPs are homogeneous across ancestries. We leveraged differences in the pattern of linkage disequilibrium between diverse populations to fine-map the 20 loci through construction of "credible sets" of variants driving eGFR association signals. Credible variants at the 20 eGFR loci were enriched for DNase I hypersensitivity sites (DHSs) in human kidney cells. DHS credible variants were expression quantitative trait loci for NFATC1 and RGS14 (at the SLC34A1 locus) in multiple tissues. Loss-of-function mutations in ancestral orthologs of both genes in Drosophila melanogaster were associated with altered sensitivity to salt stress. Renal mRNA expression of Nfatc1 and Rgs14 in a salt-sensitive mouse model was also reduced after exposure to a high-salt diet or induced CKD. Our study (1) demonstrates the utility of trans-ethnic fine mapping through integration of GWASs involving diverse populations with genomic annotation from relevant tissues to define molecular mechanisms by which association signals exert their effect and (2) suggests that salt sensitivity might be an important marker for biological processes that affect kidney function and CKD in humans.
Platelets play an essential role in hemostasis and thrombosis. We performed a genome-wide association study of platelet count in 12,491 participants of the Hispanic Community Health Study/Study of Latinos by using a mixed-model method that accounts for admixture and family relationships. We discovered and replicated associations with five genes (ACTN1, ETV7, GABBR1-MOG, MEF2C, and ZBTB9-BAK1). Our strongest association was with Amerindian-specific variant rs117672662 (p value = 1.16 × 10(-28)) in ACTN1, a gene implicated in congenital macrothrombocytopenia. rs117672662 exhibited allelic differences in transcriptional activity and protein binding in hematopoietic cells. Our results underscore the value of diverse populations to extend insights into the allelic architecture of complex traits.