Publications by Year: 2013

2013

Jia X, Han B, Onengut-Gumuscu S, Chen WM, Concannon P, Rich S, Raychaudhuri S, Bakker P. Imputing amino acid polymorphisms in human leukocyte antigens. PLoS One. 2013;8(6):e64683.
DNA sequence variation within human leukocyte antigen (HLA) genes mediate susceptibility to a wide range of human diseases. The complex genetic structure of the major histocompatibility complex (MHC) makes it difficult, however, to collect genotyping data in large cohorts. Long-range linkage disequilibrium between HLA loci and SNP markers across the major histocompatibility complex (MHC) region offers an alternative approach through imputation to interrogate HLA variation in existing GWAS data sets. Here we describe a computational strategy, SNP2HLA, to impute classical alleles and amino acid polymorphisms at class I (HLA-A, -B, -C) and class II (-DPA1, -DPB1, -DQA1, -DQB1, and -DRB1) loci. To characterize performance of SNP2HLA, we constructed two European ancestry reference panels, one based on data collected in HapMap-CEPH pedigrees (90 individuals) and another based on data collected by the Type 1 Diabetes Genetics Consortium (T1DGC, 5,225 individuals). We imputed HLA alleles in an independent data set from the British 1958 Birth Cohort (N = 918) with gold standard four-digit HLA types and SNPs genotyped using the Affymetrix GeneChip 500 K and Illumina Immunochip microarrays. We demonstrate that the sample size of the reference panel, rather than SNP density of the genotyping platform, is critical to achieve high imputation accuracy. Using the larger T1DGC reference panel, the average accuracy at four-digit resolution is 94.7% using the low-density Affymetrix GeneChip 500 K, and 96.7% using the high-density Illumina Immunochip. For amino acid polymorphisms within HLA genes, we achieve 98.6% and 99.3% accuracy using the Affymetrix GeneChip 500 K and Illumina Immunochip, respectively. Finally, we demonstrate how imputation and association testing at amino acid resolution can facilitate fine-mapping of primary MHC association signals, giving a specific example from type 1 diabetes.
Gusev A, Bhatia G, Zaitlen N, Vilhjálmsson B, Diogo D, Stahl E, Gregersen P, Worthington J, Klareskog L, Raychaudhuri S, Plenge R, Pasaniuc B, Price A. Quantifying missing heritability at known GWAS loci. PLoS Genet. 2013;9(12):e1003993.
Recent work has shown that much of the missing heritability of complex traits can be resolved by estimates of heritability explained by all genotyped SNPs. However, it is currently unknown how much heritability is missing due to poor tagging or additional causal variants at known GWAS loci. Here, we use variance components to quantify the heritability explained by all SNPs at known GWAS loci in nine diseases from WTCCC1 and WTCCC2. After accounting for expectation, we observed all SNPs at known GWAS loci to explain 1.29 x more heritability than GWAS-associated SNPs on average (P=3.3 x 10⁻⁵). For some diseases, this increase was individually significant: 2.07 x for Multiple Sclerosis (MS) (P=6.5 x 10⁻⁹) and 1.48 x for Crohn's Disease (CD) (P = 1.3 x 10⁻³); all analyses of autoimmune diseases excluded the well-studied MHC region. Additionally, we found that GWAS loci from other related traits also explained significant heritability. The union of all autoimmune disease loci explained 7.15 x more MS heritability than known MS SNPs (P < 1.0 x 10⁻¹⁶ and 2.20 x more CD heritability than known CD SNPs (P = 6.1 x 10⁻⁹), with an analogous increase for all autoimmune diseases analyzed. We also observed significant increases in an analysis of > 20,000 Rheumatoid Arthritis (RA) samples typed on ImmunoChip, with 2.37 x more heritability from all SNPs at GWAS loci (P = 2.3 x 10⁻⁶) and 5.33 x more heritability from all autoimmune disease loci (P < 1 x 10⁻¹⁶ compared to known RA SNPs (including those identified in this cohort). Our methods adjust for LD between SNPs, which can bias standard estimates of heritability from SNPs even if all causal variants are typed. By comparing adjusted estimates, we hypothesize that the genome-wide distribution of causal variants is enriched for low-frequency alleles, but that causal variants at known GWAS loci are skewed towards common alleles. These findings have important ramifications for fine-mapping study design and our understanding of complex disease architecture.
Diogo D, Kurreeman F, Stahl E, Liao K, Gupta N, Greenberg J, Rivas M, Hickey B, Flannick J, Thomson B, Guiducci C, Ripke S, Adzhubey I, Barton A, Kremer J, Alfredsson L, America CRRN, Rheumatoid Arthritis Consortium International, Sunyaev S, Martin J, Zhernakova A, Bowes J, Eyre S, Siminovitch K, Gregersen P, Worthington J, Klareskog L, Padyukov L, Raychaudhuri S, Plenge R. Rare, low-frequency, and common variants in the protein-coding sequence of biological candidate genes from GWASs contribute to risk of rheumatoid arthritis. Am J Hum Genet. 2013;92(1):15–27.
The extent to which variants in the protein-coding sequence of genes contribute to risk of rheumatoid arthritis (RA) is unknown. In this study, we addressed this issue by deep exon sequencing and large-scale genotyping of 25 biological candidate genes located within RA risk loci discovered by genome-wide association studies (GWASs). First, we assessed the contribution of rare coding variants in the 25 genes to the risk of RA in a pooled sequencing study of 500 RA cases and 650 controls of European ancestry. We observed an accumulation of rare nonsynonymous variants exclusive to RA cases in IL2RA and IL2RB (burden test: p = 0.007 and p = 0.018, respectively). Next, we assessed the aggregate contribution of low-frequency and common coding variants to the risk of RA by dense genotyping of the 25 gene loci in 10,609 RA cases and 35,605 controls. We observed a strong enrichment of coding variants with a nominal signal of association with RA (p < 0.05) after adjusting for the best signal of association at the loci (p(enrichment) = 6.4 × 10(-4)). For one locus containing CD2, we found that a missense variant, rs699738 (c.798C>A [p.His266Gln]), and a noncoding variant, rs624988, reside on distinct haplotypes and independently contribute to the risk of RA (p = 4.6 × 10(-6)). Overall, our results indicate that variants (distributed across the allele-frequency spectrum) within the protein-coding portion of a subset of biological candidate genes identified by GWASs contribute to the risk of RA. Further, we have demonstrated that very large sample sizes will be required for comprehensively identifying the independent alleles contributing to the missing heritability of RA.
Cui J, Stahl E, Saevarsdottir S, Miceli C, Diogo D, Trynka G, Raj T, Mirkov MU, Canhao H, Ikari K, Terao C, Okada Y, Wedrén S, Askling J, Yamanaka H, Momohara S, Taniguchi A, Ohmura K, Matsuda F, Mimori T, Gupta N, Kuchroo M, Morgan A, Isaacs J, Wilson A, Hyrich K, Herenius M, Doorenspleet M, Tak PP, Crusius B, Horst-Bruinsma I, Wolbink GJ, Riel P, Laar M, Guchelaar HJ, Shadick N, Allaart C, Huizinga T, Toes R, Kimberly R, Bridges L, Criswell L, Moreland L, Fonseca JE, Vries N, Stranger B, De Jager P, Raychaudhuri S, Weinblatt M, Gregersen P, Mariette X, Barton A, Padyukov L, Coenen MJ, Karlson E, Plenge R. Genome-wide association study and gene expression analysis identifies CD84 as a predictor of response to etanercept therapy in rheumatoid arthritis. PLoS Genet. 2013;9(3):e1003394.
Anti-tumor necrosis factor alpha (anti-TNF) biologic therapy is a widely used treatment for rheumatoid arthritis (RA). It is unknown why some RA patients fail to respond adequately to anti-TNF therapy, which limits the development of clinical biomarkers to predict response or new drugs to target refractory cases. To understand the biological basis of response to anti-TNF therapy, we conducted a genome-wide association study (GWAS) meta-analysis of more than 2 million common variants in 2,706 RA patients from 13 different collections. Patients were treated with one of three anti-TNF medications: etanercept (n = 733), infliximab (n = 894), or adalimumab (n = 1,071). We identified a SNP (rs6427528) at the 1q23 locus that was associated with change in disease activity score (ΔDAS) in the etanercept subset of patients (P = 8 × 10(-8)), but not in the infliximab or adalimumab subsets (P>0.05). The SNP is predicted to disrupt transcription factor binding site motifs in the 3' UTR of an immune-related gene, CD84, and the allele associated with better response to etanercept was associated with higher CD84 gene expression in peripheral blood mononuclear cells (P = 1 × 10(-11) in 228 non-RA patients and P = 0.004 in 132 RA patients). Consistent with the genetic findings, higher CD84 gene expression correlated with lower cross-sectional DAS (P = 0.02, n = 210) and showed a non-significant trend for better ΔDAS in a subset of RA patients with gene expression data (n = 31, etanercept-treated). A small, multi-ethnic replication showed a non-significant trend towards an association among etanercept-treated RA patients of Portuguese ancestry (n = 139, P = 0.4), but no association among patients of Japanese ancestry (n = 151, P = 0.8). Our study demonstrates that an allele associated with response to etanercept therapy is also associated with CD84 gene expression, and further that CD84 expression correlates with disease activity. These findings support a model in which CD84 genotypes and/or expression may serve as a useful biomarker for response to etanercept treatment in RA patients of European ancestry.
Raj T, Kuchroo M, Replogle J, Raychaudhuri S, Stranger B, De Jager P. Common risk alleles for inflammatory diseases are targets of recent positive selection. Am J Hum Genet. 2013;92(4):517–29.
Genome-wide association studies (GWASs) have identified hundreds of loci harboring genetic variation influencing inflammatory-disease susceptibility in humans. It has been hypothesized that present day inflammatory diseases may have arisen, in part, due to pleiotropic effects of host resistance to pathogens over the course of human history, with significant selective pressures acting to increase host resistance to pathogens. The extent to which genetic factors underlying inflammatory-disease susceptibility has been influenced by selective processes can now be quantified more comprehensively than previously possible. To understand the evolutionary forces that have shaped inflammatory-disease susceptibility and to elucidate functional pathways affected by selection, we performed a systems-based analysis to integrate (1) published GWASs for inflammatory diseases, (2) a genome-wide scan for signatures of positive selection in a population of European ancestry, (3) functional genomics data comprised of protein-protein interaction networks, and (4) a genome-wide expression quantitative trait locus (eQTL) mapping study in peripheral blood mononuclear cells (PBMCs). We demonstrate that loci for inflammatory-disease susceptibility are enriched for genomic signatures of recent positive natural selection, with selected loci forming a highly interconnected protein-protein interaction network. Further, we identify 21 loci for inflammatory-disease susceptibility that display signatures of recent positive selection, of which 13 also show evidence of cis-regulatory effects on genes within the associated locus. Thus, our integrated analyses highlight a set of susceptibility loci that might subserve a shared molecular function and has experienced selective pressure over the course of human history; today, these loci play a key role in influencing susceptibility to multiple different inflammatory diseases, in part through alterations of gene expression in immune cells.
Li G, Diogo D, Wu D, Spoonamore J, Dancik V, Franke L, Kurreeman F, Rossin E, Duclos G, Hartland C, Zhou X, Li K, Liu J, De Jager P, Siminovitch K, Zhernakova A, Raychaudhuri S, Bowes J, Eyre S, Padyukov L, Gregersen P, Worthington J, Rheumatoid Arthritis Consortium International (RACI), Gupta N, Clemons P, Stahl E, Tolliday N, Plenge R. Human genetics in rheumatoid arthritis guides a high-throughput drug screen of the CD40 signaling pathway. PLoS Genet. 2013;9(5):e1003487.
Although genetic and non-genetic studies in mouse and human implicate the CD40 pathway in rheumatoid arthritis (RA), there are no approved drugs that inhibit CD40 signaling for clinical care in RA or any other disease. Here, we sought to understand the biological consequences of a CD40 risk variant in RA discovered by a previous genome-wide association study (GWAS) and to perform a high-throughput drug screen for modulators of CD40 signaling based on human genetic findings. First, we fine-map the CD40 risk locus in 7,222 seropositive RA patients and 15,870 controls, together with deep sequencing of CD40 coding exons in 500 RA cases and 650 controls, to identify a single SNP that explains the entire signal of association (rs4810485, P = 1.4×10(-9)). Second, we demonstrate that subjects homozygous for the RA risk allele have ∼33% more CD40 on the surface of primary human CD19+ B lymphocytes than subjects homozygous for the non-risk allele (P = 10(-9)), a finding corroborated by expression quantitative trait loci (eQTL) analysis in peripheral blood mononuclear cells from 1,469 healthy control individuals. Third, we use retroviral shRNA infection to perturb the amount of CD40 on the surface of a human B lymphocyte cell line (BL2) and observe a direct correlation between amount of CD40 protein and phosphorylation of RelA (p65), a subunit of the NF-κB transcription factor. Finally, we develop a high-throughput NF-κB luciferase reporter assay in BL2 cells activated with trimerized CD40 ligand (tCD40L) and conduct an HTS of 1,982 chemical compounds and FDA-approved drugs. After a series of counter-screens and testing in primary human CD19+ B cells, we identify 2 novel chemical inhibitors not previously implicated in inflammation or CD40-mediated NF-κB signaling. Our study demonstrates proof-of-concept that human genetics can be used to guide the development of phenotype-based, high-throughput small-molecule screens to identify potential novel therapies in complex traits such as RA.
While studies to associate genomic variants to complex traits have gradually become increasingly productive, the molecular mechanisms that underlie these associations are rarely understood. Because only a small fraction of trait-associated variants can be linked to coding sequences, investigators have speculated that many of the underlying causal alleles influence non-coding gene regulatory sites. Recent studies have successfully identified examples of mechanisms for non-coding alleles at individual loci. Now, genome-wide chromatin assays have resulted in maps of dozens of genomic annotations of the non-coding genome across multiple different tissues, cell types and cell lines. This gives a tremendous opportunity to integrate these annotations with complex trait signals to globally interpret associated variants, and prioritize likely causal alleles. Here, we review the examples of mechanisms by which non-coding, common alleles result in phenotypes. We discuss the efforts to integrate common trait-associated variants with genomic annotations. Finally, we highlight some caveats of these approaches and outline future directions for improvement.
Trynka G, Sandor C, Han B, Xu H, Stranger B, Liu S, Raychaudhuri S. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat Genet. 2013;45(2):124–30.
If trait-associated variants alter regulatory regions, then they should fall within chromatin marks in relevant cell types. However, it is unclear which of the many marks are most useful in defining cell types associated with disease and fine mapping variants. We hypothesized that informative marks are phenotypically cell type specific; that is, SNPs associated with the same trait likely overlap marks in the same cell type. We examined 15 chromatin marks and found that those highlighting active gene regulation were phenotypically cell type specific. Trimethylation of histone H3 at lysine 4 (H3K4me3) was the most phenotypically cell type specific (P < 1 × 10(-6)), driven by colocalization of variants and marks rather than gene proximity (P < 0.001). H3K4me3 peaks overlapped with 37 SNPs for plasma low-density lipoprotein concentration in the liver (P < 7 × 10(-5)), 31 SNPs for rheumatoid arthritis within CD4(+) regulatory T cells (P = 1 × 10(-4)), 67 SNPs for type 2 diabetes in pancreatic islet cells (P = 0.003) and the liver (P = 0.003), and 14 SNPs for neuropsychiatric disease in neuronal tissues (P = 0.007). We show how cell type-specific H3K4me3 peaks can inform the fine mapping of associated SNPs to identify causal variation.
Hu X, Kim H, Brennan P, Han B, Baecher-Allan C, De Jager P, Brenner M, Raychaudhuri S. Application of user-guided automated cytometric data analysis to large-scale immunoprofiling of invariant natural killer T cells. Proc Natl Acad Sci U S A. 2013;110(47):19030–5.
Defining and characterizing pathologies of the immune system requires precise and accurate quantification of abundances and functions of cellular subsets via cytometric studies. At this time, data analysis relies on manual gating, which is a major source of variability in large-scale studies. We devised an automated, user-guided method, X-Cyt, which specializes in rapidly and robustly identifying targeted populations of interest in large data sets. We first applied X-Cyt to quantify CD4(+) effector and central memory T cells in 236 samples, demonstrating high concordance with manual analysis (r = 0.91 and 0.95, respectively) and superior performance to other available methods. We then quantified the rare mucosal associated invariant T cell population in 35 samples, achieving manual concordance of 0.98. Finally we characterized the population dynamics of invariant natural killer T (iNKT) cells, a particularly rare peripheral lymphocyte, in 110 individuals by assaying 19 markers. We demonstrated that although iNKT cell numbers and marker expression are highly variable in the population, iNKT abundance correlates with sex and age, and the expression of phenotypic and functional markers correlates closely with CD4 expression.