Publications

2014

Okada Y, Han B, Tsoi L, Stuart P, Ellinghaus E, Tejasvi T, Chandran V, Pellett F, Pollock R, Bowcock A, Krueger G, Weichenthal M, Voorhees J, Rahman P, Gregersen P, Franke A, Nair R, Abecasis G, Gladman D, Elder J, Bakker P, Raychaudhuri S. Fine mapping major histocompatibility complex associations in psoriasis and its clinical subtypes. Am J Hum Genet. 2014;95(2):162–72.
Psoriasis vulgaris (PsV) risk is strongly associated with variation within the major histocompatibility complex (MHC) region, but its genetic architecture has yet to be fully elucidated. Here, we conducted a large-scale fine-mapping study of PsV risk in the MHC region in 9,247 PsV-affected individuals and 13,589 controls of European descent by imputing class I and II human leukocyte antigen (HLA) genes from SNP genotype data. In addition, we imputed sequence variants for MICA, an MHC HLA-like gene that has been associated with PsV, to evaluate association at that locus as well. We observed that HLA-C(∗)06:02 demonstrated the lowest p value for overall PsV risk (p = 1.7 × 10(-364)). Stepwise analysis revealed multiple HLA-C(∗)06:02-independent risk variants in both class I and class II HLA genes for PsV susceptibility (HLA-C(∗)12:03, HLA-B amino acid positions 67 and 9, HLA-A amino acid position 95, and HLA-DQα1 amino acid position 53; p < 5.0 × 10(-8)), but no apparent risk conferred by MICA. We further evaluated risk of two major clinical subtypes of PsV, psoriatic arthritis (PsA; n = 3,038) and cutaneous psoriasis (PsC; n = 3,098). We found that risk heterogeneity between PsA and PsC might be driven by HLA-B amino acid position 45 (Pomnibus = 2.2 × 10(-11)), indicating that different genetic factors underlie the overall risk of PsV and the risk of specific PsV subphenotypes. Our study illustrates the value of high-resolution HLA and MICA imputation for fine mapping causal variants in the MHC.
Hu X, Kim H, Raj T, Brennan P, Trynka G, Teslovich N, Slowikowski K, Chen WM, Onengut S, Baecher-Allan C, De Jager P, Rich S, Stranger B, Brenner M, Raychaudhuri S. Regulation of gene expression in autoimmune disease loci and the genetic basis of proliferation in CD4+ effector memory T cells. PLoS Genet. 2014;10(6):e1004404.
Genome-wide association studies (GWAS) and subsequent dense-genotyping of associated loci identified over a hundred single-nucleotide polymorphism (SNP) variants associated with the risk of rheumatoid arthritis (RA), type 1 diabetes (T1D), and celiac disease (CeD). Immunological and genetic studies suggest a role for CD4-positive effector memory T (CD+ TEM) cells in the pathogenesis of these diseases. To elucidate mechanisms of autoimmune disease alleles, we investigated molecular phenotypes in CD4+ effector memory T cells potentially affected by these variants. In a cohort of genotyped healthy individuals, we isolated high purity CD4+ TEM cells from peripheral blood, then assayed relative abundance, proliferation upon T cell receptor (TCR) stimulation, and the transcription of 215 genes within disease loci before and after stimulation. We identified 46 genes regulated by cis-acting expression quantitative trait loci (eQTL), the majority of which we detected in stimulated cells. Eleven of the 46 genes with eQTLs were previously undetected in peripheral blood mononuclear cells. Of 96 risk alleles of RA, T1D, and/or CeD in densely genotyped loci, eleven overlapped cis-eQTLs, of which five alleles completely explained the respective signals. A non-coding variant, rs389862A, increased proliferative response (p=4.75 × 10-8). In addition, baseline expression of seventeen genes in resting cells reliably predicted proliferative response after TCR stimulation. Strikingly, however, there was no evidence that risk alleles modulated CD4+ TEM abundance or proliferation. Our study underscores the power of examining molecular phenotypes in relevant cells and conditions for understanding pathogenic mechanisms of disease variants.
Han B, Diogo D, Eyre S, Kallberg H, Zhernakova A, Bowes J, Padyukov L, Okada Y, González-Gay M, Rantapää-Dahlqvist S, Martin J, Huizinga T, Plenge R, Worthington J, Gregersen P, Klareskog L, Bakker P, Raychaudhuri S. Fine mapping seronegative and seropositive rheumatoid arthritis to shared and distinct HLA alleles by adjusting for the effects of heterogeneity. Am J Hum Genet. 2014;94(4):522–32.
Despite progress in defining human leukocyte antigen (HLA) alleles for anti-citrullinated-protein-autoantibody-positive (ACPA(+)) rheumatoid arthritis (RA), identifying HLA alleles for ACPA-negative (ACPA(-)) RA has been challenging because of clinical heterogeneity within clinical cohorts. We imputed 8,961 classical HLA alleles, amino acids, and SNPs from Immunochip data in a discovery set of 2,406 ACPA(-) RA case and 13,930 control individuals. We developed a statistical approach to identify and adjust for clinical heterogeneity within ACPA(-) RA and observed independent associations for serine and leucine at position 11 in HLA-DRβ1 (p = 1.4 × 10(-13), odds ratio [OR] = 1.30) and for aspartate at position 9 in HLA-B (p = 2.7 × 10(-12), OR = 1.39) within the peptide binding grooves. These amino acid positions induced associations at HLA-DRB1(∗)03 (encoding serine at 11) and HLA-B(∗)08 (encoding aspartate at 9). We validated these findings in an independent set of 427 ACPA(-) case subjects, carefully phenotyped with a highly sensitive ACPA assay, and 1,691 control subjects (HLA-DRβ1 Ser11+Leu11: p = 5.8 × 10(-4), OR = 1.28; HLA-B Asp9: p = 2.6 × 10(-3), OR = 1.34). Although both amino acid sites drove risk of ACPA(+) and ACPA(-) disease, the effects of individual residues at HLA-DRβ1 position 11 were distinct (p < 2.9 × 10(-107)). We also identified an association with ACPA(+) RA at HLA-A position 77 (p = 2.7 × 10(-8), OR = 0.85) in 7,279 ACPA(+) RA case and 15,870 control subjects. These results contribute to mounting evidence that ACPA(+) and ACPA(-) RA are genetically distinct and potentially have separate autoantigens contributing to pathogenesis. We expect that our approach might have broad applications in analyzing clinical conditions with heterogeneity at both major histocompatibility complex (MHC) and non-MHC regions.
Okada Y, Kim K, Han B, Pillai N, Ong R, Saw WY, Luo M, Jiang L, Yin J, Bang SY, Lee HS, Brown M, Bae SC, Xu H, Teo YY, Bakker P, Raychaudhuri S. Risk for ACPA-positive rheumatoid arthritis is driven by shared HLA amino acid polymorphisms in Asian and European populations. Hum Mol Genet. 2014;23(25):6916–26.
Previous studies have emphasized ethnically heterogeneous human leukocyte antigen (HLA) classical allele associations to rheumatoid arthritis (RA) risk. We fine-mapped RA risk alleles within the major histocompatibility complex (MHC) in 2782 seropositive RA cases and 4315 controls of Asian descent. We applied imputation to determine genotypes for eight class I and II HLA genes to Asian populations for the first time using a newly constructed pan-Asian reference panel. First, we empirically measured high imputation accuracy in Asian samples. Then we observed the most significant association in HLA-DRβ1 at amino acid position 13, located outside the classical shared epitope (Pomnibus = 6.9 × 10(-135)). The individual residues at position 13 have relative effects that are consistent with published effects in European populations (His > Phe > Arg > Tyr ≅ Gly > Ser)--but the observed effects in Asians are generally smaller. Applying stepwise conditional analysis, we identified additional independent associations at positions 57 (conditional Pomnibus = 2.2 × 10(-33)) and 74 (conditional Pomnibus = 1.1 × 10(-8)). Outside of HLA-DRβ1, we observed independent effects for amino acid polymorphisms within HLA-B (Asp9, conditional P = 3.8 × 10(-6)) and HLA-DPβ1 (Phe9, conditional P = 3.0 × 10(-5)) concordant with European populations. Our trans-ethnic HLA fine-mapping study reveals that (i) a common set of amino acid residues confer shared effects in European and Asian populations and (ii) these same effects can explain ethnically heterogeneous classical allelic associations (e.g. HLA-DRB1*09:01) due to allele frequency differences between populations. Our study illustrates the value of high-resolution imputation for fine-mapping causal variants in the MHC.
Slowikowski K, Hu X, Raychaudhuri S. SNPsea: an algorithm to identify cell types, tissues and pathways affected by risk loci. Bioinformatics. 2014;30(17):2496–7.
UNLABELLED: We created a fast, robust and general C+ + implementation of a single-nucleotide polymorphism (SNP) set enrichment algorithm to identify cell types, tissues and pathways affected by risk loci. It tests trait-associated genomic loci for enrichment of specificity to conditions (cell types, tissues and pathways). We use a non-parametric statistical approach to compute empirical P-values by comparison with null SNP sets. As a proof of concept, we present novel applications of our method to four sets of genome-wide significant SNPs associated with red blood cell count, multiple sclerosis, celiac disease and HDL cholesterol. AVAILABILITY AND IMPLEMENTATION: http://broadinstitute.org/mpg/snpsea. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

2013

Jia X, Han B, Onengut-Gumuscu S, Chen WM, Concannon P, Rich S, Raychaudhuri S, Bakker P. Imputing amino acid polymorphisms in human leukocyte antigens. PLoS One. 2013;8(6):e64683.
DNA sequence variation within human leukocyte antigen (HLA) genes mediate susceptibility to a wide range of human diseases. The complex genetic structure of the major histocompatibility complex (MHC) makes it difficult, however, to collect genotyping data in large cohorts. Long-range linkage disequilibrium between HLA loci and SNP markers across the major histocompatibility complex (MHC) region offers an alternative approach through imputation to interrogate HLA variation in existing GWAS data sets. Here we describe a computational strategy, SNP2HLA, to impute classical alleles and amino acid polymorphisms at class I (HLA-A, -B, -C) and class II (-DPA1, -DPB1, -DQA1, -DQB1, and -DRB1) loci. To characterize performance of SNP2HLA, we constructed two European ancestry reference panels, one based on data collected in HapMap-CEPH pedigrees (90 individuals) and another based on data collected by the Type 1 Diabetes Genetics Consortium (T1DGC, 5,225 individuals). We imputed HLA alleles in an independent data set from the British 1958 Birth Cohort (N = 918) with gold standard four-digit HLA types and SNPs genotyped using the Affymetrix GeneChip 500 K and Illumina Immunochip microarrays. We demonstrate that the sample size of the reference panel, rather than SNP density of the genotyping platform, is critical to achieve high imputation accuracy. Using the larger T1DGC reference panel, the average accuracy at four-digit resolution is 94.7% using the low-density Affymetrix GeneChip 500 K, and 96.7% using the high-density Illumina Immunochip. For amino acid polymorphisms within HLA genes, we achieve 98.6% and 99.3% accuracy using the Affymetrix GeneChip 500 K and Illumina Immunochip, respectively. Finally, we demonstrate how imputation and association testing at amino acid resolution can facilitate fine-mapping of primary MHC association signals, giving a specific example from type 1 diabetes.
Lin C, Karlson E, Canhao H, Miller T, Dligach D, Chen PJ, Perez RNG, Shen Y, Weinblatt M, Shadick N, Plenge R, Savova G. Automatic prediction of rheumatoid arthritis disease activity from the electronic medical records. PLoS One. 2013;8(8):e69932.
OBJECTIVE: We aimed to mine the data in the Electronic Medical Record to automatically discover patients' Rheumatoid Arthritis disease activity at discrete rheumatology clinic visits. We cast the problem as a document classification task where the feature space includes concepts from the clinical narrative and lab values as stored in the Electronic Medical Record. MATERIALS AND METHODS: The Training Set consisted of 2792 clinical notes and associated lab values. Test Set 1 included 1749 clinical notes and associated lab values. Test Set 2 included 344 clinical notes for which there were no associated lab values. The Apache clinical Text Analysis and Knowledge Extraction System was used to analyze the text and transform it into informative features to be combined with relevant lab values. RESULTS: Experiments over a range of machine learning algorithms and features were conducted. The best performing combination was linear kernel Support Vector Machines with Unified Medical Language System Concept Unique Identifier features with feature selection and lab values. The Area Under the Receiver Operating Characteristic Curve (AUC) is 0.831 (σ = 0.0317), statistically significant as compared to two baselines (AUC = 0.758, σ = 0.0291). Algorithms demonstrated superior performance on cases clinically defined as extreme categories of disease activity (Remission and High) compared to those defined as intermediate categories (Moderate and Low) and included laboratory data on inflammatory markers. CONCLUSION: Automatic Rheumatoid Arthritis disease activity discovery from Electronic Medical Record data is a learnable task approximating human performance. As a result, this approach might have several research applications, such as the identification of patients for genome-wide pharmacogenetic studies that require large sample sizes with precise definitions of disease activity and response to therapies.
Lim E, Raychaudhuri S, Sanders S, Stevens C, Sabo A, MacArthur D, Neale B, Kirby A, Ruderfer D, Fromer M, Lek M, Liu L, Flannick J, Ripke S, Nagaswamy U, Muzny D, Reid J, Hawes A, Newsham I, Wu Y, Lewis L, Dinh H, Gross S, Wang LS, Lin CF, Valladares O, Gabriel S, DePristo M, Altshuler D, Purcell S, NHLBI Exome Sequencing Project, State M, Boerwinkle E, Buxbaum J, Cook E, Gibbs R, Schellenberg G, Sutcliffe J, Devlin B, Roeder K, Daly M. Rare complete knockouts in humans: population distribution and significant role in autism spectrum disorders. Neuron. 2013;77(2):235–42.
To characterize the role of rare complete human knockouts in autism spectrum disorders (ASDs), we identify genes with homozygous or compound heterozygous loss-of-function (LoF) variants (defined as nonsense and essential splice sites) from exome sequencing of 933 cases and 869 controls. We identify a 2-fold increase in complete knockouts of autosomal genes with low rates of LoF variation (≤ 5% frequency) in cases and estimate a 3% contribution to ASD risk by these events, confirming this observation in an independent set of 563 probands and 4,605 controls. Outside the pseudoautosomal regions on the X chromosome, we similarly observe a significant 1.5-fold increase in rare hemizygous knockouts in males, contributing to another 2% of ASDs in males. Taken together, these results provide compelling evidence that rare autosomal and X chromosome complete gene knockouts are important inherited risk factors for ASD.
Liao K, Kurreeman F, Li G, Duclos G, Murphy S, Guzman R, Cai T, Gupta N, Gainer V, Schur P, Cui J, Denny J, Szolovits P, Churchill S, Kohane I, Karlson E, Plenge R. Associations of autoantibodies, autoimmune risk alleles, and clinical diagnoses from the electronic medical records in rheumatoid arthritis cases and non-rheumatoid arthritis controls. Arthritis Rheum. 2013;65(3):571–81.
OBJECTIVE: The significance of non-rheumatoid arthritis (RA) autoantibodies in patients with RA is unclear. The aim of this study was to assess associations of autoantibodies with autoimmune risk alleles and with clinical diagnoses from the electronic medical records (EMRs) among RA cases and non-RA controls. METHODS: Data on 1,290 RA cases and 1,236 non-RA controls of European genetic ancestry were obtained from the EMRs of 2 large academic centers. The levels of anti-citrullinated protein antibodies (ACPAs), antinuclear antibodies (ANAs), anti-tissue transglutaminase antibodies (AGTAs), and anti-thyroid peroxidase (anti-TPO) antibodies were measured. All subjects were genotyped for autoimmune risk alleles, and the association between number of autoimmune risk alleles present and number of types of autoantibodies present was studied. A phenome-wide association study (PheWAS) was conducted to study potential associations between autoantibodies and clinical diagnoses among RA cases and non-RA controls. RESULTS: The mean ages were 60.7 years in RA cases and 64.6 years in non-RA controls. The proportion of female subjects was 79% in each group. The prevalence of ACPAs and ANAs was higher in RA cases compared to controls (each P < 0.0001); there were no differences in the prevalence of anti-TPO antibodies and AGTAs. Carriage of higher numbers of autoimmune risk alleles was associated with increasing numbers of autoantibody types in RA cases (P = 2.1 × 10(-5)) and non-RA controls (P = 5.0 × 10(-3)). From the PheWAS, the presence of ANAs was significantly associated with a diagnosis of Sjögren's/sicca syndrome in RA cases. CONCLUSION: The increased frequency of autoantibodies in RA cases and non-RA controls was associated with the number of autoimmune risk alleles carried by an individual. PheWAS of EMR data, with linkage to laboratory data obtained from blood samples, provide a novel method to test for the clinical significance of biomarkers in disease.
Viatte S, Plant D, Raychaudhuri S. Genetics and epigenetics of rheumatoid arthritis. Nat Rev Rheumatol. 2013;9(3):141–53.
Investigators have made key advances in rheumatoid arthritis (RA) genetics in the past 10 years. Although genetic studies have had limited influence on clinical practice and drug discovery, they are currently generating testable hypotheses to explain disease pathogenesis. Firstly, we review here the major advances in identifying RA genetic susceptibility markers both within and outside of the MHC. Understanding how genetic variants translate into pathogenic mechanisms and ultimately into phenotypes remains a mystery for most of the polymorphisms that confer susceptibility to RA, but functional data are emerging. Interplay between environmental and genetic factors is poorly understood and in need of further investigation. Secondly, we review current knowledge of the role of epigenetics in RA susceptibility. Differences in the epigenome could represent one of the ways in which environmental exposures translate into phenotypic outcomes. The best understood epigenetic phenomena include post-translational histone modifications and DNA methylation events, both of which have critical roles in gene regulation. Epigenetic studies in RA represent a new area of research with the potential to answer unsolved questions.