Publications

2015

Trynka G, Westra HJ, Slowikowski K, Hu X, Xu H, Stranger B, Klein R, Han B, Raychaudhuri S. Disentangling the Effects of Colocalizing Genomic Annotations to Functionally Prioritize Non-coding Variants within Complex-Trait Loci. Am J Hum Genet. 2015;97(1):139–52.
Identifying genomic annotations that differentiate causal from trait-associated variants is essential to fine mapping disease loci. Although many studies have identified non-coding functional annotations that overlap disease-associated variants, these annotations often colocalize, complicating the ability to use these annotations for fine mapping causal variation. We developed a statistical approach (Genomic Annotation Shifter [GoShifter]) to assess whether enriched annotations are able to prioritize causal variation. GoShifter defines the null distribution of an annotation overlapping an allele by locally shifting annotations; this approach is less sensitive to biases arising from local genomic structure than commonly used enrichment methods that depend on SNP matching. Local shifting also allows GoShifter to identify independent causal effects from colocalizing annotations. Using GoShifter, we confirmed that variants in expression quantitative trail loci drive gene-expression changes though DNase-I hypersensitive sites (DHSs) near transcription start sites and independently through 3' UTR regulation. We also showed that (1) 15%-36% of trait-associated loci map to DHSs independently of other annotations; (2) loci associated with breast cancer and rheumatoid arthritis harbor potentially causal variants near the summits of histone marks rather than full peak bodies; (3) variants associated with height are highly enriched in embryonic stem cell DHSs; and (4) we can effectively prioritize causal variation at specific loci.
Lee H, Byrne E, Hultman C, Kähler A, Vinkhuyzen A, Ripke S, Andreassen O, Frisell T, Gusev A, Hu X, Karlsson R, Mantzioris V, McGrath J, Mehta D, Stahl E, Zhao Q, Kendler K, Sullivan P, Price A, O’Donovan M, Okada Y, Mowry B, Raychaudhuri S, Wray N, International SWGPGCRAC, Authors SWGPGC, Byerley W, Cahn W, Cantor R, Cichon S, Cormican P, Curtis D, Djurovic S, Escott-Price V, Gejman P, Georgieva L, Giegling I, Hansen T, Ingason A, Kim Y, Konte B, Lee P, McIntosh A, McQuillin A, Morris D, Nöthen M, O’Dushlaine C, Olincy A, Olsen L, Pato C, Pato M, Pickard B, Posthuma D, Rasmussen H, Rietschel M, Rujescu D, Schulze T, Silverman J, Thirumalai S, Werge T, Collaborators SWGPGC, Agartz I, Amin F, Azevedo M, Bass N, Black D, Blackwood D, Bruggeman R, Buccola N, Choudhury K, Cloninger R, Corvin A, Craddock N, Daly M, Datta S, Donohoe G, Duan J, Dudbridge F, Fanous A, Freedman R, Freimer N, Friedl M, Gill M, Gurling H, De Haan L, Hamshere M, Hartmann A, Holmans P, Kahn R, Keller M, Kenny E, Kirov G, Krabbendam L, Krasucki R, Lawrence J, Lencz T, Levinson D, Lieberman J, Lin DY, Linszen D, Magnusson P, Maier W, Malhotra A, Mattheisen M, Mattingsdal M, McCarroll S, Medeiros H, Melle I, Milanova V, Myin-Germeys I, Neale B, Ophoff R, Owen M, Pimm J, Purcell S, Puri V, Quested D, Rossin L, Ruderfer D, Sanders A, Shi J, Sklar P, St Clair D, Stroup S, Os J, Visscher P, Wiersma D, Zammit S, Rheumatoid Arthritis Consortium International Authors, Bridges L, Choi H, Coenen MJ, Vries N, Dieud P, Greenberg J, Huizinga T, Padyukov L, Siminovitch K, Tak P, Worthington J, Rheumatoid Arthritis Consortium International Collaborators, De Jager P, Denny J, Gregersen P, Klareskog L, Mariette X, Plenge R, Laar M, Riel P. New data and an old puzzle: the negative association between schizophrenia and rheumatoid arthritis. Int J Epidemiol. 2015;44(5):1706–21.
BACKGROUND: A long-standing epidemiological puzzle is the reduced rate of rheumatoid arthritis (RA) in those with schizophrenia (SZ) and vice versa. Traditional epidemiological approaches to determine if this negative association is underpinned by genetic factors would test for reduced rates of one disorder in relatives of the other, but sufficiently powered data sets are difficult to achieve. The genomics era presents an alternative paradigm for investigating the genetic relationship between two uncommon disorders. METHODS: We use genome-wide common single nucleotide polymorphism (SNP) data from independently collected SZ and RA case-control cohorts to estimate the SNP correlation between the disorders. We test a genotype X environment (GxE) hypothesis for SZ with environment defined as winter- vs summer-born. RESULTS: We estimate a small but significant negative SNP-genetic correlation between SZ and RA (-0.046, s.e. 0.026, P = 0.036). The negative correlation was stronger for the SNP set attributed to coding or regulatory regions (-0.174, s.e. 0.071, P = 0.0075). Our analyses led us to hypothesize a gene-environment interaction for SZ in the form of immune challenge. We used month of birth as a proxy for environmental immune challenge and estimated the genetic correlation between winter-born and non-winter born SZ to be significantly less than 1 for coding/regulatory region SNPs (0.56, s.e. 0.14, P = 0.00090). CONCLUSIONS: Our results are consistent with epidemiological observations of a negative relationship between SZ and RA reflecting, at least in part, genetic factors. Results of the month of birth analysis are consistent with pleiotropic effects of genetic variants dependent on environmental context.
Triebwasser M, Roberson E, Yu Y, Schramm E, Wagner E, Raychaudhuri S, Seddon J, Atkinson J. Rare Variants in the Functional Domains of Complement Factor H Are Associated With Age-Related Macular Degeneration. Invest Ophthalmol Vis Sci. 2015;56(11):6873–8.
PURPOSE: Age-related macular degeneration (AMD) has a substantial genetic risk component, as evidenced by the risk from common genetic variants uncovered in the first genome-wide association studies. More recently, it has become apparent that rare genetic variants also play an independent role in AMD risk. We sought to determine if rare variants in complement factor H (CFH) played a role in AMD risk. METHODS: We had previously collected DNA from a large population of patients with advanced age-related macular degeneration (A-AMD) and controls for targeted deep sequencing of candidate AMD risk genes. In this analysis, we tested for an increased burden of rare variants in CFH in 1665 cases and 752 controls from this cohort. RESULTS: We identified 65 missense, nonsense, or splice-site mutations with a minor allele frequency ≤ 1%. Rare variants with minor allele frequency ≤ 1% (odds ratio [OR] = 1.5, P = 4.4 × 10⁻²), 0.5% (OR = 1.6, P = 2.6 × 10⁻²), and all singletons (OR = 2.3, P = 3.3 × 10⁻²) were enriched in A-AMD cases. Moreover, we observed loss-of-function rare variants (nonsense, splice-site, and loss of a conserved cysteine) in 10 cases and serum levels of FH were decreased in all 5 with an available sample (haploinsufficiency). Further, rare variants in the major functional domains of CFH were increased in cases (OR = 3.2; P = 1.4 × 10⁻³) and the magnitude of the effect correlated with the disruptive nature of the variant, location in an active site, and inversely with minor allele frequency. CONCLUSIONS: In this large A-AMD cohort, rare variants in the CFH gene were enriched and tended to be located in functional sites or led to low serum levels. These data, combined with those indicating a similar, but even more striking, increase in rare variants found in CFI, strongly implicate complement activation in A-AMD etiopathogenesis as CFH and CFI interact to inhibit the alternative pathway.
Steenbergen, Raychaudhuri, Rodríguez-Rodríguez, Rantapää-Dahlqvist, Berglin, Toes, Huizinga, Fernández-Gutiérrez, Gregersen, Helm-van Mil. Association of valine and leucine at HLA-DRB1 position 11 with radiographic progression in rheumatoid arthritis, independent of the shared epitope alleles but not independent of anti-citrullinated protein antibodies. Arthritis Rheumatol. 2015;67(4):877–86.
OBJECTIVE: For decades it has been known that the HLA-DRB1 shared epitope (SE) alleles are associated with an increased risk of development and progression of rheumatoid arthritis (RA). Recently, the following variations in the peptide-binding grooves of HLA molecules that predispose to RA development have been identified: Val and Leu at HLA-DRB1 position 11, Asp at HLA-B position 9, and Phe at HLA-DPB1 position 9. This study was undertaken to investigate whether these variants are also associated with radiographic progression in RA, independent of SE and anti-citrullinated protein antibody (ACPA) status. METHODS: A total of 4,911 radiograph sets from 1,878 RA patients included in the Leiden Early Arthritis Clinic (The Netherlands), Umeå (Sweden), Hospital Clinico San Carlos-Rheumatoid Arthritis (Spain), and National Data Bank for Rheumatic Diseases (US) cohorts were studied. HLA was imputed using single-nucleotide polymorphism data from an Immunochip, and the amino acids listed above were tested in relation to radiographic progression per cohort using an additive model. Results from the 4 cohorts were combined in inverse-variance weighted meta-analyses using a fixed-effects model. Analyses were conditioned on SE and ACPA status. RESULTS: Val and Leu at HLA-DRB1 position 11 were associated with more radiographic progression (meta-analysis P = 5.11 × 10(-7)); this effect was independent of SE status (meta-analysis P = 0.022) but not independent of ACPA status. Phe at HLA-DPB1 position 9 was associated with more severe radiographic progression (meta-analysis P = 0.024), though not independent of SE status. Asp at HLA-B position 9 was not associated with radiographic progression. CONCLUSION: Val and Leu at HLA-DRB1 position 11 conferred a risk of a higher rate of radiographic progression independent of SE status but not independent of ACPA status. These findings support the relevance of these amino acids at position 11.
Yu S, Liao K, Shaw S, Gainer V, Churchill S, Szolovits P, Murphy S, Kohane I, Cai T. Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources. J Am Med Inform Assoc. 2015;22(5):993–1000.
OBJECTIVE: Analysis of narrative (text) data from electronic health records (EHRs) can improve population-scale phenotyping for clinical and genetic research. Currently, selection of text features for phenotyping algorithms is slow and laborious, requiring extensive and iterative involvement by domain experts. This paper introduces a method to develop phenotyping algorithms in an unbiased manner by automatically extracting and selecting informative features, which can be comparable to expert-curated ones in classification accuracy. MATERIALS AND METHODS: Comprehensive medical concepts were collected from publicly available knowledge sources in an automated, unbiased fashion. Natural language processing (NLP) revealed the occurrence patterns of these concepts in EHR narrative notes, which enabled selection of informative features for phenotype classification. When combined with additional codified features, a penalized logistic regression model was trained to classify the target phenotype. RESULTS: The authors applied our method to develop algorithms to identify patients with rheumatoid arthritis and coronary artery disease cases among those with rheumatoid arthritis from a large multi-institutional EHR. The area under the receiver operating characteristic curves (AUC) for classifying RA and CAD using models trained with automated features were 0.951 and 0.929, respectively, compared to the AUCs of 0.938 and 0.929 by models trained with expert-curated features. DISCUSSION: Models trained with NLP text features selected through an unbiased, automated procedure achieved comparable or slightly higher accuracy than those trained with expert-curated features. The majority of the selected model features were interpretable. CONCLUSION: The proposed automated feature extraction method, generating highly accurate phenotyping algorithms with improved efficiency, is a significant step toward high-throughput phenotyping.
Yarwood A, Han B, Raychaudhuri S, Bowes J, Lunt M, Pappas D, Kremer J, Greenberg J, Plenge R, Rheumatoid Arthritis Consortium International (RACI), Worthington J, Barton A, Eyre S. A weighted genetic risk score using all known susceptibility variants to estimate rheumatoid arthritis risk. Ann Rheum Dis. 2015;74(1):170–6.
BACKGROUND: There is currently great interest in the incorporation of genetic susceptibility loci into screening models to identify individuals at high risk of disease. Here, we present the first risk prediction model including all 46 known genetic loci associated with rheumatoid arthritis (RA). METHODS: A weighted genetic risk score (wGRS) was created using 45 RA non-human leucocyte antigen (HLA) susceptibility loci, imputed amino acids at HLA-DRB1 (11, 71 and 74), HLA-DPB1 (position 9) HLA-B (position 9) and gender. The wGRS was tested in 11 366 RA cases and 15 489 healthy controls. The risk of developing RA was estimated using logistic regression by dividing the wGRS into quintiles. The ability of the wGRS to discriminate between cases and controls was assessed by receiver operator characteristic analysis and discrimination improvement tests. RESULTS: Individuals in the highest risk group showed significantly increased odds of developing anti-cyclic citrullinated peptide-positive RA compared to the lowest risk group (OR 27.13, 95% CI 23.70 to 31.05). The wGRS was validated in an independent cohort that showed similar results (area under the curve 0.78, OR 18.00, 95% CI 13.67 to 23.71). Comparison of the full wGRS with a wGRS in which HLA amino acids were replaced by a HLA tag single-nucleotide polymorphism showed a significant loss of sensitivity and specificity. CONCLUSIONS: Our study suggests that in RA, even when using all known genetic susceptibility variants, prediction performance remains modest; while this is insufficiently accurate for general population screening, it may prove of more use in targeted studies. Our study has also highlighted the importance of including HLA variation in risk prediction models.
Kim K, Jiang X, Cui J, Lu B, Costenbader K, Sparks J, Bang SY, Lee HS, Okada Y, Raychaudhuri S, Alfredsson L, Bae SC, Klareskog L, Karlson E. Interactions between amino acid-defined major histocompatibility complex class II variants and smoking in seropositive rheumatoid arthritis. Arthritis Rheumatol. 2015;67(10):2611–23.
OBJECTIVE: To define the interaction between cigarette smoking and HLA polymorphisms in seropositive rheumatoid arthritis (RA), in the context of a recently identified amino acid-based HLA model for RA susceptibility. METHODS: We imputed Immunochip data on HLA amino acids and classical alleles from 3 case-control studies (the Swedish Epidemiological Investigation of Rheumatoid Arthritis [EIRA] study [1,654 cases and 1,934 controls], the Nurses' Health Study [NHS] [229 cases and 360 controls], and the Korean RA Cohort Study [1,390 cases and 735 controls]). We examined the interaction effects of heavy smoking (>10 pack-years) and the genetic risk score (GRS) of multiple RA-associated amino acid positions (positions 11, 13, 71, and 74 in HLA-DRβ1, position 9 in HLA-B, and position 9 in HLA-DPβ1), as well as the interaction effects of heavy smoking and the GRS of HLA-DRβ1 4-amino acid haplotypes (assessed via attributable proportion due to interaction [AP] using the additive interaction model). RESULTS: Heavy smoking and all investigated HLA amino acid positions and haplotypes were associated with RA susceptibility in the 3 populations. In the interaction analysis, we found a significant deviation from the expected additive joint effect between heavy smoking and the HLA-DRβ1 4-amino acid haplotype (AP 0.416, 0.467, and 0.796, in the EIRA, NHS, and Korean studies, respectively). We further identified the key interacting variants as being located at HLA-DRβ1 amino acid positions 11 and 13 but not at any of the other RA risk-associated amino acid positions. For residues in positions 11 and 13, there were similar patterns between RA risk effects and interaction effects. CONCLUSION: Our findings of significant gene-environment interaction effects indicate that a physical interaction between citrullinated autoantigens produced by smoking and HLA-DR molecules is characterized by the HLA-DRβ1 4-amino acid haplotype, primarily by positions 11 and 13.
Diogo D, Bastarache L, Liao K, Graham R, Fulton R, Greenberg J, Eyre S, Bowes J, Cui J, Lee A, Pappas D, Kremer J, Barton A, Coenen MJ, Franke B, Kiemeney L, Mariette X, Richard-Miceli C, Canhao H, Fonseca J, Vries N, Tak P, Crusius B, Nurmohamed M, Kurreeman F, Mikuls T, Okada Y, Stahl E, Larson D, Deluca T, O’Laughlin M, Fronick C, Fulton L, Kosoy R, Ransom M, Bhangale T, Ortmann W, Cagan A, Gainer V, Karlson E, Kohane I, Murphy S, Martin J, Zhernakova A, Klareskog L, Padyukov L, Worthington J, Mardis E, Seldin M, Gregersen P, Behrens T, Raychaudhuri S, Denny J, Plenge R. TYK2 protein-coding variants protect against rheumatoid arthritis and autoimmunity, with no evidence of major pleiotropic effects on non-autoimmune complex traits. PLoS One. 2015;10(4):e0122271.
Despite the success of genome-wide association studies (GWAS) in detecting a large number of loci for complex phenotypes such as rheumatoid arthritis (RA) susceptibility, the lack of information on the causal genes leaves important challenges to interpret GWAS results in the context of the disease biology. Here, we genetically fine-map the RA risk locus at 19p13 to define causal variants, and explore the pleiotropic effects of these same variants in other complex traits. First, we combined Immunochip dense genotyping (n = 23,092 case/control samples), Exomechip genotyping (n = 18,409 case/control samples) and targeted exon-sequencing (n = 2,236 case/controls samples) to demonstrate that three protein-coding variants in TYK2 (tyrosine kinase 2) independently protect against RA: P1104A (rs34536443, OR = 0.66, P = 2.3 x 10(-21)), A928V (rs35018800, OR = 0.53, P = 1.2 x 10(-9)), and I684S (rs12720356, OR = 0.86, P = 4.6 x 10(-7)). Second, we show that the same three TYK2 variants protect against systemic lupus erythematosus (SLE, Pomnibus = 6 x 10(-18)), and provide suggestive evidence that two of the TYK2 variants (P1104A and A928V) may also protect against inflammatory bowel disease (IBD; P(omnibus) = 0.005). Finally, in a phenome-wide association study (PheWAS) assessing >500 phenotypes using electronic medical records (EMR) in >29,000 subjects, we found no convincing evidence for association of P1104A and A928V with complex phenotypes other than autoimmune diseases such as RA, SLE and IBD. Together, our results demonstrate the role of TYK2 in the pathogenesis of RA, SLE and IBD, and provide supporting evidence for TYK2 as a promising drug target for the treatment of autoimmune diseases.
Kim K, Bang SY, Lee HS, Cho SK, Choi CB, Sung YK, Kim TH, Jun JB, Yoo DH, Kang YM, Kim SK, Suh CH, Shim SC, Lee SS, Lee J, Chung WT, Choe JY, Shin HD, Lee JY, Han BG, Nath S, Eyre S, Bowes J, Pappas D, Kremer J, González-Gay M, Rodriguez-Rodriguez L, Ärlestig L, Okada Y, Diogo D, Liao K, Karlson E, Raychaudhuri S, Rantapää-Dahlqvist S, Martin J, Klareskog L, Padyukov L, Gregersen P, Worthington J, Greenberg J, Plenge R, Bae SC. High-density genotyping of immune loci in Koreans and Europeans identifies eight new rheumatoid arthritis risk loci. Ann Rheum Dis. 2015;74(3):e13.
OBJECTIVE: A highly polygenic aetiology and high degree of allele-sharing between ancestries have been well elucidated in genetic studies of rheumatoid arthritis. Recently, the high-density genotyping array Immunochip for immune disease loci identified 14 new rheumatoid arthritis risk loci among individuals of European ancestry. Here, we aimed to identify new rheumatoid arthritis risk loci using Korean-specific Immunochip data. METHODS: We analysed Korean rheumatoid arthritis case-control samples using the Immunochip and genome-wide association studies (GWAS) array to search for new risk alleles of rheumatoid arthritis with anticitrullinated peptide antibodies. To increase power, we performed a meta-analysis of Korean data with previously published European Immunochip and GWAS data for a total sample size of 9299 Korean and 45,790 European case-control samples. RESULTS: We identified eight new rheumatoid arthritis susceptibility loci (TNFSF4, LBH, EOMES, ETS1-FLI1, COG6, RAD51B, UBASH3A and SYNGR1) that passed a genome-wide significance threshold (p<5×10(-8)), with evidence for three independent risk alleles at 1q25/TNFSF4. The risk alleles from the seven new loci except for the TNFSF4 locus (monomorphic in Koreans), together with risk alleles from previously established RA risk loci, exhibited a high correlation of effect sizes between ancestries. Further, we refined the number of single nucleotide polymorphisms (SNPs) that represent potentially causal variants through a trans-ethnic comparison of densely genotyped SNPs. CONCLUSIONS: This study demonstrates the advantage of dense-mapping and trans-ancestral analysis for identification of potentially causal SNPs. In addition, our findings support the importance of T cells in the pathogenesis and the fact of frequent overlap of risk loci among diverse autoimmune diseases.