Publications

2012

Kurreeman F, Stahl E, Okada Y, Liao K, Diogo D, Raychaudhuri S, Freudenberg J, Kochi Y, Patsopoulos N, Gupta N, CLEAR investigators, Sandor C, Bang SY, Lee HS, Padyukov L, Suzuki A, Siminovitch K, Worthington J, Gregersen P, Hughes L, Reynolds R, Bridges L, Bae SC, Yamamoto K, Plenge R. Use of a multiethnic approach to identify rheumatoid- arthritis-susceptibility loci, 1p36 and 17q12. Am J Hum Genet. 2012;90(3):524–32.
We have previously shown that rheumatoid arthritis (RA) risk alleles overlap between different ethnic groups. Here, we utilize a multiethnic approach to show that we can effectively discover RA risk alleles. Thirteen putatively associated SNPs that had not yet exceeded genome-wide significance (p < 5 × 10(-8)) in our previous RA genome-wide association study (GWAS) were analyzed in independent sample sets consisting of 4,366 cases and 17,765 controls of European, African American, and East Asian ancestry. Additionally, we conducted an overall association test across all 65,833 samples (a GWAS meta-analysis plus the replication samples). Of the 13 SNPs investigated, four were significantly below the study-wide Bonferroni corrected p value threshold (p < 0.0038) in the replication samples. Two SNPs (rs3890745 at the 1p36 locus [p = 2.3 × 10(-12)] and rs2872507 at the 17q12 locus [p = 1.7 × 10(-9)]) surpassed genome-wide significance in all 16,659 RA cases and 49,174 controls combined. We used available GWAS data to fine map these two loci in Europeans and East Asians, and we found that the same allele conferred risk in both ethnic groups. A series of bioinformatic analyses identified TNFRSF14-MMEL1 at the 1p36 locus and IKZF3-ORMDL3-GSDMB at the 17q12 locus as the genes most likely associated with RA. These findings demonstrate empirically that a multiethnic approach is an effective strategy for discovering RA risk loci, and they suggest that combining GWASs across ethnic groups represents an efficient strategy for gaining statistical power.
Raychaudhuri S, Sandor C, Stahl E, Freudenberg J, Lee HS, Jia X, Alfredsson L, Padyukov L, Klareskog L, Worthington J, Siminovitch K, Bae SC, Plenge R, Gregersen P, Bakker P. Five amino acids in three HLA proteins explain most of the association between MHC and seropositive rheumatoid arthritis. Nat Genet. 2012;44(3):291–6.
The genetic association of the major histocompatibility complex (MHC) to rheumatoid arthritis risk has commonly been attributed to alleles in HLA-DRB1. However, debate persists about the identity of the causal variants in HLA-DRB1 and the presence of independent effects elsewhere in the MHC. Using existing genome-wide SNP data in 5,018 individuals with seropositive rheumatoid arthritis (cases) and 14,974 unaffected controls, we imputed and tested classical alleles and amino acid polymorphisms in HLA-A, HLA-B, HLA-C, HLA-DPA1, HLA-DPB1, HLA-DQA1, HLA-DQB1 and HLA-DRB1, as well as 3,117 SNPs across the MHC. Conditional and haplotype analyses identified that three amino acid positions (11, 71 and 74) in HLA-DRβ1 and single-amino-acid polymorphisms in HLA-B (at position 9) and HLA-DPβ1 (at position 9), which are all located in peptide-binding grooves, almost completely explain the MHC association to rheumatoid arthritis risk. This study shows how imputation of functional variation from large reference panels can help fine map association signals in the MHC.
Stahl E, Wegmann D, Trynka G, Gutierrez-Achury J, Do R, Voight B, Kraft P, Chen R, Kallberg H, Kurreeman F, Consortium DGRM analysis, Myocardial Infarction Genetics Consortium, Kathiresan S, Wijmenga C, Gregersen P, Alfredsson L, Siminovitch K, Worthington J, Bakker P, Raychaudhuri S, Plenge R. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat Genet. 2012;44(5):483–9.
The genetic architectures of common, complex diseases are largely uncharacterized. We modeled the genetic architecture underlying genome-wide association study (GWAS) data for rheumatoid arthritis and developed a new method using polygenic risk-score analyses to infer the total liability-scale variance explained by associated GWAS SNPs. Using this method, we estimated that, together, thousands of SNPs from rheumatoid arthritis GWAS explain an additional 20% of disease risk (excluding known associated loci). We further tested this method on datasets for three additional diseases and obtained comparable estimates for celiac disease (43% excluding the major histocompatibility complex), myocardial infarction and coronary artery disease (48%) and type 2 diabetes (49%). Our results are consistent with simulated genetic models in which hundreds of associated loci harbor common causal variants and a smaller number of loci harbor multiple rare causal variants. These analyses suggest that GWAS will continue to be highly productive for the discovery of additional susceptibility loci for common diseases.

2011

Rossin E, Lage K, Raychaudhuri S, Xavier R, Tatar D, Benita Y, International Inflammatory Bowel Disease Genetics Constortium, Cotsapas C, Daly M. Proteins encoded in genomic regions associated with immune-mediated disease physically interact and suggest underlying biology. PLoS Genet. 2011;7(1):e1001273.
Genome-wide association studies (GWAS) have defined over 150 genomic regions unequivocally containing variation predisposing to immune-mediated disease. Inferring disease biology from these observations, however, hinges on our ability to discover the molecular processes being perturbed by these risk variants. It has previously been observed that different genes harboring causal mutations for the same Mendelian disease often physically interact. We sought to evaluate the degree to which this is true of genes within strongly associated loci in complex disease. Using sets of loci defined in rheumatoid arthritis (RA) and Crohn's disease (CD) GWAS, we build protein-protein interaction (PPI) networks for genes within associated loci and find abundant physical interactions between protein products of associated genes. We apply multiple permutation approaches to show that these networks are more densely connected than chance expectation. To confirm biological relevance, we show that the components of the networks tend to be expressed in similar tissues relevant to the phenotypes in question, suggesting the network indicates common underlying processes perturbed by risk loci. Furthermore, we show that the RA and CD networks have predictive power by demonstrating that proteins in these networks, not encoded in the confirmed list of disease associated loci, are significantly enriched for association to the phenotypes in question in extended GWAS analysis. Finally, we test our method in 3 non-immune traits to assess its applicability to complex traits in general. We find that genes in loci associated to height and lipid levels assemble into significantly connected networks but did not detect excess connectivity among Type 2 Diabetes (T2D) loci beyond chance. Taken together, our results constitute evidence that, for many of the complex diseases studied here, common genetic associations implicate regions encoding proteins that physically interact in a preferential manner, in line with observations in Mendelian disease.
Zhernakova A, Stahl E, Trynka G, Raychaudhuri S, Festen E, Franke L, Westra HJ, Fehrmann R, Kurreeman F, Thomson B, Gupta N, Romanos J, McManus R, Ryan A, Turner G, Brouwer E, Posthumus M, Remmers E, Tucci F, Toes R, Grandone E, Mazzilli MC, Rybak A, Cukrowska B, Coenen MJ, Radstake T, Riel P, Li Y, Bakker P, Gregersen P, Worthington J, Siminovitch K, Klareskog L, Huizinga T, Wijmenga C, Plenge R. Meta-analysis of genome-wide association studies in celiac disease and rheumatoid arthritis identifies fourteen non-HLA shared loci. PLoS Genet. 2011;7(2):e1002004.
Epidemiology and candidate gene studies indicate a shared genetic basis for celiac disease (CD) and rheumatoid arthritis (RA), but the extent of this sharing has not been systematically explored. Previous studies demonstrate that 6 of the established non-HLA CD and RA risk loci (out of 26 loci for each disease) are shared between both diseases. We hypothesized that there are additional shared risk alleles and that combining genome-wide association study (GWAS) data from each disease would increase power to identify these shared risk alleles. We performed a meta-analysis of two published GWAS on CD (4,533 cases and 10,750 controls) and RA (5,539 cases and 17,231 controls). After genotyping the top associated SNPs in 2,169 CD cases and 2,255 controls, and 2,845 RA cases and 4,944 controls, 8 additional SNPs demonstrated P<5 × 10(-8) in a combined analysis of all 50,266 samples, including four SNPs that have not been previously confirmed in either disease: rs10892279 near the DDX6 gene (P(combined) =  1.2 × 10(-12)), rs864537 near CD247 (P(combined) =  2.2 × 10(-11)), rs2298428 near UBE2L3 (P(combined) =  2.5 × 10(-10)), and rs11203203 near UBASH3A (P(combined) =  1.1 × 10(-8)). We also confirmed that 4 gene loci previously established in either CD or RA are associated with the other autoimmune disease at combined P<5 × 10(-8) (SH2B3, 8q24, STAT4, and TRAF1-C5). From the 14 shared gene loci, 7 SNPs showed a genome-wide significant effect on expression of one or more transcripts in the linkage disequilibrium (LD) block around the SNP. These associations implicate antigen presentation and T-cell activation as a shared mechanism of disease pathogenesis and underscore the utility of cross-disease meta-analysis for identification of genetic risk factors with pleiotropic effects between two clinically distinct diseases.
Janse M, Lamberts L, Franke L, Raychaudhuri S, Ellinghaus E, Muri Boberg K, Melum E, Folseraas T, Schrumpf E, Bergquist A, Björnsson E, Fu J, Jan Westra H, Groen H, Fehrmann R, Smolonska J, Berg L, Ophoff R, Porte R, Weismüller T, Wedemeyer J, Schramm C, Sterneck M, Günther R, Braun F, Vermeire S, Henckaerts L, Wijmenga C, Ponsioen C, Schreiber S, Karlsen T, Franke A, Weersma R. Three ulcerative colitis susceptibility loci are associated with primary sclerosing cholangitis and indicate a role for IL2, REL, and CARD9. Hepatology. 2011;53(6):1977–85.
UNLABELLED: Primary sclerosing cholangitis (PSC) is a chronic cholestatic liver disease characterized by inflammation and fibrosis of the bile ducts. Both environmental and genetic factors contribute to its pathogenesis. To further clarify its genetic background, we investigated susceptibility loci recently identified for ulcerative colitis (UC) in a large cohort of 1,186 PSC patients and 1,748 controls. Single nucleotide polymorphisms (SNPs) tagging 13 UC susceptibility loci were initially genotyped in 854 PSC patients and 1,491 controls from Benelux (331 cases, 735 controls), Germany (265 cases, 368 controls), and Scandinavia (258 cases, 388 controls). Subsequently, a joint analysis was performed with an independent second Scandinavian cohort (332 cases, 257 controls). SNPs at chromosomes 2p16 (P-value 4.12 × 10(-4) ), 4q27 (P-value 4.10 × 10(-5) ), and 9q34 (P-value 8.41 × 10(-4) ) were associated with PSC in the joint analysis after correcting for multiple testing. In PSC patients without inflammatory bowel disease (IBD), SNPs at 4q27 and 9q34 were nominally associated (P < 0.05). We applied additional in silico analyses to identify likely candidate genes at PSC susceptibility loci. To identify nonrandom, evidence-based links we used GRAIL (Gene Relationships Across Implicated Loci) analysis showing interconnectivity between genes in six out of in total nine PSC-associated regions. Expression quantitative trait analysis from 1,469 Dutch and UK individuals demonstrated that five out of nine SNPs had an effect on cis-gene expression. These analyses prioritized IL2, CARD9, and REL as novel candidates. CONCLUSION: We have identified three UC susceptibility loci to be associated with PSC, harboring the putative candidate genes REL, IL2, and CARD9. These results add to the scarce knowledge on the genetic background of PSC and imply an important role for both innate and adaptive immunological factors.
Yu Y, Bhangale T, Fagerness J, Ripke S, Thorleifsson G, Tan P, Souied E, Richardson A, Merriam J, Buitendijk G, Reynolds R, Raychaudhuri S, Chin K, Sobrin L, Evangelou E, Lee P, Lee A, Leveziel N, Zack D, Campochiaro B, Campochiaro P, Smith T, Barile G, Guymer R, Hogg R, Chakravarthy U, Robman L, Gustafsson O, Sigurdsson H, Ortmann W, Behrens T, Stefansson K, Uitterlinden A, Duijn C, Vingerling J, Klaver C, Allikmets R, Brantley M, Baird P, Katsanis N, Thorsteinsdottir U, Ioannidis J, Daly M, Graham R, Seddon J. Common variants near FRK/COL10A1 and VEGFA are associated with advanced age-related macular degeneration. Hum Mol Genet. 2011;20(18):3699–709.
Despite significant progress in the identification of genetic loci for age-related macular degeneration (AMD), not all of the heritability has been explained. To identify variants which contribute to the remaining genetic susceptibility, we performed the largest meta-analysis of genome-wide association studies to date for advanced AMD. We imputed 6 036 699 single-nucleotide polymorphisms with the 1000 Genomes Project reference genotypes on 2594 cases and 4134 controls with follow-up replication of top signals in 5640 cases and 52 174 controls. We identified two new common susceptibility alleles, rs1999930 on 6q21-q22.3 near FRK/COL10A1 [odds ratio (OR) 0.87; P = 1.1 × 10(-8)] and rs4711751 on 6p12 near VEGFA (OR 1.15; P = 8.7 × 10(-9)). In addition to the two novel loci, 10 previously reported loci in ARMS2/HTRA1 (rs10490924), CFH (rs1061170, and rs1410996), CFB (rs641153), C3 (rs2230199), C2 (rs9332739), CFI (rs10033900), LIPC (rs10468017), TIMP3 (rs9621532) and CETP (rs3764261) were confirmed with genome-wide significant signals in this large study. Loci in the recently reported genes ABCA1 and COL8A1 were also detected with suggestive evidence of association with advanced AMD. The novel variants identified in this study suggest that angiogenesis (VEGFA) and extracellular collagen matrix (FRK/COL10A1) pathways contribute to the development of advanced AMD.
Raychaudhuri S, Iartchouk O, Chin K, Tan P, Tai A, Ripke S, Gowrisankar S, Vemuri S, Montgomery K, Yu Y, Reynolds R, Zack D, Campochiaro B, Campochiaro P, Katsanis N, Daly M, Seddon J. A rare penetrant mutation in CFH confers high risk of age-related macular degeneration. Nat Genet. 2011;43(12):1232–6.
Two common variants in the gene encoding complement factor H (CFH), the Y402H substitution (rs1061170, c.1204C>T)(1-4) and the intronic rs1410996 SNP(5,6), explain 17% of age-related macular degeneration (AMD) liability. However, proof for the involvement of CFH, as opposed to a neighboring transcript, and knowledge of the potential mechanism of susceptibility alleles are lacking. Assuming that rare functional variants might provide mechanistic insights, we used genotype data and high-throughput sequencing to discover a rare, high-risk CFH haplotype with a c.3628C>T mutation that resulted in an R1210C substitution. This allele has been implicated previously in atypical hemolytic uremic syndrome, and it abrogates C-terminal ligand binding(7,8). Genotyping R1210C in 2,423 AMD cases and 1,122 controls demonstrated high penetrance (present in 40 cases versus 1 control, P = 7.0 × 10(-6)) and an association with a 6-year-earlier onset of disease (P = 2.3 × 10(-6)). This result suggests that loss-of-function alleles at CFH are likely to drive AMD risk. This finding represents one of the first instances in which a common complex disease variant has led to the discovery of a rare penetrant mutation.
Chen, Stahl, Kurreeman, Gregersen, Siminovitch, Worthington, Padyukov, Raychaudhuri, Plenge. Fine mapping the TAGAP risk locus in rheumatoid arthritis. Genes Immun. 2011;12(4):314–8.
A common allele at the TAGAP gene locus demonstrates a suggestive, but not conclusive association with risk of rheumatoid arthritis (RA). To fine map the locus, we conducted comprehensive imputation of CEU HapMap single-nucleotide polymorphisms (SNPs) in a genome-wide association study (GWAS) of 5,500 RA cases and 22,621 controls (all of European ancestry). After controlling for population stratification with principal components analysis, the strongest signal of association was to an imputed SNP, rs212389 (P=3.9 × 10(-8), odds ratio=0.87). This SNP remained highly significant upon conditioning on the previous RA risk variant (rs394581, P=2.2 × 10(-5)) or on a SNP previously associated with celiac disease and type I diabetes (rs1738074, P=1.7 × 10(-4)). Our study has refined the TAGAP signal of association to a single haplotype in RA, and in doing so provides conclusive statistical evidence that the TAGAP locus is associated with RA risk. Our study also underscores the utility of comprehensive imputation in large GWAS data sets to fine map disease risk alleles.
Kurreeman F, Liao K, Chibnik L, Hickey B, Stahl E, Gainer V, Li G, Bry L, Mahan S, Ardlie K, Thomson B, Szolovits P, Churchill S, Murphy S, Cai T, Raychaudhuri S, Kohane I, Karlson E, Plenge R. Genetic basis of autoantibody positive and negative rheumatoid arthritis risk in a multi-ethnic cohort derived from electronic health records. Am J Hum Genet. 2011;88(1):57–69.
Discovering and following up on genetic associations with complex phenotypes require large patient cohorts. This is particularly true for patient cohorts of diverse ancestry and clinically relevant subsets of disease. The ability to mine the electronic health records (EHRs) of patients followed as part of routine clinical care provides a potential opportunity to efficiently identify affected cases and unaffected controls for appropriate-sized genetic studies. Here, we demonstrate proof-of-concept that it is possible to use EHR data linked with biospecimens to establish a multi-ethnic case-control cohort for genetic research of a complex disease, rheumatoid arthritis (RA). In 1,515 EHR-derived RA cases and 1,480 controls matched for both genetic ancestry and disease-specific autoantibodies (anti-citrullinated protein antibodies [ACPA]), we demonstrate that the odds ratios and aggregate genetic risk score (GRS) of known RA risk alleles measured in individuals of European ancestry within our EHR cohort are nearly identical to those derived from a genome-wide association study (GWAS) of 5,539 autoantibody-positive RA cases and 20,169 controls. We extend this approach to other ethnic groups and identify a large overlap in the GRS among individuals of European, African, East Asian, and Hispanic ancestry. We also demonstrate that the distribution of a GRS based on 28 non-HLA risk alleles in ACPA+ cases partially overlaps with ACPA- subgroup of RA cases. Our study demonstrates that the genetic basis of rheumatoid arthritis risk is similar among cases of diverse ancestry divided into subsets based on ACPA status and emphasizes the utility of linking EHR clinical data with biospecimens for genetic studies.