Publications

2009

Linden M, Feitsma A, Cessie S, Kern M, Olsson L, Raychaudhuri S, Begovich A, Chang M, Catanese J, Kurreeman F, Nies J, Heijde D, Gregersen P, Huizinga T, Toes R, Helm-van Mil A. Association of a single-nucleotide polymorphism in CD40 with the rate of joint destruction in rheumatoid arthritis. Arthritis Rheum. 2009;60(8):2242–7.
OBJECTIVE: The severity of joint destruction in rheumatoid arthritis (RA) is highly variable from patient to patient and is influenced by genetic factors. Genome-wide association studies have enormously boosted the field of the genetics of RA susceptibility, but risk loci for RA severity remain poorly defined. A recent meta-analysis of genome-wide association studies identified 6 genetic regions for susceptibility to autoantibody-positive RA: CD40, KIF5A/PIP4K2C, CDK6, CCL21, PRKCQ, and MMEL1/TNFRSF14. The purpose of this study was to investigate whether these newly described genetic regions are associated with the rate of joint destruction. METHODS: RA patients enrolled in the Leiden Early Arthritis Clinic were studied (n=563). Yearly radiographs were scored using the Sharp/van der Heijde method (median followup 5 years; maximum followup 9 years). The rate of joint destruction between genotype groups was compared using a linear mixed model, correcting for age, sex, and treatment strategies. A total of 393 anti-citrullinated protein antibody (ACPA)-positive RA patients from the North American Rheumatoid Arthritis Consortium (NARAC) who had radiographic data available were used for the replication study. RESULTS: The TT and CC/CG genotypes of 2 single-nucleotide polymorphisms, rs4810485 (CD40) and rs42041 (CDK6), respectively, were associated with a higher rate of joint destruction in ACPA-positive RA patients (P=0.003 and P=0.012, respectively), with rs4810485 being significant after Bonferroni correction for multiple testing. The association of the CD40 minor allele with the rate of radiographic progression was replicated in the NARAC cohort (P=0.021). CONCLUSION: A polymorphism in the CD40 locus is associated with the rate of joint destruction in patients with ACPA-positive RA. Our findings provide one of the first non-HLA-related genetic severity factors that has been replicated.
Kong A, Steinthorsdottir V, Masson G, Thorleifsson G, Sulem P, Besenbacher S, Jonasdottir A, Sigurdsson A, Kristinsson KT, Jonasdottir A, Frigge M, Gylfason A, Olason P, Gudjonsson S, Sverrisson S, Stacey S, Sigurgeirsson B, Benediktsdottir K, Sigurdsson H, Jonsson T, Benediktsson R, Olafsson J, Johannsson OT, Hreidarsson A, Sigurdsson G, DIAGRAM Consortium, Ferguson-Smith A, Gudbjartsson D, Thorsteinsdottir U, Stefansson K. Parental origin of sequence variants associated with complex diseases. Nature. 2009;462(7275):868–74.
Effects of susceptibility variants may depend on from which parent they are inherited. Although many associations between sequence variants and human traits have been discovered through genome-wide associations, the impact of parental origin has largely been ignored. Here we show that for 38,167 Icelanders genotyped using single nucleotide polymorphism (SNP) chips, the parental origin of most alleles can be determined. For this we used a combination of genealogy and long-range phasing. We then focused on SNPs that associate with diseases and are within 500 kilobases of known imprinted genes. Seven independent SNP associations were examined. Five-one with breast cancer, one with basal-cell carcinoma and three with type 2 diabetes-have parental-origin-specific associations. These variants are located in two genomic regions, 11p15 and 7q32, each harbouring a cluster of imprinted genes. Furthermore, we observed a novel association between the SNP rs2334499 at 11p15 and type 2 diabetes. Here the allele that confers risk when paternally inherited is protective when maternally transmitted. We identified a differentially methylated CTCF-binding site at 11p15 and demonstrated correlation of rs2334499 with decreased methylation of that site.
Lee Y, Raychaudhuri S, Cui J, De Vivo I, Ding B, Alfredsson L, Padyukov L, Costenbader K, Seielstad M, Graham R, Klareskog L, Gregersen P, Plenge R, Karlson E. The PRL -1149 G/T polymorphism and rheumatoid arthritis susceptibility. Arthritis Rheum. 2009;60(5):1250–4.
OBJECTIVE: Previous studies have demonstrated that the PRL -1149 T (minor) allele decreases prolactin expression and may be associated with autoimmune disease. The aim of this study was to determine the role of the PRL -1149 G/T polymorphism (rs1341239) in rheumatoid arthritis (RA) susceptibility. METHODS: We examined the association between PRL -1149 G/T and RA risk in 4 separate study populations, consisting of a total of 3,405 RA cases and 4,111 controls of self-reported white European ancestry. Samples were genotyped using 1 of 3 genotyping platforms, and strict quality control metrics were applied. We tested for association using a 2-tailed Cochran-Mantel-Haenszel additive, fixed-effects model. RESULTS: In the individual populations, odds ratios (ORs) for an association between PRL -1149 T and RA risk ranged from 0.80 to 0.97. In a joint meta-analysis across all 4 populations, the OR for an association between PRL -1149 T and RA risk was 0.90 (95% confidence interval 0.84-0.96, P=0.001). CONCLUSION: Our findings indicate a possible association between the PRL -1149 T allele and decreased RA risk. The effect size is small but similar to ORs for other genetic polymorphisms associated with complex traits, including RA.
Raychaudhuri S, Thomson B, Remmers E, Eyre S, Hinks A, Guiducci C, Catanese J, Xie G, Stahl E, Chen R, Alfredsson L, Amos C, Ardlie K, BIRAC Consortium, Barton A, Bowes J, Burtt N, Chang M, Coblyn J, Costenbader K, Criswell L, Crusius B, Cui J, De Jager P, Ding B, Emery P, Flynn E, Harrison P, Hocking L, Huizinga T, Kastner D, Ke X, Kurreeman F, Lee A, Liu X, Li Y, Martin P, Morgan A, Padyukov L, Reid D, Seielstad M, Seldin M, Shadick N, Steer S, Tak P, Thomson W, Helm-van Mil A, Horst-Bruinsma I, Weinblatt M, Wilson A, Wolbink GJ, Wordsworth P, YEAR Consortium, Altshuler D, Karlson E, Toes R, Vries N, Begovich A, Siminovitch K, Worthington J, Klareskog L, Gregersen P, Daly M, Plenge R. Genetic variants at CD28, PRDM1 and CD2/CD58 are associated with rheumatoid arthritis risk. Nat Genet. 2009;41(12):1313–8.
To discover new rheumatoid arthritis (RA) risk loci, we systematically examined 370 SNPs from 179 independent loci with P < 0.001 in a published meta-analysis of RA genome-wide association studies (GWAS) of 3,393 cases and 12,462 controls. We used Gene Relationships Across Implicated Loci (GRAIL), a computational method that applies statistical text mining to PubMed abstracts, to score these 179 loci for functional relationships to genes in 16 established RA disease loci. We identified 22 loci with a significant degree of functional connectivity. We genotyped 22 representative SNPs in an independent set of 7,957 cases and 11,958 matched controls. Three were convincingly validated: CD2-CD58 (rs11586238, P = 1 x 10(-6) replication, P = 1 x 10(-9) overall), CD28 (rs1980422, P = 5 x 10(-6) replication, P = 1 x 10(-9) overall) and PRDM1 (rs548234, P = 1 x 10(-5) replication, P = 2 x 10(-8) overall). An additional four were replicated (P < 0.0023): TAGAP (rs394581, P = 0.0002 replication, P = 4 x 10(-7) overall), PTPRC (rs10919563, P = 0.0003 replication, P = 7 x 10(-7) overall), TRAF6-RAG1 (rs540386, P = 0.0008 replication, P = 4 x 10(-6) overall) and FCGR2A (rs12746613, P = 0.0022 replication, P = 2 x 10(-5) overall). Many of these loci are also associated to other immunologic diseases.
De Jager P, Jia X, Wang J, Bakker P, Ottoboni L, Aggarwal N, Piccio L, Raychaudhuri S, Tran D, Aubin C, Briskin R, Romano S, International MS Genetics Consortium, Baranzini S, McCauley J, Pericak-Vance M, Haines J, Gibson R, Naeglin Y, Uitdehaag B, Matthews P, Kappos L, Polman C, McArdle W, Strachan D, Evans D, Cross A, Daly M, Compston A, Sawcer S, Weiner H, Hauser S, Hafler D, Oksenberg J. Meta-analysis of genome scans and replication identify CD6, IRF8 and TNFRSF1A as new multiple sclerosis susceptibility loci. Nat Genet. 2009;41(7):776–82.
We report the results of a meta-analysis of genome-wide association scans for multiple sclerosis (MS) susceptibility that includes 2,624 subjects with MS and 7,220 control subjects. Replication in an independent set of 2,215 subjects with MS and 2,116 control subjects validates new MS susceptibility loci at TNFRSF1A (combined P = 1.59 x 10(-11)), IRF8 (P = 3.73 x 10(-9)) and CD6 (P = 3.79 x 10(-9)). TNFRSF1A harbors two independent susceptibility alleles: rs1800693 is a common variant with modest effect (odds ratio = 1.2), whereas rs4149584 is a nonsynonymous coding polymorphism of low frequency but with stronger effect (allele frequency = 0.02; odds ratio = 1.6). We also report that the susceptibility allele near IRF8, which encodes a transcription factor known to function in type I interferon signaling, is associated with higher mRNA expression of interferon-response pathway genes in subjects with MS.

2008

Raychaudhuri S, Remmers E, Lee A, Hackett R, Guiducci C, Burtt N, Gianniny L, Korman B, Padyukov L, Kurreeman F, Chang M, Catanese J, Ding B, Wong S, Helm-van Mil A, Neale B, Coblyn J, Cui J, Tak P, Wolbink GJ, Crusius B, Horst-Bruinsma I, Criswell L, Amos C, Seldin M, Kastner D, Ardlie K, Alfredsson L, Costenbader K, Altshuler D, Huizinga T, Shadick N, Weinblatt M, Vries N, Worthington J, Seielstad M, Toes R, Karlson E, Begovich A, Klareskog L, Gregersen P, Daly M, Plenge R. Common variants at CD40 and other loci confer risk of rheumatoid arthritis. Nat Genet. 2008;40(10):1216–23.
To identify rheumatoid arthritis risk loci in European populations, we conducted a meta-analysis of two published genome-wide association (GWA) studies totaling 3,393 cases and 12,462 controls. We genotyped 31 top-ranked SNPs not previously associated with rheumatoid arthritis in an independent replication of 3,929 autoantibody-positive rheumatoid arthritis cases and 5,807 matched controls from eight separate collections. We identified a common variant at the CD40 gene locus (rs4810485, P = 0.0032 replication, P = 8.2 x 10(-9) overall, OR = 0.87). Along with other associations near TRAF1 (refs. 2,3) and TNFAIP3 (refs. 4,5), this implies a central role for the CD40 signaling pathway in rheumatoid arthritis pathogenesis. We also identified association at the CCL21 gene locus (rs2812378, P = 0.00097 replication, P = 2.8 x 10(-7) overall), a gene involved in lymphocyte trafficking. Finally, we identified evidence of association at four additional gene loci: MMEL1-TNFRSF14 (rs3890745, P = 0.0035 replication, P = 1.1 x 10(-7) overall), CDK6 (rs42041, P = 0.010 replication, P = 4.0 x 10(-6) overall), PRKCQ (rs4750316, P = 0.0078 replication, P = 4.4 x 10(-6) overall), and KIF5A-PIP4K2C (rs1678542, P = 0.0026 replication, P = 8.8 x 10(-8) overall).
Bakker P, Ferreira M, Jia X, Neale B, Raychaudhuri S, Voight B. Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum Mol Genet. 2008;17(R2):R122–8.
Motivated by the overwhelming success of genome-wide association studies, droves of researchers are working vigorously to exchange and to combine genetic data to expediently discover genetic risk factors for common human traits. The primary tools that fuel these new efforts are imputation, allowing researchers who have collected data on a diversity of genotype platforms to share data in a uniformly exchangeable format, and meta-analysis for pooling statistical support for a genotype-phenotype association. As many groups are forming collaborations to engage in these efforts, this review collects a series of guidelines, practical detail and learned experiences from a variety of individuals who have contributed to the subject.

2003

Raychaudhuri S, Altman R. A literature-based method for assessing the functional coherence of a gene group. Bioinformatics. 2003;19(3):396–401.
MOTIVATION: Many experimental and algorithmic approaches in biology generate groups of genes that need to be examined for related functional properties. For example, gene expression profiles are frequently organized into clusters of genes that may share functional properties. We evaluate a method, neighbor divergence per gene (NDPG), that uses scientific literature to assess whether a group of genes are functionally related. The method requires only a corpus of documents and an index connecting the documents to genes. RESULTS: We evaluate NDPG on 2796 functional groups generated by the Gene Ontology consortium in four organisms: mouse, fly, worm and yeast. NDPG finds functional coherence in 96, 92, 82 and 45% of the groups (at 99.9% specificity) in yeast, mouse, fly and worm respectively.
Raychaudhuri S, Chang J, Imam F, Altman R. The computational analysis of scientific literature to define and recognize gene expression clusters. Nucleic Acids Res. 2003;31(15):4553–60.
A limitation of many gene expression analytic approaches is that they do not incorporate comprehensive background knowledge about the genes into the analysis. We present a computational method that leverages the peer-reviewed literature in the automatic analysis of gene expression data sets. Including the literature in the analysis of gene expression data offers an opportunity to incorporate functional information about the genes when defining expression clusters. We have created a method that associates gene expression profiles with known biological functions. Our method has two steps. First, we apply hierarchical clustering to the given gene expression data set. Secondly, we use text from abstracts about genes to (i) resolve hierarchical cluster boundaries to optimize the functional coherence of the clusters and (ii) recognize those clusters that are most functionally coherent. In the case where a gene has not been investigated and therefore lacks primary literature, articles about well-studied homologous genes are added as references. We apply our method to two large gene expression data sets with different properties. The first contains measurements for a subset of well-studied Saccharomyces cerevisiae genes with multiple literature references, and the second contains newly discovered genes in Drosophila melanogaster; many have no literature references at all. In both cases, we are able to rapidly define and identify the biologically relevant gene expression profiles without manual intervention. In both cases, we identified novel clusters that were not noted by the original investigators.

2002

Raychaudhuri S, Chang J, Sutphin P, Altman R. Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. Genome Res. 2002;12(1):203–14.
Functional characterizations of thousands of gene products from many species are described in the published literature. These discussions are extremely valuable for characterizing the functions not only of these gene products, but also of their homologs in other organisms. The Gene Ontology (GO) is an effort to create a controlled terminology for labeling gene functions in a more precise, reliable, computer-readable manner. Currently, the best annotations of gene function with the GO are performed by highly trained biologists who read the literature and select appropriate codes. In this study, we explored the possibility that statistical natural language processing techniques can be used to assign GO codes. We compared three document classification methods (maximum entropy modeling, naïve Bayes classification, and nearest-neighbor classification) to the problem of associating a set of GO codes (for biological process) to literature abstracts and thus to the genes associated with the abstracts. We showed that maximum entropy modeling outperforms the other methods and achieves an accuracy of 72% when ascertaining the function discussed within an abstract. The maximum entropy method provides confidence measures that correlate well with performance. We conclude that statistical methods may be used to assign GO codes and may be useful for the difficult task of reassignment as terminology standards evolve over time.