Publications by Year: 2013

2013

Viatte S, Plant D, Raychaudhuri S. Genetics and epigenetics of rheumatoid arthritis. Nat Rev Rheumatol. 2013;9(3):141–53.
Investigators have made key advances in rheumatoid arthritis (RA) genetics in the past 10 years. Although genetic studies have had limited influence on clinical practice and drug discovery, they are currently generating testable hypotheses to explain disease pathogenesis. Firstly, we review here the major advances in identifying RA genetic susceptibility markers both within and outside of the MHC. Understanding how genetic variants translate into pathogenic mechanisms and ultimately into phenotypes remains a mystery for most of the polymorphisms that confer susceptibility to RA, but functional data are emerging. Interplay between environmental and genetic factors is poorly understood and in need of further investigation. Secondly, we review current knowledge of the role of epigenetics in RA susceptibility. Differences in the epigenome could represent one of the ways in which environmental exposures translate into phenotypic outcomes. The best understood epigenetic phenomena include post-translational histone modifications and DNA methylation events, both of which have critical roles in gene regulation. Epigenetic studies in RA represent a new area of research with the potential to answer unsolved questions.
Liao K, Kurreeman F, Li G, Duclos G, Murphy S, Guzman R, Cai T, Gupta N, Gainer V, Schur P, Cui J, Denny J, Szolovits P, Churchill S, Kohane I, Karlson E, Plenge R. Associations of autoantibodies, autoimmune risk alleles, and clinical diagnoses from the electronic medical records in rheumatoid arthritis cases and non-rheumatoid arthritis controls. Arthritis Rheum. 2013;65(3):571–81.
OBJECTIVE: The significance of non-rheumatoid arthritis (RA) autoantibodies in patients with RA is unclear. The aim of this study was to assess associations of autoantibodies with autoimmune risk alleles and with clinical diagnoses from the electronic medical records (EMRs) among RA cases and non-RA controls. METHODS: Data on 1,290 RA cases and 1,236 non-RA controls of European genetic ancestry were obtained from the EMRs of 2 large academic centers. The levels of anti-citrullinated protein antibodies (ACPAs), antinuclear antibodies (ANAs), anti-tissue transglutaminase antibodies (AGTAs), and anti-thyroid peroxidase (anti-TPO) antibodies were measured. All subjects were genotyped for autoimmune risk alleles, and the association between number of autoimmune risk alleles present and number of types of autoantibodies present was studied. A phenome-wide association study (PheWAS) was conducted to study potential associations between autoantibodies and clinical diagnoses among RA cases and non-RA controls. RESULTS: The mean ages were 60.7 years in RA cases and 64.6 years in non-RA controls. The proportion of female subjects was 79% in each group. The prevalence of ACPAs and ANAs was higher in RA cases compared to controls (each P < 0.0001); there were no differences in the prevalence of anti-TPO antibodies and AGTAs. Carriage of higher numbers of autoimmune risk alleles was associated with increasing numbers of autoantibody types in RA cases (P = 2.1 × 10(-5)) and non-RA controls (P = 5.0 × 10(-3)). From the PheWAS, the presence of ANAs was significantly associated with a diagnosis of Sjögren's/sicca syndrome in RA cases. CONCLUSION: The increased frequency of autoantibodies in RA cases and non-RA controls was associated with the number of autoimmune risk alleles carried by an individual. PheWAS of EMR data, with linkage to laboratory data obtained from blood samples, provide a novel method to test for the clinical significance of biomarkers in disease.
Consortium CDGPG, Lee H, Ripke S, Neale B, Faraone S, Purcell S, Perlis R, Mowry B, Thapar A, Goddard M, Witte J, Absher D, Agartz I, Akil H, Amin F, Andreassen O, Anjorin A, Anney R, Anttila V, Arking D, Asherson P, Azevedo M, Backlund L, Badner J, Bailey A, Banaschewski T, Barchas J, Barnes M, Barrett T, Bass N, Battaglia A, Bauer M, Bayés M, Bellivier F, Bergen S, Berrettini W, Betancur C, Bettecken T, Biederman J, Binder E, Black D, Blackwood D, Bloss C, Boehnke M, Boomsma D, Breen G, Breuer R, Bruggeman R, Cormican P, Buccola N, Buitelaar J, Bunney W, Buxbaum J, Byerley W, Byrne E, Caesar S, Cahn W, Cantor R, Casas M, Chakravarti A, Chambert K, Choudhury K, Cichon S, Cloninger R, Collier D, Cook E, Coon H, Cormand B, Corvin A, Coryell W, Craig D, Craig I, Crosbie J, Cuccaro M, Curtis D, Czamara D, Datta S, Dawson G, Day R, De Geus E, Degenhardt F, Djurovic S, Donohoe G, Doyle A, Duan J, Dudbridge F, Duketis E, Ebstein R, Edenberg H, Elia J, Ennis S, Etain B, Fanous A, Farmer A, Ferrier N, Flickinger M, Fombonne E, Foroud T, Frank J, Franke B, Fraser C, Freedman R, Freimer N, Freitag C, Friedl M, Frisén L, Gallagher L, Gejman P, Georgieva L, Gershon E, Geschwind D, Giegling I, Gill M, Gordon S, Gordon-Smith K, Green E, Greenwood T, Grice D, Gross M, Grozeva D, Guan W, Gurling H, De Haan L, Haines J, Hakonarson H, Hallmayer J, Hamilton S, Hamshere M, Hansen T, Hartmann A, Hautzinger M, Heath A, Henders A, Herms S, Hickie I, Hipolito M, Hoefels S, Holmans P, Holsboer F, Hoogendijk W, Hottenga JJ, Hultman C, Hus V, Ingason A, Ising M, Jamain S, Jones E, Jones I, Jones L, Tzeng JY, Kähler A, Kahn R, Kandaswamy R, Keller M, Kennedy J, Kenny E, Kent L, Kim Y, Kirov G, Klauck S, Klei L, Knowles J, Kohli M, Koller D, Konte B, Korszun A, Krabbendam L, Krasucki R, Kuntsi J, Kwan P, Landén M, Långström N, Lathrop M, Lawrence J, Lawson W, Leboyer M, Ledbetter D, Lee P, Lencz T, Lesch KP, Levinson D, Lewis C, Li J, Lichtenstein P, Lieberman J, Lin DY, Linszen D, Liu C, Lohoff F, Loo S, Lord C, Lowe J, Lucae S, MacIntyre D, Madden P, Maestrini E, Magnusson P, Mahon P, Maier W, Malhotra A, Mane S, Martin C, Martin N, Mattheisen M, Matthews K, Mattingsdal M, McCarroll S, McGhee K, McGough J, McGrath P, McGuffin P, McInnis M, McIntosh A, McKinney R, McLean A, McMahon F, McMahon W, McQuillin A, Medeiros H, Medland S, Meier S, Melle I, Meng F, Meyer J, Middeldorp C, Middleton L, Milanova V, Miranda A, Monaco A, Montgomery G, Moran J, Moreno-De-Luca D, Morken G, Morris D, Morrow E, Moskvina V, Muglia P, Mühleisen T, Muir W, Müller-Myhsok B, Murtha M, Myers R, Myin-Germeys I, Neale M, Nelson S, Nievergelt C, Nikolov I, Nimgaonkar V, Nolen W, Nöthen M, Nurnberger J, Nwulia E, Nyholt D, O’Dushlaine C, Oades R, Olincy A, Oliveira G, Olsen L, Ophoff R, Osby U, Owen M, Palotie A, Parr J, Paterson A, Pato C, Pato M, Penninx B, Pergadia M, Pericak-Vance M, Pickard B, Pimm J, Piven J, Posthuma D, Potash J, Poustka F, Propping P, Puri V, Quested D, Quinn E, Ramos-Quiroga JA, Rasmussen H, Raychaudhuri S, Rehnström K, Reif A, Ribasés M, Rice J, Rietschel M, Roeder K, Roeyers H, Rossin L, Rothenberger A, Rouleau G, Ruderfer D, Rujescu D, Sanders A, Sanders S, Santangelo S, Sergeant J, Schachar R, Schalling M, Schatzberg A, Scheftner W, Schellenberg G, Scherer S, Schork N, Schulze T, Schumacher J, Schwarz M, Scolnick E, Scott L, Shi J, Shilling P, Shyn S, Silverman J, Slager S, Smalley S, Smit J, Smith E, Sonuga-Barke E, St Clair D, State M, Steffens M, Steinhausen HC, Strauss J, Strohmaier J, Stroup S, Sutcliffe J, Szatmari P, Szelinger S, Thirumalai S, Thompson R, Todorov A, Tozzi F, Treutlein J, Uhr M, Oord E, Van Grootheest G, Os J, Vicente A, Vieland V, Vincent J, Visscher P, Walsh C, Wassink T, Watson S, Weissman M, Werge T, Wienker T, Wijsman E, Willemsen G, Williams N, Willsey J, Witt S, Xu W, Young A, Yu T, Zammit S, Zandi P, Zhang P, Zitman F, Zöllner S, Devlin B, Kelsoe J, Sklar P, Daly M, O’Donovan M, Craddock N, Sullivan P, Smoller J, Kendler K, Wray N, International Inflammatory Bowel Disease Genetics Consortium (IIBDGC). Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat Genet. 2013;45(9):984–94.
Most psychiatric disorders are moderately to highly heritable. The degree to which genetic variation is unique to individual disorders or shared across disorders is unclear. To examine shared genetic etiology, we use genome-wide genotype data from the Psychiatric Genomics Consortium (PGC) for cases and controls in schizophrenia, bipolar disorder, major depressive disorder, autism spectrum disorders (ASD) and attention-deficit/hyperactivity disorder (ADHD). We apply univariate and bivariate methods for the estimation of genetic variation within and covariation between disorders. SNPs explained 17-29% of the variance in liability. The genetic correlation calculated using common SNPs was high between schizophrenia and bipolar disorder (0.68 ± 0.04 s.e.), moderate between schizophrenia and major depressive disorder (0.43 ± 0.06 s.e.), bipolar disorder and major depressive disorder (0.47 ± 0.06 s.e.), and ADHD and major depressive disorder (0.32 ± 0.07 s.e.), low between schizophrenia and ASD (0.16 ± 0.06 s.e.) and non-significant for other pairs of disorders as well as between psychiatric disorders and the negative control of Crohn's disease. This empirical evidence of shared genetic etiology for psychiatric disorders can inform nosology and encourages the investigation of common pathophysiologies for related disorders.
Seddon J, Yu Y, Miller E, Reynolds R, Tan P, Gowrisankar S, Goldstein J, Triebwasser M, Anderson H, Zerbib J, Kavanagh D, Souied E, Katsanis N, Daly M, Atkinson J, Raychaudhuri S. Rare variants in CFI, C3 and C9 are associated with high risk of advanced age-related macular degeneration. Nat Genet. 2013;45(11):1366–70.
To define the role of rare variants in advanced age-related macular degeneration (AMD) risk, we sequenced the exons of 681 genes within all reported AMD loci and related pathways in 2,493 cases and controls. We first tested each gene for increased or decreased burden of rare variants in cases compared to controls. We found that 7.8% of AMD cases compared to 2.3% of controls are carriers of rare missense CFI variants (odds ratio (OR) = 3.6; P = 2 × 10(-8)). There was a predominance of dysfunctional variants in cases compared to controls. We then tested individual variants for association with disease. We observed significant association with rare missense alleles in genes other than CFI. Genotyping in 5,115 independent samples confirmed associations with AMD of an allele in C3 encoding p.Lys155Gln (replication P = 3.5 × 10(-5), OR = 2.8; joint P = 5.2 × 10(-9), OR = 3.8) and an allele in C9 encoding p.Pro167Ser (replication P = 2.4 × 10(-5), OR = 2.2; joint P = 6.5 × 10(-7), OR = 2.2). Finally, we show that the allele of C3 encoding Gln155 results in resistance to proteolytic inactivation by CFH and CFI. These results implicate loss of C3 protein regulation and excessive alternative complement activation in AMD pathogenesis, thus informing both the direction of effect and mechanistic underpinnings of this disorder.
Liao K, Cai T, Gainer V, Cagan A, Murphy S, Liu C, Churchill S, Shaw S, Kohane I, Solomon D, Plenge R, Karlson E. Lipid and lipoprotein levels and trend in rheumatoid arthritis compared to the general population. Arthritis Care Res (Hoboken). 2013;65(12):2046–50.
OBJECTIVE: Differences in lipid levels associated with cardiovascular (CV) risk between rheumatoid arthritis (RA) patients and the general population remain unclear. Determining these differences is important in understanding the role of lipids in CV risk in RA. METHODS: We studied 2,005 RA subjects from 2 large academic medical centers. We extracted electronic medical record data on the first low-density lipoprotein (LDL) measurement, and total cholesterol and high-density lipoprotein (HDL) measurements within 1 year of the LDL measurement. Subjects with an electronic statin prescription prior to the first LDL measurement were excluded. We compared lipid levels in RA patients to recently published levels from the general US population using the t-test and stratifying by published parameters, i.e., 2007-2010, and women. We determined lipid trends using separate linear regression models for total cholesterol, LDL cholesterol, and HDL cholesterol, testing the association between year of measurement (1989-2010) and lipid level, adjusted by age and sex. Lipid trends in RA were qualitatively compared to the published general population trends. RESULTS: Women with RA had a significantly lower total cholesterol (186 versus 200 mg/dl; P = 0.002) and LDL cholesterol (105 versus 118 mg/dl; P = 0.001) compared to the general population (2007-2010). HDL cholesterol was not significantly different in the 2 groups. In the RA cohort, total cholesterol and LDL cholesterol significantly decreased each year, while HDL cholesterol increased (all with P < 0.0001), consistent with overall trends observed in a previous study. CONCLUSION: RA patients appear to have an overall lower total cholesterol and LDL cholesterol than the general population despite the general overall risk of CV disease in RA from observational studies.
Chen Y, Carroll R, Hinz EM, Shah A, Eyler A, Denny J, Xu H. Applying active learning to high-throughput phenotyping algorithms for electronic health records data. J Am Med Inform Assoc. 2013;20(e2):e253–9.
OBJECTIVES: Generalizable, high-throughput phenotyping methods based on supervised machine learning (ML) algorithms could significantly accelerate the use of electronic health records data for clinical and translational research. However, they often require large numbers of annotated samples, which are costly and time-consuming to review. We investigated the use of active learning (AL) in ML-based phenotyping algorithms. METHODS: We integrated an uncertainty sampling AL approach with support vector machines-based phenotyping algorithms and evaluated its performance using three annotated disease cohorts including rheumatoid arthritis (RA), colorectal cancer (CRC), and venous thromboembolism (VTE). We investigated performance using two types of feature sets: unrefined features, which contained at least all clinical concepts extracted from notes and billing codes; and a smaller set of refined features selected by domain experts. The performance of the AL was compared with a passive learning (PL) approach based on random sampling. RESULTS: Our evaluation showed that AL outperformed PL on three phenotyping tasks. When unrefined features were used in the RA and CRC tasks, AL reduced the number of annotated samples required to achieve an area under the curve (AUC) score of 0.95 by 68% and 23%, respectively. AL also achieved a reduction of 68% for VTE with an optimal AUC of 0.70 using refined features. As expected, refined features improved the performance of phenotyping classifiers and required fewer annotated samples. CONCLUSIONS: This study demonstrated that AL can be useful in ML-based phenotyping methods. Moreover, AL and feature engineering based on domain knowledge could be combined to develop efficient and generalizable phenotyping methods.
Xie G, Roshandel D, Sherva R, Monach P, Lu EY, Kung T, Carrington K, Zhang S, Pulit S, Ripke S, Carette S, Dellaripa P, Edberg J, Hoffman G, Khalidi N, Langford C, Mahr A, St Clair W, Seo P, Specks U, Spiera R, Stone J, Ytterberg S, Raychaudhuri S, Bakker P, Farrer L, Amos C, Merkel P, Siminovitch K. Association of granulomatosis with polyangiitis (Wegener’s) with HLA-DPB1*04 and SEMA6A gene variants: evidence from genome-wide analysis. Arthritis Rheum. 2013;65(9):2457–68.
OBJECTIVE: To identify genetic determinants of granulomatosis with polyangiitis (Wegener's) (GPA). METHODS: We carried out a genome-wide association study (GWAS) of 492 GPA cases and 1,506 healthy controls (white subjects of European descent), followed by replication analysis of the most strongly associated signals in an independent cohort of 528 GPA cases and 1,228 controls. RESULTS: Genome-wide significant associations were identified in 32 single-nucleotide polymorphic (SNP) markers across the HLA region, the majority of which were located in the HLA-DPB1 and HLA-DPA1 genes encoding the class II major histocompatibility complex (MHC) DPβ chain 1 and DPα chain 1 proteins, respectively. Peak association signals in these 2 genes, emanating from SNPs rs9277554 (for DPβ chain 1) and rs9277341 (DPα chain 1) were strongly replicated in an independent cohort (in the combined analysis of the initial cohort and the replication cohort, P = 1.92 × 10(-50) and 2.18 × 10(-39) , respectively). Imputation of classic HLA alleles and conditional analyses revealed that the SNP association signal was fully accounted for by the classic HLA-DPB1*04 allele. An independent single SNP, rs26595, near SEMA6A (the gene for semaphorin 6A) on chromosome 5, was also associated with GPA, reaching genome-wide significance in a combined analysis of the GWAS and replication cohorts (P = 2.09 × 10(-8) ). CONCLUSION: We identified the SEMA6A and HLA-DP loci as significant contributors to risk for GPA, with the HLA-DPB1*04 allele almost completely accounting for the MHC association. These two associations confirm the critical role of immunogenetic factors in the development of GPA.
Jia X, Han B, Onengut-Gumuscu S, Chen WM, Concannon P, Rich S, Raychaudhuri S, Bakker P. Imputing amino acid polymorphisms in human leukocyte antigens. PLoS One. 2013;8(6):e64683.
DNA sequence variation within human leukocyte antigen (HLA) genes mediate susceptibility to a wide range of human diseases. The complex genetic structure of the major histocompatibility complex (MHC) makes it difficult, however, to collect genotyping data in large cohorts. Long-range linkage disequilibrium between HLA loci and SNP markers across the major histocompatibility complex (MHC) region offers an alternative approach through imputation to interrogate HLA variation in existing GWAS data sets. Here we describe a computational strategy, SNP2HLA, to impute classical alleles and amino acid polymorphisms at class I (HLA-A, -B, -C) and class II (-DPA1, -DPB1, -DQA1, -DQB1, and -DRB1) loci. To characterize performance of SNP2HLA, we constructed two European ancestry reference panels, one based on data collected in HapMap-CEPH pedigrees (90 individuals) and another based on data collected by the Type 1 Diabetes Genetics Consortium (T1DGC, 5,225 individuals). We imputed HLA alleles in an independent data set from the British 1958 Birth Cohort (N = 918) with gold standard four-digit HLA types and SNPs genotyped using the Affymetrix GeneChip 500 K and Illumina Immunochip microarrays. We demonstrate that the sample size of the reference panel, rather than SNP density of the genotyping platform, is critical to achieve high imputation accuracy. Using the larger T1DGC reference panel, the average accuracy at four-digit resolution is 94.7% using the low-density Affymetrix GeneChip 500 K, and 96.7% using the high-density Illumina Immunochip. For amino acid polymorphisms within HLA genes, we achieve 98.6% and 99.3% accuracy using the Affymetrix GeneChip 500 K and Illumina Immunochip, respectively. Finally, we demonstrate how imputation and association testing at amino acid resolution can facilitate fine-mapping of primary MHC association signals, giving a specific example from type 1 diabetes.
Lin C, Karlson E, Canhao H, Miller T, Dligach D, Chen PJ, Perez RNG, Shen Y, Weinblatt M, Shadick N, Plenge R, Savova G. Automatic prediction of rheumatoid arthritis disease activity from the electronic medical records. PLoS One. 2013;8(8):e69932.
OBJECTIVE: We aimed to mine the data in the Electronic Medical Record to automatically discover patients' Rheumatoid Arthritis disease activity at discrete rheumatology clinic visits. We cast the problem as a document classification task where the feature space includes concepts from the clinical narrative and lab values as stored in the Electronic Medical Record. MATERIALS AND METHODS: The Training Set consisted of 2792 clinical notes and associated lab values. Test Set 1 included 1749 clinical notes and associated lab values. Test Set 2 included 344 clinical notes for which there were no associated lab values. The Apache clinical Text Analysis and Knowledge Extraction System was used to analyze the text and transform it into informative features to be combined with relevant lab values. RESULTS: Experiments over a range of machine learning algorithms and features were conducted. The best performing combination was linear kernel Support Vector Machines with Unified Medical Language System Concept Unique Identifier features with feature selection and lab values. The Area Under the Receiver Operating Characteristic Curve (AUC) is 0.831 (σ = 0.0317), statistically significant as compared to two baselines (AUC = 0.758, σ = 0.0291). Algorithms demonstrated superior performance on cases clinically defined as extreme categories of disease activity (Remission and High) compared to those defined as intermediate categories (Moderate and Low) and included laboratory data on inflammatory markers. CONCLUSION: Automatic Rheumatoid Arthritis disease activity discovery from Electronic Medical Record data is a learnable task approximating human performance. As a result, this approach might have several research applications, such as the identification of patients for genome-wide pharmacogenetic studies that require large sample sizes with precise definitions of disease activity and response to therapies.