Publications

2008

Hazelhurst, Scott, Winston Hide, Zsuzsanna Lipták, Ramon Nogueira, and Richard Starfield. [2008] 2008. “An Overview of the Wcd EST Clustering Tool..” Bioinformatics (Oxford, England) 24(13):1542-6. doi: 10.1093/bioinformatics/btn203.

UNLABELLED: The wcd system is an open source tool for clustering expressed sequence tags (EST) and other DNA and RNA sequences. wcd allows efficient all-versus-all comparison of ESTs using either the d(2) distance function or edit distance, improving existing implementations of d(2). It supports merging, refinement and reclustering of clusters. It is 'drop in' compatible with the StackPack clustering package. wcd supports parallelization under both shared memory and cluster architectures. It is distributed with an EMBOSS wrapper allowing wcd to be installed as part of an EMBOSS installation (and so provided by a web server).

AVAILABILITY: wcd is distributed under a GPL licence and is available from http://code.google.com/p/wcdest.

SUPPLEMENTARY INFORMATION: Additional experimental results. The wcd manual, a companion paper describing underlying algorithms, and all datasets used for experimentation can also be found at www.bioinf.wits.ac.za/ scott/wcdsupp.html.

Howe, Doug, Maria Costanzo, Petra Fey, Takashi Gojobori, Linda Hannick, Winston Hide, David P Hill, Renate Kania, Mary Schaeffer, Susan St Pierre, Simon Twigger, Owen White, and Seung Yon Rhee. [2008] 2008. “Big Data: The Future of Biocuration..” Nature 455(7209):47-50. doi: 10.1038/455047a.
Kaur, Mandeep, Sebastian Schmeier, Cameron R MacPherson, Oliver Hofmann, Winston A Hide, Stephen Taylor, Nick Willcox, and Vladimir B Bajic. [2008] 2008. “Prioritizing Genes of Potential Relevance to Diseases Affected by Sex Hormones: An Example of Myasthenia Gravis..” BMC Genomics 9:481. doi: 10.1186/1471-2164-9-481.

BACKGROUND: About 5% of western populations are afflicted by autoimmune diseases many of which are affected by sex hormones. Autoimmune diseases are complex and involve many genes. Identifying these disease-associated genes contributes to development of more effective therapies. Also, association studies frequently imply genomic regions that contain disease-associated genes but fall short of pinpointing these genes. The identification of disease-associated genes has always been challenging and to date there is no universal and effective method developed.

RESULTS: We have developed a method to prioritize disease-associated genes for diseases affected strongly by sex hormones. Our method uses various types of information available for the genes, but no information that directly links genes with the disease. It generates a score for each of the considered genes and ranks genes based on that score. We illustrate our method on early-onset myasthenia gravis (MG) using genes potentially controlled by estrogen and localized in a genomic segment (which contains the MHC and surrounding region) strongly associated with MG. Based on the considered genomic segment 283 genes are ranked for their relevance to MG and responsiveness to estrogen. The top three ranked genes, HLA-G, TAP2 and HLA-DRB1, are implicated in autoimmune diseases, while TAP2 is associated with SNPs characteristic for MG. Within the top 35 prioritized genes our method identifies 90% of the 10 already known MG-associated genes from the considered region without using any information that directly links genes to MG. Among the top eight genes we identified HLA-G and TUBB as new candidates. We show that our ab-initio approach outperforms the other methods for prioritizing disease-associated genes.

CONCLUSION: We have developed a method to prioritize disease-associated genes under the potential control of sex hormones. We demonstrate the success of this method by prioritizing the genes localized in the MHC and surrounding region and evaluating the role of these genes as potential candidates for estrogen control as well as MG. We show that our method outperforms the other methods. The method has a potential to be adapted to prioritize genes relevant to other diseases.

Hofmann, Oliver, Otavia L Caballero, Brian J Stevenson, Yao-Tseng Chen, Tzeela Cohen, Ramon Chua, Christopher A Maher, Sumir Panji, Ulf Schaefer, Adele Kruger, Minna Lehvaslaiho, Piero Carninci, Yoshihide Hayashizaki, Victor Jongeneel, Andrew J G Simpson, Lloyd J Old, and Winston Hide. [2008] 2008. “Genome-Wide Analysis of Cancer/Testis Gene Expression..” Proceedings of the National Academy of Sciences of the United States of America 105(51):20422-7. doi: 10.1073/pnas.0810777105.

Cancer/Testis (CT) genes, normally expressed in germ line cells but also activated in a wide range of cancer types, often encode antigens that are immunogenic in cancer patients, and present potential for use as biomarkers and targets for immunotherapy. Using multiple in silico gene expression analysis technologies, including twice the number of expressed sequence tags used in previous studies, we have performed a comprehensive genome-wide survey of expression for a set of 153 previously described CT genes in normal and cancer expression libraries. We find that although they are generally highly expressed in testis, these genes exhibit heterogeneous gene expression profiles, allowing their classification into testis-restricted (39), testis/brain-restricted (14), and a testis-selective (85) group of genes that show additional expression in somatic tissues. The chromosomal distribution of these genes confirmed the previously observed dominance of X chromosome location, with CT-X genes being significantly more testis-restricted than non-X CT. Applying this core classification in a genome-wide survey we identified >30 CT candidate genes; 3 of them, PEPP-2, OTOA, and AKAP4, were confirmed as testis-restricted or testis-selective using RT-PCR, with variable expression frequencies observed in a panel of cancer cell lines. Our classification provides an objective ranking for potential CT genes, which is useful in guiding further identification and characterization of these potentially important diagnostic and therapeutic targets.

2007

Kruger, Adele, Oliver Hofmann, Piero Carninci, Yoshihide Hayashizaki, and Winston Hide. [2007] 2007. “Simplified Ontologies Allowing Comparison of Developmental Mammalian Gene Expression..” Genome Biology 8(10):R229.

Model organisms represent an important resource for understanding the fundamental aspects of mammalian biology. Mapping of biological phenomena between model organisms is complex and if it is to be meaningful, a simplified representation can be a powerful means for comparison. The Developmental eVOC ontologies presented here are simplified orthogonal ontologies describing the temporal and spatial distribution of developmental human and mouse anatomy. We demonstrate the ontologies by identifying genes showing a bias for developmental brain expression in human and mouse.

Stevenson, Brian J, Christian Iseli, Sumir Panji, Monique Zahn-Zabal, Winston Hide, Lloyd J Old, Andrew J Simpson, and Victor Jongeneel. [2007] 2007. “Rapid Evolution of Cancer/Testis Genes on the X Chromosome..” BMC Genomics 8:129.

BACKGROUND: Cancer/testis (CT) genes are normally expressed only in germ cells, but can be activated in the cancer state. This unusual property, together with the finding that many CT proteins elicit an antigenic response in cancer patients, has established a role for this class of genes as targets in immunotherapy regimes. Many families of CT genes have been identified in the human genome, but their biological function for the most part remains unclear. While it has been shown that some CT genes are under diversifying selection, this question has not been addressed before for the class as a whole.

RESULTS: To shed more light on this interesting group of genes, we exploited the generation of a draft chimpanzee (Pan troglodytes) genomic sequence to examine CT genes in an organism that is closely related to human, and generated a high-quality, manually curated set of human:chimpanzee CT gene alignments. We find that the chimpanzee genome contains homologues to most of the human CT families, and that the genes are located on the same chromosome and at a similar copy number to those in human. Comparison of putative human:chimpanzee orthologues indicates that CT genes located on chromosome X are diverging faster and are undergoing stronger diversifying selection than those on the autosomes or than a set of control genes on either chromosome X or autosomes.

CONCLUSION: Given their high level of diversifying selection, we suggest that CT genes are primarily responsible for the observed rapid evolution of protein-coding genes on the X chromosome.

Seoighe, Cathal, Farahnaz Ketwaroo, Visva Pillay, Konrad Scheffler, Natasha Wood, Rodger Duffet, Marketa Zvelebil, Neil Martinson, James McIntyre, Lynn Morris, and Winston Hide. [2007] 2007. “A Model of Directional Selection Applied to the Evolution of Drug Resistance in HIV-1..” Molecular Biology and Evolution 24(4):1025-31.

Understanding how pathogens acquire resistance to drugs is important for the design of treatment strategies, particularly for rapidly evolving viruses such as HIV-1. Drug treatment can exert strong selective pressures and sites within targeted genes that confer resistance frequently evolve far more rapidly than the neutral rate. Rapid evolution at sites that confer resistance to drugs can be used to help elucidate the mechanisms of evolution of drug resistance and to discover or corroborate novel resistance mutations. We have implemented standard maximum likelihood methods that are used to detect diversifying selection and adapted them for use with serially sampled reverse transcriptase (RT) coding sequences isolated from a group of 300 HIV-1 subtype C-infected women before and after single-dose nevirapine (sdNVP) to prevent mother-to-child transmission. We have also extended the standard models of codon evolution for application to the detection of directional selection. Through simulation, we show that the directional selection model can provide a substantial improvement in sensitivity over models of diversifying selection. Five of the sites within the RT gene that are known to harbor mutations that confer resistance to nevirapine (NVP) strongly supported the directional selection model. There was no evidence that other mutations that are known to confer NVP resistance were selected in this cohort. The directional selection model, applied to serially sampled sequences, also had more power than the diversifying selection model to detect selection resulting from factors other than drug resistance. Because inference of selection from serial samples is unlikely to be adversely affected by recombination, the methods we describe may have general applicability to the analysis of positive selection affecting recombining coding sequences when serially sampled data are available.

Schwegmann, Anita, Reto Guler, Antony J Cutler, Berenice Arendse, William G C Horsnell, Alexandra Flemming, Andreas H Kottmann, Gregory Ryan, Winston Hide, Michael Leitges, Cathal Seoighe, and Frank Brombacher. [2007] 2007. “Protein Kinase C Delta Is Essential for Optimal Macrophage-Mediated Phagosomal Containment of Listeria Monocytogenes..” Proceedings of the National Academy of Sciences of the United States of America 104(41):16251-6.

Activation of macrophages and subsequent "killing" effector functions against infectious pathogens are essential for the establishment of protective immunity. NF-IL6 is a transcription factor downstream of IFN-gamma and TNF in the macrophage activation pathway required for bacterial killing. Comparison of microarray expression profiles of Listeria monocytogenes (LM)-infected macrophages from WT and NF-IL6-deficient mice enabled us to identify candidate genes downstream of NF-IL6 involved in the unknown pathways of LM killing independent of reactive oxygen intermediates and reactive nitrogen intermediates. One differentially expressed gene, PKCdelta, had higher mRNA levels in the LM-infected NF-IL6-deficient macrophages as compared with WT. To define the role of PKCdelta during listeriosis, we infected PKCdelta-deficient mice with LM. PKCdelta-deficient mice were highly susceptible to LM infection with increased bacterial burden and enhanced histopathology despite enhanced NF-IL6 mRNA expression. Subsequent studies in PKCdelta-deficient macrophages demonstrated that, despite elevated levels of proinflammatory cytokines and NO production, increased escape of LM from the phagosome into the cytoplasm and uncontrolled bacterial growth occurred. Taken together these data identified PKCdelta as a critical factor for confinement of LM within macrophage phagosomes.

Lombard, Zane, Nicki Tiffin, Oliver Hofmann, Vladimir B Bajic, Winston Hide, and Michele Ramsay. [2007] 2007. “Computational Selection and Prioritization of Candidate Genes for Fetal Alcohol Syndrome..” BMC Genomics 8:389.

BACKGROUND: Fetal alcohol syndrome (FAS) is a serious global health problem and is observed at high frequencies in certain South African communities. Although in utero alcohol exposure is the primary trigger, there is evidence for genetic- and other susceptibility factors in FAS development. No genome-wide association or linkage studies have been performed for FAS, making computational selection and -prioritization of candidate disease genes an attractive approach.

RESULTS: 10174 Candidate genes were initially selected from the whole genome using a previously described method, which selects candidate genes according to their expression in disease-affected tissues. Hereafter candidates were prioritized for experimental investigation by investigating criteria pertinent to FAS and binary filtering. 29 Criteria were assessed by mining various database sources to populate criteria-specific gene lists. Candidate genes were then prioritized for experimental investigation using a binary system that assessed the criteria gene lists against the candidate list, and candidate genes were scored accordingly. A group of 87 genes was prioritized as candidates and for future experimental validation. The validity of the binary prioritization method was assessed by investigating the protein-protein interactions, functional enrichment and common promoter element binding sites of the top-ranked genes.

CONCLUSION: This analysis highlighted a list of strong candidate genes from the TGF-beta, MAPK and Hedgehog signalling pathways, which are all integral to fetal development and potential targets for alcohol's teratogenic effect. We conclude that this novel bioinformatics approach effectively prioritizes credible candidate genes for further experimental analysis.

2006

Mehrle, Alexander, Heiko Rosenfelder, Ingo Schupp, Coral del Val, Dorit Arlt, Florian Hahne, Stephanie Bechtel, Jeremy Simpson, Oliver Hofmann, Winston Hide, Karl-Heinz Glatting, Wolfgang Huber, Rainer Pepperkok, Annemarie Poustka, and Stefan Wiemann. [2006] 2006. “The LIFEdb Database in 2006..” Nucleic Acids Research 34(Database issue):D415-8.

LIFEdb (http://www.LIFEdb.de) integrates data from large-scale functional genomics assays and manual cDNA annotation with bioinformatics gene expression and protein analysis. New features of LIFEdb include (i) an updated user interface with enhanced query capabilities, (ii) a configurable output table and the option to download search results in XML, (iii) the integration of data from cell-based screening assays addressing the influence of protein-overexpression on cell proliferation and (iv) the display of the relative expression ('Electronic Northern') of the genes under investigation using curated gene expression ontology information. LIFEdb enables researchers to systematically select and characterize genes and proteins of interest, and presents data and information via its user-friendly web-based interface.