Glioblastoma is the most common brain tumor. Median survival in unselected patients is <10 months. The tumor harbors stem-like cells that self-renew and propagate upon serial transplantation in mice, although the clinical relevance of these cells has not been well documented. We have performed the first genome-wide analysis that directly relates the gene expression profile of nine enriched populations of glioblastoma stem cells (GSCs) to five identically isolated and cultivated populations of stem cells from the normal adult human brain. Although the two cell types share common stem- and lineage-related markers, GSCs show a more heterogeneous gene expression. We identified a number of pathways that are dysregulated in GSCs. A subset of these pathways has previously been identified in leukemic stem cells, suggesting that cancer stem cells of different origin may have common features. Genes upregulated in GSCs were also highly expressed in embryonic and induced pluripotent stem cells. We found that canonical Wnt-signaling plays an important role in GSCs, but not in adult human neural stem cells. As well we identified a 30-gene signature highly overexpressed in GSCs. The expression of these signature genes correlates with clinical outcome and demonstrates the clinical relevance of GSCs.
Publications
2013
PROM1 is the gene encoding prominin-1 or CD133, an important cell surface marker for the isolation of both normal and cancer stem cells. PROM1 transcripts initiate at a range of transcription start sites (TSS) associated with distinct tissue and cancer expression profiles. Using high resolution Cap Analysis of Gene Expression (CAGE) sequencing we characterize TSS utilization across a broad range of normal and developmental tissues. We identify a novel proximal promoter (P6) within CD133(+) melanoma cell lines and stem cells. Additional exon array sampling finds P6 to be active in populations enriched for mesenchyme, neural stem cells and within CD133(+) enriched Ewing sarcomas. The P6 promoter is enriched with respect to previously characterized PROM1 promoters for a HMGI/Y (HMGA1) family transcription factor binding site motif and exhibits different epigenetic modifications relative to the canonical promoter region of PROM1.
The initiation of cellular programs is orchestrated by key transcription factors and chromatin regulators that activate or inhibit target gene expression. To generate a compendium of chromatin factors that establish the epigenetic code during developmental haematopoiesis, a large-scale reverse genetic screen was conducted targeting orthologues of 425 human chromatin factors in zebrafish. A set of chromatin regulators was identified that target different stages of primitive and definitive blood formation, including factors not previously implicated in haematopoiesis. We identified 15 factors that regulate development of primitive erythroid progenitors and 29 factors that regulate development of definitive haematopoietic stem and progenitor cells. These chromatin factors are associated with SWI/SNF and ISWI chromatin remodelling, SET1 methyltransferase, CBP-p300-HBO1-NuA4 acetyltransferase, HDAC-NuRD deacetylase, and Polycomb repressive complexes. Our work provides a comprehensive view of how specific chromatin factors and their associated complexes play a major role in the establishment of haematopoietic cells in vivo.
2012
Complex diseases result from contributions of multiple genes that act in concert through pathways. Here we present a method to prioritize novel candidates of disease-susceptibility genes depending on the biological similarities to the known disease-related genes. The extent of disease-susceptibility of a gene is prioritized by analyzing seven features of human genes captured in H-InvDB. Taking rheumatoid arthritis (RA) and prostate cancer (PC) as two examples, we evaluated the efficiency of our method. Highly scored genes obtained included TNFSF12 and OSM as candidate disease genes for RA and PC, respectively. Subsequent characterization of these genes based upon an extensive literature survey reinforced the validity of these highly scored genes as possible disease-susceptibility genes. Our approach, Prioritization ANalysis of Disease Association (PANDA), is an efficient and cost-effective method to narrow down a large set of genes into smaller subsets that are most likely to be involved in the disease pathogenesis.
Mounting evidence suggests that malignant tumors are initiated and maintained by a subpopulation of cancerous cells with biological properties similar to those of normal stem cells. However, descriptions of stem-like gene and pathway signatures in cancers are inconsistent across experimental systems. Driven by a need to improve our understanding of molecular processes that are common and unique across cancer stem cells (CSCs), we have developed the Stem Cell Discovery Engine (SCDE)-an online database of curated CSC experiments coupled to the Galaxy analytical framework. The SCDE allows users to consistently describe, share and compare CSC data at the gene and pathway level. Our initial focus has been on carefully curating tissue and cancer stem cell-related experiments from blood, intestine and brain to create a high quality resource containing 53 public studies and 1098 assays. The experimental information is captured and stored in the multi-omics Investigation/Study/Assay (ISA-Tab) format and can be queried in the data repository. A linked Galaxy framework provides a comprehensive, flexible environment populated with novel tools for gene list comparisons against molecular signatures in GeneSigDB and MSigDB, curated experiments in the SCDE and pathways in WikiPathways. The SCDE is available at http://discovery.hsci.harvard.edu.
African Americans are disproportionately affected by type 2 diabetes (T2DM) yet few studies have examined T2DM using genome-wide association approaches in this ethnicity. The aim of this study was to identify genes associated with T2DM in the African American population. We performed a Genome Wide Association Study (GWAS) using the Affymetrix 6.0 array in 965 African-American cases with T2DM and end-stage renal disease (T2DM-ESRD) and 1029 population-based controls. The most significant SNPs (n = 550 independent loci) were genotyped in a replication cohort and 122 SNPs (n = 98 independent loci) were further tested through genotyping three additional validation cohorts followed by meta-analysis in all five cohorts totaling 3,132 cases and 3,317 controls. Twelve SNPs had evidence of association in the GWAS (P<0.0071), were directionally consistent in the Replication cohort and were associated with T2DM in subjects without nephropathy (P<0.05). Meta-analysis in all cases and controls revealed a single SNP reaching genome-wide significance (P<2.5×10(-8)). SNP rs7560163 (P = 7.0×10(-9), OR (95% CI) = 0.75 (0.67-0.84)) is located intergenically between RND3 and RBM43. Four additional loci (rs7542900, rs4659485, rs2722769 and rs7107217) were associated with T2DM (P<0.05) and reached more nominal levels of significance (P<2.5×10(-5)) in the overall analysis and may represent novel loci that contribute to T2DM. We have identified novel T2DM-susceptibility variants in the African-American population. Notably, T2DM risk was associated with the major allele and implies an interesting genetic architecture in this population. These results suggest that multiple loci underlie T2DM susceptibility in the African-American population and that these loci are distinct from those identified in other ethnic populations.
Circulating levels of adiponectin, a hormone produced predominantly by adipocytes, are highly heritable and are inversely associated with type 2 diabetes mellitus (T2D) and other metabolic traits. We conducted a meta-analysis of genome-wide association studies in 39,883 individuals of European ancestry to identify genes associated with metabolic disease. We identified 8 novel loci associated with adiponectin levels and confirmed 2 previously reported loci (P = 4.5×10(-8)-1.2×10(-43)). Using a novel method to combine data across ethnicities (N = 4,232 African Americans, N = 1,776 Asians, and N = 29,347 Europeans), we identified two additional novel loci. Expression analyses of 436 human adipocyte samples revealed that mRNA levels of 18 genes at candidate regions were associated with adiponectin concentrations after accounting for multiple testing (p<3×10(-4)). We next developed a multi-SNP genotypic risk score to test the association of adiponectin decreasing risk alleles on metabolic traits and diseases using consortia-level meta-analytic data. This risk score was associated with increased risk of T2D (p = 4.3×10(-3), n = 22,044), increased triglycerides (p = 2.6×10(-14), n = 93,440), increased waist-to-hip ratio (p = 1.8×10(-5), n = 77,167), increased glucose two hours post oral glucose tolerance testing (p = 4.4×10(-3), n = 15,234), increased fasting insulin (p = 0.015, n = 48,238), but with lower in HDL-cholesterol concentrations (p = 4.5×10(-13), n = 96,748) and decreased BMI (p = 1.4×10(-4), n = 121,335). These findings identify novel genetic determinants of adiponectin levels, which, taken together, influence risk of T2D and markers of insulin resistance.
Gene expression quantitative trait loci (eQTL) are useful for identifying single nucleotide polymorphisms (SNPs) associated with diseases. At times, a genetic variant may be associated with a master regulator involved in the manifestation of a disease. The downstream target genes of the master regulator are typically co-expressed and share biological function. Therefore, it is practical to screen for eQTLs by identifying SNPs associated with the targets of a transcript-regulator (TR). We used a multivariate regression with the gene expression of known targets of TRs and SNPs to identify TReQTLs in European (CEU) and African (YRI) HapMap populations. A nominal p-value of <1×10(-6) revealed 234 SNPs in CEU and 154 in YRI as TReQTLs. These represent 36 independent (tag) SNPs in CEU and 39 in YRI affecting the downstream targets of 25 and 36 TRs respectively. At a false discovery rate (FDR) = 45%, one cis-acting tag SNP (within 1 kb of a gene) in each population was identified as a TReQTL. In CEU, the SNP (rs16858621) in Pcnxl2 was found to be associated with the genes regulated by CREM whereas in YRI, the SNP (rs16909324) was linked to the targets of miRNA hsa-miR-125a. To infer the pathways that regulate expression, we ranked TReQTLs by connectivity within the structure of biological process subtrees. One TReQTL SNP (rs3790904) in CEU maps to Lphn2 and is associated (nominal p-value = 8.1×10(-7)) with the targets of the X-linked breast cancer suppressor Foxp3. The structure of the biological process subtree and a gene interaction network of the TReQTL revealed that tumor necrosis factor, NF-kappaB and variants in G-protein coupled receptors signaling may play a central role as communicators in Foxp3 functional regulation. The potential pleiotropic effect of the Foxp3 TReQTLs was gleaned from integrating mRNA-Seq data and SNP-set enrichment into the analysis.
To make full use of research data, the bioscience community needs to adopt technologies and reward mechanisms that support interoperability and promote the growth of an open 'data commoning' culture. Here we describe the prerequisites for data commoning and present an established and growing ecosystem of solutions using the shared 'Investigation-Study-Assay' framework to support that vision.
BACKGROUND: Animal studies suggest that early-life lead exposure influences gene expression and production of proteins associated with Alzheimer's disease (AD).
OBJECTIVES: We attempted to assess the relationship between early-life lead exposure and potential biomarkers for AD among young men and women. We also attempted to assess whether early-life lead exposure was associated with changes in expression of AD-related genes.
METHODS: We used sandwich enzyme-linked immunosorbent assays (ELISA) to measure plasma concentrations of amyloid β proteins Aβ40 and Aβ42 among 55 adults who had participated as newborns and young children in a prospective cohort study of the effects of lead exposure on development. We used RNA microarray techniques to analyze gene expression.
RESULTS: Mean plasma Aβ42 concentrations were lower among 13 participants with high umbilical cord blood lead concentrations (≥ 10 μg/dL) than in 42 participants with lower cord blood lead concentrations (p = 0.08). Among 10 participants with high prenatal lead exposure, we found evidence of an inverse relationship between umbilical cord lead concentration and expression of ADAM metallopeptidase domain 9 (ADAM9), reticulon 4 (RTN4), and low-density lipoprotein receptor-related protein associated protein 1 (LRPAP1) genes, whose products are believed to affect Aβ production and deposition. Gene network analysis suggested enrichment in gene sets involved in nerve growth and general cell development.
CONCLUSIONS: Data from our exploratory study suggest that prenatal lead exposure may influence Aβ-related biological pathways that have been implicated in AD onset. Gene network analysis identified further candidates to study the mechanisms of developmental lead neurotoxicity.