The evolutionarily conserved gene lin-28 encodes an RNA-binding protein and is an important regulator of the proper temporal succession of several developmental events in both invertebrates and vertebrates. At the cellular level, LIN-28 promotes stemness and proliferation, and inhibits differentiation, a feature best illustrated by its ability to induce pluripotency when ectopically expressed in human fibroblasts in combination with NANOG, OCT4, and SOX2. Mammalian LIN28 functions in part by regulating processing of the let-7 microRNA through a GGAG binding site in the pre-let-7's distal loop region. However, many human and animal let-7 precursors lack the GGAG binding motif. In order to dissect the molecular mechanisms underlying its biological functions in a living animal, we identified a map of LIN-28 interactions with the transcriptome by in vivo HITS-CLIP in Caenorhabditis elegans. LIN-28 binds a large pool of messenger RNAs, and a substantial fraction of the bona fide LIN-28 targets are involved in aspects of animal development. Furthermore, our data show that LIN-28 regulates the expression of the let-7 microRNA by binding its primary transcript in a previously unknown region, revealing a novel regulatory mechanism.
- Home
- Publications
- Hongyu Zhao
Publications by Author: Hongyu Zhao
S
K
While cancer is a serious health issue, there are very few genetic biomarkers that predict predisposition, prognosis, diagnosis, and treatment response. Recently, sequence variations that disrupt microRNA (miRNA)-mediated regulation of genes have been shown to be associated with many human diseases, including cancer. In an early example, a variant at one particular single nucleotide polymorphism (SNP) in a let-7 miRNA complementary site in the 3' untranslated region (3' UTR) of the KRAS gene was associated with risk and outcome of various cancers. The KRAS oncogene is an important regulator of cellular proliferation, and is frequently mutated in cancers. To discover additional sequence variants in the 3' UTR of KRAS with the potential as genetic biomarkers, we resequenced the complete region of the 3' UTR of KRAS in multiple non-small cell lung cancer and epithelial ovarian cancer cases either by Sanger sequencing or capture enrichment followed by high-throughput sequencing. Here we report a comprehensive list of sequence variations identified in cases, with some potentially dysregulating expression of KRAS by altering putative miRNA complementary sites. Notably, rs712, rs9266, and one novel variant may have a functional role in regulation of KRAS by disrupting complementary sites of various miRNAs, including let-7 and miR-181.
Small, noncoding RNAs (sncRNAs), including microRNAs (miRNAs), impact diverse biological events through the control of gene expression and genome stability. However, the role of these sncRNAs in aging remains largely unknown. To understand the contribution of sncRNAs to the aging process, we performed small RNA profiling by deep-sequencing over the course of Caenorhabditis elegans (C. elegans) aging. Many small RNAs, including a significant number of miRNAs, change their expression during aging in C. elegans. Further studies of miRNA expression changes under conditions that modify lifespan demonstrate the tight control of their expression during aging. Adult-specific loss of argonaute-like gene-1 (alg-1) activity, which is necessary for miRNA maturation and function, resulted in an abnormal lifespan, suggesting that miRNAs are, indeed, required in adulthood for normal aging. miRNA target prediction algorithms combined with transcriptome data and pathway enrichment analysis revealed likely targets of these age-associated miRNAs with known roles in aging, such as mitochondrial metabolism. Furthermore, a computational analysis of our deep-sequencing data identified additional age-associated sncRNAs, including miRNA star strands, novel miRNA candidates, and endo-siRNA sequences. We also show an increase of specific transfer RNA (tRNA) fragments during aging, which are known to be induced in response to stress in several organisms. This study suggests that sncRNAs including miRNAs contribute to lifespan regulation in C. elegans, and indicates new connections between aging, stress responses, and the small RNA world.
C
MOTIVATION: MicroRNAs (miRNAs) play a crucial role in tumorigenesis and development through their effects on target genes. The characterization of miRNA-gene interactions will lead to a better understanding of cancer mechanisms. Many computational methods have been developed to infer miRNA targets with/without expression data. Because expression datasets are in general limited in size, most existing methods concatenate datasets from multiple studies to form one aggregated dataset to increase sample size and power. However, such simple aggregation analysis results in identifying miRNA-gene interactions that are mostly common across datasets, whereas specific interactions may be missed by these methods. Recent releases of The Cancer Genome Atlas data provide paired expression profiling of miRNAs and genes in multiple tumors with sufficiently large sample size. To study both common and cancer-specific interactions, it is desirable to develop a method that can jointly analyze multiple cancers to study miRNA-gene interactions without combining all the data into one single dataset.
RESULTS: We developed a novel statistical method to jointly analyze expression profiles from multiple cancers to identify miRNA-gene interactions that are both common across cancers and specific to certain cancers. The benefit of this joint analysis approach is demonstrated by both simulation studies and real data analysis of The Cancer Genome Atlas datasets. Compared with simple aggregate analysis or single sample analysis, our method can effectively use the shared information among different but related cancers to improve the identification of miRNA-gene interactions. Another useful property of our method is that it can estimate similarity among cancers through their shared miRNA-gene interactions.
AVAILABILITY AND IMPLEMENTATION: The program, MCMG, implemented in R is available at http://bioinformatics.med.yale.edu/group/.
Next-generation sequencing is widely used to study complex diseases because of its ability to identify both common and rare variants without prior single nucleotide polymorphism (SNP) information. Pooled sequencing of implicated target regions can lower costs and allow more samples to be analyzed, thus improving statistical power for disease-associated variant detection. Several methods for disease association tests of pooled data and for optimal pooling designs have been developed under certain assumptions of the pooling process, for example, equal/unequal contributions to the pool, sequencing depth variation, and error rate. However, these simplified assumptions may not portray the many factors affecting pooled sequencing data quality, such as PCR amplification during target capture and sequencing, reference allele preferential bias, and others. As a result, the properties of the observed data may differ substantially from those expected under the simplified assumptions. Here, we use real datasets from targeted sequencing of pooled samples, together with microarray SNP genotypes of the same subjects, to identify and quantify factors (biases and errors) affecting the observed sequencing data. Through simulations, we find that these factors have a significant impact on the accuracy of allele frequency estimation and the power of association tests. Furthermore, we develop a workflow protocol to incorporate these factors in data analysis to reduce the potential biases and errors in pooled sequencing data and to gain better estimation of allele frequencies. The workflow, Psafe, is available at http://bioinformatics.med.yale.edu/group/.