Email Address

Password

Home
 
Magnaporthe grisea EST
¡¤Navigation
 
Rice EST
¡¤Introduction
¡¤EST analysis
¡¤3'end Analysis
¡¤Navigation
¡¤Reference
¡¤Search
¡¤Supply Data
¡¡
Array
¡¤Introduction
¡¤Navigation
¡¤Search
¡¤Supply Data
¡¡
Services
¡¤Querying Clones
¡¡
Bio-Links
¡¡
Software
¡¤DNAuser
¡¤Pusamen
¡¡
Books
¡¤Practical Protocols of Gene Engineering
¡¤Basic knowledge of rice
¡¡
Introduction

¡¡The transcribed and expressed sequences in genomes only occupy roughly 2-5 percent, but sequencing on them can lead to the discovery of genes and contribute to mining the key information in genome. It is important for revealing the function of gene (family) , the gene network structure and pathway to study the time-space expression profile of genes involved in specific phenotype and crucial biological progress (Donson et al.,2002). The gene expression level can indicate the type, development phase and response state of according cell. The gene expression pattern can be studied systematically and globally through expression profile that also can provide the clues of physiological research. Recent research showed the cancer orientation of normal cell could be predicted early by gene expression profile (Shoemaker et al., 2002), which mean the regulation network of gene expression could suggest the orientation of life before the symptom emergent. Understanding precisely how the network on transcriptional level regulate the process of life can conduce to reveal the principles of genes' systematically expression , the informatics' characteristic of development and the theoretical base of dynamical balance of regulation network. The common tools in processing the whole genome scale expression data are descriptive. ESTs (Expressed Sequence Tags) concept originated by Adams (Adams et al., 1991) is a kind of classical t technology complement to nucleotide array and SAGE (Serial Analysis of Gene Expression). Gene expression profiling based on EST had been matured both on theory and practice since Okubo (Okubo et al., 1992) a decade ago. The considerable amount of ESTs can present the gene expression situation of the source tissue or cell (Mekhedov et al.,2000), and can be used for exploring the complex relationship between the gene expression pattern and genome sequence (Iseli et al.,2002). ESTs had been the cost-efficient and valuable resource of genome annotation and played important role in functional genomics research (Ewing et al.,1999; Fernandes et al.,2002; Lee et al.,2002; Okano et al.,2001; Qutob et al.,2000)¡£By this technology, specific gene can be identified (Zhan et al., 2000), and metallization pathway can be interpreted (Ohlrogge et al., 2000) with some key gene cloned efficiently (Runnsley et al., 1996), i.e. Cahoon (Cahoon et al., 1999) had acquired the gene of fatty-acid conjugase that is the key enzyme of lipid-linked biosynthesis by analysis on the EST datasets from the oil producing tissue of Momordica Charantia and Impatiens Balsamina. The tissue and biological specific ESTs identified from original EST datasets are the essential source for high-quality cDNA array (Lofftus et al.,1999). Data mining on the EST datasets also can reveal potential information or rule about pre-mRNA processing mechanisms, such as signal elements, alternate splicing or alternate 3'end-processing sites (Kan et al.,2001). Gene regulation exists on five levels, including the DNA level, transcriptional level, post-transcription level, translation level and post-translation level. In Eukaryote, the matured mRNA's formation need the post-transcriptional modification of pre-mRNA essentially including the appendance of 5'capping, splicing of intron and 3'end processing. The general structure of a matured transcript included 5'Untranslated Region (5'UTR), Open Reading Frame (ORF) and 3' Untranslated Region (3'UTR). 3'UTR is transcript-specific (Coulson et al., 1997, Wu et al 2000) and play crucial regulation role in post-translation modification, inter-cellular localization and transmission, mRNA stabilization and assuring for the translation efficiency (Mignone et al., 2002). The cis-elements in 3'UTR and 3'clip region that can regulate the 3'end processing by interacting with specific trans-elements involved in 3'end processing(Pauws et al., 2001). Research on mammal show that 3'end processing consists of two key steps, one is the cleavage on a specific site and the other is appending a Poly(A+) tail on the end. The core cis-elements had been identified by absence experiment and sequence analysis (Barabino et al., 1999).There are three kinds of elements: Poly(A+) site, Position Element (PE) and Downstream Element (DSE). Poly(A+) site is a di-nucleotides with conserved composition YA (Y:C,T) (Chen et al., 1995; Zhao et al., 1999),which is also can be called cleavage site (CS).PE locates 10~30nt upstream the Poly(A+) site ,and have a conserved motif AATAAA. DSE is T/GT rich element downstream the Poly(A+) site and plays important role in stabilizing the trans-elements' complex. In plant, the elements involved in 3'end processing are more dispersive, lower conservative and are more complicated (Rothnie, 1996). A simple model on the distribution of the elements was shown in Fig.1 (Zhao et al., 1999). No very conserved pattern had been found for the PE in plant mRNA and different mRNA has diverse PE which is only efficient for its owner (Zheng et al., 2000). PE in plant is called Near Upstream Element (NUE). The low conserved element crucial for the efficiency of 3'end processing is called Far Upstream Element(FUE) locates upstream the NUE and also can be called Efficiency Element (EE)¡£The composition of Poly(A+) site is similar to mammal. Structure and distribution of 3'end processing related elements decentralize in a wide variety among different mRNA and plants. Even for the same gene, the elements function in 3'end processing could be different, which could cause the polymorphism of 3'end of matured mRNA. Some gene has multiple NUE, i.e. pea rbcS-E gene)(Fig.2)¡£It was known that at least four kinds of trans-elements participate the 3'end processing (Zheng et al., 2000), including Cleavage Polyadenylation-Specific Factor (CPSF) recognizing NUE, Cleavage stimulation Factor (CstF) recognizing FUE, Cleavage factors (CFs) responsible for cleavaging the pre-mRNA 3'end and Poly(A+) Polymerase (PAP) which produce the Poly(A+) tail. The potential model of combination of these enzymes was shown as Fig.3. It is notable that not all matured mRNA have Poly(A+) tail, i.e. Histone mRNA.

Fig.1. Plant cis-elements in mRNA 3' end processing (from Zhao et al., 1999)


Fig.2. Multi Poly(A+) sites of pea rbcS-E9 gene, from http://www.uky.edu/~aghunt00/polya.signal.html.


Fig.3. Plant trans-elements in mRNA 3' end processing, retraced from Zheng et al., 2000. CPSF recognizes NUE directly, CstF recognizes FUE, CFs are required for the cleavage reaction, PAP is required for the Poly(A+) sequence generation.

¡¡Data mining on the 3'end processing cis-elements had provided amounted clues for further experiment (van Helden et al., 2000; Graber et al., 1999). in silico experiment based on Poly(A+) EST datasets can help to identify and characterize some significant cis-elements(Pauws et al.,2001).Accumulation of data and analysis on the primary structure of 3'UTR and 3'clip can also make for the research on the secondary structure (Pesole et al.,1999),promoting the understanding of the sequence characteristic of the 3'end region. Arithmetic had been applied on this kind research focuses on the statistic and analysis of sequence composition, including to identify a potential elements by determining the statistical significance of nucleotide word (represented by "word" in followed text) , by discriminating the word's position distribution and by comparing the words similar in composition and distribution by alignment. Statistical model, clustering and discriminate model, Markov model and etc. had been employed in data mining (van Helden et al., 2000)¡£Some research also had dealed with the relationship between the sequence and other biological characteristic, i.e. the association between sequence pattern and gene function (Conklin et al., 2002).Comparing the matured mRNA from the same gene expressed in distinct tissue indicated that the distribution of Poly(A+) site was tissue-specific to some extend (Beaudoing et al.,2001). Rice (Oryza sativa) is one of the most important cereal crops in the world. It has become a model plant because of its economic value, small genome size (430 Mb), high gene density and syntenic relations with other cereals (Serageldin, 2002). With the draft genome sequences for Japonica and Indica rice having been released in public databases, the coming arduous and important mission is function genomic research, which mean to reveal the gene regulation and interaction network based on the precisely annotating on the 30,000~50,000 genes(Yu et al,2002£»Goff et al,2002)¡£As an important part of function genomics, a large-scale EST analysis project of the genome has progressed for rice. An early large-scale sequencing and analysis project of the rice genome generated an enormous collection of ESTs (Sasaki T et al., 1994; Sasaki T et al., 1996; Kimiko Yamamoto1 et al., 1997).A total of 202,290 rice ESTs had been released in dbEST on NCBI (2003.5.2, http://www.ncbi.nlm.nih.gov/dbEST_summary.html) with more and more rapidly accumulation. However, on the side of data size, compared to the progress of genomic sequencing projects and other plants absent on genome sequence, the amount of ESTs is low. On the side of data sources, large-scale EST libraries generated from specific biological processes or tissues, i.e. from the interaction between plant and pathogen microbe, are lacking. On the side of data characteristic, most ESTs were from 5' end. The lack of the transcript specific 3'EST made against the further research based on ESTs, not only expression profiling, cDNA array and genome sequence analysis, but also the research on 3'end processing related elements. For the reason above, as part of our rice gene discovery plan, we generated 25,160 high-quality ESTs from large-scale 3' sequencing of three cDNA libraries. Analysis was performed on the three EST datasets respectively from leaf induced by Magnaporthe grisea, stem in the 3- to 5-leaf stage and endosperm 10~15 days after anthesis. Each library reflects an important tissue expression pattern under specific conditions. Comparative analysis of the expression patterns among the three EST datasets was performed to investigate further similarities and differences among three distinct libraries. A systematic and detailed analysis of 3'end processing elements on rice has not been performed, and there is rare research on the 3'UTR general structure of rice, except for common analysis on expression pattern. Hence, we performed an investigation on sequence features in 3' UTR and 3'clip based on our non-redundant database of 3'EST and published rice genome sequence.