Email Address

Password

 
Home
 
Magnaporthe grisea EST
Navigation
 
Rice EST
Introduction
EST analysis
3'end Analysis
Navigation
Reference
Search
Supply Data
 
Array
Introduction
Navigation
Search
Supply Data
 
Services
Querying Clones
 
Bio-Links
 
Software
DNAuser
Pusamen
 
Books
Practical Protocols of Gene Engineering
Basic knowledge of rice
 
3'end processing sequence analysis

Download

Download two dataset of 3'end sequence in our 3'end processing sequence analysis


Surrounding dataset

Upstream dataset

Introduction

 After processing, two datasets of sequence were constructed. One dataset includes 7662 sequences of 150 bases upstream of the putative poly(A+), named the "Upstream" dataset. The other dataset is comprised of 1693 sequences of 250 bases, ranging from -150 to +100 from putative cleavage sites, named the "Surrounding" dataset. Notably, the position of poly(A+) itself is -1. All the sequences were aligned at the putative poly(A+).

 Only the sequences in our non-redundant dataset were selected for further statistical analysis, to avoid skewing of statistical data. To assure unambiguousness, the sequences including one "N" were also eliminated.

 In order to identify signal sequences, we measured the position distribution of series of 6-mer words in the sequences flanking the putative poly(A+) site in two datasets respectively. The Markov chain model was used to measure the overrepresentation of words. This provides a reliable basis for estimating the expected word frequencies in large sequence sets. Chi-Square was calculated to screen words with biases in position distribution, and parallel analysis on two datasets was used to discriminate false positives.

 Based on the position distribution profile analysis and base frequency statistic,a model for the distribution and feature of the cis-elements on mRNA 3'end processing was proposed here (Figure 1). Model's core components was composed by following elements, T-rich region surrounding the poly(A+) site, Near Upstream Element (NUE) and Far Upstream Element(FUE). For poly(A+) site, it was a YA (Y: C,T) di-nucleotide itself, the T-rich region downstream was commonly more conserved than the one upstream. For NUE, it was an A, T-rich region situated between 10 and 30 nt upstream the poly(A+) site, including two specific sequences of AATAAA and TATATA respectively. AATAAA was a kind of typical position element that determine the poly(A+) site downstream. For FUE, as the element 50~70 nt upstream the poly(A+) site, one kind was ATGTAA-like with a core consensus motif TGTA and the other was T/GT rich. In addition, in some mRNAs, there were some continuous A closely downstream of the poly(A+) site.

Figure 1. General structure of mRNA 3'-end processing related sequence in rice