Centre for Statistical & Survey Methodology Working Paper Series

Publication Date



The Generalized Bernoulli Modeling approach is used to analyze the pattern change in intron sequences of a model plant species Arabidopsis thaliana. The influence of the intron length and the number of GC on the intron sequence pattern changes is examined. Two other gene properties, the gene expression level and the protein function encoded are also assessed. Among the random sampled intron sequences, 10.71% of them have been identified to have sequence pattern change. Our study shows that the number of GC and the intron length significantly influence the intron pattern change while the gene expression level and the protein function have little effect. Our results show that for Arabidopsis thaliana, the shorter intron with more number of GC might have a higher chance to have pattern changes detected on its sequence and this piece of information could be used for checking whether the intron is functional introns. This study may be benefit to the further study on functions of intron.