Case study on the pattern change in arabidopsis thaliana intron sequence
A large portion in eukaryotic genomes is introns but their function is not yet fully elucidated. The aims of this study are to employ the generalized Bernoulli modeling approach to estimate the change points based on GC distribution in intron sequences of Arabidopsis thaliana and to investigate whether there is any correlation between gene properties (the gene expression level and the protein function) and these intron pattern changes. The influence of the intron length and the number of GC on the intron sequence pattern changes is demonstrated. Among the random sampled intron sequences, 10.71% have been identified to have pattern changes. The number of GC and the intron length can be considered main factors influencing the intron pattern change. Short introns with high GC number are likely to have higher chance to have pattern changes detected on their sequences. Little influence of gene properties on the pattern change is found in this study. The pattern changes identified in this study may benefit the further investigation on intron functionality.