The probability distribution of distance TSS-TLS is organism characteristic and can be used for promoter prediction
Transcription is a complicated process which involves the interactions of promoter cis-elements with multiple trans-protein factors. The specific interactions rely not only on the specific sequence recognition between the cis- and trans-factors but also on certain spatial arrangement of the factors in a complex. The relative positioning of involved cis-elements provides the framework for such a spatial arrangement. The distance distribution between gene transcription and translation start sites (TSS-TLS) is the subject of the present study to test an assumption that over evolution, the TSS-TLS distance becomes a distinct character for a given organism. Four representative organisms (Escherichia cloi, Saccharomyces cerevisiae, Arabidopsis thaliana and Homo sapiens) were chosen to study the probability distribution of the distance TSS-TLS. The statistical results show that the distances distributions vary significantly and are not independent of species. There seems a trend of increased length of the distances from simple prokaryotic to more complicated eukaryotic organisms. With the specific distance distribution data, computational promoter prediction tools can be improved for higher accuracy.