Journal of Probability and Statistics — An Open Access Journal

Journal Menu

Journal of Probability and Statistics
Volume 2012 (2012), Article ID 830575, 18 pages
http://dx.doi.org/10.1155/2012/830575

Research Article

Finding Transcription Factor Binding Motifs for Coregulated Genes by Combining Sequence Overrepresentation with Cross-Species Conservation

Hui Jia¹ and Jinming Li^1,2

¹School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551
²Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou Dadao Bei1838, Guangzhou 510515, China

Received 1 March 2012; Accepted 29 April 2012

Academic Editor: Xiaohua Douglas Zhang

Copyright © 2012 Hui Jia and Jinming Li. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Novel computational methods for finding transcription factor binding motifs have long been sought due to tedious work of experimentally identifying them. However, the current prevailing methods yield a large number of false positive predictions due to the short, variable nature of transcriptional factor binding sites (TFBSs). We proposed here a method that combines sequence overrepresentation and cross-species sequence conservation to detect TFBSs in upstream regions of a given set of coregulated genes. We applied the method to 35 S. cerevisiae transcriptional factors with known DNA binding motifs (with the support of orthologous sequences from genomes of S. mikatae, S. bayanus, and S. paradoxus), and the proposed method outperformed the single-genome-based motif finding methods MEME and AlignACE as well as the multiple-genome-based methods PHYME and Footprinter for the majority of these transcriptional factors. Compared with the prevailing motif finding software, our method has some advantages in finding transcriptional factor binding motifs for potential coregulated genes if the gene upstream sequences of multiple closely related species are available. Although we used yeast genomes to assess our method in this study, it might also be applied to other organisms if suitable related species are available and the upstream sequences of coregulated genes can be obtained for the multiple closely related species.