Sichtig Abstract

An SNN-GA Approach for the Prediction of Transcription Factor Binding Sites
Heike Sichtig and Alberto Riva
Department of Molecular Genetics and Microbiology,and UF Genetics Institute
University of Florida, USA

Motivation: The computational identification of genomic regulatory elements is the basis for the large-scale investigation of regulatory networks and of the general principles underlying genome organization and function. The complex and underspecified nature of most regulatory elements requires the development of ad hoc computational methods for their identification. Our understanding of complex biological adaptive systems, from the cellular to the molecular level, can prove valuable in this context. An example is the SNN-GA machine learning framework, which combines spiking neural networks, able to realistically model neurological systems, with genetic algorithms, used to tune the network parameters during the learning process.

Results: We present a description and an initial evaluation of an SNN-GA system for the computational prediction of transcription factor binding sites (TFBS) in DNA sequences. The goal of our work is to develop a classifier able to reduce the number of false positives in the predicted TFBSs, through a more accurate modeling of the information contained in the alignments in the training data, with the potential to elucidate complex interactions connected to transcriptional and epigenetic regulation. Our method represents the structure of a TFBS as a neural network that is trained using real TFBS data from the TRANSFAC® database and appropriately generated negative examples. Compared with classical TFBS representation methods, this approach is able to take dependencies between nucleotide positions into account, and we have therefore defined four alternative network structures to represent TFBS, based on different hypothesis about their internal structures. We have performed cross-validation experiments to compare the performance of our method against established ones like MATCH and MAPPER. The results show that our classifier has the potential to attain very high classification accuracy, and allowed us to identify the network structure offering the best predictive performance