Motivation: The computational identification of genomic regulatory elements is the basis
for the large-scale investigation of regulatory networks and of the general principles
underlying genome organization and function. The complex and underspecified nature
of most regulatory elements requires the development of ad hoc computational methods
for their identification. Our understanding of complex biological adaptive systems, from
the cellular to the molecular level, can prove valuable in this context. An example is the
SNN-GA machine learning framework, which combines spiking neural networks, able to
realistically model neurological systems, with genetic algorithms, used to tune the
network parameters during the learning process.
Results: We present a description and an initial evaluation of an SNN-GA system for the
computational prediction of transcription factor binding sites (TFBS) in DNA sequences.
The goal of our work is to develop a classifier able to reduce the number of false
positives in the predicted TFBSs, through a more accurate modeling of the information
contained in the alignments in the training data, with the potential to elucidate complex
interactions connected to transcriptional and epigenetic regulation. Our method
represents the structure of a TFBS as a neural network that is trained using real TFBS
data from the TRANSFAC® database and appropriately generated negative examples.
Compared with classical TFBS representation methods, this approach is able to take
dependencies between nucleotide positions into account, and we have therefore defined
four alternative network structures to represent TFBS, based on different hypothesis
about their internal structures. We have performed cross-validation experiments to
compare the performance of our method against established ones like MATCH and
MAPPER. The results show that our classifier has the potential to attain very high
classification accuracy, and allowed us to identify the network structure offering the best