genePI was developed as part of the diploma project "Supervised classification of regulatory regions in the human genome" by Petra Schwalie, under the supervision of Prof. Dr. Finn Drabløs. We showed that a classification system based on internal DNA properties without additional promoter-
specific motif data is able to correctly distinguish promoter regions from other genomic regions. A support vector
machine classifier was trained and tested using feature vectors based on predicted DNA properties, nucleotide
content, sequence complexity and repeat data. Our approach for gene promoter identification (genePI) could
successfully recognise all promoter classes, showing high sensitivity on general promoters, miRNA promoters and
bidirectional promoters. Generally, it performed better then the methods FirstEF, Eponine, FProm, ARTS, ProStar, Ep3 and ProSOM .
The genePI strategy can reliably identify promoter regions in the human genome. These predictions
can subsequently be used by other tools for prediction of e.g. transcription start sites and transcription factor
binding sites. In addition, this classification approach can be used as a general strategy for classification of functional regions
in DNA, as we have shown e.g. for enhancers. Provided a high-quality training set with distinct DNA properties is available, a classifier capable of
recognising new functional elements of the same type can be developed.
All datasets used for training and testing genePI, as well as for developing the enhancer classifier can be downloaded here. In the promoter directory, cage data was saved separately and the positive sequences are named as human_strand_cage-category.txt.
Bioinformatics & Gene Regulation
Department of Cancer Research and Molecular Medicine
Norwegian University of Science and Technology
Trondheim, Norway