About genePI

genePI was developed as part of the diploma project "Supervised classification of regulatory regions in the human genome" by Petra Schwalie, under the supervision of Prof. Dr. Finn Drabløs. We showed that a classification system based on internal DNA properties without additional promoter- specific motif data is able to correctly distinguish promoter regions from other genomic regions. A support vector machine classifier was trained and tested using feature vectors based on predicted DNA properties, nucleotide content, sequence complexity and repeat data. Our approach for gene promoter identification (genePI) could successfully recognise all promoter classes, showing high sensitivity on general promoters, miRNA promoters and bidirectional promoters. Generally, it performed better then the methods FirstEF, Eponine, FProm, ARTS, ProStar, Ep3 and ProSOM .

The genePI strategy can reliably identify promoter regions in the human genome. These predictions can subsequently be used by other tools for prediction of e.g. transcription start sites and transcription factor binding sites. In addition, this classification approach can be used as a general strategy for classification of functional regions in DNA, as we have shown e.g. for enhancers. Provided a high-quality training set with distinct DNA properties is available, a classifier capable of recognising new functional elements of the same type can be developed.


Performance measure of several promoter prediction tools including genePI

back


All datasets used for training and testing genePI, as well as for developing the enhancer classifier can be downloaded here. In the promoter directory, cage data was saved separately and the positive sequences are named as human_strand_cage-category.txt.


Bioinformatics & Gene Regulation
Department of Cancer Research and Molecular Medicine
Norwegian University of Science and Technology
Trondheim, Norway