MotifLab: A tools and data integration workbench for motif discovery and regulatory sequence analysis
Kjetil Klepper
Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway


Discovering binding motifs and bindings sites for transcription factors is an important problem in bioinformatics, and many tools have been proposed to search for novel motifs or to scan for potential sites that match established binding motifs. Unfortunately, traditional motif discovery and scanning methods that only rely on sequence data have a tendency to make a lot of false predictions. However, it has been demonstrated that use of additional information, such as gene expression, sequence conservation, location of DNase HS sites and epigenetic marks etc., has the potential to reduce the number of spurious predictions and also discriminate between functional and non-functional binding sites. A lot of data that could prove useful for this purpose is already available at genome-wide scales and more data for different organisms, cell-types and conditions is being published at an increasing rate.

MotifLab is a general software workbench for regulatory sequence analysis designed to make it easy to incorporate different types of data into the motif discovery process. A key application of MotifLab is for constructing positional priors tracks based on various sequence feature annotations. Positional priors can be used to highlight those parts of sequences that are considered more likely to contain functional binding sites, and they can be employed by motif discovery methods to guide the search or be used in a post-processing step to filter unpromising predictions. MotifLab can interface with several popular motif discovery tools (including MEME, Priority, MDscan, Weeder and BioProspector) to predict both individual binding sites and combinations of sites that could potentially function together (cis-regulatory modules).