MotifLab is a general workbench for analyzing regulatory sequence regions and
discovering transcription factor binding sites and cis-regulatory modules.
MotifLab can improve performance of binding site predictions by allowing users
to integrate several motif discovery tools as well as different types of data,
for instance phylogenetic conservation, epigenetic marks, DNase HS sites,
ChIP-Seq data, positional binding preferences of transcription factors, TF-TF
interactions,
TF-expression and target gene expression.
With more than 40 data-processing operations available, users can create,
import, export, transform and manipulate data objects such as DNA sequences,
numeric- or region-based data tracks, transcription factor binding motifs,
regulatory modules, background models, text- and numeric-variables, and
collections and partitions of data objects.
Various statistical analyses can be applied to evaluate or compare data
objects and results can be output in many common text-based data formats or in
HTML with images included. Interactive tools allow users to search for and
highlight interesting features of datasets, for instance potential networks of
binding sites for interacting transcription factors.
The program is written in JAVA and can either be run in interactive mode with
a graphical user interface providing powerful data visualization
capabilities, or the analysis to be performed can be defined in a
protocol-script and executed by MotifLab in batch-mode from a command-line
interface. The latter option allows MotifLab to be incorporated as a component
in larger data processing pipelines.
- Automatically obtain sequence data from a list of gene identifiers or
specified genomic intervals.
- Annotate your sequences with data downloaded from internet resources
such as the UCSC Genome Browser or DAS servers with the click of a button,
or load your own annotation data from files.
- Import and export data in a variety of supported formats, including FASTA,
GFF, GTF, BED, WIG, PSP and CisML (for sequence data) and Transfac, Jaspar,
TAMO, MEME minimal motif, XMS, INCLUSive and raw PSSM format (for motif
data). Some data objects can even be output to formatted HTML with images.
- Transform numeric data objects with
arithmetic operations (add,subtract,multiply,divide) or other mathematical
functions such as logarithms, rounding functions and many more.
- Apply sliding window functions to numeric
tracks to smooth data or to detect edges, peaks and valleys.
- Mask portions of DNA sequences by changing
the letter case or by replacing the bases with other letters such as N
or X.
- Combine information from several tracks into
new tracks.
- Most data objects can be edited manually
within the GUI by altering their values in dialog boxes.
You can also
edit sequence annotation tracks by drawing directly into the sequence browser.
- Import collections of motifs from databases like TRANSFAC and
JASPAR with the click of a button, and scan your sequences to find
potential matching sites.
- Run external de novo motif
discovery methods from within MotifLab to search for novel motifs.
- Create positional priors tracks to
guide motif discovery methods in their search, and use machine learning to
train classifiers that can automatically generate such tracks based on
information from various features.
- Compare newly discovered motifs to
existing motif libraries using different comparisons measures.
- Use biological knowledge and annotation data,
such as conservation, DNaseHS sites, ChIP-Seq data etc. to filter out likely false predictions.
- Use ensemble methods to compare and combine
predictions made by different algorithms.
- Annotate your motifs with additional
useful information, such as as the names of transcription factors binding to
the motif, which organisms and tissues these TFs are expressed in, and references to other factors
that are known to interact with factors binding to the motif.
- Search your sequences for recurring
combinations of binding motifs that could represent cis-regulatory modules.
- Or define your own modules manually and scan
sequences to see if they contain these modules.
- Impose constraints on the motifs appearing in
a module, such as the order of the motifs, the distance between them or
their relative orientation.
- You can also generate a library of modules
automatically based on information about known interaction partners for each
motif (provided that your motifs have such annotations).
- Count the number of times each motif occurs
in your sequences and compare these numbers to expected frequencies to identify
motifs that are significantly overrepresented in your dataset.
- Compare one subset of your sequences to
another to see if some motifs appear with higher frequency in one of the
subsets.
- Use linear regression to identify the motifs
that correlate best with gene expression.
- Find which motifs have the highest average
conservation level across all its binding sites.
- Analyse the positional distribution of
binding sites to see if some motifs tend to appear at a certain distance
relative to the transcription start site.
- Compare different data tracks (or other
objects) to each other to see how well they correlate.
- Evaluate the ability of different features to
be used as predictors of others.
- Combine results from several different
analyses into larger meta-analyses to simplify comparisons.
- Sequences, along with any annotation data
tracks and motif prediction tracks, are visualized in an internal sequence browser.
Since all data is kept in memory at all time, the browser supports very fast navigation with
panning to show different parts of the sequences and zoom to any scale.
- The visualization is highly customizable. Numeric tracks can be displayed as either
graphs or heatmaps (color gradients) and region tracks can be displayed
in "compact mode" with overlapping regions on top of each other or in
"expanded mode" with overlapping regions beneath each other.
- Tracks containing motif and module predictions
have special status. At 1000% scale and higher, motif binding regions can
be shown with superimposed "match logos" that convey information about
both the appearance of the binding motif itself and also how well the motif
matches the sequence in that region.
- The visibility of individual motifs, modules,
sequences or data tracks can be easily altered to show only the parts
that you are most interested in focusing on at any moment.
- Interactive tools can be used to search for
motifs and highlight binding sites with selected properties, for
instance motifs that match the consensus sequence "CACGTG", motifs
that bind the transcription factor "CREB" or binding sites where the average
conservation level within the site is greater than some dynamically chosen
cutoff value.
|
|