Protocol |
Description |
Notes |
Motif Count |
This protocol will perform motif scanning and output a table showing
the number of times each motif is found in the sequences.
An overrepresentation p-value is calculated for each motif by comparing
the motif's observed frequency in the sequences to its expected frequency
based on the number of times it is predicted in a scrambled version of
the original sequences (i.e. randomly created artificial sequences with
the same oligonucleotide composition as the original sequences).
|
|
Estimate
expected motif occurrence frequencies |
This protocol will generate 50 random DNA sequences with specified length
based on a chosen background model and then perform motif scanning in these
sequences and calculate the occurrence frequency of each motif. These
frequencies can be used as "expected frequencies" to calculate p-values for
overrepresentation with the "count motif occurrences" analysis. Note that the
same scanning method and settings should be used when scanning for motifs in
these artificial control sequences that you used when analysing the actual
target sequences.
|
|
TFBS filtering |
This protocol performs motif scanning in a set of sequences with motifs from
TRANSFAC and then proceeds to filter out predictions according to different
criteria, such as binding sites that are not conserved, binding sites
overlapping with known repeat regions, binding sites that are not located
within a DNase hypersensitivity site, binding sites that are not supported by
a ChIP-seq peak region for the corresponding TF or binding sites that do not
have sites for known interaction partners within a specified distance.
|
|
Motif
Discovery Benchmark |
This protocol will generate 20 random DNA sequences with specified length
based on a chosen background model and then plant up to 5 selected motifs at
random locations in these sequences. A few de novo motif discovery methods
will be run to predict the locations of the motifs in the sequences, and the
performance of the methods will be evaluated. Note that in order to use this
protocol you must have installed/configured the motif discovery methods used
here (or rewrite the protocol to use other methods instead).
|
|
Forskolin analysis |
This is the protocol (slightly modified) which was used for the third use case
example presented in the MotifLab publication to identify interesting motifs
and binding sites in promoter regions of genes whose expression were
significantly changed in response to treatment with forskolin. The protocol
performs motif scanning and then finds motifs that are significantly
overrepresented compared to an expected motif frequency, as well as finding
motifs that have a high average conservation across all binding sites and
motifs that tend to appear in the same location relative to the TSS in several
sequences. Finally, results from these three different analyses are collated
into a larger analysis and the motifs are ranked according to the combined
rank sum of these three properties.
|
|
Simple
use of positional priors to guide motif discovery |
This protocol demonstrates how some simple numeric tracks, where the value in
each position correlates with the probability of observing a binding site in
that position, can be used as "positional priors" tracks that can guide motif
discovery programs towards the target motifs. Tracks such as Conservation,
DNase hypersensitivity tracks and ChIP-seq tracks can be used either directly
or with minimal processing (depending on the motif discovery method used).
The protocol also demonstrates how operations can be used to manually create
a positional priors track with a specific search focus, namely searching for
additional motifs in neighborhood regions around known binding sites.
|
|