Motif and Module Collections


This page contains collections of motifs and modules from published papers or on-line motif databases.
Note that there is a substantial degree of overlap between some of the collections.

Import motif collection from file
Right-click on the link of a collection below and select "Save Target As..." (Internet Explorer) or "Save Link As..." (Firefox) from the context menu to save the collection file to your local file system. Go to MotifLab and select "Add New ⇒ Motif Collection" from the Data-menu. In the collection dialog, go to the "Import" tab and press the "Browse" button to locate the file you just saved. The format should be "MotifLabFormat".

Import motif collection from URL
Right-click on the link of a collection below and select "Copy Shortcut..." (Internet Explorer) or "Copy Link Location..." (Firefox) from the context menu. Go to MotifLab and select "Add New ⇒ Motif Collection" from the Data-menu. In the collection dialog, go to the "Import" tab and paste the URL link you just copied into the "File or URL" textbox (using CTRL+V in Windows or ⌘+V on Mac). The format should be "MotifLabFormat".


Collection Description


JASPAR (full)


CORE
FAM
PBM
PBM_HLH
PBM_HOMEO
PHYLOFACTS
POLII
SPLICE
Motifs from the 2016 update of the popular JASPAR collection. The full collection contains all motifs from both the CORE collection and the specialized collections, including duplicate models for updated motifs. For instance, the model "MA0003" is included in three versions "MA0003.1", "MA0003.2" and "MA0003.3". Since MotifLab does not allow dots in motif identifiers, the dots have been replaced by underscores (e.g. "MA0003_3").

The CORE and specialized collections only contain the latest version of each motif, and the version suffix has been stripped from the identifiers (but the original identifier with suffix is kept in the property "JASPAR").

Citation:
Mathelier, A., Fornes, O., Arenillas, D.J., Chen, C., Denay, G., Lee, J., Shi, W., Shyr, C., Tan, G., Worsley-Hunt, R., Zhang, A., Parcy, F., Lenhard, B., Sandelin, A. and Wasserman, W. (2015) "JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles", Nucleic Acids Research 44: D110-D115 doi: 10.1093/nar/gkv1176


HOCOMOCO v10

Human
Mouse
Combined
A collection of 641 human and 427 mouse mononucleotide count matrices from the HOCOMOCO motif database (version 10, September 2015). Each motif has a "PWM quality" rating from "A" (best) to "D" (worst) or alternatively the letter "S" to denote secondary (mostly single-box) models (thus allowing two models for the same TF). All the motifs for both human and mouse have identifiers on the format "Mnnnnn", where the 5-digit incremental number starts at 01001. In the combined collection, the human motif identifiers have been renamed to start with the prefix "MH" and the mouse motifs start with "MM" to avoid overlapping identifiers.

Citation:
Ivan V. Kulakovskiy, Ilya E. Vorontsov, Ivan S. Yevshin, Anastasiia V. Soboleva, Artem S. Kasianov, Haitham Ashoor, Wail Ba-alawi, Vladimir B. Bajic, Yulia A. Medvedeva, Fedor A. Kolpakov and Vsevolod J. Makeev (2016) "HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models",
Nucleic Acids Research 44 (D1): D116-D125        doi:10.1093/nar/gkv1249


RegulonDB
The RegulonDB collection contains 82 motifs from Escherichia coli K-12.

Citation:
Gama-Castro S, Salgado H, Santos-Zavaleta A, Ledezma-Tejeida D, Muñiz-Rascado L, García-Sotelo JS, Alquicira-Hernández K, Martínez-Flores I, Pannier L, Castro-Mondragón JA, Medina-Rivera A, Solano-Lira H, Bonavides-Martínez C, Pérez-Rueda E, Alquicira-Hernández S, Porrón-Sotelo L, López-Fuentes A, Hernández-Koutoucheva A, Moral-Chávez VD, Rinaldi F, Collado-Vides J. (2016) "RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond", Nucleic Acids Research 44(D1):D133-43
doi:10.1093/nar/gkv1156


PLACE
PLACE is a database of cis-acting regulatory DNA elements in plants.
The collection contains 469 motifs.

Citation:
Kenichi Higo, Yoshihiro Ugawa, Masao Iwamoto and Tomoko Korenaga (1999) "Plant cis-acting regulatory DNA elements (PLACE) database: 1999", Nucleic Acids Research 27(1): 297-300        doi:10.1093/nar/27.1.297


HOMER
The HOMER motif collection contains 332 models based mostly on ChIP-Seq data.

Citation:
Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, Glass CK (2010)
"Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities", Molecular Cell 38(4):576-89        doi:10.1016/j.molcel.2010.05.004


Jolma
A collection of 843 motif models based on high-throughput SELEX experiments in humans.

Citation:
Jolma A, Yan J, Whitington T, Toivonen J, Nitta KR, Rastas P, Morgunova E, Enge M, Taipale M, Wei G, Palin K, Vaquerizas JM, Vincentelli R, Luscombe NM, Hughes TR, Lemaire P, Ukkonen E, Kivioja T, Taipale J. (2013) "DNA-binding specificities of human transcription factors", Cell 152(1-2):327-39     doi:10.1016/j.cell.2012.12.009


Kellis
A collection of 2065 known and discovered motif models based on ENCODE2 ChIP-Seq data. Produced by Kellis lab at MIT.

Citation:
Pouya Kheradpour and Manolis Kellis (2013) "Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments", Nucleic Acids Research 42(5): 2976-2987        doi:10.1093/nar/gkt1249


3D-footprint
A collection of 1118 motifs for various species obtained from the 3D-footprint database.

Citation:
Contreras-Moreira,B. (2010) "3D-footprint: a database for the structural analysis of protein-DNA complexes",
Nucleic Acids Research 38: D91-D97        doi:10.1093/nar/gkp781


CIS-BP

renamed
CIS-BP is a meta-collection which includes binding motifs from several other collections, such as Transfac, Jaspar, HOCOMOCO and other publications. 5099 motif models in all, although many represent duplicates of the same TFs. The "renamed" collection is identical to the regular collection except that the motif identifiers (originally on the form "Mnnnn") have been changed to "CISBPnnnn".

Citation:
Weirauch MT, Yang A, Albu M, Cote AG, Montenegro-Montero A, Drewe P, Najafabadi HS, Lambert SA, Mann I, Cook K, Zheng H, Goity A, van Bakel H, Lozano JC, Galli M, Lewsey MG, Huang E, Mukherjee T, Chen X, Reece-Hoyes JS, Govindarajan S, Shaulsky G, Walhout AJ, Bouget FY, Ratsch G, Larrondo LF, Ecker JR, Hughes TR. (2014) "Determination and inference of eukaryotic transcription factor sequence specificity", Cell 158(6) : 1431-43
doi:10.1016/j.cell.2014.08.009