A steganographic approach to genome-wide motif finding and its applications
Weixiong Zhang
Department of Computer Science and Engineering Department of Genetics
Washington University in St. Louis, USA


Understanding transcriptional regulation is critical for elucidating complex biological processes and human diseases. The transcriptional regulation is largely determined by the binding of transcription factors (TFs) to TF binding motifs (TFBMs). TFs often act synergistically to form complexes and thus TFBMs often appear in modules. Despite that many motif finding methods have been developed, it remains a challenge to discover TFBMs and motif modules, particularly in a genome-wide scale.

We approach the problem of discovering TFBMs from a steganographic perspective in which some secrete messages (motifs) are embedded in a stegoscript (genome). I will first describe an efficient, genome-wide motif finding algorithm, called WordSpy. I will then consider the problem of motif-module discovery, discuss our WordModuler motif-module discovery algorithm, and present some results of cis-element modules for yeast cell-cycle regulation.

In the second part of my talk, I will discuss two applications of our motif identification and analysis methods. The first is to understand the regulation of modules of co-expressed genes differentially expressed in the brains of patients of Alzheimer's disease. The result indicates that many genes that are co-expressed in diabetes, cardiovascular diseases and Alzheimer's disease share the same regulatory mechanism. The second application is to identify stress-responsive microRNAs in plant by a cis-element based transcriptome analysis. Using this method, we predict 19 microRNAs in 11 microRNA families to be inducible by cold stress in model plant Arabidopsis. Experimental validation show that among the eleven microRNAs, eight are differentially induced and three are constantly expressed under low temperature. Our result expands the number of cold-inducible microRNAs from four to eight.