Gary Stormo

Department of Genetics Center for Genome Sciences
Washington University Medical School

30 years of PWMs: Where are we now and what comes next?

Position weight matrices (also called just weight matrices or position specific scoring matrices, PSSMs) were introduced in a 1982 paper as a method to represent the specificity of a nucleic acid binding protein. After a few years it became the most commonly used representation while at the same time there became many alternative methods for determining PWMs, from fitting of experimental data to statistical analyses of collections of binding sites to the development of several motif discovery algorithms. Over the same period the limitations of PWMs became apparent and various alternatives were developed to provide better representations, although limiting data left PWMs as the primary model because of their relatively small number of parameters. Current high-throughput experimental methods allow us to address in detail and for many different factors the important questions of how accurate are their PWM representations and how to develop better representations. This talk will cover some history, some current work and some speculations about the future.

Qing Zhou

Department of Statistics
University of California, Los Angeles.

Constructing sparse binding landscapes by penalized posterior sampling.

We develop a penalized posterior sampling method to construct binding landscapes of DNA-binding factors from ChIP-seq data, nucleosome occupation models, and often a large set of position-specific weight matrices (PWMs). The method uses penalty counts to achieve a sparse selection of PWMs and a more accurate prediction of binding site locations. Applications to mouse ChIP-seq data demonstrate the effectiveness of this method compared to other scanning-based approaches.

Saurabh Sinha

Department of Computer Science
University of Illinois at Urbana-Champaign.

Modeling transcription factor occupancy profiles in Drosophila.

Transcriptional regulation is the result of transcription factors (TFs) binding the DNA and interacting with the basal transcriptional machinery, and with each other, to regulate gene expression. In recent years, chromatin immunoprecipitation (ChIP)-based genome-wide assays of TF occupancy have emerged as a powerful, high throughput method to understand transcriptional regulation, especially on a global scale. With the availability of ChIP-chip and ChIP-SEQ data sets, attempts have been made to correlate occupancy profiles to the TF’s binding specificity as described by a motif. The ultimate goal is to be able to model a TF’s occupancy profile in any given cellular condition. In this talk, I will present our on-going work towards this goal. We have analyzed TF-ChIP data sets in Drosophila using our statistical thermodynamics-based model of occupancy. While the baseline model uses only the TF’s motif in predicting ChIP profiles, we incorporated additional features into the model, including TF concentration, competition and cooperation between TFs, and DNA accessibility, to test if these features make a significant contribution to measured occupancy. We find evidence for a variety of factors influencing TF-DNA interaction, some common across many ChIP data sets and some more specific to a few data sets.

Remo Rohs

Molecular and Computational Biology Program
Departments of Biological Sciences and Chemistry
University of Southern California

New approaches to genome analysis based on the integration of DNA sequence and shape

High-throughput sequencing technologies continue to produce large amounts of DNA sequence information. The availability of whole genomes has dramatically changed biological and biomedical research and our understanding of cellular functions, biological processes, and disease. Whereas analyzing the genome as a linear one-dimensional string of letters provides answers to many biological questions, proteins recognize DNA as a three-dimensional object. Considering DNA as a double helix with sequence-dependent shape enables the biophysical characterization of protein-DNA readout. This presentation describes new approaches to genome analysis based on the integration of sequence and shape, including the evolutionary relationships between transcription factor binding sites, motif search, and de-novo motif discovery.

Wei Wang

Department of Chemistry and Biochemistry
University of California, San Diego.

Delineation of epigenetic landscape in human cells

The concept of epigenetic landscape is widely appreciated in description of how cell fate is decided. With the fast accumulation of genomic and epigenomic data, it is tempting to fill biological details into this abstract framework and delineate the epigenetic landscape at the systems level. Two perspectives on delineating the epigenetic landscape in human cells will be discussed. The first aspect is to depict the epigenetic states by genome-wide measurements of chromatin states and DNA methylation across diverse human cells, which paints a global view of the relationship between epigenetic modifications and cell type specificity. Comparative annotation of these epigenomes is able to reveal the regulatory elements responsible for specifying cell functions. The second perspective is to determine the cell states based on genetic network and delineate the potential landscape of cell states that defines the cell specificity. Recent progress on developing efficient computational methods to overcome the hurdles of modeling complex genetic network will be reported. The applications of such a systems biology approach in phenotype prediction will be discussed.

Andrew Smith

Molecular and Computational Biology section
Division of Biological Sciences
University of Southern California.

Precisely bounding genomic regulatory regions in mammals using high-resolution DNA-methylation data

DNA methylation is unique among epigenomic marks in that it is associated with individual nucleotides, and in mammals almost exclusively at CpG sites. Since the emergence of high-throughput bisulfite sequencing, several full-genome single-CpG methylation profiles, or methylomes, have been produced in human, mouse and chimp. Mammalian genomes are highly methylated almost everywhere, except at hypomethylated regions (HMRs) typically associated with promoters or enhancers. These HMRs can be identified with very high precision to indicate the likely boundaries of regions that are accessible to transcription factor binding. HMR boundaries are highly consistent across methylomes and often identifiable to a single nucleotide position, suggesting a mechanism regulating the boundaries. However, at many tissue-specific genes we observe tissue-specific shifts in HMR boundaries around promoters. Similarly, cases of expanding or contracting HMRs can be found when comparing methylomes between species. We hypothesize that these boundaries indicate precisely where the regulatory elements reside near a gene, and may indicate the extent of the proximal promoter for a given gene. I will discuss the available data, explain how these data are analyzed and describe interesting examples. I will also discuss the implications for our understanding of promoter organization in mammals.

Siddarth Selvaraj

Ludwig Institute for Cancer Research
Bioinformatics and Systems Biology Graduate Program
University of California

Identification and Characterization of Topological Domains in Mammalian Genomes

The structural organization of the genome has a fundamental role in its function. The transcription regulatory process, for example, involves higher order chromatin structures, where transcriptional activation is frequently mediated through long range looping interactions between promoters and enhancers and accompanied by dynamic chromatin movement in the nucleus [2,3]. It has also been well recognized that heterochromatin, the highly compacted chromatin structure, drives gene silencing, while euchromatin, the open chromatin structure, promotes gene activation. As such, knowledge of the higher order chromatin structure is essential for a full understanding of transcriptional control and other nuclear processes.

Manoj Hariharan

Department of Genetics
Stanford University

Context-specific Combinatorial Interaction of Transcription Factors in Gene Regulation

It is well known that transcription factors (TFs) form complexes with other TFs or cofactors to assert transcriptional regulation of genes. However, it is the first time that a series of high-throughput experiments have been performed to identify the binding sites of more than a hundred TFs in ~70 different cell-types, as part of the ENCODE project. This gave us the unique opportunity to address one of the central themes of gene regulation, i.e., combinatorial interaction among the TFs. Expression profiles, DNase1 hypersensitive sites, nucleosome depleted sites and histone modification data were also available across the entire genome. This enabled the cross-comparison and made integration feasible and robust. Here we describe our findings based on ~100 TFs across five cell lines. We describe the context-specificity of specific combinations of TF-TF interactions. How these combinations can yield varying outputs, in terms of gene regulation and functional modularity will be discussed.

Idit Kosti

Faculty of Biology
Technion - Israel Institute of Technology

An integrated regulatory network reveals pervasive cross-regulation among transcription and splicing factors

The operation of a living cell depends on its ability to regulate its different functions. The master regulators in the cell are proteins that control the function of many other genes by several mechanisms. Transcription factors can differentially activate or repress transcription of genes by binding to their regulatory elements. A second major mechanism of gene expression regulation occurs at the level of alternative splicing. Alternative splicing is regulated by splicing factors that bind to short regulatory motifs on the RNA and dictate the final gene architecture. Traditionally the gene expression pathway was regarded as being composed of independent steps, from RNA transcription to protein translation. To-date there is increasing evidence for coupling between the different processes of the pathway, specifically between transcription and splicing.

Logan J. Everett

Institute for Diabetes, Obesity and Metabolism
Department of Genetics
Perelman School of Medicine
University of Pennsylvania

Cistromic analysis reveals novel insights into hepatic CREB regulatory mechanisms

The liver is a central organ in the maintenance of blood glucose homeostasis throughout the normal feeding/fasting cycle, and disruption of this homeostatic function is a contributing factor in a number of metabolic diseases1. In mammalian systems, a drop in blood glucose triggers the release of the hormone glucagon, which triggers the production of cAMP in liver hepatocytes2. The cAMP-Response Element Binding protein (CREB) was initially identified as a primary effector of transcriptional changes in response to cAMP signaling3. In particular, CREB has been shown to be a major regulator of fasting-induced hepatic gluconeogenesis4, but the precise molecular mechanisms by which CREB achieves functional specificity in different physiological contexts remain to be elucidated. Increased cAMP levels have been shown to activate Protein Kinase A, which phosphorylates CREB on Serine 133, thereby promoting interactions between CREB and transcriptional co-activators2. In vitro studies have also suggested that phosphorylation of S133 promotes CREB binding to specific DNA sequences5, providing a potential mechanism by which cAMP induces the expression of some CREB target genes, but not others. CREB has also been shown to respond to a number of other signaling pathways, and regulate a broad array of gene transcription programs in different tissues2. Thus, CREB remains an interesting potential therapeutic target in the treatment of metabolic diseases, but challenges remain in specifically targeting the downstream gluconeogenic program.

Andrei Thomas-Tikhonenko, Ph.D.

Perelman School of Medicine at the University of Pennsylvania.

Quantitative transcriptome-wide analysis of the Myc-miR-17-92 axis

The Myc family oncoproteins are non-canonical transcription factors that regulate greater than 15% of the human transcriptome. Recent work from several laboratories including our own has indicated that some of these unusually broad effects could be mediated through deregulation of a limited number of microRNAs, both Myc-repressed (e.g., miR-34a) and Myc-activated (members of the miR-17-92 cluster). To measure, in an unbiased fashion, the microRNA component of the Myc pathway, we first performed mRNA profiling of human P493-6 lymphoblastoid cells carrying a repressible c-MYC allele. Both Myc-repressed and Myc-stimulated genes exhibited enrichment for predicted binding sites for Myc-regulated miRNAs; however, the miRNAs most enriched for were members of the miR-17~92 cluster. Thus, we repeated the profiling analysis on P493-6 cells saturated with exogenous miR-17-92 mimics to render this miR cluster refractory to changes in c-Myc levels. More than 1400 Myc-repressed genes were found to be stabilized in the presence of steady miR-17~92 levels. Subsequent Gene Set Enrichment Analysis (GSEA) demonstrated that these miR-17~92-dependent Myc targets are selectively enriched in negative regulators of several pathways involved in B-cell proliferation. The significance of these pathways for lymphomagenesis will be discussed.

An-Yuan Guo

Hubei Bioinformatics & Molecular Imaging Key Laboratory
Department of Systems Biology
College of Life Science and Technology
Huazhong University of Science and Technology

MicroRNA and transcription factor co-regulatory network analysis reveals miR-19 inhibits CYLD in T-cell acute lymphoblastic leukemia

T-cell acute lymphoblastic leukemia (T-ALL) is an aggressive hematological malignancy accounting for about 15 and 25% of pediatric and adult acute lymphoblastic leukemia, respectively. T-ALL is usually characterized by proliferation of thymocytes at various stages of development with high white blood cell counts, mediastinal lymph nodes enlargement and central nervous system involvement. Although this neoplastic disorder originates from the thymus, it will spread throughout all organs and will be fatal rapidly without therapy. Compared with the common B-cell lineage ALL, T-ALL has a worse prognosis in patients historically. Current multi-agent combination chemotherapy provides an overall survival rate of 60%-70% in children and only 30Ė40% in adults. Currently, understanding of the etiology of T-ALL has largely come from the studies of gene abnormalities. Although the oncogenicity of these genes is well established, understanding of the transformational programs and multi-step pathogenesis of T-ALL remains limited. Especially the regulatory networks of T-ALL genes expression are still elusive.

Zhengqing Ouyang

Howard Hughes Medical Institute and Program in Epithelial Biology
Department of Genetics and Center for Genomics and Personalized Medicine
Stanford University School of Medicine

SeqFold: Accurate genome-scale RNA structure reconstruction integrating experimental measurements provides insights into gene regulation

Regulatory information in RNA is encoded not only in its primary sequence, but also in its structure with complex base pairing patterns. Virtually every step in the gene expression program, from transcription to splicing and translation, is influenced by RNA structure. Precise mapping of RNA structure is essential for understanding the functions of RNAs, especially for the large set of functionally uncharacterized non-coding RNAs (ncRNAs) (Wan et al. 2011). Experimental methods for RNA structure determination, although quite accurate, are traditionally only applicable to analyze a single RNA per experiment, and limited in the length of the probed RNA. Computational methods, aiming at predicting RNA structure from primary sequence, have been developed and can be applied to a large number of RNAs with the increasing computational power. However, in silico algorithms have variable accuracy, and may be limited by the scope of applicability under real experimental conditions.

Harmen Bussemaker

Department of Biological Sciences
Columbia University.

Dissecting transcription factor networks using high-throughput sequencing and quantitative genetics

In this talk I will describe our recent efforts to elucidate the molecular interactions underlying the behavior of gene regulatory networks. I will first demonstrate how deep sequencing of interactions between DNaseI and naked DNA uncovers a strong dependence of cleavage rate on nucleotide sequence, which can be related to the width of the minor groove; an additional, equally striking dependence on CpG methylation status exists, allowing us to predict methylation status from in vivo DNaseI profiles. Next, I will show how SELEX-seq, a novel methodology for quantifying in vitro interactions between transcription factor complexes and DNA, allowed us to discover that heterodimerization with the cofactor Extradenticle gives rise to large differences in DNA binding specificity between Hox proteins that are absent when these proteins bind as monomers. Finally, I will demonstrate how linear modeling of genetic variation in mRNA expression levels combined with prior information about the DNA binding specificity of transcription factors can be used to map the loci ("aQTLs") whose allelic variation modulates their regulatory activity.

Roderic Guigo

Center for Genomic Regulation
Universitat Pompeu Fabra

Interrogating RNA heterogeneity

The unfolding of the instructions encoded in the genome is triggered by the transcription of DNA into RNA, and the subsequent processing of the resulting primary RNA transcripts into functional mature RNAs. RNA is thus the first phenotype of the genome, mediating all other phenotypic changes at the organism level caused by changes in the DNA sequence. While current technology is too primitive to provide accurate measurements of the RNA content of the cell, the recent development of Massively Parallel Sequencing Instruments has dramatically increased the resolution with which we can monitor cellular RNA. Using these instruments, the ENCODE project has surveyed the RNA content of multiple cell lines and subcellular compartments. The results of these surveys underscore pervasive transription, as well as great RNA heterogeneity between and within cells. Comparison of RNA surveys with other genome wide epigenetic surveys—such as those of binding sites for Transcription Factors, or of Histone modifications—reveals a very tightly coupling between the different pathways involved in RNA processing, transcription and splicing in particular.

Igor Zwir

University of Granada, Spain.

Mapping sequence to numbers: A quantitative model of promoter binding and gene transcription kinetics under DNA accessibility constraints

A major challenge in biology is to develop quantitative, predictive models of gene regulation that unfold over time in response to environmental changes. Promoters contain transcription factor binding sites differing in their affinity and accessibility, but little is understood about how these variables combine to generate a single fine-tuned, quantitative response. By using the targets of the PhoP DNA binding protein in Salmonella, we were able to quantify the relations between transcription factor input and expression output. We developed a model capable of capturing variable changes in in vitro measurements of kinetic constants of individual binding sites; combining them in promoters with multiple sites; and inferring the corresponding effective affinities that drive disparate in vivo kinetic binding behaviors of PhoP co-regulated promoters. The model faithfully reproduced the observed quantitative changes in terms of binding and transcription onset times and levels (that are not necessarily correlated) of the promoters that occurred upon altering the affinity of the transcription factor for its binding sites by the silencing effect of nucleotide association proteins that impedes transcription factors to access to the DNA. Furthermore, because in vivo binding and in vivo transcription kinetics where independently modeled, we were able to identify key cis-acting elements that are responsible for discrepant behaviors. Because the quantitative measurements are not always available for a given regulator, the quantitative parameters of this model were mapped to sequence motifs representing cis-acting elements arranged in particular promoter architectures, and safely replaced to achieve a stand-alone predictive model. This model, based on sequence analysis, was able to predict gene expression of PhoP regulated genes that were not previously implicated in the model construction, without measuring their biochemical parameters. Finally, the knowhow gained in this study can serve as a base to be applied in other systems and/or species with less detailed genome wide experiments.

Alex Hartemink

Department of Computer Science
Duke University.

Toward a Mechanistic Understanding of Transcriptional Regulation: A Systems Perspective on Genome Occupancy

A key goal of regulatory genomics is to predict gene expression---and more immediately, transcript production---from genomic sequence. One significant milestone toward this goal will be to accurately predict promoter occupancy from genomic sequence. We adopt a systems perspective to model the competitive binding of multiple factors along the genome, with an eye toward a more mechanistic understanding of genome occupancy and its dynamics. Hundreds of different factors adorn the eukaryotic genome, binding to it in large number. These DNA binding factors (DBFs) include nucleosomes, transcription factors (TFs), and other proteins and protein complexes, such as the origin recognition complex (ORC). DBFs compete with one another for binding along the genome, yet many current models of genome binding do not consider different types of DBFs together simultaneously. Additionally, binding is a stochastic process that results in a continuum of binding probabilities at any position along the genome, but many current models consider genomic positions to be either binding sites or not binding sites. We present COMPETE, a model that allows a multitude of DBFs, each at different concentrations, to compete with one another for binding sites along the genome. The result is an occupancy profile, a probabilistic description of the DNA occupancy of each factor at each position along the genome.

Marit Ackermann

Cellular Networks and Systems Biology
Biotechnology Center
TU Dresden

Assessing the impact of natural genetic variation on gene expression dynamics

Natural genetic variation affects gene expression levels and thereby impacts on molecular and physiological phenotypes such as protein levels, cell morphology or disease phenotypes. A genetic locus containing a sequence variant that affects transcript levels of a gene is called an expression quantitative trait locus (eQTL). Diferences in mRNA expression levels caused by natural genetic variation can manifest themselves between individuals, populations, environments and, very importantly, between cell types and tissues.