RegGenSig 2013 Abstracts

Ben Lehner, THE GENETICS OF INDIVIDUALS: WHY WOULD A MUTATION KILL ME, BUT NOT YOU?
Andrea Califano, Assembly and interrogation of tumor-specific regulatory models reveals master regulators of tumor maintenance and chemosensitivity.
Erik van Nimwegen, The transcription factors democracy: Completely automated inference of genome-wide regulatory interactions from sequencing data.
Jason Ernst: Interplay between chromatin state, regulator binding, and regulatory motifs
Stein Aerts: Motif-based identification of master regulators and direct TF-target interactions in human and Drosophila gene networks
Struan F.A. Grant: Following functional clues based on the genetic commonalities of diabetes and cancer
Johannes Söding: Drosophila Pol II core promoters cluster into four classes characterized by distinct sets of motifs, regulatory properties, and nucleosome patterning.
Alan Moses: Systematic identification of conserved non-coding sequences in plants
Boris Lenhard: Alternative and Overlapping Determinants of Transcription Start Site Selection in Vertebrate Promoters
Bartek Wilczyński: Predicting regulatory domain boundaries from chromatin immunoprecipitation data

Poster Sessions

Ben Lehner

EMBL-CRG Systems Biology and ICREA, Centre for Genomic Regulation, UPF, Barcelona, Spain

Title: THE GENETICS OF INDIVIDUALS: WHY WOULD A MUTATION KILL ME, BUT NOT YOU?

Abstract: To what extent is it possible to predict the phenotypic differences among individuals from their completely sequenced genomes? We use model organisms (yeast, worms and tumours) to understand when you can, and why you cannot, predict the characteristics of individuals from their genome sequences.

Andrea Califano

Title: Assembly and interrogation of tumor-specific regulatory models reveals master regulators of tumor maintenance and chemosensitivity.

Abstract: The recent onslaught of molecular data, across multiple human malignancies, is producing an unprecedented repertoire of genetic and epigenetic alterations contributing to tumorigenesis and progression. Yet, the direct impact of this knowledge on tumor treatment and prevention is still largely unproven. Loss of tumor suppressor function is difficult to target pharmacologically and, with a handful of exceptions, alterations providing potential drug targets are relatively infrequent in cancer patients and are thus unlikely to support clinical development.

By reconstructing and interrogating the in vivo regulatory logic of the cancer cell, which integrates multiple aberrant signals resulting from genetic and epigenetic alterations, systems biology is starting to elucidate and mechanistically validate both oncogene and non-oncogene addiction mechanisms. These mechanisms are exquisitely dependent on the molecular landscape of cancer subtypes, can be targeted pharmacologically, and are frequently synergistic, thus providing uniquely specific entry points for combination therapy.

In this presentation, we will discuss recent result in the discovery of synergistic, non-oncogene addiction mechanisms and their application to the stratification and treatment of high-grade glioma, non-small cell lung cancer, and prostate cancer. The approach is highly extensible and has been applied to a variety of additional tumor subtypes, to the study of stem cell differentiation, reprogramming, and pluripotency control, as well as to the study of neurodegenerative diseases.

Erik van Nimwegen

Title: The transcription factors democracy: Completely automated inference of genome-wide regulatory interactions from sequencing data.

Abstract: How do gene regulatory networks control cell fate and identity in higher eukaryotic organisms? Although gene expression and chromatin state dynamics are ultimately encoded by constellations of binding sites recognized by regulators such as transcriptions factors (TFs) and microRNAs (miRNAs), our understanding of this regulatory code and its context-dependent read-out remains very limited. Experimental researchers interested in elucidating the key regulatory interactions acting within a particular biological system of interest face the difficulty that in higher eukaryotes, there are thousands of potential regulators, and it is not feasible to investigate all these using direct experimentation. Although it has become relatively straight-forward, using next-generation sequencing, to obtain genome-wide measurements of gene expression, chromatin state, and TF-binding dynamics, it is typically far beyond the expertise of experimental groups to connect such data to the actions of individual regulators. And even when experimentalists team up with expert computational biologists, inferring key regulators and their genome-wide interactions from high-throughput data remains highly challenging, typically involving `case-by-case' development of methodology.

In recent years we have developed a methodology that combines automated processing of next-generation sequencing data with genome-wide predictions of TF binding sites and miRNA target sites to model gene expression or chromatin modifications in terms of these sites. This completely automated system, called ISMARA, is available through a web-interface (ismara.unibas.ch) and requires only the uploading of raw micro-array or sequencing (RNA-seq or ChIP-seq) data. ISMARA then automatically identifies the key TFs and miRNAs driving expression/chromatin changes and makes detailed predictions regarding their regulatory roles. These include predicted activities of the regulators across the samples, their genome-wide targets, enriched gene categories among the targets, and direct interactions between the regulators.

In the presentation I will discuss various aspects of the methodology implemented in ISMARA, and illustrate the power of the approach by demonstrating that, for well-studied model systems, ISMARA consistently identifies known key regulators and their actions ab initio in a completely automated fashion and without any tunable parameters.

Jason Ernst

Title: Interplay between chromatin state, regulator binding, and regulatory motifs

Abstract: The regions bound by sequence-specific transcription factors can be highly variable across different cell types, despite the static nature of the underlying genome sequence. This has been partly attributed to changes in chromatin accessibility, but a systematic picture has been hindered by the lack of large-scale datasets. In this talk I will describe our efforts analyzing 456 binding experiments for 119 regulators and 84 chromatin maps generated by ENCODE in six human cell types and relating those to a global map of regulatory motif instances for these factors. We find specific and robust chromatin state preferences for each regulator beyond the previously-reported open-chromatin association, suggesting a much richer chromatin landscape beyond simple accessibility. The preferentially-bound chromatin states of regulators were enriched for sequence motifs of regulators relative to all states, suggesting that these preferences are at least partly encoded by the genomic sequence. Relative to all regions bound by a regulator however, regulatory motifs were surprisingly depleted in the regulator's preferentially-bound states, suggesting additional non-sequence-specific binding beyond the level predicted by the regulatory motifs. Such permissive binding was largely restricted to open-chromatin regions showing histone modification marks characteristic of active enhancer and promoter regions, whereas open-chromatin regions lacking such marks did not show permissive binding. Lastly, the vast majority of co-binding of regulator pairs is predicted by the chromatin state preferences of individual regulators. Overall, our results suggest a joint role of sequence motifs and specific chromatin states beyond mere accessibility in mediating regulator binding dynamics across different cell types.

Joint work with Manolis Kellis

Stein Aerts

Title: Motif-based identification of master regulators and direct TF-target interactions in human and Drosophila gene networks

Abstract: We revisit the problem of motif discovery in Metazoan co-expressed gene sets. We discuss in this talk how classical motif discovery, but also modern 'track discovery', can be complementary approaches to ChIP-seq assays and how they continue being invaluable to decipher gene regulatory networks. This is particularly true for biological systems that are less amenable to high-throughput methods, and for processes for which the master regulators are yet unknown. We illustrate the power of motif discovery by mapping an extensive gene regulatory network underlying Drosophila eye development. To this end, we exploit (1) tissue-specific gene expression across three Drosophila species; (2) multiple genetic perturbations and cell sorting experiments in the eye disc; and (3) open chromatin profiling using FAIRE-seq. We identify several new targetomes of eye-related transcription factors, such as Glass, the master regulator of photoreceptor differentiation.

As a next step towards the integration of motif discovery with gene regulatory network inference, we developed iRegulon, a Cytoscape plugin that unites cis-regulatory sequence analysis with biological network tools. Using iRegulon, we re-analyzed microRNA target sets, signaling pathways, Gene Ontology classes, STRING and GeneMania networks, TF perturbation signatures, and finally twenty thousand cancer gene signatures. Through meta-analysis we summarize TF-target interactions yielding “meta- targetomes” that can be useful to annotate re-sequenced cancer genomes.

Struan F.A. Grant

Division of Human Genetics, Children's Hospital of Philadelphia Research Institute,

Perelman School of Medicine, University of Pennsylvania, USA

Title: Following functional clues based on the genetic commonalities of diabetes and cancer

Abstract: The repertoire of genes already established to play a role in the pathogenesis oftype 2 diabetes (T2D) has grown substantially due to recent genome wide association studies (GWAS). In 2006, we discovered the strong association of variants in the transcription factor 7 like 2 (TCF7L2) gene with T2D. Other investigators have already independently replicated this finding in different ethnicities and, interestingly, from the first GWAS of T2D in Caucasians, the strongest association was indeed with TCF7L2; this is now considered the most significant genetic finding in T2D to date.

Interestingly, there is also a very strong connection between TCF7L2 and cancer. The key 8q24 locus found to be the most strongly associated genomic region with a number of cancers through GWAS contributes to the disease pathogenesis through mutation of an upstream TCF7L2-binding element driving the transcription of the MYC gene. Indeed, is has been known for some years that TCF7L2 harbors specific mutations that strongly influence colorectal cancer risk plus genomic sequencing of colorectal adenocarcinomas identified a recurrent VTI1A-TCF7L2 gene fusion. Furthermore, many of the T2D GWAS-derived risk conferring alleles have been shown to protect against prostate cancer; in addition, THADA, JAZF1 and TCF2 are loci that have been strongly detected in separate GWAS analyses of prostate cancer and T2D. Thus, TCF7L2 and other T2D associated genes also appear to be key players in cancer pathogenesis; however, this mechanism is still far from understood.

We previously performed chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) with this transcription factor to elucidate its binding repertoire genome wide. Unexpectedly, and despite employing a carcinoma cell line, the genes with TCF7L2 binding sites are strongly enriched in pathway categories related to metabolic-related functions and traits, further suggesting a role for metabolism in cancer. Furthermore, the list of loci bound by TCF7L2 harbors a highly significant over-representation of GWAS loci associated with T2D and cardiovascular disease.

With all these intriguing facts in mind, we are taking forward the loci that are common to T2D and cancer GWAS outcomes and investigating the impact on cell proliferation with the ultimate goal of testing their role in beta-cell proliferation in mice, a mechanism which still largely eludes the diabetes research community.

Johannes Söding

Title: Drosophila Pol II core promoters cluster into four classes characterized by distinct sets of motifs, regulatory properties, and nucleosome patterning.

Holger Hartmann*, Mark E. L. Heron*, Anja Kiesel*, Lukas Utz, Claudia Gugenmus, and Johannes Söding (* equal contributions)

Abstract: Core promoters (CPs) are the sites in the genome that recruit the basal transcription machinerie in order to initiate transcription. High-throughput measurements of transcription start site distributions have established the existence of two classes of eukaryotic CPs: Narrow-peaked or “focussed” promoters are usually highly regulated, while broad-peaked or “dispersed” promoters mostly belong to constitutively expressed housekeeping genes. These two classes differ in their motif composition, and it is becoming clear that their motifs influence which combinations of basal transcription factors assemble into a functional preinitiation complex.

We have systematically studied, at the example of Drosophila melanogaster, the link between core promoter elements and the resulting regulatory properties. Our motif discovery method XXmotif finds 12 known and 7 novel, conserved core promoter elements (CPEs). These motifs fall into four groups that tend to co-occur and that characterize four overlapping classes of CPs: (1) strongly regulated, stallable, INR-enriched CP, mostly from developmental genes, (2) highly inducible, TATA-containing CPs, (3) constitutive CPs from housekeeping genes, and (4) very strongly constitutively active CPs, mostly from ribosomal genes. Furthermore, each class has a characteristic dinucleotides profile that is correlated with its nucleosome patterning and 5’-nucleosome-free region. The four CP classes hint at four major, alternative pathways of transcription initiation, each of which uses a different set of basal transcription factors and thereby determines regulatory response. Despite employing different motifs, the same four classes of CPs are likely to exist in humans and other species.

Alan Moses

Title: Systematic identification of conserved non-coding sequences in plants

Abstract: Despite the central importance of noncoding DNA in gene regulation and evolution, our understanding of the genomic extent and nature of selection on plant noncoding regions remains limited. This is in contrast to other clades containing model organisms (mammals, fruit fly, budding yeast, etc.) where studies of sequence conservation across large numbers of related genomes have provided a powerful approach to identify and characterize functional noncoding sequences. To systematically identify conserved non-coding regions in Arabidopsis and its close relatives (crucifers) we sequenced three Brassicaceae species and analyzed them alongside six previously sequenced crucifer genomes. We compared the conservation of non-coding DNA in plants to what had been previously observed in other organisms. For example, although we find that these plants have shorter and fewer conserved non-coding sequences than have been observed in animals, genes involved in development, particularly transcription factors, are associated with large numbers of the most highly constrained non-coding sequences. Remarkably, since plants’ and animals’ most recent common ancestor was likely unicellular, this suggests that complex regulatory control of developmental patterning transcription factors evolved independently in the two major lineages of complex multicellular life. We also performed whole genome motif-finding on the conserved non-coding sequences and identified known and novel transcription factor binding specificities, as well as other motifs. Finally, using population genomics data, we tested for more recent evidence of selection on the conserved non-coding sequences and regulatory motifs.

Boris Lenhard

Imperial College London

b.lenhard@imperial.ac.uk

Title: Alternative and Overlapping Determinants of Transcription Start Site Selection in Vertebrate Promoters

Vertebrate promoters differ with respect to the precision of transcription star site (TSS)selection and the sequence motifs that determine it. The highly precise transcription start sites are found at fixed distances from e.g. TATA box motifs, while most TATA-less promoters allow transcription to start within a broader region. Here we report that TSS selection rules change systematically on subsets of promoters in development and differentiation time courses. The first and most intriguing is the promoter grammar change during maternal to zygotic transition during early embryonic development. We have analysed transcription initiation sites at 1bp resolution in combination with histone modification at core promoter regions at high spatial precision in the course early development of zebrafish. We show that the switch from maternal to zygotic transcriptome is accompanied by a switch between two fundamentally different mechanisms for defining transcription initiation. Upon zygotic transcription activation, the maternal specific W-box motif dependent TSS definition is replaced with a SS|WW dinucleotide enrichment boundary-associated grammar. The two grammars coexist in core promoters of ubiquitously expressed genes in close proximity or in an overlapping fashion and thus enable the continuous expression of these genes in the two very different intracellular environments. The switch in promoter interpretation constitutes a central part of the mechanism for setting up the promoters for the regulation of early development. We show that related, albeit less dramatic systematic changes in TSS selection occur during male spermatogenesis and skeletal muscle differentiation. To ensure gene expression in all stages of these processes, the corresponding promoters must accommodate all the required grammars, often in an overlapping fashion.

Bartek Wilczyński

Title: Predicting regulatory domain boundaries from chromatin immunoprecipitation data

Pawel Bednarz, Bartek Wilczyński

Institute of Informatics, University of Warsaw, Poland

During development of a multicellular organism, cells undergo an orchestrated series of transformations. Through coordinated proliferation and differentiation a complex system of thousands of complementary cells is formed. One of the key aspects of this process is regulation of transcription, allowing different cells to express different sets of proteins leading to variablility in cell morphology and function. In this process, expression of thousands of genes needs to be tightly controlled as misexpression of an important gene in a wrong tissue or at a wrong developmental stage would in many cases lead to developmental defects. This level of control is achieved through the action of transcription factors binding to enhancers, or more generally regulatory elements, leading to very selective activation or repression of their target genes.

Enhancers usually act on target genes' promoters by physically interacting (through co-factor proteins) with the core transcriptional machinery. While in majority of cases of studied enhancers we have a notion of a target gene, it may be diffcult to assign regulatory elements to target genes, especially in the light of recent findings showing enhancer sharing between genes for both humans (Sanyal et al., 2012) and mice (Li et al., 2012).

With the progress of mapping transcription factor binding sites through chromatin immunoprecipitation-based experiments, we are getting closer to having complete maps of regulatory elements in multiple species (Negre et al., 2011). With complementary techniques such as DNAse I hypersensitivity (Thomas et al., 2011) and FAIRE (Giresi et al., 2007) we can get an even more comprehensive picture of the universe of regulatory regions genome- wide, therefore the need for making comprehensive models of regulatory interactions is becoming one of the main challenges in the field.

In a recent work (Wilczyński et al., 2012), we have shown for mesoderm development in Drosphila that given a comprehensive set of Chip-Chip experiments for relevant transcription factors (Zinzen et al., 2009), it is possible to make a computational model making accurate predictions of tissue- and stage-specific gene expression patterns. One of the key elements of the model, indispensible without loss of prediction accuracy, was at least a rudimentary notion of regulatory domains, in this case based on binding of insulator proteins (Negre et al., 2010).

In order to explore the problem of predicting regulatory domain bound- aries more deeply, we have used supervised machine learning approach to make a model of boundary elements using modENCODE data. The model was trained on large scale mapping of chromatin domains (Sexton et al., 2012) and subsequently tested on independent datasets, both high-throughput (Filion et al., 2010) and targeted tests of insulator regions through luciferase assays (Srinivasan and Mishra, 2012). Our model achieves over .80 AUC in cross-validation experiments, generalizes well across different datasets and outperforms other approaches such as Hidden Markov Models (Ernst and Kellis, 2012) or sequence based predictions (Srinivasan and Mishra, 2012).

Posters

1	Anna Lyubetskaya (also oral presentation)	Reconstructing the regulatory network of TB: transcription factor binding distribution and properties
2	Idit Kosti (also oral presentation)	Does intragenic DNA methylation determine differential exon expression?
3	Morgane Thomas-Chollier	Deciphering genome-wide cis-regulation with RSAT: application to the glucocorticoid receptor
4	Remy Nicolle	Modelling normal cells identifies master regulators in cancer
5	Mayetri Gupta	Bayesian inference of gene regulatory networks from factorial time-course experiments with applications to bone fracture healing
6	Marcin Joachimiak	Deep surveys of biological modules: K-biclustering gene expression and phenotype data
7	Alastair Kilpatrick	Stochastic algorithms for motif discovery: a comparison of sampling strategies
8	Michal Dabrowski	Comparison of Jaspar, Transfac and Genomatix motif libraries on chip-seq data for 44 transcription factors
9	Michal Dabrowski	Integrating Nencki Genomics webservices via Taverna workbench
10	Inbal Paz	DRIMust: a web server for discovering rank imbalanced motifs using suffix trees
11	Jieun Jeong	Polycomb repression and RNA polymerase in neural tube development
12	Alena van Bömmel	Detection of co-regulating transcription factors in 34 human cell types using predicted DNA-binding affinity on DNase hypersensitive sites
13	Marleen Claeys	Regulatory motif detection using different types of evolutionary conservation information
14	Joshua Welch	Investigating the role of transcribed pseudogenes in breast cancer
15	Yaron Orenstein	Inferring binding site motifs from high-throughput in vitro data
16	Lex Overmars	REPs, genetic insulators that enable differential regulation of gene expression in bacteria
17	Mohamed Elati	MiRnaBoost: Multi-view AdaBoost for microRNA target prediction
18	Stefan Naulaerts	Integrative biological itemset mining in cancer research
19	Galip Gurkan Yardimci	Prediction of genome-wide in vivo transcription factor binding using factor-specific DNase footprinting models