This is the downloadable material for the article 'A manually curated ChIP-seq benchmark demonstrates room for improvement in current peak-finder programs' (submitted to Nucleic Acids Research.)
Chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq) is rapidly becoming the method of choice for discovering cell-specific transc ription factor binding locations genome wide. By aligning sequenced tags to the genome, binding locations appear as peaks in the tag profile. Several program s have been designed to identify such peaks, but program evaluation has been difficult due to the lack of benchmark datasets. We have created benchmark datas ets for three transcription factors by manually evaluating a selection of potential binding regions that cover typical variation in peak size and appearance. Performance of five programs on this benchmark showed, first, that external control or background data was essential to limit the number of false positive p eaks from the programs. However, over 80% of these peaks could be manually filtered out by visual inspection alone, without using additional background data, showing that peak shape information is not fully exploited in the evaluated programs. Second, none of the programs returned peak-regions that corresponded t o the actual resolution in ChIP-seq data. Our results showed that ChIP-seq peaks should be narrowed down to 100-400 bps, which is sufficient to identify uniq ue peaks and binding sites. Based on these results, we propose a meta-approach that gives improved peak definitions.
Bioinformatics & Gene Regulation
Department of Cancer Research and Molecular Medicine
Norwegian University of Science and Technology