peak finding

About Peak Finding

Ultra high-throughput sequencing experiments usually seek to identify locations in which sequence reads cluster into regions of enrichment, in which overlapping reads appear as peaks. For ChIP-Seq experiments, such areas represent in vivo locations where proteins of interest (e.g. modified histones or transcription factors) were associated with DNA; for transcriptome experiments, they correspond to the locations of transcribed exons.There are many peak finding programs available already.They usually take the result of mapping program (such as Eland, Bowtie,etc.) as input.

Input formats:

Format Mapping programs Reference
No Reference Available
Eland Export
Eland formats from pipeline (export format) No Reference Available
MapView (MAQ)
Map (MAQ)

Input data files:

If the mapped data are from paired end experiment,users should provide two files respectively. Otherwise,only one sample data file is required. If you also have control data, please upload it together with sample data.


MACS is a Model-based method to analyze ChIP-Seq data generated by short read sequencers such as Solexa's Genome Analyzer. MACS empirically models the shift size of ChIP-Seq tags, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for more robust predictions.

MACS was proposed by Yong Zhang et al. at Harvard university. The details of MACS can be found in the original paper: Model-based Analysis of ChIP-Seq. The standalone software package was writen by Yong Zhang et al. and can be download from:


Findpeaks is another efficient application for use with ChIP-Seq experimental data that includes novel functionality for identifying areas of gene enrichment and transcription factor binding site locations, as well as for estimating DNA fragment size distributions in enriched areas. The FindPeaks application can also generate UCSC compatible custom WIG track files from aligned-read files for short-read sequencing technology.Findpeaks was proposed by Anthony et al. in the paper: FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology.


ChIPSeq peak finding method was used by Johnson et al. in their Science publication(2007): Genome-Wide Mapping of in Vivo Protein-DNA Interactions. In this method ChIP and Control reads were analyzed jointly for each experiment to identify regions that have an over-representation of reads in the ChIP sample versus the control sample. Candidate enriched regions were identified as aggregations of a threshold counts(in their paper this value is 13) or more ChIP reads not separated by more than 100 bp and were assigned the number of reads as a score. The threshold will need to be selected in a specific study based on the structure of each data set, and with consideration of the false-discovery rate that will be tolerated in the study.