Introduction
  Data Convert
 

Convert between different ChIP-Seq data formats:

WIG format to BED format

The wiggle (WIG) format is used for display of dense, continuous data in UCSC genome browser. Wiggle format is line-oriented and the first line must be a track definition line which has serveral data descriptional options. BED format has no definition lines. In the BED file,each line has the same number of fields which are separated by TAB delimiter. Those BED lines have three required fields and up to nine additional optional fields.

See more details on the BED format:http://genome.ucsc.edu/FAQ/FAQformat.html#format1 and WIG format:http://genome.ucsc.edu/goldenPath/help/wiggle.html.

Eland format to BED format

Eland is a tab-delimited text file, with lines that look like:
>1-1-557-760 TCTTTTATAGCTCCAAACTTTTTTTTA U0 1 0 0 2 63780267 R ..
>1-1-993-248 TTTTCTTTCTTTGTTTTTTTCTTCCTT U0 1 3 153 12 24687605 R ..
>1-1-371-674 AATTTCTCTCTGCTGAAGCTCTTCTTA U1 0 1 0 3 183334250 F .. 23G
>1-1-678-626 ACCCAGAATGCGCTGTTGATTTTTAGT U1 0 1 0 8 70748333 F .. 27G
>1-1-174-418 TGTGTCCCTTTGTAATGAATCACTATC U2 0 0 1 4 103570835 F .. 23G 24C
>1-1-105-657 TTCTAATTTGTATATTTGGACCATTTA U1 0 1 2 12 100574279 R .. 2C

The fields of Eland lines:
1. Sequence name (derived from file name and line number if format is not Fasta)
2. Sequence
3. Type of match:
  NM - no match found.
   QC - no matching done: QC failure (too many Ns basically).
   RM - no matching done: repeat masked (may be seen if repeatFile.txt was specified).
   U0 - Best match found was a unique exact match.
   U1 - Best match found was a unique 1-error match.
   U2 - Best match found was a unique 2-error match.
   R0 - Multiple exact matches found.
   R1 - Multiple 1-error matches found, no exact matches.
   R2 - Multiple 2-error matches found, no exact or 1-error matches.
4. Number of exact matches found.
5. Number of 1-error matches found.
6. Number of 2-error matches found.
Rest of fields are only seen if a unique best match was found (i.e. the match code in field 3 begins with "U").
7. Genome file in which match was found.
8. Position of match (bases in file are numbered starting at 1).
9. Direction of match (F=forward strand, R=reverse).
10. How N characters in read were interpreted: ("."=not applicable, "D"=deletion, "I"=insertion).
Rest of fields are only seen in the case of a unique inexact match (i.e. the match code was U1 or U2).
11. Position and type of first substitution error (e.g. 12A: base 12 was A, not whatever is was in read).
12. Position and type of first substitution error, as above.

Bowtie format to BED format

Bowtie is a tab-delimited text file and each line has 8 fields:
1. Name of read that aligned.
2. Reference strand aligned to, + for forward strand, - for reverse.
3. Name of reference sequence where alignment occurs, or numeric ID if no name was provided.
4. 0-based offset into the forward reference strand where leftmost character of the alignment occurs.
5. Read sequence (reverse-complemented if orientation is -).
6. ASCII-encoded read qualities (reversed if orientation is -). The encoded quality values are on the Phred scale and the encoding is ASCII-offset by 33 (ASCII char !).
7. If -M was specified and the prescribed ceiling was exceeded for this read, this column contains the value of the ceiling, indicating that at least that many valid alignments were found in addition to the one reported.
Otherwise, this column contains the number of other instances where the same sequence aligned against the same reference characters as were aligned against in the reported alignment. This is not the number of other places the read aligns with the same number of mismatches. The number in this column is generally not a good proxy for that number (e.g., the number in this column may be '0' while the number of other alignments with the same number of mismatches might be large).
8. Comma-separated list of mismatch descriptors. If there are no mismatches in the alignment, this field is empty. A single descriptor has the format offset:reference-base>read-base. The offset is expressed as a 0-based offset from the high-quality (5') end of the read.

See more details on the Bowtie format.

Extract columns from BED format files

Input the field numbers which are separated by comma, and extracted those fields from the input BED format file into a new tab-delimited file.

For Example, extract the following fileds from the input file:
   5,3,1,2,4