icSHAPE-pipe

A pipeline to calculate SHAPE/DMS reactity score

Site Map
Latest Release
Secondary structures

File Format

icSHAPE-pipe generates a lot of intermediate files during the running process. Here we introduce the format of each file one by one.

*.genomeCoor.bed file

The *.genomeCoor.bed file is the output file after parseGTF parsing the GTF or GFF3 file. The information contains the location information of each transcript, and the gene type has the gene name. This file is used as an input file when commenting and converting coordinates.

chr1 958246 959256 - NOC2L=ENSG00000188976.10 ENST00000469563.1 retained_intron 959215-959256,958246-959081
chr1 998962 1000172 - HES4=ENSG00000188290.10 ENST00000428771.6 protein_coding 999692-1000172,999526-999613,998962-999432 999974-1000172,998962-999061
  1. Chromosome ID
  2. Chromosome start
  3. Chromosome end
  4. Chromosome strand
  5. gene_name=gene_id
  6. transcript_id
  7. gene_type
  8. exons
  9. utrs

*.tab file

The *.tab file is a file generated by sam2tab, which converts each read in the sam/bam file into a record, using a more simplified method to represent an alignment.

chr22 + 10698831 10698842 11574357 11574369
chr22 + 10747846 10747858
  1. Chromosome ID
  2. Chromosome strand
  3. Map start
  4. Map end/Junction start
  5. Junction end
  6. Map end

*.gTab file

The *.gTab file is an intermediate file that records the activity score information based on the genomic coordinates. The number of columns in the file is variable, but the content of each column needs to be explained at the beginning of the file.

@ColNum 9
@ChrID 1
@Strand 2
@ChrPos 3
@Base 4
@N_RT 5
@N_BD 6
@Shape 7
@ShapeNum 8
@WindowShape 9
chr22 + 16405373 A 2 106 0.130102 7 0.142857,0.125,0.125,0.125,0.125,0.125,0.142857,
chr22 + 16405374 C 4 108 0.260204 7 0.285714,0.25,0.25,0.25,0.25,0.25,0.285714,

The line beginning with @ in the file indicates the information represented by each column. .gTab files with @Shape column can be converted to .shape files

*.shape file

The *.shape file records the reactivity information for each transcript. The number of columns in the file is the same as the length of the transcript

yeast_16S 1800 27362.172 NULL NULL NULL NULL NULL 0.052 0.416 0.182 0.956
yeast_23S 3675 266383.172 NULL NULL NULL NULL NULL 0.030 0.261 0.037 0.826
  1. Transcript ID
  2. Transcript Length
  3. RPKM
  4. score for 1st base
  5. score for 2nd base
  6. score for 3rd base
  7. score for 4th base
  8. ...