Genome annotations files (GTF from Ensembl/Gencode and GFF3 from Refseq) are collected in this directory. The *.genomeCoor.bed file is parsed from GTF/GFF3 with GAP
All genome files are located in /150T/zhangqf/GenomeAnnotation/[Gencode|NCBI]
GTF files are collected from Gencode (for human and mouse) or Ensembl (others). The GTF file records the gene positions and gene structure (UTR and CDS). GTF is the version 2 of GFF file. Now these GTF files are provided:
All genome files are located in /150T/zhangqf/GenomeAnnotation/Gencode
GFF3 files are collected from NCBI. The GFF3 file records the gene positions and gene structure (UTR and CDS). Now these GFF3 files are provided:
All genome files are located in /150T/zhangqf/GenomeAnnotation/NCBI
All GTF files and GFF3 files are converted to *.genomeCoor.bed files with GAP: parseGTF.py -g genome.gtf -s ensembl -o prefix --genome genome.fasta. They are more friendly to read. This file can be input of GAP.
column | content |
---|---|
1 | chromosome id |
2 | chromosome start |
3 | chromosome end |
4 | strand |
5 | gene_name=gene_id |
6 | transcript_id |
7 | gene type |
8 | exons |
9 | utrs |
Transcriptome files are parsed from genome with GAP. These files are subject to such naming rules: *_transcriptome.fa