THE COLLECTION OF COMMON GENOME ANNOTATION
Genome GTF Rfam Structures RNAs Index Useful Softwares

GTF (Annotation) files

Genome annotations files (GTF from Ensembl/Gencode and GFF3 from Refseq) are collected in this directory. The *.genomeCoor.bed file is parsed from GTF/GFF3 with GAP

All genome files are located in /150T/zhangqf/GenomeAnnotation/[Gencode|NCBI]

gtf files

GTF files are collected from Gencode (for human and mouse) or Ensembl (others). The GTF file records the gene positions and gene structure (UTR and CDS). GTF is the version 2 of GFF file. Now these GTF files are provided:

hg19.gtf hg38.gtf mm9.gtf mm10.gtf yeast.gtf chicken.gtf chimpanzee.gtf drosophila.gtf macaque.gtf horse.gtf TAIR10.gtf

All genome files are located in /150T/zhangqf/GenomeAnnotation/Gencode

gff3 files

GFF3 files are collected from NCBI. The GFF3 file records the gene positions and gene structure (UTR and CDS). Now these GFF3 files are provided:

hg19.gff3 hg38.gff3 mm10.gff3 cattle.gff3 chicken.gff3 chimpanzee.gff3 macaque.gff3 rat.gff3 zebrafish.gff3 alpaca.gff3 baboon.gff3 bonobo.gff3 brown_kiwi.gff3 cat.gff3 dolphin.gff3 ferret.gff3 garter_snake.gff3 golden_eagle.gff3 green_monkey.gff3 marmoset.gff3 naked_mole_rat.gff3 opossum.gff3 platypus.gff3 rabbit.gff3 sheep.gff3 yeast.gff3

All genome files are located in /150T/zhangqf/GenomeAnnotation/NCBI

*.genomeCoor.bed files

All GTF files and GFF3 files are converted to *.genomeCoor.bed files with GAP: parseGTF.py -g genome.gtf -s ensembl -o prefix --genome genome.fasta. They are more friendly to read. This file can be input of GAP.

chr1 1280077 1284975 - LOC105611997=105611997 XM_012177871.1 mRNA 1281685-1284975,1281575-1281682,1280077-1281572 1282247-1284975,1280077-1281333
chr1 1292405 1293440 - LOC101105206=101105206 XM_004003444.3 mRNA 1292818-1293440,1292405-1292750
chr1 1307762 1309351 - LOC101105463=101105463 XM_012178048.2 mRNA 1307762-1309351 1309072-1309351,1307762-1308141
column content
1 chromosome id
2 chromosome start
3 chromosome end
4 strand
5 gene_name=gene_id
6 transcript_id
7 gene type
8 exons
9 utrs

Transcriptome files

Transcriptome files are parsed from genome with GAP. These files are subject to such naming rules: *_transcriptome.fa

>XM_014073160.1 106554484|LOC106554484|mRNA
GCCCTCCCTGCTCTCCTAGCCTCACCATGCCCACCATGCTCTCCCTGCTGGGCCAGGGCA
CCACCGGTAAGTCGAGGACTACCAGTATATTCACTCTGCTCCCTTCTCTTCCTGCAGACC
GGGTGGGTTGGCCCCTTCCTGCCGTGGAAGTGGATTATCCCGAAGGCCTTGATCGCTTCA