Genome Resources

Collection of Common Genome Annotations and Resources

Genome files .fa .fai

Genome files are saved as fasta file format, include many species such as human, mouse, chicken, chimpanzee and so on. Data from Emsembl and Gencode are collected. A fasta index (.fai) file is located in the same directoey.

Directory:

/150T/zhangqf/GenomeAnnotation/genome/[*.fa|*.fai]
/150T/zhangqf/GenomeAnnotation/genome/more/[*.fa|*.fai]

Transcriptome files _transcriptome.fa

Transcriptome files are saved as fasta file format, include many species such as human, mouse, chicken, chimpanzee and so on. The sequence is retrived from genome with GTF/GFF3 annotation with GAP: parseGTF.py -g genome.gtf -s ensembl -o prefix --genome genome.fasta

Directory:

/150T/zhangqf/GenomeAnnotation/genome/Gencode/*_transcriptome.fa
/150T/zhangqf/GenomeAnnotation/genome/NCBI/*_transcriptome.fa
/150T/zhangqf/GenomeAnnotation/genome/NCBI/more/*_transcriptome.fa

GTF files (Annotation) .gtf .gff3 .genomeCoor.bed

Genome annotations files (GTF from Ensembl/Gencode and GFF3 from Refseq) are collected in this directory. The *.genomeCoor.bed file is parsed from GTF/GFF3 with GAP

Directory:

/150T/zhangqf/GenomeAnnotation/NCBI/[*.gff3|*.gtf|*.genomeCoor.bed]
/150T/zhangqf/GenomeAnnotation/NCBI/more/[*.gff3|*.gtf|*.genomeCoor.bed]
/150T/zhangqf/GenomeAnnotation/Gencode/[*.gff3|*.gtf|*.genomeCoor.bed]

Rfam database .dot .seed

The Rfam database is a collection of RNA families, each represented by multiple sequence alignments, consensus secondary structures and covariance models (CMs). Conserved RNA structures (*.dot) of human have been parsed from multiple-alignment files (*.seed)

Directory:

/150T/zhangqf/GenomeAnnotation/Rfam/Parsed_Structure/human.dot
/150T/zhangqf/GenomeAnnotation/Rfam/14.1/Rfam.seed

Known structures .dot

Some known RNA structures are collected from all kinds of databases and papers. The conserved rRNA structures from CRW database are also collected

Directory:

/150T/zhangqf/GenomeAnnotation/Known_Structures/*.dot
/150T/zhangqf/GenomeAnnotation/Known_Structures/CRW/*.dot

Small RNAs .fa .dot .bed

Well-known small RNAs (tRNA, miRNA, snoRNA and snRNA) are collected from all kinds of databases. Structures of tRNA/miRNA are provided and boxes (C/D box and H/ACA box) of snoRNA are provided

rRNAs (including pre-rRNAs) of human and mouse are also collected

Directory:

/150T/zhangqf/GenomeAnnotation/rRNA
/150T/zhangqf/GenomeAnnotation/tRNA
/150T/zhangqf/GenomeAnnotation/miRNA
/150T/zhangqf/GenomeAnnotation/snoRNA

Index bowtie2 STAR

Genome index of bowtie2, STAR and hisat2 are pre-built for human, mouse and yeast. Annotation file is provided when building index with STAR and hisat2

Directory:

/150T/zhangqf/GenomeAnnotation/INDEX/bowtie2
/150T/zhangqf/GenomeAnnotation/INDEX/STAR junction
/150T/zhangqf/GenomeAnnotation/INDEX/hisat2 junction

Warning: The version of STAR has been changing, the index between each version is not compatible. Use /150T/zhangqf/GenomeAnnotation/INDEX/bin/STAR to map

Chain files .chain.gz

Chain files are used to covert genome version such as hg19=>hg38 and mm9=>mm10

Directory:

/150T/zhangqf/GenomeAnnotation/chain

Size files .size

Size files record the length of each chromosome. It can be the input file for bedtools. It can be produced when build index with STAR (file: chrNameLength.txt)

Directory:

/150T/zhangqf/GenomeAnnotation/size