Collection of Common Genome Annotations and Resources
Genome files are saved as fasta file format, include many species such as human, mouse, chicken, chimpanzee and so on. Data from Emsembl and Gencode are collected. A fasta index (.fai) file is located in the same directoey.
Directory:
Transcriptome files are saved as fasta file format, include many species such as human, mouse, chicken, chimpanzee and so on. The sequence is retrived from genome with GTF/GFF3 annotation with GAP: parseGTF.py -g genome.gtf -s ensembl -o prefix --genome genome.fasta
Directory:
Genome annotations files (GTF from Ensembl/Gencode and GFF3 from Refseq) are collected in this directory. The *.genomeCoor.bed file is parsed from GTF/GFF3 with GAP
Directory:
The Rfam database is a collection of RNA families, each represented by multiple sequence alignments, consensus secondary structures and covariance models (CMs). Conserved RNA structures (*.dot) of human have been parsed from multiple-alignment files (*.seed)
Directory:
Some known RNA structures are collected from all kinds of databases and papers. The conserved rRNA structures from CRW database are also collected
Directory:
Well-known small RNAs (tRNA, miRNA, snoRNA and snRNA) are collected from all kinds of databases. Structures of tRNA/miRNA are provided and boxes (C/D box and H/ACA box) of snoRNA are provided
rRNAs (including pre-rRNAs) of human and mouse are also collected
Directory:
Genome index of bowtie2, STAR and hisat2 are pre-built for human, mouse and yeast. Annotation file is provided when building index with STAR and hisat2
Directory:
Chain files are used to covert genome version such as hg19=>hg38 and mm9=>mm10
Directory:
Size files record the length of each chromosome. It can be the input file for bedtools. It can be produced when build index with STAR (file: chrNameLength.txt)
Directory: