Index

Genome index of bowtie2 and STAR are pre-built for human, mouse and yeast. Annotation file is provided when building index with STAR

Index is located in /150T/zhangqf/GenomeAnnotation/INDEX

STAR index

STAR can align spliced sequences of any length with moderate error rates providing scalability for emerging sequencing technologies. The GTF/GFF3 files are provided when build index

The index building command is recorded in file run.sh in each folder. Most of index is built with command icSHAPE-pipe starbuild -i genome.fa -o ./ --gtf gtffile -p 20 --noscaffold. That means it only build index for chromosomes and the scaffolds in genome are removed. icSHAPE-pipe can be obtained in github

STAR index is located in /150T/zhangqf/GenomeAnnotation/INDEX/STAR

Directory	content
hg38_Gencode	Human (hg38) genome index with Gencode annotation
hg38_NCBI	Human (hg38) genome index with NCBI annotation
mm10_Gencode	Mouse (mm10) genome index with Gencode annotation
mm10_NCBI	Mouse (mm10) genome index with NCBI annotation
yeast_Gencode	Yeast genome index with Gencode annotation
human_rRNA_tRNA_mtRNA	Human rRNA, tRNA and mtRNA
mouse_rRNA_tRNA_mtRNA	Mouse rRNA, tRNA and mtRNA

Warning: The version of STAR has been changing, the index between each version is not compatible. Use /150T/zhangqf/GenomeAnnotation/INDEX/bin/STAR to map

hisat2 index New!

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (whole-genome, transcriptome, and exome sequencing data) against the general human population (as well as against a single reference genome). The GTF/GFF3 files are provided when build index

The index building command is recorded in file run.sh in each folder. Most of index is built with command hisat2_extract_splice_sites.py hg38.gtf > hg38.splice; faformat -in ${genome} -out hg38.fa -fp_chrid "^chr"; hisat2-build --ss hg38.splice -p 20 hg38.fa hg38;. That means it only build index for chromosomes and the scaffolds in genome are removed.

hisat2 index is located in /150T/zhangqf/GenomeAnnotation/INDEX/hisat2

Directory	content
hg38_Gencode	Human (hg38) genome index with Gencode annotation
hg38_NCBI	Human (hg38) genome index with NCBI annotation
mm10_Gencode	Mouse (mm10) genome index with Gencode annotation
mm10_NCBI	Mouse (mm10) genome index with NCBI annotation

bowtie2 index

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.

The index building command is recorded in file run.sh in each folder. Most of index is built with command faformat -in genome.fa -out tmp.genome.fa -fp_chrid "^chr" and then bowtie2-build --threads 20 tmp.genome.fa genome. That means it only build index for chromosomes and the scaffolds in genome are removed.

bowtie2 index is located in /150T/zhangqf/GenomeAnnotation/INDEX/bowtie2

Directory	content
hg38	Human (hg38) genome index
mm10	Mouse (mm10) genome index
human_rRNA_tRNA_mtRNA	Human rRNA, tRNA and mtRNA
mouse_rRNA_tRNA_mtRNA	Mouse rRNA, tRNA and mtRNA

blastn index New!

BLAST finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance.

The index building command is recorded in file run.sh in each folder. Most of index is built with command faformat -in /path/hg38.fa -out /dev/stdout -fp_chrid "^chr" | awk '{print $1}' > hg38.fa and then makeblastdb -in hg38.fa -dbtype nucl -title hg38 -parse_seqids -out hg38. That means it only build index for chromosomes and the scaffolds in genome are removed.

bowtie2 index is located in /150T/zhangqf/GenomeAnnotation/INDEX/blast

Directory	content
hg38	Human (hg38) genome index
mm10	Mouse (mm10) genome index
zebrafish_ncbi	Zebrafish (from NCBI) genome index

How to use blastn: blastn -query query.fa -db blastdb -out result.txt.

If you hope to search almost all result (every sensitive) with short queries (<50nt): blastn -db blastdb -query query.fa -word_size 4 -task blastn-short -gapopen 1 -outfmt 7 -penalty -1 -num_threads 20 -evalue 1000 -out results.tabular