Genome index of bowtie2 and STAR are pre-built for human, mouse and yeast. Annotation file is provided when building index with STAR
Index is located in /150T/zhangqf/GenomeAnnotation/INDEX
STAR can align spliced sequences of any length with moderate error rates providing scalability for emerging sequencing technologies. The GTF/GFF3 files are provided when build index
The index building command is recorded in file run.sh in each folder. Most of index is built with command icSHAPE-pipe starbuild -i genome.fa -o ./ --gtf gtffile -p 20 --noscaffold. That means it only build index for chromosomes and the scaffolds in genome are removed. icSHAPE-pipe can be obtained in github
STAR index is located in /150T/zhangqf/GenomeAnnotation/INDEX/STAR
Directory | content |
---|---|
hg38_Gencode | Human (hg38) genome index with Gencode annotation |
hg38_NCBI | Human (hg38) genome index with NCBI annotation |
mm10_Gencode | Mouse (mm10) genome index with Gencode annotation |
mm10_NCBI | Mouse (mm10) genome index with NCBI annotation |
yeast_Gencode | Yeast genome index with Gencode annotation |
human_rRNA_tRNA_mtRNA | Human rRNA, tRNA and mtRNA |
mouse_rRNA_tRNA_mtRNA | Mouse rRNA, tRNA and mtRNA |
HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (whole-genome, transcriptome, and exome sequencing data) against the general human population (as well as against a single reference genome). The GTF/GFF3 files are provided when build index
The index building command is recorded in file run.sh in each folder. Most of index is built with command hisat2_extract_splice_sites.py hg38.gtf > hg38.splice; faformat -in ${genome} -out hg38.fa -fp_chrid "^chr"; hisat2-build --ss hg38.splice -p 20 hg38.fa hg38;. That means it only build index for chromosomes and the scaffolds in genome are removed.
hisat2 index is located in /150T/zhangqf/GenomeAnnotation/INDEX/hisat2
Directory | content |
---|---|
hg38_Gencode | Human (hg38) genome index with Gencode annotation |
hg38_NCBI | Human (hg38) genome index with NCBI annotation |
mm10_Gencode | Mouse (mm10) genome index with Gencode annotation |
mm10_NCBI | Mouse (mm10) genome index with NCBI annotation |
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.
The index building command is recorded in file run.sh in each folder. Most of index is built with command faformat -in genome.fa -out tmp.genome.fa -fp_chrid "^chr" and then bowtie2-build --threads 20 tmp.genome.fa genome. That means it only build index for chromosomes and the scaffolds in genome are removed.
bowtie2 index is located in /150T/zhangqf/GenomeAnnotation/INDEX/bowtie2
Directory | content |
---|---|
hg38 | Human (hg38) genome index |
mm10 | Mouse (mm10) genome index |
human_rRNA_tRNA_mtRNA | Human rRNA, tRNA and mtRNA |
mouse_rRNA_tRNA_mtRNA | Mouse rRNA, tRNA and mtRNA |
BLAST finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance.
The index building command is recorded in file run.sh in each folder. Most of index is built with command faformat -in /path/hg38.fa -out /dev/stdout -fp_chrid "^chr" | awk '{print $1}' > hg38.fa and then makeblastdb -in hg38.fa -dbtype nucl -title hg38 -parse_seqids -out hg38. That means it only build index for chromosomes and the scaffolds in genome are removed.
bowtie2 index is located in /150T/zhangqf/GenomeAnnotation/INDEX/blast
Directory | content |
---|---|
hg38 | Human (hg38) genome index |
mm10 | Mouse (mm10) genome index |
zebrafish_ncbi | Zebrafish (from NCBI) genome index |
How to use blastn: blastn -query query.fa -db blastdb -out result.txt.
If you hope to search almost all result (every sensitive) with short queries (<50nt): blastn -db blastdb -query query.fa -word_size 4 -task blastn-short -gapopen 1 -outfmt 7 -penalty -1 -num_threads 20 -evalue 1000 -out results.tabular