Last active
February 25, 2021 23:31
-
-
Save FloWuenne/f80f66ef06fb146cf9b5706e6ee202f2 to your computer and use it in GitHub Desktop.
Building a custom reference for kb-python in a virtual environment
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## This is only on compute canada clusters where modules are available! | |
module load python/3.7.4 | |
## Create virtual environment using python | |
python3 -m venv ./kb_python_env | |
## activate environment | |
source ./kb_python_env/bin/activate | |
## Install kb-python inside environment | |
pip3 install kb-python | |
## Define paths for index | |
ref_file_dir="." | |
index_dir="./kb_custom_index" | |
## Download the reference files from Gencode | |
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M25/GRCm38.primary_assembly.genome.fa.gz ## genome fasta | |
gunzip GRCm38.primary_assembly.genome.fa.gz | |
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M25/gencode.vM25.annotation.gtf.gz ## gtf file (Comprehensive gene annotation) | |
gunzip gencode.vM25.annotation.gtf.gz | |
## Add your custom sequence to reference | |
## It is important that the fasta header and gtf entry are formatted exactly like the other genes and transcripts in the reference file | |
## otherwise, the reference building won't work correctly! | |
cat GRCm38.primary_assembly.genome.fa custom_sequence.fasta > GRCm38.primary_assembly.genome.with_custom_seq.fa | |
cat gencode.vM25.annotation.gtf custom_sequence.gtf > gencode.vM25.annotation.with_custom_seq.gtf | |
## Build the reference reference | |
kb ref --workflow standard -i $index_dir/kb_ref.GRCm38.with_custom_seq.idx -g $ref_file_dir/t2g_kb_ref.GRCm38.with_custom_seq -f1 $ref_file_dir/cdna.GRCm38.with_custom_seq $ref_file_dir/GRCm38.primary_assembly.genome.with_custom_seq.fa $ref_file_dir/gencode.vM25.annotation.with_custom_seq.gtf |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment