Created
April 9, 2024 15:42
-
-
Save ramongallego/a613b0a054fdb1e8e6a66178f53fca11 to your computer and use it in GitHub Desktop.
kraken2 db from NCBI db
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# First download to your computer the right database from NCBI. You can either go to https://ftp.ncbi.nlm.nih.gov/blast/db/ | |
# and download the right db, or if it is a multi-file one run | |
# perl <path/to/ncbi/bin>/update_blastdb.pl --decompress <name_of_db> | |
# I would recommend having one db per folder, as it might rewrite the taxdb.btd/bti files | |
###### | |
# | |
## Once you have the downloaded db, the next thing is to extract the FASTA from the dbs | |
# I am sure once there is a downloaded db, there is an easy way of generating the krakendb, but here we are | |
## USAGE: bash Format_NCBI_4_Kraken2.sh <folder_with_NCBI_db>/<NCBI_db_name> <KRAKEN_DB_NAME> | |
DBNAME=$2 | |
input=$1 | |
input_folder=$(dirname $input) | |
input_db=$(basename $input) | |
## STEP1 Extract required info from database | |
blastdbcmd -db $input -entry all -out "${input_folder}"/"${input_db}".txt -outfmt ">%a|kraken:taxid|%T,%s" | |
## STEP2 reformat as FASTA | |
awk -F',' '{print $1"\n"$2}' "${input_folder}"/"${input_db}".txt > "${input_folder}"/"${input_db}".fasta | |
## STEP3 download a taxonomy and start your custom db | |
kraken2-build --download-taxonomy --use-ftp --db $DBNAME | |
## STEP4 add the new seqs to the db | |
kraken2-build --add-to-library "${input_folder}"/"${input_db}".fasta --db $DBNAME | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment