Skip to content

Instantly share code, notes, and snippets.

@genomewalker
Created April 29, 2019 20:16
Show Gist options
  • Save genomewalker/96cfc0c4754cd4f2cd1ce6925576c876 to your computer and use it in GitHub Desktop.
Save genomewalker/96cfc0c4754cd4f2cd1ce6925576c876 to your computer and use it in GitHub Desktop.
+ set -e
+ MMSEQS=/vol/attached/opt/MMseqs2-7-4e23d/bin/mmseqs
+ DIR=/vol/attached/gtdb
+ SDIR=/vol/scratch/gtdb
+ export 'OMPI_MCA_btl=^openib'
+ OMPI_MCA_btl='^openib'
+ export OMP_NUM_THREADS=28
+ OMP_NUM_THREADS=28
+ RUNNER='mpirun --mca btl_tcp_if_include ens3 -n 10 --map-by ppr:1:node --bind-to none '
+ /vol/attached/opt/MMseqs2-7-4e23d/bin/mmseqs clusterupdate /vol/scratch/gtdb/marine_hmp_db_03112017 /vol/scratch/gtdb/mg_gtdb_orfs_db /vol/scratch/gtdb/marine_hmp_db_03112017_clu /vol/attached/gtdb/mg_gtdb_update/mg_gtdb_db_052019 /vol/attached/gtdb/mg_gtdb_update/mg_gtdb_db_
052019_clu /vol/attached/gtdb/mg_gtdb_update/tmp --min-seq-id 0.3 -s 5 --cov-mode 0 -c 0.8
Program call:
clusterupdate /vol/scratch/gtdb/marine_hmp_db_03112017 /vol/scratch/gtdb/mg_gtdb_orfs_db /vol/scratch/gtdb/marine_hmp_db_03112017_clu /vol/attached/gtdb/mg_gtdb_update/mg_gtdb_db_052019 /vol/attached/gtdb/mg_gtdb_update/mg_gtdb_db_052019_clu /vol/attached/gtdb/mg_gtdb_update/tm
p --min-seq-id 0.3 -s 5 --cov-mode 0 -c 0.8
MMseqs Version: GITDIR-NOTFOUND-MPI
Sub Matrix blosum62.out
Add backtrace false
Alignment mode 0
E-value threshold 0.001
Seq. Id Threshold 0.3
Seq. Id. Mode 0
Alternative alignments 0
Coverage threshold 0.8
Coverage Mode 0
Max. sequence length 65535
Compositional bias 1
Realign hit false
Max Reject 2147483647
Max Accept 2147483647
Include identical Seq. Id. false
Preload mode 0
Pseudo count a 1
Pseudo count b 1.5
Score bias 0
Gap open cost 11
Gap extension cost 1
Threads 28
Verbosity 3
Sensitivity 5
K-mer size 0
K-score 2147483647
Alphabet size 21
Offset result 0
Split DB 0
Split mode 2
Split Memory Limit 0
Diagonal Scoring 1
Exact k-mer matching 0
Mask Residues 1
Minimum Diagonal score 15
Spaced Kmer 1
Spaced k-mer pattern
Local temporary path
Rescore mode 0
Remove hits by seq.id. and coverage false
Sort results 0
In substitution scoring mode, performs global alignment along the diagonal false
Mask profile 1
Profile e-value threshold 0.001
Use global sequence weighting false
Filter MSA 1
Maximum sequence identity threshold 0.9
Minimum seq. id. 0
Minimum score per column -20
Minimum coverage 0
Select n most diverse seqs 1000
Omit Consensus false
Min codons in orf 1
Max codons in length 2147483647
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 0
Forward Frames 1,2,3
Reverse Frames 1,2,3
Translation Table 1
Use all table starts false
Offset of numeric ids 0
Add Orf Stop false
Number search iterations 1
Start sensitivity 4
Search steps 1
Run a seq-profile search in slice mode false
Strand selection 1
Disk space limit 0
Sets the MPI runner mpirun --mca btl_tcp_if_include ens3 -n 10 --map-by ppr:1:node --bind-to none
Remove Temporary Files false
Cluster mode 0
Max depth connected component 1000
Similarity type 2
Single step clustering true
Cascaded clustering steps 3
Kmer per sequence 21
Shift hash 5
Include only extendable false
Skip sequence with n repeating k-mers 0
Match sequences by their ID false
Recover Deleted false
===================================================
=== Update the new sequences with the old keys ====
===================================================
===================================================
====== Filter out the new from old sequences ======
===================================================
===================================================
======= Extract representative sequences ==========
===================================================
===================================================
======== Search the new sequences against =========
========= previous (rep seq of) clusters ==========
===================================================
Program call:
search /vol/attached/gtdb/mg_gtdb_update/tmp/NEWDB.newSeqs /vol/attached/gtdb/mg_gtdb_update/tmp/OLDDB.repSeq /vol/attached/gtdb/mg_gtdb_update/tmp/newSeqsHits /vol/attached/gtdb/mg_gtdb_update/tmp/search --sub-mat blosum62.out -a 0 --alignment-mode 0 -e 0.001 --min-seq-id 0.3
--seq-id-mode 0 --alt-ali 0 -c 0.8 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --realign 0 --max-rejected 2147483647 --max-accept 1 --add-self-matches 0 --db-load-mode 0 --pca 1 --pcb 1.5 --score-bias 0 --gap-open 11 --gap-extend 1 --threads 28 -v 3 -s 5 -k 0 --k-score
2147483647 --alph-size 21 --offset-result 0 --split 0 --split-mode 2 --split-memory-limit 0 --diag-score 1 --exact-kmer-matching 0 --mask 1 --min-ungapped-score 15 --spaced-kmer-mode 1 --rescore-mode 0 --filter-hits 0 --sort-results 0 --global-alignment 0 --mask-profile 1 --e-p
rofile 0.001 --wg 0 --filter-msa 1 --max-seq-id 0.9 --qid 0 --qsc -20 --cov 0 --diff 1000 --omit-consensus 0 --min-length 1 --max-length 2147483647 --max-gaps 2147483647 --contig-start-mode 2 --contig-end-mode 2 --orf-start-mode 0 --forward-frames 1,2,3 --reverse-frames 1,2,3 -
-translation-table 1 --use-all-table-starts 0 --id-offset 0 --add-orf-stop 0 --num-iterations 1 --start-sens 4 --sens-steps 1 --slice-search 0 --strand 1 --disk-space-limit 0 --remove-tmp-files 0
MMseqs Version: GITDIR-NOTFOUND-MPI
Sub Matrix blosum62.out
Add backtrace false
Alignment mode 0
E-value threshold 0.001
Seq. Id Threshold 0.3
Seq. Id. Mode 0
Alternative alignments 0
Coverage threshold 0.8
Coverage Mode 0
Max. sequence length 65535
Max. results per query 300
Compositional bias 1
Realign hit false
Max Reject 2147483647
Max Accept 1
Include identical Seq. Id. false
Preload mode 0
Pseudo count a 1
Pseudo count b 1.5
Score bias 0
Gap open cost 11
Gap extension cost 1
Threads 28
Verbosity 3
Sensitivity 5
K-mer size 0
K-score 2147483647
Alphabet size 21
Offset result 0
Split DB 0
Split mode 2
Split Memory Limit 0
Diagonal Scoring 1
Exact k-mer matching 0
Mask Residues 1
Minimum Diagonal score 15
Spaced Kmer 1
Spaced k-mer pattern
Local temporary path
Rescore mode 0
Remove hits by seq.id. and coverage false
Sort results 0
In substitution scoring mode, performs global alignment along the diagonal false
Mask profile 1
Profile e-value threshold 0.001
Use global sequence weighting false
Filter MSA 1
Maximum sequence identity threshold 0.9
Minimum seq. id. 0
Minimum score per column -20
Minimum coverage 0
Select n most diverse seqs 1000
Omit Consensus false
Min codons in orf 1
Max codons in length 2147483647
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 0
Forward Frames 1,2,3
Reverse Frames 1,2,3
Translation Table 1
Use all table starts false
Offset of numeric ids 0
Add Orf Stop false
Number search iterations 1
Start sensitivity 4
Search steps 1
Run a seq-profile search in slice mode false
Strand selection 1
Disk space limit 0
Sets the MPI runner mpirun --mca btl_tcp_if_include ens3 -n 10 --map-by ppr:1:node --bind-to none
Remove Temporary Files false
MPI Init...
Rank: 0 Size: 10
Program call:
prefilter /vol/attached/gtdb/mg_gtdb_update/tmp/NEWDB.newSeqs /vol/attached/gtdb/mg_gtdb_update/tmp/OLDDB.repSeq /vol/attached/gtdb/mg_gtdb_update/tmp/search/5596834897263602452/pref_5.0 --sub-mat blosum62.out -k 0 --k-score 2147483647 --alph-size 21 --max-seq-len 65535 --max-s
eqs 300 --offset-result 0 --split 0 --split-mode 2 --split-memory-limit 0 -c 0.8 --cov-mode 0 --comp-bias-corr 1 --diag-score 1 --exact-kmer-matching 0 --mask 1 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca 1 --pcb 1.5 --threads 28 -v
3 -s 5.0
MMseqs Version: GITDIR-NOTFOUND-MPI
Sub Matrix blosum62.out
Sensitivity 5
K-mer size 0
K-score 2147483647
Alphabet size 21
Max. sequence length 65535
Max. results per query 300
Offset result 0
Split DB 0
Split mode 2
Split Memory Limit 0
Coverage threshold 0.8
Coverage Mode 0
Compositional bias 1
Diagonal Scoring 1
Exact k-mer matching 0
Mask Residues 1
Minimum Diagonal score 15
Include identical Seq. Id. false
Spaced Kmer 1
Preload mode 0
Pseudo count a 1
Pseudo count b 1.5
Spaced k-mer pattern
Local temporary path
Threads 28
Verbosity 3
Initialising data structures...
Using 28 threads.
Could not find precomputed index. Compute index.
Touch data file /vol/attached/gtdb/mg_gtdb_update/tmp/OLDDB.repSeq ... Done.
Substitution matrices...
Substitution matrices...
Use kmer size 7 and split 1 using Query split mode.
Needed memory (81983279223 byte) of total memory (243445174272 byte)
Target database: /vol/attached/gtdb/mg_gtdb_update/tmp/OLDDB.repSeq(Size: 32465074)
Index table k-mer threshold: 106
Index table: counting k-mers...
................................................................................................... 1 Mio. sequences processed
................................................................................................... 2 Mio. sequences processed
................................................................................................... 3 Mio. sequences processed
................................................................................................... 4 Mio. sequences processed
................................................................................................... 5 Mio. sequences processed
................................................................................................... 6 Mio. sequences processed
................................................................................................... 7 Mio. sequences processed
................................................................................................... 8 Mio. sequences processed
................................................................................................... 9 Mio. sequences processed
................................................................................................... 10 Mio. sequences processed
................................................................................................... 11 Mio. sequences processed
................................................................................................... 12 Mio. sequences processed
................................................................................................... 13 Mio. sequences processed
................................................................................................... 14 Mio. sequences processed
................................................................................................... 15 Mio. sequences processed
................................................................................................... 16 Mio. sequences processed
................................................................................................... 17 Mio. sequences processed
................................................................................................... 18 Mio. sequences processed
................................................................................................... 19 Mio. sequences processed
................................................................................................... 20 Mio. sequences processed
................................................................................................... 21 Mio. sequences processed
................................................................................................... 22 Mio. sequences processed
................................................................................................... 23 Mio. sequences processed
................................................................................................... 24 Mio. sequences processed
................................................................................................... 25 Mio. sequences processed
................................................................................................... 26 Mio. sequences processed
................................................................................................... 27 Mio. sequences processed
................................................................................................... 28 Mio. sequences processed
................................................................................................... 29 Mio. sequences processed
................................................................................................... 30 Mio. sequences processed
................................................................................................... 31 Mio. sequences processed
................................................................................................... 32 Mio. sequences processed
..............................................
Index table: Masked residues: 94184143
Index table: fill...
................................................................................................... 1 Mio. sequences processed
................................................................................................... 2 Mio. sequences processed
................................................................................................... 3 Mio. sequences processed
................................................................................................... 4 Mio. sequences processed
................................................................................................... 5 Mio. sequences processed
................................................................................................... 6 Mio. sequences processed
................................................................................................... 7 Mio. sequences processed
................................................................................................... 8 Mio. sequences processed
................................................................................................... 9 Mio. sequences processed
................................................................................................... 10 Mio. sequences processed
................................................................................................... 11 Mio. sequences processed
................................................................................................... 12 Mio. sequences processed
................................................................................................... 13 Mio. sequences processed
................................................................................................... 14 Mio. sequences processed
................................................................................................... 15 Mio. sequences processed
................................................................................................... 16 Mio. sequences processed
................................................................................................... 17 Mio. sequences processed
................................................................................................... 18 Mio. sequences processed
................................................................................................... 19 Mio. sequences processed
................................................................................................... 20 Mio. sequences processed
................................................................................................... 21 Mio. sequences processed
................................................................................................... 22 Mio. sequences processed
................................................................................................... 23 Mio. sequences processed
................................................................................................... 24 Mio. sequences processed
................................................................................................... 25 Mio. sequences processed
................................................................................................... 26 Mio. sequences processed
................................................................................................... 27 Mio. sequences processed
................................................................................................... 28 Mio. sequences processed
................................................................................................... 29 Mio. sequences processed
................................................................................................... 30 Mio. sequences processed
................................................................................................... 31 Mio. sequences processed
................................................................................................... 32 Mio. sequences processed
..............................................
Index table: removing duplicate entries...
Index table init done.
DB statistic
Entries: 3435016814
DB Size: 30850100884 (byte)
Avg Kmer Size: 2.68361
Top 10 Kmers
GITSPKL 7839
GNGGTPS 5634
DGVIGSP 3658
LLGPGKT 3280
GNGGTPT 3266
IDSNVGT 3030
FLNSHRT 2799
SERSRET 2370
DLIHDNS 2326
ERRDSNV 2257
Min Kmer Size: 0
Empty list: 596886817
Time for index table init: 0h 2m 35s 107ms
Query database type: Aminoacid
Target database type: Aminoacid
Time for init: 0h 2m 39s 698ms
Query database: /vol/attached/gtdb/mg_gtdb_update/tmp/NEWDB.newSeqs(size=93723190)
Process prefiltering step 1 of 10
k-mer similarity threshold: 106
k-mer match probability: 0
Starting prefiltering scores calculation (step 1 of 10)
Query db start 1 to 9887005
Target db start 1 to 32465074
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment