Diferences on bowtie2 alignment time between concatenated and non-concatenated genomes
Concat number of references: 52,517
No concat number of references: 8,086,857
Number of reads: 164,369,171
# CONCAT
$ bowtie2-build --seed 42 --threads 24 genomes-concat.fa genomes-concat
...
Renaming genomes-concat.3.bt2l.tmp to genomes-concat.3.bt2l
Renaming genomes-concat.4.bt2l.tmp to genomes-concat.4.bt2l
Renaming genomes-concat.1.bt2l.tmp to genomes-concat.1.bt2l
Renaming genomes-concat.2.bt2l.tmp to genomes-concat.2.bt2l
Renaming genomes-concat.rev.1.bt2l.tmp to genomes-concat.rev.1.bt2l
Renaming genomes-concat.rev.2.bt2l.tmp to genomes-concat.rev.2.bt2l
real 1899m39.067s
user 30028m59.046s
sys 216m29.524s
$ bowtie2 -t -x genomes-concat -p 24 -D 15 -R 2 -N 1 -L 22 -i S,1,1.15 --np 1 --mp " 1,1" --rdg " 0,1" --rfg " 0,1" --score-min " L,0,-0.1" --no-unal -U bd5c17818444fb4fa10dbe268e5659af.fq.gz -t > genomes-concat.sam
Time loading reference: 00:05:25
Time loading forward index: 00:17:23
Time loading mirror index: 00:02:51
Multiseed full-index search: 05:49:30
164369171 reads; of these:
164369171 (100.00%) were unpaired; of these:
133880111 (81.45%) aligned 0 times
11556636 (7.03%) aligned exactly 1 time
18932424 (11.52%) aligned > 1 times
18.55% overall alignment rate
Time searching: 06:15:19
Overall time: 06:15:20
# 2.8 GB
$ time samtools sort -@24 -m10G genomes-concat.sam -O BAM -o genomes-concat.sorted.bam
[bam_sort_core] merging from 0 files and 24 in-memory blocks...
real 0m28.768s
user 1m36.260s
sys 0m2.123s
$ java -Xms2g -Xmx100g -jar picard.jar MarkDuplicates -INPUT genomes-concat.sorted.bam -OUTPUT genomes-concat.sorted.rmdup.bam -METRICS_FILE genomes-concat.sorted.rmdup.metrics -ASO null -VALIDATION_STRINGENCY LENIENT -MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 1000 -REMOVE_DUPLICATES TRUE --MAX_RECORDS_IN_RAM 10000000
...
[Sat Jun 17 20:37:37 CEST 2023] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 1.69 minutes.
Runtime.totalMemory ()=2147483648
# NO CONCAT
$ bowtie2-build --seed 42 --threads 48 genomes-noconcat.fa.gz genomes-noconcat
...
Renaming genomes-noconcat.3.bt2l.tmp to genomes-noconcat.3.bt2l
Renaming genomes-noconcat.4.bt2l.tmp to genomes-noconcat.4.bt2l
Renaming genomes-noconcat.1.bt2l.tmp to genomes-noconcat.1.bt2l
Renaming genomes-noconcat.2.bt2l.tmp to genomes-noconcat.2.bt2l
Renaming genomes-noconcat.rev.1.bt2l.tmp to genomes-noconcat.rev.1.bt2l
Renaming genomes-noconcat.rev.2.bt2l.tmp to genomes-noconcat.rev.2.bt2l
real 2085m12.387s
user 68766m15.297s
sys 210m33.451s
bowtie2 -t -x genomes-noconcat -p 24 -D 15 -R 2 -N 1 -L 22 -i S,1,1.15 --np 1 --mp " 1,1" --rdg " 0,1" --rfg " 0,1" --score-min " L,0,-0.1" --no-unal -U bd5c17818444fb4fa10dbe268e5659af.fq.gz -t > genomes-noconcat.sam
Time loading reference: 00:00:29
Time loading forward index: 00:02:01
Time loading mirror index: 00:01:02
Multiseed full-index search: 05:35:44
164369171 reads; of these:
164369171 (100.00%) were unpaired; of these:
133882682 (81.45%) aligned 0 times
11556509 (7.03%) aligned exactly 1 time
18929980 (11.52%) aligned > 1 times
18.55% overall alignment rate
Time searching: 05:39:48
Overall time: 05:39:49
# 3.3 GB
time samtools sort -@24 -m10G genomes-noconcat.sam -O BAM -o genomes-noconcat.sorted.bam
[bam_sort_core] merging from 0 files and 24 in-memory blocks...
real 0m45.846s
user 2m16.526s
sys 0m2.889s
$ java -Xms2g -Xmx100g -jar picard.jar MarkDuplicates -INPUT genomes-noconcat.sorted.bam -OUTPUT genomes-noconcat.sorted.rmdup.bam -METRICS_FILE genomes-noconcat.sorted.rmdup.metrics -ASO null -VALIDATION_STRINGENCY LENIENT -MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 1000 -REMOVE_DUPLICATES TRUE --MAX_RECORDS_IN_RAM 10000000
...
[Sat Jun 17 20:35:07 CEST 2023] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 4.79 minutes.
Runtime.totalMemory ()=30651973632