Skip to content

Instantly share code, notes, and snippets.

@genomewalker
Last active June 22, 2023 05:09
Show Gist options
  • Save genomewalker/a8e6aef2d6dc2c2abcf0d5f770362cf4 to your computer and use it in GitHub Desktop.
Save genomewalker/a8e6aef2d6dc2c2abcf0d5f770362cf4 to your computer and use it in GitHub Desktop.

Diferences on bowtie2 alignment time between concatenated and non-concatenated genomes

  • Concat number of references: 52,517
  • No concat number of references: 8,086,857
  • Number of reads: 164,369,171
# CONCAT
$ bowtie2-build --seed 42 --threads 24 genomes-concat.fa genomes-concat

...
Renaming genomes-concat.3.bt2l.tmp to genomes-concat.3.bt2l
Renaming genomes-concat.4.bt2l.tmp to genomes-concat.4.bt2l
Renaming genomes-concat.1.bt2l.tmp to genomes-concat.1.bt2l
Renaming genomes-concat.2.bt2l.tmp to genomes-concat.2.bt2l
Renaming genomes-concat.rev.1.bt2l.tmp to genomes-concat.rev.1.bt2l
Renaming genomes-concat.rev.2.bt2l.tmp to genomes-concat.rev.2.bt2l

real    1899m39.067s
user    30028m59.046s
sys     216m29.524s

$ bowtie2 -t -x genomes-concat -p 24 -D 15 -R 2 -N 1 -L 22 -i S,1,1.15 --np 1 --mp "1,1" --rdg "0,1" --rfg "0,1" --score-min "L,0,-0.1" --no-unal -U bd5c17818444fb4fa10dbe268e5659af.fq.gz -t  > genomes-concat.sam

Time loading reference: 00:05:25
Time loading forward index: 00:17:23
Time loading mirror index: 00:02:51
Multiseed full-index search: 05:49:30
164369171 reads; of these:
  164369171 (100.00%) were unpaired; of these:
    133880111 (81.45%) aligned 0 times
    11556636 (7.03%) aligned exactly 1 time
    18932424 (11.52%) aligned >1 times
18.55% overall alignment rate
Time searching: 06:15:19
Overall time: 06:15:20

# 2.8 GB
$ time samtools sort -@24 -m10G genomes-concat.sam -O BAM -o genomes-concat.sorted.bam
[bam_sort_core] merging from 0 files and 24 in-memory blocks...

real    0m28.768s
user    1m36.260s
sys     0m2.123s


$ java -Xms2g -Xmx100g -jar picard.jar MarkDuplicates -INPUT genomes-concat.sorted.bam -OUTPUT genomes-concat.sorted.rmdup.bam -METRICS_FILE genomes-concat.sorted.rmdup.metrics -ASO null -VALIDATION_STRINGENCY LENIENT -MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 1000 -REMOVE_DUPLICATES TRUE --MAX_RECORDS_IN_RAM 10000000

...
[Sat Jun 17 20:37:37 CEST 2023] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 1.69 minutes.
Runtime.totalMemory()=2147483648
# NO CONCAT
$ bowtie2-build --seed 42 --threads 48 genomes-noconcat.fa.gz genomes-noconcat

...
Renaming genomes-noconcat.3.bt2l.tmp to genomes-noconcat.3.bt2l
Renaming genomes-noconcat.4.bt2l.tmp to genomes-noconcat.4.bt2l
Renaming genomes-noconcat.1.bt2l.tmp to genomes-noconcat.1.bt2l
Renaming genomes-noconcat.2.bt2l.tmp to genomes-noconcat.2.bt2l
Renaming genomes-noconcat.rev.1.bt2l.tmp to genomes-noconcat.rev.1.bt2l
Renaming genomes-noconcat.rev.2.bt2l.tmp to genomes-noconcat.rev.2.bt2l

real    2085m12.387s
user    68766m15.297s
sys     210m33.451s


bowtie2 -t -x genomes-noconcat -p 24 -D 15 -R 2 -N 1 -L 22 -i S,1,1.15 --np 1 --mp "1,1" --rdg "0,1" --rfg "0,1" --score-min "L,0,-0.1" --no-unal -U bd5c17818444fb4fa10dbe268e5659af.fq.gz -t  > genomes-noconcat.sam

Time loading reference: 00:00:29
Time loading forward index: 00:02:01
Time loading mirror index: 00:01:02
Multiseed full-index search: 05:35:44
164369171 reads; of these:
  164369171 (100.00%) were unpaired; of these:
    133882682 (81.45%) aligned 0 times
    11556509 (7.03%) aligned exactly 1 time
    18929980 (11.52%) aligned >1 times
18.55% overall alignment rate
Time searching: 05:39:48
Overall time: 05:39:49

# 3.3 GB
time samtools sort -@24 -m10G genomes-noconcat.sam -O BAM -o genomes-noconcat.sorted.bam
[bam_sort_core] merging from 0 files and 24 in-memory blocks...

real    0m45.846s
user    2m16.526s
sys     0m2.889s

$ java -Xms2g -Xmx100g -jar picard.jar MarkDuplicates -INPUT genomes-noconcat.sorted.bam -OUTPUT genomes-noconcat.sorted.rmdup.bam -METRICS_FILE genomes-noconcat.sorted.rmdup.metrics -ASO null -VALIDATION_STRINGENCY LENIENT -MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 1000 -REMOVE_DUPLICATES TRUE --MAX_RECORDS_IN_RAM 10000000

...
[Sat Jun 17 20:35:07 CEST 2023] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 4.79 minutes.
Runtime.totalMemory()=30651973632
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment