Skip to content

Instantly share code, notes, and snippets.

@nh13
Last active November 5, 2017 06:16
Show Gist options
  • Select an option

  • Save nh13/b103c353405102c3a47f1814e0c8bd69 to your computer and use it in GitHub Desktop.

Select an option

Save nh13/b103c353405102c3a47f1814e0c8bd69 to your computer and use it in GitHub Desktop.
A tool to regenerate the NM/UQ/MD tags on a SAM or BAM file Raw with logging
package com.nilshomer.bfx.examples.utilities
import com.fulcrumgenomics.FgBioDef.PathToFasta
import com.fulcrumgenomics.bam.Bams
import com.fulcrumgenomics.bam.api.{SamOrder, SamSource, SamWriter}
import com.fulcrumgenomics.cmdline.ClpGroups
import com.fulcrumgenomics.commons.CommonsDef.{PathToBam, SafelyClosable}
import com.fulcrumgenomics.commons.util.LazyLogging
import com.fulcrumgenomics.sopt.{arg, clp}
import com.fulcrumgenomics.util.{Io, ProgressLogger}
import com.nilshomer.bfx.examples.cmdline.BfxExamplesTool
import htsjdk.samtools.reference.ReferenceSequenceFileWalker
@clp(
description = "Regenerates the NM/UQ/MD tags on a SAM or BAM file.",
group = ClpGroups.Utilities
)
class RegenerateNmUqMdTags
( @arg(flag='i', doc="Input SAM or BAM file of aligned reads in coordinate order.") val input: PathToBam,
@arg(flag='o', doc="Output SAM or BAM file.") val output: PathToBam,
@arg(flag='r', doc="Reference sequence FASTA file.") val ref: PathToFasta
) extends BfxExamplesTool with LazyLogging {
Io.assertReadable(input)
Io.assertReadable(ref)
Io.assertCanWriteFile(output)
def execute(): Unit = {
val in = SamSource(this.input)
val walker = new ReferenceSequenceFileWalker(ref.toFile)
val out = SamWriter(output, in.header, ref=Some(ref))
val progress = ProgressLogger(logger, unit=1e6.toInt)
require(SamOrder(in.header).contains(SamOrder.Coordinate), "Input must be coordinate sorted")
logger.info(s"Reading from $input")
logger.info(s"Writing to $output")
in.foreach { rec =>
Bams.regenerateNmUqMdTags(rec, walker)
out += rec
progress.record(rec)
}
logger.info(s"Wrote ${progress.getCount} records")
in.safelyClose()
out.close()
}
}
$java -Xmx2g -jar tools/target/scala-2.12/bfx-examples-tools.jar RegenerateNmUqMdTags -i in.bam -r hs38DH.fa -o out.bam
[2017/11/04 23:13:24 | BfxExamplesMain | Info] Executing RegenerateNmUqMdTags from bfx-examples version 0.1.0-SNAPSHOT as nhomer@Nils on JRE 1.8.0_66-b17 without snappy
[2017/11/04 23:13:24 | RegenerateNmUqMdTags | Info] Reading from in.bam
[2017/11/04 23:13:24 | RegenerateNmUqMdTags | Info] Writing to out.bam
[2017/11/04 23:13:42 | RegenerateNmUqMdTags | Info] processed 1,000,000 records. Elapsed time: 00:00:18s. Time for last 1,000,000: 17s. Last read position: chr3:93,470,582
[2017/11/04 23:14:00 | SamWriter | Info] Wrote 2,000,000 records. Elapsed time: 00:00:35s. Time for last 2,000,000: 35s. Last read position: chr6:127,749,553
[2017/11/04 23:14:00 | RegenerateNmUqMdTags | Info] processed 2,000,000 records. Elapsed time: 00:00:35s. Time for last 1,000,000: 17s. Last read position: chr6:127,749,553
[2017/11/04 23:14:17 | RegenerateNmUqMdTags | Info] processed 3,000,000 records. Elapsed time: 00:00:52s. Time for last 1,000,000: 16s. Last read position: chr9:130,862,720
[2017/11/04 23:14:30 | SamWriter | Info] Wrote 4,000,000 records. Elapsed time: 00:01:05s. Time for last 2,000,000: 29s. Last read position: chr9:130,872,057
[2017/11/04 23:14:30 | RegenerateNmUqMdTags | Info] processed 4,000,000 records. Elapsed time: 00:01:05s. Time for last 1,000,000: 13s. Last read position: chr9:130,872,057
[2017/11/04 23:14:42 | RegenerateNmUqMdTags | Info] processed 5,000,000 records. Elapsed time: 00:01:18s. Time for last 1,000,000: 12s. Last read position: chr9:130,874,801
[2017/11/04 23:14:55 | SamWriter | Info] Wrote 6,000,000 records. Elapsed time: 00:01:30s. Time for last 2,000,000: 24s. Last read position: chr9:130,878,519
[2017/11/04 23:14:55 | RegenerateNmUqMdTags | Info] processed 6,000,000 records. Elapsed time: 00:01:30s. Time for last 1,000,000: 12s. Last read position: chr9:130,878,519
[2017/11/04 23:15:07 | RegenerateNmUqMdTags | Info] processed 7,000,000 records. Elapsed time: 00:01:43s. Time for last 1,000,000: 12s. Last read position: chr9:134,837,505
[2017/11/04 23:15:25 | SamWriter | Info] Wrote 8,000,000 records. Elapsed time: 00:02:00s. Time for last 2,000,000: 30s. Last read position: chr14:86,718,794
[2017/11/04 23:15:25 | RegenerateNmUqMdTags | Info] processed 8,000,000 records. Elapsed time: 00:02:00s. Time for last 1,000,000: 17s. Last read position: chr14:86,718,794
[2017/11/04 23:15:42 | RegenerateNmUqMdTags | Info] processed 9,000,000 records. Elapsed time: 00:02:17s. Time for last 1,000,000: 17s. Last read position: chr22:32,161,343
[2017/11/04 23:15:46 | RegenerateNmUqMdTags | Info] Wrote 9216420 records
[2017/11/04 23:15:47 | BfxExamplesMain | Info] RegenerateNmUqMdTags completed. Elapsed time: 2.40 minutes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment