returns an array (sample,bam)
from a bam or a file.
it gets the SAM header from a BAM or a CRAM and extracts the identifier from the sam read groups https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/samtools/SAMReadGroupRecord.html
Would be the equivalent of:
find "${params.bamdir}" -type f -name "*.bam" | while read F
do
samtools view -H \$F | grep '^@RG' | tr "\\t" "\\n" | grep ^SM | cut -d ':' -f 2 | head -n 1 | tr "\\n" "," && echo \$F
done > inputs.csv
Channel
.fromPath("/path/to/input.bam")
.sampleAndBam()
.map { T-> T[0]+" "+T[1] }
.println { it }
S1 /path/to/input.bam
you can also specify wich part of the read group you want' to extract
Channel
.fromPath("/path/to/input.bam")
.sampleAndBam(by:"LB")
.map { T-> T[0]+" "+T[1] }
.println { it }
L2 /path/to/input.bam
L1 /path/to/input.bam
extracts the SAM sequence dictionary from a file using https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/variant/utils/SAMSequenceDictionaryExtractor.html
returns an array (name,length,file)
Channel
.fromPath("/path/to/input.vcf")
.samDictionary()
.map { T-> T[0]+",length="+T[1]+","+T[2] }
.println { it }
chr1,length=249250621,/path/to/input.vcf
chr2,length=243199373,/path/to/input.vcf
returns an array (sampleName,file)
extract the Samples from a VCF using : https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/variant/vcf/VCFHeader.html#getSampleNamesInOrder--
would be the equivalent of
bcftools view --header-only ${vcf} | grep "^#CHROM" -m1 | cut -f 10- | tr "\\t" "\\n" | awk '{printf("%s,${vcf}\\n",\$1);}
Channel
.fromPath("/path/to/input.vcf")
.sampleInVcf()
.map { T-> T[0]+","+T[2] }
.println { it }
sample1,/path/to/input.vcf
sample2,/path/to/input.vcf
sample3,/path/to/input.vcf
sample4,/path/to/input.vcf
Looks cool. I would only suggest maintaining the same naming pattern, something like:
splitBam
,splitSamDictionary
,splitVcf
.. or something along these lines.