ideas for some new nextflow operators (would require htsjdk lib) . was : https://twitter.com/PaoloDiTommaso/status/1159073940448432129

sampleAndBam

returns an array (sample,bam) from a bam or a file.

it gets the SAM header from a BAM or a CRAM and extracts the identifier from the sam read groups https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/samtools/SAMReadGroupRecord.html

Would be the equivalent of:

find "${params.bamdir}" -type f -name "*.bam" | while read F
do
	samtools view -H \$F | grep '^@RG' | tr "\\t" "\\n" | grep ^SM | cut -d ':' -f 2 | head -n 1 | tr "\\n" ","  && echo \$F
done > inputs.csv

Channel
    .fromPath("/path/to/input.bam")
    .sampleAndBam()
    .map { T->  T[0]+" "+T[1] }
    .println { it }


S1 /path/to/input.bam

you can also specify wich part of the read group you want' to extract

Channel
    .fromPath("/path/to/input.bam")
    .sampleAndBam(by:"LB")
    .map { T->  T[0]+" "+T[1] }
    .println { it }

L2 /path/to/input.bam
L1 /path/to/input.bam

samDictionary

extracts the SAM sequence dictionary from a file using https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/variant/utils/SAMSequenceDictionaryExtractor.html

returns an array (name,length,file)

Channel
    .fromPath("/path/to/input.vcf")
    .samDictionary()
    .map { T->  T[0]+",length="+T[1]+","+T[2] }
    .println { it }


chr1,length=249250621,/path/to/input.vcf
chr2,length=243199373,/path/to/input.vcf

sampleInVcf

returns an array (sampleName,file)

extract the Samples from a VCF using : https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/variant/vcf/VCFHeader.html#getSampleNamesInOrder--

would be the equivalent of

bcftools view --header-only ${vcf} | grep "^#CHROM" -m1 | cut -f 10- | tr "\\t" "\\n" | awk '{printf("%s,${vcf}\\n",\$1);}

Channel
    .fromPath("/path/to/input.vcf")
    .sampleInVcf()
    .map { T->  T[0]+","+T[2] }
    .println { it }


sample1,/path/to/input.vcf
sample2,/path/to/input.vcf
sample3,/path/to/input.vcf
sample4,/path/to/input.vcf

lindenb/README.md

sampleAndBam

samDictionary

sampleInVcf

pditommaso commented Aug 7, 2019

Uh oh!

lindenb commented Aug 7, 2019

Uh oh!

pditommaso commented Aug 7, 2019 •

edited

Loading

Uh oh!

lindenb commented Aug 7, 2019

Uh oh!

pditommaso commented Aug 7, 2019

Uh oh!

lindenb/README.md

sampleAndBam

samDictionary

sampleInVcf

pditommaso commented Aug 7, 2019

Uh oh!

lindenb commented Aug 7, 2019

Uh oh!

pditommaso commented Aug 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lindenb commented Aug 7, 2019

Uh oh!

pditommaso commented Aug 7, 2019

Uh oh!

pditommaso commented Aug 7, 2019 •

edited

Loading