Skip to content

Instantly share code, notes, and snippets.

@danielecook
Last active July 17, 2019 14:08
Show Gist options
  • Save danielecook/389f52e9e5e6737150a14f0783157898 to your computer and use it in GitHub Desktop.
Save danielecook/389f52e9e5e6737150a14f0783157898 to your computer and use it in GitHub Desktop.
conda activate primary-env
function extract_gts() {
i=${1}
patient=$(echo ${i} | egrep -o "LTX[0-9]+" | head -n 1)
sample=$(echo ${i} | cut -f 5 -d "/")
vcf2tsv -g ${i} | \
awk -v patient=${patient} -v fname=${i} -v OFS="\t" -v sample=${sample} 'NR == 1 { print "patient", "sample", "fname", $0 } NR > 1 { print patient, sample, fname, $0 }' | \
awk '!arr[$0]++' > gl_genotypes/${patient}.tsv
}
export -f extract_gts
# Get a list of germline files
ls release_*/**/exome/Platypus/*GL/Analysis_SNPs/*raw_GL_calls.vcf > gl_files.txt
# Parallel extract genotypes
parallel --verbose -j 32 extract_gts :::: gl_files.txt
# Combine genotype files
tut stack *.tsv | pigz -p 32 > gl_genotypes.tsv.gz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment