Skip to content

Instantly share code, notes, and snippets.

View lindenb's full-sized avatar
😶
I hate people.

Pierre Lindenbaum lindenb

😶
I hate people.
View GitHub Profile
@lindenb
lindenb / Usage.md
Created June 11, 2022 12:49
https://www.biostars.org/p/9526679/ How to merge 20K single-sample VCFs *without* using plink or plink2? #bcftools #nextflow

Usage

find path/to/dir -type f -name "S*.vcf.gz" > jeter.list
nextflow run --vcfs ${PWD}/jeter.list biostar9526718.nf

add -C config.cfg toconfigure your cluster config....

@lindenb
lindenb / biostar9524046.nf
Created May 21, 2022 14:26
biostar9524046.nf https://www.biostars.org/p/9524046/ Forum:Compare the samples in a VCF concordance picard
nextflow.enable.dsl=2
params.vcf=""
workflow {
picard = downloadPicard()
samples_ch = vcf2samples(params.vcf).splitCsv(header: false,sep:'\t',strip:true)
pair_ch = samples_ch.combine(samples_ch).filter{T->!T[1].equals(T[3])}
concordances_ch = concordance(picard,pair_ch)
@lindenb
lindenb / biostar9523782.nf
Last active May 19, 2022 16:28
https://www.biostars.org/p/9523782/ blast biostar nextflow fasta sequence align
nextflow.enable.dsl=2
/* full path to query directory */
params.qdir="/DIR1"
/* full path to target/database directory */
params.tdir="/DIR2"
workflow {
#!/bin/bash
# https://git.521000.bestmunity/t/how-to-create-full-release-from-command-line-not-just-a-tag/916/2
if [ "$#" -ne 2 ]; then
echo "Expected: 'version' 'message'"
exit -1
fi
@lindenb
lindenb / FastGatk.java
Created June 9, 2021 08:37
Fast GATK: calling a small region for a large number of Bams with GATK HaplotypeCaller
/*
The MIT License (MIT)
Copyright (c) 2021 Pierre Lindenbaum
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
@lindenb
lindenb / cromwell.log
Created February 24, 2021 18:01
cromwell wdl
$ time java -jar cromwell-57.jar run jeter.wdl
[2021-02-24 18:43:16,75] [info] Running with database db.url = jdbc:hsqldb:mem:43a7a2b5-d8e7-4129-a9d5-c939dbf31ddf;shutdown=false;hsqldb.tx=mvcc
[2021-02-24 18:45:41,45] [info] Running migration RenameWorkflowOptionsInMetadata with a read batch size of 100000 and a write batch size of 100000
[2021-02-24 18:45:42,04] [info] [RenameWorkflowOptionsInMetadata] 100%
[2021-02-24 18:45:44,94] [info] Running with database db.url = jdbc:hsqldb:mem:f57101ad-0a60-4bc1-947f-efb7610b1497;shutdown=false;hsqldb.tx=mvcc
[2021-02-24 18:45:55,25] [info] Slf4jLogger started
[2021-02-24 18:45:58,25] [info] Workflow heartbeat configuration:
{
"cromwellId" : "cromid-3d07b24",
@lindenb
lindenb / .gitignore
Last active November 20, 2020 14:15
a basic GWAK extension generating the reverse complement of a DNA sequence
*.o
*.so
*~
gawk-*
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Sequence Braiding</title>
<meta name="description" content="">
<meta name="author" content="">
@lindenb
lindenb / Makefile
Last active October 6, 2020 13:31
test nextflow dsl2.
PREFIX=20201006.DSL2
OUTDIR=work
all: main.nf $(OUTDIR)/jeter.fastqs.list
mkdir -p $(OUTDIR)
nextflow run -with-dag workflow.dot -with-report -with-timeline -with-trace -resume \
-work-dir "$(OUTDIR)" \
--fastqs $(OUTDIR)/jeter.fastqs.list \
$<
@echo "output is $(OUTDIR)"
@lindenb
lindenb / Usage.md
Created September 10, 2020 12:23
Question: is there any way to filter NCBI datasets by sample type?
$ wget -q -O - "https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE19826&targ=self&form=xml&view=quick" |\
  xsltproc biostar460606.xsl - | bash
 
GSM495051 !Sample_source_name_ch1 = noncancer tissue
GSM495052 !Sample_source_name_ch1 = gastric cancer tissue
GSM495053 !Sample_source_name_ch1 = noncancer tissue
GSM495054 !Sample_source_name_ch1 = gastric cancer tissue
GSM495055 !Sample_source_name_ch1 = noncancer tissue
GSM495056 !Sample_source_name_ch1 = gastric cancer tissue