QC Illumina flowcell, demultiplex, and QC mapped BAM
Installation · Usage · Future Improvements
The main pipeline multi-sample-run-level-pipeline.Snakefile depends on bwa, Picard, Python 3.6+, and R.
❯ snakemake \
-s multi-sample-run-level-pipeline.Snakefile \
--cores ${max_cores} \
--rerun-incomplete \
--retry-times ${retry_times} \
--config \
run_folder=${run_folder} \
sample_sheet=${sample_sheet} \
reference=${reference} \
bait_intervals=${bait_intervals} \
run_output=${run_output}All wrapped tools support on-the-fly Python to tool argument conversion.
For example, the following conversion is applied for a typical Picard tool:
>>> params:
>>> create_index=True,
>>> output_extension=None,
>>> adapters_to_check=['INDEXED', 'FLUIDIGM']
# Represented as the following CLI expansion
"""
picard tool \
CREATE_INDEX=true \
OUTPUT_EXTENSION=null \
ADAPTERS_TO_CHECK=INDEXED
ADAPTERS_TO_CHECK=FLUIDIGM
"""A similar example for bwa:
>>> params:
>>> v=2,
>>> p=True
# Represented as the following CLI expansion
'bwa mem -p -v2'A graph representation of this pipeline for analyzing one sample:
All Picard tool support the following Snakemake resource objects. Boolean choices must be represented as an integer binary assignment e.g. use_async_io_read_samtools=1.
-XX:GCHeapFreeLimit={gc_heap_free_limit}-XX:GCTimeLimit={gc_time_limit}-Xmx{malloc}m-Dsamjdk.buffer_size={samjdk_buffer_size}-Dsamjdk.use_async_io_read_samtools={use_async_io_read_samtools}-Dsamjdk.use_async_io_write_samtools={use_async_io_read_samtools}
All tasks for all samples log to ${run_output}/logs.
- Duplicates are marked but not removed.
- Target intervals are assumed to be the same as the bait intervals.
- Unique sample settings of bait, target, and reference genomes is easily implemented but not supported yet.
- A rule cannot have both
dynamicandstaticoutputs so all compressed barcode files go untracked in the DAG. - The basecalls run input directory is hard-coded for the NextSeq platform.
- The path to all task wrappers is hard-coded as they have not been pushed to the official Snakemake wrapper repository.
- Resources and resource scaling on retry attempt are hard-coded and, at the moment, not configurable through a master configuration file. Resources can be limited with the CLI option
--resources.
TODO:
These tasks are not necessary but would increase the reproducibility, portability, and ease-of-use of these pipelines:
- Use tagged master of
snakemake_wrappersGitHub branch - Comment assignments and methods of this script (pre-rule definitions)
- Sub-pipeline pre-alignment, alignment, and post-alignment
- Mark
unmappedandmapped.rawas temp() - Abstract away some config settings to a YAML file
-
Useclickfor a CLI interface to all pipelines - Write
setup.pyinstaller script - Support
environment.ymlin all wrappers - Write tests in all wrappers
- Lookup bait_intervals, target_intervals, and reference for each sample (flowcells with mixed sample sets)
- Deploy only using Miniconda and Snakemake

