Skip to content

Instantly share code, notes, and snippets.

@mvdbeek
Created February 13, 2025 16:28
Show Gist options
  • Save mvdbeek/b53d73a5fcba2677b0706bcd7ab10704 to your computer and use it in GitHub Desktop.
Save mvdbeek/b53d73a5fcba2677b0706bcd7ab10704 to your computer and use it in GitHub Desktop.
workflows.yaml file for brc
workflows:
- trs_id: '#workflow/github.com/iwc-workflows/assembly-with-flye/main/versions/v0.2'
workflow_categories:
- ASSEMBLY
workflow_name: Genome assembly with Flye
workflow_description: Assemble long reads with Flye, then view assembly statistics
and assembly graph
ploidy: any
parameters:
Input sequence reads:
class: File
active: false
- trs_id: '#workflow/github.com/iwc-workflows/atacseq/main/versions/v1.0'
workflow_categories:
- REGULATION
workflow_name: ATACseq
workflow_description: 'This workflow takes as input a collection of paired fastq.
It will remove bad quality and adapters with cutadapt. Map with Bowtie2 end-to-end.
Will remove reads on MT and unconcordant pairs and pairs with mapping quality
below 30 and PCR duplicates. Will compute the pile-up on 5'' +- 100bp. Will call
peaks and count the number of reads falling in the 1kb region centered on the
summit. Will compute 2 normalization for coverage: normalized by million reads
and normalized by million reads in peaks. Will plot the number of reads for each
fragment length.'
ploidy: any
parameters:
PE fastq input:
class: Collection
reference_genome:
class: text
effective_genome_size:
class: integer
bin_size:
class: integer
active: false
- trs_id: '#workflow/github.com/iwc-workflows/average-bigwig-between-replicates/main/versions/v0.2'
workflow_categories:
- REGULATION
- TRANSCRIPTOMICS
workflow_name: Average Bigwig between replicates
workflow_description: 'We assume the identifiers of the input list are like:
sample_name_replicateID.
The identifiers of the output list will be:
sample_name'
ploidy: any
parameters:
Bigwig to average:
class: Collection
bin_size:
class: integer
active: false
- trs_id: '#workflow/github.com/iwc-workflows/bacterial-genome-assembly/main/versions/v1.1.5'
workflow_categories:
- ASSEMBLY
workflow_name: Bacterial Genome Assembly using Shovill
workflow_description: Assembly of bacterial paired-end short read data with generation
of quality metrics and reports
ploidy: any
parameters:
Input adapter trimmed sequence reads (forward):
class: File
ext:
- fastq
- fastq.gz
- fastqsanger
- fastqsanger.gz
Input adapter trimmed sequence reads (reverse):
class: File
ext:
- fastq
- fastq.gz
- fastqsanger
- fastqsanger.gz
active: false
- trs_id: '#workflow/github.com/iwc-workflows/baredsc/baredSC-1d-logNorm/versions/v0.5'
workflow_categories:
- TRANSCRIPTOMICS
workflow_name: baredSC_1d_logNorm
workflow_description: Run baredSC in 1 dimension in logNorm for 1 to N gaussians
and combine models.
ploidy: any
parameters:
Tabular with raw expression values:
class: File
ext: tabular
Gene name:
class: text
Maximum value in logNorm:
class: float
Maximum number of Gaussians to study:
class: integer
active: false
- trs_id: '#workflow/github.com/iwc-workflows/baredsc/baredSC-2d-logNorm/versions/v0.5'
workflow_categories:
- TRANSCRIPTOMICS
workflow_name: baredSC_2d_logNorm
workflow_description: Run baredSC in 2 dimensions in logNorm for 1 to N gaussians
and combine models.
ploidy: any
parameters:
Tabular with raw expression values:
class: File
ext: tabular
Gene name for x axis:
class: text
maximum value in logNorm for x-axis:
class: float
Gene name for y axis:
class: text
maximum value in logNorm for y-axis:
class: float
Maximum number of Gaussians to study:
class: integer
compute p-value:
class: boolean
active: false
- trs_id: '#workflow/github.com/iwc-workflows/brew3r/main/versions/v0.2'
workflow_categories:
- TRANSCRIPTOMICS
workflow_name: BREW3R
workflow_description: This workflow takes a collection of BAM (output of STAR) and
a gtf. It extends the input gtf using de novo annotation.
ploidy: any
parameters:
Input gtf:
class: File
ext: gtf
BAM collection:
class: Collection
ext: bam
strandedness:
class: text
minimum coverage:
class: integer
minimum FPKM for merge:
class: float
active: false
- trs_id: '#workflow/github.com/iwc-workflows/chipseq-pe/main/versions/v0.12'
workflow_categories:
- REGULATION
workflow_name: ChIPseq_PE
workflow_description: This workflow takes as input a collection of paired fastqs.
Remove adapters with cutadapt, map pairs with bowtie2. Keep MAPQ30 and concordant
pairs. MACS2 for paired bam.
ploidy: any
parameters:
PE fastq input:
class: Collection
adapter_forward:
class: text
adapter_reverse:
class: text
reference_genome:
class: text
effective_genome_size:
class: integer
normalize_profile:
class: boolean
active: false
- trs_id: '#workflow/github.com/iwc-workflows/chipseq-sr/main/versions/v0.12'
workflow_categories:
- REGULATION
workflow_name: ChIPseq_SR
workflow_description: This workflow takes as input a collection of fastqs (single
reads). Remove adapters with cutadapt, map with bowtie2. Keep MAPQ30. MACS2 for
bam with fixed extension or model.
ploidy: any
parameters:
SR fastq input:
class: Collection
adapter_forward:
class: text
reference_genome:
class: text
effective_genome_size:
class: integer
normalize_profile:
class: boolean
active: false
- trs_id: '#workflow/github.com/iwc-workflows/consensus-peaks/consensus-peaks-atac-cutandrun/versions/v1.2'
workflow_categories:
- REGULATION
workflow_name: Get Confident Peaks From ATAC or CUTandRUN replicates
workflow_description: This workflow takes as input BAM from ATAC-seq or CUT&RUN.
It calls peaks on each replicate and intersect them. In parallel, each BAM is
subsetted to smallest number of reads. Peaks are called using all subsets combined.
Only peaks called using a combination of all subsets which have summits intersecting
the intersection of at least x replicates will be kept.
ploidy: any
parameters:
n rmDup BAM:
class: Collection
ext: bam
Minimum number of overlap:
class: integer
effective_genome_size:
class: integer
bin_size:
class: integer
active: false
- trs_id: '#workflow/github.com/iwc-workflows/consensus-peaks/consensus-peaks-chip-pe/versions/v1.2'
workflow_categories:
- REGULATION
workflow_name: Get Confident Peaks From ChIP_PE replicates
workflow_description: This workflow takes as input PE BAM from ChIP-seq. It calls
peaks on each replicate and intersect them. In parallel, each BAM is subsetted
to smallest number of reads. Peaks are called using all subsets combined. Only
peaks called using a combination of all subsets which have summits intersecting
the intersection of at least x replicates will be kept.
ploidy: any
parameters:
n rmDup BAMPE:
class: Collection
ext: bam
Minimum number of overlap:
class: integer
effective_genome_size:
class: integer
bin_size:
class: integer
active: false
- trs_id: '#workflow/github.com/iwc-workflows/consensus-peaks/consensus-peaks-chip-sr/versions/v1.2'
workflow_categories:
- REGULATION
workflow_name: Get Confident Peaks From ChIP_SR replicates
workflow_description: This workflow takes as input SR BAM from ChIP-seq. It calls
peaks on each replicate and intersect them. In parallel, each BAM is subsetted
to smallest number of reads. Peaks are called using all subsets combined. Only
peaks called using a combination of all subsets which have summits intersecting
the intersection of at least x replicates will be kept.
ploidy: any
parameters:
n rmDup BAMSR:
class: Collection
ext: bam
Minimum number of overlap:
class: integer
effective_genome_size:
class: integer
bin_size:
class: integer
active: false
- trs_id: '#workflow/github.com/iwc-workflows/cutandrun/main/versions/v0.13'
workflow_categories:
- REGULATION
workflow_name: CUTandRUN
workflow_description: This workflow take as input a collection of paired fastq.
Remove adapters with cutadapt, map pairs with bowtie2 allowing dovetail. Keep
MAPQ30 and concordant pairs. BAM to BED. MACS2 with "ATAC" parameters.
ploidy: any
parameters:
PE fastq input:
class: Collection
adapter_forward:
class: text
adapter_reverse:
class: text
reference_genome:
class: text
effective_genome_size:
class: integer
normalize_profile:
class: boolean
active: false
- trs_id: '#workflow/github.com/iwc-workflows/fastq-to-matrix-10x/scrna-seq-fastq-to-matrix-10x-cellplex/versions/v0.5'
workflow_categories:
- TRANSCRIPTOMICS
workflow_name: scRNA-seq_preprocessing_10X_cellPlex
workflow_description: This workflow processes the CMO fastqs with CITE-seq-Count
and include the translation step required for cellPlex processing. In parallel
it processes the Gene Expresion fastqs with STARsolo, filter cells with DropletUtils
and reformat all outputs to be easily used by the function 'Read10X' from Seurat.
ploidy: any
parameters:
fastq PE collection GEX:
class: Collection
reference genome:
class: text
gtf:
class: File
cellranger_barcodes_3M-february-2018.txt:
class: File
Barcode Size is same size of the Read:
class: boolean
fastq PE collection CMO:
class: Collection
sample name and CMO sequence collection:
class: Collection
ext: csv
Number of expected cells:
class: integer
active: false
- trs_id: '#workflow/github.com/iwc-workflows/fastq-to-matrix-10x/scrna-seq-fastq-to-matrix-10x-v3/versions/v0.5'
workflow_categories:
- TRANSCRIPTOMICS
workflow_name: scRNA-seq_preprocessing_10X_v3_Bundle
workflow_description: This workflow processes the Gene Expresion fastqs with STARsolo,
filter cells with DropletUtils and reformat all outputs to be easily used by the
function 'Read10X' from Seurat.
ploidy: any
parameters:
fastq PE collection:
class: Collection
reference genome:
class: text
gtf:
class: File
cellranger_barcodes_3M-february-2018.txt:
class: File
Barcode Size is same size of the Read:
class: boolean
active: false
- trs_id: '#workflow/github.com/iwc-workflows/generic-variant-calling-wgs-pe/main/versions/v0.1.1'
workflow_categories:
- VARIANT_CALLING
workflow_name: Generic variation analysis on WGS PE data
workflow_description: Workflow for variant analysis against a reference genome
in GenBank format
ploidy: any
parameters:
Paired Collection:
class: Collection
ext:
- fastqsanger
- fastqsanger.gz
GenBank genome:
class: File
ext: genbank
Name for genome database:
class: text
active: false
- trs_id: '#workflow/github.com/iwc-workflows/goseq/main/versions/v0.1'
workflow_categories:
- TRANSCRIPTOMICS
workflow_name: Goseq GO-KEGG Enrichment Analysis
workflow_description: This workflow is used for GO and KEGG enrichment analysis
using GOseq tools.
ploidy: any
parameters:
Select genome to use:
class: text
Differential expression result:
class: File
ext: tabular
Select gene ID format:
class: text
gene length:
class: File
ext: tabular
KEGG pathways:
class: File
ext: tabular
active: false
- trs_id: '#workflow/github.com/iwc-workflows/haploid-variant-calling-wgs-pe/main/versions/v0.1'
workflow_categories:
- VARIANT_CALLING
workflow_name: Paired end variant calling in haploid system
workflow_description: Workflow for variant analysis against a reference genome in
GenBank format
ploidy: any
parameters:
Paired Collection:
class: Collection
ext:
- fastqsanger
- fastqsanger.gz
Annotation GTF:
class: File
Genome fasta:
class: File
active: false
- trs_id: '#workflow/github.com/iwc-workflows/hic-hicup-cooler/chic-fastq-to-cool-hicup-cooler/versions/v0.3'
workflow_categories:
- REGULATION
workflow_name: cHi-C_fastqToCool_hicup_cooler
workflow_description: This workflow take as input a collection of paired fastq.
It uses HiCUP to go from fastq to validPair file. The pairs are filtered for MAPQ
and for the region captured. Then, they are sorted by cooler to generate a tabix
dataset. Cooler is used to generate a balanced cool file to the desired resolution.
ploidy: any
parameters:
PE fastq input:
class: Collection
genome name:
class: text
Restriction enzyme:
class: text
No fill-in:
class: boolean
minimum MAPQ:
class: integer
Bin size in bp:
class: integer
Interactions to consider to calculate weights in normalization step:
class: text
capture region (chromosome):
class: text
capture region (start):
class: integer
capture region (end):
class: integer
active: false
- trs_id: '#workflow/github.com/iwc-workflows/hic-hicup-cooler/hic-fastq-to-cool-hicup-cooler/versions/v0.3'
workflow_categories:
- REGULATION
workflow_name: Hi-C_fastqToCool_hicup_cooler
workflow_description: This workflow takes as input a collection of paired fastq.
It uses HiCUP to go from fastq to validPair file using the middle of the fragment
as coordinates. The pairs are filtered for MAPQ and sorted by cooler to generate
a tabix dataset. Cooler is used to generate a balanced cool file to the desired
resolution.
ploidy: any
parameters:
PE fastq input:
class: Collection
genome name:
class: text
Restriction enzyme:
class: text
No fill-in:
class: boolean
minimum MAPQ:
class: integer
Bin size in bp:
class: integer
Interactions to consider to calculate weights in normalization step:
class: text
region for matrix plotting:
class: text
active: false
- trs_id: '#workflow/github.com/iwc-workflows/hic-hicup-cooler/hic-fastq-to-pairs-hicup/versions/v0.3'
workflow_categories:
- REGULATION
workflow_name: Hi-C_fastqToPairs_hicup
workflow_description: This workflow takes as input a collection of paired fastq.
It uses HiCUP to go from fastq to validPair file. First truncate the fastq using
the cutting sequence to guess the fill-in. Then map the truncated fastq. Then
asign to fragment and filter the self-ligated and dandling ends or internal (it
can also filter for the size). Then it removes the duplicates. Convert the output
to be compatible with juicebox or cooler using the middle of the fragment as coordinates.
Finally filter for mapping quality
ploidy: any
parameters:
PE fastq input:
class: Collection
genome name:
class: text
Restriction enzyme:
class: text
No fill-in:
class: boolean
minimum MAPQ:
class: integer
active: false
- trs_id: '#workflow/github.com/iwc-workflows/hic-hicup-cooler/hic-juicermediumtabix-to-cool-cooler/versions/v0.3'
workflow_categories:
- REGULATION
workflow_name: Hi-C_juicermediumtabixToCool_cooler
workflow_description: This workflow uses as input a collection of juicer medium
tabix files and a genome name. It builds balanced cool file to the desired resolution.
ploidy: any
parameters:
Bin size in bp:
class: integer
genome name:
class: text
Juicer Medium Tabix with validPairs:
class: Collection
Interactions to consider to calculate weights in normalization step:
class: text
active: false
- trs_id: '#workflow/github.com/iwc-workflows/polish-with-long-reads/main/versions/v0.1'
workflow_categories:
- ASSEMBLY
workflow_name: Assembly polishing with long reads
workflow_description: Racon polish with long reads, x4
ploidy: any
parameters:
Assembly to be polished:
class: File
long reads:
class: File
'minimap setting (for long reads) ':
class: text
active: false
- trs_id: '#workflow/github.com/iwc-workflows/pseudobulk-worflow-decoupler-edger/main/versions/v0.1.1'
workflow_categories:
- TRANSCRIPTOMICS
workflow_name: Differential gene expression for single-cell data using pseudo-bulk
counts with edgeR
workflow_description: This workflow uses the decoupler tool in Galaxy to generate
pseudobulk counts from an annotated AnnData file obtained from scRNA-seq analysis.
Following the pseudobulk step, differential expression genes (DEG) are calculated
using the edgeR tool. The workflow also includes data sanitation steps to ensure
smooth operation of edgeR and minimizing potential issues. Additionally, a Volcano
plot tool is used to visualize the results after the DEG analysis.
ploidy: any
parameters:
Source AnnData file:
class: File
ext:
- h5
- h5ad
'Pseudo-bulk: Fields to merge':
class: text
Group by column:
class: text
Sample key column:
class: text
Name Your Raw Counts Layer:
class: text
Factor fields:
class: text
Formula:
class: text
Gene symbol column:
class: text
active: false
- trs_id: '#workflow/github.com/iwc-workflows/quality-and-contamination-control/main/versions/v1.1.6'
workflow_categories:
- ASSEMBLY
workflow_name: Quality and Contamination Control For Genome Assembly
workflow_description: Short paired-end read analysis to provide quality analysis,
read cleaning and taxonomy assignation
ploidy: any
parameters:
Input sequence reads (forward):
class: File
ext:
- fastq
- fastq.gz
- fastqsanger
- fastqsanger.gz
Input sequence reads (reverse):
class: File
ext:
- fastq
- fastq.gz
- fastqsanger
- fastqsanger.gz
Select a taxonomy database:
class: text
Select a NCBI taxonomy database:
class: text
active: false
- trs_id: '#workflow/github.com/iwc-workflows/rnaseq-de/main/versions/v0.2'
workflow_categories:
- TRANSCRIPTOMICS
workflow_name: RNAseq_DE_filtering_plotting
workflow_description: 'This workflow can only work on an experimental setup with
exactly 2 conditions. It takes two collections of count tables as input and performs
differential expression analysis. Additionally it filters for DE genes based on
adjusted p-value and log2 fold changes thresholds. It also generates informative
plots.
'
ploidy: any
parameters:
Counts from changed condition:
class: Collection
Counts from reference condition:
class: Collection
Count files have header:
class: boolean
Gene Annotaton:
class: File
Adjusted p-value threshold:
class: float
log2 fold change threshold:
class: float
active: false
- trs_id: '#workflow/github.com/iwc-workflows/rnaseq-pe/main/versions/v1.1'
workflow_categories:
- TRANSCRIPTOMICS
workflow_name: RNA-seq for Paired-end fastqs
workflow_description: 'This workflow takes as input a list of paired-end fastqs.
Adapters and bad quality bases are removed with fastp. Reads are mapped with STAR
with ENCODE parameters and genes are counted simultaneously as well as normalized
coverage (per million mapped reads) on uniquely mapped reads. The counts are reprocessed
to be similar to HTSeq-count output. Alternatively, featureCounts can be used
to count the reads/fragments per gene. FPKM are computed with cufflinks and/or
with StringTie. The unstranded normalized coverage is computed with bedtools.
'
ploidy: any
parameters:
Collection paired FASTQ files:
class: Collection
Forward adapter:
class: text
Reverse adapter:
class: text
Generate additional QC reports:
class: boolean
Reference genome:
class: text
GTF file of annotation:
class: File
Strandedness:
class: text
Use featureCounts for generating count tables:
class: boolean
Compute Cufflinks FPKM:
class: boolean
GTF with regions to exclude from FPKM normalization with Cufflinks:
class: File
Compute StringTie FPKM:
class: boolean
active: false
- trs_id: '#workflow/github.com/iwc-workflows/rnaseq-sr/main/versions/v1.1'
workflow_categories:
- TRANSCRIPTOMICS
workflow_name: RNA-seq for Single-read fastqs
workflow_description: 'This workflow takes as input a list of single-end fastqs.
Adapters and bad quality bases are removed with fastp. Reads are mapped with STAR
with ENCODE parameters and genes are counted simultaneously as well as normalized
coverage (per million mapped reads) on uniquely mapped reads. The counts are reprocessed
to be similar to HTSeq-count output. Alternatively, featureCounts can be used
to count the reads/fragments per gene. FPKM are computed with cufflinks and/or
with StringTie. The unstranded normalized coverage is computed with bedtools.
'
ploidy: any
parameters:
Collection of FASTQ files:
class: Collection
Forward adapter:
class: text
Generate additional QC reports:
class: boolean
Reference genome:
class: text
GTF file of annotation:
class: File
Strandedness:
class: text
Use featureCounts for generating count tables:
class: boolean
Compute Cufflinks FPKM:
class: boolean
GTF with regions to exclude from FPKM normalization with Cufflinks:
class: File
Compute StringTie FPKM:
class: boolean
active: false
- trs_id: '#workflow/github.com/iwc-workflows/variation-reporting/main/versions/v0.1.1'
workflow_categories:
- VARIANT_CALLING
workflow_name: Generic variation analysis reporting
workflow_description: This workflow takes a VCF dataset of variants produced by
any of the variant calling workflows in https://github.com/galaxyproject/iwc/tree/main/workflows/sars-cov-2-variant-calling
and generates tabular lists of variants by Samples and by Variant, and an overview
plot of variants and their allele-frequencies.
ploidy: any
parameters:
Variation data to report:
class: Collection
ext:
- vcf
- vcf_bgzip
AF Filter:
class: float
DP Filter:
class: integer
DP_ALT Filter:
class: integer
active: false
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment