Skip to content

Instantly share code, notes, and snippets.

@ionox0
ionox0 / toil_tmp_dirs.md
Last active December 11, 2019 17:15
Understanding Toil temporary directories

Here are all of the current arguments related to input / output / temporary directories used by Toil and cwltool:


cwltool:

--outdir - Final outputs directory

--tmpdir-prefix - This is a prefix for folders that do not yet exist, but will be used as intermediate working directories

@ionox0
ionox0 / bsub_copy.py
Last active January 27, 2020 15:53
Script to copy a bunch of folders using bsub
# Uses bsub with rsync to copy large folders
#
# Uses a walltime limit of 60 hours (should be enough for very large folders)
#
# Runs in parallel across each subfolder of the source folder, and creates logs for each as well
#
# Usage: python bsub_copy.py /source/folder /dest/folder
import os
import sys
@ionox0
ionox0 / CWL_style_guide.md
Last active October 30, 2019 14:38
CWL style guide

Building off of the CWL Recommended Best Practices

1. Use label: to describe what happens in steps:

class: Workflow

label: A workflow to do 1. 2. 3.

steps:
@ionox0
ionox0 / fake_dir_structure_copy.py
Created October 28, 2019 17:22
copying a directory structure to a new dummy structure in python
# Todo: you must already be in the directory that you are trying to copy
output_location = '/some/other/place'
for (root, dirs, files) in os.walk('.'):
# Make the new directory
if not os.path.exists(os.path.join(output_location, root)):
os.mkdir(os.path.join(output_location, root))
# Make the new (empty) files
for file in files:
@ionox0
ionox0 / conformance.md
Last active October 21, 2019 13:58
CWL conformance testing

Run the conformance tests using toil, specifically for the lsf batch system

./run_test.sh RUNNER=toil-cwl-runner EXTRA="--batchSystem lsf"
@ionox0
ionox0 / gist:1211f4dfa8d554bf62565572345ad783
Created October 17, 2019 20:39
CWL Developers Meeting Notes
Thursday - Intros, topics
CWL 1.1
- WallTime
-
Extensions
-
runIf --> when
@ionox0
ionox0 / toil_lsf.md
Last active September 24, 2019 23:19
toil_lsf_job_issue

In case anyone’s still wondering how Toil’s lsf.py job tracking works on Juno.

bjobs -l <jobid> will work on Juno, but the string that is attempting to be matched is wrong:

Tue Sep 24 19:12:37: Started 1 Task(s) on Host(s) <jx15>, Allocated 1 Slot(s) 

DOES NOT MATCH WITH:

                elif line.find("Started on ") > -1:
@ionox0
ionox0 / tips.md
Last active September 11, 2019 14:08
Toil & CWL tips

For comprehensive Toil explanation see: https://media.nature.com/original/nature-assets/nbt/journal/v35/n4/extref/nbt.3772-S1.pdf

See logs for just one job (by using the full log file, requires knowing the job's toil-generated ID)

cat cwltoil.log | grep jobVM1fIs

Grep for full commands from toil logs

This gives you a more concise view of the commands being run (note that this information is only available from Toil when running with --logDebug, or when using the pipeline_submit script with --logLevel DEBUG)

@ionox0
ionox0 / toil_job_debug.md
Last active August 14, 2019 15:48
Debugging a failed job in the Toil jobstore

Note: The "job" that we're dealing with in this case is actually an instance of the Toil JobGraph class. Here is the JobGraph class hierarchy. You can see that a JobGraph is similar to a Job, as it is one of it's descendants.

Here is the job class for completeness's sake.

Let's try an example

From the Toil log files, you will be able to find the Toil job ID for the failed job. Here we show a job being submitted for the third time after two failed attempts. We can see the cwl file that defines the job, along with resource requirements, and the bsub command that is issued.

1.

There is then logging to indicate the failure of the job with job ID 9ntS9h.

@ionox0
ionox0 / toil_debug.md
Last active August 13, 2019 17:49
Debugging a failed job in the Toil jobstore

From the Toil log files, you will be able to find the Toil job ID for the failed job. Here we show a job being submitted for the third time after two failed attempts. We can see the cwl file that defines the job, along with resource requirements, and the bsub command that is issued.

1.

There is then logging to indicate the failure of the job with job ID 9ntS9h.

DEBUG:toil.batchSystems.abstractGridEngineBatchSystem:Issued the job command: /home/johnsoni/virtualenvs/pipeline_1.1.14/bin/_toil_worker file:///home/johnsoni/pipeline_1.1.14/ACCESS-Pipeline/cwl_tools/trimgalore/trimgalore.cwl file:/home/johnsoni/juno_ACCESS/5500-FZ/5500-FZ-1.1.14/tmp/jobstore-2888ebd4-bd4a-11e9-a01d-ec0d9a88a15a u/v/job9ntS9h with job id: 45
INFO:toil.leader:Issued job 'file:///home/johnsoni/pipeline_1.1.14/ACCESS-Pipeline/cwl_tools/trimgalore/trimgalore.cwl' u/v/job9ntS9h with job batch system ID: 45 and cores: 2, disk: 20.5 G, and memory: 15.6 G

...