Skip to content

Instantly share code, notes, and snippets.

@scottfrazer
Created March 22, 2016 20:15
Show Gist options
  • Save scottfrazer/a0139cb777dcb98ea94a to your computer and use it in GitHub Desktop.
Save scottfrazer/a0139cb777dcb98ea94a to your computer and use it in GitHub Desktop.

Cromtool

Python command line utility for working with Cromwell in DSDE

Installation

Tested with Python 3, might work with Python 2

python setup.py install

Virtual environment recommended setup

Make sure Python 3.4+ is installed. From the root of this repository:

$ pyvenv-3.5 venv
$ source venv/bin/activate
$ python setup.py develop

It's a good idea to now symbolically link ./venv/bin/cromtool into a directory in your PATH. I typically link it into ~/bin. Any time you modify the contents of this directory, the cromtool executable will be immediately affected.

First-time Setup

Run cromtool to initialize directories and files:

$ cromtool
Initializing /Users/sfrazer/.cromtool...
Initializing /Users/sfrazer/.cromtool/build...
Initializing /Users/sfrazer/.cromtool/tmp...
Writing default config to /Users/sfrazer/.cromtool/config

======================================
A default configuration file has been written to /Users/sfrazer/.cromtool/config

Please modify this file and put in values for username and passwords
======================================

Usage

Below is a description of each sub-command to cromtool

server

Manage a list of Cromwell servers. These will be referenced by name

$ cromtool server ls
|Name        |URL                                                 |
|------------|----------------------------------------------------|
|dsde-staging|https://cromwell.dsde-staging.broadinstitute.org:443|
|local       |http://localhost:8000                               |
|dsde-dev    |https://cromwell.dsde-dev.broadinstitute.org:443    |
$ cromtool server add local-80 http://localhost
$ cromtool server ls
|Name        |URL                                                 |
|------------|----------------------------------------------------|
|dsde-dev    |https://cromwell.dsde-dev.broadinstitute.org:443    |
|dsde-staging|https://cromwell.dsde-staging.broadinstitute.org:443|
|local       |http://localhost:8000                               |
|local-80    |http://localhost:80                                 |

access-token

Acquire a Google Access Token:

$ cromtool access-token
ya29._wEWJqH4_c402W621this_is_super_secret_dont_tell_anybody_about_it

refresh-token

Acquire a Google Refresh Token:

$ cromtool refresh-token
1/another_secret_string

jes-instances

Returns a table of all VMs that are running JES jobs. This data is pieced together from the output of a gcloud compute instances list command.

The last column gives a gcloud compute ssh command that will allow SSH access to the VM where the job is running. Note that the actual process is running in a Docker container on that VM.

It is recommended that you specify --project explicitly. Otherwise, the default project that gcloud is currently configured for (see output of gcloud info)

Example usage:

$ cromtool jes-instances --project=broad-dsde-dev
> gcloud compute --project=broad-dsde-dev instances list --format json
|Name                    |Pipeline ID         |Run ID                                                  |Machine Type |Status |gcloud                                                          |
|------------------------|--------------------|--------------------------------------------------------|-------------|-------|----------------------------------------------------------------|
|ggp-10444405266723971505|15579851035708136685|EJSbz4KvKhixw7Sqjej--JABIMO73rS7FyoMc3RhZ2luZ1F1ZXVl    |n1-standard-1|RUNNING|gcloud compute ssh --zone=us-central1-a ggp-10444405266723971505|
|ggp-15855547600557060489|9083337551142404666 |EOe-6pe1KhiJ44CUlZGOhdwBIMO73rS7FyoPcHJvZHVjdGlvblF1ZXVl|n1-standard-1|RUNNING|gcloud compute ssh --zone=us-central1-a ggp-15855547600557060489|
|ggp-17029143444771312572|7915637536437851425 |EP2kgLm1Khi8p5D20ffqqewBIMO73rS7FyoPcHJvZHVjdGlvblF1ZXVl|n1-standard-1|RUNNING|gcloud compute ssh --zone=us-central1-a ggp-17029143444771312572|
|ggp-1771482711520258106 |16475625741508474430|EMytz9e3Khi60KSPv8Tkyhggw7vetLsXKg9wcm9kdWN0aW9uUXVldWU |n1-standard-1|RUNNING|gcloud compute ssh --zone=us-central1-a ggp-1771482711520258106 |
|ggp-3157797711739161870 |10910293093633253841|EJrvtpyzKhiOmoSGnImw6Ssgw7vetLsXKg9wcm9kdWN0aW9uUXVldWU |n1-standard-1|RUNNING|gcloud compute ssh --zone=us-central1-a ggp-3157797711739161870 |
|ggp-7010617938722704116 |17059662677337818778|EJPWn-qzKhj0lbGs8futpWEgw7vetLsXKg9wcm9kdWN0aW9uUXVldWU |n1-standard-1|RUNNING|gcloud compute ssh --zone=us-central1-a ggp-7010617938722704116 |
|ggp-8015122898931865014 |7747849907777471718 |EOfS-a24Khi2q7uv-PnbnW8gw7vetLsXKg9wcm9kdWN0aW9uUXVldWU |n1-standard-1|RUNNING|gcloud compute ssh --zone=us-central1-a ggp-8015122898931865014 |
|ggp-8473594771888441731 |968724465190265928  |EN7rpNm5KhiDk5LozLGQzHUgw7vetLsXKg9wcm9kdWN0aW9uUXVldWU |n1-standard-1|RUNNING|gcloud compute ssh --zone=us-central1-a ggp-8473594771888441731 |
|ggp-9284367781915948616 |954632328327854207  |EL2CrKqvKhjI7Lmp6ems7IABIMO73rS7FyoPcHJvZHVjdGlvblF1ZXVl|n1-standard-1|RUNNING|gcloud compute ssh --zone=us-central1-a ggp-9284367781915948616 |

jes-job

The jes-job subcommand will return lots of information about the GCE virtual machine and Docker container running on that virtual machine.

jes-job needs a operation ID (which can be acquired using the jes-instances subcommand) and a project. It is recommended to specify --project, but the default configured for gcloud will be used if one is not provided.

In the example below, the gcloud compute ssh command is given to SSH to the VM and the docker exec command is used once SSH'd to that VM to get a shell on the container running the job.

On top of that, jes-job will pring out disk usage statistics (output of df -h) as well as a list of files that have been localized (via tree -h /mnt/local-disk on the GCE VM)

$ cromtool jes-job --project=broad-dsde-dev EN7rpNm5KhiDk5LozLGQzHUgw7vetLsXKg9wcm9kdWN0aW9uUXVldWU
> gcloud compute --project=broad-dsde-dev instances list --format json

Pipeline ID: 968724465190265928
Run ID: EN7rpNm5KhiDk5LozLGQzHUgw7vetLsXKg9wcm9kdWN0aW9uUXVldWU
SSH: gcloud compute --project=broad-dsde-dev ssh --zone=us-central1-a ggp-8473594771888441731

> gcloud compute --project=broad-dsde-dev ssh --zone=us-central1-a ggp-8473594771888441731 sudo "docker ps"
CONTAINER ID        IMAGE                        COMMAND                  CREATED             STATUS              PORTS               NAMES
95a875c07bb7        broadgdac/tool_gistic2:141   "dumb-init /tmp/ggp-9"   17 hours ago        Up 17 hours                             backstabbing_hodgkin

Docker exec: sudo docker exec -t -i 95a875c07bb7 bash -l

> gcloud compute --project=broad-dsde-dev ssh --zone=us-central1-a ggp-8473594771888441731 "sudo df -h"
Warning: Permanently added '104.197.167.190' (RSA) to the list of known hosts.
Filesystem                                              Size  Used Avail Use% Mounted on
rootfs                                                  9.8G  4.2G  5.1G  45% /
udev                                                     10M     0   10M   0% /dev
tmpfs                                                   372M  140K  372M   1% /run
/dev/disk/by-uuid/e8292f07-3714-41a2-ae8e-4a059f383139  9.8G  4.2G  5.1G  45% /
tmpfs                                                   5.0M     0  5.0M   0% /run/lock
tmpfs                                                   743M  212K  743M   1% /run/shm
cgroup                                                  1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/disk/by-uuid/e8292f07-3714-41a2-ae8e-4a059f383139  9.8G  4.2G  5.1G  45% /var/lib/docker/aufs
/dev/sdb                                                9.8G  270M  9.0G   3% /mnt/local-disk
none                                                    9.8G  4.2G  5.1G  45% /var/lib/docker/aufs/mnt/95a875c07bb726998d5d2176bf7dcad05827b13dd5b9b55c8ec2d5c49081a204

> gcloud compute --project=broad-dsde-dev ssh --zone=us-central1-a ggp-8473594771888441731 "sudo apt-get -qq install tree && tree -h /mnt/local-disk"
Warning: Permanently added '104.197.167.190' (RSA) to the list of known hosts.
/mnt/local-disk
├── [ 14M]  all_data_by_genes.txt
├── [ 46K]  all_lesions.conf_99.txt
├── [468K]  all_thresholded.by_genes.mat
├── [5.1M]  all_thresholded.by_genes.txt
├── [8.4K]  amp_genes.conf_99.txt
├── [4.5M]  amp_qplot.fig
├── [7.1K]  amp_qplot.png
├── [2.6K]  arraylistfile.txt
├── [2.6K]  array_list.txt

mysql

Acquire a MySQL connection string for a particular environment. Edit ~/.cromtool/config to set the users and passwords for each environment:

This is recommended to be used in sub-shell form, $(cromtool mysql --env=dsde-dev), as seen in the example below.

NOTE: this will output a string with a MySQL password in it

$ $(cromtool mysql --env=dsde-dev)
Warning: Using a password on the command line interface can be insecure.
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 106677
Server version: 5.6.26 (Google)

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

run

Runs a WDL file through Cromwell in one of two ways:

  1. Locally with JAR files built via the cromtool build subcommand. Requires the --build flag to specify the name of the build to use.
  2. Remotely via calls to the Cromwell REST API using the --server flag

The WDL file, inputs JSON file, and workflow options can be specified one of two ways:

  1. Via --wdl, --inputs, and --options flags, which each should specify the path to the appropriate file.
  2. Via --prefix which will take the value and append .wdl, .json, and .options.json to find the WDL file, inputs JSON file, and workflow options JSON file respectively.

Example of running a job from the command line runner:

$ cromtool run --build=0.12 --prefix=wdl/jes0 --poll
> java -jar /Users/sfrazer/.cromtool/build/0.12/cromwell-0.12.jar run wdl/jes0.wdl wdl/jes0.json wdl/jes0.options.json
[info] Slf4jLogger started
[info] Default backend: LOCAL
...

Example of running a job on a remote server:

$ cromtool run --server=dsde-dev --prefix=wdl/jes0 --poll
POST https://cromwell.dsde-dev.broadinstitute.org:443/workflows/v1
Content-Length: 1658
...

When running with --server, HTTP requests and responses will be printed to standard out. The --poll option will cause cromtool to poll the workflow after it's submitted every 5 seconds. These HTTP requests and responses will be printed to standard out as well.

build

Manages builds of Cromwell stored in ~/.cromtool/build/

Add a build from tree-like 0.11 on the Cromwell repository:

$ cromtool build add 0.11
> git init
Initialized empty Git repository in /Users/sfrazer/.cromtool/tmp/tmpr5dqmcz4/.git/
> git remote add origin https://github.com/broadinstitute/cromwell.git
> git fetch
...
> git reset --hard 0.11
HEAD is now at ec83068 Merge branch 'develop'
> sbt assembly
...

List builds:

$ cromtool build ls
|Name|Git tree-like|JAR Path                                             |
|----|-------------|-----------------------------------------------------|
|0.11|0.11         |/Users/sfrazer/.cromtool/build/0.11/cromwell-0.11.jar|
|0.12|0.12         |/Users/sfrazer/.cromtool/build/0.12/cromwell-0.12.jar|

Remove build:

$ cromtool build rm 0.11

query

Given a workflow ID and a --server, this will generate HTTPie commands for querying status, outputs, logs, and metadata:

$ cromtool query --server=dsde-dev 7e88bcb9-57c5-44ea-8319-0e2179e5a327
http 'https://cromwell.dsde-dev.broadinstitute.org:443/workflows/v1/7e88bcb9-57c5-44ea-8319-0e2179e5a327/status' 'Authorization: Bearer secret_access_token'
http 'https://cromwell.dsde-dev.broadinstitute.org:443/workflows/v1/7e88bcb9-57c5-44ea-8319-0e2179e5a327/outputs' 'Authorization: Bearer secret_access_token'
http 'https://cromwell.dsde-dev.broadinstitute.org:443/workflows/v1/7e88bcb9-57c5-44ea-8319-0e2179e5a327/logs' 'Authorization: Bearer secret_access_token'
http 'https://cromwell.dsde-dev.broadinstitute.org:443/workflows/v1/7e88bcb9-57c5-44ea-8319-0e2179e5a327/metadata' 'Authorization: Bearer secret_access_token'

status

Given a workflow ID and a --server, this will use the /workflows/v1/<ID>/metadata endpoint to build a table of the sub-job statuses:

$ cromtool status --server=dsde-dev ac61c1f2-21ae-46ae-a3fa-6df8ff43ad86
Workflow status: Succeeded
|FQN      |status|
|---------|------|
|sfrazer.x|Done  |
|sfrazer.z|Done  |
|sfrazer.y|Done  |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment