Quick Start Guide

The following is a step by step example of how to build a docker image that complies to the "bioboxes" standard which means only a small amount of code should have to be changed in order build customized assembly docker images. All the steps required to build and run a docker image should be included here but you should consult "bioboxes.org" for details.

If you use the minia assembler as an example, and follow these steps, you should be able to build and run the minia image without changing any code.

copy templates

copy minia assembler example for template files (i.e. git clone https://github.com/bioboxes/minia)

Dockerfile Taskfile run.sh

If you want to run the minia assembler image as a test, you can download some reads from (wget https://www.dropbox.com/s/uxgn6cqngctqv74/reads.fq.gz).

Customize files

Dockerfile

The purpose of the Dockerfile is to:

install all the required software dependencies.
set up the directory structure required by your software and by biobox validation
add files that can be seen by the software that will be run in docker container.
set which command will be launched when the container is run by docker run. Such a command is set by the "ENTRYPOINT" variable but can be overwritten (i.e. docker run --entrypoint=/bin/bash).

All you really need to change are the first two lines

FROM sjackman/linuxbrew  #… you probably want to change to something like 
FROM <ubuntu:latest|debian:latest>
MAINTAINER <your name and email>

And content under:

install minia
add schema, tasks, run scripts

Don't change :

Locations for biobox validator
install yaml2json and jq tools
Install the biobox file validator
download the assembler schema

Note that you should have a run.sh and Taskfile in same directory as the Dockerfile for simplicity.

Taskfile

(http://bioboxes.org/guide/developer/create-a-task/) The Taskfile is what actually contains the code to run your software. You can hard code different parameter settings here. The syntax is described in ("").

run.sh

(or "assemble" in following link)

(http://bioboxes.org/guide/developer/putting-everything-together/) This file runs the code in the Taskfile as well as validation steps. You'll have to customize a few lines in this script. In the minia example, the read paths are saved into a file called "minia_input" so when the mina assembler is called, it will find the read paths. You'll have to set up your data/directory structure accordingly.

biobox.yaml

(http://bioboxes.org/guide/user/using-a-biobox/) Create this file by copying the below code (https://github.com/bioboxes/minia => "create a biobox.yaml") You need to change "reads.fq.gz" to your fastq file names.

---
version: "0.9.0"
arguments:
 - fastq:
   - id: "test_reads_1"
     type: "paired"
     value: "/bbx/input/reads_1.fq.gz" 
   - id: "test_reads_2"
     type: "paired"
     value: "/bbx/input/reads_2.fq.gz" 
   - id: "test_reads_3"
     type: "paired"
     value: "/bbx/input/reads_3.fq.gz"

build and run the docker image

After copying all necessary template files, customize them, and creating a file called biobox.yaml (name is hard coded name in run.sh) you need to set up the directory structure by doing:

mkdir input_data output_data
mv input_data
mv biobox.yaml input_data
build a docker image. You need to rebuild the image anytime you change one of the files. sudo docker build --rm --tag myassembler. where you've specified any image name ("myassembler") and the path to Dockerfile (".")
run the image and mount some volumes so the software running in the container can see it.

sudo docker run
--volume="$(pwd)/input_data:/bbx/input:ro"
--volume="$(pwd)/output_data:/bbx/output:rw"
--rm
myassembler
default

This command mounts the directories you made (with data) onto a directory that the container can see when its run. Note that "default" is an argument to the run.sh script which gets run because of the ENTRYPOINT variable in the Dockerfile.

The run.sh script finds the fastq files described in biobox.yaml and finds the appropriate command in Taskfile.

Output from the command in the Taskfile is put into /bbx/output which is mounted to ///output_data so that you will be able to access it when the container quits. If you ran the minia image with the suggested reads, you should see a minia.contigs.fa file with 5 contigs.

Some Hints:

For testing, this command is useful since you can see what the container sees. You will run the docker image but open an interactive bash shell (“-it”) by overwriting the ENTRYPOINT variable in Dockerfile

sudo docker run
--volume=/home/ubuntu/input_data:/bbx/input:rw
--volume=/home/ubuntu/input_data:/bbx/output:rw
--rm
--entrypoint=/bin/bash -it
myassembler

add this to your Dockerfile to include any required scripts that your pipeline uses ADD fastqLengths.pl /usr/local/bin/fastqLengths.pl
/usr/local/bin/ is already in $PATH but you can add directories ENV PATH /usr/local/megahit:$PATH
Since each command in the Dockerfile like RUN,ENTRYPOINT,CMD, runs in a new shell, this command…. RUN cd Ray-2.3.1/ && make 'MAXKMERLENGTH=73' && make install is different than….

RUN cd Ray-2.3.1/
RUN make 'MAXKMERLENGTH=73' 
RUN make install

The first case is correct and the second case will try and run “make” in the root directory and not the expected Ray-2.3.1/

The difference between RUN and CMD in the Dockfile. RUN will run a command in bash CMD is used to specify default parameters for the command specified in ENTRYPOINT. for example:

ENTRYPOINT ["sudo", "-E", "/bin/bash", "/usr/local/bin/run"]
CMD ["default"]

CMD arguments can be overwritten by arguments given at the end of the docker run command.

michaelbarton/gist:996c0affbd32aae4b144