The following is a step by step example of how to build a docker image that complies to the "bioboxes" standard which means only a small amount of code should have to be changed in order build customized assembly docker images. All the steps required to build and run a docker image should be included here but you should consult "bioboxes.org" for details.
If you use the minia assembler as an example, and follow these steps, you should be able to build and run the minia image without changing any code.
copy minia assembler example for template files (i.e. git clone https://github.com/bioboxes/minia)
Dockerfile Taskfile run.sh
If you want to run the minia assembler image as a test, you can download some reads from (wget https://www.dropbox.com/s/uxgn6cqngctqv74/reads.fq.gz).
The purpose of the Dockerfile is to:
- install all the required software dependencies.
- set up the directory structure required by your software and by biobox validation
- add files that can be seen by the software that will be run in docker container.
- set which command will be launched when the container is run by docker run. Such a command is set by the "ENTRYPOINT" variable but can be overwritten (i.e. docker run --entrypoint=/bin/bash).
All you really need to change are the first two lines
FROM sjackman/linuxbrew #… you probably want to change to something like
FROM <ubuntu:latest|debian:latest>
MAINTAINER <your name and email>
And content under:
- install minia
- add schema, tasks, run scripts
Don't change :
- Locations for biobox validator
- install yaml2json and jq tools
- Install the biobox file validator
- download the assembler schema
Note that you should have a run.sh and Taskfile in same directory as the Dockerfile for simplicity.
(http://bioboxes.org/guide/developer/create-a-task/) The Taskfile is what actually contains the code to run your software. You can hard code different parameter settings here. The syntax is described in ("").
(or "assemble" in following link)
(http://bioboxes.org/guide/developer/putting-everything-together/) This file runs the code in the Taskfile as well as validation steps. You'll have to customize a few lines in this script. In the minia example, the read paths are saved into a file called "minia_input" so when the mina assembler is called, it will find the read paths. You'll have to set up your data/directory structure accordingly.
biobox.yaml
(http://bioboxes.org/guide/user/using-a-biobox/) Create this file by copying the below code (https://github.com/bioboxes/minia => "create a biobox.yaml") You need to change "reads.fq.gz" to your fastq file names.
---
version: "0.9.0"
arguments:
- fastq:
- id: "test_reads_1"
type: "paired"
value: "/bbx/input/reads_1.fq.gz"
- id: "test_reads_2"
type: "paired"
value: "/bbx/input/reads_2.fq.gz"
- id: "test_reads_3"
type: "paired"
value: "/bbx/input/reads_3.fq.gz"
After copying all necessary template files, customize them, and creating a file called biobox.yaml (name is hard coded name in run.sh) you need to set up the directory structure by doing:
- mkdir input_data output_data
- mv input_data
- mv biobox.yaml input_data
- build a docker image. You need to rebuild the image anytime you change one
of the files.
sudo docker build --rm --tag myassembler
. where you've specified any image name ("myassembler") and the path to Dockerfile (".") - run the image and mount some volumes so the software running in the container can see it.
sudo docker run
--volume="$(pwd)/input_data:/bbx/input:ro"
--volume="$(pwd)/output_data:/bbx/output:rw"
--rm
myassembler
default
This command mounts the directories you made (with data) onto a directory that the container can see when its run. Note that "default" is an argument to the run.sh script which gets run because of the ENTRYPOINT variable in the Dockerfile.
The run.sh script finds the fastq files described in biobox.yaml and finds the appropriate command in Taskfile.
Output from the command in the Taskfile is put into /bbx/output which is mounted to ///output_data so that you will be able to access it when the container quits. If you ran the minia image with the suggested reads, you should see a minia.contigs.fa file with 5 contigs.
- For testing, this command is useful since you can see what the container sees. You will run the docker image but open an interactive bash shell (“-it”) by overwriting the ENTRYPOINT variable in Dockerfile
sudo docker run
--volume=/home/ubuntu/input_data:/bbx/input:rw
--volume=/home/ubuntu/input_data:/bbx/output:rw
--rm
--entrypoint=/bin/bash -it
myassembler
-
add this to your Dockerfile to include any required scripts that your pipeline uses ADD fastqLengths.pl /usr/local/bin/fastqLengths.pl
-
/usr/local/bin/ is already in $PATH but you can add directories ENV PATH /usr/local/megahit:$PATH
-
Since each command in the Dockerfile like RUN,ENTRYPOINT,CMD, runs in a new shell, this command…. RUN cd Ray-2.3.1/ && make 'MAXKMERLENGTH=73' && make install is different than….
RUN cd Ray-2.3.1/
RUN make 'MAXKMERLENGTH=73'
RUN make install
The first case is correct and the second case will try and run “make” in the root directory and not the expected Ray-2.3.1/
- The difference between RUN and CMD in the Dockfile. RUN will run a command in bash CMD is used to specify default parameters for the command specified in ENTRYPOINT. for example:
ENTRYPOINT ["sudo", "-E", "/bin/bash", "/usr/local/bin/run"]
CMD ["default"]
CMD arguments can be overwritten by arguments given at the end of the docker run command.