First make sure you have a sufficiently modern java (11 or higher). You can find out with
java -version
If your java is too old you can try installing a newer java using sdkman
.
curl -s "https://get.sdkman.io" | bash
source "$HOME/.sdkman/bin/sdkman-init.sh"
sdk install java 11.0.2-open
sdk default java 11.0.2-open
After running these commands once you will probably want to ensure that your new java is always used by default. You can enable this by adding the following line to your ~.bash_profile
source "$HOME/.sdkman/bin/sdkman-init.sh"
Once java is installed you can install nextflow. Start by following instructions on the nextflow website to create a nextflow
executable in your working directory. When this is done you should put this nextflow
executable into a sensible directory and make sure this directory is on your path. My suggestion is;
mkdir ~/bin
mv nextflow ~/bin
echo 'export PATH=${PATH}:${HOME}/bin' >> ~/.bash_profile
These commands will make a new bin
directory in your home directory, move nextflow
to it, and then permanently add this bin directory to your path.
Test that everything is working by logging out, then logging back in again and typing nextflow
. When you should this you should see the nextflow help instructions indicating that nextflow ran successfully. Now you shoud be able to run the nextflow command from anywhere on your system.
All of the software for marine omics pipelines are captured within self-contained machine images. The first time you run a workflow it will need to download the appropriate machine image and store it somewhere on your system. These images are moderately large (~1-2Gb) so the download will take a few minutes. In addition to downloading, your system will need to translate the image from docker format into singularity format. This takes longer (sometimes around 10 minutes).
Because these images take so long to download and build nextflow will try to store them in a cache
. You control the location of stored cache images with the NXF_SINGULARITY_CACHE
environment variable.
A special complication on JCU systems is that genomics1 and genomics2 (MCB servers) suffer very poor IO performance when writing to your home directory. When working on these systems it is therefore quite important not to set the singularity cache to be on your home directory.
Instructions below provide solutions for various situations depending on which machines you intend to use.
mkdir -p "${HOME}/.nxf/singularity_cache"
echo 'export NXF_SINGULARITY_CACHEDIR=${HOME}/.nxf/singularity_cache'>> ~/.bash_profile
Use this option if you intend to use marine omics pipelines on all machines (ie zodiac and genomics[12])
First create various singularity directories (do this while logged in to genomics1
or genomics2
. You will need to do it for each machine separately.
Note that jcXXX
stands for your jc number
mkdir -p "${HOME}/.nxf/singularity_cache"
mkdir -p /fast/jcxxx/.nxf/singularity_cache
mkdir -p /fast/jcxxx/tmp
mkdir -p /fast/jcxxx/.singularity
Add the following lines to your ~/.bash_profile
if [[ ! $HOSTNAME =~ genomics[12] ]];then
export NXF_SINGULARITY_CACHEDIR="${HOME}/.nxf/singularity_cache"
else
export SINGULARITY_CACHEDIR="/fast/jcxxx/.singularity"
export SINGULARITY_TMPDIR="/fast/jcxxx/tmp"
export APPTAINER_CACHEDIR="/fast/jcxxx/.singularity"
export APPTAINER_TMPDIR="/fast/jcxxx/tmp"
export NXF_SINGULARITY_CACHEDIR="/fast/jcxxx/.nxf/singularity_cache"
fi
A good way to test things is to run one of the pipelines that comes with built-in tests. To run a test on genomics[12] with movp try
nextflow run marine-omics/movp -latest -profile genomics,test -r main
Or if you want to test on zodiac try
nextflow run marine-omics/movp -latest -profile zodiac,test -r main
Although the default settings are designed to capture common use cases you might sometimes find that you need to customise resource requests or set custom arguments for some workflow processes.
These can be accomodated by creating a local.config
file within the directory where you launch your nextflow job.
The example below shows the overall structure of te file. Enclose everything within a single process
directive. Individual tasks can then be addressed using their names.
Within each block you can set custom values for cpus
, memory
, queue
and ext.args
. The first three are probably self-explanatory. The third setting ext.args
allows you to provide command-line arguments directly to the tool.
process {
withName: 'gatk_mark_duplicates'{
cpus=12
memory=30.GB
}
withName: 'fastp'{
ext.args='--trim_front1 28 --poly_g_min_len 5'
}
withName: 'freebayes'{
queue='long'
ext.args='-0 --use-duplicate-reads --genotype-qualities --strict-vcf'
cpus = 64
}
}
After creating this file you can tell nextflow to use these settings by providing -c local.config
at the command line.
One of the very nice features of nextflow is that it integrates with a service called tower. To set this up;
- Create a tower account and sign in. An easy way to do this would be to sign in with your github profile. Do this by going to tower.nf
- Follow these instructions to create a token
- Add your token permanently by adding it to your
.bash_profile
(edit as appropriate for your specific token)
export TOWER_ACCESS_TOKEN=eyxxxxxxxxxxxxxxxQ1ZTE=
export NXF_VER=20.10.0
- Run your pipeline with
-with-tower
Now you can visit tower.nf and you should be able to see your running pipeline there.