Skip to content

Instantly share code, notes, and snippets.

@philippbayer
Last active March 3, 2021 02:34
Show Gist options
  • Save philippbayer/aacf8f32825f917e0b81785906f36754 to your computer and use it in GitHub Desktop.
Save philippbayer/aacf8f32825f917e0b81785906f36754 to your computer and use it in GitHub Desktop.
Running EDTA on Pawsey with Singularity

First, to download EDTA:

module load singularity
singularity pull EDTA.sif docker://quay.io/biocontainers/edta:1.9.4--0

That'll make a new file called EDTA.sif containing everything in the EDTA v1.9.4 container.

Then we have a problem: Pawsey allows only 1 million files per user and running several EDTA runs for several genomes at once will hit that limit.

So we make an overlay fs via https://pawseysc.github.io/sc19-containers/04-writable-containers/index.html

I use a 40GB overlay FS since that should be enough (testing with B. oleracea and Amborella data ran out of space with 20GB)

singularity exec docker://library/ubuntu:18.04 bash -c " \
  mkdir -p overlay_tmp/upper overlay_tmp/work && \
  dd if=/dev/zero of=my_overlay count=40000 bs=1M && \
  mkfs.ext3 -d overlay_tmp my_overlay && \
  rm -rf overlay_tmp \
  "

This will make a 40GB file called my_overlay which contains that filesystem (sidenote: normally you could use the EDTA.sif to run this, but for some reason that image doesn't contain the mkfs.ext3 program. In this case I use the standard Ubuntu 18.04 image, the first time you run this command this will download the image to your Singularity folder, the second time it will reuse the downloaded image).

For two species I needed to increase count from 40000 (40gb) to 60000 (60gb).

You can use regular Linux tools for that if your fs already exists:

 e2fsck -f my_overlay && \
  resize2fs my_overlay 60G

Important note: newer versions of Singularity (>3.6) treat the overlay system slightly different, where you have to make specific folders (see https://sylabs.io/guides/3.6/user-guide/persistent_overlays.html ). Even newer versions seem to have a image.create command which does it all for you.

Let's copy the reference into that overlay FS. By default, EDTA seems to write the output into the current working directory, and it's not so easy to cd around within singularity.

singularity exec --overlay `pwd`/my_overlay  EDTA.sif mkdir  /files
singularity exec --overlay `pwd`/my_overlay  EDTA.sif cp reference.fa  /files

Then, to run EDTA:

export LC_ALL=C # to get rid of an annoying warning message
singularity exec --overlay `pwd`/my_overlay EDTA.sif \
  bash -c 'cd /files && EDTA.pl --genome reference.fa' > STDOUT 2> STDERR

This will run EDTA within the overlay fs. The bash -c 'cd' trick is needed to change the working directory, if you don't do that EDTA will write outside the overlay FS directory.

A remaining problem is that the overlay FS is completely locked by Singularity, so even running ls while EDTA is doing its thing will result in a warning:

singularity exec --overlay `pwd`/my_overlay  EDTA.sif bash -c 'cd /files && ls'
FATAL:   error while locking ext3 overlay partition from my_overlay: can't open my_overlay for writing, currently in use by another process

That also means that when you want to run many concurrent EDTA runs you'll have to make one overlay FS for each.

BONUS

Sometimes, the EDTA github repository has a bugfix that the singularity image doesn't have yet. You can run the EDTA from Github via the singularity image too:

git clone [email protected]:oushujun/EDTA.git
 
export LC_ALL=C # to get rid of an annoying warning message
singularity exec --overlay `pwd`/my_overlay EDTA.sif \
  bash -c 'cd /files && `pwd`/EDTA/EDTA.pl --genome reference.fa' > STDOUT 2> STDERR

That way you have all the dependencies but the main EDTA is the newest version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment