Skip to content

Instantly share code, notes, and snippets.

@brianv0
Created July 21, 2017 16:44
Show Gist options
  • Save brianv0/887e6a5fb23b72d6ea0ab5ed0c72ab09 to your computer and use it in GitHub Desktop.
Save brianv0/887e6a5fb23b72d6ea0ab5ed0c72ab09 to your computer and use it in GitHub Desktop.
Fermi Containerization Notes

End of week report

Joris and I have been working on a plan for containers, here's our progress and findings so far.

Core technologies to adopt

We believe we will want a few flavors of container images in both and that we will need to leverage both Singularity and Docker.

We believe the adoption of CVMFS will be necessary and viable. At SLAC, we know that there is a 100GB caching proxy setup for CVMFS locally, and that batch nodes have a 25GB local cache. This means it's likely that container images and libraries distributed through CVMFS will be in the CVMFS cache at SLAC during production data processing. We believe this will mitigate any scaling issues when, for example, hundreds of L1 jobs start up.

Container flavors

  • Uber-container. The Uber-containers will be a full collection of all libraries rolled up in one large image. This will be useful for testing and simplified distribution of images, most likely for small scale jobs (n < 50), but could be used for production provided that container distribution/caching is addressed. We don't expect, for example, L1 to use these containers. For this flavor, it's probably most important this image be available for docker.

  • Lightweight containers + Software mount-in. These containers are much smaller, consisting of a centos6 base image and a few OS base packages which aren't provided by GLAST_EXT. It will probably be an evolving dialog as to what may be provided into this container. We will mount in external folders (bind mounts in Singularity, volumes in Docker) with GLAST_EXT and GlastRelease, for example, into the container. For distribution of those libraries, we are planning on leveraging CVMFS at SLAC and abroad. For distribution of the containers, we are also planning on leveraging CVMFS as well, potentially with fallback to an image provided through the internet (HTTP via nginx, or singularity-hub.org). It's most important that this image be available for Singularity.

Known and potential issues

  • Singularity does not appear to allow users to mount in arbitrary folders at execution time by default. This means either a user needs to declare, at container build time, the folders they might want bind-mounted, such as /afs/slac/.../ground/software to /software inside the container (again, created at build time). We will be exploring this further. One workaround is to always create a few well-known file system roots in the container at build time in case they might be used, e.g. /nfs, /u, /sps, /afs, /gpfs, etc... There may be issues with the automounter when running batch jobs at SLAC, we are unclear on what issues there may be. We plan on working with SLAC IT to see if there is a configuration issue involved, but we will need to know if this issue will also exist for jobs which run at IN2P3 and/or the grid, where we might not be able to control a Singularity configuration as easily.

  • Our containers will likely need several configuration files mounted in or added to the container build as well, especially for L1. Those configuration files might include database connection information (e.g. Oracle wallet), Xrootd client configuration. This may be covered by the previous issue, but we haven't gotten here yet.

Next steps

  • We need to get the libraries in CVMFS

  • We plan on investigating the best plan of attack for integrating containers into the pipeline. We aren't yet sure the best way to run a job in the pipeline just yet, but we believe all batch jobs will be submitted with the container executing the pipeline wrapper. We anticipate this may cause some issue with job start/stop processing especially in regards to relying on the host's client email daemon. Another option is having the wrapper run the command inside the container instead, but that could lead to other issues, so we will need to understand that.

  • We need SLAC IT to upgrade singularity on the rhel7 machine. CVMFS is to be installed, but isn't there yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment