Containerisation of software¶
Containers provide a means to install and run your analysis code in a self-contained environment with all required dependencies, somewhat like a virtual machine. Unlike a virtual machine, however, a container uses the host system's underlying operating system, reducing the disk size of the container image and improving performance.
We make use of two containerization platforms:
- docker: the best-known and most widely integrated platform, requires administrator priveleges to use.
- singularity: a free, open-source platform which does not require adminstrator priveleges and is more suited to high-performance and high-throughput computing projects. Singularity can run docker images, but not vice-versa.
Typically an analysis pipeline will use continuous integration to build and push a docker image to a container registry (like dockerhub and/or
containers.ligo.org). Once an image is in a publicly-accessible registry it can be published to CVMFS, making it available to various high-thoughtput computing resources running singularity.
Creating A Docker Image¶
Detailed instructions for working with the
containers.ligo.org container registry can be found on the computing wiki.
In short, developers are required to:
- Host their software in the gitlab repository at git.ligo.org
- Create a recipe, in the form of a
Dockerfile, which installs all software dependencies and reproduces the desired runtime environment
- Build and push the image created by the
- Ideally: integrate the image build into the project's CI-pipeline such that appropriately tagged container images maintain parity with the software in the repository automatically.
To make the container image available on high-throughput resources, they should be published to CVMFS.
Publishing Singularity Images To CVMFS¶
To publish a docker image to CVMFS:
NOTE: the interaction of the github and gitlab APIs apparently does not allow wildcards (
*) in image names. You must explicitly state which image tag(s) to publish.
The Open Science Grid (OSG) consortium maintains a service which automatically converts publicly-available docker images to singularity and publishes them to CVMFS. Taking advantage of this service is as simple as a 1-line change to a text file in an OSG github repository, through a standard pull request.
Suppose you have a docker image in the LIGO gitlab registry you would like to access in CVMFS. Let's call that image:
Fork the OSG CVMFS singularity synchronisation repository (click "Fork")
Clone your fork:
git clone email@example.com:astroclark/cvmfs-singularity-sync.git
Set upstream and create a branch:
cd cvmfs-singularity-sync git remote add upstream firstname.lastname@example.org:opensciencegrid/cvmfs-singularity-sync.git git checkout -b my-awesome-image
docker_images.txt to include your image. You can see some examples from our community already:
$ grep -A 3 'LIGO - user defined images' docker_images.txt # LIGO - user defined images containers.ligo.org/joshua.willis/pycbc:latest containers.ligo.org/james-clark/gwrucio:latest containers.ligo.org/albert.einstein/my-awesome-image:sometag
Add your image around here. Commit and push your new branch to your fork:
$ git add docker_images.txt $ git commit -m "Added Albert Einstein's awesome image to CVMFS publishing via the OSG" $ git push origin my-aweseom-image Counting objects: 5, done. Delta compression using up to 48 threads. Compressing objects: 100% (3/3), done. Writing objects: 100% (3/3), 368 bytes | 0 bytes/s, done. Total 3 (delta 2), reused 0 (delta 0) remote: Resolving deltas: 100% (2/2), completed with 2 local objects. remote: remote: Create a pull request for 'my-awesome-image' on GitHub by visiting: remote: https://github.com/albert.einstein/cvmfs-singularity-sync/pull/new/my-awesome-image remote: To email@example.com:albert.einstein/cvmfs-singularity-sync.git * [new branch] my-awesome-image -> my-awesome-image
Follow the github URL to open a pull request on the OSG github repository to merge your changes upstream.