Containerisation of software¶

Containers provide a means to install and run your analysis code in a self-contained environment with all required dependencies, somewhat like a virtual machine. Unlike a virtual machine, however, a container uses the host system's underlying operating system, reducing the disk size of the container image and improving performance.

We make use of two containerization platforms:

docker: the best-known and most widely integrated platform, requires administrator priveleges to use.
singularity: a free, open-source platform which does not require adminstrator priveleges and is more suited to high-performance and high-throughput computing projects. Singularity can run docker images, but not vice-versa.

Typically an analysis pipeline will use continuous integration to build and push a docker image to a container registry (like dockerhub and/or containers.ligo.org). Once an image is in a publicly-accessible registry it can be published to CVMFS, making it available to various high-thoughtput computing resources running singularity.

Terminology: "image" refers to the set of files which define a given instance of a "container".

Creating Containers¶

Detailed legacy instructions for working with the containers.ligo.org container registry and Docker can be found on the computing wiki.

In short, developers should:

Create a Dockerfile: a recipe which installs all software dependencies and reproduces the complete runtime environment.
Use Docker to build and push the image created by the Dockerfile to containers.ligo.org. Due to security considerations, Docker is not available on shared login-hosts. You can skip installing Docker locally by using a git.ligo.org Continuous Integration environment. This automation will also improve the transparency and reproducibility of your work.
Ideally: integrate the image build into the project's CI-pipeline such that appropriately tagged container images maintain parity with the software in the repository automatically.
To make the container image available on high-throughput resources, they should be published to CVMFS.

Deployment example (1/2)¶

Consider a real-world example where a science team wish to use IGWN grid resources to analyze a set of gravitational event simulations using two pre-release versions of LALSuite:

The current HEAD of the master branch.
A specific branch on a developer's fork of the same repository.

We require access to pre-release development code which is not yet deployed in the IGWN production environments in CVMFS. The solution is to build Docker container images for LALSuite master and for the development branch.

This repository contains Dockerfiles and a CI-script to build images for an example of this use-case:

Note that building Docker images can be time-consuming and computationally intensive: this CI script is configured to run manually to minimise the load on the gitlab instance (an authorised person must trigger the docker build through the gitlab user interface). Other mechanisms to control the workflow are available.

The principles illustrated here will also apply to cases where users require some dependency that cannot be easily met through those conda environments or via native package managers on IGWN-managed resources.

Publishing Singularity Images To CVMFS¶

Singularity bind points for IGWN frame data

If your jobs require access to directories on the host which do not already exist in the container image those directories should be created in the Dockerfile.

For example, add the following to your Dockerfile to guarentee frame data access at IGWN Grid sites:

RUN mkdir -p /cvmfs /hdfs /gpfs /ceph /hadoop

Power Users¶

To publish a docker image to CVMFS:

Fork this repository
Add the registry URL for your image to docker_images.txt
Submit a pull request

NOTE: the interaction of the github and gitlab APIs apparently does not allow wildcards (*) in image names. You must explicitly state which image tag(s) to publish.

Everyone else¶

The Open Science Grid (OSG) consortium maintains a service which automatically converts publicly-available docker images to singularity and publishes them to CVMFS. Taking advantage of this service is as simple as a 1-line change to a text file in an OSG github repository, through a standard pull request.

Suppose you have a docker image in the LIGO gitlab registry you would like to access in CVMFS. Let's call that image:

containers.ligo.org/albert.einstein/my-awesome-image:sometag

Fork the OSG CVMFS singularity synchronisation repository (click "Fork")

Clone your fork:

git clone git@github.com:astroclark/cvmfs-singularity-sync.git

Set upstream and create a branch:

cd cvmfs-singularity-sync
git remote add upstream git@github.com:opensciencegrid/cvmfs-singularity-sync.git
git checkout -b my-awesome-image

Update docker_images.txt to include your image. You can see some examples from our community already:

$ grep -A 3 'LIGO - user defined images'  docker_images.txt
# LIGO - user defined images
containers.ligo.org/joshua.willis/pycbc:latest
containers.ligo.org/james-clark/gwrucio:latest
containers.ligo.org/albert.einstein/my-awesome-image:sometag

Add your image around here. Commit and push your new branch to your fork:

$ git add docker_images.txt
$ git commit -m "Added Albert Einstein's awesome image to CVMFS publishing via the OSG"
$ git push origin my-aweseom-image
Counting objects: 5, done.
Delta compression using up to 48 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 368 bytes | 0 bytes/s, done.
Total 3 (delta 2), reused 0 (delta 0)
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
remote:
remote: Create a pull request for 'my-awesome-image' on GitHub by visiting:
remote:      https://github.com/albert.einstein/cvmfs-singularity-sync/pull/new/my-awesome-image
remote:
To git@github.com:albert.einstein/cvmfs-singularity-sync.git
 * [new branch]      my-awesome-image -> my-awesome-image

Follow the github URL to open a pull request on the OSG github repository to merge your changes upstream.

Deployment example (2/2)¶

The pull request to publish the images for the LALSuite master- / fork-branch example can be found HERE

Specifically, the URLs which were added to docker_images.txt were:

containers.ligo.org/james-clark/tgr_images/testing_gr_fta:latest
containers.ligo.org/james-clark/tgr_images/lalsuite-master:latest

If you are not sure what those URLs should be, explore the Packages & Registries > Container Registry options in the navbar in gitlab. The container registry information for the example can be found HERE.