Accessing software with HTCondor¶
Given the IGWN grid jobs can run at any number of computing centres, including opporunistic resources not operated by IGWN members, job configurations cannot assume that the jobs will be matched with a resource that already has a local installation of any software or dependencies. Such software must be available from a globally-accessible repository or sent with the job using HTCondor's file transfer mechanism.
Summary of use cases¶
The following table lists some typical use-cases and suggested software deployment solutions:
|Production-level analysis code||IGWN conda environments (CVMFS) or CVMFs-deployed Singluarity container|
|Developmental / pre-release code||User-supplied Singularity container (CVMFS)|
|Statically compiled binary / shell script||Transfer executable with job|
Conda environments in CVMFS¶
For instructions on how to use executables from a Conda environment in an HTCondor workflow, please see Conda environments.
Singularity container images in CVMFS¶
A container is a unit of software that packages code and dependencies to provide a complete and extremely portable run-time environment. Singularity is one of several container technologies which is particularly well-suited to shared distributed computing environments.
A complete recipe for building and publishing containers to CVMFS can be found in Containerisation of software.
Once your Singularity container has been published to CVMFS:
configure your job submit file to use the path to the executable in the container,
executable = /usr/bin/mything
specify the path to the container image in CVMFS:
MY.SingularityImage = "/cvmfs/singularity.opensciencegrid.org/<org>/<image>:<tag>"
ensure that your job matches a host with the ability to run Singularity by specifying the
requirements = HAS_SINGULARITY=?=True
HTCondor job using a Singularity container in CVMFS
executable = /usr/bin/BayesWave transfer_executable = False MY.SingularityImage = "/cvmfs/singularity.opensciencegrid.org/lscsoft/bayeswave:v1.0.6" requirements = (HAS_SINGULARITY=?=True) log = example.log error = example.err output = example.out request_disk = 1GB request_memory = 16GB accounting_group = ligo.dev.o5.cbc.pe.bayeswave queue
Transferring Singularity images using HTCondor file transfer¶
Work in progress
This section is a work in progress, please consider contributing to complete this if you can.
Singularity images via Dockerhub¶
It is possible, but not generally recommended, to specify a docker URI for the singularity image. For example:
MY.SingularityImage = "docker://igwn/software"
In this case, HTCondor will invoke a
singularity pull command prior to starting your job, which will download and convert the Docker image to Singularity on the fly.
Limit use of Docker URIs for
MY.SingularityImage = "docker://some/image" means every job in your workflow will attempt to download and convert that container image. This can quickly induce enormous loads on local worker nodes, as well as overwhelming local network bandwidth at the execute point. Please only use this option for very small scale development tests.
If your software and all dependencies can be installed without administrator priveleges and is relatively platform-independent, an variation on this option is to use a script to install software on the fly at the start of job execution (but be aware that this will affect your jobs' run time). In all cases, be mindful of the bandwidth and disk space requirements for potentially large statically-compiled binaries and on-the-fly installations: software and data distributed through CVMFS uses smart caching mechanisms to make efficient use of bandwidth; installing packages as part of your job generally does not.
To use this method of software distribution, configure the job submit file with the path to your executable and tell HTCondor to transfer the executable with the job:
HTCondor submit file with executable transfer enabled
Specify the path to the
executable and enable file transfer:
executable = /home/albert.einstein/hello_world.sh transfer_executable = true log = example.log error = example.err output = example.out request_disk = 1GB request_memory = 1GB accounting_group = ligo.dev.o5.compsoft.hello.world queue