Using Conda environments with HTCondor¶
Many IGWN workflows are built around software provided by a Conda environment, which may include special shell environment variables or other considerations.
There is no first-class way to tell HTCondor that your job relies on a custom software environment (a Conda environment, or a Python virtualenv, or similar), so the environment that your job lands in on the Execute Point by default may not be exactly what you need it to be.
There are a few different ways to ensure that your job executes in with the required software, and in the required environment, which one you need will depend on the requirements of your workflow.
Use conda run
to execute your process¶
The most robust way to use Conda with HTCondor is to mimic the shell pattern:
conda activate <my-environment>
<exe> <arguments>
The Conda project provides an entry point called conda run
which simulates this in a single command, e.g.
conda run --name <my-environment> --no-capture-output <exe> <arguments>
--no-capture-output
By default conda run
captures output and error logs, and emits them once the process completes. The --no-capture-output
option tells conda run
to stream the output and error streams directly to the console.
This is optional, but will mimic the interactive process output better.
If your job, or monitoring of the job progress, isn't reliant on streaming stdout, e.g. the scientific output is written to a file, this option can be safely removed.
Referencing conda environments by path, rather than name
If your Conda environment is in a non-standard path, you can reference it using the --prefix <path>
argument, rather than --name <name>
:
conda run --prefix </path/to/conda/env> <exe> <arguments>
The argument to --prefix
should be the directory that contains the bin/
and lib/
directories.
For example, when using the IGWN Conda Distribution, from any currently-active environment:
$ python3 --version
Python 3.6.8
$ conda run --name igwn python3 --version
Python 3.10.9
(output may not exactly match)
You can configure HTCondor to use this pattern as follows
executable = </path/to/conda>/bin/conda
arguments = run --name <env-name> --no-capture-output <exe> <arguments>
For example:
executable = /cvmfs/oasis.opensciencegrid.org/ligo/sw/conda/bin/conda
arguments = run --name igwn-py310-testing --no-capture-output python --version
This will result in the output Python 3.10.9
as in the example above.
A complete example compatible with the IGWN Grid would be as follows:
IGWN HTCondor job using conda run
executable = /cvmfs/oasis.opensciencegrid.org/ligo/sw/conda/bin/conda
arguments = run --name igwn --no-capture-output BayesWave --ifo H1 --H1-flow 32 --H1-cache LALSimAdLIGO --H1-channel LALSimAdLIGO --trigtime 900000000.00 --srate 512 --seglen 4 --PSDstart 900000000 --PSDlength 1024 --NCmin 2 --NCmax 2 --dataseed 1234 --outputDir results
log = example.log
error = example.err
output = example.out
transfer_output_files = results/
request_cpus = 1
request_disk = 1GB
request_memory = 16GB
accounting_group = ligo.dev.o4.burst.paramest.bayeswave
queue
Use the absolute path of the executable¶
In some cases it may be enough to just ensure that your executable is specified using it's absolute path:
executable = /cvmfs/oasis.opensciencegrid.org/ligo/sw/conda/envs/igwn/bin/python
arguments = --version
transfer_executable = false
The absolute path will ensure that the correct installation of the executable is used (rather than relying on your shell
searching through $PATH
).
transfer_executable = false
Conda environments are built using RPATH
links between shared object libraries. These links are almost always specified relative to the $ORIGIN
, i.e. the actual location of the executable itself. This means that the runtime location of the executable is critical to its ability to resolve dynamic links, so we must tell HTCondor to not attempt to copy the executable into the scratch directory for the job.
This is done via transfer_executable = false
.
Please see here for details on how conda-build links shared libraries or compiled executables.
This option is not required with the conda run
pattern above because conda
is a Python script that is relocatable (as long as the Python interpreter referred to in its shebang line is not relocated).
This does not set environment variables
Just using the absolute path of the executable will probably ensure that the right executable, Python modules, or shared libraries are used, but will not guarantee that shell environment variables are set.
If your process relies upon the shell environment variables (or other settings) that are configured during conda activate
, please see the above solution for Use conda run to execute your process.