Using Conda environments with HTCondor¶

Many IGWN workflows are built around software provided by a Conda environment, which may include special shell environment variables or other considerations.

There is no first-class way to tell HTCondor that your job relies on a custom software environment (a Conda environment, or a Python virtualenv, or similar), so the environment that your job lands in on the Execute Point by default may not be exactly what you need it to be.

There are a few different ways to ensure that your job executes in with the required software, and in the required environment, which one you need will depend on the requirements of your workflow.

Use `conda run` to execute your process¶

The most robust way to use Conda with HTCondor is to mimic the shell pattern:

conda activate <my-environment>
<exe> <arguments>

The Conda project provides an entry point called conda run which simulates this in a single command, e.g.

conda run --name <my-environment> --no-capture-output <exe> <arguments>

--no-capture-output

By default conda run captures output and error logs, and emits them once the process completes. The --no-capture-output option tells conda run to stream the output and error streams directly to the console.

This is optional, but will mimic the interactive process output better.

If your job, or monitoring of the job progress, isn't reliant on streaming stdout, e.g. the scientific output is written to a file, this option can be safely removed.

Referencing conda environments by path, rather than name

If your Conda environment is in a non-standard path, you can reference it using the --prefix <path> argument, rather than --name <name>:

conda run --prefix </path/to/conda/env> <exe> <arguments>

The argument to --prefix should be the directory that contains the bin/ and lib/ directories.

For example, when using the IGWN Conda Distribution, from any currently-active environment:

Example usage of conda run

$ python3 --version
Python 3.6.8
$ conda run --name igwn python3 --version
Python 3.10.9

(output may not exactly match)

You can configure HTCondor to use this pattern as follows

executable = </path/to/conda>/bin/conda
arguments = run --name <env-name> --no-capture-output <exe> <arguments>

For example:

HTCondor Submit instructions to support conda run

executable = /cvmfs/software.igwn.org/conda/bin/conda
arguments = run --name igwn-py310-testing --no-capture-output python --version

This will result in the output Python 3.10.9 as in the example above.

A complete example compatible with the IGWN Grid would be as follows:

IGWN HTCondor job using conda run

executable = /cvmfs/software.igwn.org/conda/bin/conda
arguments = run --name igwn --no-capture-output BayesWave --ifo H1 --H1-flow 32 --H1-cache LALSimAdLIGO --H1-channel LALSimAdLIGO --trigtime 900000000.00 --srate 512 --seglen 4 --PSDstart 900000000 --PSDlength 1024 --NCmin 2 --NCmax 2 --dataseed 1234 --outputDir results

log = example.log
error = example.err
output = example.out

transfer_output_files = results/

request_cpus = 1
request_disk = 1GB
request_memory = 16GB

accounting_group = ligo.dev.o4.burst.paramest.bayeswave

queue

Use the absolute path of the executable¶

In some cases it may be enough to just ensure that your executable is specified using it's absolute path:

Specifying executables in Conda environments with absolute paths

executable = /cvmfs/software.igwn.org/conda/envs/igwn/bin/python
arguments = --version
transfer_executable = false

The absolute path will ensure that the correct installation of the executable is used (rather than relying on your shell searching through $PATH).

transfer_executable = false

Conda environments are built using RPATH links between shared object libraries. These links are almost always specified relative to the $ORIGIN, i.e. the actual location of the executable itself. This means that the runtime location of the executable is critical to its ability to resolve dynamic links, so we must tell HTCondor to not attempt to copy the executable into the scratch directory for the job.

This is done via transfer_executable = false.

Please see here for details on how conda-build links shared libraries or compiled executables.

This option is not required with the conda run pattern above because conda is a Python script that is relocatable (as long as the Python interpreter referred to in its shebang line is not relocated).

This does not set environment variables

Just using the absolute path of the executable will probably ensure that the right executable, Python modules, or shared libraries are used, but will not guarantee that shell environment variables are set.

If your process relies upon the shell environment variables (or other settings) that are configured during conda activate, please see the above solution for Use conda run to execute your process.

Using Conda environments with HTCondor¶

Use conda run to execute your process¶

Use the absolute path of the executable¶

Use `conda run` to execute your process¶