Using Conda environments with HTCondor¶
Many IGWN workflows are built around software provided by a Conda environment, which may include special shell environment variables or other considerations.
There is no first-class way to tell HTCondor that your job relies on a custom software environment (a Conda environment, or a Python virtualenv, or similar), so the environment that your job lands in on the Execute Point by default may not be exactly what you need it to be.
There are a few different ways to ensure that your job executes in with the required software, and in the required environment, which one you need will depend on the requirements of your workflow.
conda run to execute your process¶
The most robust way to use Conda with HTCondor is to mimic the shell pattern:
conda activate <my-environment> <exe> <arguments>
The Conda project provides an entry point called
conda run which simulates this in a single command, e.g.
conda run --name <my-environment> --no-capture-output <exe> <arguments>
conda run captures output and error logs, and emits them once the process completes. The
--no-capture-output option tells
conda run to stream the output and error streams directly to the console.
This is optional, but will mimic the interactive process output better.
If your job, or monitoring of the job progress, isn't reliant on streaming stdout, e.g. the scientific output is written to a file, this option can be safely removed.
Referencing conda environments by path, rather than name
If your Conda environment is in a non-standard path, you can reference it using the
--prefix <path> argument, rather than
conda run --prefix </path/to/conda/env> <exe> <arguments>
The argument to
--prefix should be the directory that contains the
For example, when using the IGWN Conda Distribution, from any currently-active environment:
$ python3 --version Python 3.6.8 $ conda run --name igwn python3 --version Python 3.10.9
(output may not exactly match)
You can configure HTCondor to use this pattern as follows
executable = </path/to/conda>/bin/conda arguments = run --name <env-name> --no-capture-output <exe> <arguments>
executable = /cvmfs/software.igwn.org/conda/bin/conda arguments = run --name igwn-py310-testing --no-capture-output python --version
This will result in the output
Python 3.10.9 as in the example above.
A complete example compatible with the IGWN Grid would be as follows:
IGWN HTCondor job using
executable = /cvmfs/software.igwn.org/conda/bin/conda arguments = run --name igwn --no-capture-output BayesWave --ifo H1 --H1-flow 32 --H1-cache LALSimAdLIGO --H1-channel LALSimAdLIGO --trigtime 900000000.00 --srate 512 --seglen 4 --PSDstart 900000000 --PSDlength 1024 --NCmin 2 --NCmax 2 --dataseed 1234 --outputDir results log = example.log error = example.err output = example.out transfer_output_files = results/ request_cpus = 1 request_disk = 1GB request_memory = 16GB accounting_group = ligo.dev.o4.burst.paramest.bayeswave queue
Use the absolute path of the executable¶
In some cases it may be enough to just ensure that your executable is specified using it's absolute path:
executable = /cvmfs/software.igwn.org/conda/envs/igwn/bin/python arguments = --version transfer_executable = false
The absolute path will ensure that the correct installation of the executable is used (rather than relying on your
shell searching through
transfer_executable = false
Conda environments are built using
RPATH links between shared object libraries. These links are almost always specified relative to the
$ORIGIN, i.e. the actual location of the executable itself. This means that the runtime location of the executable is critical to its ability to resolve dynamic links, so we must tell HTCondor to not attempt to copy the executable into the scratch directory for the job.
This is done via
transfer_executable = false.
Please see here for details on how conda-build links shared libraries or compiled executables.
This option is not required with the
conda run pattern above because
conda is a Python script that is relocatable (as long as the Python interpreter referred to in its shebang line is not relocated).
This does not set environment variables
Just using the absolute path of the executable will probably ensure that the right executable, Python modules, or shared libraries are used, but will not guarantee that shell environment variables are set.
If your process relies upon the shell environment variables (or other settings) that are configured during
conda activate, please see the above solution for Use conda run to execute your process.