Skip to content

Using Conda environments with HTCondor

Many IGWN workflows are built around software provided by a Conda environment, which may include special shell environment variables or other considerations.

There is no first-class way to tell HTCondor that your job relies on a custom software environment (a Conda environment, or a Python virtualenv, or similar), so the environment that your job lands in on the Execute Point by default may not be exactly what you need it to be.

There are a few different ways to ensure that your job executes in with the required software, and in the required environment, which one you need will depend on the requirements of your workflow.

Use conda run to execute your process

The most robust way to use Conda with HTCondor is to mimic the shell pattern:

conda activate <my-environment>
<exe> <arguments>

The Conda project provides an entry point called conda run which simulates this in a single command, e.g.

conda run --name <my-environment> --no-capture-output <exe> <arguments>

--no-capture-output

By default conda run captures output and error logs, and emits them once the process completes. The --no-capture-output option tells conda run to stream the output and error streams directly to the console.

This is optional, but will mimic the interactive process output better.

If your job, or monitoring of the job progress, isn't reliant on streaming stdout, e.g. the scientific output is written to a file, this option can be safely removed.

Referencing conda environments by path, rather than name

If your Conda environment is in a non-standard path, you can reference it using the --prefix <path> argument, rather than --name <name>:

conda run --prefix </path/to/conda/env> <exe> <arguments>

The argument to --prefix should be the directory that contains the bin/ and lib/ directories.

For example, when using the IGWN Conda Distribution, from any currently-active environment:

Example usage of conda run
$ python3 --version
Python 3.6.8
$ conda run --name igwn python3 --version
Python 3.10.9

(output may not exactly match)

You can configure HTCondor to use this pattern as follows

executable = </path/to/conda>/bin/conda
arguments = run --name <env-name> --no-capture-output <exe> <arguments>

For example:

HTCondor Submit instructions to support conda run
executable = /cvmfs/software.igwn.org/conda/bin/conda
arguments = run --name igwn-py310-testing --no-capture-output python --version

This will result in the output Python 3.10.9 as in the example above.

A complete example compatible with the IGWN Grid would be as follows:

IGWN HTCondor job using conda run

executable = /cvmfs/software.igwn.org/conda/bin/conda
arguments = run --name igwn --no-capture-output BayesWave --ifo H1 --H1-flow 32 --H1-cache LALSimAdLIGO --H1-channel LALSimAdLIGO --trigtime 900000000.00 --srate 512 --seglen 4 --PSDstart 900000000 --PSDlength 1024 --NCmin 2 --NCmax 2 --dataseed 1234 --outputDir results

log = example.log
error = example.err
output = example.out

transfer_output_files = results/

request_cpus = 1
request_disk = 1GB
request_memory = 16GB

accounting_group = ligo.dev.o4.burst.paramest.bayeswave

queue

Use the absolute path of the executable

In some cases it may be enough to just ensure that your executable is specified using it's absolute path:

Specifying executables in Conda environments with absolute paths
executable = /cvmfs/software.igwn.org/conda/envs/igwn/bin/python
arguments = --version
transfer_executable = false

The absolute path will ensure that the correct installation of the executable is used (rather than relying on your shell searching through $PATH).

transfer_executable = false

Conda environments are built using RPATH links between shared object libraries. These links are almost always specified relative to the $ORIGIN, i.e. the actual location of the executable itself. This means that the runtime location of the executable is critical to its ability to resolve dynamic links, so we must tell HTCondor to not attempt to copy the executable into the scratch directory for the job.

This is done via transfer_executable = false.

Please see here for details on how conda-build links shared libraries or compiled executables.

This option is not required with the conda run pattern above because conda is a Python script that is relocatable (as long as the Python interpreter referred to in its shebang line is not relocated).

This does not set environment variables

Just using the absolute path of the executable will probably ensure that the right executable, Python modules, or shared libraries are used, but will not guarantee that shell environment variables are set.

If your process relies upon the shell environment variables (or other settings) that are configured during conda activate, please see the above solution for Use conda run to execute your process.

Machine requirements

Conda environments are populated by downloading packages (basically tarballs) that contain pre-compiled libraries and executables and other architecture-independent code (e.g. Python modules, environment files).

To ensure that applications will run properly the machine on which the code is executed must be compatible with the machine on which the code was compiled. For many applications this just means matching the right architecture (e.g. x86_64 or arm64), however for more complex applications you may have to ensure that the runtime machine supports the same instruction sets as were available on the build machine.

IGWN Conda Distribution

The IGWN Conda Distribution environments in CVMFS are built to support x86_64 architecture execution with a minimum microarchitecture level of x86_64-v3. So, to be safe, users executing code in one of these environments should include the appropriate machine requirements for HTCondor to match against compatible machines:

IGWN Conda Distribution HTCondor machine requirements
requirements = (Arch == "X86_64") && (Microarch >= "x86_64-v3")

Custom environments

Custom conda environments may have different requirements depending on how, when, and where they were created.

To check the architecture assumptions of your conda distribution and environment you can try something like this:

Check conda architecture
conda info | grep archspec

This will report something like:

Example conda archspec
$ conda info | grep archspec
       virtual packages : __archspec=1=skylake

The important bit is the final word after the =1=, in this example skylake.

Likely output:

  • core2: Intel Core 2 This is an example of microarchitecture level 1 or 2.

  • skylake : meaning AVX2-enabled Skylake (without AVX512). This is an example of microarchitecture level 3.

  • skylake_avx512 : meaning AVX2-enabled Skylake (without AVX512). This is an example of microarchitecture level 3.

Additionally, a created conda environment may include a package that directly indicates the required microarchitecture level. To discover this:

Check conda environment microarch requirements
conda list microarch-level

There are two possible outputs:

  • Blank (no lines). This means that there are no packages in the conda environment that explicitly require a minimum microarchitecture level.

  • Non-blank:

    Example conda list microarch-level output
    # packages in environment at /home/user/.conda/envs/science:
    #
    # Name                    Version                   Build  Channel
    _x86_64-microarch-level   4               2_skylake_avx512    conda-forge
    

    This indicates a minimum requirement for x86_64 microarchitecture level 4. For this example the appropriate HTCondor requirements are:

    HTCondor machine requirements for Microarch level 4
    requirements = (Arch == "X86_64") && (Microarch >= "x86_64-v4")