Skip to content

Data Management on the IGWN Computing Grid

The IGWN Computing Grid connects geographically-distributed resources that do not share a file system, which requires careful management of data, both in to, and out of a job.

HTCondor file transfer

IGWN workflows should use HTCondor's file transfer mechanism to transfer data to and from jobs, including between stages of a workflow.

The exception to this recommendation is related to reading data that are centrally managed and cached using CVMFS or the Open Science Data Federation (OSDF) Cache.

Accessing cached IGWN data with HTCondor

Many IGWN workflows require reading instrumental data from centralised data archives. IGWN CompSoft supports this on the IGWN Computing Grid via the Open Science Data Federation which enables dynamic access to IGWN data from any machine as if those files were available locally.

There are a few different strategies for accessing data in an HTCondor workflow:

Read data from CVMFS

CVMFS is a software and data distribution service that makes remote data available via a POSIX-like file system. See CVMFS (on this guide) for more details.

Workflows that need to read data from CVMFS can configure their job requirements to ensure that the jobs only match with matchines that have the necessary CVMFS repositories available.

IGWN private data in CVMFS

For proprietary IGWN Data (e.g. h(t)), which requires Credentials for access, a custom requirement is used to ensure that the target machine has the data, and can handle the credentials appropriately.

If your job needs access to IGWN proprietary data, including those paths returned by the GWDataFind server at https://datafind.ligo.org the requirement command should be

requirements = HAS_CVMFS_IGWN_PRIVATE_DATA =?= True

Private data access requires credentials

Accessing the IGWN private data requires configuring your job to include Credentials.

GWOSC public data in CVMFS

For the GWOSC public CVMFS data repository (gwosc.osgstorage.org) the requirement command should be

requirements = HAS_CVMFS_gwosc_osgstorage_org =?= True

Download data using StashCP

The IGWN Computing Grid also supports using the OSDF client in HTCondor jobs to dynamically transfer data from the Open Science Data Federation caches. This provides a method to transfer data from the same centralised data archives without relying on CVMFS (which is not universally available).

Work in Progress

This section is a work in progress, and will be updated soon.