Data Management on the IGWN Computing Grid¶
The IGWN Computing Grid connects geographically-distributed resources that do not share a file system, which requires careful management of data, both in to, and out of a job.
HTCondor file transfer¶
IGWN workflows should use HTCondor's file transfer mechanism to transfer data to and from jobs, including between stages of a workflow.
The exception to this recommendation is related to reading data that are centrally managed and cached using CVMFS or the Open Science Data Federation (OSDF) Cache.
Accessing cached IGWN data with HTCondor¶
Many IGWN workflows require reading instrumental data from centralised data archives. IGWN CompSoft supports this on the IGWN Computing Grid via the Open Science Data Federation which enables dynamic access to IGWN data from any machine as if those files were available locally.
There are a few different strategies for accessing data in an HTCondor workflow:
Read data from CVMFS¶
CVMFS is a software and data distribution service that makes remote data available via a POSIX-like file system. See CVMFS (on this guide) for more details.
Workflows that need to read data from CVMFS can configure their job requirements
to ensure that the jobs only match with matchines that have the necessary CVMFS repositories available.
IGWN private data in CVMFS¶
For proprietary IGWN Data (e.g. h(t)), which requires Credentials for access, a custom requirement is used to ensure that the target machine has the data, and can handle the credentials appropriately.
If your job needs access to IGWN proprietary data, including those paths returned by the GWDataFind server at https://datafind.ligo.org
the requirement command should be
requirements = HAS_CVMFS_IGWN_PRIVATE_DATA =?= True
Private data access requires credentials
Accessing the IGWN private data requires configuring your job to include Credentials.
GWOSC public data in CVMFS¶
For the GWOSC public CVMFS data repository (gwosc.osgstorage.org
) the requirement command should be
requirements = HAS_CVMFS_gwosc_osgstorage_org =?= True
Download data using StashCP¶
The IGWN Computing Grid also supports using the OSDF client in HTCondor jobs to dynamically transfer data from the Open Science Data Federation caches. This provides a method to transfer data from the same centralised data archives without relying on CVMFS (which is not universally available).
Work in Progress
This section is a work in progress, and will be updated soon.