Accounting tags are required
Jobs without accounting tags which are submitted from IGWN hosts will fail instantly. See: Accounting.
Collections of resources where you can run jobs in HTCondor are called "pools".
In this section, we discuss some special requirements and options to target the pool and resources in which your jobs run.
Workflows can be submitted to the IGWN Grid only from specific machines. Workflows submitted from
For IGWN members, those machines are:
| ||Caltech (LIGO)||Local & IGWN pools available|
| ||Hanford (LIGO)||Local & IGWN pools available|
| ||Livingston (LIGO)||Local & IGWN pools available|
| ||Nikhef||IGWN Grid-only, limited |
Login follows the same directions given here to connect to generic collaboration resources, hence both
gsissh access is supported.
Each submit host is configured to connect to the underlying HTCondor computing workload manager. Any computing task one wishes to run on the IGWN pool should be submitted from one of such submit hosts.
IGWN grid job requirements¶
No special requirements needed!
Jobs submitted from IGWN Grid submit hosts will now run on the IGWN Grid without any additional entries in the submit file
IGWN Grid jobs run through the glide-in model for distributed computing: special "pilot" jobs run on the native batch workflow management system at a computing site and provision slots for your HTCondor jobs to run on.
How to force jobs to run in a glidein (testing)
At some locations which also offer a local pool as well as the IGWN pool, you may find it useful for testing purposes to require that your job runs through a glidein and prevent them from using any available resources in the local pool:
requirements = (IS_GLIDEIN=?=True)
This is not a pre-requisite to using the IGWN Grid, however, and jobs submitted from IGWN Grid hosts should behave identically regardless of the pool they land in, provided they have been configured as described in this documentation.
requirements may need to be given to indicate to condor other resources that are needed by your job(s). The following table summarises some relevant requirement expressions, and what they mean for the target execute host:
| ||the |
| ||the |
| ||GWF data files are available via CVMFS (paths as returned from |
| ||jobs can run within user-provided (or remote) Singularity images|
| ||jobs can run within CVMFS-distributed Singularity images|
Specifying multiple requirements to HTCondor
Multiple requirements should be combined using the
&& intersection operator:
requirements = (HAS_LIGO_FRAMES=?=True) && (HAS_SINGULARITY=?=True)
You shouldn't need to do this for most use-cases
requirements are set correctly, there should be no need to restrict the site at which jobs run. If site-specific problems do occur, the first action should be to notify the IGWN computing community via the help-desk. Restricting sites is a stop-gap solution which may help urgent workflows complete or facilitate special-case data-access patterns.
To opt-in to specific sites, use the
+DESIRED_Sites HTCondor directive in your submit file
Restricting sites jobs run at
Restrict jobs to a specific site, using double-quotes:
+DESIRED_Sites = "LIGO-CIT"
Multiple desired sites can be declared as a comma-separated list:
+DESIRED_Sites = "LIGO-CIT,GATech"
To opt-out of (blacklist) a site, mark it as undesired:
+UNDESIRED_Sites = "GATech"
Available sites include (but are not necessarily limited to):
| ||Brookhaven National Lab. (USA)|
| ||IN2P3, Lyon (France)|
| ||INFN (Italy)|
| ||Georgia Tech (USA)|
| ||KISTI (Korea)|
| ||Caltech (USA)|
| ||LIGO Hanford Observatory (USA)|
| ||LIGO Livingston Observatory (USA)|
| ||Nikhef (Netherlands)|
| ||LSU (USA)|
| ||RAL (UK)|
| ||Pacific Research Platform (Global, mostly USA)|
| ||Syracuse (USA)|
| ||LSU (USA)|
| ||Univ. of Chicago (USA)|
| ||UCSD (USA)|
Using a local pool¶
Some IGWN Grid submit hosts provide access to both the IGWN Grid pool and a local cluster pool. In the case of
ldas-osg.ligo.caltech.edu for example, this local pool is the "usual" CIT cluster accessed through
ldas-grid.ligo.caltech.edu and, as such, allows access to resources like the shared filesystem, including
Usage of the local pool operates under an opt-in model, where jobs must "flock" from the IGWN Grid to the local pool. Jobs which are not able to run on IGWN Grid resources must be further restricted to only run in the local pool.
HTCondor submit file for jobs in a local pool
Allow jobs to run in a local pool using
universe = Vanilla executable = /lalapps_somejob +flock_local = True log = example.log error = example.err output = example.out queue 1
Restrict jobs to a local pool using flocking and restricting the external sites at which they can run:
universe = Vanilla executable = /lalapps_somejob +flock_local = True +DESIRED_SItes = "none" log = example.log error = example.err output = example.out queue 1
Finally, to restrict jobs to the local pool and leverage a shared filesystem, disable file transfers:
universe = Vanilla executable = /lalapps_somejob should_transfer_files = NO +flock_local = True +DESIRED_SItes = "none" log = example.log error = example.err output = example.out queue 1
Using the local pool can be a powerful tool for e.g.:
- pre-processing jobs to extract data for further distribution using HTCondor file transfer and increasing the number of potential sites at which you can run
- short-duration post-processing jobs such as output data aggregation or web page generation.