Job submission
Accounting tags are required
Jobs without accounting tags which are submitted from IGWN hosts will fail instantly. See: Accounting.
Collections of resources where you can run jobs in HTCondor are called "pools".
In this section, we discuss some special requirements and options to target the pool and resources in which your jobs run.
Submission hosts¶
Workflows can be submitted to the IGWN Grid only from specific machines. Workflows submitted from
For IGWN members, those machines are:
Hostname | Location | Notes |
---|---|---|
ldas-osg.ligo.caltech.edu | Caltech (LIGO) | Local & IGWN pools available |
ldas-osg.ligo-wa.caltech.edu | Hanford (LIGO) | Local & IGWN pools available |
ldas-osg.ligo-la.caltech.edu | Livingston (LIGO) | Local & IGWN pools available |
stro.nikhef.nl | Nikhef | IGWN Grid-only, limited /home space |
Login follows the same directions given here to connect to generic collaboration resources, hence both ssh
and gsissh
access is supported.
Each submit host is configured to connect to the underlying HTCondor computing workload manager. Any computing task one wishes to run on the IGWN pool should be submitted from one of such submit hosts.
IGWN grid job requirements¶
Jobs submitted from IGWN Grid submit hosts will now run on the IGWN Grid without any additional entries in the submit file requirements
.
IGWN Grid jobs run through the glide-in model for distributed computing: special "pilot" jobs run on the native batch workflow management system at a computing site and provision slots for your HTCondor jobs to run on.
At some locations which also offer a local pool as well as the IGWN pool, you may find it useful for testing purposes to require that your job runs through a glidein and prevent them from using any available resources in the local pool:
requirements = (IS_GLIDEIN=?=True)
This is not a pre-requisite to using the IGWN Grid, however, and jobs submitted from IGWN Grid hosts should behave identically regardless of the pool they land in, provided they have been configured as described in this documentation.
Other requirements
may be given to indicate to condor other resources that are needed by your job(s). The following table summarises some relevant requirement expressions, and what they mean for the target execute host:
Requirement | Description |
---|---|
{: nowrap } (HAS_LIGO_FRAMES=?=True) | GWF data files are available via CVMFS (paths as returned from datafind.ligo.org:443 , see CVMFS data discovery) |
{: .nowrap } (HAS_SINGULARITY=?=True) | jobs can run within Singularity containers |
Specifying multiple requirements to HTCondor
Multiple requirements should be combined using the &&
intersection operator:
requirements = (HAS_LIGO_FRAMES=?=True) && (HAS_SINGULARITY=?=True)
Restricting sites¶
You shouldn't need to do this for most use-cases
If requirements
are set correctly, there should be no need to restrict the site at which jobs run. If site-specific problems do occur, the first action should be to notify the IGWN computing community via the help-desk. Restricting sites is a stop-gap solution which may help urgent workflows complete or facilitate special-case data-access patterns.
To opt-in to specific sites, use the +DESIRED_Sites
HTCondor directive in your submit file
Restricting sites jobs run at
Restrict jobs to a specific site, using double-quotes:
+DESIRED_Sites = "LIGO-CIT"
Multiple desired sites can be declared as a comma-separated list:
+DESIRED_Sites = "LIGO-CIT,GATech"
To opt-out of (blacklist) a site, mark it as undesired:
+UNDESIRED_Sites = "GATech"
Available sites include (but are not necessarily limited to):
Label | Location |
---|---|
BNL | Brookhaven National Lab. (USA) |
CCIN2P3 | IN2P3, Lyon (France) |
CNAF | INFN (Italy) |
GATech | Georgia Tech (USA) |
KISTI | KISTI (Korea) |
LIGO-CIT | Caltech (USA) |
LIGO-WA | LIGO Hanford Observatory (USA) |
LIGO-LA | LIGO Livingston Observatory (USA) |
NIKHEF | Nikhef (Netherlands) |
QB2 | LSU (USA) |
RAL | RAL (UK) |
SDSC-PRP | Pacific Research Platform (Global, mostly USA) |
SU-ITS | Syracuse (USA) |
SuperMIC | LSU (USA) |
UChicago | Univ. of Chicago (USA) |
UCSD | UCSD (USA) |
Using a local pool¶
Some IGWN Grid submit hosts provide access to both the IGWN Grid pool and a local cluster pool. In the case of ldas-osg.ligo.caltech.edu
for example, this local pool is the "usual" CIT cluster accessed through ldas-grid.ligo.caltech.edu
and, as such, allows access to resources like the shared filesystem, including /home
directories.
Usage of the local pool operates under an opt-in model, where jobs must "flock" from the IGWN Grid to the local pool. Jobs which are not able to run on IGWN Grid resources must be further restricted to only run in the local pool.
HTCondor submit file for jobs in a local pool
Allow jobs to run in a local pool using +flock_local
:
universe = Vanilla
executable = /lalapps_somejob
+flock_local = True
log = example.log
error = example.err
output = example.out
queue 1
Restrict jobs to a local pool using flocking and restricting the external sites at which they can run:
universe = Vanilla
executable = /lalapps_somejob
+flock_local = True
+DESIRED_SItes = "none"
log = example.log
error = example.err
output = example.out
queue 1
Finally, to restrict jobs to the local pool and leverage a shared filesystem, disable file transfers:
universe = Vanilla
executable = /lalapps_somejob
should_transfer_files = NO
+flock_local = True
+DESIRED_SItes = "none"
log = example.log
error = example.err
output = example.out
queue 1
Using the local pool can be a powerful tool for e.g.:
- pre-processing jobs to extract data for further distribution using HTCondor file transfer and increasing the number of potential sites at which you can run
- short-duration post-processing jobs such as output data aggregation or web page generation.