Data discovery¶
IGWN data are stored in a number of locations, and what data are available is dependent on how the data should be accessed.
For file-based access, including using CVMFS, see Dataset discovery.
For direct access to (remote) specific data channels, without using files or datasets, see Remote data discovery with NDS.
Dataset discovery¶
Datasets archived in files should be discoverable using GWDataFind, an HTTP-based API to the index of available files for a given computing centre.
Data are indexed using the following metadata parameters:
-
observatory
- the single-character prefix for the observatoryPrefix Name A
LIGO India G
GEO600 K
KAGRA L
LIGO Livingston H
LIGO Hanford V
Virgo -
frametype
- the dataset name - the GPS
[start, stop)
interval of the contained data
Client
For more details, and better examples, about the GWDataFind client, see
Api
For a full description of the GWDataFind HTTP API, see
GWDataFind servers¶
Distributed data server¶
An instance of the GWDataFind server is operated centrally to support discovery of any data distributed using CVMFS.
This instance should be used for all offline analysis of data by IGWN members.
The address of the service is
https://datafind.ligo.org
Requires SciToken authorisation
This service requires SciToken authorisation.
Computing-centre servers¶
Each computing centre operates its own GWDataFind server to enable discovery of its available data. The address of the server will be automatically configured in the following shell envirnoment variable
GWDATAFIND_SERVER
which the GWDataFind client will automatically use.
No authorisation required for local access
The local GWDataFind servers are usually configured to accept connections without authorisation from other hosts on the same network.
Discovering datasets¶
Example
To discover which datasets are available at a given location for an observatory:
from gwdatafind import find_types
print(find_types("L"))
python3 -m gwdatafind -o L --show-types
Discovering data URLs¶
The GWDataFind servers will return URLs for specific data files.
File URLs¶
The default response when using the client library is to return file://
URLs:
Discover file URLs
To find the file://
URLs (or paths) for a dataset:
from gwdatafind import find_urls
print(find_urls("L", "L1_HOFT_C00", 1187008866, 1187008898))
python -m gwdatafind -o L -t L1_HOFT_C00 -s 1187008866 -e 1187008898
OSDF URLs¶
A GWDataFind server can also return osdf://
URLs, intended to work with HTCondor file transfer:
Discover OSDF URLs
To find the OSDF://
URLs for a dataset:
from gwdatafind import find_urls
print(find_urls("L", "L1_HOFT_C00", 1187008866, 1187008898, urltype="osdf"))
python -m gwdatafind -o L -t L1_HOFT_C00 -s 1187008866 -e 1187008898 -u osdf
Only works with CVMFS paths
A GWDataFind server can only return osdf://
URLs for corresponding /cvmfs
paths, so this likely only works with https://datafind.ligo.org
.
Remote data discovery with NDS¶
The LIGO Laboratory supports the Network Data Server (NDS), which enables remote access to observatory data. Version 2 of the NDS protocol supports remote, authenticated access for collaboration members, from anywhere in the world.
Info
For full details, see the NDS2 client documentation.
In contrast to gwdatafind
, which locates files containing a specific data set, NDS2 operates solely on the data channel, and will directly return the data for that channel, regardless of which dataset contains it. This can be extremely valuable when you are not sure which dataset contains a specific channel of interest.
Example
To download data for a specific data channel:
import nds2
conn = nds2.connection("nds.ligo.caltech.edu")
print(conn.fetch(1187008866, 1187008898, ["H1:GDS-CALIB_STRAIN"]))