Skip to content

Data discovery

IGWN data are stored in a number of locations, and what data are available is dependent on how the data should be accessed.

For file-based access, including using CVMFS, see Dataset discovery.

For direct access to (remote) specific data channels, without using files or datasets, see Remote data discovery with NDS.

Dataset discovery

Datasets archived in files should be discoverable using GWDataFind, an HTTP-based API to the index of available files for a given computing centre.

Data are indexed using the following metadata parameters:

  • observatory - the single-character prefix for the observatory

    Prefix Name
    A LIGO India
    G GEO600
    K KAGRA
    L LIGO Livingston
    H LIGO Hanford
    V Virgo
  • frametype - the dataset name

  • the GPS [start, stop) interval of the contained data

Client

For more details, and better examples, about the GWDataFind client, see

https://gwdatafind.readthedocs.io/

Api

For a full description of the GWDataFind HTTP API, see

https://computing.docs.ligo.org/gwdatafind/server/api/

GWDataFind servers

Distributed data server

An instance of the GWDataFind server is operated centrally to support discovery of any data distributed using CVMFS.

This instance should be used for all offline analysis of data by IGWN members.

The address of the service is

https://datafind.ligo.org

Requires SciToken authorisation

This service requires SciToken authorisation.

Computing-centre servers

Each computing centre operates its own GWDataFind server to enable discovery of its available data. The address of the server will be automatically configured in the following shell envirnoment variable

GWDATAFIND_SERVER

which the GWDataFind client will automatically use.

No authorisation required for local access

The local GWDataFind servers are usually configured to accept connections without authorisation from other hosts on the same network.

Discovering datasets

Example

To discover which datasets are available at a given location for an observatory:

from gwdatafind import find_types
print(find_types("L"))
python3 -m gwdatafind -o L --show-types

Discovering data URLs

The GWDataFind servers will return URLs for specific data files.

File URLs

The default response when using the client library is to return file:// URLs:

Discover file URLs

To find the file:// URLs (or paths) for a dataset:

from gwdatafind import find_urls
print(find_urls("L", "L1_HOFT_C00", 1187008866, 1187008898))
python -m gwdatafind -o L -t L1_HOFT_C00 -s 1187008866 -e 1187008898

OSDF URLs

A GWDataFind server can also return osdf:// URLs, intended to work with HTCondor file transfer:

Discover OSDF URLs

To find the OSDF:// URLs for a dataset:

from gwdatafind import find_urls
print(find_urls("L", "L1_HOFT_C00", 1187008866, 1187008898, urltype="osdf"))
python -m gwdatafind -o L -t L1_HOFT_C00 -s 1187008866 -e 1187008898 -u osdf

Only works with CVMFS paths

A GWDataFind server can only return osdf:// URLs for corresponding /cvmfs paths, so this likely only works with https://datafind.ligo.org.

Remote data discovery with NDS

The LIGO Laboratory supports the Network Data Server (NDS), which enables remote access to observatory data. Version 2 of the NDS protocol supports remote, authenticated access for collaboration members, from anywhere in the world.

Info

For full details, see the NDS2 client documentation.

In contrast to gwdatafind, which locates files containing a specific data set, NDS2 operates solely on the data channel, and will directly return the data for that channel, regardless of which dataset contains it. This can be extremely valuable when you are not sure which dataset contains a specific channel of interest.

Example

To download data for a specific data channel:

import nds2
conn = nds2.connection("nds.ligo.caltech.edu")
print(conn.fetch(1187008866, 1187008898, ["H1:GDS-CALIB_STRAIN"]))