Skip to content

Data discovery

As described in Data distribution IGWN data are stored in a number of locations, and what data are available is dependent on the location and how those data should be accessed.

For file-based dataset access, including using OSDF and CVMFS, see Dataset discovery.

For direct access to (remote) specific data channels, without using files or datasets, see Remote data discovery with NDS.

Dataset discovery

Datasets archived in files, including data distributed via OSDF, are discoverable using GWDataFind.

Data files are indexed using the following metadata parameters:

  • observatory - the single-character prefix for the observatory

    Prefix Name
    A LIGO India
    G GEO600
    K KAGRA
    L LIGO Livingston
    H LIGO Hanford
    V Virgo

    (or a combination of prefices for files containing data for multiple observatories)

  • frametype - the dataset name

  • the GPS [start, stop) interval of the contained data

Client

For more details, and better examples, about the GWDataFind client, see

https://gwdatafind.readthedocs.io/

Api

For a full description of the GWDataFind HTTP API, see

https://computing.docs.ligo.org/gwdatafind/server/api/

GWDataFind servers

Distributed data

An GWDataFind server instance is operated centrally to support discovery of any data distributed using OSDF.

This instance should be used for all offline analysis of data by IGWN members.

The address of the service is

https://datafind.igwn.org

Requires SciToken authorisation

This service requires SciToken authorisation.

Computing-centre servers

Any of the IGWN Computing Centres may operates their own GWDataFind server to enable discovery of locally-available data. The address of the server will be automatically configured in the following shell envirnoment variable

GWDATAFIND_SERVER

which the GWDataFind client will use automatically.

No authorisation required for local access

The local GWDataFind servers are usually configured to accept connections without authorisation from other hosts on the same network.

Public data

The public data available via GWOSC are indexed and discoverable using the GWDataFind server at

https://datafind.gw-openscience.org

Discovering datasets

Example

To discover which datasets are available at a given location for an observatory:

from gwdatafind import find_types
print(find_types("L"))
python3 -m gwdatafind -o L --show-types

Discovering data URLs

The GWDataFind servers will return URLs for specific data files.

File URLs

The default response when using the client library is to return file:// URLs:

Discover file URLs

To find the file:// URLs (or paths) for a dataset:

from gwdatafind import find_urls
print(find_urls("L", "L1_HOFT_C00", 1187008866, 1187008898))
python -m gwdatafind -o L -t L1_HOFT_C00 -s 1187008866 -e 1187008898

OSDF URLs

A GWDataFind server can also return osdf:// URLs, which represent paths under an OSDF namespace:

Discover OSDF URLs

To find the OSDF:// URLs for a dataset:

from gwdatafind import find_urls
print(find_urls("L", "L1_HOFT_C00", 1187008866, 1187008898, urltype="osdf"))
python -m gwdatafind -o L -t L1_HOFT_C00 -s 1187008866 -e 1187008898 -u osdf

Only works with paths also visible from CVMFS

A GWDataFind server can only return osdf:// URLs for corresponding /cvmfs paths, so this likely only works with https://datafind.igwn.org.

Remote data discovery with NDS

The LIGO Laboratory supports the Network Data Server (NDS), which enables remote access to observatory data. Version 2 of the NDS protocol supports remote, authenticated access for collaboration members, from anywhere in the world.

Info

For full details, see the NDS2 client documentation.

In contrast to gwdatafind, which locates files containing a specific data set, NDS2 operates solely on the data channel, and will directly return the data for that channel, regardless of which dataset contains it. This can be extremely valuable when you are not sure which dataset contains a specific channel of interest.

Example

To download data for a specific data channel:

import nds2
conn = nds2.connection("nds.ligo.caltech.edu")
print(conn.fetch(1187008866, 1187008898, ["H1:GDS-CALIB_STRAIN"]))