Accessing IGWN data¶
This page describes data availability, and how data are transferred in various regimes.
For details on how these data are transferred between computing centres, please see the pages on Bulk data distribution and Low-latency distribution.
Gravitational-wave Frame files (GWF)¶
Each of the current generation of IGWN detectors continuously produces data that are archived in Gravitational-Wave Frame (GWF, .gwf
) files. This is a custom binary file format that allows for extremely efficient storage of a large quantity of heterogeneous data.
For details on the GWF format, see LIGO-T970130.
Data discovery¶
The GWF files are stored in a number of locations, and what data are available is dependent on which grid computing centre you want to use. Additionally, data may be available for remote use via the NDS2 (see below).
Local data discovery¶
Local data files are discoverable using gwdatafind
, a python library and command-line interface backed by an index of all available files.
Info
For more details, and better examples, see the gwdatafind documentation
.
Data are indexed using the following metadata parameters:
-
observatory
- the single-character prefix for the observatoryPrefix Name G
GEO600 K
KAGRA L
LIGO Livingston H
LIGO Hanford V
Virgo -
frametype
- the dataset name (see Available datasets below) - and the GPS
[start, stop)
interval of the contained data
Discovering datasets¶
Example
To discover which datasets are available at a given location for an observatory:
```Python tab= from gwdatafind import find_types print(find_types("L"))
```shell tab="Command-line"
python -m gwdatafind -o L --show-types
Discovering file URLs¶
Example
To find the file URLs (or paths) for a dataset:
```Python tab= from gwdatafind import find_urls print(find_urls("L", "L1_HOFT_C00", 1187008866, 1187008898))
```shell tab="Command-line"
python -m gwdatafind -o L -t L1_HOFT_C00 -s 1187008866 -e 1187008898
CVMFS data discovery¶
In a typical configuration gwdatafind
is configured (via the LIGO_DATAFIND_SERVER
environment variable) to connect to a local server that reads an index of local files.
However, a special server has been configured at datafind.ligo.org:443
to return URLs that point to locations in a CVMFS file system, allowing remote files to be accessed as if they were local.
Any (authorised) user can query against that server to discover file URLs, and just need to configure CVMFS on their system to be able to read the files as if they were local:
Configuring CVMFS for IGWN data access
See Accessing proprietary IGWN data via CVMFS for details on how to configure CVMFS on a Linux host (or container) to enable IGWN data access.
Example
To query for files using a custom server:
from gwdatafind import find_urls
print(find_urls("L", "L1_HOFT_C00", 1187008866, 1187008898, host="datafind.ligo.org:443")
python -m gwdatafind -o L -t L1_HOFT_C00 -s 1187008866 -e 1187008898 -r datafind.ligo.org:443
will return (at time of writing):
'file://localhost/cvmfs/oasis.opensciencegrid.org/ligo/frames/O2/hoft/L1/L-L1_HOFT_C00-11870/L-L1_HOFT_C00-1187008512-4096.gwf'
Remote data discovery with NDS¶
The LIGO Laboratory supports a tool called Network Data Server (NDS), which enables remote access to observatory data. Version 2 of the NDS protocol supports remote, authenticated access for collaboration members, from anywhere in the world.
Info
For full details, see the the NDS2 client documentation.
In contrast to gwdatafind
, which locates files contain a specific data set, NDS2 operates solely on the data channel name, and will directly return the data for that channel, regardless of which dataset contains it. This can be extremely valuable when you are not sure which dataset contains a specific channel of interest.
Example
To download data for a specific data channel:
Python tab= import nds2 conn = nds2.connection("nds.ligo.caltech.edu") print(conn.fetch(1187008866, 1187008898, ["H1:GDS-CALIB_STRAIN"]))
Available datasets¶
The following is an complete, but representative reference as to which datasets may be avaiable.
Warning
Not all datasets are available at each grid computing centre.
Dataset (frametype) | Description |
---|---|
H1_R | All 'raw' data channels, stored at the native sampling rate |
H1_T | Second trends of all 'raw' channels, including .mean, .min, and .max |
H1_M | Minute trends of all 'raw' channels, including .mean, .min, and .max |
H1_HOFT_C00 | Strain h(t) and metadata generated using the real-time calibration pipeline |
H1_HOFT_CXY | Strain h(t) and metadata generated using the off-line calibration pipeline at version XY |
H1_GWOSC_O2_4KHZ_R1 | 4k Hz Strain h(t) and metadata as released by The Gravitational-Wave Open Science Centre (GWOSC) for the O2 data release |
H1_GWOSC_O2_16KHZ_R1 | 16k Hz Strain h(t) and metadata as released by The Gravitational-Wave Open Science Centre (GWOSC) for the O2 data release |