CernVM File System (CVMFS)¶
What is CVMFS?¶
The CernVM File System ("CernVM-FS" or "CVMFS") is a tool that allows for efficient global distribution of software and data that does not change frequently. Its name indicates its origins for use by virtual machines in use by the high energy physics community, however, it has wider applicability and usage. It caches files to disk so that, after the initial download, file access for the client is speedy.
Within IGWN, CVMFS is being used to distribute both instrument data ("frame files") and analysis software for use at the shared computing centres and by distributed workflows. You may elect to install CVMFS on your workstation for ease of access to recent data and software.
Platform compatibility
CVMFS is fully developed and tested for Linux, and works for the most part on macOS (with the explicit exception of X.509 authentication). There is no support for native Windows use, however CVMFS works well on the Windows Subsystem for Linux (WSL) version 2; users of Windows should follow the Linux instructions relevant for their WSL distribution of choice, noting any Extra steps for Windows users along the way.
Full documentation of CVMFS and its inner workings can be browsed online at https://cvmfs.readthedocs.io/.
Installing the CVMFS client¶
First, install the client using the instructions specific to your platform:
-
Configure the necessary extra Apt repository for CERN-maintained CVMFS using these instructions.
-
Configure the necessary extra Apt repository for
cvmfs-contrib
using these instructions. -
Update and install the relevant packages:
sudo apt-get update sudo apt-get install cvmfs cvmfs-config-osg
-
Install macFuse by downloading the
.dmg
installer from the latest release. -
Install the CVMFS client using these instructions.
-
Reboot your machine to finish installing CVMFS.
-
Manually configure the OSG config repo:
git clone https://github.com/opensciencegrid/cvmfs-config-osg.git /tmp/cvmfs-config-osg sudo make install -C /tmp/cvmfs-config-osg rm -rf /tmp/cvmfs-config-osg sudo ln -s /etc/cvmfs/config.d/config-osg.opensciencegrid.org.conf /etc/cvmfs/default.d/
-
Manually mount the OSG config repo:
sudo mkdir -p /cvmfs/config-osg.opensciencegrid.org sudo mount -t cvmfs config-osg.opensciencegrid.org /cvmfs/config-osg.opensciencegrid.org
-
Install and configure the OSG repositories using these instructions.
Install OSG ≤3.5 for your platform
In order to use X.509, it is important to install the OSG 3.5 repository, and not the 3.6 or later versions.
-
Install the relevant packages
yum -y install cvmfs cvmfs-config-osg cvmfs-x509-helper
Once the client is installed, you should configure it (on all platforms) using the following short set of steps:
-
Run a basic setup:
sudo cvmfs_config setup
Extra step for WSL users
If you are running a Linux distribution using the Windows Subsystem for Linux, the above
cvmfs_config setup
call will do nothing, but will alert you to instead run this:sudo cvmfs_config wsl2_start
-
Create a
default.local
configuration for CVMFS that references the repositories you care about:sudo bash -c 'cat > /etc/cvmfs/default.local' << EOF CVMFS_REPOSITORIES=<my-cvmfs-repo> CVMFS_QUOTA_LIMIT=20000 CVMFS_HTTP_PROXY=DIRECT EOF
Configuring
CVMFS_REPOSITORIES
The
CVMFS_REPOSITORIES
variable should be defined as a comma-separated list of repository domains, e.g.CVMFS_REPOSITORIES=oasis.opensciencegrid.org,virgo.infn.it
See the Useful CVMFS repositories section below for details of useful repositories and how to configure them.
-
Reload the configuration and verify the file system:
cvmfs_config probe
Useful CVMFS repositories:¶
This section describes a number of useful CVMFS repositories, and how to configure them.
OASIS (oasis.opensciencegrid.org
)¶
OASIS is the OSG Application Software Installation Service, the recommended way to install software on the Open Science Grid. IGWN uses it as the host repository for the IGWN Conda Distribution.
OASIS software is (mostly) built for linux
Most of the software available on OASIS (including the IGWN Conda Distribution) is compiled for Linux use, so is unlikely to work on macOS, however, there are data and other files distributed in the OASIS repo that may be used on that platform.
To configure OASIS:
-
Update the
CVMFS_REPOSITORIES
configuration variable in/etc/cvmfs/default.local
to includeoasis.opensciencegrid.org
:CVMFS_REPOSITORIES=oasis.opensciencegrid.org
Extra step for macOS users
On macOS you need to manually mount the new repo:
sudo mkdir -p /cvmfs/oasis.opensciencegrid.org sudo mount -t cvmfs oasis.opensciencegrid.org /cvmfs/oasis.opensciencegrid.org
-
Probe the repo to check that it works
cvmfs_config probe oasis.opensciencegrid.org
-
You should now be able to see the IGWN Conda Distribution, as well as a huge host of software from IGWN and other projects:
$ /cvmfs/oasis.opensciencegrid.org/ligo/sw/conda/condabin/conda --version conda 4.9.2
IGWN proprietary data (igwn.osgstorage.org
)¶
Only for registered IGWN members
Access to the IGWN proprietary data distributed with CVMFS is restricted to registered IGWN collaboration members, and cannot be accessed by non-collaboration members. For information on accounts and access to collaboration services see Accounts.
For information on how to access Open Data via CVMFS, see GW Open Data below.
Prerequisites
Please complete the configuration of oasis.opensciencegrid.org
before configuring igwn.osgstorage.org
.
Not available on macOS
The macOS CVMFS client does not work with X.509-authenticated repositories at this time. It is possible that in the future support will be available when the shift to SciTokens is complete.
Proprietary IGWN data are distributed via CVMFS to make them available to jobs using the distributed HTC grid supported by the OSG. Access to the files is restricted to authorised IGWN members who must include authorisation credentials with jobs that require access to the data. The files are discoverable using the GWDataFind service hosted at datafind.ligo.org
.
To configure IGWN proprietary data via CVMFS:
-
Install
cvmfs-x509-helper
to support authorised access via SciTokens or X.509:apt-get -y install cvmfs-x509-helper
dnf -y install cvmfs-x509-helper
yum -y install cvmfs-x509-helper
-
Update the
CVMFS_REPOSITORIES
configuration variable in/etc/cvmfs/default.local
to includeigwn.osgstorage.org
andligo.osgstorage.org
:CVMFS_REPOSITORIES=igwn.osgstorage.org,ligo.osgstorage.org,oasis.opensciencegrid.org
-
See How to generate a SciToken or How to generate a credential (X.509) for details on how to generate a SciToken or X.509 credential based on the (digital) identity you hold. Short version:
Generating an X.509 credential
conda install -c conda-forge ciecplib ecp-get-cert -i login.ligo.org -u marie.curie
-
Probe the repos to check that they work
cvmfs_config probe ligo.osgstorage.org cvmfs_config probe igwn.osgstorage.org
This might not work
If you don't have a valid X.509 credential, the
cvmfs_config probe
will fail, see step 3 above.
And once you have a valid X.509 credential, you should be able to use GWDataFind to discover data available via CVMFS.
Discovering data via CVMFS
See CVMFS data discovery for details on discovering data available via CVMFS.
GW Open Data (gwosc.osgstorage.org
)¶
Data published by the Gravitational Wave Open Science Center are directly downloadable from their website, but are also made available via CVMFS. These files are discoverable using the GWDataFind service hosted at datafind.gw-openscience.org
.
To configure GW Open Data via CVMFS:
-
Update the
CVMFS_REPOSITORIES
configuration variable in/etc/cvmfs/default.local
to includegwosc.osgstorage.org
:CVMFS_REPOSITORIES=gwosc.osgstorage.org
Extra step for macOS users
On macOS you need to manually mount the new repo:
sudo mkdir -p /cvmfs/gwosc.osgstorage.org sudo mount -t cvmfs gwosc.osgstorage.org /cvmfs/gwosc.osgstorage.org
-
Probe the repo to check that it works
cvmfs_config probe gwosc.osgstorage.org
The GWOSC data are not restricted in the same way as those from igwn.osgstorage.org
, meaning that as soon as you have configured the CVMFS repository, you should be able to query for and read data:
Discovering and reading data from gwosc.osgstorage.org
First install some software to discover data, and read it:
conda install -c conda-forge gwdatafind gwpy python-ldas-tools-framecpp
Then we can query for a file URL around GW150914, discover the channels (data streams) it contains, and read the strain channel:
>>> from gwdatafind import find_urls
>>> files = find_urls('H', 'H1_LOSC_4_V1', 1126259460, 1126259464, host='datafind.gw-openscience.org')
>>> print(files)
['file://localhost/cvmfs/gwosc.osgstorage.org/gwdata/O1/strain.4k/frame.v1/H1/1126170624/H-H1_LOSC_4_V1-1126256640-4096.gwf']
>>> from gwpy.io.gwf import get_channel_names
>>> print(get_channel_names(files[0]))
['H1:LOSC-DQMASK', 'H1:LOSC-INJMASK', 'H1:LOSC-STRAIN']
>>> from gwpy.timeseries import TimeSeries
>>> print(TimeSeries.read(files, 'H1:LOSC-STRAIN', start=1126259460, end=1126259464))
TimeSeries([2.36153281e-19, 2.43491425e-19, 2.21593212e-19, ...,
1.42904638e-19, 1.36820401e-19, 1.21621713e-19]
unit: dimensionless,
t0: 1126259460.0 s,
dt: 0.000244140625 s,
name: H1:LOSC-STRAIN,
channel: H1:LOSC-STRAIN)
Many other tools and libraries are available to read GWF files, both in Python and in other languages.
OSG Singularity images (singularity.opensciencegrid.org
)¶
IGWN (and other groups) publish Singularity container images via CVMFS for distributed available, including to HTC grid jobs.
To configure OSG Singularity images via CVMFS:
-
Update the
CVMFS_REPOSITORIES
configuration variable in/etc/cvmfs/default.local
to includesingularity.opensciencegrid.org
:CVMFS_REPOSITORIES=oasis.opensciencegrid.org,singularity.opensciencegrid.org
Keeping OASIS is optional
In this example we also configure the
oasis.opensciencegrid.org
repo, mainly because it provides a build ofsingularity
itself, but I believe thatsingularity.opensciencegrid.org
can work independently ofoasis.opensciencegrid.org
if you have your own build ofsingularity
.Extra step for macOS users
On macOS you need to manually mount the new repo:
sudo mkdir -p /cvmfs/singularity.opensciencegrid.org sudo mount -t cvmfs singularity.opensciencegrid.org /cvmfs/singularity.opensciencegrid.org
-
Probe the repo to check that it works
cvmfs_config probe singularity.opensciencegrid.org
-
You can now run a simple test to validate that singularity can execute images:
Validating
singularity.opensciencegrid.org
$ /cvmfs/oasis.opensciencegrid.org/mis/singularity/bin/singularity \ exec \ /cvmfs/singularity.opensciencegrid.org/igwn/software:el7 \ cat /etc/os-release NAME="Scientific Linux" VERSION="7.9 (Nitrogen)" ID="scientific" ID_LIKE="rhel centos fedora" VERSION_ID="7.9" PRETTY_NAME="Scientific Linux 7.9 (Nitrogen)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:scientificlinux:scientificlinux:7.9:GA" HOME_URL="http://www.scientificlinux.org//" BUG_REPORT_URL="mailto:scientific-linux-devel@listserv.fnal.gov" REDHAT_BUGZILLA_PRODUCT="Scientific Linux 7" REDHAT_BUGZILLA_PRODUCT_VERSION=7.9 REDHAT_SUPPORT_PRODUCT="Scientific Linux" REDHAT_SUPPORT_PRODUCT_VERSION="7.9"
Publishing images to CVMFS
See Publishing Singularity Images To CVMFS for details on how to publish Docker images to CVMFS for distribution in singularity.opensciencegrid.org
.