CernVM File System (CVMFS)¶
What is CVMFS?¶
The CernVM File System ("CernVM-FS" or "CVMFS") is a tool that allows for efficient global distribution of software and data that does not change frequently. Its name indicates its origins for use by virtual machines in use by the high energy physics community, however, it has wider applicability and usage. It caches files to disk so that, after the initial download, file access for the client is speedy.
Within IGWN, CVMFS is being used to distribute both instrument data ("frame files") and analysis software for use at the shared computing centres and by distributed workflows. You may elect to install CVMFS on your workstation for ease of access to recent data and software.
Platform compatibility
CVMFS is fully developed and tested for Linux, and works for the most part on macOS (with the explicit exception of X.509 authentication). There is no support for native Windows use, however CVMFS works well on the Windows Subsystem for Linux (WSL) version 2; users of Windows should follow the Linux instructions relevant for their WSL distribution of choice, noting any Extra steps for Windows users along the way.
Full documentation of CVMFS and its inner workings can be browsed online at https://cvmfs.readthedocs.io/.
Installing the CVMFS client¶
-
Install the client using the instructions specific to your platform and location:
-
Configure the necessary extra Apt repository for CVMFS using these instructions
-
Install CVMFS:
sudo apt-get update sudo apt-get install cvmfs
-
Configure the necessary extra Apt repository for
cvmfs-contrib
using these instructions. -
Install the
cvmfs-config-<domain>
package that best matches your location.<domain>
should beegi
- for users in Europeosg
- for everyone else
For example:
apt-get update apt-get install cvmfs-config-osg
-
Install macFuse by downloading the
.dmg
installer from the latest release.You may be asked to reboot your machine after this step.
-
Install the CVMFS client using these instructions.
-
Reboot your machine to finish installing CVMFS.
-
Manually configure the cvmfs-config repo that best matches your location:
Users in Europe should configure their client to use the EGI configuration:
git clone https://github.com/cvmfs-contrib/cvmfs-config-egi.git /tmp/cvmfs-config-egi sudo /bin/bash <<EOF mkdir -p \ /etc/cvmfs/default.d \ /etc/cvmfs/config.d \ /etc/cvmfs/keys/egi.eu install -m 444 /tmp/cvmfs-config-egi/60-egi.conf /etc/cvmfs/default.d install -m 444 /tmp/cvmfs-config-egi/config-egi.egi.eu.conf /etc/cvmfs/config.d install -m 444 /tmp/cvmfs-config-egi/egi.eu.pub /etc/cvmfs/keys/egi.eu EOF rm -rf /tmp/cvmfs-config-osg
Then manually mount the config repo:
sudo mkdir -p /cvmfs/config-egi.egi.eu sudo mount -t cvmfs config-egi.egi.eu /cvmfs/config-egi.egi.eu
Users outside of Europe should configure their client to use the OSG configuration:
git clone https://github.com/opensciencegrid/cvmfs-config-osg.git /tmp/cvmfs-config-osg sudo /bin/bash <<EOF mkdir -p \ /etc/cvmfs/default.d \ /etc/cvmfs/config.d \ /etc/cvmfs/keys/opensciencegrid.org install -m 444 /tmp/cvmfs-config-osg/60-osg.conf /etc/cvmfs/default.d install -m 444 /tmp/cvmfs-config-osg/config-osg.opensciencegrid.org.conf /etc/cvmfs/config.d install -m 444 /tmp/cvmfs-config-osg/opensciencegrid.org.pub /etc/cvmfs/keys/opensciencegrid.org EOF rm -rf /tmp/cvmfs-config-osg
Then manually mount the config repo:
sudo mkdir -p /cvmfs/config-osg.opensciencegrid.org sudo mount -t cvmfs config-osg.opensciencegrid.org /cvmfs/config-osg.opensciencegrid.org
-
Configure the necessary extra Yum repository for CVMFS using these instructions
-
Install CVMFS:
dnf -y install cvmfs
-
Configure the necessary extra Yum repository for
cvmfs-contrib
using these instructions. -
Install the
cvmfs-config-egi
configuration packagednf -y install cvmfs-config-osg
-
Install and configure the OSG repositories using these instructions.
-
Install the relevant packages
yum -y install cvmfs cvmfs-config-osg
-
-
Next, configure the client (on all platforms) using the following short set of steps:
-
Run a basic setup:
sudo cvmfs_config setup
Extra step for WSL users
If you are running a Linux distribution using the Windows Subsystem for Linux, the above
cvmfs_config setup
call will do nothing, but will alert you to instead run this:sudo cvmfs_config wsl2_start
-
Create a
default.local
configuration for CVMFS that references the repositories you care about:sudo bash -c 'cat > /etc/cvmfs/default.local' << EOF CVMFS_HTTP_PROXY=DIRECT CVMFS_QUOTA_LIMIT=20000 CVMFS_REPOSITORIES=<my-cvmfs-repo> EOF
Configuring
CVMFS_HTTP_PROXY
The
CVMFS_HTTP_PROXY
variable defines what proxies (if any) CVMFS should use when attempting to download data over HTTP.The value
DIRECT
is a special case to avoid using a proxy altogether. For full details of how to configure this best for your client, seehttps://cvmfs.readthedocs.io/en/stable/cpt-configure.html#proxy-lists
Configuring
CVMFS_REPOSITORIES
The
CVMFS_REPOSITORIES
variable should be defined as a comma-separated list of repository names, e.g.CVMFS_REPOSITORIES=gwosc.osgstorage.org
See the Useful CVMFS repositories section below for details of useful repositories and how to configure them.
-
Reload the configuration and verify the file system:
cvmfs_config probe
-
Mounting a CVMFS repository¶
Repositories can be mounted manually by adding their name to the CVMFS_REPOSITORIES
variable in /etc/cvmfs/default.local
, e.g:
CVMFS_REPOSITORIES="software.igwn.org"
See below for details about mouting repositories on your platform.
After attempting to mount a repository, you should probe
the repository to assert that it works:
cvmfs_config probe software.igwn.org
Mounting a CVMFS repository on Linux¶
On Linux systems that use the EGI or OSG CVMFS configuration repositories, it is unlikely that you will need to manually mount the repository, or even add it to the CVMFS_REPOSITORIES
variable.
autofs
and cvmfs2
will automatically mount the repository as soon as any user attempts to access it.
Mounting a CVMFS repository on macOS¶
On macOS there is no autofs
service to automatically mount CVMFS repositories, so they must be manually mounted as follows:
sudo mkdir -p /cvmfs/<repo-name>
sudo mount -t cvmfs <repo-name> /cvmfs/<repo-name>
For example:
sudo mkdir -p /cvmfs/software.igwn.org
sudo mount -t cvmfs software.igwn.org /cvmfs/software.igwn.org
Authorised access¶
Some CVMFS repositories are configured to require users to present authorisation credentials to gain access.
To configure CVMFS to enable access to restricted repositories:
-
Install
cvmfs-x509-helper
to support authorised access via SciTokens:apt-get -y install cvmfs-x509-helper
dnf -y install cvmfs-x509-helper
yum -y install cvmfs-x509-helper
-
See How to generate a SciToken for details on how to generate a SciToken based on the (digital) identity you hold. Short version:
Generating a SciToken
conda install -c conda-forge htgettoken htgettoken --scopes read:/kagra read:/ligo read:/virgo
Each repository below lists the relevant scope
required for token-based authorisation.
Useful CVMFS repositories:¶
This section describes a number of useful CVMFS repositories, and how to configure them.
All of the repositories below should be automatically mounted on systems that have the CVMFS client configured using the EGI or OSG configuration repositories.
OASIS (oasis.opensciencegrid.org
)¶
Name: oasis.opensciencegrid.org
Token scopes: -
OASIS is the OSG Application Software Installation Service, the recommended way to distribute software on the Open Science Grid.
IGWN Software (software.igwn.org
)¶
Name: software.igwn.org
Token scopes: -
IGWN distributes its software in a dedicated CVMFS repository called software.igwn.org
. This repo is the host for the IGWN Conda Distribution.
IGWN CVMFS software is built for linux
Most of the software available on software.igwn.org
(including the IGWN Conda Distribution) is compiled for Linux use, so is unlikely to work on macOS, however, there are data and other files distributed in the OASIS repo that may be used on that platform.
You should now be able to see the IGWN Conda Distribution, as well as a few other pieces of software from IGWN:
$ /cvmfs/software.igwn.org/conda/condabin/conda --version
conda 4.9.2
Older IGWN Conda Distribution environments require OASIS
To use IGWN Conda Distribution environments older than 20230523
, you will need to unsure that the oasis.opensciencegrid.org
repository is also configured.
Proprietary IGWN data¶
Only for registered IGWN members
Access to proprietary IGWN data distributed with CVMFS is restricted to registered IGWN collaboration members, and cannot be accessed by non-collaboration members. For information on accounts and access to collaboration services see Accounts.
For information on how to access Open Data via CVMFS, see GW Open Data below.
Proprietary IGWN data are distributed via CVMFS to make them available to distributed workflows. Each collaboration operates its own data origin (canonical copy server) and makes its data available via a separate CVMFS repository.
The files are discoverable using the GWDataFind service hosted at datafind.igwn.org
.
Discovering data via CVMFS
See CVMFS data discovery for details on discovering data available via CVMFS.
KAGRA data (kagra.storage.igwn.org
)¶
Name: kagra.storage.igwn.org
Token scopes: read:/kagra
The KAGRA collaboration operates a data origin and CVMFS repository for data from the KAGRA instrument.
LIGO data (ligo.storage.igwn.org
)¶
Name: ligo.storage.igwn.org
Token scopes: read:/ligo
The LIGO collaboration operates a data origin and CVMFS repository for data from the LIGO-Hanford and LIGO-Livingston instruments.
Virgo data (virgo.storage.igwn.org
)¶
Name: virgo.storage.igwn.org
Token scopes: read:/virgo
The Virgo collaboration operates a data origin and CVMFS repository for data from the Virgo instrument.
Shared data (shared.storage.igwn.org
)¶
Name: shared.storage.igwn.org
Token scopes: read:/shared
IGWN operates a 'shared' data origin for derived and/or auxiliary analysis data files that may be shared by many workflows.
Work in progress
The shared data origin and CVMFS repositories are not yet configured. Please watch this space, or contact the IGWN Computing group for more details or to offer effort to complete this work.
GW Open Data (gwosc.osgstorage.org
)¶
Name: gwosc.osgstorage.org
Token scopes: -
Data published by the Gravitational Wave Open Science Center are directly downloadable from their website, but are also made available via CVMFS. These files are discoverable using the GWDataFind service hosted at datafind.gw-openscience.org
.
Discovering and reading data from gwosc.osgstorage.org
First install some software to discover data, and read it:
conda install -c conda-forge gwdatafind gwpy python-ldas-tools-framecpp
Then we can query for a file URL around GW150914, discover the channels (data streams) it contains, and read the strain channel:
>>> from gwdatafind import find_urls
>>> files = find_urls('H', 'H1_LOSC_4_V1', 1126259460, 1126259464, host='datafind.gw-openscience.org')
>>> print(files)
['file://localhost/cvmfs/gwosc.osgstorage.org/gwdata/O1/strain.4k/frame.v1/H1/1126170624/H-H1_LOSC_4_V1-1126256640-4096.gwf']
>>> from gwpy.io.gwf import get_channel_names
>>> print(get_channel_names(files[0]))
['H1:LOSC-DQMASK', 'H1:LOSC-INJMASK', 'H1:LOSC-STRAIN']
>>> from gwpy.timeseries import TimeSeries
>>> print(TimeSeries.read(files, 'H1:LOSC-STRAIN', start=1126259460, end=1126259464))
TimeSeries([2.36153281e-19, 2.43491425e-19, 2.21593212e-19, ...,
1.42904638e-19, 1.36820401e-19, 1.21621713e-19]
unit: dimensionless,
t0: 1126259460.0 s,
dt: 0.000244140625 s,
name: H1:LOSC-STRAIN,
channel: H1:LOSC-STRAIN)
Many other tools and libraries are available to read GWF files, both in Python and in other languages.
OSG Singularity images (singularity.opensciencegrid.org
)¶
Name: singularity.opensciencegrid.org
Token scopes: -
IGWN (and other groups) publish Singularity container images via CVMFS for distributed available, including to HTC grid jobs.
Publishing images to CVMFS
See Publishing Singularity Images To CVMFS for details on how to publish Docker images to CVMFS for distribution in singularity.opensciencegrid.org
.