CernVM File System (CVMFS)¶
What is CVMFS?¶
The CernVM File System ("CernVM-FS" or "CVMFS") is a tool that allows for efficient global distribution of software and data that does not change frequently. Its name indicates its origins for use by virtual machines in use by the high energy physics community, however, it has wider applicability and usage. It caches files to disk so that, after the initial download, file access for the client is speedy.
Within IGWN, CVMFS is being used to distribute both instrument data ("frame files") and analysis software for use at the shared computing centres and by distributed workflows. You may elect to install CVMFS on your workstation for ease of access to recent data and software.
CVMFS is fully developed and tested for Linux, and works for the most part on macOS (with the explicit exception of X.509 authentication). There is no support for native Windows use, however CVMFS works well on the Windows Subsystem for Linux (WSL) version 2; users of Windows should follow the Linux instructions relevant for their WSL distribution of choice, noting any Extra steps for Windows users along the way.
Full documentation of CVMFS and its inner workings can be browsed online at https://cvmfs.readthedocs.io/.
Installing the CVMFS client¶
Install the client using the instructions specific to your platform and location:
Configure the necessary extra Apt repository for CVMFS using these instructions
sudo apt-get update sudo apt-get install cvmfs
cvmfs-config-<domain>package that best matches your location.
egi- for users in Europe
osg- for everyone else
apt-get update apt-get install cvmfs-config-osg
You may be asked to reboot your machine after this step.
Install the CVMFS client using these instructions.
Reboot your machine to finish installing CVMFS.
Manually configure the cvmfs-config repo that best matches your location:
Users in Europe should configure their client to use the EGI configuration:
git clone https://github.com/cvmfs-contrib/cvmfs-config-egi.git /tmp/cvmfs-config-egi sudo /bin/bash <<EOF mkdir -p \ /etc/cvmfs/default.d \ /etc/cvmfs/config.d \ /etc/cvmfs/keys/egi.eu install -m 444 /tmp/cvmfs-config-egi/60-egi.conf /etc/cvmfs/default.d install -m 444 /tmp/cvmfs-config-egi/config-egi.egi.eu.conf /etc/cvmfs/config.d install -m 444 /tmp/cvmfs-config-egi/egi.eu.pub /etc/cvmfs/keys/egi.eu EOF rm -rf /tmp/cvmfs-config-osg
Then manually mount the config repo:
sudo mkdir -p /cvmfs/config-egi.egi.eu sudo mount -t cvmfs config-egi.egi.eu /cvmfs/config-egi.egi.eu
Users outside of Europe should configure their client to use the OSG configuration:
git clone https://github.com/opensciencegrid/cvmfs-config-osg.git /tmp/cvmfs-config-osg sudo /bin/bash <<EOF mkdir -p \ /etc/cvmfs/default.d \ /etc/cvmfs/config.d \ /etc/cvmfs/keys/opensciencegrid.org install -m 444 /tmp/cvmfs-config-osg/60-osg.conf /etc/cvmfs/default.d install -m 444 /tmp/cvmfs-config-osg/config-osg.opensciencegrid.org.conf /etc/cvmfs/config.d install -m 444 /tmp/cvmfs-config-osg/opensciencegrid.org.pub /etc/cvmfs/keys/opensciencegrid.org EOF rm -rf /tmp/cvmfs-config-osg
Then manually mount the config repo:
sudo mkdir -p /cvmfs/config-osg.opensciencegrid.org sudo mount -t cvmfs config-osg.opensciencegrid.org /cvmfs/config-osg.opensciencegrid.org
Install and configure the OSG repositories using these instructions.
Install the relevant packages
yum -y install cvmfs cvmfs-config-osg
Next, configure the client (on all platforms) using the following short set of steps:
Run a basic setup:
sudo cvmfs_config setup
Extra step for WSL users
If you are running a Linux distribution using the Windows Subsystem for Linux, the above
cvmfs_config setupcall will do nothing, but will alert you to instead run this:
sudo cvmfs_config wsl2_start
default.localconfiguration for CVMFS that references the repositories you care about:
sudo bash -c 'cat > /etc/cvmfs/default.local' << EOF CVMFS_HTTP_PROXY=DIRECT CVMFS_QUOTA_LIMIT=20000 CVMFS_REPOSITORIES=<my-cvmfs-repo> EOF
CVMFS_HTTP_PROXYvariable defines what proxies (if any) CVMFS should use when attempting to download data over HTTP.
DIRECTis a special case to avoid using a proxy altogether. For full details of how to configure this best for your client, see
CVMFS_REPOSITORIESvariable should be defined as a comma-separated list of repository names, e.g.
See the Useful CVMFS repositories section below for details of useful repositories and how to configure them.
Reload the configuration and verify the file system:
Mounting a CVMFS repository¶
Repositories can be mounted manually by adding their name to the
CVMFS_REPOSITORIES variable in
See below for details about mouting repositories on your platform.
After attempting to mount a repository, you should
probe the repository to assert that it works:
cvmfs_config probe software.igwn.org
Mounting a CVMFS repository on Linux¶
On Linux systems that use the EGI or OSG CVMFS configuration repositories, it is unlikely that you will need to manually mount the repository, or even add it to the
cvmfs2 will automatically mount the repository as soon as any user attempts to access it.
Mounting a CVMFS repository on macOS¶
On macOS there is no
autofs service to automatically mount CVMFS repositories, so they must be manually mounted as follows:
sudo mkdir -p /cvmfs/<repo-name> sudo mount -t cvmfs <repo-name> /cvmfs/<repo-name>
sudo mkdir -p /cvmfs/software.igwn.org sudo mount -t cvmfs software.igwn.org /cvmfs/software.igwn.org
Some CVMFS repositories are configured to require users to present authorisation credentials to gain access.
To configure CVMFS to enable access to restricted repositories:
cvmfs-x509-helperto support authorised access via SciTokens:
apt-get -y install cvmfs-x509-helper
dnf -y install cvmfs-x509-helper
yum -y install cvmfs-x509-helper
See How to generate a SciToken for details on how to generate a SciToken based on the (digital) identity you hold. Short version:
Generating a SciToken
conda install -c conda-forge htgettoken htgettoken --scopes read:/kagra read:/ligo read:/virgo
Each repository below lists the relevant
scope required for token-based authorisation.
Useful CVMFS repositories:¶
This section describes a number of useful CVMFS repositories, and how to configure them.
All of the repositories below should be automatically mounted on systems that have the CVMFS client configured using the EGI or OSG configuration repositories.
Token scopes: -
OASIS is the OSG Application Software Installation Service, the recommended way to distribute software on the Open Science Grid.
IGWN Software (
Token scopes: -
IGWN distributes its software in a dedicated CVMFS repository called
software.igwn.org. This repo is the host for the IGWN Conda Distribution.
IGWN CVMFS software is built for linux
Most of the software available on
software.igwn.org (including the IGWN Conda Distribution) is compiled for Linux use, so is unlikely to work on macOS, however, there are data and other files distributed in the OASIS repo that may be used on that platform.
You should now be able to see the IGWN Conda Distribution, as well as a few other pieces of software from IGWN:
$ /cvmfs/software.igwn.org/conda/condabin/conda --version conda 4.9.2
Older IGWN Conda Distribution environments require OASIS
To use IGWN Conda Distribution environments older than
20230523, you will need to unsure that the
oasis.opensciencegrid.org repository is also configured.
Proprietary IGWN data¶
Only for registered IGWN members
Access to proprietary IGWN data distributed with CVMFS is restricted to registered IGWN collaboration members, and cannot be accessed by non-collaboration members. For information on accounts and access to collaboration services see Accounts.
For information on how to access Open Data via CVMFS, see GW Open Data below.
Proprietary IGWN data are distributed via CVMFS to make them available to distributed workflows. Each collaboration operates its own data origin (canonical copy server) and makes its data available via a separate CVMFS repository.
The files are discoverable using the GWDataFind service hosted at
Discovering data via CVMFS
See CVMFS data discovery for details on discovering data available via CVMFS.
KAGRA data (
The KAGRA collaboration operates a data origin and CVMFS repository for data from the KAGRA instrument.
Work in progress
The KAGRA data origin and CVMFS repositories are not yet configured. Please watch this space, or contact the IGWN Computing group for more details or to offer effort to complete this work.
LIGO data (
The LIGO collaboration operates a data origin and CVMFS repository for data from the LIGO-Hanford and LIGO-Livingston instruments.
Virgo data (
The Virgo collaboration operates a data origin and CVMFS repository for data from the Virgo instrument.
Shared data (
IGWN operates a 'shared' data origin for derived and/or auxiliary analysis data files that may be shared by many workflows.
Work in progress
The shared data origin and CVMFS repositories are not yet configured. Please watch this space, or contact the IGWN Computing group for more details or to offer effort to complete this work.
GW Open Data (
Token scopes: -
Data published by the Gravitational Wave Open Science Center are directly downloadable from their website, but are also made available via CVMFS. These files are discoverable using the GWDataFind service hosted at
Discovering and reading data from
First install some software to discover data, and read it:
conda install -c conda-forge gwdatafind gwpy python-ldas-tools-framecpp
Then we can query for a file URL around GW150914, discover the channels (data streams) it contains, and read the strain channel:
>>> from gwdatafind import find_urls >>> files = find_urls('H', 'H1_LOSC_4_V1', 1126259460, 1126259464, host='datafind.gw-openscience.org') >>> print(files) ['file://localhost/cvmfs/gwosc.osgstorage.org/gwdata/O1/strain.4k/frame.v1/H1/1126170624/H-H1_LOSC_4_V1-1126256640-4096.gwf'] >>> from gwpy.io.gwf import get_channel_names >>> print(get_channel_names(files)) ['H1:LOSC-DQMASK', 'H1:LOSC-INJMASK', 'H1:LOSC-STRAIN'] >>> from gwpy.timeseries import TimeSeries >>> print(TimeSeries.read(files, 'H1:LOSC-STRAIN', start=1126259460, end=1126259464)) TimeSeries([2.36153281e-19, 2.43491425e-19, 2.21593212e-19, ..., 1.42904638e-19, 1.36820401e-19, 1.21621713e-19] unit: dimensionless, t0: 1126259460.0 s, dt: 0.000244140625 s, name: H1:LOSC-STRAIN, channel: H1:LOSC-STRAIN)
Many other tools and libraries are available to read GWF files, both in Python and in other languages.
OSG Singularity images (
Token scopes: -
IGWN (and other groups) publish Singularity container images via CVMFS for distributed available, including to HTC grid jobs.
Publishing images to CVMFS
See Publishing Singularity Images To CVMFS for details on how to publish Docker images to CVMFS for distribution in