Skip to content

CernVM File System (CVMFS)

What is CVMFS?

The CernVM File System ("CernVM-FS" or "CVMFS") is a tool that allows for efficient global distribution of software and data that does not change frequently. Its name indicates its origins for use by virtual machines in use by the high energy physics community, however, it has wider applicability and usage. It caches files to disk so that, after the initial download, file access for the client is speedy.

Within IGWN, CVMFS is being used to distribute both instrument data ("frame files") and analysis software for use at the shared computing centres and by distributed workflows. You may elect to install CVMFS on your workstation for ease of access to recent data and software.

Platform compatibility

CVMFS is fully developed and tested for Linux, and works for the most part on macOS (with the explicit exception of X.509 authentication). There is no support for native Windows use, however CVMFS works well on the Windows Subsystem for Linux (WSL) version 2; users of Windows should follow the Linux instructions relevant for their WSL distribution of choice, noting any Extra steps for Windows users along the way.

Full documentation of CVMFS and its inner workings can be browsed online at https://cvmfs.readthedocs.io/.

Installing the CVMFS client

  1. Install the client using the instructions specific to your platform and location:

    1. Configure the necessary extra Apt repository for CVMFS using these instructions

    2. Install CVMFS:

      sudo apt-get update
      sudo apt-get install cvmfs
      
    3. Configure the necessary extra Apt repository for cvmfs-contrib using these instructions.

    4. Install the cvmfs-config-<domain> package that best matches your location. <domain> should be

      • egi - for users in Europe
      • osg - for everyone else

      For example:

      apt-get update
      apt-get install cvmfs-config-osg
      
    1. Install macFuse by downloading the .dmg installer from the latest release.

      You may be asked to reboot your machine after this step.

    2. Install the CVMFS client using these instructions.

    3. Reboot your machine to finish installing CVMFS.

    4. Manually configure the cvmfs-config repo that best matches your location:

      Users in Europe should configure their client to use the EGI configuration:

      git clone https://github.com/cvmfs-contrib/cvmfs-config-egi.git /tmp/cvmfs-config-egi
      sudo /bin/bash <<EOF
      mkdir -p \
        /etc/cvmfs/default.d \
        /etc/cvmfs/config.d \
        /etc/cvmfs/keys/egi.eu
      install -m 444 /tmp/cvmfs-config-egi/60-egi.conf /etc/cvmfs/default.d
      install -m 444 /tmp/cvmfs-config-egi/config-egi.egi.eu.conf /etc/cvmfs/config.d
      install -m 444 /tmp/cvmfs-config-egi/egi.eu.pub /etc/cvmfs/keys/egi.eu
      EOF
      rm -rf /tmp/cvmfs-config-osg
      

      Then manually mount the config repo:

      sudo mkdir -p /cvmfs/config-egi.egi.eu
      sudo mount -t cvmfs config-egi.egi.eu /cvmfs/config-egi.egi.eu
      

      Users outside of Europe should configure their client to use the OSG configuration:

      git clone https://github.com/opensciencegrid/cvmfs-config-osg.git /tmp/cvmfs-config-osg
      sudo /bin/bash <<EOF
      mkdir -p \
        /etc/cvmfs/default.d \
        /etc/cvmfs/config.d \
        /etc/cvmfs/keys/opensciencegrid.org
      install -m 444 /tmp/cvmfs-config-osg/60-osg.conf /etc/cvmfs/default.d
      install -m 444 /tmp/cvmfs-config-osg/config-osg.opensciencegrid.org.conf /etc/cvmfs/config.d
      install -m 444 /tmp/cvmfs-config-osg/opensciencegrid.org.pub /etc/cvmfs/keys/opensciencegrid.org
      EOF
      rm -rf /tmp/cvmfs-config-osg
      

      Then manually mount the config repo:

      sudo mkdir -p /cvmfs/config-osg.opensciencegrid.org
      sudo mount -t cvmfs config-osg.opensciencegrid.org /cvmfs/config-osg.opensciencegrid.org
      
    1. Configure the necessary extra Yum repository for CVMFS using these instructions

    2. Install CVMFS:

      dnf -y install cvmfs
      
    3. Configure the necessary extra Yum repository for cvmfs-contrib using these instructions.

    4. Install the cvmfs-config-egi configuration package

      dnf -y install cvmfs-config-osg
      
    1. Install and configure the OSG repositories using these instructions.

    2. Install the relevant packages

      yum -y install cvmfs cvmfs-config-osg
      
  2. Next, configure the client (on all platforms) using the following short set of steps:

    1. Run a basic setup:

      sudo cvmfs_config setup
      

      Extra step for WSL users

      If you are running a Linux distribution using the Windows Subsystem for Linux, the above cvmfs_config setup call will do nothing, but will alert you to instead run this:

      sudo cvmfs_config wsl2_start
      
    2. Create a default.local configuration for CVMFS that references the repositories you care about:

      sudo bash -c 'cat > /etc/cvmfs/default.local' << EOF
      CVMFS_HTTP_PROXY=DIRECT
      CVMFS_QUOTA_LIMIT=20000
      CVMFS_REPOSITORIES=<my-cvmfs-repo>
      EOF
      

      Configuring CVMFS_HTTP_PROXY

      The CVMFS_HTTP_PROXY variable defines what proxies (if any) CVMFS should use when attempting to download data over HTTP.

      The value DIRECT is a special case to avoid using a proxy altogether. For full details of how to configure this best for your client, see

      https://cvmfs.readthedocs.io/en/stable/cpt-configure.html#proxy-lists

      Configuring CVMFS_REPOSITORIES

      The CVMFS_REPOSITORIES variable should be defined as a comma-separated list of repository names, e.g.

      CVMFS_REPOSITORIES=gwosc.osgstorage.org
      

      See the Useful CVMFS repositories section below for details of useful repositories and how to configure them.

    3. Reload the configuration and verify the file system:

      cvmfs_config probe
      

Mounting a CVMFS repository

Repositories can be mounted manually by adding their name to the CVMFS_REPOSITORIES variable in /etc/cvmfs/default.local, e.g:

CVMFS_REPOSITORIES="software.igwn.org"

See below for details about mouting repositories on your platform.

After attempting to mount a repository, you should probe the repository to assert that it works:

cvmfs_config probe software.igwn.org

Mounting a CVMFS repository on Linux

On Linux systems that use the EGI or OSG CVMFS configuration repositories, it is unlikely that you will need to manually mount the repository, or even add it to the CVMFS_REPOSITORIES variable.

autofs and cvmfs2 will automatically mount the repository as soon as any user attempts to access it.

Mounting a CVMFS repository on macOS

On macOS there is no autofs service to automatically mount CVMFS repositories, so they must be manually mounted as follows:

sudo mkdir -p /cvmfs/<repo-name>
sudo mount -t cvmfs <repo-name> /cvmfs/<repo-name>

For example:

sudo mkdir -p /cvmfs/software.igwn.org
sudo mount -t cvmfs software.igwn.org /cvmfs/software.igwn.org

Authorised access

Some CVMFS repositories are configured to require users to present authorisation credentials to gain access.

To configure CVMFS to enable access to restricted repositories:

  1. Install cvmfs-x509-helper to support authorised access via SciTokens:

    apt-get -y install cvmfs-x509-helper
    
    dnf -y install cvmfs-x509-helper
    
    yum -y install cvmfs-x509-helper
    
  2. See How to generate a SciToken for details on how to generate a SciToken based on the (digital) identity you hold. Short version:

    Generating a SciToken

    conda install -c conda-forge htgettoken
    htgettoken --scopes read:/kagra read:/ligo read:/virgo
    

Each repository below lists the relevant scope required for token-based authorisation.

Useful CVMFS repositories:

This section describes a number of useful CVMFS repositories, and how to configure them.

All of the repositories below should be automatically mounted on systems that have the CVMFS client configured using the EGI or OSG configuration repositories.

OASIS (oasis.opensciencegrid.org)

Name: oasis.opensciencegrid.org
Token scopes: -

OASIS is the OSG Application Software Installation Service, the recommended way to distribute software on the Open Science Grid.

IGWN Software (software.igwn.org)

Name: software.igwn.org
Token scopes: -

IGWN distributes its software in a dedicated CVMFS repository called software.igwn.org. This repo is the host for the IGWN Conda Distribution.

IGWN CVMFS software is built for linux

Most of the software available on software.igwn.org (including the IGWN Conda Distribution) is compiled for Linux use, so is unlikely to work on macOS, however, there are data and other files distributed in the OASIS repo that may be used on that platform.

You should now be able to see the IGWN Conda Distribution, as well as a few other pieces of software from IGWN:

$ /cvmfs/software.igwn.org/conda/condabin/conda --version
conda 4.9.2

Older IGWN Conda Distribution environments require OASIS

To use IGWN Conda Distribution environments older than 20230523, you will need to unsure that the oasis.opensciencegrid.org repository is also configured.

Proprietary IGWN data

Only for registered IGWN members

Access to proprietary IGWN data distributed with CVMFS is restricted to registered IGWN collaboration members, and cannot be accessed by non-collaboration members. For information on accounts and access to collaboration services see Accounts.

For information on how to access Open Data via CVMFS, see GW Open Data below.

Proprietary IGWN data are distributed via CVMFS to make them available to distributed workflows. Each collaboration operates its own data origin (canonical copy server) and makes its data available via a separate CVMFS repository.

The files are discoverable using the GWDataFind service hosted at datafind.igwn.org.

Discovering data via CVMFS

See CVMFS data discovery for details on discovering data available via CVMFS.

KAGRA data (kagra.storage.igwn.org)

Name: kagra.storage.igwn.org
Token scopes: read:/kagra

The KAGRA collaboration operates a data origin and CVMFS repository for data from the KAGRA instrument.

Work in progress

The KAGRA data origin and CVMFS repositories are not yet configured. Please watch this space, or contact the IGWN Computing group for more details or to offer effort to complete this work.

LIGO data (ligo.storage.igwn.org)

Name: ligo.storage.igwn.org
Token scopes: read:/ligo

The LIGO collaboration operates a data origin and CVMFS repository for data from the LIGO-Hanford and LIGO-Livingston instruments.

Virgo data (virgo.storage.igwn.org)

Name: virgo.storage.igwn.org
Token scopes: read:/virgo

The Virgo collaboration operates a data origin and CVMFS repository for data from the Virgo instrument.

Shared data (shared.storage.igwn.org)

Name: shared.storage.igwn.org
Token scopes: read:/shared

IGWN operates a 'shared' data origin for derived and/or auxiliary analysis data files that may be shared by many workflows.

Work in progress

The shared data origin and CVMFS repositories are not yet configured. Please watch this space, or contact the IGWN Computing group for more details or to offer effort to complete this work.

GW Open Data (gwosc.osgstorage.org)

Name: gwosc.osgstorage.org
Token scopes: -

Data published by the Gravitational Wave Open Science Center are directly downloadable from their website, but are also made available via CVMFS. These files are discoverable using the GWDataFind service hosted at datafind.gw-openscience.org.

Discovering and reading data from gwosc.osgstorage.org

First install some software to discover data, and read it:

conda install -c conda-forge gwdatafind gwpy python-ldas-tools-framecpp

Then we can query for a file URL around GW150914, discover the channels (data streams) it contains, and read the strain channel:

>>> from gwdatafind import find_urls
>>> files = find_urls('H', 'H1_LOSC_4_V1', 1126259460, 1126259464, host='datafind.gw-openscience.org')
>>> print(files)
['file://localhost/cvmfs/gwosc.osgstorage.org/gwdata/O1/strain.4k/frame.v1/H1/1126170624/H-H1_LOSC_4_V1-1126256640-4096.gwf']
>>> from gwpy.io.gwf import get_channel_names
>>> print(get_channel_names(files[0]))
['H1:LOSC-DQMASK', 'H1:LOSC-INJMASK', 'H1:LOSC-STRAIN']
>>> from gwpy.timeseries import TimeSeries
>>> print(TimeSeries.read(files, 'H1:LOSC-STRAIN', start=1126259460, end=1126259464))
TimeSeries([2.36153281e-19, 2.43491425e-19, 2.21593212e-19, ...,
            1.42904638e-19, 1.36820401e-19, 1.21621713e-19]
           unit: dimensionless,
           t0: 1126259460.0 s,
           dt: 0.000244140625 s,
           name: H1:LOSC-STRAIN,
           channel: H1:LOSC-STRAIN)

Many other tools and libraries are available to read GWF files, both in Python and in other languages.

OSG Singularity images (singularity.opensciencegrid.org)

Name: singularity.opensciencegrid.org
Token scopes: -

IGWN (and other groups) publish Singularity container images via CVMFS for distributed available, including to HTC grid jobs.

Publishing images to CVMFS

See Publishing Singularity Images To CVMFS for details on how to publish Docker images to CVMFS for distribution in singularity.opensciencegrid.org.