Skip to content

CernVM File System (CVMFS)

What is CVMFS?

The CernVM File System ("CernVM-FS" or "CVMFS") is a tool that allows for efficient global distribution of software and data that does not change frequently. Its name indicates its origins for use by virtual machines in use by the high energy physics community, however, it has wider applicability and usage. It caches files to disk so that, after the initial download, file access for the client is speedy.

Within IGWN, CVMFS is being used to distribute both instrument data ("frame files") and analysis software for use at the shared computing centres and by distributed workflows. You may elect to install CVMFS on your workstation for ease of access to recent data and software.

Platform compatibility

CVMFS is fully developed and tested for Linux, and works for the most part on macOS (with the explicit exception of X.509 authentication). There is no support for native Windows use, however CVMFS works well on the Windows Subsystem for Linux (WSL) version 2; users of Windows should follow the Linux instructions relevant for their WSL distribution of choice, noting any Extra steps for Windows users along the way.

Full documentation of CVMFS and its inner workings can be browsed online at https://cvmfs.readthedocs.io/.

Installing the CVMFS client

First, install the client using the instructions specific to your platform:

  1. Configure the necessary extra Apt repository for CERN-maintained CVMFS using these instructions.

  2. Configure the necessary extra Apt repository for cvmfs-contrib using these instructions.

  3. Update and install the relevant packages:

    sudo apt-get update
    sudo apt-get install cvmfs cvmfs-config-osg
    
  1. Install macFuse by downloading the .dmg installer from the latest release.

  2. Install the CVMFS client using these instructions.

  3. Reboot your machine to finish installing CVMFS.

  4. Manually configure the OSG config repo:

    git clone https://github.com/opensciencegrid/cvmfs-config-osg.git /tmp/cvmfs-config-osg
    sudo make install -C /tmp/cvmfs-config-osg
    rm -rf /tmp/cvmfs-config-osg
    sudo ln -s /etc/cvmfs/config.d/config-osg.opensciencegrid.org.conf /etc/cvmfs/default.d/
    
  5. Manually mount the OSG config repo:

    sudo mkdir -p /cvmfs/config-osg.opensciencegrid.org
    sudo mount -t cvmfs config-osg.opensciencegrid.org /cvmfs/config-osg.opensciencegrid.org
    
  1. Install and configure the OSG repositories using these instructions.

    Install OSG ≤3.5 for your platform

    In order to use X.509, it is important to install the OSG 3.5 repository, and not the 3.6 or later versions.

  2. Install the relevant packages

    yum -y install cvmfs cvmfs-config-osg cvmfs-x509-helper
    

Once the client is installed, you should configure it (on all platforms) using the following short set of steps:

  1. Run a basic setup:

    sudo cvmfs_config setup
    

    Extra step for WSL users

    If you are running a Linux distribution using the Windows Subsystem for Linux, the above cvmfs_config setup call will do nothing, but will alert you to instead run this:

    sudo cvmfs_config wsl2_start
    
  2. Create a default.local configuration for CVMFS that references the repositories you care about:

    sudo bash -c 'cat > /etc/cvmfs/default.local' << EOF
    CVMFS_REPOSITORIES=<my-cvmfs-repo>
    CVMFS_QUOTA_LIMIT=20000
    CVMFS_HTTP_PROXY=DIRECT
    EOF
    

    Configuring CVMFS_REPOSITORIES

    The CVMFS_REPOSITORIES variable should be defined as a comma-separated list of repository domains, e.g.

    CVMFS_REPOSITORIES=oasis.opensciencegrid.org,virgo.infn.it
    

    See the Useful CVMFS repositories section below for details of useful repositories and how to configure them.

  3. Reload the configuration and verify the file system:

    cvmfs_config probe
    

Useful CVMFS repositories:

This section describes a number of useful CVMFS repositories, and how to configure them.

OASIS (oasis.opensciencegrid.org)

OASIS is the OSG Application Software Installation Service, the recommended way to install software on the Open Science Grid. IGWN uses it as the host repository for the IGWN Conda Distribution.

OASIS software is (mostly) built for linux

Most of the software available on OASIS (including the IGWN Conda Distribution) is compiled for Linux use, so is unlikely to work on macOS, however, there are data and other files distributed in the OASIS repo that may be used on that platform.

To configure OASIS:

  1. Update the CVMFS_REPOSITORIES configuration variable in /etc/cvmfs/default.local to include oasis.opensciencegrid.org:

    CVMFS_REPOSITORIES=oasis.opensciencegrid.org
    

    Extra step for macOS users

    On macOS you need to manually mount the new repo:

    sudo mkdir -p /cvmfs/oasis.opensciencegrid.org
    sudo mount -t cvmfs oasis.opensciencegrid.org /cvmfs/oasis.opensciencegrid.org
    
  2. Probe the repo to check that it works

    cvmfs_config probe oasis.opensciencegrid.org
    
  3. You should now be able to see the IGWN Conda Distribution, as well as a huge host of software from IGWN and other projects:

    $ /cvmfs/oasis.opensciencegrid.org/ligo/sw/conda/condabin/conda --version
    conda 4.9.2
    

IGWN proprietary data (igwn.osgstorage.org)

Only for registered IGWN members

Access to the IGWN proprietary data distributed with CVMFS is restricted to registered IGWN collaboration members, and cannot be accessed by non-collaboration members. For information on accounts and access to collaboration services see Accounts.

For information on how to access Open Data via CVMFS, see GW Open Data below.

Prerequisites

Please complete the configuration of oasis.opensciencegrid.org before configuring igwn.osgstorage.org.

Not available on macOS

The macOS CVMFS client does not work with X.509-authenticated repositories at this time. It is possible that in the future support will be available when the shift to SciTokens is complete.

Proprietary IGWN data are distributed via CVMFS to make them available to jobs using the distributed HTC grid supported by the OSG. Access to the files is restricted to authorised IGWN members who must include authorisation credentials with jobs that require access to the data. The files are discoverable using the GWDataFind service hosted at datafind.ligo.org.

To configure IGWN proprietary data via CVMFS:

  1. Install cvmfs-x509-helper to support authorised access via SciTokens or X.509:

    apt-get -y install cvmfs-x509-helper
    
    dnf -y install cvmfs-x509-helper
    
    yum -y install cvmfs-x509-helper
    
  2. Update the CVMFS_REPOSITORIES configuration variable in /etc/cvmfs/default.local to include igwn.osgstorage.org and ligo.osgstorage.org:

    CVMFS_REPOSITORIES=igwn.osgstorage.org,ligo.osgstorage.org,oasis.opensciencegrid.org
    
  3. See How to generate a SciToken or How to generate a credential (X.509) for details on how to generate a SciToken or X.509 credential based on the (digital) identity you hold. Short version:

    Generating an X.509 credential

    conda install -c conda-forge ciecplib
    ecp-get-cert -i login.ligo.org -u marie.curie
    
  4. Probe the repos to check that they work

    cvmfs_config probe ligo.osgstorage.org
    cvmfs_config probe igwn.osgstorage.org
    

    This might not work

    If you don't have a valid X.509 credential, the cvmfs_config probe will fail, see step 3 above.

And once you have a valid X.509 credential, you should be able to use GWDataFind to discover data available via CVMFS.

Discovering data via CVMFS

See CVMFS data discovery for details on discovering data available via CVMFS.

GW Open Data (gwosc.osgstorage.org)

Data published by the Gravitational Wave Open Science Center are directly downloadable from their website, but are also made available via CVMFS. These files are discoverable using the GWDataFind service hosted at datafind.gw-openscience.org.

To configure GW Open Data via CVMFS:

  1. Update the CVMFS_REPOSITORIES configuration variable in /etc/cvmfs/default.local to include gwosc.osgstorage.org:

    CVMFS_REPOSITORIES=gwosc.osgstorage.org
    

    Extra step for macOS users

    On macOS you need to manually mount the new repo:

    sudo mkdir -p /cvmfs/gwosc.osgstorage.org
    sudo mount -t cvmfs gwosc.osgstorage.org /cvmfs/gwosc.osgstorage.org
    
  2. Probe the repo to check that it works

    cvmfs_config probe gwosc.osgstorage.org
    

The GWOSC data are not restricted in the same way as those from igwn.osgstorage.org, meaning that as soon as you have configured the CVMFS repository, you should be able to query for and read data:

Discovering and reading data from gwosc.osgstorage.org

First install some software to discover data, and read it:

conda install -c conda-forge gwdatafind gwpy python-ldas-tools-framecpp

Then we can query for a file URL around GW150914, discover the channels (data streams) it contains, and read the strain channel:

>>> from gwdatafind import find_urls
>>> files = find_urls('H', 'H1_LOSC_4_V1', 1126259460, 1126259464, host='datafind.gw-openscience.org')
>>> print(files)
['file://localhost/cvmfs/gwosc.osgstorage.org/gwdata/O1/strain.4k/frame.v1/H1/1126170624/H-H1_LOSC_4_V1-1126256640-4096.gwf']
>>> from gwpy.io.gwf import get_channel_names
>>> print(get_channel_names(files[0]))
['H1:LOSC-DQMASK', 'H1:LOSC-INJMASK', 'H1:LOSC-STRAIN']
>>> from gwpy.timeseries import TimeSeries
>>> print(TimeSeries.read(files, 'H1:LOSC-STRAIN', start=1126259460, end=1126259464))
TimeSeries([2.36153281e-19, 2.43491425e-19, 2.21593212e-19, ...,
            1.42904638e-19, 1.36820401e-19, 1.21621713e-19]
           unit: dimensionless,
           t0: 1126259460.0 s,
           dt: 0.000244140625 s,
           name: H1:LOSC-STRAIN,
           channel: H1:LOSC-STRAIN)

Many other tools and libraries are available to read GWF files, both in Python and in other languages.

OSG Singularity images (singularity.opensciencegrid.org)

IGWN (and other groups) publish Singularity container images via CVMFS for distributed available, including to HTC grid jobs.

To configure OSG Singularity images via CVMFS:

  1. Update the CVMFS_REPOSITORIES configuration variable in /etc/cvmfs/default.local to include singularity.opensciencegrid.org:

    CVMFS_REPOSITORIES=oasis.opensciencegrid.org,singularity.opensciencegrid.org
    

    Keeping OASIS is optional

    In this example we also configure the oasis.opensciencegrid.org repo, mainly because it provides a build of singularity itself, but I believe that singularity.opensciencegrid.org can work independently of oasis.opensciencegrid.org if you have your own build of singularity.

    Extra step for macOS users

    On macOS you need to manually mount the new repo:

    sudo mkdir -p /cvmfs/singularity.opensciencegrid.org
    sudo mount -t cvmfs singularity.opensciencegrid.org /cvmfs/singularity.opensciencegrid.org
    
  2. Probe the repo to check that it works

    cvmfs_config probe singularity.opensciencegrid.org
    
  3. You can now run a simple test to validate that singularity can execute images:

    Validating singularity.opensciencegrid.org

    $ /cvmfs/oasis.opensciencegrid.org/mis/singularity/bin/singularity \
          exec \
          /cvmfs/singularity.opensciencegrid.org/igwn/software:el7 \
          cat /etc/os-release
    NAME="Scientific Linux"
    VERSION="7.9 (Nitrogen)"
    ID="scientific"
    ID_LIKE="rhel centos fedora"
    VERSION_ID="7.9"
    PRETTY_NAME="Scientific Linux 7.9 (Nitrogen)"
    ANSI_COLOR="0;31"
    CPE_NAME="cpe:/o:scientificlinux:scientificlinux:7.9:GA"
    HOME_URL="http://www.scientificlinux.org//"
    BUG_REPORT_URL="mailto:scientific-linux-devel@listserv.fnal.gov"
    
    REDHAT_BUGZILLA_PRODUCT="Scientific Linux 7"
    REDHAT_BUGZILLA_PRODUCT_VERSION=7.9
    REDHAT_SUPPORT_PRODUCT="Scientific Linux"
    REDHAT_SUPPORT_PRODUCT_VERSION="7.9"
    

Publishing images to CVMFS

See Publishing Singularity Images To CVMFS for details on how to publish Docker images to CVMFS for distribution in singularity.opensciencegrid.org.