NC State High Performance Computing
High Performance Computing

Getting Started with HPC storage systems

    HPC users have a number of file systems available for their use. Effective use of the HPC resources requires some understanding of the types of available file systems and their intended use.

    The general types of storage available are:

    The following sections will describe each of these in some detail including the intended use of these storage resources.

  • Home Directory

    Each user has a home directory. The home directory is shared by all cluster nodes. Individual user file quotas are enforced on home directories. Total available space in the home file system is small by design and these quotas are used to manage the available space. Home directories are intended to be used to hold commonly used scripts, environment configuration files, and modest size source trees. Home directories are backed up daily. Only one copy of each file is retained in the backup. Files which have been deleted for more than 5 days are subject to being deleted from the backup.

  • Scratch Space

    Scratch space is intended to be used for the storage requirements for running jobs. In particular, large input or output files should use scratch space during job execution. Users should create a directory for their use to avoid potential file name conflicts with other users.

    Scratch space is not backed up.

    • Local Scratch Space

      Local scratch space is directly connected to the compute node. On the henry2 Linux cluster the local scratch file system available to users is /scratch. Local scratch file system contents are only available to the local node to which the file system is directly connected. Use of the local scratch space must be managed from the user's LSF script (since there is no way to know ahead of time which nodes a job may be assigned) both movement of files to the space and removal of files after execution completes. Local scratch space on the cluster is subject to immediate removal of files at the completion of the LSF job.

      Local scratch space is relatively small - a few GB to few dozen GB depending on the node and must be carefully managed by the user. Except for a few very special cases use of local scratch space should be avoided.

      Like all Linux systems, compute nodes have a world writable /tmp file system. This space is essential for the proper operation of the operating system and many applications. /tmp on the compute nodes is very small (~2GB) and should never be used for user file storage.

    • Shared Scratch Space

      The henry2 Linux cluster also has shared scratch space. This file system is attached to the login nodes and to all of the compute nodes. Each HPC project has a group writable scratch directory /share/group_name. If group_name is not known it can be found using the groups command. The first group name listed in output is the default group and the one that should have directory under /share.

      Shared scratch file systems are subject to periodic purge and are not backed up. A per project quota is enforced on each shared scratch file system.

      Any file in shared scratch space is subject to removal at any time. A purge is used to maintain free space in the file system. While the purge generally allows files to remain on the shared scratch file systems for a week or more, during periods of high disk use this may not be true and files that are only a day or two old may also be removed by the purge.

      /share directories are stored in the /gpfs_common file system. The /share/group_name symbolic link should always be used when referencing this space. From time to time it may be necessary to move the actual location of these directories. Using the symbolic link will ensure that scripts continue to work regardless of any changes in the actual path.

  • Directories for sharing with support staff (/share/Support and /share/help)

    Support staff can write to /share/Support, which users can read.
    Users can write to /share/help, which support staff can read. /share/help is configured as a drop box, that is users can write to the directory but cannot read the directory.

  • Mass Storage System

    Mass storage space is intended to hold important files that are too large to be stored in users' home directories. Users requiring mass storage space should request that a mass storage directory be created for their use.

    It is anticipated that research groups will have up to a 1TB group quota for mass storage space with options to purchase additional quota if required.

    Mass storage space is available from all login nodes. Mass storage space is not available from compute nodes and can not be used as an alternative to scratch space for running jobs.

    • Configuration

      The mass storage file system is /ncsu/volume1. It is available via NFS from the HPC login nodes.

    • Space for User Maintained Executables

      Acceptable Use Policy
        Directory tree /usr/local/usrapps is intended to provide space
          for user installed and maintained applications
        Space is not to be used for data or working space from which to execute jobs
        Applications must be maintained/patched to minimize potential security vulnerabilities
       Access should be managed via Linux group permissions - care needs
          to be taken by group installing application to set access appropriately
          for any license restrictions
        Applications that require root access to install are not permitted.

      Procedure to request use of space under /usr/local/usrapps

        Submit request to oit_hpc Service Now group - email to
        Request should include
          Name of HPC Project that will be responsible for the application
          Name of the application to be installed
          Statement that the group is authorized (by owner of the application)
            to install the application on the HPC Linux cluster
        A directory will be created with group read/write access for the requesting project
        Project group will be responsible for installing and maintaining the application

    • Backups

      Backup frequency for the HPC storage system is daily from the /home, /ncsu/volume1, and /ncsu/volume2 file systems to a tape library. One copy of each file is maintained in the tape library. When a file is modified on disk the new version of the file replaces any previous backup of that file.

      Files removed from /home, /ncsu/volume1, or /ncsu/volume2 file system will remain in the backup for at least five days.

      A consequence of the backup policy is that files that are updated with the same name will overwrite the backup version during the daily update. Files that are being modified for which previous versions may be needed should be modified using a file naming scheme to retain previous versions with unique file names.

    • HSM

      An additional level of management is utilized on /ncsu/volume1. Tivoli Space Manager is used to migrate older, larger files from the file system disk to tape. Migrated files are retrieved automatically if they are accessed.

      Space manager seeks to maintain the disk usage level for /ncsu/volume1 between 85% and 90%.

Last modified: December 16 2017 12:58:06.