Getting Started with HPC storage systems
HPC users have a number of file systems available for their
use. Effective use of the HPC resources requires some
understanding of the types of available file systems and
their intended use.
- home directory
- local scratch space
- shared scratch space
- mass storage space
- space for user maintained executables
Each user has a home directory. The home directory is shared by all cluster nodes. Individual user file quotas are enforced on home directories. Total available space in the home file system is relatively small by design and these quotas are used to manage the available space. Home directories are intended to be used to hold commonly used scripts, environment configuration files, and modest size source trees. Home directories are backed up daily. Only one copy of each file is retained in the backup. Files which have been deleted for more than 5 days are subject to being deleted from the backup.
Scratch space is intended to be used for the storage requirements for running jobs. In particular, large input or output files should use scratch space during job execution. Scratch file systems are world writable. Users should create a directory for their use to avoid potential file name conflicts with other users.
Scratch space is not backed up.
Local scratch space is directly connected to the compute node. On the Linux cluster the local scratch file system available to users is /scratch. Local scratch file system contents are only available to the local node to which the file system is directly connected. Use of the local scratch space must be managed from the user's LSF script (since there is no way to know ahead of time which nodes a job may be assigned) both movement of files to the space and removal of files after execution completes. Local scratch space on the cluster is subject to immediate removal of files at the completion of the LSF job.
Local scratch space is relatively small a few GB to few dozen GB depending on the node and must be carefully managed by the user. Except for a few very special cases use of local scratch space should be avoided.
Like all Linux systems, compute nodes have a world writable /tmp file system. This space is essential for the proper operation of the operating system and many applications. /tmp on the compute nodes is very small (~2GB) and should never be used for user file storage.
Shared scratch file systems are subject to periodic purge and are not backed up. A per project quota is enforced on each shared scratch file system.
Any file in shared scratch space is subject to removal at any time. A purge is used to maintain free space in the file system. While the purge generally allows files to remain on the shared scratch file systems for a week or more, during periods of high disk use this may not be true and files that are only a day or two old may also be removed by the purge.
As with local scratch space this storage is intended to provide large storage space required by jobs during execution.
A GPFS file system is also available on the Linux cluster (henry2). This file system /gpfs_share has a per project quota and is also not backed up. Codes spending significant amounts of time doing parallel I/O - and any code using MPI-IO - should use /gpfs_share.
As of spring of 2014 the shared scratch file systems (actually now file sets) /share, /share2, and /share3 are also provided via a GPFS file system, /gpfs_common.
It is anticipated that research groups will have up to a 1TB group quota for mass storage space with options to purchase additional quota if required.
Mass storage space is available from all login nodes. Mass storage space is not available from compute nodes and can not be used as an alternative to scratch space for running jobs.
Acceptable Use Policy
Directory tree /usr/local/usrapps is intended to provide space
for user installed and maintained applications
Space is not to be used for data or working space from which to execute jobs
Applications must be maintained/patched to minimize potential security vulnerabilities
Access should be managed via Linux group permissions - care needs
to be taken by group installing application to set access appropriately
for any license restrictions
Applications that require root access to install are not permitted.
Procedure to request use of space under /usr/local/usrapps
Submit request to oit_hpc Service Now group - email to email@example.com
Request should include
Name of HPC Project that will be responsible for the application
Name of the application to be installed
Statement that the group is authorized (by owner of the application)
to install the application on the HPC Linux cluster
A directory will be created with group read/write access for the requesting project
Project group will be responsible for installing and maintaining the application
Backup frequency for the HPC storage system is daily from the /home, /ncsu/volume1, and /ncsu/volume2 file systems to a tape library. One copy of each file is maintained in the tape library. When a file is modified on disk the new version of the file replaces any previous backup of that file.
Files removed from /home, /ncsu/volume1, or /ncsu/volume2 file system will remain in the backup for at least five days.
A consequence of the backup policy is that files that are updated with the same name will overwrite the backup version during the daily update. Files that are being modified for which previous versions may be needed should be modified using a file naming scheme to retain previous versions with unique file names.
HSMAn additional level of management is utilized on /ncsu/volume1. Tivoli Space Manager is used to migrate older, larger files from the file system disk to tape. Migrated files are retrieved automatically if they are accessed.
Space manager seeks to maintain the disk usage level for /ncsu/volume1 between 85% and 90%.
There are currently two mass storage file systems, /ncsu/volume1 and /ncsu/volume2. Users will only be provided a directory on one of these file systems.
Separate file servers are used for /ncsu/volume1 and /ncsu/volume2. Both file systems are availabe from login nodes via NFS.
The general types of storage available are:
The following sections will describe each of these in some detail including the intended use of these storage resources.
Last modified: February 02 2017 12:17:33.