Storage areas on HPC clusters
Projects and users receive different areas to store files and other data. Some areas are used for temporary files during job execution while others are for storing project data.
Overview
The following table summarizes the different storage options for Betzy, Fram, and Saga. Below the table, we give recommendations and discuss the pros and cons of the various storage areas.
Directory |
Purpose |
||
---|---|---|---|
|
User data |
20 GiB / 100 K files |
Only if quota enforced |
|
Per-job data |
N/A |
No |
(Fram/Saga) |
Per-job data |
No |
|
|
Staging and job data |
N/A |
No |
|
Project data |
Yes |
|
|
Shared data |
No |
User areas and project areas are private: Data handling and storage policy is documented here.
$LOCALSCRATCH
area is only implemented on Fram and Saga.In addition to the areas in the tables above, both clusters mount the NIRD project areas as
/nird/datapeak/NSxxxxK
for NIRD Data Peak (TS) projects and/nird/datalake/NSxxxxK
for NIRD Data Lake (DL) projects on the login nodes (but not on the compute nodes).The
/cluster
file system is a high-performance parallel file system. On Fram, it is a Lustre system with a total storage space of 2.3 PB, and on Saga it is a BeeGFS system with a total storage space of 6.5 PB. For performance optimizations, consult Optimizing storage performance.
Home directory
The home directory is /cluster/home/$USER
. The location is stored
in the environment variable $HOME
. Storage quota is enabled on home
directories which is by default 20 GiB and 100 000 files, so it
is not advisable to run jobs in $HOME
. However, it is perfectly
fine to store stderr
and stdout
logs from your batch jobs in
$HOME
so they are available for reviewing in case of issues with it.
The home directory should be used for storing tools, scripts, application sources or other relevant data which must have a backup.
The home directory is only accessible for the user. Files that should be accessible by other uses in a project must be placed in the project area.
Backed up with daily snapshots only if Storage quota is enforced for the last 7 days and weekly snapshots for the last 6 weeks.
Job scratch area
Each job gets an area /cluster/work/jobs/$SLURM_JOB_ID
that is
automatically created for the job, and automatically deleted when the
job finishes. The location is stored in the environment variable
$SCRATCH
available in the job. $SCRATCH
is only accessible by the
user running the job.
On Fram and Saga there are two scratch areas (see also below).
The area is meant as a temporary scratch area during job execution. This area is not backed up (documentation about backup).
There are special commands (savefile
and cleanup
) one can use in
the job script to ensure that files are copied back to the submit
directory $SLURM_SUBMIT_DIR
(where sbatch
was run).
Note
Pros of running jobs in the job scratch area
There is less risk of interference from other jobs because every job ID has its own scratch directory.
Because the scratch directory is removed when the job finishes, the scripts do not need to clean up temporary files.
Warning
Cons of running jobs in the job scratch area
Since the area is removed automatically, it can be hard to debug jobs that fail.
One must use the special commands to copy files back in case the job script crashes before it has finished.
If the main node of a job crashes (i.e., not the job script, but the node itself), the special commands might not be run, so files might be lost.
Job scratch area on local disk
This only exists on Fram and Saga.
A job on Fram/Saga can request a scratch area on local disk on the node
it is running on. This is done by specifying
--gres=localscratch:<size>
, where --gres=localscratch:20G
for 20 GiB.
Normal compute nodes on Fram have 198 GiB disk that can be handed out to local scratch areas, and the bigmem nodes have 868 GiB. On Saga most nodes have 330 GiB; a few of the bigmem nodes have 7 TiB, the hugemem nodes have 13 TiB and the GPU nodes have either 406 GiB or 8 TiB. If a job tries to use more space on the area than it requested, it will get a “disk quota exceeded” or “no space left on device” error (the exact message depends on the program doing the writing). Please do not ask for more than what you actually need, other users might share the local scratch space with you (Saga only).
Jobs that request a local scratch area, get an area /localscratch/$SLURM_JOB_ID
that is automatically created for the job, and automatically deleted
when the job finishes. The location is stored in the environment
variable $LOCALSCRATCH
available in the job. $LOCALSCRATCH
is
only accessible by the user running the job.
Note that since this area is on local disk on the compute node, it is probably not useful for jobs running on more than one node (the job would get one independent area on each node).
The area is meant to be used as a temporary scratch area during job
execution by jobs who do a lot of disk IO operations (either metadata
operations or read/write operations). Using it for such jobs will
speed up the jobs, and reduce the load on the /cluster
file system.
This area is not backed up (documentation about backup).
Currently, there are no special commands to ensure that files are
copied back automatically, so one has to do that with cp
commands or
similar in the job script.
Note
Pros of running jobs in the local disk job scratch area
Input/output operations are faster than on the
/cluster
file system.Great if you need to write/read a large number of files.
It reduces the load on the
/cluster
file system.There is less risk of interference from other jobs because every job ID has its own scratch directory.
Because the scratch directory is removed when the job finishes, the scripts do not need to clean up temporary files.
Warning
Cons of running jobs in the local disk job scratch area
Since the area is removed automatically, it can be hard to debug jobs that fail.
Not suitable for files larger than 198-300 GB.
One must make sure to use
cp
commands or similar in the job script to copy files back.If the main node of a job crashes (i.e., not the job script, but the node itself), files might be lost.
User work area
Each user has an area /cluster/work/users/$USER
. The location is
stored in the environment variable $USERWORK
.
This area is not backed up (documentation about backup).
By default, $USERWORK
is a private area and only accessible by
the user owning the area. However, it is possible to grant other
users access here, for e.g., debugging purposes. Note that write
access to your $USERWORK
can not be granted to others.
To allow others to read your work area, you may use the command:
chmod o+rx $USERWORK
Note that by doing so you will allow everyone on the machine to
access your user work directory. If you want to share the results
in $USERWORK
with other people in the project, the best way is to
move them to the project area.
The $USERWORK
directory is meant for files that are used by one
or more jobs. All result files must be moved out from this area
after the jobs finish, otherwise they will be automatically deleted
after a while (see notes below). We highly encourage users to keep
this area tidy, since both high disk usage and automatic deletion
process takes away disk performance. The best solution is to clean up
any unnecessary data after each job.
File deletion depends on the newest of the creation-, modification- and access time and the total usage of the file system. The oldest files will be deleted first and a weekly scan removes files older than 21 days (up to 42 days if sufficient storage is available).
When file system usage reaches 70%, files older than 21 days are subject to automatic deletion. If usage is over 90%, files older than 17 days are subject to automatic deletion.
It is not allowed to try to circumvent the automatic deletion by for instance running scripts that touch all files.
Note
Pros of running jobs in the user work area
Since job files are not removed automatically directly when a job finishes, it is easier to debug failing jobs.
There is no need to use special commands to copy files back in case the job script or node crashes before the job has finished.
Warning
Cons of running jobs in the user work area
There is a risk of interference from other jobs unless one makes sure to run each job in a separate sub directory inside
$USERWORK
.Because job files are not removed when the job finishes, one has to remember to clean up temporary files afterwards.
One has to remember to move result files to the project area if one wants to keep them. Otherwise they will eventually be deleted by the automatic file deletion.
Project area
All HPC projects have a dedicated local space to share data between project
members, located at /cluster/projects/<project_name>
.
The project area is controlled by Storage quota and the default project quota for HPC projects is 1 TiB, but projects can apply for more during the application process with a maximum quota of 10 TiB on Fram and Saga, and 20 TiB on Betzy.
Also after the project has been created, project members can request to increase the quota to up to 10/20 TiB by documenting why this is needed. Such requests should be submitted by the project leader via e-mail to contact@sigma2.no . Note that only files that are relevant for further computation jobs should be kept on the HPC machine. HPC is not intended for long-term storage. In your request, please include answers to the following questions:
How large are the input files? (Approximate or exact numbers are fine.)
How many such input files will be used in a single job?
At what rate do you intend to process your data? (Approximate GB per week or equivalent.)
What size are your output files and will you use this data as input in further analysis?
Please explain why you cannot benefit from the /cluster/work area NIRD is tightly connected with our HPC systems and data can be moved between the two both fast and easily.
Please explain why staging data from NIRD is not sufficient for your project
Based on your answers above, how much storage quota do you think you need?
Requests for more than 10/20 TiB require an application for a separate NIRD project area. On special occasions, storage above 10/20 TiB can be permitted. This requires an investigation of the workflow to ensure that needs cannot be satisfied through an allocation on NIRD. Granted disk space above 10/20 TiB is charged according to the Contribution model, Storage category B.
Note that unused quota can also be withdrawn for technical reasons (too little space) or organisational reasons (less needs/less usage/fewer members of the group/fewer compute hours).
Daily backup is taken to NIRD (documentation about backup).
Note
Pros of running jobs in the project area
Since job files are not removed automatically when a job finishes, it is easier to debug failing jobs.
There is no need to use special commands to copy files back in case the job script or node crashes before the job has finished.
There is no need to move result files to save them permanently or give the rest of the project access to them.
Warning
Cons of running jobs in the project area
There is a risk of interference from other jobs unless one makes sure to run each job in a separate sub-directory inside the project area.
Because job files are not removed when the job finishes, one has to remember to clean up temporary files afterwards otherwise they can fill up the quota.
There is a risk of using all of the disk quota if one runs many jobs and/or jobs needing a lot of storage at the same time.
Decommissioning
Starting at the 2020.1 resource allocation period, storage decommissioning procedures have been established for the HPC storages. This to ensure a predictable storage for users and projects, and the provisioning more sustainable to Sigma2. For more details, please visit the data decommissioning policies page.