NIRD - National Infrastructure for Research Data¶
NIRD is the National e-Infrastructure for Research Data. It is owned and operated by UNINETT Sigma2.
The NIRD infrastructure offers storage services, archiving services, and processing capacity for computing on the stored data. It offers services and capacities to any scientific discipline that requires access to advanced, large scale, or high-end resources for storing, processing, publishing research data or searching digital databases and collections.
NIRD will provide storage resources with yearly capacity upgrades, data security through geo-replication (data stored on two physical locations) and adaptable application services, multiple storage protocol support, migration to third-party cloud providers and much more. Alongside the national high-performance computing resources, NIRD forms the backbone of the national e-infrastructure for research and education in Norway, connecting data and computing resources for efficient provisioning of services.
The NIRD storage system consists of DDN SFA14K controllers, 3400 x 10TB NL-SAS drives with a total capacity of 2 x 11 PiB. The solution is based on DDN GridScaler® parallel file system, supporting multiple file, block and object protocols.
To gain access to the storage services, a formal application is required. The process is explained at the How to apply for a user account page.
Users must be registered and authorised by the project responsible before getting access.
To access or transfer data, you may use the following tools:
sftp. Visit the transferring files page
Access to your $HOME on NIRD and the project data storage area is through the login containers. Login containers are running on servers directly connected to the storage on both sites -that is Tromsø and Trondheim- to facilitate data handling right where the primary data resides. Each login container offers a maximum of 16 CPU cores and 128GiB of memory.
Login containers can be accessed via following addresses:
Note that we run four login containers per site.
If you plan to start a
screen session on one of the login containers or
you wish to copy data with the help of
WinSCP, you should log in
to a specific container.
X - can have values between 0 and 3.
Each user has a home directory
<username> is the username. The default quota for home directories
is 20 GiB and 100 000 files. To check the disk usage and quotas, type:
Home directories on NIRD also contain a backup of Betzy, Fram and Saga home
directories (when relevant) in
To account for this default quota is doubled when relevant.
Note that this is a backup from the HPC cluster; you cannot transfer
files to the cluster by putting them here.
The total storage space of
/scratch is 15TB.
Each user has a scratch directory
The area is meant as a temporary scratch area. This area is not backed up.
When file system usage reaches 75%, files are subject to automatic deletion.
There is no quota in the scratch area.
Each NIRD Data Storage project gets a project area
NSxxxxK is the ID of the project.
NIRD Data Storage projects are - with some exceptions, mutually agreed with the project leader - stored on two sites and asynchronously geo-replicated.
The main purpose for the replica is to ensure data integrity and resilience in case of large damage at the primary site.
We advice projects to assess which of the dataset needs a higher level of security and should be replicated. This helps in optimizing the storage space used by the project.
In general, one can consider which data can be easily reproduced, and which are copies of files stored on other storage resources. These data normally do not need replication, and can be considered excluded from replication.
Instructions on how to exclude files and folders from replication are described on the Granular Replication page.
For every project that has requested replication, the data is stored on a primary data volume on one site and the replica on the other site.
The primary site is chosen based on operational convenience, that is to be the one closest to where the data is consumed, namely NIRD-TOS if data is analysed on the Fram HPC cluster, or NIRD-TRD if data is analysed on the Saga or on the Betzy HPC clusters.
Projects have the possibility to read from and write to the primary site, while they cannot read from or write to the secondary site.
The users should log onto the login container nearest to the primary data storage.
The project area has a quota on disk space and the number of files, and you can see the quota and the current usage by running:
$ dusage -p NSxxxxK
In addition to geo-replication NIRD supports snapshots of project areas and home directories allowing for recovery of deleted data. For more information, visit the backup page.
The NIRD toolkit allows pre/post processing analysis, data intensive processing, visualization, artificial intelligence and machine learning platform. The NIRD toolkit services have access to your NIRD Project area. The available services can be found at the documentation of NIRD Toolkit .
Mounts on HPC¶
When relevant, the NIRD Storage project areas are also mounted on the login nodes of Betzy, Fram or Saga HPC clusters.
Only the primary data volumes for projects are mounted to the HPC clusters:
projects from NIRD-TOS to Fram
projects from NIRD-TRD to Betzy and Saga
You can check what the primary site is for a project by running the following on a NIRD login-node:
$ readlink /projects/NSxxxxK
Replace “xxxx” with the actual project number you want to check. It will print out a path starting either with /tos-project or /trd-project.
If it starts with “tos” then the primary site is in Tromsø (login-tos.nird.sigma2.no)
If it starts with “trd” then the primary site is in Trondheim (login-trd.nird.sigma2.no)
To avoid performance impact and operational issues, NIRD $HOME and project areas are not mounted on any of the compute nodes of the HPC clusters.
For more information, visit the Betzy, Fram and Saga page.