NIRD Tiered Storage and Data Lake
NIRD consists of two separate storage systems, namely Tiered Storage (NIRD TS) and Data Lake (NIRD DL).
The NIRD TS have several tiers spanned by single filesystem and designed for active projects, while NIRD DL has a more flat structure, designed for less active data and sharing data across multiple projects and interfacing with external storages. Both are based on IBM Elastic Storage System.
NIRD TS contains building blocks based on:
- ESS 3200 building blocks for NVMe drives
- ESS 5000 building blocks for NL-SAS drives
and the NIRD DL contains building blocks based only on:
- ESS 5000 building blocks for NL-SAS drives
Below is the detail information of the difference between TS and DL.
TS vs DL Capacity and Performance
Tiered Storage(TS) |
Data Lake(DL) |
---|---|
Capacity : 22 PB |
Capacity : 11 PB |
Performance: |
Performance: |
TS vs DL Architecture
Tiered Storage(TS) |
Data Lake(DL) |
---|---|
Performance and capacity tiers |
Flat architecture (no tiers) |
Designed for active projects |
Designed for less active data, data libraries, sharing data across multiple projects and interfacing with external storages |
Data integrity secured by: |
Data integrity secured by: |
TS vs DL Functionalities
Tiered Storage(TS) |
Data Lake(DL) |
---|---|
Protocols: POSIX, GPFS, NFS and SMB(Not yet enabled) |
Protocols: POSIX, GPFS,S3(To be enabled later in 2023) |
APIs: GPFS, Discover REST API(Initially to be used internally, might be offered for selected projects) |
APIs: GPFS, S3(To be enabled later in 2023) |
Possibilities for: |
RBAC via S3 |
ACLs and extended attributes |
ACLs and extended attributes |
TS vs DL Filesystems
Tiered Storage(TS) |
Data Lake(DL) |
---|---|
Project storage |
Data Lake project storage |
User’s home |
Backup for TS |
Primary site for the NIRD Research Data Archive |
Secondary site for the NIRD Research |
Scratch storage |
NIRD TS(Resource: projects) and NIRD DL (Resource: datalake) will have separate quota based on the project allocation. You can see the quota and the current usage by running:
$ dusage -p NSxxxxK
where NSxxxxK
is the ID of the project.