NIRD S3

What is object storage?

Object storage is specifically designed to manage large volumes of unstructured data. It stores data as discrete units, known as objects, each accompanied by metadata and a unique identifier. Each object is accessible via a dedicated key-pair, ensuring that all data blocks within a file are kept together as a single entity, complete with its relevant metadata and unique ID.

In the context of object storage, traditional file storage directories (POSIX) are analogous to ‘buckets.’

About NIRD S3

On NIRD, object storage is available as a service component within the NIRD Data Lake, supporting the efficient consumption and ingestion of unstructured data. This service also facilitates data streaming, data sharing with external collaborators, and more.

The NIRD Data Lake provides unified access to data, allowing the same data can be accessed simultaneously through multiple protocols, such as POSIX, NFS, and S3.

The object storage on NIRD Data Lake facilitates seamless collaboration without the necessity for data staging between locations. It also allows integration with data processing frameworks and AI/ML platforms, providing fast, and easy access to large datasets for training models and running simulations.

NIRD S3 comes with enhanced load balancing and high I/O throughput, with a measured throughput of 27GB/s.

NIRD S3 supports 10TB single object sizes and up to 100 million objects per bucket.

Working with NIRD S3

This guide provides a brief overview of how to use NIRD S3, though it does not cover all possible use cases and functionalities.

Obtaining S3 Access

To gain access to NIRD S3, you must be a member of an active project on NIRD, have an allocation on the NIRD Data Lake, and own the directory for which you’re requesting S3 access.

Use the RequestS3Access command to submit a request. This request is processed periodically by the system, which will then register an S3 account and create a bucket linked to the specified directory.

nird-login $ RequestS3Access /nird/datalake/NSxxxxK/my-directory

Upon successful request, further information is sent to your email address.

S3 client configuration

Several CLI clients are available like AWS CLI, s3cmd, MinIO Client, rclone, etc. As graphical user interface, Cyberduck may be used. Please consult each tool’s documentation for detailed help.

Examples below are given based on AWS CLI, and example configuration listed below.

.aws/config

[default]
region = us-east-1

[profile s3test]
region = us-east-1
s3 =
    multipart_chunksize = 5GB
    multipart_threshold = 2GB
    max_concurrent_requests = 100

.aws/credentials

[default]
aws_access_key_id =
aws_secret_access_key =
[s3test]
aws_access_key_id = my-s3-user
aws_secret_access_key = my-s3-password

List bucket aws --profile s3test --endpoint https://s3.nird.sigma2.no s3 ls s3://my-directory-nsxxxxk/

Put object aws --profile s3test --endpoint https://s3.nird.sigma2.no s3 cp local-file.nc s3://my-directory-nsxxxxk/

Get object aws --profile s3test --endpoint https://s3.nird.sigma2.no s3 cp s3://my-directory-nsxxxxk/somefile .

Use S3 API to fetch file metadata aws --profile s3test --endpoint https://s3.nird.sigma2.no s3api head-object --bucket my-directory-nsxxxxk --key somefile