NIRD S3

What is object storage?

Object storage is specifically designed to manage large volumes of unstructured data. It stores data as discrete units, known as objects, each accompanied by metadata and a unique identifier. Each object is accessible via a dedicated key-pair, ensuring that all data blocks within a file are kept together as a single entity, complete with its relevant metadata and unique ID.

In the context of object storage, traditional file storage directories (POSIX) are analogous to ‘buckets.’

About NIRD S3

On NIRD, object storage is available as a service component within the NIRD Data Lake, supporting the efficient consumption and ingestion of unstructured data. This service also facilitates data streaming, data sharing with external collaborators, and more.

The NIRD Data Lake provides unified access to data, allowing the same data can be accessed simultaneously through multiple protocols, such as POSIX, NFS, and S3.

The object storage on NIRD Data Lake facilitates seamless collaboration without the necessity for data staging between locations. It also allows integration with data processing frameworks and AI/ML platforms, providing fast, and easy access to large datasets for training models and running simulations.

NIRD S3 comes with enhanced load balancing and high I/O throughput, with a measured throughput of 27GB/s.

NIRD S3 supports 10TB single object sizes and up to 100 million objects per bucket.

Working with NIRD S3

This guide provides a brief overview of how to use NIRD S3, though it does not cover all possible use cases and functionalities.

Obtaining S3 Access

To gain access to NIRD S3, you must be a member of an active project on NIRD, have an allocation on the NIRD Data Lake, and own the directory for which you’re requesting S3 access.

Use the RequestS3Access command to submit a request. This request is processed periodically by the system, which will then register an S3 account and create a bucket linked to the specified directory.

nird-login $ RequestS3Access /nird/datalake/NSxxxxK/my-directory

Upon successful request, further information is sent to your email address.

Please read carefully the information in the email and pay attention to the NAME of your bucket in the message. Some symbols are not permitted in the bucket names and the directory name which you created for the bucket may have been modified (e.g. replaced underscores, etc.).

S3 client configuration

Several CLI clients are available like AWS CLI, s3cmd, MinIO Client, rclone, etc. As graphical user interface, Cyberduck may be used. Please consult each tool’s documentation for detailed help.

Examples below are given based on AWS CLI, and example configuration listed below.

.aws/config

The [profile s3test] value in .aws/config shall match the value in .aws/credentials : [s3test]

[default]
region = us-east-1

[profile s3test]
region = us-east-1
s3 =
    multipart_chunksize = 5GB
    multipart_threshold = 2GB
    max_concurrent_requests = 100

.aws/credentials

The values of aws_access_key_id and aws_secret_access_key in .aws/credentials shall contain the values in the <username>-<project>-s3creds.txt file as follows (this file is automatically created after S3 access request here above and is located in the user home directory):

[default]
aws_access_key_id =
aws_secret_access_key =
[s3test]
aws_access_key_id = <Access Key in the file>
aws_secret_access_key = <Secret Key in the file>

List bucket aws --profile s3test --endpoint https://s3.nird.sigma2.no s3 ls s3://username-nsxxxxk-bucketname/

Put object aws --profile s3test --endpoint https://s3.nird.sigma2.no s3 cp local-file.nc s3://username-nsxxxxk-bucketname/

Get object aws --profile s3test --endpoint https://s3.nird.sigma2.no s3 cp s3://username-nsxxxxk-bucketname/somefile .

Use S3 API to fetch file metadata aws --profile s3test --endpoint https://s3.nird.sigma2.no s3api head-object --bucket username-nsxxxxk-bucketname --key somefile