PyTorch on Olivia
This guide family shows how to run PyTorch on Olivia in three ways:
Module path through the NRIS GPU software stack.
Direct container path using Apptainer explicitly.
EESSI path using the EESSI software stack.
The main focus is the HPC workflow: start on one GPU, then scale to multiple GPUs on one node, and then to multiple nodes.
Guide Structure
Use the reference pages first:
Then follow the execution guides:
Performance Summary
This 3-part guide walks you through scaling PyTorch training on Olivia’s GH200 GPUs:
Configuration |
Throughput |
Speedup |
|---|---|---|
Single GPU (Part 1) |
~5,100 img/s |
1x |
4 GPUs on 1 node (Part 2) |
~37,000 img/s |
7x |
8 GPUs on 2 nodes (Part 3) |
~63,000 img/s |
12x |
The multi-GPU guides use FP16 mixed precision for improved performance.
Note
Key considerations for Olivia:
The login node is x86_64, while the GPU compute nodes are Aarch64.
Software and containers must therefore be compatible with ARM on the compute nodes.
Set up projects in project or work storage, not in your home directory.