PyTorch on Olivia

This guide family shows how to run PyTorch on Olivia in three ways:

  1. Module path through the NRIS GPU software stack.

  2. Direct container path using Apptainer explicitly.

  3. EESSI path using the EESSI software stack.

The main focus is the HPC workflow: start on one GPU, then scale to multiple GPUs on one node, and then to multiple nodes.

Guide Structure

Use the reference pages first:

  1. PyTorch software options

  2. Models, datasets, caches, and overlays

  3. Adding Python packages to module and container paths

  4. Monitoring the jobs & Debugging

Then follow the execution guides:

  1. Single-GPU guide

  2. Multi-GPU guide

  3. Multi-node guide

Performance Summary

This 3-part guide walks you through scaling PyTorch training on Olivia’s GH200 GPUs:

Configuration

Throughput

Speedup

Single GPU (Part 1)

~5,100 img/s

1x

4 GPUs on 1 node (Part 2)

~37,000 img/s

7x

8 GPUs on 2 nodes (Part 3)

~63,000 img/s

12x

The multi-GPU guides use FP16 mixed precision for improved performance.

Note

Key considerations for Olivia:

  1. The login node is x86_64, while the GPU compute nodes are Aarch64.

  2. Software and containers must therefore be compatible with ARM on the compute nodes.

  3. Set up projects in project or work storage, not in your home directory.