Job Types on Olivia

Olivia is designed for large-scale parallel jobs and GPU-accelerated workloads. With its high-performance compute nodes featuring 256 or 288 CPUs and substantial memory per node, Olivia is suited for computationally intensive applications that can scale across many cores.

The basic allocation units on Olivia are cpus, memory and GPUs, or whole nodes. The details about how the billing units are calculated can be found in Projects and accounting. Note that the number of GPUs is counted separately, not as part of the billing units.

Small

Allocation units: cpus and memory
Job Limits:
- maximum 256 billing units
- maximum 1 node
Maximum walltime: 7 days
Priority: normal
Available resources:
- 88 nodes with 256 AMD cpus and 741 GiB RAM
Parameter for sbatch/salloc:
- --partition=small (can be omitted, small is the default)
Job Scripts: Small

This is the default job type, meant for CPU-only jobs needing less than a whole node. The partition is good for:

Memory-intensive applications requiring substantial RAM but less than 256 CPUs.

Large

Allocation units: whole nodes
Job Limits:
- maximum 9 nodes
Maximum walltime: 7 days
Priority: normal
Available resources:
- 172 nodes with 256 AMD cpus and 741 GiB RAM
Parameter for sbatch/salloc:
- --partition=large
Job Scripts: Large

This is meant for larger CPU-only jobs, needing at least one node. The partition is good for:

Large-scale parallel computations
Memory-intensive applications requiring substantial RAM and many CPUs
Jobs that can efficiently utilize many CPU cores
Scientific simulations requiring significant computational resources

Note that jobs will be allocated to whole nodes, no matter what --ntasks and/or --ntasks-per-node are specified as. A notice will be printed about this at submission if needed. This can be suppressed by setting the environment variable SLURM_SUBMIT_SUPPRESS_NTASKS_WARNING or SLURM_SUBMIT_SUPPRESS_WARNINGS to 1 (the latter will suppress any warnings from submission).

Accel

Allocation units: cpus, memory and GPUs
Job Limits:
- minimum 1 GPU
- maximum 32 GPUs
Maximum walltime: 7 days
Priority: normal
Available resources: 76 nodes (max 60 per project) with 288 ARM64 cpus, 808 GiB RAM and 4 GH200 GPUs.
Parameter for sbatch/salloc:
- --partition=accel
- --gpus=N, --gpus-per-node=N or similar, with N being the number of GPUs
Job Scripts: Accel

Accel jobs give access to use the Grace Hopper nodes that combine ARM64 CPUs with NVIDIA GH200 GPUs. This is useful for AI/ML training, inference, and other GPU-accelerated applications.

Can be combined with --qos=devel to get higher priority but maximum wall time (2h) and resource limits of devel apply.

Devel

Allocation units: cpus and memory and GPUs
Job Limits:
- maximum 1152 billing units per job
- maximum 16 GPUs per job
- maximum 2304 billing units in use at the same time
- maximum 32 GPUs in use at the same time
- maximum 2 running jobs per user
Maximum walltime: 2 hours
Priority: high
Available resources: devel jobs can run on any node on Olivia
Parameter for sbatch/salloc:
- --qos=devel
Job Scripts: Devel

This is meant for small, short development or test jobs. Devel jobs get higher priority for them to run as soon as possible. On the other hand, there are limits on the size and number of devel jobs.

Can be combined with --partition=small, --partition=large or --partition=accel to increase priority while having max wall time and job limits of devel job.

If you have temporary development needs that cannot be fulfilled by the devel job type, please contact us at support@nris.no.