Introduction to using GPU compute

A GPU, or Graphics Processing Unit, is a computational unit, which as the name suggest, is optimized to work on graphics tasks. Nearly every computer device that one interacts with contains a GPU of some sort, responsible for transforming the information we want to display into actual pixels on our screens.

One question that might immediately present itself is, if GPUs are optimized for graphics - why are they interesting in the context of computational resources? The answer to that is of course complicated, but the short explanation is that many computational tasks have a lot in common with graphical computations. The reason for this is that GPUs are optimized for working with pixels on the screen, and a lot of them. Since all of these operations are almost identical, mainly working on floating point values, they can be run in parallel on dedicated hardware (i.e. the GPU) that is tailored and optimized for this particular task. This already sounds quite a bit like working with a discrete grid in e.g. atmospheric simulation, which points to the reason why GPUs can be interesting in a computational context.

Since GPUs are optimized for working on grids of data and how to transform this data, they are quite well suited for matrix calculations. For some indication of this we can compare the theoretical performance of one GPU with one CPU .

AMD Epyc 7742 (Betzy)

Nvidia P100 (Saga)

Nvidia A100 (Betzy)

Half Precision




Single Precision




Double Precision




Based on this it is no wonder why tensor libraries such as TensorFlow and PyTorch report speedup on accelerators between 23x and 190x compared to using only a CPU.

Getting started

To get started we first have to SSH into Saga:

[me@mylaptop]$ ssh <username>

From the hardware specification we see that there should be 8 GPU nodes available on Saga, and from the available job types we identify --partition=accel as the relevant hardware partition for GPU jobs. You can run the sinfo command to check the available partitions on Saga:

[me@login.SAGA]$ sinfo
normal*      up 7-00:00:00     88   resv c1-[1-22,24-28,30,32-37,39-52,56],c2-[1-3,6,8,10-18,21-23,29-33,36-39,43,54-56],c3-[1-2,5-6,8-10,13,15]
normal*      up 7-00:00:00      5    mix c5-28,c10-41,c11-[31,57-58]
normal*      up 7-00:00:00    226  alloc c1-[23,29,31,38,53-55],c2-[4-5,7,9,19-20,24-28,34-35,40-42,44-53],c3-[3-4,7,11-12,14,16-28],c5-[1-27,29-59],c10-[1-40,42-60],c11-[1-30,32-56,59-60]
bigmem       up 14-00:00:0      1 drain* c6-2
bigmem       up 14-00:00:0     32    mix c3-[29-35,38-56],c6-[1,3-7]
bigmem       up 14-00:00:0      2  alloc c3-[36-37]
bigmem       up 14-00:00:0      1   idle c6-8
accel        up 14-00:00:0      2    mix c7-[3,8]
accel        up 14-00:00:0      4  alloc c7-[1-2,4,6]
accel        up 14-00:00:0      2   idle c7-[5,7]
optimist     up   infinite      1 drain* c6-2
optimist     up   infinite     88   resv c1-[1-22,24-28,30,32-37,39-52,56],c2-[1-3,6,8,10-18,21-23,29-33,36-39,43,54-56],c3-[1-2,5-6,8-10,13,15]
optimist     up   infinite     39    mix c3-[29-35,38-56],c5-28,c6-[1,3-7],c7-[3,8],c10-41,c11-[31,57-58]
optimist     up   infinite    232  alloc c1-[23,29,31,38,53-55],c2-[4-5,7,9,19-20,24-28,34-35,40-42,44-53],c3-[3-4,7,11-12,14,16-28,36-37],c5-[1-27,29-59],c7-[1-2,4,6],c10-[1-40,42-60],c11-[1-30,32-56,59-60]
optimist     up   infinite      3   idle c6-8,c7-[5,7]

Here we see that the accel partition contains 8 nodes in total, 2 of which are unused at the moment (idle), 4 are fully occupied (alloc) and 2 are partially occupied (mix). We can also read from this that the maximum time limit for a GPU job is 14 days, which might be relevant for your production calculations.

To select the correct partition use the --partition=accel flag with either salloc (interactive) or sbatch (job script). This flag will ensure that your job is only run on machines in the accel partition which have attached GPUs. However, to be able to actually interact with one or more GPUs we will have to also add --gpus=N which tells Slurm that we would also like to use N GPUs (N can be a number between 1 and 4 on Saga since each node has 4 GPUs).


There are multiple ways of requesting GPUs a part from --gpus=N, such as --gpus-per-task to specify the number of GPUs that each task should get access to. Checkout the official Slurm documentation for more on how to specify the number of GPUs.

Interactive testing

All projects should have access to GPU resources, and to that end we will start by simply testing that we can get access to a single GPU. To do this we will run an interactive job using the salloc command, on the accel partition and asking for a single GPU:

[me@login.SAGA]$ salloc --ntasks=1 --mem-per-cpu=1G --time=00:02:00 --partition=accel --gpus=1 --qos=devel --account=<your project number>
salloc: Pending job allocation 4318997
salloc: job 4318997 queued and waiting for resources
salloc: job 4318997 has been allocated resources
salloc: Granted job allocation 4318997
salloc: Waiting for resource configuration
salloc: Nodes c7-7 are ready for job

Once we land on the compute node we can inspect the GPU hardware with the nvidia-smi command (this is kind of the top equivalent for Nvidia GPUs):

[me@c7-8.SAGA]$ nvidia-smi
Wed Nov  3 14:14:47 2021       
| NVIDIA-SMI 455.32.00    Driver Version: 455.32.00    CUDA Version: 11.1     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:14:00.0 Off |                    0 |
| N/A   33C    P0    30W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|  No running processes found                                                 |

Here we can find useful things like CUDA library/driver version and the name of the graphics card (Tesla P100-PCIE...), but also information about currently running processes that are “GPU aware” (none at the moment). If you don’t get any useful information out of the nvidia-smi command (e.g. command not found or No devices were found) you likely missed the --partition=accel and/or --gpus=N options in your Slurm command, which means that you won’t actually have access to any GPU (even if there might be one physically on the machine).


In the above Slurm specification we combined --qos=devel with GPUs and interactive operations so that we can experiment with commands interactively. This can be a good way to perform short tests to ensure that libraries correctly pick up GPUs when developing your experiments. Read more about --qos=devel in our guide on interactive jobs.

Simple GPU test runs

In the following we present a few minimal standalone code examples using different acceleration strategies and programming languages. The purpose of all these examples is the same (compile, run and verify), so you can choose the version that suites you best.

Next steps

Transitioning your application to GPU can be a daunting challenge. We have documented a few ways to get started in our development here, but if you are unsure please don’t hesitate to contact us at

We also have a few tutorials on specific GPU related topics: