Introduction to using GPU compute
A GPU, or Graphics Processing Unit, is a computational unit, which as the name suggest, is optimized to work on graphics tasks. Nearly every computer device that one interacts with contains a GPU of some sort, responsible for transforming the information we want to display into actual pixels on our screens.
One question that might immediately present itself is, if GPUs are optimized for graphics - why are they interesting in the context of computational resources? The answer to that is of course complicated, but the short explanation is that many computational tasks have a lot in common with graphical computations. The reason for this is that GPUs are optimized for working with pixels on the screen, and a lot of them. Since all of these operations are almost identical, mainly working on floating point values, they can be run in parallel on dedicated hardware (i.e. the GPU) that is tailored and optimized for this particular task. This already sounds quite a bit like working with a discrete grid in e.g. atmospheric simulation, which points to the reason why GPUs can be interesting in a computational context.
Since GPUs are optimized for working on grids of data and how to transform this data, they are quite well suited for matrix calculations. For some indication of this we can compare the theoretical performance of one GPU with one CPU .
AMD Epyc 7742 (Betzy) |
Nvidia P100 (Saga) |
Nvidia A100 (Betzy) |
|
---|---|---|---|
Half Precision |
N/A |
18.7 TFLOPS |
78 TFLOPS |
Single Precision |
1,3 TFLOPS |
9,3 TFLOPS |
19.5 TFLOPS |
Double Precision |
N/A |
4.7 TFLOPS |
9.7 TFLOPS |
Based on this it is no wonder why tensor libraries such as
TensorFlow
and PyTorch
report speedup
on accelerators between 23x
and 190x
compared to using only a CPU.
Getting started
To get started we first have to SSH into Saga:
[me@mylaptop]$ ssh <username>@saga.sigma2.no
From the hardware specification we see that there should be 8 GPU
nodes available on Saga, and from the available job types
we identify --partition=accel
as the relevant hardware partition for GPU jobs.
You can run the sinfo
command to check the available partitions on Saga:
[me@login.SAGA]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
normal* up 7-00:00:00 88 resv c1-[1-22,24-28,30,32-37,39-52,56],c2-[1-3,6,8,10-18,21-23,29-33,36-39,43,54-56],c3-[1-2,5-6,8-10,13,15]
normal* up 7-00:00:00 5 mix c5-28,c10-41,c11-[31,57-58]
normal* up 7-00:00:00 226 alloc c1-[23,29,31,38,53-55],c2-[4-5,7,9,19-20,24-28,34-35,40-42,44-53],c3-[3-4,7,11-12,14,16-28],c5-[1-27,29-59],c10-[1-40,42-60],c11-[1-30,32-56,59-60]
bigmem up 14-00:00:0 1 drain* c6-2
bigmem up 14-00:00:0 32 mix c3-[29-35,38-56],c6-[1,3-7]
bigmem up 14-00:00:0 2 alloc c3-[36-37]
bigmem up 14-00:00:0 1 idle c6-8
accel up 14-00:00:0 2 mix c7-[3,8]
accel up 14-00:00:0 4 alloc c7-[1-2,4,6]
accel up 14-00:00:0 2 idle c7-[5,7]
optimist up infinite 1 drain* c6-2
optimist up infinite 88 resv c1-[1-22,24-28,30,32-37,39-52,56],c2-[1-3,6,8,10-18,21-23,29-33,36-39,43,54-56],c3-[1-2,5-6,8-10,13,15]
optimist up infinite 39 mix c3-[29-35,38-56],c5-28,c6-[1,3-7],c7-[3,8],c10-41,c11-[31,57-58]
optimist up infinite 232 alloc c1-[23,29,31,38,53-55],c2-[4-5,7,9,19-20,24-28,34-35,40-42,44-53],c3-[3-4,7,11-12,14,16-28,36-37],c5-[1-27,29-59],c7-[1-2,4,6],c10-[1-40,42-60],c11-[1-30,32-56,59-60]
optimist up infinite 3 idle c6-8,c7-[5,7]
Here we see that the accel
partition contains 8 nodes in total, 2 of which are
unused at the moment (idle
), 4 are fully occupied (alloc
) and 2 are partially
occupied (mix
). We can also read from this that the maximum time limit for a GPU
job is 14 days, which might be relevant for your production calculations.
To select the correct partition use the --partition=accel
flag with either
salloc
(interactive)
or
sbatch
(job script).
This flag will ensure that your job is only run on machines in the accel
partition
which have attached GPUs. However, to be able to actually interact with one or more
GPUs we will have to also add --gpus=N
which tells Slurm that we would also like
to use N
GPUs (N
can be a number between 1 and 4 on Saga since each node has 4
GPUs).
Tip
There are multiple ways of requesting GPUs a part from --gpus=N
, such as
--gpus-per-task
to specify the number of GPUs that each task should get
access to. Checkout the official Slurm
documentation for more on how to specify
the number of GPUs.
Interactive testing
All projects should have access to GPU resources, and to that end we will start
by simply testing that we can get access to a single GPU. To do this we will run
an interactive job using the salloc
command, on the accel
partition and asking
for a single GPU:
[me@login.SAGA]$ salloc --ntasks=1 --mem-per-cpu=1G --time=00:02:00 --partition=accel --gpus=1 --qos=devel --account=<your project number>
salloc: Pending job allocation 4318997
salloc: job 4318997 queued and waiting for resources
salloc: job 4318997 has been allocated resources
salloc: Granted job allocation 4318997
salloc: Waiting for resource configuration
salloc: Nodes c7-7 are ready for job
Once we land on the compute node we can inspect the GPU hardware with
the nvidia-smi
command (this is kind of the top
equivalent for Nvidia GPUs):
[me@c7-8.SAGA]$ nvidia-smi
Wed Nov 3 14:14:47 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.32.00 Driver Version: 455.32.00 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:14:00.0 Off | 0 |
| N/A 33C P0 30W / 250W | 0MiB / 16280MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Here we can find useful things like CUDA library/driver version and the name of the
graphics card (Tesla P100-PCIE...
), but also information about currently
running processes that are “GPU aware” (none at the moment). If you don’t get any
useful information out of the nvidia-smi
command (e.g. command not found
or
No devices were found
) you likely missed the --partition=accel
and/or --gpus=N
options in your Slurm command, which means that you won’t actually have access to any
GPU (even if there might be one physically on the machine).
Tip
In the above Slurm specification we combined --qos=devel
with GPUs and
interactive operations so that we can experiment with commands interactively.
This can be a good way to perform short tests to ensure that libraries correctly
pick up GPUs when developing your experiments. Read more about --qos=devel
in our guide on interactive jobs.
Simple GPU test runs
In the following we present a few minimal standalone code examples using different acceleration strategies and programming languages. The purpose of all these examples is the same (compile, run and verify), so you can choose the version that suites you best.
Next steps
Transitioning your application to GPU can be a daunting challenge. We have documented a few ways to get started in our development here, but if you are unsure please don’t hesitate to contact us at support@nris.no.
We also have a few tutorials on specific GPU related topics: