Job Scripts on Betzy
This page documents how to specify the queue system parameters for the different job types on Betzy. See Job Types on Betzy for information about the different job types on Betzy.
The basic type of job on Betzy is the normal job. Most of the other job types are “variants” of a normal job.
Normal jobs must specify account (
--account), walltime limit
--time) and number of nodes (
--nodes). The jobs can specify how
many tasks should run per node and how many CPUs should be used by
A typical job specification for a normal job would be
#SBATCH --account=MyProject #SBATCH --job-name=MyJob #SBATCH --time=1-0:0:0 #SBATCH --nodes=10 --ntasks-per-node=128
This will start 128 tasks (processes) on each node, one for each cpu on the node.
All normal jobs gets exclusive access to whole nodes (all CPUs and memory). If a job tries to use more (resident) memory than is configured on the nodes, it will be killed. Currently, this limit is 244 GiB, but it can change. If a job would require more memory per task than the given 244 GiB split by 128 tasks, the trick is to limit the number of tasks per node the following way:
#SBATCH --account=MyProject #SBATCH --job-name=MyJob #SBATCH --time=1-0:0:0 #SBATCH --nodes=10 --ntasks-per-node=16
This example above will use only 16 tasks per node, giving each task 15 GiB. Note that is the total memory usage on each node that counts, so one of the tasks can use more than 15 GiB, as long as the total is less than 244 GiB.
To run multithreaded applications, use
--cpus-per-task to allocate
the right number of cpus to each task. For instance:
#SBATCH --account=MyProject #SBATCH --job-name=MyJob #SBATCH --time=1-0:0:0 #SBATCH --nodes=4 --ntasks-per-node=4 --cpus-per-task=32
Note that setting
--cpus-per-task does not bind the tasks to the
given number of cpus for normal jobs; it merely sets
$OMP_NUM_THREADS so that OpenMP jobs by default will use the right
number of threads. (It is possible to override this number by setting
$OMP_NUM_THREADS in the job script.)
The Betzy Sample MPI Job page has an example of a normal MPI job.
Preproc jobs must specify
--partition=preproc. In addition, they
must specify wall time limit, the number of tasks and the amount of
memory memory per cpu. A preproc job is assigned the requested cpus
and memory exclusively, but shares nodes with other jobs. (Currently,
there is only one node in the preproc partition.) If a
preproc job tries to use more resident memory than requested, it gets
killed. The maximal wall time limit for preproc jobs is 1 day.
Here is an example that asks for 3 tasks per, 4 cpus per task, and 2 GiB RAM per cpu:
#SBATCH --account=MyProject --job-name=MyJob #SBATCH --partition=preproc #SBATCH --time=1-0:0:0 #SBATCH --ntasks=3 --cpus-per-task=4 #SBATCH --mem-per-cpu=2G
Note that even though the memory specification is called
memory limit the job gets on the node is for the total usage by all processes
on the node, so in the above example, it would get a limit of 3 * 4 * 2 GiB =
12 GiB. The queue system doesn’t care how the memory usage is divided between
the processes or threads, as long as the total usage on the node is below the
Also note that contrary to normal jobs, preproc jobs will be bound to the cpu cores they are allocated, so the above sample job will have access to 12 cores. However, the three tasks are free to use all cores the job has access to (12 in this example).
Accel jobs are those that require GPUs to perform calculations. To
ensure that your job is run on only machinces with GPUs the
--partition=accel option must be supplied. Also, to get access to
one or more GPUs one need to request a number of GPUs with the
--gpus=N specification (see below for more ways to specify the GPUs
for your job). In addition, the jobs must specify wall time limit,
the number of tasks and the amount of memory memory per cpu or GPU.
See the preproc job section above for details about specifying tasks
For a simple job, only requiring 1 GPU, the following example configuration could be used:
#SBATCH --account=MyProject #SBATCH --job-name=SimpleGPUJob #SBATCH --time=0-00:05:00 #SBATCH --mem-per-cpu=1G #SBATCH --partition=accel #SBATCH --gpus=1
The following example starts 2 tasks each with a single GPU. This is useful for MPI enabled jobs where each rank should be assigned a GPU.
#SBATCH --account=MyProject #SBATCH --job-name=MPIGPUJob #SBATCH --time=0-00:05:00 #SBATCH --mem-per-cpu=1G #SBATCH --ntasks=2 --gpus=2 #SBATCH --partition=accel
There are other GPU related specifications that can be used, and that parallel some of the cpu related specifications. The most useful are probably:
--gpus-per-nodeHow many GPUs the job should have on each node.
--gpus-per-taskHow many GPUs the job should have per task. Requires the use of
--gpus-per-socketHow many GPUs the job should have on each socket. Requires the use of
--mem-per-gpuHow much RAM the job should have for each GPU. Can be used instead of
--mem-per-cpu, (but cannot be used together with it).
Due to a bug in Slurm
--gpus-per-task is not working correctly on Betzy, jobs using
this option will be billed more core hours than what the job is actually using.
Users should revert to using
Betzy for now.
See sbatch or
for the details, and other GPU related specifications.
(The old way of specifying GPUs:
--gres=gpu:N is still supported,
but is less flexible than the above specification.)
devel jobs must specify
--qos=devel. A devel job is like a normal
job, except that it has restrictions on job length and size.
#SBATCH --account=MyProject #SBATCH --job-name=MyJob #SBATCH --qos=devel #SBATCH --time=00:30:00 #SBATCH --nodes=2 --ntasks-per-node=128