Job Scripts on Fram

This page documents how to specify the queue system parameters for the different job types on Fram. See Fram Job Types for information about the different job types on Fram.

Normal

The basic type of job on Fram is the normal job. Most of the other job types are "variants" of a normal job.

Normal jobs must specify account (--account), walltime limit (--time) and number of nodes (--nodes). The jobs can specify how many tasks should run per node and how many CPUs should be used by each task.

A typical job specification for a normal job would be

#SBATCH --account=MyProject
#SBATCH --job-name=MyJob
#SBATCH --time=1-0:0:0
#SBATCH --nodes=10 --ntasks-per-node=32

This will start 32 tasks (processes) on each node, one for each cpu on the node.

All normal jobs gets exclusive access to whole nodes (all CPUs and memory). If a job tries to use more (resident) memory than is configured on the nodes, it will be killed. Currently, this limit is 60 GiB, but it can change. If a job would require more memory per task than the given 60 GiB split by 32 tasks, the trick is to limit the number of tasks per node the following way:

#SBATCH --account=MyProject
#SBATCH --job-name=MyJob
#SBATCH --time=1-0:0:0
#SBATCH --nodes=10 --ntasks-per-node=4

This example above will use only 4 tasks per node, giving each task 15 GiB. Note that is the total memory usage on each node that counts, so one of the tasks can use more than 15 GiB, as long as the total is less than 60 GiB.

If your job needs more than 60 GiB per task, the only option on Fram is to use a bigmem job (see below).

To run multithreaded applications, use --cpus-per-task to allocate the right number of cpus to each task. For instance:

#SBATCH --account=MyProject
#SBATCH --job-name=MyJob
#SBATCH --time=1-0:0:0
#SBATCH --nodes=4 --ntasks-per-node=2 --cpus-per-task=16

Note that setting --cpus-per-task does not bind the tasks to the given number of cpus for normal jobs; it merely sets $OMP_NUM_THREADS so that OpenMP jobs by default will use the right number of threads. (It is possible to override this number by setting $OMP_NUM_THREADS in the job script.)

The Fram Sample MPI Job page has an example of a normal MPI job.

See Fram Job Placement for optional parameters for controlling which nodes a normal job is run on.

Preproc

Preproc jobs are specified just like normal jobs, except that they also must specify --qos=preproc, and they are only allocated 1 node (so there is no need to specify --nodes). Otherwise, they are just like normal jobs.

Example of a general preproc specification (1 node, 4 tasks with 8 CPUs/task):

#SBATCH --account=MyProject --job-name=MyJob
#SBATCH --qos=preproc
#SBATCH --time=1:0:0
#SBATCH --ntasks-per-node=4 --cpus-per-task=8

Here is a simpler preproc job (one task on one node):

#SBATCH --account=MyProject --job-name=MyJob
#SBATCH --qos=preproc
#SBATCH --time=1:0:0

Bigmem

Bigmem jobs must specify --partition=bigmem. In addition, they must specify wall time limit, the number of tasks and the amount of memory memory per cpu. A bigmem job is assigned the requested cpus and memory exclusively, but shares nodes with other jobs. If a bigmem job tries to use more resident memory than requested, it gets killed. The maximal wall time limit for bigmem jobs is 14 days.

Here is an example that asks for 2 nodes, 3 tasks per node, 4 cpus per task, and 32 GiB RAM per cpu:

#SBATCH --account=MyProject --job-name=MyJob
#SBATCH --partition=bigmem
#SBATCH --time=1-0:0:0
#SBATCH --nodes=2 --ntasks-per-node=3 --cpus-per-task=4
#SBATCH --mem-per-cpu=32G

Note that even though the memory specification is called --mem-per-cpu, the memory limit the job gets on the node is for the total usage by all processes on the node, so in the above example, it would get a limit of 3 4 32 GiB = 384 GiB. The queue system doesn't care how the memory usage is divided between the processes or threads, as long as the total usage on the node is below the limit.

Also note that contrary to normal jobs, bigmem jobs will be bound to the cpu cores they are allocated, so the above sample job will have access to 12 cores on each node. However, the three tasks are free to use all cores the job has access to on the node (12 in this example).

Here is a simpler example, which only asks for 16 tasks (of 1 cpu each) and 32 GiB RAM per task; it does not care how the tasks are allocated on the nodes:

#SBATCH --account=MyProject --job-name=MyJob
#SBATCH --partition=bigmem
#SBATCH --time=1-0:0:0
#SBATCH --ntasks=16
#SBATCH --mem-per-cpu=32G

Optimist

Optimist jobs are specified just like normal jobs, except that they also must must specify --qos=optimist, and should not specify wall time limit. They run on the same nodes as normal jobs.

An optimist job can be scheduled if there are free resources at least 30 minutes when the job is considered for scheduling. However, it can be requeued before 30 minutes have passed, so there is no gurarantee of a minimum run time. When an optimist job is requeued, it is first sent a SIGTERM signal. This can be trapped in order to trigger a checkpoint. After 30 seconds, the job receives a SIGKILL signal, which cannot be trapped.

A simple optimist job specification might be:

#SBATCH --account=MyProject
#SBATCH --job-name=MyJob
#SBATCH --partition=optimist
#SBATCH --nodes=4 --ntasks-per-node=32

Devel

devel jobs must specify --qos=devel. A devel job is like a normal job, except that it has restrictions on job length and size.

For instance:

#SBATCH --account=MyProject
#SBATCH --job-name=MyJob
#SBATCH --qos=devel
#SBATCH --time=00:30:00
#SBATCH --nodes=2 --ntasks-per-node=32

Short

short jobs must specify --qos=short. A short job is like a normal job, except that it has restrictions on job length and size. It differs from devel jobs in that it allows somewhat longer and larger jobs, but typically have longer wait time.

For instance:

#SBATCH --account=MyProject
#SBATCH --job-name=MyJob
#SBATCH --qos=short
#SBATCH --time=2:00:00
#SBATCH --nodes=8 --ntasks-per-node=32

results matching ""

    No results matching ""