Job Types on Fram
Fram is designed to run medium-sized parallel jobs. If you need to run serial jobs or “narrow” parallel jobs, Saga is a better choice.
Most jobs on Fram are normal jobs.
Jobs requiring a lot of memory (> 4 GiB/cpu) should run as bigmem jobs. Also, jobs requiring only a single cpu, can use a small bigmem job.
Jobs that are very short, or implement checkpointing, can run as optimist jobs, which means they can use resources that are idle for a short time before they are requeued by a non-optimist job.
For development or testing, there are two job types: devel usually has the shortest wait time during office hours, but is limited to small, short jobs. short allows slightly larger and longer jobs, but will probably have longer wait times.
Here is a more detailed description of the different job types on Fram:
Normal
Allocation units: whole nodes
Job Limits:
minimum 1 node, maximum 32 nodes (can be increased)
Maximum walltime: 7 days
Priority: normal
Available resources: 996 nodes with 32 cpus and 59 GiB RAM
Parameter for sbatch/salloc:
None, normal is the default
Job Scripts: Normal
This is the default job type. Most jobs are normal jobs. Most of the other job types are “variants” of a normal job.
In normal jobs, the queue system hands out complete nodes. If a project needs more than 32 nodes per job, and the application in question can actually scale more than 32 nodes, please send a request to support@nris.no.
Bigmem
Allocation units: cpus and memory
Job Limits:
(none)
Maximum walltime: 14 days
Priority: normal
Available resources:
8 nodes (max 7 per user) with 32 cpus and 494 GiB RAM
Parameter for sbatch/salloc:
--partition=bigmem
Job Scripts: Bigmem
Bigmem jobs are meant for jobs that need a lot of memory (RAM), typically more than 4 GiB per cpu. (The normal nodes on Fram have slightly less than 2 GiB per cpu.)
For bigmem jobs, the queue system hands out cpus and memory, not whole nodes.
Devel
Allocation units: whole nodes
Job Limits:
minimum 1 nodes, maximum 8 nodes per job
maximum 1 running job at a time per user
Maximum walltime: 30 minutes
Priority: high
Available resources: 8 nodes with 32 cpus and 59 GiB RAM between 07:00 and 21:00 on weekdays
Parameter for sbatch/salloc:
--qos=devel
Job Scripts: Devel
This is meant for small, short development or test jobs. Devel jobs have access to a set of dedicated nodes on daytime in weekdays to make the jobs start as soon as possible. On the other hand, there are limits on the size and number of devel jobs.
If you have temporary development needs that cannot be fulfilled by the devel or short job types, please contact us at support@nris.no.
Short
Allocation units: whole nodes
Job Limits:
minimum 1 nodes, maximum 10 nodes per job
maximum 16 nodes in use at the same time
Maximum walltime: 2 hours
Priority: high (slightly lower than devel)
Available resources: 16 nodes with 32 cpus and 59 GiB RAM (shared with normal)
Parameter for sbatch/salloc:
--qos=short
Job Scripts: Short
This is also meant for development or test jobs. It allows slightly longer and wider jobs than devel, but has slightly lower priority, and no dedicated resources. This usually results in a longer wait time than devel jobs, at least on work days.
Optimist
Allocation units: whole nodes
Job Limits:
minimum 1 node, maximum 32 nodes (can be increased)
Maximum Walltime: None. The jobs will start as soon as resources are available for at least 30 minutes, but can be requeued at any time, so there is no guaranteed minimum run time.
Priority: low
Available resources: optimist jobs run on the normal nodes.
Parameter for sbatch/salloc:
--qos=optimist
Job Scripts: Optimist
The optimist job type is meant for very short jobs, or jobs with checkpointing (i.e., they save state regularly, so they can restart from where they left off).
Optimist jobs get lower priority than other jobs, but will start as soon as there are free resources for at least 30 minutes. However, when any other non-optimist job needs its resources, the optimist job is stopped and put back on the job queue. This can happen before the optimist job has run 30 minutes, so there is no guaranteed minimum run time.
Therefore, all optimist jobs must use checkpointing, and access to run optimist jobs will only be given to projects that demonstrate that they can use checkpointing. If you want to run optimist jobs, send a request to support@nris.no.