Job Types on Fram

Fram is designed to run medium-sized parallel jobs. If you need to run serial jobs or “narrow” parallel jobs, Saga is a better choice.

Most jobs on Fram are normal jobs.

Jobs requiring a lot of memory (> 4 GiB/cpu) should run as bigmem jobs. Also, jobs requiring only a single cpu, can use a small bigmem job.

Jobs that are very short, or implement checkpointing, can run as optimist jobs, which means they can use resources that are idle for a short time before they are requeued by a non-optimist job.

For development or testing, there are two job types: devel usually has the shortest wait time during office hours, but is limited to small, short jobs. short allows slightly larger and longer jobs, but will probably have longer wait times.

Here is a more detailed description of the different job types on Fram:

Normal

  • Allocation units: whole nodes

  • Job Limits:

    • minimum 1 node, maximum 32 nodes (can be increased)

  • Maximum walltime: 7 days

  • Priority: normal

  • Available resources: 996 nodes with 32 cpus and 59 GiB RAM

  • Parameter for sbatch/salloc:

    • None, normal is the default

  • Job Scripts: Normal

This is the default job type. Most jobs are normal jobs. Most of the other job types are “variants” of a normal job.

In normal jobs, the queue system hands out complete nodes. If a project needs more than 32 nodes per job, and the application in question can actually scale more than 32 nodes, please send a request to support@nris.no.

Bigmem

  • Allocation units: cpus and memory

  • Job Limits:

    • (none)

  • Maximum walltime: 14 days

  • Priority: normal

  • Available resources:

    • 8 nodes with 32 cpus and 494 GiB RAM

  • Parameter for sbatch/salloc:

    • --partition=bigmem

  • Job Scripts: Bigmem

Bigmem jobs are meant for jobs that need a lot of memory (RAM), typically more than 4 GiB per cpu. (The normal nodes on Fram have slightly less than 2 GiB per cpu.)

For bigmem jobs, the queue system hands out cpus and memory, not whole nodes.

Devel

  • Allocation units: whole nodes

  • Job Limits:

    • minimum 1 nodes, maximum 8 nodes per job

    • maximum 1 running job at a time per user

  • Maximum walltime: 30 minutes

  • Priority: high

  • Available resources: 8 nodes with 32 cpus and 59 GiB RAM between 07:00 and 21:00 on weekdays

  • Parameter for sbatch/salloc:

    • --qos=devel

  • Job Scripts: Devel

This is meant for small, short development or test jobs. Devel jobs have access to a set of dedicated nodes on daytime in weekdays to make the jobs start as soon as possible. On the other hand, there are limits on the size and number of devel jobs.

If you have temporary development needs that cannot be fulfilled by the devel or short job types, please contact us at support@nris.no.

Short

  • Allocation units: whole nodes

  • Job Limits:

    • minimum 1 nodes, maximum 10 nodes per job

    • maximum 16 nodes in use at the same time

  • Maximum walltime: 2 hours

  • Priority: high (slightly lower than devel)

  • Available resources: 16 nodes with 32 cpus and 59 GiB RAM (shared with normal)

  • Parameter for sbatch/salloc:

    • --qos=short

  • Job Scripts: Short

This is also meant for development or test jobs. It allows slightly longer and wider jobs than devel, but has slightly lower priority, and no dedicated resources. This usually results in a longer wait time than devel jobs, at least on work days.

Optimist

  • Allocation units: whole nodes

  • Job Limits:

    • minimum 1 node, maximum 32 nodes (can be increased)

  • Maximum Walltime: None. The jobs will start as soon as resources are available for at least 30 minutes, but can be requeued at any time, so there is no guaranteed minimum run time.

  • Priority: low

  • Available resources: optimist jobs run on the normal nodes.

  • Parameter for sbatch/salloc:

    • --qos=optimist

  • Job Scripts: Optimist

The optimist job type is meant for very short jobs, or jobs with checkpointing (i.e., they save state regularly, so they can restart from where they left off).

Optimist jobs get lower priority than other jobs, but will start as soon as there are free resources for at least 30 minutes. However, when any other non-optimist job needs its resources, the optimist job is stopped and put back on the job queue. This can happen before the optimist job has run 30 minutes, so there is no guaranteed minimum run time.

Therefore, all optimist jobs must use checkpointing, and access to run optimist jobs will only be given to projects that demonstrate that they can use checkpointing. If you want to run optimist jobs, send a request to support@nris.no.