Job Types on Saga

Saga is designed to run serial and small ("narrow") parallel jobs, in addition to GPU jobs. If you need to run "wider" parallel jobs, Fram is a better choice.

The basic allocation units on Saga are cpu and memory.

Most jobs on Saga are normal jobs.

Jobs requiring a lot of memory (> 8 GiB/cpu) should run as bigmem jobs.

Jobs that are very short, or implement checkpointing, can run as optimist jobs, which means they can use resources that are idle for a short time before they are requeued by a non-optimist job.

For development or testing, use a devel job

Here is a more detailed description of the different job types on Saga:

Normal

  • Allocation units: cpus and memory
  • Job Limits:
    • maximum 256 cpus
  • Maximum walltime: 7 days
  • Priority: normal
  • Available resources: 200 nodes with 40 cpus and 186 GiB RAM, in total 8000 cpus and 36.4 TiB RAM.
  • Job Scripts: Saga Normal Job Scripts

This is the default job type. Most jobs are normal jobs.

Bigmem

  • Allocation units: cpus and memory
  • Job Limits:
    • (none)
  • Maximum walltime: 14 days
  • Priority: normal
  • Available resources:
    • 28 nodes with 40 cpus and 377 GiB RAM
    • 8 nodes with 64 cpus and 3021 GiB RAM
    • In total 1632 cpus and 33.9 TiB RAM.
  • Job Scripts: Saga bigmem Job Scripts

  • Description: Meant for jobs that need a lot of memory (RAM)

Bigmem jobs are meant for jobs that need a lot of memory (RAM), typically more than 8 GiB per cpu. (The normal nodes on Fram have slightly more than 4.5 GiB per cpu.)

Accel

  • Allocation units: cpus, memory and GPUs
  • Job Limits:
    • (none)
  • Maximum walltime: 14 days
  • Priority: normal
  • Available resources: 8 nodes with 24 cpus, 377 GiB RAM and 4 GPUs, in total 192 cpus, 3019 GiB RAM and 32 GPUs.
  • Job Scripts: Saga accel Job Scripts

Accel jobs give access to use the GPUs.

Optimist

  • Allocation units: cpus and memory
  • Job Limits:
    • maximum 256 cpus
  • Maximum Walltime: None. The jobs will start as soon as resources are available for at least 30 minutes, but can be requeued at any time, so there is no guaranteed minimum run time.
  • Priority: low
  • Available resources: optimist jobs can run on any node on Saga
  • Job Scripts: Saga optimist Job Scripts

The optimist job type is meant for very short jobs, or jobs with checkpointing (i.e., they save state regularly, so they can restart from where they left off).

Optimist jobs get lower priority than other jobs, but will start as soon as there are free resources for at least 30 minutes. However, when any other non-optimist job needs its resources, the optimist job is stopped and put back on the job queue. This can happen before the optimist job has run 30 minutes, so there is no guaranteed minimum run time.

Therefore, all optimist jobs must use checkpointing, and access to run optimist jobs will only be given to projects that demonstrate that they can use checkpointing. If you want to run optimist jobs, send a request to support@metacenter.no.

Devel

  • Allocation units: cpus and memory and GPUs
  • Job Limits:
    • maximum 128 cpus per job
    • maximum 256 cpus in use at the same time
    • maximum 2 running jobs per user
  • Maximum walltime: 2 hours
  • Priority: high
  • Available resources: devel jobs can run on any node on Saga
  • Job Scripts: Saga devel Job Scripts

This is meant for small, short development or test jobs. Devel jobs get higher priority for them to run as soon as possible. On the other hand, there are limits on the size and number of devel jobs.

If you have temporary development needs that cannot be fulfilled by the devel or short job types, please contact us at support@metacenter.no.

results matching ""

    No results matching ""