Job States

Commands like squeue, sacct and scontrol show job will show a state of each job. All job states are explained in the JOB STATE CODES section of the squeue documentation page.

Here is a table with the most common ones

Name

Long name

Description

PD

Pending

Job is waiting to be started

CF

Configuring

The queue system is starting up the job

R

Running

Job is running

CG

Completing

Job is finishing

CD

Completed

Job has finished

CA

Cancelled

Job has been cancelled, either before or after it started

F

Failed

Job has exited with a non-zero exit status

TO

Timeout

Job didn’t finish in time, and was cancelled

PR

Preemepted

Job was requeued because a higher priority job needed the resources

NF

Node_fail

Job was requeued because of a problem with one of its comput nodes

OOM

Out_of_memory

Job was cancelled because it tried to use too much memory

The commands can also give a reason why a job is in the state it is. This is most useful for pending jobs. All these reasons are explained in the JOB REASON CODES section of the squeue documentation page.

Here is a table with the most common ones

Name

Description

Resources

The job is waiting for resources to become idle

Priority

There are jobs with higher priority than this job. The job might be started, if it does not delay any of those jobs

AssocGrpCPUMinutesLimit

(On Fram) There is not enough hours left on the quota to start the job

AssocBillingMinutes

(On Saga) There is not enough hours left on the quota to start the job

ReqNodeNotAvail

One or more of the job’s required nodes is currently not available, typically because it is down or reserved

Dependency

The job is waiting for jobs it depend on to start or finish.

JobHeldUser

The job has been put on hold by the user

JobHeldAdmin

The job has been put on hold by an admin. Please contact support if you don’t know why it is being held.