Job States
Commands like squeue
, sacct
and scontrol show job
will show a
state of each job. All job states are explained in the JOB STATE
CODES section of the squeue documentation
page.
Here is a table with the most common ones
Name |
Long name |
Description |
---|---|---|
PD |
Pending |
Job is waiting to be started |
CF |
Configuring |
The queue system is starting up the job |
R |
Running |
Job is running |
CG |
Completing |
Job is finishing |
CD |
Completed |
Job has finished |
CA |
Cancelled |
Job has been cancelled, either before or after it started |
F |
Failed |
Job has exited with a non-zero exit status |
TO |
Timeout |
Job didn’t finish in time, and was cancelled |
PR |
Preemepted |
Job was requeued because a higher priority job needed the resources |
NF |
Node_fail |
Job was requeued because of a problem with one of its comput nodes |
OOM |
Out_of_memory |
Job was cancelled because it tried to use too much memory |
The commands can also give a reason why a job is in the state it is. This is most useful for pending jobs. All these reasons are explained in the JOB REASON CODES section of the squeue documentation page.
Here is a table with the most common ones
Name |
Description |
---|---|
Resources |
The job is waiting for resources to become idle |
Priority |
There are jobs with higher priority than this job. The job might be started, if it does not delay any of those jobs |
AssocGrpBillingMinutes |
There is not enough hours left on the quota to start the job |
ReqNodeNotAvail |
One or more of the job’s required nodes is currently not available, typically because it is down or reserved |
Dependency |
The job is waiting for jobs it depend on to start or finish. |
JobHeldUser |
The job has been put on hold by the user |
JobHeldAdmin |
The job has been put on hold by an admin. Please contact support if you don’t know why it is being held. |