Monitoring jobs
How to check whether your job is running
To check the job status of all your jobs, you can use squeue, i.e. by executing:
squeue --me
For an explanation of the output of this command, see squeue output examples.
You can also get a quick view of the status of a job
squeue -j JobId
where JobId
is the job id number that sbatch
returns. To see more
details about a job, use
scontrol show job JobId
Both commands will show the job state (ST), and can show a job reason for why a job is pending. Job States describes a few of the more common ones.
While a job is running, it is possible to view some of its usage statistics with the sstat command, and after it has finished, sacct will give you similar information:
sstat -j JobId
sacct -j JobId
Both sstat
and sacct
have an option --format
to select which
fields to show. See the documentation of the commands for the
available fields and what they mean.
When a job has finished, the output file slurm-JobId.out
will
contain some usage statistics from sstat
and sacct
.
Cancelling jobs and putting jobs on hold
You can cancel running or pending (waiting) jobs with scancel. For instance:
scancel JobId # Cancel job with id JobId (as returned from sbatch)
scancel --user=MyUsername # Cancel all your jobs
scancel --account=MyProject # Cancel all jobs in MyProject
The command scontrol can be used to further control pending or running jobs:
scontrol requeue JobId
: Requeue a running job. The job will be stopped, and its state changed to pending.scontrol hold JobId
: Hold a pending job. This prevents the queue system from starting the job. The job reason will be set toJobHeldUser
.scontrol release JobId
: Release a held job. This allows the queue system to start the job.
It is also possible to submit a job and put it on hold immediately
with sbatch --hold JobScript
.