Useful Slurm commands and tools for managing jobs
Slurm, the job scheduler used on the HPC clusters, has a number of useful commands for managing jobs. Here is a growing collection of useful Slurm commands. Slurm commands can also be found in the official Slurm documentation.
Commands for sbatch
There are two ways of giving sbatch a command. One way is to include the command in the job script by adding
#SBATCH before the command (just like you would do for the required sbatch commands such as
#SBATCH --nodes, etc.) The other way is to give the command in the command line when submitting a job script. For example for the command
--test-only, you would submit the job with
sbatch --test-only job_script.sh.
Validates the script and report about any missing information (misspelled input files, invalid arguments, etc.) and give an estimate of when the job will start running. Will not actually submit the job to the queue.
A job on Fram or Saga can request a scratch area on local disk on the node it is running on to speed up I/O intensive jobs. This command is not useful for jobs running on more than one node. Currently, there are no special commands to ensure that files are copied back automatically, so one has to do that with cp commands or similar in the job script. More information on using this command is found here: Job scratch area on local disk.
Other Slurm commands
Job statistics can be found with
sstat for running jobs and with
sacct for completed jobs. In the command line, use
sacct with the option
-j followed by the job id number.
$ sstat -j JobId $ sacct -j JobId
sacct have an option
--format to select which
fields to show. See the documentation on
sstat here and on
Here, a growing collection of useful tools available in the command line.
seff is a nice tool which we can use on completed jobs. For example here we ask
for a summary for the job number 4200691:
$ seff 4200691
Job ID: 4200691 Cluster: saga User/Group: user/user State: COMPLETED (exit code 0) Cores: 1 CPU Utilized: 00:00:01 CPU Efficiency: 2.70% of 00:00:37 core-walltime Job Wall-clock time: 00:00:37 Memory Utilized: 3.06 GB Memory Efficiency: 89.58% of 3.42 GB
Slurm Job Script Generator
A tool for generating Slurm job scripts tailored for our HPC clusters: Slurm Job Script Generator