Submitting jobs
The HPC clusters are resources that are shared between many users, and to ensure fair use everyone must do their computations by submitting jobs through a queue system (batch system) that will execute the applications on the available resources. In our case Slurm is used as workload manager and job scheduler.
When you log in to a cluster, you are logged in to a login node shared by all users. The login nodes are meant for logging in, copying files, editing, compiling, running short tests (no more than a couple of minutes), submitting jobs, checking job status, etc. If you are unsure about the basic interaction with Unix-like systems, here is a good resource to start with. Jobs started via Slurm run on the compute nodes.
Note that it is not allowed to run jobs directly on the login nodes.
Note
Email notification from completed Slurm scripts is currently disabled on all machines and it looks like it will take quite a while (months?) before we can re-enable it. Sorry for the inconvenience. The reason is technical due to the way the infrastructure is set up. It is non-trivial for us to re-enable this in a good and robust and secure way.
Jobs
It is possible to run commands interactively on the cluster, which can be a good way to test your commands, or work with interactive applications like MATLAB. See interactive for more details. However, the normal way to run a computation on the cluster, is to submit a job script into a job queue, and the job is started when one or more suitable compute nodes are available.
Job scripts are submitted with the sbatch command:
sbatch YourJobscript
The sbatch command returns a jobid, number that identifies the
submitted job. The job will be waiting in the job queue until there
are free compute resources it can use. A job in that state is said to
be pending (PD). When it has started, it is called running (R).
Any output (stdout or stderr) of the job script will be written to a
file called slurm-<jobid>.out in the directory where you ran
sbatch, unless otherwise specified.
It is also possible to pass arguments to the job script, like this:
sbatch YourJobscript arg1 arg2
These will be available as the variables $1, $2, etc. in the job
script, so in this example, $1 would have the value arg1 and $2
the value arg2.
All commands in the job script are performed on the compute-node(s) allocated by the queue system. The script also specifies a number of requirements (memory usage, number of CPUs, run-time, etc.), used by the queue system to find one or more suitable machines for the job.
More information about Slurm
For more information about the Slurm parameters and job script settings, see Slurm parameter.
A more detailed description of the queue system can be found in Queue System Concepts.
If you are already used to PBS/Torque, but not Slurm, you might find Porting from PBS/Torque useful.
Job Queue
Jobs in the job queue are started on a priority basis, and a job gets higher priority the longer it has to wait in the queue. A detailed description can be found in Job Scheduling.
To see the list of running or pending jobs in the queue, use the
command squeue. Some useful squeue options:
-j jobids show only the specified jobs
-w nodes show only jobs on the specified nodes
-A projects show only jobs belonging to the specified projects
-t states show only jobs in the specified states (pending, running,
suspended, etc.)
-u users show only jobs belonging to the specified users
All specifications can be comma separated lists. Examples:
squeue -j 14132,14133 # shows jobs 4132 and 4133
squeue -w c23-11 # shows jobs running on c23-11
squeue -u foo -t PD # shows pending jobs belonging to user 'foo'
squeue -A bar # shows all jobs in the project 'bar'
To see all pending jobs, in priority order, you can use pending,
which is a small wrapper for squeue. See pending --help for
details and options.
For a description of common job states, see Job States. For an overview of the output from squeue see squeue output examples.
To get an overview of the available and used resources on the cluster,
you can use the qsumm (“Queue Summary”) command. It will by default
show the number of available billing units, how many are used
by running jobs and wanted by pending jobs, for all jobs together, as
well as by each project. For instance:
$ qsumm
Billing units in job queue, per project.
Run 'qsumm --man' for details.
Account Limit Running Pending
------------------------------------
Sum normal 172416 151971 5952
nn1002k 172416 3840 4
nn10054k 172416 1024 2
nn11022k 172416 6144 .
nn11023k 172416 512 .
nn11063k 172416 . 16
nn12019k 172416 57344 .
nn12037k 172416 1280 772
nn12055k 172416 2048 .
nn2834k 172416 25600 .
nn2916k 172416 1024 .
nn2993k 172416 20480 5120
nn4654k 172416 768 .
nn5023k 172416 . 1
nn8015k 172416 512 .
nn8104k 172416 . 1
nn9039k 172416 15 .
nn9188k 172416 13312 17
nn9238k 172416 1044 .
nn9352k 172416 13184 .
nn9372k 172416 512 .
nn9391k 172416 1536 .
nn9560k 172416 768 17
nn9600k 172416 512 2
nn9894k 172416 512 .
------------------------------------
Total sum 172416 151971 5952
This shows that the cluster (Betzy, in this case) has 172416 billing units available, 151971 is currently used by running jobs, and pending jobs want in total 5952 billing units. It also shows how much each project has running or pending.
It is possible to get qsumm to show GPUs instead of billing units,
limit it to certain partitions or jobs belonging to specific users:
$ qsumm --gpu # Show GPUs instead of billing units. Especially useful on Olivia
$ qsumm --partition=<partiton(s)> # Limit output to specific partititions
$ qsumm --user=<user(s)> # Limit output to specific users
(Separate partitions or users with a comma (“,”).) See qsumm --help
for a summary of options, or qsumm --man for full manual.