Job Dependencies
In the following we demonstrate how to add dependecies between jobs using the Slurm option --dependency
.
The full list of dependency types can be found in the Slurm
documentation, but we will show the most useful cases here:
Option |
Explanation |
---|---|
|
job can start after |
|
job can start after |
|
job can start only if |
|
job can start only if |
Several <jobid>
s can be combined in a comma-separated list.
Note
The --dependency
option must be added to the sbatch
command before the name of the
job script, if you put it after the script it will be treated as an argument to the script, not
to the sbatch
command. If the dependency was added successfully, you should see a (Dependency)
in the NODELIST(REASON)
column of the squeue
output.
Beware of exit status
With some of the options it is important to keep in mind the exit status of your job script, to indicate whether or not the job finished successfully. By default the script will return the exit status of the last command executed in the script, which in general does not necessarily reflect the overall success of the job. It is then highly recommended adding the following to the script:
set -o errexit # Exit the script on any error
set -o nounset # Treat any unset variables as an error
as well as capturing errors in critical commands along the way:
mycommand || exit 1
and finally explicitly return 0 in case the script finishes successfully:
# Successful exit
exit 0
Standard Slurm errors like out-of-memory or time limit will of course be captured automatically.
Examples
Here
pre.sh
is a pre-processing step forjob-1.sh
,job-2.sh
, etc:
$ sbatch pre.sh
Submitted batch job 123123
$ for i in 1 2 3 4 5; do sbatch --dependency=afterok:123123 job-${i}.sh; done
Submitted batch job 123124
Submitted batch job 123125
Submitted batch job 123126
Submitted batch job 123127
Submitted batch job 123128
$ squeue -u $USER
JOBID PARTITION NAME ST USER TIME NODES NODELIST(REASON)
123124 normal job-1 PD me 0:00 1 (Dependency)
123125 normal job-2 PD me 0:00 1 (Dependency)
123126 normal job-3 PD me 0:00 1 (Dependency)
123127 normal job-4 PD me 0:00 1 (Dependency)
123128 normal job-5 PD me 0:00 1 (Dependency)
123123 normal pre R me 0:28 1 c1-1
Here
post.sh
is a post-processing step forjob-1.sh
,job-2.sh
, etc:
$ for i in 1 2 3 4 5; do sbatch job-${i}.sh; done
Submitted batch job 123123
Submitted batch job 123124
Submitted batch job 123125
Submitted batch job 123126
Submitted batch job 123127
$ sbatch --dependency=afterok:123123,123124,123125,123126,123127 post.sh
Submitted batch job 123128
Here
job-2.sh
is a fallback/retry in casejob-1.sh
fails:
$ sbatch job-1.sh
Submitted batch job 123123
$ sbatch --dependency=afternotok:123123 job-2.sh
Submitted batch job 123124
If for some reason you want your jobs to run one after the other:
This is a bit cumbersome to do in a loop since the sbatch
command returns the text string
“Submitted batch job” before showing the jobid, but we can extract it with a awk '{ print $4 }'
command (which returns the 4th entry in the string), and use it in a loop as follows (not that
the first job must be submitted individually, as it has no dependencies):
$ lastid=`sbatch job-1.sh | awk '{ print $4 }'`
$ echo $lastid
123123
$ for i in 2 3 4 5; do lastid=`sbatch --dependency=after:${lastid} job-${i}.sh | awk '{ print $4 }'`; echo ${lastid}; done
123124
123125
123126
123127