Job array howto
In this example we wish to run many similar sequential jobs in parallel using job arrays. We take Python as an example but this does not matter for the job arrays:
#!/usr/bin/env python
import time
print('start at ' + time.strftime('%H:%M:%S'))
print('sleep for 10 seconds ...')
time.sleep(10)
print('stop at ' + time.strftime('%H:%M:%S'))
Download the script:
Try it out:
$ python array_test.py
start at 15:23:48
sleep for 10 seconds ...
stop at 15:23:58
Good. Now we would like to run this script 16 times at (more or less) the same time. For this we use the following
#!/bin/bash
#####################
# job-array example #
#####################
## Substitute with your project name:
#SBATCH --account=YourProject
#SBATCH --job-name=array_example
# 16 jobs will run in this array at the same time
#SBATCH --array=1-16
# each job will run for maximum five minutes
# d-hh:mm:ss
#SBATCH --time=0-00:05:00
# you must not place bash commands before the last #SBATCH directive
## Set safer defaults for bash
set -o errexit
set -o nounset
module --quiet purge # Clear any inherited modules
# each job will see a different ${SLURM_ARRAY_TASK_ID}
echo "now processing task id:: " ${SLURM_ARRAY_TASK_ID}
python test.py > output_${SLURM_ARRAY_TASK_ID}.txt
Download the script:
This is a script for running a normal array job on Saga. It can easily be changed to run on Fram or Betzy, or use a different job type.
Submit the script with sbatch
and after a while you should see 16
output files in your submit directory:
$ ls -l output*txt
-rw------- 1 user user 60 Oct 14 14:44 output_1.txt
-rw------- 1 user user 60 Oct 14 14:44 output_10.txt
-rw------- 1 user user 60 Oct 14 14:44 output_11.txt
-rw------- 1 user user 60 Oct 14 14:44 output_12.txt
-rw------- 1 user user 60 Oct 14 14:44 output_13.txt
-rw------- 1 user user 60 Oct 14 14:44 output_14.txt
-rw------- 1 user user 60 Oct 14 14:44 output_15.txt
-rw------- 1 user user 60 Oct 14 14:44 output_16.txt
-rw------- 1 user user 60 Oct 14 14:44 output_2.txt
-rw------- 1 user user 60 Oct 14 14:44 output_3.txt
-rw------- 1 user user 60 Oct 14 14:44 output_4.txt
-rw------- 1 user user 60 Oct 14 14:44 output_5.txt
-rw------- 1 user user 60 Oct 14 14:44 output_6.txt
-rw------- 1 user user 60 Oct 14 14:44 output_7.txt
-rw------- 1 user user 60 Oct 14 14:44 output_8.txt
-rw------- 1 user user 60 Oct 14 14:44 output_9.txt
Observe that they all started (approximately) at the same time:
$ grep start output*txt
output_1.txt:start at 14:43:58
output_10.txt:start at 14:44:00
output_11.txt:start at 14:43:59
output_12.txt:start at 14:43:59
output_13.txt:start at 14:44:00
output_14.txt:start at 14:43:59
output_15.txt:start at 14:43:59
output_16.txt:start at 14:43:59
output_2.txt:start at 14:44:00
output_3.txt:start at 14:43:59
output_4.txt:start at 14:43:59
output_5.txt:start at 14:43:58
output_6.txt:start at 14:43:59
output_7.txt:start at 14:43:58
output_8.txt:start at 14:44:00
output_9.txt:start at 14:43:59