Packaging smaller parallel jobs into one large job

There are several ways to package smaller jobs into one large parallel job. The preferred way is to use Job Arrays. Here we want to present a more pedestrian alternative which can give a lot of flexibility.

In this example we imagine that we wish to run 8 MPI jobs at the same time, each using 16 tasks, thus totalling to 128 tasks. Once they finish, we wish to do a post-processing step and then resubmit another set of 8 jobs with 16 tasks each:

#!/bin/bash

## Job name:
#SBATCH --job-name=MyLargeJob
## Allocating amount of resources:
#SBATCH --nodes=4
## Number of tasks (aka processes) to start on each node: Pure mpi, one task per core
#SBATCH --ntasks-per-node=32
## No memory pr task since this option is turned off on Fram in QOS normal.
## Run for 10 minutes, syntax is d-hh:mm:ss
#SBATCH --time=0-00:10:00 


cd ${SLURM_SUBMIT_DIR}

# first set of parallel runs
mpirun -n 16 ./my-binary &
mpirun -n 16 ./my-binary &
mpirun -n 16 ./my-binary &
mpirun -n 16 ./my-binary &
mpirun -n 16 ./my-binary &
mpirun -n 16 ./my-binary &
mpirun -n 16 ./my-binary &
mpirun -n 16 ./my-binary &

wait

# here a post-processing step
# ...

# another set of parallel runs
mpirun -n 16 ./my-binary &
mpirun -n 16 ./my-binary &
mpirun -n 16 ./my-binary &
mpirun -n 16 ./my-binary &
mpirun -n 16 ./my-binary &
mpirun -n 16 ./my-binary &
mpirun -n 16 ./my-binary &
mpirun -n 16 ./my-binary &

wait

exit 0

The wait commands are important here - the run script will only continue once all commands started with & have completed.

results matching ""

    No results matching ""