Job Placement on Fram
The compute nodes on Fram are divided into four groups, called islands. Each island has about the same number of nodes. The Infiniband network throughput ("speed") within an island is higher than the throughput between islands. Some jobs need high network throughput between its nodes, and will usually run faster if they run within a single island.
Therefore, the queue system is configured to run each job within one island, if that does not delay the job too much. It works like this: When a job is submitted, the queue system lets the job wait until there are enough free resources so that it can run within one island. If this has not happened when the job has waited 7 days1, the job will be started on more than one island.
Overriding the Setup
The downside of requiring that all nodes belonging to a job should be in the same island, is that the job might have to wait longer in the queue, especially if the job needs many nodes. Some jobs do not need high network throughput between its nodes. For such jobs, you can override the setup, either for individual jobs or for all your jobs.
For individual jobs, you can use the switch
--switches=N[@time] on the
command line when submitting the job, where N is the maximal number of
islands to use (1, 2, 3 or 4), and time (optional) is the maximum time to
man sbatch for details. Two examples:
--switches=2 # Allow two islands --switches=1@4-0:0:0 # Change max wait time to 4 days
The maximal possible wait time to specify is 28 days1. A longer time will silently be truncated to 28 days!
Note that putting this option in an
#SBATCH line in the job script will
not work (it will silently be overridden by the environment variables we
set to get the default behaviour)!
On the other hand, you might want to guarantee that your job never,
ever, starts on more than one island. The easiest way to do that is to
--constraint=[island1|island2|island3|island4] instead (this option
can be used either on the command line or in the job script).
Changing the Defaults
For changing the default for your jobs, you can change the followin environment variables:
SBATCH_REQ_SWITCH: Max number of islands for
SALLOC_REQ_SWITCH: Max number of islands for
SRUN_REQ_SWITCH: Max number of islands for
SBATCH_WAIT4SWITCH: Max wait time for
SALLOC_WAIT4SWITCH: Max wait time for
SRUN_WAIT4SWITCH: Max wait time for
srun jobs are interactive jobs; see
Interactive Jobs.) As above, the maximal possible wait
time to specify is 28 days1, and any time longer than that will silently be
truncated. The change takes effect for jobs submitted after you change the
variables. For instance, to change the default to allow two islands, and wait
up to two weeks:
export SBATCH_REQ_SWITCH=2 export SALLOC_REQ_SWITCH=2 export SRUN_REQ_SWITCH=2 export SBATCH_WAIT4SWITCH=14-00:00:00 export SALLOC_WAIT4SWITCH=14-00:00:00 export SRUN_WAIT4SWITCH=14-00:00:00
Note that we do not recommend that you unset these variables. If you want
your jobs to start on any nodes, whichever island they are on, simply set
*_REQ_SWITCH variables to 4. Specifically, if you unset the
*_WAIT4SWITCH variables, they will default to 28 days1. Also, in the
future we might change the underlying mechanism, in which case unsetting these
variables will have no effect (but setting them will).
1. The limits might change in the future. ↩