Software module scheme
Since a HPC cluster is shared among many users, and also holds a significant size in contrast to most desktop compute machinery around, the amount of installed software spans many applications in many different versions and quite a few of them are installed typically non-standard places for easier maintenance (for admin crew), practical and security reasons. It is not possible (nor desirable) to use them all at the same time, since different versions of the same application may conflict with each other. Therefore, it is practical to provide the production environment for a given application outside of the application itself. This is done using a set of instructions and variable settings that are specific for the given application called an application module. This also simplifies control of which application versions are available in a specific session.
The main command for using this system is the module command. You can find a list of all its options by typing:
module --help
We use the lmod module system; for more info see https://lmod.readthedocs.io/en/latest/ in NRIS currently. Below we listed the most commonly used options, but also feel free to investigate options in this toolset more thoroughly on developers site.
Which modules are currently loaded?
To see the modules currently active in your session, use the command:
module list
Which modules are available?
In order to see a complete list of available modules, issue the command:
module avail
The resulting list will contain module names conforming to the following pattern:
name of the module
/
version
The avail
option can also be used to search for specific software, e.g.
module avail netcdf
will list all modules matching the string “netcdf” (case insensitive).
Note
Some modules are mainly intended as dependencies for others, and are typically
not very useful by themselves. Such modules are made hidden to the module avail
command to avoid cluttering the listed output. However, if you are compiling
your own code some of these might still be useful, and you can still load them.
To include hidden modules you can add the --show-hidden
option to the module avail
search.
How to load a module
In order to make, for instance, the NetCDF library available, get the full list of available netCDF modules first by typing:
module avail netCDF
Pick up one from the list (for example netCDF/4.9.2-gompi-2023a) and then issue the command:
module load netCDF/4.9.2-gompi-2023a
Note that we currently do not have default modules on NRIS machines, so you need to write full module name when loading!
How to unload a module
Keeping with the above example, use the following command to unload the NetCDF module again:
module unload netCDF
Note that this will only unload the loaded module with “netCDF/”-namebase, in this case the module named netCDF/4.4.1.1-intel-2018a-HDF5-1.8.19. To unload everything you can type
module purge
Note
The module purge
command will inform you that some modules (like StdEnv
)
were not unloaded. Such modules are made “sticky” because they are necessary
for the system to work, and they should not be --force
purged as the message
suggest. If this warning message annoys you, you can suppress it with the --quiet
option instead.
How to switch to a different version of a module
Switching to another version is similar to loading a specific version. As an example, if you want to switch from the current loaded netCDF to an older one; netCDF/4.9.0-gompi-2022b:
module switch netCDF/4.9.2-gompi-2023a netCDF/4.9.0-gompi-2022b
This, more compact syntax will fortunately also work:
module switch netCDF netCDF/4.9.0-gompi-2022b
Note
We are using self-contained modules in NRIS, meaning that a given module
loads all dependecies necessary. This is in slight contrast to old policies and also means it is possible
to make a mess if you load extra modules in job scripts after loading the main software module.
We recommend doing module list
after every load (to inspect) and unloading any
conflicting packages, if possible. It is also good practice to start all job scripts
with a module purge
, before loading all necessary modules for the calculation.
How to save and restore your module environment
When you have loaded all necessary modules for a particular purpose and made sure that your environment is working correctly, you can save it with
module save <name-of-env>
and later restore it with
module restore <name-of-env>
To list all your saved environments
module savelist
This feature is particularly convenient if you spend a lot of time compiling/debugging in interactive sessions. For production calculations using job scripts it is still recommended to load each module explicitly for clarity.
GPU modules
Saga
There are two types of GPU nodes on Saga, located in two distinct SLURM partitions:
Intel CPU with 4X Tesla P100, 16GB,
--partition=accel
AMD CPUs with 4XA100, 80GB,
--partition=a100
These are different architectures. By default, Saga loads the Intel software environment,
If you want to run/compile software for the nodes with the AMD CPUs and A100 GPUS
you need to get an allocation on the a100
partition. Then, inside the allocation,
or inside your job script, switch the module environment
module --force swap StdEnv Zen2Env
Note that installed modules can vary between the two node types.