Using Arm Performance Reports on Fram

Due to a bug in older versions of OpenMPI, on Fram Arm Performance Reports works only with OpenMPI version 3.1.3 and newer. If you have compiled your application with OpenMPI 3.1.1, you don't need to recompile it. Simply load the 3.1.3 module - those versions are compatible.

Arm Performance Reports does not work in the "express launch mode". In other words it cannot be used in combination with commands starting with "mpirun" or "srun". Only use it in the "compatibility mode" by giving the number of MPI tasks as an argument to perf-report.

Use Arm Performance Reports only either in job scripts or on an interactive compute node never on a login node:

Profiling a batch script

Let us consider the following example job script as your usual computation which you wish to profile:

#!/bin/bash -l

#SBATCH --account=YourAccount
#SBATCH --job-name=without-apr
#SBATCH --time=0-00:05:00
#SBATCH --nodes=4 --ntasks-per-node=32
#SBATCH --qos=short

# recommended bash safety settings
set -o errexit  # make bash exit on any error
set -o nounset  # treat unset variables as errors

srun ./myexample.x  # <- we will need to modify this line

All we need to do is to load the Arm-PerfReports/20.0.3 module and to modify the srun command to instead use perf-report (you need to adjust "YourAccount"):

#!/bin/bash -l

#SBATCH --account=YourAccount
#SBATCH --job-name=with-apr
#SBATCH --time=0-00:05:00
#SBATCH --nodes=4 --ntasks-per-node=32
#SBATCH --qos=short

# recommended bash safety settings
set -o errexit  # make bash exit on any error
set -o nounset  # treat unset variables as errors

module load Arm-PerfReports/20.0.3  # <- we added this line

perf-report -n ${SLURM_NTASKS} ./myexample.x  # <- we modified this line

In other words replace srun or mpirun -n ${SLURM_NTASKS} by perf-report -n ${SLURM_NTASKS}.

That's it.

Profiling on an interactive compute node

To run interactive tests one needs to submit an interactive job to Slurm using srun (not using salloc), e.g.:

# obtain an interactive compute node for 30 minutes
# adjust "YourAccount"
$ srun --nodes=1 --ntasks-per-node=32 --time=00:30:00 --qos=devel --account=YourAccount --pty bash -i

# load the module
$ module load Arm-PerfReports/20.0.3 

# profile my application
$ perf-report -n ${SLURM_NTASKS} ./myexample.x

results matching ""

    No results matching ""