Gaussian NRIS machines Job Examples
Note
Here we present tested examples for various job types on the different NRIS machines. This will be under more or less continoues development, and if you find things missing and/or not working as expected, do not hesitate to contact us.
Expected knowledge base
Before you run any Gaussian calculations, or any other calculations on NRIS machines for that matter, you are expected to update yourself on NRIS machinery specifics. A decent minimal curriculum is as follows:
Finding available Gaussian versions and submitting a standard Gaussian job
To see which versions of Gaussian software which are available on a given machine; type the following command after logged into the machine in question:
module avail Gaussian
To use Gaussian, type
module load Gaussian/<version>
specifying one of the available versions.
Please inspect the job script examples before submitting jobs!
To run an example - create a directory, step into it, create an input file (for example for water - see below), download a job script (for example the fram cpu job script as shown below) and submit the script with:
$ sbatch fram_g16.sh
Gaussian input file examples
Water input example (note the blank line at the end;
water.com
):
%chk=water
%mem=500MB
#p b3lyp/cc-pVDZ opt
structure optimization of water
0 1
O
H 1 0.96
H 1 0.96 2 109.471221
Caffeine input example (note the blank line at the end;
caffeine.com
):
%chk=caffeine
%mem=5GB
#p b3lyp/cc-pVQZ
caffeine molecule example
0 1
C 1.179579 0.000000 -0.825950
C 2.359623 0.000000 0.016662
C 2.346242 0.000000 1.466600
C 0.000000 0.000000 1.440000
C 4.217536 0.000000 1.154419
C 4.765176 -0.384157 -0.964164
C 1.058378 -0.322767 3.578004
C -1.260613 -0.337780 -0.608570
N 1.092573 0.000000 2.175061
N 0.000000 0.000000 0.000000
N 3.391185 0.000000 1.965657
N 3.831536 0.000000 0.062646
O -1.345306 0.000000 1.827493
O 1.192499 0.000000 -2.225890
H -1.997518 -0.535233 0.168543
H -1.598963 0.492090 -1.227242
H -1.138698 -1.225644 -1.227242
H 0.031688 -0.271264 3.937417
H 1.445570 -1.329432 3.728432
H 1.672014 0.388303 4.129141
H 4.218933 -0.700744 -1.851470
H 5.400826 0.464419 -1.212737
H 5.381834 -1.206664 -0.604809
H 5.288201 0.000000 1.353412
Running Gaussian on Fram
On Fram, you currently run exclusively on nodes by default. Note that means that you are using the nodes exclusively - thus if you ask for less than a full node, you might experience that more than one job is stacked on one node. This is something that you should keep in mind when submitting jobs.
Job script example (
fram_g16.sh
):
#!/bin/bash -l
#SBATCH --account=nnXXXXk
#SBATCH --job-name=example
#SBATCH --time=0-00:05:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=32
#SBATCH --output=slurm.%j.log
# make the program and environment visible to this script
module --quiet purge
module load Gaussian/g16_C.01
# name of input file without extension
input=water
# set the heap-size for the job to 20GB
export GAUSS_LFLAGS2="--LindaOptions -s 20000000"
export PGI_FASTMATH_CPU=avx2
# create the temporary folder
export GAUSS_SCRDIR=/cluster/work/users/$USER/$SLURM_JOB_ID
mkdir -p $GAUSS_SCRDIR
# split large temporary files into smaller parts
lfs setstripe --stripe-count 8 $GAUSS_SCRDIR
# copy input file to temporary folder
cp $SLURM_SUBMIT_DIR/$input.com $GAUSS_SCRDIR
# run the program
cd $GAUSS_SCRDIR
time g16.ib $input.com > $input.out
# copy result files back to submit directory
cp $input.out $input.chk $SLURM_SUBMIT_DIR
exit 0
Running Gaussian on Saga
On Saga there are more restrictions and tricky situations to consider than on Fram. First and foremost, there is a heterogenous setup with some nodes having 52 cores and most nodes having 40 cores. Secondly, on Saga there is a 256 core limit, efficiently limiting the useful maximum amount of nodes for a Gaussian job on Saga to 6. And third, since you do share the nodes by default - you need to find a way to set resource allocations in a sharing environment not necessarily heterogenous across your given nodes.
Currently, we are working to find a solution to all these challenges and as of now our advices are:
Up to and including 2 nodes should can be done with standard advices for running jobs on Saga.
For 3 nodes and above you either need to run with full nodes or using the slurm exclusive flag: #SBATCH--exclusive
. We prefer the latter due to robustness.
To facilitate this, the g16 wrapper has been edited to both be backwards compatible and adjust for the more recent insight on our side. If you are not using this wrapper, please look into the wrapper to find syntax for using in your job script. Wrapper(s) are all available in Gaussian Software folder. Current name is g16.ib.
Job script example (
saga_g16.sh
):
#!/bin/bash -l
#SBATCH --account=nnXXXXk
#SBATCH --job-name=example
#SBATCH --time=0-00:05:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=40
#SBATCH --mem=32G
#SBATCH --output=slurm.%j.log
# make the program and environment visible to this script
module --quiet purge
module load Gaussian/g16_C.01
export GAUSS_LFLAGS2="--LindaOptions -s 20000000"
export PGI_FASTMATH_CPU=avx2
# name of input file without extension
input=water
# create the temporary folder
export GAUSS_SCRDIR=/cluster/work/users/$USER/$SLURM_JOB_ID
mkdir -p $GAUSS_SCRDIR
# copy input file to temporary folder
cp $SLURM_SUBMIT_DIR/$input.com $GAUSS_SCRDIR
# run the program
cd $GAUSS_SCRDIR
time g16.ib $input.com > $input.out
# copy result files back to submit directory
cp $input.out $input.chk $SLURM_SUBMIT_DIR
exit 0
Running Gaussian on GPUs on Saga
Both of the current g16
versions on Saga supports GPU offloading, and we have provided
an alternative wrapper script for launching the GPU version. The only things that
need to change in the run script are the resource allocation, by adding --gpus=N
and --partition=accel
, and to use the g16.gpu
wrapper script instead of g16.ib
.
The g16.gpu
script is available through the standard Gaussian modules, Gaussian/g16_B.01
and Gaussian/g16_C.01
(the latter will likely have better GPU performance since it is
the more recent version).
There are some important limitations for the current GPU version:
It can only be run as single-node (up to 24 CPU cores + 4 GPU), so please specify
--nodes=1
The number of GPUs must be specified with the
--gpus=N
flag (not--gpus-per-task
)The billing ratio between GPUs and CPUs is 6:1 on Saga, so the natural way to increment resources is to add 6 CPUs per GPU
Not all parts of Gaussian is able to utilize GPU resources. From the official docs:
GPUs are effective for larger molecules when doing DFT energies, gradients and frequencies
(for both ground and excited states), but they are not effective for small jobs. They are
also not used effectively by post-SCF calculations such as MP2 or CCSD.
Run script example (
gpu_g16.sh
)
#!/bin/bash -l
#SBATCH --account=nnXXXXk
#SBATCH --job-name=example
#SBATCH --time=0-01:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=6
#SBATCH --gpus=1
#SBATCH --partition=accel
#SBATCH --mem=96G
#SBATCH --output=slurm.%j.log
# make the program and environment visible to this script
module --quiet purge
module load Gaussian/g16_C.01
export PGI_FASTMATH_CPU=skylake
# name of input file without extension
input=caffeine
# create the temporary folder
export GAUSS_SCRDIR=/cluster/work/users/$USER/$SLURM_JOB_ID
mkdir -p $GAUSS_SCRDIR
# copy input file to temporary folder
cp $SLURM_SUBMIT_DIR/$input.com $GAUSS_SCRDIR
# run the program
cd $GAUSS_SCRDIR
time g16.gpu $input.com > $input.out
# copy result files back to submit directory
cp $input.out $input.chk $SLURM_SUBMIT_DIR
exit 0
Some timing examples are listed below for a single-point energy calculation on the Caffeine molecule using a large quadruple zeta basis set. The requested resources are chosen based on billing units, see Projects and accounting, where one GPU is the equvalent of six CPU cores. Then the memory is chosen such that it will not be the determining factor for the overall billing.
Configuration |
CPUs |
GPUs |
MEM |
Run time |
Speedup |
Billing |
CPU-hrs |
---|---|---|---|---|---|---|---|
Reference |
1 |
0 |
4G |
6h51m26s |
1.0 |
1 |
6.9 |
1 GPU equivalent |
6 |
0 |
20G |
1h00m45s |
6.8 |
6 |
6.1 |
2 GPU equivalents |
12 |
0 |
40G |
36m08s |
11.4 |
12 |
7.2 |
3 GPU equivalents |
18 |
0 |
60G |
30m14s |
13.6 |
18 |
9.1 |
4 GPU equivalents |
24 |
0 |
80G |
19m52s |
20.7 |
24 |
7.9 |
Full normal node |
40 |
0 |
140G |
13m05s |
31.4 |
40 |
8.7 |
1/4 GPU node |
6 |
1 |
80G |
22m41s |
18.1 |
6 |
2.3 |
1/2 GPU node |
12 |
2 |
160G |
15m44s |
26.2 |
12 |
3.1 |
3/4 GPU node |
18 |
3 |
240G |
12m03s |
34.1 |
18 |
3.6 |
Full GPU node |
24 |
4 |
320G |
10m12s |
40.3 |
24 |
4.1 |
The general impression from these numbers is that Gaussian scales quite well for this particular calculation, and we see from the last column that the GPU version is consistently about a factor two more efficient than the CPU version, when comparing the actual consumed CPU-hours. This will of course depend on the conversion factor from CPU to GPU billing, which will depend on the system configuration, but at least with the current ratio of 6:1 on Saga it seems to pay off to use the GPU over the CPU version (queuing time not taken into account).
If you find any issues with the GPU version of Gaussian, please contact us at our support line.
Note
The timings in the table above represent a single use case, and the behavior might be very different in other situations. Please perform simple benchmarks to check that the program runs efficiently with your particular computational setup. Also do not hesitate to contact us if you need guidance on GPU efficiency, see our extended GPU Support.