Gaussian NRIS machines Job Examples

Note

Here we present tested examples for various job types on the different NRIS machines. This will be under more or less continoues development, and if you find things missing and/or not working as expected, do not hesitate to contact us.

Expected knowledge base

Before you run any Gaussian calculations, or any other calculations on NRIS machines for that matter, you are expected to update yourself on NRIS machinery specifics. A decent minimal curriculum is as follows:

Finding available Gaussian versions and submitting a standard Gaussian job

To see which versions of Gaussian software which are available on a given machine; type the following command after logged into the machine in question:

module avail Gaussian

To use Gaussian, type

module load Gaussian/<version>

specifying one of the available versions.

Please inspect the job script examples before submitting jobs!

To run an example - create a directory, step into it, create an input file (for example for water - see below), download a job script (for example the fram cpu job script as shown below) and submit the script with:

$ sbatch fram_g16.sh

Gaussian input file examples

  • Water input example (note the blank line at the end; water.com):

%chk=water
%mem=500MB
#p b3lyp/cc-pVDZ opt

structure optimization of water

0 1
O
H 1 0.96
H 1 0.96 2 109.471221

  • Caffeine input example (note the blank line at the end; caffeine.com):

%chk=caffeine
%mem=5GB
#p b3lyp/cc-pVQZ

caffeine molecule example

0 1
C     1.179579     0.000000    -0.825950
C     2.359623     0.000000     0.016662
C     2.346242     0.000000     1.466600
C     0.000000     0.000000     1.440000
C     4.217536     0.000000     1.154419
C     4.765176    -0.384157    -0.964164
C     1.058378    -0.322767     3.578004
C    -1.260613    -0.337780    -0.608570
N     1.092573     0.000000     2.175061
N     0.000000     0.000000     0.000000
N     3.391185     0.000000     1.965657
N     3.831536     0.000000     0.062646
O    -1.345306     0.000000     1.827493
O     1.192499     0.000000    -2.225890
H    -1.997518    -0.535233     0.168543
H    -1.598963     0.492090    -1.227242
H    -1.138698    -1.225644    -1.227242
H     0.031688    -0.271264     3.937417
H     1.445570    -1.329432     3.728432
H     1.672014     0.388303     4.129141
H     4.218933    -0.700744    -1.851470
H     5.400826     0.464419    -1.212737
H     5.381834    -1.206664    -0.604809
H     5.288201     0.000000     1.353412

Running Gaussian on Fram

On Fram, you currently run exclusively on nodes by default. Note that means that you are using the nodes exclusively - thus if you ask for less than a full node, you might experience that more than one job is stacked on one node. This is something that you should keep in mind when submitting jobs.

  • Job script example (fram_g16.sh):

#!/bin/bash -l
#SBATCH --account=nnXXXXk
#SBATCH --job-name=example
#SBATCH --time=0-00:05:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=32
#SBATCH --output=slurm.%j.log

# make the program and environment visible to this script
module --quiet purge
module load Gaussian/g16_C.01

# name of input file without extension
input=water

# set the heap-size for the job to 20GB
export GAUSS_LFLAGS2="--LindaOptions -s 20000000"
export PGI_FASTMATH_CPU=avx2


# create the temporary folder
export GAUSS_SCRDIR=/cluster/work/users/$USER/$SLURM_JOB_ID
mkdir -p $GAUSS_SCRDIR

# split large temporary files into smaller parts
lfs setstripe --stripe-count 8 $GAUSS_SCRDIR

# copy input file to temporary folder
cp $SLURM_SUBMIT_DIR/$input.com $GAUSS_SCRDIR

# run the program
cd $GAUSS_SCRDIR
time g16.ib $input.com > $input.out

# copy result files back to submit directory
cp $input.out $input.chk $SLURM_SUBMIT_DIR

exit 0

Running Gaussian on Saga

On Saga there are more restrictions and tricky situations to consider than on Fram. First and foremost, there is a heterogenous setup with some nodes having 52 cores and most nodes having 40 cores. Secondly, on Saga there is a 256 core limit, efficiently limiting the useful maximum amount of nodes for a Gaussian job on Saga to 6. And third, since you do share the nodes by default - you need to find a way to set resource allocations in a sharing environment not necessarily heterogenous across your given nodes.

Currently, we are working to find a solution to all these challenges and as of now our advices are: Up to and including 2 nodes should can be done with standard advices for running jobs on Saga. For 3 nodes and above you either need to run with full nodes or using the slurm exclusive flag: #SBATCH--exclusive. We prefer the latter due to robustness.

To facilitate this, the g16 wrapper has been edited to both be backwards compatible and adjust for the more recent insight on our side. If you are not using this wrapper, please look into the wrapper to find syntax for using in your job script. Wrapper(s) are all available in Gaussian Software folder. Current name is g16.ib.

  • Job script example (saga_g16.sh):

#!/bin/bash -l
#SBATCH --account=nnXXXXk
#SBATCH --job-name=example
#SBATCH --time=0-00:05:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=40
#SBATCH --mem=32G
#SBATCH --output=slurm.%j.log

# make the program and environment visible to this script
module --quiet purge
module load Gaussian/g16_C.01

export GAUSS_LFLAGS2="--LindaOptions -s 20000000"
export PGI_FASTMATH_CPU=avx2

# name of input file without extension
input=water

# create the temporary folder
export GAUSS_SCRDIR=/cluster/work/users/$USER/$SLURM_JOB_ID
mkdir -p $GAUSS_SCRDIR

# copy input file to temporary folder
cp $SLURM_SUBMIT_DIR/$input.com $GAUSS_SCRDIR

# run the program
cd $GAUSS_SCRDIR
time g16.ib $input.com > $input.out

# copy result files back to submit directory
cp $input.out $input.chk $SLURM_SUBMIT_DIR

exit 0

Running Gaussian on GPUs on Saga

Both of the current g16 versions on Saga supports GPU offloading, and we have provided an alternative wrapper script for launching the GPU version. The only things that need to change in the run script are the resource allocation, by adding --gpus=N and --partition=accel, and to use the g16.gpu wrapper script instead of g16.ib. The g16.gpu script is available through the standard Gaussian modules, Gaussian/g16_B.01 and Gaussian/g16_C.01 (the latter will likely have better GPU performance since it is the more recent version).

There are some important limitations for the current GPU version:

  • It can only be run as single-node (up to 24 CPU cores + 4 GPU), so please specify --nodes=1

  • The number of GPUs must be specified with the --gpus=N flag (not --gpus-per-task)

  • The billing ratio between GPUs and CPUs is 6:1 on Saga, so the natural way to increment resources is to add 6 CPUs per GPU

  • Not all parts of Gaussian is able to utilize GPU resources. From the official docs:

GPUs are effective for larger molecules when doing DFT energies, gradients and frequencies
(for both ground and excited states), but they are not effective for small jobs. They are
also not used effectively by post-SCF calculations such as MP2 or CCSD.
  • Run script example (gpu_g16.sh)

#!/bin/bash -l
#SBATCH --account=nnXXXXk
#SBATCH --job-name=example
#SBATCH --time=0-01:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=6
#SBATCH --gpus=1
#SBATCH --partition=accel
#SBATCH --mem=96G
#SBATCH --output=slurm.%j.log

# make the program and environment visible to this script
module --quiet purge
module load Gaussian/g16_C.01

export PGI_FASTMATH_CPU=skylake

# name of input file without extension
input=caffeine

# create the temporary folder
export GAUSS_SCRDIR=/cluster/work/users/$USER/$SLURM_JOB_ID
mkdir -p $GAUSS_SCRDIR

# copy input file to temporary folder
cp $SLURM_SUBMIT_DIR/$input.com $GAUSS_SCRDIR

# run the program
cd $GAUSS_SCRDIR
time g16.gpu $input.com > $input.out

# copy result files back to submit directory
cp $input.out $input.chk $SLURM_SUBMIT_DIR

exit 0

Some timing examples are listed below for a single-point energy calculation on the Caffeine molecule using a large quadruple zeta basis set. The requested resources are chosen based on billing units, see Projects and accounting, where one GPU is the equvalent of six CPU cores. Then the memory is chosen such that it will not be the determining factor for the overall billing.

Configuration

CPUs

GPUs

MEM

Run time

Speedup

Billing

CPU-hrs

Reference

1

0

4G

6h51m26s

1.0

1

6.9

1 GPU equivalent

6

0

20G

1h00m45s

6.8

6

6.1

2 GPU equivalents

12

0

40G

36m08s

11.4

12

7.2

3 GPU equivalents

18

0

60G

30m14s

13.6

18

9.1

4 GPU equivalents

24

0

80G

19m52s

20.7

24

7.9

Full normal node

40

0

140G

13m05s

31.4

40

8.7

1/4 GPU node

6

1

80G

22m41s

18.1

6

2.3

1/2 GPU node

12

2

160G

15m44s

26.2

12

3.1

3/4 GPU node

18

3

240G

12m03s

34.1

18

3.6

Full GPU node

24

4

320G

10m12s

40.3

24

4.1

The general impression from these numbers is that Gaussian scales quite well for this particular calculation, and we see from the last column that the GPU version is consistently about a factor two more efficient than the CPU version, when comparing the actual consumed CPU-hours. This will of course depend on the conversion factor from CPU to GPU billing, which will depend on the system configuration, but at least with the current ratio of 6:1 on Saga it seems to pay off to use the GPU over the CPU version (queuing time not taken into account).

If you find any issues with the GPU version of Gaussian, please contact us at our support line.

Note

The timings in the table above represent a single use case, and the behavior might be very different in other situations. Please perform simple benchmarks to check that the program runs efficiently with your particular computational setup. Also do not hesitate to contact us if you need guidance on GPU efficiency, see our extended GPU Support.