Gaussian NRIS machines Job Examples

Note

Here we present tested examples for various job types on the different NRIS machines. This will be under more or less continoues development, and if you find things missing and/or not working as expected, do not hesitate to contact us.

Expected knowledge base

Before you run any Gaussian calculations, or any other calculations on NRIS machines for that matter, you are expected to update yourself on NRIS machinery specifics. A decent minimal curriculum is as follows:

Finding available Gaussian versions and submitting a standard Gaussian job

To see which versions of Gaussian software which are available on a given machine; type the following command after logged into the machine in question:

module avail Gaussian

To use Gaussian, type

module load Gaussian/<version>

specifying one of the available versions.

Please inspect the job script examples before submitting jobs!

To run an example - create a directory, step into it, create an input file (for example for water - see below), download a job script (for example the fram cpu job script as shown below) and submit the script with:

$ sbatch saga_g16.sh

Gaussian input file examples

  • Water input example (note the blank line at the end; water.com):

%chk=water
%mem=500MB
#p b3lyp/cc-pVDZ opt

structure optimization of water

0 1
O
H 1 0.96
H 1 0.96 2 109.471221

  • Caffeine input example (note the blank line at the end; caffeine.com):

%chk=caffeine
%mem=5GB
#p b3lyp/cc-pVQZ

caffeine molecule example

0 1
C     1.179579     0.000000    -0.825950
C     2.359623     0.000000     0.016662
C     2.346242     0.000000     1.466600
C     0.000000     0.000000     1.440000
C     4.217536     0.000000     1.154419
C     4.765176    -0.384157    -0.964164
C     1.058378    -0.322767     3.578004
C    -1.260613    -0.337780    -0.608570
N     1.092573     0.000000     2.175061
N     0.000000     0.000000     0.000000
N     3.391185     0.000000     1.965657
N     3.831536     0.000000     0.062646
O    -1.345306     0.000000     1.827493
O     1.192499     0.000000    -2.225890
H    -1.997518    -0.535233     0.168543
H    -1.598963     0.492090    -1.227242
H    -1.138698    -1.225644    -1.227242
H     0.031688    -0.271264     3.937417
H     1.445570    -1.329432     3.728432
H     1.672014     0.388303     4.129141
H     4.218933    -0.700744    -1.851470
H     5.400826     0.464419    -1.212737
H     5.381834    -1.206664    -0.604809
H     5.288201     0.000000     1.353412

Running Gaussian on Saga

On Saga there are more restrictions and tricky situations to consider than on Fram. First and foremost, there is a heterogenous setup with some nodes having 52 cores and most nodes having 40 cores. Secondly, on Saga there is a 256 core limit, efficiently limiting the useful maximum amount of nodes for a Gaussian job on Saga to 6. And third, since you do share the nodes by default - you need to find a way to set resource allocations in a sharing environment not necessarily heterogenous across your given nodes.

Currently, we are working to find a solution to all these challenges and as of now our advices are: Up to and including 2 nodes should can be done with standard advices for running jobs on Saga. For 3 nodes and above you either need to run with full nodes or using the slurm exclusive flag: #SBATCH--exclusive. We prefer the latter due to robustness.

To facilitate this, the g16 wrapper has been edited to both be backwards compatible and adjust for the more recent insight on our side. If you are not using this wrapper, please look into the wrapper to find syntax for using in your job script. Wrapper(s) are all available in Gaussian Software folder. Current name is g16.ib.

  • Job script example (saga_g16.sh):

#!/bin/bash -l
#SBATCH --account=nnXXXXk
#SBATCH --job-name=example
#SBATCH --time=0-00:05:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=40
#SBATCH --mem=32G
#SBATCH --output=slurm.%j.log

# make the program and environment visible to this script
module --quiet purge
module load Gaussian/g16_C.01

export GAUSS_LFLAGS2="--LindaOptions -s 20000000"
export PGI_FASTMATH_CPU=avx2

# name of input file without extension
input=water

# create the temporary folder
export GAUSS_SCRDIR=/cluster/work/users/$USER/$SLURM_JOB_ID
mkdir -p $GAUSS_SCRDIR

# copy input file to temporary folder
cp $SLURM_SUBMIT_DIR/$input.com $GAUSS_SCRDIR

# run the program
cd $GAUSS_SCRDIR
time g16.ib $input.com > $input.out

# copy result files back to submit directory
cp $input.out $input.chk $SLURM_SUBMIT_DIR

exit 0

Running Gaussian on GPUs on Saga

Both of the current g16 versions on Saga supports GPU offloading, and we have provided an alternative wrapper script for launching the GPU version. The only things that need to change in the run script are the resource allocation, by adding --gpus=N and --partition=accel, and to use the g16.gpu wrapper script instead of g16.ib. The g16.gpu script is available through the standard Gaussian modules, Gaussian/g16_B.01 and Gaussian/g16_C.01 (the latter will likely have better GPU performance since it is the more recent version).

There are some important limitations for the current GPU version:

  • It can only be run as single-node (up to 24 CPU cores + 4 GPU), so please specify --nodes=1

  • The number of GPUs must be specified with the --gpus=N flag (not --gpus-per-task)

  • The billing ratio between GPUs and CPUs is 6:1 on Saga, so the natural way to increment resources is to add 6 CPUs per GPU

  • Not all parts of Gaussian is able to utilize GPU resources. From the official docs:

GPUs are effective for larger molecules when doing DFT energies, gradients and frequencies
(for both ground and excited states), but they are not effective for small jobs. They are
also not used effectively by post-SCF calculations such as MP2 or CCSD.
  • Run script example (gpu_g16.sh)

#!/bin/bash -l
#SBATCH --account=nnXXXXk
#SBATCH --job-name=example
#SBATCH --time=0-01:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=6
#SBATCH --gpus=1
#SBATCH --partition=accel
#SBATCH --mem=96G
#SBATCH --output=slurm.%j.log

# make the program and environment visible to this script
module --quiet purge
module load Gaussian/g16_C.01

export PGI_FASTMATH_CPU=skylake

# name of input file without extension
input=caffeine

# create the temporary folder
export GAUSS_SCRDIR=/cluster/work/users/$USER/$SLURM_JOB_ID
mkdir -p $GAUSS_SCRDIR

# copy input file to temporary folder
cp $SLURM_SUBMIT_DIR/$input.com $GAUSS_SCRDIR

# run the program
cd $GAUSS_SCRDIR
time g16.gpu $input.com > $input.out

# copy result files back to submit directory
cp $input.out $input.chk $SLURM_SUBMIT_DIR

exit 0

Some timing examples are listed below for a single-point energy calculation on the Caffeine molecule using a large quadruple zeta basis set. The requested resources are chosen based on billing units, see Projects and accounting, where one GPU is the equvalent of six CPU cores. Then the memory is chosen such that it will not be the determining factor for the overall billing.

Configuration

CPUs

GPUs

MEM

Run time

Speedup

Billing

CPU-hrs

Reference

1

0

4G

6h51m26s

1.0

1

6.9

1 GPU equivalent

6

0

20G

1h00m45s

6.8

6

6.1

2 GPU equivalents

12

0

40G

36m08s

11.4

12

7.2

3 GPU equivalents

18

0

60G

30m14s

13.6

18

9.1

4 GPU equivalents

24

0

80G

19m52s

20.7

24

7.9

Full normal node

40

0

140G

13m05s

31.4

40

8.7

1/4 GPU node

6

1

80G

22m41s

18.1

6

2.3

1/2 GPU node

12

2

160G

15m44s

26.2

12

3.1

3/4 GPU node

18

3

240G

12m03s

34.1

18

3.6

Full GPU node

24

4

320G

10m12s

40.3

24

4.1

The general impression from these numbers is that Gaussian scales quite well for this particular calculation, and we see from the last column that the GPU version is consistently about a factor two more efficient than the CPU version, when comparing the actual consumed CPU-hours. This will of course depend on the conversion factor from CPU to GPU billing, which will depend on the system configuration, but at least with the current ratio of 6:1 on Saga it seems to pay off to use the GPU over the CPU version (queuing time not taken into account).

If you find any issues with the GPU version of Gaussian, please contact us at our support line.

Note

The timings in the table above represent a single use case, and the behavior might be very different in other situations. Please perform simple benchmarks to check that the program runs efficiently with your particular computational setup. Also do not hesitate to contact us if you need guidance on GPU efficiency, see our extended GPU Support.

Running Gaussian in Betzy and Olivia with Linda parallelization

Linda is Gaussian’s method of process parallelization. Linda allows to run certain calculations in less time dute to parallelization, and it is essential in multi-node calculations of Gaussian, as it helps Gaussian communicate across nodes. Since the minimum numnber of nodes one can run jobs on in Betzy is 4, knowing how to use Linda is essential for proper usage.

More information on Linda and parallelization in Gaussian can be read in Gaussian’s documentation here and for what specific calculations are sped up, you can read here. In both of those links, navigate to the Parallel sections.

In order to run Gaussian with Linda parallelization in Betzy and Olivia one needs to set up passwordless login to the nodes, authorize these nodes for login in job-script, and put the necessary lines in the Gaussian input file. Following is a guide in how to do this

Setting up Passwordless login to the nodes

Before any job with Gaussian is done, do the following in your home directory. You only have to this once.

  1. Generate a keypair for the nodes. For this, run the following in the terminal:

ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519 -C "$USER@<MACHINE>" -N ""

This will generate a passwordless keypair. Replace <MACHINE> with either betzy, or olivia.

  1. Add the public key (ending with .pub) to authorized keys by running the following commands on the terminal:

touch ~/.ssh/authorized_keys
cat ~/.ssh/id_ed25519.pub >> ~/.ssh/authorized_keys
  1. If you don’t have a ~/.ssh/config, you can create it by by running the following commands in the terminal, otherwise, skip to the next step

touch ~/.ssh/config
  1. Open the ~/.ssh/config file and add the lines below to the end of it with your preferred editor.

Host *.<MACHINE>.sigma2.no
IdentityFile ~/.ssh/id_ed25519
BatchMode yes
StrictHostKeyChecking yes

replacing <MACHINE> with either betzy, or olivia.

Running Gaussian using Linda workers on Betzy

Following is an example script for Betzy, later we will show the same for Olivia: In this script we want to run a 4 node job, where each node will start 1 linda worker using 2 processes each.

  • The input file (“water_Linda.com`):

%chk=water
%mem=500MB
%NPROCSHARED=8
#p b3lyp/cc-pVDZ opt

structure optimization of water

0 1
O
H 1 0.96
H 1 0.96 2 109.471221
Note the preamble contains the line `%NPROCSHARED=2` which specifies the number of cores for each worker.
  • Job script example (betzy_g16.sh):

#!/bin/bash -l
#SBATCH --account=nnXXXXk
#SBATCH --job-name=example
#SBATCH --time=0-00:10:00
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=1             ## Linda does not use this variable
#SBATCH --output=slurm.%j.log

# make the program and environment visible to this script
module --quiet purge
module load Gaussian/16.C.01-AVX2

# name of input file without extension
input=water

# set the heap-size for the job to 20GB
export PGI_FASTMATH_CPU=avx2


# create the temporary folder
export GAUSS_SCRDIR=/cluster/work/users/$USER/$SLURM_JOB_ID
mkdir -p $GAUSS_SCRDIR

# add all nodes to list of known hosts
for HN in $(scontrol show hostnames)
do
    mkdir -p ~/.ssh
    touch ~/.ssh/known_hosts
    chmod 600 ~/.ssh/known_hosts
    ssh-keygen -R "$HN" 2>/dev/null || true
    ssh-keyscan -H "$HN" >> ~/.ssh/known_hosts 2>/dev/null || true
    
    ## Test if the nodes are reachable
    ssh -o BatchMode=yes "$HN" true && echo "SSH OK" || echo "SSH failed"
done


# copy input file to temporary folder
cp $SLURM_SUBMIT_DIR/$input.com $GAUSS_SCRDIR

# run the program
cd $GAUSS_SCRDIR

# Add line specifying the amount of Linda Workers per node.
NodeList=$(scontrol show hostnames | while read n ; do echo $n:1 ; done |  paste -d, -s)
{
printf '%%LINDAWORKERS=%s\n' "$NodeList"
cat $input.com
} > $input.tmp && mv $input.tmp $input.com


time g16 < $input.com > $input.out

# copy result files back to submit directory
cp $input.out $input.chk $SLURM_SUBMIT_DIR

exit 0

There are two important sections in the script, which will be explained below:

  1. These lines:

for HN in $(scontrol show hostnames) 
do
    mkdir -p ~/.ssh
    touch ~/.ssh/known_hosts
    ssh-keygen -R "$HN" 2>/dev/null || true
    ssh-keyscan -H "$HN" >> ~/.ssh/known_hosts 2>/dev/null || true
    
    # These two lines are for testing that the connections are set up properly, not necessary, but help in case of issues.
    ssh -o BatchMode=yes "$HN" true && echo "SSH OK"
    ssh -o BatchMode=yes "$HN" true && echo "SSH OK" || echo "SSH failed"
done

These lines will allow Linda to login to the different nodes without a yes|no prompt appearing, if this prompt appears, Linda will crash.

  1. These lines:

NodeList=$(scontrol show hostnames | while read n ; do echo $n:1 ; done |  paste -d, -s)

{
printf '%%LINDAWORKERS=%s\n' "$NodeList"
cat input.com
} > input.tmp && mv input.tmp input.com

Will tell Gaussian how to distribute the lindaWorkers through the nodes, even if just one node is used. Because of this section do echo $n:1 only one LindaWorker will be added to each node. This adds a line like the following at the top of your input file

%LindaWorkers=[node_1]:1,[node_2]:1, ...

If you want more LindaWorkers per node, change the number after the colon to what you want. For more information about this specific part of the preamble, check the Gaussian documentation linked above.

Note

Since the minimum number of nodes per job is 4 in betzy, only heavy calculations should be ran such that each node’s CPU utilization is maximized. Saga and Olivia are better suited for smaller calculations.

Running Gaussian using Linda workers on Olivia

Here is an example of a script for running Gaussian on Olivia. The specific details are as shown above for betzy. The same input file is used.

  • Job script example (olivia_g16.sh):

#!/bin/bash -l
#SBATCH --account=nnXXXXk
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=0-00:15:00
#SBATCH --cpus-per-task=8       # change this when more cores are needed
#SBATCH --mem-per-cpu=2GB
#SBATCH --job-name=example
#SBATCH --output=slurm.%j.log

# make the program and environment visible to this script
module restore
module load NRIS/CPU
module load Gaussian/16.C.01-AVX2
module list

# name of input file without extension
input=water

# set the heap-size for the job to 20GB
export PGI_FASTMATH_CPU=avx2



# create the temporary folder
export GAUSS_SCRDIR=/cluster/work/projects/nnXXXXk/$USER/$SLURM_JOB_ID
mkdir -p $GAUSS_SCRDIR

# copy input file to temporary folder
cp $SLURM_SUBMIT_DIR/$input.com $GAUSS_SCRDIR

# cd into the temporary folder
cd $GAUSS_SCRDIR

# add all nodes to list of known hosts
# if You are not using LINDA, you can delete these next lines and jump straight to "run the program"
for HN in $(scontrol show hostnames)
do
    mkdir -p ~/.ssh
    touch ~/.ssh/known_hosts
    chmod 600 ~/.ssh/known_hosts
    ssh-keygen -R "$HN" 2>/dev/null || true
    ssh-keyscan -H "$HN" >> ~/.ssh/known_hosts 2>/dev/null || true
    
    ## Test if the nodes are reachable
    ssh -o BatchMode=yes "$HN" true && echo "SSH OK" || echo "SSH failed"
done


# Add line specifying the amount of Linda Workers per node to the input file.
NodeList=$(scontrol show hostnames | while read n ; do echo $n:1 ; done |  paste -d, -s)
{
printf '%%LINDAWORKERS=%s\n' "$NodeList"
cat $input.com
} > $input.tmp && mv $input.tmp $input.com


# run the program
time g16 < $input.com > $input.out

# copy result files back to submit directory
cp $input.out $input.chk $SLURM_SUBMIT_DIR

exit 0

Note

Gaussian Defaults to 1 CPU if none are specified. Changing the variable --cpus-per-task in the submit script is not sufficient to increase the Gaussian’s CPU usage. The user must also change the line %NPROCSHARED in the input file with the correct number of CPUs needed .