Installing software with Conda

Conda is a tool to install packages (Python/R/C/C++/…) and their dependencies. You can do this yourself without administrator permissions on the cluster.

It is also a tool that allows to create and keep track of different environments for different projects. Creating multiple environments allows you to have installations of the same software in different versions or incompatible software collections at once. You can then share a list of the installed packages with collaborators or colleagues, so they can set up the same environment in a matter of minutes.

Conda comes in different shapes and forms but the idea and functionality is more or less the same:

  • Anaconda: Full-fledged distribution that includes a large number of pre-installed packages. Use this if you don’t want to install any packages and just need to test something and are sure that the package you need is part of Anaconda.

  • Miniconda: Minimal distribution that comes with only the essential packages required to set up Conda. Use this if you want to install dependencies and keep track of your environment.

  • Mamba: A re-implementation of Conda for fast dependency resolution.

  • Micromamba: Ultra-lightweight Mamba which supports the most important conda commands.

  • MiniForge Minimal installation of conda and mamba where the default channel is conda-forge. Use this when working in betzy only.

Note

Due to some licensing issues, only Miniforge3 is available in Betzy. Miniforge works in the same way described below, just replace any instance of EBROOTANACONDA3, EBROOTMINICONDA3 or EBROOTMAMBA with EBROOTMINIFORGE3

We typically provide modules for Anaconda, Miniconda and Mamba. Either of the three is fine.

The Conda workflow consists of several steps, these are:

  • Create an environment

  • Source into the environment

  • Install software, libraries and/or packages in the environment

  • Run software and packages that you have installed. These steps are performed slightly differently in the clusters than what you might do in your local machine.

In the next sections we will go through the different steps on our machines and provide some useful tips for reproducibility or ease of use.

Quickstart simple environment setup

In order to create an environment we start by loading Conda module of choice, this can be either Anaconda3, Miniconda3 or Mamba, here we give an example using the Anaconda3/2022.10 module, but any other Conda module works

module load Anaconda3/2022.10
source ${EBROOTANACONDA3}/bin/activate

with Miniconda3 the last command would be

source ${EBROOTMINICONDA3}/bin/activate

and with Mambait would be

source ${EBROOTMAMBA}/bin/activate

Note

The rest of the commands will be the same for all three distributions with the exception of that if you are using Mamba you have to change the conda in all the terminal commands with mamba, e.g. mamba install numpy.

At this point you are should be in the base environment of Conda (your terminal should have (base) before the directory name), where you can run basic Python scripts, use the Conda executable to create and manage environments, install software or packages, etc.

Conda downloads and stores a lot of files when installing packages in environments. Conda stores files in a cache to speed up subsequent installations of the same software. This can lead to cluttering of your home directory if not careful. To avoid cluttering your home directory, we encourage you to specify a path to where you want to create the environment and where you want to keep the software cache. A good location is your project directory for both the environment and the conda software cache.

To specify the software cache you can run this command in your terminal

export CONDA_PKGS_DIRS=/cluster/projects/nn____k/conda/username/package-cache

The package-cache stores tar-balls, logfiles and other side products of software installation. Some of these files are stored to make subsequent installations in different environments more streamlined.

This cache can be cleaned by running

conda clean -a

When creating the environment we specify the environment path by using the --prefix option.

conda env create --prefix /cluster/projects/nn____k/conda/username/my-env 

all files related to the environment will now be stored under the /cluster/projects/nn____k/conda/username/my-env/ directory. To activate this environment we need to specify its path

conda activate /cluster/projects/nn____k/conda/username/my-env

Notice that the (base) in your terminal now has changed to something similar to (my-env), this is indication that you are now in the context of the my-env environment.

Once you have activated the environment you can install software or libraries, numpy for example, by running

conda install numpy

Now any python script that utilizes numpy can be run in the terminal using the version of numpy you have installed in the step above.

Some of the libraries or software you install might need to be installed from different channels than the default ones. Scipy, for example, can be installed from the channel conda-forge. This can be done in a number of ways, following we introduce some

conda install conda-forge::scipy

alternatively

conda install --channel conda-forge scipy

both will do the same operation.

Some packages can only be installed with pip, but this can also be done through Conda:

conda install pip
pip install jupyter

These two commands will first install pip in the context of the Conda environment (otherwise you would use the global installation, which can introduce other problems) and then use this pip to install the package jupyter.

Note

pip might not need to be installed this way all the time, as it might have been installed by a previous command (such as installing any Python version). This is still good practice to avoid different versions of the same packages used simultaneously.

You can at any time list the packages installed in the environment by running

conda list

which, for the packages we have installed here, will show something like this:

# packages in environment at /path/to/my-env:
#
# Name                    Version                   Build  Channel
.
.
.
jupyter                   1.0.0                    pypi_0    pypi
numpy                     2.0.0           py312h8813227_0    conda-forge
pip                       24.0               pyhd8ed1ab_0    conda-forge
python                    3.12.4          h37a9e06_0_cpython    conda-forge
scipy                     1.14.0          py312hb9702fa_1    conda-forge
.
.
.

where we have omitted most of the other dependencies.

Once you are done with your environment you can delete it with

conda deactivate
conda remove --name my-env --all

Warning

Conda will at one point advice you to run the conda init command. do not run this command This will change your .bashrc and will make it very difficult for support to troubleshoot any of your issues.

Using environment.yml files

One of the pros of using Conda is that you can share your environment specification to collaborators through environment.yml files, which they can use to create their own copy of your environment.

Such an environment.ymlfile can look like this:

name: my-env
channels:
  - defaults
dependencies:
  - python=3.10
  - numpy
  - pandas
  - scipy

In order to create an environment from this file you run

conda env create --prefix /cluster/projects/nn____k/conda/username/my-env --file environment.yml

given that the file is named environment.yml. This will install all the dependencies listed from the necessary channels.

To create this file you can either write the environment.yml file manually following these instructions or use conda to export the list of installed packages in your environment automatically. You can do this by first activating the environment conda activate /path/to/my-env and then run this command

conda env export > environment.yml

This will overwrite any environment.yml file in the current directory and list all installed packages in the environment. These can be quite many, due to system specific dependencies that are installed at the same time. This might make this environment file not compatible across platforms. In order to just show the packages you specifically asked for you can run the following command

conda env export --from-history

If you change the environment.yml file for an existing environment, and you want to install the new packages, you run this in your terminal

conda update --file environment.yml

Activating the environment in your job script

We activate the environment in the job script the same way we activate it interactively on the command line (above). The additional SBATCH directives on top are unrelated to the Conda part:

 1#!/usr/bin/env bash
 2
 3# settings to catch errors in bash scripts
 4set -euf -o pipefail
 5
 6#                change this
 7#                    |
 8#                    v
 9#SBATCH --account=nn____k
10#SBATCH --job-name=example
11#SBATCH --qos=devel
12#SBATCH --ntasks=1
13#SBATCH --time=00:02:00
14
15# the actual module version might be different
16module load Anaconda3/2022.10
17source ${EBROOTANACONDA3}/bin/activate
18
19#                               change this
20#                                   |
21#                                   v
22conda activate /cluster/projects/nn____k/conda/username/myproject
23
24python --version
25python example.py

We need three lines before running any code that depends on the packages in your environment: loading the module, sourcing the activate script, and conda activate your environment.

If you used Miniconda instead of Anaconda, then lines 16 and 17 (above) might look like this instead (version might be different):

module load Miniconda3/22.11.1-1
source ${EBROOTMINICONDA3}/bin/activate

Help! I ran conda init and/or I see (base) everytime I log in

We advice against running conda init, but if you have run it you can “undo” this by deleting the lines that Conda added to your .bashrc or similar file and restarting your terminal (log off and back in again).

You can find your .bashrc file in your home directory. The lines that you must remove are

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/cluster/software/Anaconda3/2022.10/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/cluster/software/Anaconda3/2022.10/etc/profile.d/conda.sh" ]; then
        . "/cluster/software/Anaconda3/2022.10/etc/profile.d/conda.sh"
    else
        export PATH="/cluster/software/Anaconda3/2022.10/bin:$PATH"
    fi
fi
unset __conda_setup
# <<< conda initialize <<<

Once you have deleted these lines and restarted your session you will be free of the (base) prefix.

Container solution

If you are interested using Conda through a Singularity/Apptainer container, have a look at https://github.com/bast/apptainer-conda/.