Models, Datasets, Caches, and Overlays on Olivia
This page summarizes recommended defaults for storing models, datasets, Hugging Face caches, and overlay images on Olivia.
Use it together with PyTorch Software Options on Olivia and Adding Python Packages to PyTorch Containers.
Recommended Defaults
Use the module path by default.
Use the direct container path when you need explicit control over container launch details.
Do not plan around extending EESSI with
pip install.Store models, datasets, caches, and overlays in project or work storage, not in your home directory.
Use one overlay per project, not one overlay per job.
Build overlays from a
requirements.txtfile and reuse them across related jobs.If several users need the same models or datasets, use a shared project location.
Note
Home storage is limited by default, so large model and dataset caches should not be allowed to accumulate there.
Recommended Layout
A reasonable default layout is:
/cluster/work/projects/<project>/<user>/my_project/
├── code/
├── data/
├── hf_cache/
│ ├── hub/
│ ├── datasets/
│ └── torch/
└── overlays/
└── project_overlay.img
If several users in the same project need access to the same models or datasets, place shared caches and overlays in a project-shared location instead.
Overlay Recommendation
If additional Python packages are needed, prefer the module path or the direct container path.
For project work, the recommended default is:
Create one overlay per project.
Build it from a
requirements.txtfile.Store it in the project area.
Reuse it across related jobs.
Note
The package-install workflow is documented separately in Adding Python Packages to PyTorch Containers. This page only describes the recommended organization.
Hugging Face and Torch Cache Locations
PyTorch and Hugging Face workflows often download model weights, datasets, and cache files automatically.
On Olivia, redirect those caches away from your home directory.
The PyTorch guide examples use the following pattern:
HF_ROOT="${SCRIPT_DIR}/hf_cache"
mkdir -p "${HF_ROOT}/hub" "${HF_ROOT}/datasets" "${HF_ROOT}/torch"
export HF_HOME="${HF_ROOT}"
export HF_HUB_CACHE="${HF_ROOT}/hub"
export HF_DATASETS_CACHE="${HF_ROOT}/datasets"
export TRANSFORMERS_CACHE="${HF_ROOT}/hub"
export TORCH_HOME="${HF_ROOT}/torch"
These variables control:
HF_HOMEsets the general Hugging Face home directory.HF_HUB_CACHEstores downloaded model files from Hugging Face Hub.HF_DATASETS_CACHEstores datasets handled through Hugging Face Datasets.TRANSFORMERS_CACHEstores cached model files used by Transformers.TORCH_HOMEstores Torch-related cached files such as downloaded model artifacts.
Where to Put Models and Datasets
For personal work, store models and datasets under your own project or work directory.
For shared project work, store them in a shared project location.
Apply the same rule to datasets downloaded from outside Hugging Face.
Do not let large model and dataset caches build up in the home directory.