Distributed LLM Fine-Tuning & Inference on HPC Systems - event in Oslo - June 09-10 2026

The Norwegian Research Infrastructure Services (NRIS) is hosting a two-day in-person, hands-on physical course in Oslo/UiO. Gain practical, hands-on experience over two days working with single-GPU fine-tuning, multi-GPU scaling on single-node and multi-node setups, and optimized LLM inference on a high-performance computing (HPC) system. Build applied skills in optimizing large language models in HPC environments.

When: 09th and 10th of June, 2026, 09:30-16:00 both days

Where: UiO Campus, Oslo - Ole-Johan Dahls Hus: Seminar room Java

Instructor: Hicham Agueny

HPC System: Olivia

Content: In this course, you will learn to:

Implement parameter-efficient fine-tuning using LoRA and QLoRA
Configure and launch distributed fine-tuning on multiple GPUs and across multiple nodes
Perform distributed LLM inference
Monitor and analyze GPU utilization and profiling GPU memory

Course program and schedule

Day 1 — Single-GPU Fine-Tuning & HPC Foundations

Theme: Build an efficient single-GPU fine-tuning workflow on an HPC system.

Morning Session (09:30–12:00) — HPC Fundamentals & Fine-Tuning Optimization

HPC Foundations for LLM Workloads
- Overview of Olivia Supercomputer
- Containerized environments
LLM Fine-Tuning Fundamentals
- Parameter-efficient fine-tuning with LoRA
- Quantized fine-tuning with QLoRA

Afternoon Session (13:00–15:30) — Hands-On: Single-GPU workflow for QA and XSum Tasks

LoRA fine-tuning workflow
Quantized fine-tuning with QLoRA: FP4 vs BF16 comparison
GPU monitoring and memory profiling

Wrap-Up & Discussion (15:30–16:00)

Outcome: Participants implement and optimize a complete single-GPU fine-tuning pipeline with performance diagnostics on an HPC system.

Day 2 — Distributed Training & Optimized Inference

Theme: Scale fine-tuning and inference across multiple GPUs while minimizing communication overhead.

Morning Session (09:30–12:00) — Distributed Fine-Tuning

Distributed Training Concepts
- DDP vs FSDP
- Communication and scaling efficiency
Hands-On: Multi-GPU Fine-Tuning on a single node & acorss nodes for QA and XSum Tasks
- Multi-GPU LoRA & QLoRA fine-tuning
- Profiling distributed workloads

Afternoon Session (13:00–15:30) — Hands-On: Optimized Inference

Introduction to the vLLM inference engine
Single-GPU inference benchmarking
Multi-GPU inference

Wrap-Up & Discussion (15:30–16:00)

Outcome: Participants scale fine-tuned models and inference across multiple GPUs, interpret performance metrics, and apply optimization strategies suitable for HPC allocations.

Target audience: This course is ideal for researchers, developers, and students with Python experience who want hands-on skills in scalable LLM training and inference on an HPC system.

Registration: Register here

Practical information: The course is free of charge, and has a maximum capacity of 30 participants. Lunch will be included.

Coordinators: Eirik Skjerve, Burcin Buket Ogul

Contact us

You can always contact our support team.