Distributed LLM Fine-Tuning & Inference on HPC Systems - event in Oslo - June 09-10 2026
The Norwegian Research Infrastructure Services (NRIS) is hosting a two-day in-person, hands-on physical course in Oslo/UiO. Gain practical, hands-on experience over two days working with single-GPU fine-tuning, multi-GPU scaling on single-node and multi-node setups, and optimized LLM inference on a high-performance computing (HPC) system. Build applied skills in optimizing large language models in HPC environments.
When: 09th and 10th of June, 2026, 09:30-16:00 both days
Where: UiO Campus, Oslo - Ole-Johan Dahls Hus: Seminar room Java
Instructor: Hicham Agueny
HPC System: Olivia
Content: In this course, you will learn to:
Implement parameter-efficient fine-tuning using LoRA and QLoRA
Configure and launch distributed fine-tuning on multiple GPUs and across multiple nodes
Perform distributed LLM inference
Monitor and analyze GPU utilization and profiling GPU memory
Course program and schedule
Day 1 — Single-GPU Fine-Tuning & HPC Foundations
Theme: Build an efficient single-GPU fine-tuning workflow on an HPC system.
Morning Session (09:30–12:00) — HPC Fundamentals & Fine-Tuning Optimization
HPC Foundations for LLM Workloads
Overview of Olivia Supercomputer
Containerized environments
LLM Fine-Tuning Fundamentals
Parameter-efficient fine-tuning with LoRA
Quantized fine-tuning with QLoRA
Afternoon Session (13:00–15:30) — Hands-On: Single-GPU workflow for QA and XSum Tasks
LoRA fine-tuning workflow
Quantized fine-tuning with QLoRA: FP4 vs BF16 comparison
GPU monitoring and memory profiling
Wrap-Up & Discussion (15:30–16:00)
Outcome: Participants implement and optimize a complete single-GPU fine-tuning pipeline with performance diagnostics on an HPC system.
Day 2 — Distributed Training & Optimized Inference
Theme: Scale fine-tuning and inference across multiple GPUs while minimizing communication overhead.
Morning Session (09:30–12:00) — Distributed Fine-Tuning
Distributed Training Concepts
DDP vs FSDP
Communication and scaling efficiency
Hands-On: Multi-GPU Fine-Tuning on a single node & acorss nodes for QA and XSum Tasks
Multi-GPU LoRA & QLoRA fine-tuning
Profiling distributed workloads
Afternoon Session (13:00–15:30) — Hands-On: Optimized Inference
Introduction to the vLLM inference engine
Single-GPU inference benchmarking
Multi-GPU inference
Wrap-Up & Discussion (15:30–16:00)
Outcome: Participants scale fine-tuned models and inference across multiple GPUs, interpret performance metrics, and apply optimization strategies suitable for HPC allocations.
Target audience: This course is ideal for researchers, developers, and students with Python experience who want hands-on skills in scalable LLM training and inference on an HPC system.
Registration: Register here
Practical information: The course is free of charge, and has a maximum capacity of 30 participants. Lunch will be included.
Coordinators: Eirik Skjerve, Burcin Buket Ogul
Contact us
You can always contact us by sending an email to support@nris.no.