Distributed LLM Fine-Tuning & Inference on HPC Systems - event in Oslo - June 09-10 2026

The Norwegian Research Infrastructure Services (NRIS) is hosting a two-day in-person, hands-on physical course in Oslo/UiO. Gain practical, hands-on experience over two days working with single-GPU fine-tuning, multi-GPU scaling on single-node and multi-node setups, and optimized LLM inference on a high-performance computing (HPC) system. Build applied skills in optimizing large language models in HPC environments.

When: 09th and 10th of June, 2026, 09:30-16:00 both days

Where: UiO Campus, Oslo - Ole-Johan Dahls Hus: Seminar room Java

Instructor: Hicham Agueny

HPC System: Olivia

Content: In this course, you will learn to:

  • Implement parameter-efficient fine-tuning using LoRA and QLoRA

  • Configure and launch distributed fine-tuning on multiple GPUs and across multiple nodes

  • Perform distributed LLM inference

  • Monitor and analyze GPU utilization and profiling GPU memory

Course program and schedule

Day 1 — Single-GPU Fine-Tuning & HPC Foundations

Theme: Build an efficient single-GPU fine-tuning workflow on an HPC system.

Morning Session (09:30–12:00) — HPC Fundamentals & Fine-Tuning Optimization

  1. HPC Foundations for LLM Workloads

    • Overview of Olivia Supercomputer

    • Containerized environments

  2. LLM Fine-Tuning Fundamentals

    • Parameter-efficient fine-tuning with LoRA

    • Quantized fine-tuning with QLoRA

Afternoon Session (13:00–15:30) — Hands-On: Single-GPU workflow for QA and XSum Tasks

  • LoRA fine-tuning workflow

  • Quantized fine-tuning with QLoRA: FP4 vs BF16 comparison

  • GPU monitoring and memory profiling

Wrap-Up & Discussion (15:30–16:00)

Outcome: Participants implement and optimize a complete single-GPU fine-tuning pipeline with performance diagnostics on an HPC system.

Day 2 — Distributed Training & Optimized Inference

Theme: Scale fine-tuning and inference across multiple GPUs while minimizing communication overhead.

Morning Session (09:30–12:00) — Distributed Fine-Tuning

  1. Distributed Training Concepts

    • DDP vs FSDP

    • Communication and scaling efficiency

  2. Hands-On: Multi-GPU Fine-Tuning on a single node & acorss nodes for QA and XSum Tasks

    • Multi-GPU LoRA & QLoRA fine-tuning

    • Profiling distributed workloads

Afternoon Session (13:00–15:30) — Hands-On: Optimized Inference

  • Introduction to the vLLM inference engine

  • Single-GPU inference benchmarking

  • Multi-GPU inference

Wrap-Up & Discussion (15:30–16:00)

Outcome: Participants scale fine-tuned models and inference across multiple GPUs, interpret performance metrics, and apply optimization strategies suitable for HPC allocations.



Target audience: This course is ideal for researchers, developers, and students with Python experience who want hands-on skills in scalable LLM training and inference on an HPC system.

Registration: Register here

Practical information: The course is free of charge, and has a maximum capacity of 30 participants. Lunch will be included.

Coordinators: Eirik Skjerve, Burcin Buket Ogul

Contact us

You can always contact us by sending an email to support@nris.no.