Distributed LLM Fine-Tuning & Inference on HPC Systems: March 25-26 2026

The Norwegian Research Infrastructure Services (NRIS) is hosting a two-day in-person, hands-on physical course in Bergen. Gain practical, hands-on experience over two days working with single-GPU fine-tuning, multi-GPU scaling, and optimized LLM inference on a high-performance computing (HPC) system. Build applied skills in optimizing large language models in HPC environments.

When: 25th and 26th of March, 2026, 09:30-16:00 both days

Where: Nygårdsgaten 5, Bergen (exact conference room will be provided)

Instructor: Hicham Agueny

HPC System: Olivia

Content: In this course, you will learn to:

Implement parameter-efficient fine-tuning using LoRA and QLoRA
Configure and launch distributed training workloads across multiple GPUs
Perform distributed LLM inference
Monitor and analyze GPU utilization and profiling GPU memory

Course program and schedule

Day 1 — Single-GPU Fine-Tuning & HPC Foundations

Theme: Build an efficient single-GPU fine-tuning workflow on an HPC system.

Morning Session (09:30–12:00) — HPC Fundamentals & Fine-Tuning Optimization

HPC Foundations for LLM Workloads
- Overview of Olivia Supercomputer
- Storage hierarchy strategy
- Containerized environments
LLM Fine-Tuning Fundamentals
- Parameter-efficient fine-tuning (LoRA, QLoRA)
- Quantization within QLoRA (FP4, FP8, BF16)
- Memory–throughput trade-offs

Afternoon Session (13:00–15:30) — Hands-On: Single-GPU Workflow

End-to-end LoRA fine-tuning workflow
Quantized fine-tuning: FP4 vs FP8 vs BF16 comparison
GPU monitoring and memory profiling

Wrap-Up & Discussion (15:30–16:00)

Outcome: Participants implement and optimize a complete single-GPU fine-tuning pipeline with performance diagnostics on an HPC system.

Day 2 — Distributed Training & Optimized Inference

Theme: Scale fine-tuning and inference across multiple GPUs while minimizing communication overhead.

Morning Session (09:30–12:00) — Distributed Fine-Tuning

Distributed Training Concepts
- DDP vs FSDP
- Communication overhead and scaling efficiency
Hands-On: Multi-GPU Fine-Tuning
- Multi-GPU LoRA & QLoRA fine-tuning
- Profiling distributed workloads
- Throughput and scaling efficiency analysis

Afternoon Session (13:00–15:30) — Hands-On: Optimized Inference

Introduction to the vLLM inference engine
Single-GPU inference benchmarking
Multi-GPU inference scaling
Latency vs throughput trade-offs

Wrap-Up & Discussion (15:30–16:00)

Outcome: Participants scale fine-tuned models and inference across multiple GPUs, interpret performance metrics, and apply optimization strategies suitable for HPC allocations.

Target audience: This course is ideal for researchers, developers, and students with Python experience who want hands-on skills in scalable LLM training and inference on an HPC system.

Registration: Register here

Practical information: The course is free of charge, and has a maximum capacity of 30 participants. There will be serving of food, some light pastries and coffee/tea both days.

Coordinator: Eirik Skjerve

Contact us

You can always contact us by sending an email to support@nris.no.