K3s NVIDIA GPU Passthrough Guide — Proxmox VE + K3s¶

Stream GPU power into your K3s homelab: pass an NVIDIA GeForce RTX 3070 from Proxmox into a K3s VM and accelerate AI, video, and compute workloads.

Prerequisites¶

VT‑d (Intel) or AMD‑Vi enabled in BIOS › Advanced › System Agent
Proxmox node with an NVIDIA GPU installed (e.g. RTX 3070 on host still‑fawn)
VM OS: Ubuntu 22.04 / 24.04 or Debian 12
Existing K3s ≥ v1.32 cluster (API or GUI install)
Host‑side tools: lspci, dmesg, update-grub, modprobe, nvidia-ctk, kubectl, crictl

⚠️ Beware: Most BIOSes ship with IOMMU/VT‑d disabled. Double‑check and turn it on before continuing.

1. Enable IOMMU & VFIO on Proxmox Host¶

# A) Add IOMMU flags to GRUB
sed -i 's/quiet/quiet intel_iommu=on iommu=pt/' /etc/default/grub
update-grub && reboot

# B) Load VFIO modules at boot
echo -e "vfio\nvfio_iommu_type1\nvfio_pci\nvfio_virqfd" | sudo tee /etc/modules
update-initramfs -u

# C) Blacklist host GPU drivers
echo -e "blacklist nouveau\nblacklist nvidia" | sudo tee /etc/modprobe.d/blacklist-gpu.conf
update-initramfs -u

# D) Bind GPU to VFIO — replace IDs with your lspci -nn output
echo 'options vfio-pci ids=10de:2484,10de:228b disable_vga=1' | sudo tee /etc/modprobe.d/vfio.conf
update-initramfs -u && reboot

Verification¶

# Kernel enabled IOMMU?
dmesg | grep -E 'DMAR:.*IOMMU enabled'

# GPU bound to vfio-pci?
lspci -k -s 01:00.0 | grep 'vfio-pci'

# Every PCIe device isolated in its own IOMMU group?
for g in /sys/kernel/iommu_groups/*; do
  echo "IOMMU Group ${g##*/}:"
  for d in "$g"/devices/*; do
    echo -e "\t$(lspci -nns ${d##*/})"
  done
done

IOMMU Setup Common Missteps¶

Missing iommu=pt → inconsistent passthrough.
Host driver not blacklisted → GPU never frees for VM.

2. Create & Configure the VM (Proxmox GUI)¶

VM Options: BIOS = OVMF (UEFI), Machine = q35, CPU Type = host
Hardware → Add → PCI Device: choose 01:00.0 (GPU) → enable All Functions and PCI‑Express
Start VM

Verification (inside VM)¶

lspci -nn | grep -i nvidia

VM Config Common Missteps¶

Forgetting All Functions → passthrough of GPU or audio only.
CPU model left at Default → AVX and other flags unavailable inside VM.
CPU model should be Host

3. Install NVIDIA Drivers & Configure K3s Containerd¶

Option A — Community Quick‑start¶

a. GPU Operator via Helm (fully automated)¶

helm repo add nvidia https://nvidia.github.io/gpu-operator
helm repo update
kubectl create namespace gpu-operator
helm install gpu-operator nvidia/gpu-operator \
  --namespace gpu-operator \
  --values values.yaml

b. Single shot `nvidia-ctk` injection (manual shim only)¶

sudo nvidia-ctk runtime configure \
  --runtime=containerd \
  --config /var/lib/rancher/k3s/agent/etc/containerd/config.toml
sudo systemctl restart k3s

If you used a. GPU Operator, skip b. The Operator already performs the injection. The above is not verified. I did both and then it wasn’t working before it started working.

NVIDIA Install Verification¶

sudo crictl info | grep -A3 '"nvidia"'
nvidia-smi

NVIDIA Common Missteps From Blogs¶

Editing /etc/containerd/config.toml (K3s ignores this file).
Forgetting to run nvidia-ctk before K3s starts.

4. Deploy the NVIDIA Device Plugin¶

kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.17.1/deployments/static/nvidia-device-plugin.yml
# (Optional) Restrict to GPU node only:
kubectl patch ds nvidia-device-plugin -n kube-system \
  --type=json -p '[{"op":"add","path":"/spec/template/spec/nodeSelector","value":{"nvidia.com/gpu.present":"true"}}]'

K3s NVIDIA Integration Verification¶

kubectl get ds nvidia-device-plugin -n kube-system
kubectl logs -l app=nvidia-device-plugin -n kube-system | head -n 20
kubectl describe node still-fawn | grep -A2 Capacity  # Expect: nvidia.com/gpu: 1

5. Smoke‑test with a CUDA Pod¶

gpu-test.yaml¶

apiVersion: v1
kind: Pod
metadata:
  name: gpu-test
spec:
  restartPolicy: Never
  containers:
  - name: cuda-smi
    image: nvidia/cuda:11.0-base
    command: ["nvidia-smi"]
    resources:
      limits:
        nvidia.com/gpu: 1

kubectl apply -f gpu-test.yaml
kubectl wait --for=condition=Succeeded pod/gpu-test --timeout=1m
kubectl logs gpu-test  # Expect full nvidia-smi output