Ollama GPU Server Setup¶

This guide shows how to run the Ollama server on a Kubernetes node with an NVIDIA GPU. It assumes the NVIDIA device plugin is installed on the cluster.

Deployment via Flux¶

FluxCD manages the Ollama deployment under gitops/clusters/homelab/apps/ollama/. Commit the manifest files to the repository and Flux will create the namespace, deployment and service automatically. The deployment mounts an empty directory at /root/.ollama for model storage. Replace it with a persistent volume claim if you want the models to survive pod restarts.

Adding and Serving Models¶

Pull a model into the running pod:

kubectl exec deployment/ollama-server -- ollama pull llama3

Verify the model is available:

kubectl exec deployment/ollama-server -- ollama list

Send a test request to the service:

curl http://<service-ip>:80/api/generate -d '{"model":"llama3","prompt":"Hello"}'

Ollama GPU Server Setup¶

Deployment via Flux¶

Adding and Serving Models¶

homeiac

Navigation

Related Topics