This tutorial guides you through the process of setting up a Kubernetes environment on a GPU-enabled server. We will install and configure kubectl, helm, and minikube, ensuring GPU compatibility for workloads requiring accelerated computing. By the end of this tutorial, you will have a fully functional Kubernetes environment ready for deploy the vLLM Production Stack.
Before you begin, ensure the following:
Clone the repository and navigate to the utils/ folder:
git clone https://github.com/vllm-project/production-stack.git
cd production-stack/utils
Execute the script install-kubectl.sh:
bash install-kubectl.sh
Explanation:
This script downloads the latest version of kubectl, the Kubernetes command-line tool, and places it in your PATH for easy execution.
Expected Output:
kubectl was downloaded and installed.Verification message using:
kubectl version --client
Example output:
Client Version: v1.32.1
Execute the script install-helm.sh:
bash install-helm.sh
Verification message using:
helm version
Example output:
version.BuildInfo{Version:"v3.17.0", GitCommit:"301108edc7ac2a8ba79e4ebf5701b0b6ce6a31e4", GitTreeState:"clean", GoVersion:"go1.23.4"}
Before proceeding, ensure Docker runs without requiring sudo. To add your user to the docker group, run:
sudo usermod -aG docker $USER && newgrp docker
If Minikube is already installed on your system, we recommend uninstalling the existing version before proceeding. You may use one of the following commands based on your operating system and package manager:
# Ubuntu / Debian
sudo apt remove minikube
# RHEL / CentOS / Fedora
sudo yum remove minikube
# or
sudo dnf remove minikube
# macOS (installed via Homebrew)
brew uninstall minikube
# Arch Linux
sudo pacman -Rs minikube
# Windows (via Chocolatey)
choco uninstall minikube
# Windows (via Scoop)
scoop uninstall minikube
After removing the previous installation, please execute the script provided below to install the latest version.
Execute the script install-minikube-cluster.sh:
bash install-minikube-cluster.sh
gpu-operator chart to manage GPU resources within the cluster.Expected Output: If everything goes smoothly, you should see the example output like following:
😄 minikube v1.35.0 on Ubuntu 22.04 (kvm/amd64)
❗ minikube skips various validations when --force is supplied; this may lead to unexpected behavior
✨ Using the docker driver based on user configuration
......
......
🏄 Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
"nvidia" has been added to your repositories
Hang tight while we grab the latest from your chart repositories...
......
......
NAME: gpu-operator-1737507918
LAST DEPLOYED: Wed Jan 22 01:05:21 2025
NAMESPACE: gpu-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None
Some troubleshooting tips for installing gpu-operator:
If gpu-operator fails to start because of the common seen “too many open files” issue for minikube (and kind), then a quick fix below may be helpful.
The issue can be observed by one or more gpu-operator pods in CrashLoopBackOff status, and be confirmed by checking their logs. For example,
$ kubectl -n gpu-operator logs daemonset/nvidia-device-plugin-daemonset -c nvidia-device-plugin
IS_HOST_DRIVER=true
NVIDIA_DRIVER_ROOT=/
DRIVER_ROOT_CTR_PATH=/host
NVIDIA_DEV_ROOT=/
DEV_ROOT_CTR_PATH=/host
Starting nvidia-device-plugin
I0131 19:35:42.895845 1 main.go:235] "Starting NVIDIA Device Plugin" version=<
d475b2cf
commit: d475b2cfcf12b983a4975d4fc59d91af432cf28e
>
I0131 19:35:42.895917 1 main.go:238] Starting FS watcher for /var/lib/kubelet/device-plugins
E0131 19:35:42.895933 1 main.go:173] failed to create FS watcher for /var/lib/kubelet/device-plugins/: too many open files
The fix is well documented by kind, it also works for minikube.
Ensure Minikube is running:
minikube status
Expected output:
minikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured
Verify GPU access within Kubernetes:
kubectl describe nodes | grep -i gpu
Expected output:
nvidia.com/gpu: 1
... (plus many lines related to gpu information)
Deploy a test GPU workload:
kubectl run gpu-test --image=nvidia/cuda:12.2.0-runtime-ubuntu22.04 --restart=Never -- nvidia-smi
Wait for kubernetes to download and create the pod and then check logs to confirm GPU usage:
kubectl logs gpu-test
You should see the nvidia-smi output from the terminal
By following this tutorial, you have successfully set up a Kubernetes environment with GPU support on your server. You are now ready to deploy and test vLLM Production Stack on Kubernetes. For further configuration and workload-specific setups, consult the official documentation for kubectl, helm, and minikube.
What’s next: