Setup Nvidia GPU for JupyterHub#
Note
This assumes that the Nvidia GPU Operator is installed and working on your k8s cluster. Please see the Getting Started documentation from Nvidia for more details on setting up the GPU Operator.
In order to connect JupyterHub singleuser environments to GPUs the nvidia runtime class needs to be set as the default for the container. Changing the default runtime for the cluster will cause issues and can bring the whole cluster down by interrupting other pods like the CNIs (Container Network Interface). This can be accomplished by providing an extra_pod_config in the helm chart configuration for the GPU singleuser image. This isn’t 100% clear in the documentation and where to place this isn’t clear either. An example of what this looks like in the values used for the Helm install is as follows:
- display_name: NVIDIA Tesla T4, ~16 GB, ~4 CPUs
slug: gpu
description: "Start a container on a dedicated node with a GPU"
profile_options:
image:
display_name: Image
choices:
tensorflow:
display_name: Pangeo Tensorflow ML Notebook
slug: "tensorflow"
kubespawner_override:
image: "pangeo/ml-notebook:2023.05.18"
pytorch:
display_name: Pangeo PyTorch ML Notebook
default: true
slug: "pytorch"
kubespawner_override:
image: "pangeo/pytorch-notebook:2023.05.18"
kubespawner_override:
mem_limit: null
mem_guarantee: 14G
environment:
NVIDIA_DRIVER_CAPABILITIES: compute,utility
extra_pod_config:
runtimeClassName: "nvidia"
extra_resource_limits:
nvidia.com/gpu: "1"