Setup Nvidia GPU for JupyterHub#

Note

This assumes that the Nvidia GPU Operator is installed and working on your k8s cluster. Please see the Getting Started documentation from Nvidia for more details on setting up the GPU Operator.

In order to connect JupyterHub singleuser environments to GPUs the nvidia runtime class needs to be set as the default for the container. Changing the default runtime for the cluster will cause issues and can bring the whole cluster down by interrupting other pods like the CNIs (Container Network Interface). This can be accomplished by providing an extra_pod_config in the helm chart configuration for the GPU singleuser image. This isn’t 100% clear in the documentation and where to place this isn’t clear either. An example of what this looks like in the values used for the Helm install is as follows:

- display_name: NVIDIA Tesla T4, ~16 GB, ~4 CPUs
  slug: gpu
  description: "Start a container on a dedicated node with a GPU"
  profile_options:
    image:
      display_name: Image
      choices:
        tensorflow:
          display_name: Pangeo Tensorflow ML Notebook
          slug: "tensorflow"
          kubespawner_override:
            image: "pangeo/ml-notebook:2023.05.18"
        pytorch:
          display_name: Pangeo PyTorch ML Notebook
          default: true
          slug: "pytorch"
          kubespawner_override:
            image: "pangeo/pytorch-notebook:2023.05.18"
  kubespawner_override:
    mem_limit: null
    mem_guarantee: 14G
    environment:
      NVIDIA_DRIVER_CAPABILITIES: compute,utility
    extra_pod_config:
      runtimeClassName: "nvidia"
    extra_resource_limits:
      nvidia.com/gpu: "1"