DirectorySecurity advisories
Sign in

Directory

gpu-operator-validator logo

gpu-operator-validator

Last changed
Sign in for updates

Get notified of upcoming product changes, critical vulnerability notifications and patches and more.

Sign in
Versions
Overview
Provenance
Specifications
SBOM
Vulnerabilities
Advisories

Minimal gpu-operator-validator container image for gpu-operator-validator.

Download this Image

The image is available on cgr.dev:

docker pull cgr.dev/<cgr-registry>/gpu-operator-validator:latest

Usage

NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes and automates tasks related to bootstrapping GPU nodes.

The gpu-operator-validator image provides the validator for NVIDIA GPU Operator, which runs as a Daemonset and ensures that all components are working as expected on all GPU nodes. This needs to be run on a VM which has GPU support, or a kind cluster which supports GPUs.

Helm Installation for GPU-Operator

You can configure the GPU Operator Validator using the Helm chart

$ helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
    && helm repo update
$ helm upgrade --install gpu-operator nvidia/gpu-operator \
    --set validator.repository=<cgr-registry> \
    --set validator.image=gpu-operator-validator \
    --set validator.version=latest \
    --set operator.repository=<cgr-registry> \
    --set operator.image=gpu-operator \
    --set operator.version=latest-dev

It might take a couple of minutes for all the pods and deamonsets to be created. Once created you may verify them by looking for the following:

Verify Deployments and Daemonsets

$ kubectl get pods
NAME                                                          READY   STATUS      RESTARTS   AGE
gpu-feature-discovery-pdhdn                                   1/1     Running     0          10m
gpu-operator-6cf77ccd94-jv2dm                                 1/1     Running     0          11m
gpu-operator-node-feature-discovery-gc-7b4fd865f4-578fw       1/1     Running     0          11m
gpu-operator-node-feature-discovery-master-5dff7dbb89-xzsc8   1/1     Running     0          11m
gpu-operator-node-feature-discovery-worker-65gvq              1/1     Running     0          11m
nvidia-container-toolkit-daemonset-zkpsh                      1/1     Running     0          10m
nvidia-cuda-validator-jhz7l                                   0/1     Completed   0          10m
nvidia-dcgm-exporter-vg5wz                                    1/1     Running     0          10m
nvidia-device-plugin-daemonset-gbmd4                          1/1     Running     0          10m
nvidia-operator-validator-qb22l                               1/1     Running     0          10m


# Verify GPU Feature Discovery Deployments
$ kubectl get deploy -l app.kubernetes.io/name=node-feature-discovery

# Verify GPU Feature Discovery Daemonset
$ kubectl get daemonset -l app.kubernetes.io/name=node-feature-discovery

# Verify GPU Operator Deployment
$ kubectl get rs -l app.kubernetes.io/name=gpu-operator

For more information, refer to the documentation:

Licenses

Chainguard Images contain software packages that are direct or transitive dependencies. The following licenses were found in the "latest" version of this image:

  • GCC-exception-3.1

  • GPL-3.0-or-later

  • LGPL-2.1-or-later

  • MIT

  • MPL-2.0

  • PROPRIETARY

For a complete list of licenses, please refer to this Image's SBOM.

Software license agreement

Category
application
cuda
GPU
tools

Products

Chainguard Images

© 2024 Chainguard, Inc.