​
DirectorySecurity Advisories
Sign In
Directory
dcgm-exporter logo

dcgm-exporter

Last changed

Sign In for Updates

Get notified of upcoming product changes, critical vulnerability notifications and patches and more.

Sign In
Versions
Overview
Provenance
Specifications
SBOM
Vulnerabilities
Advisories

Download this Image

The image is available on cgr.dev:

docker pull cgr.dev/chainguard/dcgm-exporter:latest

Usage

DCGM-Exporter is a tool based on the Go APIs to NVIDIA DCGM that allows users to gather GPU metrics and understand workload behavior or monitor GPUs in clusters. DCGM Exporter is written in Go and exposes GPU metrics at an HTTP endpoint (/metrics) for monitoring solutions such as Prometheus.

To test the functionality of NVIDIA DCGM Exporter Image, it requires an environment with connected GPUs. If you have connected GPUs, here's one way to use this image:

Using Docker

Run Image

Install Docker Engine and configure it with your credentials to pull image

Run the image:

docker run -d --rm \
   --gpus all \
   --net host \
   --cap-add SYS_ADMIN \
   cgr.dev/chainguard/dcgm-exporter:latest \
   -f /etc/dcgm-exporter/dcp-metrics-included.csv

Retreive the metrics

$ curl localhost:9400/metrics

Output should be something like this

# HELP DCGM_FI_DEV_SM_CLOCK SM clock frequency (in MHz).
# TYPE DCGM_FI_DEV_SM_CLOCK gauge
# HELP DCGM_FI_DEV_MEM_CLOCK Memory clock frequency (in MHz).
# TYPE DCGM_FI_DEV_MEM_CLOCK gauge
# HELP DCGM_FI_DEV_MEMORY_TEMP Memory temperature (in C).
# TYPE DCGM_FI_DEV_MEMORY_TEMP gauge
...
DCGM_FI_DEV_SM_CLOCK{gpu="0", UUID="GPU-604ac76c-d9cf-fef3-62e9-d92044ab6e52"} 139
DCGM_FI_DEV_MEM_CLOCK{gpu="0", UUID="GPU-604ac76c-d9cf-fef3-62e9-d92044ab6e52"} 405
DCGM_FI_DEV_MEMORY_TEMP{gpu="0", UUID="GPU-604ac76c-d9cf-fef3-62e9-d92044ab6e52"} 9223372036854775794
...

Helm Installation

Step 1: Add and Update Helm Repository Add the NVIDIA DCGM Exporter repository and update it to ensure you have access to the latest charts.

$ helm repo add gpu-helm-charts \
   https://nvidia.github.io/dcgm-exporter/helm-charts

$ helm repo update

Step 2: Install NVIDIA DCGM Exporter

Install NVIDIA DCGM Exporter using Helm with the specified version, namespace, and optional configuration settings.

$ helm install \
  --generate-name \
  gpu-helm-charts/dcgm-exporter \
  --set image.repository=cgr.dev/chainguard/dcgm-exporter \
  --set image.tag=latest

Step 3: Verify Installation

$ kubectl get pods -A

NAMESPACE     NAME                                                              READY   STATUS      RESTARTS   AGE
default       dcgm-exporter-2-1603213075-w27mx                                  1/1     Running     0          2m18s
kube-system   calico-kube-controllers-8f59968d4-g28x8                           1/1     Running     1          43m
kube-system   calico-node-zfnfk                                                 1/1     Running     1          43m
kube-system   coredns-f9fd979d6-p7djj                                           1/1     Running     1          43m
kube-system   coredns-f9fd979d6-qhhgq                                           1/1     Running     1          43m
kube-system   etcd-ip-172-31-92-253                                             1/1     Running     1          43m
kube-system   kube-apiserver-ip-172-31-92-253                                   1/1     Running     2          43m
kube-system   kube-controller-manager-ip-172-31-92-253                          1/1     Running     1          43m
kube-system   kube-proxy-mh528                                                  1/1     Running     1          43m
kube-system   kube-scheduler-ip-172-31-92-253                                   1/1     Running     1          43m
kube-system   nvidia-device-plugin-1603211071-7hlk6                             1/1     Running     0          35m
prometheus    alertmanager-kube-prometheus-stack-1603-alertmanager-0            2/2     Running     0          33m
prometheus    kube-prometheus-stack-1603-operator-6b95bcdc79-wmbkn              2/2     Running     0          33m
prometheus    kube-prometheus-stack-1603211794-grafana-67ff56c449-tlmxc         2/2     Running     0          33m
prometheus    kube-prometheus-stack-1603211794-kube-state-metrics-877df67c49f   1/1     Running     0          33m
prometheus    kube-prometheus-stack-1603211794-prometheus-node-exporter-b5fl9   1/1     Running     0          33m
prometheus    prometheus-kube-prometheus-stack-1603-prometheus-0                3/3     Running     1          33m

For more information and setting it up with prometheus stack, refer to the official documentation:

Licenses

Chainguard Images contain software packages that are direct or transitive dependencies. The following licenses were found in the "latest" version of this image:

  • Apache-2.0

  • GCC-exception-3.1

  • GPL-3.0-or-later

  • LGPL-2.1-or-later

  • MIT

  • MPL-2.0

  • PROPRIETARY

For a complete list of licenses, please refer to this Image's SBOM.

Software license agreement

Category
AI
application

Media KitContact Us
© 2024 Chainguard. All Rights Reserved.
Private PolicyTerms of Use

Product

Chainguard Images