dcgm-exporter

Last changed

Request a free trial

Contact our team to test out this image for free. Please also indicate any other images you would like to evaluate.

Chainguard Container for dcgm-exporter

Chainguard Containers are regularly-updated, secure-by-default container images.

Download this Container Image

For those with access, this container image is available on cgr.dev:


docker pull cgr.dev/ORGANIZATION/dcgm-exporter:latest

Be sure to replace the ORGANIZATION placeholder with the name used for your organization's private repository within the Chainguard Registry.

Usage

DCGM-Exporter is a tool based on the Go APIs to NVIDIA DCGM that allows users to gather GPU metrics and understand workload behavior or monitor GPUs in clusters. DCGM Exporter is written in Go and exposes GPU metrics at an HTTP endpoint (/metrics) for monitoring solutions such as Prometheus.

To test the functionality of NVIDIA DCGM Exporter Image, it requires an environment with connected GPUs. If you have connected GPUs, here's one way to use this image:

Using Docker

Run Image

Install Docker Engine and configure it with your credentials to pull image

Run the image:


docker run -d --rm \
   --gpus all \
   --net host \
   --cap-add SYS_ADMIN \
   cgr.dev/chainguard/dcgm-exporter:latest \
   -f /etc/dcgm-exporter/dcp-metrics-included.csv

Retreive the metrics


$ curl localhost:9400/metrics

Output should be something like this


# HELP DCGM_FI_DEV_SM_CLOCK SM clock frequency (in MHz).
# TYPE DCGM_FI_DEV_SM_CLOCK gauge
# HELP DCGM_FI_DEV_MEM_CLOCK Memory clock frequency (in MHz).
# TYPE DCGM_FI_DEV_MEM_CLOCK gauge
# HELP DCGM_FI_DEV_MEMORY_TEMP Memory temperature (in C).
# TYPE DCGM_FI_DEV_MEMORY_TEMP gauge
...
DCGM_FI_DEV_SM_CLOCK{gpu="0", UUID="GPU-604ac76c-d9cf-fef3-62e9-d92044ab6e52"} 139
DCGM_FI_DEV_MEM_CLOCK{gpu="0", UUID="GPU-604ac76c-d9cf-fef3-62e9-d92044ab6e52"} 405
DCGM_FI_DEV_MEMORY_TEMP{gpu="0", UUID="GPU-604ac76c-d9cf-fef3-62e9-d92044ab6e52"} 9223372036854775794
...

Helm Installation

Step 1: Add and Update Helm Repository Add the NVIDIA DCGM Exporter repository and update it to ensure you have access to the latest charts.


$ helm repo add gpu-helm-charts \
   https://nvidia.github.io/dcgm-exporter/helm-charts

$ helm repo update

Step 2: Install NVIDIA DCGM Exporter

Install NVIDIA DCGM Exporter using Helm with the specified version, namespace, and optional configuration settings.


$ helm install \
  --generate-name \
  gpu-helm-charts/dcgm-exporter \
  --set image.repository=cgr.dev/chainguard/dcgm-exporter \
  --set image.tag=latest

Step 3: Verify Installation


$ kubectl get pods -A

NAMESPACE     NAME                                                              READY   STATUS      RESTARTS   AGE
default       dcgm-exporter-2-1603213075-w27mx                                  1/1     Running     0          2m18s
kube-system   calico-kube-controllers-8f59968d4-g28x8                           1/1     Running     1          43m
kube-system   calico-node-zfnfk                                                 1/1     Running     1          43m
kube-system   coredns-f9fd979d6-p7djj                                           1/1     Running     1          43m
kube-system   coredns-f9fd979d6-qhhgq                                           1/1     Running     1          43m
kube-system   etcd-ip-172-31-92-253                                             1/1     Running     1          43m
kube-system   kube-apiserver-ip-172-31-92-253                                   1/1     Running     2          43m
kube-system   kube-controller-manager-ip-172-31-92-253                          1/1     Running     1          43m
kube-system   kube-proxy-mh528                                                  1/1     Running     1          43m
kube-system   kube-scheduler-ip-172-31-92-253                                   1/1     Running     1          43m
kube-system   nvidia-device-plugin-1603211071-7hlk6                             1/1     Running     0          35m
prometheus    alertmanager-kube-prometheus-stack-1603-alertmanager-0            2/2     Running     0          33m
prometheus    kube-prometheus-stack-1603-operator-6b95bcdc79-wmbkn              2/2     Running     0          33m
prometheus    kube-prometheus-stack-1603211794-grafana-67ff56c449-tlmxc         2/2     Running     0          33m
prometheus    kube-prometheus-stack-1603211794-kube-state-metrics-877df67c49f   1/1     Running     0          33m
prometheus    kube-prometheus-stack-1603211794-prometheus-node-exporter-b5fl9   1/1     Running     0          33m
prometheus    prometheus-kube-prometheus-stack-1603-prometheus-0                3/3     Running     1          33m

For more information and setting it up with prometheus stack, refer to the official documentation:

Helm Installation Guide

What are Chainguard Containers?

Chainguard Containers are minimal container images that are secure by default.

In many cases, the Chainguard Containers tagged as :latest contain only an open-source application and its runtime dependencies. These minimal container images typically do not contain a shell or package manager. Chainguard Containers are built with Wolfi, our Linux undistro designed to produce container images that meet the requirements of a more secure software supply chain.

The main features of Chainguard Containers include:

Minimal design, without unnecessary software bloat
Daily builds to ensure container images are up-to-date with available security patches
High quality build-time SBOMs attesting to the provenance of all artifacts within the image
Verifiable signatures provided by Sigstore
Reproducible builds with Cosign and apko (read more about reproducibility)

For cases where you need container images with shells and package managers to build or debug, most Chainguard Containers come paired with a -dev variant.

Although the -dev container image variants have similar security features as their more minimal versions, they feature additional software that is typically not necessary in production environments. We recommend using multi-stage builds to leverage the -dev variants, copying application artifacts into a final minimal container that offers a reduced attack surface that won’t allow package installations or logins.

Learn More

To better understand how to work with Chainguard Containers, please visit Chainguard Academy and Chainguard Courses.

In addition to Containers, Chainguard offers VMs and Libraries. Contact Chainguard to access additional products.

Trademarks

This software listing is packaged by Chainguard. The trademarks set forth in this offering are owned by their respective companies, and use of them does not imply any affiliation, sponsorship, or endorsement by such companies.

Licenses

Chainguard container images contain software packages that are direct or transitive dependencies. The following licenses were found in the "latest" tag of this image:

Apache-2.0
BSD-3-Clause
GCC-exception-3.1
GPL-3.0-or-later
LGPL-2.1-or-later
MIT
MPL-2.0

For a complete list of licenses, please refer to this Image's SBOM.

Software license agreement

Compliance

A FIPS validated version of this image is available for FedRAMP compliance. STIG is included with FIPS image.

Related images

dcgm-exporter-fips