tritonserver

Last changed

Request a free trial

Contact our team to test out this image for free. Please also indicate any other images you would like to evaluate.

Chainguard Container for tritonserver

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Chainguard Containers are regularly-updated, secure-by-default container images.

Download this Container Image

For those with access, this container image is available on cgr.dev:


docker pull cgr.dev/ORGANIZATION/tritonserver:latest

Be sure to replace the ORGANIZATION placeholder with the name used for your organization's private repository within the Chainguard Registry.

Compatibility Notes

This image supports the Python, ONNX Runtime, OpenVINO and TensorRT backends only.

Getting Started

You can test this image locally with docker:


docker run -it \
 -p 8000:8000 -p 8001:8001 -p 8002:8002 \
 -v ${YOUR_MODELS_DIRECTORY:-$PWD}:/models \
 --gpus all \
 cgr.dev/ORGANIZATION/tritonserver:latest \
 --model-repository=/models

If you wish to run the server on CPU only, omit the --gpus all line.

Examples

The following examples will use a shared repository for all the backends, you can get started by create a project directory and navigate into it:


mkdir -p ~/triton-examples && cd $_

Then download the example model server, client script, and configuration files:


curl https://codeload.github.com/chainguard-dev/triton-examples/tar.gz/main | \
 tar -xz --strip=1 triton-examples-main

After downloading these files, your folder structure should be as follows:


.
├── client.py
├── onnxruntime-backend
│   ├── fetch-model.sh
│   └── onnxruntime
│       ├── 1
│       │   └── model.onnx
│       └── config.pbtxt
├── openvino-backend
│   ├── fetch-model.sh
│   └── openvino
│       ├── 1
│       │   └── model.onnx
│       └── config.pbtxt
├── python-backend
│   └── python
│       ├── 1
│       │   └── model.py
│       └── config.pbtxt
├── README.md
└── tensorrt-backend
    ├── fetch-model.sh
    ├── model.onnx
    └── tensorrt
        ├── 1
        └── config.pbtxt

You can now connect to the server using a client for each of the examples. For simplicity, we will run a client script on the host machine, but client inference can be containerized using the Python Chainguard Container Image for inclusion in your orchestration setup.

Assuming that you have Python on your system's path as python, create a virtual environment:


python3 -m venv venv && source venv/bin/activate

Install the Triton client library using pip:


pip install 'tritonclient[grpc]'

The client now should be runnable under the current directory:


python3 ./client.py --help


usage: Tritonserver Client Tests [-h] [-s SERVER] model

Testing for Tritonserver

positional arguments:
  model                Model that will be used with the client

options:
  -h, --help           show this help message and exit
  -s, --server SERVER  Host that will be used for the GRPC client (e.g.: localhost:8001)

Python backend

The following example runs a variant of the add_sub example for the Triton Server Python backend.

Change your working directory to the python-backend directory. This directory will be mounted on our image as our model repository:


cd ~/triton-examples/python-backend

Run the following command to mount the model repository and run the server specified in the model.py file:


docker run -it \
 -p 8000:8000 -p 8001:8001 -p 8002:8002 \
 -v $PWD:/models \
 --gpus all \
 cgr.dev/ORGANIZATION/tritonserver:latest \
 --model-repository=/models

You should see output detailing the running Triton Inference Server process. Included in this output should be the status of the python model:


+---------+---------+--------+
| Model   | Version | Status |
+---------+---------+--------+
| python  | 1       | READY  |
+---------+---------+--------+

Then run the client script:


python ../client.py python

If the test is successful, you should receive output similar to the following:


[
    {
        "input": [
            "[[0.5522413  0.64158934 0.19804768 0.87941355 0.5255043 ]\n [0.03742671 0.5047181  0.5687971  0.7528154  0.09557169]\n [0.8530532  0.3704309  0.11962368 0.2563551  0.7490047 ]\n [0.61212635 0.43093833 0.44432703 0.20261322 0.06146438]\n [0.24954486 0.0787174  0.1349516  0.717098   0.46025884]]"
        ],
        "expected": "[[1.1044827  1.2831787  0.39609537 1.7588271  1.0510086 ]\n [0.07485342 1.0094362  1.1375942  1.5056309  0.19114338]\n [1.7061064  0.7408618  0.23924737 0.5127102  1.4980094 ]\n [1.2242527  0.86187667 0.88865405 0.40522644 0.12292876]\n [0.49908972 0.1574348  0.2699032  1.434196   0.9205177 ]]",
        "output": "[[1.1044827  1.2831787  0.39609537 1.7588271  1.0510086 ]\n [0.07485342 1.0094362  1.1375942  1.5056309  0.19114338]\n [1.7061064  0.7408618  0.23924737 0.5127102  1.4980094 ]\n [1.2242527  0.86187667 0.88865405 0.40522644 0.12292876]\n [0.49908972 0.1574348  0.2699032  1.434196   0.9205177 ]]",
        "successful": true
    }
]

This shows that the client successfully connected to the model server and executed elementwise addition and subtraction operations on two sample vectors.

ONNX Runtime backend

Change your working directory to the onnxruntime-backend directory. This directory will be mounted on our image as our model repository:


cd ~/triton-examples/onnxruntime-backend

This model requires an onnx model that will be fetched from the internet, you can run the script on the current directory to fetch it to the model storage location for the onnxruntime model:


./fetch-model.sh


+ mkdir -p onnxruntime/1
+ curl -fSLo ./onnxruntime/1/model.onnx https://github.com/triton-inference-server/onnxruntime_backend/raw/604ee7ae2d75d0204ec756aaf7d7edf5317e7dcc/test/initializer_as_input/models/add_with_initializer/1/model.onnx
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   269  100   269    0     0   2402      0 --:--:-- --:--:-- --:--:--  2402
+ set +x
Model successfully fetched to onnxruntime/1/model.onnx

Run the following command to mount the model repository and run the server:


docker run -it \
 -p 8000:8000 -p 8001:8001 -p 8002:8002 \
 -v $PWD:/models \
 --gpus all \
 cgr.dev/ORGANIZATION/tritonserver:latest \
 --model-repository=/models

You should see output detailing the running Triton Inference Server process. Included in this output should be the status of the onnxruntime model:


+-------------+---------+--------+
| Model       | Version | Status |
+-------------+---------+--------+
| onnxruntime | 1       | READY  |
+-------------+---------+--------+

Then run the client script:


python ../client.py onnxruntime

If the test is successful, you should receive output similar to the following:


[
    {
        "input": [
            "[[0.04237414 0.63609475 0.88362867 0.724177   0.240701  ]\n [0.358571   0.16024649 0.12010413 0.47096097 0.09345072]\n [0.6444194  0.61650777 0.6638608  0.49962732 0.3688811 ]\n [0.0204376  0.6174347  0.05064286 0.04272859 0.49577346]\n [0.68124044 0.77822125 0.6928203  0.50161165 0.25527555]]"
        ],
        "expected": "[[0.08474828 1.2721895  1.7672573  1.448354   0.481402  ]\n [0.717142   0.32049298 0.24020825 0.94192195 0.18690144]\n [1.2888387  1.2330155  1.3277216  0.99925464 0.7377622 ]\n [0.0408752  1.2348694  0.10128573 0.08545718 0.9915469 ]\n [1.3624809  1.5564425  1.3856406  1.0032233  0.5105511 ]]",
        "output": "[[0.08474828 1.2721895  1.7672573  1.448354   0.481402  ]\n [0.717142   0.32049298 0.24020825 0.94192195 0.18690144]\n [1.2888387  1.2330155  1.3277216  0.99925464 0.7377622 ]\n [0.0408752  1.2348694  0.10128573 0.08545718 0.9915469 ]\n [1.3624809  1.5564425  1.3856406  1.0032233  0.5105511 ]]",
        "successful": true
    }
]

This shows that the client successfully connected to the model server and executed the scalar multiplication of a vector by the scalar 2.

OpenVINO backend

Change your working directory to the openvino-backend directory. This directory will be mounted on our image as our model repository:


cd ~/triton-examples/openvino-backend

This model can run using an onnx model which will be fetched from the internet, you can run the script on the current directory to fetch it to the model storage location for the openvino model:


./fetch-model.sh


+ mkdir -p openvino/1
+ curl -fSLo ./openvino/1/model.onnx https://github.com/onnx/models/raw/b1eeaa1ac722dcc1cd1a8284bde34393dab61c3d/validated/vision/classification/resnet/model/resnet50-caffe2-v1-9.onnx
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 97.7M  100 97.7M    0     0  5391k      0  0:00:18  0:00:18 --:--:-- 10.1M
+ set +x
Model successfully fetched to openvino/1/model.onnx

Run the following command to mount the model repository and run the server:


docker run -it \
 -p 8000:8000 -p 8001:8001 -p 8002:8002 \
 -v $PWD:/models \
 --gpus all \
 cgr.dev/ORGANIZATION/tritonserver:latest \
 --model-repository=/models

You should see output detailing the running Triton Inference Server process. Included in this output should be the status of the onnxruntime model:


+----------+---------+--------+
| Model    | Version | Status |
+----------+---------+--------+
| openvino | 1       | READY  |
+----------+---------+--------+

Then run the client script:


python ../client.py openvino

If the test is successful, you should receive output similar to the following:


[
    {
        "input": [
            "[[[[0.76338285 0.46184912 0.92636037 ... 0.4257808  0.61404836\n    0.9067718 ]\n   [0.65512913 0.74693495 0.07375129 ... 0.37925065 0.4888047\n    0.04267222]\n   [0.04240799 0.08182416 0.69489807 ... 0.4103226  0.054923\n  0.0582601 ]\n   ...\n   [0.9834254  0.7005278  0.11914089 ... 0.29851222 0.14448294\n    0.65900624]\n   [0.154907760.6532571  0.8287187  ... 0.36543208 0.12733477\n    0.3147746 ]\n   [0.45976332 0.68108255 0.8520731  ... 0.99021596 0.9573471\n    0.7810805 ]]\n\n  [[0.0842445  0.3005944  0.3265607  ... 0.6121345  0.5080284\n    0.85021585]\n   [0.24282897 0.4927684  0.4689886  ... 0.99156994 0.75396144\n    0.4774928 ]\n   [0.80796444 0.00248269 0.13700046 ... 0.14362834 0.8269185\n    0.28405726]\n   ...\n   [0.8429374  0.13909613 0.65293604 ... 0.04426242 0.19225791\n    0.33422643]\n   [0.26046273 0.6121224  0.576417   ... 0.46340346 0.608027\n    0.39018032]\n   [0.7119001  0.4588718  0.15979071... 0.3650059  0.83611363\n    0.6298459 ]]\n\n  [[0.00699139 0.36632583 0.6074161  ... 0.08094972 0.55059016\n    0.0456534 ]\n   [0.3950255  0.6318781  0.43853968 ... 0.09412231 0.06041615\n    0.84371537]\n   [0.06924959 0.74535745 0.61118585 ... 0.07594369 0.4584373\n    0.41392347]\n   ...\n   [0.47875118 0.52679694 0.2972078  ... 0.40715238 0.58498055\n    0.6465085 ]\n   [0.31188497 0.51325756 0.22442417 ... 0.31170854 0.8710871\n    0.2910038 ]\n   [0.6793682  0.49418375 0.41446647 ... 0.6936627  0.9575656\n    0.14582857]]]]"
        ],
        "expected": "(1,1000)",
        "output": [
            1,
            1000
        ],
        "successful": true
    }
]

This shows that the client successfully connected to the model server and executed a translation of a vector to a specific 1 by 1000 matrix shape

TensorRT backend

Change your working directory to the tensorrt-backend directory. This directory will be mounted on our image as our model repository:


cd ~/triton-examples/tensorrt-backend

This model requires the translation of an onnx model into a plan TensorRT engine, first, fetch the onnx model from the internet with the following command:


./fetch-model.sh


+ curl -fSLo ./model.onnx https://raw.githubusercontent.com/triton-inference-server/onnxruntime_backend/refs/heads/main/test/initializer_as_input/models/add_with_initializer/1/model.onnx
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   269  100   269    0     0   1252      0 --:--:-- --:--:-- --:--:--  1257
+ set +x
Model successfully fetched to the current working directory

Then, translate the model to a model.plan file by using the next command, it will place the file in your current directory:


docker run \
  --gpus all \
  --rm -it \
  -u "$(id -u)" \
  -e "LD_LIBRARY_PATH=/usr/local/tensorrt/lib" \
  -v "${PWD}:/work" \
  -w "/work" \
  --entrypoint /usr/local/tensorrt/bin/trtexec \
  cgr.dev/ORGANIZATION/tritonserver:latest \
  --onnx=model.onnx --saveEngine=model.plan --fp16

Move the model to the tensorrt model repository:


mv ./model.plan tensorrt/1/model.plan

Run the following command to mount the model repository and run the server:


docker run -it \
 -p 8000:8000 -p 8001:8001 -p 8002:8002 \
 -v $PWD:/models \
 --gpus all \
 cgr.dev/ORGANIZATION/tritonserver:latest \
 --model-repository=/models

You should see output detailing the running Triton Inference Server process. Included in this output should be the status of the tensorrt model:


+----------+---------+--------+
| Model    | Version | Status |
+----------+---------+--------+
| tensorrt | 1       | READY  |
+----------+---------+--------+

Then run the client script:


python ../client.py tensorrt

If the test is successful, you should receive output similar to the following:


[
    {
        "input": [
            "[[0.5522413  0.64158934 0.19804768 0.87941355 0.5255043 ]\n [0.03742671 0.5047181  0.5687971  0.7528154  0.09557169]\n [0.8530532  0.3704309  0.11962368 0.2563551  0.7490047 ]\n [0.61212635 0.43093833 0.44432703 0.20261322 0.06146438]\n [0.24954486 0.0787174  0.1349516  0.717098   0.46025884]]"
        ],
        "expected": "[[1.1044827  1.2831787  0.39609537 1.7588271  1.0510086 ]\n [0.07485342 1.0094362  1.1375942  1.5056309  0.19114338]\n [1.7061064  0.7408618  0.23924737 0.5127102  1.4980094 ]\n [1.2242527  0.86187667 0.88865405 0.40522644 0.12292876]\n [0.49908972 0.1574348  0.2699032  1.434196   0.9205177 ]]",
        "output": "[[1.1044827  1.2831787  0.39609537 1.7588271  1.0510086 ]\n [0.07485342 1.0094362  1.1375942  1.5056309  0.19114338]\n [1.7061064  0.7408618  0.23924737 0.5127102  1.4980094 ]\n [1.2242527  0.86187667 0.88865405 0.40522644 0.12292876]\n [0.49908972 0.1574348  0.2699032  1.434196   0.9205177 ]]",
        "successful": true
    }
]

This shows that the client successfully connected to the model server and executed an element-wise addition of the scalar 1 to all elements on a random matrix

Documentation and Resources

What are Chainguard Containers?

Chainguard's free tier of Starter container images are built with Wolfi, our minimal Linux undistro.

All other Chainguard Containers are built with Chainguard OS, Chainguard's minimal Linux operating system designed to produce container images that meet the requirements of a more secure software supply chain.

The main features of Chainguard Containers include:

Minimal design, without unnecessary software bloat
Daily builds to ensure container images are up-to-date with available security patches
High quality build-time SBOMs attesting to the provenance of all artifacts within the image
Verifiable signatures provided by Sigstore
Reproducible builds with Cosign and apko (read more about reproducibility)

For cases where you need container images with shells and package managers to build or debug, most Chainguard Containers come paired with a development, or -dev, variant.

In all other cases, including Chainguard Containers tagged as :latest or with a specific version number, the container images include only an open-source application and its runtime dependencies. These minimal container images typically do not contain a shell or package manager.

Although the -dev container image variants have similar security features as their more minimal versions, they include additional software that is typically not necessary in production environments. We recommend using multi-stage builds to copy artifacts from the -dev variant into a more minimal production image.

Need additional packages?

To improve security, Chainguard Containers include only essential dependencies. Need more packages? Chainguard customers can use Custom Assembly to add packages, either through the Console, chainctl, or API.

To use Custom Assembly in the Chainguard Console: navigate to the image you'd like to customize in your Organization's list of images, and click on the Customize image button at the top of the page.

Learn More

Refer to our Chainguard Containers documentation on Chainguard Academy. Chainguard also offers VMs and Libraries — contact us for access.

Trademarks

This software listing is packaged by Chainguard. The trademarks set forth in this offering are owned by their respective companies, and use of them does not imply any affiliation, sponsorship, or endorsement by such companies.

Licenses

Chainguard container images contain software packages that are direct or transitive dependencies. The following licenses were found in the "latest" tag of this image:

(GPL-2.0-only
Apache-2.0
BSD-1-Clause
BSD-2-Clause
BSD-3-Clause
BSD-3-Clause)
BSD-4-Clause-UC

For a complete list of licenses, please refer to this Image's SBOM.

Software license agreement

Compliance

A FIPS validated version of this image is available for FedRAMP compliance. STIG is included with FIPS image.

Related images

tritonserver-fips

Product

Chainguard Containers Chainguard Libraries Chainguard VMs Integrations Pricing

Solutions

FedRAMP PCI DSS Golden Images CVE Remediation Public Sector

Customers

Customer Stories Chainguard Reviews

Resources

Events & Webinars Chainguard Courses Documentation Trust Center

Company

About Us Blog Partners Newsroom Careers Legal