Chainguard Container for tritonserver-fips
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Chainguard Containers are regularly-updated, secure-by-default container images.
Download this Container Image
For those with access, this container image is available on cgr.dev
:
docker pull cgr.dev/ORGANIZATION/tritonserver-fips:latest
Be sure to replace the ORGANIZATION
placeholder with the name used for your organization's private repository within the Chainguard Registry.
Compatibility Notes
This image supports the Python, ONNX Runtime, OpenVINO and TensorRT backends only.
FIPS Support
The tritonserver-fips
Chainguard Image ships with a validated redistribution of the OpenSSL's FIPS provider module. For more on FIPS support in Chainguard Images, consult the guide on FIPS-enabled Chainguard Images on Chainguard Academy
Getting Started
You can test this image locally with docker
:
docker run -it \
-p 8000:8000 -p 8001:8001 -p 8002:8002 \
-v ${YOUR_MODELS_DIRECTORY:-$PWD}:/models \
--gpus all \
cgr.dev/ORGANIZATION/tritonserver-fips:latest \
--model-repository=/models
If you wish to run the server on CPU only, omit the --gpus all
line.
Examples
The following examples will use a shared repository for all the backends, you can get started by create a project directory and navigate into it:
mkdir -p ~/triton-examples && cd $_
Then download the example model server, client script, and configuration files:
curl https://codeload.github.com/chainguard-dev/triton-examples/tar.gz/main | \
tar -xz --strip=1 triton-examples-main
After downloading these files, your folder structure should be as follows:
.
├── client.py
├── onnxruntime-backend
│  ├── fetch-model.sh
│  └── onnxruntime
│  ├── 1
│  │  └── model.onnx
│  └── config.pbtxt
├── openvino-backend
│  ├── fetch-model.sh
│  └── openvino
│  ├── 1
│  │  └── model.onnx
│  └── config.pbtxt
├── python-backend
│  └── python
│  ├── 1
│  │  └── model.py
│  └── config.pbtxt
├── README.md
└── tensorrt-backend
├── fetch-model.sh
├── model.onnx
└── tensorrt
├── 1
└── config.pbtxt
You can now connect to the server using a client for each of the examples. For simplicity, we will run a client script on the host machine, but client inference can be containerized using the Python Chainguard Container Image for inclusion in your orchestration setup.
Assuming that you have Python on your system's path as python
, create a virtual environment:
python3 -m venv venv && source venv/bin/activate
Install the Triton client library using pip
:
pip install 'tritonclient[grpc]'
The client now should be runnable under the current directory:
python3 ./client.py --help
usage: Tritonserver Client Tests [-h] [-s SERVER] model
Testing for Tritonserver
positional arguments:
model Model that will be used with the client
options:
-h, --help show this help message and exit
-s, --server SERVER Host that will be used for the GRPC client (e.g.: localhost:8001)
Python backend
The following example runs a variant of the add_sub example for the Triton Server Python backend.
Change your working directory to the python-backend
directory. This directory will be mounted on our image as our model repository:
cd ~/triton-examples/python-backend
Run the following command to mount the model repository and run the server specified in the model.py
file:
docker run -it \
-p 8000:8000 -p 8001:8001 -p 8002:8002 \
-v $PWD:/models \
--gpus all \
cgr.dev/ORGANIZATION/tritonserver-fips:latest \
--model-repository=/models
You should see output detailing the running Triton Inference Server process. Included in this output should be the status of the python
model:
+---------+---------+--------+
| Model | Version | Status |
+---------+---------+--------+
| python | 1 | READY |
+---------+---------+--------+
Then run the client script:
python ../client.py python
If the test is successful, you should receive output similar to the following:
[
{
"input": [
"[[0.5522413 0.64158934 0.19804768 0.87941355 0.5255043 ]\n [0.03742671 0.5047181 0.5687971 0.7528154 0.09557169]\n [0.8530532 0.3704309 0.11962368 0.2563551 0.7490047 ]\n [0.61212635 0.43093833 0.44432703 0.20261322 0.06146438]\n [0.24954486 0.0787174 0.1349516 0.717098 0.46025884]]"
],
"expected": "[[1.1044827 1.2831787 0.39609537 1.7588271 1.0510086 ]\n [0.07485342 1.0094362 1.1375942 1.5056309 0.19114338]\n [1.7061064 0.7408618 0.23924737 0.5127102 1.4980094 ]\n [1.2242527 0.86187667 0.88865405 0.40522644 0.12292876]\n [0.49908972 0.1574348 0.2699032 1.434196 0.9205177 ]]",
"output": "[[1.1044827 1.2831787 0.39609537 1.7588271 1.0510086 ]\n [0.07485342 1.0094362 1.1375942 1.5056309 0.19114338]\n [1.7061064 0.7408618 0.23924737 0.5127102 1.4980094 ]\n [1.2242527 0.86187667 0.88865405 0.40522644 0.12292876]\n [0.49908972 0.1574348 0.2699032 1.434196 0.9205177 ]]",
"successful": true
}
]
This shows that the client successfully connected to the model server and executed elementwise addition and subtraction operations on two sample vectors.
ONNX Runtime backend
Change your working directory to the onnxruntime-backend
directory. This directory will be mounted on our image as our model repository:
cd ~/triton-examples/onnxruntime-backend
This model requires an onnx
model that will be fetched from the internet, you can run the script on the current directory to fetch it to the model storage location for the onnxruntime model:
./fetch-model.sh
+ mkdir -p onnxruntime/1
+ curl -fSLo ./onnxruntime/1/model.onnx https://github.com/triton-inference-server/onnxruntime_backend/raw/604ee7ae2d75d0204ec756aaf7d7edf5317e7dcc/test/initializer_as_input/models/add_with_initializer/1/model.onnx
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 269 100 269 0 0 2402 0 --:--:-- --:--:-- --:--:-- 2402
+ set +x
Model successfully fetched to onnxruntime/1/model.onnx
Run the following command to mount the model repository and run the server:
docker run -it \
-p 8000:8000 -p 8001:8001 -p 8002:8002 \
-v $PWD:/models \
--gpus all \
cgr.dev/ORGANIZATION/tritonserver-fips:latest \
--model-repository=/models
You should see output detailing the running Triton Inference Server process. Included in this output should be the status of the onnxruntime
model:
+-------------+---------+--------+
| Model | Version | Status |
+-------------+---------+--------+
| onnxruntime | 1 | READY |
+-------------+---------+--------+
Then run the client script:
python ../client.py onnxruntime
If the test is successful, you should receive output similar to the following:
[
{
"input": [
"[[0.04237414 0.63609475 0.88362867 0.724177 0.240701 ]\n [0.358571 0.16024649 0.12010413 0.47096097 0.09345072]\n [0.6444194 0.61650777 0.6638608 0.49962732 0.3688811 ]\n [0.0204376 0.6174347 0.05064286 0.04272859 0.49577346]\n [0.68124044 0.77822125 0.6928203 0.50161165 0.25527555]]"
],
"expected": "[[0.08474828 1.2721895 1.7672573 1.448354 0.481402 ]\n [0.717142 0.32049298 0.24020825 0.94192195 0.18690144]\n [1.2888387 1.2330155 1.3277216 0.99925464 0.7377622 ]\n [0.0408752 1.2348694 0.10128573 0.08545718 0.9915469 ]\n [1.3624809 1.5564425 1.3856406 1.0032233 0.5105511 ]]",
"output": "[[0.08474828 1.2721895 1.7672573 1.448354 0.481402 ]\n [0.717142 0.32049298 0.24020825 0.94192195 0.18690144]\n [1.2888387 1.2330155 1.3277216 0.99925464 0.7377622 ]\n [0.0408752 1.2348694 0.10128573 0.08545718 0.9915469 ]\n [1.3624809 1.5564425 1.3856406 1.0032233 0.5105511 ]]",
"successful": true
}
]
This shows that the client successfully connected to the model server and executed the scalar multiplication of a vector by the scalar 2.
OpenVINO backend
Change your working directory to the openvino-backend
directory. This directory will be mounted on our image as our model repository:
cd ~/triton-examples/openvino-backend
This model can run using an onnx
model which will be fetched from the internet, you can run the script on the current directory to fetch it to the model storage location for the openvino model:
./fetch-model.sh
+ mkdir -p openvino/1
+ curl -fSLo ./openvino/1/model.onnx https://github.com/onnx/models/raw/b1eeaa1ac722dcc1cd1a8284bde34393dab61c3d/validated/vision/classification/resnet/model/resnet50-caffe2-v1-9.onnx
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 97.7M 100 97.7M 0 0 5391k 0 0:00:18 0:00:18 --:--:-- 10.1M
+ set +x
Model successfully fetched to openvino/1/model.onnx
Run the following command to mount the model repository and run the server:
docker run -it \
-p 8000:8000 -p 8001:8001 -p 8002:8002 \
-v $PWD:/models \
--gpus all \
cgr.dev/ORGANIZATION/tritonserver-fips:latest \
--model-repository=/models
You should see output detailing the running Triton Inference Server process. Included in this output should be the status of the onnxruntime
model:
+----------+---------+--------+
| Model | Version | Status |
+----------+---------+--------+
| openvino | 1 | READY |
+----------+---------+--------+
Then run the client script:
python ../client.py openvino
If the test is successful, you should receive output similar to the following:
[
{
"input": [
"[[[[0.76338285 0.46184912 0.92636037 ... 0.4257808 0.61404836\n 0.9067718 ]\n [0.65512913 0.74693495 0.07375129 ... 0.37925065 0.4888047\n 0.04267222]\n [0.04240799 0.08182416 0.69489807 ... 0.4103226 0.054923\n 0.0582601 ]\n ...\n [0.9834254 0.7005278 0.11914089 ... 0.29851222 0.14448294\n 0.65900624]\n [0.154907760.6532571 0.8287187 ... 0.36543208 0.12733477\n 0.3147746 ]\n [0.45976332 0.68108255 0.8520731 ... 0.99021596 0.9573471\n 0.7810805 ]]\n\n [[0.0842445 0.3005944 0.3265607 ... 0.6121345 0.5080284\n 0.85021585]\n [0.24282897 0.4927684 0.4689886 ... 0.99156994 0.75396144\n 0.4774928 ]\n [0.80796444 0.00248269 0.13700046 ... 0.14362834 0.8269185\n 0.28405726]\n ...\n [0.8429374 0.13909613 0.65293604 ... 0.04426242 0.19225791\n 0.33422643]\n [0.26046273 0.6121224 0.576417 ... 0.46340346 0.608027\n 0.39018032]\n [0.7119001 0.4588718 0.15979071... 0.3650059 0.83611363\n 0.6298459 ]]\n\n [[0.00699139 0.36632583 0.6074161 ... 0.08094972 0.55059016\n 0.0456534 ]\n [0.3950255 0.6318781 0.43853968 ... 0.09412231 0.06041615\n 0.84371537]\n [0.06924959 0.74535745 0.61118585 ... 0.07594369 0.4584373\n 0.41392347]\n ...\n [0.47875118 0.52679694 0.2972078 ... 0.40715238 0.58498055\n 0.6465085 ]\n [0.31188497 0.51325756 0.22442417 ... 0.31170854 0.8710871\n 0.2910038 ]\n [0.6793682 0.49418375 0.41446647 ... 0.6936627 0.9575656\n 0.14582857]]]]"
],
"expected": "(1,1000)",
"output": [
1,
1000
],
"successful": true
}
]
This shows that the client successfully connected to the model server and executed a translation of a vector to a specific 1 by 1000 matrix shape
TensorRT backend
Change your working directory to the tensorrt-backend
directory. This directory will be mounted on our image as our model repository:
cd ~/triton-examples/tensorrt-backend
This model requires the translation of an onnx
model into a plan
TensorRT engine, first, fetch the onnx
model from the internet with the following command:
./fetch-model.sh
+ curl -fSLo ./model.onnx https://raw.githubusercontent.com/triton-inference-server/onnxruntime_backend/refs/heads/main/test/initializer_as_input/models/add_with_initializer/1/model.onnx
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 269 100 269 0 0 1252 0 --:--:-- --:--:-- --:--:-- 1257
+ set +x
Model successfully fetched to the current working directory
Then, translate the model to a model.plan
file by using the next command, it will place the file in your current directory:
docker run \
--gpus all \
--rm -it \
-u "$(id -u)" \
-e "LD_LIBRARY_PATH=/usr/local/tensorrt/lib" \
-v "${PWD}:/work" \
-w "/work" \
--entrypoint /usr/local/tensorrt/bin/trtexec \
cgr.dev/ORGANIZATION/tritonserver-fips:latest \
--onnx=model.onnx --saveEngine=model.plan --fp16
Move the model to the tensorrt
model repository:
mv ./model.plan tensorrt/1/model.plan
Run the following command to mount the model repository and run the server:
docker run -it \
-p 8000:8000 -p 8001:8001 -p 8002:8002 \
-v $PWD:/models \
--gpus all \
cgr.dev/ORGANIZATION/tritonserver-fips:latest \
--model-repository=/models
You should see output detailing the running Triton Inference Server process. Included in this output should be the status of the tensorrt
model:
+----------+---------+--------+
| Model | Version | Status |
+----------+---------+--------+
| tensorrt | 1 | READY |
+----------+---------+--------+
Then run the client script:
python ../client.py tensorrt
If the test is successful, you should receive output similar to the following:
[
{
"input": [
"[[0.5522413 0.64158934 0.19804768 0.87941355 0.5255043 ]\n [0.03742671 0.5047181 0.5687971 0.7528154 0.09557169]\n [0.8530532 0.3704309 0.11962368 0.2563551 0.7490047 ]\n [0.61212635 0.43093833 0.44432703 0.20261322 0.06146438]\n [0.24954486 0.0787174 0.1349516 0.717098 0.46025884]]"
],
"expected": "[[1.1044827 1.2831787 0.39609537 1.7588271 1.0510086 ]\n [0.07485342 1.0094362 1.1375942 1.5056309 0.19114338]\n [1.7061064 0.7408618 0.23924737 0.5127102 1.4980094 ]\n [1.2242527 0.86187667 0.88865405 0.40522644 0.12292876]\n [0.49908972 0.1574348 0.2699032 1.434196 0.9205177 ]]",
"output": "[[1.1044827 1.2831787 0.39609537 1.7588271 1.0510086 ]\n [0.07485342 1.0094362 1.1375942 1.5056309 0.19114338]\n [1.7061064 0.7408618 0.23924737 0.5127102 1.4980094 ]\n [1.2242527 0.86187667 0.88865405 0.40522644 0.12292876]\n [0.49908972 0.1574348 0.2699032 1.434196 0.9205177 ]]",
"successful": true
}
]
This shows that the client successfully connected to the model server and executed an element-wise addition of the scalar 1 to all elements on a random matrix
Documentation and Resources
What are Chainguard Containers?
Chainguard Containers are minimal container images that are secure by default.
In many cases, the Chainguard Containers tagged as :latest
contain only an open-source application and its runtime dependencies. These minimal container images typically do not contain a shell or package manager. Chainguard Containers are built with Wolfi, our Linux undistro designed to produce container images that meet the requirements of a more secure software supply chain.
The main features of Chainguard Containers include:
For cases where you need container images with shells and package managers to build or debug, most Chainguard Containers come paired with a -dev
variant.
Although the -dev
container image variants have similar security features as their more minimal versions, they feature additional software that is typically not necessary in production environments. We recommend using multi-stage builds to leverage the -dev
variants, copying application artifacts into a final minimal container that offers a reduced attack surface that won’t allow package installations or logins.
Learn More
To better understand how to work with Chainguard Containers, please visit Chainguard Academy and Chainguard Courses.
In addition to Containers, Chainguard offers VMs and Libraries. Contact Chainguard to access additional products.
Trademarks
This software listing is packaged by Chainguard. The trademarks set forth in this offering are owned by their respective companies, and use of them does not imply any affiliation, sponsorship, or endorsement by such companies.