/
DirectorySecurity AdvisoriesPricing
Sign in
Directory
tritonserver-trtllm-backend logo

tritonserver-trtllm-backend

Last changed

Request a free trial

Contact our team to test out this image for free. Please also indicate any other images you would like to evaluate.

Tags
Overview
Comparison
Provenance
Specifications
SBOM
Vulnerabilities
Advisories

Chainguard Container for tritonserver-trtllm-backend

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Chainguard Containers are regularly-updated, secure-by-default container images.

Download this Container Image

For those with access, this container image is available on cgr.dev:

docker pull cgr.dev/ORGANIZATION/tritonserver-trtllm-backend:latest

Be sure to replace the ORGANIZATION placeholder with the name used for your organization's private repository within the Chainguard Registry.

Compatibility Notes

The Chainguard tritonserver-trtllm-backend image is comparable to the official NVIDIA Triton Inference Server image. However, the Chainguard image contains only the minimum set of tools and dependencies needed to function.

Getting Started

The following steps serve a TensorRT-LLM model with the Triton TensorRT-LLM Backend using the GPT model from the TensorRT-LLM repository. These instructions are adapted from the official readme.

Begin by cloning TrensorRT LLM Backend repository:

git clone https://github.com/triton-inference-server/tensorrtllm_backend.git

Navigate into the tensortllm_backend repository and prepare the submodule:

cd tensorrtllm_backend
git submodule update --init --recursive
git lfs install
git lfs pull

Next, set up the GPT-2 medium model from Hugging Face:

GPT_DIR="tensorrtllm_backend/tensorrt_llm/examples/gpt"
cd ${GPT_DIR} && git clone https://huggingface.co/gpt2-medium gpt2
cd ${GPT_DIR}/gpt2 && rm pytorch_model.bin model.safetensors
cd ${GPT_DIR}/gpt2 && wget https://huggingface.co/gpt2-medium/resolve/main/pytorch_model.bin

Following that, you will need to convert the model to the TensorRT Format. You can use the provided conversion script and tools to convert the model checkpoint into TensorRT format:

cd ${GPT_DIR} && \
python3.10 convert_checkpoint.py --model_dir gpt2 --dtype float16 \
    --tp_size 1 --output_dir ./c-model/gpt2/fp16/1-gpu && \
    trtllm-build --checkpoint_dir ./c-model/gpt2/fp16/1-gpu \
    --gpt_attention_plugin float16 --remove_input_padding enable \
    --paged_kv_cache enable --gemm_plugin float16 \
    --output_dir /engines/fp16/1-gpu

Then update the Triton model repository by copying the prebuilt inflight batcher LLM files into it:

cp -r tensorrtllm_backend/all_models/inflight_batcher_llm/* triton_model_repo/

Use the provided fill_template.py script to customize model configuration files:

python3 tensorrtllm_backend/tools/fill_template.py -i triton_model_repo/ensemble/config.pbtxt triton_max_batch_size:1
python3 tensorrtllm_backend/tools/fill_template.py -i triton_model_repo/preprocessing/config.pbtxt tokenizer_dir:tensorrtllm_backend/tensorrt_llm/examples/gpt/gpt2,triton_max_batch_size:1,preprocessing_instance_count:1
python3 tensorrtllm_backend/tools/fill_template.py -i triton_model_repo/tensorrt_llm/config.pbtxt triton_backend:tensorrtllm,triton_max_batch_size:1,decoupled_mode:false,engine_dir:engines/fp16/1-gpu,max_queue_delay_microseconds:0,batching_strategy:inflight_fused_batching,max_queue_size:0,encoder_input_features_data_type:TYPE_FP16
python3 tensorrtllm_backend/tools/fill_template.py -i triton_model_repo/postprocessing/config.pbtxt tokenizer_dir:tensorrtllm_backend/tensorrt_llm/examples/gpt/gpt2,triton_max_batch_size:1,postprocessing_instance_count:1,max_queue_size:0
python3 tensorrtllm_backend/tools/fill_template.py -i triton_model_repo/tensorrt_llm_bls/config.pbtxt triton_max_batch_size:1,decoupled_mode:false,bls_instance_count:1

Finally, you can run Triton Inference Server with the prepared model repository and TensorRT LLM backend:

docker run --rm -t --gpus all \
  -v "$(pwd)/triton_model_repo:/triton_model_repo" \
  -v "$(pwd)/tensorrtllm_backend:/tensorrtllm_backend" \
  -v "$(pwd)/engines:/engines" \
  -p 8001:8001 \
  cgr.dev/chainguard-eng/user/tritonserver-trtllm-backend:latest \
  --model-repository=/triton_model_repo \
  --grpc-port=8001

What are Chainguard Containers?

Chainguard's free tier of Starter container images are built with Wolfi, our minimal Linux undistro.

All other Chainguard Containers are built with Chainguard OS, Chainguard's minimal Linux operating system designed to produce container images that meet the requirements of a more secure software supply chain.

The main features of Chainguard Containers include:

For cases where you need container images with shells and package managers to build or debug, most Chainguard Containers come paired with a development, or -dev, variant.

In all other cases, including Chainguard Containers tagged as :latest or with a specific version number, the container images include only an open-source application and its runtime dependencies. These minimal container images typically do not contain a shell or package manager.

Although the -dev container image variants have similar security features as their more minimal versions, they include additional software that is typically not necessary in production environments. We recommend using multi-stage builds to copy artifacts from the -dev variant into a more minimal production image.

Need additional packages?

To improve security, Chainguard Containers include only essential dependencies. Need more packages? Chainguard customers can use Custom Assembly to add packages, either through the Console, chainctl, or API.

To use Custom Assembly in the Chainguard Console: navigate to the image you'd like to customize in your Organization's list of images, and click on the Customize image button at the top of the page.

Learn More

Refer to our Chainguard Containers documentation on Chainguard Academy. Chainguard also offers VMs and Libraries — contact us for access.

Trademarks

This software listing is packaged by Chainguard. The trademarks set forth in this offering are owned by their respective companies, and use of them does not imply any affiliation, sponsorship, or endorsement by such companies.

Licenses

Chainguard container images contain software packages that are direct or transitive dependencies. The following licenses were found in the "latest" tag of this image:

  • Apache-2.0

  • BSD-2-Clause

  • BSD-3-Clause

  • BSD-3-Clause-Open-MPI

  • CC-BY-4.0

  • CC-PDDC

  • FTL

For a complete list of licenses, please refer to this Image's SBOM.

Software license agreement

Category
featured
AI

Safe Source for Open Sourceâ„¢
Contact us
© 2025 Chainguard. All Rights Reserved.
Private PolicyTerms of Use

Product

Chainguard ContainersChainguard LibrariesChainguard VMsIntegrationsPricing