DirectorySecurity Advisories
Sign In
Directory
tritonserver-trtllm-backend logo

tritonserver-trtllm-backend

Last changed

Create your Free Account

Be the first to hear about exciting product updates, critical vulnerability alerts, compare alternative images, and more.

Sign Up
Versions
Overview
Provenance
Specifications
SBOM
Vulnerabilities
Advisories

Chainguard Image for tritonserver-trtllm-backend

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Chainguard Images are regularly-updated, minimal container images with low-to-zero CVEs.

Download this Image

This image is available on cgr.dev:

docker pull cgr.dev/ORGANIZATION/tritonserver-trtllm-backend:latest

Be sure to replace the ORGANIZATION placeholder with the name used for your organization's private repository within the Chainguard registry.

Compatibility Notes

The Chainguard tritonserver-trtllm-backend image is comparable to the official NVIDIA tritonserver-trtllm-backend 24.04 image. However, the Chainguard image contains only the minimum set of tools and dependencies needed to function.

Getting Started

The following steps serve a TensorRT-LLM model with the Triton TensorRT-LLM Backend using the GPT model from the TensorRT-LLM repository. These instructions are adapted from the official readme.

Begin by cloning TrensorRT LLM Backend repository:

git clone https://github.com/triton-inference-server/tensorrtllm_backend.git

Navigate into the tensortllm_backend repository and prepare the submodule:

cd tensorrtllm_backend
git submodule update --init --recursive
git lfs install
git lfs pull

Next, set up the GPT-2 medium model from Hugging Face:

GPT_DIR="tensorrtllm_backend/tensorrt_llm/examples/gpt"
cd ${GPT_DIR} && git clone https://huggingface.co/gpt2-medium gpt2
cd ${GPT_DIR}/gpt2 && rm pytorch_model.bin model.safetensors
cd ${GPT_DIR}/gpt2 && wget https://huggingface.co/gpt2-medium/resolve/main/pytorch_model.bin

Following that, you will need to convert the model to the TensorRT Format. You can use the provided conversion script and tools to convert the model checkpoint into TensorRT format:

cd ${GPT_DIR} && \
python3.10 convert_checkpoint.py --model_dir gpt2 --dtype float16 \
    --tp_size 1 --output_dir ./c-model/gpt2/fp16/1-gpu && \
    trtllm-build --checkpoint_dir ./c-model/gpt2/fp16/1-gpu \
    --gpt_attention_plugin float16 --remove_input_padding enable \
    --paged_kv_cache enable --gemm_plugin float16 \
    --output_dir /engines/fp16/1-gpu

Then update the Triton model repository by copying the prebuilt inflight batcher LLM files into it:

cp -r tensorrtllm_backend/all_models/inflight_batcher_llm/* triton_model_repo/

Use the provided fill_template.py script to customize model configuration files:

python3 tensorrtllm_backend/tools/fill_template.py -i triton_model_repo/ensemble/config.pbtxt triton_max_batch_size:1
python3 tensorrtllm_backend/tools/fill_template.py -i triton_model_repo/preprocessing/config.pbtxt tokenizer_dir:tensorrtllm_backend/tensorrt_llm/examples/gpt/gpt2,triton_max_batch_size:1,preprocessing_instance_count:1
python3 tensorrtllm_backend/tools/fill_template.py -i triton_model_repo/tensorrt_llm/config.pbtxt triton_backend:tensorrtllm,triton_max_batch_size:1,decoupled_mode:false,engine_dir:engines/fp16/1-gpu,max_queue_delay_microseconds:0,batching_strategy:inflight_fused_batching,max_queue_size:0,encoder_input_features_data_type:TYPE_FP16
python3 tensorrtllm_backend/tools/fill_template.py -i triton_model_repo/postprocessing/config.pbtxt tokenizer_dir:tensorrtllm_backend/tensorrt_llm/examples/gpt/gpt2,triton_max_batch_size:1,postprocessing_instance_count:1,max_queue_size:0
python3 tensorrtllm_backend/tools/fill_template.py -i triton_model_repo/tensorrt_llm_bls/config.pbtxt triton_max_batch_size:1,decoupled_mode:false,bls_instance_count:1

Finaly, you can run Triton Inference Server with the prepared model repository and TensorRT LLM backend:

docker run --rm -t --gpus all \
  -v "$(pwd)/triton_model_repo:/triton_model_repo" \
  -v "$(pwd)/tensorrtllm_backend:/tensorrtllm_backend" \
  -v "$(pwd)/engines:/engines" \
  -p 8001:8001 \
  cgr.dev/chainguard-eng/user/tritonserver-trtllm-backend:latest \
  --model-repository=/triton_model_repo \
  --grpc-port=8001

Contact Support

If you have a Zendesk account (typically set up for you by your Customer Success Manager) you can reach out to Chainguard's Customer Success team through our Zendesk portal.

What are Chainguard Images?

Chainguard Images are a collection of container images designed for security and minimalism.

Many Chainguard Images are distroless; they contain only an open-source application and its runtime dependencies. These images do not even contain a shell or package manager. Chainguard Images are built with Wolfi, our Linux undistro designed to produce container images that meet the requirements of a secure software supply chain.

The main features of Chainguard Images include:

-dev Variants

As mentioned previously, Chainguard’s distroless Images have no shell or package manager by default. This is great for security, but sometimes you need these things, especially in builder images. For those cases, most (but not all) Chainguard Images come paired with a -dev variant which does include a shell and package manager.

Although the -dev image variants have similar security features as their distroless versions, such as complete SBOMs and signatures, they feature additional software that is typically not necessary in production environments. The general recommendation is to use the -dev variants only to build the application and then copy all application artifacts into a distroless image, which will result in a final container image that has a minimal attack surface and won’t allow package installations or logins.

That being said, it’s worth noting that -dev variants of Chainguard Images are completely fine to run in production environments. After all, the -dev variants are still more secure than many popular container images based on fully-featured operating systems such as Debian and Ubuntu since they carry less software, follow a more frequent patch cadence, and offer attestations for what they include.

Learn More

To better understand how to work with Chainguard Images, we encourage you to visit Chainguard Academy, our documentation and education platform.

Licenses

Chainguard Images contain software packages that are direct or transitive dependencies. The following licenses were found in the "latest" version of this image:

  • Apache-2.0

  • BSD-2-Clause

  • BSD-3-Clause

  • BSD-3-Clause-Open-MPI

  • CC-BY-4.0

  • FTL

  • GCC-exception-3.1

For a complete list of licenses, please refer to this Image's SBOM.

Software license agreement

Category
featured
AI
base

Safe Source for Open Sourceâ„¢
Media KitContact Us
© 2024 Chainguard. All Rights Reserved.
Private PolicyTerms of Use

Product

Chainguard Images