tritonserver-vllm-backend

Last changed

Request a free trial

Contact our team to test out this image for free. Please also indicate any other images you would like to evaluate.

Chainguard Container for tritonserver-vllm-backend

The Triton Inference Server provides an optimized cloud and edge inferencing solution with vllm backend

Chainguard Containers are regularly-updated, secure-by-default container images.

Download this Container Image

For those with access, this container image is available on cgr.dev:


docker pull cgr.dev/ORGANIZATION/tritonserver-vllm-backend:latest

Be sure to replace the ORGANIZATION placeholder with the name used for your organization's private repository within the Chainguard Registry.

Getting started

The Triton Server with vLLM backend container provides GPU-accelerated large language model inference through NVIDIA's Triton Inference Server with the vLLM backend for optimized performance.

Basic vLLM Model Serving

Set up a simple vLLM model repository and serve the facebook/opt-125m model:


# Create model repository structure
mkdir -p model_repository/vllm_model/1

# Download model configuration
wget -P model_repository/vllm_model/1 https://raw.githubusercontent.com/triton-inference-server/vllm_backend/main/samples/model_repository/vllm_model/1/model.json
wget -P model_repository/vllm_model https://raw.githubusercontent.com/triton-inference-server/vllm_backend/main/samples/model_repository/vllm_model/config.pbtxt

# Start Triton server with vLLM backend
docker run --gpus all -d \
  --name triton-vllm \
  -p 8000:8000 -p 8001:8001 -p 8002:8002 \
  --shm-size=1G --ulimit memlock=-1 --ulimit stack=67108864 \
  -v $(pwd):/workspace -w /workspace \
  cgr.dev/ORGANIZATION/tritonserver-vllm-backend:latest \
  --model-repository ./model_repository

The server will take 2-5 minutes to initialize as it downloads and loads the model.

Health Check and Model Status

Check if the server and model are ready:


# Check server health
curl http://localhost:8000/v2/health/ready

# Check model status
curl http://localhost:8000/v2/models/vllm_model/ready

# Get model metadata
curl http://localhost:8000/v2/models/vllm_model

gRPC Client Inference

Test text generation using the gRPC interface:


# Download sample client and prompts
wget https://raw.githubusercontent.com/triton-inference-server/vllm_backend/main/samples/client.py
wget https://raw.githubusercontent.com/triton-inference-server/vllm_backend/main/samples/prompts.txt

# Run client (requires tritonclient[grpc])
docker run --rm --net=host \
  -v $(pwd):/workspace -w /workspace \
  --entrypoint python3 \
  python:3.12-slim \
  -c "pip install tritonclient[grpc] && python3 client.py -u localhost:8001"

Refer to the vLLM documentation for detailed configuration options.

What are Chainguard Containers?

Chainguard's free tier of Starter container images are built with Wolfi, our minimal Linux undistro.

All other Chainguard Containers are built with Chainguard OS, Chainguard's minimal Linux operating system designed to produce container images that meet the requirements of a more secure software supply chain.

The main features of Chainguard Containers include:

Minimal design, without unnecessary software bloat
Daily builds to ensure container images are up-to-date with available security patches
High quality build-time SBOMs attesting to the provenance of all artifacts within the image
Verifiable signatures provided by Sigstore
Reproducible builds with Cosign and apko (read more about reproducibility)

For cases where you need container images with shells and package managers to build or debug, most Chainguard Containers come paired with a development, or -dev, variant.

In all other cases, including Chainguard Containers tagged as :latest or with a specific version number, the container images include only an open-source application and its runtime dependencies. These minimal container images typically do not contain a shell or package manager.

Although the -dev container image variants have similar security features as their more minimal versions, they include additional software that is typically not necessary in production environments. We recommend using multi-stage builds to copy artifacts from the -dev variant into a more minimal production image.

Need additional packages?

To improve security, Chainguard Containers include only essential dependencies. Need more packages? Chainguard customers can use Custom Assembly to add packages, either through the Console, chainctl, or API.

To use Custom Assembly in the Chainguard Console: navigate to the image you'd like to customize in your Organization's list of images, and click on the Customize image button at the top of the page.

Learn More

Refer to our Chainguard Containers documentation on Chainguard Academy. Chainguard also offers VMs and Libraries — contact us for access.

Trademarks

This software listing is packaged by Chainguard. The trademarks set forth in this offering are owned by their respective companies, and use of them does not imply any affiliation, sponsorship, or endorsement by such companies.

Licenses

Chainguard's container images contain software packages that are direct or transitive dependencies. The following licenses were found in the "latest" tag of this image:

Apache-2.0
BSD-1-Clause
BSD-2-Clause
BSD-3-Clause
BSD-4-Clause-UC
CC-BY-4.0
CC-PDDC

For a complete list of licenses, please refer to this Image's SBOM.

Software license agreement

Compliance

Chainguard Containers are SLSA Level 3 compliant with detailed metadata and documentation about how it was built. We generate build provenance and a Software Bill of Materials (SBOM) for each release, with complete visibility into the software supply chain.

SLSA compliance at Chainguard

This image helps reduce time and effort in establishing PCI DSS 4.0 compliance with low-to-no CVEs.

PCI DSS at Chainguard

A FIPS validated version of this image is available for FedRAMP compliance. STIG is included with FIPS image.

Related images

tritonserver-vllm-backend-fips