py3.10-vllm-cuda-12.6
Chainguard
0.8.4-r0
vLLM Allows Remote Code Execution via Mooncake Integration
When vLLM is configured to use Mooncake, unsafe deserialization exposed directly over ZMQ/TCP will allow attackers to execute remote code on distributed hosts.
Only sender_socket
and receiver_ack
are allowed to be accessed publicly, while the data actually decompressed by pickle.loads()
comes from recv_bytes. Its interface is defined as self.receiver_socket.connect(f\"tcp://{d_host}:{d_rank_offset + 1}\")
, where d_host
is decode_host
, a locally defined address 192.168.0.139,from mooncake.json (https://github.com/kvcache-ai/Mooncake/blob/main/doc/en/vllm-integration-v0.2.md?plain=1#L36).
recv_tensor()
calls _recv_impl
which passes the raw network bytes to pickle.loads()
. Additionally, it does not appear that there are any controls (network, authentication, etc) to prevent arbitrary users from sending this payload to the affected service.This is a remote code execution vulnerability impacting any deployments using Mooncake to distribute KV across distributed hosts.
This issue is resolved by https://github.com/vllm-project/vllm/pull/14228