Home

Description

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.8.0 and prior to 0.8.5 are affected by a critical performance vulnerability in the input preprocessing logic of the multimodal tokenizer. The code dynamically replaces placeholder tokens (e.g., <|audio_|>, <|image_|>) with repeated tokens based on precomputed lengths. Due to ​​inefficient list concatenation operations​​, the algorithm exhibits ​​quadratic time complexity (O(n²))​​, allowing malicious actors to trigger resource exhaustion via specially crafted inputs. This issue has been patched in version 0.8.5.

PUBLISHED Reserved 2025-04-24 | Published 2025-04-30 | Updated 2025-04-30 | Assigner GitHub_M




MEDIUM: 6.5CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Problem types

CWE-1333: Inefficient Regular Expression Complexity

Product status

>= 0.8.0, < 0.8.5
affected

References

github.com/...t/vllm/security/advisories/GHSA-vc6m-hm49-g9qg exploit

github.com/...t/vllm/security/advisories/GHSA-vc6m-hm49-g9qg

github.com/...6e8744f4e/vllm/model_executor/models/phi4mm.py

cve.org (CVE-2025-46560)

nvd.nist.gov (CVE-2025-46560)

Download JSON