We use these services and cookies to improve your user experience. You may opt out if you wish, however, this may limit some features on this site.
Please see our statement on Data Privacy.
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.8.0 and prior to 0.8.5 are affected by a critical performance vulnerability in the input preprocessing logic of the multimodal tokenizer. The code dynamically replaces placeholder tokens (e.g., <|audio_|>, <|image_|>) with repeated tokens based on precomputed lengths. Due to inefficient list concatenation operations, the algorithm exhibits quadratic time complexity (O(n²)), allowing malicious actors to trigger resource exhaustion via specially crafted inputs. This issue has been patched in version 0.8.5.
Reserved 2025-04-24 | Published 2025-04-30 | Updated 2025-04-30 | Assigner GitHub_MCWE-1333: Inefficient Regular Expression Complexity
github.com/...t/vllm/security/advisories/GHSA-vc6m-hm49-g9qg
github.com/...6e8744f4e/vllm/model_executor/models/phi4mm.py
Support options