Home

Description

vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, the /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the chat template. With the right chat_template_kwargs parameters, it is possible to block processing of the API server for long periods of time, delaying all other requests. This issue has been patched in version 0.11.1.

PUBLISHED Reserved 2025-10-13 | Published 2025-11-21 | Updated 2025-11-21 | Assigner GitHub_M




MEDIUM: 6.5CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Problem types

CWE-770: Allocation of Resources Without Limits or Throttling

Product status

>= 0.5.5, < 0.11.1
affected

References

github.com/...t/vllm/security/advisories/GHSA-69j4-grxj-j64p

github.com/vllm-project/vllm/pull/27205

github.com/...ommit/3ada34f9cb4d1af763fdfa3b481862a93eb6bd2b

github.com/...5ed6fe6226997fb/vllm/entrypoints/chat_utils.py

github.com/...97fb/vllm/entrypoints/openai/serving_engine.py

cve.org (CVE-2025-62426)

nvd.nist.gov (CVE-2025-62426)

Download JSON