Description
Ray is an AI compute engine. From version 2.54.0 to before version 2.55.0, Ray Data registers custom Arrow extension types (ray.data.arrow_tensor, ray.data.arrow_tensor_v2, ray.data.arrow_variable_shaped_tensor) globally in PyArrow. When PyArrow reads a Parquet file containing one of these extension types, it calls __arrow_ext_deserialize__ on the field's metadata bytes. Ray's implementation passes these bytes directly to cloudpickle.loads(), achieving arbitrary code execution during schema parsing, before any row data is read. This issue has been patched in version 2.55.0.
Problem types
CWE-94: Improper Control of Generation of Code ('Code Injection')
CWE-502: Deserialization of Untrusted Data
Product status
References
github.com/...ct/ray/security/advisories/GHSA-mw35-8rx3-xf9r
github.com/ray-project/ray/pull/62056
github.com/...ommit/c02bd31ae31996805868baa446a131a8d304525f
github.com/ray-project/ray/releases/tag/ray-2.55.0