Home

Description

In the Linux kernel, the following vulnerability has been resolved: netfs: fix reference leak Commit 20d72b00ca81 ("netfs: Fix the request's work item to not require a ref") modified netfs_alloc_request() to initialize the reference counter to 2 instead of 1. The rationale was that the requet's "work" would release the second reference after completion (via netfs_{read,write}_collection_worker()). That works most of the time if all goes well. However, it leaks this additional reference if the request is released before the I/O operation has been submitted: the error code path only decrements the reference counter once and the work item will never be queued because there will never be a completion. This has caused outages of our whole server cluster today because tasks were blocked in netfs_wait_for_outstanding_io(), leading to deadlocks in Ceph (another bug that I will address soon in another patch). This was caused by a netfs_pgpriv2_begin_copy_to_cache() call which failed in fscache_begin_write_operation(). The leaked netfs_io_request was never completed, leaving `netfs_inode.io_count` with a positive value forever. All of this is super-fragile code. Finding out which code paths will lead to an eventual completion and which do not is hard to see: - Some functions like netfs_create_write_req() allocate a request, but will never submit any I/O. - netfs_unbuffered_read_iter_locked() calls netfs_unbuffered_read() and then netfs_put_request(); however, netfs_unbuffered_read() can also fail early before submitting the I/O request, therefore another netfs_put_request() call must be added there. A rule of thumb is that functions that return a `netfs_io_request` do not submit I/O, and all of their callers must be checked. For my taste, the whole netfs code needs an overhaul to make reference counting easier to understand and less fragile & obscure. But to fix this bug here and now and produce a patch that is adequate for a stable backport, I tried a minimal approach that quickly frees the request object upon early failure. I decided against adding a second netfs_put_request() each time because that would cause code duplication which obscures the code further. Instead, I added the function netfs_put_failed_request() which frees such a failed request synchronously under the assumption that the reference count is exactly 2 (as initially set by netfs_alloc_request() and never touched), verified by a WARN_ON_ONCE(). It then deinitializes the request object (without going through the "cleanup_work" indirection) and frees the allocation (with RCU protection to protect against concurrent access by netfs_requests_seq_start()). All code paths that fail early have been changed to call netfs_put_failed_request() instead of netfs_put_request(). Additionally, I have added a netfs_put_request() call to netfs_unbuffered_read() as explained above because the netfs_put_failed_request() approach does not work there.

PUBLISHED Reserved 2025-04-16 | Published 2025-10-20 | Updated 2025-10-20 | Assigner Linux

Product status

Default status
unaffected

20d72b00ca814d748f5663484e5c53bb2bf37a3a (git) before 8df142e93098b4531fadb5dfcf93087649f570b3
affected

20d72b00ca814d748f5663484e5c53bb2bf37a3a (git) before 4d428dca252c858bfac691c31fa95d26cd008706
affected

1a8360c2eed3b292ed654c2ac61b09de4a80e298 (git)
affected

Default status
affected

6.16
affected

Any version before 6.16
unaffected

6.16.10 (semver)
unaffected

6.17 (original_commit_for_fix)
unaffected

References

git.kernel.org/...c/8df142e93098b4531fadb5dfcf93087649f570b3

git.kernel.org/...c/4d428dca252c858bfac691c31fa95d26cd008706

cve.org (CVE-2025-40007)

nvd.nist.gov (CVE-2025-40007)

Download JSON