Description
In the Linux kernel, the following vulnerability has been resolved: nvmet: move async event work off nvmet-wq For target nvmet_ctrl_free() flushes ctrl->async_event_work. If nvmet_ctrl_free() runs on nvmet-wq, the flush re-enters workqueue completion for the same worker:- A. Async event work queued on nvmet-wq (prior to disconnect): nvmet_execute_async_event() queue_work(nvmet_wq, &ctrl->async_event_work) nvmet_add_async_event() queue_work(nvmet_wq, &ctrl->async_event_work) B. Full pre-work chain (RDMA CM path): nvmet_rdma_cm_handler() nvmet_rdma_queue_disconnect() __nvmet_rdma_queue_disconnect() queue_work(nvmet_wq, &queue->release_work) process_one_work() lock((wq_completion)nvmet-wq) <--------- 1st nvmet_rdma_release_queue_work() C. Recursive path (same worker): nvmet_rdma_release_queue_work() nvmet_rdma_free_queue() nvmet_sq_destroy() nvmet_ctrl_put() nvmet_ctrl_free() flush_work(&ctrl->async_event_work) __flush_work() touch_wq_lockdep_map() lock((wq_completion)nvmet-wq) <--------- 2nd Lockdep splat: ============================================ WARNING: possible recursive locking detected 6.19.0-rc3nvme+ #14 Tainted: G N -------------------------------------------- kworker/u192:42/44933 is trying to acquire lock: ffff888118a00948 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x26/0x90 but task is already holding lock: ffff888118a00948 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x53e/0x660 3 locks held by kworker/u192:42/44933: #0: ffff888118a00948 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x53e/0x660 #1: ffffc9000e6cbe28 ((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: process_one_work+0x1c5/0x660 #2: ffffffff82d4db60 (rcu_read_lock){....}-{1:3}, at: __flush_work+0x62/0x530 Workqueue: nvmet-wq nvmet_rdma_release_queue_work [nvmet_rdma] Call Trace: __flush_work+0x268/0x530 nvmet_ctrl_free+0x140/0x310 [nvmet] nvmet_cq_put+0x74/0x90 [nvmet] nvmet_rdma_free_queue+0x23/0xe0 [nvmet_rdma] nvmet_rdma_release_queue_work+0x19/0x50 [nvmet_rdma] process_one_work+0x206/0x660 worker_thread+0x184/0x320 kthread+0x10c/0x240 ret_from_fork+0x319/0x390 Move async event work to a dedicated nvmet-aen-wq to avoid reentrant flush on nvmet-wq.
Product status
8832cf922151e9dfa2821736beb0ae2dd3968b6e (git) before 49c7c50ee6325a084216e94395e067ecde8088fa
8832cf922151e9dfa2821736beb0ae2dd3968b6e (git) before ca111c9d8d6c9d5735878d933a1716c4be86c2d1
8832cf922151e9dfa2821736beb0ae2dd3968b6e (git) before 25ceffc1dabec3b93f458b437aae26f4da293f87
8832cf922151e9dfa2821736beb0ae2dd3968b6e (git) before 2922e3507f6d5caa7f1d07f145e186fc6f317a4e
d44ff3b100b94e9f23b1e8dbe688eee9bb867ac9 (git)
84026f8c9357c62a9d3e4c554ffd10ccab813654 (git)
5.18
Any version before 5.18
6.12.80 (semver)
6.18.21 (semver)
6.19.11 (semver)
7.0 (original_commit_for_fix)
References
git.kernel.org/...c/49c7c50ee6325a084216e94395e067ecde8088fa
git.kernel.org/...c/ca111c9d8d6c9d5735878d933a1716c4be86c2d1
git.kernel.org/...c/25ceffc1dabec3b93f458b437aae26f4da293f87
git.kernel.org/...c/2922e3507f6d5caa7f1d07f145e186fc6f317a4e