linux

History

Pierre-Eric Pelloux-Prayer 487df8b698 drm/sched: Fix deadlock in drm_sched_entity_kill_jobs_cb The Mesa issue referenced below pointed out a possible deadlock: [ 1231.611031] Possible interrupt unsafe locking scenario: [ 1231.611033] CPU0 CPU1 [ 1231.611034] ---- ---- [ 1231.611035] lock(&xa->xa_lock#17); [ 1231.611038] local_irq_disable(); [ 1231.611039] lock(&fence->lock); [ 1231.611041] lock(&xa->xa_lock#17); [ 1231.611044] <Interrupt> [ 1231.611045] lock(&fence->lock); [ 1231.611047] * DEADLOCK * In this example, CPU0 would be any function accessing job->dependencies through the xa_* functions that don't disable interrupts (eg: drm_sched_job_add_dependency(), drm_sched_entity_kill_jobs_cb()). CPU1 is executing drm_sched_entity_kill_jobs_cb() as a fence signalling callback so in an interrupt context. It will deadlock when trying to grab the xa_lock which is already held by CPU0. Replacing all xa_* usage by their xa_*_irq counterparts would fix this issue, but Christian pointed out another issue: dma_fence_signal takes fence.lock and so does dma_fence_add_callback. dma_fence_signal() // locks f1.lock -> drm_sched_entity_kill_jobs_cb() -> foreach dependencies -> dma_fence_add_callback() // locks f2.lock This will deadlock if f1 and f2 share the same spinlock. To fix both issues, the code iterating on dependencies and re-arming them is moved out to drm_sched_entity_kill_jobs_work(). Cc: stable@vger.kernel.org # v6.2+ Fixes: `2fdb8a8f07` ("drm/scheduler: rework entity flush, kill and fini") Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13908 Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> [phasta: commit message nits] Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patch.msgid.link/20251104095358.15092-1-pierre-eric.pelloux-prayer@amd.com		2025-11-05 12:29:52 +01:00
..
tests	drm/sched/tests: Remove redundant header files	2025-08-28 10:13:56 +02:00
.kunitconfig	drm/sched: Add scheduler unit testing infrastructure and some basic tests	2025-03-24 10:41:52 +01:00
Makefile	drm/sched: Add scheduler unit testing infrastructure and some basic tests	2025-03-24 10:41:52 +01:00
gpu_scheduler_trace.h	drm/doc: Document some tracepoints as uAPI	2025-05-28 16:16:18 +02:00
sched_entity.c	drm/sched: Fix deadlock in drm_sched_entity_kill_jobs_cb	2025-11-05 12:29:52 +01:00
sched_fence.c	drm/scheduler: Include <linux/export.h>	2025-06-16 09:02:41 +02:00
sched_internal.h	drm/sched: Store the drm client_id in drm_sched_fence	2025-05-28 16:15:58 +02:00
sched_main.c	drm/sched: Fix potential double free in drm_sched_job_add_resv_dependencies	2025-10-16 14:26:05 +02:00