linux/include
Waiman Long 73ab05aa46 sched/core: Disable page allocation in task_tick_mm_cid()
With KASAN and PREEMPT_RT enabled, calling task_work_add() in
task_tick_mm_cid() may cause the following splat.

[   63.696416] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[   63.696416] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 610, name: modprobe
[   63.696416] preempt_count: 10001, expected: 0
[   63.696416] RCU nest depth: 1, expected: 1

This problem is caused by the following call trace.

  sched_tick() [ acquire rq->__lock ]
   -> task_tick_mm_cid()
    -> task_work_add()
     -> __kasan_record_aux_stack()
      -> kasan_save_stack()
       -> stack_depot_save_flags()
        -> alloc_pages_mpol_noprof()
         -> __alloc_pages_noprof()
	  -> get_page_from_freelist()
	   -> rmqueue()
	    -> rmqueue_pcplist()
	     -> __rmqueue_pcplist()
	      -> rmqueue_bulk()
	       -> rt_spin_lock()

The rq lock is a raw_spinlock_t. We can't sleep while holding
it. IOW, we can't call alloc_pages() in stack_depot_save_flags().

The task_tick_mm_cid() function with its task_work_add() call was
introduced by commit 223baf9d17 ("sched: Fix performance regression
introduced by mm_cid") in v6.4 kernel.

Fortunately, there is a kasan_record_aux_stack_noalloc() variant that
calls stack_depot_save_flags() while not allowing it to allocate
new pages.  To allow task_tick_mm_cid() to use task_work without
page allocation, a new TWAF_NO_ALLOC flag is added to enable calling
kasan_record_aux_stack_noalloc() instead of kasan_record_aux_stack()
if set. The task_tick_mm_cid() function is modified to add this new flag.

The possible downside is the missing stack trace in a KASAN report due
to new page allocation required when task_work_add_noallloc() is called
which should be rare.

Fixes: 223baf9d17 ("sched: Fix performance regression introduced by mm_cid")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20241010014432.194742-1-longman@redhat.com
2024-10-11 10:49:32 +02:00
..
acpi Power management updates for 6.12-rc1 2024-09-16 07:47:50 +02:00
asm-generic asm-generic updates for 6.12 2024-09-26 11:54:40 -07:00
clocksource
crypto
cxl cxl: Move mailbox related bits to the same context 2024-09-12 08:38:01 -07:00
drm drm next for 6.12-rc1 2024-09-19 10:18:15 +02:00
dt-bindings soc: convert ep93xx to devicetree 2024-09-26 12:00:25 -07:00
keys KEYS: Remove unused declarations 2024-09-20 18:28:26 +03:00
kunit The core clk framework is left largely untouched this time around except for 2024-09-23 15:01:48 -07:00
kvm
linux sched/core: Disable page allocation in task_tick_mm_cid() 2024-10-11 10:49:32 +02:00
math-emu
media media: cec: move cec_get/put_device to header 2024-09-05 20:12:15 +02:00
memory
misc
net tcp: check skb is non-NULL in tcp_rto_delta_us() 2024-09-23 11:43:09 +01:00
pcmcia
ras
rdma RDMA/nldev: Add support for RDMA monitoring 2024-09-13 08:29:14 +03:00
rv
scsi SCSI misc on 20240919 2024-09-19 11:28:51 +02:00
soc soc: driver updates for 6.12 2024-09-17 10:48:09 +02:00
sound ASoC: Updates for v6.12 2024-09-14 09:09:59 +02:00
target
trace vfs-6.12-rc2.fixes 2024-09-30 10:59:44 -07:00
uapi bitmap-for-6.12 2024-09-27 12:10:45 -07:00
ufs Many singleton patches - please see the various changelogs for details. 2024-09-21 08:20:50 -07:00
vdso random: vDSO: add a __vdso_getrandom prototype for all architectures 2024-09-13 17:28:35 +02:00
video
xen xen: sync elfnote.h from xen tree 2024-09-25 14:15:04 +02:00