Commit Graph

1429340 Commits

Author SHA1 Message Date
Linus Torvalds 9147566d80 sched_ext: Fixes for v7.0-rc6
- Fix SCX_KICK_WAIT deadlock where multiple CPUs waiting for each other in
   hardirq context form a cycle. Move the wait to a balance callback which
   can drop the rq lock and process IPIs.
 
 - Fix inconsistent NUMA node lookup in scx_select_cpu_dfl() where the
   waker_node used cpu_to_node() while prev_cpu used
   scx_cpu_node_if_enabled(), leading to undefined behavior when per-node
   idle tracking is disabled.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCacwiiQ4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGVILAP44s30JBpNyJ9JhAiCoTYzxzOXqqGbotnpQckMF
 +7WoJAD/Z9dJO/Sw/AH0fX6WVJDmO0QsQvFXLXJBxWy7A5XVAA0=
 =2DW5
 -----END PGP SIGNATURE-----

Merge tag 'sched_ext-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext fixes from Tejun Heo:

 - Fix SCX_KICK_WAIT deadlock where multiple CPUs waiting for each other
   in hardirq context form a cycle. Move the wait to a balance callback
   which can drop the rq lock and process IPIs.

 - Fix inconsistent NUMA node lookup in scx_select_cpu_dfl() where
   the waker_node used cpu_to_node() while prev_cpu used
   scx_cpu_node_if_enabled(), leading to undefined behavior when
   per-node idle tracking is disabled.

* tag 'sched_ext-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  selftests/sched_ext: Add cyclic SCX_KICK_WAIT stress test
  sched_ext: Fix SCX_KICK_WAIT deadlock by deferring wait to balance callback
  sched_ext: Fix inconsistent NUMA node lookup in scx_select_cpu_dfl()
2026-03-31 14:23:12 -07:00
Linus Torvalds 0958d657b4 workqueue: Fixes for v7.0-rc6
- Fix false positive stall reports on weakly ordered architectures where
   the lockless worklist/timestamp check in the watchdog can observe stale
   values due to memory reordering. Recheck under pool->lock to confirm.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCacwiew4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGX66AQCT5ZNUV46mQdQU8TXV4qMiAW1gdwoUT+rvcQ9Q
 u2jA/AD/S1uHycAWKqA+TTL8NfNvhgyhgq1AVbYUlTRXCz3iwgU=
 =BlI5
 -----END PGP SIGNATURE-----

Merge tag 'wq-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue fix from Tejun Heo:

 - Fix false positive stall reports on weakly ordered architectures
   where the lockless worklist/timestamp check in the watchdog can
   observe stale values due to memory reordering.

   Recheck under pool->lock to confirm.

* tag 'wq-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Better describe stall check
  workqueue: Fix false positive stall reports
2026-03-31 14:20:39 -07:00
Linus Torvalds 53d85a2056 cgroup: Fixes for v7.0-rc6
- Fix cgroup rmdir racing with dying tasks. Deferred task cgroup unlink
   introduced a window where cgroup.procs is empty but the cgroup is still
   populated, causing rmdir to fail with -EBUSY and selftest failures. Make
   rmdir wait for dying tasks to fully leave and fix selftests to not depend
   on synchronous populated updates.
 
 - Fix cpuset v1 task migration failure from empty cpusets under strict
   security policies. When CPU hotplug removes the last CPU from a v1
   cpuset, tasks must be migrated to an ancestor without a
   security_task_setscheduler() check that would block the migration.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCacwibg4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGXHEAP98nVEKyl7c7+sXYtwOPn8KEhdHkdpHyPZwhpS2
 1wLhaQEAm8yO49s7IgvGPWSz0s/gQdmF5/x8RAee0sJsZALvGQg=
 =bUUt
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:

 - Fix cgroup rmdir racing with dying tasks.

   Deferred task cgroup unlink introduced a window where cgroup.procs
   is empty but the cgroup is still populated, causing rmdir to fail
   with -EBUSY and selftest failures.

   Make rmdir wait for dying tasks to fully leave and fix selftests to
   not depend on synchronous populated updates.

 - Fix cpuset v1 task migration failure from empty cpusets under strict
   security policies.

   When CPU hotplug removes the last CPU from a v1 cpuset, tasks must be
   migrated to an ancestor without a security_task_setscheduler() check
   that would block the migration.

* tag 'cgroup-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/cpuset: Skip security check for hotplug induced v1 task migration
  cgroup/cpuset: Simplify setsched decision check in task iteration loop of cpuset_can_attach()
  cgroup: Fix cgroup_drain_dying() testing the wrong condition
  selftests/cgroup: Don't require synchronous populated update on task exit
  cgroup: Wait for dying tasks to leave on rmdir
2026-03-31 13:59:51 -07:00
Waiman Long 089f3fcd69 cgroup/cpuset: Skip security check for hotplug induced v1 task migration
When a CPU hot removal causes a v1 cpuset to lose all its CPUs, the
cpuset hotplug handler will schedule a work function to migrate tasks
in that cpuset with no CPU to its ancestor to enable those tasks to
continue running.

If a strict security policy is in place, however, the task migration
may fail when security_task_setscheduler() call in cpuset_can_attach()
returns a -EACCES error. That will mean that those tasks will have
no CPU to run on. The system administrators will have to explicitly
intervene to either add CPUs to that cpuset or move the tasks elsewhere
if they are aware of it.

This problem was found by a reported test failure in the LTP's
cpuset_hotplug_test.sh. Fix this problem by treating this special case as
an exception to skip the setsched security check in cpuset_can_attach()
when a v1 cpuset with tasks have no CPU left.

With that patch applied, the cpuset_hotplug_test.sh test can be run
successfully without failure.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-03-31 09:14:13 -10:00
Waiman Long bbe5ab8191 cgroup/cpuset: Simplify setsched decision check in task iteration loop of cpuset_can_attach()
Centralize the check required to run security_task_setscheduler() in
the task iteration loop of cpuset_can_attach() outside of the loop as
it has no dependency on the characteristics of the tasks themselves.

There is no functional change.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-03-31 09:14:13 -10:00
Linus Torvalds dbf00d8d23 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmnLsoQACgkQnJ2qBz9k
 QNmnIAgAmJtXFZu/uJiRR6RrxmYzIdznkaZ8aTGeApQfG432goHqPhh1j+TKpe7Q
 YwrB90YP6umL16HOyKvci/jNpv7+vFb8sr93FPu8+01AnMJqfvA3IJ+hrzPy6IcX
 dqpO4oKr9gqbmf2/9Eny5mv50xcNI3+M7y+xh0MFsKoZxnEb2j4pCKQsPO0qMOZw
 Aui1fFb4nXpcWUlIJtNFRuXXR3pQf4n8ZpfywhUCe4FV5RRrwiq6Y8If0Tj8lqgx
 2fEwbkkjZkFVlOijNhZSSWKnobwg3oUbpXStZE6Kyw60vZBcQgEKWOAke175v+qf
 XMuhiSrSawJDC45SZmb8QtGe8GEV6g==
 =gYPI
 -----END PGP SIGNATURE-----

Merge tag 'fs_for_v7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull udf fix from Jan Kara:
 "Fix for a race in UDF that can lead to memory corruption"

* tag 'fs_for_v7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Fix race between file type conversion and writeback
  mpage: Provide variant of mpage_writepages() with own optional folio handler
2026-03-31 10:28:08 -07:00
Linus Torvalds d0c3bcd5b8 Crypto library fixes for v7.0-rc7
Fix missing zeroization of the ChaCha state
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCacrR+xQcZWJpZ2dlcnNA
 a2VybmVsLm9yZwAKCRDzXCl4vpKOK9IdAQDpZU7HYigXJJxdlLKi5BWqRdMS85Qz
 JYAlahqXGLclnQD+KlXc0FFZagy/EwTYNTyXh5Adl1hvhbUvx+hvDrgLwQo=
 =0HZN
 -----END PGP SIGNATURE-----

Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull crypto library fix from Eric Biggers:
 "Fix missing zeroization of the ChaCha state"

* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  lib/crypto: chacha: Zeroize permuted_state before it leaves scope
2026-03-30 13:40:48 -07:00
Linus Torvalds f1b24d8bdd rtla fixes for 7.0:
- Fix build failure when libbpf does not exist
 
   RTLA supports building without BPF libraries, but a recent change
   added a libbpf.h include outside of the BPF protection which caused
   build failures when libbpf was not installed.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCacqphhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qjglAQDZFyZlJ5x13SbmxcIkA+pSy7zrWkxt
 3hB09dkdY2q2uAEA+PMALreOSF2A1dyH8c6/yuxf3ftcUZH+/XnkQeheows=
 =f4XK
 -----END PGP SIGNATURE-----

Merge tag 'trace-rtla-v7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull rtla build fix from Steven Rostedt:

 - Fix build failure when libbpf does not exist

   RTLA supports building without BPF libraries, but a recent change
   added a libbpf.h include outside of the BPF protection which caused
   build failures when libbpf was not installed.

* tag 'trace-rtla-v7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rtla: Fix build without libbpf header
2026-03-30 13:12:00 -07:00
Tejun Heo 090d34f0f0 selftests/sched_ext: Add cyclic SCX_KICK_WAIT stress test
Add a test that creates a 3-CPU kick_wait cycle (A->B->C->A). A BPF
scheduler kicks the next CPU in the ring with SCX_KICK_WAIT on every
enqueue while userspace workers generate continuous scheduling churn via
sched_yield(). Without the preceding fix, this hangs the machine within seconds.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
2026-03-30 08:37:55 -10:00
Tejun Heo 415cb193bb sched_ext: Fix SCX_KICK_WAIT deadlock by deferring wait to balance callback
SCX_KICK_WAIT busy-waits in kick_cpus_irq_workfn() using
smp_cond_load_acquire() until the target CPU's kick_sync advances. Because
the irq_work runs in hardirq context, the waiting CPU cannot reschedule and
its own kick_sync never advances. If multiple CPUs form a wait cycle, all
CPUs deadlock.

Replace the busy-wait in kick_cpus_irq_workfn() with resched_curr() to
force the CPU through do_pick_task_scx(), which queues a balance callback
to perform the wait. The balance callback drops the rq lock and enables
IRQs following the sched_core_balance() pattern, so the CPU can process
IPIs while waiting. The local CPU's kick_sync is advanced on entry to
do_pick_task_scx() and continuously during the wait, ensuring any CPU that
starts waiting for us sees the advancement and cannot form cyclic
dependencies.

Fixes: 90e55164da ("sched_ext: Implement SCX_KICK_WAIT")
Cc: stable@vger.kernel.org # v6.12+
Reported-by: Christian Loehle <christian.loehle@arm.com>
Link: https://lore.kernel.org/r/20260316100249.1651641-1-christian.loehle@arm.com
Signed-off-by: Tejun Heo <tj@kernel.org>
Tested-by: Christian Loehle <christian.loehle@arm.com>
2026-03-30 08:37:27 -10:00
Tomas Glozar 2e8b1a1d12 rtla: Fix build without libbpf header
rtla supports building without libbpf. However, BPF actions
patchset [1] adds an include of bpf/libbpf.h into timerlat_bpf.h,
which breaks build on systems that don't have libbpf headers
installed.

This is a leftover from a draft version of the patchset where
timerlat_bpf_set_action() (which takes a struct bpf_program * argument)
was defined in the header. timerlat_bpf.c already includes bpf/libbpf.h
via timerlat.skel.h when libbpf is present.

Remove the redundant include to fix build on systems without libbpf
headers.

[1] https://lore.kernel.org/linux-trace-kernel/20251126144205.331954-1-tglozar@redhat.com/T/

Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: Crystal Wood <crwood@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Link: https://patch.msgid.link/20260330091207.16184-1-tglozar@redhat.com
Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Closes: https://lore.kernel.org/linux-trace-kernel/20260329122202.65a8b575@robin/
Fixes: 8cd0f08ac7 ("rtla/timerlat: Support tail call from BPF program")
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Reviewed-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-30 12:44:48 -04:00
Linus Torvalds 7aaa8047ea Linux 7.0-rc6 2026-03-29 15:40:00 -07:00
Linus Torvalds d1384f70b2 vfs-7.0-rc6.fixes
Please consider pulling these changes from the signed vfs-7.0-rc6.fixes tag.
 
 Thanks!
 Christian
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCacmRjQAKCRCRxhvAZXjc
 olJnAQD2iiLqih8Y8nX3ESMkkIQWUoSikrfSVw/GqmuKTmlrDgEA/z+LRgDGnI/+
 6xzkEw4UNmJ9JoJsiPSlHq18yyga/ww=
 =DxTb
 -----END PGP SIGNATURE-----

Merge tag 'vfs-7.0-rc6.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

 - Fix netfs_limit_iter() hitting BUG() when an ITER_KVEC iterator
   reaches it via core dump writes to 9P filesystems. Add ITER_KVEC
   handling following the same pattern as the existing ITER_BVEC code.

 - Fix a NULL pointer dereference in the netfs unbuffered write retry
   path when the filesystem (e.g., 9P) doesn't set the prepare_write
   operation.

 - Clear I_DIRTY_TIME in sync_lazytime for filesystems implementing
  ->sync_lazytime. Without this the flag stays set and may cause
   additional unnecessary calls during inode deactivation.

 - Increase tmpfs size in mount_setattr selftests. A recent commit
   bumped the ext4 image size to 2 GB but didn't adjust the tmpfs
   backing store, so mkfs.ext4 fails with ENOSPC writing metadata.

 - Fix an invalid folio access in iomap when i_blkbits matches the folio
   size but differs from the I/O granularity. The cur_folio pointer
   would not get invalidated and iomap_read_end() would still be called
   on it despite the IO helper owning it.

 - Fix hash_name() docstring.

 - Fix read abandonment during netfs retry where the subreq variable
   used for abandonment could be uninitialized on the first pass or
   point to a deleted subrequest on later passes.

 - Don't block sync for filesystems with no data integrity guarantees.
   Add a SB_I_NO_DATA_INTEGRITY superblock flag replacing the per-inode
   AS_NO_DATA_INTEGRITY mapping flag so sync kicks off writeback but
   doesn't wait for flusher threads. This fixes a suspend-to-RAM hang on
   fuse-overlayfs where the flusher thread blocks when the fuse daemon
   is frozen.

 - Fix a lockdep splat in iomap when reads fail. iomap_read_end_io()
   invokes fserror_report() which calls igrab() taking i_lock in hardirq
   context while i_lock is normally held with interrupts enabled. Kick
   failed read handling to a workqueue.

 - Remove the redundant netfs_io_stream::front member and use
   stream->subrequests.next instead, fixing a potential issue in the
   direct write code path.

* tag 'vfs-7.0-rc6.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  netfs: Fix the handling of stream->front by removing it
  iomap: fix lockdep complaint when reads fail
  writeback: don't block sync for filesystems with no data integrity guarantees
  netfs: Fix read abandonment during retry
  vfs: fix docstring of hash_name()
  iomap: fix invalid folio access when i_blkbits differs from I/O granularity
  selftests/mount_setattr: increase tmpfs size for idmapped mount tests
  fs: clear I_DIRTY_TIME in sync_lazytime
  netfs: Fix NULL pointer dereference in netfs_unbuffered_write() on retry
  netfs: Fix kernel BUG in netfs_limit_iter() for ITER_KVEC iterators
2026-03-29 15:24:28 -07:00
Linus Torvalds fc9eae25ec phy fixes for 7.0
Couple of driver fixes
  - Qualcomm PCS table fix for ufs phy
  - TI device node reference fix
  - Common prop kconfig fix
  - lynx CDR lock workaround for lanes disabled
  - usb disconnect function fix of k1 driver
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmnJEFoACgkQfBQHDyUj
 g0fL0RAAuRoXzQk70/6pGl0Ap5rndQ+ommgE+b9LXYfgcT18jJHAwiUr+14En+At
 NdI+Cw8nmpTbg4WRGfE8StRQPu6fRUjsEzxG0WbyZIccpXCvp926LgFVio1/tmNH
 YnUs6oT48SWiTT7EF5seYG6YtWCBPu1dFWjfg46czoPC399Ieyp38mX48Vsl1i/G
 t9g20T7eioYDS/wKOt1fDSEGG+kWVsWttqzXJ5w42hqUbAK0FGCuw9oRNMZSmNZK
 z9z7l5cPhGb5SN+N0fq3mHYhZ7BAzKKNrRwW6cf+N6gnPd+g3L19SGW80ocJx6AR
 74xK9Sfjl/bUw5/CwWGIIbFxVPfNap4i0qxABPtONFTy0463F2TSR58Wsnru17fX
 OSdY9IUMh527d8Bgj5Pyp6I3/VD7VHaXYg2XgAu+rzwv2tCw3JHwWn5tdPF8cnFe
 efgUr1xwLQM0ZIWZzuXuozNaldD5vUQB0XvSy2UETlI8vlkCmGtgxoxyvSB/CNKw
 mkUEfyZeXg4wuvivRPXvJOOU9pIWv+MkDQ475m7nE3VR/PwRon9ABSxfL2YkR2uk
 sOKqIsPQkYaEFRi5sfa/V5g6tAvhAI/G6qUdftiFzblQWgq3bLcwstH0cFQEoeku
 Tji3hQt1ihJEZyZmp6/Y04c/WMeoRmO9Q0c4XTkJAUKdo9p2MZY=
 =HcR7
 -----END PGP SIGNATURE-----

Merge tag 'phy-fixes-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy

Pull phy fixes from Vinod Koul:

 - Qualcomm PCS table fix for ufs phy

 - TI device node reference fix

 - Common prop kconfig fix

 - lynx CDR lock workaround for lanes disabled

 - usb disconnect function fix of k1 driver

* tag 'phy-fixes-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
  phy: qcom: qmp-ufs: Fix SM8650 PCS table for Gear 4
  phy: ti: j721e-wiz: Fix device node reference leak in wiz_get_lane_phy_types()
  phy: k1-usb: add disconnect function support
  phy: lynx-28g: skip CDR lock workaround for lanes disabled in the device tree
  phy: make PHY_COMMON_PROPS Kconfig symbol conditionally user-selectable
2026-03-29 12:48:52 -07:00
Linus Torvalds a516c618a6 dmaengine fixes for v7.0
Bunch of driver fixes for:
  - Xilinx regmap init error handling, dma_device directions, residue
    calculation, reset related timeout fixes.
  - Renesas CHCTRL updates and driver list fixes
  - DW HDMA cycle bits and MSI data programming fix
  - IDXD pile of fixes for memeory leak and FLR fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmnJEbYACgkQfBQHDyUj
 g0fSUw//ciiTGIky88hy0QdvcTpFdlGwItBbAu6FuVWJamU15di2sFUlcVS/HKjd
 e+99FbPm7Wnrimv7/726U0FR8WDfNoaUtLZVFE3tsmnVxIubXxBooRjG36ciOlMC
 aEf5mhwnUxzHlBuq0q2eqWlyb5yfrkeDI/vp9i1NVawT64xe7DjOtimHZ0CVkdyF
 uNX0vCskBdnejGc6QzquWcYjk0jOhds++8yYJl2SPWUejJO5tvAcLdz9N6GMfWcC
 WoSmMom6ft/UgpVHp0NvslUC3hUzYvQBGBZ8QGxv9o7GUBJaTEf6J2AYzlWFk3M2
 rvgEpT4+yfIpIS0xWvxPwMAv0gm3ATBY97loQCl2FX2AbAUedYdALzCI6C/v1Cja
 bw6YVAhIqmEBVoUBb7Bf8z/t+722cx+0MMZP+EPB9j7vNhzFqcfSR6ULIZBCl3Uv
 Cm8zY69SpoXdAkIQc0iR+Ey2MufCaegQDKeZgLYMMqY1U4VRWNGP/IFzhkB7yZlu
 MgWP19+PeDgnXNjfoVj8VahCsMydXhTh1tNUHVEJGbZZ5KQZcpkC4PkiWoM59ZPb
 VjTuuDCdlQdd8hKqI07Zvqua0G1lkSf578I87Mh4dnPySEMZebYYusTnPp9abFk/
 7/te+c7X/bWDHby8vE+7cAlwrcpefERT9aLrSCbt+m2wp6BjsKk=
 =MxNf
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-fix-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine

Pull dmaengine fixes from Vinod Koul:
 "A bunch of driver fixes with idxd ones being the biggest:

   - Xilinx regmap init error handling, dma_device directions, residue
     calculation, and reset related timeout fixes

   - Renesas CHCTRL updates and driver list fixes

   - DW HDMA cycle bits and MSI data programming fix

   - IDXD pile of fixes for memeory leak and FLR fixes"

* tag 'dmaengine-fix-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (21 commits)
  dmaengine: xilinx_dma: Fix reset related timeout with two-channel AXIDMA
  dmaengine: xilinx: xilinx_dma: Fix unmasked residue subtraction
  dmaengine: xilinx: xilinx_dma: Fix residue calculation for cyclic DMA
  dmaengine: xilinx: xilinx_dma: Fix dma_device directions
  dmaengine: sh: rz-dmac: Move CHCTRL updates under spinlock
  dmaengine: sh: rz-dmac: Protect the driver specific lists
  dmaengine: idxd: fix possible wrong descriptor completion in llist_abort_desc()
  dmaengine: xilinx: xdma: Fix regmap init error handling
  dmaengine: dw-edma: Fix multiple times setting of the CYCLE_STATE and CYCLE_BIT bits for HDMA.
  dmaengine: idxd: Fix leaking event log memory
  dmaengine: idxd: Fix freeing the allocated ida too late
  dmaengine: idxd: Fix memory leak when a wq is reset
  dmaengine: idxd: Fix not releasing workqueue on .release()
  dmaengine: idxd: Wait for submitted operations on .device_synchronize()
  dmaengine: idxd: Flush all pending descriptors
  dmaengine: idxd: Flush kernel workqueues on Function Level Reset
  dmaengine: idxd: Fix possible invalid memory access after FLR
  dmaengine: idxd: Fix crash when the event log is disabled
  dmaengine: idxd: Fix lockdep warnings when calling idxd_device_config()
  dmaengine: dw-edma: fix MSI data programming for multi-IRQ case
  ...
2026-03-29 12:42:31 -07:00
Linus Torvalds 32ee88daf7 i2c-for-7.0-rc6
designware: fix resume-probe race causing NULL-deref in amdisp
 imx: fix timeout on repeated reads and extra clock at end
 MAINTAINERS: drop outdated I2C website
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEOZGx6rniZ1Gk92RdFA3kzBSgKbYFAmnIwYAACgkQFA3kzBSg
 KbZWMw/+P/+jIvjGm9ySmj4cOZIKydpGxdauDPcyzmWwu2bfyjplmqvtUJxUmiRI
 jf+4uod4SJeibFUm6OVk+EgrXuUWEV+qIJ+IPlBvdBbhftVTQUN1o751rOUGxj6D
 X8uqmDRLnC0ujG6Y1QvZGNkYl2RUmDBVHYgCk0kY+mYeoV7+TihVvmuhX+ksRhga
 s+fMl9M18Z4H7m7muZzRJe/phvo8GK2RvM+jXIW/wD2sJCoRc6x3OZ6VIqiHZcZS
 PVfCIFxaSokAXx95eWpjqJhRkvwEFMG09EI6TASmxtnYbZKGX8wHaoY0bctUbOPu
 SQ/vHQsnkzNxyHBX7mVFot10nA90C88MqoD/oEpVYAPv0Xg5GuaIvV20+FMg/2HT
 KeNHBb3TTOa79Bl1tWiQZ9a8TcfyaVZ/TeTeuntFNYoiJJDhQrk0nbXsRuMz1zmf
 viMD0YjkuNrlvIiEbR/Y27AC/vUTk4elRaWb+inAvZonjvNi/bhjxFhxsC+Jb163
 cP9bt9zjfEqcIAnHTKwLcFtSRmZCiSFPeJ0VFCZZxUehybhjZJflSTSJeKaiw1ni
 mJqoMt3oXo4rIuwOouxrKT4WfptLu2mstOfQPlvyGVEulGl1p+Sx3JYzthhF0gVM
 OyR/UzLg4lRFlO84qlLjNZTPoxa3G8f2EdP0dSGiPz0kXuvl3DM=
 =3asa
 -----END PGP SIGNATURE-----

Merge tag 'i2c-for-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:

 - designware: fix resume-probe race causing NULL-deref in amdisp

 - imx: fix timeout on repeated reads and extra clock at end

 - MAINTAINERS: drop outdated I2C website

* tag 'i2c-for-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  MAINTAINERS: drop outdated I2C website
  i2c: designware: amdisp: Fix resume-probe race condition issue
  i2c: imx: ensure no clock is generated after last read
  i2c: imx: fix i2c issue when reading multiple messages
2026-03-29 12:27:13 -07:00
Linus Torvalds ac354b5cb0 s390:
* Lots of small and not-so-small fixes for the newly rewritten gmap,
   mostly affecting the handling of nested guests.
 
 x86:
 
 * Fix an issue with shadow paging, which causes KVM to install an MMIO PTE
   in the shadow page tables without first zapping a non-MMIO SPTE if KVM
   didn't see the write that modified the shadowed guest PTE.  While commit
   a54aa15c6b was right about it being impossible to miss such a write
   if it was coming from the guest, it failed to account for writes to
   guest memory that are outside the scope of KVM: if userspace modifies
   the guest PTE, and then the guest hits a relevant page fault, KVM will
   get confused.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmnH3j8UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOWRQf7BD1dgyO9Id+Y/QQJPzZ0z/zGbNWT
 jLDTpapxSB960AybvmkOl0pgr7AJrNN+iWQ5cbod/41NKEdJn++ME++NFQlt15oH
 gZAMdVr72qklyVFOq3BZhQRskleGo35A/YYznKf+re4tdvL5fynyYTLDwVkDR4NU
 tCwHCg+B6bVSNOLjxMm5eOpDXoboGiwohFYay7IclsXibjDlKyFaj9mZPJW1E6qy
 SUp+nuseUTf8RFFscNTsW6XRPa/Y7RctPBNQuGSiw3rxFXsq+VyD6Y/AOklbdeyz
 8u+25gdKm65sdXFmLWIN1Ogec0DcKMgdNpFrgEj+9PPWyHDHikqksv/vRw==
 =/YA7
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "s390:

   - Lots of small and not-so-small fixes for the newly rewritten gmap,
     mostly affecting the handling of nested guests.

  x86:

   - Fix an issue with shadow paging, which causes KVM to install an
     MMIO PTE in the shadow page tables without first zapping a non-MMIO
     SPTE if KVM didn't see the write that modified the shadowed guest
     PTE.

     While commit a54aa15c6b ("KVM: x86/mmu: Handle MMIO SPTEs
     directly in mmu_set_spte()") was right about it being impossible to
     miss such a write if it was coming from the guest, it failed to
     account for writes to guest memory that are outside the scope of
     KVM: if userspace modifies the guest PTE, and then the guest hits a
     relevant page fault, KVM will get confused"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86/mmu: Only WARN in direct MMUs when overwriting shadow-present SPTE
  KVM: x86/mmu: Drop/zap existing present SPTE even when creating an MMIO SPTE
  KVM: s390: Fix KVM_S390_VCPU_FAULT ioctl
  KVM: s390: vsie: Fix guest page tables protection
  KVM: s390: vsie: Fix unshadowing while shadowing
  KVM: s390: vsie: Fix refcount overflow for shadow gmaps
  KVM: s390: vsie: Fix nested guest memory shadowing
  KVM: s390: Correctly handle guest mappings without struct page
  KVM: s390: Fix gmap_link()
  KVM: s390: vsie: Fix check for pre-existing shadow mapping
  KVM: s390: Remove non-atomic dat_crstep_xchg()
  KVM: s390: vsie: Fix dat_split_ste()
2026-03-29 11:58:47 -07:00
Linus Torvalds b8a3bc8567 xen: branch for v7.0-rc6
-----BEGIN PGP SIGNATURE-----
 
 iJEEABYKADkWIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCackzVhsUgAAAAAAEAA5t
 YW51MiwyLjUrMS4xMiwyLDIACgkQgFxhu0/YY75+AAEA54Z8n7PQ4NCdd3NxtfZY
 vopaMNY5NbBzMpzZpPo/uNkA/R/zfRogLx9pLiBh4LgJIzNe3nSK7NUsKBZVXxo6
 jcEL
 =8HbQ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-7.0a-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fix from Juergen Gross:
 "A single fix for a very rare bug introduced in rc5"

* tag 'for-linus-7.0a-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/privcmd: unregister xenstore notifier on module exit
2026-03-29 11:51:37 -07:00
Linus Torvalds f242ac4a09 Miscellaneous x86 fixes:
- Fix an early boot crash in AMD SEV-SNP guests, caused by incorrect
    FSGSBASE init ordering. (Nikunj A Dadhania)
 
  - Remove X86_CR4_FRED from the CR4 pinned bits mask, to fix a race
    window during the bootup of SEV-{ES,SNP} or TDX guests, which
    can crash them if they trigger exceptions in that window.
    (Borislav Petkov)
 
  - Fix early boot failures on SEV-ES/SNP guests, due to incorrect
    early GHCB access. (Nikunj A Dadhania)
 
  - Add clarifying comment to the CRn pinning logic, to avoid
    future confusion & bugs. (Peter Zijlstra)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmnIudcRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jDaA/9GAF9AwwF8Gle0/AbQFjgIG2lA4G1D00q
 bfToKQeCAqBmlro7SP/4h9Kr70/xtnPCRP/BwTWEHHY/2Z5fLJOXcra5FWWPjjA1
 HZE6HHKFDXmPV2BfbdE5ezX8pSwBdGZ/vIykZ+q/q3/P7P/1lmxfFA+rEvbkykFH
 CQwAA3fVQuwQUiBY5eRN+0lwoIKRX3gnREtDflkgoqhhntoJ/hpnoytQAQX5Sb5j
 +MNpy4iIDFYLqRz3GHepcMDUB9W9e+9YvWd1qxsdgxyys5zVW9l7Z1qm3SHXd2uw
 OWAFu/ySKjlage4f8hPUbZHwtllzxM6HfyHcVoxjLnP/sa/9g9kJPhCGz1R1CoyM
 s2/2PM43dKN72IsWThUppa7JUENpaVUqDwOm8mBD07BZVZ7THKS1d4nLqxSyGIaN
 yDcEoH+6sbv0PhqC6GOg3Oi7DW66Q7LPJcf1AsrzW9f7pUBZc+K9KnqvY+U2SewH
 5OS3tzXhTLnRRkvJxLB3EFFSpefOFyEQ6WruZfrbijNVupIZpKpnqrN8xfDvF4p0
 XJ1rw03F1ltV9OO8YDfJfEjav1eJWTZcqGA8ZMnlZTzrB1+oJGw7i4n1rAILJNEF
 77wARixDNiiOsFcvF7dTG7l7HWEx1mWBkn75n+lnMvWqWd1QN4B2z85CIm5PdsnW
 0D0wd6vxUtk=
 =I3Gh
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2026-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

 - Fix an early boot crash in AMD SEV-SNP guests, caused by incorrect
   FSGSBASE init ordering (Nikunj A Dadhania)

 - Remove X86_CR4_FRED from the CR4 pinned bits mask, to fix a race
   window during the bootup of SEV-{ES,SNP} or TDX guests, which can
   crash them if they trigger exceptions in that window (Borislav
   Petkov)

 - Fix early boot failures on SEV-ES/SNP guests, due to incorrect early
   GHCB access (Nikunj A Dadhania)

 - Add clarifying comment to the CRn pinning logic, to avoid future
   confusion & bugs (Peter Zijlstra)

* tag 'x86-urgent-2026-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add comment clarifying CRn pinning
  x86/fred: Fix early boot failures on SEV-ES/SNP guests
  x86/cpu: Remove X86_CR4_FRED from the CR4 pinned bits mask
  x86/cpu: Enable FSGSBASE early in cpu_init_exception_handling()
2026-03-29 10:04:37 -07:00
Linus Torvalds 47e3f23f0e Fix an argument order bug in the alarm timer forwarding logic,
which may cause missed expirations or incorrect overrun accounting.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmnIt8IRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hogw/+J8ekCjKdzZobEyGAM4fTc0jlOmPiXog3
 yCKEZHnRpTmWskMKhDIqt3TYr9ihzHSWLwgtUpcx5bXO2qyhqVlQtfmSjA0CXeQ9
 DHADtLAsZIg8qpkqur6zGd4icLX5ZoVfiwZdPX3QZMgwAmM0JsqOLOIy8TGvsB2Q
 PWnq4aiTqs3lqA00ehMDR2gSlPfgJKUNgHVyJa0/udHVU81A67vPMbu12sX+Iz25
 ITxOehF32JuEvJUxa/sipjswizYc/bt5qEywn8Xoob99xrkv5vmTQ31sehqovwq/
 xVMaWs+oRj0Ec20E1LWuawnV83b0XR7j8Z2XCVNJHHfys80Hj6Egw7HVctKQq1Tf
 X3bzQMbyaYHWn+Cc7DClexposL3xi76ZPW5VaWcsLs5DA15hGmgY7r2DrlJby7QZ
 isjqsW06NGLM1DFrTEYlwTDKm7vLukFnWX4TXokMHitkSQPcyLa24sgMPFIXY1Pe
 QoY+f9tmIZ5GmA8JA4dgORSQvQEiG8ElToNDfM2Dmk8b69eO+JwdehQfwwGhhemF
 S8VMUMylrk0WOoCxM6peuA0JUMPKsLtg5dWWxsWDOxNUUxs9ujoQBAMYXuRloPUv
 kGV8entjyBZ4Nz1dxxKM1r/J2c7a8IImbQ18AsdPFu4NaRIG6TvYkGP3MGZtMHLe
 iHW5kgvgB5c=
 =hDiV
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2026-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
 "Fix an argument order bug in the alarm timer forwarding logic, which
  may cause missed expirations or incorrect overrun accounting"

* tag 'timers-urgent-2026-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  alarmtimer: Fix argument order in alarm_timer_forward()
2026-03-29 10:02:38 -07:00
Linus Torvalds f087b0bad4 Miscellaneous futex fixes:
- Tighten up the sys_futex_requeue() ABI a bit, to disallow
    dissimilar futex flags and potential UaF access. (Peter Zijlstra)
 
  - Fix UaF between futex_key_to_node_opt() and vma_replace_policy()
    (Hao-Yu Yang)
 
  - Clear stale exiting pointer in futex_lock_pi() retry path,
    which bug triggered a warning (and potential misbehavior)
    in stress-testing. (Davidlohr Bueso)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmnItYsRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1h0Ow/+O4JG+XMdmTn2OLZQBFVqlt40RG+UAudW
 tLyop+xovbLXk0FbZZen233yrYLnD8Qea9+LSIi1KPOKdklT4fimgDrVhSdZVMcY
 mPJxkZ1NTRDpl3qpjgaT3yXslgpQQlzDLjAkyyC6SWQh7RHlZ9JB4UICei9sOeE8
 WgwBrCfDMJXgjG5sAvkIhQFJaNEb/5HBHJu2Nxldlkl0of+le1e0mGUdY7H0ntxx
 kJ4jCsrpCw+tzhZw7RmGe8ouEDkbt2f7EBT10pjZ1cEP22Rogn1AzgdD7pkbhcU3
 WxbBGjSfm3ziPADELf0IhVYE/s+UXKTK9LfKDt4Q5W7paTV+o4Fllr+sraBmOoxp
 Ova5RZzSdq5NCGAcmFJWg/rxEyZf+uC7GjwPx/80huzgJvBthdjOagdVKm/3BNv8
 4d4H9KX67zWXE/XQHMt3g/7KYlVD9tzcp+v+h/rs8fiwkCkElwbljLISEr1StyOv
 CDaCouu6FQE8MoVTNPPbH1DfEyIgNKHdXfkQbt/OkGVB9/dX71zFmxEXRCYAGMb1
 wZXJ0j/oZZYcapmcb/xk5YjcZEfEU9QnMHHtCdR+dDL56xbsT5cDIzrb09LF6H/9
 WiMxhx2V3RNBNUwbOKGX6vdqa2kXG+Pjp6sq9Af7VB9oy+nbWpfcVFqWbWwKAP1N
 7e/EmnoxPTs=
 =85AU
 -----END PGP SIGNATURE-----

Merge tag 'locking-urgent-2026-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull futex fixes from Ingo Molnar:

 - Tighten up the sys_futex_requeue() ABI a bit, to disallow dissimilar
   futex flags and potential UaF access (Peter Zijlstra)

 - Fix UaF between futex_key_to_node_opt() and vma_replace_policy()
   (Hao-Yu Yang)

 - Clear stale exiting pointer in futex_lock_pi() retry path, which
   triggered a warning (and potential misbehavior) in stress-testing
   (Davidlohr Bueso)

* tag 'locking-urgent-2026-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: Clear stale exiting pointer in futex_lock_pi() retry path
  futex: Fix UaF between futex_key_to_node_opt() and vma_replace_policy()
  futex: Require sys_futex_requeue() to have identical flags
2026-03-29 09:59:46 -07:00
Linus Torvalds 21047b17b3 Miscellaneous irqchip driver fixes:
- Fix TX completion signaling bug in the Qualcomm MPM irqchip driver
 
  - Fix probe error handling in the Renesas RZ/V2H(P) irqchip driver
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmnIsbERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1j41A//QgxGkveiGrcNIULJvDM8tDwi1KxbQUQT
 DyZXEeimDlbZZm09pU8hQzUV3g3MqFDUV43MURl5wD2iGt3oT29f+hB2hK5fLSv6
 GqXzG48s2u6o+bmIPI+sVhwJ4PoEfCzrnvW/RLhjtnt9FAxd8swGNbiN6owiEtWW
 /s8kY5n7Jf8gQG6hRM4cPv8V2qLOAHGxGqZ2HLu4sBLC7DcBElaaIpigXMzOouU8
 GoRX+b7UH3j/Knm/bdnmTY05bdN588U/WMaqKuPvU6Flk7w2PheA/n4jAIckDlmC
 AbelcFJnD+V/VpXxFV5cJQ3GOD1Tdcp8eWytyYGxYQxaZF5XX0fILdwQ8uDoWegE
 fzWPcqJZ4LTw02J+mtAx7IaQqoaVxQF4wmcQu2owubUJYPkH6tsNvpNinT+IpqI7
 Giz20nNyRrU7lYUx2ha4GjFwf3xldBOJZTssE8iFDx0SJR7DoaiyI1l30mxJ4PGI
 p6XtHrCFTe1yE3k3JcTwktbAnW7Znu5JB3WYP5F7HazMFGYu31duJ89Uc9i58axX
 aOCW27Zhv3oxKgYoKJGKX3Bn37Tldsa5VYN3kq2f53p0G6nbCFDspeXzqiAEd/TY
 tSFdGEKKydKeTCwqFrlHfDUE/XrKgb4sfMjpYewEJdbI/UOvb4ojSYgrChE65eDE
 0ovBNqTVYMA=
 =O4KY
 -----END PGP SIGNATURE-----

Merge tag 'irq-urgent-2026-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Ingo Molnar:

 - Fix TX completion signaling bug in the Qualcomm MPM irqchip driver

 - Fix probe error handling in the Renesas RZ/V2H(P) irqchip driver

* tag 'irq-urgent-2026-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/renesas-rzv2h: Fix error path in rzv2h_icu_probe_common()
  irqchip/qcom-mpm: Add missing mailbox TX done acknowledgment
2026-03-29 09:53:01 -07:00
Linus Torvalds a3d97d1d3f overlayfs fixes for 7.0-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmnI69IACgkQEVvVyTe/
 1WqWmxAAn7Bbd9vh6botvvRoMwSlvPOE3oexRbpr+EMCE3Z9mulKi//KoiOk1xxW
 nzg41gXQ2ZvsCys8M/5tloo4U0YRZQthf3SyEyvl/c/wGRWvSZYdv86iV5/Qhkwf
 VaAT0Kh/V5Rffg1f1xYIVlt4TRbcbKTj9lr1bPnPGEbAY96Dav0gYMn2+29I1642
 l6CpWtM2OZCalDK/j0zG6Wxxeko84pT0iv6kxGba3roQVONzYWXycgW3caR4DJq2
 l4D81QCG1NhV984MP3UChHPoTa1gUepE6fOhyyTwXrWn8bEdkFXtHgKaC90PirP0
 OeYYK7sRx4yrsRcpBsuo83lpCP7aMIiGpU9hrbABdQ2qF9o1/hZyggU+FdKv+dri
 6SIw1w+wHpa2gDoSsze9uCsoc/jPflCzBgfo1w1jbGf3kctwmMSnWpkVBhCIwtD4
 ntoTWk2MzmLNXCDCrjb1r6SpYgnxrGDcRf6KikF89hkVmWO3wSYzscbaVQz1+R4b
 QKBF67zmYoubYPPZ503Y5n7S4L3O8kxGMCGCBx10FIvs8Lvf8dGwTBL+gOtmg1VK
 YO99lreo+BN3Q3iui1h2F4wT+XY1Iaq2ORaxi3czjblMfu/rkpcwHS2O6FlecvCQ
 Vfgve+6xGuYHMeziDkSiczA2m+8USIefA0zXzwq+2TA82hs95b4=
 =sSP0
 -----END PGP SIGNATURE-----

Merge tag 'ovl-fixes-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs

Pull overlayfs fixes from Amir Goldstein:

 - Fix regression in 'xino' feature detection

   I clumsily introduced this regression myself when working on another
   subsystem (fsnotify). Both the regression and the fix have almost no
   visible impact on users except for some kmsg prints.

 - Fix to performance regression in v6.12.

   This regression was reported by Google COS developers.

   It is not uncommon these days for the year-old mature LTS to get
   adopted by distros and get exposed to many new workloads. We made a
   sub-smart move of making a behavior change in v6.12 which could
   impact performance, without making it opt-in. Fixing this mistake
   retroactively, to be picked by LTS.

* tag 'ovl-fixes-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
  ovl: make fsync after metadata copy-up opt-in mount option
  ovl: fix wrong detection of 32bit inode numbers
2026-03-29 09:34:50 -07:00
Linus Torvalds 241d4ca15d Update the MAINTAINERS file to add reviewers for the ext4 file system
Add a test issue an ext4 warning (not a WARN_ON) if there are still
 dirty pages attached to an evicted inode.
 
 A lot of ext4 bug fixes including:
    * Fix a number of Syzkaller issues.
    * Fix memory leaks on error paths.
    * Replace some BUG and WARN with EFSCORRUPTED reporting.
    * Fix a potential crash when disabling discard via remount followed
      by an immediate unmount.  (Found by Sashiko)
    * Fix a corner case which could lead to allocating blocks for an
      indirect-mapped inode block numbers > 2**32.
    * Fix a race when reallocating a freed inode that could result in
      a deadlock.
    * Fix a user-after-free in update_super_work when racing with umount.
    * Fix build issues when trying to build ext4's kunit tests as a module
    * Fix a bug where ext4_split_extent_zeroout() could fail to pass
      back an error from ext4_ext_dirty().
    * Avoid allocating blocks from a corrupted block group in
      ext4_mb_find_by_goal().
    * Fix a percpu_counters list corruption BUG triggered by an
      ext4 extents kunit.
    * Fix a potetial crash caused by the fast commit flush path potentially
      accessing the jinode structure before it is fully initialized.
    * Fix fsync(2) in no-journal mode to make sure the dirtied inode is
      write to storage.
    * Fix a bug when in no-journal mode, when ext4 tries to avoid using
      recently deleted inodes, if lazy itable initialization is enabled,
      can lead to an unitialized inode getting skipped and triggering
      an e2fsck complaint.
    * Fix journal credit calculation when setting an xattr when both
      the encryption and ea_inode feeatures are enabled.
    * Fix corner cases which could result in stale xarray tags after
      writeback.
    * Fix generic/475 failures caused by ENOSPC errors while creating
      a symlink when the system crashes resulting to a file system
      inconsistency when replaying the fast commit journal.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmnIo3cACgkQ8vlZVpUN
 gaNu+wf+PyZyRFar37GVcXiMWm8wiBrH7XYhcH1z+uEfjxtWuxUDtKZtF/UZ8FiW
 peRI785hiIW1xjj3efkNlrg4JFPlY+mLlyY8iUabYj6Gt2kddgy7h6gCf2ugM/V9
 1fk9Hf3+QQ38qM9PpBpBVZyhSS7WpafCcUvx622G/+WiQldWAohpfk/KrNQtSWiX
 ZFzJtJ+7jJD6Qo1Pe2Em4khsZK2fsLw1XH9GpUWIdauQV+oXnkp7+kOQIOoT+7RX
 azjAub5b6L3Bf9p/NNvlUCE9VINeqjKHVGI/gesRz9uiQd/zI3/pBQtddMUy8B3k
 buN3uii5iYcLQD8ntX5sTP35PQg4hg==
 =Ew20
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:

 - Update the MAINTAINERS file to add reviewers for the ext4 file system

 - Add a test issue an ext4 warning (not a WARN_ON) if there are still
   dirty pages attached to an evicted inode.

 - Fix a number of Syzkaller issues

 - Fix memory leaks on error paths

 - Replace some BUG and WARN with EFSCORRUPTED reporting

 - Fix a potential crash when disabling discard via remount followed by
   an immediate unmount. (Found by Sashiko)

 - Fix a corner case which could lead to allocating blocks for an
   indirect-mapped inode block numbers > 2**32

 - Fix a race when reallocating a freed inode that could result in a
   deadlock

 - Fix a user-after-free in update_super_work when racing with umount

 - Fix build issues when trying to build ext4's kunit tests as a module

 - Fix a bug where ext4_split_extent_zeroout() could fail to pass back
   an error from ext4_ext_dirty()

 - Avoid allocating blocks from a corrupted block group in
   ext4_mb_find_by_goal()

 - Fix a percpu_counters list corruption BUG triggered by an ext4
   extents kunit

 - Fix a potetial crash caused by the fast commit flush path potentially
   accessing the jinode structure before it is fully initialized

 - Fix fsync(2) in no-journal mode to make sure the dirtied inode is
   write to storage

 - Fix a bug when in no-journal mode, when ext4 tries to avoid using
   recently deleted inodes, if lazy itable initialization is enabled,
   can lead to an unitialized inode getting skipped and triggering an
   e2fsck complaint

 - Fix journal credit calculation when setting an xattr when both the
   encryption and ea_inode feeatures are enabled

 - Fix corner cases which could result in stale xarray tags after
   writeback

 - Fix generic/475 failures caused by ENOSPC errors while creating a
   symlink when the system crashes resulting to a file system
   inconsistency when replaying the fast commit journal

* tag 'ext4_for_linus-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (27 commits)
  ext4: always drain queued discard work in ext4_mb_release()
  ext4: handle wraparound when searching for blocks for indirect mapped blocks
  ext4: skip split extent recovery on corruption
  ext4: fix iloc.bh leak in ext4_fc_replay_inode() error paths
  ext4: fix deadlock on inode reallocation
  ext4: fix use-after-free in update_super_work when racing with umount
  ext4: fix the might_sleep() warnings in kvfree()
  ext4: reject mount if bigalloc with s_first_data_block != 0
  ext4: fix extents-test.c is not compiled when EXT4_KUNIT_TESTS=M
  ext4: fix mballoc-test.c is not compiled when EXT4_KUNIT_TESTS=M
  ext4: introduce EXPORT_SYMBOL_FOR_EXT4_TEST() helper
  jbd2: gracefully abort on checkpointing state corruptions
  ext4: avoid infinite loops caused by residual data
  ext4: validate p_idx bounds in ext4_ext_correct_indexes
  ext4: test if inode's all dirty pages are submitted to disk
  ext4: minor fix for ext4_split_extent_zeroout()
  ext4: avoid allocate block from corrupted group in ext4_mb_find_by_goal()
  ext4: kunit: extents-test: lix percpu_counters list corruption
  ext4: publish jinode after initialization
  ext4: replace BUG_ON with proper error handling in ext4_read_inline_folio
  ...
2026-03-29 09:30:06 -07:00
Linus Torvalds b51ad67773 for-7.0-rc5-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmnIO1oACgkQxWXV+ddt
 WDuLDQ//eWEAvqyA7ch+aLNm0QVHqsdH3NdVFjqw/3NsZfC9A3txTpQmmWpfqIUl
 Yqb3AVIdRqwdytinn3xX17SxkJxzCWExcQ8LckoiJ2ODhLoi/vWGnaK8sUBzUDsm
 2etvLGbDmhEEpkTMtIFkbY8fhprqthGCQgRHNrn/cZ+AzwohaboinvACysQJSLlH
 Nqv0YxnAxsaGIaGD7wigjZ7a2BDoSZ/Q82+RvacrOkyqEThSOqzLmfl++fmHfxQo
 UDzVOt9RSUnZ5LU4MYjF3H5otLczqKqRuoQC66ulIkvCml07jaKxfUVJyjkQ3dzZ
 //UiylTXnC6XK/x1ayRrM73dKkXoN9mBmvH4b2FnOQJ3D3oyiCuytEF8s4nCrE+P
 Zml+OTvIvWMj7Xy2U7cTMWuQGeggZ1m4nxCRScZBEnububUzt13izODwE7C8Cx1y
 5FvfFIkiyeBrI03bWUO48H38lwWsfIKj6KnEk1SUeZIARClnZfM4NGZ7MbxjcntG
 SunXzuQAywOZCkpskUDWMASLSH6CEGG4zgHPmbzw1NSCE5ZCHFZYXG7znv4w3CTW
 TTeowG9r9cgdVBnMUTIToNWuD0jSMLmUfEPWvcwZF9ptfXdoABsPEQZTrKTLeiq7
 TVdunKWzU12ZLr2sl4iku8U4H4aNQFoAwV3XqKnG8VXmzN41+Zw=
 =I6JH
 -----END PGP SIGNATURE-----

Merge tag 'for-7.0-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few more fixes. There's one that stands out in size as it fixes an
  edge case in fsync.

   - fix issue on fsync where file with zero size appears as a non-zero
     after log replay

   - in zlib compression, handle a crash when data alignment causes
     folio reference issues

   - fix possible crash with enabled tracepoints on a overlayfs mount

   - handle device stats update error

   - on zoned filesystems, fix kobject leak on sub-block groups

   - fix super block offset in an error message in validation"

* tag 'for-7.0-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix lost error when running device stats on multiple devices fs
  btrfs: tracepoints: get correct superblock from dentry in event btrfs_sync_file()
  btrfs: zlib: handle page aligned compressed size correctly
  btrfs: fix leak of kobject name for sub-group space_info
  btrfs: fix zero size inode with non-zero size after log replay
  btrfs: fix super block offset in error message in btrfs_validate_super()
2026-03-28 15:23:03 -07:00
Linus Torvalds 0bcb517f0a 10 hotfixes. 8 are cc:stable. 9 are for MM.
There's a 3-patch series of DAMON fixes from Josh Law and SeongJae Park.
 The rest are singletons - please see the changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCacgT5AAKCRDdBJ7gKXxA
 jgYVAQDLam92lD7fD0IwdOFE36zlOcJNK4WjHSEf2y9fFFnUWQD/QPnVX+q7fR5t
 ZE8ZKhVkMa2w6A2VISdt9Msqm++5RQg=
 =oCV1
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2026-03-28-10-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "10 hotfixes.  8 are cc:stable.  9 are for MM.

  There's a 3-patch series of DAMON fixes from Josh Law and SeongJae
  Park. The rest are singletons - please see the changelogs for details"

* tag 'mm-hotfixes-stable-2026-03-28-10-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/mseal: update VMA end correctly on merge
  bug: avoid format attribute warning for clang as well
  mm/pagewalk: fix race between concurrent split and refault
  mm/memory: fix PMD/PUD checks in follow_pfnmap_start()
  mm/damon/sysfs: check contexts->nr in repeat_call_fn
  mm/damon/sysfs: check contexts->nr before accessing contexts_arr[0]
  mm/damon/sysfs: fix param_ctx leak on damon_sysfs_new_test_ctx() failure
  mm/swap: fix swap cache memcg accounting
  MAINTAINERS, mailmap: update email address for Harry Yoo
  mm/huge_memory: fix folio isn't locked in softleaf_to_folio()
2026-03-28 14:19:55 -07:00
Wolfram Sang b0faf733fc MAINTAINERS: drop outdated I2C website
As stated on the website: "This wiki has been archived and the content
is no longer updated." No need to reference it.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2026-03-28 20:31:33 +01:00
Linus Torvalds cbfffcca2b tracing fixes for v7.0:
- Fix potential deadlock in osnoise and hotplug
 
   The interface_lock can be called by a osnoise thread and the CPU shutdown
   logic of osnoise can wait for this thread to finish. But cpus_read_lock()
   can also be taken while holding the interface_lock. This produces a
   circular lock dependency and can cause a deadlock.
 
   Swap the ordering of cpus_read_lock() and the interface_lock to have
   interface_lock taken within the cpus_read_lock() context to prevent this
   circular dependency.
 
 - Fix freeing of event triggers in early boot up
 
   If the same trigger is added on the kernel command line, the second
   one will fail to be applied and the trigger created will be freed.
   This calls into the deferred logic and creates a kernel thread to do
   the freeing. But the command line logic is called before kernel
   threads can be created and this leads to a NULL pointer dereference.
 
   Delay freeing event triggers until late init.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCacf2GBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qooZAQDQgsnrkwkUHc2WjhAAsZunjWhqRQOp
 SIdOO95pDvQwewEAjjP6kG5UEppL8eNF9WmV8EMz9pjYo7i2/Wy59H6xnwQ=
 =6Rs0
 -----END PGP SIGNATURE-----

Merge tag 'trace-v7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix potential deadlock in osnoise and hotplug

   The interface_lock can be called by a osnoise thread and the CPU
   shutdown logic of osnoise can wait for this thread to finish. But
   cpus_read_lock() can also be taken while holding the interface_lock.
   This produces a circular lock dependency and can cause a deadlock.

   Swap the ordering of cpus_read_lock() and the interface_lock to have
   interface_lock taken within the cpus_read_lock() context to prevent
   this circular dependency.

 - Fix freeing of event triggers in early boot up

   If the same trigger is added on the kernel command line, the second
   one will fail to be applied and the trigger created will be freed.
   This calls into the deferred logic and creates a kernel thread to do
   the freeing. But the command line logic is called before kernel
   threads can be created and this leads to a NULL pointer dereference.

   Delay freeing event triggers until late init.

* tag 'trace-v7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Drain deferred trigger frees if kthread creation fails
  tracing: Fix potential deadlock in cpu hotplug with osnoise
2026-03-28 09:59:09 -07:00
Linus Torvalds e522b75c44 s390 fixes for 7.0-rc6
- Add array_index_nospec() to syscall dispatch table lookup to prevent
   limited speculative out-of-bounds access with user-controlled syscall
   number
 
 - Mark array_index_mask_nospec() __always_inline since GCC may emit an
   out-of-line call instead of the inline data dependency sequence the
   mitigation relies on
 
 - Clear r12 on kernel entry to prevent potential speculative use of user
   value in system_call, ext/io/mcck interrupt handlers
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmnHIWsACgkQjYWKoQLX
 FBgwnwgAhPc5rPBk6rG9BAnertIUC+f7NhXkepryLJctPTldaUvycx6aR+o9wizd
 7LEiur6duGAg7enWpaRa9FVthOk5tytbkGfU/MijJBoZovX5mZX7U0Ky4WcN+D7B
 nFo+CfhWt+jNC6DVZzqhQVrdCxES42olnadLbTbhq5t975lJFgCwJOCcciupawWt
 9Lx/YVHym9xlX4iE+sbc0yWGgicGn7JsPsHjfn5ci4WGgF2uhmF3FFfhVSzYJDZK
 b8TaodVBebzwREy0s0RKmbQAuT/R01sL16yAayHCL6smerAbNPN62oREEIDz2IS+
 fAC/4R7/+nsYQcVpGA/DxztbzK69Uw==
 =V6om
 -----END PGP SIGNATURE-----

Merge tag 's390-7.0-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Vasily Gorbik:

 - Add array_index_nospec() to syscall dispatch table lookup to prevent
   limited speculative out-of-bounds access with user-controlled syscall
   number

 - Mark array_index_mask_nospec() __always_inline since GCC may emit an
   out-of-line call instead of the inline data dependency sequence the
   mitigation relies on

 - Clear r12 on kernel entry to prevent potential speculative use of
   user value in system_call, ext/io/mcck interrupt handlers

* tag 's390-7.0-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/entry: Scrub r12 register on kernel entry
  s390/syscalls: Add spectre boundary for syscall dispatch table
  s390/barrier: Make array_index_mask_nospec() __always_inline
2026-03-28 09:50:11 -07:00
Davidlohr Bueso 210d36d892 futex: Clear stale exiting pointer in futex_lock_pi() retry path
Fuzzying/stressing futexes triggered:

    WARNING: kernel/futex/core.c:825 at wait_for_owner_exiting+0x7a/0x80, CPU#11: futex_lock_pi_s/524

When futex_lock_pi_atomic() sees the owner is exiting, it returns -EBUSY
and stores a refcounted task pointer in 'exiting'.

After wait_for_owner_exiting() consumes that reference, the local pointer
is never reset to nil. Upon a retry, if futex_lock_pi_atomic() returns a
different error, the bogus pointer is passed to wait_for_owner_exiting().

  CPU0			     CPU1		       CPU2
  futex_lock_pi(uaddr)
  // acquires the PI futex
  exit()
    futex_cleanup_begin()
      futex_state = EXITING;
			     futex_lock_pi(uaddr)
			       futex_lock_pi_atomic()
				 attach_to_pi_owner()
				   // observes EXITING
				   *exiting = owner;  // takes ref
				   return -EBUSY
			       wait_for_owner_exiting(-EBUSY, owner)
				 put_task_struct();   // drops ref
			       // exiting still points to owner
			       goto retry;
			       futex_lock_pi_atomic()
				 lock_pi_update_atomic()
				   cmpxchg(uaddr)
					*uaddr ^= WAITERS // whatever
				   // value changed
				 return -EAGAIN;
			       wait_for_owner_exiting(-EAGAIN, exiting) // stale
				 WARN_ON_ONCE(exiting)

Fix this by resetting upon retry, essentially aligning it with requeue_pi.

Fixes: 3ef240eaff ("futex: Prevent exit livelock")
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260326001759.4129680-1-dave@stgolabs.net
2026-03-28 13:54:02 +01:00
Wesley Atwell 250ab25391 tracing: Drain deferred trigger frees if kthread creation fails
Boot-time trigger registration can fail before the trigger-data cleanup
kthread exists. Deferring those frees until late init is fine, but the
post-boot fallback must still drain the deferred list if kthread
creation never succeeds.

Otherwise, boot-deferred nodes can accumulate on
trigger_data_free_list, later frees fall back to synchronously freeing
only the current object, and the older queued entries are leaked
forever.

To trigger this, add the following to the kernel command line:

  trace_event=sched_switch trace_trigger=sched_switch.traceon,sched_switch.traceon

The second traceon trigger will fail and be freed. This triggers a NULL
pointer dereference and crashes the kernel.

Keep the deferred boot-time behavior, but when kthread creation fails,
drain the whole queued list synchronously. Do the same in the late-init
drain path so queued entries are not stranded there either.

Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260324221326.1395799-3-atwellwea@gmail.com
Fixes: 61d445af0a ("tracing: Add bulk garbage collection of freeing event_trigger_data")
Signed-off-by: Wesley Atwell <atwellwea@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-28 08:32:44 -04:00
Lorenzo Stoakes (Oracle) 2697dd8ae7 mm/mseal: update VMA end correctly on merge
Previously we stored the end of the current VMA in curr_end, and then upon
iterating to the next VMA updated curr_start to curr_end to advance to the
next VMA.

However, this doesn't take into account the fact that a VMA might be
updated due to a merge by vma_modify_flags(), which can result in curr_end
being stale and thus, upon setting curr_start to curr_end, ending up with
an incorrect curr_start on the next iteration.

Resolve the issue by setting curr_end to vma->vm_end unconditionally to
ensure this value remains updated should this occur.

While we're here, eliminate this entire class of bug by simply setting
const curr_[start/end] to be clamped to the input range and VMAs, which
also happens to simplify the logic.

Link: https://lkml.kernel.org/r/20260327173104.322405-1-ljs@kernel.org
Fixes: 6c2da14ae1 ("mm/mseal: rework mseal apply logic")
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reported-by: Antonius <antonius@bluedragonsec.com>
Closes: https://lore.kernel.org/linux-mm/CAK8a0jwWGj9-SgFk0yKFh7i8jMkwKm5b0ao9=kmXWjO54veX2g@mail.gmail.com/
Suggested-by: David Hildenbrand (ARM) <david@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-27 20:48:38 -07:00
Arnd Bergmann 2598ab9d63 bug: avoid format attribute warning for clang as well
Like gcc, clang-22 now also warns about a function that it incorrectly
identifies as a printf-style format:

lib/bug.c:190:22: error: diagnostic behavior may be improved by adding the 'format(printf, 1, 0)' attribute to the declaration of '__warn_printf' [-Werror,-Wmissing-format-attribute]
  179 | static void __warn_printf(const char *fmt, struct pt_regs *regs)
      | __attribute__((format(printf, 1, 0)))
  180 | {
  181 |         if (!fmt)
  182 |                 return;
  183 |
  184 | #ifdef HAVE_ARCH_BUG_FORMAT_ARGS
  185 |         if (regs) {
  186 |                 struct arch_va_list _args;
  187 |                 va_list *args = __warn_args(&_args, regs);
  188 |
  189 |                 if (args) {
  190 |                         vprintk(fmt, *args);
      |                                           ^

Revert the change that added a gcc-specific workaround, and instead add
the generic annotation that avoid the warning.

Link: https://lkml.kernel.org/r/20260323205534.1284284-1-arnd@kernel.org
Fixes: d36067d6ea ("bug: Hush suggest-attribute=format for __warn_printf()")
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Suggested-by: Brendan Jackman <jackmanb@google.com>
Link: https://lore.kernel.org/all/20251208141618.2805983-1-andriy.shevchenko@linux.intel.com/T/#u
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-27 20:48:38 -07:00
Max Boone 3b89863c3f mm/pagewalk: fix race between concurrent split and refault
The splitting of a PUD entry in walk_pud_range() can race with a
concurrent thread refaulting the PUD leaf entry causing it to try walking
a PMD range that has disappeared.

An example and reproduction of this is to try reading numa_maps of a
process while VFIO-PCI is setting up DMA (specifically the
vfio_pin_pages_remote call) on a large BAR for that process.

This will trigger a kernel BUG:
vfio-pci 0000:03:00.0: enabling device (0000 -> 0002)
BUG: unable to handle page fault for address: ffffa23980000000
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
...
RIP: 0010:walk_pgd_range+0x3b5/0x7a0
Code: 8d 43 ff 48 89 44 24 28 4d 89 ce 4d 8d a7 00 00 20 00 48 8b 4c 24
28 49 81 e4 00 00 e0 ff 49 8d 44 24 ff 48 39 c8 4c 0f 43 e3 <49> f7 06
   9f ff ff ff 75 3b 48 8b 44 24 20 48 8b 40 28 48 85 c0 74
RSP: 0018:ffffac23e1ecf808 EFLAGS: 00010287
RAX: 00007f44c01fffff RBX: 00007f4500000000 RCX: 00007f44ffffffff
RDX: 0000000000000000 RSI: 000ffffffffff000 RDI: ffffffff93378fe0
RBP: ffffac23e1ecf918 R08: 0000000000000004 R09: ffffa23980000000
R10: 0000000000000020 R11: 0000000000000004 R12: 00007f44c0200000
R13: 00007f44c0000000 R14: ffffa23980000000 R15: 00007f44c0000000
FS:  00007fe884739580(0000) GS:ffff9b7d7a9c0000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffa23980000000 CR3: 000000c0650e2005 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
 <TASK>
 __walk_page_range+0x195/0x1b0
 walk_page_vma+0x62/0xc0
 show_numa_map+0x12b/0x3b0
 seq_read_iter+0x297/0x440
 seq_read+0x11d/0x140
 vfs_read+0xc2/0x340
 ksys_read+0x5f/0xe0
 do_syscall_64+0x68/0x130
 ? get_page_from_freelist+0x5c2/0x17e0
 ? mas_store_prealloc+0x17e/0x360
 ? vma_set_page_prot+0x4c/0xa0
 ? __alloc_pages_noprof+0x14e/0x2d0
 ? __mod_memcg_lruvec_state+0x8d/0x140
 ? __lruvec_stat_mod_folio+0x76/0xb0
 ? __folio_mod_stat+0x26/0x80
 ? do_anonymous_page+0x705/0x900
 ? __handle_mm_fault+0xa8d/0x1000
 ? __count_memcg_events+0x53/0xf0
 ? handle_mm_fault+0xa5/0x360
 ? do_user_addr_fault+0x342/0x640
 ? arch_exit_to_user_mode_prepare.constprop.0+0x16/0xa0
 ? irqentry_exit_to_user_mode+0x24/0x100
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fe88464f47e
Code: c0 e9 b6 fe ff ff 50 48 8d 3d be 07 0b 00 e8 69 01 02 00 66 0f 1f
84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00
   f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28
RSP: 002b:00007ffe6cd9a9b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fe88464f47e
RDX: 0000000000020000 RSI: 00007fe884543000 RDI: 0000000000000003
RBP: 00007fe884543000 R08: 00007fe884542010 R09: 0000000000000000
R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
 </TASK>

Fix this by validating the PUD entry in walk_pmd_range() using a stable
snapshot (pudp_get()).  If the PUD is not present or is a leaf, retry the
walk via ACTION_AGAIN instead of descending further.  This mirrors the
retry logic in walk_pte_range(), which lets walk_pmd_range() retry if the
PTE is not being got by pte_offset_map_lock().

Link: https://lkml.kernel.org/r/20260325-pagewalk-check-pmd-refault-v2-1-707bff33bc60@akamai.com
Fixes: f9e54c3a2f ("vfio/pci: implement huge_fault support")
Co-developed-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Max Boone <mboone@akamai.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-27 20:48:38 -07:00
David Hildenbrand (Arm) ffef67b93a mm/memory: fix PMD/PUD checks in follow_pfnmap_start()
follow_pfnmap_start() suffers from two problems:

(1) We are not re-fetching the pmd/pud after taking the PTL

Therefore, we are not properly stabilizing what the lock actually
protects.  If there is concurrent zapping, we would indicate to the
caller that we found an entry, however, that entry might already have
been invalidated, or contain a different PFN after taking the lock.

Properly use pmdp_get() / pudp_get() after taking the lock.

(2) pmd_leaf() / pud_leaf() are not well defined on non-present entries

pmd_leaf()/pud_leaf() could wrongly trigger on non-present entries.

There is no real guarantee that pmd_leaf()/pud_leaf() returns something
reasonable on non-present entries.  Most architectures indeed either
perform a present check or make it work by smart use of flags.

However, for example loongarch checks the _PAGE_HUGE flag in pmd_leaf(),
and always sets the _PAGE_HUGE flag in __swp_entry_to_pmd().  Whereby
pmd_trans_huge() explicitly checks pmd_present(), pmd_leaf() does not do
that.

Let's check pmd_present()/pud_present() before assuming "the is a present
PMD leaf" when spotting pmd_leaf()/pud_leaf(), like other page table
handling code that traverses user page tables does.

Given that non-present PMD entries are likely rare in VM_IO|VM_PFNMAP, (1)
is likely more relevant than (2).  It is questionable how often (1) would
actually trigger, but let's CC stable to be sure.

This was found by code inspection.

Link: https://lkml.kernel.org/r/20260323-follow_pfnmap_fix-v1-1-5b0ec10872b3@kernel.org
Fixes: 6da8e9634b ("mm: new follow_pfnmap API")
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-27 20:48:38 -07:00
Josh Law 6557004a8b mm/damon/sysfs: check contexts->nr in repeat_call_fn
damon_sysfs_repeat_call_fn() calls damon_sysfs_upd_tuned_intervals(),
damon_sysfs_upd_schemes_stats(), and
damon_sysfs_upd_schemes_effective_quotas() without checking contexts->nr. 
If nr_contexts is set to 0 via sysfs while DAMON is running, these
functions dereference contexts_arr[0] and cause a NULL pointer
dereference.  Add the missing check.

For example, the issue can be reproduced using DAMON sysfs interface and
DAMON user-space tool (damo) [1] like below.

    $ sudo damo start --refresh_interval 1s
    $ echo 0 | sudo tee \
            /sys/kernel/mm/damon/admin/kdamonds/0/contexts/nr_contexts

Link: https://patch.msgid.link/20260320163559.178101-3-objecting@objecting.org
Link: https://lkml.kernel.org/r/20260321175427.86000-4-sj@kernel.org
Link: https://github.com/damonitor/damo [1]
Fixes: d809a7c64b ("mm/damon/sysfs: implement refresh_ms file internal work")
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>	[6.17+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-27 20:48:38 -07:00
Josh Law 1bfe9fb5ed mm/damon/sysfs: check contexts->nr before accessing contexts_arr[0]
Multiple sysfs command paths dereference contexts_arr[0] without first
verifying that kdamond->contexts->nr == 1.  A user can set nr_contexts to
0 via sysfs while DAMON is running, causing NULL pointer dereferences.

In more detail, the issue can be triggered by privileged users like
below.

First, start DAMON and make contexts directory empty
(kdamond->contexts->nr == 0).

    # damo start
    # cd /sys/kernel/mm/damon/admin/kdamonds/0
    # echo 0 > contexts/nr_contexts

Then, each of below commands will cause the NULL pointer dereference.

    # echo update_schemes_stats > state
    # echo update_schemes_tried_regions > state
    # echo update_schemes_tried_bytes > state
    # echo update_schemes_effective_quotas > state
    # echo update_tuned_intervals > state

Guard all commands (except OFF) at the entry point of
damon_sysfs_handle_cmd().

Link: https://lkml.kernel.org/r/20260321175427.86000-3-sj@kernel.org
Fixes: 0ac32b8aff ("mm/damon/sysfs: support DAMOS stats")
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>	[5.18+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-27 20:48:38 -07:00
Josh Law 7fe000eb32 mm/damon/sysfs: fix param_ctx leak on damon_sysfs_new_test_ctx() failure
Patch series "mm/damon/sysfs: fix memory leak and NULL dereference
issues", v4.

DAMON_SYSFS can leak memory under allocation failure, and do NULL pointer
dereference when a privileged user make wrong sequences of control.  Fix
those.


This patch (of 3):

When damon_sysfs_new_test_ctx() fails in damon_sysfs_commit_input(),
param_ctx is leaked because the early return skips the cleanup at the out
label.  Destroy param_ctx before returning.

Link: https://lkml.kernel.org/r/20260321175427.86000-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20260321175427.86000-2-sj@kernel.org
Fixes: f0c5118ebb ("mm/damon/sysfs: catch commit test ctx alloc failure")
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>	[6.18+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-27 20:48:37 -07:00
Alexandre Ghiti 9e0d0ddfbc mm/swap: fix swap cache memcg accounting
The swap readahead path was recently refactored and while doing this, the
order between the charging of the folio in the memcg and the addition of
the folio in the swap cache was inverted.

Since the accounting of the folio is done while adding the folio to the
swap cache and the folio is not charged in the memcg yet, the accounting
is then done at the node level, which is wrong.

Fix this by charging the folio in the memcg before adding it to the swap cache.

Link: https://lkml.kernel.org/r/20260320050601.1833108-1-alex@ghiti.fr
Fixes: 2732acda82 ("mm, swap: use swap cache as the swap in synchronize layer")
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Acked-by: Kairui Song <kasong@tencent.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Chris Li <chrisl@kernel.org>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-27 20:48:37 -07:00
Harry Yoo (Oracle) 26d3dca201 MAINTAINERS, mailmap: update email address for Harry Yoo
Update my email address to harry@kernel.org.

Link: https://lkml.kernel.org/r/20260320125925.2259998-1-harry@kernel.org
Signed-off-by: Harry Yoo (Oracle) <harry@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-27 20:48:37 -07:00
Jinjiang Tu 4c5e7f0fcd mm/huge_memory: fix folio isn't locked in softleaf_to_folio()
On arm64 server, we found folio that get from migration entry isn't locked
in softleaf_to_folio().  This issue triggers when mTHP splitting and
zap_nonpresent_ptes() races, and the root cause is lack of memory barrier
in softleaf_to_folio().  The race is as follows:

	CPU0                                             CPU1

deferred_split_scan()                              zap_nonpresent_ptes()
  lock folio
  split_folio()
    unmap_folio()
      change ptes to migration entries
    __split_folio_to_order()                         softleaf_to_folio()
      set flags(including PG_locked) for tail pages    folio = pfn_folio(softleaf_to_pfn(entry))
      smp_wmb()                                        VM_WARN_ON_ONCE(!folio_test_locked(folio))
      prep_compound_page() for tail pages

In __split_folio_to_order(), smp_wmb() guarantees page flags of tail pages
are visible before the tail page becomes non-compound.  smp_wmb() should
be paired with smp_rmb() in softleaf_to_folio(), which is missed.  As a
result, if zap_nonpresent_ptes() accesses migration entry that stores tail
pfn, softleaf_to_folio() may see the updated compound_head of tail page
before page->flags.

This issue will trigger VM_WARN_ON_ONCE() in pfn_swap_entry_folio()
because of the race between folio split and zap_nonpresent_ptes()
leading to a folio incorrectly undergoing modification without a folio
lock being held.

This is a BUG_ON() before commit 93976a2034 ("mm: eliminate further
swapops predicates"), which in merged in v6.19-rc1.

To fix it, add missing smp_rmb() if the softleaf entry is migration entry
in softleaf_to_folio() and softleaf_to_page().

[tujinjiang@huawei.com: update function name and comments]
  Link: https://lkml.kernel.org/r/20260321075214.3305564-1-tujinjiang@huawei.com
Link: https://lkml.kernel.org/r/20260319012541.4158561-1-tujinjiang@huawei.com
Fixes: e9b61f1985 ("thp: reintroduce split_huge_page()")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Barry Song <baohua@kernel.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-27 20:48:37 -07:00
Theodore Ts'o 9ee29d20aa ext4: always drain queued discard work in ext4_mb_release()
While reviewing recent ext4 patch[1], Sashiko raised the following
concern[2]:

> If the filesystem is initially mounted with the discard option,
> deleting files will populate sbi->s_discard_list and queue
> s_discard_work. If it is then remounted with nodiscard, the
> EXT4_MOUNT_DISCARD flag is cleared, but the pending s_discard_work is
> neither cancelled nor flushed.

[1] https://lore.kernel.org/r/20260319094545.19291-1-qiang.zhang@linux.dev/
[2] https://sashiko.dev/#/patchset/20260319094545.19291-1-qiang.zhang%40linux.dev

The concern was valid, but it had nothing to do with the patch[1].
One of the problems with Sashiko in its current (early) form is that
it will detect pre-existing issues and report it as a problem with the
patch that it is reviewing.

In practice, it would be hard to hit deliberately (unless you are a
malicious syzkaller fuzzer), since it would involve mounting the file
system with -o discard, and then deleting a large number of files,
remounting the file system with -o nodiscard, and then immediately
unmounting the file system before the queued discard work has a change
to drain on its own.

Fix it because it's a real bug, and to avoid Sashiko from raising this
concern when analyzing future patches to mballoc.c.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fixes: 55cdd0af2b ("ext4: get discard out of jbd2 commit kthread contex")
Cc: stable@kernel.org
2026-03-27 23:39:10 -04:00
Theodore Ts'o bb81702370 ext4: handle wraparound when searching for blocks for indirect mapped blocks
Commit 4865c768b5 ("ext4: always allocate blocks only from groups
inode can use") restricts what blocks will be allocated for indirect
block based files to block numbers that fit within 32-bit block
numbers.

However, when using a review bot running on the latest Gemini LLM to
check this commit when backporting into an LTS based kernel, it raised
this concern:

   If ac->ac_g_ex.fe_group is >= ngroups (for instance, if the goal
   group was populated via stream allocation from s_mb_last_groups),
   then start will be >= ngroups.

   Does this allow allocating blocks beyond the 32-bit limit for
   indirect block mapped files? The commit message mentions that
   ext4_mb_scan_groups_linear() takes care to not select unsupported
   groups. However, its loop uses group = *start, and the very first
   iteration will call ext4_mb_scan_group() with this unsupported
   group because next_linear_group() is only called at the end of the
   iteration.

After reviewing the code paths involved and considering the LLM
review, I determined that this can happen when there is a file system
where some files/directories are extent-mapped and others are
indirect-block mapped.  To address this, add a safety clamp in
ext4_mb_scan_groups().

Fixes: 4865c768b5 ("ext4: always allocate blocks only from groups inode can use")
Cc: Jan Kara <jack@suse.cz>
Reviewed-by: Baokun Li <libaokun@linux.alibaba.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://patch.msgid.link/20260326045834.1175822-1-tytso@mit.edu
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2026-03-27 23:39:02 -04:00
hongao 3ceda17325 ext4: skip split extent recovery on corruption
ext4_split_extent_at() retries after ext4_ext_insert_extent() fails by
refinding the original extent and restoring its length. That recovery is
only safe for transient resource failures such as -ENOSPC, -EDQUOT, and
-ENOMEM.

When ext4_ext_insert_extent() fails because the extent tree is already
corrupted, ext4_find_extent() can return a leaf path without p_ext.
ext4_split_extent_at() then dereferences path[depth].p_ext while trying to
fix up the original extent length, causing a NULL pointer dereference while
handling a pre-existing filesystem corruption.

Do not enter the recovery path for corruption errors, and validate p_ext
after refinding the extent before touching it. This keeps the recovery path
limited to cases it can actually repair and turns the syzbot-triggered crash
into a proper corruption report.

Fixes: 716b9c23b8 ("ext4: refactor split and convert extents")
Reported-by: syzbot+1ffa5d865557e51cb604@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1ffa5d865557e51cb604
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: hongao <hongao@uniontech.com>
Link: https://patch.msgid.link/EF77870F23FF9C90+20260324015815.35248-1-hongao@uniontech.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2026-03-27 23:38:52 -04:00
Baokun Li ec0a7500d8 ext4: fix iloc.bh leak in ext4_fc_replay_inode() error paths
During code review, Joseph found that ext4_fc_replay_inode() calls
ext4_get_fc_inode_loc() to get the inode location, which holds a
reference to iloc.bh that must be released via brelse().

However, several error paths jump to the 'out' label without
releasing iloc.bh:

 - ext4_handle_dirty_metadata() failure
 - sync_dirty_buffer() failure
 - ext4_mark_inode_used() failure
 - ext4_iget() failure

Fix this by introducing an 'out_brelse' label placed just before
the existing 'out' label to ensure iloc.bh is always released.

Additionally, make ext4_fc_replay_inode() propagate errors
properly instead of always returning 0.

Reported-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Fixes: 8016e29f43 ("ext4: fast commit recovery path")
Signed-off-by: Baokun Li <libaokun@linux.alibaba.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260323060836.3452660-1-libaokun@linux.alibaba.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2026-03-27 23:38:01 -04:00
Jan Kara 0c90eed1b9 ext4: fix deadlock on inode reallocation
Currently there is a race in ext4 when reallocating freed inode
resulting in a deadlock:

Task1					Task2
ext4_evict_inode()
  handle = ext4_journal_start();
  ...
  if (IS_SYNC(inode))
    handle->h_sync = 1;
  ext4_free_inode()
					ext4_new_inode()
					  handle = ext4_journal_start()
					  finds the bit in inode bitmap
					    already clear
					  insert_inode_locked()
					    waits for inode to be
					      removed from the hash.
  ext4_journal_stop(handle)
    jbd2_journal_stop(handle)
      jbd2_log_wait_commit(journal, tid);
        - deadlocks waiting for transaction handle Task2 holds

Fix the problem by removing inode from the hash already in
ext4_clear_inode() by which time all IO for the inode is done so reuse
is already fine but we are still before possibly blocking on transaction
commit.

Reported-by: "Lai, Yi" <yi1.lai@linux.intel.com>
Link: https://lore.kernel.org/all/abNvb2PcrKj1FBeC@ly-workstation
Fixes: 88ec797c46 ("fs: make insert_inode_locked() wait for inode destruction")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260320090428.24899-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2026-03-27 23:37:48 -04:00
Jiayuan Chen d15e4b0a41 ext4: fix use-after-free in update_super_work when racing with umount
Commit b98535d091 ("ext4: fix bug_on in start_this_handle during umount
filesystem") moved ext4_unregister_sysfs() before flushing s_sb_upd_work
to prevent new error work from being queued via /proc/fs/ext4/xx/mb_groups
reads during unmount. However, this introduced a use-after-free because
update_super_work calls ext4_notify_error_sysfs() -> sysfs_notify() which
accesses the kobject's kernfs_node after it has been freed by kobject_del()
in ext4_unregister_sysfs():

  update_super_work                ext4_put_super
  -----------------                --------------
                                   ext4_unregister_sysfs(sb)
                                     kobject_del(&sbi->s_kobj)
                                       __kobject_del()
                                         sysfs_remove_dir()
                                           kobj->sd = NULL
                                         sysfs_put(sd)
                                           kernfs_put()  // RCU free
  ext4_notify_error_sysfs(sbi)
    sysfs_notify(&sbi->s_kobj)
      kn = kobj->sd              // stale pointer
      kernfs_get(kn)             // UAF on freed kernfs_node
                                   ext4_journal_destroy()
                                     flush_work(&sbi->s_sb_upd_work)

Instead of reordering the teardown sequence, fix this by making
ext4_notify_error_sysfs() detect that sysfs has already been torn down
by checking s_kobj.state_in_sysfs, and skipping the sysfs_notify() call
in that case. A dedicated mutex (s_error_notify_mutex) serializes
ext4_notify_error_sysfs() against kobject_del() in ext4_unregister_sysfs()
to prevent TOCTOU races where the kobject could be deleted between the
state_in_sysfs check and the sysfs_notify() call.

Fixes: b98535d091 ("ext4: fix bug_on in start_this_handle during umount filesystem")
Cc: Jiayuan Chen <jiayuan.chen@linux.dev>
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260319120336.157873-1-jiayuan.chen@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2026-03-27 23:37:39 -04:00
Zqiang 496bb99b7e ext4: fix the might_sleep() warnings in kvfree()
Use the kvfree() in the RCU read critical section can trigger
the following warnings:

EXT4-fs (vdb): unmounting filesystem cd983e5b-3c83-4f5a-a136-17b00eb9d018.

WARNING: suspicious RCU usage

./include/linux/rcupdate.h:409 Illegal context switch in RCU read-side critical section!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1

Call Trace:
 <TASK>
 dump_stack_lvl+0xbb/0xd0
 dump_stack+0x14/0x20
 lockdep_rcu_suspicious+0x15a/0x1b0
 __might_resched+0x375/0x4d0
 ? put_object.part.0+0x2c/0x50
 __might_sleep+0x108/0x160
 vfree+0x58/0x910
 ? ext4_group_desc_free+0x27/0x270
 kvfree+0x23/0x40
 ext4_group_desc_free+0x111/0x270
 ext4_put_super+0x3c8/0xd40
 generic_shutdown_super+0x14c/0x4a0
 ? __pfx_shrinker_free+0x10/0x10
 kill_block_super+0x40/0x90
 ext4_kill_sb+0x6d/0xb0
 deactivate_locked_super+0xb4/0x180
 deactivate_super+0x7e/0xa0
 cleanup_mnt+0x296/0x3e0
 __cleanup_mnt+0x16/0x20
 task_work_run+0x157/0x250
 ? __pfx_task_work_run+0x10/0x10
 ? exit_to_user_mode_loop+0x6a/0x550
 exit_to_user_mode_loop+0x102/0x550
 do_syscall_64+0x44a/0x500
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
 </TASK>

BUG: sleeping function called from invalid context at mm/vmalloc.c:3441
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 556, name: umount
preempt_count: 1, expected: 0
CPU: 3 UID: 0 PID: 556 Comm: umount
Call Trace:
 <TASK>
 dump_stack_lvl+0xbb/0xd0
 dump_stack+0x14/0x20
 __might_resched+0x275/0x4d0
 ? put_object.part.0+0x2c/0x50
 __might_sleep+0x108/0x160
 vfree+0x58/0x910
 ? ext4_group_desc_free+0x27/0x270
 kvfree+0x23/0x40
 ext4_group_desc_free+0x111/0x270
 ext4_put_super+0x3c8/0xd40
 generic_shutdown_super+0x14c/0x4a0
 ? __pfx_shrinker_free+0x10/0x10
 kill_block_super+0x40/0x90
 ext4_kill_sb+0x6d/0xb0
 deactivate_locked_super+0xb4/0x180
 deactivate_super+0x7e/0xa0
 cleanup_mnt+0x296/0x3e0
 __cleanup_mnt+0x16/0x20
 task_work_run+0x157/0x250
 ? __pfx_task_work_run+0x10/0x10
 ? exit_to_user_mode_loop+0x6a/0x550
 exit_to_user_mode_loop+0x102/0x550
 do_syscall_64+0x44a/0x500
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The above scenarios occur in initialization failures and teardown
paths, there are no parallel operations on the resources released
by kvfree(), this commit therefore remove rcu_read_lock/unlock() and
use rcu_access_pointer() instead of rcu_dereference() operations.

Fixes: 7c990728b9 ("ext4: fix potential race between s_flex_groups online resizing and access")
Fixes: df3da4ea5a ("ext4: fix potential race between s_group_info online resizing and access")
Signed-off-by: Zqiang <qiang.zhang@linux.dev>
Reviewed-by: Baokun Li <libaokun@linux.alibaba.com>
Link: https://patch.msgid.link/20260319094545.19291-1-qiang.zhang@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2026-03-27 23:36:20 -04:00
Helen Koike 3822743dc2 ext4: reject mount if bigalloc with s_first_data_block != 0
bigalloc with s_first_data_block != 0 is not supported, reject mounting
it.

Signed-off-by: Helen Koike <koike@igalia.com>
Suggested-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: syzbot+b73703b873a33d8eb8f6@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b73703b873a33d8eb8f6
Link: https://patch.msgid.link/20260317142325.135074-1-koike@igalia.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2026-03-27 23:36:11 -04:00
Ye Bin 9e1b14320b ext4: fix extents-test.c is not compiled when EXT4_KUNIT_TESTS=M
Now, only EXT4_KUNIT_TESTS=Y testcase will be compiled in 'extents.c'.
To solve this issue, the ext4 test code needs to be decoupled. The
'extents-test' module is compiled into 'ext4-test' module.

Fixes: cb1e0c1d1f ("ext4: kunit tests for extent splitting and conversion")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260314075258.1317579-4-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-03-27 23:36:06 -04:00