Commit Graph

47777 Commits

Author SHA1 Message Date
Breno Leitao 0f08335ade trace: tcp: Add tracepoint for tcp_sendmsg_locked()
Add a tracepoint to monitor TCP send operations, enabling detailed
visibility into TCP message transmission.

Create a new tracepoint within the tcp_sendmsg_locked function,
capturing traditional fields along with size_goal, which indicates the
optimal data size for a single TCP segment. Additionally, a reference to
the struct sock sk is passed, allowing direct access for BPF programs.
The implementation is largely based on David's patch[1] and suggestions.

Link: https://lore.kernel.org/all/70168c8f-bf52-4279-b4c4-be64527aa1ac@kernel.org/ [1]
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250408-tcpsendmsg-v3-2-208b87064c28@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-10 18:34:05 -07:00
Jakub Kicinski cb7103298d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.15-rc2).

Conflict:

Documentation/networking/netdevices.rst
net/core/lock_debug.c
  04efcee6ef ("net: hold instance lock during NETDEV_CHANGE")
  03df156dd3 ("xdp: double protect netdev->xdp_flags with netdev->lock")

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-10 16:51:07 -07:00
Stanislav Fomichev 311920774c configs/debug: run and debug PREEMPT
Recent change [0] resulted in a "BUG: using __this_cpu_read() in
preemptible" splat [1]. PREEMPT kernels have additional requirements
on what can and can not run with/without preemption enabled.
Expose those constrains in the debug kernels.

0: https://lore.kernel.org/netdev/20250314120048.12569-2-justin.iurman@uliege.be/
1: https://lore.kernel.org/netdev/20250402094458.006ba2a7@kernel.org/T/#mbf72641e9d7d274daee9003ef5edf6833201f1bc

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250402172305.1775226-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09 17:47:06 -07:00
Linus Torvalds bec7dcbc24 Probes fixes for v6.14:
- fprobe: Fix to remove fprobe_hlist_node when module unloading
 
   When a fprobe target module is removed, the fprobe_hlist_node
   should be removed from the fprobe's hash table to prevent reusing
   accidentally if another module is loaded at the same address.
 
 - fprobe: Fix to lock module while registering fprobe
 
  The module containing the function to be probeed is locked using a
   reference counter until the fprobe registration is complete, which
   prevents use after free.
 
 - fprobe-events: Fix possible UAF on modules
 
   Basically as same as above, but in the fprobe-events layer we also
   need to get module reference counter when we find the tracepoint
   in the module.
 -----BEGIN PGP SIGNATURE-----
 
 iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmf0kJ8bHG1hc2FtaS5o
 aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8b+bEIALiFuYBn2y26OJnfaRnW
 rgSC2JupswEVg7HwsN5kA/x1ypXl9SPfJGjbiHLUTq9+4KGOBTmY+k5/OpVO+Qkh
 3nYKOkZxKRTglA7hRSTH0rxDV1eobps4nv/xkPjprugcjCGU54+4yb9Hq7Kyflpa
 o8p+VS/0VOJ9f3Iy9a9JRfu9qE7Qzz9USCj4N64WMgx/qczPe27twqFEaUpTf1VW
 Sw9twtKnqGs9hNE2QmhlzUBuq6gOZMXkjH6t1U4pMWBGB51JqZ5ZBhC4kL/5XEIZ
 bEau82El5qdieQC2B7c0RxldceKa4t4QUlJDalZGKpxvTXrCw9rFyv0dRe2cXnKm
 Yo0=
 =I+MO
 -----END PGP SIGNATURE-----

Merge tag 'probes-fixes-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes fixes from Masami Hiramatsu:

 - fprobe: remove fprobe_hlist_node when module unloading

   When a fprobe target module is removed, the fprobe_hlist_node should
   be removed from the fprobe's hash table to prevent reusing
   accidentally if another module is loaded at the same address.

 - fprobe: lock module while registering fprobe

   The module containing the function to be probeed is locked using a
   reference counter until the fprobe registration is complete, which
   prevents use after free.

 - fprobe-events: fix possible UAF on modules

   Basically as same as above, but in the fprobe-events layer we also
   need to get module reference counter when we find the tracepoint in
   the module.

* tag 'probes-fixes-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: fprobe: Cleanup fprobe hash when module unloading
  tracing: fprobe events: Fix possible UAF on modules
  tracing: fprobe: Fix to lock module while registering fprobe
2025-04-08 12:51:34 -07:00
Linus Torvalds e37f72b3b4 cgroup: Fixes for v6.15-rc1
- A number of cpuset remote partition related fixes and cleanups along with
   selftest updates.
 
 - A change from this merge window made cgroup_rstat_updated_list() called
   outside cgroup_rstat_lock leading to list corruptions. Fix it by
   relocating the call inside the lock.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZ/QMSQ4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGebUAP0bdg/hIX5OjhREbaDKWoUyAHnHqMdg3Dvngvhp
 d9aOqQD/b1jdVfDINFtb2qjOpizPjyI0ycQxrr9K3DrSYmUAKAs=
 =hFhq
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:

 - A number of cpuset remote partition related fixes and cleanups along
   with selftest updates.

 - A change from this merge window made cgroup_rstat_updated_list()
   called outside cgroup_rstat_lock leading to list corruptions. Fix it
   by relocating the call inside the lock.

* tag 'cgroup-for-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/cpuset: Fix race between newly created partition and dying one
  cgroup: rstat: call cgroup_rstat_updated_list with cgroup_rstat_lock
  selftest/cgroup: Add a remote partition transition test to test_cpuset_prs.sh
  selftest/cgroup: Clean up and restructure test_cpuset_prs.sh
  selftest/cgroup: Update test_cpuset_prs.sh to use | as effective CPUs and state separator
  cgroup/cpuset: Remove unneeded goto in sched_partition_write() and rename it
  cgroup/cpuset: Code cleanup and comment update
  cgroup/cpuset: Don't allow creation of local partition over a remote one
  cgroup/cpuset: Remove remote_partition_check() & make update_cpumasks_hier() handle remote partition
  cgroup/cpuset: Fix error handling in remote_partition_disable()
  cgroup/cpuset: Fix incorrect isolated_cpus update in update_parent_effective_cpumask()
2025-04-08 12:15:05 -07:00
Masami Hiramatsu (Google) a3dc2983ca tracing: fprobe: Cleanup fprobe hash when module unloading
Cleanup fprobe address hash table on module unloading because the
target symbols will be disappeared when unloading module and not
sure the same symbol is mapped on the same address.

Note that this is at least disables the fprobes if a part of target
symbols on the unloaded modules. Unlike kprobes, fprobe does not
re-enable the probe point by itself. To do that, the caller should
take care register/unregister fprobe when loading/unloading modules.
This simplifies the fprobe state managememt related to the module
loading/unloading.

Link: https://lore.kernel.org/all/174343534473.843280.13988101014957210732.stgit@devnote2/

Fixes: 4346ba1604 ("fprobe: Rewrite fprobe on function-graph tracer")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-04-08 08:46:25 +09:00
Linus Torvalds dda8887894 Fix a perf events time accounting bug.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfyslURHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jGRw/8CZ3NyJI/g0StlH/csA4a1f58JBpIR1XY
 I+uUXuTEsd1yS1TMLz5WeK8hsLp1ZCnehojLcjY4Ee/8Tp+mBRlgYuvgJ/T7wBbI
 OOJjsSku2IHk+D3RshUJt3kG9deeBIkZSQON+HeJ28WaYzXs0qpNKnS3a5RMLJWo
 k+PS4IpRaWN5a/YIxC2XMGGxEBE0W9wJXXthIbbSozuu1uXuNiZ92cxAa8IzPiZn
 4oThM4dq1XyR4NvcjWf23206fUUVEzBoK/XS15oRK3Nk2oHMZ2ilruTxkBEaFf50
 6Nr2zNVVQ6/l6wR9DYMAQTE+UHFMJGb7+l1oSARLFKPKc9h7nj4+eItBMzkzbxXS
 wAZX0nq+kkXAr2DABHBxeT6q10OGjLHCOTrE4AfU0Iss8kmNhRPiqwuXT87dOcDa
 OWH75mrP6rkGdUdExV5+ZdB1GhBiomg/KB3YCILeM2OjrluXtZC1aHYiuS1RPh24
 KaH6H20WtRzNbF7uxRPBwOS1U3xHfFd+usZ1XnBzl2DWNz09hyJUjuG8B90EmjvI
 POQ2lyepVg2tIV/uplM0sb9J39tNZNXRlfQrmjsyuHk+1kmK+bh0lL9WVe7qQqVB
 jEX6X0yqTdLAMyRllcwRwtIhgb1u89PFRhnC3IKRfuJeuj07LTgnkOYSVwp7YQ+C
 p7eIplRJhrw=
 =slQk
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf event fix from Ingo Molnar:
 "Fix a perf events time accounting bug"

* tag 'perf-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix child_total_time_enabled accounting bug at task exit
2025-04-06 10:48:12 -07:00
Linus Torvalds 302deb109d Miscellaneous scheduler fixes/updates:
- Fix a nonsensical Kconfig combination
  - Remove an unnecessary rseq-notification
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfysY8RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gftQ//Sy6HJRQhC/xxLuv8N8f9bEzju5fSgEaK
 2Cg5aKmKyf6IECIAMilB3snAM4h1X1ajnGbxx98sERuDTjkjKawLD17hC5riYxyr
 fReDy6VkdNw/yALK/QcodJZHPWBbYeVT3uVO9qQSnMq6q8IJrOkM0rZwOYawo2FJ
 ID7xWUGhPTatuqm2TM4r4yzwXrPq5fHllrWEsc4LlhtXYRJmzeOGbLh63vUgUFZO
 iu0uM7qt93GoVZqPsw5fliuFE+m4Ug8fPY+hBtXZlUn/npQpR9dP3+hccXIsslCq
 H00pmnqiE5nyDo8zsOG3rzO4gml6k4JQGUWkcmzVq56n02N6naC4KTGZS79aCpaV
 7KwInYW2fzwYcd6UEVHlRqeJK/XFTcL+fDfFWSEp5T/3keeCnwilZDRAHPAgW6ot
 GxAUPT8P8qlnGhXSOMoOoJND3KChelQQzJBQc/j5EToqYCNLytqzWnPgXbzwN/Za
 ZWhlL2T39n3ykEQarlm0MOL35n/0CF27Q5dKOLeaS6OA7K1wYHOQYuCf09zRpKrv
 aaKiKhir4RyYLsfUIJD9cSO68AZQGAwXZGEyM23eErjcA/ZNHrew4TGFM3Tzwj2Q
 /7wHpWfRhhcP7igGrOoJ+YDOCvrfUSgegRYx8hgucuWmFFI1h1mrmfWy8lyPihtm
 pPy9jAwjElI=
 =n5lA
 -----END PGP SIGNATURE-----

Merge tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:

 - Fix a nonsensical Kconfig combination

 - Remove an unnecessary rseq-notification

* tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rseq: Eliminate useless task_work on execve
  sched/isolation: Make CONFIG_CPU_ISOLATION depend on CONFIG_SMP
2025-04-06 10:44:58 -07:00
Linus Torvalds 16cd1c2657 A set of final cleanups for the timer subsystem:
1) Convert all del_timer[_sync]() instances over to the new
      timer_delete[_sync]() API and remove the legacy wrappers.
 
      Conversion was done with coccinelle plus some manual fixups as
      coccinelle chokes on scoped_guard().
 
   2) The final cleanup of the hrtimer_init() to hrtimer_setup() conversion.
 
      This has been delayed to the end of the merge window, so that all
      patches which have been merged through other trees are in mainline and
      all new users are catched.
 
 Doing this right before rc1 ensures that new code which is merged post rc1
 is not introducing new instances of the original functionality.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmfyXi0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYzlD/4ykDZbUzgTreYOxEQpBJ9elPwBhxfL
 1v8OwDjRWlNrmLup8RiUfKrlbmztGl1J/u9ld0qhjcqkywCCBC1N5S+DhCjYetyP
 MPWLbi2Dc35cFA+M7i8fMgxI2K9MLz2Zj1UKxz1MdsSuNHm07N3mul/3T11Ye4Rz
 nPlzeQBTBDFCKTEGKjr8zjuoD15Wl48sObM0AjV35BPuQR1jfY4CE6VXo2h78+0c
 jYwpJpDmcd+o1bDrfFhWUME2DzABEkHhn4wNSETnM4E5RXZRMUbi4UiigzInibQr
 JOUTKwPJXTMX/Erd0XyXErrYf2qy1X9BQy6NlyDDOv+8kLEVRsC9Efplx9uoEtfi
 QvVT/UmgmhZFJBfIT3/B8OvasrfwOropaYoG4L0zbDpp1b09VY47N5lCLlNr/mZf
 jb2TwIln8Szy2EfIT2RSd0ZNupyU8V4aH/mYNpSlbUJ6mfvfIAttBSS/YH+Zeqku
 7zOJkoCusaySOCZCOQkeikL3ZBN+FHtNteXxmGnp34ed/tsfgGZj1lsbmkM2rrWo
 f2mQsYAclUA4KQeY9z/Xf7/c5wJUkME69PxOaaN23dOpBR7GA58Cvb0PQTnPlAiT
 KnH/JRweBHtcv4KEHMi2f5no4cxcmXyKTj7/TLyYNjc8LATL9Eo/nxG36PLxy4lN
 QPOWz11zEBLjQQ==
 =8Ftq
 -----END PGP SIGNATURE-----

Merge tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer cleanups from Thomas Gleixner:
 "A set of final cleanups for the timer subsystem:

   - Convert all del_timer[_sync]() instances over to the new
     timer_delete[_sync]() API and remove the legacy wrappers.

     Conversion was done with coccinelle plus some manual fixups as
     coccinelle chokes on scoped_guard().

   - The final cleanup of the hrtimer_init() to hrtimer_setup()
     conversion.

     This has been delayed to the end of the merge window, so that all
     patches which have been merged through other trees are in mainline
     and all new users are catched.

  Doing this right before rc1 ensures that new code which is merged post
  rc1 is not introducing new instances of the original functionality"

* tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tracing/timers: Rename the hrtimer_init event to hrtimer_setup
  hrtimers: Rename debug_init_on_stack() to debug_setup_on_stack()
  hrtimers: Rename debug_init() to debug_setup()
  hrtimers: Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper()
  hrtimers: Remove unnecessary NULL check in hrtimer_start_range_ns()
  hrtimers: Make callback function pointer private
  hrtimers: Merge __hrtimer_init() into __hrtimer_setup()
  hrtimers: Switch to use __htimer_setup()
  hrtimers: Delete hrtimer_init()
  treewide: Convert new and leftover hrtimer_init() users
  treewide: Switch/rename to timer_delete[_sync]()
2025-04-06 08:35:37 -07:00
Linus Torvalds ff0c66685d A set of updates for the interrupt subsystem:
1) A treewide cleanup for the irq_domain code, which makes the naming
      consistent and gets rid of the original oddity of naming domains
      'host'.
 
      This is a trivial mechanical change and is done late to ensure that
      all instances have been catched and new code merged post rc1 wont
      reintroduce new instances.
 
   2) A trivial consistency fix in the migration code
 
      The recent introduction of irq_force_complete_move() in the core
      code, causes a problem for the nostalgia crowd who maintains ia64 out
      of tree.
 
      The code assumes that hierarchical interrupt domains are enabled and
      dereferences irq_data::parent_data unconditionally. That works in mainline
      because both architectures which enable that code have hierarchical domains
      enabled. Though it breaks the ia64 build, which enables the functionality,
      but does not have hierarchical domains.
 
      While it's not really a problem for mainline today, this
      unconditional dereference is inconsistent and trivially fixable by
      using the existing helper function irqd_get_parent_data(), which has
      the appropriate #ifdeffery in place.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmfyW1sTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWywD/sG69q7rjt0bBHleXPjjUIrM5TdRI9k
 r9S3BhVtZzfreiMnhQS1CLrA64fBFhKGJVo9HtKbsjC0hF8r10A1+OKEftYpydPz
 Mk7DreqCvQO/GQ/p2MiwHiQL39iXW5eFqL8qScafD8jUnkQ1kjHu53blLuoAzx2u
 ysfe/4V3KtcziKgShss4Y0SGg3CEL5sJiLbU7SLNCSRNkO/hCPh1KYAFcsrRaXnQ
 pcnHae8N58RrgGIhe1F9oPNji2B0YdQ2vt7Ora2g6TlbMv66LYQ+QCu++/0n3HZI
 EV/ikBtuF7zwAg6qzcmfY63XfTMj/K/Oj7qKTsMtcgHFlrpcQ9HW33qMUm90rATB
 Sx/oeiJS10XFlEoseX0dO8NoRE/ZvF9wioAXnvbxxZtOchr+3hyQSbI3hGdJoncL
 mqIRyf08o5kzBoRUY7Nqztlst6/+0bBgxPgDFsW7j47V/NBlUYQ0UBlB+FyoeVfk
 RWS3Z18jpKlvVNKn67ZYRI0zlaxgyyGszwSsLTpQvOFt2HGdKiHFeCuBiBVOboel
 vhtIRW+zT3cyMKvZimQ3BfKnBgFiEKd73VQIjaHBB+eLt2DtNpq6x0dnaOQLvVau
 7eSFgBKOwEz3zAu81omcgHwMb/5/Z46e5jrtliF4YFThHWUZPZFrhrr7JFJ+pqTz
 PTNWb0zGIzQCmg==
 =lhoB
 -----END PGP SIGNATURE-----

Merge tag 'irq-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull more irq updates from Thomas Gleixner:
 "A set of updates for the interrupt subsystem:

   - A treewide cleanup for the irq_domain code, which makes the naming
     consistent and gets rid of the original oddity of naming domains
     'host'.

     This is a trivial mechanical change and is done late to ensure that
     all instances have been catched and new code merged post rc1 wont
     reintroduce new instances.

   - A trivial consistency fix in the migration code

     The recent introduction of irq_force_complete_move() in the core
     code, causes a problem for the nostalgia crowd who maintains ia64
     out of tree.

     The code assumes that hierarchical interrupt domains are enabled
     and dereferences irq_data::parent_data unconditionally. That works
     in mainline because both architectures which enable that code have
     hierarchical domains enabled. Though it breaks the ia64 build,
     which enables the functionality, but does not have hierarchical
     domains.

     While it's not really a problem for mainline today, this
     unconditional dereference is inconsistent and trivially fixable by
     using the existing helper function irqd_get_parent_data(), which
     has the appropriate #ifdeffery in place"

* tag 'irq-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/migration: Use irqd_get_parent_data() in irq_force_complete_move()
  irqdomain: Stop using 'host' for domain
  irqdomain: Rename irq_get_default_host() to irq_get_default_domain()
  irqdomain: Rename irq_set_default_host() to irq_set_default_domain()
2025-04-06 08:17:43 -07:00
Linus Torvalds a91c49517d A revert to fix a adjtimex() regression:
The recent change to prevent that time goes backwards for the coarse time
 getters due to immediate multiplier adjustments via adjtimex(), changed the
 way how the timekeeping core treats that.
 
 That change result in a regression on the adjtimex() side, which is user
 space visible:
 
  1) The forwarding of the base time moves the update out of the original
     period and establishes a new one. That's changing the behaviour of the
     [PF]LL control, which user space expects to be applied periodically.
 
  2) The clearing of the accumulated NTP error due to #1, changes the
     behaviour as well.
 
 It was tried to delay the multiplier/frequency update to the next tick, but
 that did not solve the problem as userspace expects that the multiplier or
 frequency updates are in effect, when the syscall returns.
 
 There is a different solution for the coarse time problem available, so
 revert the offending commit to restore the existing adjtimex() behaviour.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmfyVtsTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoaeBEACIjssZjasdb/6sGgpp+6jdp3yCuUff
 XUB30O54u+54NBtIoxq8a8w74sI06Y1xmmRIRgmchBrylUooZyglaE1BfKZpzt5h
 FWszpSp3pfFOZ+A2rpNGWhskGxNVCDnGsPAsQSmPgCY17ZU5j+BkTSE4fcDZqftC
 E/Ojr67KD24kXGXDeQp08fSdXCfyd85PnFmpZmyqnDePuAA2JF6uAfqJE+QoeuUh
 KkQdARi+xAvXdzIRCLw5cQ/tlhxwPYrHOiMt/VRg/A44Nowl/+IEo83QjXRn7cz9
 sq1X2tAY42D/VSG01ZS8cpErWQuSYlI+hilFJ13POVZP+2xhZQUI3QzmrjG4+jqr
 s6I5g6RQyasG8tgkVTTR9+rIvSOAVkp0j0Y2tZ14e/9gi+/0+f5DYhxRc7MFPLW0
 ssS6oPIO1lsnU5KcaZ88SdDZ1OYmAj+L3R3dKM8PoggK8igZkaqezKwiH3RorKQJ
 8yZ5yfGYRNInzLHq7MUkai0xnLGbbx/hHCPZt+V7rNWP34eD+xykSKestC3wFscm
 jWAwP/CERz6mYR5mqicWkP52o39fIjbFixq+epAzBabmBJnPNBaUyb9V3MEf6ycq
 yWscFVjPu6koeX4MNUtDpcFdtb1QZJMJAtBxxnysFy03eNaryYRvta1t8EP/WgMz
 Zu71G7I8SvWrUw==
 =0MEH
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
 "A revert to fix a adjtimex() regression:

  The recent change to prevent that time goes backwards for the coarse
  time getters due to immediate multiplier adjustments via adjtimex(),
  changed the way how the timekeeping core treats that.

  That change result in a regression on the adjtimex() side, which is
  user space visible:

   1) The forwarding of the base time moves the update out of the
      original period and establishes a new one. That's changing the
      behaviour of the [PF]LL control, which user space expects to be
      applied periodically.

   2) The clearing of the accumulated NTP error due to #1, changes the
      behaviour as well.

  An attempt to delay the multiplier/frequency update to the next tick
  did not solve the problem as userspace expects that the multiplier or
  frequency updates are in effect, when the syscall returns.

  There is a different solution for the coarse time problem available,
  so revert the offending commit to restore the existing adjtimex()
  behaviour"

* tag 'timers-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "timekeeping: Fix possible inconsistencies in _COARSE clockids"
2025-04-06 08:13:16 -07:00
Linus Torvalds f4d2ef4825 Kbuild updates for v6.15
- Improve performance in gendwarfksyms
 
  - Remove deprecated EXTRA_*FLAGS and KBUILD_ENABLE_EXTRA_GCC_CHECKS
 
  - Support CONFIG_HEADERS_INSTALL for ARCH=um
 
  - Use more relative paths to sources files for better reproducibility
 
  - Support the loong64 Debian architecture
 
  - Add Kbuild bash completion
 
  - Introduce intermediate vmlinux.unstripped for architectures that need
    static relocations to be stripped from the final vmlinux
 
  - Fix versioning in Debian packages for -rc releases
 
  - Treat missing MODULE_DESCRIPTION() as an error
 
  - Convert Nios2 Makefiles to use the generic rule for built-in DTB
 
  - Add debuginfo support to the RPM package
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmfxp2EVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGkIUP/AgNiP6or6fmY5+HSyjlrdutBWAh
 QNW0AiKh5vytmBIv63/i103OE0SRbt+U6IApn9c7FQKkeuyIlD1e9NfSwFMZixmP
 P7t6JqDCL61G5d3W2Iisqle1cpBoVvNgUwu0k3sTSXl0vNsDbiyxcCzQzLhZMKsd
 O+Ppwp3zNGE2vIUwpIjzJsR5Dt/Z5MfuKDi4UShsyWpFZ1rg9X93YKc9QJOXjKwj
 4Np2x2cukDo2oz4uXuZQ8F1+bOFsKYoilCwjtxlrC6BO0lSPiJsRTN6nGJ0ejns9
 GGD56mBNGcGk+NEPGhAMQmZHqNAP4JfjEvAgaoSBn0Rdnjd9Cj/2T+4n61xkR4Wu
 MXCP/LEJ3MyctmkZjUq+0fDAe2wjxuaAG15kAHCha+9KxIG2NzHbf2XXb4E49DDU
 2rw3fqA41/cKCq1ZEaqRn3pZZgU6ysfsEW42JmnNxO+7zz9k8RX4rk8CVaVIEUuw
 Xojkis//KnE6+OCBe6Tb0H2Rzo0JF3AG2eNF4zY/xnc562FRIMS19WYS38tKZng6
 Gr1BRG0bA4t9mf2Vck1W1LcAb3Jh0mddtyrgYKhbcwq0YOj2q/H6F50DkC+wL282
 wvhV6B/vKAH8BByEWAn3rBcN0N+w/VFc0uPCz//tkoAm4nPg8PvKq63JHPrHsyZe
 mOMhifoiVbjF4KFo
 =GiQ6
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Improve performance in gendwarfksyms

 - Remove deprecated EXTRA_*FLAGS and KBUILD_ENABLE_EXTRA_GCC_CHECKS

 - Support CONFIG_HEADERS_INSTALL for ARCH=um

 - Use more relative paths to sources files for better reproducibility

 - Support the loong64 Debian architecture

 - Add Kbuild bash completion

 - Introduce intermediate vmlinux.unstripped for architectures that need
   static relocations to be stripped from the final vmlinux

 - Fix versioning in Debian packages for -rc releases

 - Treat missing MODULE_DESCRIPTION() as an error

 - Convert Nios2 Makefiles to use the generic rule for built-in DTB

 - Add debuginfo support to the RPM package

* tag 'kbuild-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (40 commits)
  kbuild: rpm-pkg: build a debuginfo RPM
  kconfig: merge_config: use an empty file as initfile
  nios2: migrate to the generic rule for built-in DTB
  rust: kbuild: skip `--remap-path-prefix` for `rustdoc`
  kbuild: pacman-pkg: hardcode module installation path
  kbuild: deb-pkg: don't set KBUILD_BUILD_VERSION unconditionally
  modpost: require a MODULE_DESCRIPTION()
  kbuild: make all file references relative to source root
  x86: drop unnecessary prefix map configuration
  kbuild: deb-pkg: add comment about future removal of KDEB_COMPRESS
  kbuild: Add a help message for "headers"
  kbuild: deb-pkg: remove "version" variable in mkdebian
  kbuild: deb-pkg: fix versioning for -rc releases
  Documentation/kbuild: Fix indentation in modules.rst example
  x86: Get rid of Makefile.postlink
  kbuild: Create intermediate vmlinux build with relocations preserved
  kbuild: Introduce Kconfig symbol for linking vmlinux with relocations
  kbuild: link-vmlinux.sh: Make output file name configurable
  kbuild: do not generate .tmp_vmlinux*.map when CONFIG_VMLINUX_MAP=y
  Revert "kheaders: Ignore silly-rename files"
  ...
2025-04-05 15:46:50 -07:00
Nam Cao 244132c4e5 tracing/timers: Rename the hrtimer_init event to hrtimer_setup
The function hrtimer_init() doesn't exist anymore. It was replaced by
hrtimer_setup().

Thus, rename the hrtimer_init trace event to hrtimer_setup to keep it
consistent.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/cba84c3d853c5258aa3a262363a6eac08e2c7afc.1738746927.git.namcao@linutronix.de
2025-04-05 10:30:17 +02:00
Nam Cao 59c9edafc0 hrtimers: Rename debug_init_on_stack() to debug_setup_on_stack()
All the hrtimer_init*() functions have been renamed to hrtimer_setup*().
Rename debug_init_on_stack() to debug_setup_on_stack() as well, to keep the
names consistent.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/073cf6162779a2f5b12624677d4c49ee7eccc1ed.1738746927.git.namcao@linutronix.de
2025-04-05 10:30:17 +02:00
Nam Cao e9ef2093ad hrtimers: Rename debug_init() to debug_setup()
All the hrtimer_init*() functions have been renamed to hrtimer_setup*().
Rename debug_init() to debug_setup() as well, to keep the names consistent.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/4b730c1f79648b16a1c5413f928fdc2e138dfc43.1738746927.git.namcao@linutronix.de
2025-04-05 10:30:17 +02:00
Nam Cao fcea1ccf24 hrtimers: Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper()
All the hrtimer_init*() functions have been renamed to hrtimer_setup*().
Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper() as well, to
keep the names consistent.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/807694aedad9353421c4a7347629a30c5c31026f.1738746927.git.namcao@linutronix.de
2025-04-05 10:30:17 +02:00
Nam Cao 1cc24f2e76 hrtimers: Remove unnecessary NULL check in hrtimer_start_range_ns()
The struct hrtimer::function field can only be changed using
hrtimer_setup*() or hrtimer_update_function(), and both already null-check
'function'. Therefore, null-checking 'function' in hrtimer_start_range_ns()
is not necessary.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/4661c571ee87980c340ccc318fc1a473c0c8f6bc.1738746927.git.namcao@linutronix.de
2025-04-05 10:30:17 +02:00
Nam Cao 04257da0c9 hrtimers: Make callback function pointer private
Make the struct hrtimer::function field private, to prevent users from
changing this field in an unsafe way. hrtimer_update_function() should be
used if the callback function needs to be changed.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/7d0e6e0c5c59a64a9bea940051aac05d750bc0c2.1738746927.git.namcao@linutronix.de
2025-04-05 10:30:17 +02:00
Nam Cao 87d82cff38 hrtimers: Merge __hrtimer_init() into __hrtimer_setup()
__hrtimer_init() is only called by __hrtimer_setup(). Simplify by merging
__hrtimer_init() into __hrtimer_setup().

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/8a0a847a35f711f66b2d05b57255aa44e7e61279.1738746927.git.namcao@linutronix.de
2025-04-05 10:30:17 +02:00
Nam Cao 50177a8b2e hrtimers: Switch to use __htimer_setup()
__hrtimer_init_sleeper() calls __hrtimer_init() and also sets up the
callback function. But there is already __hrtimer_setup() which does both
actions.

Switch to use __hrtimer_setup() to simplify the code.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/d9a45a51b6a8aa0045310d63f73753bf6b33f385.1738746927.git.namcao@linutronix.de
2025-04-05 10:30:17 +02:00
Nam Cao 9779489a31 hrtimers: Delete hrtimer_init()
hrtimer_init() is now unused. Delete it.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/003722f60c7a2a4f8d4ed24fb741aa313b7e5136.1738746927.git.namcao@linutronix.de
2025-04-05 10:30:17 +02:00
Thomas Gleixner 8fa7292fee treewide: Switch/rename to timer_delete[_sync]()
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.

Conversion was done with coccinelle plus manual fixups where necessary.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-05 10:30:12 +02:00
Thomas Gleixner 324a2219ba Revert "timekeeping: Fix possible inconsistencies in _COARSE clockids"
This reverts commit 757b000f7b.

Miroslav reported that the changes for handling the inconsistencies in the
coarse time getters result in a regression on the adjtimex() side.

There are two issues:

  1) The forwarding of the base time moves the update out of the original
     period and establishes a new one.

  2) The clearing of the accumulated NTP error is changing the behaviour as
     well.

Userspace expects that multiplier/frequency updates are in effect, when the
syscall returns, so delaying the update to the next tick is not solving the
problem either.

Revert the change, so that the established expectations of user space
implementations (ntpd, chronyd) are restored. The re-introduced
inconsistency of the coarse time getters will be addressed in a subsequent
fix.

Fixes: 757b000f7b ("timekeeping: Fix possible inconsistencies in _COARSE clockids")
Reported-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/Z-qsg6iDGlcIJulJ@localhost
2025-04-04 19:10:00 +02:00
Thomas Gleixner 9b305678c5 genirq/migration: Use irqd_get_parent_data() in irq_force_complete_move()
Frank reported, that the common irq_force_complete_move() breaks the out of
tree build of ia64. The reason is that ia64 uses the migration code, but
does not have hierarchical interrupt domains enabled.

This went unnoticed in mainline as both x86 and RISC-V have hierarchical
domains enabled. Not that it matters for mainline, but it's still
inconsistent.

Use irqd_get_parent_data() instead of accessing the parent_data field
directly. The helper returns NULL when hierarchical domains are disabled
otherwise it accesses the parent_data field of the domain.

No functional change.

Fixes: 751dc837da ("genirq: Introduce common irq_force_complete_move() implementation")
Reported-by: Frank Scheiner <frank.scheiner@web.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Frank Scheiner <frank.scheiner@web.de>
Link: https://lore.kernel.org/all/87h634ugig.ffs@tglx
2025-04-04 17:08:36 +02:00
Jiri Slaby (SUSE) 0a27ea384c irqdomain: Rename irq_get_default_host() to irq_get_default_domain()
Naming interrupt domains host is confusing at best and the irqdomain code
uses both domain and host inconsistently.

Therefore rename irq_get_default_host() to irq_get_default_domain().

Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-4-jirislaby@kernel.org
2025-04-04 16:39:10 +02:00
Jiri Slaby (SUSE) 825dfab23b irqdomain: Rename irq_set_default_host() to irq_set_default_domain()
Naming interrupt domains host is confusing at best and the irqdomain code
uses both domain and host inconsistently.

Therefore rename irq_set_default_host() to irq_set_default_domain().

Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-3-jirislaby@kernel.org
2025-04-04 16:39:10 +02:00
Linus Torvalds 6cb0bd94c0 Persistent buffer cleanups and simplifications for v6.15:
It was mistaken that the physical memory returned from "reserve_mem" had to
 be vmap()'d to get to it from a virtual address. But reserve_mem already
 maps the memory to the virtual address of the kernel so a simple
 phys_to_virt() can be used to get to the virtual address from the physical
 memory returned by "reserve_mem". With this new found knowledge, the
 code can be cleaned up and simplified.
 
 - Enforce that the persistent memory is page aligned
 
   As the buffers using the persistent memory are all going to be
   mapped via pages, make sure that the memory given to the tracing
   infrastructure is page aligned. If it is not, it will print a warning
   and fail to map the buffer.
 
 - Use phys_to_virt() to get the virtual address from reserve_mem
 
   Instead of calling vmap() on the physical memory returned from
   "reserve_mem", use phys_to_virt() instead.
 
   As the memory returned by "memmap" or any other means where a physical
   address is given to the tracing infrastructure, it still needs to
   be vmap(). Since this memory can never be returned back to the buddy
   allocator nor should it ever be memmory mapped to user space, flag
   this buffer and up the ref count. The ref count will keep it from
   ever being freed, and the flag will prevent it from ever being memory
   mapped to user space.
 
 - Use vmap_page_range() for memmap virtual address mapping
 
   For the memmap buffer, instead of allocating an array of struct pages,
   assigning them to the contiguous phsycial memory and then passing that to
   vmap(), use vmap_page_range() instead
 
 - Replace flush_dcache_folio() with flush_kernel_vmap_range()
 
   Instead of calling virt_to_folio() and passing that to
   flush_dcache_folio(), just call flush_kernel_vmap_range() directly.
   This also fixes a bug where if a subbuffer was bigger than PAGE_SIZE
   only the PAGE_SIZE portion would be flushed.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+6oZRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qhq6AP481KHAgaowQCg7zrKPkMlbYBIigYoU
 7aqoAg2rSLBRSQEAl8fViHZgZ9Q+O7xdozQWiIR7/KQW8VIaTcP/V7cHkAU=
 =+5JB
 -----END PGP SIGNATURE-----

Merge tag 'trace-ringbuffer-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ring-buffer updates from Steven Rostedt:
 "Persistent buffer cleanups and simplifications.

  It was mistaken that the physical memory returned from "reserve_mem"
  had to be vmap()'d to get to it from a virtual address. But
  reserve_mem already maps the memory to the virtual address of the
  kernel so a simple phys_to_virt() can be used to get to the virtual
  address from the physical memory returned by "reserve_mem". With this
  new found knowledge, the code can be cleaned up and simplified.

   - Enforce that the persistent memory is page aligned

     As the buffers using the persistent memory are all going to be
     mapped via pages, make sure that the memory given to the tracing
     infrastructure is page aligned. If it is not, it will print a
     warning and fail to map the buffer.

   - Use phys_to_virt() to get the virtual address from reserve_mem

     Instead of calling vmap() on the physical memory returned from
     "reserve_mem", use phys_to_virt() instead.

     As the memory returned by "memmap" or any other means where a
     physical address is given to the tracing infrastructure, it still
     needs to be vmap(). Since this memory can never be returned back to
     the buddy allocator nor should it ever be memmory mapped to user
     space, flag this buffer and up the ref count. The ref count will
     keep it from ever being freed, and the flag will prevent it from
     ever being memory mapped to user space.

   - Use vmap_page_range() for memmap virtual address mapping

     For the memmap buffer, instead of allocating an array of struct
     pages, assigning them to the contiguous phsycial memory and then
     passing that to vmap(), use vmap_page_range() instead

   - Replace flush_dcache_folio() with flush_kernel_vmap_range()

     Instead of calling virt_to_folio() and passing that to
     flush_dcache_folio(), just call flush_kernel_vmap_range() directly.
     This also fixes a bug where if a subbuffer was bigger than
     PAGE_SIZE only the PAGE_SIZE portion would be flushed"

* tag 'trace-ringbuffer-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Use flush_kernel_vmap_range() over flush_dcache_folio()
  tracing: Use vmap_page_range() to map memmap ring buffer
  tracing: Have reserve_mem use phys_to_virt() and separate from memmap buffer
  tracing: Enforce the persistent ring buffer to be page aligned
2025-04-03 16:09:29 -07:00
Linus Torvalds 8c7c1b5506 - The 2 patch series "mm: fixes for fallouts from mem_init() cleanup"
from Mike Rapoport fixes a couple of issues with the just-merged "arch,
   mm: reduce code duplication in mem_init()" series.
 
 - The 4 patch series "MAINTAINERS: add my isub-entries to MM part." from
   Mike Rapoport does some maintenance on MAINTAINERS.
 
 - The 6 patch series "remove tlb_remove_page_ptdesc()" from Qi Zheng
   does some cleanup work to the page mapping code.
 
 - The 7 patch series "mseal system mappings" from Jeff Xu permits
   sealing of "system mappings", such as vdso, vvar, vvar_vclock, vectors
   (arm compat-mode), sigpage (arm compat-mode).
 
 - Plus the usual shower of singleton patches.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+4XpgAKCRDdBJ7gKXxA
 jnwtAP43Rp3zyWf034fEypea36xQqcsy4I7YUTdZEgnFS7LCZwEApM97JvGHsYEr
 Ns9Zhnh+E3RWASfOAzJoVZVrAaMovg4=
 =MyVR
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-04-02-22-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more MM updates from Andrew Morton:

 - The series "mm: fixes for fallouts from mem_init() cleanup" from Mike
   Rapoport fixes a couple of issues with the just-merged "arch, mm:
   reduce code duplication in mem_init()" series

 - The series "MAINTAINERS: add my isub-entries to MM part." from Mike
   Rapoport does some maintenance on MAINTAINERS

 - The series "remove tlb_remove_page_ptdesc()" from Qi Zheng does some
   cleanup work to the page mapping code

 - The series "mseal system mappings" from Jeff Xu permits sealing of
   "system mappings", such as vdso, vvar, vvar_vclock, vectors (arm
   compat-mode), sigpage (arm compat-mode)

 - Plus the usual shower of singleton patches

* tag 'mm-stable-2025-04-02-22-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits)
  mseal sysmap: add arch-support txt
  mseal sysmap: enable s390
  selftest: test system mappings are sealed
  mseal sysmap: update mseal.rst
  mseal sysmap: uprobe mapping
  mseal sysmap: enable arm64
  mseal sysmap: enable x86-64
  mseal sysmap: generic vdso vvar mapping
  selftests: x86: test_mremap_vdso: skip if vdso is msealed
  mseal sysmap: kernel config and header change
  mm: pgtable: remove tlb_remove_page_ptdesc()
  x86: pgtable: convert to use tlb_remove_ptdesc()
  riscv: pgtable: unconditionally use tlb_remove_ptdesc()
  mm: pgtable: convert some architectures to use tlb_remove_ptdesc()
  mm: pgtable: change pt parameter of tlb_remove_ptdesc() to struct ptdesc*
  mm: pgtable: make generic tlb_remove_table() use struct ptdesc
  microblaze/mm: put mm_cmdline_setup() in .init.text section
  mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
  MAINTAINERS: mm: add entry for secretmem
  MAINTAINERS: mm: add entry for numa memblocks and numa emulation
  ...
2025-04-03 11:10:00 -07:00
Linus Torvalds ea59cb7423 sched_ext: Fixes for v6.15-rc0
- Calling scx_bpf_create_dsq() with the same ID would succeed creating
   duplicate DSQs. Fix it to return -EEXIST.
 
 - scx_select_cpu_dfl() fixes and cleanups.
 
 - Synchronize tool/sched_ext with external scheduler repo. While this isn't
   a fix. There's no risk to the kernel and it's better if they stay synced
   closer.
 -----BEGIN PGP SIGNATURE-----
 
 iIMEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZ+29Eg4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGeNGAP97GCCCwovepx3f9HV3RRk8oEregsGI7gmr+TC5
 +XJrqwD4urg6I5JGM3K5dB9m626RyUP6k5RmYdjqBrEL6LauCg==
 =uWzD
 -----END PGP SIGNATURE-----

Merge tag 'sched_ext-for-6.15-rc0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext fixes from Tejun Heo:

 - Calling scx_bpf_create_dsq() with the same ID would succeed creating
   duplicate DSQs. Fix it to return -EEXIST.

 - scx_select_cpu_dfl() fixes and cleanups.

 - Synchronize tool/sched_ext with external scheduler repo. While this
   isn't a fix. There's no risk to the kernel and it's better if they
   stay synced closer.

* tag 'sched_ext-for-6.15-rc0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  tools/sched_ext: Sync with scx repo
  sched_ext: initialize built-in idle state before ops.init()
  sched_ext: create_dsq: Return -EEXIST on duplicate request
  sched_ext: Remove a meaningless conditional goto in scx_select_cpu_dfl()
  sched_ext: idle: Fix return code of scx_select_cpu_dfl()
2025-04-03 10:03:38 -07:00
Linus Torvalds 41677970ad tracing fixes for 6.15
- Fix build error when CONFIG_PROBE_EVENTS_BTF_ARGS is not enabled
 
   The tracing of arguments in the function tracer depends on some
   functions that are only defined when PROBE_EVENTS_BTF_ARGS is enabled.
   In fact, PROBE_EVENTS_BTF_ARGS also depends on all the same configs
   as the function argument tracing requires. Just have the function
   argument tracing depend on PROBE_EVENTS_BTF_ARGS.
 
 - Free module_delta for persistent ring buffer instance
 
   When an instance holds the persistent ring buffer, it allocates
   a helper array to hold the deltas between where modules are loaded
   on the last boot and the current boot. This array needs to be freed
   when the instance is freed.
 
 - Add cond_resched() to loop in ftrace_graph_set_hash()
 
   The hash functions in ftrace loop over every function that can be
   enabled by ftrace. This can be 50,000 functions or more. This
   loop is known to trigger soft lockup warnings and requires a
   cond_resched(). The loop in ftrace_graph_set_hash() was missing it.
 
 - Fix the event format verifier to include "%*p.." arguments
 
   To prevent events from dereferencing stale pointers that can
   happen if a trace event uses a dereferece pointer to something
   that was not copied into the ring buffer and can be freed by the
   time the trace is read, a verifier is called. At boot or module
   load, the verifier scans the print format string for pointers
   that can be dereferenced and it checks the arguments to make sure
   they do not contain something that can be freed. The "%*p" was
   not handled, which would add another argument and cause the verifier
   to not only not verify this pointer, but it will look at the wrong
   argument for every pointer after that.
 
 - Fix mcount sorttable building for different endian type target
 
   When modifying the ELF file to sort the mcount_loc table in the
   sorttable.c code, the endianess of the file and the host is used
   to determine if the bytes need to be swapped when calculations are
   done. A change was made to the sorting of the mcount_loc that read
   the values from the ELF file into an array and the swap happened
   on the filling of the array. But one of the calculations of the
   array still did the swap when it did not need to. This caused building
   on a little endian machine for a big endian target to not find
   the mcount function in the 'nm' table and it zeroed it out, causing
   there to be no functions available to trace.
 
 - Add goto out_unlock jump to rv_register_monitor() on error path
 
   One of the error paths in rv_register_monitor() just returned the
   error when it should have jumped to the out_unlock label to release
   the mutex.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+2tyBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qjPYAPwJDti6nHTqheFwIa1WzJ3yC2tKRYKt
 1E5PYW/2Ct5NmwEAqgg3TvJppXHymVdutLghhGFnlBnyTWMI+KIhparSBw8=
 =NFM5
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix build error when CONFIG_PROBE_EVENTS_BTF_ARGS is not enabled

   The tracing of arguments in the function tracer depends on some
   functions that are only defined when PROBE_EVENTS_BTF_ARGS is
   enabled. In fact, PROBE_EVENTS_BTF_ARGS also depends on all the same
   configs as the function argument tracing requires. Just have the
   function argument tracing depend on PROBE_EVENTS_BTF_ARGS.

 - Free module_delta for persistent ring buffer instance

   When an instance holds the persistent ring buffer, it allocates a
   helper array to hold the deltas between where modules are loaded on
   the last boot and the current boot. This array needs to be freed when
   the instance is freed.

 - Add cond_resched() to loop in ftrace_graph_set_hash()

   The hash functions in ftrace loop over every function that can be
   enabled by ftrace. This can be 50,000 functions or more. This loop is
   known to trigger soft lockup warnings and requires a cond_resched().
   The loop in ftrace_graph_set_hash() was missing it.

 - Fix the event format verifier to include "%*p.." arguments

   To prevent events from dereferencing stale pointers that can happen
   if a trace event uses a dereferece pointer to something that was not
   copied into the ring buffer and can be freed by the time the trace is
   read, a verifier is called. At boot or module load, the verifier
   scans the print format string for pointers that can be dereferenced
   and it checks the arguments to make sure they do not contain
   something that can be freed. The "%*p" was not handled, which would
   add another argument and cause the verifier to not only not verify
   this pointer, but it will look at the wrong argument for every
   pointer after that.

 - Fix mcount sorttable building for different endian type target

   When modifying the ELF file to sort the mcount_loc table in the
   sorttable.c code, the endianess of the file and the host is used to
   determine if the bytes need to be swapped when calculations are done.
   A change was made to the sorting of the mcount_loc that read the
   values from the ELF file into an array and the swap happened on the
   filling of the array. But one of the calculations of the array still
   did the swap when it did not need to. This caused building on a
   little endian machine for a big endian target to not find the mcount
   function in the 'nm' table and it zeroed it out, causing there to be
   no functions available to trace.

 - Add goto out_unlock jump to rv_register_monitor() on error path

   One of the error paths in rv_register_monitor() just returned the
   error when it should have jumped to the out_unlock label to release
   the mutex.

* tag 'trace-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rv: Fix missing unlock on double nested monitors return path
  scripts/sorttable: Fix endianness handling in build-time mcount sort
  tracing: Verify event formats that have "%*p.."
  ftrace: Add cond_resched() to ftrace_graph_set_hash()
  tracing: Free module_delta on freeing of persistent ring buffer
  ftrace: Have tracing function args depend on PROBE_EVENTS_BTF_ARGS
2025-04-03 09:52:44 -07:00
Mathieu Desnoyers 169eae7711 rseq: Eliminate useless task_work on execve
Eliminate a useless task_work on execve by moving the call to
rseq_set_notify_resume() from sched_mm_cid_after_execve() to the error
path of bprm_execve().

The call to rseq_set_notify_resume() from sched_mm_cid_after_execve() is
pointless in the success case, because rseq_execve() will clear the rseq
pointer before returning to userspace.

sched_mm_cid_after_execve() is called from both the success and error
paths of bprm_execve(). The call to rseq_set_notify_resume() is needed
on error because the mm_cid may have changed.

Also move the rseq_execve() to right after sched_mm_cid_after_execve()
in bprm_execve().

[ mingo: Merged to a recent upstream kernel, extended the changelog. ]

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250327132945.1558783-1-mathieu.desnoyers@efficios.com
2025-04-03 13:10:47 +02:00
Linus Torvalds ddd0172f18 TTY/Serial driver updates for 6.15-rc1
Here is the big set of serial and tty driver updates for 6.15-rc1.
 Include in here are the following:
   - more great tty layer cleanups from Jiri.  Someday this will be done,
     but that's not going to be any year soon...
   - kdb debug driver reverts to fix a reported issue
   - lots of .dts binding updates for different devices with serial
     devices
   - lots of tiny updates and tweaks and a few bugfixes for different
     serial drivers.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZ+2YPA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yn2OQCgvxCyoeuNPuV4X89JdrgocMTMyTYAn15pGgDa
 r7w9UDO/D7UqRnKEnFy+
 =lJwK
 -----END PGP SIGNATURE-----

Merge tag 'tty-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver updates from Greg KH:
 "Here is the big set of serial and tty driver updates for 6.15-rc1.
  Include in here are the following:

   - more great tty layer cleanups from Jiri. Someday this will be done,
     but that's not going to be any year soon...

   - kdb debug driver reverts to fix a reported issue

   - lots of .dts binding updates for different devices with serial
     devices

   - lots of tiny updates and tweaks and a few bugfixes for different
     serial drivers.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (79 commits)
  tty: serial: fsl_lpuart: Fix unused variable 'sport' build warning
  serial: stm32: do not deassert RS485 RTS GPIO prematurely
  serial: 8250: add driver for NI UARTs
  dt-bindings: serial: snps-dw-apb-uart: document RZ/N1 binding without DMA
  serial: icom: fix code format problems
  serial: sh-sci: Save and restore more registers
  tty: serial: pl011: remove incorrect of_match_ptr annotation
  dt-bindings: serial: snps-dw-apb-uart: Add support for rk3562
  tty: serial: lpuart: only disable CTS instead of overwriting the whole UARTMODIR register
  tty: caif: removed unused function debugfs_tx()
  serial: 8250_dma: terminate correct DMA in tx_dma_flush()
  tty: serial: fsl_lpuart: rename register variables more specifically
  tty: serial: fsl_lpuart: use port struct directly to simply code
  tty: serial: fsl_lpuart: Use u32 and u8 for register variables
  tty: serial: fsl_lpuart: disable transmitter before changing RS485 related registers
  tty: serial: 8250: Add Brainboxes XC devices
  dt-bindings: serial: fsl-lpuart: support i.MX94
  tty: serial: 8250: Add some more device IDs
  dt-bindings: serial: samsung: add exynos7870-uart compatible
  serial: 8250_dw: Comment possible corner cases in serial_out() implementation
  ...
2025-04-02 18:17:33 -07:00
Linus Torvalds 4b06c990c1 vfs-6.15-rc1.fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ+1ZqAAKCRCRxhvAZXjc
 oqO7AP4jdW03PHsk5zkbuUTzTTngkZQ1AypIJLTCYIoKPATMowD+ILMjOTsVOfxA
 h38ziAM3tubsz1pwkGuIlsU+drwz5Ao=
 =QfJV
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

 - Add a new maintainer for configfs

 - Fix exportfs module description

 - Place flexible array memeber at the end of an internal struct in the
   mount code

 - Add new maintainer for netfslib as Jeff Layton is stepping down as
   current co-maintainer

 - Fix error handling in cachefiles_get_directory()

 - Cleanup do_notify_pidfd()

 - Fix syscall number definitions in pidfd selftests

 - Fix racy usage of fs_struct->in exec during multi-threaded exec

 - Ensure correct exit code is reported when pidfs_exit() is called from
   release_task() for a delayed thread-group leader exit

 - Fix conflicting iomap flag definitions

* tag 'vfs-6.15-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  iomap: Fix conflicting values of iomap flags
  fs: namespace: Avoid -Wflex-array-member-not-at-end warning
  MAINTAINERS: configfs: add Andreas Hindborg as maintainer
  exportfs: add module description
  exit: fix the usage of delay_group_leader->exit_code in do_notify_parent() and pidfs_exit()
  netfs: add Paulo as maintainer and remove myself as Reviewer
  cachefiles: Fix oops in vfs_mkdir from cachefiles_get_directory
  exec: fix the racy usage of fs_struct->in_exec
  selftests/pidfd: fixes syscall number defines
  pidfs: cleanup the usage of do_notify_pidfd()
2025-04-02 16:05:21 -07:00
Linus Torvalds 92b71befc3 These are objtool fixes and updates by Josh Poimboeuf, centered
around the fallout from the new CONFIG_OBJTOOL_WERROR=y feature,
 which, despite its default-off nature, increased the profile/impact
 of objtool warnings:
 
  - Improve error handling and the presentation of warnings/errors.
 
  - Revert the new summary warning line that some test-bot tools
    interpreted as new regressions.
 
  - Fix a number of objtool warnings in various drivers, core kernel
    code and architecture code. About half of them are potential
    problems related to out-of-bounds accesses or potential undefined
    behavior, the other half are additional objtool annotations.
 
  - Update objtool to latest (known) compiler quirks and
    objtool bugs triggered by compiler code generation
 
  - Misc fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfsRJMRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g0YRAApiCylIv+0ucdKiDVAiI+cU7dqAggFp9h
 ULcTuuCtVkfjYzIBw6y1Iw9JeYsyngYaI0VEMmLasJPt8o93K0vwBXGArXJKoMeu
 UPcVS8N6+LqrHsWBXk919t1wgBZ7csgUxsCa1K47NKa3eCijrqI0N8PtcoYqKd+M
 tOuyEcTCTfS0E2STv6Gpdp6VfDKms3Cn4MffLbcNWJXAsd1dwzDIG8IvAHUW9yG3
 /ezVjm46thneNrRd9j/qU3mqNmhsec9NemHG7URaTznRKleWULhpmhGmcPYCh4Rj
 AqGjmPtqprPELtgezeV+LIcmIm5UWF/f+0tzzBrsRy1MiY8ED2w+J51DHsLoHg8t
 IfIkPyYX/zu9StXoRIwx/7C5NQqBlUfXGp6TuOOwzgbKOt+uRJOU6SnSQ06ZDwsa
 l2brQ+NDfvF7EvGnvi18wIM+iqMc2jSuWl0AT94ATDuAZGCyzlmwluIYmDuLfyZM
 JuYOogojt5vgHXDN6Ro3rDfK+tYckwez+Txx4oByGB3IJy75osBihtvHiYno7FgW
 KXDbiAfLZ4SlfPzqxI6PPzaj3py6hG9LICEiL0U8VecC7bZ/22BZQCpdKko+/E/Y
 PwlqCatqz/25U7GlsnfBISJO2VAyyUcbymvjnVXzZCi+IPAfeih6WcsTPJ96jxsa
 LULLCnuvmoY=
 =KkiI
 -----END PGP SIGNATURE-----

Merge tag 'objtool-urgent-2025-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fixes from Ingo Molnar:
 "These are objtool fixes and updates by Josh Poimboeuf, centered around
  the fallout from the new CONFIG_OBJTOOL_WERROR=y feature, which,
  despite its default-off nature, increased the profile/impact of
  objtool warnings:

   - Improve error handling and the presentation of warnings/errors

   - Revert the new summary warning line that some test-bot tools
     interpreted as new regressions

   - Fix a number of objtool warnings in various drivers, core kernel
     code and architecture code. About half of them are potential
     problems related to out-of-bounds accesses or potential undefined
     behavior, the other half are additional objtool annotations

   - Update objtool to latest (known) compiler quirks and objtool bugs
     triggered by compiler code generation

   - Misc fixes"

* tag 'objtool-urgent-2025-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  objtool/loongarch: Add unwind hints in prepare_frametrace()
  rcu-tasks: Always inline rcu_irq_work_resched()
  context_tracking: Always inline ct_{nmi,irq}_{enter,exit}()
  sched/smt: Always inline sched_smt_active()
  objtool: Fix verbose disassembly if CROSS_COMPILE isn't set
  objtool: Change "warning:" to "error: " for fatal errors
  objtool: Always fail on fatal errors
  Revert "objtool: Increase per-function WARN_FUNC() rate limit"
  objtool: Append "()" to function name in "unexpected end of section" warning
  objtool: Ignore end-of-section jumps for KCOV/GCOV
  objtool: Silence more KCOV warnings, part 2
  objtool, drm/vmwgfx: Don't ignore vmw_send_msg() for ORC
  objtool: Fix STACK_FRAME_NON_STANDARD for cold subfunctions
  objtool: Fix segfault in ignore_unreachable_insn()
  objtool: Fix NULL printf() '%s' argument in builtin-check.c:save_argv()
  objtool, lkdtm: Obfuscate the do_nothing() pointer
  objtool, regulator: rk808: Remove potential undefined behavior in rk806_set_mode_dcdc()
  objtool, ASoC: codecs: wcd934x: Remove potential undefined behavior in wcd934x_slim_irq_handler()
  objtool, Input: cyapa - Remove undefined behavior in cyapa_update_fw_store()
  objtool, panic: Disable SMAP in __stack_chk_fail()
  ...
2025-04-02 10:30:10 -07:00
Linus Torvalds af54a3a151 more printk changes for 6.15
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmftIx0ACgkQUqAMR0iA
 lPJKUxAArJaWC7EXtDtd8u8Rl/CYpIEaMdPd7V+XA5sqdUyjkSJI+jRswonpOsNX
 Zn9pbGMds1LNBXm1NO9039+2TrPSJFTCGK6OgeCJC17/O31wnnm0LhZiU+JElgfi
 iQI5fdTnc3sB37bsjkvEUr9HFizRxY2fHHMWZ8ngiLfkKfki4ET+1u/yf7CraRk1
 6+LK9mM/WyytP6gYaSlL5YYVYs9fNcR/ND6IQgpfIN15/fOAOXWbMB1jE2iDRzqt
 MQUD4+DTYQYmeS6jQ4ToZdx3Ql9NwcP2nJnA5fxXeqPFHc/SgRS6KqOPQgQUD4tV
 N4q6ozLPlzDFeHVHMhPz/PzlSEn0zC1ZX87xXCUAilnkJpbEujcPxf44R/3RHu3d
 y7kmCRj0RwgHpLIwzLH5POrF4il9/wVlyZFRaYBPMkj09l0WBwYvfMhlnzvAtCP8
 pRKqHkjJ1FOWQFJyn98ONqcCmm2pZ8XKW2enikAhISVXcptI/1lIQ6IIpRdTjte1
 r60CbiJ7UFL+TrVqsWBuqWQRi5u5HykPkZiCL/YYXzZmrl3zLO+0ti9YzEU8Yrzd
 K1VAB/1aK/MDrTgOI+VaqlPq79uJBwtbrflgFhFBKAKsqTpBcsZUv9/1KHthnqXV
 Y84SsY2XpoGtjn58mU6eEc+8lLTOTDVXs+ZZL4/M3maW7ygNiYY=
 =Biv4
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull more printk updates from Petr Mladek:

 - Silence warnings about candidates for ‘gnu_print’ format attribute

* tag 'printk-for-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  vsnprintf: Silence false positive GCC warning for va_format()
  vsnprintf: Drop unused const char fmt * in va_format()
  vsnprintf: Mark binary printing functions with __printf() attribute
  tracing: Mark binary printing functions with __printf() attribute
  seq_file: Mark binary printing functions with __printf() attribute
  seq_buf: Mark binary printing functions with __printf() attribute
2025-04-02 10:05:55 -07:00
Linus Torvalds da0512b2a3 RCU fixes for v6.15:
- srcu: Make FORCE_NEED_SRCU_NMI_SAFE depend on RCU_EXPERT
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEj5IosQTPz8XU1wRHSXnow7UH+rgFAmfs09kACgkQSXnow7UH
 +rg+tQf/ehvjwWwijwLfaDRuVieLUQDEiTLNAsSmDDx9p/620lAm3PtIyi4XBT4d
 tPH15uoqNaFF4fOwWouiIAbJTCgmzg6aOrg8U2Nc1KRGS7JdNUBMV+MxKYJncBYh
 NNcw97n/HGMvi2BWLFj1xdOlSEMITX5xRZArp7c/PVRCau7DDC2lj2Ht47zYPOY3
 echRbQzozLiFCuHseGEiEpVfa00lq0Pg1UyWC+5cXCLVKhv6XlV1kMrsVOWMpF39
 g2CXT5QCTENnPHXBj1wCTG7hZMLVjnlcCE8+tMf92lwmc1zVM5L/T3GZLFzPBb42
 mJE6UhaqiLJYctplnoygWu4xhgLI4A==
 =MPwV
 -----END PGP SIGNATURE-----

Merge tag 'rcu-fixes-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU fix from Boqun Feng:

 - srcu: Make FORCE_NEED_SRCU_NMI_SAFE depend on RCU_EXPERT

* tag 'rcu-fixes-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux:
  srcu: Make FORCE_NEED_SRCU_NMI_SAFE depend on RCU_EXPERT
2025-04-02 10:04:48 -07:00
Linus Torvalds 002dcfd057 kgdb patches for 6.15
Two clean ups this cycle. The larger of which is the removal of a private
 allocator within kdb and replacing it with regular memory allocation. The
 other adopts the simplified version of strscpy() in a couple of places in
 kdb.
 
 Signed-off-by: Daniel Thompson (RISCstar) <danielt@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEELzVBU1D3lWq6cKzwfOMlXTn3iKEFAmfqlAkACgkQfOMlXTn3
 iKFY4Q/9E8YEVGzvQykVAaxYy3etPm39aua3VtTgyhmMTAcEoxJy4g0yCKkErtec
 r0/AbbVbKtiV3PD/2QiWgYY6UHVuu8fLDgG2wgYwaAoUUuvpng5QZkUzAeGbIVrq
 XmzYPoS5ymVmjZfs9vS5IEvY2syATAYWfYe+zztaqT3WzG2ajPF80VyxqdH+4kKM
 Ds4RyQIRYzKndZF0qJ+absnKujgVbZcOUSxjawutZ2vbZtULDnvY3psuiJpnOlbw
 C2kd21K3IiJCLduHRhPr0VzK9xrOe4TilsJHU5bRr6W2t0My/bPa8TQag4g+G2cK
 xY8SKEaIZHOR3swnErwy0EERRr1WYpqC8e1wa9wYFkkS7rjAZ+wSzmnSJdOPKpyq
 0xYN/qTCsFCNBkUjUZc0zOw8+sOx+NPRp7oQKY4v7qAj9ptOMoUACc9dZnIFyqfX
 Rnj7Dn3Jx7g4+OD67UT9/X7b9InsXcr9aeSTYCoc9IdwEFpmKaZ5dPbYOqx4J9lS
 tLoRylprfzP7N0EVy8+R5TQ8fT9ct8tdQFGyVmzTmuKt9DurStNX5xJ2wS6Ot5rK
 +oOEuG1UULI3PdZFftQMtBZird7blhTJhsAqsMWXVLGt2spoZ+Qj99EGVrYpCVkj
 H0eEy0XfcpqklFsfyy0NR1K0oFgTgFSULBjohfLi5MZsyJIBeo4=
 =mLdP
 -----END PGP SIGNATURE-----

Merge tag 'kgdb-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux

Pull kgdb updates from Daniel Thompson:
 "Two cleanups this cycle. The larger of which is the removal of a
  private allocator within kdb and replacing it with regular memory
  allocation. The other adopts the simplified version of strscpy() in a
  couple of places in kdb"

* tag 'kgdb-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
  kdb: Remove optional size arguments from strscpy() calls
  kdb: remove usage of static environment buffer
2025-04-02 09:55:51 -07:00
Steven Rostedt e4d4b8670c ring-buffer: Use flush_kernel_vmap_range() over flush_dcache_folio()
Some architectures do not have data cache coherency between user and
kernel space. For these architectures, the cache needs to be flushed on
both the kernel and user addresses so that user space can see the updates
the kernel has made.

Instead of using flush_dcache_folio() and playing with virt_to_folio()
within the call to that function, use flush_kernel_vmap_range() which
takes the virtual address and does the work for those architectures that
need it.

This also fixes a bug where the flush of the reader page only flushed one
page. If the sub-buffer order is 1 or more, where the sub-buffer size
would be greater than a page, it would miss the rest of the sub-buffer
content, as the "reader page" is not just a page, but the size of a
sub-buffer.

Link: https://lore.kernel.org/all/CAG48ez3w0my4Rwttbc5tEbNsme6tc0mrSN95thjXUFaJ3aQ6SA@mail.gmail.com/

Cc: stable@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Link: https://lore.kernel.org/20250402144953.920792197@goodmis.org
Fixes: 117c39200d ("ring-buffer: Introducing ring-buffer mapping functions");
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 11:02:27 -04:00
Steven Rostedt 394f3f02de tracing: Use vmap_page_range() to map memmap ring buffer
The code to map the physical memory retrieved by memmap currently
allocates an array of pages to cover the physical memory and then calls
vmap() to map it to a virtual address. Instead of using this temporary
array of struct page descriptors, simply use vmap_page_range() that can
directly map the contiguous physical memory to a virtual address.

Link: https://lore.kernel.org/all/CAHk-=whUOfVucfJRt7E0AH+GV41ELmS4wJqxHDnui6Giddfkzw@mail.gmail.com/

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/20250402144953.754618481@goodmis.org
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 11:02:27 -04:00
Steven Rostedt 34ea8fa084 tracing: Have reserve_mem use phys_to_virt() and separate from memmap buffer
The reserve_mem kernel command line option may pass back a physical
address, but the memory is still part of the normal memory just like
using memblock_alloc() would be. This means that the physical memory
returned by the reserve_mem command line option can be converted directly
to virtual memory by simply using phys_to_virt().

When freeing the buffer there's no need to call vunmap() anymore as the
memory allocated by reserve_mem is freed by the call to
reserve_mem_release_by_name().

Because the persistent ring buffer can also be allocated via the memmap
option, which *is* different than normal memory as it cannot be added back
to the buddy system, it must be treated differently. It still needs to be
virtually mapped to have access to it. It also can not be freed nor can it
ever be memory mapped to user space.

Create a new trace_array flag called TRACE_ARRAY_FL_MEMMAP which gets set
if the buffer is created by the memmap option, and this will prevent the
buffer from being memory mapped by user space.

Also increment the ref count for memmap'ed buffers so that they can never
be freed.

Link: https://lore.kernel.org/all/Z-wFszhJ_9o4dc8O@kernel.org/

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/20250402144953.583750106@goodmis.org
Suggested-by: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 11:02:26 -04:00
Steven Rostedt c44a14f216 tracing: Enforce the persistent ring buffer to be page aligned
Enforce that the address and the size of the memory used by the persistent
ring buffer is page aligned. Also update the documentation to reflect this
requirement.

Link: https://lore.kernel.org/all/CAHk-=whUOfVucfJRt7E0AH+GV41ELmS4wJqxHDnui6Giddfkzw@mail.gmail.com/

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/20250402144953.412882844@goodmis.org
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 11:02:26 -04:00
Masami Hiramatsu (Google) dd941507a9 tracing: fprobe events: Fix possible UAF on modules
Commit ac91052f0a ("tracing: tprobe-events: Fix leakage of module
refcount") moved try_module_get() from __find_tracepoint_module_cb()
to find_tracepoint() caller, but that introduced a possible UAF
because the module can be unloaded before try_module_get(). In this
case, the module object should be freed too. Thus, try_module_get()
does not only fail but may access to the freed object.

To avoid that, try_module_get() in __find_tracepoint_module_cb()
again.

Link: https://lore.kernel.org/all/174342990779.781946.9138388479067729366.stgit@devnote2/

Fixes: ac91052f0a ("tracing: tprobe-events: Fix leakage of module refcount")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-04-02 23:18:58 +09:00
Masami Hiramatsu (Google) d24fa977ee tracing: fprobe: Fix to lock module while registering fprobe
Since register_fprobe() does not get the module reference count while
registering fgraph filter, if the target functions (symbols) are in
modules, those modules can be unloaded when registering fprobe to
fgraph.

To avoid this issue, get the reference counter of module for each
symbol, and put it after register the fprobe.

Link: https://lore.kernel.org/all/174330568792.459674.16874380163991113156.stgit@devnote2/

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Closes: https://lore.kernel.org/all/20250325130628.3a9e234c@gandalf.local.home/
Fixes: 4346ba1604 ("fprobe: Rewrite fprobe on function-graph tracer")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-04-02 23:18:57 +09:00
Gabriele Monaco fc0585c7fa rv: Fix missing unlock on double nested monitors return path
RV doesn't support nested monitors having children monitors themselves
and exits with the EINVAL code. However, it returns without unlocking
the rv_interface_lock.

Unlock the lock before returning from the initialisation function.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/20250402071351.19864-2-gmonaco@redhat.com
Fixes: cb85c660fc ("rv: Add option for nested monitors and include sched")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Closes: https://lore.kernel.org/r/202503310200.UBXGitB4-lkp@intel.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 09:51:26 -04:00
Steven Rostedt ea8d7647f9 tracing: Verify event formats that have "%*p.."
The trace event verifier checks the formats of trace events to make sure
that they do not point at memory that is not in the trace event itself or
in data that will never be freed. If an event references data that was
allocated when the event triggered and that same data is freed before the
event is read, then the kernel can crash by reading freed memory.

The verifier runs at boot up (or module load) and scans the print formats
of the events and checks their arguments to make sure that dereferenced
pointers are safe. If the format uses "%*p.." the verifier will ignore it,
and that could be dangerous. Cover this case as well.

Also add to the sample code a use case of "%*pbl".

Link: https://lore.kernel.org/all/bcba4d76-2c3f-4d11-baf0-02905db953dd@oracle.com/

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 5013f454a3 ("tracing: Add check of trace event print fmts for dereferencing pointers")
Link: https://lore.kernel.org/20250327195311.2d89ec66@gandalf.local.home
Reported-by: Libo Chen <libo.chen@oracle.com>
Reviewed-by: Libo Chen <libo.chen@oracle.com>
Tested-by: Libo Chen <libo.chen@oracle.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 09:51:26 -04:00
zhoumin 42ea22e754 ftrace: Add cond_resched() to ftrace_graph_set_hash()
When the kernel contains a large number of functions that can be traced,
the loop in ftrace_graph_set_hash() may take a lot of time to execute.
This may trigger the softlockup watchdog.

Add cond_resched() within the loop to allow the kernel to remain
responsive even when processing a large number of functions.

This matches the cond_resched() that is used in other locations of the
code that iterates over all functions that can be traced.

Cc: stable@vger.kernel.org
Fixes: b9b0c831be ("ftrace: Convert graph filter to use hash tables")
Link: https://lore.kernel.org/tencent_3E06CE338692017B5809534B9C5C03DA7705@qq.com
Signed-off-by: zhoumin <teczm@foxmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 09:51:25 -04:00
Steven Rostedt 2c9ee74a6d tracing: Free module_delta on freeing of persistent ring buffer
If a persistent ring buffer is created, a "module_delta" array is also
allocated to hold the module deltas of loaded modules that match modules
in the scratch area. If this buffer gets freed, the module_delta array is
not freed and causes a memory leak.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250401124525.1f9ac02a@gandalf.local.home
Fixes: 35a380ddbc ("tracing: Show last module text symbols in the stacktrace")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 09:51:25 -04:00
Steven Rostedt b81ff11c21 ftrace: Have tracing function args depend on PROBE_EVENTS_BTF_ARGS
The option PROBE_EVENTS_BTF_ARGS enables the functions
btf_find_func_proto() and btf_get_func_param() which are used by the
function argument tracing code. The option FUNCTION_TRACE_ARGS was
dependent on the same configs that PROBE_EVENTS_BTF_ARGS was dependent on,
but it was also dependent on PROBE_EVENTS_BTF_ARGS. In fact, if
PROBE_EVENTS_BTF_ARGS is supported then FUNCTION_TRACE_ARGS is supported.

Just make FUNCTION_TRACE_ARGS depend on PROBE_EVENTS_BTF_ARGS.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/20250401113601.17fa1129@gandalf.local.home
Fixes: 533c20b062 ("ftrace: Add print_function_args()")
Closes: https://lore.kernel.org/all/DB9PR08MB75820599801BAD118D123D7D93AD2@DB9PR08MB7582.eurprd08.prod.outlook.com/
Reported-by: Christian Loehle <Christian.Loehle@arm.com>
Tested-by: Christian Loehle <Christian.Loehle@arm.com>
Tested-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 09:50:56 -04:00
Waiman Long a22b3d54de cgroup/cpuset: Fix race between newly created partition and dying one
There is a possible race between removing a cgroup diectory that is
a partition root and the creation of a new partition.  The partition
to be removed can be dying but still online, it doesn't not currently
participate in checking for exclusive CPUs conflict, but the exclusive
CPUs are still there in subpartitions_cpus and isolated_cpus. These
two cpumasks are global states that affect the operation of cpuset
partitions. The exclusive CPUs in dying cpusets will only be removed
when cpuset_css_offline() function is called after an RCU delay.

As a result, it is possible that a new partition can be created with
exclusive CPUs that overlap with those of a dying one. When that dying
partition is finally offlined, it removes those overlapping exclusive
CPUs from subpartitions_cpus and maybe isolated_cpus resulting in an
incorrect CPU configuration.

This bug was found when a warning was triggered in
remote_partition_disable() during testing because the subpartitions_cpus
mask was empty.

One possible way to fix this is to iterate the dying cpusets as well and
avoid using the exclusive CPUs in those dying cpusets. However, this
can still cause random partition creation failures or other anomalies
due to racing. A better way to fix this race is to reset the partition
state at the moment when a cpuset is being killed.

Introduce a new css_killed() CSS function pointer and call it, if
defined, before setting CSS_DYING flag in kill_css(). Also update the
css_is_dying() helper to use the CSS_DYING flag introduced by commit
33c35aa481 ("cgroup: Prevent kill_css() from being called more than
once") for proper synchronization.

Add a new cpuset_css_killed() function to reset the partition state of
a valid partition root if it is being killed.

Fixes: ee8dde0cd2 ("cpuset: Add new v2 cpuset.sched.partition flag")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-04-01 21:46:22 -10:00
Jeff Xu 3d38922abf mseal sysmap: uprobe mapping
Provide support to mseal the uprobe mapping.

Unlike other system mappings, the uprobe mapping is not established during
program startup.  However, its lifetime is the same as the process's
lifetime.  It could be sealed from creation.

Test was done with perf tool, and observe the uprobe mapping is sealed.

Link: https://lkml.kernel.org/r/20250305021711.3867874-6-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:16 -07:00