-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmmytygACgkQxWXV+ddt
WDuiKw/+IyoS2IE95Jd4G2hezcsaqJNAvv5q90vVyLepT/7jzRbYrHytk7z/0v56
+Pjc1JHgp+TidYKZa52E/R09eEvYCyuZvEq4bnClOnO3CAJDCixqTKWV70CcYiHX
HoSCuPML2CJMLZY3u6MxATgk46y1ey3UkbmQ4oufoUHrjAE+D3pDQsYQ9BWR/P6J
4LbyZ214uIUPvbp0wPJ14cVUMpNxs116JmWvK9dxWQNZSzFcq2IVuLwQUcZBPAdb
dursd0kt6HXXmNCITmUD8O+ipG4U1akn8FuCjaLqo4LF1AH2f6OzNjucsisfa1tE
4MD+ATzmNsew5q6dtyfcvv8Btl+olbTP4KGibLl9PCCM9vlkzl3EN5GPUllGi6S9
T8jqe2pILXZHEx1DIQjHaXJsnuHG9aEkUbkSHUxIa15QDj66omrZZkZG3EF5Buy9
TogJJgESYU19dt/y110Q/vD/LPOgJhB3NBXwIx1FDYA1OwaAjt6hUcVRVRI5PtpL
moAIG4nNrPz5Pa875NvguFmrXFIudpTqyANHCPVio4eqDR7LeSg9vVHj9zeOfeRD
gmn3WGuGNb6L+86TNet/nFQhjxhkKqsnSbzO2zoPQKhG6FS+HSLnUwvJYL4/YnkW
QEeveeI/6Duiei4DHuw7FXmZVNQxeBEWW6MTFDHA1UNV3HMKewA=
=TgSD
-----END PGP SIGNATURE-----
Merge tag 'for-7.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- detect possible file name hash collision earlier so it does not lead
to transaction abort
- handle b-tree leaf overflows when snapshotting a subvolume with set
received UUID, leading to transaction abort
- in zoned mode, reorder relocation block group initialization after
the transaction kthread start
- fix orphan cleanup state tracking of subvolume, this could lead to
invalid dentries under some conditions
- add locking around updates of dynamic reclain state update
- in subpage mode, add missing RCU unlock when trying to releae extent
buffer
- remap tree fixes:
- add missing description strings for the newly added remap tree
- properly update search key when iterating backrefs
* tag 'for-7.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: remove duplicated definition of btrfs_printk_in_rcu()
btrfs: remove unnecessary transaction abort in the received subvol ioctl
btrfs: abort transaction on failure to update root in the received subvol ioctl
btrfs: fix transaction abort on set received ioctl due to item overflow
btrfs: fix transaction abort when snapshotting received subvolumes
btrfs: fix transaction abort on file creation due to name hash collision
btrfs: read key again after incrementing slot in move_existing_remaps()
btrfs: add missing RCU unlock in error path in try_release_subpage_extent_buffer()
btrfs: set BTRFS_ROOT_ORPHAN_CLEANUP during subvol create
btrfs: zoned: move btrfs_zoned_reserve_data_reloc_bg() after kthread start
btrfs: hold space_info->lock when clearing periodic reclaim ready
btrfs: print-tree: add remap tree definitions
- fix race between freeing data and fs accessing it
- fix race on unreferenced rawdata dereference
- fix differential encoding verification
- fix unconfined unprivileged local user can do privileged policy management
- Fix double free of ns_name in aa_replace_profiles()
- fix missing bounds check on DEFAULT table in verify_dfa()
- fix side-effect bug in match_char() macro usage
- fix: limit the number of levels of policy namespaces
- replace recursive profile removal with iterative approach
- fix memory leak in verify_header
- validate DFA start states are in bounds in unpack_pdb
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE7cSDD705q2rFEEf7BS82cBjVw9gFAmmvWT8ACgkQBS82cBjV
w9h2NBAAhLieJZTWiut2bo3c2YUyjEWnb8EUReGu3e2bbMSlDADpsLVAjAZ1nBRZ
pIbsBLMtRnkIWLBHLYf8u0As3Rlt60JvlwpDeC9kkOB1/JNsqh/o4merVXvvPvbM
Ym+2cVeW05pfh5urRALVFUqKNQWfv+vNmaDSYvgLTjHSjUJf7TAVDQPnF18DOygi
1Ml2VeIc4eIjryuk1mS1AQyGCbuTrO2yOYoVncWU7RoWzf6vvEdrPK8rjesJq5zR
+T/I90ooT59QPB5LGhMCJEyXI1TUp4NdOCLHZ5/4lFj78RZ9GMDhZsqFmSixuAAy
oiSoPxa8ik4EiuxcPy73az28zlGP4RzNx0fsoE3XnBIMWSM9tkbD9qvgTlQptk3Q
UzPZAo4dHDSTuQpQIx4VYti5EjCNpLPp8WuyhMbHcFSFL3EQTc1NG2WWV99p7/a+
Pgg83N8jJOcTTyzd4d7p5ewgJLDQCDNoMQ+C6bhRwSY7oo6ZN/d7R9CUlwlyzYEX
86j7t19hi703lplyTBudQSNleunFgSa3v6/yboBAGP9RKXyvhrG6Txk6ahjv2Awe
zs5Uux92QedjDlrLi/DpDCfuBljZHyoeLw2As2lP5kJokS8cNAOFpf6d50Jmovxq
LiKmiT70ai2IIVkXDjG1n9690ZTeawJhkJNzcap6kF5FWhO0T7A=
=BS6j
-----END PGP SIGNATURE-----
Merge tag 'apparmor-pr-mainline-2026-03-09' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
Pull AppArmor fixes from John Johansen:
- fix race between freeing data and fs accessing it
- fix race on unreferenced rawdata dereference
- fix differential encoding verification
- fix unconfined unprivileged local user can do privileged policy management
- Fix double free of ns_name in aa_replace_profiles()
- fix missing bounds check on DEFAULT table in verify_dfa()
- fix side-effect bug in match_char() macro usage
- fix: limit the number of levels of policy namespaces
- replace recursive profile removal with iterative approach
- fix memory leak in verify_header
- validate DFA start states are in bounds in unpack_pdb
* tag 'apparmor-pr-mainline-2026-03-09' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: fix race between freeing data and fs accessing it
apparmor: fix race on rawdata dereference
apparmor: fix differential encoding verification
apparmor: fix unprivileged local user can do privileged policy management
apparmor: Fix double free of ns_name in aa_replace_profiles()
apparmor: fix missing bounds check on DEFAULT table in verify_dfa()
apparmor: fix side-effect bug in match_char() macro usage
apparmor: fix: limit the number of levels of policy namespaces
apparmor: replace recursive profile removal with iterative approach
apparmor: fix memory leak in verify_header
apparmor: validate DFA start states are in bounds in unpack_pdb
Blamed commit missed that both functions can be called with dev == NULL.
Also add unlikely() hints for these conditions that only fuzzers can hit.
Fixes: 6f1a9140ec ("net: add xmit recursion limit to tunnel xmit functions")
Signed-off-by: Eric Dumazet <edumazet@google.com>
CC: Weiming Shi <bestswngs@gmail.com>
Link: https://patch.msgid.link/20260312043908.2790803-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The NIX RAS health report path uses nix_af_rvu_err when handling the
NIX_AF_RVU_RAS case, so the report prints the ERR interrupt status rather
than the RAS interrupt status.
Use nix_af_rvu_ras for the NIX_AF_RVU_RAS report.
Fixes: 5ed66306ea ("octeontx2-af: Add devlink health reporters for NIX")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/20260310184824.1183651-2-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The NIX RAS health reporter recovery routine checks nix_af_rvu_int to
decide whether to re-enable NIX_AF_RAS interrupts. This is the RVU
interrupt status field and is unrelated to RAS events, so the recovery
flow may incorrectly skip re-enabling NIX_AF_RAS interrupts.
Check nix_af_rvu_ras instead before writing NIX_AF_RAS_ENA_W1S.
Fixes: 5ed66306ea ("octeontx2-af: Add devlink health reporters for NIX")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/20260310184824.1183651-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The "rx_filter" member of "hwtstamp_config" structure is an enum field and
does not support bitwise OR combination of multiple filter values. It
causes error while linuxptp application tries to match rx filter version.
Fix this by storing the requested filter type in a new port field.
Fixes: 97248adb5a ("net: ti: am65-cpsw: Update hw timestamping filter for PTPv1 RX packets")
Signed-off-by: Chintan Vankar <c-vankar@ti.com>
Link: https://patch.msgid.link/20260310160940.109822-1-c-vankar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In mana_gd_setup() error path, set gc->service_wq to NULL after
destroy_workqueue() to match the cleanup in mana_gd_cleanup().
This prevents a use-after-free if the workqueue pointer is checked
after a failed setup.
Fixes: f975a09552 ("net: mana: Fix double destroy_workqueue on service rescan PCI path")
Signed-off-by: Shiraz Saleem <shirazsaleem@microsoft.com>
Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260309172443.688392-1-kotaranov@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJdBAABCABHFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmmwGJYbFIAAAAAABAAO
bWFudTIsMi41KzEuMTEsMiwyDRxmd0BzdHJsZW4uZGUACgkQcJGo2a1f9gBpExAA
vbIa7R1p88Lt4nBEu3eB4gHhN9f8n25jF/bpzU3MTjB9iyEnVAdflD2dBBScO6fl
z0fMKSyBWcAJ0wxRbo114D7+xhuWMSOjzZhEX0cHqo5OCVmL/T314ufmxSKkCJM0
r1fpy7eI95FL7EBv5gg7x95ksKyFYiFJxCOEnqzMcuMZEuM4IWhQ+4cSqeU1/8R0
G3l0W/L3V5LibMtQ0YGcDNfuGFK2DJ+YgxmEOirVapWPTqojc6TiDkV2j2SAJHbL
kkpW+M3J0xIL+pivHPQ1z4NhVQS1bKhNjqqCVOE80a0ifhsW0qXfjZEGWbfuKaTs
LNc/lqrhjt681cVo/6ljYGdqIOlzXtUxfcU+nJLy26GMKHrP0C6iPpY87h11IvPZ
q1UNvBbK0r7fPFDc3lY0+AwD6WHpLtB+JjuITXWkZu9/zdRUcixQmpzNnW82TNXA
SKaX5KVopfvizozIlzuvPO/sahMumMsg6wIMmOZTKPhr7XMLmcHdOFLGSDB4tlhM
AQ87TCcrK2VUBNbLZpgsQMVqcYKwbvTsXZWqHJCYKHRh4aOn7rO5dC5vZKNibMzW
itthoLHOWJZLzqBR54u90Ri086SQLCG949neDUrbsA0i/svZkpl9HTm0aq9us430
csDY/oSlXAULJG0YlrCraBnvPJSLKg0dqNq6HDzGwP8=
=jtC+
-----END PGP SIGNATURE-----
Merge tag 'nf-26-03-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter: updates for net
Due to large volume of backlogged patches its unlikely I will make the
2nd planned PR this week, so several legit fixes will be pushed back
to next week. Sorry for the inconvenience but I am out of ideas and
alternatives.
1) syzbot managed to add/remove devices to a flowtable, due to a bug in
the flowtable netdevice notifier this gets us a double-add and
eventually UaF when device is removed again (we only expect one
entry, duplicate remains past net_device end-of-life).
From Phil Sutter, bug added in 6.16.
2) Yiming Qian reports another nf_tables transaction handling bug:
in some cases error unwind misses to undo certain set elements,
resulting in refcount underflow and use-after-free, bug added in 6.4.
3) Jenny Guanni Qu found out-of-bounds read in pipapo set type.
While the value is never used, it still rightfully triggers KASAN
splats. Bug exists since this set type was added in 5.6.
4) a few x_tables modules contain copypastry tcp option parsing code which
can read 1 byte past the option area. This bug is ancient, fix from
David Dull.
5) nfnetlink_queue leaks kernel memory if userspace provides bad
NFQA_VLAN/NFQA_L2HDR attributes. From Hyunwoo Kim, bug stems from
from 4.7 days.
6) nfnetlink_cthelper has incorrect loop restart logic which may result
in reading one pointer past end of array. From 3.6 days, fix also from
Hyunwoo Kim.
7) xt_IDLETIMER v0 extension must reject working with timers added
by revision v1, else we get list corruption. Bug added in v5.7.
From Yifan Wu, Juefei Pu and Yuan Tan via Xin Lu.
* tag 'nf-26-03-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: xt_IDLETIMER: reject rev0 reuse of ALARM timer labels
netfilter: nfnetlink_cthelper: fix OOB read in nfnl_cthelper_dump_table()
netfilter: nfnetlink_queue: fix entry leak in bridge verdict error path
netfilter: x_tables: guard option walkers against 1-byte tail reads
netfilter: nft_set_pipapo: fix stack out-of-bounds read in pipapo_drop()
netfilter: nf_tables: always walk all pending catchall elements
netfilter: nf_tables: Fix for duplicate device in netdev hooks
====================
Link: https://patch.msgid.link/20260310132050.630-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2026-03-10 (ice, iavf, i40e, e1000e, e1000)
Nikolay Aleksandrov changes return code of RDMA related ice devlink get
parameters when irdma is not enabled to -EOPNOTSUPP as current return
of -ENODEV causes issues with devlink output.
Petr Oros resolves a couple of issues in iavf; freeing PTP resources
before reset and disable. Fixing contention issues with the netdev lock
between reset and some ethtool operations.
Alok Tiwari corrects an incorrect comparison of cloud filter values and
adjust some passed arguments to sizeof() for consistency on i40e.
Matt Vollrath removes an incorrect decrement for DMA error on e1000 and
e1000e drivers.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
e1000/e1000e: Fix leak in DMA error cleanup
i40e: fix src IP mask checks and memcpy argument names in cloud filter
iavf: fix incorrect reset handling in callbacks
iavf: fix PTP use-after-free during reset
drivers: net: ice: fix devlink parameters get without irdma
====================
Link: https://patch.msgid.link/20260310205654.4109072-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca says:
====================
neighbour: fix update of proxy neighbour
While re-reading some "old" patches I ran into a small change of
behavior in commit dc2a27e524 ("neighbour: Update pneigh_entry in
pneigh_create().").
The old behavior was not consistent between ->protocol and ->flags,
and didn't offer a way to clear protocol, so maybe it's better to
change that (7-years-old [1]) behavior. But then we should change
non-proxy neighbours as well to keep neigh/pneigh consistent.
[1] df9b0e30d4 ("neighbor: Add protocol attribute")
====================
Link: https://patch.msgid.link/cover.1772894876.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Prior to commit dc2a27e524 ("neighbour: Update pneigh_entry in
pneigh_create()."), a pneigh's protocol was updated only when the
value of the NDA_PROTOCOL attribute was non-0. While moving the code,
that check was removed. This is a small change of user-visible
behavior, and inconsistent with the (non-proxy) neighbour behavior.
Fixes: dc2a27e524 ("neighbour: Update pneigh_entry in pneigh_create().")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/38c61de1bb032871a886aff9b9b52fe1cdd4cada.1772894876.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The rtl8366rb_led_group_port_mask() function always returns LED port
bit in LED group 0; the switch statement returns the same thing in all
non-default cases.
This means that the driver does not currently support configuring LEDs
in non-zero LED groups.
Fix this.
Fixes: 32d6170054 ("net: dsa: realtek: add LED drivers for rtl8366rb")
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260311111237.29002-1-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A user can set conn_timeout to any value via
setsockopt(TIPC_CONN_TIMEOUT), including values less than 4. When a
SYN is rejected with TIPC_ERR_OVERLOAD and the retry path in
tipc_sk_filter_connect() executes:
delay %= (tsk->conn_timeout / 4);
If conn_timeout is in the range [0, 3], the integer division yields 0,
and the modulo operation triggers a divide-by-zero exception, causing a
kernel oops/panic.
Fix this by clamping conn_timeout to a minimum of 4 at the point of use
in tipc_sk_filter_connect().
Oops: divide error: 0000 [#1] SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 119 Comm: poc-F144 Not tainted 7.0.0-rc2+
RIP: 0010:tipc_sk_filter_rcv (net/tipc/socket.c:2236 net/tipc/socket.c:2362)
Call Trace:
tipc_sk_backlog_rcv (include/linux/instrumented.h:82 include/linux/atomic/atomic-instrumented.h:32 include/net/sock.h:2357 net/tipc/socket.c:2406)
__release_sock (include/net/sock.h:1185 net/core/sock.c:3213)
release_sock (net/core/sock.c:3797)
tipc_connect (net/tipc/socket.c:2570)
__sys_connect (include/linux/file.h:62 include/linux/file.h:83 net/socket.c:2098)
Fixes: 6787927475 ("tipc: buffer overflow handling in listener socket")
Cc: stable@vger.kernel.org
Signed-off-by: Mehul Rao <mehulrao@gmail.com>
Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech>
Link: https://patch.msgid.link/20260310170730.28841-1-mehulrao@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If request_threaded_irq() fails during the PTP message IRQ setup, the
newly created IRQ mapping is never disposed. Indeed, the
ksz_ptp_irq_setup()'s error path only frees the mappings that were
successfully set up.
Dispose the newly created mapping if the associated
request_threaded_irq() fails at setup.
Cc: stable@vger.kernel.org
Fixes: d0b8fec8ae ("net: dsa: microchip: Fix symetry in ksz_ptp_msg_irq_{setup/free}()")
Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/20260309-ksz-ptp-irq-fix-v1-1-757b3b985955@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When booting with the 'ipv6.disable=1' parameter, the nd_tbl is never
initialized because inet6_init() exits before ndisc_init() is called which
initializes it. If bpf_redirect_neigh() is called with explicit AF_INET6
nexthop parameters, __bpf_redirect_neigh_v6() can skip the IPv6 FIB lookup
and call bpf_out_neigh_v6() directly. bpf_out_neigh_v6() then calls
ip_neigh_gw6(), which uses ipv6_stub->nd_tbl.
BUG: kernel NULL pointer dereference, address: 0000000000000248
Oops: Oops: 0000 [#1] SMP NOPTI
RIP: 0010:skb_do_redirect+0x44f/0xf40
Call Trace:
<TASK>
? srso_alias_return_thunk+0x5/0xfbef5
? __tcf_classify.constprop.0+0x83/0x160
? srso_alias_return_thunk+0x5/0xfbef5
? tcf_classify+0x2b/0x50
? srso_alias_return_thunk+0x5/0xfbef5
? tc_run+0xb8/0x120
? srso_alias_return_thunk+0x5/0xfbef5
__dev_queue_xmit+0x6fa/0x1000
? srso_alias_return_thunk+0x5/0xfbef5
packet_sendmsg+0x10da/0x1700
? srso_alias_return_thunk+0x5/0xfbef5
__sys_sendto+0x1f3/0x220
__x64_sys_sendto+0x24/0x30
do_syscall_64+0x101/0xf80
? exc_page_fault+0x6e/0x170
? srso_alias_return_thunk+0x5/0xfbef5
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
Fix this by adding an early check in bpf_out_neigh_v6(). If IPv6 is
disabled, drop the packet before neighbor lookup.
Suggested-by: Fernando Fernandez Mancera <fmancera@suse.de>
Fixes: ba452c9e99 ("bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop")
Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20260307-net-nd_tbl_fixes-v4-4-e2677e85628c@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When booting with the 'ipv6.disable=1' parameter, the nd_tbl is never
initialized because inet6_init() exits before ndisc_init() is called which
initializes it. If bpf_redirect_neigh() is called from tc with an explicit
nexthop of nh_family == AF_INET6, bpf_out_neigh_v4() takes the AF_INET6
branch and calls ip_neigh_gw6(), which relies on ipv6_stub->nd_tbl.
BUG: kernel NULL pointer dereference, address: 0000000000000248
Oops: Oops: 0000 [#1] SMP NOPTI
RIP: 0010:skb_do_redirect+0xb93/0xf00
Call Trace:
<TASK>
? srso_alias_return_thunk+0x5/0xfbef5
? __tcf_classify.constprop.0+0x83/0x160
? srso_alias_return_thunk+0x5/0xfbef5
? tcf_classify+0x2b/0x50
? srso_alias_return_thunk+0x5/0xfbef5
? tc_run+0xb8/0x120
? srso_alias_return_thunk+0x5/0xfbef5
__dev_queue_xmit+0x6fa/0x1000
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? alloc_skb_with_frags+0x58/0x200
packet_sendmsg+0x10da/0x1700
? srso_alias_return_thunk+0x5/0xfbef5
__sys_sendto+0x1f3/0x220
__x64_sys_sendto+0x24/0x30
do_syscall_64+0x101/0xf80
? exc_page_fault+0x6e/0x170
? srso_alias_return_thunk+0x5/0xfbef5
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
Fix this by adding an early check in the AF_INET6 branch of
bpf_out_neigh_v4(). If IPv6 is disabled, unlock RCU and drop the packet.
Suggested-by: Fernando Fernandez Mancera <fmancera@suse.de>
Fixes: ba452c9e99 ("bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop")
Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20260307-net-nd_tbl_fixes-v4-3-e2677e85628c@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When booting with the 'ipv6.disable=1' parameter, the nd_tbl is never
initialized because inet6_init() exits before ndisc_init() is called
which initializes it. If bonding ARP/NS validation is enabled, an IPv6
NS/NA packet received on a slave can reach bond_validate_na(), which
calls bond_has_this_ip6(). That path calls ipv6_chk_addr() and can
crash in __ipv6_chk_addr_and_flags().
BUG: kernel NULL pointer dereference, address: 00000000000005d8
Oops: Oops: 0000 [#1] SMP NOPTI
RIP: 0010:__ipv6_chk_addr_and_flags+0x69/0x170
Call Trace:
<IRQ>
ipv6_chk_addr+0x1f/0x30
bond_validate_na+0x12e/0x1d0 [bonding]
? __pfx_bond_handle_frame+0x10/0x10 [bonding]
bond_rcv_validate+0x1a0/0x450 [bonding]
bond_handle_frame+0x5e/0x290 [bonding]
? srso_alias_return_thunk+0x5/0xfbef5
__netif_receive_skb_core.constprop.0+0x3e8/0xe50
? srso_alias_return_thunk+0x5/0xfbef5
? update_cfs_rq_load_avg+0x1a/0x240
? srso_alias_return_thunk+0x5/0xfbef5
? __enqueue_entity+0x5e/0x240
__netif_receive_skb_one_core+0x39/0xa0
process_backlog+0x9c/0x150
__napi_poll+0x30/0x200
? srso_alias_return_thunk+0x5/0xfbef5
net_rx_action+0x338/0x3b0
handle_softirqs+0xc9/0x2a0
do_softirq+0x42/0x60
</IRQ>
<TASK>
__local_bh_enable_ip+0x62/0x70
__dev_queue_xmit+0x2d3/0x1000
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? packet_parse_headers+0x10a/0x1a0
packet_sendmsg+0x10da/0x1700
? kick_pool+0x5f/0x140
? srso_alias_return_thunk+0x5/0xfbef5
? __queue_work+0x12d/0x4f0
__sys_sendto+0x1f3/0x220
__x64_sys_sendto+0x24/0x30
do_syscall_64+0x101/0xf80
? exc_page_fault+0x6e/0x170
? srso_alias_return_thunk+0x5/0xfbef5
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
Fix this by checking ipv6_mod_enabled() before dispatching IPv6 packets to
bond_na_rcv(). If IPv6 is disabled, return early from bond_rcv_validate()
and avoid the path to ipv6_chk_addr().
Suggested-by: Fernando Fernandez Mancera <fmancera@suse.de>
Fixes: 4e24be018e ("bonding: add new parameter ns_targets")
Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260307-net-nd_tbl_fixes-v4-2-e2677e85628c@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
From: Jakub Kicinski <kuba@kernel.org>
Make sure disable_ipv6_mod itself is not part of the IPv6 module,
in case core code wants to refer to it. We will remove support
for IPv6=m soon, this change helps make fixes we commit before
that less messy.
Link: https://patch.msgid.link/20260307-net-nd_tbl_fixes-v4-1-e2677e85628c@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Correct the early return from the i.MX remoteproc prepare operation,
which prevented the platform-specific prepare function to be reached.
Ensure that the Mediatek SCP clock is released during system suspend,
after the recent refactoring to avoid issues with the clock framework's
prepare lock.
Correct the type of the subsys_name_len field in the sysmon event
QMI message, as the recent introduction of big endian support in the QMI
encoder highlighted the type mismatch and resulted in a failure to
encode the message.
Roll back the devm_ioremap_resource_wc() to a devm_ioremap_wc() in the
Qualcomm WCNSS remoteproc driver, after reports that requesting this
resource fails on some platforms.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEBd4DzF816k8JZtUlCx85Pw2ZrcUFAmmw5noACgkQCx85Pw2Z
rcXcGhAAjx1XIq6DQIXHgmKICMTJNmvx6v4qeWd5Yb7F9BqJ3Y9vJYTdA124sHJR
okln8ilAGe/+t7Rid1TNoqpkCIm8fqBK62kMvVggtrpaSah2WnNp1fJp0OGFgCik
DRjVNuPvr3x4aFs7Zp3iBFt2Vf7XebW7Y58gal4/22nEXErLou0nAIwAZHT+ypEE
EGyb/TDV1KxjLjx7csk/LRTGlZMDFI0i5owIL5FUdGIStemt+RhrJiQUGR/kaIh6
3qjhSC6sf+jixQ61bTgGsPODWMpJBW+0Ir5GTQpSnirIikZAZdLCcbgVahfwK8Et
VJsadN8H0gREF3uis/IHRglNBPJ0TQtO2JHE0oqUSDevvHBaDLe8SeXYmLxlo8m/
hIRvpnvUaprr2jc3XMVOEzRr+XCmRP5HVSnzChApY43UZwGHln4ViEckcl8Qpzav
ccCmWc28s6BN1iXvEP7rwg15RfEClNiVXG5Oazuonkl8EN5NbZ0cBApEBFQhZ+1s
6V6XEdIQclAX0t9SF1H8slAENG0Xe2zpht6p6eFlks9SRxtIRKHVSloX0020qKoh
OiPB+K2HOHBVSqcCw+BSJlvpptu/aJNIID8edrlhQZOjg7sl1oxhP39gkLUJr+0a
TZfPTlOqnJc3NXce9RuuqCamYbLB3cC4zY4/x+JaEOePJp3wnmw=
=7ekE
-----END PGP SIGNATURE-----
Merge tag 'rproc-v7.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux
Pull remoteproc fixes from Bjorn Andersson:
- Correct the early return from the i.MX remoteproc prepare
operation, which prevented the platform-specific prepare
function from being reached
- Ensure that the Mediatek SCP clock is released during system
suspend after the recent refactoring to avoid issues with the
clock framework's prepare lock.
- Correct the type of the subsys_name_len field in the sysmon
event QMI message, as the recent introduction of big endian
support in the QMI encoder highlighted the type mismatch and
resulted in a failure to encode the message
- Roll back the devm_ioremap_resource_wc() to a devm_ioremap_wc()
in the Qualcomm WCNSS remoteproc driver, after reports that
requesting this resource fails on some platforms
* tag 'rproc-v7.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
remoteproc: imx_rproc: Fix unreachable platform prepare_ops
remoteproc: mediatek: Unprepare SCP clock during system suspend
remoteproc: sysmon: Correct subsys_name_len type in QMI request
remoteproc: qcom_wcnss: Fix reserved region mapping failure
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmmwX+wACgkQiiy9cAdy
T1FWkQv9GHg6IWRG8PlbhGZG7NbQnHNndv2+hjR2v41+ZtB9zZADpC0bdoHFNGZV
KeYrLQa9sypss2w6hCynkRAtPobf7aROe7lscVQk+uCC2XVTI5eSyalpf8jiQHVi
0XXgvs3FuBcvLHNtzrCJeTUxAko0EGRkjLkSYDr7mijr31FKkV6Vwvy+JuT6JArR
M706MwIyWgWYWc7N6mrt3WrA5dT4KJc5KH2/4hCQMoJTgjKn1ExVwp/b/rogQEdC
yV+MkMinQJ3s+WvSYQR+WvG1HQeCs6UJiI5heDHCgu6husdvzeAD4LagVsAIYbFB
jXCjiIJunL5H1ISWsQVhPykfg3CUY6CT34trNPAwxEStziW0kAcJ6bou5v2IUwnK
hkztVeeEwSX3e4VFAPhYmqo5x5EDoSEG444rTSxiGdA4Ut3KNrpyxu9SJM4L4Fhi
7OCdR70p+DagzNz28dE1e6pM4S/47vNrVRcLPBy9WPpviO1G1C5elUXBNduwucUd
ca1sEA/u
=BNF+
-----END PGP SIGNATURE-----
Merge tag 'v7.0-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Fix potential use after free errors
- Fix refcount leak in smb2 open error path
- Prevent allowing logging signing or encryption keys
* tag 'v7.0-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: Don't log keys in SMB3 signing and encryption key generation
smb: server: fix use-after-free in smb2_open()
ksmbd: fix use-after-free in smb_lazy_parent_lease_break_close()
ksmbd: fix use-after-free by using call_rcu() for oplock_info
ksmbd: fix use-after-free in proc_show_files due to early rcu_read_unlock
smb/server: Fix another refcount leak in smb2_open()
The bcmgenet EEE implementation is broken in several ways.
phy_support_eee() is never called, so the PHY never advertises EEE
and phylib never sets phydev->enable_tx_lpi. bcmgenet_mac_config()
checks priv->eee.eee_enabled to decide whether to enable the MAC
LPI logic, but that field is never initialised to true, so the MAC
never enters Low Power Idle even when EEE is negotiated - wasting
the power savings EEE is designed to provide. The only way to get
EEE working at all is a manual 'ethtool --set-eee eth0 eee on' after
every link-up, and even then bcmgenet_get_eee() immediately clobbers
the reported state because phy_ethtool_get_eee() overwrites
eee_enabled and tx_lpi_enabled with the uninitialised PHY eee_cfg
values. Finally, bcmgenet_mac_config() is only called on link-up,
so EEE is never disabled in hardware on link-down.
Fix all of this by removing the MAC-side EEE state tracking
(priv->eee) and aligning with the pattern used by other non-phylink
MAC drivers such as FEC.
Call phy_support_eee() in bcmgenet_mii_probe() so the PHY advertises
EEE link modes and phylib tracks negotiation state. Move the EEE
hardware control to bcmgenet_mii_setup(), which is called on every
link event, and drive it directly from phydev->enable_tx_lpi - the
flag phylib sets when EEE is negotiated and the user has not disabled
it. This enables EEE automatically once the link partner agrees and
disables it cleanly on link-down.
Make bcmgenet_get_eee() and bcmgenet_set_eee() pure passthroughs to
phy_ethtool_get_eee() and phy_ethtool_set_eee(), with the MAC
hardware register read/written for tx_lpi_timer. Drop struct
ethtool_keee eee from struct bcmgenet_priv.
Fixes: fe0d4fd928 ("net: phy: Keep track of EEE configuration")
Link: https://lore.kernel.org/netdev/d352039f-4cbb-41e6-9aeb-0b4f3941b54c@lunn.ch/
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20260310054935.1238594-1-nb@tipi-net.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
genlmsg_reply() hands the reply skb to netlink, and
netlink_unicast() consumes it on all return paths, whether the
skb is queued successfully or freed on an error path.
net_shaper_nl_get_doit() and net_shaper_nl_cap_get_doit()
currently jump to free_msg after genlmsg_reply() fails and call
nlmsg_free(msg), which can hit the same skb twice.
Return the genlmsg_reply() error directly and keep free_msg
only for pre-reply failures.
Fixes: 4b623f9f0f ("net-shapers: implement NL get operation")
Fixes: 553ea9f1ef ("net: shaper: implement introspection support")
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moses <p@1g4.org>
Link: https://patch.msgid.link/20260309173450.538026-2-p@1g4.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The PHY addresses in the MII bus are not equal to the port addresses,
so the bus cannot be assigned as user_mii_bus. Falling back on the
user_mii_bus in case a PHY isn't declared in device tree will result in
using the wrong (in this case: off-by-+1) PHY.
Remove the wrong assignment.
Fixes: 23794bec1c ("net: dsa: add basic initial driver for MxL862xx switches")
Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/0f0df310fd8cab57e0e5e3d0831dd057fd05bcd5.1773103271.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Normal RX/TX interrupts are enabled later, in arc_emac_open(), so probe
should not see interrupt delivery in the usual case. However, hardware may
still present stale or latched interrupt status left by firmware or the
bootloader.
If probe later unwinds after devm_request_irq() has installed the handler,
such a stale interrupt can still reach arc_emac_intr() during teardown and
race with release of the associated net_device.
Avoid that window by putting the device into a known quiescent state before
requesting the IRQ: disable all EMAC interrupt sources and clear any
pending EMAC interrupt status bits. This keeps the change hardware-focused
and minimal, while preventing spurious IRQ delivery from leftover state.
Fixes: e4f2379db6 ("ethernet/arc/arc_emac - Add new driver")
Cc: stable@vger.kernel.org
Signed-off-by: Fan Wu <fanwu01@zju.edu.cn>
Link: https://patch.msgid.link/20260309132409.584966-1-fanwu01@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
While testing other changes in vng I noticed that
nl_netdev.page_pool_check flakes. This never happens in real CI.
Turns out vng may boot and get to that test in less than a second.
page_pool_detached() records the detach time in seconds, so if
vng is fast enough detach time is set to 0. Other code treats
0 as "not detached". detach_time is only used to report the state
to the user, so it's not a huge deal in practice but let's fix it.
Store the raw ktime_t (nanoseconds) instead. A nanosecond value
of 0 is practically impossible.
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Fixes: 69cb4952b6 ("net: page_pool: report when page pool was destroyed")
Link: https://patch.msgid.link/20260310003907.3540019-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Quanyang observed that when using an NFS rootfs on an AMD ZynqMp board,
the rootfs may take an extended time to recover after a suspend.
Upon investigation, it was determined that the issue originates from a
problem in the macb driver.
According to the Zynq UltraScale TRM [1], when transmit is disabled,
the transmit buffer queue pointer resets to point to the address
specified by the transmit buffer queue base address register.
In the current implementation, the code merely resets `queue->tx_head`
and `queue->tx_tail` to '0'. This approach presents several issues:
- Packets already queued in the tx ring are silently lost,
leading to memory leaks since the associated skbs cannot be released.
- Concurrent write access to `queue->tx_head` and `queue->tx_tail` may
occur from `macb_tx_poll()` or `macb_start_xmit()` when these values
are reset to '0'.
- The transmission may become stuck on a packet that has already been sent
out, with its 'TX_USED' bit set, but has not yet been processed. However,
due to the manipulation of 'queue->tx_head' and 'queue->tx_tail',
`macb_tx_poll()` incorrectly assumes there are no packets to handle
because `queue->tx_head == queue->tx_tail`. This issue is only resolved
when a new packet is placed at this position. This is the root cause of
the prolonged recovery time observed for the NFS root filesystem.
To resolve this issue, shuffle the tx ring and tx skb array so that
the first unsent packet is positioned at the start of the tx ring.
Additionally, ensure that updates to `queue->tx_head` and
`queue->tx_tail` are properly protected with the appropriate lock.
[1] https://docs.amd.com/v/u/en-US/ug1085-zynq-ultrascale-trm
Fixes: bf9cf80cab ("net: macb: Fix tx/rx malfunction after phy link down and up")
Reported-by: Quanyang Wang <quanyang.wang@windriver.com>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260307-zynqmp-v2-1-6ef98a70e1d0@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If an error is encountered while mapping TX buffers, the driver should
unmap any buffers already mapped for that skb.
Because count is incremented after a successful mapping, it will always
match the correct number of unmappings needed when dma_error is reached.
Decrementing count before the while loop in dma_error causes an
off-by-one error. If any mapping was successful before an unsuccessful
mapping, exactly one DMA mapping would leak.
In these commits, a faulty while condition caused an infinite loop in
dma_error:
Commit 03b1320dfc ("e1000e: remove use of skb_dma_map from e1000e
driver")
Commit 602c0554d7 ("e1000: remove use of skb_dma_map from e1000 driver")
Commit c1fa347f20 ("e1000/e1000e/igb/igbvf/ixgb/ixgbe: Fix tests of
unsigned in *_tx_map()") fixed the infinite loop, but introduced the
off-by-one error.
This issue may still exist in the igbvf driver, but I did not address it
in this patch.
Fixes: c1fa347f20 ("e1000/e1000e/igb/igbvf/ixgb/ixgbe: Fix tests of unsigned in *_tx_map()")
Assisted-by: Claude:claude-4.6-opus
Signed-off-by: Matt Vollrath <tactii@gmail.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix following issues in the IPv4 and IPv6 cloud filter handling logic in
both the add and delete paths:
- The source-IP mask check incorrectly compares mask.src_ip[0] against
tcf.dst_ip[0]. Update it to compare against tcf.src_ip[0]. This likely
goes unnoticed because the check is in an "else if" path that only
executes when dst_ip is not set, most cloud filter use cases focus on
destination-IP matching, and the buggy condition can accidentally
evaluate true in some cases.
- memcpy() for the IPv4 source address incorrectly uses
ARRAY_SIZE(tcf.dst_ip) instead of ARRAY_SIZE(tcf.src_ip), although
both arrays are the same size.
- The IPv4 memcpy operations used ARRAY_SIZE(tcf.dst_ip) and ARRAY_SIZE
(tcf.src_ip), Update these to use sizeof(cfilter->ip.v4.dst_ip) and
sizeof(cfilter->ip.v4.src_ip) to ensure correct and explicit copy size.
- In the IPv6 delete path, memcmp() uses sizeof(src_ip6) when comparing
dst_ip6 fields. Replace this with sizeof(dst_ip6) to make the intent
explicit, even though both fields are struct in6_addr.
Fixes: e284fc2804 ("i40e: Add and delete cloud filter")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Singletons, with one doubleton - please see the changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaa9ZkgAKCRDdBJ7gKXxA
jpfeAQDAj+Sn3bRdJWCUVNgWPdNLrmZnZRw5VdipioAnfmZHZQEAqBXnm7ehn1BA
JgQvx6zTlsyqy4FhW+Q6uLmBpKmMTQw=
=kbLH
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2026-03-09-16-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"15 hotfixes. 6 are cc:stable. 14 are for MM.
Singletons, with one doubleton - please see the changelogs for details"
* tag 'mm-hotfixes-stable-2026-03-09-16-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
MAINTAINERS, mailmap: update email address for Lorenzo Stoakes
mm/mmu_notifier: clean up mmu_notifier.h kernel-doc
uaccess: correct kernel-doc parameter format
mm/huge_memory: fix a folio_split() race condition with folio_try_get()
MAINTAINERS: add co-maintainer and reviewer for SLAB ALLOCATOR
MAINTAINERS: add RELAY entry
memcg: fix slab accounting in refill_obj_stock() trylock path
mm/hugetlb.c: use __pa() instead of virt_to_phys() in early bootmem alloc code
zram: rename writeback_compressed device attr
tools/testing: fix testing/vma and testing/radix-tree build
Revert "ptdesc: remove references to folios from __pagetable_ctor() and pagetable_dtor()"
mm/cma: move put_page_testzero() out of VM_WARN_ON in cma_release()
mm/damon/core: clear walk_control on inactive context in damos_walk()
mm: memfd_luo: always dirty all folios
mm: memfd_luo: always make all folios uptodate
Three driver callbacks schedule a reset and wait for its completion:
ndo_change_mtu(), ethtool set_ringparam(), and ethtool set_channels().
Waiting for reset in ndo_change_mtu() and set_ringparam() was added by
commit c2ed2403f1 ("iavf: Wait for reset in callbacks which trigger
it") to fix a race condition where adding an interface to bonding
immediately after MTU or ring parameter change failed because the
interface was still in __RESETTING state. The same commit also added
waiting in iavf_set_priv_flags(), which was later removed by commit
53844673d5 ("iavf: kill "legacy-rx" for good").
Waiting in set_channels() was introduced earlier by commit 4e5e6b5d9d
("iavf: Fix return of set the new channel count") to ensure the PF has
enough time to complete the VF reset when changing channel count, and to
return correct error codes to userspace.
Commit ef490bbb22 ("iavf: Add net_shaper_ops support") added
net_shaper_ops to iavf, which required reset_task to use _locked NAPI
variants (napi_enable_locked, napi_disable_locked) that need the netdev
instance lock.
Later, commit 7e4d784f58 ("net: hold netdev instance lock during
rtnetlink operations") and commit 2bcf4772e4 ("net: ethtool: try to
protect all callback with netdev instance lock") started holding the
netdev instance lock during ndo and ethtool callbacks for drivers with
net_shaper_ops.
Finally, commit 120f28a6f3 ("iavf: get rid of the crit lock")
replaced the driver's crit_lock with netdev_lock in reset_task, causing
incorrect behavior: the callback holds netdev_lock and waits for
reset_task, but reset_task needs the same lock:
Thread 1 (callback) Thread 2 (reset_task)
------------------- ---------------------
netdev_lock() [blocked on workqueue]
ndo_change_mtu() or ethtool op
iavf_schedule_reset()
iavf_wait_for_reset() iavf_reset_task()
waiting... netdev_lock() <- blocked
This does not strictly deadlock because iavf_wait_for_reset() uses
wait_event_interruptible_timeout() with a 5-second timeout. The wait
eventually times out, the callback returns an error to userspace, and
after the lock is released reset_task completes the reset. This leads to
incorrect behavior: userspace sees an error even though the configuration
change silently takes effect after the timeout.
Fix this by extracting the reset logic from iavf_reset_task() into a new
iavf_reset_step() function that expects netdev_lock to be already held.
The three callbacks now call iavf_reset_step() directly instead of
scheduling the work and waiting, performing the reset synchronously in
the caller's context which already holds netdev_lock. This eliminates
both the incorrect error reporting and the need for
iavf_wait_for_reset(), which is removed along with the now-unused
reset_waitqueue.
The workqueue-based iavf_reset_task() becomes a thin wrapper that
acquires netdev_lock and calls iavf_reset_step(), preserving its use
for PF-initiated resets.
The callbacks may block for several seconds while iavf_reset_step()
polls hardware registers, but this is acceptable since netdev_lock is a
per-device mutex and only serializes operations on the same interface.
v3:
- Remove netif_running() guard from iavf_set_channels(). Unlike
set_ringparam where descriptor counts are picked up by iavf_open()
directly, num_req_queues is only consumed during
iavf_reinit_interrupt_scheme() in the reset path. Skipping the reset
on a down device would silently discard the channel count change.
- Remove dead reset_waitqueue code (struct field, init, and all
wake_up calls) since iavf_wait_for_reset() was the only consumer.
Fixes: 120f28a6f3 ("iavf: get rid of the crit lock")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Commit 7c01dbfc8a ("iavf: periodically cache PHC time") introduced a
worker to cache PHC time, but failed to stop it during reset or disable.
This creates a race condition where `iavf_reset_task()` or
`iavf_disable_vf()` free adapter resources (AQ) while the worker is still
running. If the worker triggers `iavf_queue_ptp_cmd()` during teardown, it
accesses freed memory/locks, leading to a crash.
Fix this by calling `iavf_ptp_release()` before tearing down the adapter.
This ensures `ptp_clock_unregister()` synchronously cancels the worker and
cleans up the chardev before the backing resources are destroyed.
Fixes: 7c01dbfc8a ("iavf: periodically cache PHC time")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
If CONFIG_IRDMA isn't enabled but there are ice NICs in the system, the
driver will prevent full devlink dev param show dump because its rdma get
callbacks return ENODEV and stop the dump. For example:
$ devlink dev param show
pci/0000:82:00.0:
name msix_vec_per_pf_max type generic
values:
cmode driverinit value 2
name msix_vec_per_pf_min type generic
values:
cmode driverinit value 2
kernel answers: No such device
Returning EOPNOTSUPP allows the dump to continue so we can see all devices'
devlink parameters.
Fixes: c24a65b6a2 ("iidc/ice/irdma: Update IDC to support multiple consumers")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-----BEGIN PGP SIGNATURE-----
iIkEABYKADEWIQSl+MghEFFAdY3pYJLMOmT6rpmt0gUCaa/u1RMcbWtsQHBlbmd1
dHJvbml4LmRlAAoJEMw6ZPquma3SmsABAJR3wGHrBuQ5QmhLPSmDKsNPMM2so9v9
AW4bGswHf+ReAP99CKH9aNvEIm43/Ap55lTLMhuBEKyYMS1e6SXFN572Dw==
=nBZ9
-----END PGP SIGNATURE-----
Merge tag 'linux-can-fixes-for-7.0-20260310' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2026-03-10
this is a pull request of 2 patches for net/main.
Haibo Chen's patch fixes the maximum allowed bit rate error, which was
broken in v6.19.
Wenyuan Li contributes a patch for the hi311x driver that adds missing
error checking in the caller of the hi3110_power_enable() function,
hi3110_open().
linux-can-fixes-for-7.0-20260310
* tag 'linux-can-fixes-for-7.0-20260310' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: hi311x: hi3110_open(): add check for hi3110_power_enable() return value
can: dev: keep the max bitrate error at 5%
====================
Link: https://patch.msgid.link/20260310103547.2299403-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
IDLETIMER revision 0 rules reuse existing timers by label and always call
mod_timer() on timer->timer.
If the label was created first by revision 1 with XT_IDLETIMER_ALARM,
the object uses alarm timer semantics and timer->timer is never initialized.
Reusing that object from revision 0 causes mod_timer() on an uninitialized
timer_list, triggering debugobjects warnings and possible panic when
panic_on_warn=1.
Fix this by rejecting revision 0 rule insertion when an existing timer with
the same label is of ALARM type.
Fixes: 68983a354a ("netfilter: xtables: Add snapshot of hardidletimer target")
Co-developed-by: Yifan Wu <yifanwucs@gmail.com>
Signed-off-by: Yifan Wu <yifanwucs@gmail.com>
Co-developed-by: Juefei Pu <tomapufckgml@gmail.com>
Signed-off-by: Juefei Pu <tomapufckgml@gmail.com>
Signed-off-by: Yuan Tan <tanyuan98@outlook.com>
Signed-off-by: Xin Liu <dstsmallbird@foxmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
nfnl_cthelper_dump_table() has a 'goto restart' that jumps to a label
inside the for loop body. When the "last" helper saved in cb->args[1]
is deleted between dump rounds, every entry fails the (cur != last)
check, so cb->args[1] is never cleared. The for loop finishes with
cb->args[0] == nf_ct_helper_hsize, and the 'goto restart' jumps back
into the loop body bypassing the bounds check, causing an 8-byte
out-of-bounds read on nf_ct_helper_hash[nf_ct_helper_hsize].
The 'goto restart' block was meant to re-traverse the current bucket
when "last" is no longer found, but it was placed after the for loop
instead of inside it. Move the block into the for loop body so that
the restart only occurs while cb->args[0] is still within bounds.
BUG: KASAN: slab-out-of-bounds in nfnl_cthelper_dump_table+0x9f/0x1b0
Read of size 8 at addr ffff888104ca3000 by task poc_cthelper/131
Call Trace:
nfnl_cthelper_dump_table+0x9f/0x1b0
netlink_dump+0x333/0x880
netlink_recvmsg+0x3e2/0x4b0
sock_recvmsg+0xde/0xf0
__sys_recvfrom+0x150/0x200
__x64_sys_recvfrom+0x76/0x90
do_syscall_64+0xc3/0x6e0
Allocated by task 1:
__kvmalloc_node_noprof+0x21b/0x700
nf_ct_alloc_hashtable+0x65/0xd0
nf_conntrack_helper_init+0x21/0x60
nf_conntrack_init_start+0x18d/0x300
nf_conntrack_standalone_init+0x12/0xc0
Fixes: 12f7a50533 ("netfilter: add user-space connection tracking helper infrastructure")
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
nfqnl_recv_verdict() calls find_dequeue_entry() to remove the queue
entry from the queue data structures, taking ownership of the entry.
For PF_BRIDGE packets, it then calls nfqa_parse_bridge() to parse VLAN
attributes. If nfqa_parse_bridge() returns an error (e.g. NFQA_VLAN
present but NFQA_VLAN_TCI missing), the function returns immediately
without freeing the dequeued entry or its sk_buff.
This leaks the nf_queue_entry, its associated sk_buff, and all held
references (net_device refcounts, struct net refcount). Repeated
triggering exhausts kernel memory.
Fix this by dropping the entry via nfqnl_reinject() with NF_DROP verdict
on the error path, consistent with other error handling in this file.
Fixes: 8d45ff22f1 ("netfilter: bridge: nf queue verdict to use NFQA_VLAN and NFQA_L2HDR")
Reviewed-by: David Dull <monderasdor@gmail.com>
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
When the last byte of options is a non-single-byte option kind, walkers
that advance with i += op[i + 1] ? : 1 can read op[i + 1] past the end
of the option area.
Add an explicit i == optlen - 1 check before dereferencing op[i + 1]
in xt_tcpudp and xt_dccp option walkers.
Fixes: 2e4e6a17af ("[NETFILTER] x_tables: Abstraction layer for {ip,ip6,arp}_tables")
Signed-off-by: David Dull <monderasdor@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
pipapo_drop() passes rulemap[i + 1].n to pipapo_unmap() as the
to_offset argument on every iteration, including the last one where
i == m->field_count - 1. This reads one element past the end of the
stack-allocated rulemap array (declared as rulemap[NFT_PIPAPO_MAX_FIELDS]
with NFT_PIPAPO_MAX_FIELDS == 16).
Although pipapo_unmap() returns early when is_last is true without
using the to_offset value, the argument is evaluated at the call site
before the function body executes, making this a genuine out-of-bounds
stack read confirmed by KASAN:
BUG: KASAN: stack-out-of-bounds in pipapo_drop+0x50c/0x57c [nf_tables]
Read of size 4 at addr ffff8000810e71a4
This frame has 1 object:
[32, 160) 'rulemap'
The buggy address is at offset 164 -- exactly 4 bytes past the end
of the rulemap array.
Pass 0 instead of rulemap[i + 1].n on the last iteration to avoid
the out-of-bounds read.
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Jenny Guanni Qu <qguanni@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
During transaction processing we might have more than one catchall element:
1 live catchall element and 1 pending element that is coming as part of the
new batch.
If the map holding the catchall elements is also going away, its
required to toggle all catchall elements and not just the first viable
candidate.
Otherwise, we get:
WARNING: ./include/net/netfilter/nf_tables.h:1281 at nft_data_release+0xb7/0xe0 [nf_tables], CPU#2: nft/1404
RIP: 0010:nft_data_release+0xb7/0xe0 [nf_tables]
[..]
__nft_set_elem_destroy+0x106/0x380 [nf_tables]
nf_tables_abort_release+0x348/0x8d0 [nf_tables]
nf_tables_abort+0xcf2/0x3ac0 [nf_tables]
nfnetlink_rcv_batch+0x9c9/0x20e0 [..]
Fixes: 628bd3e49c ("netfilter: nf_tables: drop map element references from preparation phase")
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
When handling NETDEV_REGISTER notification, duplicate device
registration must be avoided since the device may have been added by
nft_netdev_hook_alloc() already when creating the hook.
Suggested-by: Florian Westphal <fw@strlen.de>
Reported-by: syzbot+bb9127e278fa198e110c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=bb9127e278fa198e110c
Fixes: a331b78a55 ("netfilter: nf_tables: Respect NETDEV_REGISTER events")
Tested-by: Helen Koike <koike@igalia.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
Tunnel xmit functions (iptunnel_xmit, ip6tunnel_xmit) lack their own
recursion limit. When a bond device in broadcast mode has GRE tap
interfaces as slaves, and those GRE tunnels route back through the
bond, multicast/broadcast traffic triggers infinite recursion between
bond_xmit_broadcast() and ip_tunnel_xmit()/ip6_tnl_xmit(), causing
kernel stack overflow.
The existing XMIT_RECURSION_LIMIT (8) in the no-qdisc path is not
sufficient because tunnel recursion involves route lookups and full IP
output, consuming much more stack per level. Use a lower limit of 4
(IP_TUNNEL_RECURSION_LIMIT) to prevent overflow.
Add recursion detection using dev_xmit_recursion helpers directly in
iptunnel_xmit() and ip6tunnel_xmit() to cover all IPv4/IPv6 tunnel
paths including UDP encapsulated tunnels (VXLAN, Geneve, etc.).
Move dev_xmit_recursion helpers from net/core/dev.h to public header
include/linux/netdevice.h so they can be used by tunnel code.
BUG: KASAN: stack-out-of-bounds in blake2s.constprop.0+0xe7/0x160
Write of size 32 at addr ffff88810033fed0 by task kworker/0:1/11
Workqueue: mld mld_ifc_work
Call Trace:
<TASK>
__build_flow_key.constprop.0 (net/ipv4/route.c:515)
ip_rt_update_pmtu (net/ipv4/route.c:1073)
iptunnel_xmit (net/ipv4/ip_tunnel_core.c:84)
ip_tunnel_xmit (net/ipv4/ip_tunnel.c:847)
gre_tap_xmit (net/ipv4/ip_gre.c:779)
dev_hard_start_xmit (net/core/dev.c:3887)
sch_direct_xmit (net/sched/sch_generic.c:347)
__dev_queue_xmit (net/core/dev.c:4802)
bond_dev_queue_xmit (drivers/net/bonding/bond_main.c:312)
bond_xmit_broadcast (drivers/net/bonding/bond_main.c:5279)
bond_start_xmit (drivers/net/bonding/bond_main.c:5530)
dev_hard_start_xmit (net/core/dev.c:3887)
__dev_queue_xmit (net/core/dev.c:4841)
ip_finish_output2 (net/ipv4/ip_output.c:237)
ip_output (net/ipv4/ip_output.c:438)
iptunnel_xmit (net/ipv4/ip_tunnel_core.c:86)
gre_tap_xmit (net/ipv4/ip_gre.c:779)
dev_hard_start_xmit (net/core/dev.c:3887)
sch_direct_xmit (net/sched/sch_generic.c:347)
__dev_queue_xmit (net/core/dev.c:4802)
bond_dev_queue_xmit (drivers/net/bonding/bond_main.c:312)
bond_xmit_broadcast (drivers/net/bonding/bond_main.c:5279)
bond_start_xmit (drivers/net/bonding/bond_main.c:5530)
dev_hard_start_xmit (net/core/dev.c:3887)
__dev_queue_xmit (net/core/dev.c:4841)
ip_finish_output2 (net/ipv4/ip_output.c:237)
ip_output (net/ipv4/ip_output.c:438)
iptunnel_xmit (net/ipv4/ip_tunnel_core.c:86)
ip_tunnel_xmit (net/ipv4/ip_tunnel.c:847)
gre_tap_xmit (net/ipv4/ip_gre.c:779)
dev_hard_start_xmit (net/core/dev.c:3887)
sch_direct_xmit (net/sched/sch_generic.c:347)
__dev_queue_xmit (net/core/dev.c:4802)
bond_dev_queue_xmit (drivers/net/bonding/bond_main.c:312)
bond_xmit_broadcast (drivers/net/bonding/bond_main.c:5279)
bond_start_xmit (drivers/net/bonding/bond_main.c:5530)
dev_hard_start_xmit (net/core/dev.c:3887)
__dev_queue_xmit (net/core/dev.c:4841)
mld_sendpack
mld_ifc_work
process_one_work
worker_thread
</TASK>
Fixes: 745e20f1b6 ("net: add a recursion limit in xmit path")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Link: https://patch.msgid.link/20260306160133.3852900-2-bestswngs@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Raju Rangoju says:
====================
amd-xgbe: RX adaptation and PHY handling fixes
This series fixes several issues in the amd-xgbe driver related to RX
adaptation and PHY handling in 10GBASE-KR mode, particularly when
auto-negotiation is disabled.
Patch 1 fixes link status handling during RX adaptation by correctly
reading the latched link status bit so transient link drops are
detected without losing the current state.
Patch 2 prevents CRC errors that can occur when performing RX
adaptation with auto-negotiation turned off. The driver now stops
TX/RX before re-triggering RX adaptation and only re-enables traffic
once adaptation completes and the link is confirmed up, ensuring
packets are not corrupted during the adaptation window.
Patch 3 restores the intended ordering of PHY reset relative to
phy_start(), making sure PHY settings are reset before the PHY is
started instead of afterwards.
====================
Link: https://patch.msgid.link/20260306111629.1515676-1-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
commit f93505f357 ("amd-xgbe: let the MAC manage PHY PM") moved
xgbe_phy_reset() from xgbe_open() to xgbe_start(), placing it after
phy_start(). As a result, the PHY settings were being reset after the
PHY had already started.
Reorder the calls so that the PHY settings are reset before
phy_start() is invoked.
Fixes: f93505f357 ("amd-xgbe: let the MAC manage PHY PM")
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20260306111629.1515676-4-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When operating in 10GBASE-KR mode with auto-negotiation disabled and RX
adaptation enabled, CRC errors can occur during the RX adaptation
process. This happens because the driver continues transmitting and
receiving packets while adaptation is in progress.
Fix this by stopping TX/RX immediately when the link goes down and RX
adaptation needs to be re-triggered, and only re-enabling TX/RX after
adaptation completes and the link is confirmed up. Introduce a flag to
track whether TX/RX was disabled for adaptation so it can be restored
correctly.
This prevents packets from being transmitted or received during the RX
adaptation window and avoids CRC errors from corrupted frames.
The flag tracking the data path state is synchronized with hardware
state in xgbe_start() to prevent stale state after device restarts.
This ensures that after a restart cycle (where xgbe_stop disables
TX/RX and xgbe_start re-enables them), the flag correctly reflects
that the data path is active.
Fixes: 4f3b20bfbb ("amd-xgbe: add support for rx-adaptation")
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20260306111629.1515676-3-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The link status bit is latched low to allow detection of momentary
link drops. If the status indicates that the link is already down,
read it again to obtain the current state.
Fixes: 4f3b20bfbb ("amd-xgbe: add support for rx-adaptation")
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20260306111629.1515676-2-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
mctp_flow_prepare_output() checks key->dev and may call
mctp_dev_set_key(), but it does not hold key->lock while doing so.
mctp_dev_set_key() and mctp_dev_release_key() are annotated with
__must_hold(&key->lock), so key->dev access is intended to be
serialized by key->lock. The mctp_sendmsg() transmit path reaches
mctp_flow_prepare_output() via mctp_local_output() -> mctp_dst_output()
without holding key->lock, so the check-and-set sequence is racy.
Example interleaving:
CPU0 CPU1
---- ----
mctp_flow_prepare_output(key, devA)
if (!key->dev) // sees NULL
mctp_flow_prepare_output(
key, devB)
if (!key->dev) // still NULL
mctp_dev_set_key(devB, key)
mctp_dev_hold(devB)
key->dev = devB
mctp_dev_set_key(devA, key)
mctp_dev_hold(devA)
key->dev = devA // overwrites devB
Now both devA and devB references were acquired, but only the final
key->dev value is tracked for release. One reference can be lost,
causing a resource leak as mctp_dev_release_key() would only decrease
the reference on one dev.
Fix by taking key->lock around the key->dev check and
mctp_dev_set_key() call.
Fixes: 67737c4572 ("mctp: Pass flow data & flow release events to drivers")
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Link: https://patch.msgid.link/20260306031402.857224-1-dg573847474@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>