linux

Commit Graph

Author	SHA1	Message	Date
Xiang Mei	fa6e249633	bridge: mrp: reject zero test interval to avoid OOM panic br_mrp_start_test() and br_mrp_start_in_test() accept the user-supplied interval value from netlink without validation. When interval is 0, usecs_to_jiffies(0) yields 0, causing the delayed work (br_mrp_test_work_expired / br_mrp_in_test_work_expired) to reschedule itself with zero delay. This creates a tight loop on system_percpu_wq that allocates and transmits MRP test frames at maximum rate, exhausting all system memory and causing a kernel panic via OOM deadlock. The same zero-interval issue applies to br_mrp_start_in_test_parse() for interconnect test frames. Use NLA_POLICY_MIN(NLA_U32, 1) in the nla_policy tables for both IFLA_BRIDGE_MRP_START_TEST_INTERVAL and IFLA_BRIDGE_MRP_START_IN_TEST_INTERVAL, so zero is rejected at the netlink attribute parsing layer before the value ever reaches the workqueue scheduling code. This is consistent with how other bridge subsystems (br_fdb, br_mst) enforce range constraints on netlink attributes. Fixes: `20f6a05ef6` ("bridge: mrp: Rework the MRP netlink interface") Fixes: `7ab1748e4c` ("bridge: mrp: Extend MRP netlink interface for configuring MRP interconnect") Reported-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Xiang Mei <xmei5@asu.edu> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260328063000.1845376-1-xmei5@asu.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-31 16:11:24 +02:00
Yang Yang	850837965a	bridge: br_nd_send: validate ND option lengths br_nd_send() walks ND options according to option-provided lengths. A malformed option can make the parser advance beyond the computed option span or use a too-short source LLADDR option payload. Validate option lengths against the remaining NS option area before advancing, and only read source LLADDR when the option is large enough for an Ethernet address. Fixes: `ed842faeb2` ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports") Cc: stable@vger.kernel.org Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Tested-by: Ao Zhou <n05ec@lzu.edu.cn> Co-developed-by: Yuan Tan <tanyuan98@outlook.com> Signed-off-by: Yuan Tan <tanyuan98@outlook.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Yang Yang <n05ec@lzu.edu.cn> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260326034441.2037420-3-n05ec@lzu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-27 20:37:14 -07:00
Yang Yang	a01aee7caf	bridge: br_nd_send: linearize skb before parsing ND options br_nd_send() parses neighbour discovery options from ns->opt[] and assumes that these options are in the linear part of request. Its callers only guarantee that the ICMPv6 header and target address are available, so the option area can still be non-linear. Parsing ns->opt[] in that case can access data past the linear buffer. Linearize request before option parsing and derive ns from the linear network header. Fixes: `ed842faeb2` ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Tested-by: Ao Zhou <n05ec@lzu.edu.cn> Co-developed-by: Yuan Tan <tanyuan98@outlook.com> Signed-off-by: Yuan Tan <tanyuan98@outlook.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Yang Yang <n05ec@lzu.edu.cn> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260326034441.2037420-2-n05ec@lzu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-27 20:37:14 -07:00
Hyunwoo Kim	3715a00855	bridge: cfm: Fix race condition in peer_mep deletion When a peer MEP is being deleted, cancel_delayed_work_sync() is called on ccm_rx_dwork before freeing. However, br_cfm_frame_rx() runs in softirq context under rcu_read_lock (without RTNL) and can re-schedule ccm_rx_dwork via ccm_rx_timer_start() between cancel_delayed_work_sync() returning and kfree_rcu() being called. The following is a simple race scenario: cpu0 cpu1 mep_delete_implementation() cancel_delayed_work_sync(ccm_rx_dwork); br_cfm_frame_rx() // peer_mep still in hlist if (peer_mep->ccm_defect) ccm_rx_timer_start() queue_delayed_work(ccm_rx_dwork) hlist_del_rcu(&peer_mep->head); kfree_rcu(peer_mep, rcu); ccm_rx_work_expired() // on freed peer_mep To prevent this, cancel_delayed_work_sync() is replaced with disable_delayed_work_sync() in both peer MEP deletion paths, so that subsequent queue_delayed_work() calls from br_cfm_frame_rx() are silently rejected. The cc_peer_disable() helper retains cancel_delayed_work_sync() because it is also used for the CC enable/disable toggle path where the work must remain re-schedulable. Fixes: `dc32cbb3db` ("bridge: cfm: Kernel space implementation of CFM. CCM frame RX added.") Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/abBgYT5K_FI9rD1a@v4bel Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-12 18:33:52 -07:00
Fernando Fernandez Mancera	e5e8906305	net: bridge: fix nd_tbl NULL dereference when IPv6 is disabled When booting with the 'ipv6.disable=1' parameter, the nd_tbl is never initialized because inet6_init() exits before ndisc_init() is called which initializes it. Then, if neigh_suppress is enabled and an ICMPv6 Neighbor Discovery packet reaches the bridge, br_do_suppress_nd() will dereference ipv6_stub->nd_tbl which is NULL, passing it to neigh_lookup(). This causes a kernel NULL pointer dereference. BUG: kernel NULL pointer dereference, address: 0000000000000268 Oops: 0000 [#1] PREEMPT SMP NOPTI [...] RIP: 0010:neigh_lookup+0x16/0xe0 [...] Call Trace: <IRQ> ? neigh_lookup+0x16/0xe0 br_do_suppress_nd+0x160/0x290 [bridge] br_handle_frame_finish+0x500/0x620 [bridge] br_handle_frame+0x353/0x440 [bridge] __netif_receive_skb_core.constprop.0+0x298/0x1110 __netif_receive_skb_one_core+0x3d/0xa0 process_backlog+0xa0/0x140 __napi_poll+0x2c/0x170 net_rx_action+0x2c4/0x3a0 handle_softirqs+0xd0/0x270 do_softirq+0x3f/0x60 Fix this by replacing IS_ENABLED(IPV6) call with ipv6_mod_enabled() in the callers. This is in essence disabling NS/NA suppression when IPv6 is disabled. Fixes: `ed842faeb2` ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports") Reported-by: Guruprasad C P <gurucp2005@gmail.com> Closes: https://lore.kernel.org/netdev/CAHXs0ORzd62QOG-Fttqa2Cx_A_VFp=utE2H2VTX5nqfgs7LDxQ@mail.gmail.com/ Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260304120357.9778-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-05 07:52:56 -08:00
Danielle Ratson	93c9475c04	bridge: Check relevant per-VLAN options in VLAN range grouping The br_vlan_opts_eq_range() function determines if consecutive VLANs can be grouped together in a range for compact netlink notifications. It currently checks state, tunnel info, and multicast router configuration, but misses two categories of per-VLAN options that affect the output: 1. User-visible priv_flags (neigh_suppress, mcast_enabled) 2. Port multicast context (mcast_max_groups, mcast_n_groups) When VLANs have different settings for these options, they are incorrectly grouped into ranges, causing netlink notifications to report only one VLAN's settings for the entire range. Fix by checking priv_flags equality, but only for flags that affect netlink output (BR_VLFLAG_NEIGH_SUPPRESS_ENABLED and BR_VLFLAG_MCAST_ENABLED), and comparing multicast context (mcast_max_groups and mcast_n_groups). Example showing the bugs before the fix: $ bridge vlan set vid 10 dev dummy1 neigh_suppress on $ bridge vlan set vid 11 dev dummy1 neigh_suppress off $ bridge -d vlan show dev dummy1 port vlan-id dummy1 10-11 ... neigh_suppress on $ bridge vlan set vid 10 dev dummy1 mcast_max_groups 100 $ bridge vlan set vid 11 dev dummy1 mcast_max_groups 200 $ bridge -d vlan show dev dummy1 port vlan-id dummy1 10-11 ... mcast_max_groups 100 After the fix, VLANs 10 and 11 are shown as separate entries with their correct individual settings. Fixes: `a1aee20d5d` ("net: bridge: Add netlink knobs for number / maximum MDB entries") Fixes: `83f6d60079` ("bridge: vlan: Allow setting VLAN neighbor suppression state") Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260225143956.3995415-2-danieller@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-26 19:24:29 -08:00
Kees Cook	189f164e57	Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-22 08:26:33 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Linus Torvalds	8bf22c33e7	Including fixes from Netfilter. Current release - new code bugs: - net: fix backlog_unlock_irq_restore() vs CONFIG_PREEMPT_RT - eth: mlx5e: XSK, Fix unintended ICOSQ change - phy_port: correctly recompute the port's linkmodes - vsock: prevent child netns mode switch from local to global - couple of kconfig fixes for new symbols Previous releases - regressions: - nfc: nci: fix false-positive parameter validation for packet data - net: do not delay zero-copy skbs in skb_attempt_defer_free() Previous releases - always broken: - mctp: ensure our nlmsg responses to user space are zero-initialised - ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data() - fixes for ICMP rate limiting Misc: - intel: fix PCI device ID conflict between i40e and ipw2200 Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmmXUh8ACgkQMUZtbf5S IrufYA//ZVj+4gvegqKwKZYXNBndVW00GGTYqaILbaenK1olUVUelVB91eV2Klc/ dXCeKG/MgEPuT89IjkPzVr2Yg4x6uhjcQL1rsahORn+GuQfSI/P8y7ysDOPnHVeM Rtsg1m8z3EizJcHPeAJe7nEqFzfvZ2m+FCEGe++z8BYaUZUVApytgpIWOHO/aB+p t13bCNzd05XxPphMl610T00Fncj2jCVDHILMgTB5rmFmkeJuQwNrRGXQSoQame46 +g+yCZjT0eVTrBaH1EUssWfrOT3VJj3BEee6gSp7k9mxMkbW18i8shBgmxS+EHjk u19wwBzSrHK+JY1UExim+1E/rZisQVmEE1Gs0ALedxAu9zC/Julzfa2/+BFsc0j7 QTXd4jukG3aTPIX8v3TV2Igu0j+bAT4WdpzvnsXXBMVKy7wFYMd1+aSOLyFH2W9L qRbg50oUATcsz77bZt6YUTJEgua4HXNYGtn15FMZOR7HJVR2L44Q5TK5mQxGp5iM GabeKMzg6bsjE98STM3nbWks3pIb9ptIk++i0913eSqKgn84bDPtp3Gabfgle2SJ 8gjKS61K8rDt5x8StXVod7oGQ4asL8RJyOtE/avgbWUu9BNH8/oKqsE6TQrpXauv 1ndiyim/mPe4fBCxkVAi2+uq5/ph9z8XyleESz9VYwyL3Rl4nsg= =qSCj -----END PGP SIGNATURE----- Merge tag 'net-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from Netfilter. Current release - new code bugs: - net: fix backlog_unlock_irq_restore() vs CONFIG_PREEMPT_RT - eth: mlx5e: XSK, Fix unintended ICOSQ change - phy_port: correctly recompute the port's linkmodes - vsock: prevent child netns mode switch from local to global - couple of kconfig fixes for new symbols Previous releases - regressions: - nfc: nci: fix false-positive parameter validation for packet data - net: do not delay zero-copy skbs in skb_attempt_defer_free() Previous releases - always broken: - mctp: ensure our nlmsg responses to user space are zero-initialised - ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data() - fixes for ICMP rate limiting Misc: - intel: fix PCI device ID conflict between i40e and ipw2200" * tag 'net-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits) net: nfc: nci: Fix parameter validation for packet data net/mlx5e: Use unsigned for mlx5e_get_max_num_channels net/mlx5e: Fix deadlocks between devlink and netdev instance locks net/mlx5e: MACsec, add ASO poll loop in macsec_aso_set_arm_event net/mlx5: Fix misidentification of write combining CQE during poll loop net/mlx5e: Fix misidentification of ASO CQE during poll loop net/mlx5: Fix multiport device check over light SFs bonding: alb: fix UAF in rlb_arp_recv during bond up/down bnge: fix reserving resources from FW eth: fbnic: Advertise supported XDP features. rds: tcp: fix uninit-value in __inet_bind net/rds: Fix NULL pointer dereference in rds_tcp_accept_one octeontx2-af: Fix default entries mcam entry action net/mlx5e: XSK, Fix unintended ICOSQ change ipv6: icmp: icmpv6_xrlim_allow() optimization if net.ipv6.icmp.ratelimit is zero ipv4: icmp: icmpv4_xrlim_allow() optimization if net.ipv4.icmp_ratelimit is zero ipv6: icmp: remove obsolete code in icmpv6_xrlim_allow() inet: move icmp_global_{credit,stamp} to a separate cache line icmp: prevent possible overflow in icmp_global_allow() selftests/net: packetdrill: add ipv4-mapped-ipv6 tests ...	2026-02-19 10:39:08 -08:00
Nikolay Aleksandrov	8b769e311a	net: bridge: mcast: always update mdb_n_entries for vlan contexts syzbot triggered a warning[1] about the number of mdb entries in a context. It turned out that there are multiple ways to trigger that warning today (some got added during the years), the root cause of the problem is that the increase is done conditionally, and over the years these different conditions increased so there were new ways to trigger the warning, that is to do a decrease which wasn't paired with a previous increase. For example one way to trigger it is with flush: $ ip l add br0 up type bridge vlan_filtering 1 mcast_snooping 1 $ ip l add dumdum up master br0 type dummy $ bridge mdb add dev br0 port dumdum grp 239.0.0.1 permanent vid 1 $ ip link set dev br0 down $ ip link set dev br0 type bridge mcast_vlan_snooping 1 ^^^^ this will enable snooping, but will not update mdb_n_entries because in __br_multicast_enable_port_ctx() we check !netif_running $ bridge mdb flush dev br0 ^^^ this will trigger the warning because it will delete the pg which we added above, which will try to decrease mdb_n_entries Fix the problem by removing the conditional increase and always keep the count up-to-date while the vlan exists. In order to do that we have to first initialize it on port-vlan context creation, and then always increase or decrease the value regardless of mcast options. To keep the current behaviour we have to enforce the mdb limit only if the context is port's or if the port-vlan's mcast snooping is enabled. [1] ------------[ cut here ]------------ n == 0 WARNING: net/bridge/br_multicast.c:718 at br_multicast_port_ngroups_dec_one net/bridge/br_multicast.c:718 [inline], CPU#0: syz.4.4607/22043 WARNING: net/bridge/br_multicast.c:718 at br_multicast_port_ngroups_dec net/bridge/br_multicast.c:771 [inline], CPU#0: syz.4.4607/22043 WARNING: net/bridge/br_multicast.c:718 at br_multicast_del_pg+0x1bbe/0x1e20 net/bridge/br_multicast.c:825, CPU#0: syz.4.4607/22043 Modules linked in: CPU: 0 UID: 0 PID: 22043 Comm: syz.4.4607 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/24/2026 RIP: 0010:br_multicast_port_ngroups_dec_one net/bridge/br_multicast.c:718 [inline] RIP: 0010:br_multicast_port_ngroups_dec net/bridge/br_multicast.c:771 [inline] RIP: 0010:br_multicast_del_pg+0x1bbe/0x1e20 net/bridge/br_multicast.c:825 Code: 41 5f 5d e9 04 7a 48 f7 e8 3f 73 5c f7 90 0f 0b 90 e9 cf fd ff ff e8 31 73 5c f7 90 0f 0b 90 e9 16 fd ff ff e8 23 73 5c f7 90 <0f> 0b 90 e9 60 fd ff ff e8 15 73 5c f7 eb 05 e8 0e 73 5c f7 48 8b RSP: 0018:ffffc9000c207220 EFLAGS: 00010293 RAX: ffffffff8a68042d RBX: ffff88807c6f1800 RCX: ffff888066e90000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff888066e90000 R09: 000000000000000c R10: 000000000000000c R11: 0000000000000000 R12: ffff8880303ef800 R13: dffffc0000000000 R14: ffff888050eb11c4 R15: 1ffff1100a1d6238 FS: 00007fa45921b6c0(0000) GS:ffff8881256f5000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa4591f9ff8 CR3: 0000000081df2000 CR4: 00000000003526f0 Call Trace: <TASK> br_mdb_flush_pgs net/bridge/br_mdb.c:1525 [inline] br_mdb_flush net/bridge/br_mdb.c:1544 [inline] br_mdb_del_bulk+0x5e2/0xb20 net/bridge/br_mdb.c:1561 rtnl_mdb_del+0x48a/0x640 net/core/rtnetlink.c:-1 rtnetlink_rcv_msg+0x77e/0xbe0 net/core/rtnetlink.c:6967 netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline] netlink_unicast+0x80f/0x9b0 net/netlink/af_netlink.c:1344 netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg net/socket.c:742 [inline] ____sys_sendmsg+0xa68/0xad0 net/socket.c:2592 ___sys_sendmsg+0x2a5/0x360 net/socket.c:2646 __sys_sendmsg net/socket.c:2678 [inline] __do_sys_sendmsg net/socket.c:2683 [inline] __se_sys_sendmsg net/socket.c:2681 [inline] __x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2681 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xe2/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fa45839aeb9 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fa45921b028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fa458615fa0 RCX: 00007fa45839aeb9 RDX: 0000000000000000 RSI: 00002000000000c0 RDI: 0000000000000004 RBP: 00007fa458408c1f R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fa458616038 R14: 00007fa458615fa0 R15: 00007fff0b59fae8 </TASK> Fixes: `b57e8d870d` ("net: bridge: Maintain number of MDB entries in net_bridge_mcast_port") Reported-by: syzbot+d5d1b7343531d17bd3c5@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/aYrWbRp83MQR1ife@debil/T/#t Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://patch.msgid.link/20260213070031.1400003-2-nikolay@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-17 13:00:14 +01:00
Linus Torvalds	136114e0ab	mm.git review status for linus..mm-nonmm-stable Total patches: 107 Reviews/patch: 1.07 Reviewed rate: 67% - The 2 patch series "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" from Heming Zhao saves disk space by teaching ocfs2 to reclaim suballocator block group space. - The 4 patch series "Add ARRAY_END(), and use it to fix off-by-one bugs" from Alejandro Colomar adds the ARRAY_END() macro and uses it in various places. - The 2 patch series "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" from Pnina Feder makes the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the page size. - The 7 patch series "kallsyms: Prevent invalid access when showing module buildid" from Petr Mladek cleans up kallsyms code related to module buildid and fixes an invalid access crash when printing backtraces. - The 3 patch series "Address page fault in ima_restore_measurement_list()" from Harshit Mogalapalli fixes a kexec-related crash that can occur when booting the second-stage kernel on x86. - The 6 patch series "kho: ABI headers and Documentation updates" from Mike Rapoport updates the kexec handover ABI documentation. - The 4 patch series "Align atomic storage" from Finn Thain adds the __aligned attribute to atomic_t and atomic64_t definitions to get natural alignment of both types on csky, m68k, microblaze, nios2, openrisc and sh. - The 2 patch series "kho: clean up page initialization logic" from Pratyush Yadav simplifies the page initialization logic in kho_restore_page(). - The 6 patch series "Unload linux/kernel.h" from Yury Norov moves several things out of kernel.h and into more appropriate places. - The 7 patch series "don't abuse task_struct.group_leader" from Oleg Nesterov removes the usage of ->group_leader when it is "obviously unnecessary". - The 5 patch series "list private v2 & luo flb" from Pasha Tatashin adds some infrastructure improvements to the live update orchestrator. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaY4giAAKCRDdBJ7gKXxA jgusAQDnKkP8UWTqXPC1jI+OrDJGU5ciAx8lzLeBVqMKzoYk9AD/TlhT2Nlx+Ef6 0HCUHUD0FMvAw/7/Dfc6ZKxwBEIxyww= =mmsH -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves disk space by teaching ocfs2 to reclaim suballocator block group space (Heming Zhao) - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the ARRAY_END() macro and uses it in various places (Alejandro Colomar) - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the page size (Pnina Feder) - "kallsyms: Prevent invalid access when showing module buildid" cleans up kallsyms code related to module buildid and fixes an invalid access crash when printing backtraces (Petr Mladek) - "Address page fault in ima_restore_measurement_list()" fixes a kexec-related crash that can occur when booting the second-stage kernel on x86 (Harshit Mogalapalli) - "kho: ABI headers and Documentation updates" updates the kexec handover ABI documentation (Mike Rapoport) - "Align atomic storage" adds the __aligned attribute to atomic_t and atomic64_t definitions to get natural alignment of both types on csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain) - "kho: clean up page initialization logic" simplifies the page initialization logic in kho_restore_page() (Pratyush Yadav) - "Unload linux/kernel.h" moves several things out of kernel.h and into more appropriate places (Yury Norov) - "don't abuse task_struct.group_leader" removes the usage of ->group_leader when it is "obviously unnecessary" (Oleg Nesterov) - "list private v2 & luo flb" adds some infrastructure improvements to the live update orchestrator (Pasha Tatashin) * tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits) watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency procfs: fix missing RCU protection when reading real_parent in do_task_stat() watchdog/softlockup: fix sample ring index wrap in need_counting_irqs() kcsan, compiler_types: avoid duplicate type issues in BPF Type Format kho: fix doc for kho_restore_pages() tests/liveupdate: add in-kernel liveupdate test liveupdate: luo_flb: introduce File-Lifecycle-Bound global state liveupdate: luo_file: Use private list list: add kunit test for private list primitives list: add primitives for private list manipulations delayacct: fix uapi timespec64 definition panic: add panic_force_cpu= parameter to redirect panic to a specific CPU netclassid: use thread_group_leader(p) in update_classid_task() RDMA/umem: don't abuse current->group_leader drm/pan*: don't abuse current->group_leader drm/amd: kill the outdated "Only the pthreads threading model is supported" checks drm/amdgpu: don't abuse current->group_leader android/binder: use same_thread_group(proc->tsk, current) in binder_mmap() android/binder: don't abuse current->group_leader kho: skip memoryless NUMA nodes when reserving scratch areas ...	2026-02-12 12:13:01 -08:00
Alice Mikityanska	b2936b4fd5	net/ipv6: Introduce payload_len helpers The next commits will transition away from using the hop-by-hop extension header to encode packet length for BIG TCP. Add wrappers around ip6->payload_len that return the actual value if it's non-zero, and calculate it from skb->len if payload_len is set to zero (and a symmetrical setter). The new helpers are used wherever the surrounding code supports the hop-by-hop jumbo header for BIG TCP IPv6, or the corresponding IPv4 code uses skb_ip_totlen (e.g., in include/net/netfilter/nf_tables_ipv6.h). No behavioral change in this commit. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260205133925.526371-2-alice.kernel@fastmail.im Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-06 20:50:03 -08:00
David Corvaglia	6dfa3df797	net: bridge: use sysfs_emit instead of sprintf Replace sprintf with sysfs_emit in sysfs show() methods as outlined in Documentation/filesystems/sysfs.rst. sysfs_emit is preferred to sprintf in sysfs show() methods as it is safer with buffer handling. Signed-off-by: David Corvaglia <david@corvaglia.dev> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/0100019c1fc2bcc3-bc9ca2f1-22d7-4250-8441-91e4af57117b-000000@email.amazonses.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-03 19:19:42 -08:00
Jakub Kicinski	a010fe8d86	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.19-rc8). No adjacent changes, conflicts: drivers/net/ethernet/spacemit/k1_emac.c `2c84959167` ("net: spacemit: Check for netif_carrier_ok() in emac_stats_update()") `f66086798f` ("net: spacemit: Remove broken flow control support") https://lore.kernel.org/aXjAqZA3iEWD_DGM@sirena.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-29 17:28:54 -08:00
Martin Kaiser	cc0cf10fda	net: bridge: fix static key check Fix the check if netfilter's static keys are available. netfilter defines and exports static keys if CONFIG_JUMP_LABEL is enabled. (HAVE_JUMP_LABEL is never defined.) Fixes: `971502d77f` ("bridge: netfilter: unroll NF_HOOK helper in bridge input path") Signed-off-by: Martin Kaiser <martin@kaiser.cx> Reviewed-by: Florian Westphal <fw@strlen.de> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260127101925.1754425-1-martin@kaiser.cx Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-28 19:34:01 -08:00
David Yang	dd7fca9f49	net: bridge: mcast: fix memcpy with u64_stats On 64bit arches, struct u64_stats_sync is empty and provides no help against load/store tearing. memcpy() should not be considered atomic against u64 values. Use u64_stats_copy() instead. Signed-off-by: David Yang <mmyangfl@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260120092137.2161162-3-mmyangfl@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-25 13:14:11 -08:00
Randy Dunlap	24c776355f	kernel.h: drop hex.h and update all hex.h users Remove <linux/hex.h> from <linux/kernel.h> and update all users/callers of hex.h interfaces to directly #include <linux/hex.h> as part of the process of putting kernel.h on a diet. Removing hex.h from kernel.h means that 36K C source files don't have to pay the price of parsing hex.h for the roughly 120 C source files that need it. This change has been build-tested with allmodconfig on most ARCHes. Also, all users/callers of <linux/hex.h> in the entire source tree have been updated if needed (if not already #included). Link: https://lkml.kernel.org/r/20251215005206.2362276-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:44:19 -08:00
Florian Westphal	910d271227	netfilter: don't include xt and nftables.h in unrelated subsystems conntrack, xtables and nftables are distinct subsystems, don't use them in other subystems. Signed-off-by: Florian Westphal <fw@strlen.de>	2026-01-20 16:23:37 +01:00
Eric Dumazet	b25a0b4a21	net: bridge: annotate data-races around fdb->{updated,used} fdb->updated and fdb->used are read and written locklessly. Add READ_ONCE()/WRITE_ONCE() annotations. Fixes: `31cbc39b63` ("net: bridge: add option to allow activity notifications for any fdb entries") Reported-by: syzbot+bfab43087ad57222ce96@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/695e3d74.050a0220.1c677c.035f.GAE@google.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260108093806.834459-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-09 17:36:21 -08:00
Jakub Kicinski	d6f6c6d909	netfilter pull request nf-26-01-02 -----BEGIN PGP SIGNATURE----- iQJdBAABCABHFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmlXqEwbFIAAAAAABAAO bWFudTIsMi41KzEuMTEsMiwyDRxmd0BzdHJsZW4uZGUACgkQcJGo2a1f9gBFeg// awoFVzkRjgk5RgGnZstWU5QIDRwdl6G+Q0OkuQSh3KNQak50CxbrY6X6zSdVSJjG y///iM1zONM46k0TIvYovHklYp4adTTEPEXISuUwzARfbD6X40qgsVCkBNrmr5fO l1cu2RXAcYzNOm9DrC+744z8KVeduoL6LFon0Xf4ah/eqxM7o92Tcj8dtPV1TsA8 8C+wVxZIw11OaM7H1fKhUMhQ6CnVc5OZOveO/lJxonaTIuCVULxPezEZQjpXNcnc hp5uMrip2BlecyeNFiPqDhqnVeU34xN3Zxns6Nq9zOcz5yfQg/LB/XGTx39HVljn 4v7ziGS+7qUQ9zc9LNID1jgJY6RNlj2+SkbavTfKpPQagKOR+BEIw7b4KBAsOL9l b8uDXLlUaWKTjb/DIVSWks5viwg1tbtsdyoeplcHWfP7miATp59es2FvD+DxT8HV stXhTWf0CDCLxiUHW9E8+QoQcnktjw9khoCZY/YZwfbIYf60uvlLEzO2G1Dt9SO6 cYfdlHLPpFcmQKhEOAFuOSRURqpyMHpwRgbul1yvR/ItW0hNA8j4oepo6g6crU+l WQ+l18ZtV0Aa/CxZ4eGcUBvonjf/n+XR8Hshukh9yRblF5BFDuSFIKWY7pMbkRO2 Es+C4Q6WMufRmovYhWqUaR4pHJFFSVe7JxBZ85VBVjI= =g2yW -----END PGP SIGNATURE----- Merge tag 'nf-26-01-02' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter: updates for net The following patchset contains Netfilter fixes for net: 1) Fix overlap detection for nf_tables with concatenated ranges. There are cases where element could not be added due to a conflict with existing range, while kernel reports success to userspace. 2) update selftest to cover this bug. 3) synproxy update path should use READ/WRITE once as we replace config struct while packet path might read it in parallel. This relies on said config struct to fit sizeof(long). From Fernando Fernandez Mancera. 4) Don't return -EEXIST from xtables in module load path, a pending patch to module infra will spot a warning if this happens. From Daniel Gomez. 5) Fix a memory leak in nf_tables when chain hits 2*32 users and rule is to be hw-offloaded, from Zilin Guan. 6) Avoid infinite list growth when insert rate is high in nf_conncount, also from Fernando. tag 'nf-26-01-02' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_conncount: update last_gc only when GC has been performed netfilter: nf_tables: fix memory leak in nf_tables_newrule() netfilter: replace -EEXIST with -EBUSY netfilter: nft_synproxy: avoid possible data-race on update operation selftests: netfilter: nft_concat_range.sh: add check for overlap detection bug netfilter: nft_set_pipapo: fix range overlap detection ==================== Link: https://patch.msgid.link/20260102114128.7007-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-04 10:59:59 -08:00
Alexandre Knecht	3128df6be1	bridge: fix C-VLAN preservation in 802.1ad vlan_tunnel egress When using an 802.1ad bridge with vlan_tunnel, the C-VLAN tag is incorrectly stripped from frames during egress processing. br_handle_egress_vlan_tunnel() uses skb_vlan_pop() to remove the S-VLAN from hwaccel before VXLAN encapsulation. However, skb_vlan_pop() also moves any "next" VLAN from the payload into hwaccel: /* move next vlan tag to hw accel tag */ __skb_vlan_pop(skb, &vlan_tci); __vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci); For QinQ frames where the C-VLAN sits in the payload, this moves it to hwaccel where it gets lost during VXLAN encapsulation. Fix by calling __vlan_hwaccel_clear_tag() directly, which clears only the hwaccel S-VLAN and leaves the payload untouched. This path is only taken when vlan_tunnel is enabled and tunnel_info is configured, so 802.1Q bridges are unaffected. Tested with 802.1ad bridge + VXLAN vlan_tunnel, verified C-VLAN preserved in VXLAN payload via tcpdump. Fixes: `11538d039a` ("bridge: vlan dst_metadata hooks in ingress and egress paths") Signed-off-by: Alexandre Knecht <knecht.alexandre@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20251228020057.2788865-1-knecht.alexandre@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-04 09:45:35 -08:00
Daniel Gomez	2bafeb8d2f	netfilter: replace -EEXIST with -EBUSY The -EEXIST error code is reserved by the module loading infrastructure to indicate that a module is already loaded. When a module's init function returns -EEXIST, userspace tools like kmod interpret this as "module already loaded" and treat the operation as successful, returning 0 to the user even though the module initialization actually failed. Replace -EEXIST with -EBUSY to ensure correct error reporting in the module initialization path. Affected modules: * ebtable_broute ebtable_filter ebtable_nat arptable_filter * ip6table_filter ip6table_mangle ip6table_nat ip6table_raw * ip6table_security iptable_filter iptable_mangle iptable_nat * iptable_raw iptable_security Signed-off-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-01-01 11:31:48 +01:00
Bagas Sanjaya	f79f9b7ace	net: bridge: Describe @tunnel_hash member in net_bridge_vlan_group struct Sphinx reports kernel-doc warning: WARNING: ./net/bridge/br_private.h:267 struct member 'tunnel_hash' not described in 'net_bridge_vlan_group' Fix it by describing @tunnel_hash member. Fixes: `efa5356b0d` ("bridge: per vlan dst_metadata netlink support") Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20251218042936.24175-2-bagasdotme@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-12-28 10:17:14 +01:00
Jakub Kicinski	1ec9871fbb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.18-rc5). Conflicts: drivers/net/wireless/ath/ath12k/mac.c `9222582ec5` ("Revert "wifi: ath12k: Fix missing station power save configuration"") `6917e268c4` ("wifi: ath12k: Defer vdev bring-up until CSA finalize to avoid stale beacon") https://lore.kernel.org/11cece9f7e36c12efd732baa5718239b1bf8c950.camel@sipsolutions.net Adjacent changes: drivers/net/ethernet/intel/Kconfig `b1d16f7c00` ("libie: depend on DEBUG_FS when building LIBIE_FWLOG") `93f53db9f9` ("ice: switch to Page Pool") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-06 09:27:40 -08:00
Nikolay Aleksandrov	ee87c63f9b	net: bridge: fix MST static key usage As Ido pointed out, the static key usage in MST is buggy and should use inc/dec instead of enable/disable because we can have multiple bridges with MST enabled which means a single bridge can disable MST for all. Use static_branch_inc/dec to avoid that. When destroying a bridge decrement the key if MST was enabled. Fixes: `ec7328b591` ("net: bridge: mst: Multiple Spanning Tree (MST) mode") Reported-by: Ido Schimmel <idosch@nvidia.com> Closes: https://lore.kernel.org/netdev/20251104120313.1306566-1-razor@blackwall.org/T/#m6888d87658f94ed1725433940f4f4ebb00b5a68b Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20251105111919.1499702-3-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-06 07:32:17 -08:00
Nikolay Aleksandrov	8dca36978a	net: bridge: fix use-after-free due to MST port state bypass syzbot reported[1] a use-after-free when deleting an expired fdb. It is due to a race condition between learning still happening and a port being deleted, after all its fdbs have been flushed. The port's state has been toggled to disabled so no learning should happen at that time, but if we have MST enabled, it will bypass the port's state, that together with VLAN filtering disabled can lead to fdb learning at a time when it shouldn't happen while the port is being deleted. VLAN filtering must be disabled because we flush the port VLANs when it's being deleted which will stop learning. This fix adds a check for the port's vlan group which is initialized to NULL when the port is getting deleted, that avoids the port state bypass. When MST is enabled there would be a minimal new overhead in the fast-path because the port's vlan group pointer is cache-hot. [1] https://syzkaller.appspot.com/bug?extid=dd280197f0f7ab3917be Fixes: `ec7328b591` ("net: bridge: mst: Multiple Spanning Tree (MST) mode") Reported-by: syzbot+dd280197f0f7ab3917be@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/69088ffa.050a0220.29fc44.003d.GAE@google.com/ Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20251105111919.1499702-2-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-06 07:32:17 -08:00
Petr Machata	68800bbf58	net: bridge: Flush multicast groups when snooping is disabled When forwarding multicast packets, the bridge takes MDB into account when IGMP / MLD snooping is enabled. Currently, when snooping is disabled, the MDB is retained, even though it is not used anymore. At the same time, during the time that snooping is disabled, the IGMP / MLD control packets are obviously ignored, and after the snooping is reenabled, the administrator has to assume it is out of sync. In particular, missed join and leave messages would lead to traffic being forwarded to wrong interfaces. Keeping the MDB entries around thus serves no purpose, and just takes memory. Note also that disabling per-VLAN snooping does actually flush the relevant MDB entries. This patch flushes non-permanent MDB entries as global snooping is disabled. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/5e992df1bb93b88e19c0ea5819e23b669e3dde5d.1761228273.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-27 17:57:21 -07:00
Hangbin Liu	0152747a52	net: bridge: use common function to compute the features Previously, bridge ignored all features propagation and DST retention, only handling explicitly the GSO limits. By switching to the new helper netdev_compute_master_upper_features(), the bridge now expose additional features, depending on the lowers capabilities. Since br_set_gso_limits() is already covered by the helper, it can be removed safely. Bridge has it's own way to update needed_headroom. So we don't need to update it in the helper. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20251017034155.61990-5-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-21 18:08:23 -07:00
Alok Tiwari	0513a3f97b	net: bridge: correct debug message function name in br_fill_ifinfo The debug message in br_fill_ifinfo() incorrectly refers to br_fill_info instead of the actual function name. Update it for clarity in debugging output. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20251013100121.755899-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-14 12:13:36 -07:00
Eric Woudstra	bbf0c98b3a	bridge: br_vlan_fill_forward_path_pvid: use br_vlan_group_rcu() net/bridge/br_private.h:1627 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 7 locks held by socat/410: #0: ffff88800d7a9c90 (sk_lock-AF_INET){+.+.}-{0:0}, at: inet_stream_connect+0x43/0xa0 #1: ffffffff9a779900 (rcu_read_lock){....}-{1:3}, at: __ip_queue_xmit+0x62/0x1830 [..] #6: ffffffff9a779900 (rcu_read_lock){....}-{1:3}, at: nf_hook.constprop.0+0x8a/0x440 Call Trace: lockdep_rcu_suspicious.cold+0x4f/0xb1 br_vlan_fill_forward_path_pvid+0x32c/0x410 [bridge] br_fill_forward_path+0x7a/0x4d0 [bridge] Use to correct helper, non _rcu variant requires RTNL mutex. Fixes: `bcf2766b13` ("net: bridge: resolve forwarding path for VLAN tag actions in bridge devices") Signed-off-by: Eric Woudstra <ericwouds@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-10-08 13:17:31 +02:00
Petr Machata	cd9a9562b2	net: bridge: Install FDB for bridge MAC on VLAN 0 Currently, after the bridge is created, the FDB does not hold an FDB entry for the bridge MAC on VLAN 0: # ip link add name br up type bridge # ip -br link show dev br br UNKNOWN 92:19:8c:4e:01:ed <BROADCAST,MULTICAST,UP,LOWER_UP> # bridge fdb show \| grep 92:19:8c:4e:01:ed 92:19:8c:4e:01:ed dev br vlan 1 master br permanent Later when the bridge MAC is changed, or in fact when the address is given during netdevice creation, the entry appears: # ip link add name br up address 00:11:22:33:44:55 type bridge # bridge fdb show \| grep 00:11:22:33:44:55 00:11:22:33:44:55 dev br vlan 1 master br permanent 00:11:22:33:44:55 dev br master br permanent However when the bridge address is set by the user to the current bridge address before the first port is enslaved, none of the address handlers gets invoked, because the address is not actually changed. The address is however marked as NET_ADDR_SET. Then when a port is enslaved, the address is not changed, because it is NET_ADDR_SET. Thus the VLAN 0 entry is not added, and it has not been added previously either: # ip link add name br up type bridge # ip -br link show dev br br UNKNOWN 7e:f0:a8:1a:be:c2 <BROADCAST,MULTICAST,UP,LOWER_UP> # ip link set dev br addr 7e:f0:a8:1a:be:c2 # ip link add name v up type veth # ip link set dev v master br # ip -br link show dev br br UNKNOWN 7e:f0:a8:1a:be:c2 <BROADCAST,MULTICAST,UP,LOWER_UP> # bridge fdb \| grep 7e:f0:a8:1a:be:c2 7e:f0:a8:1a:be:c2 dev br vlan 1 master br permanent Then when the bridge MAC is used as DMAC, and br_handle_frame_finish() looks up an FDB entry with VLAN=0, it doesn't find any, and floods the traffic instead of passing it up. Fix this by simply adding the VLAN 0 FDB entry for the bridge itself always on netdevice creation. This also makes the behavior consistent with how ports are treated: ports always have an FDB entry for each member VLAN as well as VLAN 0. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/415202b2d1b9b0899479a502bbe2ba188678f192.1758550408.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-23 17:10:49 -07:00
Marco Crivellari	5fd8bb982e	net: replace use of system_wq with system_percpu_wq Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. system_unbound_wq should be the default workqueue so as not to enforce locality constraints for random work whenever it's not required. Adding system_dfl_wq to encourage its use when unbound work should be used. The old system_unbound_wq will be kept for a few release cycles. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Link: https://patch.msgid.link/20250918142427.309519-3-marco.crivellari@suse.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-22 17:40:30 -07:00
Jakub Kicinski	bd569dd935	netfilter pull request nf-next-25-09-11 -----BEGIN PGP SIGNATURE----- iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmjC0WsNHGZ3QHN0cmxl bi5kZQAKCRBwkajZrV/2AOC2D/982GOJGI79ArhsM9FzEiJDp2QLxMnz7cw0t1Vg pU469JYA/He6VHMbhiqCBopSAskl36kMMGLbflPe+Zc1DNjconFD8wrDRBlUI+EI lR1dERMUyafuRMuuvKaE7A0ePyR7AD1JFf1znpo6BrRDMX8mbumcIA/L7/T1Rz2F 0b9Phbj77EdKruRgD4eXCuDJibOr5j6FymNvDGR/bm3SovOsOM4gi2dQ3Y3B85ZK aUDDrss5acypBfu36w+340eMtr1GJKoaCKoK9ScnO58peCSo9C0b+o6xwUBPnHdw sQOhl6fErOfN+agGir357jLsBUb1H7Tg9decwv9qHp3jESei3kgAyoBF4SqH+48S Bz1E2M0A0Hxve3/I/Pp1Hz6X5r+V+gDtx71z4+acoZg5YjDMAKFJAfgiK6sFKqZP 73QsvnU2omu2C5r8TAbryv01C5xmkbe3YyU8j4taZOBGk4YkPd5WjJELbTcVM4If dxDE6vQ9+GrrlBl9aidrMNxIhJuS/qjVIb0tv7oZPVql995AgPJG3F3gSO78OPBj V4dN9p/wPN9lRl5tj/odWiOfsdQiY7fEkqdXlO79sACQCzXC5O6zAwcnK37s80mQ QFSt3oiPFcacJJyHHYeYISy13HXlxwUgdxWC6T/sXMU5Jpic+3bWFxC2YgRcIsqZ ZE1JbA== =FXmW -----END PGP SIGNATURE----- Merge tag 'nf-next-25-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter: updates for net-next 1) Don't respond to ICMP_UNREACH errors with another ICMP_UNREACH error. 2) Support fetching the current bridge ethernet address. This allows a more flexible approach to packet redirection on bridges without need to use hardcoded addresses. From Fernando Fernandez Mancera. 3) Zap a few no-longer needed conditionals from ipvs packet path and convert to READ/WRITE_ONCE to avoid KCSAN warnings. From Zhang Tengfei. 4) Remove a no-longer-used macro argument in ipset, from Zhen Ni. * tag 'nf-next-25-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_reject: don't reply to icmp error messages ipvs: Use READ_ONCE/WRITE_ONCE for ipvs->enable netfilter: nft_meta_bridge: introduce NFT_META_BRI_IIFHWADDR support netfilter: ipset: Remove unused htable_bits in macro ahash_region selftest:net: fixed spelling mistakes ==================== Link: https://patch.msgid.link/20250911143819.14753-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-12 17:06:25 -07:00
Petr Machata	21446c06b4	net: bridge: Introduce UAPI for BR_BOOLOPT_FDB_LOCAL_VLAN_0 The previous patches introduced a new option, BR_BOOLOPT_FDB_LOCAL_VLAN_0. When enabled, it has local FDB entries installed only on VLAN 0, instead of duplicating them across all VLANs. In this patch, add the corresponding UAPI toggle, and the code for turning the feature on and off. Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/ea99bfb10f687fa58091e6e1c2f8acc33f47ca45.1757004393.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-11 19:02:50 -07:00
Petr Machata	a29aba64e0	net: bridge: BROPT_FDB_LOCAL_VLAN_0: Skip local FDBs on VLAN creation When BROPT_FDB_LOCAL_VLAN_0 is enabled, the local FDB entries for the member ports as well as the bridge itself should not be created per-VLAN, but instead only on VLAN 0. Thus when a VLAN is added for a port or the bridge itself, a local FDB entry with the corresponding address should not be added when in the VLAN-0 mode. Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/bb13ba01d58ed6d5d700e012c519d38ee6806d22.1757004393.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-11 19:02:50 -07:00
Petr Machata	40df3b8e90	net: bridge: BROPT_FDB_LOCAL_VLAN_0: On bridge changeaddr, skip per-VLAN FDBs When BROPT_FDB_LOCAL_VLAN_0 is enabled, the local FDB entries for the bridge itself should not be created per-VLAN, but instead only on VLAN 0. When the bridge address changes, the local FDB entries need to be updated, which is done in br_fdb_change_mac_address(). Bail out early when in VLAN-0 mode, so that the per-VLAN FDB entries are not created. The per-VLAN walk is only done afterwards. Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/0bd432cf91921ef7c4ed0e129de1d1cd358c716b.1757004393.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-11 19:02:50 -07:00
Petr Machata	4cf5fd8497	net: bridge: BROPT_FDB_LOCAL_VLAN_0: On port changeaddr, skip per-VLAN FDBs When BROPT_FDB_LOCAL_VLAN_0 is enabled, the local FDB entries for member ports should not be created per-VLAN, but instead only on VLAN 0. When the member port address changes, the local FDB entries need to be updated, which is done in br_fdb_changeaddr(). Under the VLAN-0 mode, only one local FDB entry will ever be added for a port's address, and that on VLAN 0. Thus bail out of the delete loop early. For the same reason, also skip adding the per-VLAN entries. Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/0cf9d41836d2a245b0ce07e1a16ee05ca506cbe9.1757004393.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-11 19:02:50 -07:00
Petr Machata	60d6be0931	net: bridge: BROPT_FDB_LOCAL_VLAN_0: Look up FDB on VLAN 0 on miss When BROPT_FDB_LOCAL_VLAN_0 is enabled, the local FDB entries for the member ports as well as the bridge itself should not be created per-VLAN, but instead only on VLAN 0. That means that br_handle_frame_finish() needs to make two lookups: the primary lookup on an appropriate VLAN, and when that misses, a lookup on VLAN 0. Have the second lookup only accept local MAC addresses. Turning this into a generic second-lookup feature is not the goal. Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/8087475009dce360fb68d873b1ed9c80827da302.1757004393.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-11 19:02:50 -07:00
Petr Machata	c1164178e9	net: bridge: Introduce BROPT_FDB_LOCAL_VLAN_0 The following patches will gradually introduce the ability of the bridge to look up local FDB entries on VLAN 0 instead of using the VLAN indicated by a packet. In this patch, just introduce the option itself, with which the feature will be linked. Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/ab85e33ef41ed19a3deaef0ff7da26830da30642.1757004393.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-11 19:02:49 -07:00
Jakub Kicinski	fc3a281041	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.17-rc6). Conflicts: net/netfilter/nft_set_pipapo.c net/netfilter/nft_set_pipapo_avx2.c `c4eaca2e10` ("netfilter: nft_set_pipapo: don't check genbit from packetpath lookups") `84c1da7b38` ("netfilter: nft_set_pipapo: use avx2 algorithm for insertions too") Only trivial adjacent changes (in a doc and a Makefile). Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-11 17:40:13 -07:00
Fernando Fernandez Mancera	cbd2257dc9	netfilter: nft_meta_bridge: introduce NFT_META_BRI_IIFHWADDR support Expose the input bridge interface ethernet address so it can be used to redirect the packet to the receiving physical device for processing. Tested with nft command line tool. table bridge nat { chain PREROUTING { type filter hook prerouting priority 0; policy accept; ether daddr de:ad:00:00:be:ef meta pkttype set host ether daddr set meta ibrhwdr accept } } Joint work with Pablo Neira. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-11 15:40:55 +02:00
Petr Machata	8625f5748f	net: bridge: Bounce invalid boolopts The bridge driver currently tolerates options that it does not recognize. Instead, it should bounce them. Fixes: `a428afe82f` ("net: bridge: add support for user-controlled bool options") Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/e6fdca3b5a8d54183fbda075daffef38bdd7ddce.1757070067.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-08 18:23:40 -07:00
Jakub Kicinski	5ef04a7b06	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.17-rc5). No conflicts. Adjacent changes: include/net/sock.h `c51613fa27` ("net: add sk->sk_drop_counters") `5d6b58c932` ("net: lockless sock_i_ino()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-04 13:33:00 -07:00
Qianfeng Rong	46015e6b3e	netfilter: ebtables: Use vmalloc_array() to improve code Remove array_size() calls and replace vmalloc() with vmalloc_array() to simplify the code. vmalloc_array() is also optimized better, uses fewer instructions, and handles overflow more concisely[1]. [1]: https://lore.kernel.org/lkml/abc66ec5-85a4-47e1-9759-2f60ab111971@vivo.com/ Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-02 15:28:17 +02:00
Wang Liang	479a54ab92	netfilter: br_netfilter: do not check confirmed bit in br_nf_local_in() after confirm When send a broadcast packet to a tap device, which was added to a bridge, br_nf_local_in() is called to confirm the conntrack. If another conntrack with the same hash value is added to the hash table, which can be triggered by a normal packet to a non-bridge device, the below warning may happen. ------------[ cut here ]------------ WARNING: CPU: 1 PID: 96 at net/bridge/br_netfilter_hooks.c:632 br_nf_local_in+0x168/0x200 CPU: 1 UID: 0 PID: 96 Comm: tap_send Not tainted 6.17.0-rc2-dirty #44 PREEMPT(voluntary) RIP: 0010:br_nf_local_in+0x168/0x200 Call Trace: <TASK> nf_hook_slow+0x3e/0xf0 br_pass_frame_up+0x103/0x180 br_handle_frame_finish+0x2de/0x5b0 br_nf_hook_thresh+0xc0/0x120 br_nf_pre_routing_finish+0x168/0x3a0 br_nf_pre_routing+0x237/0x5e0 br_handle_frame+0x1ec/0x3c0 __netif_receive_skb_core+0x225/0x1210 __netif_receive_skb_one_core+0x37/0xa0 netif_receive_skb+0x36/0x160 tun_get_user+0xa54/0x10c0 tun_chr_write_iter+0x65/0xb0 vfs_write+0x305/0x410 ksys_write+0x60/0xd0 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> ---[ end trace 0000000000000000 ]--- To solve the hash conflict, nf_ct_resolve_clash() try to merge the conntracks, and update skb->_nfct. However, br_nf_local_in() still use the old ct from local variable 'nfct' after confirm(), which leads to this warning. If confirm() does not insert the conntrack entry and return NF_DROP, the warning may also occur. There is no need to reserve the WARN_ON_ONCE, just remove it. Link: https://lore.kernel.org/netdev/20250820043329.2902014-1-wangliang74@huawei.com/ Fixes: `62e7151ae3` ("netfilter: bridge: confirm multicast packets before passing them up the stack") Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Wang Liang <wangliang74@huawei.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-08-27 11:53:37 +02:00
Jakub Kicinski	a9af709fda	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.17-rc3). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-21 11:33:15 -07:00
Wang Liang	7de0eebbb4	net: bridge: remove unused argument of br_multicast_query_expired() Since commit `67b746f94f` ("net: bridge: mcast: make sure querier port/address updates are consistent"), the argument 'querier' is unused, just get rid of it. Signed-off-by: Wang Liang <wangliang74@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250814042355.1720755-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-15 10:26:56 -07:00
Wang Liang	d1547bf460	net: bridge: fix soft lockup in br_multicast_query_expired() When set multicast_query_interval to a large value, the local variable 'time' in br_multicast_send_query() may overflow. If the time is smaller than jiffies, the timer will expire immediately, and then call mod_timer() again, which creates a loop and may trigger the following soft lockup issue. watchdog: BUG: soft lockup - CPU#1 stuck for 221s! [rb_consumer:66] CPU: 1 UID: 0 PID: 66 Comm: rb_consumer Not tainted 6.16.0+ #259 PREEMPT(none) Call Trace: <IRQ> __netdev_alloc_skb+0x2e/0x3a0 br_ip6_multicast_alloc_query+0x212/0x1b70 __br_multicast_send_query+0x376/0xac0 br_multicast_send_query+0x299/0x510 br_multicast_query_expired.constprop.0+0x16d/0x1b0 call_timer_fn+0x3b/0x2a0 __run_timers+0x619/0x950 run_timer_softirq+0x11c/0x220 handle_softirqs+0x18e/0x560 __irq_exit_rcu+0x158/0x1a0 sysvec_apic_timer_interrupt+0x76/0x90 </IRQ> This issue can be reproduced with: ip link add br0 type bridge echo 1 > /sys/class/net/br0/bridge/multicast_querier echo 0xffffffffffffffff > /sys/class/net/br0/bridge/multicast_query_interval ip link set dev br0 up The multicast_startup_query_interval can also cause this issue. Similar to the commit `99b4061095` ("net: bridge: mcast: add and enforce query interval minimum"), add check for the query interval maximum to fix this issue. Link: https://lore.kernel.org/netdev/20250806094941.1285944-1-wangliang74@huawei.com/ Link: https://lore.kernel.org/netdev/20250812091818.542238-1-wangliang74@huawei.com/ Fixes: `d902eee43f` ("bridge: Add multicast count/interval sysfs entries") Suggested-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Wang Liang <wangliang74@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250813021054.1643649-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-14 17:49:33 -07:00
Ido Schimmel	3d05b24429	bridge: Redirect to backup port when port is administratively down If a backup port is configured for a bridge port, the bridge will redirect known unicast traffic towards the backup port when the primary port is administratively up but without a carrier. This is useful, for example, in MLAG configurations where a system is connected to two switches and there is a peer link between both switches. The peer link serves as the backup port in case one of the switches loses its connection to the multi-homed system. In order to avoid flooding when the primary port loses its carrier, the bridge does not flush dynamic FDB entries pointing to the port upon STP disablement, if the port has a backup port. The above means that known unicast traffic destined to the primary port will be blackholed when the port is put administratively down, until the FDB entries pointing to it are aged-out. Given that the current behavior is quite weird and unlikely to be depended on by anyone, amend the bridge to redirect to the backup port also when the primary port is administratively down and not only when it does not have a carrier. The change is motivated by a report from a user who expected traffic to be redirected to the backup port when the primary port was put administratively down while debugging a network issue. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250812080213.325298-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-14 17:45:36 -07:00

1 2 3 4 5 ...

2679 Commits