linux

Commit Graph

Author	SHA1	Message	Date
Eric Dumazet	4e45337556	ipv6: avoid overflows in ip6_datagram_send_ctl() Yiming Qian reported : <quote> I believe I found a locally triggerable kernel bug in the IPv6 sendmsg ancillary-data path that can panic the kernel via `skb_under_panic()` (local DoS). The core issue is a mismatch between: - a 16-bit length accumulator (`struct ipv6_txoptions::opt_flen`, type `__u16`) and - a pointer to the last provided destination-options header (`opt->dst1opt`) when multiple `IPV6_DSTOPTS` control messages (cmsgs) are provided. - `include/net/ipv6.h`: - `struct ipv6_txoptions::opt_flen` is `__u16` (wrap possible). (lines 291-307, especially 298) - `net/ipv6/datagram.c:ip6_datagram_send_ctl()`: - Accepts repeated `IPV6_DSTOPTS` and accumulates into `opt_flen` without rejecting duplicates. (lines 909-933) - `net/ipv6/ip6_output.c:__ip6_append_data()`: - Uses `opt->opt_flen + opt->opt_nflen` to compute header sizes/headroom decisions. (lines 1448-1466, especially 1463-1465) - `net/ipv6/ip6_output.c:__ip6_make_skb()`: - Calls `ipv6_push_frag_opts()` if `opt->opt_flen` is non-zero. (lines 1930-1934) - `net/ipv6/exthdrs.c:ipv6_push_frag_opts()` / `ipv6_push_exthdr()`: - Push size comes from `ipv6_optlen(opt->dst1opt)` (based on the pointed-to header). (lines 1179-1185 and 1206-1211) 1. `opt_flen` is a 16-bit accumulator: - `include/net/ipv6.h:298` defines `__u16 opt_flen; /* after fragment hdr /`. 2. `ip6_datagram_send_ctl()` accepts repeated* `IPV6_DSTOPTS` cmsgs and increments `opt_flen` each time: - In `net/ipv6/datagram.c:909-933`, for `IPV6_DSTOPTS`: - It computes `len = ((hdr->hdrlen + 1) << 3);` - It checks `CAP_NET_RAW` using `ns_capable(net->user_ns, CAP_NET_RAW)`. (line 922) - Then it does: - `opt->opt_flen += len;` (line 927) - `opt->dst1opt = hdr;` (line 928) There is no duplicate rejection here (unlike the legacy `IPV6_2292DSTOPTS` path which rejects duplicates at `net/ipv6/datagram.c:901-904`). If enough large `IPV6_DSTOPTS` cmsgs are provided, `opt_flen` wraps while `dst1opt` still points to a large (2048-byte) destination-options header. In the attached PoC (`poc.c`): - 32 cmsgs with `hdrlen=255` => `len = (255+1)8 = 2048` - 1 cmsg with `hdrlen=0` => `len = 8` - Total increment: `322048 + 8 = 65544`, so `(__u16)opt_flen == 8` - The last cmsg is 2048 bytes, so `dst1opt` points to a 2048-byte header. 3. The transmit path sizes headers using the wrapped `opt_flen`: - In `net/ipv6/ip6_output.c:1463-1465`: - `headersize = sizeof(struct ipv6hdr) + (opt ? opt->opt_flen + opt->opt_nflen : 0) + ...;` With wrapped `opt_flen`, `headersize`/headroom decisions underestimate what will be pushed later. 4. When building the final skb, the actual push length comes from `dst1opt` and is not limited by wrapped `opt_flen`: - In `net/ipv6/ip6_output.c:1930-1934`: - `if (opt->opt_flen) proto = ipv6_push_frag_opts(skb, opt, proto);` - In `net/ipv6/exthdrs.c:1206-1211`, `ipv6_push_frag_opts()` pushes `dst1opt` via `ipv6_push_exthdr()`. - In `net/ipv6/exthdrs.c:1179-1184`, `ipv6_push_exthdr()` does: - `skb_push(skb, ipv6_optlen(opt));` - `memcpy(h, opt, ipv6_optlen(opt));` With insufficient headroom, `skb_push()` underflows and triggers `skb_under_panic()` -> `BUG()`: - `net/core/skbuff.c:2669-2675` (`skb_push()` calls `skb_under_panic()`) - `net/core/skbuff.c:207-214` (`skb_panic()` ends in `BUG()`) - The `IPV6_DSTOPTS` cmsg path requires `CAP_NET_RAW` in the target netns user namespace (`ns_capable(net->user_ns, CAP_NET_RAW)`). - Root (or any task with `CAP_NET_RAW`) can trigger this without user namespaces. - An unprivileged `uid=1000` user can trigger this if unprivileged user namespaces are enabled and it can create a userns+netns to obtain namespaced `CAP_NET_RAW` (the attached PoC does this). - Local denial of service: kernel BUG/panic (system crash). - Reproducible with a small userspace PoC. </quote> This patch does not reject duplicated options, as this might break some user applications. Instead, it makes sure to adjust opt_flen and opt_nflen to correctly reflect the size of the current option headers, preventing the overflows and the potential for panics. This applies to IPV6_DSTOPTS, IPV6_HOPOPTS, and IPV6_RTHDR. Specifically: When a new IPV6_DSTOPTS is processed, the length of the old opt->dst1opt is subtracted from opt->opt_flen before adding the new length. When a new IPV6_HOPOPTS is processed, the length of the old opt->dst0opt is subtracted from opt->opt_nflen. When a new Routing Header (IPV6_RTHDR or IPV6_2292RTHDR) is processed, the length of the old opt->srcrt is subtracted from opt->opt_nflen. In the special case within IPV6_2292RTHDR handling where dst1opt is moved to dst0opt, the length of the old opt->dst0opt is subtracted from opt->opt_nflen before the new one is added. Fixes: `333fad5364` ("[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).") Reported-by: Yiming Qian <yimingqian591@gmail.com> Closes: https://lore.kernel.org/netdev/CAL_bE8JNzawgr5OX5m+3jnQDHry2XxhQT5=jThW1zDPtUikRYA@mail.gmail.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260401154721.3740056-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 08:25:22 -07:00
Hangbin Liu	ffb5a4843c	ipv6: fix data race in fib6_metric_set() using cmpxchg fib6_metric_set() may be called concurrently from softirq context without holding the FIB table lock. A typical path is: ndisc_router_discovery() spin_unlock_bh(&table->tb6_lock) <- lock released fib6_metric_set(rt, RTAX_HOPLIMIT, ...) <- lockless call When two CPUs process Router Advertisement packets for the same router simultaneously, they can both arrive at fib6_metric_set() with the same fib6_info pointer whose fib6_metrics still points to dst_default_metrics. if (f6i->fib6_metrics == &dst_default_metrics) { /* both CPUs: true / struct dst_metrics p = kzalloc_obj(p, GFP_ATOMIC); refcount_set(&p->refcnt, 1); f6i->fib6_metrics = p; / CPU1 overwrites CPU0's p -> p0 leaked */ } The dst_metrics allocated by the losing CPU has refcnt=1 but no pointer to it anywhere in memory, producing a kmemleak report: unreferenced object 0xff1100025aca1400 (size 96): comm "softirq", pid 0, jiffies 4299271239 backtrace: kmalloc_trace+0x28a/0x380 fib6_metric_set+0xcd/0x180 ndisc_router_discovery+0x12dc/0x24b0 icmpv6_rcv+0xc16/0x1360 Fix this by: - Set val for p->metrics before published via cmpxchg() so the metrics value is ready before the pointer becomes visible to other CPUs. - Replace the plain pointer store with cmpxchg() and free the allocation safely when competition failed. - Add READ_ONCE()/WRITE_ONCE() for metrics[] setting in the non-default metrics path to prevent compiler-based data races. Fixes: `d4ead6b34b` ("net/ipv6: move metrics from dst to rt6_info") Reported-by: Fei Liu <feliu@redhat.com> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260331-b4-fib6_metric_set-kmemleak-v3-1-88d27f4d8825@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-01 17:44:35 -07:00
Zhengchuan Liang	9ca562bb8e	net: ipv6: flowlabel: defer exclusive option free until RCU teardown `ip6fl_seq_show()` walks the global flowlabel hash under the seq-file RCU read-side lock and prints `fl->opt->opt_nflen` when an option block is present. Exclusive flowlabels currently free `fl->opt` as soon as `fl->users` drops to zero in `fl_release()`. However, the surrounding `struct ip6_flowlabel` remains visible in the global hash table until later garbage collection removes it and `fl_free_rcu()` finally tears it down. A concurrent `/proc/net/ip6_flowlabel` reader can therefore race that early `kfree()` and dereference freed option state, triggering a crash in `ip6fl_seq_show()`. Fix this by keeping `fl->opt` alive until `fl_free_rcu()`. That matches the lifetime already required for the enclosing flowlabel while readers can still reach it under RCU. Fixes: `d3aedd5ebd` ("ipv6 flowlabel: Convert hash list to RCU.") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Tested-by: Ren Wei <enjou1224z@gmail.com> Signed-off-by: Zhengchuan Liang <zcliangcn@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/07351f0ec47bcee289576f39f9354f4a64add6e4.1774855883.git.zcliangcn@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-31 15:44:29 -07:00
Paolo Abeni	fd63f18597	ipv6: prevent possible UaF in addrconf_permanent_addr() The mentioned helper try to warn the user about an exceptional condition, but the message is delivered too late, accessing the ipv6 after its possible deletion. Reorder the statement to avoid the possible UaF; while at it, place the warning outside the idev->lock as it needs no protection. Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://sashiko.dev/#/patchset/8c8bfe2e1a324e501f0e15fef404a77443fd8caf.1774365668.git.pabeni%40redhat.com Fixes: `f1705ec197` ("net: ipv6: Make address flushing on ifdown optional") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/ef973c3a8cb4f8f1787ed469f3e5391b9fe95aa0.1774601542.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-30 17:25:10 -07:00
Eric Dumazet	2edfa31769	ip6_tunnel: clear skb2->cb[] in ip4ip6_err() Oskar Kjos reported the following problem. ip4ip6_err() calls icmp_send() on a cloned skb whose cb[] was written by the IPv6 receive path as struct inet6_skb_parm. icmp_send() passes IPCB(skb2) to __ip_options_echo(), which interprets that cb[] region as struct inet_skb_parm (IPv4). The layouts differ: inet6_skb_parm.nhoff at offset 14 overlaps inet_skb_parm.opt.rr, producing a non-zero rr value. __ip_options_echo() then reads optlen from attacker-controlled packet data at sptr[rr+1] and copies that many bytes into dopt->__data, a fixed 40-byte stack buffer (IP_OPTIONS_DATA_FIXED_SIZE). To fix this we clear skb2->cb[], as suggested by Oskar Kjos. Also add minimal IPv4 header validation (version == 4, ihl >= 5). Fixes: `c4d3efafcc` ("[IPV6] IP6TUNNEL: Add support to IPv4 over IPv6 tunnel.") Reported-by: Oskar Kjos <oskar.kjos@hotmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260326155138.2429480-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-27 20:24:12 -07:00
Eric Dumazet	86ab3e5567	ipv6: icmp: clear skb2->cb[] in ip6_err_gen_icmpv6_unreach() Sashiko AI-review observed: In ip6_err_gen_icmpv6_unreach(), the skb is an outer IPv4 ICMP error packet where its cb contains an IPv4 inet_skb_parm. When skb is cloned into skb2 and passed to icmp6_send(), it uses IP6CB(skb2). IP6CB interprets the IPv4 inet_skb_parm as an inet6_skb_parm. The cipso offset in inet_skb_parm.opt directly overlaps with dsthao in inet6_skb_parm at offset 18. If an attacker sends a forged ICMPv4 error with a CIPSO IP option, dsthao would be a non-zero offset. Inside icmp6_send(), mip6_addr_swap() is called and uses ipv6_find_tlv(skb, opt->dsthao, IPV6_TLV_HAO). This would scan the inner, attacker-controlled IPv6 packet starting at that offset, potentially returning a fake TLV without checking if the remaining packet length can hold the full 18-byte struct ipv6_destopt_hao. Could mip6_addr_swap() then perform a 16-byte swap that extends past the end of the packet data into skb_shared_info? Should the cb array also be cleared in ip6_err_gen_icmpv6_unreach() and ip6ip6_err() to prevent this? This patch implements the first suggestion. I am not sure if ip6ip6_err() needs to be changed. A separate patch would be better anyway. Fixes: `ca15a078bd` ("sit: generate icmpv6 error when receiving icmpv4 error") Reported-by: Ido Schimmel <idosch@nvidia.com> Closes: https://sashiko.dev/#/patchset/20260326155138.2429480-1-edumazet%40google.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Oskar Kjos <oskar.kjos@hotmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260326202608.2976021-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-27 20:22:46 -07:00
Pengpeng Hou	5e67ba9bb5	net/ipv6: ioam6: prevent schema length wraparound in trace fill ioam6_fill_trace_data() stores the schema contribution to the trace length in a u8. With bit 22 enabled and the largest schema payload, sclen becomes 1 + 1020 / 4, wraps from 256 to 0, and bypasses the remaining-space check. __ioam6_fill_trace_data() then positions the write cursor without reserving the schema area but still copies the 4-byte schema header and the full schema payload, overrunning the trace buffer. Keep sclen in an unsigned int so the remaining-space check and the write cursor calculation both see the full schema length. Fixes: `8c6f6fa677` ("ipv6: ioam: IOAM Generic Netlink API") Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> Reviewed-by: Justin Iurman <justin.iurman@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2026-03-27 12:05:36 +00:00
Yochai Eisenrich	ae05340cca	net: ipv6: ndisc: fix ndisc_ra_useropt to initialize nduseropt_padX fields to zero to prevent an info-leak When processing Router Advertisements with user options the kernel builds an RTM_NEWNDUSEROPT netlink message. The nduseroptmsg struct has three padding fields that are never zeroed and can leak kernel data The fix is simple, just zeroes the padding fields. Fixes: `31910575a9` ("[IPv6]: Export userland ND options through netlink (RDNSS support)") Signed-off-by: Yochai Eisenrich <echelonh@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260324224925.2437775-1-echelonh@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 20:38:35 -07:00
Ren Wei	9d3f027327	netfilter: ip6t_rt: reject oversized addrnr in rt_mt6_check() Reject rt match rules whose addrnr exceeds IP6T_RT_HOPS. rt_mt6() expects addrnr to stay within the bounds of rtinfo->addrs[]. Validate addrnr during rule installation so malformed rules are rejected before the match logic can use an out-of-range value. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Tested-by: Yuhang Zheng <z1652074432@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-03-26 13:18:31 +01:00
Paolo Abeni	51a209ee33	ipsec-2026-03-23 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmnA+cgACgkQrB3Eaf9P W7fJgBAAlZKkRki11NUIeI8IjOzEoMRShSsbOMjeCVBUDKc05krfWyln1FLuQbD/ BNSgRNFQ0uT653Cn88CbVRtxuebkmhde7bH29yEpfnsd/duVDlJaHkwjCEH15hvb zIeWrzdn+ct77Kg6i1EsJ5BfC7kADYWfgCFrSAAz2MEerCGNcLn2pKlopAEIGAD9 Ahd7XohBK9uxP8ZhF4GLQAjTImTDEQmJJek0QDdGp6sr+V0PuIh1MQ75SjW+9rZK 4p+rHhsOGCcjobljbksYTJd9/5hC2ThqsYBBbRsxS+g9ibvMvDoal2PCtBA7SnHZ F66PL8Lui555V4jL80Fi80Mu/uquizOX0iMiVjhAtepiqxn9IZleXutddPN/9yCg tHlk7IytBSovGBBT/AdL6F8hOVvwAFa/pnr/6pzjcjmiIkwSLMCU0ge/yjF01vGK tnltSGfuZ9+aF6XEjAmIZ2jMbA7mtKIoc9VOJB5/96yFS3G48/E7Aq6SNYIF8vyB N6xgdbhqp4PfIYuQ+zWcibj2XAGlXW9RF34i2CSbf7BlztetoctS8iuHlUWIlkS3 dcYAp7/ZQWRM779pg9pTKw7kGUwPlS0LbUBr4Z8nvcxdBUULuKc+9PAgRO3nX1v0 7EbIukGdhc+hvM8zC/aok8g6h8cPNvvaaL8CLL+wSYt28/xHrLs= =E39n -----END PGP SIGNATURE----- Merge tag 'ipsec-2026-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2026-03-23 1) Add missing extack for XFRMA_SA_PCPU in add_acquire and allocspi. From Sabrina Dubroca. 2) Fix the condition on x->pcpu_num in xfrm_sa_len by using the proper check. From Sabrina Dubroca. 3) Call xdo_dev_state_delete during state update to properly cleanup the xdo device state. From Sabrina Dubroca. 4) Fix a potential skb leak in espintcp when async crypto is used. From Sabrina Dubroca. 5) Validate inner IPv4 header length in IPTFS payload to avoid parsing malformed packets. From Roshan Kumar. 6) Fix skb_put() panic on non-linear skb during IPTFS reassembly. From Fernando Fernandez Mancera. 7) Silence various sparse warnings related to RCU, state, and policy handling. From Sabrina Dubroca. 8) Fix work re-schedule race after cancel in xfrm_nat_keepalive_net_fini(). From Hyunwoo Kim. 9) Prevent policy_hthresh.work from racing with netns teardown by using a proper cleanup mechanism. From Minwoo Ra. 10) Validate that the family of the source and destination addresses match in pfkey_send_migrate(). From Eric Dumazet. 11) Only publish mode_data after the clone is setup in the IPTFS receive path. This prevents leaving x->mode_data pointing at freed memory on error. From Paul Moses. Please pull or let me know if there are problems. ipsec-2026-03-23 * tag 'ipsec-2026-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: iptfs: only publish mode_data after clone setup af_key: validate families in pfkey_send_migrate() xfrm: prevent policy_hthresh.work from racing with netns teardown xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini() xfrm: avoid RCU warnings around the per-netns netlink socket xfrm: add rcu_access_pointer to silence sparse warning for xfrm_input_afinfo xfrm: policy: silence sparse warning in xfrm_policy_unregister_afinfo xfrm: policy: fix sparse warnings in xfrm_policy_{init,fini} xfrm: state: silence sparse warnings during netns exit xfrm: remove rcu/state_hold from xfrm_state_lookup_spi_proto xfrm: state: add xfrm_state_deref_prot to state_by* walk under lock xfrm: state: fix sparse warnings around XFRM_STATE_INSERT xfrm: state: fix sparse warnings in xfrm_state_init xfrm: state: fix sparse warnings on xfrm_state_hold_rcu xfrm: iptfs: fix skb_put() panic on non-linear skb during reassembly xfrm: iptfs: validate inner IPv4 header length in IPTFS payload esp: fix skb leak with espintcp and async crypto xfrm: call xdo_dev_state_delete during state update xfrm: fix the condition on x->pcpu_num in xfrm_sa_len xfrm: add missing extack for XFRMA_SA_PCPU in add_acquire and allocspi ==================== Link: https://patch.msgid.link/20260323083440.2741292-1-steffen.klassert@secunet.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-24 15:16:28 +01:00
Kuniyuki Iwashima	4be7b99c25	ipv6: Don't remove permanent routes with exceptions from tb6_gc_hlist. The cited commit mechanically put fib6_remove_gc_list() just after every fib6_clean_expires() call. When a temporary route is promoted to a permanent route, there may already be exception routes tied to it. If fib6_remove_gc_list() removes the route from tb6_gc_hlist, such exception routes will no longer be aged. Let's replace fib6_remove_gc_list() with a new helper fib6_may_remove_gc_list() and use fib6_age_exceptions() there. Note that net->ipv6 is only compiled when CONFIG_IPV6 is enabled, so fib6_{add,remove,may_remove}_gc_list() are guarded. Fixes: `5eb902b8e7` ("net/ipv6: Remove expired routes with a separated list of routes.") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260320072317.2561779-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-23 16:59:31 -07:00
Kuniyuki Iwashima	6af51e9f31	ipv6: Remove permanent routes from tb6_gc_hlist when all exceptions expire. Commit `5eb902b8e7` ("net/ipv6: Remove expired routes with a separated list of routes.") introduced a per-table GC list and changed GC to iterate over that list instead of traversing the entire route table. However, it forgot to add permanent routes to tb6_gc_hlist when exception routes are added. Commit `cfe82469a0` ("ipv6: add exception routes to GC list in rt6_insert_exception") fixed that issue but introduced another one. Even after all exception routes expire, the permanent routes remain in tb6_gc_hlist, potentially negating the performance benefits intended by the initial change. Let's count gc_args->more before and after rt6_age_exceptions() and remove the permanent route when the delta is 0. Note that the next patch will reuse fib6_age_exceptions(). Fixes: `cfe82469a0` ("ipv6: add exception routes to GC list in rt6_insert_exception") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260320072317.2561779-2-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-23 16:59:31 -07:00
Minhong He	0641379352	ipv6: add NULL checks for idev in SRv6 paths __in6_dev_get() can return NULL when the device has no IPv6 configuration (e.g. MTU < IPV6_MIN_MTU or after NETDEV_UNREGISTER). Add NULL checks for idev returned by __in6_dev_get() in both seg6_hmac_validate_skb() and ipv6_srh_rcv() to prevent potential NULL pointer dereferences. Fixes: `1ababeba4a` ("ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)") Fixes: `bf355b8d2c` ("ipv6: sr: add core files for SR HMAC support") Signed-off-by: Minhong He <heminhong@kylinos.cn> Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it> Link: https://patch.msgid.link/20260316073301.106643-1-heminhong@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-18 17:23:43 -07:00
Jakub Kicinski	94a4b1f959	ipv6: move the disable_ipv6_mod knob to core code From: Jakub Kicinski <kuba@kernel.org> Make sure disable_ipv6_mod itself is not part of the IPv6 module, in case core code wants to refer to it. We will remove support for IPv6=m soon, this change helps make fixes we commit before that less messy. Link: https://patch.msgid.link/20260307-net-nd_tbl_fixes-v4-1-e2677e85628c@suse.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-11 17:53:37 -07:00
Jiayuan Chen	21ec92774d	net: ipv6: fix panic when IPv4 route references loopback IPv6 nexthop When a standalone IPv6 nexthop object is created with a loopback device (e.g., "ip -6 nexthop add id 100 dev lo"), fib6_nh_init() misclassifies it as a reject route. This is because nexthop objects have no destination prefix (fc_dst=::), causing fib6_is_reject() to match any loopback nexthop. The reject path skips fib_nh_common_init(), leaving nhc_pcpu_rth_output unallocated. If an IPv4 route later references this nexthop, __mkroute_output() dereferences NULL nhc_pcpu_rth_output and panics. Simplify the check in fib6_nh_init() to only match explicit reject routes (RTF_REJECT) instead of using fib6_is_reject(). The loopback promotion heuristic in fib6_is_reject() is handled separately by ip6_route_info_create_nh(). After this change, the three cases behave as follows: 1. Explicit reject route ("ip -6 route add unreachable 2001:db8::/64"): RTF_REJECT is set, enters reject path, skips fib_nh_common_init(). No behavior change. 2. Implicit loopback reject route ("ip -6 route add 2001:db8::/32 dev lo"): RTF_REJECT is not set, takes normal path, fib_nh_common_init() is called. ip6_route_info_create_nh() still promotes it to reject afterward. nhc_pcpu_rth_output is allocated but unused, which is harmless. 3. Standalone nexthop object ("ip -6 nexthop add id 100 dev lo"): RTF_REJECT is not set, takes normal path, fib_nh_common_init() is called. nhc_pcpu_rth_output is properly allocated, fixing the crash when IPv4 routes reference this nexthop. Suggested-by: Ido Schimmel <idosch@nvidia.com> Fixes: `493ced1ac4` ("ipv4: Allow routes to use nexthop objects") Reported-by: syzbot+334190e097a98a1b81bb@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/698f8482.a70a0220.2c38d7.00ca.GAE@google.com/T/ Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260304113817.294966-2-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-05 07:53:17 -08:00
Eric Dumazet	165573e41f	tcp: secure_seq: add back ports to TS offset This reverts `28ee1b746f` ("secure_seq: downgrade to per-host timestamp offsets") tcp_tw_recycle went away in 2017. Zhouyan Deng reported off-path TCP source port leakage via SYN cookie side-channel that can be fixed in multiple ways. One of them is to bring back TCP ports in TS offset randomization. As a bonus, we perform a single siphash() computation to provide both an ISN and a TS offset. Fixes: `28ee1b746f` ("secure_seq: downgrade to per-host timestamp offsets") Reported-by: Zhouyan Deng <dengzhouyan_nwpu@163.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Acked-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260302205527.1982836-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 17:44:35 -08:00
Eric Biggers	46d0d6f50d	net/tcp-md5: Fix MAC comparison to be constant-time To prevent timing attacks, MACs need to be compared in constant time. Use the appropriate helper function for this. Fixes: `cfb6eeb4c8` ("[TCP]: MD5 Signature Option (RFC2385) support.") Fixes: `658ddaaf66` ("tcp: md5: RST: getting md5 key from listener") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org> Link: https://patch.msgid.link/20260302203409.13388-1-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-03 18:39:43 -08:00
Jakub Kicinski	2ffb4f5c2c	ipv6: fix NULL pointer deref in ip6_rt_get_dev_rcu() l3mdev_master_dev_rcu() can return NULL when the slave device is being un-slaved from a VRF. All other callers deal with this, but we lost the fallback to loopback in ip6_rt_pcpu_alloc() -> ip6_rt_get_dev_rcu() with commit `4832c30d54` ("net: ipv6: put host and anycast routes on device with address"). KASAN: null-ptr-deref in range [0x0000000000000108-0x000000000000010f] RIP: 0010:ip6_rt_pcpu_alloc (net/ipv6/route.c:1418) Call Trace: ip6_pol_route (net/ipv6/route.c:2318) fib6_rule_lookup (net/ipv6/fib6_rules.c:115) ip6_route_output_flags (net/ipv6/route.c:2607) vrf_process_v6_outbound (drivers/net/vrf.c:437) I was tempted to rework the un-slaving code to clear the flag first and insert synchronize_rcu() before we remove the upper. But looks like the explicit fallback to loopback_dev is an established pattern. And I guess avoiding the synchronize_rcu() is nice, too. Fixes: `4832c30d54` ("net: ipv6: put host and anycast routes on device with address") Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260301194548.927324-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-03 17:14:48 -08:00
Eric Dumazet	29252397bc	inet: annotate data-races around isk->inet_num UDP/TCP lookups are using RCU, thus isk->inet_num accesses should use READ_ONCE() and WRITE_ONCE() where needed. Fixes: `3ab5aee7fe` ("net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260225203545.1512417-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-27 17:16:59 -08:00
Linus Torvalds	b9c8fc2cae	Including fixes from IPsec, Bluetooth and netfilter Current release - regressions: - wifi: fix dev_alloc_name() return value check - rds: fix recursive lock in rds_tcp_conn_slots_available Current release - new code bugs: - vsock: lock down child_ns_mode as write-once Previous releases - regressions: - core: - do not pass flow_id to set_rps_cpu() - consume xmit errors of GSO frames - netconsole: avoid OOB reads, msg is not nul-terminated - netfilter: h323: fix OOB read in decode_choice() - tcp: re-enable acceptance of FIN packets when RWIN is 0 - udplite: fix null-ptr-deref in __udp_enqueue_schedule_skb(). - wifi: brcmfmac: fix potential kernel oops when probe fails - phy: register phy led_triggers during probe to avoid AB-BA deadlock - eth: bnxt_en: fix deleting of Ntuple filters - eth: wan: farsync: fix use-after-free bugs caused by unfinished tasklets - eth: xscale: check for PTP support properly Previous releases - always broken: - tcp: fix potential race in tcp_v6_syn_recv_sock() - kcm: fix zero-frag skb in frag_list on partial sendmsg error - xfrm: - fix race condition in espintcp_close() - always flush state and policy upon NETDEV_UNREGISTER event - bluetooth: - purge error queues in socket destructors - fix response to L2CAP_ECRED_CONN_REQ - eth: mlx5: - fix circular locking dependency in dump - fix "scheduling while atomic" in IPsec MAC address query - eth: gve: fix incorrect buffer cleanup for QPL - eth: team: avoid NETDEV_CHANGEMTU event when unregistering slave - eth: usb: validate USB endpoints Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmmgYU4SHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkLBgQAINazHstJ0DoDkvmwXapRSN0Ffauyd46 oX6nfeWOT3BzZbAhZHtGgCSs4aULifJWMevtT7pq7a7PgZwMwfa47BugR1G/u5UE hCqalNjRTB/U2KmFk6eViKSacD4FvUIAyAMOotn1aEdRRAkBIJnIW/o/ZR9ZUkm0 5+UigO64aq57+FOc5EQdGjYDcTVdzW12iOZ8ZqwtSATdNd9aC+gn3voRomTEo+Fm kQinkFEPAy/YyHGmfpC/z87/RTgkYLpagmsT4ZvBJeNPrIRvFEibSpPNhuzTzg81 /BW5M8sJmm3XFiTiRp6Blv+0n6HIpKjAZMHn5c9hzX9cxPZQ24EjkXEex9ClaxLd OMef79rr1HBwqBTpIlK7xfLKCdT5Iex88s8HxXRB/Psqk9pVP469cSoK6cpyiGiP I+4WT0wn9ukTiu/yV2L2byVr1sanlu54P+UBYJpDwqq3lZ1ngWtkJ+SY369jhwAS FYIBmUSKhmWz3FEULaGpgPy4m9Fl/fzN8IFh2Buoc/Puq61HH7MAMjRty2ZSFTqj gbHrRhlkCRqubytgjsnCDPLoJF4ZYcXtpo/8ogG3641H1I+dN+DyGGVZ/ioswkks My1ds0rKqA3BHCmn+pN/qqkuopDCOB95dqOpgDqHG7GePrpa/FJ1guhxexsCd+nL Run2RcgDmd+d =HBOu -----END PGP SIGNATURE----- Merge tag 'net-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from IPsec, Bluetooth and netfilter Current release - regressions: - wifi: fix dev_alloc_name() return value check - rds: fix recursive lock in rds_tcp_conn_slots_available Current release - new code bugs: - vsock: lock down child_ns_mode as write-once Previous releases - regressions: - core: - do not pass flow_id to set_rps_cpu() - consume xmit errors of GSO frames - netconsole: avoid OOB reads, msg is not nul-terminated - netfilter: h323: fix OOB read in decode_choice() - tcp: re-enable acceptance of FIN packets when RWIN is 0 - udplite: fix null-ptr-deref in __udp_enqueue_schedule_skb(). - wifi: brcmfmac: fix potential kernel oops when probe fails - phy: register phy led_triggers during probe to avoid AB-BA deadlock - eth: - bnxt_en: fix deleting of Ntuple filters - wan: farsync: fix use-after-free bugs caused by unfinished tasklets - xscale: check for PTP support properly Previous releases - always broken: - tcp: fix potential race in tcp_v6_syn_recv_sock() - kcm: fix zero-frag skb in frag_list on partial sendmsg error - xfrm: - fix race condition in espintcp_close() - always flush state and policy upon NETDEV_UNREGISTER event - bluetooth: - purge error queues in socket destructors - fix response to L2CAP_ECRED_CONN_REQ - eth: - mlx5: - fix circular locking dependency in dump - fix "scheduling while atomic" in IPsec MAC address query - gve: fix incorrect buffer cleanup for QPL - team: avoid NETDEV_CHANGEMTU event when unregistering slave - usb: validate USB endpoints" * tag 'net-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits) netfilter: nf_conntrack_h323: fix OOB read in decode_choice() dpaa2-switch: validate num_ifs to prevent out-of-bounds write net: consume xmit errors of GSO frames vsock: document write-once behavior of the child_ns_mode sysctl vsock: lock down child_ns_mode as write-once selftests/vsock: change tests to respect write-once child ns mode net/mlx5e: Fix "scheduling while atomic" in IPsec MAC address query net/mlx5: Fix missing devlink lock in SRIOV enable error path net/mlx5: E-switch, Clear legacy flag when moving to switchdev net/mlx5: LAG, disable MPESW in lag_disable_change() net/mlx5: DR, Fix circular locking dependency in dump selftests: team: Add a reference count leak test team: avoid NETDEV_CHANGEMTU event when unregistering slave net: mana: Fix double destroy_workqueue on service rescan PCI path MAINTAINERS: Update maintainer entry for QUALCOMM ETHQOS ETHERNET DRIVER dpll: zl3073x: Remove redundant cleanup in devm_dpll_init() selftests/net: packetdrill: Verify acceptance of FIN packets when RWIN is 0 tcp: re-enable acceptance of FIN packets when RWIN is 0 vsock: Use container_of() to get net namespace in sysctl handlers net: usb: kaweth: validate USB endpoints ...	2026-02-26 08:00:13 -08:00
Sabrina Dubroca	0c0eef8ccd	esp: fix skb leak with espintcp and async crypto When the TX queue for espintcp is full, esp_output_tail_tcp will return an error and not free the skb, because with synchronous crypto, the common xfrm output code will drop the packet for us. With async crypto (esp_output_done), we need to drop the skb when esp_output_tail_tcp returns an error. Fixes: `e27cca96cd` ("xfrm: add espintcp (RFC 8229)") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2026-02-25 09:11:40 +01:00
Kees Cook	189f164e57	Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-22 08:26:33 -08:00
Linus Torvalds	32a92f8c89	Convert more 'alloc_obj' cases to default GFP_KERNEL arguments This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 20:03:00 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Kuniyuki Iwashima	470c7ca2b4	udplite: Fix null-ptr-deref in __udp_enqueue_schedule_skb(). syzbot reported null-ptr-deref of udp_sk(sk)->udp_prod_queue. [0] Since the cited commit, udp_lib_init_sock() can fail, as can udp_init_sock() and udpv6_init_sock(). Let's handle the error in udplite_sk_init() and udplitev6_sk_init(). [0]: BUG: KASAN: null-ptr-deref in instrument_atomic_read include/linux/instrumented.h:82 [inline] BUG: KASAN: null-ptr-deref in atomic_read include/linux/atomic/atomic-instrumented.h:32 [inline] BUG: KASAN: null-ptr-deref in __udp_enqueue_schedule_skb+0x151/0x1480 net/ipv4/udp.c:1719 Read of size 4 at addr 0000000000000008 by task syz.2.18/2944 CPU: 1 UID: 0 PID: 2944 Comm: syz.2.18 Not tainted syzkaller #0 PREEMPTLAZY Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 Call Trace: <IRQ> dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120 kasan_report+0xa2/0xe0 mm/kasan/report.c:595 check_region_inline mm/kasan/generic.c:-1 [inline] kasan_check_range+0x264/0x2c0 mm/kasan/generic.c:200 instrument_atomic_read include/linux/instrumented.h:82 [inline] atomic_read include/linux/atomic/atomic-instrumented.h:32 [inline] __udp_enqueue_schedule_skb+0x151/0x1480 net/ipv4/udp.c:1719 __udpv6_queue_rcv_skb net/ipv6/udp.c:795 [inline] udpv6_queue_rcv_one_skb+0xa2e/0x1ad0 net/ipv6/udp.c:906 udp6_unicast_rcv_skb+0x227/0x380 net/ipv6/udp.c:1064 ip6_protocol_deliver_rcu+0xe17/0x1540 net/ipv6/ip6_input.c:438 ip6_input_finish+0x191/0x350 net/ipv6/ip6_input.c:489 NF_HOOK+0x354/0x3f0 include/linux/netfilter.h:318 ip6_input+0x16c/0x2b0 net/ipv6/ip6_input.c:500 NF_HOOK+0x354/0x3f0 include/linux/netfilter.h:318 __netif_receive_skb_one_core net/core/dev.c:6149 [inline] __netif_receive_skb+0xd3/0x370 net/core/dev.c:6262 process_backlog+0x4d6/0x1160 net/core/dev.c:6614 __napi_poll+0xae/0x320 net/core/dev.c:7678 napi_poll net/core/dev.c:7741 [inline] net_rx_action+0x60d/0xdc0 net/core/dev.c:7893 handle_softirqs+0x209/0x8d0 kernel/softirq.c:622 do_softirq+0x52/0x90 kernel/softirq.c:523 </IRQ> <TASK> __local_bh_enable_ip+0xe7/0x120 kernel/softirq.c:450 local_bh_enable include/linux/bottom_half.h:33 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:924 [inline] __dev_queue_xmit+0x109c/0x2dc0 net/core/dev.c:4856 __ip6_finish_output net/ipv6/ip6_output.c:-1 [inline] ip6_finish_output+0x158/0x4e0 net/ipv6/ip6_output.c:219 NF_HOOK_COND include/linux/netfilter.h:307 [inline] ip6_output+0x342/0x580 net/ipv6/ip6_output.c:246 ip6_send_skb+0x1d7/0x3c0 net/ipv6/ip6_output.c:1984 udp_v6_send_skb+0x9a5/0x1770 net/ipv6/udp.c:1442 udp_v6_push_pending_frames+0xa2/0x140 net/ipv6/udp.c:1469 udpv6_sendmsg+0xfe0/0x2830 net/ipv6/udp.c:1759 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg+0xe5/0x270 net/socket.c:742 __sys_sendto+0x3eb/0x580 net/socket.c:2206 __do_sys_sendto net/socket.c:2213 [inline] __se_sys_sendto net/socket.c:2209 [inline] __x64_sys_sendto+0xde/0x100 net/socket.c:2209 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xd2/0xf20 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f67b4d9c629 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f67b5c98028 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007f67b5015fa0 RCX: 00007f67b4d9c629 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 00007f67b4e32b39 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000040000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f67b5016038 R14: 00007f67b5015fa0 R15: 00007ffe3cb66dd8 </TASK> Fixes: `b650bf0977` ("udp: remove busylock and add per NUMA queues") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260219173142.310741-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-20 16:14:10 -08:00
Jakub Kicinski	0aebd81fa8	ipsec-2026-02-20 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmmYIsoACgkQrB3Eaf9P W7d4HQ/8DblZh6FcGi+/XR9jSfqWjFRTU37FHrgpveOtCcoXqYYC8OwKqYnqRABh rKgynAjrzKLKYUNFEOZVEYRb/ohKdZ9WlxAOK9ezSlFoEp7W3jfB8hkYv98vl45E 8gw6dqRZt2J9hKa0mcyBPosKZ43yShNWEVktQWpoFOL4fy6fCZVpgOwMSzEr8oRV 56AjvHjM8oFDP3BEDPeCGayzC8GFlER8fc79sUZNDpRr5OQtGo1NoceyUaGIJxZS d7g7WPgbewbfpx+IQavhmfiLYWXNwPal8aTtUNIZclPVB75+efkDNWf89O7ZGlZE 5LLo2Ix2oG/IP3EmKA42IqO6Rx7T6N89kK3AwXeEVP1BciwYhYch0L0ts5XdU6nG A9fQQ+qNukVK8F65dk32zSTStAsGUh/WxgAgY0jnbDwJlOsVwf4B9CEcTC3RavtS OvW2vIVtBYq3xdLh3DoUMxvLj+LIk6WOuicO4QHk+qDqHD0/gbkxVbb7hpXALOvc CCf5/+PG6s2uatIlsOJp+hg8BAQqG1s8vcvfHYpfBzLjJhTA4cem2pIFchMeIgei f25W5vzftMNm+sZejAhCzBwDkrEegNpjE6BbyQ4psYh44QIyRzveDVIHdVZmgpv6 nXCcL2K9jgkdUG4TLOj1FYTp/cWhNOGGyh6gVCVH+mupbdyTd6A= =Aa4a -----END PGP SIGNATURE----- Merge tag 'ipsec-2026-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2026-02-20 1) Check the value of ipv6_dev_get_saddr() to fix an uninitialized saddr in xfrm6_get_saddr(). From Jiayuan Chen. 2) Skip the templates check for packet offload in tunnel mode. Is was already done by the hardware and causes an unexpected XfrmInTmplMismatch increase. From Leon Romanovsky. 3) Fix a unregister_netdevice stall due to not dropped refcounts by always flushing xfrm state and policy on a NETDEV_UNREGISTER event. From Tetsuo Handa. * tag 'ipsec-2026-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: always flush state and policy upon NETDEV_UNREGISTER event xfrm: skip templates check for packet offload tunnel mode xfrm6: fix uninitialized saddr in xfrm6_get_saddr() ==================== Link: https://patch.msgid.link/20260220094133.14219-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-20 15:57:55 -08:00
Eric Dumazet	858d2a4f67	tcp: fix potential race in tcp_v6_syn_recv_sock() Code in tcp_v6_syn_recv_sock() after the call to tcp_v4_syn_recv_sock() is done too late. After tcp_v4_syn_recv_sock(), the child socket is already visible from TCP ehash table and other cpus might use it. Since newinet->pinet6 is still pointing to the listener ipv6_pinfo bad things can happen as syzbot found. Move the problematic code in tcp_v6_mapped_child_init() and call this new helper from tcp_v4_syn_recv_sock() before the ehash insertion. This allows the removal of one tcp_sync_mss(), since tcp_v4_syn_recv_sock() will call it with the correct context. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: syzbot+937b5bbb6a815b3e5d0b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/69949275.050a0220.2eeac1.0145.GAE@google.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260217161205.2079883-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-19 14:02:19 -08:00
Eric Dumazet	9395b1bb1f	ipv6: icmp: icmpv6_xrlim_allow() optimization if net.ipv6.icmp.ratelimit is zero If net.ipv6.icmp.ratelimit is zero we do not have to call inet_getpeer_v6() and inet_peer_xrlim_allow(). Both can be very expensive under DDOS. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260216142832.3834174-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-18 16:46:37 -08:00
Eric Dumazet	0201eedb69	ipv6: icmp: remove obsolete code in icmpv6_xrlim_allow() Following part was needed before the blamed commit, because inet_getpeer_v6() second argument was the prefix. /* Give more bandwidth to wider prefixes. */ if (rt->rt6i_dst.plen < 128) tmo >>= ((128 - rt->rt6i_dst.plen)>>5); Now inet_getpeer_v6() retrieves hosts, we need to remove @tmo adjustement or wider prefixes likes /24 allow 8x more ICMP to be sent for a given ratelimit. As we had this issue for a while, this patch changes net.ipv6.icmp.ratelimit default value from 1000ms to 100ms to avoid potential regressions. Also add a READ_ONCE() when reading net->ipv6.sysctl.icmpv6_time. Fixes: `fd0273d793` ("ipv6: Remove external dependency on rt6i_dst and rt6i_src") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Cc: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260216142832.3834174-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-18 16:46:36 -08:00
Kuniyuki Iwashima	8244f959e2	ipv6: Fix out-of-bound access in fib6_add_rt2node(). syzbot reported out-of-bound read in fib6_add_rt2node(). [0] When IPv6 route is created with RTA_NH_ID, struct fib6_info does not have the trailing struct fib6_nh. The cited commit started to check !iter->fib6_nh->fib_nh_gw_family to ensure that rt6_qualify_for_ecmp() will return false for iter. If iter->nh is not NULL, rt6_qualify_for_ecmp() returns false anyway. Let's check iter->nh before reading iter->fib6_nh and avoid OOB read. [0]: BUG: KASAN: slab-out-of-bounds in fib6_add_rt2node+0x349c/0x3500 net/ipv6/ip6_fib.c:1142 Read of size 1 at addr ffff8880384ba6de by task syz.0.18/5500 CPU: 0 UID: 0 PID: 5500 Comm: syz.0.18 Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xba/0x230 mm/kasan/report.c:482 kasan_report+0x117/0x150 mm/kasan/report.c:595 fib6_add_rt2node+0x349c/0x3500 net/ipv6/ip6_fib.c:1142 fib6_add_rt2node_nh net/ipv6/ip6_fib.c:1363 [inline] fib6_add+0x910/0x18c0 net/ipv6/ip6_fib.c:1531 __ip6_ins_rt net/ipv6/route.c:1351 [inline] ip6_route_add+0xde/0x1b0 net/ipv6/route.c:3957 inet6_rtm_newroute+0x268/0x19e0 net/ipv6/route.c:5660 rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6958 netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline] netlink_unicast+0x80f/0x9b0 net/netlink/af_netlink.c:1344 netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg net/socket.c:742 [inline] ____sys_sendmsg+0xa68/0xad0 net/socket.c:2592 ___sys_sendmsg+0x2a5/0x360 net/socket.c:2646 __sys_sendmsg net/socket.c:2678 [inline] __do_sys_sendmsg net/socket.c:2683 [inline] __se_sys_sendmsg net/socket.c:2681 [inline] __x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2681 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xe2/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f9316b9aeb9 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffd8809b678 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f9316e15fa0 RCX: 00007f9316b9aeb9 RDX: 0000000000000000 RSI: 0000200000004380 RDI: 0000000000000003 RBP: 00007f9316c08c1f R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f9316e15fac R14: 00007f9316e15fa0 R15: 00007f9316e15fa0 </TASK> Allocated by task 5499: kasan_save_stack mm/kasan/common.c:57 [inline] kasan_save_track+0x3e/0x80 mm/kasan/common.c:78 poison_kmalloc_redzone mm/kasan/common.c:398 [inline] __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415 kasan_kmalloc include/linux/kasan.h:263 [inline] __do_kmalloc_node mm/slub.c:5657 [inline] __kmalloc_noprof+0x40c/0x7e0 mm/slub.c:5669 kmalloc_noprof include/linux/slab.h:961 [inline] kzalloc_noprof include/linux/slab.h:1094 [inline] fib6_info_alloc+0x30/0xf0 net/ipv6/ip6_fib.c:155 ip6_route_info_create+0x142/0x860 net/ipv6/route.c:3820 ip6_route_add+0x49/0x1b0 net/ipv6/route.c:3949 inet6_rtm_newroute+0x268/0x19e0 net/ipv6/route.c:5660 rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6958 netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline] netlink_unicast+0x80f/0x9b0 net/netlink/af_netlink.c:1344 netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg net/socket.c:742 [inline] ____sys_sendmsg+0xa68/0xad0 net/socket.c:2592 ___sys_sendmsg+0x2a5/0x360 net/socket.c:2646 __sys_sendmsg net/socket.c:2678 [inline] __do_sys_sendmsg net/socket.c:2683 [inline] __se_sys_sendmsg net/socket.c:2681 [inline] __x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2681 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xe2/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: `bbf4a17ad9` ("ipv6: Fix ECMP sibling count mismatch when clearing RTF_ADDRCONF") Reported-by: syzbot+707d6a5da1ab9e0c6f9d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/698cbfba.050a0220.2eeac1.009d.GAE@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Shigeru Yoshida <syoshida@redhat.com> Link: https://patch.msgid.link/20260211175133.3657034-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-13 12:24:28 -08:00
Qanux	6db8b56eed	ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data() On the receive path, __ioam6_fill_trace_data() uses trace->nodelen to decide how much data to write for each node. It trusts this field as-is from the incoming packet, with no consistency check against trace->type (the 24-bit field that tells which data items are present). A crafted packet can set nodelen=0 while setting type bits 0-21, causing the function to write ~100 bytes past the allocated region (into skb_shared_info), which corrupts adjacent heap memory and leads to a kernel panic. Add a shared helper ioam6_trace_compute_nodelen() in ioam6.c to derive the expected nodelen from the type field, and use it: - in ioam6_iptunnel.c (send path, existing validation) to replace the open-coded computation; - in exthdrs.c (receive path, ipv6_hop_ioam) to drop packets whose nodelen is inconsistent with the type field, before any data is written. Per RFC 9197, bits 12-21 are each short (4-octet) fields, so they are included in IOAM6_MASK_SHORT_FIELDS (changed from 0xff100000 to 0xff1ffc00). Fixes: `9ee11f0fff` ("ipv6: ioam: Data plane support for Pre-allocated Trace") Cc: stable@vger.kernel.org Signed-off-by: Junxi Qian <qjx1298677004@gmail.com> Reviewed-by: Justin Iurman <justin.iurman@gmail.com> Link: https://patch.msgid.link/20260211040412.86195-1-qjx1298677004@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-13 12:24:05 -08:00
Paolo Abeni	83310d6133	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Merge in late fixes in preparation for the net-next PR. Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-11 15:14:35 +01:00
Eric Dumazet	97d7ae6e14	tcp: inet6_csk_xmit() optimization After prior patches, inet6_csk_xmit() can reuse inet->cork.fl.u.ip6 if __sk_dst_check() returns a valid dst. Otherwise call inet6_csk_route_socket() to refresh inet->cork.fl.u.ip6 content and get a new dst. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260206173426.1638518-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 20:57:50 -08:00
Eric Dumazet	a6eee39cc2	tcp: populate inet->cork.fl.u.ip6 in tcp_v6_syn_recv_sock() As explained in commit `85d05e2817` ("ipv6: change inet6_sk_rebuild_header() to use inet->cork.fl.u.ip6"): TCP v6 spends a good amount of time rebuilding a fresh fl6 at each transmit in inet6_csk_xmit()/inet6_csk_route_socket(). TCP v4 caches the information in inet->cork.fl.u.ip4 instead. After this patch, passive TCP ipv6 flows have correctly initialized inet->cork.fl.u.ip6 structure. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260206173426.1638518-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 20:57:50 -08:00
Eric Dumazet	19bdb267f7	tcp: populate inet->cork.fl.u.ip6 in tcp_v6_connect() Instead of using private @fl6 and @final variables use respectively inet->cork.fl.u.ip6 and np->final. As explained in commit `85d05e2817` ("ipv6: change inet6_sk_rebuild_header() to use inet->cork.fl.u.ip6"): TCP v6 spends a good amount of time rebuilding a fresh fl6 at each transmit in inet6_csk_xmit()/inet6_csk_route_socket(). TCP v4 caches the information in inet->cork.fl.u.ip4 instead. After this patch, active TCP ipv6 flows have correctly initialized inet->cork.fl.u.ip6 structure. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260206173426.1638518-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 20:57:50 -08:00
Eric Dumazet	969a20198b	ipv6: inet6_csk_xmit() and inet6_csk_update_pmtu() use inet->cork.fl.u.ip6 Convert inet6_csk_route_socket() to use np->final instead of an automatic variable to get rid of a stack canary. Convert inet6_csk_xmit() and inet6_csk_update_pmtu() to use inet->cork.fl.u.ip6 instead of @fl6 automatic variable. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260206173426.1638518-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 20:57:50 -08:00
Eric Dumazet	4e6c91cf60	ipv6: use inet->cork.fl.u.ip6 and np->final in ip6_datagram_dst_update() Get rid of @fl6 and &final variables in ip6_datagram_dst_update(). Use instead inet->cork.fl.u.ip6 and np->final so that a stack canary is no longer needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260206173426.1638518-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 20:57:49 -08:00
Eric Dumazet	3d3f075e80	ipv6: use np->final in inet6_sk_rebuild_header() Instead of using an automatic variable, use np->final to get rid of the stack canary in inet6_sk_rebuild_header(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260206173426.1638518-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 20:57:49 -08:00
Alice Mikityanska	1676ebba39	net/ipv6: Remove jumbo_remove step from TX path Now that the kernel doesn't insert HBH for BIG TCP IPv6 packets, remove unnecessary steps from the GSO TX path, that used to check and remove HBH. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260205133925.526371-5-alice.kernel@fastmail.im Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-06 20:50:12 -08:00
Alice Mikityanska	81be30c1f5	net/ipv6: Drop HBH for BIG TCP on RX side Complementary to the previous commit, stop inserting HBH when building BIG TCP GRO SKBs. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260205133925.526371-4-alice.kernel@fastmail.im Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-06 20:50:12 -08:00
Alice Mikityanska	741d069aa4	net/ipv6: Drop HBH for BIG TCP on TX side BIG TCP IPv6 inserts a hop-by-hop extension header to indicate the real IPv6 payload length when it doesn't fit into the 16-bit field in the IPv6 header itself. While it helps tools parse the packet, it also requires every driver that supports TSO and BIG TCP to remove this 8-byte extension header. It might not sound that bad until we try to apply it to tunneled traffic. Currently, the drivers don't attempt to strip HBH if skb->encapsulation = 1. Moreover, trying to do so would require dissecting different tunnel protocols and making corresponding adjustments on case-by-case basis, which would slow down the fastpath (potentially also requiring adjusting checksums in outer headers). At the same time, BIG TCP IPv4 doesn't insert any extra headers and just calculates the payload length from skb->len, significantly simplifying implementing BIG TCP for tunnels. Stop inserting HBH when building BIG TCP GSO SKBs. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260205133925.526371-3-alice.kernel@fastmail.im Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-06 20:50:12 -08:00
Alice Mikityanska	b2936b4fd5	net/ipv6: Introduce payload_len helpers The next commits will transition away from using the hop-by-hop extension header to encode packet length for BIG TCP. Add wrappers around ip6->payload_len that return the actual value if it's non-zero, and calculate it from skb->len if payload_len is set to zero (and a symmetrical setter). The new helpers are used wherever the surrounding code supports the hop-by-hop jumbo header for BIG TCP IPv6, or the corresponding IPv4 code uses skb_ip_totlen (e.g., in include/net/netfilter/nf_tables_ipv6.h). No behavioral change in this commit. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260205133925.526371-2-alice.kernel@fastmail.im Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-06 20:50:03 -08:00
Eric Dumazet	a14d931790	ipv6: do not use skb_header_pointer() in icmpv6_filter() Prefer pskb_may_pull() to avoid a stack canary in raw6_local_deliver(). Note: skb->head can change, hence we reload ip6h pointer in ipv6_raw_deliver() $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-86 (-86) Function old new delta raw6_local_deliver 780 694 -86 Total: Before=24889784, After=24889698, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260205211909.4115285-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-06 20:34:20 -08:00
Eric Dumazet	c89477ad79	inet: RAW sockets using IPPROTO_RAW MUST drop incoming ICMP Yizhou Zhao reported that simply having one RAW socket on protocol IPPROTO_RAW (255) was dangerous. socket(AF_INET, SOCK_RAW, 255); A malicious incoming ICMP packet can set the protocol field to 255 and match this socket, leading to FNHE cache changes. inner = IP(src="192.168.2.1", dst="8.8.8.8", proto=255)/Raw("TEST") pkt = IP(src="192.168.1.1", dst="192.168.2.1")/ICMP(type=3, code=4, nexthopmtu=576)/inner "man 7 raw" states: A protocol of IPPROTO_RAW implies enabled IP_HDRINCL and is able to send any IP protocol that is specified in the passed header. Receiving of all IP protocols via IPPROTO_RAW is not possible using raw sockets. Make sure we drop these malicious packets. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn> Link: https://lore.kernel.org/netdev/20251109134600.292125-1-zhaoyz24@mails.tsinghua.edu.cn/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260203192509.682208-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-05 12:36:49 -08:00
Jakub Kicinski	a182a62ff7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.19-rc9). No adjacent changes, conflicts: drivers/net/ethernet/spacemit/k1_emac.c `3125fc1701` ("net: spacemit: k1-emac: fix jumbo frame support") `f66086798f` ("net: spacemit: Remove broken flow control support") https://lore.kernel.org/aYIysFIE9ooavWia@sirena.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-05 09:54:08 -08:00
Eric Dumazet	85d05e2817	ipv6: change inet6_sk_rebuild_header() to use inet->cork.fl.u.ip6 TCP v6 spends a good amount of time rebuilding a fresh fl6 at each transmit in inet6_csk_xmit()/inet6_csk_route_socket(). TCP v4 caches the information in inet->cork.fl.u.ip4 instead. This patch is a first step converting IPv6 to the same strategy: Before this patch inet6_sk_rebuild_header() only validated/rebuilt a dst. Automatic variable @fl6 content was lost. After this patch inet6_sk_rebuild_header() also initializes inet->cork.fl.u.ip6, which can be reused in the future. This makes inet6_sk_rebuild_header() very similar to inet_sk_rebuild_header(). Also remove the EXPORT_SYMBOL_GPL(), inet6_sk_rebuild_header() is not called from any module. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260204163035.4123817-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-05 09:24:10 -08:00
Shigeru Yoshida	bbf4a17ad9	ipv6: Fix ECMP sibling count mismatch when clearing RTF_ADDRCONF syzbot reported a kernel BUG in fib6_add_rt2node() when adding an IPv6 route. [0] Commit `f72514b3c5` ("ipv6: clear RA flags when adding a static route") introduced logic to clear RTF_ADDRCONF from existing routes when a static route with the same nexthop is added. However, this causes a problem when the existing route has a gateway. When RTF_ADDRCONF is cleared from a route that has a gateway, that route becomes eligible for ECMP, i.e. rt6_qualify_for_ecmp() returns true. The issue is that this route was never added to the fib6_siblings list. This leads to a mismatch between the following counts: - The sibling count computed by iterating fib6_next chain, which includes the newly ECMP-eligible route - The actual siblings in fib6_siblings list, which does not include that route When a subsequent ECMP route is added, fib6_add_rt2node() hits BUG_ON(sibling->fib6_nsiblings != rt->fib6_nsiblings) because the counts don't match. Fix this by only clearing RTF_ADDRCONF when the existing route does not have a gateway. Routes without a gateway cannot qualify for ECMP anyway (rt6_qualify_for_ecmp() requires fib_nh_gw_family), so clearing RTF_ADDRCONF on them is safe and matches the original intent of the commit. [0]: kernel BUG at net/ipv6/ip6_fib.c:1217! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 6010 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 RIP: 0010:fib6_add_rt2node+0x3433/0x3470 net/ipv6/ip6_fib.c:1217 [...] Call Trace: <TASK> fib6_add+0x8da/0x18a0 net/ipv6/ip6_fib.c:1532 __ip6_ins_rt net/ipv6/route.c:1351 [inline] ip6_route_add+0xde/0x1b0 net/ipv6/route.c:3946 ipv6_route_ioctl+0x35c/0x480 net/ipv6/route.c:4571 inet6_ioctl+0x219/0x280 net/ipv6/af_inet6.c:577 sock_do_ioctl+0xdc/0x300 net/socket.c:1245 sock_ioctl+0x576/0x790 net/socket.c:1366 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: `f72514b3c5` ("ipv6: clear RA flags when adding a static route") Reported-by: syzbot+cb809def1baaac68ab92@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=cb809def1baaac68ab92 Tested-by: syzbot+cb809def1baaac68ab92@syzkaller.appspotmail.com Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Link: https://patch.msgid.link/20260204095837.1285552-1-syoshida@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-05 08:38:40 -08:00
Eric Dumazet	b409a7f717	ipv6: colocate inet6_cork in inet_cork_full All inet6_cork users also use one inet_cork_full. Reduce number of parameters and increase data locality. This saves ~275 bytes of code on x86_64. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-02 17:49:30 -08:00
Eric Dumazet	fe8570186f	ipv4: use dst4_mtu() instead of dst_mtu() When we expect an IPv4 dst, use dst4_mtu() instead of dst_mtu() to save some code space. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-02 17:49:29 -08:00

1 2 3 4 5 ...

8903 Commits