Commit Graph

3086 Commits

Author SHA1 Message Date
Jason Xing 45e359be1c net: xsk: introduce XDP_MAX_TX_SKB_BUDGET setsockopt
This patch provides a setsockopt method to let applications leverage to
adjust how many descs to be handled at most in one send syscall. It
mitigates the situation where the default value (32) that is too small
leads to higher frequency of triggering send syscall.

Considering the prosperity/complexity the applications have, there is no
absolutely ideal suggestion fitting all cases. So keep 32 as its default
value like before.

The patch does the following things:
- Add XDP_MAX_TX_SKB_BUDGET socket option.
- Set max_tx_budget to 32 by default in the initialization phase as a
  per-socket granular control.
- Set the range of max_tx_budget as [32, xs->tx->nentries].

The idea behind this comes out of real workloads in production. We use a
user-level stack with xsk support to accelerate sending packets and
minimize triggering syscalls. When the packets are aggregated, it's not
hard to hit the upper bound (namely, 32). The moment user-space stack
fetches the -EAGAIN error number passed from sendto(), it will loop to try
again until all the expected descs from tx ring are sent out to the driver.
Enlarging the XDP_MAX_TX_SKB_BUDGET value contributes to less frequency of
sendto() and higher throughput/PPS.

Here is what I did in production, along with some numbers as follows:
For one application I saw lately, I suggested using 128 as max_tx_budget
because I saw two limitations without changing any default configuration:
1) XDP_MAX_TX_SKB_BUDGET, 2) socket sndbuf which is 212992 decided by
net.core.wmem_default. As to XDP_MAX_TX_SKB_BUDGET, the scenario behind
this was I counted how many descs are transmitted to the driver at one
time of sendto() based on [1] patch and then I calculated the
possibility of hitting the upper bound. Finally I chose 128 as a
suitable value because 1) it covers most of the cases, 2) a higher
number would not bring evident results. After twisting the parameters,
a stable improvement of around 4% for both PPS and throughput and less
resources consumption were found to be observed by strace -c -p xxx:
1) %time was decreased by 7.8%
2) error counter was decreased from 18367 to 572

[1]: https://lore.kernel.org/all/20250619093641.70700-1-kerneljasonxing@gmail.com/

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20250704160138.48677-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-10 14:48:29 +02:00
Kamil Horák - 2N 67c0170566 net: phy: MII-Lite PHY interface mode
Some Broadcom PHYs are capable to operate in simplified MII mode,
without TXER, RXER, CRS and COL signals as defined for the MII.
The MII-Lite mode can be used on most Ethernet controllers with full
MII interface by just leaving the input signals (RXER, CRS, COL)
inactive. The absence of COL signal makes half-duplex link modes
impossible but does not interfere with BroadR-Reach link modes on
Broadcom PHYs, because they are all full-duplex only.

Add MII-Lite interface mode, especially for Broadcom two-wire PHYs.

Signed-off-by: Kamil Horák - 2N <kamilh@axis.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250708090140.61355-2-kamilh@axis.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09 19:32:30 -07:00
Jason Xing 819802e25a Documentation: xsk: correct the obsolete references and examples
The modified lines are mainly related to the following commits[1][2]
which remove those tests and examples. Since samples/bpf has been
deprecated, we can refer to more examples that are easily searched
in the various xdp-projects, like the following link:
https://github.com/xdp-project/bpf-examples/tree/main/AF_XDP-example

[1]
commit f366006342 ("libbpf: move xsk.{c,h} into selftests/bpf")
[2]
commit cfb5a2dbf1 ("bpf, samples: Remove AF_XDP samples")

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250708062907.11557-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09 19:29:32 -07:00
Ivan Vecera 2df8e64e01 dpll: Add basic Microchip ZL3073x support
Microchip Azurite ZL3073x represents chip family providing DPLL
and optionally PHC (PTP) functionality. The chips can be connected
be connected over I2C or SPI bus.

They have the following characteristics:
* up to 5 separate DPLL units (channels)
* 5 synthesizers
* 10 input pins (references)
* 10 outputs
* 20 output pins (output pin pair shares one output)
* Each reference and output can operate in either differential or
  single-ended mode (differential mode uses 2 pins)
* Each output is connected to one of the synthesizers
* Each synthesizer is driven by one of the DPLL unit

The device uses 7-bit addresses and 8-bits values. It exposes 8-, 16-,
32- and 48-bits registers in address range <0x000,0x77F>. Due to 7bit
addressing, the range is organized into pages of 128 bytes, with each
page containing a page selector register at address 0x7F.
For reading/writing multi-byte registers, the device supports bulk
transfers.

Add basic functionality to access device registers, probe functionality
both I2C and SPI cases and add devlink support to provide info and
to set clock ID parameter.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-6-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09 19:08:52 -07:00
Ivan Vecera de9ccf2296 devlink: Add new "clock_id" generic device param
Add a new device generic parameter to specify clock ID that should
be used by the device for registering DPLL devices and pins.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-5-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09 19:08:52 -07:00
Mengyuan Lou a0008a3658 net: wangxun: add ngbevf build
Add doc build infrastructure for ngbevf driver.
Implement the basic PCI driver loading and unloading interface.
Initialize the id_table which support 1G virtual
functions for Wangxun.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-10-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09 18:39:13 -07:00
Mengyuan Lou 377d180bd7 net: wangxun: add txgbevf build
Add doc build infrastructure for txgbevf driver.
Implement the basic PCI driver loading and unloading interface.
Initialize the id_table which support 10/25/40G virtual
functions for Wangxun.
Ioremap the space of bar0 and bar4 which will be used.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-5-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09 18:39:13 -07:00
Xin Guo 19c066f940 tcp: update the outdated ref draft-ietf-tcpm-rack
As RACK-TLP was published as a standards-track RFC8985,
so the outdated ref draft-ietf-tcpm-rack need to be updated.

Signed-off-by: Xin Guo <guoxin0309@gmail.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20250705163647.301231-1-guoxin0309@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08 09:01:52 -07:00
Tonghao Zhang 2f9afffc39 net: bonding: send peer notify when failure recovery
In LACP mode with broadcast_neighbor enabled, after LACP protocol
recovery, the port can transmit packets. However, if the bond port
doesn't send gratuitous ARP/ND packets to the switch, the switch
won't return packets through the current interface. This causes
traffic imbalance. To resolve this issue, when LACP protocol recovers,
send ARP/ND packets if broadcast_neighbor is enabled.

Cc: Jay Vosburgh <jv@jvosburgh.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com>
Signed-off-by: Zengbing Tu <tuzengbing@didiglobal.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/3993652dc093fffa9504ce1c2448fb9dea31d2d2.1751031306.git.tonghao@bamaicloud.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08 10:59:42 +02:00
Tonghao Zhang ce7a381697 net: bonding: add broadcast_neighbor option for 802.3ad
Stacking technology is a type of technology used to expand ports on
Ethernet switches. It is widely used as a common access method in
large-scale Internet data center architectures. Years of practice
have proved that stacking technology has advantages and disadvantages
in high-reliability network architecture scenarios. For instance,
in stacking networking arch, conventional switch system upgrades
require multiple stacked devices to restart at the same time.
Therefore, it is inevitable that the business will be interrupted
for a while. It is for this reason that "no-stacking" in data centers
has become a trend. Additionally, when the stacking link connecting
the switches fails or is abnormal, the stack will split. Although it is
not common, it still happens in actual operation. The problem is that
after the split, it is equivalent to two switches with the same
configuration appearing in the network, causing network configuration
conflicts and ultimately interrupting the services carried by the
stacking system.

To improve network stability, "non-stacking" solutions have been
increasingly adopted, particularly by public cloud providers and
tech companies like Alibaba, Tencent, and Didi. "non-stacking" is
a method of mimicing switch stacking that convinces a LACP peer,
bonding in this case, connected to a set of "non-stacked" switches
that all of its ports are connected to a single switch
(i.e., LACP aggregator), as if those switches were stacked. This
enables the LACP peer's ports to aggregate together, and requires
(a) special switch configuration, described in the linked article,
and (b) modifications to the bonding 802.3ad (LACP) mode to send
all ARP/ND packets across all ports of the active aggregator.

Note that, with multiple aggregators, the current broadcast mode
logic will send only packets to the selected aggregator(s).

 +-----------+   +-----------+
 |  switch1  |   |  switch2  |
 +-----------+   +-----------+
         ^           ^
         |           |
      +-----------------+
      |   bond4 lacp    |
      +-----------------+
         |           |
         | NIC1      | NIC2
      +-----------------+
      |     server      |
      +-----------------+

- https://www.ruijie.com/fr-fr/support/tech-gallery/de-stack-data-center-network-architecture/

Cc: Jay Vosburgh <jv@jvosburgh.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com>
Signed-off-by: Zengbing Tu <tuzengbing@didiglobal.com>
Link: https://patch.msgid.link/84d0a044514157bb856a10b6d03a1028c4883561.1751031306.git.tonghao@bamaicloud.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08 10:59:41 +02:00
Paolo Abeni 6b9fd8857b Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.16-rc5).

No conflicts.

No adjacent changes.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-04 08:03:18 +02:00
Bagas Sanjaya 2f1fa26eef net: ip-sysctl: Add link to SCTP IPv4 scoping draft
addr_scope_policy description contains pointer to SCTP IPv4 scoping
draft but not its IETF Datatracker link. Add it.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250701031300.19088-6-bagasdotme@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-03 15:51:45 +02:00
Bagas Sanjaya 82b0566000 net: ip-sysctl: Format SCTP-related memory parameters description as bullet list
The description for vector elements of SCTP-related memory usage
parameters (sctp{r,w,}mem) is formatted as normal paragraphs rather than
bullet list. Convert the description to the latter.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250701031300.19088-5-bagasdotme@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-03 15:51:45 +02:00
Bagas Sanjaya 98bc1d41f2 net: ip-sysctl: Format pf_{enable,expose} boolean lists as bullet lists
These lists' items were separated by newlines but without bullet list
marker. Turn the lists into proper bullet list.

While at it, also reword values description for pf_expose to not repeat
mentioning SCTP_PEER_ADDR_CHANGE and SCTP_GET_PEER_ADDR_INFO sockopt.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250701031300.19088-4-bagasdotme@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-03 15:51:45 +02:00
Bagas Sanjaya 2040058db3 net: ip-sysctl: Format possible value range of ioam6_id{,_wide} as bullet list
Format possible value range bounds of ioam6_id and ioam6_id_wide as
bullet list instead of running paragraph.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250701031300.19088-3-bagasdotme@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-03 15:51:44 +02:00
Bagas Sanjaya 501aeb1ef4 net: ip-sysctl: Format Private VLAN proxy arp aliases as bullet list
Alias names list for private VLAN proxy arp technology is formatted as
indented paragraph instead. Make it bullet list as it is better fit for
this purpose.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250701031300.19088-2-bagasdotme@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-03 15:51:44 +02:00
Pablo Neira Ayuso fd72f265bb netfilter: conntrack: remove DCCP protocol support
The DCCP socket family has now been removed from this tree, see:

  8bb3212be4 ("Merge branch 'net-retire-dccp-socket'")

Remove connection tracking and NAT support for this protocol, this
should not pose a problem because no DCCP traffic is expected to be seen
on the wire.

As for the code for matching on dccp header for iptables and nftables,
mark it as deprecated and keep it in place. Ruleset restoration is an
atomic operation. Without dccp matching support, an astray match on dccp
could break this operation leaving your computer with no policy in
place, so let's follow a more conservative approach for matches.

Add CONFIG_NFT_EXTHDR_DCCP which is set to 'n' by default to deprecate
dccp extension support. Similarly, label CONFIG_NETFILTER_XT_MATCH_DCCP
as deprecated too and also set it to 'n' by default.

Code to match on DCCP protocol from ebtables also remains in place, this
is just a few checks on IPPROTO_DCCP from _check() path which is
exercised when ruleset is loaded. There is another use of IPPROTO_DCCP
from the _check() path in the iptables multiport match. Another check
for IPPROTO_DCCP from the packet in the reject target is also removed.

So let's schedule removal of the dccp matching for a second stage, this
should not interfer with the dccp retirement since this is only matching
on the dccp header.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-03 13:51:39 +02:00
Carolina Jubran 566e8f108f devlink: Extend devlink rate API with traffic classes bandwidth management
Introduce support for specifying relative bandwidth shares between
traffic classes (TC) in the devlink-rate API. This new option allows
users to allocate bandwidth across multiple traffic classes in a
single command.

This feature provides a more granular control over traffic management,
especially for scenarios requiring Enhanced Transmission Selection.

Users can now define a relative bandwidth share for each traffic class.
For example, assigning share values of 20 to TC0 (TCP/UDP) and 80 to TC5
(RoCE) will result in TC0 receiving 20% and TC5 receiving 80% of the
total bandwidth. The actual percentage each class receives depends on
the ratio of its share value to the sum of all shares.

Example:
DEV=pci/0000:08:00.0

$ devlink port function rate add $DEV/vfs_group tx_share 10Gbit \
  tx_max 50Gbit tc-bw 0:20 1:0 2:0 3:0 4:0 5:80 6:0 7:0

$ devlink port function rate set $DEV/vfs_group \
  tc-bw 0:20 1:0 2:0 3:0 4:0 5:20 6:60 7:0

Example usage with ynl:

./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \
  --do rate-set --json '{
  "bus-name": "pci",
  "dev-name": "0000:08:00.0",
  "port-index": 1,
  "rate-tc-bws": [
    {"rate-tc-index": 0, "rate-tc-bw": 50},
    {"rate-tc-index": 1, "rate-tc-bw": 50},
    {"rate-tc-index": 2, "rate-tc-bw": 0},
    {"rate-tc-index": 3, "rate-tc-bw": 0},
    {"rate-tc-index": 4, "rate-tc-bw": 0},
    {"rate-tc-index": 5, "rate-tc-bw": 0},
    {"rate-tc-index": 6, "rate-tc-bw": 0},
    {"rate-tc-index": 7, "rate-tc-bw": 0}
  ]
}'

./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \
  --do rate-get --json '{
  "bus-name": "pci",
  "dev-name": "0000:08:00.0",
  "port-index": 1
}'

output for rate-get:
{'bus-name': 'pci',
 'dev-name': '0000:08:00.0',
 'port-index': 1,
 'rate-tc-bws': [{'rate-tc-bw': 50, 'rate-tc-index': 0},
                 {'rate-tc-bw': 50, 'rate-tc-index': 1},
                 {'rate-tc-bw': 0, 'rate-tc-index': 2},
                 {'rate-tc-bw': 0, 'rate-tc-index': 3},
                 {'rate-tc-bw': 0, 'rate-tc-index': 4},
                 {'rate-tc-bw': 0, 'rate-tc-index': 5},
                 {'rate-tc-bw': 0, 'rate-tc-index': 6},
                 {'rate-tc-bw': 0, 'rate-tc-index': 7}],
 'rate-tx-max': 0,
 'rate-tx-priority': 0,
 'rate-tx-share': 0,
 'rate-tx-weight': 0,
 'rate-type': 'leaf'}

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250629142138.361537-3-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02 15:39:05 -07:00
Dave Marquardt 16f87fb243 docs: netdevsim: fixe typo in netdevsim documentation
Fixed a typographical error in "Rate objects" section

Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Dave Marquardt <davemarq@linux.ibm.com>
Link: https://patch.msgid.link/20250630-netdevsim-typo-fix-v3-1-e1eae3a5f018@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-01 17:38:50 -07:00
Jakub Kicinski 8b79380dfe docs: fbnic: explain the ring config
fbnic takes 4 parameters to configure the Rx queues. The semantics
are similar to other existing NICs but confusing to newcomers.
Document it.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250626191554.32343-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-01 12:48:28 +02:00
Ulrich Weber ba2f83eecd doc: tls: socket needs to be established to enable ulp
To enable TLS ulp socket needs to be in established state.
This was added in commit d91c3e17f7 ("net/tls: Only attach
to sockets in ESTABLISHED state"), in 2018.

Signed-off-by: Ulrich Weber <ulrich.weber@gmail.com>
Link: https://patch.msgid.link/20250626145618.15464-1-ulrich.weber@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-27 15:33:01 -07:00
Jakub Kicinski 28aa52b618 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.16-rc4).

Conflicts:

Documentation/netlink/specs/mptcp_pm.yaml
  9e6dd4c256 ("netlink: specs: mptcp: replace underscores with dashes in names")
  ec362192aa ("netlink: specs: fix up indentation errors")
https://lore.kernel.org/20250626122205.389c2cd4@canb.auug.org.au

Adjacent changes:

Documentation/netlink/specs/fou.yaml
  791a9ed0a4 ("netlink: specs: fou: replace underscores with dashes in names")
  880d43ca9a ("netlink: specs: clean up spaces in brackets")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26 10:40:50 -07:00
Abdelrahman Fekry 5cfb2ac280 docs: net: sysctl documentation cleanup
Add missing default values for networking sysctl parameters and
standardize documentation:
- Use "0 (disabled)" / "1 (enabled)" format consistently
- Fix cipso_rbm_struct_valid -> cipso_rbm_strictvalid typo
- Convert fwmark_reflect description to enabled/disabled terminology
- Document possible values for tcp_autocorking

Also addresses formatting inconsistencies in touched parameters.

Signed-off-by: Abdelrahman Fekry <abdelrahmanfekry375@gmail.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://patch.msgid.link/20250624150923.40590-1-abdelrahmanfekry375@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-26 15:02:00 +02:00
Jakub Kicinski 47c3ed01af doc: ethtool: mark ETHTOOL_GRXFHINDIR as reimplemented
The ETHTOOL_GRXFHINDIR reimplementation has been completed around
a year ago. We have been tweaking it so a bit hard to point
to a single commit that completed it, but all the fields available
in IOCTL are reported via Netlink.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250623231720.3124717-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-25 15:24:14 -07:00
Jakub Kicinski 46837be5af net: ethtool: rss: add notifications
In preparation for RSS_SET handling in ethnl introduce Netlink
notifications for RSS. Only cover modifications, not creation
and not removal of a context, because the latter may deserve
a different notification type. We should cross that bridge
when we add the support for context add / remove via Netlink.

Link: https://patch.msgid.link/20250623231720.3124717-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-25 15:24:14 -07:00
Faisal Bukhari 302251f1fd Fix typo in marvell octeontx2 documentation
Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst
Fixes a spelling mistake: "funcionality" → "functionality".

Signed-off-by: Faisal Bukhari <faisalbukhari523@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2025-06-22 18:16:08 +01:00
Bagas Sanjaya 5194439d70 Documentation: treewide: Replace remaining spinics links with lore
Long before introduction of lore.kernel.org, people would link
to LKML threads on third-party archives (here spinics.net), which
in some cases can be unreliable (as these were outside of
kernel.org control). Replace links to them with lore counterparts
(if any).

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20250611065254.36608-2-bagasdotme@gmail.com
2025-06-21 14:20:51 -06:00
Kory Maincent (Dent Project) eeb0c8f72f net: ethtool: Add PSE port priority support feature
This patch expands the status information provided by ethtool for PSE c33
with current port priority and max port priority. It also adds a call to
pse_ethtool_set_prio() to configure the PSE port priority.

Signed-off-by: Kory Maincent (Dent Project) <kory.maincent@bootlin.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20250617-feature_poe_port_prio-v14-8-78a1a645e2ee@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18 19:00:17 -07:00
Kory Maincent (Dent Project) 1176978ed8 net: ethtool: Add support for new power domains index description
Report the index of the newly introduced PSE power domain to the user,
enabling improved management of the power budget for PSE devices.

Signed-off-by: Kory Maincent (Dent Project) <kory.maincent@bootlin.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20250617-feature_poe_port_prio-v14-5-78a1a645e2ee@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18 19:00:16 -07:00
Kory Maincent (Dent Project) fc0e6db309 net: pse-pd: Add support for reporting events
Add support for devm_pse_irq_helper() to register PSE interrupts and report
events such as over-current or over-temperature conditions. This follows a
similar approach to the regulator API but also sends notifications using a
dedicated PSE ethtool netlink socket.

Signed-off-by: Kory Maincent (Dent Project) <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20250617-feature_poe_port_prio-v14-2-78a1a645e2ee@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18 19:00:15 -07:00
David Arinzon c922302143 net: ena: Add PHC documentation
Provide the relevant information and guidelines
about the feature support in the ENA driver.

Signed-off-by: Amit Bernstein <amitbern@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://patch.msgid.link/20250617110545.5659-10-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18 18:57:29 -07:00
David Arinzon 60e28350b1 net: ena: Add debugfs support to the ENA driver
Adding the base directory of debugfs to the driver.
In order for the folder to be unique per driver instantiation,
the chosen name is the device name.

This commit contains the initialization and the
base folder.

The creation of the base folder may fail, but is considered
non-fatal.

Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://patch.msgid.link/20250617110545.5659-8-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18 18:57:29 -07:00
David Arinzon cea465a96a devlink: Add new "enable_phc" generic device param
Add a new device generic parameter to enable/disable the
PHC (PTP Hardware Clock) functionality in the device associated
with the devlink instance.

Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250617110545.5659-6-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18 18:57:29 -07:00
David Arinzon 15115b1a25 net: ena: Add device reload capability through devlink
Adding basic devlink capability support of reloading the driver.
This capability is required to support driver init type
devlink params (DEVLINK_PARAM_CMODE_DRIVERINIT). Such params
require reloading of the driver (destroy/restore sequence).
The reloading is done by the devlink framework using the
hooks provided by the driver.

Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://patch.msgid.link/20250617110545.5659-4-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18 18:57:28 -07:00
David Arinzon e0ea34158e net: ena: Add PHC support in the ENA driver
The ENA driver will be extended to support the new PHC feature using
ptp_clock interface [1]. this will provide timestamp reference for user
space to allow measuring time offset between the PHC and the system
clock in order to achieve nanosecond accuracy.

[1] - https://www.kernel.org/doc/html/latest/driver-api/ptp.html

Signed-off-by: Amit Bernstein <amitbern@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://patch.msgid.link/20250617110545.5659-2-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18 18:57:28 -07:00
Gustavo Luiz Duarte 8c587aa3fa docs: netconsole: document msgid feature
Add documentation explaining the msgid feature in netconsole.

This feature appends unique id to the userdata dictionary. The message
ID is populated from a per-target 32 bit counter which is incremented
for each message sent to the target. This allows a target to detect if
messages are dropped before reaching the target.

Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2025-06-18 10:46:10 +01:00
Neal Cardwell ba4618885b tcp: remove RFC3517/RFC6675 hint state: lost_skb_hint, lost_cnt_hint
Now that obsolete RFC3517/RFC6675 TCP loss detection has been removed,
we can remove the somewhat complex and intrusive code to maintain its
hint state: lost_skb_hint and lost_cnt_hint.

This commit makes tcp_clear_retrans_hints_partial() empty. We will
remove tcp_clear_retrans_hints_partial() and its call sites in the
next commit.

Suggested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250615001435.2390793-3-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17 16:19:04 -07:00
Neal Cardwell 1c120191dc tcp: remove obsolete and unused RFC3517/RFC6675 loss recovery code
RACK-TLP loss detection has been enabled as the default loss detection
algorithm for Linux TCP since 2018, in:

 commit b38a51fec1 ("tcp: disable RFC6675 loss detection")

In case users ran into unexpected bugs or performance regressions,
that commit allowed Linux system administrators to revert to using
RFC3517/RFC6675 loss recovery by setting net.ipv4.tcp_recovery to 0.

In the seven years since 2018, our team has not heard reports of
anyone reverting Linux TCP to use RFC3517/RFC6675 loss recovery, and
we can't find any record in web searches of such a revert.

RACK-TLP was published as a standards-track RFC, RFC8985, in February
2021.

Several other major TCP implementations have default-enabled RACK-TLP
at this point as well.

RACK-TLP offers several significant performance advantages over
RFC3517/RFC6675 loss recovery, including much better performance in
the common cases of tail drops, lost retransmissions, and reordering.

It is now time to remove the obsolete and unused RFC3517/RFC6675 loss
recovery code. This will allow a substantial simplification of the
Linux TCP code base, and removes 12 bytes of state in every tcp_sock
for 64-bit machines (8 bytes on 32-bit machines).

To arrange the commits in reasonable sizes, this patch series is split
into 3 commits. The following 2 commits remove bookkeeping state and
code that is no longer needed after this removal of RFC3517/RFC6675
loss recovery.

Suggested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250615001435.2390793-2-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17 16:18:48 -07:00
Jakub Kicinski 34355b6712 linux-can-next-for-6.17-20250610
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEn/sM2K9nqF/8FWzzDHRl3/mQkZwFAmhH/swTHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRAMdGXf+ZCRnONCCACa16bTW53gBzmiTxdEgUJ/h+gQuR8G
 Fj+yOYIWNZY/YOExa40ldApu3iB9UAB0D+FOly4Wv5zYDct6yNBxqtZjbkTFMaoi
 3i+SSrRLNtIxgGs1KgJKVPis8mhCqiBL0aGoJDGyRiye6hotECDyQWvlGM3lMGUr
 wdMDQW2xyKOWvm++jXijkUMyKThmI7czlSH8al+JU9KcAO9hiUlGzejdI56KUIMW
 TRlg2QSK9CfIzgUP4RQughbF59/8Xbq3LOidu50xMad2wiOJj0IUHB0h6LoAshnS
 jFy4Ox4Gw5hcmdaEKazjYEtq3nQeZ6wct7jThw02e4D9h0ac2MCVhphk
 =Pt9d
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-next-for-6.17-20250610' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2025-06-10

The first 4 patches are by Vincent Mailhol and prepare the CAN netlink
interface for the introduction of CAN XL configuration.

Geert Uytterhoeven's patch updates the CAN networking documentation.

The last 2 patched are by Davide Caratti and introduce skb drop
reasons in the receive path of several CAN protocols.

* tag 'linux-can-next-for-6.17-20250610' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
  can: add drop reasons in CAN protocols receive path
  can: add drop reasons in the receive path of AF_CAN
  documentation: networking: can: Document alloc_candev_mqs()
  can: netlink: can_changelink(): rename tdc_mask into fd_tdc_flag_provided
  can: bittiming: rename can_tdc_is_enabled() into can_fd_tdc_is_enabled()
  can: bittiming: rename CAN_CTRLMODE_TDC_MASK into CAN_CTRLMODE_FD_TDC_MASK
  can: netlink: replace tabulation by space in assignment
====================

Link: https://patch.msgid.link/20250610094933.1593081-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-10 15:44:47 -07:00
Geert Uytterhoeven 9e97db3c07 documentation: networking: can: Document alloc_candev_mqs()
Since the introduction of alloc_candev_mqs() and friends, there is no
longer a need to allocate a generic network device and perform explicit
CAN-specific setup.  Remove the code showing this setup, and document
alloc_candev_mqs() instead.

Fixes: 39549eef35 ("can: CAN Network device driver and Netlink interface")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/c0f9a706ba31f1a49eb72e58526cd294d97a1ce9.1748865431.git.geert+renesas@glider.be
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-06-10 10:20:11 +02:00
Karol Kolacinski cb9e0de777 ice: add ice driver PTP pin documentation
Add a description of PTP pins support by the adapters to ice driver
documentation.

Reviewed-by: Milena Olech <milena.olech@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-09 09:56:18 -07:00
Linus Torvalds 00c010e130 - The 11 patch series "Add folio_mk_pte()" from Matthew Wilcox
simplifies the act of creating a pte which addresses the first page in a
   folio and reduces the amount of plumbing which architecture must
   implement to provide this.
 
 - The 8 patch series "Misc folio patches for 6.16" from Matthew Wilcox
   is a shower of largely unrelated folio infrastructure changes which
   clean things up and better prepare us for future work.
 
 - The 3 patch series "memory,x86,acpi: hotplug memory alignment
   advisement" from Gregory Price adds early-init code to prevent x86 from
   leaving physical memory unused when physical address regions are not
   aligned to memory block size.
 
 - The 2 patch series "mm/compaction: allow more aggressive proactive
   compaction" from Michal Clapinski provides some tuning of the (sadly,
   hard-coded (more sadly, not auto-tuned)) thresholds for our invokation
   of proactive compaction.  In a simple test case, the reduction of a guest
   VM's memory consumption was dramatic.
 
 - The 8 patch series "Minor cleanups and improvements to swap freeing
   code" from Kemeng Shi provides some code cleaups and a small efficiency
   improvement to this part of our swap handling code.
 
 - The 6 patch series "ptrace: introduce PTRACE_SET_SYSCALL_INFO API"
   from Dmitry Levin adds the ability for a ptracer to modify syscalls
   arguments.  At this time we can alter only "system call information that
   are used by strace system call tampering, namely, syscall number,
   syscall arguments, and syscall return value.
 
   This series should have been incorporated into mm.git's "non-MM"
   branch, but I goofed.
 
 - The 3 patch series "fs/proc: extend the PAGEMAP_SCAN ioctl to report
   guard regions" from Andrei Vagin extends the info returned by the
   PAGEMAP_SCAN ioctl against /proc/pid/pagemap.  This permits CRIU to more
   efficiently get at the info about guard regions.
 
 - The 2 patch series "Fix parameter passed to page_mapcount_is_type()"
   from Gavin Shan implements that fix.  No runtime effect is expected
   because validate_page_before_insert() happens to fix up this error.
 
 - The 3 patch series "kernel/events/uprobes: uprobe_write_opcode()
   rewrite" from David Hildenbrand basically brings uprobe text poking into
   the current decade.  Remove a bunch of hand-rolled implementation in
   favor of using more current facilities.
 
 - The 3 patch series "mm/ptdump: Drop assumption that pxd_val() is u64"
   from Anshuman Khandual provides enhancements and generalizations to the
   pte dumping code.  This might be needed when 128-bit Page Table
   Descriptors are enabled for ARM.
 
 - The 12 patch series "Always call constructor for kernel page tables"
   from Kevin Brodsky "ensures that the ctor/dtor is always called for
   kernel pgtables, as it already is for user pgtables".  This permits the
   addition of more functionality such as "insert hooks to protect page
   tables".  This change does result in various architectures performing
   unnecesary work, but this is fixed up where it is anticipated to occur.
 
 - The 9 patch series "Rust support for mm_struct, vm_area_struct, and
   mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM
   structures.
 
 - The 3 patch series "fix incorrectly disallowed anonymous VMA merges"
   from Lorenzo Stoakes takes advantage of some VMA merging opportunities
   which we've been missing for 15 years.
 
 - The 4 patch series "mm/madvise: batch tlb flushes for MADV_DONTNEED
   and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB
   flushing.  Instead of flushing each address range in the provided iovec,
   we batch the flushing across all the iovec entries.  The syscall's cost
   was approximately halved with a microbenchmark which was designed to
   load this particular operation.
 
 - The 6 patch series "Track node vacancy to reduce worst case allocation
   counts" from Sidhartha Kumar makes the maple tree smarter about its node
   preallocation.  stress-ng mmap performance increased by single-digit
   percentages and the amount of unnecessarily preallocated memory was
   dramaticelly reduced.
 
 - The 3 patch series "mm/gup: Minor fix, cleanup and improvements" from
   Baoquan He removes a few unnecessary things which Baoquan noted when
   reading the code.
 
 - The 3 patch series ""Enhance sysfs handling for memory hotplug in
   weighted interleave" from Rakie Kim "enhances the weighted interleave
   policy in the memory management subsystem by improving sysfs handling,
   fixing memory leaks, and introducing dynamic sysfs updates for memory
   hotplug support".  Fixes things on error paths which we are unlikely to
   hit.
 
 - The 7 patch series "mm/damon: auto-tune DAMOS for NUMA setups
   including tiered memory" from SeongJae Park introduces new DAMOS quota
   goal metrics which eliminate the manual tuning which is required when
   utilizing DAMON for memory tiering.
 
 - The 5 patch series "mm/vmalloc.c: code cleanup and improvements" from
   Baoquan He provides cleanups and small efficiency improvements which
   Baoquan found via code inspection.
 
 - The 2 patch series "vmscan: enforce mems_effective during demotion"
   from Gregory Price "changes reclaim to respect cpuset.mems_effective
   during demotion when possible".  because "presently, reclaim explicitly
   ignores cpuset.mems_effective when demoting, which may cause the cpuset
   settings to violated." "This is useful for isolating workloads on a
   multi-tenant system from certain classes of memory more consistently."
 
 - The 2 patch series ""Clean up split_huge_pmd_locked() and remove
   unnecessary folio pointers" from Gavin Guo provides minor cleanups and
   efficiency gains in in the huge page splitting and migrating code.
 
 - The 3 patch series "Use kmem_cache for memcg alloc" from Huan Yang
   creates a slab cache for `struct mem_cgroup', yielding improved memory
   utilization.
 
 - The 4 patch series "add max arg to swappiness in memory.reclaim and
   lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness="
   argument for memory.reclaim MGLRU's lru_gen.  This directs proactive
   reclaim to reclaim from only anon folios rather than file-backed folios.
 
 - The 17 patch series "kexec: introduce Kexec HandOver (KHO)" from Mike
   Rapoport is the first step on the path to permitting the kernel to
   maintain existing VMs while replacing the host kernel via file-based
   kexec.  At this time only memblock's reserve_mem is preserved.
 
 - The 7 patch series "mm: Introduce for_each_valid_pfn()" from David
   Woodhouse provides and uses a smarter way of looping over a pfn range.
   By skipping ranges of invalid pfns.
 
 - The 2 patch series "sched/numa: Skip VMA scanning on memory pinned to
   one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless
   VMA scanning when a task is pinned a single NUMA mode.  Dramatic
   performance benefits were seen in some real world cases.
 
 - The 2 patch series "JFS: Implement migrate_folio for
   jfs_metapage_aops" from Shivank Garg addresses a warning which occurs
   during memory compaction when using JFS.
 
 - The 4 patch series "move all VMA allocation, freeing and duplication
   logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c
   into the more appropriate mm/vma.c.
 
 - The 6 patch series "mm, swap: clean up swap cache mapping helper" from
   Kairui Song provides code consolidation and cleanups related to the
   folio_index() function.
 
 - The 2 patch series "mm/gup: Cleanup memfd_pin_folios()" from Vishal
   Moola does that.
 
 - The 8 patch series "memcg: Fix test_memcg_min/low test failures" from
   Waiman Long addresses some bogus failures which are being reported by
   the test_memcontrol selftest.
 
 - The 3 patch series "eliminate mmap() retry merge, add .mmap_prepare
   hook" from Lorenzo Stoakes commences the deprecation of
   file_operations.mmap() in favor of the new
   file_operations.mmap_prepare().  The latter is more restrictive and
   prevents drivers from messing with things in ways which, amongst other
   problems, may defeat VMA merging.
 
 - The 4 patch series "memcg: decouple memcg and objcg stocks"" from
   Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's
   one.  This is a step along the way to making memcg and objcg charging
   NMI-safe, which is a BPF requirement.
 
 - The 6 patch series "mm/damon: minor fixups and improvements for code,
   tests, and documents" from SeongJae Park is "yet another batch of
   miscellaneous DAMON changes.  Fix and improve minor problems in code,
   tests and documents."
 
 - The 7 patch series "memcg: make memcg stats irq safe" from Shakeel
   Butt converts memcg stats to be irq safe.  Another step along the way to
   making memcg charging and stats updates NMI-safe, a BPF requirement.
 
 - The 4 patch series "Let unmap_hugepage_range() and several related
   functions take folio instead of page" from Fan Ni provides folio
   conversions in the hugetlb code.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaDt5qgAKCRDdBJ7gKXxA
 ju6XAP9nTiSfRz8Cz1n5LJZpFKEGzLpSihCYyR6P3o1L9oe3mwEAlZ5+XAwk2I5x
 Qqb/UGMEpilyre1PayQqOnct3aSL9Ao=
 =tYYm
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of
   creating a pte which addresses the first page in a folio and reduces
   the amount of plumbing which architecture must implement to provide
   this.

 - "Misc folio patches for 6.16" from Matthew Wilcox is a shower of
   largely unrelated folio infrastructure changes which clean things up
   and better prepare us for future work.

 - "memory,x86,acpi: hotplug memory alignment advisement" from Gregory
   Price adds early-init code to prevent x86 from leaving physical
   memory unused when physical address regions are not aligned to memory
   block size.

 - "mm/compaction: allow more aggressive proactive compaction" from
   Michal Clapinski provides some tuning of the (sadly, hard-coded (more
   sadly, not auto-tuned)) thresholds for our invokation of proactive
   compaction. In a simple test case, the reduction of a guest VM's
   memory consumption was dramatic.

 - "Minor cleanups and improvements to swap freeing code" from Kemeng
   Shi provides some code cleaups and a small efficiency improvement to
   this part of our swap handling code.

 - "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin
   adds the ability for a ptracer to modify syscalls arguments. At this
   time we can alter only "system call information that are used by
   strace system call tampering, namely, syscall number, syscall
   arguments, and syscall return value.

   This series should have been incorporated into mm.git's "non-MM"
   branch, but I goofed.

 - "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from
   Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl
   against /proc/pid/pagemap. This permits CRIU to more efficiently get
   at the info about guard regions.

 - "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan
   implements that fix. No runtime effect is expected because
   validate_page_before_insert() happens to fix up this error.

 - "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David
   Hildenbrand basically brings uprobe text poking into the current
   decade. Remove a bunch of hand-rolled implementation in favor of
   using more current facilities.

 - "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman
   Khandual provides enhancements and generalizations to the pte dumping
   code. This might be needed when 128-bit Page Table Descriptors are
   enabled for ARM.

 - "Always call constructor for kernel page tables" from Kevin Brodsky
   ensures that the ctor/dtor is always called for kernel pgtables, as
   it already is for user pgtables.

   This permits the addition of more functionality such as "insert hooks
   to protect page tables". This change does result in various
   architectures performing unnecesary work, but this is fixed up where
   it is anticipated to occur.

 - "Rust support for mm_struct, vm_area_struct, and mmap" from Alice
   Ryhl adds plumbing to permit Rust access to core MM structures.

 - "fix incorrectly disallowed anonymous VMA merges" from Lorenzo
   Stoakes takes advantage of some VMA merging opportunities which we've
   been missing for 15 years.

 - "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from
   SeongJae Park optimizes process_madvise()'s TLB flushing.

   Instead of flushing each address range in the provided iovec, we
   batch the flushing across all the iovec entries. The syscall's cost
   was approximately halved with a microbenchmark which was designed to
   load this particular operation.

 - "Track node vacancy to reduce worst case allocation counts" from
   Sidhartha Kumar makes the maple tree smarter about its node
   preallocation.

   stress-ng mmap performance increased by single-digit percentages and
   the amount of unnecessarily preallocated memory was dramaticelly
   reduced.

 - "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes
   a few unnecessary things which Baoquan noted when reading the code.

 - ""Enhance sysfs handling for memory hotplug in weighted interleave"
   from Rakie Kim "enhances the weighted interleave policy in the memory
   management subsystem by improving sysfs handling, fixing memory
   leaks, and introducing dynamic sysfs updates for memory hotplug
   support". Fixes things on error paths which we are unlikely to hit.

 - "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory"
   from SeongJae Park introduces new DAMOS quota goal metrics which
   eliminate the manual tuning which is required when utilizing DAMON
   for memory tiering.

 - "mm/vmalloc.c: code cleanup and improvements" from Baoquan He
   provides cleanups and small efficiency improvements which Baoquan
   found via code inspection.

 - "vmscan: enforce mems_effective during demotion" from Gregory Price
   changes reclaim to respect cpuset.mems_effective during demotion when
   possible. because presently, reclaim explicitly ignores
   cpuset.mems_effective when demoting, which may cause the cpuset
   settings to violated.

   This is useful for isolating workloads on a multi-tenant system from
   certain classes of memory more consistently.

 - "Clean up split_huge_pmd_locked() and remove unnecessary folio
   pointers" from Gavin Guo provides minor cleanups and efficiency gains
   in in the huge page splitting and migrating code.

 - "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache
   for `struct mem_cgroup', yielding improved memory utilization.

 - "add max arg to swappiness in memory.reclaim and lru_gen" from
   Zhongkun He adds a new "max" argument to the "swappiness=" argument
   for memory.reclaim MGLRU's lru_gen.

   This directs proactive reclaim to reclaim from only anon folios
   rather than file-backed folios.

 - "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the
   first step on the path to permitting the kernel to maintain existing
   VMs while replacing the host kernel via file-based kexec. At this
   time only memblock's reserve_mem is preserved.

 - "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides
   and uses a smarter way of looping over a pfn range. By skipping
   ranges of invalid pfns.

 - "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via
   cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning
   when a task is pinned a single NUMA mode.

   Dramatic performance benefits were seen in some real world cases.

 - "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank
   Garg addresses a warning which occurs during memory compaction when
   using JFS.

 - "move all VMA allocation, freeing and duplication logic to mm" from
   Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more
   appropriate mm/vma.c.

 - "mm, swap: clean up swap cache mapping helper" from Kairui Song
   provides code consolidation and cleanups related to the folio_index()
   function.

 - "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that.

 - "memcg: Fix test_memcg_min/low test failures" from Waiman Long
   addresses some bogus failures which are being reported by the
   test_memcontrol selftest.

 - "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo
   Stoakes commences the deprecation of file_operations.mmap() in favor
   of the new file_operations.mmap_prepare().

   The latter is more restrictive and prevents drivers from messing with
   things in ways which, amongst other problems, may defeat VMA merging.

 - "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples
   the per-cpu memcg charge cache from the objcg's one.

   This is a step along the way to making memcg and objcg charging
   NMI-safe, which is a BPF requirement.

 - "mm/damon: minor fixups and improvements for code, tests, and
   documents" from SeongJae Park is yet another batch of miscellaneous
   DAMON changes. Fix and improve minor problems in code, tests and
   documents.

 - "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg
   stats to be irq safe. Another step along the way to making memcg
   charging and stats updates NMI-safe, a BPF requirement.

 - "Let unmap_hugepage_range() and several related functions take folio
   instead of page" from Fan Ni provides folio conversions in the
   hugetlb code.

* tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits)
  mm: pcp: increase pcp->free_count threshold to trigger free_high
  mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range()
  mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page
  mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page
  mm/hugetlb: pass folio instead of page to unmap_ref_private()
  memcg: objcg stock trylock without irq disabling
  memcg: no stock lock for cpu hot-unplug
  memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs
  memcg: make count_memcg_events re-entrant safe against irqs
  memcg: make mod_memcg_state re-entrant safe against irqs
  memcg: move preempt disable to callers of memcg_rstat_updated
  memcg: memcg_rstat_updated re-entrant safe against irqs
  mm: khugepaged: decouple SHMEM and file folios' collapse
  selftests/eventfd: correct test name and improve messages
  alloc_tag: check mem_profiling_support in alloc_tag_init
  Docs/damon: update titles and brief introductions to explain DAMOS
  selftests/damon/_damon_sysfs: read tried regions directories in order
  mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject()
  mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat()
  mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs
  ...
2025-05-31 15:44:16 -07:00
Alok Tiwari 6682bfc1b2 Doc: networking: Fix various typos in rds.rst
Corrected "sages" to "messages" in the bitmap allocation description.
Fixed "competed" to "completed" in the recv path datagram handling section.
Corrected "privatee" to "private" in the multipath RDS section.
Fixed "mutlipath" to "multipath" in the transport capabilities description.

These changes improve documentation clarity and maintain consistency.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20250522074413.3634446-1-alok.a.tiwari@oracle.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27 12:17:39 +02:00
Paolo Abeni fdb061195f ipsec-next-2025-05-23
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmgwJa4ACgkQrB3Eaf9P
 W7d34A//V3NukN6UNAUKd+MbH80eXCEbNSNIuVUstfr0S71qTCxovLX58u+oQztb
 43mx/NsnF38TzNFWVyVzF4vcr/n0DS/Da3P5pJEjoewIYSDrz/WfOum6VpVIUsZ/
 kLCDZlIoX/fBPFZDPHMmsDXDemAdrtr8CuK72NUH10vKDuGKSUG0NElqDieDBEsA
 y/fqgBsyxQXi9cMdRxf+DLDK/hzqyaJmVj8B8WEcFtYXJ4RE6+jfLgAaTE6J7V5W
 fYACTu/IcdtgEEm2U7wlow66oIjqqGReuWUzV9zHGJNCB9+da6L4dbGtzlRmOPdn
 kI1PIALFWT2HbKnJOJJbaThO6zES1rMOm3PsWt7iVewCT8HuhAa9kDV0xzdcLQE1
 +REfo8dXW9f5hRUrSuqpJFUArkCHWHLhQEcmTHaF0b2RveC/hd9rOyKIfae+fgIP
 5uLU2DpwafDgw5UCjsQTLyQ5M6icO8wFgM7vKAUJWyI1Pck1ktf7Ic6+KQRNjWiv
 Q7ImwpSdLH2bZpIbIKDnIcyZg3CMBIQ88cdsYi0+ckgDQ0hMf6ZrXRseXKRe0P/M
 gKgBOoXIJBF7niJQTDqHjsmnYGvvhZysIJNQLf4BZFYOeF5L9OduP6ywqMe5pFKt
 QAsJSZw/+SibheLEYQAzvyLD6VdMXaxeOAHlPylRRpl9vEX0l04=
 =GRVJ
 -----END PGP SIGNATURE-----

Merge tag 'ipsec-next-2025-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
1) Remove some unnecessary strscpy_pad() size arguments.
   From Thorsten Blum.

2) Correct use of xso.real_dev on bonding offloads.
   Patchset from Cosmin Ratiu.

3) Add hardware offload configuration to XFRM_MSG_MIGRATE.
   From Chiachang Wang.

4) Refactor migration setup during cloning. This was
   done after the clone was created. Now it is done
   in the cloning function itself.
   From Chiachang Wang.

5) Validate assignment of maximal possible SEQ number.
   Prevent from setting to the maximum sequrnce number
   as this would cause for traffic drop.
   From Leon Romanovsky.

6) Prevent configuration of interface index when offload
   is used. Hardware can't handle this case.i
   From Leon Romanovsky.

7) Always use kfree_sensitive() for SA secret zeroization.
   From Zilin Guan.

ipsec-next-2025-05-23

* tag 'ipsec-next-2025-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  xfrm: use kfree_sensitive() for SA secret zeroization
  xfrm: prevent configuration of interface index when offload is used
  xfrm: validate assignment of maximal possible SEQ number
  xfrm: Refactor migration setup during the cloning process
  xfrm: Migrate offload configuration
  bonding: Fix multiple long standing offload races
  bonding: Mark active offloaded xfrm_states
  xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free}
  xfrm: Remove unneeded device check from validate_xmit_xfrm
  xfrm: Use xdo.dev instead of xdo.real_dev
  net/mlx5: Avoid using xso.real_dev unnecessarily
  xfrm: Remove unnecessary strscpy_pad() size arguments
====================

Link: https://patch.msgid.link/20250523075611.3723340-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26 18:32:48 +02:00
Fan Gong 17fcb3dc12 hinic3: module initialization and tx/rx logic
This is [1/3] part of hinic3 Ethernet driver initial submission.
With this patch hinic3 is a valid kernel module but non-functional
driver.

The driver parts contained in this patch:
Module initialization.
PCI driver registration but with empty id_table.
Auxiliary driver registration.
Net device_ops registration but open/stop are empty stubs.
tx/rx logic.

All major data structures of the driver are fully introduced with the
code that uses them but without their initialization code that requires
management interface with the hw.

Co-developed-by: Xin Guo <guoxin09@huawei.com>
Signed-off-by: Xin Guo <guoxin09@huawei.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Co-developed-by: Gur Stavi <gur.stavi@huawei.com>
Signed-off-by: Gur Stavi <gur.stavi@huawei.com>
Link: https://patch.msgid.link/76a137ffdfe115c737c2c224f0c93b60ba53cc16.1747736586.git.gur.stavi@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21 20:31:42 -07:00
Eric Dumazet 572be9bf9d tcp: increase tcp_rmem[2] to 32 MB
Last change to tcp_rmem[2] happened in 2012, in commit b49960a05e
("tcp: change tcp_adv_win_scale and tcp_rmem[2]")

TCP performance on WAN is mostly limited by tcp_rmem[2] for receivers.

After this series improvements, it is time to increase the default.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250513193919.1089692-12-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15 11:30:09 -07:00
Eric Dumazet 9ea3bfa61b tcp: increase tcp_limit_output_bytes default value to 4MB
Last change happened in 2018 with commit c73e5807e4
("tcp: tsq: no longer use limit_output_bytes for paced flows")

Modern NIC speeds got a 4x increase since then.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250513193919.1089692-10-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15 11:30:09 -07:00
Jakub Kicinski bebd7b2626 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.15-rc7).

Conflicts:

tools/testing/selftests/drivers/net/hw/ncdevmem.c
  97c4e094a4 ("tests/ncdevmem: Fix double-free of queue array")
  2f1a805f32 ("selftests: ncdevmem: Implement devmem TCP TX")
https://lore.kernel.org/20250514122900.1e77d62d@canb.auug.org.au

Adjacent changes:

net/core/devmem.c
net/core/devmem.h
  0afc44d8cd ("net: devmem: fix kernel panic when netlink socket close after module unload")
  bd61848900 ("net: devmem: Implement TX path")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15 11:28:30 -07:00
Lee Trager 82534f446d eth: fbnic: Add devlink dev flash support
Add support to update the CMRT and control firmware as well as the UEFI
driver on fbnic using devlink dev flash.

Make sure the shutdown / quiescence paths like suspend take the devlink
lock to prevent them from interrupting the FW flashing process.

Signed-off-by: Lee Trager <lee@trager.us>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250512190109.2475614-6-lee@trager.us
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15 12:59:18 +02:00
Alper Ak 4abc1f14e2 documentation: networking: devlink: Fix a typo in devlink-trap.rst
Fix a typo in the documentation: "errorrs" -> "errors".

Signed-off-by: Alper Ak <alperyasinak1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250513092451.22387-1-alperyasinak1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14 18:51:58 -07:00
Vladimir Oltean d5c17e3654 docs: networking: timestamping: improve stacked PHC sentence
The first paragraph makes no grammatical sense. I suppose a portion of
the intended sentece is missing: "[The challenge with ] stacked PHCs
(...) is that they uncover bugs".

Rephrase, and at the same time simplify the structure of the sentence a
little bit, it is not easy to follow.

Fixes: 94d9f78f4d ("docs: networking: timestamping: add section for stacked PHC devices")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://patch.msgid.link/20250512131751.320283-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-13 15:40:54 -07:00
Mina Almasry 383faec0fd net: enable driver support for netmem TX
Drivers need to make sure not to pass netmem dma-addrs to the
dma-mapping API in order to support netmem TX.

Add helpers and netmem_dma_*() helpers that enables special handling of
netmem dma-addrs that drivers can use.

Document in netmem.rst what drivers need to do to support netmem TX.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250508004830.4100853-7-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-13 11:12:49 +02:00
Mina Almasry 17af8cc06a net: add devmem TCP TX documentation
Add documentation outlining the usage and details of the devmem TCP TX
API.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250508004830.4100853-6-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-13 11:12:48 +02:00
Christoph Lameter (Ampere) 786d5cc2b9 Update Christoph's Email address and make it consistent
Use cl@gentwo.org throughout and remove the old email addresses.

Link: https://lkml.kernel.org/r/8b962f57-4d98-cbb0-cd82-b6ba456733e8@gentwo.org
Signed-off-by: Christoph Lameter <cl@gentwo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12 23:50:31 -07:00
Jakub Kicinski 08906eacce Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-04-29 (igb, igc, ixgbe, idpf)

For igb:
Kurt Kanzenbach adds linking of IRQs and queues to NAPI instances and
adds persistent NAPI config. Lastly, he removes undesired IRQs that
occur while busy polling.

For igc:
Kurt Kanzenbach switches the Tx mode for MQPRIO offload to harmonize the
current implementation with TAPRIO.

For ixgbe:
Jedrzej adds separate ethtool ops for E610 devices to account for device
differences.

Slawomir adds devlink region support for E610 devices.

For idpf:
Mateusz assigns and utilizes the ptype field out of libeth_rqe_info.

Michal removes unreachable code.

* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  idpf: remove unreachable code from setting mailbox
  idpf: assign extracted ptype to struct libeth_rqe_info field
  ixgbe: devlink: add devlink region support for E610
  ixgbe: add E610 .set_phys_id() callback implementation
  ixgbe: apply different rules for setting FC on E610
  ixgbe: add support for ACPI WOL for E610
  ixgbe: create E610 specific ethtool_ops structure
  igc: Change Tx mode for MQPRIO offloading
  igc: Limit netdev_tc calls to MQPRIO
  igb: Get rid of spurious interrupts
  igb: Add support for persistent NAPI config
  igb: Link queues to NAPI instances
  igb: Link IRQs to NAPI instances
====================

Link: https://patch.msgid.link/20250429234651.3982025-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01 17:51:31 -07:00
Jakub Kicinski 1f773970a7 netfilter pull request 25-04-29
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEjF9xRqF1emXiQiqU1w0aZmrPKyEFAmgP++oACgkQ1w0aZmrP
 KyHxVw/5ASwm5WPt7qRgSLKlbIzUbTQ5a2CEC7WS6ova5M0nvxFL6f/JACeK3tTt
 nC+2+Vx89PFYB1fcGE9MfrZBFf/Ccm9/leluKtYJ0l7VnhLAxAByn5EbExhTFPeP
 YA3neewU62x8Cm3JQ6pZrFQYjz7pEGfSgjJ6E2dncIDMcjQzFAnDxEAB8VDqzFde
 yAosgA3yJtvan2otuqAqg33i0Q32FM0rhFplIB0pvJpbq24YVs1vTupSp9xEAeDu
 Ms5dcm2fN1L2HEgt2xhDrhVLEIp3CDZpC0k9hf7eJkonm03nCBlfqUd6gPFwdz0j
 YoHziIDO4zdtefw4Z1Bd31L5BgRyEZkbf6E8qWqCb5hpauKMKNwrtaa2qAMYqWmF
 UQuGXCXkIJVTfwxrCk1zyYKRhc0ECXpah65lB7RwxlAayDWTvEGKdcWKs9Hk5jx3
 hHgxfqO+nRfUjzGap5bWpOt18mfwZn287XSWFK6Ko6m/6UgzORfEEby/YoDUELKL
 xZBWy50+fauSkhFpQXt+IRT5rirmfMYv9hBBweKBwK05txMR4JdTNcI+zZxLMwnK
 sDbRjUsE1Req2ZnJPF33PcbkbvHj/B8m3awPxt2C3/kDK5WpKL3986Bj0hIJ25iM
 12BD5iSfOTGIZJCdRMW7SAoI1h3Z7+cePF4UjxiWS7jCbKE6u2E=
 =tp12
 -----END PGP SIGNATURE-----

Merge tag 'nf-next-25-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following batch contains Netfilter updates for net-next:

1) Replace msecs_to_jiffies() by secs_to_jiffies(), from Easwar Hariharan.

2) Allow to compile xt_cgroup with cgroupsv2 support only,
   from Michal Koutny.

3) Prepare for sock_cgroup_classid() removal by wrapping it around
   ifdef, also from Michal Koutny.

4) Remove redundant pointer fetch on conntrack template, from Xuanqiang Luo.

5) Re-format one block in the tproxy documentation for consistency,
   from Chen Linxuan.

6) Expose set element count and type via netlink attributes,
   from Florian Westphal.

* tag 'nf-next-25-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables: export set count and backend name to userspace
  docs: tproxy: fix formatting for nft code block
  netfilter: conntrack: Remove redundant NFCT_ALIGN call
  net: cgroup: Guard users of sock_cgroup_classid()
  netfilter: xt_cgroup: Make it independent from net_cls
  netfilter: xt_IDLETIMER: convert timeouts to secs_to_jiffies()
====================

Link: https://patch.msgid.link/20250428221254.3853-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29 16:31:10 -07:00
Slawomir Mrozowicz fe259a1bb2 ixgbe: devlink: add devlink region support for E610
Provide support for the following devlink cmds:
 -DEVLINK_CMD_REGION_GET
 -DEVLINK_CMD_REGION_NEW
 -DEVLINK_CMD_REGION_DEL
 -DEVLINK_CMD_REGION_READ

ixgbe devlink region implementation, similarly to the ice one,
lets user to create snapshots of content of Non Volatile Memory,
content of Shadow RAM, and capabilities of the device.

For both NVM and SRAM regions provide .read() handler to let user
read their contents without the need to create full snapshots.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com>
Co-developed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Bharath R <bharath.r@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29 15:13:44 -07:00
MD Danish Anwar 0d15a26b24 net: ti: icssg-prueth: Add ICSSG FW Stats
The ICSSG firmware maintains set of stats called PA_STATS.
Currently the driver only dumps 4 stats. Add support for dumping more
stats.

The offset for different stats are defined as MACROs in icssg_switch_map.h
file. All the offsets are for Slice0. Slice1 offsets are slice0 + 4.
The offset calculation is taken care while reading the stats in
emac_update_hardware_stats().

The statistics are documented in
Documentation/networking/device_drivers/icssg_prueth.rst

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://patch.msgid.link/20250424095316.2643573-1-danishanwar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28 17:20:53 -07:00
Chen Linxuan 149a133a54 docs: tproxy: fix formatting for nft code block
The nft command snippet for redirecting traffic isn't formatted
in a literal code block like the rest of snippets.
Fix the formatting inconsistency.

Signed-off-by: Chen Linxuan <chenlinxuan@uniontech.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-04-29 00:00:26 +02:00
Dr. David Alan Gilbert 39144062ea rxrpc: Remove deadcode
Remove three functions that are no longer used.

rxrpc_get_txbuf() last use was removed by 2020's
commit 5e6ef4f101 ("rxrpc: Make the I/O thread take over the call and
local processor work")

rxrpc_kernel_get_epoch() last use was removed by 2020's
commit 44746355cc ("afs: Don't get epoch from a server because it may be
ambiguous")

rxrpc_kernel_set_max_life() last use was removed by 2023's
commit db099c625b ("rxrpc: Fix timeout of a call that hasn't yet been
granted a channel")

Both of the rxrpc_kernel_* functions were documented.  Remove that
documentation as well as the code.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Acked-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/20250422235147.146460-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-24 17:03:45 -07:00
Jakub Kicinski 22cbc1ee26 netdev: fix the locking for netdev notifications
Kuniyuki reports that the assert for netdev lock fires when
there are netdev event listeners (otherwise we skip the netlink
event generation).

Correct the locking when coming from the notifier.

The NETDEV_XDP_FEAT_CHANGE notifier is already fully locked,
it's the documentation that's incorrect.

Fixes: 99e44f39a8 ("netdev: depend on netdev->lock for xdp features")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Reported-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/20250410171019.62128-1-kuniyu@amazon.com
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250416030447.1077551-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-17 18:55:14 -07:00
Jakub Kicinski 4e34a84061 Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
ixgbe: Add basic devlink support

Jedrzej Jagielski says:

Create devlink specific directory for more convenient future feature
development.

Flashing and reloading are supported only by E610 devices.

Introduce basic FW/NVM validation since devlink reload introduces
possibility of runtime NVM update. Check FW API version, FW recovery
mode and FW rollback mode. Introduce minimal recovery probe to let
user to reload the faulty FW when recovery mode is detected.

* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ixgbe: add support for FW rollback mode
  ixgbe: add E610 implementation of FW recovery mode
  ixgbe: add FW API version check
  ixgbe: add support for devlink reload
  ixgbe: add device flash update via devlink
  ixgbe: extend .info_get() with stored versions
  ixgbe: add E610 functions getting PBA and FW ver info
  ixgbe: add .info_get extension specific for E610 devices
  ixgbe: read the netlist version information
  ixgbe: read the OROM version information
  ixgbe: add E610 functions for acquiring flash data
  ixgbe: add handler for devlink .info_get()
  ixgbe: add initial devlink support
  ixgbe: wrap netdev_priv() usage
  devlink: add value check to devlink_info_version_put()
====================

Link: https://patch.msgid.link/20250415221301.1633933-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16 19:05:21 -07:00
Jakub Kicinski 49593c298c docs: networking: clarify intended audience of netdevices.rst
The netdevices doc is dangerously broad. At least make it clear
that it's intended for developers, not for users.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250415172653.811147-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16 18:30:14 -07:00
Cosmin Ratiu 43eca05b6a xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free}
Previously, device driver IPSec offload implementations would fall into
two categories:
1. Those that used xso.dev to determine the offload device.
2. Those that used xso.real_dev to determine the offload device.

The first category didn't work with bonding while the second did.
In a non-bonding setup the two pointers are the same.

This commit adds explicit pointers for the offload netdevice to
.xdo_dev_state_add() / .xdo_dev_state_delete() / .xdo_dev_state_free()
which eliminates the confusion and allows drivers from the first
category to work with bonding.

xso.real_dev now becomes a private pointer managed by the bonding
driver.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-04-16 11:01:41 +02:00
Jedrzej Jagielski c9e563cae1 ixgbe: add support for devlink reload
The E610 adapters contain an embedded chip with firmware which can be
updated using devlink flash. The firmware which runs on this chip is
referred to as the Embedded Management Processor firmware (EMP
firmware).

Activating the new firmware image currently requires that the system be
rebooted. This is not ideal as rebooting the system can cause unwanted
downtime.

The EMP firmware itself can be reloaded by issuing a special update
to the device called an Embedded Management Processor reset (EMP
reset). This reset causes the device to reset and reload the EMP
firmware.

Implement support for devlink reload with the "fw_activate" flag. This
allows user space to request the firmware be activated immediately.

Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Bharath R <bharath.r@intel.com>
Co-developed-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com>
Signed-off-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com>
Co-developed-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Co-developed-by: Stefan Wegrzyn <stefan.wegrzyn@intel.com>
Signed-off-by: Stefan Wegrzyn <stefan.wegrzyn@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-15 07:36:33 -07:00
Jedrzej Jagielski a0f45672d5 ixgbe: add device flash update via devlink
Use the pldmfw library to implement device flash update for
the Intel ixgbe networking device driver specifically for E610 devices.
This support uses the devlink flash update interface.

Using the pldmfw library, the provided firmware file will be scanned for
the three major components, "fw.undi" for the Option ROM, "fw.mgmt" for
the main NVM module containing the primary device firmware, and
"fw.netlist" containing the netlist module.

The flash is separated into two banks, the active bank containing the
running firmware, and the inactive bank which we use for update. Each
module is updated in a staged process. First, the inactive bank is
erased, preparing the device for update. Second, the contents of the
component are copied to the inactive portion of the flash. After all
components are updated, the driver signals the device to switch the
active bank during the next EMP reset.

With this implementation, basic flash update for the E610 hardware is
supported.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Bharath R <bharath.r@intel.com>
Co-developed-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com>
Signed-off-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com>
Co-developed-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Co-developed-by: Stefan Wegrzyn <stefan.wegrzyn@intel.com>
Signed-off-by: Stefan Wegrzyn <stefan.wegrzyn@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-15 07:36:32 -07:00
Jedrzej Jagielski 8210ff7380 ixgbe: add .info_get extension specific for E610 devices
E610 devices give possibility to show more detailed info than the previous
boards.
Extend reporting NVM info with following pieces:
 fw.mgmt.api -> version number of the API
 fw.mgmt.build -> identifier of the source for the FW
 fw.mgmt.srev -> number defining FW's security revision
 fw.psid.api -> version defining the format of the flash contents
 fw.undi.srev -> number defining OROM's security revision
 fw.netlist -> version of the netlist module
 fw.netlist.build -> first 4 bytes of the netlist hash

Co-developed-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com>
Signed-off-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com>
Co-developed-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-15 07:36:32 -07:00
Jedrzej Jagielski f6b588af3d ixgbe: add handler for devlink .info_get()
Provide devlink .info_get() callback implementation to allow the
driver to report detailed version information. The following info
is reported:

 "serial_number" -> The PCI DSN of the adapter
 "fw.bundle_id" -> Unique identifier for the combined flash image
 "fw.undi" -> Version of the Option ROM containing the UEFI driver
 "board.id" -> The PBA ID string

Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Bharath R <bharath.r@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-15 07:36:32 -07:00
Jedrzej Jagielski a0285236ab ixgbe: add initial devlink support
Add an initial support for devlink interface to ixgbe driver.

Similarly to i40e driver the implementation doesn't enable
devlink to manage device-wide configuration. Devlink instance
is created for each physical function of PCIe device.

Create separate directory for devlink related ixgbe files
and use naming scheme similar to the one used in the ice driver.

Add a stub for Documentation, to be extended by further patches.

Change struct ixgbe_adapter allocation to be done by devlink (Przemek),
as suggested by Jiri.

Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Co-developed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Bharath R <bharath.r@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-15 07:36:32 -07:00
Mohsin Bashir f2957147ae eth: fbnic: add support for TTI HW stats
Add coverage for the TX Extension (TEI) Interface (TTI) stats. We are
tracking packets and control message drops because of credit exhaustion
on the TX interface.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410070859.4160768-6-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-15 11:23:13 +02:00
Mohsin Bashir 5f8bd2ce82 eth: fbnic: add support for TMI stats
This patch add coverage for TMI stats including PTP stats and drop
stats.

PTP stats include illegal requests, bad timestamp and good timestamps.
The bad timestamp and illegal request counters are reported under as
`error` via `ethtool -T` Both these counters are individually being
reported via `ethtool -S`

The good timestamp stats are being reported as `pkts` via `ethtool -T`

ethtool -S eth0 | grep "ptp"
     ptp_illegal_req: 0
     ptp_good_ts: 0
     ptp_bad_ts: 0

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410070859.4160768-5-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-15 11:23:13 +02:00
Mohsin Bashir 986c63a029 eth: fbnic: add coverage for RXB stats
This patch provides coverage to the RXB (RX Buffer) stats. RXB stats
are divided into 3 sections: RXB enqueue, RXB FIFO, and RXB dequeue
stats.

The RXB enqueue/dequeue stats are indexed from 0-3 and cater for the
input/output counters whereas, the RXB fifo stats are indexed from 0-7.

The RXB also supports pause frame stats counters which we are leaving
for a later patch.

ethtool -S eth0 | grep rxb
     rxb_integrity_err0: 0
     rxb_mac_err0: 0
     rxb_parser_err0: 0
     rxb_frm_err0: 0
     rxb_drbo0_frames: 1433543
     rxb_drbo0_bytes: 775949081
     ---
     ---
     rxb_intf3_frames: 1195711
     rxb_intf3_bytes: 739650210
     rxb_pbuf3_frames: 1195711
     rxb_pbuf3_bytes: 765948092

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410070859.4160768-4-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-15 11:23:13 +02:00
Mohsin Bashir 8f20a2bfa4 eth: fbnic: add coverage for hw queue stats
This patch provides support for hardware queue stats and covers
packet errors for RX-DMA engine, RCQ drops and BDQ drops.

The packet errors are also aggregated with the `rx_errors` stats in the
`rtnl_link_stats` as well as with the `hw_drops` in the queue API.

The RCQ and BDQ drops are aggregated with `rx_over_errors` in the
`rtnl_link_stats` as well as with the `hw_drop_overruns` in the queue API.

ethtool -S eth0 | grep -E 'rde'
     rde_0_pkt_err: 0
     rde_0_pkt_cq_drop: 0
     rde_0_pkt_bdq_drop: 0
     ---
     ---
     rde_127_pkt_err: 0
     rde_127_pkt_cq_drop: 0
     rde_127_pkt_bdq_drop: 0

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410070859.4160768-3-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-15 11:23:13 +02:00
David Howells d03539d5c2 rxrpc: Display security params in the afs_cb_call tracepoint
Make the afs_cb_call tracepoint display some security parameters to make
debugging easier.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250411095303.2316168-12-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-14 17:36:42 -07:00
David Howells 9d1d2b5934 rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)
Implement the basic parts of the yfs-rxgk security class (security index 6)
to support GSSAPI-negotiated security.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250411095303.2316168-9-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-14 17:36:42 -07:00
David Howells 5800b1cf3f rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSE
Allow the app to request that CHALLENGEs be passed to it through an
out-of-band queue that allows recvmsg() to pick it up so that the app can
add data to it with sendmsg().

This will allow the application (AFS or userspace) to interact with the
process if it wants to and put values into user-defined fields.  This will
be used by AFS when talking to a fileserver to supply that fileserver with
a crypto key by which callback RPCs can be encrypted (ie. notifications
from the fileserver to the client).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250411095303.2316168-5-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-14 17:36:41 -07:00
David Howells 28a79fc9b0 rxrpc: kdoc: Update function descriptions and add link from rxrpc.rst
Update the kerneldoc function descriptions to add "Return:" sections for
AF_RXRPC exported functions that have return values to stop the kdoc
builder from throwing warnings.

Also add links from the rxrpc.rst API doc to add a function API reference
at the end.  (Note that the API doc really needs updating, but that's
beyond this patchset).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250411095303.2316168-2-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-14 17:36:40 -07:00
Jakub Kicinski 097f171f98 net: convert dev->rtnl_link_state to a bool
netdevice reg_state was split into two 16 bit enums back in 2010
in commit a2835763e1 ("rtnetlink: handle rtnl_link netlink
notifications manually"). Since the split the fields have been
moved apart, and last year we converted reg_state to a normal
u8 in commit 4d42b37def ("net: convert dev->reg_state to u8").

rtnl_link_state being a 16 bitfield makes no sense. Convert it
to a single bool, it seems very unlikely after 15 years that
we'll need more values in it.

We could drop dev->rtnl_link_ops from the conditions but feels
like having it there more clearly points at the reason for this
hack.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250410014246.780885-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-14 14:29:54 -07:00
Kuniyuki Iwashima 2a63dd0edf net: Retire DCCP socket.
DCCP was orphaned in 2021 by commit 054c4610bd ("MAINTAINERS: dccp:
move Gerrit Renker to CREDITS"), which noted that the last maintainer
had been inactive for five years.

In recent years, it has become a playground for syzbot, and most changes
to DCCP have been odd bug fixes triggered by syzbot.  Apart from that,
the only changes have been driven by treewide or networking API updates
or adjustments related to TCP.

Thus, in 2023, we announced we would remove DCCP in 2025 via commit
b144fcaf46 ("dccp: Print deprecation notice.").

Since then, only one individual has contacted the netdev mailing list. [0]

There is ongoing research for Multipath DCCP.  The repository is hosted
on GitHub [1], and development is not taking place through the upstream
community.  While the repository is published under the GPLv2 license,
the scheduling part remains proprietary, with a LICENSE file [2] stating:

  "This is not Open Source software."

The researcher mentioned a plan to address the licensing issue, upstream
the patches, and step up as a maintainer, but there has been no further
communication since then.

Maintaining DCCP for a decade without any real users has become a burden.

Therefore, it's time to remove it.

Removing DCCP will also provide significant benefits to TCP.  It allows
us to freely reorganize the layout of struct inet_connection_sock, which
is currently shared with DCCP, and optimize it to reduce the number of
cachelines accessed in the TCP fast path.

Note that we keep DCCP netfilter modules as requested.  [3]

Link: https://lore.kernel.org/netdev/20230710182253.81446-1-kuniyu@amazon.com/T/#u #[0]
Link: https://github.com/telekom/mp-dccp #[1]
Link: https://github.com/telekom/mp-dccp/blob/mpdccp_v03_k5.10/net/dccp/non_gpl_scheduler/LICENSE #[2]
Link: https://lore.kernel.org/netdev/Z_VQ0KlCRkqYWXa-@calendula/ #[3]
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paul Moore <paul@paul-moore.com> (LSM and SELinux)
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Link: https://patch.msgid.link/20250410023921.11307-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-11 18:58:10 -07:00
Jiayuan Chen c449d5f3a3 tcp: add LINUX_MIB_PAWS_TW_REJECTED counter
When TCP is in TIME_WAIT state, PAWS verification uses
LINUX_PAWSESTABREJECTED, which is ambiguous and cannot be distinguished
from other PAWS verification processes.

We added a new counter, like the existing PAWS_OLD_ACK one.

Also we update the doc with previously missing PAWS_OLD_ACK.

usage:
'''
nstat -az | grep PAWSTimewait
TcpExtPAWSTimewait              1                  0.0
'''

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250409112614.16153-3-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-10 18:29:26 -07:00
Jakub Kicinski cb7103298d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.15-rc2).

Conflict:

Documentation/networking/netdevices.rst
net/core/lock_debug.c
  04efcee6ef ("net: hold instance lock during NETDEV_CHANGE")
  03df156dd3 ("xdp: double protect netdev->xdp_flags with netdev->lock")

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-10 16:51:07 -07:00
Jakub Kicinski ce7b149474 netdev: depend on netdev->lock for qstats in ops locked drivers
We mostly needed rtnl_lock in qstat to make sure the queue count
is stable while we work. For "ops locked" drivers the instance
lock protects the queue count, so we don't have to take rtnl_lock.

For currently ops-locked drivers: netdevsim and bnxt need
the protection from netdev going down while we dump, which
instance lock provides. gve doesn't care.

Reviewed-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250408195956.412733-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09 17:01:52 -07:00
Jakub Kicinski 87eba404f2 docs: netdev: break down the instance locking info per ops struct
Explicitly list all the ops structs and what locking they provide.
Use "ops locked" as a term for drivers which have ops called under
the instance lock.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250408195956.412733-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09 17:01:52 -07:00
Jakub Kicinski 03df156dd3 xdp: double protect netdev->xdp_flags with netdev->lock
Protect xdp_features with netdev->lock. This way pure readers
no longer have to take rtnl_lock to access the field.

This includes calling NETDEV_XDP_FEAT_CHANGE under the lock.
Looks like that's fine for bonding, the only "real" listener,
it's the same as ethtool feature change.

In terms of normal drivers - only GVE need special consideration
(other drivers don't use instance lock or don't support XDP).
It calls xdp_set_features_flag() helper from gve_init_priv() which
in turn is called from gve_reset_recovery() (locked), or prior
to netdev registration. So switch to _locked.

Reviewed-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Harshitha Ramamurthy <hramamurthy@google.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250408195956.412733-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09 17:01:52 -07:00
Stanislav Fomichev 04efcee6ef net: hold instance lock during NETDEV_CHANGE
Cosmin reports an issue with ipv6_add_dev being called from
NETDEV_CHANGE notifier:

[ 3455.008776]  ? ipv6_add_dev+0x370/0x620
[ 3455.010097]  ipv6_find_idev+0x96/0xe0
[ 3455.010725]  addrconf_add_dev+0x1e/0xa0
[ 3455.011382]  addrconf_init_auto_addrs+0xb0/0x720
[ 3455.013537]  addrconf_notify+0x35f/0x8d0
[ 3455.014214]  notifier_call_chain+0x38/0xf0
[ 3455.014903]  netdev_state_change+0x65/0x90
[ 3455.015586]  linkwatch_do_dev+0x5a/0x70
[ 3455.016238]  rtnl_getlink+0x241/0x3e0
[ 3455.019046]  rtnetlink_rcv_msg+0x177/0x5e0

Similarly, linkwatch might get to ipv6_add_dev without ops lock:
[ 3456.656261]  ? ipv6_add_dev+0x370/0x620
[ 3456.660039]  ipv6_find_idev+0x96/0xe0
[ 3456.660445]  addrconf_add_dev+0x1e/0xa0
[ 3456.660861]  addrconf_init_auto_addrs+0xb0/0x720
[ 3456.661803]  addrconf_notify+0x35f/0x8d0
[ 3456.662236]  notifier_call_chain+0x38/0xf0
[ 3456.662676]  netdev_state_change+0x65/0x90
[ 3456.663112]  linkwatch_do_dev+0x5a/0x70
[ 3456.663529]  __linkwatch_run_queue+0xeb/0x200
[ 3456.663990]  linkwatch_event+0x21/0x30
[ 3456.664399]  process_one_work+0x211/0x610
[ 3456.664828]  worker_thread+0x1cc/0x380
[ 3456.665691]  kthread+0xf4/0x210

Reclassify NETDEV_CHANGE as a notifier that consistently runs under the
instance lock.

Link: https://lore.kernel.org/netdev/aac073de8beec3e531c86c101b274d434741c28e.camel@nvidia.com/
Reported-by: Cosmin Ratiu <cratiu@nvidia.com>
Tested-by: Cosmin Ratiu <cratiu@nvidia.com>
Fixes: ad7c7b2172 ("net: hold netdev instance lock during sysfs operations")
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250404161122.3907628-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-07 11:13:39 -07:00
Stanislav Fomichev ee705fa21f docs: net: document netdev notifier expectations
We don't have a consistent state yet, but document where we think
we are and where we wanna be.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250401163452.622454-8-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03 15:32:08 -07:00
Linus Torvalds ca0b04ba0b for-6.15/io_uring-rx-zc-20250325
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmfjTP8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpm6oEACnpGL52FAKTVj14GDqFo6Pq0Jmnh07x8qj
 mpHFPwxfWAzRiuNyji2iS9ecS2cnlkixNyMWZipXRi4KJAUjJH6YDd7IofUI3Glf
 6v7b6srFSvsWJIJ8LdkJHLHAJuzYnJvFZ8apwgQczEDqgHq7BAunM1sVQ+mydjYk
 EXT4kN6DSBOPzwr9GAay52f8nXhbqdHfT+YTGHPHg+QToojL6gD7vvW57w/QqD/x
 91hJef1z01cSIsDZOxA0EUeD+9bBAHpoamr/e3IOOCVYCN6hy0dGa9g0QGbbpVyE
 AeU4FGZLV9J8OOfvHVraDt5Wn3IXxYaMu22dSn1S6tVinwjXhJR2LAA+t4fGHAkt
 i38LjOsIbopSQn/cNhzwC8UZcHLqnVsdDolHlHzsVFVfcpck2/4JFpUeP8QhWgrk
 f9tY12QUf/oEaWm0/sUCHZNFxpIGeFA5FFXf0Z92clnzBuiuWoesBNvxqY/2DeZn
 IDNXiv+Trxr6kFEjTpzPeuxbWrn4PJ7afQSAFcEmOCguk5riM+zJZNIKg0TxUHSS
 tt6sfxmTP1DhgDKad5kT3MLyzOcx47Kbjf4dj6KmRnD+3DGwwN2F7X7R1GJylPSp
 RLOzJ+Ouuy9UmBN6JMsT4BmR9+FJTVirADU926d/ZqCTtRV8Tnq/6HHmKmmr4CR0
 THJ0PJqQjg==
 =MOve
 -----END PGP SIGNATURE-----

Merge tag 'for-6.15/io_uring-rx-zc-20250325' of git://git.kernel.dk/linux

Pull io_uring zero-copy receive support from Jens Axboe:
 "This adds support for zero-copy receive with io_uring, enabling fast
  bulk receive of data directly into application memory, rather than
  needing to copy the data out of kernel memory.

  While this version only supports host memory as that was the initial
  target, other memory types are planned as well, with notably GPU
  memory coming next.

  This work depends on some networking components which were queued up
  on the networking side, but have now landed in your tree.

  This is the work of Pavel Begunkov and David Wei. From the v14 posting:

    'We configure a page pool that a driver uses to fill a hw rx queue
     to hand out user pages instead of kernel pages. Any data that ends
     up hitting this hw rx queue will thus be dma'd into userspace
     memory directly, without needing to be bounced through kernel
     memory. 'Reading' data out of a socket instead becomes a
     _notification_ mechanism, where the kernel tells userspace where
     the data is. The overall approach is similar to the devmem TCP
     proposal

     This relies on hw header/data split, flow steering and RSS to
     ensure packet headers remain in kernel memory and only desired
     flows hit a hw rx queue configured for zero copy. Configuring this
     is outside of the scope of this patchset.

     We share netdev core infra with devmem TCP. The main difference is
     that io_uring is used for the uAPI and the lifetime of all objects
     are bound to an io_uring instance. Data is 'read' using a new
     io_uring request type. When done, data is returned via a new shared
     refill queue. A zero copy page pool refills a hw rx queue from this
     refill queue directly. Of course, the lifetime of these data
     buffers are managed by io_uring rather than the networking stack,
     with different refcounting rules.

     This patchset is the first step adding basic zero copy support. We
     will extend this iteratively with new features e.g. dynamically
     allocated zero copy areas, THP support, dmabuf support, improved
     copy fallback, general optimisations and more'

  In a local setup, I was able to saturate a 200G link with a single CPU
  core, and at netdev conf 0x19 earlier this month, Jamal reported
  188Gbit of bandwidth using a single core (no HT, including soft-irq).

  Safe to say the efficiency is there, as bigger links would be needed
  to find the per-core limit, and it's considerably more efficient and
  faster than the existing devmem solution"

* tag 'for-6.15/io_uring-rx-zc-20250325' of git://git.kernel.dk/linux:
  io_uring/zcrx: add selftest case for recvzc with read limit
  io_uring/zcrx: add a read limit to recvzc requests
  io_uring: add missing IORING_MAP_OFF_ZCRX_REGION in io_uring_mmap
  io_uring: Rename KConfig to Kconfig
  io_uring/zcrx: fix leaks on failed registration
  io_uring/zcrx: recheck ifq on shutdown
  io_uring/zcrx: add selftest
  net: add documentation for io_uring zcrx
  io_uring/zcrx: add copy fallback
  io_uring/zcrx: throttle receive requests
  io_uring/zcrx: set pp memory provider for an rx queue
  io_uring/zcrx: add io_recvzc request
  io_uring/zcrx: dma-map area for the device
  io_uring/zcrx: implement zerocopy receive pp memory provider
  io_uring/zcrx: grab a net device
  io_uring/zcrx: add io_zcrx_area
  io_uring/zcrx: add interface queue and refill queue
2025-03-28 13:45:52 -07:00
Linus Torvalds 7b667acd69 powerpc updates for 6.15
- Removal of support for IBM Cell Blades
 
  - SMP support for microwatt platform
 
  - Support for inline static calls on PPC32
 
  - Enable pmu selftests for power11 platform
 
  - Enable hardware trace macro (HTM) hcall support
 
  - Support for limited address mode capability
 
  - Changes to RMA size from 512 MB to 768 MB to handle fadump
 
  - Misc fixes and cleanups
 
 Thanks to: Abhishek Dubey, Amit Machhiwal, Andreas Schwab, Arnd Bergmann,
 Athira Rajeev, Avnish Chouhan, Christophe Leroy, Disha Goel, Donet Tom, Gaurav
 Batra, Gautam Menghani, Hari Bathini, Kajol Jain, Kees Cook, Mahesh Salgaonkar,
 Michael Ellerman, Paul Mackerras, Ritesh Harjani (IBM), Sathvika Vasireddy,
 Segher Boessenkool, Sourabh Jain, Vaibhav Jain, Venkat Rao Bagalkote.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqX2DNAOgU8sBX3pRpnEsdPSHZJQFAmfjZnYACgkQpnEsdPSH
 ZJTmKg/+NGW7wyFY9d8Iai9ncYY7GSzsMSDTaan7qg0QWOd5gjHbsdbava7TM/DW
 8p9XsC+17kSeftRNUjtc52bSN8Ei2gBdsXIagQG1alfB2X2e6wkNauifK+dz3Su6
 usMEZZTO5R/jFDotFXNM1nsUj+8dvjnPgOUrji/P8k7PT5295wpza0hz1fy5SrOA
 hM5cliBP36UgFe5Efvgm4OUX2gQIhbc3stt9MVfymW/k0Mit5f41UIPuVGiTWowY
 s0cUJGkhxUlGXT3VfOVKuZfn4u9KMha7UCl9afSceJzXOdnUIKIbskui1VEv6cD/
 iSIxi839uErAobFHlsLYprgYFciYLII3xe2qNZCA/ZxeIMS/Mm6xokESeWLhBnfa
 P7ke6l0z3GDtTvgI2eSeU9BdrVveF1NgbP9GYSKgT6gtw/kRRnxgHF8tzmLON5PT
 KXpQlzz8VuSBRtF2jnLFU89+FFwSA1bRUhDrp89HyYFqw1B5g4N7kFFTUJWHOuKS
 fwPGy+cveKehmCUBedeTRFqHvvqdwpD/WnPlQzCly3WxqdL8U/eTXYftMiAwuK28
 ovLuSs3vRThKRQ8DnUa5oB0UGsjMpRV5LdvYkhw+x8mZKUR59oj4fx2ae4TtPakg
 dbAYuPPkCORdaSga/nV6vQgsLprFpcGX3dq6E19+BVBAY5D+1PE=
 =GFcj
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Madhavan Srinivasan:

 - Remove support for IBM Cell Blades

 - SMP support for microwatt platform

 - Support for inline static calls on PPC32

 - Enable pmu selftests for power11 platform

 - Enable hardware trace macro (HTM) hcall support

 - Support for limited address mode capability

 - Changes to RMA size from 512 MB to 768 MB to handle fadump

 - Misc fixes and cleanups

Thanks to Abhishek Dubey, Amit Machhiwal, Andreas Schwab, Arnd Bergmann,
Athira Rajeev, Avnish Chouhan, Christophe Leroy, Disha Goel, Donet Tom,
Gaurav Batra, Gautam Menghani, Hari Bathini, Kajol Jain, Kees Cook,
Mahesh Salgaonkar, Michael Ellerman, Paul Mackerras, Ritesh Harjani
(IBM), Sathvika Vasireddy, Segher Boessenkool, Sourabh Jain, Vaibhav
Jain, and Venkat Rao Bagalkote.

* tag 'powerpc-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (61 commits)
  powerpc/kexec: fix physical address calculation in clear_utlb_entry()
  crypto: powerpc: Mark ghashp8-ppc.o as an OBJECT_FILES_NON_STANDARD
  powerpc: Fix 'intra_function_call not a direct call' warning
  powerpc/perf: Fix ref-counting on the PMU 'vpa_pmu'
  KVM: PPC: Enable CAP_SPAPR_TCE_VFIO on pSeries KVM guests
  powerpc/prom_init: Fixup missing #size-cells on PowerBook6,7
  powerpc/microwatt: Add SMP support
  powerpc: Define config option for processors with broadcast TLBIE
  powerpc/microwatt: Define an idle power-save function
  powerpc/microwatt: Device-tree updates
  powerpc/microwatt: Select COMMON_CLK in order to get the clock framework
  net: toshiba: Remove reference to PPC_IBM_CELL_BLADE
  net: spider_net: Remove powerpc Cell driver
  cpufreq: ppc_cbe: Remove powerpc Cell driver
  genirq: Remove IRQ_EDGE_EOI_HANDLER
  docs: Remove reference to removed CBE_CPUFREQ_SPU_GOVERNOR
  powerpc: Remove UDBG_RTAS_CONSOLE
  powerpc/io: Use standard barrier macros in io.c
  powerpc/io: Rename _insw_ns() etc.
  powerpc/io: Use generic raw accessors
  ...
2025-03-27 19:39:08 -07:00
Linus Torvalds 1a9239bb42 Networking changes for 6.15.
Core & protocols
 ----------------
 
  - Continue Netlink conversions to per-namespace RTNL lock
    (IPv4 routing, routing rules, routing next hops, ARP ioctls).
 
  - Continue extending the use of netdev instance locks. As a driver
    opt-in protect queue operations and (in due course) ethtool
    operations with the instance lock and not RTNL lock.
 
  - Support collecting TCP timestamps (data submitted, sent, acked)
    in BPF, allowing for transparent (to the application) and lower
    overhead tracking of TCP RPC performance.
 
  - Tweak existing networking Rx zero-copy infra to support zero-copy
    Rx via io_uring.
 
  - Optimize MPTCP performance in single subflow mode by 29%.
 
  - Enable GRO on packets which went thru XDP CPU redirect (were queued
    for processing on a different CPU). Improving TCP stream performance
    up to 2x.
 
  - Improve performance of contended connect() by 200% by searching
    for an available 4-tuple under RCU rather than a spin lock.
    Bring an additional 229% improvement by tweaking hash distribution.
 
  - Avoid unconditionally touching sk_tsflags on RX, improving
    performance under UDP flood by as much as 10%.
 
  - Avoid skb_clone() dance in ping_rcv() to improve performance under
    ping flood.
 
  - Avoid FIB lookup in netfilter if socket is available, 20% perf win.
 
  - Rework network device creation (in-kernel) API to more clearly
    identify network namespaces and their roles.
    There are up to 4 namespace roles but we used to have just 2 netns
    pointer arguments, interpreted differently based on context.
 
  - Use sysfs_break_active_protection() instead of trylock to avoid
    deadlocks between unregistering objects and sysfs access.
 
  - Add a new sysctl and sockopt for capping max retransmit timeout
    in TCP.
 
  - Support masking port and DSCP in routing rule matches.
 
  - Support dumping IPv4 multicast addresses with RTM_GETMULTICAST.
 
  - Support specifying at what time packet should be sent on AF_XDP
    sockets.
 
  - Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin users.
 
  - Add Netlink YAML spec for WiFi (nl80211) and conntrack.
 
  - Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols
    which only need to be exported when IPv6 support is built as a module.
 
  - Age FDB entries based on Rx not Tx traffic in VxLAN, similar
    to normal bridging.
 
  - Allow users to specify source port range for GENEVE tunnels.
 
  - netconsole: allow attaching kernel release, CPU ID and task name
    to messages as metadata
 
 Driver API
 ----------
 
  - Continue rework / fixing of Energy Efficient Ethernet (EEE) across
    the SW layers. Delegate the responsibilities to phylink where possible.
    Improve its handling in phylib.
 
  - Support symmetric OR-XOR RSS hashing algorithm.
 
  - Support tracking and preserving IRQ affinity by NAPI itself.
 
  - Support loopback mode speed selection for interface selftests.
 
 Device drivers
 --------------
 
  - Remove the IBM LCS driver for s390.
 
  - Remove the sb1000 cable modem driver.
 
  - Add support for SFP module access over SMBus.
 
  - Add MCTP transport driver for MCTP-over-USB.
 
  - Enable XDP metadata support in multiple drivers.
 
  - Ethernet high-speed NICs:
    - Broadcom (bnxt):
      - add PCIe TLP Processing Hints (TPH) support for new AMD platforms
      - support dumping RoCE queue state for debug
      - opt into instance locking
    - Intel (100G, ice, idpf):
      - ice: rework MSI-X IRQ management and distribution
      - ice: support for E830 devices
      - iavf: add support for Rx timestamping
      - iavf: opt into instance locking
    - nVidia/Mellanox:
      - mlx4: use page pool memory allocator for Rx
      - mlx5: support for one PTP device per hardware clock
      - mlx5: support for 200Gbps per-lane link modes
      - mlx5: move IPSec policy check after decryption
    - AMD/Solarflare:
      - support FW flashing via devlink
    - Cisco (enic):
      - use page pool memory allocator for Rx
      - enable 32, 64 byte CQEs
      - get max rx/tx ring size from the device
    - Meta (fbnic):
      - support flow steering and RSS configuration
      - report queue stats
      - support TCP segmentation
      - support IRQ coalescing
      - support ring size configuration
    - Marvell/Cavium:
      - support AF_XDP
    - Wangxun:
      - support for PTP clock and timestamping
    - Huawei (hibmcge):
      - checksum offload
      - add more statistics
 
  - Ethernet virtual:
    - VirtIO net:
      - aggressively suppress Tx completions, improve perf by 96% with
        1 CPU and 55% with 2 CPUs
      - expose NAPI to IRQ mapping and persist NAPI settings
    - Google (gve):
      - support XDP in DQO RDA Queue Format
      - opt into instance locking
    - Microsoft vNIC:
      - support BIG TCP
 
  - Ethernet NICs consumer, and embedded:
    - Synopsys (stmmac):
      - cleanup Tx and Tx clock setting and other link-focused cleanups
      - enable SGMII and 2500BASEX mode switching for Intel platforms
      - support Sophgo SG2044
    - Broadcom switches (b53):
      - support for BCM53101
    - TI:
      - iep: add perout configuration support
      - icssg: support XDP
    - Cadence (macb):
      - implement BQL
    - Xilinx (axinet):
      - support dynamic IRQ moderation and changing coalescing at runtime
      - implement BQL
      - report standard stats
    - MediaTek:
      - support phylink managed EEE
    - Intel:
      - igc: don't restart the interface on every XDP program change
    - RealTek (r8169):
      - support reading registers of internal PHYs directly
      - increase max jumbo packet size on RTL8125/RTL8126
    - Airoha:
      - support for RISC-V NPU packet processing unit
      - enable scatter-gather and support MTU up to 9kB
    - Tehuti (tn40xx):
      - support cards with TN4010 MAC and an Aquantia AQR105 PHY
 
  - Ethernet PHYs:
    - support for TJA1102S, TJA1121
    - dp83tg720: add randomized polling intervals for link detection
    - dp83822: support changing the transmit amplitude voltage
    - support for LEDs on 88q2xxx
 
  - CAN:
    - canxl: support Remote Request Substitution bit access
    - flexcan: add S32G2/S32G3 SoC
 
  - WiFi:
    - remove cooked monitor support
    - strict mode for better AP testing
    - basic EPCS support
    - OMI RX bandwidth reduction support
    - batman-adv: add support for jumbo frames
 
  - WiFi drivers:
    - RealTek (rtw88):
      - support RTL8814AE and RTL8814AU
    - RealTek (rtw89):
      - switch using wiphy_lock and wiphy_work
      - add BB context to manipulate two PHY as preparation of MLO
      - improve BT-coexistence mechanism to play A2DP smoothly
    - Intel (iwlwifi):
      - add new iwlmld sub-driver for latest HW/FW combinations
    - MediaTek (mt76):
      - preparation for mt7996 Multi-Link Operation (MLO) support
    - Qualcomm/Atheros (ath12k):
      - continued work on MLO
    - Silabs (wfx):
      - Wake-on-WLAN support
 
  - Bluetooth:
    - add support for skb TX SND/COMPLETION timestamping
    - hci_core: enable buffer flow control for SCO/eSCO
    - coredump: log devcd dumps into the monitor
 
  - Bluetooth drivers:
    - intel: add support to configure TX power
    - nxp: handle bootloader error during cmd5 and cmd7
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmfkLC8ACgkQMUZtbf5S
 Irsb5g/+L7oKOf0ALbaV9kxFsoz8AymZfAW9i/27F07omGJGpks8oX6j6rQLgIRO
 OQOFcp7XEdDh1+jh82gHVuPrw2/6lchLtW8ARtzdiQKFr5DRjrsbtua6GRc8iBqA
 DIRCBFoV2HuMkF39Vr09HMa9AZAT7QR2RLsRGpSq8E8Z8xxKz0X7oujs10PFpMTE
 IVKhTrVrk+NDot/IU2hzVpnpup+0ld+T2/ZaBklJGcU8uDffImsqNepHRyCG5UC3
 xz74Ju23MAj24Gct+og0yFUooF+lUltKyVm0FYCDCY3bASTwgY01NR3kEH/0NQvM
 cywLzd/ngHm/SMD2ggVAHkjZUieiIVHdaZ53dgjDeBOQoVP6p0dgUK7EumXX8Mx4
 8ReR2UiGoYRPaq9c4o+IjG4K027MwVK2p+mF1a6MLa+20XcyMbev8FIRbbHtC/V4
 z5/FsOAxcuICWkA1hU9bODrrGzIqemmdRgKG8sGuTJCt/kYGAn72/TCATGNSaCJ0
 00n2jN1aepa7wtywHJ5MhVzxN9iQX7+geUHXz0BI+lK4e1Pmk+vjGksymb9ai2fk
 eQAUV9ekub6q68/J16scD7XeOUM37bTLiMBQeIF8UtZBOJscKiS71zn9QP9Twwxv
 P2pm01RDZUI+z5ZX3hc12Pm1vjRHaAh9S1JpAw/pTOVlQ+mAJEM=
 =XY0S
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Continue Netlink conversions to per-namespace RTNL lock
     (IPv4 routing, routing rules, routing next hops, ARP ioctls)

   - Continue extending the use of netdev instance locks. As a driver
     opt-in protect queue operations and (in due course) ethtool
     operations with the instance lock and not RTNL lock.

   - Support collecting TCP timestamps (data submitted, sent, acked) in
     BPF, allowing for transparent (to the application) and lower
     overhead tracking of TCP RPC performance.

   - Tweak existing networking Rx zero-copy infra to support zero-copy
     Rx via io_uring.

   - Optimize MPTCP performance in single subflow mode by 29%.

   - Enable GRO on packets which went thru XDP CPU redirect (were queued
     for processing on a different CPU). Improving TCP stream
     performance up to 2x.

   - Improve performance of contended connect() by 200% by searching for
     an available 4-tuple under RCU rather than a spin lock. Bring an
     additional 229% improvement by tweaking hash distribution.

   - Avoid unconditionally touching sk_tsflags on RX, improving
     performance under UDP flood by as much as 10%.

   - Avoid skb_clone() dance in ping_rcv() to improve performance under
     ping flood.

   - Avoid FIB lookup in netfilter if socket is available, 20% perf win.

   - Rework network device creation (in-kernel) API to more clearly
     identify network namespaces and their roles. There are up to 4
     namespace roles but we used to have just 2 netns pointer arguments,
     interpreted differently based on context.

   - Use sysfs_break_active_protection() instead of trylock to avoid
     deadlocks between unregistering objects and sysfs access.

   - Add a new sysctl and sockopt for capping max retransmit timeout in
     TCP.

   - Support masking port and DSCP in routing rule matches.

   - Support dumping IPv4 multicast addresses with RTM_GETMULTICAST.

   - Support specifying at what time packet should be sent on AF_XDP
     sockets.

   - Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin
     users.

   - Add Netlink YAML spec for WiFi (nl80211) and conntrack.

   - Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols
     which only need to be exported when IPv6 support is built as a
     module.

   - Age FDB entries based on Rx not Tx traffic in VxLAN, similar to
     normal bridging.

   - Allow users to specify source port range for GENEVE tunnels.

   - netconsole: allow attaching kernel release, CPU ID and task name to
     messages as metadata

  Driver API:

   - Continue rework / fixing of Energy Efficient Ethernet (EEE) across
     the SW layers. Delegate the responsibilities to phylink where
     possible. Improve its handling in phylib.

   - Support symmetric OR-XOR RSS hashing algorithm.

   - Support tracking and preserving IRQ affinity by NAPI itself.

   - Support loopback mode speed selection for interface selftests.

  Device drivers:

   - Remove the IBM LCS driver for s390

   - Remove the sb1000 cable modem driver

   - Add support for SFP module access over SMBus

   - Add MCTP transport driver for MCTP-over-USB

   - Enable XDP metadata support in multiple drivers

   - Ethernet high-speed NICs:
      - Broadcom (bnxt):
         - add PCIe TLP Processing Hints (TPH) support for new AMD
           platforms
         - support dumping RoCE queue state for debug
         - opt into instance locking
      - Intel (100G, ice, idpf):
         - ice: rework MSI-X IRQ management and distribution
         - ice: support for E830 devices
         - iavf: add support for Rx timestamping
         - iavf: opt into instance locking
      - nVidia/Mellanox:
         - mlx4: use page pool memory allocator for Rx
         - mlx5: support for one PTP device per hardware clock
         - mlx5: support for 200Gbps per-lane link modes
         - mlx5: move IPSec policy check after decryption
      - AMD/Solarflare:
         - support FW flashing via devlink
      - Cisco (enic):
         - use page pool memory allocator for Rx
         - enable 32, 64 byte CQEs
         - get max rx/tx ring size from the device
      - Meta (fbnic):
         - support flow steering and RSS configuration
         - report queue stats
         - support TCP segmentation
         - support IRQ coalescing
         - support ring size configuration
      - Marvell/Cavium:
         - support AF_XDP
      - Wangxun:
         - support for PTP clock and timestamping
      - Huawei (hibmcge):
         - checksum offload
         - add more statistics

   - Ethernet virtual:
      - VirtIO net:
         - aggressively suppress Tx completions, improve perf by 96%
           with 1 CPU and 55% with 2 CPUs
         - expose NAPI to IRQ mapping and persist NAPI settings
      - Google (gve):
         - support XDP in DQO RDA Queue Format
         - opt into instance locking
      - Microsoft vNIC:
         - support BIG TCP

   - Ethernet NICs consumer, and embedded:
      - Synopsys (stmmac):
         - cleanup Tx and Tx clock setting and other link-focused
           cleanups
         - enable SGMII and 2500BASEX mode switching for Intel platforms
         - support Sophgo SG2044
      - Broadcom switches (b53):
         - support for BCM53101
      - TI:
         - iep: add perout configuration support
         - icssg: support XDP
      - Cadence (macb):
         - implement BQL
      - Xilinx (axinet):
         - support dynamic IRQ moderation and changing coalescing at
           runtime
         - implement BQL
         - report standard stats
      - MediaTek:
         - support phylink managed EEE
      - Intel:
         - igc: don't restart the interface on every XDP program change
      - RealTek (r8169):
         - support reading registers of internal PHYs directly
         - increase max jumbo packet size on RTL8125/RTL8126
      - Airoha:
         - support for RISC-V NPU packet processing unit
         - enable scatter-gather and support MTU up to 9kB
      - Tehuti (tn40xx):
         - support cards with TN4010 MAC and an Aquantia AQR105 PHY

   - Ethernet PHYs:
      - support for TJA1102S, TJA1121
      - dp83tg720: add randomized polling intervals for link detection
      - dp83822: support changing the transmit amplitude voltage
      - support for LEDs on 88q2xxx

   - CAN:
      - canxl: support Remote Request Substitution bit access
      - flexcan: add S32G2/S32G3 SoC

   - WiFi:
      - remove cooked monitor support
      - strict mode for better AP testing
      - basic EPCS support
      - OMI RX bandwidth reduction support
      - batman-adv: add support for jumbo frames

   - WiFi drivers:
      - RealTek (rtw88):
         - support RTL8814AE and RTL8814AU
      - RealTek (rtw89):
         - switch using wiphy_lock and wiphy_work
         - add BB context to manipulate two PHY as preparation of MLO
         - improve BT-coexistence mechanism to play A2DP smoothly
      - Intel (iwlwifi):
         - add new iwlmld sub-driver for latest HW/FW combinations
      - MediaTek (mt76):
         - preparation for mt7996 Multi-Link Operation (MLO) support
      - Qualcomm/Atheros (ath12k):
         - continued work on MLO
      - Silabs (wfx):
         - Wake-on-WLAN support

   - Bluetooth:
      - add support for skb TX SND/COMPLETION timestamping
      - hci_core: enable buffer flow control for SCO/eSCO
      - coredump: log devcd dumps into the monitor

   - Bluetooth drivers:
      - intel: add support to configure TX power
      - nxp: handle bootloader error during cmd5 and cmd7"

* tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1681 commits)
  unix: fix up for "apparmor: add fine grained af_unix mediation"
  mctp: Fix incorrect tx flow invalidation condition in mctp-i2c
  net: usb: asix: ax88772: Increase phy_name size
  net: phy: Introduce PHY_ID_SIZE — minimum size for PHY ID string
  net: libwx: fix Tx L4 checksum
  net: libwx: fix Tx descriptor content for some tunnel packets
  atm: Fix NULL pointer dereference
  net: tn40xx: add pci-id of the aqr105-based Tehuti TN4010 cards
  net: tn40xx: prepare tn40xx driver to find phy of the TN9510 card
  net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus
  net: phy: aquantia: add essential functions to aqr105 driver
  net: phy: aquantia: search for firmware-name in fwnode
  net: phy: aquantia: add probe function to aqr105 for firmware loading
  net: phy: Add swnode support to mdiobus_scan
  gve: add XDP DROP and PASS support for DQ
  gve: update XDP allocation path support RX buffer posting
  gve: merge packet buffer size fields
  gve: update GQ RX to use buf_size
  gve: introduce config-based allocation for XDP
  gve: remove xdp_xsk_done and xdp_xsk_wakeup statistics
  ...
2025-03-26 21:48:21 -07:00
Jakub Kicinski 4f74a45c6b bluetooth-next pull request for net-next:
core:
 
  - Add support for skb TX SND/COMPLETION timestamping
  - hci_core: Enable buffer flow control for SCO/eSCO
  - coredump: Log devcd dumps into the monitor
 
  drivers:
 
  - btusb: Add 2 HWIDs for MT7922
  - btusb: Fix regression in the initialization of fake Bluetooth controllers
  - btusb: Add 14 USB device IDs for Qualcomm WCN785x
  - btintel: Add support for Intel Scorpius Peak
  - btintel: Add support to configure TX power
  - btintel: Add DSBR support for ScP
  - btintel_pcie: Add device id of Whale Peak
  - btintel_pcie: Setup buffers for firmware traces
  - btintel_pcie: Read hardware exception data
  - btintel_pcie: Add support for device coredump
  - btintel_pcie: Trigger device coredump on hardware exception
  - btnxpuart: Support for controller wakeup gpio config
  - btnxpuart: Add support to set BD address
  - btnxpuart: Add correct bootloader error codes
  - btnxpuart: Handle bootloader error during cmd5 and cmd7
  - btnxpuart: Fix kernel panic during FW release
  - qca: add WCN3950 support
  - hci_qca: use the power sequencer for wcn6750
  - btmtksdio: Prevent enabling interrupts after IRQ handler removal
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmfjA3AZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKeWWD/0QZYBrNP9QSTyYTeNlYhCC
 Lw/n7n3+LhxOqIu+tOGS7UplTqR3p4WQGu2g+XV9wSu4dvLplZGn/40XtiXJA0+r
 VsXivQ4IR/Vjd8sNLLixfmuH4g4CbMSblQvECD3/5wFTeSH8T6/gyts/WV/LDYOJ
 jp5kG6HCFHMv7RaiaHdZ0Pe4c1xJ/Ek8TnrW4G/kPBaLzm+lhjRkfzxCx8cO0k0H
 mpoheUohLtUSgfLf49826t7rp3HDuX9db2hiGXQfKSrL2milwufKNaMFTmbuU2Uq
 IyAwR1CEdSsKlcpbnVNF05r94sjf8NjuBD3YWxB9OfVXq9aymJYZdGh54XT5nF3f
 ccD1WNnOoTsXDbEAnuVx+EYrJAI70e7vE4m8XbpxJcxhFoSIQCmtDoXUY199LiEG
 El7DlwouNFESmOIj6gSK7ogUEhmcQA0AJxBVpdxnGBAcQMn1hrGSb75VGg7rul8K
 HcBtf5j4gxZum2fzrufeY1+eBxfSRjNFIsnutAr53gEtidPZYlmiRyAuqQJMo2sO
 0hlpC6gQGQJ5HiTR5Jbo9u+OC1oeHHXNN5IpNrd6J65n0pqCRldDsoXCerEJsOou
 cqWzb9lfQCrjRnWRxUWOLukdYRgBH15wuORU+Z7/qVhqOr49RvARnzq7AjGSenTS
 YKb2N+NGlqpxG+vQscFJUA==
 =XCdn
 -----END PGP SIGNATURE-----

Merge tag 'for-net-next-2025-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Luiz Augusto von Dentz says:

====================
bluetooth-next pull request for net-next:

core:

 - Add support for skb TX SND/COMPLETION timestamping
 - hci_core: Enable buffer flow control for SCO/eSCO
 - coredump: Log devcd dumps into the monitor

 drivers:

 - btusb: Add 2 HWIDs for MT7922
 - btusb: Fix regression in the initialization of fake Bluetooth controllers
 - btusb: Add 14 USB device IDs for Qualcomm WCN785x
 - btintel: Add support for Intel Scorpius Peak
 - btintel: Add support to configure TX power
 - btintel: Add DSBR support for ScP
 - btintel_pcie: Add device id of Whale Peak
 - btintel_pcie: Setup buffers for firmware traces
 - btintel_pcie: Read hardware exception data
 - btintel_pcie: Add support for device coredump
 - btintel_pcie: Trigger device coredump on hardware exception
 - btnxpuart: Support for controller wakeup gpio config
 - btnxpuart: Add support to set BD address
 - btnxpuart: Add correct bootloader error codes
 - btnxpuart: Handle bootloader error during cmd5 and cmd7
 - btnxpuart: Fix kernel panic during FW release
 - qca: add WCN3950 support
 - hci_qca: use the power sequencer for wcn6750
 - btmtksdio: Prevent enabling interrupts after IRQ handler removal

* tag 'for-net-next-2025-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (53 commits)
  Bluetooth: MGMT: Add LL Privacy Setting
  Bluetooth: hci_event: Fix handling of HCI_EV_LE_DIRECT_ADV_REPORT
  Bluetooth: btnxpuart: Fix kernel panic during FW release
  Bluetooth: btnxpuart: Handle bootloader error during cmd5 and cmd7
  Bluetooth: btnxpuart: Add correct bootloader error codes
  t blameBluetooth: btintel: Fix leading white space
  Bluetooth: btintel: Add support to configure TX power
  Bluetooth: btmtksdio: Prevent enabling interrupts after IRQ handler removal
  Bluetooth: btmtk: Remove the resetting step before downloading the fw
  Bluetooth: SCO: add TX timestamping
  Bluetooth: L2CAP: add TX timestamping
  Bluetooth: ISO: add TX timestamping
  Bluetooth: add support for skb TX SND/COMPLETION timestamping
  net-timestamp: COMPLETION timestamp on packet tx completion
  HCI: coredump: Log devcd dumps into the monitor
  Bluetooth: HCI: Add definition of hci_rp_remote_name_req_cancel
  Bluetooth: hci_vhci: Mark Sync Flow Control as supported
  Bluetooth: hci_core: Enable buffer flow control for SCO/eSCO
  Bluetooth: btintel_pci: Fix build warning
  Bluetooth: btintel_pcie: Trigger device coredump on hardware exception
  ...
====================

Link: https://patch.msgid.link/20250325192925.2497890-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 14:00:48 -07:00
Eric Dumazet f1e30061e8 tcp/dccp: remove icsk->icsk_ack.timeout
icsk->icsk_ack.timeout can be replaced by icsk->csk_delack_timer.expires

This saves 8 bytes in TCP/DCCP sockets and helps for better cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250324203607.703850-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 10:34:33 -07:00
Eric Dumazet a7c428ee8f tcp/dccp: remove icsk->icsk_timeout
icsk->icsk_timeout can be replaced by icsk->icsk_retransmit_timer.expires

This saves 8 bytes in TCP/DCCP sockets and helps for better cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250324203607.703850-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 10:34:33 -07:00
Pauli Virtanen 983e0e4e87 net-timestamp: COMPLETION timestamp on packet tx completion
Add SOF_TIMESTAMPING_TX_COMPLETION, for requesting a software timestamp
when hardware reports a packet completed.

Completion tstamp is useful for Bluetooth, as hardware timestamps do not
exist in the HCI specification except for ISO packets, and the hardware
has a queue where packets may wait.  In this case the software SND
timestamp only reflects the kernel-side part of the total latency
(usually small) and queue length (usually 0 unless HW buffers
congested), whereas the completion report time is more informative of
the true latency.

It may also be useful in other cases where HW TX timestamps cannot be
obtained and user wants to estimate an upper bound to when the TX
probably happened.

Signed-off-by: Pauli Virtanen <pav@iki.fi>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-03-25 12:48:05 -04:00
Jakub Kicinski 586b7b3ebb ipsec-next-2025-03-24
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmfg9oUACgkQrB3Eaf9P
 W7d8eRAAhUdQztUoVjNfRBScD34EHEBq8ruFMgbHcXkkdhg223CUaTYKqz3dYE1E
 04OCSwe7yxFFSs7CKR4dRMSzcx7uH/ISPprk45g+wwoIzBCYOWjLBzS5qmamtSYu
 E3sI/QZaVUU7mFDg5n3sr5nkCNz+LtTL2dUbOgi5AOqY9h7LZRzLBEB1RCVOioVO
 14vDRBJ5RGIvmNx3pd0sRhB4BySUw1GQVTYLSFaL/V2JZcRu3yGhKT8q637fdKcS
 blvdf5q4CYyu5i3p9LHhsg33D8fYUotBVFtzw3QaM1G9MdyhMF4WUbeZCa2kkk7J
 N7j2N0WDjP7crKQkPbm9ATQiruF2zvPcYJqgIHKuNhWY35KNIFARWxbP8KGJJFUI
 EqUUKuT0lUi39hULmm5YEqCGWLNQldTBOOZL1fccRj2DZelx4uhQsnt5B2yimUCD
 7QJGH0Vx6NJAutuCgTXQtmD0qO2XYi50XLmh4K+HZ1M3DkSGohcYSz+aj8IIY/wT
 +yT55uRoONqDbh9kOLHab0WWE6R+XJyY5rKnz6XpzyTPh0OwoxrbVXJF045NYIpB
 tSl9ykVpymvWNT56nsfXEOm5THZhdvL+uDS/QV4rtF9En4TEJU0dGbhESvriuu2j
 g6EAQpYJqL9OwA5aKVZCGlHf7siji7Z/uc57DADZ8mtmDXVMqco=
 =9RMg
 -----END PGP SIGNATURE-----

Merge tag 'ipsec-next-2025-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2025-03-24

1) Prevent setting high order sequence number bits input in
   non-ESN mode. From Leon Romanovsky.

2) Support PMTU handling in tunnel mode for packet offload.
   From Leon Romanovsky.

3) Make xfrm_state_lookup_byaddr lockless.
   From Florian Westphal.

4) Remove unnecessary NULL check in xfrm_lookup_with_ifid().
   From Dan Carpenter.

* tag 'ipsec-next-2025-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  xfrm: Remove unnecessary NULL check in xfrm_lookup_with_ifid()
  xfrm: state: make xfrm_state_lookup_byaddr lockless
  xfrm: check for PMTU in tunnel mode for packet offload
  xfrm: provide common xdo_dev_offload_ok callback implementation
  xfrm: rely on XFRM offload
  xfrm: simplify SA initialization routine
  xfrm: delay initialization of offload path till its actually requested
  xfrm: prevent high SEQ input in non-ESN mode
====================

Link: https://patch.msgid.link/20250324061855.4116819-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 08:50:10 -07:00
Jason Xing f38805c5d2 tcp: support TCP_RTO_MIN_US for set/getsockopt use
Support adjusting/reading RTO MIN for socket level by using set/getsockopt().

This new option has the same effect as TCP_BPF_RTO_MIN, which means it
doesn't affect RTAX_RTO_MIN usage (by using ip route...). Considering that
bpf option was implemented before this patch, so we need to use a standalone
new option for pure tcp set/getsockopt() use.

When the socket is created, its icsk_rto_min is set to the default
value that is controlled by sysctl_tcp_rto_min_us. Then if application
calls setsockopt() with TCP_RTO_MIN_US flag to pass a valid value, then
icsk_rto_min will be overridden in jiffies unit.

This patch adds WRITE_ONCE/READ_ONCE to avoid data-race around
icsk_rto_min.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250317120314.41404-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 04:27:19 -07:00
Linus Torvalds f81c2b8150 It has been a reasonably busy cycle for docs...
- Significant changes throughout the tree to bring Python code up to
   current standards and raise the minimum Python required to 3.9.  Much of
   this is preparatory to replacing the ancient Perl scripts/kernel-doc
   horror with a slightly less horrifying Python implementation, expected
   for 6.16.
 
 - Update the minimum Sphinx required to 3.4.3, allowing us to remove a
   bunch of older compatibility code.
 
 - Rework and improve the generation of the ABI documentation.
 
   (All of the above done by Mauro)
 
 - Lots of translation updates.  Alex Shi and Yanteng Si are taking on
   responsibility for the Chinese translations going forward; that work will
   still get to you via docs-next
 
 - Try to standardize the format for indicating a developer's affiliation in
   commit tags.
 
 - Clarify the TAB's role in CoC enforcement actions.
 
 - Try to spell out the rules for when a commit tag can name another
   developer without their explicit permission.
 
 Plus lots of other typo fixes and updates.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmfccxwPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YoYcH/jL/nS8YAiJ3awF5PH5tR3m5ddt9l+fKXWJx
 PB3KcHtDORbWltTA+Tvo2aP1jxGY9wqsIIvl+nvjJyUcfd72g4HNfTDUDXwP3OFU
 wTkaEAQp3n/hqnLXtJ2AzV3Ir5cIfEL2d7F6QsN1Gnof8iu2OuMk5iMeb0iexUX6
 FYjJq+jknh30VdAp2hxHy8q17R7h7PySh5OsjeAYJJroLv60n3DwQgnzHjXC/FT2
 Qq1UuEzlSpRoso2o2NwVTND6OVW081umo6YrioqD7ZC2G2fhRgLFJJtJGXDNcyUl
 gQv9xLSaTD97V4zaWPm28ObNBpY/GnAd4hMjB17wAH5xUfVS5Aw=
 =Gvdp
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.15' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "It has been a reasonably busy cycle for docs...

   - Significant changes throughout the tree to bring Python code up to
     current standards and raise the minimum Python required to 3.9

     Much of this is preparatory to replacing the ancient Perl
     scripts/kernel-doc horror with a slightly less horrifying Python
     implementation, expected for 6.16

   - Update the minimum Sphinx required to 3.4.3, allowing us to remove
     a bunch of older compatibility code

   - Rework and improve the generation of the ABI documentation

  (All of the above done by Mauro)

   - Lots of translation updates. Alex Shi and Yanteng Si are taking on
     responsibility for the Chinese translations going forward; that
     work will still get to you via docs-next

   - Try to standardize the format for indicating a developer's
     affiliation in commit tags

   - Clarify the TAB's role in CoC enforcement actions

   - Try to spell out the rules for when a commit tag can name another
     developer without their explicit permission

  Plus lots of other typo fixes and updates"

* tag 'docs-6.15' of git://git.lwn.net/linux: (98 commits)
  docs/zh_CN: fix spelling mistake
  docs/Chinese: change the disclaimer words
  docs/zh_CN: Add snp-tdx-threat-model index Chinese translation
  docs: driver-api: firmware: clarify userspace requirements
  docs: clarify rules wrt tagging other people
  docs: Remove outdated highuid.rst documentation
  Documentation: dma-buf: heaps: Add heap name definitions
  docs/.../submit-checklist: Use Documentation/admin-guide/abi.rst for cross-ref of README
  docs: Correct installation instruction
  Documentation: kcsan: fix "Plain Accesses and Data Races" URL in kcsan.rst
  Documentation/CoC: Spell out the TAB role in enforcement decisions
  Documentation: ocxl.rst: Update consortium site
  scripts: get_feat.pl: substitute s390x with s390
  scripts/kernel-doc: drop dead code for Wcontents_before_sections
  scripts/kernel-doc: don't add not needed new lines
  docs: driver-api/infiniband.rst: fix Kerneldoc markup
  drivers: firewire: firewire-cdev.h: fix identation on a kernel-doc markup
  drivers: media: intel-ipu3.h: fix identation on a kernel-doc markup
  include/asm-generic/io.h: fix kerneldoc markup
  Docs/arch/arm64: Fix spelling in amu.rst
  ...
2025-03-24 18:42:27 -07:00
Ryohei Kinugawa 4f34c2b779 docs/kcm: Fix typo "BFP"
'BFP' should be 'BPF'.

Signed-off-by: Ryohei Kinugawa <ryohei.kinugawa@gmail.com>
Link: https://patch.msgid.link/20250318095154.4187952-1-ryohei.kinugawa@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24 12:20:42 -07:00
WangYuli 6d1929475e docs: networking: strparser: Fix a typo
The context indicates that 'than' is the correct word instead of 'then',
as a comparison is being performed.

Given that 'then' is also a valid English word, checkpatch.pl wouldn't
have picked up on this spelling error.

This typo was caught by AI during code review.

Suggested-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Reviewed-by: Yanteng Si <si.yanteng@linux.dev>
Link: https://patch.msgid.link/A43BEA49ED5CC6E5+20250318074656.644391-1-wangyuli@uniontech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24 12:14:16 -07:00
Yui Washizu f8e1bcec62 docs: fix the path of example code and example commands for device memory TCP
This updates the old path and fixes the description of unavailable options.

Signed-off-by: Yui Washizu <yui.washidu@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250318061251.775191-1-yui.washidu@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24 12:12:09 -07:00
Eric Dumazet 1937a0be28 tcp: move icsk_clean_acked to a better location
As a followup of my presentation in Zagreb for netdev 0x19:

icsk_clean_acked is only used by TCP when/if CONFIG_TLS_DEVICE
is enabled from tcp_ack().

Rename it to tcp_clean_acked, move it to tcp_sock structure
in the tcp_sock_read_rx for better cache locality in TCP
fast path.

Define this field only when CONFIG_TLS_DEVICE is enabled
saving 8 bytes on configs not using it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250317085313.2023214-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24 09:55:18 -07:00
Yael Chemla c3b999cad7 net/mlx5e: Expose port reset cycle recovery counter via ethtool
Display recovery event of PPCNT recovery counters group. Counts (per
link) the number of total successful recovery events of any recovery
types during port reset cycle.

Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/1742112876-2890-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21 21:23:47 +01:00
Breno Leitao 56ad890de2 docs: netconsole: document release feature
Add documentation explaining the kernel release auto-population feature
in netconsole.

This feature appends kernel version information to the userdata
dictionary in every message sent when enabled via the `release_enabled`
file in the configfs hierarchy.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-netcons_release-v1-6-07979c4b86af@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21 18:59:25 +01:00
Arnd Bergmann 3fed9fda15 net: remove sb1000 cable modem driver
This one is hilariously outdated, it provided a faster downlink over
TV cable for users of analog modems in the 1990s, through an ISA card.

The web page for the userspace tools has been broken for 25 years, and
the driver has only ever seen mechanical updates.

Link: http://web.archive.org/web/20000611165545/http://home.adelphia.net:80/~siglercm/sb1000.html
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250312085236.2531870-1-arnd@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21 17:11:54 +01:00
Geliang Tang fa3ee9dd80 mptcp: sysctl: add available_path_managers
Similarly to net.mptcp.available_schedulers, this patch adds a new one
net.mptcp.available_path_managers to list the available path managers.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-11-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20 10:14:49 +01:00
Geliang Tang 595c26d122 mptcp: sysctl: set path manager by name
Similar to net.mptcp.scheduler, a new net.mptcp.path_manager sysctl knob
is added to determine which path manager will be used by each newly
created MPTCP socket by setting the name of it.

Dealing with an explicit name is easier than with a number, especially
when more PMs will be introduced.

This sysctl knob makes the old one "pm_type" deprecated.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-8-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20 10:14:48 +01:00
Uday Shankar f8a10bed32 netconsole: allow selection of egress interface via MAC address
Currently, netconsole has two methods of configuration - module
parameter and configfs. The former interface allows for netconsole
activation earlier during boot (by specifying the module parameter on
the kernel command line), so it is preferred for debugging issues which
arise before userspace is up/the configfs interface can be used. The
module parameter syntax requires specifying the egress interface name.
This requirement makes it hard to use for a couple reasons:
- The egress interface name can be hard or impossible to predict. For
  example, installing a new network card in a system can change the
  interface names assigned by the kernel.
- When constructing the module parameter, one may have trouble
  determining the original (kernel-assigned) name of the interface
  (which is the name that should be given to netconsole) if some stable
  interface naming scheme is in effect. A human can usually look at
  kernel logs to determine the original name, but this is very painful
  if automation is constructing the parameter.

For these reasons, allow selection of the egress interface via MAC
address when configuring netconsole using the module parameter. Update
the netconsole documentation with an example of the new syntax.
Selection of egress interface by MAC address via configfs is far less
interesting (since when this interface can be used, one should be able
to easily convert between MAC address and interface name), so it is left
unimplemented.

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Tested-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250312-netconsole-v6-2-3437933e79b8@purestorage.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-19 19:17:58 +01:00
Paolo Abeni ed6bcbe39e This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
 
  - drop batadv_priv_debug_log struct, by Sven Eckelmann
 
  - adopt netdev_hold() / netdev_put(), by Eric Dumazet
 
  - add support for jumbo frames, by Sven Eckelmann
 
  - use consistent name for mesh interface, by Sven Eckelmann
 
  - cleanup B.A.T.M.A.N. IV OGM aggregation handling,
    by Sven Eckelmann (4 patches)
 
  - add missing newlines for log macros, by Sven Eckelmann
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmfTCE0WHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoea+EACQ3fJp7kf6iWxhJLiwon3Aoo4u
 dg84xX5nc02KHT/zUC/ADSSN3FDdc3y8jgjpLfZv5TkWmRUqMNP9449PaoXpiIsW
 oI6HKrVaEKIEPpMAaBrv0dPtG5fBZPku4tZBLP7ekU/2N0Ri3eKTCph7IznO3Ma5
 cM4FN6H9Nxwnht2qN7Gx3nWoBn9LeaSyKfAFLR+hI3QxTh9PYLV26rGCLrUhbbvT
 St8TtgdeULSH/50rhMqC4wpT8verk+BMJdShNPLyse4QSRky6DD6YOj7zS/c7akP
 z8S77gvp0mcgzb/t49K4AFjUFEvK9l0rvPDx6DMrfJwQAZUPjFY93y1XI5QUOPRZ
 +kVj0nr2XRc7c1/wDNfn4131T9Twi41+TX6BI8Fan8WpWKVTo2a/O5VWk5pHKjy+
 a8aZqL41444wWyFhyCL+Io1y1mDc9TDa7644+fJjR5H4h47M9rGnLVEX3F9mVtBl
 LIOwpHpHXg9z3JDgpCwLWykUzW4lEuF1hQFUD3DPIbVKENjvibiiB+qPNZJG1icL
 n9Pt6843vF8gcw3aGbp0VGsESILiKr7EN7CMkhfcO1CaqAErJG4DinubH6DLQIjO
 uW5Ss2751BfqWPOakWVzvvWafnleVRdw4rysbil4ko/xONg0qdfTMWe5fS42bkbk
 CMmGPNeSiqVTXp5bIg==
 =hmiI
 -----END PGP SIGNATURE-----

Merge tag 'batadv-next-pullrequest-20250313' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - drop batadv_priv_debug_log struct, by Sven Eckelmann

 - adopt netdev_hold() / netdev_put(), by Eric Dumazet

 - add support for jumbo frames, by Sven Eckelmann

 - use consistent name for mesh interface, by Sven Eckelmann

 - cleanup B.A.T.M.A.N. IV OGM aggregation handling,
   by Sven Eckelmann (4 patches)

 - add missing newlines for log macros, by Sven Eckelmann

* tag 'batadv-next-pullrequest-20250313' of git://git.open-mesh.org/linux-merge:
  batman-adv: add missing newlines for log macros
  batman-adv: Limit aggregation size to outgoing MTU
  batman-adv: Use actual packet count for aggregated packets
  batman-adv: Switch to bitmap helper for aggregation handling
  batman-adv: Limit number of aggregated packets directly
  batman-adv: Use consistent name for mesh interface
  batman-adv: Add support for jumbo frames
  batman-adv: adopt netdev_hold() / netdev_put()
  batman-adv: Drop batadv_priv_debug_log struct
  batman-adv: Start new development cycle
====================

Link: https://patch.msgid.link/20250313164519.72808-1-sw@simonwunderlich.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18 12:10:20 +01:00
Pavan Chebbi 2c4d376c3a bnxt_en: Add devlink support for ENABLE_ROCE nvm parameter
Add set/show support for the ENABLE_ROCE NVM parameter to
enable/disable RoCE for a PF.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Co-developed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250310183129.3154117-4-michael.chan@broadcom.com
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18 10:25:22 +01:00
Stanislav Fomichev 0a13c1e0a4 net: revert to lockless TC_SETUP_BLOCK and TC_SETUP_FT
There is a couple of places from which we can arrive to ndo_setup_tc
with TC_SETUP_BLOCK/TC_SETUP_FT:
- netlink
- netlink notifier
- netdev notifier

Locking netdev too deep in this call chain seems to be problematic
(especially assuming some/all of the call_netdevice_notifiers
NETDEV_UNREGISTER) might soon be running with the instance lock).
Revert to lockless ndo_setup_tc for TC_SETUP_BLOCK/TC_SETUP_FT. NFT
framework already takes care of most of the locking. Document
the assumptions.

ndo_setup_tc TC_SETUP_BLOCK
  nft_block_offload_cmd
    nft_chain_offload_cmd
      nft_flow_block_chain
        nft_flow_offload_chain
	  nft_flow_rule_offload_abort
	    nft_flow_rule_offload_commit
	  nft_flow_rule_offload_commit
	    nf_tables_commit
	      nfnetlink_rcv_batch
	        nfnetlink_rcv_skb_batch
		  nfnetlink_rcv
	nft_offload_netdev_event
	  NETDEV_UNREGISTER notifier

ndo_setup_tc TC_SETUP_FT
  nf_flow_table_offload_cmd
    nf_flow_table_offload_setup
      nft_unregister_flowtable_hook
        nft_register_flowtable_net_hooks
	  nft_flowtable_update
	  nf_tables_newflowtable
	    nfnetlink_rcv_batch (.call NFNL_CB_BATCH)
	nft_flowtable_update
	  nf_tables_newflowtable
	nft_flowtable_event
	  nf_tables_flowtable_event
	    NETDEV_UNREGISTER notifier
      __nft_unregister_flowtable_net_hooks
        nft_unregister_flowtable_net_hooks
	  nf_tables_commit
	    nfnetlink_rcv_batch (.call NFNL_CB_BATCH)
	  __nf_tables_abort
	    nf_tables_abort
	      nfnetlink_rcv_batch
	__nft_release_hook
	  __nft_release_hooks
	    nf_tables_pre_exit_net -> module unload
	  nft_rcv_nl_event
	    netlink_register_notifier (oh boy)
      nft_register_flowtable_net_hooks
      	nft_flowtable_update
	  nf_tables_newflowtable
        nf_tables_newflowtable

Fixes: c4f0f30b42 ("net: hold netdev instance lock during nft ndo_setup_tc")
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Reported-by: syzbot+0afb4bcf91e5a1afdcad@syzkaller.appspotmail.com
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250308044726.1193222-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-12 13:02:00 -07:00
Stanislav Fomichev cc34acd577 docs: net: document new locking reality
Also clarify ndo_get_stats (that read and write paths can run
concurrently) and mention only RCU.

Cc: Saeed Mahameed <saeed@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250305163732.2766420-14-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06 12:59:45 -08:00
Breno Leitao 7010b61983 netconsole: docs: document the task name feature
Add documentation for the netconsole task name feature in
Documentation/networking/netconsole.rst. This explains how to enable
task name via configfs and demonstrates the output format.

The documentation includes:
- How to enable/disable the feature via taskname_enabled
- The format of the task name in the output
- An example showing the task name appearing in messages

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04 15:28:28 +01:00
Nicolas Dichtel 0c493da863 net: rename netns_local to netns_immutable
The name 'netns_local' is confusing. A following commit will export it via
netlink, so let's use a more explicit name.

Reported-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04 12:44:48 +01:00
Jakub Kicinski 357660d759 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.14-rc5).

Conflicts:

drivers/net/ethernet/cadence/macb_main.c
  fa52f15c74 ("net: cadence: macb: Synchronize stats calculations")
  75696dd0fd ("net: cadence: macb: Convert to get_stats64")
https://lore.kernel.org/20250224125848.68ee63e5@canb.auug.org.au

Adjacent changes:

drivers/net/ethernet/intel/ice/ice_sriov.c
  79990cf5e7 ("ice: Fix deinitializing VF in error path")
  a203163274 ("ice: simplify VF MSI-X managing")

net/ipv4/tcp.c
  18912c5206 ("tcp: devmem: don't write truncated dmabuf CMSGs to userspace")
  297d389e9e ("net: prefix devmem specific helpers")

net/mptcp/subflow.c
  8668860b0a ("mptcp: reset when MPTCP opts are dropped after join")
  c3349a22c2 ("mptcp: consolidate subflow cleanup")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27 10:20:58 -08:00
Ahmed Zaki bd7c00605e net: move aRFS rmap management and CPU affinity to core
A common task for most drivers is to remember the user-set CPU affinity
to its IRQs. On each netdev reset, the driver should re-assign the user's
settings to the IRQs. Unify this task across all drivers by moving the CPU
affinity to napi->config.

However, to move the CPU affinity to core, we also need to move aRFS
rmap management since aRFS uses its own IRQ notifiers.

For the aRFS, add a new netdev flag "rx_cpu_rmap_auto". Drivers supporting
aRFS should set the flag via netif_enable_cpu_rmap() and core will allocate
and manage the aRFS rmaps. Freeing the rmap is also done by core when the
netdev is freed. For better IRQ affinity management, move the IRQ rmap
notifier inside the napi_struct and add new notify.notify and
notify.release functions: netif_irq_cpu_rmap_notify() and
netif_napi_affinity_release().

Now we have the aRFS rmap management in core, add CPU affinity mask to
napi_config. To delegate the CPU affinity management to the core, drivers
must:
 1 - set the new netdev flag "irq_affinity_auto":
                                       netif_enable_irq_affinity(netdev)
 2 - create the napi with persistent config:
                                       netif_napi_add_config()
 3 - bind an IRQ to the napi instance: netif_napi_set_irq()

the core will then make sure to use re-assign affinity to the napi's
IRQ.

The default IRQ mask is set to one cpu starting from the closest NUMA.

Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://patch.msgid.link/20250224232228.990783-2-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26 19:51:37 -08:00
Eric Dumazet 3ba075278c tcp: be less liberal in TSEcr received while in SYN_RECV state
Yong-Hao Zou mentioned that linux was not strict as other OS in 3WHS,
for flows using TCP TS option (RFC 7323)

As hinted by an old comment in tcp_check_req(),
we can check the TSEcr value in the incoming packet corresponds
to one of the SYNACK TSval values we have sent.

In this patch, I record the oldest and most recent values
that SYNACK packets have used.

Send a challenge ACK if we receive a TSEcr outside
of this range, and increase a new SNMP counter.

nstat -az | grep TSEcrRejected
TcpExtTSEcrRejected            0                  0.0

Due to TCP fastopen implementation, do not apply yet these checks
for fastopen flows.

v2: No longer use req->num_timeout, but treq->snt_tsval_first
    to detect when first SYNACK is prepared. This means
    we make sure to not send an initial zero TSval.
    Make sure MPTCP and TCP selftests are passing.
    Change MIB name to TcpExtTSEcrRejected

v1: https://lore.kernel.org/netdev/CADVnQykD8i4ArpSZaPKaoNxLJ2if2ts9m4As+=Jvdkrgx1qMHw@mail.gmail.com/T/

Reported-by: Yong-Hao Zou <yonghaoz1994@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250225171048.3105061-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26 18:11:17 -08:00
Michael Ellerman d9fefcea81 net: spider_net: Remove powerpc Cell driver
This driver can no longer be built since support for IBM Cell Blades was
removed, in particular PPC_IBM_CELL_BLADE.

Remove the driver and the documentation.
Remove the MAINTAINERS entry, and add Ishizaki and Geoff to CREDITS.

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20241218105523.416573-24-mpe@ellerman.id.au
2025-02-26 21:15:18 +05:30
Gal Pressman ecdff89338 ethtool: Symmetric OR-XOR RSS hash
Add an additional type of symmetric RSS hash type: OR-XOR.
The "Symmetric-OR-XOR" algorithm transforms the input as follows:

(SRC_IP | DST_IP, SRC_IP ^ DST_IP, SRC_PORT | DST_PORT, SRC_PORT ^ DST_PORT)

Change 'cap_rss_sym_xor_supported' to 'supported_input_xfrm', a bitmap
of supported RXH_XFRM_* types.

Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20250224174416.499070-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-25 18:31:04 -08:00
Sven Eckelmann 9443335502 batman-adv: Use consistent name for mesh interface
The way how the virtual interface is called inside the batman-adv source
code is not consistent. The genl headers call it meshif and the rest of the
code calls is (mostly) softif.

The genl definitions cannot be touched because they are part of the UAPI.
But the rest of the batman-adv code can be touched to have a consistent
name again.

The bulk of the renaming was done using

  sed -i -e 's/soft\(-\|\_\| \|\)i\([nf]\)/mesh\1i\2/g' \
         -e 's/SOFT\(-\|\_\| \|\)I\([NF]\)/MESH\1I\2/g'

and then it was adjusted slightly when proofreading the changes.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2025-02-22 11:36:22 +01:00
Jakub Kicinski e87700965a bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCZ7ffOQAKCRAraS+Z+3y5
 EzVHAP9h/QkeYoOZW9gul08I8vFiZsFe/lbOSLJWxeVfxb9JhgD/cMqby3qAxQK6
 lsdNQ9jYG2232Wym89ag7fvTBK15Wg4=
 =gkN2
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-02-20

We've added 19 non-merge commits during the last 8 day(s) which contain
a total of 35 files changed, 1126 insertions(+), 53 deletions(-).

The main changes are:

1) Add TCP_RTO_MAX_MS support to bpf_set/getsockopt, from Jason Xing

2) Add network TX timestamping support to BPF sock_ops, from Jason Xing

3) Add TX metadata Launch Time support, from Song Yoong Siang

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  igc: Add launch time support to XDP ZC
  igc: Refactor empty frame insertion for launch time support
  net: stmmac: Add launch time support to XDP ZC
  selftests/bpf: Add launch time request to xdp_hw_metadata
  xsk: Add launch time hardware offload support to XDP Tx metadata
  selftests/bpf: Add simple bpf tests in the tx path for timestamping feature
  bpf: Support selective sampling for bpf timestamping
  bpf: Add BPF_SOCK_OPS_TSTAMP_SENDMSG_CB callback
  bpf: Add BPF_SOCK_OPS_TSTAMP_ACK_CB callback
  bpf: Add BPF_SOCK_OPS_TSTAMP_SND_HW_CB callback
  bpf: Add BPF_SOCK_OPS_TSTAMP_SND_SW_CB callback
  bpf: Add BPF_SOCK_OPS_TSTAMP_SCHED_CB callback
  net-timestamp: Prepare for isolating two modes of SO_TIMESTAMPING
  bpf: Disable unsafe helpers in TX timestamping callbacks
  bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback
  bpf: Prepare the sock_ops ctx and call bpf prog for TX timestamping
  bpf: Add networking timestamping support to bpf_get/setsockopt()
  selftests/bpf: Add rto max for bpf_setsockopt test
  bpf: Support TCP_RTO_MAX_MS for bpf_setsockopt
====================

Link: https://patch.msgid.link/20250221022104.386462-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:59:47 -08:00
Leon Romanovsky cc18f482e8 xfrm: provide common xdo_dev_offload_ok callback implementation
Almost all drivers except bond and nsim had same check if device
can perform XFRM offload on that specific packet. The check was that
packet doesn't have IPv4 options and IPv6 extensions.

In NIC drivers, the IPv4 HELEN comparison was slightly different, but
the intent was to check for the same conditions. So let's chose more
strict variant as a common base.

Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-02-21 08:08:15 +01:00
Linus Torvalds 319fc77f8f BPF fixes:
- Fix a soft-lockup in BPF arena_map_free on 64k page size
   kernels (Alan Maguire)
 
 - Fix a missing allocation failure check in BPF verifier's
   acquire_lock_state (Kumar Kartikeya Dwivedi)
 
 - Fix a NULL-pointer dereference in trace_kfree_skb by adding
   kfree_skb to the raw_tp_null_args set (Kuniyuki Iwashima)
 
 - Fix a deadlock when freeing BPF cgroup storage (Abel Wu)
 
 - Fix a syzbot-reported deadlock when holding BPF map's
   freeze_mutex (Andrii Nakryiko)
 
 - Fix a use-after-free issue in bpf_test_init when
   eth_skb_pkt_type is accessing skb data not containing an
   Ethernet header (Shigeru Yoshida)
 
 - Fix skipping non-existing keys in generic_map_lookup_batch
   (Yan Zhai)
 
 - Several BPF sockmap fixes to address incorrect TCP copied_seq
   calculations, which prevented correct data reads from recv(2)
   in user space (Jiayuan Chen)
 
 - Two fixes for BPF map lookup nullness elision (Daniel Xu)
 
 - Fix a NULL-pointer dereference from vmlinux BTF lookup in
   bpf_sk_storage_tracing_allowed (Jared Kangas)
 
 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
 -----BEGIN PGP SIGNATURE-----
 
 iIsEABYKADMWIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZ7evlRUcZGFuaWVsQGlv
 Z2VhcmJveC5uZXQACgkQ2yufC7HISIPwHgD/dTvM00Ck4Q73fPivyT7tcqxeXJlD
 D6ggzWl/SG9LAbwA/2/cSgAM9Jm1g7ddvn/S9QaDYOs5GmFl6urq6krs+tYD
 =FCs9
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull BPF fixes from Daniel Borkmann:

 - Fix a soft-lockup in BPF arena_map_free on 64k page size kernels
   (Alan Maguire)

 - Fix a missing allocation failure check in BPF verifier's
   acquire_lock_state (Kumar Kartikeya Dwivedi)

 - Fix a NULL-pointer dereference in trace_kfree_skb by adding kfree_skb
   to the raw_tp_null_args set (Kuniyuki Iwashima)

 - Fix a deadlock when freeing BPF cgroup storage (Abel Wu)

 - Fix a syzbot-reported deadlock when holding BPF map's freeze_mutex
   (Andrii Nakryiko)

 - Fix a use-after-free issue in bpf_test_init when eth_skb_pkt_type is
   accessing skb data not containing an Ethernet header (Shigeru
   Yoshida)

 - Fix skipping non-existing keys in generic_map_lookup_batch (Yan Zhai)

 - Several BPF sockmap fixes to address incorrect TCP copied_seq
   calculations, which prevented correct data reads from recv(2) in user
   space (Jiayuan Chen)

 - Two fixes for BPF map lookup nullness elision (Daniel Xu)

 - Fix a NULL-pointer dereference from vmlinux BTF lookup in
   bpf_sk_storage_tracing_allowed (Jared Kangas)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests: bpf: test batch lookup on array of maps with holes
  bpf: skip non exist keys in generic_map_lookup_batch
  bpf: Handle allocation failure in acquire_lock_state
  bpf: verifier: Disambiguate get_constant_map_key() errors
  bpf: selftests: Test constant key extraction on irrelevant maps
  bpf: verifier: Do not extract constant map keys for irrelevant maps
  bpf: Fix softlockup in arena_map_free on 64k page kernel
  net: Add rx_skb of kfree_skb to raw_tp_null_args[].
  bpf: Fix deadlock when freeing cgroup storage
  selftests/bpf: Add strparser test for bpf
  selftests/bpf: Fix invalid flag of recv()
  bpf: Disable non stream socket for strparser
  bpf: Fix wrong copied_seq calculation
  strparser: Add read_sock callback
  bpf: avoid holding freeze_mutex during mmap operation
  bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic
  selftests/bpf: Adjust data size to have ETH_HLEN
  bpf, test_run: Fix use-after-free issue in eth_skb_pkt_type()
  bpf: Remove unnecessary BTF lookups in bpf_sk_storage_tracing_allowed
2025-02-20 15:37:17 -08:00
Song Yoong Siang ca4419f15a xsk: Add launch time hardware offload support to XDP Tx metadata
Extend the XDP Tx metadata framework so that user can requests launch time
hardware offload, where the Ethernet device will schedule the packet for
transmission at a pre-determined time called launch time. The value of
launch time is communicated from user space to Ethernet driver via
launch_time field of struct xsk_tx_metadata.

Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250216093430.957880-2-yoong.siang.song@intel.com
2025-02-20 15:13:45 -08:00
Oleksij Rempel 6b89d89f21 can: j1939: Extend stack documentation with buffer size behavior
Extend the J1939 stack documentation to include information about how
buffer sizes influence stack behavior, detailing handling of simple
sessions, TP, and ETP transfers.

Additionally, describe various setsockopt(2) options, including their
usage  and potential error values that can be returned by the stack.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20241013181715.3488980-1-o.rempel@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-02-19 11:10:53 +01:00
Mauro Carvalho Chehab f12583fd2b docs: networking: Allow creating cross-references statistics ABI
Now that Documentation/ABI is processed by automarkup, let it
generate cross-references for the corresponding ABI file.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/a34ab9bef8f4e6b89dcb15098557fd3a7a9aa353.1739254867.git.mchehab+huawei@kernel.org
2025-02-18 13:42:46 -07:00
David Wei d9ac1d5fc9 net: add documentation for io_uring zcrx
Add documentation for io_uring zero copy Rx that explains requirements
and the user API.

Signed-off-by: David Wei <dw@davidwei.uk>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20250215000947.789731-11-dw@davidwei.uk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-17 05:41:09 -07:00
Joe Damato bf1b8e0abc documentation: networking: Add NAPI config
Document the existence of persistent per-NAPI configuration space and
the API that drivers can opt into.

Update stale documentation which suggested that NAPI IDs cannot be
queried from userspace.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://patch.msgid.link/20250213191535.38792-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14 17:09:19 -08:00
Jakub Kicinski 7a7e019713 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.14-rc3).

No conflicts or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-13 12:43:30 -08:00
Ritvik Gupta c3a97ccaed Documentation: dpaa2 ethernet switch driver: Fix spelling
Corrected spelling mistake

Signed-off-by: Ritvik Gupta <ritvikfoss@gmail.com>
Link: https://patch.msgid.link/20250212021311.13257-1-ritvikfoss@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-13 16:13:00 +01:00
Akiva Goldberger b820864335 net/mlx5: Expose ICM consumption per function
ICM is a portion of the host's memory assigned to a function by the OS
through requests made by the NIC's firmware.

PF ICM consumption can be accessed directly, while VF/SF ICM consumption
can be accessed through their representors in switchdev mode.

The value is exposed to the user in granularity of 4KB through the vnic
health reporter as follows:

$ devlink health diagnose pci/0000:08:00.0 reporter vnic
 vNIC env counters:
     total_error_queues: 0 send_queue_priority_update_flow: 0
     comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0
     invalid_command: 0 quota_exceeded_command: 0
     nic_receive_steering_discard: 0 icm_consumption: 1032

Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250209101716.112774-11-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12 10:46:17 -08:00
Edward Cree 5ea73bf3c4 sfc: document devlink flash support
Update the information in sfc's devlink documentation including
 support for firmware update with devlink flash.
Also update the help text for CONFIG_SFC_MTD, as it is no longer
 strictly required for firmware updates.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/3476b0ef04a0944f03e0b771ec8ed1a9c70db4dc.1739186253.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11 17:12:17 -08:00
Eric Dumazet 1280c26228 tcp: add tcp_rto_max_ms sysctl
Previous patch added a TCP_RTO_MAX_MS socket option
to tune a TCP socket max RTO value.

Many setups prefer to change a per netns sysctl.

This patch adds /proc/sys/net/ipv4/tcp_rto_max_ms

Its initial value is 120000 (120 seconds).

Keep in mind that a decrease of tcp_rto_max_ms
means shorter overall timeouts, unless tcp_retries2
sysctl is increased.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11 13:08:00 +01:00
Eric Dumazet 54a378f434 tcp: add the ability to control max RTO
Currently, TCP stack uses a constant (120 seconds)
to limit the RTO value exponential growth.

Some applications want to set a lower value.

Add TCP_RTO_MAX_MS socket option to set a value (in ms)
between 1 and 120 seconds.

It is discouraged to change the socket rto max on a live
socket, as it might lead to unexpected disconnects.

Following patch is adding a netns sysctl to control the
default value at socket creation time.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11 13:08:00 +01:00
Breno Leitao a7aec70a90 netconsole: docs: Add documentation for CPU number auto-population
Update the netconsole documentation to explain the new feature that
allows automatic population of the CPU number.

The key changes include introducing a new section titled "CPU number
auto population in userdata", explaining how to enable the CPU number
auto-population feature by writing to the "populate_cpu_nr" file in the
netconsole configfs hierarchy.

This documentation update ensures users are aware of the new CPU number
auto-population functionality and how to leverage it for better
demultiplexing and visibility of parallel netconsole output.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2025-02-10 15:04:18 +00:00
Reyders Morales d0b197b650 Documentation/networking: fix basic node example document ISO 15765-2
In the current struct sockaddr_can tp is member of can_addr. tp is not
member of struct sockaddr_can.

Signed-off-by: Reyders Morales <reyders1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20250203224720.42530-1-reyders1@gmail.com
Fixes: 67711e0425 ("Documentation: networking: document ISO 15765-2")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-02-08 11:28:11 +01:00
Michal Swiatkowski b2657259fc ice: devlink PF MSI-X max and min parameter
Use generic devlink PF MSI-X parameter to allow user to change MSI-X
range.

Add notes about this parameters into ice devlink documentation.

Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-05 09:02:44 -08:00
Matthieu Baerts (NGI0) 18da4b5d12 doc: mptcp: sysctl: blackhole_timeout is per-netns
All other sysctl entries mention it, and it is a per-namespace sysctl.

So mention it as well.

Fixes: 27069e7cb3 ("mptcp: disable active MPTCP in case of blackhole")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-30 14:02:19 +01:00
Jiayuan Chen 0532a79efd strparser: Add read_sock callback
Added a new read_sock handler, allowing users to customize read operations
instead of relying on the native socket's read_sock.

Signed-off-by: Jiayuan Chen <mrpre@163.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://patch.msgid.link/20250122100917.49845-2-mrpre@163.com
2025-01-29 13:32:08 -08:00
Khaled Elnaggar cde5959913 documentation: networking: fix spelling mistakes
Fix a couple of typos/spelling mistakes in the documentation.

Signed-off-by: Khaled Elnaggar <khaledelnaggarlinux@gmail.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://patch.msgid.link/20250123082521.59997-1-khaledelnaggarlinux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 14:26:06 -08:00
Jakub Kicinski 59372af69d This cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
 
  - Reorder includes for distributed-arp-table.c, by Sven Eckelmann
 
  - Fix translation table change handling, by Remi Pommarel (2 patches)
 
  - Map VID 0 to untagged TT VLAN, by Sven Eckelmann
 
  - Update MAINTAINERS/mailmap e-mail addresses, by the respective authors
    (4 patches)
 
  - netlink: reduce duplicate code by returning interfaces,
    by Linus Lüssing
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmeKTuUWHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoREQEACKKLbl6IGJ9xXTlaRhtUwr55KD
 JmvJT+y25IOmJBlRpZ6zHEjDxlsF/W+Se63k4tWrQSEscqBLXL9K0nY171cU+sJ0
 Nx6AF2F6GobfVuzJGoJWjOZ3eCk4+stiCsBPz0ZJmu5cN24nZRkQZWpOJCSl1KtY
 huRG+bum58hw4kvaK5IU+9iLxtN5ec3STVZjEe0GKVqB6vo19+XGnPJqNSlVDDiQ
 iJSl+6VjaYYG2zclyQDefo4CuILVTAxGi5vaVFhoLVkUj2QPo41BEpkfQxZ8P1HR
 bFXfnOITuhCeDgzF2uwocaeW0P06kyKa+cyH4PO5BvQOp5up667aDdXtgKPBvKzV
 5Fn+WN0A/SsM+tLdGkVdbU1ceNgsNYLyROJgorDNkiGUmRvrFOmhXVzhJMlcppkR
 O+qh7K7Kpoyuz3a2ZiktvG5sSG0aFhQ3GRJyZlqKGV5bLzJ8e2Z+0aCFKtVw0RlZ
 3lg2/iEb5//+aJKZToHH97U3PG2ahO76WPSiaTPVVfvGYS2gE7fihHWtI9Uf9FzB
 AHFhiOP5oPet5jSjH0jrybOEHcSvuxsOi+vIC0hCiPV4O/fN7tjeEj4XM3uPF5LQ
 FI26vvS5izhZxOWVIUPnz5pSWzTmfiRPNbckcESHt8CHjlSTUw3TPlC0jua0iXMF
 shhpONEPSjZvExybLg==
 =cv/9
 -----END PGP SIGNATURE-----

Merge tag 'batadv-next-pullrequest-20250117' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - Reorder includes for distributed-arp-table.c, by Sven Eckelmann

 - Fix translation table change handling, by Remi Pommarel (2 patches)

 - Map VID 0 to untagged TT VLAN, by Sven Eckelmann

 - Update MAINTAINERS/mailmap e-mail addresses, by the respective authors
   (4 patches)

 - netlink: reduce duplicate code by returning interfaces,
   by Linus Lüssing

* tag 'batadv-next-pullrequest-20250117' of git://git.open-mesh.org/linux-merge:
  batman-adv: netlink: reduce duplicate code by returning interfaces
  MAINTAINERS: mailmap: add entries for Antonio Quartulli
  mailmap: add entries for Sven Eckelmann
  mailmap: add entries for Simon Wunderlich
  MAINTAINERS: update email address of Marek Linder
  batman-adv: Map VID 0 to untagged TT VLAN
  batman-adv: Don't keep redundant TT change events
  batman-adv: Remove atomic usage for tt.local_changes
  batman-adv: Reorder includes for distributed-arp-table.c
  batman-adv: Start new development cycle
====================

Link: https://patch.msgid.link/20250117123910.219278-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-18 17:57:31 -08:00
Matthieu Baerts (NGI0) 5b4fd35343 mptcp: sysctl: add syn_retrans_before_tcp_fallback
The number of SYN + MPC retransmissions before falling back to TCP was
fixed to 2. This is certainly a good default value, but having a fixed
number can be a problem in some environments.

The current behaviour means that if all packets are dropped, there will
be:

- The initial SYN + MPC

- 2 retransmissions with MPC

- The next ones will be without MPTCP.

So typically ~3 seconds before falling back to TCP. In some networks
where some temporally blackholes are unfortunately frequent, or when a
client tries to initiate connections while the network is not ready yet,
this can cause new connections not to have MPTCP connections.

In such environments, it is now possible to increase the number of SYN
retransmissions with MPTCP options to make sure MPTCP is used.

Interesting values are:

- 0: the first retransmission will be done without MPTCP options: quite
     aggressive, but also a higher risk of detecting false-positive
     MPTCP blackholes.

- >= 128: all SYN retransmissions will keep the MPTCP options: back to
          the < 6.12 behaviour.

The default behaviour is not changed here.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250117-net-next-mptcp-syn_retrans_before_tcp_fallback-v1-1-ab4b187099b0@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-18 17:32:01 -08:00
Vladimir Oltean 6a128cdf19 net: ethtool: ts: add separate counter for unconfirmed one-step TX timestamps
For packets with two-step timestamp requests, the hardware timestamp
comes back to the driver through a confirmation mechanism of sorts,
which allows the driver to confidently bump the successful "pkts"
counter.

For one-step PTP, the NIC is supposed to autonomously insert its
hardware TX timestamp in the packet headers while simultaneously
transmitting it. There may be a confirmation that this was done
successfully, or there may not.

None of the current drivers which implement ethtool_ops :: get_ts_stats()
also support HWTSTAMP_TX_ONESTEP_SYNC or HWTSTAMP_TX_ONESTEP_SYNC, so it
is a bit unclear which model to follow. But there are NICs, such as DSA,
where there is no transmit confirmation at all. Here, it would be wrong /
misleading to increment the successful "pkts" counter, because one-step
PTP packets can be dropped on TX just like any other packets.

So introduce a special counter which signifies "yes, an attempt was made,
but we don't know whether it also exited the port or not". I expect that
for one-step PTP packets where a confirmation is available, the "pkts"
counter would be bumped.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250116104628.123555-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-17 20:01:09 -08:00
Taehee Yoo eec8359f07 net: ethtool: add support for configuring hds-thresh
The hds-thresh option configures the threshold value of
the header-data-split.
If a received packet size is larger than this threshold value, a packet
will be split into header and payload.
The header indicates TCP and UDP header, but it depends on driver spec.
The bnxt_en driver supports HDS(Header-Data-Split) configuration at
FW level, affecting TCP and UDP too.
So, If hds-thresh is set, it affects UDP and TCP packets.

Example:
   # ethtool -G <interface name> hds-thresh <value>

   # ethtool -G enp14s0f0np0 tcp-data-split on hds-thresh 256
   # ethtool -g enp14s0f0np0
   Ring parameters for enp14s0f0np0:
   Pre-set maximums:
   ...
   HDS thresh:  1023
   Current hardware settings:
   ...
   TCP data split:         on
   HDS thresh:  256

The default/min/max values are not defined in the ethtool so the drivers
should define themself.
The 0 value means that all TCP/UDP packets' header and payload
will be split.

Tested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20250114142852.3364986-3-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15 14:42:11 -08:00
Oleksij Rempel 7d66c74a17 Documentation: networking: update PHY error counter diagnostics in twisted pair guide
Replace generic instructions for monitoring error counters with a
procedure using the unified PHY statistics interface (`--all-groups`).

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14 11:44:19 +01:00
Jakub Kicinski 6167c0b6e8 net: ethtool: add support for structured PHY statistics
Introduce a new way to report PHY statistics in a structured and
standardized format using the netlink API. This new method does not
replace the old driver-specific stats, which can still be accessed with
`ethtool -S <eth name>`. The structured stats are available with
`ethtool -S <eth name> --all-groups`.

This new method makes it easier to diagnose problems by organizing stats
in a consistent and documented way.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14 11:44:19 +01:00
Moshe Shemesh 9fc43b5e39 net/mlx5: fs, add HWS to steering mode options
Add HW Steering mode to mlx5 devlink param of steering mode options.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250109160546.1733647-14-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-13 19:21:09 -08:00
David S. Miller 7b24f164cf ipsec-next-2025-01-09
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmd/mNkACgkQrB3Eaf9P
 W7cQww/9Hnv7+wosuBxFW2o2ptgZThiwj/Ovz0oiWGWvctpN7VlpwmrDOsXQ+XMn
 xF6JBFEjnJanYoDBb78D0dMffJcMcUKZopTUU/ZMitSNr8aIHYiuB4SWPG1tqxl4
 Ete7Mr2m3tS96YePQNnAaRZzEuGsx3BQb28VLTWl9So81MByD2OK4fsAbYz22Gg8
 7A6tDHn1mUd9b2VG+LeeBZaDDFG8C0O2x4E/8Z3DX3z1N8y3LABPwZ38jcgTviKO
 1ZldGrJT+PownBydu23bWDassKE2TuVvGH9e/SOPeQj8DJ4Lmd0bafMZTY6xwfNT
 RJCwhlzZUpYRXFzvcf3+U3egsqEWEemV7/LzAapdT0V9OqfLWMUh3b1jMA4KcblZ
 qmlm/MhZyXutDoeuASwtM4jgM3wGwovOofrKKsb13hD9VLBs5jFZmFSw5TlbmwE3
 sjv7V4pFwNyfJnwQtyMmfuuHiy8w+fzqAA2GCg8mF3OosHABH/FOvEBP8xg1Vqu1
 iKlLkByfyaCFD+GxTPzqSDvSB8nDzeZBgBM/ILGuwH0OWr+0gxBzl1sxrhAkw9hC
 Gf+4J3wg7EknTxfrJk4LyqfyS50GvUIzpLSkHYxDtQRd4zv75bxRzMvoZvuapTtN
 GGpWYftKO8n2kgLORQ0dTtFee3c4w/No+KXWsFdtRVNpyk3N0Yw=
 =U79q
 -----END PGP SIGNATURE-----

Merge tag 'ipsec-next-2025-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
ipsec-next-2025-01-09

1) Implement the AGGFRAG protocol and basic IP-TFS (RFC9347) functionality.
   From Christian Hopps.

2) Support ESN context update to hardware for TX.
   From Jianbo Liu.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-10 09:15:17 +00:00
Jan Stancek ab88c2b373 tools: ynl: move python code to separate sub-directory
Move python code to a separate directory so it can be
packaged as a python module. Updates existing references
in selftests and docs.

Also rename ynl-gen-[c|rst] to ynl_gen_[c|rst], avoid
dashes as these prevent easy imports for entrypoints.

Signed-off-by: Jan Stancek <jstancek@redhat.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/a4151bad0e6984e7164d395125ce87fd2e048bf1.1736343575.git.jstancek@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09 12:53:27 -08:00
Jakub Kicinski 0b7bdc7fab netdev: define NETDEV_INTERNAL
Linus suggested during one of past maintainer summits (in context of
a DMA_BUF discussion) that symbol namespaces can be used to prevent
unwelcome but in-tree code from using all exported functions.
Create a namespace for netdev.

Export netdev_rx_queue_restart(), drivers may want to use it since
it gives them a simple and safe way to restart a queue to apply
config changes. But it's both too low level and too actively developed
to be used outside netdev.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-09 15:33:08 +01:00
Aleksandr Loktionov 07af482e64 i40e: add ability to reset VF for Tx and Rx MDD events
Implement "mdd-auto-reset-vf" priv-flag to handle Tx and Rx MDD events for VFs.
This flag is also used in other network adapters like ICE.

Usage:
- "on"  - The problematic VF will be automatically reset
	  if a malformed descriptor is detected.
- "off" - The problematic VF will be disabled.

In cases where a VF sends malformed packets classified as malicious, it can
cause the Tx queue to freeze, rendering it unusable for several minutes. When
an MDD event occurs, this new implementation allows for a graceful VF reset to
quickly restore operational state.

Currently, VF queues are disabled if an MDD event occurs. This patch adds the
ability to reset the VF if a Tx or Rx MDD event occurs. It also includes MDD
event logging throttling to avoid dmesg pollution and unifies the format of
Tx and Rx MDD messages.

Note: Standard message rate limiting functions like dev_info_ratelimited()
do not meet our requirements. Custom rate limiting is implemented,
please see the code for details.

Co-developed-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Co-developed-by: Padraig J Connolly <padraig.j.connolly@intel.com>
Signed-off-by: Padraig J Connolly <padraig.j.connolly@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250106221929.956999-13-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07 18:16:02 -08:00
Jianbo Liu 373b79af3a xfrm: Support ESN context update to hardware for TX
Previously xfrm_dev_state_advance_esn() was added for RX only. But
it's possible that ESN context also need to be synced to hardware for
TX, so call it for outbound in this patch.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-01-07 13:12:11 +01:00
Jakub Kicinski 3e5908172c Merge tag 'ieee802154-for-net-next-2025-01-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan-next
Stefan Schmidt says:

====================
pull-request: ieee802154-next 2025-01-03

Leo Stone provided a documatation fix to improve the grammar.

David Gilbert spotted a non-used fucntion we can safely remove.

* tag 'ieee802154-for-net-next-2025-01-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan-next:
  net: mac802154: Remove unused ieee802154_mlme_tx_one
  Documentation: ieee802154: fix grammar
====================

Link: https://patch.msgid.link/20250103154605.440478-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-04 17:02:55 -08:00
Marek Lindner 7bce3f7518 MAINTAINERS: update email address of Marek Linder
Signed-off-by: Marek Lindner <marek.lindner@mailbox.org>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2024-12-31 09:20:19 +01:00
Mina Almasry f6038d913b net: Document netmem driver support
Document expectations from drivers looking to add support for device
memory tcp or other netmem based features.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20241217201206.2360389-1-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-19 19:12:36 -08:00
shunlizhou 65c233d8e3 docs: net: bonding: fix typos
The bonding documentation had several "insure" which is not
properly used in the context. Suggest to change to "ensure"
to improve readability.

Signed-off-by: shunlizhou <shunlizhou@aliyun.com>
Link: https://patch.msgid.link/20241216135447.57681-1-shunlizhou@aliyun.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-18 17:27:35 -08:00
Kory Maincent 6e9e2eed4f net: ethtool: Add support for tsconfig command to get/set hwtstamp config
Introduce support for ETHTOOL_MSG_TSCONFIG_GET/SET ethtool netlink socket
to read and configure hwtstamp configuration of a PHC provider. Note that
simultaneous hwtstamp isn't supported; configuring a new one disables the
previous setting.

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16 12:51:41 +00:00
Kory Maincent b9e3f7dc9e net: ethtool: tsinfo: Enhance tsinfo to support several hwtstamp by net topology
Either the MAC or the PHY can provide hwtstamp, so we should be able to
read the tsinfo for any hwtstamp provider.

Enhance 'get' command to retrieve tsinfo of hwtstamp providers within a
network topology.

Add support for a specific dump command to retrieve all hwtstamp
providers within the network topology, with added functionality for
filtered dump to target a single interface.

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16 12:51:41 +00:00
Sabrina Dubroca 5aa97a43d0 docs: tls: document TLS1.3 key updates
Document the kernel's behavior and userspace expectations.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16 12:47:30 +00:00
Jakub Kicinski 5098462fba Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.13-rc3).

No conflicts or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-12 14:19:05 -08:00
Jakub Sitnicki ca6a6f9386 tcp: Add sysctl to configure TIME-WAIT reuse delay
Today we have a hardcoded delay of 1 sec before a TIME-WAIT socket can be
reused by reopening a connection. This is a safe choice based on an
assumption that the other TCP timestamp clock frequency, which is unknown
to us, may be as low as 1 Hz (RFC 7323, section 5.4).

However, this means that in the presence of short lived connections with an
RTT of couple of milliseconds, the time during which a 4-tuple is blocked
from reuse can be orders of magnitude longer that the connection lifetime.
Combined with a reduced pool of ephemeral ports, when using
IP_LOCAL_PORT_RANGE to share an egress IP address between hosts [1], the
long TIME-WAIT reuse delay can lead to port exhaustion, where all available
4-tuples are tied up in TIME-WAIT state.

Turn the reuse delay into a per-netns setting so that sysadmins can make
more aggressive assumptions about remote TCP timestamp clock frequency and
shorten the delay in order to allow connections to reincarnate faster.

Note that applications can completely bypass the TIME-WAIT delay protection
already today by locking the local port with bind() before connecting. Such
immediate connection reuse may result in PAWS failing to detect old
duplicate segments, leaving us with just the sequence number check as a
safety net.

This new configurable offers a trade off where the sysadmin can balance
between the risk of PAWS detection failing to act versus exhausting ports
by having sockets tied up in TIME-WAIT state for too long.

[1] https://lpc.events/event/16/contributions/1349/

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20241209-jakub-krn-909-poc-msec-tw-tstamp-v2-2-66aca0eed03e@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-11 20:17:33 -08:00
Petr Machata bbe4b41259 Documentation: networking: Add a caveat to nexthop_compat_mode sysctl
net.ipv4.nexthop_compat_mode was added when nexthop objects were added to
provide the view of nexthop objects through the usual lens of the route
UAPI. As nexthop objects evolved, the information provided through this
lens became incomplete. For example, details of resilient nexthop groups
are obviously omitted.

Now that 16-bit nexthop group weights are a thing, the 8-bit UAPI cannot
convey the >8-bit weight accurately. Instead of inventing workarounds for
an obsolete interface, just document the expectations of inaccuracy.

Fixes: b72a6a7ab9 ("net: nexthop: Increase weight to u16")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/b575e32399ccacd09079b2a218255164535123bd.1733740749.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-10 18:26:24 -08:00
Jakub Kicinski 302cc446cb Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.13-rc2).

No conflicts or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-05 11:50:14 -08:00
Maksym Kutsevol 36de47bfd0 netcons: Add udp send fail statistics to netconsole
Enhance observability of netconsole. Packet sends can fail.
Start tracking at least two failure possibilities: ENOMEM and
NET_XMIT_DROP for every target. Stats are exposed via an additional
attribute in CONFIGFS.

The exposed statistics allows easier debugging of cases when netconsole
messages were not seen by receivers, eliminating the guesswork if the
sender thinks that messages in question were sent out.

Stats are not reset on enable/disable/change remote ip/etc, they
belong to the netcons target itself.

Reported-by: Breno Leitao <leitao@debian.org>
Closes: https://lore.kernel.org/all/ZsWoUzyK5du9Ffl+@gmail.com/
Signed-off-by: Maksym Kutsevol <max@kutsevol.com>
Link: https://patch.msgid.link/20241202-netcons-add-udp-send-fail-statistics-to-netconsole-v5-2-70e82239f922@kutsevol.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-04 19:16:59 -08:00
Vyshnav Ajith 16ed454515 docs: net: bareudp: fix spelling and grammar mistakes
The BareUDP documentation had several grammar and spelling mistakes,
making it harder to read. This patch fixes those errors to improve
clarity and readability for developers.

Signed-off-by: Vyshnav Ajith <puthen1977@gmail.com>
Link: https://patch.msgid.link/20241121224827.12293-1-puthen1977@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-30 13:54:28 -08:00
Leo Stone 04f5cb4899 Documentation: tls_offload: fix typos and grammar
Fix typos and grammar where it improves readability.

Signed-off-by: Leo Stone <leocstone@gmail.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://patch.msgid.link/20241124230002.56058-1-leocstone@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-28 12:09:06 +01:00
Vyshnav Ajith 6e33123a18 Fix spelling mistake
Changed from reequires to require. A minute typo.

Signed-off-by: Vyshnav Ajith <puthen1977@gmail.com>
Link: https://patch.msgid.link/20241121221852.10754-1-puthen1977@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-28 12:08:32 +01:00
tuqiang a202e60147 Documentation: tipc: fix formatting issue in tipc.rst
The hyphen is removed to have the same style as the others.

Fixes: 09ef17863f ("Documentation: add more details in tipc.rst")
Signed-off-by: tuqiang <tu.qiang35@zte.com.cn>
Signed-off-by: Jiang Kun <jiang.kun2@zte.com.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jon Maloy <jmaloy@redhat.com>
Cc: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20241115104731435HuFjZOmkaqumuFhB7mrwe@zte.com.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-19 11:17:13 +01:00
Sanman Pradhan 79da2aaa08 eth: fbnic: add RPC hardware statistics
Report Rx parser statistics via ethtool -S.

The parser stats are 32b, so we need to add refresh to the service
task to make sure we don't miss overflows.

Signed-off-by: Sanman Pradhan <sanman.p211993@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241115015344.757567-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:43:41 -08:00
Sanman Pradhan 25ba596d13 eth: fbnic: add PCIe hardware statistics
Add PCIe hardware statistics support to the fbnic driver. These stats
provide insight into PCIe transaction performance and error conditions.

Which includes, read/write and completion TLP counts and DWORD counts and
debug counters for tag, completion credit and NP credit exhaustion

The stats are exposed via debugfs and can be used to monitor PCIe
performance and debug PCIe issues.

Signed-off-by: Sanman Pradhan <sanman.p211993@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241115015344.757567-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:43:41 -08:00
Jakub Kicinski a79993b5fc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.12-rc8).

Conflicts:

tools/testing/selftests/net/.gitignore
  252e01e682 ("selftests: net: add netlink-dumps to .gitignore")
  be43a6b238 ("selftests: ncdevmem: Move ncdevmem under drivers/net/hw")
https://lore.kernel.org/all/20241113122359.1b95180a@canb.auug.org.au/

drivers/net/phy/phylink.c
  671154f174 ("net: phylink: ensure PHY momentary link-fails are handled")
  7530ea26c8 ("net: phylink: remove "using_mac_select_pcs"")

Adjacent changes:

drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c
  5b366eae71 ("stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines")
  e96321fad3 ("net: ethernet: Switch back to struct platform_driver::remove()")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-14 11:29:15 -08:00
Geetha sowjanya 6050b04dca Documentation: octeontx2: Add Documentation for RVU representors
Adds documentation for creating and configuring rvu port representors

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-13 11:57:12 +00:00
Joe Damato a90a91e24b docs: networking: Describe irq suspension
Describe irq suspension, the epoll ioctls, and the tradeoffs of using
different gro_flush_timeout values.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Co-developed-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://patch.msgid.link/20241109050245.191288-7-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11 18:45:06 -08:00
Mina Almasry 102d1404c3 net: clarify SO_DEVMEM_DONTNEED behavior in documentation
Document new behavior when the number of frags passed is too big.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20241107210331.3044434-2-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11 18:11:46 -08:00
Linu Cherian 46799a41d2 devlink: Add documentation for OcteonTx2 AF
Add documentation for the following devlink params
- npc_mcam_high_zone_percent
- npc_def_rule_cntr
- nix_maxlf

Signed-off-by: Linu Cherian <lcherian@marvell.com>
Link: https://patch.msgid.link/20241105125620.2114301-4-lcherian@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11 14:15:58 -08:00
Gilad Naaman f7f5273863 neighbour: Create netdev->neighbour association
Create a mapping between a netdev and its neighoburs,
allowing for much cheaper flushes.

Signed-off-by: Gilad Naaman <gnaaman@drivenets.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241107160444.2913124-7-gnaaman@drivenets.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-09 13:22:57 -08:00
Jakub Kicinski 2696e451df Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.12-rc7).

Conflicts:

drivers/net/ethernet/freescale/enetc/enetc_pf.c
  e15c5506dd ("net: enetc: allocate vf_state during PF probes")
  3774409fd4 ("net: enetc: build enetc_pf_common.c as a separate module")
https://lore.kernel.org/20241105114100.118bd35e@canb.auug.org.au

Adjacent changes:

drivers/net/ethernet/ti/am65-cpsw-nuss.c
  de794169cf ("net: ethernet: ti: am65-cpsw: Fix multi queue Rx on J7")
  4a7b2ba94a ("net: ethernet: ti: am65-cpsw: Use tstats instead of open coded version")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-07 13:44:16 -08:00
Jinjian Song 238f2ca1e6 net: wwan: t7xx: Unify documentation column width
Unify the column width of the document to comply with specifications.

Signed-off-by: Jinjian Song <jinjian.song@fibocom.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-07 13:33:45 +01:00
Jinjian Song 61329a1152 net: wwan: t7xx: Add debug ports
Add support for userspace to enable/disable the debug ports(ADB,MIPC).
 - ADB port: /dev/wwan0adb0
 - MIPC port: /dev/wwan0mipc0

Application can use ADB (Android Debug Bridge) port to implement
functions (shell, pull, push ...) by ADB protocol commands.
E.g., ADB commands:
 - A_OPEN: OPEN(local-id, 0, "destination")
 - A_WRTE: WRITE(local-id, remote-id, "data")
 - A_OKEY: READY(local-id, remote-id, "")
 - A_CLSE: CLOSE(local-id, remote-id, "")

Link: https://android.googlesource.com/platform/packages/modules/adb/+/refs/heads/main/README.md

Application can use MIPC (Modem Information Process Center) port
to debug antenna tuner or noise profiling through this MTK modem
diagnostic interface.

By default, debug ports are not exposed, so using the command
to enable or disable debug ports.

Enable debug ports:
 - enable: 'echo 1 > /sys/bus/pci/devices/${bdf}/t7xx_debug_ports

Disable debug ports:
 - disable: 'echo 0 > /sys/bus/pci/devices/${bdf}/t7xx_debug_ports

Signed-off-by: Jinjian Song <jinjian.song@fibocom.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-07 13:33:45 +01:00
Alexander Hölzl b6ec62e01a can: j1939: fix error in J1939 documentation.
The description of PDU1 format usage mistakenly referred to PDU2 format.

Signed-off-by: Alexander Hölzl <alexander.hoelzl@gmx.net>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://patch.msgid.link/20241023145257.82709-1-alexander.hoelzl@gmx.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-11-04 17:45:55 +01:00
Caleb Sander Mateos 61bf0009a7 dim: pass dim_sample to net_dim() by reference
net_dim() is currently passed a struct dim_sample argument by value.
struct dim_sample is 24 bytes. Since this is greater 16 bytes, x86-64
passes it on the stack. All callers have already initialized dim_sample
on the stack, so passing it by value requires pushing a duplicated copy
to the stack. Either witing to the stack and immediately reading it, or
perhaps dereferencing addresses relative to the stack pointer in a chain
of push instructions, seems to perform quite poorly.

In a heavy TCP workload, mlx5e_handle_rx_dim() consumes 3% of CPU time,
94% of which is attributed to the first push instruction to copy
dim_sample on the stack for the call to net_dim():
// Call ktime_get()
  0.26 |4ead2:   call   4ead7 <mlx5e_handle_rx_dim+0x47>
// Pass the address of struct dim in %rdi
       |4ead7:   lea    0x3d0(%rbx),%rdi
// Set dim_sample.pkt_ctr
       |4eade:   mov    %r13d,0x8(%rsp)
// Set dim_sample.byte_ctr
       |4eae3:   mov    %r12d,0xc(%rsp)
// Set dim_sample.event_ctr
  0.15 |4eae8:   mov    %bp,0x10(%rsp)
// Duplicate dim_sample on the stack
 94.16 |4eaed:   push   0x10(%rsp)
  2.79 |4eaf1:   push   0x10(%rsp)
  0.07 |4eaf5:   push   %rax
// Call net_dim()
  0.21 |4eaf6:   call   4eafb <mlx5e_handle_rx_dim+0x6b>

To allow the caller to reuse the struct dim_sample already on the stack,
pass the struct dim_sample by reference to net_dim().

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Link: https://patch.msgid.link/20241031002326.3426181-2-csander@purestorage.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03 12:36:54 -08:00
Jakub Kicinski 5b1c965956 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.12-rc6).

Conflicts:

drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c
  cbe84e9ad5 ("wifi: iwlwifi: mvm: really send iwl_txpower_constraints_cmd")
  188a1bf894 ("wifi: mac80211: re-order assigning channel in activate links")
https://lore.kernel.org/all/20241028123621.7bbb131b@canb.auug.org.au/

net/mac80211/cfg.c
  c4382d5ca1 ("wifi: mac80211: update the right link for tx power")
  8dd0498983 ("wifi: mac80211: Fix setting txpower with emulate_chanctx")

drivers/net/ethernet/intel/ice/ice_ptp_hw.h
  6e58c33106 ("ice: fix crash on probe for DPLL enabled E810 LOM")
  e4291b64e1 ("ice: Align E810T GPIO to other products")
  ebb2693f8f ("ice: Read SDP section from NVM for pin definitions")
  ac532f4f42 ("ice: Cleanup unused declarations")
https://lore.kernel.org/all/20241030120524.1ee1af18@canb.auug.org.au/

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-31 18:10:07 -07:00
Kory Maincent 2b1d193a5a Documentation: networking: Add missing PHY_GET command in the message list
ETHTOOL_MSG_PHY_GET/GET_REPLY/NTF is missing in the ethtool message list.
Add it to the ethool netlink documentation.

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20241028132351.75922-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 20:31:03 -07:00
Jakub Kicinski 71e0ad3451 wireless-next patches for v6.13
The first -next "new features" pull request for v6.13. This is a big
 one as we have not been able to send one earlier. We have also some
 patches affecting other subsystems: in staging we deleted the rtl8192e
 driver and in debugfs added a new interface to save struct
 file_operations memory; both were acked by GregKH.
 
 Because of the lib80211/libipw move there were quite a lot of
 conflicts and to solve those we decided to merge net-next into
 wireless-next.
 
 Currently there's one conflict in
 Documentation/networking/net_cachelines/net_device.rst. To fix that
 just remove the iw_public_data line:
 
 https://lore.kernel.org/all/20241011121014.674661a0@canb.auug.org.au/
 
 And when net is merged to net-next there will be another simple
 conflict in in net/mac80211/cfg.c:
 
 https://lore.kernel.org/all/20241024115523.4cd35dde@canb.auug.org.au/
 
 Major changes:
 
 cfg80211/mac80211
 
 * stop exporting wext symbols
 
 * new mac80211 op to indicate that a new interface is to be added
 
 * support radio separation of multi-band devices
 
 Wireless Extensions
 
 * move wext spy implementation to libiw
 
 * remove iw_public_data from struct net_device
 
 brcmfmac
 
 * optional LPO clock support
 
 ipw2x00
 
 * move remaining lib80211 code into libiw
 
 wilc1000
 
 * WILC3000 support
 
 rtw89
 
 * RTL8852BE and RTL8852BE-VT BT-coexistence improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmcbz9YRHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZsabQf8CWJ/kyonw/Z8hRxgfE/7D6Jiqoq7R+ML
 8W8lbc6F5wra4eCBq/oo6UVV36Ss6mxQYcRcmLq+nCkXa4qdMpg/z55QECMHxx5Z
 YnIBbD2vBrIj7W21gfCKH1WJ+b5IQFZl3zuxuCgXjxD9TJM2CjUfOkvrhrqqzrPn
 clfUx5f01vfv2jdvClPR5977gFE5One/ANeRQNs7uDS0TeeD2P+61DEB1//htIJo
 7GwwCyUJCeOcfWRMzQwhpoppWKcPAV70kSVJrl/fRstS68vQGSQbcx9yiNeWkSFw
 JXjQGdc8eYLPzLqECwS0KwFkta6AXbafAYYXe1wdlAzr+kmJ9x5oqA==
 =x+mr
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2024-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.13

The first -next "new features" pull request for v6.13. This is a big
one as we have not been able to send one earlier. We have also some
patches affecting other subsystems: in staging we deleted the rtl8192e
driver and in debugfs added a new interface to save struct
file_operations memory; both were acked by GregKH.

Because of the lib80211/libipw move there were quite a lot of
conflicts and to solve those we decided to merge net-next into
wireless-next.

Major changes:

cfg80211/mac80211
 * stop exporting wext symbols
 * new mac80211 op to indicate that a new interface is to be added
 * support radio separation of multi-band devices

Wireless Extensions
 * move wext spy implementation to libiw
 * remove iw_public_data from struct net_device

brcmfmac
 * optional LPO clock support

ipw2x00
 * move remaining lib80211 code into libiw

wilc1000
 * WILC3000 support

rtw89
 * RTL8852BE and RTL8852BE-VT BT-coexistence improvements

* tag 'wireless-next-2024-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (126 commits)
  mac80211: Remove NOP call to ieee80211_hw_config
  wifi: iwlwifi: work around -Wenum-compare-conditional warning
  wifi: mac80211: re-order assigning channel in activate links
  wifi: mac80211: convert debugfs files to short fops
  debugfs: add small file operations for most files
  wifi: mac80211: remove misleading j_0 construction parts
  wifi: mac80211_hwsim: use hrtimer_active()
  wifi: mac80211: refactor BW limitation check for CSA parsing
  wifi: mac80211: filter on monitor interfaces based on configured channel
  wifi: mac80211: refactor ieee80211_rx_monitor
  wifi: mac80211: add support for the monitor SKIP_TX flag
  wifi: cfg80211: add monitor SKIP_TX flag
  wifi: mac80211: add flag to opt out of virtual monitor support
  wifi: cfg80211: pass net_device to .set_monitor_channel
  wifi: mac80211: remove status->ampdu_delimiter_crc
  wifi: cfg80211: report per wiphy radio antenna mask
  wifi: mac80211: use vif radio mask to limit creating chanctx
  wifi: mac80211: use vif radio mask to limit ibss scan frequencies
  wifi: cfg80211: add option for vif allowed radios
  wifi: iwlwifi: allow IWL_FW_CHECK() with just a string
  ...

====================

Link: https://patch.msgid.link/20241025170705.5F6B2C4CEC3@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 18:50:58 -07:00
Hangbin Liu 9f59eccd9d Documentation: bonding: add XDP support explanation
Add document about which modes have native XDP support.

Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20241021031211.814-3-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 16:09:42 -07:00
Levi Zim b935252cc2 docs: networking: packet_mmap: replace dead links with archive.org links
The original link returns 404 now. This commit replaces the dead google
site link with archive.org link.

Signed-off-by: Levi Zim <rsworktech@outlook.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20241021-packet_mmap_fix_link-v1-1-dffae4a174c0@outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 15:47:10 -07:00
Leo Stone aad4b41a34 Documentation: ieee802154: fix grammar
Fix grammar where it improves readability.

Signed-off-by: Leo Stone <leocstone@gmail.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/20241023041203.35313-1-leocstone@gmail.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2024-10-28 21:14:27 +01:00
Joe Damato 86e25f40aa net: napi: Add napi_config
Add a persistent NAPI config area for NAPI configuration to the core.
Drivers opt-in to setting the persistent config for a NAPI by passing an
index when calling netif_napi_add_config.

napi_config is allocated in alloc_netdev_mqs, freed in free_netdev
(after the NAPIs are deleted).

Drivers which call netif_napi_add_config will have persistent per-NAPI
settings: NAPI IDs, gro_flush_timeout, and defer_hard_irq settings.

Per-NAPI settings are saved in napi_disable and restored in napi_enable.

Co-developed-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241011184527.16393-6-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-14 17:54:29 -07:00
Joe Damato acb8d4ed56 net: napi: Make gro_flush_timeout per-NAPI
Allow per-NAPI gro_flush_timeout setting.

The existing sysfs parameter is respected; writes to sysfs will write to
all NAPI structs for the device and the net_device gro_flush_timeout
field. Reads from sysfs will read from the net_device field.

The ability to set gro_flush_timeout on specific NAPI instances will be
added in a later commit, via netdev-genl.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241011184527.16393-4-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-14 17:54:29 -07:00
Joe Damato f15e3b3ddb net: napi: Make napi_defer_hard_irqs per-NAPI
Add defer_hard_irqs to napi_struct in preparation for per-NAPI
settings.

The existing sysfs parameter is respected; writes to sysfs will write to
all NAPI structs for the device and the net_device defer_hard_irq field.
Reads from sysfs show the net_device field.

The ability to set defer_hard_irqs on specific NAPI instances will be
added in a later commit, via netdev-genl.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241011184527.16393-2-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-14 17:54:28 -07:00
Eric Dumazet d677aebd66 tcp: move sysctl_tcp_l3mdev_accept to netns_ipv4_read_rx
sysctl_tcp_l3mdev_accept is read from TCP receive fast path from
tcp_v6_early_demux(),
 __inet6_lookup_established,
  inet_request_bound_dev_if().

Move it to netns_ipv4_read_rx.

Remove the '#ifdef CONFIG_NET_L3_MASTER_DEV' that was guarding
its definition.

Note this adds a hole of three bytes that could be filled later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Cc: Wei Wang <weiwan@google.com>
Cc: Coco Li <lixiaoyan@google.com>
Link: https://patch.msgid.link/20241010034100.320832-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-11 08:45:24 -07:00
Jakub Kicinski 9c0fc36ec4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.12-rc3).

No conflicts and no adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10 13:13:33 -07:00
Jakub Kicinski bdb5d2481a Merge branch 'net-introduce-tx-h-w-shaping-api'
Paolo Abeni says:

====================
net: introduce TX H/W shaping API

We have a plurality of shaping-related drivers API, but none flexible
enough to meet existing demand from vendors[1].

This series introduces new device APIs to configure in a flexible way
TX H/W shaping. The new functionalities are exposed via a newly
defined generic netlink interface and include introspection
capabilities. Some self-tests are included, on top of a dummy
netdevsim implementation. Finally a basic implementation for the iavf
driver is provided.

Some usage examples:

* Configure shaping on a given queue:

./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \
	--do set --json '{"ifindex": '$IFINDEX',
			  "shaper": {"handle":
				     {"scope": "queue", "id":'$QUEUEID'},
			  "bw-max": 2000000}}'

* Container B/W sharing

The orchestration infrastructure wants to group the
container-related queues under a RR scheduling and limit the aggregate
bandwidth:

./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \
	--do group --json '{"ifindex": '$IFINDEX',
			"leaves": [
			  {"handle": {"scope": "queue", "id":'$QID1'},
			   "weight": '$W1'},
			  {"handle": {"scope": "queue", "id":'$QID2'},
			   "weight": '$W2'}],
			  {"handle": {"scope": "queue", "id":'$QID3'},
			   "weight": '$W3'}],
			"handle": {"scope":"node"},
			"bw-max": 10000000}'
{'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 0}}

Q1 \
    \
Q2 -- node 0 -------  netdev
    / (bw-max: 10M)
Q3 /

* Delegation

A containers wants to limit the aggregate B/W bandwidth of 2 of the 3
queues it owns - the starting configuration is the one from the
previous point:

SPEC=Documentation/netlink/specs/net_shaper.yaml
./tools/net/ynl/cli.py --spec $SPEC \
	--do group --json '{"ifindex": '$IFINDEX',
			"leaves": [
			  {"handle": {"scope": "queue", "id":'$QID1'},
			   "weight": '$W1'},
			  {"handle": {"scope": "queue", "id":'$QID2'},
			   "weight": '$W2'}],
			"handle": {"scope": "node"},
			"bw-max": 5000000 }'
{'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 1}}

Q1 -- node 1 --------\
    / (bw-max: 5M)    \
Q2 /                   node 0 -------  netdev
                      /(bw-max: 10M)
Q3 ------------------/

In a group operation, when prior to the op itself, the leaves have
different parents, the user must specify the parent handle for the
group. I.e., starting from the previous config:

./tools/net/ynl/cli.py --spec $SPEC \
	--do group --json '{"ifindex": '$IFINDEX',
			"leaves": [
			  {"handle": {"scope": "queue", "id":'$QID1'},
			   "weight": '$W1'},
			  {"handle": {"scope": "queue", "id":'$QID3'},
			   "weight": '$W3'}],
			"handle": {"scope": "node"},
			"bw-max": 3000000 }'
Netlink error: Invalid argument
nl_len = 96 (80) nl_flags = 0x300 nl_type = 2
	error: -22
	extack: {'msg': 'All the leaves shapers must have the same old parent'}

./tools/net/ynl/cli.py --spec $SPEC \
	--do group --json '{"ifindex": '$IFINDEX',
			"leaves": [
			  {"handle": {"scope": "queue", "id":'$QID1'},
			   "weight": '$W1'},
			  {"handle": {"scope": "queue", "id":'$QID3'},
			   "weight": '$W3'}],
			"handle": {"scope": "node"},
			"parent": {"scope": "node", "id": 1},
			"bw-max": 3000000 }
{'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 2}}

Q1 -- node 2 ---
    /(bw-max:3M)\
Q3 /             \
         ---- node 1 \
        / (bw-max: 5M)\
      Q2              node 0 -------  netdev
                      (bw-max: 10M)

* Cleanup:

Still starting from config 1To delete a single queue shaper

./tools/net/ynl/cli.py --spec $SPEC --do delete --json \
	'{"ifindex": '$IFINDEX',
	  "handle": {"scope": "queue", "id":'$QID3'}}'

Q1 -- node 2 ---
     (bw-max:3M)\
                 \
         ---- node 1 \
        / (bw-max: 5M)\
      Q2              node 0 -------  netdev
                      (bw-max: 10M)

Deleting a node shaper relinks all its leaves to the node's parent:

./tools/net/ynl/cli.py --spec $SPEC --do delete --json \
	'{"ifindex": '$IFINDEX',
	  "handle": {"scope": "node", "id":2}}'

Q1 ---\
       \
        node 1----- \
       / (bw-max: 5M)\
Q2----/              node 0 -------  netdev
                     (bw-max: 10M)

Deleting the last shaper under a node shaper deletes the node, too:

./tools/net/ynl/cli.py --spec $SPEC --do delete --json \
	'{"ifindex": '$IFINDEX',
	  "handle": {"scope": "queue", "id":'$QID1'}}'
./tools/net/ynl/cli.py --spec $SPEC --do delete --json \
	'{"ifindex": '$IFINDEX',
	  "handle": {"scope": "queue", "id":'$QID2'}}'
./tools/net/ynl/cli.py --spec $SPEC --do get --json \
	'{"ifindex": '$IFINDEX',
	  "handle": {"scope": "node", "id": 1}}'
Netlink error: No such file or directory
nl_len = 44 (28) nl_flags = 0x300 nl_type = 2
	error: -2
	extack: {'bad-attr': '.handle'}

Such delete recurses on parents that are left over with no leaves:

./tools/net/ynl/cli.py --spec $SPEC --do get --json \
	'{"ifindex": '$IFINDEX',
	  "handle": {"scope": "node", "id": 0}}'
Netlink error: No such file or directory
nl_len = 44 (28) nl_flags = 0x300 nl_type = 2
	error: -2
	extack: {'bad-attr': '.handle'}

v8: https://lore.kernel.org/cover.1727704215.git.pabeni@redhat.com
v7: https://lore.kernel.org/cover.1725919039.git.pabeni@redhat.com
v6: https://lore.kernel.org/cover.1725457317.git.pabeni@redhat.com
v5: https://lore.kernel.org/cover.1724944116.git.pabeni@redhat.com
v4: https://lore.kernel.org/cover.1724165948.git.pabeni@redhat.com
v3: https://lore.kernel.org/cover.1722357745.git.pabeni@redhat.com
RFC v2: https://lore.kernel.org/cover.1721851988.git.pabeni@redhat.com
RFC v1: https://lore.kernel.org/cover.1719518113.git.pabeni@redhat.com
====================

Link: https://patch.msgid.link/cover.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10 08:32:46 -07:00
Paolo Abeni 4b623f9f0f net-shapers: implement NL get operation
Introduce the basic infrastructure to implement the net-shaper
core functionality. Each network devices carries a net-shaper cache,
the NL get() operation fetches the data from such cache.

The cache is initially empty, will be fill by the set()/group()
operation implemented later and is destroyed at device cleanup time.

The net_shaper_fill_handle(), net_shaper_ctx_init(), and
net_shaper_generic_pre() implementations handle generic index type
attributes, despite the current caller always pass a constant value
to avoid more noise in later patches using them with different
attributes.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/ddd10fd645a9367803ad02fca4a5664ea5ace170.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10 08:30:22 -07:00
Jakub Kicinski 22ee378eb6 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-10-08 (ice, iavf, igb, e1000e, e1000)

This series contains updates to ice, iavf, igb, e1000e, and e1000
drivers.

For ice:

Wojciech adds support for ethtool reset.

Paul adds support for hardware based VF mailbox limits for E830 devices.

Jake adjusts to a common iterator in ice_vc_cfg_qs_msg() and moves
storing of max_frame and rx_buf_len from VSI struct to the ring
structure.

Hongbo Li uses assign_bit() to replace an open-coded instance.

Markus Elfring adjusts a couple of PTP error paths to use a common,
shared exit point.

Yue Haibing removes unused declarations.

For iavf:

Yue Haibing removes unused declarations.

For igb:

Yue Haibing removes unused declarations.

For e1000e:

Takamitsu Iwai removes unneccessary writel() calls.

Joe Damato adds support for netdev-genl support to query IRQ, NAPI,
and queue information.

For e1000:

Joe Damato adds support for netdev-genl support to query IRQ, NAPI,
and queue information.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  e1000: Link NAPI instances to queues and IRQs
  e1000e: Link NAPI instances to queues and IRQs
  e1000e: Remove duplicated writel() in e1000_configure_tx/rx()
  igb: Cleanup unused declarations
  iavf: Remove unused declarations
  ice: Cleanup unused declarations
  ice: Use common error handling code in two functions
  ice: Make use of assign_bit() API
  ice: store max_frame and rx_buf_len only in ice_rx_ring
  ice: consistently use q_idx in ice_vc_cfg_qs_msg()
  ice: add E830 HW VF mailbox message limit support
  ice: Implement ethtool reset support
====================

Link: https://patch.msgid.link/20241008233441.928802-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 20:04:43 -07:00
Donald Hunter 54b771e6c6 doc: net: Fix .rst rendering of net_cachelines pages
The doc pages under /networking/net_cachelines are unreadable because
they lack .rst formatting for the tabular text.

Add simple table markup and tidy up the table contents:

- remove dashes that represent empty cells because they render
  as bullets and are not needed
- replace 'struct_*' with 'struct *' in the first column so that
  sphinx can render links for any structs that appear in the docs

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241008165329.45647-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 17:34:49 -07:00
Johannes Berg a0efa2f362 Merge net-next/main to resolve conflicts
The wireless-next tree was based on something older, and there
are now conflicts between -rc2 and work here. Merge net-next,
which has enough of -rc2 for the conflicts to happen, resolving
them in the process.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-09 08:59:22 +02:00
Wojciech Drewek b699c81af0 ice: Implement ethtool reset support
Enable ethtool reset support. Ethtool reset flags are mapped to the
E810 reset type:
PF reset:
  $ ethtool --reset <ethX> irq dma filter offload
CORE reset:
  $ ethtool --reset <ethX> irq-shared dma-shared filter-shared \
    offload-shared ram-shared
GLOBAL reset:
  $ ethtool --reset <ethX> irq-shared dma-shared filter-shared \
    offload-shared mac-shared phy-shared ram-shared

Calling the same set of flags as in PF reset case on port representor
triggers VF reset.

Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-10-08 14:37:15 -07:00
Johannes Berg 836265d316 wifi: remove iw_public_data from struct net_device
Given the previous patches, we no longer need the
struct iw_public_data etc., it's only used by the
old Intel drivers (and ps3_gelic creates it but
then doesn't use it). Remove all of that, including
the pointer in struct net_device.

Link: https://patch.msgid.link/20241007213525.8b2d52b60531.I6a27aaf30bded9a0977f07f47fba2bd31a3b3330@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-08 21:53:40 +02:00
Oleksij Rempel e793b86ae4 Documentation: networking: add Twisted Pair Ethernet diagnostics at OSI Layer 1
This patch introduces a diagnostic guide for troubleshooting Twisted
Pair  Ethernet variants at OSI Layer 1. It provides detailed steps for
detecting  and resolving common link issues, such as incorrect wiring,
cable damage,  and power delivery problems. The guide also includes
interface verification  steps and PHY-specific diagnostics.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20241004121824.1716303-1-o.rempel@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08 11:45:35 +02:00
Eric Dumazet a3f5f4c2f9 ipv4: remove fib_info_devhash[]
Upcoming per-netns RTNL conversion needs to get rid
of shared hash tables.

fib_info_devhash[] is one of them.

It is unclear why we used a hash table, because
a single hlist_head per net device was cheaper and scalable.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20241004134720.579244-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-07 16:46:27 -07:00
Eric Dumazet f858cc9eed net: add IFLA_MAX_PACING_OFFLOAD_HORIZON device attribute
Some network devices have the ability to offload EDT (Earliest
Departure Time) which is the model used for TCP pacing and FQ
packet scheduler.

Some of them implement the timing wheel mechanism described in
https://saeed.github.io/files/carousel-sigcomm17.pdf
with an associated 'timing wheel horizon'.

This patch adds dev->max_pacing_offload_horizon expressing
this timing wheel horizon in nsec units.

This is a read-only attribute.

Unless a driver sets it, dev->max_pacing_offload_horizon
is zero.

v2: addressed Jakub feedback ( https://lore.kernel.org/netdev/20240930152304.472767-2-edumazet@google.com/T/#mf6294d714c41cc459962154cc2580ce3c9693663 )
v3: added yaml doc (also per Jakub feedback)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20241003121219.2396589-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04 15:37:53 -07:00