]> git.apps.os.sepia.ceph.com Git - ceph-client.git/log
ceph-client.git
5 months agopldmfw: Don't require send_package_data or send_component_table to be defined
Lee Trager [Mon, 12 May 2025 18:53:57 +0000 (11:53 -0700)]
pldmfw: Don't require send_package_data or send_component_table to be defined

Not all drivers require send_package_data or send_component_table when
updating firmware. Instead of forcing drivers to implement a stub allow
these functions to go undefined.

Signed-off-by: Lee Trager <lee@trager.us>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250512190109.2475614-2-lee@trager.us
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agonet: phy: marvell-88q2xxx: Enable temperature measurement in probe again
Dimitri Fedrau [Mon, 12 May 2025 12:03:42 +0000 (14:03 +0200)]
net: phy: marvell-88q2xxx: Enable temperature measurement in probe again

Enabling of the temperature sensor was moved from mv88q2xxx_hwmon_probe to
mv88q222x_config_init with the consequence that the sensor is only
usable when the PHY is configured. Enable the sensor in
mv88q2xxx_hwmon_probe as well to fix this.

Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Link: https://patch.msgid.link/20250512-marvell-88q2xxx-hwmon-enable-at-probe-v4-1-9256a5c8f603@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agonet: cpsw: isolate cpsw_ndo_ioctl() to just the old driver
Vladimir Oltean [Mon, 12 May 2025 11:44:22 +0000 (14:44 +0300)]
net: cpsw: isolate cpsw_ndo_ioctl() to just the old driver

cpsw->slaves[slave_no].phy should be equal to netdev->phydev, because it
is assigned from phy_attach_direct(). The latter is indirectly called
from the two identically named cpsw_slave_open() functions, one in
cpsw.c and another in cpsw_new.c.

Thus, the driver should not need custom logic to find the PHY, the core
can find it, and phy_do_ioctl_running() achieves exactly that.

However, that is only the case for cpsw_new and for the cpsw driver in
dual EMAC mode. This is explained in more detail in the previous commit.
Thus, allow the simpler core logic to execute for cpsw_new, and move
cpsw_ndo_ioctl() to cpsw.c.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250512114422.4176010-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: cpsw: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()
Vladimir Oltean [Mon, 12 May 2025 11:44:21 +0000 (14:44 +0300)]
net: cpsw: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()

New timestamping API was introduced in commit 66f7223039c0 ("net: add
NDOs for configuring hardware timestamping") from kernel v6.6. It is
time to convert the two cpsw drivers to the new API, so that the
ndo_eth_ioctl() path can be removed completely.

The cpsw_hwtstamp_get() and cpsw_hwtstamp_set() methods (and their shim
definitions, for the case where CONFIG_TI_CPTS is not enabled) must have
their prototypes adjusted.

These methods are used by two drivers (cpsw and cpsw_new), with vastly
different configurations:
- cpsw has two operating modes:
  - "dual EMAC" - enabled through the "dual_emac" device tree property -
    creates one net_device per EMAC / slave interface (but there is no
    bridging offload)
  - "switch mode" - default - there is a single net_device, with two
    EMACs/slaves behind it (and switching between them happens
    unbeknownst to the network stack).
- cpsw_new always registers one net_device for each EMAC which doesn't
  have status = "disabled". In terms of switching, it has two modes:
  - "dual EMAC": default, no switching between ports, no switchdev
    offload.
  - "switch mode": enabled through the "switch_mode" devlink parameter,
    offloads the Linux bridge through switchdev

Essentially, in 3 out of 4 operating modes, there is a bijective
relation between the net_device and the slave. Timestamping can thus be
configured on individual slaves. But in the "switch mode" of the cpsw
driver, ndo_eth_ioctl() targets a single slave, designated using the
"active_slave" device tree property.

To deal with these different cases, the common portion of the drivers,
cpsw_priv.c, has the cpsw_slave_index() function pointer, set to
separate, identically named cpsw_slave_index_priv() by the 2 drivers.

This is all relevant because cpsw_ndo_ioctl() has the old-style
phy_has_hwtstamp() logic which lets the PHY handle the timestamping
ioctls. Normally, that logic should be obsoleted by the more complex
logic in the core, which permits dynamically selecting the timestamp
provider - see dev_set_hwtstamp_phylib().

But I have doubts as to how this works for the "switch mode" of the dual
EMAC driver, because the core logic only engages if the PHY is visible
through ndev->phydev (this is set by phy_attach_direct()).

In cpsw.c, we have:
cpsw_ndo_open()
-> for_each_slave(priv, cpsw_slave_open, priv); // continues on errors
   -> of_phy_connect()
      -> phy_connect_direct()
         -> phy_attach_direct()
   OR
   -> phy_connect()
      -> phy_connect_direct()
         -> phy_attach_direct()

The problem for "switch mode" is that the behavior of phy_attach_direct()
called twice in a row for the same net_device (once for each slave) is
probably undefined.

For sure it will overwrite dev->phydev. I don't see any explicit error
checks for this case, and even if there were, the for_each_slave() call
makes them non-fatal to cpsw_ndo_open() anyway.

I have no idea what is the extent to which this provides a usable
result, but the point is: only the last attached PHY will be visible
in dev->phydev, and this may well be a different PHY than
cpsw->slaves[slave_no].phy for the "active_slave".

In dual EMAC mode, as well as in cpsw_new, this should not be a problem.
I don't know whether PHY timestamping is a use case for the cpsw "switch
mode" as well, and I hope that there isn't, because for the sake of
simplicity, I've decided to deliberately break that functionality, by
refusing all PHY timestamping. Keeping it would mean blocking the old
API from ever being removed. In the new dev_set_hwtstamp_phylib() API,
it is not possible to operate on a phylib PHY other than dev->phydev,
and I would very much prefer not adding that much complexity for bizarre
driver decisions.

Final point about the cpsw_hwtstamp_get() conversion: we don't need to
propagate the unnecessary "config.flags = 0;", because dev_get_hwtstamp()
provides a zero-initialized struct kernel_hwtstamp_config.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250512114422.4176010-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch 'misc-drivers-sw-timestamp-changes'
Jakub Kicinski [Thu, 15 May 2025 02:32:55 +0000 (19:32 -0700)]
Merge branch 'misc-drivers-sw-timestamp-changes'

Jason Xing says:

====================
misc drivers' sw timestamp changes

This series modified three outstanding drivers among more than 100 drivers
because the software timestamp generation is too early. The idea of this
series is derived from the brief talk[1] with Willem. In conclusion, this
series makes the generation of software timestamp near/before kicking the
doorbell for drivers.

[1]: https://lore.kernel.org/all/681b9d2210879_1f6aad294bc@willemb.c.googlers.com.notmuch/

v2: https://lore.kernel.org/20250508033328.12507-1-kerneljasonxing@gmail.com
====================

Link: https://patch.msgid.link/20250510134812.48199-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: stmmac: generate software timestamp just before the doorbell
Jason Xing [Sat, 10 May 2025 13:48:12 +0000 (21:48 +0800)]
net: stmmac: generate software timestamp just before the doorbell

Make sure the call of skb_tx_timestamp is as close as possbile to the
doorbell.

The patch also adjusts the order of setting SKBTX_IN_PROGRESS and
generate software timestamp so that without SOF_TIMESTAMPING_OPT_TX_SWHW
being set the software and hardware timestamps will not appear in the
error queue of socket nearly at the same time (Please see __skb_tstamp_tx()).

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://patch.msgid.link/20250510134812.48199-4-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: cxgb4: generate software timestamp just before the doorbell
Jason Xing [Sat, 10 May 2025 13:48:11 +0000 (21:48 +0800)]
net: cxgb4: generate software timestamp just before the doorbell

Make sure the call of skb_tx_timestamp is as close as possible to the
doorbell.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://patch.msgid.link/20250510134812.48199-3-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: atlantic: generate software timestamp just before the doorbell
Jason Xing [Sat, 10 May 2025 13:48:10 +0000 (21:48 +0800)]
net: atlantic: generate software timestamp just before the doorbell

Make sure the call of skb_tx_timestamp is as close as possible to the
doorbell.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://patch.msgid.link/20250510134812.48199-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: apple: bmac: use crc32() instead of hand-rolled equivalent
Eric Biggers [Tue, 13 May 2025 05:01:42 +0000 (22:01 -0700)]
net: apple: bmac: use crc32() instead of hand-rolled equivalent

The calculation done by bmac_crc(addr) followed by taking the low 6 bits
and reversing them is equivalent to taking the high 6 bits from
crc32(~0, addr, ETH_ALEN).  Just do that instead.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://patch.msgid.link/20250513050142.635391-1-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoopenvswitch: Stricter validation for the userspace action
Eelco Chaudron [Mon, 12 May 2025 08:08:24 +0000 (10:08 +0200)]
openvswitch: Stricter validation for the userspace action

This change enhances the robustness of validate_userspace() by ensuring
that all Netlink attributes are fully contained within the parent
attribute. The previous use of nla_parse_nested_deprecated() could
silently skip trailing or malformed attributes, as it stops parsing at
the first invalid entry.

By switching to nla_parse_deprecated_strict(), we make sure only fully
validated attributes are copied for later use.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Link: https://patch.msgid.link/67eb414e2d250e8408bb8afeb982deca2ff2b10b.1747037304.git.echaudro@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: phy: remove Kconfig symbol MDIO_DEVRES
Heiner Kallweit [Sun, 11 May 2025 21:13:25 +0000 (23:13 +0200)]
net: phy: remove Kconfig symbol MDIO_DEVRES

MDIO_DEVRES is only set where PHYLIB/PHYLINK are set which
select MDIO_DEVRES. So we can remove this symbol.

Note: Due to circular module dependencies we can't simply
      make mdio_devres.c part of phylib.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/27cba535-f507-4b32-84a3-0744c783a465@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet/tg3: use crc32() instead of hand-rolled equivalent
Eric Biggers [Tue, 13 May 2025 04:14:02 +0000 (21:14 -0700)]
net/tg3: use crc32() instead of hand-rolled equivalent

The calculation done by calc_crc() is equivalent to
~crc32(~0, buf, len), so just use that instead.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://patch.msgid.link/20250513041402.541527-1-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agodt-bindings: net: snps,dwmac: Align mdio node in example with bindings
Geert Uytterhoeven [Tue, 13 May 2025 07:33:56 +0000 (09:33 +0200)]
dt-bindings: net: snps,dwmac: Align mdio node in example with bindings

According to the bindings, the MDIO subnode should be called "mdio".
Update the example to match this.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/308d72c2fe8e575e6e137b99743329c2d53eceea.1747121550.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agodocumentation: networking: devlink: Fix a typo in devlink-trap.rst
Alper Ak [Tue, 13 May 2025 09:24:51 +0000 (12:24 +0300)]
documentation: networking: devlink: Fix a typo in devlink-trap.rst

Fix a typo in the documentation: "errorrs" -> "errors".

Signed-off-by: Alper Ak <alperyasinak1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250513092451.22387-1-alperyasinak1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: enetc: fix implicit declaration of function FIELD_PREP
Wei Fang [Mon, 12 May 2025 06:17:01 +0000 (14:17 +0800)]
net: enetc: fix implicit declaration of function FIELD_PREP

The kernel test robot reported the following error:

drivers/net/ethernet/freescale/enetc/ntmp.c: In function 'ntmp_fill_request_hdr':
drivers/net/ethernet/freescale/enetc/ntmp.c:203:38: error: implicit
declaration of function 'FIELD_PREP' [-Wimplicit-function-declaration]
203 |         cbd->req_hdr.access_method = FIELD_PREP(NTMP_ACCESS_METHOD,
    |                                      ^~~~~~~~~~

Therefore, add "bitfield.h" to ntmp_private.h to fix this issue.

Fixes: 4701073c3deb ("net: enetc: add initial netc-lib driver to support NTMP")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202505101047.NTMcerZE-lkp@intel.com/
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonet: wangxun: Correct clerical errors in comments
Jiawen Wu [Mon, 12 May 2025 02:03:33 +0000 (10:03 +0800)]
net: wangxun: Correct clerical errors in comments

There are wrong "#endif" comments in .h files need to be corrected.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonet: phy: remove stub for mdiobus_register_board_info
Heiner Kallweit [Mon, 12 May 2025 20:20:59 +0000 (22:20 +0200)]
net: phy: remove stub for mdiobus_register_board_info

The functionality of mdiobus_register_board_info() typically isn't
optional for the caller. Therefore remove the stub.

Note: Currently we have only one caller of mdiobus_register_board_info(),
in a DSA/PHYLINK context. Therefore CONFIG_MDIO_DEVICE is selected anyway.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/410a2222-c4e8-45b0-9091-d49674caeb00@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: mlxsw: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()
Vladimir Oltean [Mon, 12 May 2025 15:44:11 +0000 (18:44 +0300)]
net: mlxsw: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()

New timestamping API was introduced in commit 66f7223039c0 ("net: add
NDOs for configuring hardware timestamping") from kernel v6.6. It is
time to convert the mlxsw driver to the new API, so that the
ndo_eth_ioctl() path can be removed completely.

The UAPI is still ioctl-only, but it's best to remove the "ioctl"
mentions from the driver in case a netlink variant appears.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250512154411.848614-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: ipa: Make the SMEM item ID constant
Konrad Dybcio [Mon, 12 May 2025 18:07:39 +0000 (20:07 +0200)]
net: ipa: Make the SMEM item ID constant

It can't vary, stop storing the same magic number everywhere.

Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Alex Elder <elder@kernel.org>
Link: https://patch.msgid.link/20250512-topic-ipa_smem-v1-1-302679514a0d@oss.qualcomm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: enetc: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()
Vladimir Oltean [Mon, 12 May 2025 11:24:02 +0000 (14:24 +0300)]
net: enetc: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()

New timestamping API was introduced in commit 66f7223039c0 ("net: add
NDOs for configuring hardware timestamping") from kernel v6.6. It is
time to convert the ENETC driver to the new API, so that the
ndo_eth_ioctl() path can be removed completely.

Move the enetc_hwtstamp_get() and enetc_hwtstamp_set() calls away from
enetc_ioctl() to dedicated net_device_ops for the LS1028A PF and VF
(NETC v4 does not yet implement enetc_ioctl()), adapt the prototypes and
export these symbols (enetc_ioctl() is also exported).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20250512112402.4100618-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: txgbe: Fix pending interrupt
Jiawen Wu [Mon, 12 May 2025 10:06:52 +0000 (18:06 +0800)]
net: txgbe: Fix pending interrupt

For unknown reasons, sometimes the value of MISC interrupt is 0 in the
IRQ handle function. In this case, wx_intr_enable() is also should be
invoked to clear the interrupt. Otherwise, the next interrupt would
never be reported.

Fixes: a9843689e2de ("net: txgbe: add sriov function support")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/F4F708403CE7090B+20250512100652.139510-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch 'net-mlx5-hws-complex-matchers-and-rehash-mechanism-fixes'
Jakub Kicinski [Tue, 13 May 2025 22:26:32 +0000 (15:26 -0700)]
Merge branch 'net-mlx5-hws-complex-matchers-and-rehash-mechanism-fixes'

Tariq Toukan says:

====================
net/mlx5: HWS, Complex Matchers and rehash mechanism fixes

Motivation:
----------

A matcher can match a certain set of match parameters. However,
the number and size of match params for a single matcher are
limited — all the parameters must fit within a single definer.

A common example of this limitation is IPv6 address matching, where
matching both source and destination IPs requires more bits than a
single definer can support.

SW Steering addresses this limitation by chaining multiple Steering
Table Entries (STEs) within the same matcher, where each STE matches
on a subset of the parameters.

In HW Steering, such chaining is not possible — the matcher's STEs
are managed in a hash table, and a single definer is used to calculate
the hash index for STEs.

Overview:
--------

To address this limitation in HW Steering, we introduce
*Complex Matchers*, which consist of two chained matchers. This allows
matching on twice as many parameters. Complex Matchers are filled with
*Complex Rules* — rules that are split into two parts and inserted into
their respective matchers.

The first half of the Complex Matcher is a regular matcher and points
to the second half, which is an *Isolated Matcher*. An Isolated Matcher
has its own isolated table and is accessible only by traffic coming
from the first half of the Complex Matcher.

This splitting of matchers/rules into multiple parts is transparent to
users. It is hidden behind the BWC HWS API. It becomes visible only
when dumping steering debug information, where the Complex Matcher
appears as two separate matchers: one in the user-created table and
another in its isolated table.

Implementation Details:
----------------------

All user actions are performed on the second part of the rules only.
The first part handles matching and applies two actions: modify header
(set metadata, see details below) and go-to-table (directing traffic
to the isolated table containing the isolated matcher).

Rule updates (updating rule actions) are applied to the second part
of the rule since user-provided actions are not executed in the first
matcher.

We use REG_C_6 metadata register to set and match on unique per-rule
tag (see details below).

Splitting rules into two parts introduces new challenges:

1. Invalid Combinations

   Consider two rules with different matching values:
   - Rule 1: A+B
   - Rule 2: C+D

   Let's split the rules into two parts as follows:

   |-----Complex Matcher-------|
   |                           |
   | 1st matcher   2nd matcher |
   |    |---|        |---|     |
   |    | A |        | B |     |
   |    |---| -----> |---|     |
   |    | C |        | D |     |
   |    |---|        |---|     |
   |                           |
   |---------------------------|

   Splitting these rules results in invalid combinations: A+D and C+B:
   any packet that matched on A will be forwarded to the 2nd matcher,
   where it will try to match on B (which is legal, and it is what the
   user asked for), but it will also try to match on D (which is not
   what the user asked for). To resolve this, we assign unique tags
   to each rule on the first matcher and match on these tags on the
   second matcher:

   |----------|     |---------|
   |     A    |     | B, TagA |
   | action:  |     |         |
   | set TagA |     |         |
   |----------| --> |---------|
   |     C    |     | D, TagB |
   | action:  |     |         |
   | set TagB |     |         |
   |----------|     |---------|

2. Duplicated Entries:

   Consider two rules with overlapping values:
   - Rule 1: A+B
   - Rule 2: A+D

   Let's split the rules into two parts as follows:

    |---|     |---|
    | A |     | B |
    |---| --> |---|
    |   |     | D |
    |---|     |---|

   This leads to the duplicated entries on the first matcher, which HWS
   doesn't allow: subsequent delete of either of the rules will delete
   the only entry in the first matcher, leaving the remaining rule
   broken. To address this, we use a reference count for entries in the
   first matcher and delete STEs only when their refcount reaches zero.

Both challenges are resolved by having a per-matcher data structure
(implemented with rhashtable) that manages refcounts for the first part
of the rules and holds unique tags (managed via IDA) for these rules to
set and to match on the second matcher.

Limitations:
-----------

We utilize metadata register REG_C_6 in this implementation, so its
usage anywhere along the flow that might include the need for Complex
Matcher is prohibited.

The number and size of match parameters remain limited — now
constrained by what can be represented by two definers instead of one.
This architectural limitation arises from the structure of Complex
Matchers. If future requirements demand more parameters, Complex
Matchers can be extended beyond two matchers.

Additionally, there is an implementation limit of 32 match parameters
per matcher (disregarding parameter size). This limit can be lifted
if needed.

Patches:
-------

 - Patches 1-3: small additions/refactoring in preparation for
   Complex Matcher: exposed mlx5hws_table_ft_set_next_ft() in header,
   added definer function to convert field name enum to string,
   expose the polling function mlx5hws_bwc_queue_poll() in a header.
 - Patch 4: in preparation for Complex Matcher, this patch adds
   support for Isolated Matcher.
 - Patch 5: the main patch - Complex Matchers implementation.

[2]

Patch 6: fixing the usecase where rule insertion was failing,
but rehash couldn't be initiated if the number of rules in
the table is below the rehash threshold.

Patch 7: fixing the usecase where many rules in parallel
would require rehash, due to the way the counting of rules
was done.

Patch 8: fixing the case where rules were requiring action
template extension in parallel, leading to unneeded extensions
with the same templates.

Patch 9: refactor and simplify the rehash loop.

Patch 10: dump error completion details, which helps a lot
in trying to understand what went wrong, especially during
rehash.
====================

Link: https://patch.msgid.link/1746992290-568936-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet/mlx5: HWS, dump bad completion details
Yevgeny Kliteynik [Sun, 11 May 2025 19:38:10 +0000 (22:38 +0300)]
net/mlx5: HWS, dump bad completion details

Failing to insert/delete a rule should not happen. If it does happen,
it would be good to know at which stage it happened and what was the
failure. This patch adds printing of bad CQE details.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1746992290-568936-11-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet/mlx5: HWS, rework rehash loop
Yevgeny Kliteynik [Sun, 11 May 2025 19:38:09 +0000 (22:38 +0300)]
net/mlx5: HWS, rework rehash loop

Reworking the rehash loop - simplifying the code and making it less
error prone:
 - Instead of doing round-robin on all the queues with batch of rules in
   each cycle, just go over all the queues and move all the rules that
   belong to this queue.
 - If at some stage of moving the rule we get a failure (which should
   not happen), this can't be rolled back. So instead of aborting
   rehash and leaving the matcher in a broken state, allow the loop
   to continue: attempt to move the rest of the rules and delete the
   old matcher. A rule that failed to move to a new matcher will loose
   its match STE once the rehash is completed and the old matcher is
   deleted, so the rule won't match any traffic any more. This rule's
   packets will fall back to the steering pipeline w/o HW offload.
   Rehash procedure will return an error, which will cause the rule
   insertion to fail for the rule that started this whole rehash.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1746992290-568936-10-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet/mlx5: HWS, fix redundant extension of action templates
Yevgeny Kliteynik [Sun, 11 May 2025 19:38:08 +0000 (22:38 +0300)]
net/mlx5: HWS, fix redundant extension of action templates

When a rule is inserted into a matcher, we search for the suitable
action template. If such template is not found, action template array
is extended with the new template. However, when several threads are
performing this in parallel, there is a race - we can end up with
extending the action templates array with the same template.

This patch is doing the following:
 - refactor the code to find action template index in rule create and
   update, have the common code in an auxiliary function
 - after locking all the queues, check again if the action template
   array still needs to be extended

Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1746992290-568936-9-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet/mlx5: HWS, fix counting of rules in the matcher
Yevgeny Kliteynik [Sun, 11 May 2025 19:38:07 +0000 (22:38 +0300)]
net/mlx5: HWS, fix counting of rules in the matcher

Currently the counter that counts number of rules in a matcher is
increased only when rule insertion is completed. In a multi-threaded
usecase this can lead to a scenario that many rules can be in process
of insertion in the same matcher, while none of them has completed
the insertion and the rule counter is not updated. This results in
a rule insertion failure for many of them at first attempt, which
leads to all of them requiring rehash and requiring locking of all
the queue locks.

This patch fixes the case by increasing the rule counter in the
beginning of insertion process and decreasing in case of any failure.

Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1746992290-568936-8-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet/mlx5: HWS, force rehash when rule insertion failed
Yevgeny Kliteynik [Sun, 11 May 2025 19:38:06 +0000 (22:38 +0300)]
net/mlx5: HWS, force rehash when rule insertion failed

Rules are inserted into hash table in accordance with their hash index.
When a certain number of rules is reached, the table is rehashed:
a bigger new table is allocated and all the rules are moved there.
But sometimes a new rule can't be inserted into the hash table
because its index is full, even though the number of rules in the
table is well below the threshold. The hash function is not perfect,
so such cases are not rare. When that happens, we want to do the same
rehash, in order to increase the table size and lower the probability
for such cases.

This patch fixes the usecase where rule insertion was failing, but
rehash couldn't be initiated due to low number of rules: it adds flag
that denotes that rehash is required, even if the number of rules in
the table is below the rehash threshold.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1746992290-568936-7-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet/mlx5: HWS, support complex matchers
Yevgeny Kliteynik [Sun, 11 May 2025 19:38:05 +0000 (22:38 +0300)]
net/mlx5: HWS, support complex matchers

This patch adds support for Complex Matchers/Rules

Overview:
--------

A matcher can match on a certain set of match parameters. However, the
number and size of match params for a single matcher are limited: all
the parameters must fit within a single definer.

A common example of this limitation is IPv6 address matching, where
matching both source and destination IPs requires more bits than a
single definer can support.

SW Steering addresses this limitation by chaining multiple Steering
Table Entries (STEs) within the same matcher, where each STE matches
on a subset of the parameters.

In HW Steering, such chaining is not possible — the matcher's STEs
are managed in a hash table, and a single definer is used to calculate
the hash index for STEs.

To address this limitation in HW Steering, we introduce Complex
Matchers, which consist of two chained matchers. This allows matching
on twice as many parameters. Complex Matchers are filled with Complex
Rules — rules that are split into two parts and inserted into their
respective matchers.

The first half of the Complex Matcher is a regular matcher and points
to the second half, which is an Isolated Matcher. An Isolated Matcher
has its own isolated table and is accessible only by traffic coming
from the first half of the Complex Matcher.

This splitting of matchers/rules into multiple parts is transparent to
users. It is hidden under the BWC HWS API. It becomes visible only when
dumping steering debug information, where the Complex Matcher appears
as two separate matchers: one in the user-created table and another
in its isolated table.

Some implementation details:
---------------------------

All user actions are performed on the second part of the rules only.
The first part handles matching and applies two actions: modify header
(set metadata, see details below) and go-to-table (directing traffic to
the isolated table containing the isolated matcher).

Rule updates (updating rule actions) are applied to the second part of
the rule since user-provided actions are not executed in the first
matcher.

We use REG_C_6 metadata register to set and match on unique per-rule
tag (see details below).

Splitting rules into two parts introduces new challenges:

1. Invalid Combinations

   Consider two rules with different matching values:
   - Rule 1: A+B
   - Rule 2: C+D

   Let's split the rules into two parts as follows:

   |---|     |---|
   | A |     | B |
   |---| --> |---|
   | C |     | D |
   |---|     |---|

   Splitting these rules results in invalid combinations like A+D
   and C+B.

   To resolve this, we assign unique tags to each rule on the first
   matcher and match these tags on the second matcher (the tag is
   implemented through modify_hdr action that sets value to metadata
   register REG_C_6):

   |----------|     |---------|
   |     A    |     | B, TagA |
   | action:  |     |         |
   | set TagA |     |         |
   |----------| --> |---------|
   |     C    |     | D, TagB |
   | action:  |     |         |
   | set TagB |     |         |
   |----------|     |---------|

2. Duplicated Entries:

   Consider two rules with overlapping values:
   - Rule 1: A+B
   - Rule 2: A+D

   Let's split the rules into two parts as follows:

    |---|     |---|
    | A |     | B |
    |---| --> |---|
    |   |     | D |
    |---|     |---|

   This leads to the duplicated entries on the first matcher, which HWS
   doesn't allow: subsequent delete of either of the rules will delete
   the only entry in the first matcher, leaving the remaining rule
   broken.

   To address this, we use a reference count for entries in the first
   matcher and delete STEs only when their refcount reaches zero.

Both challenges are resolved by having a per-matcher data structure
(implemented with rhashtable) that manages refcounts for the first part
of the rules and holds unique tags (managed via IDA) for these rules to
set and to match on the second matcher.

Limitations:
-----------

We utilize metadata register REG_C_6 in this implementation, so its
usage anywhere along the steering of the flow that might include the
need for Complex Matcher is prohibited.

The number and size of match parameters remain limited — now it is
constrained by what can be represented by two definers instead of one.
This architectural limitation arises from the structure of Complex
Matchers. If future requirements demand more parameters,
Complex Matchers can be extended beyond two matchers.

Additionally, there is an implementation limit of 32 match parameters
per rule (disregarding parameter size). This limit can be lifted if
needed.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1746992290-568936-6-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet/mlx5: HWS, introduce isolated matchers
Yevgeny Kliteynik [Sun, 11 May 2025 19:38:04 +0000 (22:38 +0300)]
net/mlx5: HWS, introduce isolated matchers

In preparation for complex matcher support, introduce the isolated
matcher.

Isolated matcher is a matcher that has its own isolated table.
It is used as the second half of the complex matcher: when the rule
is split into two parts (complex rule), then matching on the first
part will send the packet to the isolated matcher that will try to
match on the second part. In case of miss, the packet goes back to
the matcher's end flow table.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1746992290-568936-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet/mlx5: HWS, expose polling function in header file
Yevgeny Kliteynik [Sun, 11 May 2025 19:38:03 +0000 (22:38 +0300)]
net/mlx5: HWS, expose polling function in header file

In preparation for complex matcher, expose the function that is
polling queue for completion (mlx5hws_bwc_queue_poll) in header
file, so that it will be used by complex matcher code.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1746992290-568936-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet/mlx5: HWS, add definer function to get field name str
Yevgeny Kliteynik [Sun, 11 May 2025 19:38:02 +0000 (22:38 +0300)]
net/mlx5: HWS, add definer function to get field name str

In preparation for complex matcher support, add function for
converting definer fname to str, which will be used in following
patches.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1746992290-568936-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet/mlx5: HWS, expose function mlx5hws_table_ft_set_next_ft in header
Yevgeny Kliteynik [Sun, 11 May 2025 19:38:01 +0000 (22:38 +0300)]
net/mlx5: HWS, expose function mlx5hws_table_ft_set_next_ft in header

In preparation for complex matcher support, make function
mlx5hws_table_ft_set_next_ft() non-static and expose it in header.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1746992290-568936-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch 'amd-xgbe-add-support-for-amd-renoir'
Paolo Abeni [Tue, 13 May 2025 11:29:43 +0000 (13:29 +0200)]
Merge branch 'amd-xgbe-add-support-for-amd-renoir'

Raju Rangoju says:

====================
amd-xgbe: add support for AMD Renoir

Add support for a new AMD Ethernet device called "Renoir". It has a new
PCI ID, add this to the current list of supported devices in the
amd-xgbe devices. Also, the BAR1 addresses cannot be used to access the
PCS registers on Renoir platform, use the indirect addressing via SMN
instead.
====================

Link: https://patch.msgid.link/20250509155325.720499-1-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoamd-xgbe: add support for new pci device id 0x1641
Raju Rangoju [Fri, 9 May 2025 15:53:25 +0000 (21:23 +0530)]
amd-xgbe: add support for new pci device id 0x1641

Add support for new pci device id 0x1641 to register
Renoir device with PCIe.

Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250509155325.720499-6-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoamd-xgbe: Add XGBE_XPCS_ACCESS_V3 support to xgbe_pci_probe()
Raju Rangoju [Fri, 9 May 2025 15:53:24 +0000 (21:23 +0530)]
amd-xgbe: Add XGBE_XPCS_ACCESS_V3 support to xgbe_pci_probe()

A new version of XPCS access routines have been introduced, add the
support to xgbe_pci_probe() to use these routines.

Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250509155325.720499-5-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoamd-xgbe: add support for new XPCS routines
Raju Rangoju [Fri, 9 May 2025 15:53:23 +0000 (21:23 +0530)]
amd-xgbe: add support for new XPCS routines

Add the necessary support to enable Renoir ethernet device. Since the
BAR1 address cannot be used to access the XPCS registers on Renoir, use
the smn functions.

Some of the ethernet add-in-cards have dual PHY but share a single MDIO
line (between the ports). In such cases, link inconsistencies are
noticed during the heavy traffic and during reboot stress tests. Using
smn calls helps avoid such race conditions.

Suggested-by: Sudheesh Mavila <sudheesh.mavila@amd.com>
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250509155325.720499-4-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoamd-xgbe: reorganize the xgbe_pci_probe() code path
Raju Rangoju [Fri, 9 May 2025 15:53:22 +0000 (21:23 +0530)]
amd-xgbe: reorganize the xgbe_pci_probe() code path

Reorganize the xgbe_pci_probe() code path to convert if/else statements
to switch case to help add future code. This helps code look cleaner.

Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250509155325.720499-3-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoamd-xgbe: reorganize the code of XPCS access
Raju Rangoju [Fri, 9 May 2025 15:53:21 +0000 (21:23 +0530)]
amd-xgbe: reorganize the code of XPCS access

The xgbe_{read/write}_mmd_regs_v* functions have common code which can
be moved to helper functions. Add new helper functions to calculate the
mmd_address for v1/v2 of xpcs access.

Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250509155325.720499-2-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoMerge branch 'tools-ynl-gen-support-sub-types-for-binary-attributes'
Paolo Abeni [Tue, 13 May 2025 11:22:34 +0000 (13:22 +0200)]
Merge branch 'tools-ynl-gen-support-sub-types-for-binary-attributes'

Jakub Kicinski says:

====================
tools: ynl-gen: support sub-types for binary attributes

Binary attributes have sub-type annotations which either indicate
that the binary object should be interpreted as a raw / C array of
a simple type (e.g. u32), or that it's a struct.

Use this information in the C codegen instead of outputting void *
for all binary attrs. It doesn't make a huge difference in the genl
families, but in classic Netlink there is a lot more structs.

v1: https://lore.kernel.org/20250508022839.1256059-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20250509154213.1747885-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agotools: ynl-gen: support struct for binary attributes
Jakub Kicinski [Fri, 9 May 2025 15:42:13 +0000 (08:42 -0700)]
tools: ynl-gen: support struct for binary attributes

Support using a struct pointer for binary attrs. Len field is maintained
because the structs may grow with newer kernel versions. Or, which matters
more, be shorter if the binary is built against newer uAPI than kernel
against which it's executed. Since we are storing a pointer to a struct
type - always allocate at least the amount of memory needed by the struct
per current uAPI headers (unused mem is zeroed). Technically users should
check the length field but per modern ASAN checks storing a short object
under a pointer seems like a bad idea.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250509154213.1747885-4-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agotools: ynl-gen: auto-indent else
Jakub Kicinski [Fri, 9 May 2025 15:42:12 +0000 (08:42 -0700)]
tools: ynl-gen: auto-indent else

We auto-indent if statements (increase the indent of the subsequent
line by 1), do the same thing for else branches without a block.
There hasn't been any else branches before but we're about to add one.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250509154213.1747885-3-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agotools: ynl-gen: support sub-type for binary attributes
Jakub Kicinski [Fri, 9 May 2025 15:42:11 +0000 (08:42 -0700)]
tools: ynl-gen: support sub-type for binary attributes

Sub-type annotation on binary attributes may indicate that the attribute
carries an array of simple types (also referred to as "C array" in docs).
Support rendering them as such in the C user code. For example for u32,
instead of:

  struct {
    u32 arr;
  } _len;

  void *arr;

render:

  struct {
    u32 arr;
  } _count;

  __u32 *arr;

Note that count is the number of elements while len was the length in bytes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250509154213.1747885-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoMerge branch 'device-memory-tcp-tx'
Paolo Abeni [Tue, 13 May 2025 09:13:05 +0000 (11:13 +0200)]
Merge branch 'device-memory-tcp-tx'

Mina Almasry says:

====================
Device memory TCP TX

The TX path had been dropped from the Device Memory TCP patch series
post RFCv1 [1], to make that series slightly easier to review. This
series rebases the implementation of the TX path on top of the
net_iov/netmem framework agreed upon and merged. The motivation for
the feature is thoroughly described in the docs & cover letter of the
original proposal, so I don't repeat the lengthy descriptions here, but
they are available in [1].

Full outline on usage of the TX path is detailed in the documentation
included with this series.

Test example is available via the kselftest included in the series as well.

The series is relatively small, as the TX path for this feature largely
piggybacks on the existing MSG_ZEROCOPY implementation.

Patch Overview:
---------------

1. Documentation & tests to give high level overview of the feature
   being added.

1. Add netmem refcounting needed for the TX path.

2. Devmem TX netlink API.

3. Devmem TX net stack implementation.

4. Make dma-buf unbinding scheduled work to handle TX cases where it gets
   freed from contexts where we can't sleep.

5. Add devmem TX documentation.

6. Add scaffolding enabling driver support for netmem_tx. Add helpers, driver
feature flag, and docs to enable drivers to declare netmem_tx support.

7. Guard netmem_tx against being enabled against drivers that don't
   support it.

8. Add devmem_tx selftests. Add TX path to ncdevmem and add a test to
   devmem.py.

Testing:
--------

Testing is very similar to devmem TCP RX path. The ncdevmem test used
for the RX path is now augemented with client functionality to test TX
path.

* Test Setup:

Kernel: net-next with this RFC and memory provider API cherry-picked
locally.

Hardware: Google Cloud A3 VMs.

NIC: GVE with header split & RSS & flow steering support.

Performance results are not included with this version, unfortunately.
I'm having issues running the dma-buf exporter driver against the
upstream kernel on my test setup. The issues are specific to that
dma-buf exporter and do not affect this patch series. I plan to follow
up this series with perf fixes if the tests point to issues once they're
up and running.

Special thanks to Stan who took a stab at rebasing the TX implementation
on top of the netmem/net_iov framework merged. Parts of his proposal [2]
that are reused as-is are forked off into their own patches to give full
credit.

[1] https://lore.kernel.org/netdev/20240909054318.1809580-1-almasrymina@google.com/
[2] https://lore.kernel.org/netdev/20240913150913.1280238-2-sdf@fomichev.me/T/#m066dd407fbed108828e2c40ae50e3f4376ef57fd

Cc: sdf@fomichev.me
Cc: asml.silence@gmail.com
Cc: dw@davidwei.uk
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Victor Nogueira <victor@mojatatu.com>
Cc: Pedro Tammela <pctammela@mojatatu.com>
Cc: Samiullah Khawaja <skhawaja@google.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
v14: https://lore.kernel.org/netdev/20250429032645.363766-1-almasrymina@google.com/
v13: https://lore.kernel.org/netdev/20250425204743.617260-1-almasrymina@google.com/
v12: https://lore.kernel.org/netdev/20250423031117.907681-1-almasrymina@google.com/
v11: https://lore.kernel.org/netdev/20250423031117.907681-1-almasrymina@google.com/
v10: https://lore.kernel.org/netdev/20250417231540.2780723-1-almasrymina@google.com/
v9: https://lore.kernel.org/netdev/20250415224756.152002-1-almasrymina@google.com/
v8: https://lore.kernel.org/netdev/20250308214045.1160445-1-almasrymina@google.com/
v7: https://lore.kernel.org/netdev/20250227041209.2031104-1-almasrymina@google.com/
v6: https://lore.kernel.org/netdev/20250222191517.743530-1-almasrymina@google.com/
v5: https://lore.kernel.org/netdev/20250220020914.895431-1-almasrymina@google.com/
v4: https://lore.kernel.org/netdev/20250203223916.1064540-1-almasrymina@google.com/
v3: https://patchwork.kernel.org/project/netdevbpf/list/?series=929401&state=*
RFC v2: https://patchwork.kernel.org/project/netdevbpf/list/?series=920056&state=*
====================

Link: https://patch.msgid.link/20250508004830.4100853-1-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoselftests: ncdevmem: Implement devmem TCP TX
Mina Almasry [Thu, 8 May 2025 00:48:29 +0000 (00:48 +0000)]
selftests: ncdevmem: Implement devmem TCP TX

Add support for devmem TX in ncdevmem.

This is a combination of the ncdevmem from the devmem TCP series RFCv1
which included the TX path, and work by Stan to include the netlink API
and refactored on top of his generic memory_provider support.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250508004830.4100853-10-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agonet: check for driver support in netmem TX
Mina Almasry [Thu, 8 May 2025 00:48:28 +0000 (00:48 +0000)]
net: check for driver support in netmem TX

We should not enable netmem TX for drivers that don't declare support.

Check for driver netmem TX support during devmem TX binding and fail if
the driver does not have the functionality.

Check for driver support in validate_xmit_skb as well.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250508004830.4100853-9-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agogve: add netmem TX support to GVE DQO-RDA mode
Mina Almasry [Thu, 8 May 2025 00:48:27 +0000 (00:48 +0000)]
gve: add netmem TX support to GVE DQO-RDA mode

Use netmem_dma_*() helpers in gve_tx_dqo.c DQO-RDA paths to
enable netmem TX support in that mode.

Declare support for netmem TX in GVE DQO-RDA mode.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250508004830.4100853-8-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agonet: enable driver support for netmem TX
Mina Almasry [Thu, 8 May 2025 00:48:26 +0000 (00:48 +0000)]
net: enable driver support for netmem TX

Drivers need to make sure not to pass netmem dma-addrs to the
dma-mapping API in order to support netmem TX.

Add helpers and netmem_dma_*() helpers that enables special handling of
netmem dma-addrs that drivers can use.

Document in netmem.rst what drivers need to do to support netmem TX.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250508004830.4100853-7-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agonet: add devmem TCP TX documentation
Mina Almasry [Thu, 8 May 2025 00:48:25 +0000 (00:48 +0000)]
net: add devmem TCP TX documentation

Add documentation outlining the usage and details of the devmem TCP TX
API.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250508004830.4100853-6-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agonet: devmem: Implement TX path
Mina Almasry [Thu, 8 May 2025 00:48:24 +0000 (00:48 +0000)]
net: devmem: Implement TX path

Augment dmabuf binding to be able to handle TX. Additional to all the RX
binding, we also create tx_vec needed for the TX path.

Provide API for sendmsg to be able to send dmabufs bound to this device:

- Provide a new dmabuf_tx_cmsg which includes the dmabuf to send from.
- MSG_ZEROCOPY with SCM_DEVMEM_DMABUF cmsg indicates send from dma-buf.

Devmem is uncopyable, so piggyback off the existing MSG_ZEROCOPY
implementation, while disabling instances where MSG_ZEROCOPY falls back
to copying.

We additionally pipe the binding down to the new
zerocopy_fill_skb_from_devmem which fills a TX skb with net_iov netmems
instead of the traditional page netmems.

We also special case skb_frag_dma_map to return the dma-address of these
dmabuf net_iovs instead of attempting to map pages.

The TX path may release the dmabuf in a context where we cannot wait.
This happens when the user unbinds a TX dmabuf while there are still
references to its netmems in the TX path. In that case, the netmems will
be put_netmem'd from a context where we can't unmap the dmabuf, Resolve
this by making __net_devmem_dmabuf_binding_free schedule_work'd.

Based on work by Stanislav Fomichev <sdf@fomichev.me>. A lot of the meat
of the implementation came from devmem TCP RFC v1[1], which included the
TX path, but Stan did all the rebasing on top of netmem/net_iov.

Cc: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250508004830.4100853-5-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agonet: devmem: TCP tx netlink api
Stanislav Fomichev [Thu, 8 May 2025 00:48:23 +0000 (00:48 +0000)]
net: devmem: TCP tx netlink api

Add bind-tx netlink call to attach dmabuf for TX; queue is not
required, only ifindex and dmabuf fd for attachment.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250508004830.4100853-4-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agonet: add get_netmem/put_netmem support
Mina Almasry [Thu, 8 May 2025 00:48:22 +0000 (00:48 +0000)]
net: add get_netmem/put_netmem support

Currently net_iovs support only pp ref counts, and do not support a
page ref equivalent.

This is fine for the RX path as net_iovs are used exclusively with the
pp and only pp refcounting is needed there. The TX path however does not
use pp ref counts, thus, support for get_page/put_page equivalent is
needed for netmem.

Support get_netmem/put_netmem. Check the type of the netmem before
passing it to page or net_iov specific code to obtain a page ref
equivalent.

For dmabuf net_iovs, we obtain a ref on the underlying binding. This
ensures the entire binding doesn't disappear until all the net_iovs have
been put_netmem'ed. We do not need to track the refcount of individual
dmabuf net_iovs as we don't allocate/free them from a pool similar to
what the buddy allocator does for pages.

This code is written to be extensible by other net_iov implementers.
get_netmem/put_netmem will check the type of the netmem and route it to
the correct helper:

pages -> [get|put]_page()
dmabuf net_iovs -> net_devmem_[get|put]_net_iov()
new net_iovs -> new helpers

Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250508004830.4100853-3-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agonetmem: add niov->type attribute to distinguish different net_iov types
Mina Almasry [Thu, 8 May 2025 00:48:21 +0000 (00:48 +0000)]
netmem: add niov->type attribute to distinguish different net_iov types

Later patches in the series adds TX net_iovs where there is no pp
associated, so we can't rely on niov->pp->mp_ops to tell what is the
type of the net_iov.

Add a type enum to the net_iov which tells us the net_iov type.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250508004830.4100853-2-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agonet: dsa: b53: implement setting ageing time
Jonas Gorski [Sat, 10 May 2025 09:22:11 +0000 (11:22 +0200)]
net: dsa: b53: implement setting ageing time

b53 supported switches support configuring ageing time between 1 and
1,048,575 seconds, so add an appropriate setter.

This allows b53 to pass the FDB learning test for both vlan aware and
vlan unaware bridges.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250510092211.276541-1-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: mlx4: add SOF_TIMESTAMPING_TX_SOFTWARE flag when getting ts info
Jason Xing [Sat, 10 May 2025 09:34:42 +0000 (17:34 +0800)]
net: mlx4: add SOF_TIMESTAMPING_TX_SOFTWARE flag when getting ts info

As mlx4 has implemented skb_tx_timestamp() in mlx4_en_xmit(), the
SOFTWARE flag is surely needed when users are trying to get timestamp
information.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250510093442.79711-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonetlink: fix policy dump for int with validation callback
Jakub Kicinski [Fri, 9 May 2025 21:27:51 +0000 (14:27 -0700)]
netlink: fix policy dump for int with validation callback

Recent devlink change added validation of an integer value
via NLA_POLICY_VALIDATE_FN, for sparse enums. Handle this
in policy dump. We can't extract any info out of the callback,
so report only the type.

Fixes: 429ac6211494 ("devlink: define enum for attr types of dynamic attributes")
Reported-by: syzbot+01eb26848144516e7f0a@syzkaller.appspotmail.com
Link: https://patch.msgid.link/20250509212751.1905149-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux
Jakub Kicinski [Tue, 13 May 2025 01:48:26 +0000 (18:48 -0700)]
Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux

Tony Nguyen says:

====================
Prepare for Intel IPU E2000 (GEN3)

This is the first part in introducing RDMA support for idpf.

----------------------------------------------------------------
Tatyana Nikolova says:

To align with review comments, the patch series introducing RDMA
RoCEv2 support for the Intel Infrastructure Processing Unit (IPU)
E2000 line of products is going to be submitted in three parts:

1. Modify ice to use specific and common IIDC definitions and
   pass a core device info to irdma.

2. Add RDMA support to idpf and modify idpf to use specific and
   common IIDC definitions and pass a core device info to irdma.

3. Add RDMA RoCEv2 support for the E2000 products, referred to as
   GEN3 to irdma.

This first part is a 5 patch series based on the original
"iidc/ice/irdma: Update IDC to support multiple consumers" patch
to allow for multiple CORE PCI drivers, using the auxbus.

Patches:
1) Move header file to new name for clarity and replace ice
   specific DSCP define with a kernel equivalent one in irdma
2) Unify naming convention
3) Separate header file into common and driver specific info
4) Replace ice specific DSCP define with a kernel equivalent
   one in ice
5) Implement core device info struct and update drivers to use it
----------------------------------------------------------------

v1: https://lore.kernel.org/20250505212037.2092288-1-anthony.l.nguyen@intel.com

IWL reviews:
[v5] https://lore.kernel.org/20250416021549.606-1-tatyana.e.nikolova@intel.com
[v4] https://lore.kernel.org/20250225050428.2166-1-tatyana.e.nikolova@intel.com
[v3] https://lore.kernel.org/20250207194931.1569-1-tatyana.e.nikolova@intel.com
[v2] https://lore.kernel.org/20240824031924.421-1-tatyana.e.nikolova@intel.com
[v1] https://lore.kernel.org/20240724233917.704-1-tatyana.e.nikolova@intel.com

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux:
  iidc/ice/irdma: Update IDC to support multiple consumers
  ice: Replace ice specific DSCP mapping num with a kernel define
  iidc/ice/irdma: Break iidc.h into two headers
  iidc/ice/irdma: Rename to iidc_* convention
  iidc/ice/irdma: Rename IDC header file

====================

Link: https://patch.msgid.link/20250509200712.2911060-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch 'net-vertexcom-mse102x-improve-rx-handling'
Jakub Kicinski [Tue, 13 May 2025 01:46:46 +0000 (18:46 -0700)]
Merge branch 'net-vertexcom-mse102x-improve-rx-handling'

Stefan Wahren says:

====================
net: vertexcom: mse102x: Improve RX handling

This series is the second part of two series for the Vertexcom driver.
It contains some improvements for the RX handling of the Vertexcom MSE102x.
====================

Link: https://patch.msgid.link/20250509120435.43646-1-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: vertexcom: mse102x: Simplify mse102x_rx_pkt_spi
Stefan Wahren [Fri, 9 May 2025 12:04:35 +0000 (14:04 +0200)]
net: vertexcom: mse102x: Simplify mse102x_rx_pkt_spi

The function mse102x_rx_pkt_spi is used in two cases:
* initial polling to re-arm RX interrupt
* level based RX interrupt handler

Both of them doesn't need an open-coded retry mechanism.
In the first case the function can be called again, if the return code
is IRQ_NONE. This keeps the error behavior during netdev open.

In the second case the proper retry would be handled implicit by
the SPI interrupt. So drop the retry code and simplify the receive path.

Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Link: https://patch.msgid.link/20250509120435.43646-7-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: vertexcom: mse102x: Return code for mse102x_rx_pkt_spi
Stefan Wahren [Fri, 9 May 2025 12:04:34 +0000 (14:04 +0200)]
net: vertexcom: mse102x: Return code for mse102x_rx_pkt_spi

The MSE102x doesn't provide any interrupt register, so the only way
to handle the level interrupt is to fetch the whole packet from
the MSE102x internal buffer via SPI. So in cases the interrupt
handler fails to do this, it should return IRQ_NONE. This allows
the core to disable the interrupt in case the issue persists
and prevent an interrupt storm.

Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Link: https://patch.msgid.link/20250509120435.43646-6-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: vertexcom: mse102x: Implement flag for valid CMD
Stefan Wahren [Fri, 9 May 2025 12:04:33 +0000 (14:04 +0200)]
net: vertexcom: mse102x: Implement flag for valid CMD

After removal of the invalid command counter only a relevant debug
message is left, which can be cumbersome. So add a new flag to debugfs,
which indicates whether the driver has ever received a valid CMD.
This helps to differentiate between general and temporary receive
issues.

Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250509120435.43646-5-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: vertexcom: mse102x: Drop invalid cmd stats
Stefan Wahren [Fri, 9 May 2025 12:04:32 +0000 (14:04 +0200)]
net: vertexcom: mse102x: Drop invalid cmd stats

There are several reasons for an invalid command response
by the MSE102x:
* SPI line interferences
* MSE102x is in reset or has no firmware
* MSE102x is busy
* no packet in MSE102x receive buffer

So the counter for invalid command isn't very helpful without
further context. So drop the confusing statistics counter,
but keep the debug messages about "unexpected response" in order
to debug possible hardware issues.

Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250509120435.43646-4-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: vertexcom: mse102x: Add warning about IRQ trigger type
Stefan Wahren [Fri, 9 May 2025 12:04:31 +0000 (14:04 +0200)]
net: vertexcom: mse102x: Add warning about IRQ trigger type

The example of the initial DT binding of the Vertexcom MSE 102x suggested
a IRQ_TYPE_EDGE_RISING, which is wrong. So warn everyone to fix their
device tree to level based IRQ.

Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250509120435.43646-3-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agodt-bindings: vertexcom-mse102x: Fix IRQ type in example
Stefan Wahren [Fri, 9 May 2025 12:04:30 +0000 (14:04 +0200)]
dt-bindings: vertexcom-mse102x: Fix IRQ type in example

According to the MSE102x documentation the trigger type is a
high level.

Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20250509120435.43646-2-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: phy: dp83867: use 2ns delay if not specified in DTB
Matthias Schiffer [Wed, 7 May 2025 10:13:21 +0000 (12:13 +0200)]
net: phy: dp83867: use 2ns delay if not specified in DTB

Most PHY drivers default to a 2ns delay if internal delay is requested
and no value is specified. Having a default value makes sense, as it
allows a Device Tree to only care about board design (whether there are
delays on the PCB or not), and not whether the delay is added on the MAC
or the PHY side when needed.

Whether the delays are actually applied is controlled by the
DP83867_RGMII_*_CLK_DELAY_EN flags, so the behavior is only changed in
configurations that would previously be rejected with -EINVAL.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Link: https://patch.msgid.link/e2509b248a11ee29ea408a50c231da4c1fa0863b.1746612711.git.matthias.schiffer@ew.tq-group.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: phy: dp83867: remove check of delay strap configuration
Matthias Schiffer [Wed, 7 May 2025 10:13:20 +0000 (12:13 +0200)]
net: phy: dp83867: remove check of delay strap configuration

The check that intended to handle "rgmii" PHY mode differently to the
RGMII modes with internal delay never worked as intended:

- added in commit 2a10154abcb7 ("net: phy: dp83867: Add TI dp83867 phy"):
  logic error caused the condition to always evaluate to true
- changed in commit a46fa260f6f5 ("net: phy: dp83867: Fix warning check
  for setting the internal delay"): now the condition incorrectly
  evaluates to false for rgmii-txid
- removed in commit 2b892649254f ("net: phy: dp83867: Set up RGMII TX
  delay")

Around the time of the removal, commit c11669a2757e ("net: phy: dp83867:
Rework delay rgmii delay handling") started clearing the delay enable
flags in RGMIICTL. The change attempted to preserve the historical
behavior of not disabling internal delays with "rgmii" PHY mode and also
documented this in a comment, but due to a conflict between "Set up
RGMII TX delay" and "Rework delay rgmii delay handling", the behavior
dp83867_verify_rgmii_cfg() warned about (and that was also described in
a comment in dp83867_config_init()) disappeared in the following merge
of net into net-next in commit b4b12b0d2f02
("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net").

While is doesn't appear that this breaking change was intentional, it
has been like this since 2019, and the new behavior to disable the delays
with "rgmii" PHY mode is generally desirable - in particular with MAC
drivers that have to fix up the delay mode, resulting in the PHY driver
not even seeing the same mode that was specified in the Device Tree.

Remove the obsolete check and comment.

Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Link: https://patch.msgid.link/8a286207cd11b460bb0dbd27931de3626b9d7575.1746612711.git.matthias.schiffer@ew.tq-group.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agodt-bindings: net: renesas-gbeth: Add support for RZ/V2N (R9A09G056) SoC
Lad Prabhakar [Wed, 7 May 2025 17:35:50 +0000 (18:35 +0100)]
dt-bindings: net: renesas-gbeth: Add support for RZ/V2N (R9A09G056) SoC

Document support for the GBETH IP found on the Renesas RZ/V2N (R9A09G056)
SoC. The GBETH controller on the RZ/V2N SoC is functionally identical to
the one found on the RZ/V2H(P) (R9A09G057) SoC, so `renesas,rzv2h-gbeth`
will be used as a fallback compatible.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250507173551.100280-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch 'selftests-net-configure-rp_filter-in-setup_ns'
Jakub Kicinski [Tue, 13 May 2025 01:10:57 +0000 (18:10 -0700)]
Merge branch 'selftests-net-configure-rp_filter-in-setup_ns'

Hangbin Liu says:

====================
selftests: net: configure rp_filter in setup_ns

Some distributions enable rp_filter globally by default, which can interfere
with various test cases. To address this, many tests explicitly disable
rp_filter within their scripts.

To avoid duplication and ensure consistent behavior across tests, this patch
moves the rp_filter configuration into setup_ns, applied immediately after a
new namespace is created. This change ensures that all namespace-based tests
inherit the appropriate rp_filter settings, simplifying individual test
scripts and improving maintainability.
====================

Link: https://patch.msgid.link/20250508081910.84216-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: mptcp: remove rp_filter configuration
Hangbin Liu [Thu, 8 May 2025 08:19:10 +0000 (08:19 +0000)]
selftests: mptcp: remove rp_filter configuration

Remove the rp_filter configuration from MPTCP tests, as it is now handled
by setup_ns.

Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250508081910.84216-7-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: netfilter: remove rp_filter configuration
Hangbin Liu [Thu, 8 May 2025 08:19:09 +0000 (08:19 +0000)]
selftests: netfilter: remove rp_filter configuration

Remove the rp_filter configuration in netfilter lib, as setup_ns already
sets it appropriately by default

Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250508081910.84216-6-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: net: use setup_ns for SRv6 tests and remove rp_filter configuration
Hangbin Liu [Thu, 8 May 2025 08:19:08 +0000 (08:19 +0000)]
selftests: net: use setup_ns for SRv6 tests and remove rp_filter configuration

Some SRv6 tests manually set up network namespaces and disable rp_filter.
Since the setup_ns library function already handles rp_filter configuration,
convert these SRv6 tests to use setup_ns and remove the redundant rp_filter
settings.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Link: https://patch.msgid.link/20250508081910.84216-5-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: net: use setup_ns for bareudp testing
Hangbin Liu [Thu, 8 May 2025 08:19:07 +0000 (08:19 +0000)]
selftests: net: use setup_ns for bareudp testing

Switch bareudp testing to use setup_ns, which sets up rp_filter by default.
This allows us to remove the manual rp_filter configuration from the script.

Additionally, since setup_ns handles namespace naming and cleanup, we no
longer need a separate cleanup function. We also move the trap setup earlier
in the script, before the test setup begins.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250508081910.84216-4-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: net: remove redundant rp_filter configuration
Hangbin Liu [Thu, 8 May 2025 08:19:06 +0000 (08:19 +0000)]
selftests: net: remove redundant rp_filter configuration

The following tests use setup_ns to create a network namespace, which
will disables rp_filter immediately after namespace creation. Therefore,
it is no longer necessary to disable rp_filter again within these individual
tests.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250508081910.84216-3-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: net: disable rp_filter after namespace initialization
Hangbin Liu [Thu, 8 May 2025 08:19:05 +0000 (08:19 +0000)]
selftests: net: disable rp_filter after namespace initialization

Some distributions enable rp_filter globally by default. To ensure consistent
behavior across environments, we explicitly disable it in several test cases.

This patch moves the rp_filter disabling logic to immediately after the
network namespace is initialized. With this change, individual test cases
with creating namespace via setup_ns no longer need to disable rp_filter
again.

This helps avoid redundancy and ensures test consistency.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250508081910.84216-2-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: ixp4xx_eth: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()
Vladimir Oltean [Thu, 8 May 2025 21:10:42 +0000 (00:10 +0300)]
net: ixp4xx_eth: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()

New timestamping API was introduced in commit 66f7223039c0 ("net: add
NDOs for configuring hardware timestamping") from kernel v6.6. It is
time to convert the intel ixp4xx ethernet driver to the new API, so that
the ndo_eth_ioctl() path can be removed completely.

hwtstamp_get() and hwtstamp_set() are only called if netif_running()
when the code path is engaged through the legacy ioctl. As I don't
want to make an unnecessary functional change which I can't test,
preserve that restriction when going through the new operations.

When cpu_is_ixp46x() is false, the execution of SIOCGHWTSTAMP and
SIOCSHWTSTAMP falls through to phy_mii_ioctl(), which may process it in
case of a timestamping PHY, or may return -EOPNOTSUPP. In the new API,
the core handles timestamping PHYs directly and does not call the netdev
driver, so just return -EOPNOTSUPP directly for equivalent logic.

A gratuitous change I chose to do anyway is prefixing hwtstamp_get() and
hwtstamp_set() with the driver name, ipx4xx. This reflects modern coding
sensibilities, and we are touching the involved lines anyway.

The remainder of eth_ioctl() is exactly equivalent to
phy_do_ioctl_running(), so use that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://patch.msgid.link/20250508211043.3388702-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: drv-net: ping: make sure the ping test restores checksum offload
Jakub Kicinski [Thu, 8 May 2025 21:40:05 +0000 (14:40 -0700)]
selftests: drv-net: ping: make sure the ping test restores checksum offload

The ping test flips checksum offload on and off.
Make sure the original value is restored if test fails.

Reviewed-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250508214005.1518013-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet/mlx5: support software TX timestamp
Stanislav Fomichev [Thu, 8 May 2025 23:51:09 +0000 (16:51 -0700)]
net/mlx5: support software TX timestamp

Having a software timestamp (along with existing hardware one) is
useful to trace how the packets flow through the stack.
mlx5e_tx_skb_update_hwts_flags is called from tx paths
to setup HW timestamp; extend it to add software one as well.

Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250508235109.585096-1-stfomichev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch 'refactoring-designware-vlan-code'
Jakub Kicinski [Sat, 10 May 2025 00:29:45 +0000 (17:29 -0700)]
Merge branch 'refactoring-designware-vlan-code'

Boon Khai Ng says:

====================
Refactoring designware VLAN code.

Refactoring designware VLAN code and introducing support for
hardware-accelerated VLAN stripping for dwxgmac2 IP,
the current patch set consists of two key changes:

1) Refactoring VLAN Functions:
The first change involves moving common VLAN-related functions
of the DesignWare Ethernet MAC into a dedicated file, stmmac_vlan.c.
This refactoring aims to improve code organization and maintainability
by centralizing VLAN handling logic.

2) Renaming all the functions, symbols and macro into more
generic name and consolidate the same function pointer into one.

3) Enabling VLAN for 10G Ethernet MAC IP:
The second change enables VLAN support specifically
for the 10G Ethernet MAC IP. This enhancement leverages
the hardware capabilities of the to perform VLAN stripping,

Changes from previous submmited patches.
v5:
a) Divided the refactor patch in to two patches,
first patch is to move the code into the separate file
and second patch to update the symbol name

b) get the dwmac4 vlan function up to date and port to
stmmac_vlan.c

c) remove the inline function in function pointer and
use only static function defeination.

d) Remove the outer parenthese that is not needed
on the 1 line return statement.

v4:
a) Updated the commit message to explain the descriptors
behaviour on different hardware.

b) Updated the perfect_match variable with the correct
byte order.

Link: https://lore.kernel.org/20250421162930.10237-1-boon.khai.ng@altera.com
v3: https://lore.kernel.org/20250408081354.25881-1-boon.khai.ng@altera.com
v2: https://lore.kernel.org/BL3PR11MB5748AC693D9D61FB56DB7313C1F32@BL3PR11MB5748.namprd11.prod.outlook.com
v1: https://lore.kernel.org/DM8PR11MB5751E5388AEFCFB80BCB483FC13FA@DM8PR11MB5751.namprd11.prod.outlook.com
====================

Link: https://patch.msgid.link/20250507063812.34000-1-boon.khai.ng@altera.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: stmmac: dwxgmac2: Add support for HW-accelerated VLAN stripping
Boon Khai Ng [Wed, 7 May 2025 06:38:12 +0000 (14:38 +0800)]
net: stmmac: dwxgmac2: Add support for HW-accelerated VLAN stripping

This patch adds support for MAC level VLAN tag stripping for the
dwxgmac2 IP. This feature can be configured through ethtool.
If the rx-vlan-offload is off, then the VLAN tag will be stripped
by the driver. If the rx-vlan-offload is on then the VLAN tag
will be stripped by the MAC.

Signed-off-by: Boon Khai Ng <boon.khai.ng@altera.com>
Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com>
Link: https://patch.msgid.link/20250507063812.34000-4-boon.khai.ng@altera.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: stmmac: stmmac_vlan: rename VLAN functions and symbol to generic symbol.
Boon Khai Ng [Wed, 7 May 2025 06:38:11 +0000 (14:38 +0800)]
net: stmmac: stmmac_vlan: rename VLAN functions and symbol to generic symbol.

With the VLAN handling code decoupled from dwmac4 and dwxgmac2 and
consolidated in stmmac_vlan.c, functions and symbols are renamed to
use a generic prefix. This change improves code clarity and
maintainability by reflecting the shared nature of the VLAN logic,
facilitating future enhancements or reuse without being tied to
specific MAC implementations.

No functional changes are introduced in this patch.

Note: The dwxgmac2_update_vlan_hash function is not combined due
to minor differences in setting the VTFE bit. A separate fix patch
will be submitted to align its behavior with the dwmac4 driver.

Signed-off-by: Boon Khai Ng <boon.khai.ng@altera.com>
Link: https://patch.msgid.link/20250507063812.34000-3-boon.khai.ng@altera.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: stmmac: Refactor VLAN implementation
Boon Khai Ng [Wed, 7 May 2025 06:38:10 +0000 (14:38 +0800)]
net: stmmac: Refactor VLAN implementation

Refactor VLAN implementation by moving common code for DWMAC4 and
DWXGMAC IPs into a separate VLAN module. VLAN implementation for
DWMAC4 and DWXGMAC differs only for CSR base address, the descriptor
for the VLAN ID and VLAN VALID bit field.

The descriptor format for VLAN is not moved to the common code due
to hardware-specific differences between DWMAC4 and DWXGMAC.

For the DWMAC4 IP, the Receive Normal Descriptor 0 (RDES0) is
formatted as follows:
    31                                                0
      ------------------------ -----------------------
RDES0| Inner VLAN TAG [31:16] | Outer VLAN TAG [15:0] |
      ------------------------ -----------------------

For the DWXGMAC IP, the RDES0 format varies based on the
Tunneled Frame bit (TNP):

a) For Non-Tunneled Frame (TNP=0)

    31                                                0
      ------------------------ -----------------------
RDES0| Inner VLAN TAG [31:16] | Outer VLAN TAG [15:0] |
      ------------------------ -----------------------

b) For Tunneled Frame (TNP=1)

     31                   8 7                3 2      0
      --------------------- ------------------ -------
RDES0| VNID/VSID           | Reserved         | OL2L3 |
      --------------------- ------------------ ------

The logic for handling tunneled frames is not yet implemented
in the dwxgmac2_wrback_get_rx_vlan_tci() function. Therefore,
it is prudent to maintain separate functions within their
respective descriptor driver files
(dwxgmac2_descs.c and dwmac4_descs.c).

Signed-off-by: Boon Khai Ng <boon.khai.ng@altera.com>
Link: https://patch.msgid.link/20250507063812.34000-2-boon.khai.ng@altera.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge tag 'batadv-next-pullrequest-20250509' of git://git.open-mesh.org/linux-merge
Jakub Kicinski [Sat, 10 May 2025 00:09:37 +0000 (17:09 -0700)]
Merge tag 'batadv-next-pullrequest-20250509' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - constify and move broadcast addr definition, Matthias Schiffer

 - remove start/stop queue function for mesh-iface, by Antonio Quartulli

 - switch to crc32 header for crc32c, by Sven Eckelmann

 - drop unused net_namespace.h include, by Sven Eckelmann

* tag 'batadv-next-pullrequest-20250509' of git://git.open-mesh.org/linux-merge:
  batman-adv: Drop unused net_namespace.h include
  batman-adv: Switch to crc32 header for crc32c
  batman-adv: no need to start/stop queue on mesh-iface
  batman-adv: constify and move broadcast addr definition
  batman-adv: Start new development cycle
====================

Link: https://patch.msgid.link/20250509091041.108416-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: mvpp2: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()
Vladimir Oltean [Thu, 8 May 2025 14:46:30 +0000 (17:46 +0300)]
net: mvpp2: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()

New timestamping API was introduced in commit 66f7223039c0 ("net: add
NDOs for configuring hardware timestamping") from kernel v6.6. It is
time to convert the mvpp2 driver to the new API, so that the
ndo_eth_ioctl() path can be removed completely.

Note that on the !port->hwtstamp condition, the old code used to fall
through in mvpp2_ioctl(), and return either -ENOTSUPP if !port->phylink,
or -EOPNOTSUPP, in phylink_mii_ioctl(). Keep the test for port->hwtstamp
in the newly introduced net_device_ops, but consolidate the error code
to just -EOPNOTSUPP. The other one is documented as NFS-specific, it's
best to avoid it anyway.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250508144630.1979215-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: gianfar: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()
Vladimir Oltean [Thu, 8 May 2025 14:36:59 +0000 (17:36 +0300)]
net: gianfar: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()

New timestamping API was introduced in commit 66f7223039c0 ("net: add
NDOs for configuring hardware timestamping") from kernel v6.6. It is
time to convert the gianfar driver to the new API, so that the
ndo_eth_ioctl() path can be removed completely.

Don't propagate the unnecessary "config.flags = 0;" assignment to
gfar_hwtstamp_get(), because dev_get_hwtstamp() provides a zero
initialized struct kernel_hwtstamp_config.

After removing timestamping logic from gfar_ioctl(), the rest is
equivalent to phy_do_ioctl_running(), so provide that directly as our
ndo_eth_ioctl() implementation.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # LS1021A
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250508143659.1944220-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dpaa2-eth: add ndo_hwtstamp_get() implementation
Vladimir Oltean [Thu, 8 May 2025 13:41:02 +0000 (16:41 +0300)]
net: dpaa2-eth: add ndo_hwtstamp_get() implementation

Allow SIOCGHWTSTAMP to retrieve the current timestamping settings on
DPNIs. This reflects what has been done in dpaa2_eth_hwtstamp_set().

Tested with hwstamp_ctl -i endpmac17. Before the change, it prints
"Device driver does not have support for non-destructive SIOCGHWTSTAMP."
After the change, it retrieves the previously set configuration.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # LX2160A
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250508134102.1747075-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dpaa2-eth: convert to ndo_hwtstamp_set()
Vladimir Oltean [Thu, 8 May 2025 13:41:01 +0000 (16:41 +0300)]
net: dpaa2-eth: convert to ndo_hwtstamp_set()

New timestamping API was introduced in commit 66f7223039c0 ("net: add
NDOs for configuring hardware timestamping") from kernel v6.6. It is
time to convert the DPAA2 Ethernet driver to the new API, so that the
ndo_eth_ioctl() path can be removed completely.

This driver only responds to SIOCSHWTSTAMP (not SIOCGHWTSTAMP) so
convert just that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # LX2160A
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250508134102.1747075-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch 'dpaa_eth-conversion-to-ndo_hwtstamp_get-and-ndo_hwtstamp_set'
Jakub Kicinski [Fri, 9 May 2025 23:40:24 +0000 (16:40 -0700)]
Merge branch 'dpaa_eth-conversion-to-ndo_hwtstamp_get-and-ndo_hwtstamp_set'

Vladimir Oltean says:

====================
dpaa_eth conversion to ndo_hwtstamp_get() and ndo_hwtstamp_set()

This is part of the effort to finalize the conversion of drivers to the
dedicated hardware timestamping API.

In the case of the DPAA1 Ethernet driver, a bit more care is needed,
because dpaa_ioctl() looks a bit strange. It handles the "set" IOCTL but
not the "get", and even the phylink_mii_ioctl() portion could do with
some cleanup.
====================

Link: https://patch.msgid.link/20250508124753.1492742-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dpaa_eth: simplify dpaa_ioctl()
Vladimir Oltean [Thu, 8 May 2025 12:47:53 +0000 (15:47 +0300)]
net: dpaa_eth: simplify dpaa_ioctl()

phylink_mii_ioctl() handles multiple ioctls in addition to just
SIOCGMIIREG: SIOCGMIIPHY, SIOCSMIIREG. Don't filter these out.

Also, phylink can handle the case where net_dev->phydev is NULL (like
optical SFP module, fixed-link). Be like other drivers and let phylink
do so without any driver-side call filtering.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Link: https://patch.msgid.link/20250508124753.1492742-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dpaa_eth: add ndo_hwtstamp_get() implementation
Vladimir Oltean [Thu, 8 May 2025 12:47:52 +0000 (15:47 +0300)]
net: dpaa_eth: add ndo_hwtstamp_get() implementation

Allow SIOCGHWTSTAMP to retrieve the current timestamping settings
on DPAA1 interfaces. This reflects what has been done in
dpaa_hwtstamp_set().

Tested with hwstamp_ctl -i fm1-mac9.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Link: https://patch.msgid.link/20250508124753.1492742-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dpaa_eth: convert to ndo_hwtstamp_set()
Vladimir Oltean [Thu, 8 May 2025 12:47:51 +0000 (15:47 +0300)]
net: dpaa_eth: convert to ndo_hwtstamp_set()

New timestamping API was introduced in commit 66f7223039c0 ("net: add
NDOs for configuring hardware timestamping") from kernel v6.6. It is
time to convert the DPAA1 driver to the new API, so that the
ndo_eth_ioctl() path can be removed completely.

This driver only responds to SIOCSHWTSTAMP (not SIOCGHWTSTAMP) so
convert just that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Link: https://patch.msgid.link/20250508124753.1492742-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dsa: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()
Vladimir Oltean [Thu, 8 May 2025 09:52:36 +0000 (12:52 +0300)]
net: dsa: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()

New timestamping API was introduced in commit 66f7223039c0 ("net: add
NDOs for configuring hardware timestamping") from kernel v6.6. It is
time to convert DSA to the new API, so that the ndo_eth_ioctl() path can
be removed completely.

Move the ds->ops->port_hwtstamp_get() and ds->ops->port_hwtstamp_set()
calls from dsa_user_ioctl() to dsa_user_hwtstamp_get() and
dsa_user_hwtstamp_set().

Due to the fact that the underlying ifreq type changes to
kernel_hwtstamp_config, the drivers and the Ocelot switchdev front-end,
all hooked up directly or indirectly, must also be converted all at once.

The conversion also updates the comment from dsa_port_supports_hwtstamp(),
which is no longer true because kernel_hwtstamp_config is kernel memory
and does not need copy_to_user(). I've deliberated whether it is
necessary to also update "err != -EOPNOTSUPP" to a more general "!err",
but all drivers now either return 0 or -EOPNOTSUPP.

The existing logic from the ocelot_ioctl() function, to avoid
configuring timestamping if the PHY supports the operation, is obsoleted
by more advanced core logic in dev_set_hwtstamp_phylib().

This is only a partial preparation for proper PHY timestamping support.
None of these switch driver currently sets up PTP traps for PHY
timestamping, so setting dev->see_all_hwtstamp_requests is not yet
necessary and the conversion is relatively trivial.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # felix, sja1105, mv88e6xxx
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250508095236.887789-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agotools: ynl: handle broken pipe gracefully in CLI
Donald Hunter [Thu, 8 May 2025 11:21:02 +0000 (12:21 +0100)]
tools: ynl: handle broken pipe gracefully in CLI

When sending YNL CLI output into a pipe, closing the pipe causes a
BrokenPipeError. E.g. running the following and quitting less:

./tools/net/ynl/pyynl/cli.py --family rt-link --dump getlink | less
Traceback (most recent call last):
  File "/home/donaldh/net-next/./tools/net/ynl/pyynl/cli.py", line 160, in <module>
    main()
    ~~~~^^
  File "/home/donaldh/net-next/./tools/net/ynl/pyynl/cli.py", line 142, in main
    output(reply)
    ~~~~~~^^^^^^^
  File "/home/donaldh/net-next/./tools/net/ynl/pyynl/cli.py", line 97, in output
    pprint.PrettyPrinter().pprint(msg)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^
[...]
BrokenPipeError: [Errno 32] Broken pipe

Consolidate the try block for ops and notifications, and gracefully
handle the BrokenPipeError by adding an exception handler to the
consolidated try block.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250508112102.63539-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoethtool: Block setting of symmetric RSS when non-symmetric rx-flow-hash is requested
Gal Pressman [Thu, 8 May 2025 10:30:34 +0000 (13:30 +0300)]
ethtool: Block setting of symmetric RSS when non-symmetric rx-flow-hash is requested

Symmetric RSS hash requires that:
* No other fields besides IP src/dst and/or L4 src/dst are set
* If src is set, dst must also be set

This restriction was only enforced when RXNFC was configured after
symmetric hash was enabled. In the opposite order of operations (RXNFC
then symmetric enablement) the check was not performed.

Perform the sanity check on set_rxfh as well, by iterating over all flow
types hash fields and making sure they are all symmetric.

Introduce a function that returns whether a flow type is hashable (not
spec only) and needs to be iterated over. To make sure that no one
forgets to update the list of hashable flow types when adding new flow
types, a static assert is added to draw the developer's attention.

The conversion of uapi #defines to enum is not ideal, but as Jakub
mentioned [1], we have precedent for that.

[1] https://lore.kernel.org/netdev/20250324073509.6571ade3@kernel.org/

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250508103034.885536-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: thunder: make tx software timestamp independent
Jason Xing [Thu, 8 May 2025 03:44:33 +0000 (11:44 +0800)]
net: thunder: make tx software timestamp independent

skb_tx_timestamp() is used for tx software timestamp enabled by
SOF_TIMESTAMPING_TX_SOFTWARE while SKBTX_HW_TSTAMP is used for
SOF_TIMESTAMPING_TX_HARDWARE. As it clearly shows they are different
timestamps in two dimensions, it's not appropriate to group these two
together in the if-statement.

This patch completes three things:
1. make the software one standalone. Users are able to set both
timestamps together with SOF_TIMESTAMPING_OPT_TX_SWHW flag.
2. make the software one generated after the hardware timestamp logic to
avoid generating sw and hw timestamps at one time without
SOF_TIMESTAMPING_OPT_TX_SWHW being set.
3. move the software timestamp call as close to the door bell.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250508034433.14408-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoiidc/ice/irdma: Update IDC to support multiple consumers
Dave Ertman [Wed, 16 Apr 2025 02:15:49 +0000 (21:15 -0500)]
iidc/ice/irdma: Update IDC to support multiple consumers

In preparation of supporting more than a single core PCI driver
for RDMA, move ice specific structs like qset_params, qos_info
and qos_params from iidc_rdma.h to iidc_rdma_ice.h.

Previously, the ice driver was just exporting its entire PF struct
to the auxiliary driver, but since each core driver will have its own
different PF struct, implement a universal struct that all core drivers
can provide to the auxiliary driver through the probe call.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Co-developed-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Co-developed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Co-developed-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 months agoMerge branch 'add-more-features-for-enetc-v4-round-2'
Jakub Kicinski [Fri, 9 May 2025 02:43:54 +0000 (19:43 -0700)]
Merge branch 'add-more-features-for-enetc-v4-round-2'

Wei Fang says:

====================
Add more features for ENETC v4 - round 2

This patch set adds the following features.
1. Compared with ENETC v1, the formats of tables and command BD of ENETC
v4 have changed significantly, and the two are not compatible. Therefore,
in order to support the NETC Table Management Protocol (NTMP) v2.0, we
introduced the netc-lib driver and added support for MAC address filter
table and RSS table.
2. Add MAC filter and VLAN filter support for i.MX95 ENETC PF.
3. Add RSS support for i.MX95 ENETC PF.
4. Add loopback support for i.MX95 ENETC PF.

Link: https://lore.kernel.org/20250103060610.2233908-1-wei.fang@nxp.com/
Link: https://lore.kernel.org/20250113082245.2332775-1-wei.fang@nxp.com/
Link: https://lore.kernel.org/20250304072201.1332603-1-wei.fang@nxp.com/
Link: https://lore.kernel.org/20250311053830.1516523-1-wei.fang@nxp.com/
Link: https://lore.kernel.org/20250411095752.3072696-1-wei.fang@nxp.com/
Link: https://lore.kernel.org/20250428105657.3283130-1-wei.fang@nxp.com/
====================

Link: https://patch.msgid.link/20250506080735.3444381-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: enetc: add loopback support for i.MX95 ENETC PF
Wei Fang [Tue, 6 May 2025 08:07:35 +0000 (16:07 +0800)]
net: enetc: add loopback support for i.MX95 ENETC PF

Add internal loopback support for i.MX95 ENETC PF, the default loopback
mode is MAC level loopback, the MAC Tx data is looped back onto the Rx.
The MAC interface runs at a fixed 1:8 ratio of NETC clock in MAC-level
loopback mode, with no dependency on Tx clock.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250506080735.3444381-15-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: enetc: add VLAN filtering support for i.MX95 ENETC PF
Wei Fang [Tue, 6 May 2025 08:07:34 +0000 (16:07 +0800)]
net: enetc: add VLAN filtering support for i.MX95 ENETC PF

Since the offsets of the VLAN hash filter registers of ENETC v4 are
different from ENETC v1. Therefore, enetc_set_si_vlan_ht_filter() is
added to set the correct VLAN hash filter based on the SI ID and ENETC
revision, so that ENETC v4 PF driver can reuse enetc_vlan_rx_add_vid()
and enetc_vlan_rx_del_vid(). In addition, the VLAN promiscuous mode will
be enabled if VLAN filtering is disabled, which means that PF qualifies
for reception of all VLAN tags.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250506080735.3444381-14-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: enetc: move generic VLAN hash filter functions to enetc_pf_common.c
Wei Fang [Tue, 6 May 2025 08:07:33 +0000 (16:07 +0800)]
net: enetc: move generic VLAN hash filter functions to enetc_pf_common.c

The VLAN hash filters of ENETC v1 and v4 are basically the same, the only
difference is that the offset of the VLAN hash filter registers has been
changed in ENETC v4. So some functions like enetc_vlan_rx_add_vid() and
enetc_vlan_rx_del_vid() only need to be slightly modified to be reused
by ENETC v4. Currently, we just move these functions from enetc_pf.c to
enetc_pf_common.c. Appropriate modifications will be made for ENETC4 in
a subsequent patch.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250506080735.3444381-13-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: enetc: extract enetc_refresh_vlan_ht_filter()
Wei Fang [Tue, 6 May 2025 08:07:32 +0000 (16:07 +0800)]
net: enetc: extract enetc_refresh_vlan_ht_filter()

Extract the common function enetc_refresh_vlan_ht_filter() from
enetc_sync_vlan_ht_filter() so that it can be reused by the ENETC
v4 PF and VF drivers in the future.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250506080735.3444381-12-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: enetc: enable RSS feature by default
Wei Fang [Tue, 6 May 2025 08:07:31 +0000 (16:07 +0800)]
net: enetc: enable RSS feature by default

Receive side scaling (RSS) is a network driver technology that enables
the efficient distribution of network receive processing across multiple
CPUs in multiprocessor systems. Therefore, it is better to enable RSS by
default so that the CPU load can be balanced and network performance can
be improved when then network is enabled.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250506080735.3444381-11-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>