====================
thunderbolt: net: Enable full end-to-end flow control
Thunderbolt/USB4 host controllers support full end-to-end flow control
that prevents dropping packets if there are not enough hardware receive
buffers. So far it has not been enabled for the networking driver yet
but this series changes that. There is one snag though: the second
generation (Intel Falcon Ridge) had a bug that needs special quirk to
get it working. We had that in the early stages of the Thunderbolt/USB4
driver but it got dropped because it was not needed at the time. Now we
add it back as a quirk for the host controller (NHI).
The first patch of this series is a bugfix that I'm planning to push for
v6.0-rc. Rest are v6.1 material. This also includes a patch that shows
the XDomain link type in sysfs the same way we do for USB4 routers and
updates the networking driver module description.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mika Westerberg [Tue, 30 Aug 2022 15:32:49 +0000 (18:32 +0300)]
net: thunderbolt: Enable full end-to-end flow control
USB4NET protocol allows the networking drivers to take advantage of
end-to-end flow control supported by the USB4 host interface. This
should prevent the receiving side from dropping network packets.
In adddition add a module parameter that can be used to turn this off
just in case it causes problems.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mika Westerberg [Tue, 30 Aug 2022 15:32:48 +0000 (18:32 +0300)]
thunderbolt: Add back Intel Falcon Ridge end-to-end flow control workaround
As we are now enabling full end-to-end flow control to the Thunderbolt
networking driver, in order for it to work properly on second generation
Thunderbolt hardware (Falcon Ridge), we need to add back the workaround
that was removed with commit 53f13319d131 ("thunderbolt: Get rid of E2E
workaround"). However, this time we only apply it for Falcon Ridge
controllers as a form of an additional quirk. For non-Falcon Ridge this
does nothing.
While there fix a typo 'reqister' -> 'register' in the comment.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mika Westerberg [Tue, 30 Aug 2022 15:32:47 +0000 (18:32 +0300)]
thunderbolt: Show link type for XDomain connections too
Following what we do for routers already, extend this to XDomain
connections as well. This will show in sysfs whether the link is in USB4
or Thunderbolt mode.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mika Westerberg [Tue, 30 Aug 2022 15:32:46 +0000 (18:32 +0300)]
net: thunderbolt: Enable DMA paths only after rings are enabled
If the other host starts sending packets early on it is possible that we
are still in the middle of populating the initial Rx ring packets to the
ring. This causes the tbnet_poll() to mess over the queue and causes
list corruption. This happens specifically when connected with macOS as
it seems start sending various IP discovery packets as soon as its side
of the paths are configured.
To prevent this we move the DMA path enabling to happen after we have
primed the Rx ring. This makes sure no incoming packets can arrive
before we are ready to handle them.
Fixes: e69b6c02b4c3 ("net: Add support for networking over Thunderbolt cable") Cc: stable@vger.kernel.org Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
changes v4:
- add Reviewed-by: Vladimir Oltean <olteanv@gmail.com> to all patches
- fix checkpatch warnings.
changes v3:
- fix build error in the middle of the patch stack.
changes v2:
- add regmap_ranges for KSZ9477
- drop output clock devicetree in driver validation patches. DTs need
some more refactoring and can be done in a separate patch set.
- remove some unused variables.
This patch series adds error handling for the PHY read/write path and optional
register access validation.
After adding regmap_ranges for KSZ8563 some bugs was detected, so
critical bug fixes are sorted before ragmap_range patch.
Potentially this bug fixes can be ported to stable kernels, but need to be
reworked.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel [Fri, 26 Aug 2022 10:56:34 +0000 (12:56 +0200)]
net: dsa: microchip: remove IS_9893 flag
Use chip_id as other places of this code do it
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel [Fri, 26 Aug 2022 10:56:33 +0000 (12:56 +0200)]
net: dsa: microchip: remove unused sgmii variable
This variable is not used. So, remove it.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This variable is not used on ksz9477 side. Remove it.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel [Fri, 26 Aug 2022 10:56:31 +0000 (12:56 +0200)]
net: dsa: microchip: remove unused port phy variable
This variable is unused. So, drop it.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel [Fri, 26 Aug 2022 10:56:30 +0000 (12:56 +0200)]
net: dsa: microchip: ksz9477: use internal_phy instead of phy_port_cnt
With code refactoring was introduced new variable internal_phy. Let's
use it.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel [Fri, 26 Aug 2022 10:56:29 +0000 (12:56 +0200)]
net: dsa: microchip: add regmap_range for KSZ9477 chip
Add register validation for KSZ9477
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel [Fri, 26 Aug 2022 10:56:28 +0000 (12:56 +0200)]
net: dsa: microchip: ksz9477: remove MII_CTRL1000 check from ksz9477_w_phy()
The reason why PHYlib may access MII_CTRL1000 on the chip without GBit
support is only if chip provides wrong information about extended caps
register. This issue is now handled by ksz9477_r_phy_quirks()
With proper regmap_ranges provided for all chips we will be able to
catch this kind of bugs any way. So, remove this sanity check.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel [Fri, 26 Aug 2022 10:56:27 +0000 (12:56 +0200)]
net: dsa: microchip: add regmap_range for KSZ8563 chip
Add register validation for KSZ8563.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel [Fri, 26 Aug 2022 10:56:26 +0000 (12:56 +0200)]
net: dsa: microchip: add support for regmap_access_tables
This is complex driver with support for different chips with different
layouts. To detect at least some bugs earlier, we should validate register
accesses by using regmap_access_table support.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel [Fri, 26 Aug 2022 10:56:25 +0000 (12:56 +0200)]
net: dsa: microchip: KSZ9893: do not write to not supported Output Clock Control Register
This issue was detected after adding regmap register access validation.
KSZ9893 compatible chips do not have "Output Clock Control Register
0x0103". So, avoid writing to it.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel [Fri, 26 Aug 2022 10:56:24 +0000 (12:56 +0200)]
net: dsa: microchip: ksz8795: add error handling to ksz8_r/w_phy
Now ksz_pread/ksz_pwrite can return error value. So, make use of it.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel [Fri, 26 Aug 2022 10:56:23 +0000 (12:56 +0200)]
net: dsa: microchip: ksz9477: add error handling to ksz9477_r/w_phy
Now ksz_pread/ksz_pwrite can return error value. So, make use of it.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel [Fri, 26 Aug 2022 10:56:22 +0000 (12:56 +0200)]
net: dsa: microchip: forward error value on all ksz_pread/ksz_pwrite functions
ksz_read*/ksz_write* are able to return errors, so forward it.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel [Fri, 26 Aug 2022 10:56:21 +0000 (12:56 +0200)]
net: dsa: microchip: allow to pass return values for PHY read/write accesses
PHY access may end with errors on different levels. So, allow to forward
return values where possible.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel [Fri, 26 Aug 2022 10:56:20 +0000 (12:56 +0200)]
net: dsa: microchip: don't announce extended register support on non Gbit chips
This issue was detected after adding support of regmap_ranges for KSZ8563R
chip. This chip is reporting extended registers support without having
actual extended registers. This made PHYlib request not existing
registers.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel [Fri, 26 Aug 2022 10:56:19 +0000 (12:56 +0200)]
net: dsa: microchip: do per-port Gbit detection instead of per-chip
KSZ8563 has two 100Mbit PHYs and CPU port with RGMII support. Since
1000Mbit configuration for the RGMII capable MAC is present, we should
use per port validation.
As main part of migration to per-port validation we need to rework
ksz9477_switch_init() function. Which is using undocumented
REG_GLOBAL_OPTIONS register to detect per-chip Gbit support. So, it is
related to some sort of risk for regressions.
To reduce this risk I compared the code with publicly available
documentations. This function will executed on following currently
supported chips:
struct ksz_chip_data OF compatible
KSZ9477 KSZ9477
KSZ9897 KSZ9897
KSZ9893 KSZ9893, KSZ9563
KSZ8563 KSZ8563
KSZ9567 KSZ9567
Only KSZ9893, KSZ9563, KSZ8563 document existence of 0xf ==
REG_GLOBAL_OPTIONS register with bit field description "SKU ID":
KSZ9893 0x0C
KSZ9563 0x1C
KSZ8563 0x3C
The existence of hidden flags is not documented.
KSZ9477, KSZ9897, KSZ9567 do not document this register at all.
Only KSZ8563 is documented as non Gbit chip: 100Mbit PHYs and RGMII CPU
port. So, this change should not introduce a regression for
configurations with properly used OF compatibles.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel [Fri, 26 Aug 2022 10:56:18 +0000 (12:56 +0200)]
net: dsa: microchip: add separate struct ksz_chip_data for KSZ8563 chip
Add separate entry for the KSZ8563 chip. According to the documentation
it can support Gbit only on RGMII port. So, we will need to be able to
describe in the followup patch.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The assertion was intentionally removed in commit 043b8413e8c0 ("net:
devlink: remove redundant rtnl lock assert") and, contrary what is
described in the commit message, the comment reflects that: "Caller must
hold RTNL mutex or reference to dev...".
====================
mlxsw: Configure max LAG ID for Spectrum-4
Amit Cohen writes:
In the device, LAG identifiers are stored in the port group table (PGT).
During initialization, firmware reserves a certain amount of entries at
the beginning of this table for LAG identifiers.
In Spectrum-4, the size of the PGT table did not increase, but the
maximum number of LAG identifiers was doubled, leaving less room for
others entries (e.g., flood entries) that also reside in the PGT.
Therefore, in order to avoid a regression and as long as there is no
explicit requirement to support 256 LAGs, configure the firmware to
allocate the same amount of LAG entries (128) as in Spectrum-{2,3}.
This can be done via the 'max_lag' field in CONFIG_PROFILE command.
Patch set overview:
Patch #1 edits the comment of the existing 'max_lag' field.
Patch #2 adds support for configuring 'max_lag' field via CONFIG_PROFILE
command.
Patch #3 adds an helper function to get the actual 'max_lag' in the
device.
Patch #4 adjusts Spectrum-4 to configure 'max_lag' field.
====================
Amit Cohen [Fri, 26 Aug 2022 16:06:52 +0000 (18:06 +0200)]
mlxsw: spectrum: Add a copy of 'struct mlxsw_config_profile' for Spectrum-4
Starting from Spectrum-4, the maximum number of LAG IDs can be configured
by software via CONFIG_PROFILE command during driver initialization.
Add a dedicated instance of 'struct mlxsw_config_profile' for Spectrum-4
and set the 'max_lag' field to 128, which is the same amount of LAG entries
as in Spectrum-{2,3}. Without this configuration, firmware reserves 256
(the value of 'cap_max_lag' resource) entries at beginning of PGT table for
LAG identifiers, which means that less entries in PGT will be available.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Amit Cohen [Fri, 26 Aug 2022 16:06:51 +0000 (18:06 +0200)]
mlxsw: Add a helper function for getting maximum LAG ID
Currently the driver queries the maximum supported LAG ID from firmware.
This will not be accurate anymore once the driver will configure 'max_lag'
via CONFIG_PROFILE command.
For resource query, firmware returns the maximum LAG ID which is supported
by hardware. Software can configure firmware to do not allocate entries for
all the supported LAGs, and to limit LAG IDs. In this case, the resource
query will not return the actual maximum LAG ID.
Add a helper function for getting this value. In case that 'max_lag' field
was set during initialization, return the value which was used, otherwise,
query firmware for the maximum supported ID.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Amit Cohen [Fri, 26 Aug 2022 16:06:50 +0000 (18:06 +0200)]
mlxsw: Support configuring 'max_lag' via CONFIG_PROFILE
In the device, LAG identifiers are stored in the port group table (PGT).
During initialization, firmware reserves a certain amount of entries at
the beginning of this table for LAG identifiers.
In Spectrum-4, the size of the PGT table did not increase, but the maximum
number of LAG identifiers was doubled, leaving less room for others entries
(e.g., flood entries) that also reside in the PGT.
Therefore, in order to avoid a regression and as long as there is no
explicit requirement to support 256 LAGs, mlxsw driver will configure the
firmware to allocate the same amount of LAG entries (128) as in
Spectrum-{2,3}. This configuration is done using 'max_lag' field in
CONFIG_PROFILE command. Extend 'struct mlxsw_config_profile' to support
'max_lag' field and configure firmware accordingly.
A next patch will adjust Spectrum-4 to configure 'max_lag' field.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Amit Cohen [Fri, 26 Aug 2022 16:06:49 +0000 (18:06 +0200)]
mlxsw: cmd: Edit the comment of 'max_lag' field in CONFIG_PROFILE
Starting from Spectrum-4, the maximum number of LAG IDs can be configured
by software via CONFIG_PROFILE command during driver initialization.
Edit the comment of 'max_lag' field to mention that this field is reserved
in Spectrum-1/2/3 and describe firmware behavior.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Documentation: bonding: clarify supported modes for tlb_dynamic_lb
tlb_dynamic_lb bonding option is compatible with balance-tlb and balance-alb
modes. In order to be consistent with other option documentation, it should
mention both modes not only balance-tlb.
Dan Carpenter [Fri, 26 Aug 2022 15:03:30 +0000 (18:03 +0300)]
mlxsw: minimal: Return -ENOMEM on allocation failure
These error paths return success but they should return -ENOMEM.
Fixes: 01328e23a476 ("mlxsw: minimal: Extend module to port mapping with slot index") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/YwjgwoJ3M7Kdq9VK@kili Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net/mlx5e: Do not use err uninitialized in mlx5e_rep_add_meta_tunnel_rule()
Clang warns:
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:481:6: error: variable 'err' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
if (IS_ERR(flow_rule)) {
^~~~~~~~~~~~~~~~~
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:489:9: note: uninitialized use occurs here
return err;
^~~
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:481:2: note: remove the 'if' if its condition is always true
if (IS_ERR(flow_rule)) {
^~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:474:9: note: initialize the variable 'err' to silence this warning
int err;
^
= 0
1 error generated.
There is little reason to have the 'goto + error variable' construct in
this function. Get rid of it and just return the PTR_ERR value in the if
statement and 0 at the end.
Jiri Pirko [Fri, 26 Aug 2022 11:04:11 +0000 (13:04 +0200)]
funeth: remove pointless check of devlink pointer in create/destroy_netdev() flows
Once devlink port is successfully registered, the devlink pointer is not
NULL. Therefore, the check is going to be always true and therefore
pointless. Remove it.
- Describe the switches, which SoCs they are in, or if they are standalone.
- Explain the various ways of configuring MT7530's port 5.
- Remove phy-mode = "rgmii-txid" from description. Same code path is
followed for delayed rgmii and rgmii phy-mode on mtk_eth_soc.c.
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Add examples which include a wide variation of configurations.
- Make example comments YAML comment instead of DT binding comment.
- Add interrupt controller to the examples. Include header file for
interrupt.
- Change reset line for MT7621 examples.
- Pretty formatting for the examples.
- Change switch reg to 0.
- Change port labels to fit the example, change port 4 label to wan.
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Add description for reset-gpios.
- Invalidate reset-gpios if mediatek,mcm is used. We cannot use multiple
reset lines at the same time.
- Invalidate mediatek,mcm if the compatible device is mediatek,mt7531.
There is no multi-chip module version of mediatek,mt7531.
- Require mediatek,mcm for mediatek,mt7621 as the compatible string is only
used for the multi-chip module version of MT7530.
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Arınç ÜNAL [Thu, 25 Aug 2022 08:22:56 +0000 (11:22 +0300)]
dt-bindings: net: dsa: mediatek,mt7530: make trivial changes
Make trivial changes on the binding.
- Update title to include MT7531 switch.
- Add me as a maintainer. List maintainers in alphabetical order by first
name.
- Add description to compatible strings.
- Stretch descriptions up to the 80 character limit.
- Remove lists for single items.
- Remove requiring reg as it's already required by dsa-port.yaml.
- Define acceptable reg values for the CPU ports.
- Remove quotes from $ref: "dsa.yaml#".
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Fri, 26 Aug 2022 08:27:30 +0000 (10:27 +0200)]
net: devlink: stub port params cmds for they are unused internally
Follow-up the removal of unused internal api of port params made by
commit 42ded61aa75e ("devlink: Delete not used port parameters APIs")
and stub the commands and add extack message to tell the user what is
going on.
If later on port params are needed, could be easily re-introduced,
but until then it is a dead code.
====================
netlink: support reporting missing attributes
This series adds support for reporting missing attributes
in a structured way. We communicate the type of the missing
attribute and if it was missing inside a nest the offset
of that nest.
Example of (YAML-based) user space reporting ethtool header
missing:
Kernel error: missing attribute: .header
I was tempted to integrate the check with the policy
but it seems tricky without doing a full scan, and there
may be a ton of attrs in the policy. So leaving that
for later.
====================
Jakub Kicinski [Fri, 26 Aug 2022 03:09:35 +0000 (20:09 -0700)]
ethtool: report missing header via ext_ack in the default handler
The actual presence check for the header is in
ethnl_parse_header_dev_get() but it's a few layers in,
and already has a ton of arguments so let's just pick
the low hanging fruit and check for missing header in
the default request handler.
Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Fri, 26 Aug 2022 03:09:31 +0000 (20:09 -0700)]
netlink: add support for ext_ack missing attributes
There is currently no way to report via extack in a structured way
that an attribute is missing. This leads to families resorting to
string messages.
Add a pair of attributes - @offset and @type for machine-readable
way of reporting missing attributes. The @offset points to the
nest which should have contained the attribute, @type is the
expected nla_type. The offset will be skipped if the attribute
is missing at the message level rather than inside a nest.
User space should be able to figure out which attribute enum
(AKA attribute space AKA attribute set) the nest pointed to by
@offset is using.
Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Fri, 26 Aug 2022 03:09:30 +0000 (20:09 -0700)]
netlink: factor out extack composition
The ext_ack writing code looks very "organically grown".
Move the calculation of the size and writing out to helpers.
This is more idiomatic and gives us the ability to return early
avoiding the long (and randomly ordered) "if" conditions.
Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pavel Begunkov [Thu, 25 Aug 2022 12:06:31 +0000 (13:06 +0100)]
net: unify alloclen calculation for paged requests
Consolidate alloclen and pagedlen calculation for zerocopy and normal
paged requests. The current non-zerocopy paged version can a bit
overallocate and unnecessary copy a small chunk of data into the linear
part.
====================
nfp: port speed and eeprom get/set updates
this short series is the initial updates for the NFP driver for the v6.1
Kernel. It covers two enhancements:
1. Patches 1/3 and 2/3:
- Support cases where application firmware does not know port speeds
a priori by relaying this information from the management firmware
to the application firmware.
This allows the existing mechanism, whereby the driver reports port
speeds to user-space as provided by the application firmware, to work
in this case.
2. Patch 2/3:
- Add support for eeprom get and set command
====================
Yinjun Zhang [Thu, 25 Aug 2022 14:12:21 +0000 (16:12 +0200)]
nfp: propagate port speed from management firmware
In future releases the NIC application firmware may be indifferent to port
speeds - not built for specific port speeds - and consequently it will not
be able to report VF port speeds to the driver without first learning them.
With this change, the driver will pass the speed of physical ports from
management firmware to application firmware, and the latter will copy the
speed of port 0 to all the active VFs. So that the driver can get VF port
speed as before.
The port speed of a VF may be requested from userspace using:
ethtool <vf-intf>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
David S. Miller [Mon, 29 Aug 2022 11:57:38 +0000 (12:57 +0100)]
Merge branch 'sparx5-mrouter'
Casper Andersson says:
====================
net: sparx5: add mrouter support
This series adds support for multicast router ports to SparX5. To manage
mrouter ports the driver must keep track of mdb entries. When adding an
mrouter port the driver has to iterate over all mdb entries and modify
them accordingly.
Casper Andersson [Thu, 25 Aug 2022 09:28:37 +0000 (11:28 +0200)]
net: sparx5: add support for mrouter ports
All multicast should be forwarded to mrouter ports. Mrouter ports must
therefore be part of all active multicast groups, and override flooding
from being disabled.
Signed-off-by: Casper Andersson <casper.casan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Casper Andersson [Thu, 25 Aug 2022 09:28:35 +0000 (11:28 +0200)]
ethernet: Add helpers to recognize addresses mapped to IP multicast
IP multicast must sometimes be discriminated from non-IP multicast,
e.g. when determining the forwarding behavior of a given group in the
presence of multicast router ports on an offloaded bridge. Therefore,
provide helpers to identify these groups.
Signed-off-by: Casper Andersson <casper.casan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 25 Aug 2022 00:18:30 +0000 (17:18 -0700)]
genetlink: start to validate reserved header bytes
We had historically not checked that genlmsghdr.reserved
is 0 on input which prevents us from using those precious
bytes in the future.
One use case would be to extend the cmd field, which is
currently just 8 bits wide and 256 is not a lot of commands
for some core families.
To make sure that new families do the right thing by default
put the onus of opting out of validation on existing families.
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel) Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Thu, 25 Aug 2022 13:17:19 +0000 (16:17 +0300)]
net: fman: memac: Uninitialized variable on error path
The "fixed_link" is only allocated sometimes but it's freed
unconditionally in the error handling. Set it to NULL so we don't free
uninitialized data.
Fixes: 9ea4742a55ca ("net: fman: Configure fixed link in memac_initialization") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Sean Anderson <sean.anderson@seco.com> Link: https://lore.kernel.org/r/Ywd2X6gdKmTfYBxD@kili Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
openvswitch: allow specifying ifindex of new interfaces
CRIU currently do not support checkpoint/restore of OVS configurations, but
there was several requests for it. For example,
https://github.com/lxc/lxc/issues/2909
The main problem is ifindexes of newly created interfaces. We realy need to
preserve them after restore. Current openvswitch API does not allow to
specify ifindex. Most of the time we can just create an interface via
generic netlink requests and plug it into ovs but datapaths (generally any
OVS_VPORT_TYPE_INTERNAL) can only be created via openvswitch requests which
do not support selecting ifindex.
This patch allows to do so.
For new datapaths I decided to use dp_infindex in header as infindex
because it control ifindex for other requests too.
For internal vports I reused OVS_VPORT_ATTR_IFINDEX.
The only concern I have is that previously dp_ifindex was not used for
OVS_DP_VMD_NEW requests and some software may not set it to zero. However
we have been running this patch at Virtuozzo for 2 years and have not
encountered this problem. Not sure if it is worth to add new
ovs_datapath_attr instead.
====================
openvswitch: allow specifying ifindex of new interfaces
CRIU is preserving ifindexes of net devices after restoration. However,
current Open vSwitch API does not allow to target ifindex, so we cannot
correctly restore OVS configuration.
Add new OVS_DP_ATTR_IFINDEX for OVS_DP_CMD_NEW and use it as desired
ifindex.
Use OVS_VPORT_ATTR_IFINDEX during OVS_VPORT_CMD_NEW to specify new netdev
ifindex.
Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sergei Antonov [Wed, 24 Aug 2022 15:17:24 +0000 (18:17 +0300)]
net: ftmac100: add an opportunity to get ethaddr from the platform
This driver always generated a random ethernet address. Leave it as a
fallback solution, but add a call to platform_get_ethdev_address().
Handle EPROBE_DEFER returned from platform_get_ethdev_address() to
retry when EEPROM is ready.
Marcus Carlberg [Wed, 24 Aug 2022 09:37:06 +0000 (11:37 +0200)]
net: dsa: mv88e6xxx: Allow external SMI if serial
p0_mode set to one of the supported serial mode should not prevent
configuring the external SMI interface in
mv88e6xxx_g2_scratch_gpio_set_smi. The current masking of the p0_mode
only checks the first 2 bits. This results in switches supporting
serial mode cannot setup external SMI on certain serial modes
(Ex: 1000BASE-X and SGMII).
Extend the mask of the p0_mode to include the reduced modes and
serial modes as allowed modes for the external SMI interface.
Jiri Pirko [Thu, 25 Aug 2022 08:19:40 +0000 (10:19 +0200)]
genetlink: hold read cb_lock during iteration of genl_fam_idr in genl_bind()
In genl_bind(), currently genl_lock and write cb_lock are taken
for iteration of genl_fam_idr and processing of static values
stored in struct genl_family. Take just read cb_lock for this task
as it is sufficient to guard the idr and the struct against
concurrent genl_register/unregister_family() calls.
This will allow to run genl command processing in genl_rcv() and
mnl_socket_setsockopt(.., NETLINK_ADD_MEMBERSHIP, ..) in parallel.
Jiri Pirko [Thu, 25 Aug 2022 11:29:23 +0000 (13:29 +0200)]
net: devlink: add RNLT lock assertion to devlink_compat_switch_id_get()
Similar to devlink_compat_phys_port_name_get(), make sure that
devlink_compat_switch_id_get() is called with RTNL lock held. Comment
already says so, so put this in code as well.
Marcus Carlberg [Mon, 22 Aug 2022 14:41:36 +0000 (16:41 +0200)]
net: dsa: mv88e6xxx: support RGMII cmode
Since the probe defaults all interfaces to the highest speed possible
(10GBASE-X in mv88e6393x) before the phy mode configuration from the
devicetree is considered it is currently impossible to use port 0 in
RGMII mode.
This change will allow RGMII modes to be configurable for port 0
enabling port 0 to be configured as RGMII as well as serial depending
on configuration.
Zhengchao Shao [Wed, 24 Aug 2022 09:10:03 +0000 (17:10 +0800)]
net: sched: remove unnecessary init of qdisc skb head
The memory allocated by using kzallloc_node and kcalloc has been cleared.
Therefore, the structure members of the new qdisc are 0. So there's no
need to explicitly assign a value of 0.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 26 Aug 2022 10:56:55 +0000 (11:56 +0100)]
Merge tag 'wireless-next-2022-08-26-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes berg says:
====================
Various updates:
* rtw88: operation, locking, warning, and code style fixes
* rtw89: small updates
* cfg80211/mac80211: more EHT/MLO (802.11be, WiFi 7) work
* brcmfmac: a couple of fixes
* misc cleanups etc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The Lenovo OneLink+ Dock contains an RTL8153 controller that behaves as
a broken CDC device by default. Add the custom Lenovo PID to the r8152
driver to support it properly.
Also, systems compatible with this dock provide a BIOS option to enable
MAC address passthrough (as per Lenovo document "ThinkPad Docking
Solutions 2017"). Add the custom PID to the MAC passthrough list too.
Tested on a ThinkPad 13 1st gen with the expected results:
passthrough disabled: Invalid header when reading pass-thru MAC addr
passthrough enabled: Using pass-thru MAC addr XX:XX:XX:XX:XX:XX
Signed-off-by: Jean-Francois Le Fillatre <jflf_kernel@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ping-Ke Shih [Mon, 15 Aug 2022 06:20:04 +0000 (14:20 +0800)]
wifi: rtw88: fix uninitialized use of primary channel index
clang reports uninitialized use:
>> drivers/net/wireless/realtek/rtw88/main.c:731:2: warning: variable
'primary_channel_idx' is used uninitialized whenever switch default is
taken [-Wsometimes-uninitialized]
default:
^~~~~~~
drivers/net/wireless/realtek/rtw88/main.c:754:39: note: uninitialized
use occurs here
hal->current_primary_channel_index = primary_channel_idx;
^~~~~~~~~~~~~~~~~~~
drivers/net/wireless/realtek/rtw88/main.c:687:24: note: initialize the
variable 'primary_channel_idx' to silence this warning
u8 primary_channel_idx;
^
= '\0'
This situation could not happen, because possible channel bandwidth
20/40/80MHz are enumerated.
Fixes: 341dd1f7de4c ("wifi: rtw88: add the update channel flow to support setting by parameters") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20220815062004.22920-1-pkshih@realtek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Qingfang DENG [Wed, 24 Aug 2022 06:10:34 +0000 (14:10 +0800)]
net: phylink: allow RGMII/RTBI in-band status
As per RGMII specification v2.0, section 3.4.1, RGMII/RTBI has an
optional in-band status feature where the PHY's link status, speed and
duplex mode can be passed to the MAC.
Allow RGMII/RTBI to use in-band status.
Signed-off-by: Qingfang DENG <dqfext@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 26 Aug 2022 09:04:55 +0000 (10:04 +0100)]
Merge branch 'prestera-matchall'
Maksym Glubokiy says:
====================
net: prestera: matchall features
This patch series extracts matchall rules management out of SPAN API
implementation and adds 2 features on top of that:
- support for egress traffic (mirred egress action)
- proper rule priorities management between matchall and flower
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Maksym Glubokiy [Tue, 23 Aug 2022 11:39:58 +0000 (14:39 +0300)]
net: prestera: manage matchall and flower priorities
matchall rules can be added only to chain 0 and their priorities have
limitations:
- new matchall ingress rule's priority must be higher (lower value)
than any existing flower rule;
- new matchall egress rule's priority must be lower (higher value)
than any existing flower rule.
The opposite works for flower rule adding:
- new flower ingress rule's priority must be lower (higher value)
than any existing matchall rule;
- new flower egress rule's priority must be higher (lower value)
than any existing matchall rule.
This is a hardware limitation and thus must be properly handled in
driver by reporting errors to the user when newly added rule has such a
priority that cannot be installed into the hardware.
To achieve this, the driver must maintain both min/max matchall
priorities for every flower block when user adds/deletes a matchall
rule, as well as both min/max flower priorities for chain 0 for every
adding/deletion of flower rules for chain 0.
Cc: Serhiy Boiko <serhiy.boiko@plvision.eu> Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu> Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
Serhiy Boiko [Tue, 23 Aug 2022 11:39:56 +0000 (14:39 +0300)]
net: prestera: acl: extract matchall logic into a separate file
This commit adds more clarity to handling of TC_CLSMATCHALL_REPLACE and
TC_CLSMATCHALL_DESTROY events by calling newly added *_mall_*() handlers
instead of directly calling SPAN API.
This also extracts matchall rules management out of SPAN API since SPAN
is a hardware module which is used to implement 'matchall egress mirred'
action only.
Signed-off-by: Taras Chornyi <tchornyi@marvell.com> Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu> Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel [Mon, 22 Aug 2022 12:39:42 +0000 (14:39 +0200)]
net: asix: ax88772: migrate to phylink
There are some exotic ax88772 based devices which may require
functionality provide by the phylink framework. For example:
- US100A20SFP, USB 2.0 auf LWL Converter with SFP Cage
- AX88772B USB to 100Base-TX Ethernet (with RMII) demo board, where it
is possible to switch between internal PHY and external RMII based
connection.
So, convert this driver to phylink as soon as possible.
Wolfram Sang [Thu, 18 Aug 2022 21:02:23 +0000 (23:02 +0200)]
wifi: mac80211: move from strlcpy with unused retval to strscpy
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Johannes Berg [Thu, 25 Aug 2022 18:57:32 +0000 (20:57 +0200)]
wifi: mac80211: correct SMPS mode in HE 6 GHz capability
If we add 6 GHz capability in MLO, we cannot use the SMPS
mode from the deflink. Pass it separately instead since on
a second link we don't even have a link data struct yet.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Rob Herring [Thu, 25 Aug 2022 19:26:07 +0000 (14:26 -0500)]
dt-bindings: net: Add missing (unevaluated|additional)Properties on child nodes
In order to ensure only documented properties are present, node schemas
must have unevaluatedProperties or additionalProperties set to false
(typically). Add missing properties/$refs as exposed by this addition.
drivers/net/ethernet/mellanox/mlx5/core/en_fs.c 21234e3a84c7 ("net/mlx5e: Fix use after free in mlx5e_fs_init()") c7eafc5ed068 ("net/mlx5e: Convert ethtool_steering member of flow_steering struct to pointer")
https://lore.kernel.org/all/20220825104410.67d4709c@canb.auug.org.au/
https://lore.kernel.org/all/20220823055533.334471-1-saeed@kernel.org/
Uros Bizjak [Mon, 22 Aug 2022 14:32:43 +0000 (16:32 +0200)]
netdev: Use try_cmpxchg in napi_if_scheduled_mark_missed
Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
napi_if_scheduled_mark_missed. x86 CMPXCHG instruction returns
success in ZF flag, so this change saves a compare after cmpxchg
(and related move instruction in front of cmpxchg).
Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg
fails, enabling further code simplifications.
Linus Torvalds [Thu, 25 Aug 2022 21:03:58 +0000 (14:03 -0700)]
Merge tag 'net-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from ipsec and netfilter (with one broken Fixes tag).
Current release - new code bugs:
- dsa: don't dereference NULL extack in dsa_slave_changeupper()
- dpaa: fix <1G ethernet on LS1046ARDB
- neigh: don't call kfree_skb() under spin_lock_irqsave()
Previous releases - regressions:
- r8152: fix the RX FIFO settings when suspending
- dsa: microchip: keep compatibility with device tree blobs with no
phy-mode
- Revert "net: macsec: update SCI upon MAC address change."
- Revert "xfrm: update SA curlft.use_time", comply with RFC 2367
Previous releases - always broken:
- netfilter: conntrack: work around exceeded TCP receive window
- ipsec: fix a null pointer dereference of dst->dev on a metadata dst
in xfrm_lookup_with_ifid
- moxa: get rid of asymmetry in DMA mapping/unmapping
- dsa: microchip: make learning configurable and keep it off while
standalone
- ice: xsk: prohibit usage of non-balanced queue id
- rxrpc: fix locking in rxrpc's sendmsg
Misc:
- another chunk of sysctl data race silencing"
* tag 'net-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
net: lantiq_xrx200: restore buffer if memory allocation failed
net: lantiq_xrx200: fix lock under memory pressure
net: lantiq_xrx200: confirm skb is allocated before using
net: stmmac: work around sporadic tx issue on link-up
ionic: VF initial random MAC address if no assigned mac
ionic: fix up issues with handling EAGAIN on FW cmds
ionic: clear broken state on generation change
rxrpc: Fix locking in rxrpc's sendmsg
net: ethernet: mtk_eth_soc: fix hw hash reporting for MTK_NETSYS_V2
MAINTAINERS: rectify file entry in BONDING DRIVER
i40e: Fix incorrect address type for IPv6 flow rules
ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter
net: Fix a data-race around sysctl_somaxconn.
net: Fix a data-race around netdev_unregister_timeout_secs.
net: Fix a data-race around gro_normal_batch.
net: Fix data-races around sysctl_devconf_inherit_init_net.
net: Fix data-races around sysctl_fb_tunnels_only_for_init_net.
net: Fix a data-race around netdev_budget_usecs.
net: Fix data-races around sysctl_max_skb_frags.
net: Fix a data-race around netdev_budget.
...
====================
net: devlink: sync flash and dev info commands
Purpose of this patchset is to introduce consistency between two devlink
commands:
devlink dev info
Shows versions of running default flash target and components.
devlink dev flash
Flashes default flash target or component name (if specified
on cmdline).
Currently it is up to the driver what versions to expose and what flash
update component names to accept. This is inconsistent. Thankfully, only
netdevsim currently using components so it is still time
to sanitize this.
This patchset makes sure, that devlink.c calls into driver for
component flash update only in case the driver exposes the same version
name.
Example:
$ devlink dev info
netdevsim/netdevsim10:
driver netdevsim
versions:
running:
fw.mgmt 10.20.30
stored:
fw.mgmt 10.20.30
$ devlink dev flash netdevsim/netdevsim10 file somefile.bin
[fw.mgmt] Preparing to flash
[fw.mgmt] Flashing 100%
[fw.mgmt] Flash select
[fw.mgmt] Flashing done
$ devlink dev flash netdevsim/netdevsim10 file somefile.bin component fw.mgmt
[fw.mgmt] Preparing to flash
[fw.mgmt] Flashing 100%
[fw.mgmt] Flash select
[fw.mgmt] Flashing done
$ devlink dev flash netdevsim/netdevsim10 file somefile.bin component dummy
Error: selected component is not supported by this device.
====================