Chris Packham [Mon, 16 May 2022 22:48:01 +0000 (10:48 +1200)]
dt-bindings: net: marvell,orion-mdio: Set unevaluatedProperties to false
When the binding was converted it appeared necessary to set
'unevaluatedProperties: true' because of the switch devices on the
turris-mox board. Actually the error was because of the reg property
being incorrect causing the rest of the properties to be unevaluated.
After the reg properties are fixed for turris-mox we can set
'unevaluatedProperties: false' as is generally expected.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Chris Packham [Mon, 16 May 2022 22:48:00 +0000 (10:48 +1200)]
arm64: dts: armada-3720-turris-mox: Correct reg property for mdio devices
MDIO devices have #address-cells = <1>, #size-cells = <0>. Now that we
have a schema enforcing this for marvell,orion-mdio we can see that the
turris-mox has a unnecessary 2nd cell for the switch nodes reg property
of it's switch devices. Remove the unnecessary 2nd cell from the
switches reg property.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: Marek BehĂșn <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 May 2022 11:51:00 +0000 (12:51 +0100)]
Merge branch 'dsa-microchip-ksz_switch-refactor'
Arun Ramadoss says:
====================
net: dsa: microchip: refactor the ksz switch init function
During the ksz_switch_register function, it calls the individual switches init
functions (ksz8795.c and ksz9477.c). Both these functions have few things in
common like, copying the chip specific data to struct ksz_dev, allocating
ksz_port memory and mib_names memory & cnt. And to add the new LAN937x series
switch, these allocations has to be replicated.
Based on the review feedback of LAN937x part support patch, refactored the
switch init function to move allocations to switch register.
Arun Ramadoss [Tue, 17 May 2022 09:43:33 +0000 (15:13 +0530)]
net: dsa: microchip: remove unused members in ksz_device
The name, regs_size and overrides members in struct ksz_device are
unused. Hence remove it.
And host_mask is used in only place of ksz8795.c file, which can be
replaced by dev->info->cpu_ports
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arun Ramadoss [Tue, 17 May 2022 09:43:32 +0000 (15:13 +0530)]
net: dsa: microchip: add the phylink get_caps
This patch add the support for phylink_get_caps for ksz8795 and ksz9477
series switch. It updates the struct ksz_switch_chip with the details of
the internal phys and xmii interface. Then during the get_caps based on
the bits set in the structure, corresponding phy mode is set.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: move mib->cnt_ptr reset code to ksz_common.c
mib->cnt_ptr resetting is handled in multiple places as part of
port_init_cnt(). Hence moved mib->cnt_ptr code to ksz common layer
and removed from individual product files.
Signed-off-by: Prasanna Vengateshan <prasanna.vengateshan@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arun Ramadoss [Tue, 17 May 2022 09:43:30 +0000 (15:13 +0530)]
net: dsa: microchip: move get_strings to ksz_common
ksz8795 and ksz9477 uses the same algorithm for copying the ethtool
strings. Hence moved to ksz_common to remove the redundant code.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arun Ramadoss [Tue, 17 May 2022 09:43:29 +0000 (15:13 +0530)]
net: dsa: microchip: move port memory allocation to ksz_common
ksz8795 and ksz9477 init function initializes the memory to dev->ports,
mib counters and assigns the ds real number of ports. Since both the
routines are same, moved the allocation of port memory to
ksz_switch_register after init.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arun Ramadoss [Tue, 17 May 2022 09:43:28 +0000 (15:13 +0530)]
net: dsa: microchip: move struct mib_names to ksz_chip_data
The ksz88xx family has one set of mib_names. The ksz87xx, ksz9477,
LAN937x based switches has one set of mib_names. In order to remove
redundant declaration, moved the struct mib_names to ksz_chip_data
structure.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arun Ramadoss [Tue, 17 May 2022 09:43:27 +0000 (15:13 +0530)]
net: dsa: microchip: perform the compatibility check for dev probed
This patch perform the compatibility check for the device after the chip
detect is done. It is to prevent the mismatch between the device
compatible specified in the device tree and actual device found during
the detect. The ksz9477 device doesn't use any .data in the
of_device_id. But the ksz8795_spi uses .data for assigning the regmap
between 8830 family and 87xx family switch. Changed the regmap
assignment based on the chip_id from the .data.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arun Ramadoss [Tue, 17 May 2022 09:43:26 +0000 (15:13 +0530)]
net: dsa: microchip: move ksz_chip_data to ksz_common
This patch moves the ksz_chip_data in ksz8795 and ksz9477 to ksz_common.
At present, the dev->chip_id is iterated with the ksz_chip_data and then
copy its value to the ksz_dev structure. These values are declared as
constant.
Instead of copying the values and referencing it, this patch update the
dev->info to the ksz_chip_data based on the chip_id in the init
function. And also update the ksz_chip_data values for the LAN937x based
switches.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arun Ramadoss [Tue, 17 May 2022 09:43:25 +0000 (15:13 +0530)]
net: dsa: microchip: ksz8795: update the port_cnt value in ksz_chip_data
The port_cnt value in the structure is not used in the switch_init.
Instead it uses the fls(chip->cpu_port), this is due to one of port in
the ksz8794 unavailable. The cpu_port for the 8794 is 0x10, fls(0x10) =
5, hence updating it directly in the ksz_chip_data structure in order to
same with all the other switches in ksz8795.c and ksz9477.c files.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 May 2022 10:35:27 +0000 (11:35 +0100)]
Merge tag 'mlx5-updates-2022-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2022-05-17
MISC updates to mlx5 dirver
1) Aya Levin allows relaxed ordering over VFs
2) Gal Pressman Adds support XDP SQs for uplink representors in switchdev mode
3) Add debugfs TC stats and command failure syndrome for debuggability
4) Tariq uses variants of vzalloc where it could help
5) Multiport eswitch support from Elic Cohen:
Eli Cohen Says:
===============
The multiport eswitch feature allows to forward traffic from a
representor net device to the uplink port of an associated eswitch's
uplink port.
This feature requires creating a LAG object. Since LAG can be created
only once for a function, the feature is mutual exclusive with either
bonding or multipath.
Multipath eswitch mode is entered automatically these conditions are
met:
1. No other LAG related mode is active.
2. A rule that explicitly forwards to an uplink port is inserted.
The implementation maintains a reference count on such rules. When the
reference count reaches zero, the LAG is released and other modes may be
used.
When an explicit rule that explicitly forwards to an uplink port is
inserted while another LAG mode is active, that rule will not be
offloaded by the hardware since the hardware cannot guarantee that the
rule will actually be forwarded to that port.
Example rules that forwards to an uplink port is:
$ tc filter add dev rep0 root flower action mirred egress \
redirect dev uplinkrep0
$ tc filter add dev rep0 root flower action mirred egress \
redirect dev uplinkrep1
This feature is supported only if LAG_RESOURCE_ALLOCATION firmware
configuration parameter is set to true.
The series consists of three patches:
1. Lag state machine refactor
This patch does not add new functionality but rather changes the way
the state of the LAG is maintained.
2. Small fix to remove unused argument.
3. The actual implementation of the feature.
===============
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cohen [Mon, 31 Jan 2022 05:49:51 +0000 (07:49 +0200)]
net/mlx5: Support multiport eswitch mode
Multiport eswitch mode is a LAG mode that allows to add rules that
forward traffic to a specific physical port without being affected by LAG
affinity configuration.
This mode of operation is mutual exclusive with the other LAG modes used
by multipath and bonding.
To make the transition between the modes, we maintain a counter on the
number of rules specifying one of the uplink representors as the target
of mirred egress redirect action.
An example of such rule would be:
$ tc filter add dev enp8s0f0_0 prot all root flower dst_mac \
00:11:22:33:44:55 action mirred egress redirect dev enp8s0f0
If the reference count just grows to one and LAG is not in use, we
create the LAG in multiport eswitch mode. Other mode changes are not
allowed while in this mode. When the reference count reaches zero, we
destroy the LAG and let other modes be used if needed.
logic also changed such that if forwarding to some uplink destination
cannot be guaranteed, we fail the operation so the rule will eventually
be in software and not in hardware.
Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Eli Cohen [Mon, 24 Jan 2022 09:30:46 +0000 (11:30 +0200)]
net/mlx5: Lag, refactor lag state machine
LAG state machine is implemented using bit flags. However, all these bit
flags, except for MLX5_LAG_FLAG_HASH_BASED, are really mutual exclusive.
In addition, MLX5_LAG_FLAG_READY is used by bonding to mark if we have
our netdevices successfully added to lag and does not really belong in
the same flags variable as the other flags.
Rename MLX5_LAG_FLAG_READY to MLX5_LAG_FLAG_NDEVS_READY to better
reflect its purpose and put it in a new flags variable.
For the rest of the flags, we introduce a mode enum to hold the state
of the LAG.
Remove the shared fdb boolean flag from struct mlx5_lag and store this
configuration as a mode flag.
Change all flag related operations to use standard Linux APIs.
Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Moshe Tal [Wed, 27 Apr 2022 15:26:52 +0000 (18:26 +0300)]
net/mlx5e: Correct the calculation of max channels for rep
Correct the calculation of maximum channels of rep to better utilize
the hardware resources and allow a larger scale of reps.
This will allow creation of all virtual ports configured.
Fixes: 473baf2e9e8c ("net/mlx5e: Allow profile-specific limitation on max num of channels") Signed-off-by: Moshe Tal <moshet@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Saeed Mahameed [Wed, 18 May 2022 03:13:09 +0000 (20:13 -0700)]
net/mlx5e: CT: Add ct driver counters
Connection offload is translated to multiple rules over several
hardware flow tables. Unhandled end-cases may cause a hardware
resource leak causing multiple system symptoms such as a host
memory leak, decreased performance and other scale related issues.
Export the current number of firmware FTEs related to the CT table
as a debugfs counter. Also add a dropped packets counter to help
debug packets dropped on restore failure.
To show the offloaded count:
cat /sys/kernel/debug/mlx5/<PCI>/ct_nic/offloaded
To show the dropped count:
cat /sys/kernel/debug/mlx5/<PCI>/ct_nic/rx_dropped
Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Roi Dayan <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com>
Aya Levin [Mon, 11 Apr 2022 06:27:39 +0000 (09:27 +0300)]
net/mlx5e: Allow relaxed ordering over VFs
By PCI spec, the config space of the VF always report relaxed ordering
not supported while it inherits this property from its PF. Hence using
pcie_relaxed_ordering_enable(), always disables the relaxed ordering on
all VFs. Remove this check and rely on the firmware which queries the
config space of the PF and set the capability bit accordingly.
Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Marina Varshaver <marinav@nvidia.com> Reviewed-by: Gal Shalom <galshalom@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Gal Pressman [Thu, 31 Mar 2022 06:26:00 +0000 (09:26 +0300)]
net/mlx5e: Support partial GSO for tunnels over vlans
Offloading outer checksum on tunnels requires GSO partial, add it to
'vlan_features' to allow offloading tunnels over vlans.
For example, running GENEVE over vlan & ipv6 (mandatory UDP checksum)
now allows for hardware TSO instead of software segmentation in GSO
only.
Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Gal Pressman [Mon, 11 Apr 2022 12:44:01 +0000 (15:44 +0300)]
net/mlx5e: IPoIB, Improve ethtool rxnfc callback structure in IPoIB
Followup commit 79ce39be1d63 ("net/mlx5e: Improve ethtool rxnfc callback structure")
and handle CONFIG_MLX5_EN_RXNFC enabled/disabled inside the fs layer so
the ethtool callbacks are always available. The fs layer will provide
stubs when CONFIG_MLX5_EN_RXNFC is compiled out.
Moshe Shemesh [Fri, 13 May 2022 03:19:31 +0000 (06:19 +0300)]
net/mlx5: Add last command failure syndrome to debugfs
Add syndrome of last command failure per command type to debugfs to ease
debugging of such failure.
last_failed_syndrome - last command failed syndrome returned by FW.
Min Li [Mon, 16 May 2022 14:47:06 +0000 (10:47 -0400)]
ptp: ptp_clockmatrix: Add PTP_CLK_REQ_EXTTS support
Use TOD_READ_SECONDARY for extts to keep TOD_READ_PRIMARY
for gettime and settime exclusively. Before this change,
TOD_READ_PRIMARY was used for both extts and gettime/settime,
which would result in changing TOD read/write triggers between
operations. Using TOD_READ_SECONDARY would make extts
independent of gettime/settime operation
====================
net/smc: send and write inline optimization for smc
Send cdc msgs and write data inline if qp has sufficent inline
space, helps latency reducing.
In my test environment, which are 2 VMs running on the same
physical host and whose NICs(ConnectX-4Lx) are working on
SR-IOV mode, qperf shows 0.4us-1.3us improvement in latency.
The results shown below:
msgsize before after
1B 11.9 us 10.6 us (-1.3 us)
2B 11.7 us 10.7 us (-1.0 us)
4B 11.7 us 10.7 us (-1.0 us)
8B 11.6 us 10.6 us (-1.0 us)
16B 11.7 us 10.7 us (-1.0 us)
32B 11.7 us 10.6 us (-1.1 us)
64B 11.7 us 11.2 us (-0.5 us)
128B 11.6 us 11.2 us (-0.4 us)
256B 11.8 us 11.2 us (-0.6 us)
512B 11.8 us 11.3 us (-0.5 us)
1KB 11.9 us 11.5 us (-0.4 us)
2KB 12.1 us 11.5 us (-0.6 us)
====================
Guangguan Wang [Mon, 16 May 2022 05:51:37 +0000 (13:51 +0800)]
net/smc: rdma write inline if qp has sufficient inline space
Rdma write with inline flag when sending small packages,
whose length is shorter than the qp's max_inline_data, can
help reducing latency.
In my test environment, which are 2 VMs running on the same
physical host and whose NICs(ConnectX-4Lx) are working on
SR-IOV mode, qperf shows 0.5us-0.7us improvement in latency.
The results shown below:
msgsize before after
1B 11.2 us 10.6 us (-0.6 us)
2B 11.2 us 10.7 us (-0.5 us)
4B 11.3 us 10.7 us (-0.6 us)
8B 11.2 us 10.6 us (-0.6 us)
16B 11.3 us 10.7 us (-0.6 us)
32B 11.3 us 10.6 us (-0.7 us)
64B 11.2 us 11.2 us (0 us)
128B 11.2 us 11.2 us (0 us)
256B 11.2 us 11.2 us (0 us)
512B 11.4 us 11.3 us (-0.1 us)
1KB 11.4 us 11.5 us (0.1 us)
2KB 11.5 us 11.5 us (0 us)
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Tested-by: kernel test robot <lkp@intel.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Guangguan Wang [Mon, 16 May 2022 05:51:36 +0000 (13:51 +0800)]
net/smc: send cdc msg inline if qp has sufficient inline space
As cdc msg's length is 44B, cdc msgs can be sent inline in
most rdma devices, which can help reducing sending latency.
In my test environment, which are 2 VMs running on the same
physical host and whose NICs(ConnectX-4Lx) are working on
SR-IOV mode, qperf shows 0.4us-0.7us improvement in latency.
The results shown below:
msgsize before after
1B 11.9 us 11.2 us (-0.7 us)
2B 11.7 us 11.2 us (-0.5 us)
4B 11.7 us 11.3 us (-0.4 us)
8B 11.6 us 11.2 us (-0.4 us)
16B 11.7 us 11.3 us (-0.4 us)
32B 11.7 us 11.3 us (-0.4 us)
64B 11.7 us 11.2 us (-0.5 us)
128B 11.6 us 11.2 us (-0.4 us)
256B 11.8 us 11.2 us (-0.6 us)
512B 11.8 us 11.4 us (-0.4 us)
1KB 11.9 us 11.4 us (-0.5 us)
2KB 12.1 us 11.5 us (-0.6 us)
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Tested-by: kernel test robot <lkp@intel.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Leszek Polak [Mon, 16 May 2022 07:08:59 +0000 (09:08 +0200)]
net: phy: marvell: Add errata section 5.1 for Alaska PHY
As per Errata Section 5.1, if EEE is intended to be used, some register
writes must be done once after every hardware reset. This patch now adds
the necessary register writes as listed in the Marvell errata.
Without this fix we experience ethernet problems on some of our boards
equipped with a new version of this ethernet PHY (different supplier).
Signed-off-by: Leszek Polak <lpolak@arri.de> Signed-off-by: Stefan Roese <sr@denx.de> Cc: Marek BehĂșn <kabel@kernel.org> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Heiner Kallweit <hkallweit1@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Cc: David S. Miller <davem@davemloft.net> Reviewed-by: Marek BehĂșn <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220516070859.549170-1-sr@denx.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Minghao Chi [Mon, 16 May 2022 08:22:51 +0000 (08:22 +0000)]
net: qede: Remove unnecessary synchronize_irq() before free_irq()
Calling synchronize_irq() right before free_irq() is quite useless. On one
hand the IRQ can easily fire again before free_irq() is entered, on the
other hand free_irq() itself calls synchronize_irq() internally (in a race
condition free way), before any state associated with the IRQ is freed.
Minghao Chi [Mon, 16 May 2022 08:19:14 +0000 (08:19 +0000)]
net: vxge: Remove unnecessary synchronize_irq() before free_irq()
Calling synchronize_irq() right before free_irq() is quite useless. On one
hand the IRQ can easily fire again before free_irq() is entered, on the
other hand free_irq() itself calls synchronize_irq() internally (in a race
condition free way), before any state associated with the IRQ is freed.
Minghao Chi [Mon, 16 May 2022 07:26:46 +0000 (07:26 +0000)]
qed: Remove unnecessary synchronize_irq() before free_irq()
Calling synchronize_irq() right before free_irq() is quite useless. On one
hand the IRQ can easily fire again before free_irq() is entered, on the
other hand free_irq() itself calls synchronize_irq() internally (in a race
condition free way), before any state associated with the IRQ is freed.
the first 2 patches are by me and target the CAN raw protocol. The 1st
removes an unneeded assignment, the other one adds support for
SO_TXTIME/SCM_TXTIME.
Oliver Hartkopp contributes 2 patches for the ISOTP protocol. The 1st
adds support for transmission without flow control, the other let's
bind() return an error on incorrect CAN ID formatting.
Geert Uytterhoeven contributes a patch to clean up ctucanfd's Kconfig
file.
Vincent Mailhol's patch for the slcan driver uses the proper function
to check for invalid CAN frames in the xmit callback.
The next patch is by Geert Uytterhoeven and makes the interrupt-names
of the renesas,rcar-canfd dt bindings mandatory.
A patch by my update the ctucanfd dt bindings to include the common
CAN controller bindings.
The last patch is by Akira Yokosawa and fixes a breakage the
ctucanfd's documentation.
* tag 'linux-can-next-for-5.19-20220516' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
docs: ctucanfd: Use 'kernel-figure' directive instead of 'figure'
dt-bindings: can: ctucanfd: include common CAN controller bindings
dt-bindings: can: renesas,rcar-canfd: Make interrupt-names required
can: slcan: slc_xmit(): use can_dropped_invalid_skb() instead of manual check
can: ctucanfd: Let users select instead of depend on CAN_CTUCANFD
can: isotp: isotp_bind(): return -EINVAL on incorrect CAN ID formatting
can: isotp: add support for transmission without flow control
can: raw: add support for SO_TXTIME/SCM_TXTIME
can: raw: raw_sendmsg(): remove not needed setting of skb->sk
====================
Ricardo Martinez [Fri, 13 May 2022 17:34:00 +0000 (10:34 -0700)]
net: skb: Remove skb_data_area_size()
skb_data_area_size() is not needed. As Jakub pointed out [1]:
For Rx, drivers can use the size passed during skb allocation or
use skb_tailroom().
For Tx, drivers should use skb_headlen().
Ricardo Martinez [Fri, 13 May 2022 17:33:59 +0000 (10:33 -0700)]
net: wwan: t7xx: Avoid calls to skb_data_area_size()
skb_data_area_size() helper was used to calculate the size of the
DMA mapped buffer passed to the HW. Instead of doing this, use the
size passed to allocate the skbs.
Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Additional locks are not needed, all the touched sections
are already under mptcp socket lock protection.
Fixes: 4293248c6704 ("mptcp: add data lock for sk timers") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geliang Tang [Sat, 14 May 2022 00:21:13 +0000 (17:21 -0700)]
selftests: mptcp: fix a mp_fail test warning
Old tc versions (iproute2 5.3) show actions in multiple lines, not a
single line. Then the following unexpected MP_FAIL selftest output
occurs:
file received by server has inverted byte at 169
./mptcp_join.sh: line 1277: [: [{"total acts":1},{"actions":[{"order":0 pedit ,"control_action":{"type":"pipe"}keys 1
index 1 ref 1 bind 1,"installed":0,"last_used":0
key #0 at 148: val ff000000 mask ffffffff
5: integer expression expected
001 Infinite map syn[ ok ] - synack[ ok ] - ack[ ok ]
sum[ ok ] - csum [ ok ]
ftx[ ok ] - failrx[ ok ]
rtx[ ok ] - rstrx [ ok ]
itx[ ok ] - infirx[ ok ]
ftx[ ok ] - failrx[ ok ] invert
This patch adds a 'grep' before 'sed' to fix this.
Akira Yokosawa [Tue, 10 May 2022 23:45:43 +0000 (08:45 +0900)]
docs: ctucanfd: Use 'kernel-figure' directive instead of 'figure'
Two issues were observed in the ReST doc added by commit c3a0addefbde
("docs: ctucanfd: CTU CAN FD open-source IP core documentation.")
with Sphinx versions 2.4.4 and 4.5.0.
The plain "figure" directive broke "make pdfdocs" due to a missing
PDF figure. For conversion of SVG -> PDF to work, the "kernel-figure"
directive, which is an extension for kernel documentation, should
be used instead.
The directive of "code:: raw" causes a warning from both
"make htmldocs" and "make pdfdocs", which reads:
[...]/can/ctu/ctucanfd-driver.rst:75: WARNING: Pygments lexer name
'raw' is not known
A plain literal-block marker should suffice where no syntax
highlighting is intended.
Fix the issues by using suitable directive and marker.
Fixes: c3a0addefbde ("docs: ctucanfd: CTU CAN FD open-source IP core documentation.") Link: https://lore.kernel.org/all/5986752a-1c2a-5d64-f91d-58b1e6decd17@gmail.com Signed-off-by: Akira Yokosawa <akiyks@gmail.com> Acked-by: Pavel Pisa <pisa@cmp.felk.cvut.cz> Cc: Martin Jerabek <martin.jerabek01@gmail.com> Cc: Ondrej Ille <ondrej.ille@gmail.com> Cc: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
there is a common CAN controller binding. Add this to the ctucanfd
binding.
Cc: Ondrej Ille <ondrej.ille@gmail.com> Acked-by: Pavel Pisa <pisa@cmp.felk.cvut.cz> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
dt-bindings: can: renesas,rcar-canfd: Make interrupt-names required
The Renesas R-Car CAN FD Controller always uses two or more interrupts.
Make the interrupt-names properties a required property, to make it
easier to identify the individual interrupts.
can: ctucanfd: Let users select instead of depend on CAN_CTUCANFD
The CTU CAN-FD IP core is only useful when used with one of the
corresponding PCI/PCIe or platform (FPGA, SoC) drivers, which depend on
PCI resp. OF.
Hence make the users select the core driver code, instead of letting
then depend on it. Keep the core code config option visible when
compile-testing, to maintain compile-coverage.
Oliver Hartkopp [Sun, 15 May 2022 18:16:33 +0000 (20:16 +0200)]
can: isotp: isotp_bind(): return -EINVAL on incorrect CAN ID formatting
Commit 3ea566422cbd ("can: isotp: sanitize CAN ID checks in
isotp_bind()") checks the given CAN ID address information by
sanitizing the input values.
This check (silently) removes obsolete bits by masking the given CAN
IDs.
Derek Will suggested to give a feedback to the application programmer
when the 'sanitizing' was actually needed which means the programmer
provided CAN ID content in a wrong format (e.g. SFF CAN IDs with a CAN
ID > 0x7FF).
Oliver Hartkopp [Sat, 7 May 2022 11:55:58 +0000 (13:55 +0200)]
can: isotp: add support for transmission without flow control
Usually the ISO 15765-2 protocol is a point-to-point protocol to transfer
segmented PDUs to a dedicated receiver. This receiver sends a flow control
message to specify protocol options and timings (e.g. block size / STmin).
The so called functional addressing communication allows a 1:N
communication but is limited to a single frame length.
This new CAN_ISOTP_CF_BROADCAST allows an unconfirmed 1:N communication
with PDU length that would not fit into a single frame. This feature is
not covered by the ISO 15765-2 standard.
This patch calls into sock_cmsg_send() to parse the user supplied
control information into a struct sockcm_cookie. Then assign the
requested transmit time to the skb.
This makes it possible to use the Earliest TXTIME First (ETF) packet
scheduler with the CAN_RAW protocol. The user can send a CAN_RAW frame
with a TXTIME and the kernel (with the ETF scheduler) will take care
of sending it to the network interface.
can: raw: raw_sendmsg(): remove not needed setting of skb->sk
The skb in raw_sendmsg() is allocated with sock_alloc_send_skb(),
which subsequently calls sock_alloc_send_pskb() -> skb_set_owner_w(),
which assigns "skb->sk = sk".
This patch removes the not needed setting of skb->sk.
Fabio Estevam [Fri, 13 May 2022 11:46:13 +0000 (08:46 -0300)]
net: phy: micrel: Use the kszphy probe/suspend/resume
Now that it is possible to use .probe without having .driver_data, let
KSZ8061 use the kszphy specific hooks for probe,suspend and resume,
which is preferred.
Switch to using the dedicated kszphy probe/suspend/resume functions.
Minghao Chi [Fri, 13 May 2022 08:19:18 +0000 (08:19 +0000)]
octeontx2-pf: Remove unnecessary synchronize_irq() before free_irq()
Calling synchronize_irq() right before free_irq() is quite useless. On one
hand the IRQ can easily fire again before free_irq() is entered, on the
other hand free_irq() itself calls synchronize_irq() internally (in a race
condition free way), before any state associated with the IRQ is freed.
Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
Zheng Bin [Fri, 13 May 2022 07:10:18 +0000 (15:10 +0800)]
octeon_ep: add missing destroy_workqueue in octep_init_module
octep_init_module misses destroy_workqueue in error path,
this patch fixes that.
Fixes: 862cd659a6fb ("octeon_ep: Add driver framework and device initialization") Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
While testing this recently added feature on a variety
of platforms/configurations, I found the following issues:
1) A race leading to concurrent calls to smp_call_function_single_async()
2) Missed opportunity to use napi_consume_skb()
3) Need to limit the max length of the per-cpu lists.
4) Process the per-cpu list more frequently, for the
(unusual) case where net_rx_action() has mutiple
napi_poll() to process per round.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 16 May 2022 04:24:55 +0000 (21:24 -0700)]
net: add skb_defer_max sysctl
commit 68822bdf76f1 ("net: generalize skb freeing
deferral to per-cpu lists") added another per-cpu
cache of skbs. It was expected to be small,
and an IPI was forced whenever the list reached 128
skbs.
We might need to be able to control more precisely
queue capacity and added latency.
An IPI is generated whenever queue reaches half capacity.
Default value of the new limit is 64.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 16 May 2022 04:24:54 +0000 (21:24 -0700)]
net: use napi_consume_skb() in skb_defer_free_flush()
skb_defer_free_flush() runs from softirq context,
we have the opportunity to refill the napi_alloc_cache,
and/or use kmem_cache_free_bulk() when this cache is full.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 16 May 2022 04:24:53 +0000 (21:24 -0700)]
net: fix possible race in skb_attempt_defer_free()
A cpu can observe sd->defer_count reaching 128,
and call smp_call_function_single_async()
Problem is that the remote CPU can clear sd->defer_count
before the IPI is run/acknowledged.
Other cpus can queue more packets and also decide
to call smp_call_function_single_async() while the pending
IPI was not yet delivered.
This is a common issue with smp_call_function_single_async().
Callers must ensure correct synchronization and serialization.
I triggered this issue while experimenting smaller threshold.
Performing the call to smp_call_function_single_async()
under sd->defer_lock protection did not solve the problem.
Commit 5a18ceca6350 ("smp: Allow smp_call_function_single_async()
to insert locked csd") replaced an informative WARN_ON_ONCE()
with a return of -EBUSY, which is often ignored.
Test of CSD_FLAG_LOCK presence is racy anyway.
Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ 274.452394] tulip0: no phy info, aborting mtable build
[ 274.499041] tulip0: MII transceiver #1 config 1000 status 782d advertising 01e1
[ 274.750691] net eth0: Digital DS21142/43 Tulip rev 65 at MMIO 0xf4008000, 00:30:6e:08:7d:21, IRQ 17
[ 283.104520] net eth0: Setting full-duplex based on MII#1 link partner capability of c1e1
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Zheng Bin [Fri, 13 May 2022 07:09:22 +0000 (15:09 +0800)]
net: hinic: add missing destroy_workqueue in hinic_pf_to_mgmt_init
hinic_pf_to_mgmt_init misses destroy_workqueue in error path,
this patch fixes that.
Fixes: 6dbb89014dc3 ("hinic: fix sending mailbox timeout in aeq event work") Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 16 May 2022 09:47:44 +0000 (10:47 +0100)]
Merge branch 'skb-drop-reason-boundary'
Menglong Dong says:
====================
net: skb: check the boundrary of skb drop reason
In the commit 1330b6ef3313 ("skb: make drop reason booleanable"),
SKB_NOT_DROPPED_YET is added to the enum skb_drop_reason, which makes
the invalid drop reason SKB_NOT_DROPPED_YET can leak to the kfree_skb
tracepoint. Once this happen (it happened, as 4th patch says), it can
cause NULL pointer in drop monitor and result in kernel panic.
Therefore, check the boundrary of drop reason in both kfree_skb_reason
(2th patch) and drop monitor (1th patch) to prevent such case happens
again.
Meanwhile, fix the invalid drop reason passed to kfree_skb_reason() in
tcp_v4_rcv() and tcp_v6_rcv().
Changes since v2:
1/4 - don't reset the reason and print the debug warning only (Jakub
Kicinski)
4/4 - remove new lines between tags
Changes since v1:
- consider tcp_v6_rcv() in the 4th patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Fri, 13 May 2022 03:03:39 +0000 (11:03 +0800)]
net: tcp: reset 'drop_reason' to NOT_SPCIFIED in tcp_v{4,6}_rcv()
The 'drop_reason' that passed to kfree_skb_reason() in tcp_v4_rcv()
and tcp_v6_rcv() can be SKB_NOT_DROPPED_YET(0), as it is used as the
return value of tcp_inbound_md5_hash().
And it can panic the kernel with NULL pointer in
net_dm_packet_report_size() if the reason is 0, as drop_reasons[0]
is NULL.
Fixes: 1330b6ef3313 ("skb: make drop reason booleanable") Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Fri, 13 May 2022 03:03:38 +0000 (11:03 +0800)]
net: skb: change the definition SKB_DR_SET()
The SKB_DR_OR() is used to set the drop reason to a value when it is
not set or specified yet. SKB_NOT_DROPPED_YET should also be considered
as not set.
Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Fri, 13 May 2022 03:03:37 +0000 (11:03 +0800)]
net: skb: check the boundrary of drop reason in kfree_skb_reason()
Sometimes, we may forget to reset skb drop reason to NOT_SPECIFIED after
we make it the return value of the functions with return type of enum
skb_drop_reason, such as tcp_inbound_md5_hash. Therefore, its value can
be SKB_NOT_DROPPED_YET(0), which is invalid for kfree_skb_reason().
So we check the range of drop reason in kfree_skb_reason() with
DEBUG_NET_WARN_ON_ONCE().
Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Fri, 13 May 2022 03:03:36 +0000 (11:03 +0800)]
net: dm: check the boundary of skb drop reasons
The 'reason' will be set to 'SKB_DROP_REASON_NOT_SPECIFIED' if it not
small that SKB_DROP_REASON_MAX in net_dm_packet_trace_kfree_skb_hit(),
but it can't avoid it to be 0, which is invalid and can cause NULL
pointer in drop_reasons.
Therefore, reset it to SKB_DROP_REASON_NOT_SPECIFIED when 'reason <= 0'.
Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Guangguan Wang [Fri, 13 May 2022 02:24:53 +0000 (10:24 +0800)]
net/smc: align the connect behaviour with TCP
Connect with O_NONBLOCK will not be completed immediately
and returns -EINPROGRESS. It is possible to use selector/poll
for completion by selecting the socket for writing. After select
indicates writability, a second connect function call will return
0 to indicate connected successfully as TCP does, but smc returns
-EISCONN. Use socket state for smc to indicate connect state, which
can help smc aligning the connect behaviour with TCP.
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 16 May 2022 09:31:06 +0000 (10:31 +0100)]
Merge branch 'sk_bound_dev_if-annotations'
Eric Dumazet says:
====================
net: add annotations for sk->sk_bound_dev_if
While writes on sk->sk_bound_dev_if are protected by socket lock,
we have many lockless reads all over the places.
This is based on syzbot report found in the first patch changelog.
v2: inline ipv6 function only defined if IS_ENABLED(CONFIG_IPV6) (kernel bots)
Change the INET6_MATCH() to inet6_match(), this is no longer a macro.
Change INET_MATCH() to inet_match() (Olivier Hartkopp & Jakub Kicinski)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 13 May 2022 18:55:50 +0000 (11:55 -0700)]
inet: rename INET_MATCH()
This is no longer a macro, but an inlined function.
INET_MATCH() -> inet_match()
Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Olivier Hartkopp <socketcan@hartkopp.net> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 13 May 2022 18:55:41 +0000 (11:55 -0700)]
net: annotate races around sk->sk_bound_dev_if
UDP sendmsg() is lockless, and reads sk->sk_bound_dev_if while
this field can be changed by another thread.
Adds minimal annotations to avoid KCSAN splats for UDP.
Following patches will add more annotations to potential lockless readers.
BUG: KCSAN: data-race in __ip6_datagram_connect / udpv6_sendmsg
write to 0xffff888136d47a94 of 4 bytes by task 7681 on cpu 0:
__ip6_datagram_connect+0x6e2/0x930 net/ipv6/datagram.c:221
ip6_datagram_connect+0x2a/0x40 net/ipv6/datagram.c:272
inet_dgram_connect+0x107/0x190 net/ipv4/af_inet.c:576
__sys_connect_file net/socket.c:1900 [inline]
__sys_connect+0x197/0x1b0 net/socket.c:1917
__do_sys_connect net/socket.c:1927 [inline]
__se_sys_connect net/socket.c:1924 [inline]
__x64_sys_connect+0x3d/0x50 net/socket.c:1924
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x50 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff888136d47a94 of 4 bytes by task 7670 on cpu 1:
udpv6_sendmsg+0xc60/0x16e0 net/ipv6/udp.c:1436
inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:652
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg net/socket.c:725 [inline]
____sys_sendmsg+0x39a/0x510 net/socket.c:2413
___sys_sendmsg net/socket.c:2467 [inline]
__sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
__do_sys_sendmmsg net/socket.c:2582 [inline]
__se_sys_sendmmsg net/socket.c:2579 [inline]
__x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x50 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x00000000 -> 0xffffff9b
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 7670 Comm: syz-executor.3 Tainted: G W 5.18.0-rc1-syzkaller-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
I chose to not add Fixes: tag because race has minor consequences
and stable teams busy enough.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 13 May 2022 18:34:08 +0000 (11:34 -0700)]
mlx5: support BIG TCP packets
mlx5 supports LSOv2.
IPv6 gro/tcp stacks insert a temporary Hop-by-Hop header
with JUMBO TLV for big packets.
We need to ignore/skip this HBH header when populating TX descriptor.
Note that ipv6_has_hopopt_jumbo() only recognizes very specific packet
layout, thus mlx5e_sq_xmit_wqe() is taking care of this layout only.
v7: adopt unsafe_memcpy() and MLX5_UNSAFE_MEMCPY_DISCLAIMER
v2: clear hopbyhop in mlx5e_tx_get_gso_ihs()
v4: fix compile error for CONFIG_MLX5_CORE_IPOIB=y
Signed-off-by: Coco Li <lixiaoyan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Leon Romanovsky <leon@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 13 May 2022 18:34:07 +0000 (11:34 -0700)]
mlx4: support BIG TCP packets
mlx4 supports LSOv2 just fine.
IPv6 stack inserts a temporary Hop-by-Hop header
with JUMBO TLV for big packets.
We need to ignore the HBH header when populating TX descriptor.
Tested:
Before: (not enabling bigger TSO/GRO packets)
ip link set dev eth0 gso_max_size 65536 gro_max_size 65536
netperf -H lpaa18 -t TCP_RR -T2,2 -l 10 -Cc -- -r 70000,70000
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to lpaa18.prod.google.com () port 0 AF_INET6 : first burst 0 : cpu bind
Local /Remote
Socket Size Request Resp. Elapsed Trans. CPU CPU S.dem S.dem
Send Recv Size Size Time Rate local remote local remote
bytes bytes bytes bytes secs. per sec % S % S us/Tr us/Tr
ip link set dev eth0 gso_max_size 185000 gro_max_size 185000
netperf -H lpaa18 -t TCP_RR -T2,2 -l 10 -Cc -- -r 70000,70000
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to lpaa18.prod.google.com () port 0 AF_INET6 : first burst 0 : cpu bind
Local /Remote
Socket Size Request Resp. Elapsed Trans. CPU CPU S.dem S.dem
Send Recv Size Size Time Rate local remote local remote
bytes bytes bytes bytes secs. per sec % S % S us/Tr us/Tr
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 13 May 2022 18:34:06 +0000 (11:34 -0700)]
veth: enable BIG TCP packets
Set the TSO driver limit to GSO_MAX_SIZE (512 KB).
This allows the admin/user to set a GSO limit up to this value.
ip link set dev veth10 gso_max_size 200000
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>