Bluetooth: hci_core: Make hci_conn_hash_add append to the list
This makes hci_conn_hash_add append to the tail of the conn_hash so it
matches the order they are created, this is required if the controller
attempts to match the order of ACL with CIS which uses append logic
when programming the CIS ids on the CIG.
The result of this change affects Create CIS:
Before:
< HCI Command: LE Create Connected Isochronous Stream (0x08|0x0064) plen 9
Number of CIS: 2
CIS Handle: 2560
ACL Handle: 3586
CIS Handle: 2561
ACL Handle: 3585
After:
< HCI Command: LE Create Connected Isochronous Stream (0x08|0x0064) plen 9
Number of CIS: 2
CIS Handle: 2560
ACL Handle: 3585
CIS Handle: 2561
ACL Handle: 3586
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Bluetooth: hci_mrvl: Add serdev support for 88W8997
Add serdev support for the 88W8997 from NXP (previously Marvell). It
includes support for changing the baud rate. The command to change the
baud rate is taken from the user manual UM11483 Rev. 9 in section 7
(Bring-up of Bluetooth interfaces) from NXP.
Signed-off-by: Stefan Eichenberger <stefan.eichenberger@toradex.com> Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Bluetooth: hci_mrvl: use maybe_unused macro for device tree ids
Use the maybe_unused macro for the device tree ids instead of #ifdef
CONFIG_OF. This makes it easier to add support for new devices.
Signed-off-by: Stefan Eichenberger <stefan.eichenberger@toradex.com> Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
The 88W8997 bluetooth module supports setting the max-speed property.
Signed-off-by: Stefan Eichenberger <stefan.eichenberger@toradex.com> Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Update the documentation with the device tree binding for the Marvell
88W8997 bluetooth device.
Signed-off-by: Stefan Eichenberger <stefan.eichenberger@toradex.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Tomasz Moń [Tue, 7 Feb 2023 11:57:41 +0000 (12:57 +0100)]
Bluetooth: btusb: Do not require hardcoded interface numbers
Remove hardcoded interface number check because Bluetooth specification
since version 4.0 only recommends and no longer requires specific
interface numbers.
While earlier Bluetooth versions, i.e. 2.1 + EDR and 3.0 + HS, contain
required configuration table in Volume 4 - Host Controller Interface
Part B - USB Transport Layer, Bluetooth Core Specification Addendum 2
changes the table from required to recommended configuration.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Unbounded info messages in the pedit datapath can flood the printk
ring buffer quite easily depending on the action created.
As these messages are informational, usually printing some, not all,
is enough to bring attention to the real issue.
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pedro Tammela [Fri, 21 Apr 2023 21:25:16 +0000 (18:25 -0300)]
net/sched: act_pedit: remove extra check for key type
The netlink parsing already validates the key 'htype'.
Remove the datapath check as it's redundant.
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pedro Tammela [Fri, 21 Apr 2023 21:25:15 +0000 (18:25 -0300)]
net/sched: act_pedit: check static offsets a priori
Static key offsets should always be on 32 bit boundaries. Validate them on
create/update time for static offsets and move the datapath validation
for runtime offsets only.
iproute2 already errors out if a given offset and data size cannot be
packed to a 32 bit boundary. This change will make sure users which
create/update pedit instances directly via netlink also error out,
instead of finding out when packets are traversing.
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pedro Tammela [Fri, 21 Apr 2023 21:25:14 +0000 (18:25 -0300)]
net/sched: act_pedit: use extack in 'ex' parsing errors
We have extack available when parsing 'ex' keys, so pass it to
tcf_pedit_keys_ex_parse and add more detailed error messages.
Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pedro Tammela [Fri, 21 Apr 2023 21:25:13 +0000 (18:25 -0300)]
net/sched: act_pedit: use NLA_POLICY for parsing 'ex' keys
Transform two checks in the 'ex' key parsing into netlink policies
removing extra if checks.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yajun Deng [Fri, 21 Apr 2023 08:26:06 +0000 (16:26 +0800)]
net: sched: Print msecs when transmit queue time out
The kernel will print several warnings in a short period of time
when it stalls. Like this:
First warning:
[ 7100.097547] ------------[ cut here ]------------
[ 7100.097550] NETDEV WATCHDOG: eno2 (xxx): transmit queue 8 timed out
[ 7100.097571] WARNING: CPU: 8 PID: 0 at net/sched/sch_generic.c:467
dev_watchdog+0x260/0x270
...
Second warning:
[ 7147.756952] rcu: INFO: rcu_preempt self-detected stall on CPU
[ 7147.756958] rcu: 24-....: (59999 ticks this GP) idle=546/1/0x400000000000000
softirq=367 3137/3673146 fqs=13844
[ 7147.756960] (t=60001 jiffies g=4322709 q=133381)
[ 7147.756962] NMI backtrace for cpu 24
...
We calculate that the transmit queue start stall should occur before
7095s according to watchdog_timeo, the rcu start stall at 7087s.
These two times are close together, it is difficult to confirm which
happened first.
To let users know the exact time the stall started, print msecs when
the transmit queue time out.
Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 23 Apr 2023 13:16:45 +0000 (14:16 +0100)]
Merge branch 'dsa-skb_mac_header'
Vladimir Oltean says:
====================
Remove skb_mac_header() dependency in DSA xmit path
Eric started working on removing skb_mac_header() assumptions from the
networking xmit path, and I offered to help for DSA:
https://lore.kernel.org/netdev/20230321164519.1286357-1-edumazet@google.com/
The majority of this patch set is a straightforward replacement of
skb_mac_header() with skb->data (hidden either behind skb_eth_hdr(), or
behind skb_vlan_eth_hdr()). The only patch which is more "interesting"
is 9/9.
Another potential caller of __skb_vlan_pop() on xmit (and therefore
also of skb_mac_header()) is tcf_vlan_act(), but I haven't had the time
to investigate that (enough to submit changes other than what's here).
v1->v2:
- 09/09: document the vlan_tci argument of vlan_remove_tag() in the kdoc
Vladimir Oltean [Thu, 20 Apr 2023 22:56:01 +0000 (01:56 +0300)]
net: dsa: tag_ocelot: call only the relevant portion of __skb_vlan_pop() on TX
ocelot_xmit_get_vlan_info() calls __skb_vlan_pop() as the most
appropriate helper I could find which strips away a VLAN header.
That's all I need it to do, but __skb_vlan_pop() has more logic, which
will become incompatible with the future revert of commit 6d1ccff62780
("net: reset mac header in dev_start_xmit()").
Namely, it performs a sanity check on skb_mac_header(), which will stop
being set after the above revert, so it will return an error instead of
removing the VLAN tag.
ocelot_xmit_get_vlan_info() gets called in 2 circumstances:
(1) the port is under a VLAN-aware bridge and the bridge sends
VLAN-tagged packets
(2) the port is under a VLAN-aware bridge and somebody else (an 8021q
upper) sends VLAN-tagged packets (using a VID that isn't in the
bridge vlan tables)
In case (1), there is actually no bug to defend against, because
br_dev_xmit() calls skb_reset_mac_header() and things continue to work.
However, in case (2), illustrated using the commands below, it can be
seen that our intervention is needed, since __skb_vlan_pop() complains:
$ ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
$ ip link set $eth master br0 && ip link set $eth up
$ ip link add link $eth name $eth.100 type vlan id 100 && ip link set $eth.100 up
$ ip addr add 192.168.100.1/24 dev $eth.100
I could fend off the checks in __skb_vlan_pop() with some
skb_mac_header_was_set() calls, but seeing how few callers of
__skb_vlan_pop() there are from TX paths, that seems rather
unproductive.
As an alternative solution, extract the bare minimum logic to strip a
VLAN header, and move it to a new helper named vlan_remove_tag(), close
to the definition of vlan_insert_tag(). Document it appropriately and
make ocelot_xmit_get_vlan_info() call this smaller helper instead.
Seeing that it doesn't appear illegal to test skb->protocol in the TX
path, I guess it would be a good for vlan_remove_tag() to also absorb
the vlan_set_encap_proto() function call.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 20 Apr 2023 22:56:00 +0000 (01:56 +0300)]
net: dsa: update TX path comments to not mention skb_mac_header()
Once commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()")
will be reverted, it will no longer be true that skb->data points at
skb_mac_header(skb) - since the skb->mac_header will not be set - so
stop saying that, and just say that it points to the MAC header.
I've reviewed vlan_insert_tag() and it does not *actually* depend on
skb_mac_header(), so reword that to avoid the confusion.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 20 Apr 2023 22:55:59 +0000 (01:55 +0300)]
net: dsa: tag_sja1105: replace skb_mac_header() with vlan_eth_hdr()
This is a cosmetic patch which consolidates the code to use the helper
function offered by if_vlan.h.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 20 Apr 2023 22:55:58 +0000 (01:55 +0300)]
net: dsa: tag_sja1105: don't rely on skb_mac_header() in TX paths
skb_mac_header() will no longer be available in the TX path when
reverting commit 6d1ccff62780 ("net: reset mac header in
dev_start_xmit()"). As preparation for that, let's use
skb_vlan_eth_hdr() to get to the VLAN header instead, which assumes it's
located at skb->data (assumption which holds true here).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 20 Apr 2023 22:55:57 +0000 (01:55 +0300)]
net: dsa: tag_ksz: do not rely on skb_mac_header() in TX paths
skb_mac_header() will no longer be available in the TX path when
reverting commit 6d1ccff62780 ("net: reset mac header in
dev_start_xmit()"). As preparation for that, let's use skb_eth_hdr() to
get to the Ethernet header's MAC DA instead, helper which assumes this
header is located at skb->data (assumption which holds true here).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 20 Apr 2023 22:55:56 +0000 (01:55 +0300)]
net: dsa: tag_ocelot: do not rely on skb_mac_header() for VLAN xmit
skb_mac_header() will no longer be available in the TX path when
reverting commit 6d1ccff62780 ("net: reset mac header in
dev_start_xmit()"). As preparation for that, let's use
skb_vlan_eth_hdr() to get to the VLAN header instead, which assumes it's
located at skb->data (assumption which holds true here).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 20 Apr 2023 22:55:55 +0000 (01:55 +0300)]
net: dpaa: avoid one skb_reset_mac_header() in dpaa_enable_tx_csum()
It appears that dpaa_enable_tx_csum() only calls skb_reset_mac_header()
to get to the VLAN header using skb_mac_header().
We can use skb_vlan_eth_hdr() to get to the VLAN header based on
skb->data directly. This avoids spending a few cycles to set
skb->mac_header.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 20 Apr 2023 22:55:54 +0000 (01:55 +0300)]
net: vlan: introduce skb_vlan_eth_hdr()
Similar to skb_eth_hdr() introduced in commit 96cc4b69581d ("macvlan: do
not assume mac_header is set in macvlan_broadcast()"), let's introduce a
skb_vlan_eth_hdr() helper which can be used in TX-only code paths to get
to the VLAN header based on skb->data rather than based on the
skb_mac_header(skb).
We also consolidate the drivers that dereference skb->data to go through
this helper.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 20 Apr 2023 22:55:53 +0000 (01:55 +0300)]
net: vlan: don't adjust MAC header in __vlan_insert_inner_tag() unless set
This is a preparatory change for the deletion of skb_reset_mac_header(skb)
from __dev_queue_xmit(). After that deletion, skb_mac_header(skb) will
no longer be set in TX paths, from which __vlan_insert_inner_tag() can
still be called (perhaps indirectly).
If we don't make this change, then an unset MAC header (equal to ~0U)
will become set after the adjustment with VLAN_HLEN.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/phy: add driver for Microchip LAN867x 10BASE-T1S PHY
This patch adds support for the Microchip LAN867x 10BASE-T1S family
(LAN8670/1/2). The driver supports P2MP with PLCA.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Ramón Nordin Rodriguez <ramon.nordin.rodriguez@ferroamp.se> Signed-off-by: David S. Miller <davem@davemloft.net>
rxrpc: Replace fake flex-array with flexible-array member
Zero-length arrays as fake flexible arrays are deprecated and we are
moving towards adopting C99 flexible-array members instead.
Transform zero-length array into flexible-array member in struct
rxrpc_ackpacket.
Address the following warnings found with GCC-13 and
-fstrict-flex-arrays=3 enabled:
net/rxrpc/call_event.c:149:38: warning: array subscript i is outside array bounds of ‘uint8_t[0]’ {aka ‘unsigned char[]’} [-Warray-bounds=]
This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [1].
Link: https://github.com/KSPP/linux/issues/21 Link: https://github.com/KSPP/linux/issues/263 Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
cc: linux-hardening@vger.kernel.org Link: https://lore.kernel.org/r/ZAZT11n4q5bBttW0@work/ Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 21 Apr 2023 09:43:57 +0000 (09:43 +0000)]
net: optimize napi_threaded_poll() vs RPS/RFS
We use napi_threaded_poll() in order to reduce our softirq dependency.
We can add a followup of 821eba962d95 ("net: optimize napi_schedule_rps()")
to further remove the need of firing NET_RX_SOFTIRQ whenever
RPS/RFS are used.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 21 Apr 2023 09:43:54 +0000 (09:43 +0000)]
net: do not provide hard irq safety for sd->defer_lock
kfree_skb() can be called from hard irq handlers,
but skb_attempt_defer_free() is meant to be used
from process or BH contexts, and skb_defer_free_flush()
is meant to be called from BH contexts.
Not having to mask hard irq can save some cycles.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Fietkau [Thu, 20 Apr 2023 21:06:42 +0000 (22:06 +0100)]
net: mtk_eth_soc: mediatek: fix ppe flow accounting for v1 hardware
Older chips (like MT7622) use a different bit in ib2 to enable hardware
counter support. Add macros for both and select the appropriate bit.
Fixes: 3fbe4d8c0e53 ("net: ethernet: mtk_eth_soc: ppe: add support for flow accounting") Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 22 Apr 2023 03:47:04 +0000 (20:47 -0700)]
Merge tag 'mlx5-updates-2023-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2023-04-20
1) Dragos Improves RX page pool, and provides some fixes to his previous
series:
1.1) Fix releasing page_pool for striding RQ and legacy RQ nonlinear case
1.2) Hook NAPIs to page pools to gain more performance.
2) From Roi, Some cleanups to TC and eswitch modules.
3) Maher migrates vnic diagnostic counters reporting from debugfs to a
dedicated devlink health reporter
Maher Says:
===========
net/mlx5: Expose vnic diagnostic counters using devlink
Currently, vnic diagnostic counters are exposed through the following
debugfs:
$ ls /sys/kernel/debug/mlx5/0000:08:00.0/esw/vf_0/vnic_diag/
cq_overrun
quota_exceeded_command
total_q_under_processor_handle
invalid_command
send_queue_priority_update_flow
nic_receive_steering_discard
The current design does not allow the hypervisor to view the diagnostic
counters of its VFs, in case the VFs get bound to a VM. In other words,
the counters are not exposed for representor interfaces.
Furthermore, the debugfs design is inconvenient future-wise, in case more
counters need to be reported by the driver in the future.
As these counters pertain to vNIC health, it is more appropriate to
utilize the devlink health reporter to expose them.
Thus, this patchest includes the following changes:
* Drop the current vnic diagnostic counters debugfs interface.
* Add a vnic devlink health reporter for PFs/VFs core devices, which
when diagnosed will dump vnic diagnostic counter values that are
queried from FW.
* Add a vnic devlink health reporter for the representor interface, which
serves the same purpose listed in the previous point, in addition to
allowing the hypervisor to view its VFs diagnostic counters, even when
the VFs are bounded to external VMs.
Example of devlink health reporter usage is:
$devlink health diagnose pci/0000:08:00.0 reporter vnic
vNIC env counters:
total_error_queues: 0 send_queue_priority_update_flow: 0
comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0
invalid_command: 0 quota_exceeded_command: 0
nic_receive_steering_discard: 0
===========
4) SW steering fixes and improvements
Yevgeny Kliteynik Says:
=======================
These short patch series are just small fixes / improvements for
SW steering:
- Patch 1: Fix dumping of legacy modify_hdr in debug dump to
align to what is expected by parser
- Patch 2: Have separate threshold for ICM sync per ICM type
- Patch 3: Add more info to the steering debug dump - Linux
version and device name
- Patch 4: Keep track of number of buddies that are currently
in use per domain per buddy type
=======================
* tag 'mlx5-updates-2023-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: Update op_mode to op_mod for port selection
net/mlx5: E-Switch, Remove unused mlx5_esw_offloads_vport_metadata_set()
net/mlx5: E-Switch, Remove redundant dev arg from mlx5_esw_vport_alloc()
net/mlx5: Include linux/pci.h for pci_msix_can_alloc_dyn()
net/mlx5e: RX, Hook NAPIs to page pools
net/mlx5e: RX, Fix XDP_TX page release for legacy rq nonlinear case
net/mlx5e: RX, Fix releasing page_pool pages twice for striding RQ
net/mlx5e: Add vnic devlink health reporter to representors
net/mlx5: Add vnic devlink health reporter to PFs/VFs
Revert "net/mlx5: Expose vnic diagnostic counters for eswitch managed vports"
Revert "net/mlx5: Expose steering dropped packets counter"
net/mlx5: DR, Add memory statistics for domain object
net/mlx5: DR, Add more info in domain dbg dump
net/mlx5: DR, Calculate sync threshold of each pool according to its type
net/mlx5: DR, Fix dumping of legacy modify_hdr in debug dump
====================
We've added 71 non-merge commits during the last 8 day(s) which contain
a total of 116 files changed, 13397 insertions(+), 8896 deletions(-).
The main changes are:
1) Add a new BPF netfilter program type and minimal support to hook
BPF programs to netfilter hooks such as prerouting or forward,
from Florian Westphal.
2) Fix race between btf_put and btf_idr walk which caused a deadlock,
from Alexei Starovoitov.
3) Second big batch to migrate test_verifier unit tests into test_progs
for ease of readability and debugging, from Eduard Zingerman.
4) Add support for refcounted local kptrs to the verifier for allowing
shared ownership, useful for adding a node to both the BPF list and
rbtree, from Dave Marchevsky.
5) Migrate bpf_for(), bpf_for_each() and bpf_repeat() macros from BPF
selftests into libbpf-provided bpf_helpers.h header and improve
kfunc handling, from Andrii Nakryiko.
6) Support 64-bit pointers to kfuncs needed for archs like s390x,
from Ilya Leoshkevich.
7) Support BPF progs under getsockopt with a NULL optval,
from Stanislav Fomichev.
8) Improve verifier u32 scalar equality checking in order to enable
LLVM transformations which earlier had to be disabled specifically
for BPF backend, from Yonghong Song.
9) Extend bpftool's struct_ops object loading to support links,
from Kui-Feng Lee.
10) Add xsk selftest follow-up fixes for hugepage allocated umem,
from Magnus Karlsson.
11) Support BPF redirects from tc BPF to ifb devices,
from Daniel Borkmann.
12) Add BPF support for integer type when accessing variable length
arrays, from Feng Zhou.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (71 commits)
selftests/bpf: verifier/value_ptr_arith converted to inline assembly
selftests/bpf: verifier/value_illegal_alu converted to inline assembly
selftests/bpf: verifier/unpriv converted to inline assembly
selftests/bpf: verifier/subreg converted to inline assembly
selftests/bpf: verifier/spin_lock converted to inline assembly
selftests/bpf: verifier/sock converted to inline assembly
selftests/bpf: verifier/search_pruning converted to inline assembly
selftests/bpf: verifier/runtime_jit converted to inline assembly
selftests/bpf: verifier/regalloc converted to inline assembly
selftests/bpf: verifier/ref_tracking converted to inline assembly
selftests/bpf: verifier/map_ptr_mixing converted to inline assembly
selftests/bpf: verifier/map_in_map converted to inline assembly
selftests/bpf: verifier/lwt converted to inline assembly
selftests/bpf: verifier/loops1 converted to inline assembly
selftests/bpf: verifier/jeq_infer_not_null converted to inline assembly
selftests/bpf: verifier/direct_packet_access converted to inline assembly
selftests/bpf: verifier/d_path converted to inline assembly
selftests/bpf: verifier/ctx converted to inline assembly
selftests/bpf: verifier/btf_ctx_access converted to inline assembly
selftests/bpf: verifier/bpf_get_stack converted to inline assembly
...
====================
Maxime Bizon [Thu, 20 Apr 2023 18:25:08 +0000 (20:25 +0200)]
net: dst: fix missing initialization of rt_uncached
xfrm_alloc_dst() followed by xfrm4_dst_destroy(), without a
xfrm4_fill_dst() call in between, causes the following BUG:
BUG: spinlock bad magic on CPU#0, fbxhostapd/732
lock: 0x890b7668, .magic: 890b7668, .owner: <none>/-1, .owner_cpu: 0
CPU: 0 PID: 732 Comm: fbxhostapd Not tainted 6.3.0-rc6-next-20230414-00613-ge8de66369925-dirty #9
Hardware name: Marvell Kirkwood (Flattened Device Tree)
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x28/0x30
dump_stack_lvl from do_raw_spin_lock+0x20/0x80
do_raw_spin_lock from rt_del_uncached_list+0x30/0x64
rt_del_uncached_list from xfrm4_dst_destroy+0x3c/0xbc
xfrm4_dst_destroy from dst_destroy+0x5c/0xb0
dst_destroy from rcu_process_callbacks+0xc4/0xec
rcu_process_callbacks from __do_softirq+0xb4/0x22c
__do_softirq from call_with_stack+0x1c/0x24
call_with_stack from do_softirq+0x60/0x6c
do_softirq from __local_bh_enable_ip+0xa0/0xcc
Patch "net: dst: Prevent false sharing vs. dst_entry:: __refcnt" moved
rt_uncached and rt_uncached_list fields from rtable struct to dst
struct, so they are more zeroed by memset_after(xdst, 0, u.dst) in
xfrm_alloc_dst().
Note that rt_uncached (list_head) was never properly initialized at
alloc time, but xfrm[46]_dst_destroy() is written in such a way that
it was not an issue thanks to the memset:
if (xdst->u.rt.dst.rt_uncached_list)
rt_del_uncached_list(&xdst->u.rt);
The route code does it the other way around: rt_uncached_list is
assumed to be valid IIF rt_uncached list_head is not empty:
This patch adds mandatory rt_uncached list_head initialization in
generic dst_init(), and adapt xfrm[46]_dst_destroy logic to match the
rest of the code.
With LEDS_CLASS=m, a built-in qca8k driver fails to link:
arm-linux-gnueabi-ld: drivers/net/dsa/qca/qca8k-leds.o: in function `qca8k_setup_led_ctrl':
qca8k-leds.c:(.text+0x1ea): undefined reference to `devm_led_classdev_register_ext'
Change the dependency to avoid the broken configuration.
Fixes: 1e264f9d2918 ("net: dsa: qca8k: add LEDs basic support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230420213639.2243388-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 18 Apr 2023 19:01:41 +0000 (22:01 +0300)]
net: phy: add basic driver for NXP CBTX PHY
The CBTX PHY is a Fast Ethernet PHY integrated into the SJA1110 A/B/C
automotive Ethernet switches.
It was hoped it would work with the Generic PHY driver, but alas, it
doesn't. The most important reason why is that the PHY is powered down
by default, and it needs a vendor register to power it on.
It has a linear memory map that is accessed over SPI by the SJA1110
switch driver, which exposes a fake MDIO controller. It has the
following (and only the following) standard clause 22 registers:
0x0: MII_BMCR
0x1: MII_BMSR
0x2: MII_PHYSID1
0x3: MII_PHYSID2
0x4: MII_ADVERTISE
0x5: MII_LPA
0x6: MII_EXPANSION
0x7: the missing MII_NPAGE for Next Page Transmit Register
Every other register is vendor-defined.
The register map expands the standard clause 22 5-bit address space of
0x20 registers, however the driver does not need to access the extra
registers for now (and hopefully never). If it ever needs to do that, it
is possible to implement a fake (software) page switching mechanism
between the PHY driver and the SJA1110 MDIO controller driver.
Also, Auto-MDIX is turned off by default in hardware, the driver turns
it on by default and reports the current status. I've tested this with a
VSC8514 link partner and a crossover cable, by forcing the mode on the
link partner, and seeing that the CBTX PHY always sees the reverse of
the mode forced on the VSC8514 (and that traffic works). The link
doesn't come up (as expected) if MDI modes are forced on both ends in
the same way (with the cross-over cable, that is).
Eduard Zingerman [Fri, 21 Apr 2023 17:42:34 +0000 (20:42 +0300)]
selftests/bpf: verifier/value_ptr_arith converted to inline assembly
Test verifier/value_ptr_arith automatically converted to use inline assembly.
Test cases "sanitation: alu with different scalars 2" and
"sanitation: alu with different scalars 3" are updated to
avoid -ENOENT as return value, as __retval() annotation
only supports numeric literals.
Eduard Zingerman [Fri, 21 Apr 2023 17:42:32 +0000 (20:42 +0300)]
selftests/bpf: verifier/unpriv converted to inline assembly
Test verifier/unpriv semi-automatically converted to use inline assembly.
The verifier/unpriv.c had to be split in two parts:
- the bulk of the tests is in the progs/verifier_unpriv.c;
- the single test that needs `struct bpf_perf_event_data`
definition is in the progs/verifier_unpriv_perf.c.
The tests above can't be in a single file because:
- first requires inclusion of the filter.h header
(to get access to BPF_ST_MEM macro, inline assembler does
not support this isntruction);
- the second requires vmlinux.h, which contains definitions
conflicting with filter.h.
Eduard Zingerman [Fri, 21 Apr 2023 17:42:19 +0000 (20:42 +0300)]
selftests/bpf: verifier/loops1 converted to inline assembly
Test verifier/loops1 automatically converted to use inline assembly.
There are a few modifications for the converted tests.
"tracepoint" programs do not support test execution, change program
type to "xdp" (which supports test execution) for the following tests
that have __retval tags:
- bounded loop, count to 4
- bonded loop containing forward jump
Also, remove the __retval tag for test:
- bounded loop, count from positive unknown to 4
Eduard Zingerman [Fri, 21 Apr 2023 17:42:11 +0000 (20:42 +0300)]
selftests/bpf: Add notion of auxiliary programs for test_loader
In order to express test cases that use bpf_tail_call() intrinsic it
is necessary to have several programs to be loaded at a time.
This commit adds __auxiliary annotation to the set of annotations
supported by test_loader.c. Programs marked as auxiliary are always
loaded but are not treated as a separate test.
====================
Changes since last version:
- rework test case in last patch wrt. ctx->skb dereference etc (Alexei)
- pacify bpf ci tests, netfilter program type missed string translation
in libbpf helper.
This still uses runtime btf walk rather than extending
the btf trace array as Alexei suggested, I would do this later (or someone else can).
v1 cover letter:
Add minimal support to hook bpf programs to netfilter hooks, e.g.
PREROUTING or FORWARD.
For this the most relevant parts for registering a netfilter
hook via the in-kernel api are exposed to userspace via bpf_link.
The new program type is 'tracing style', i.e. there is no context
access rewrite done by verifier, the function argument (struct bpf_nf_ctx)
isn't stable.
There is no support for direct packet access, dynptr api should be used
instead.
With this its possible to build a small test program such as:
Changes since v3:
- uapi: remove 'reserved' struct member, s/prio/priority (Alexei)
- add ctx access test cases (Alexei, see last patch)
- some arm32 can only handle cmpxchg on u32 (build bot)
- Fix kdoc annotations (Simon Horman)
- bpftool: prefer p_err, not fprintf (Quentin)
- add test cases in separate patch
Changes since v2:
1. don't WARN when user calls 'bpftool loink detach' twice
restrict attachment to ip+ip6 families, lets relax this
later in case arp/bridge/netdev are needed too.
2. show netfilter links in 'bpftool net' output as well.
Changes since v1:
1. Don't fail to link when CONFIG_NETFILTER=n (build bot)
2. Use test_progs instead of test_verifier (Alexei)
Changes since last RFC version:
1. extend 'bpftool link show' to print prio/hooknum etc
2. extend 'nft list hooks' so it can print the bpf program id
3. Add an extra patch to artificially restrict bpf progs with
same priority. Its fine from a technical pov but it will
cause ordering issues (most recent one comes first).
Can be removed later.
4. Add test_run support for netfilter prog type and a small
extension to verifier tests to make sure we can't return
verdicts like NF_STOLEN.
5. Alter the netfilter part of the bpf_link uapi struct:
- add flags/reserved members.
Not used here except returning errors when they are nonzero.
Plan is to allow the bpf_link users to enable netfilter
defrag or conntrack engine by setting feature flags at
link create time in the future.
====================
selftests/bpf: add missing netfilter return value and ctx access tests
Extend prog_tests with two test cases:
# ./test_progs --allow=verifier_netfilter_retcode
#278/1 verifier_netfilter_retcode/bpf_exit with invalid return code. test1:OK
#278/2 verifier_netfilter_retcode/bpf_exit with valid return code. test2:OK
#278/3 verifier_netfilter_retcode/bpf_exit with valid return code. test3:OK
#278/4 verifier_netfilter_retcode/bpf_exit with invalid return code. test4:OK
#278 verifier_netfilter_retcode:OK
This checks that only accept and drop (0,1) are permitted.
NF_QUEUE could be implemented later if we can guarantee that attachment
of such programs can be rejected if they get attached to a pf/hook that
doesn't support async reinjection.
NF_STOLEN could be implemented via trusted helpers that can guarantee
that the skb will eventually be free'd.
v4: test case for bpf_nf_ctx access checks, requested by Alexei Starovoitov.
v5: also check ctx->{state,skb} can be dereferenced (Alexei).
# ./test_progs --allow=verifier_netfilter_ctx
#281/1 verifier_netfilter_ctx/netfilter invalid context access, size too short:OK
#281/2 verifier_netfilter_ctx/netfilter invalid context access, size too short:OK
#281/3 verifier_netfilter_ctx/netfilter invalid context access, past end of ctx:OK
#281/4 verifier_netfilter_ctx/netfilter invalid context, write:OK
#281/5 verifier_netfilter_ctx/netfilter valid context read and invalid write:OK
#281/6 verifier_netfilter_ctx/netfilter test prog with skb and state read access:OK
#281/7 verifier_netfilter_ctx/netfilter test prog with skb and state read access @unpriv:OK
#281 verifier_netfilter_ctx:OK
Summary: 1/7 PASSED, 0 SKIPPED, 0 FAILED
This checks:
1/2: partial reads of ctx->{skb,state} are rejected
3. read access past sizeof(ctx) is rejected
4. write to ctx content, e.g. 'ctx->skb = NULL;' is rejected
5. ctx->state content cannot be altered
6. ctx->state and ctx->skb can be dereferenced
7. ... same program fails for unpriv (CAP_NET_ADMIN needed).
bpf: add test_run support for netfilter program type
add glue code so a bpf program can be run using userspace-provided
netfilter state and packet/skb.
Default is to use ipv4:output hook point, but this can be overridden by
userspace. Userspace provided netfilter state is restricted, only hook and
protocol families can be overridden and only to ipv4/ipv6.
Dump protocol family, hook and priority value:
$ bpftool link
2: netfilter prog 14
ip input prio -128
pids install(3264)
5: netfilter prog 14
ip6 forward prio 21
pids a.out(3387)
9: netfilter prog 14
ip prerouting prio 123
pids a.out(5700)
10: netfilter prog 14
ip input prio 21
pids test2(5701)
v2: Quentin Monnet suggested to also add 'bpftool net' support:
$ bpftool net
xdp:
tc:
flow_dissector:
netfilter:
ip prerouting prio 21 prog_id 14
ip input prio -128 prog_id 14
ip input prio 21 prog_id 14
ip forward prio 21 prog_id 14
ip output prio 21 prog_id 14
ip postrouting prio 21 prog_id 14
'bpftool net' only dumps netfilter link type, links are sorted by protocol
family, hook and priority.
v5: fix bpf ci failure: libbpf needs small update to prog_type_name[]
and probe_prog_load helper.
v4: don't fail with -EOPNOTSUPP in libbpf probe_prog_load, update
prog_type_name[] with "netfilter" entry (bpf ci)
v3: fix bpf.h copy, 'reserved' member was removed (Alexei)
use p_err, not fprintf (Quentin)
netfilter: disallow bpf hook attachment at same priority
This is just to avoid ordering issues between multiple bpf programs,
this could be removed later in case it turns out to be too cautious.
bpf prog could still be shared with non-bpf hook, otherwise we'd have to
make conntrack hook registration fail just because a bpf program has
same priority.
bpf: minimal support for programs hooked into netfilter framework
This adds minimal support for BPF_PROG_TYPE_NETFILTER bpf programs
that will be invoked via the NF_HOOK() points in the ip stack.
Invocation incurs an indirect call. This is not a necessity: Its
possible to add 'DEFINE_BPF_DISPATCHER(nf_progs)' and handle the
program invocation with the same method already done for xdp progs.
This isn't done here to keep the size of this chunk down.
Verifier restricts verdicts to either DROP or ACCEPT.
bpf: add bpf_link support for BPF_NETFILTER programs
Add bpf_link support skeleton. To keep this reviewable, no bpf program
can be invoked yet, if a program is attached only a c-stub is called and
not the actual bpf program.
Defaults to 'y' if both netfilter and bpf syscall are enabled in kconfig.
Such hook gets removed automatically if the calling program exits.
BPF_NETFILTER program invocation is added in followup change.
NF_HOOK_OP_BPF enum will eventually be read from nfnetlink_hook, it
allows to tell userspace which program is attached at the given hook
when user runs 'nft hook list' command rather than just the priority
and not-very-helpful 'this hook runs a bpf prog but I can't tell which
one'.
Will also be used to disallow registration of two bpf programs with
same priority in a followup patch.
v4: arm32 cmpxchg only supports 32bit operand
s/prio/priority/
v3: restrict prog attachment to ip/ip6 for now, lets lift restrictions if
more use cases pop up (arptables, ebtables, netdev ingress/egress etc).
Kui-Feng Lee [Thu, 20 Apr 2023 00:28:22 +0000 (17:28 -0700)]
bpftool: Update doc to explain struct_ops register subcommand.
The "struct_ops register" subcommand now allows for an optional *LINK_DIR*
to be included. This specifies the directory path where bpftool will pin
struct_ops links with the same name as their corresponding map names.
Kui-Feng Lee [Thu, 20 Apr 2023 00:28:21 +0000 (17:28 -0700)]
bpftool: Register struct_ops with a link.
You can include an optional path after specifying the object name for the
'struct_ops register' subcommand.
Since the commit 226bc6ae6405 ("Merge branch 'Transit between BPF TCP
congestion controls.'") has been accepted, it is now possible to create a
link for a struct_ops. This can be done by defining a struct_ops in
SEC(".struct_ops.link") to make libbpf returns a real link. If we don't pin
the links before leaving bpftool, they will disappear. To instruct bpftool
to pin the links in a directory with the names of the maps, we need to
provide the path of that directory.
Some socket options do getsockopt with optval=NULL to estimate the size
of the final buffer (which is returned via optlen). This breaks BPF
getsockopt assumptions about permitted optval buffer size. Let's enforce
these assumptions only when non-NULL optval is provided.
Jakub Kicinski [Fri, 21 Apr 2023 14:35:51 +0000 (07:35 -0700)]
Merge tag 'wireless-next-2023-04-21' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.4
Most likely the last -next pull request for v6.4. We have changes all
over. rtw88 now supports SDIO bus and iwlwifi continues to work on
Wi-Fi 7 support. Not much stack changes this time.
Major changes:
cfg80211/mac80211
- fix some Fine Time Measurement (FTM) frames not being bufferable
- flush frames before key removal to avoid potential unencrypted
transmission depending on the hardware design
iwlwifi
- preparation for Wi-Fi 7 EHT and multi-link support
rtw88
- SDIO bus support
- RTL8822BS, RTL8822CS and RTL8821CS SDIO chipset support
mt76
- mt7921 P2P support
- mt7996 mesh A-MSDU support
- mt7996 EHT support
- mt7996 coredump support
wcn36xx
- support for pronto v3 hardware
ath11k
- PCIe DeviceTree bindings
- WCN6750: enable SAR support
ath10k
- convert DeviceTree bindings to YAML
* tag 'wireless-next-2023-04-21' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (261 commits)
wifi: rtw88: Update spelling in main.h
wifi: airo: remove ISA_DMA_API dependency
wifi: rtl8xxxu: Simplify setting the initial gain
wifi: rtl8xxxu: Add rtl8xxxu_write{8,16,32}_{set,clear}
wifi: rtl8xxxu: Don't print the vendor/product/serial
wifi: rtw88: Fix memory leak in rtw88_usb
wifi: rtw88: call rtw8821c_switch_rf_set() according to chip variant
wifi: rtw88: set pkg_type correctly for specific rtw8821c variants
wifi: rtw88: rtw8821c: Fix rfe_option field width
wifi: rtw88: usb: fix priority queue to endpoint mapping
wifi: rtw88: 8822c: add iface combination
wifi: rtw88: handle station mode concurrent scan with AP mode
wifi: rtw88: prevent scan abort with other VIFs
wifi: rtw88: refine reserved page flow for AP mode
wifi: rtw88: disallow PS during AP mode
wifi: rtw88: 8822c: extend reserved page number
wifi: rtw88: add port switch for AP mode
wifi: rtw88: add bitmap for dynamic port settings
wifi: rtw89: mac: use regular int as return type of DLE buffer request
wifi: mac80211: remove return value check of debugfs_create_dir()
...
====================
When calculating the address of the refcount_t struct within a local
kptr, bpf_refcount_acquire_impl should add refcount_off bytes to the
address of the local kptr. Due to some missing parens, the function is
incorrectly adding sizeof(refcount_t) * refcount_off bytes. This patch
fixes the calculation.
Due to the incorrect calculation, bpf_refcount_acquire_impl was trying
to refcount_inc some memory well past the end of local kptrs, resulting
in kasan and refcount complaints, as reported in [0]. In that thread,
Florian and Eduard discovered that bpf selftests written in the new
style - with __success and an expected __retval, specifically - were
not actually being run. As a result, selftests added in bpf_refcount
series weren't really exercising this behavior, and thus didn't unearth
the bug.
With this fixed behavior it's safe to revert commit 7c4b96c00043
("selftests/bpf: disable program test run for progs/refcounted_kptr.c"),
this patch does so.
Florian and Eduard reported hard dead lock:
[ 58.433327] _raw_spin_lock_irqsave+0x40/0x50
[ 58.433334] btf_put+0x43/0x90
[ 58.433338] bpf_find_btf_id+0x157/0x240
[ 58.433353] btf_parse_fields+0x921/0x11c0
This happens since btf->refcount can be 1 at the time of btf_put() and
btf_put() will call btf_free_id() which will try to grab btf_idr_lock
and will dead lock.
Avoid the issue by doing btf_put() without locking.
Fixes: 3d78417b60fb ("bpf: Add bpf_btf_find_by_name_kind() helper.") Fixes: 1e89106da253 ("bpf: Add bpf_core_add_cands() and wire it into bpf_core_apply_relo_insn().") Reported-by: Florian Westphal <fw@strlen.de> Reported-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20230421014901.70908-1-alexei.starovoitov@gmail.com
Jianfeng Tan [Wed, 19 Apr 2023 07:24:16 +0000 (15:24 +0800)]
net/packet: support mergeable feature of virtio
Packet sockets, like tap, can be used as the backend for kernel vhost.
In packet sockets, virtio net header size is currently hardcoded to be
the size of struct virtio_net_hdr, which is 10 bytes; however, it is not
always the case: some virtio features, such as mrg_rxbuf, need virtio
net header to be 12-byte long.
Mergeable buffers, as a virtio feature, is worthy of supporting: packets
that are larger than one-mbuf size will be dropped in vhost worker's
handle_rx if mrg_rxbuf feature is not used, but large packets
cannot be avoided and increasing mbuf's size is not economical.
With this virtio feature enabled by virtio-user, packet sockets with
hardcoded 10-byte virtio net header will parse mac head incorrectly in
packet_snd by taking the last two bytes of virtio net header as part of
mac header.
This incorrect mac header parsing will cause packet to be dropped due to
invalid ether head checking in later under-layer device packet receiving.
By adding extra field vnet_hdr_sz with utilizing holes in struct
packet_sock to record currently used virtio net header size and supporting
extra sockopt PACKET_VNET_HDR_SZ to set specified vnet_hdr_sz, packet
sockets can know the exact length of virtio net header that virtio user
gives.
In packet_snd, tpacket_snd and packet_recvmsg, instead of using
hardcoded virtio net header size, it can get the exact vnet_hdr_sz from
corresponding packet_sock, and parse mac header correctly based on this
information to avoid the packets being mistakenly dropped.
Signed-off-by: Jianfeng Tan <henry.tjf@antgroup.com> Co-developed-by: Anqi Shen <amy.saq@antgroup.com> Signed-off-by: Anqi Shen <amy.saq@antgroup.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 21 Apr 2023 10:49:47 +0000 (11:49 +0100)]
Merge branch 'mlx5-ipsec-fixes'
Leon Romanovsky says:
====================
Fixes to mlx5 IPsec implementation
This small patchset includes various fixes and one refactoring patch
which I collected for the features sent in this cycle, with one exception -
first patch.
First patch fixes code which was introduced in previous cycle, however I
was able to trigger FW error only in custom debug code, so don't see a
need to send it to net-rc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Thu, 20 Apr 2023 08:02:51 +0000 (11:02 +0300)]
net/mlx5e: Refactor duplicated code in mlx5e_ipsec_init_macs
ARP discovery code has same logic for RX and TX flows, but with
different source and destination fields. Instead of duplicating
same code in mlx5e_ipsec_init_macs, let's refactor.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 4562116f8a56 ("net/mlx5e: Generalize IPsec work structs") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Thu, 20 Apr 2023 08:02:49 +0000 (11:02 +0300)]
net/mlx5e: Compare all fields in IPv6 address
Fix size argument in memcmp to compare whole IPv6 address.
Fixes: b3beba1fb404 ("net/mlx5e: Allow policies with reqid 0, to support IKE policy holes") Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Emeel Hakim <ehakim@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Thu, 20 Apr 2023 08:02:48 +0000 (11:02 +0300)]
net/mlx5e: Don't overwrite extack message returned from IPsec SA validator
Addition of new err_xfrm label caused to error messages be overwritten.
Fix it by using proper NL_SET_ERR_MSG_WEAK_MOD macro together with change
in a default message.
Fixes: aa8bd0c9518c ("net/mlx5e: Support IPsec acquire default SA") Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Thu, 20 Apr 2023 08:02:47 +0000 (11:02 +0300)]
net/mlx5e: Fix FW error while setting IPsec policy block action
When trying to set IPsec policy block action the following error is
generated:
mlx5_cmd_out_err:803:(pid 3426): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed,
status bad parameter(0x3), syndrome (0x8708c3), err(-22)
This error means that drop action is not allowed when modify action is
set, so update the code to skip modify header for XFRM_POLICY_BLOCK action.
Fixes: 6721239672fe ("net/mlx5e: Skip IPsec encryption for TX path without matching policy") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yan Wang [Wed, 19 Apr 2023 14:13:46 +0000 (22:13 +0800)]
net: stmmac:fix system hang when setting up tag_8021q VLAN for DSA ports
The system hang because of dsa_tag_8021q_port_setup()->
stmmac_vlan_rx_add_vid().
I found in stmmac_drv_probe() that cailing pm_runtime_put()
disabled the clock.
First, when the kernel is compiled with CONFIG_PM=y,The stmmac's
resume/suspend is active.
Secondly,stmmac as DSA master,the dsa_tag_8021q_port_setup() function
will callback stmmac_vlan_rx_add_vid when DSA dirver starts. However,
The system is hanged for the stmmac_vlan_rx_add_vid() accesses its
registers after stmmac's clock is closed.
I would suggest adding the pm_runtime_resume_and_get() to the
stmmac_vlan_rx_add_vid().This guarantees that resuming clock output
while in use.
Fixes: b3dcb3127786 ("net: stmmac: correct clocks enabled in stmmac_vlan_rx_kill_vid()") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Yan Wang <rk.code@outlook.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 21 Apr 2023 07:29:14 +0000 (08:29 +0100)]
Merge branch 'pds_core'
Shannon Nelson says:
====================
pds_core driver
Summary:
--------
This patchset implements a new driver for use with the AMD/Pensando
Distributed Services Card (DSC), intended to provide core configuration
services through the auxiliary_bus and through a couple of EXPORTed
functions for use initially in VFio and vDPA feature specific drivers.
To keep this patchset to a manageable size, the pds_vdpa and pds_vfio
drivers have been split out into their own patchsets to be reviewed
separately.
Detail:
-------
AMD/Pensando is making available a new set of devices for supporting vDPA,
VFio, and potentially other features in the Distributed Services Card
(DSC). These features are implemented through a PF that serves as a Core
device for controlling and configuring its VF devices. These VF devices
have separate drivers that use the auxiliary_bus to work through the Core
device as the control path.
Currently, the DSC supports standard ethernet operations using the
ionic driver. This is not replaced by the Core-based devices - these
new devices are in addition to the existing Ethernet device. Typical DSC
configurations will include both PDS devices and Ionic Eth devices.
However, there is a potential future path for ethernet services to come
through this device as well.
The Core device is a new PCI PF/VF device managed by a new driver
'pds_core'. The PF device has access to an admin queue for configuring
the services used by the VFs, and sets up auxiliary_bus devices for each
vDPA VF for communicating with the drivers for the vDPA devices. The VFs
may be for VFio or vDPA, and other services in the future; these VF types
are selected as part of the DSC internal FW configurations, which is out
of the scope of this patchset.
When the vDPA support set is enabled in the core PF through its devlink
param, auxiliary_bus devices are created for each VF that supports the
feature. The vDPA driver then connects to and uses this auxiliary_device
to do control path configuration through the PF device. This can then be
used with the vdpa kernel module to provide devices for virtio_vdpa kernel
module for host interfaces, or vhost_vdpa kernel module for interfaces
exported into your favorite VM.
A cheap ASCII diagram of a vDPA instance looks something like this:
Changes:
v11:
- change strncpy to strscpy Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202304181137.WaZTYyAa-lkp@intel.com/
v10: Link: https://lore.kernel.org/netdev/20230418003228.28234-1-shannon.nelson@amd.com/
- remove CONFIG_DEBUG_FS guard static inline stuff
- remove unnecessary 0 and null initializations
- verify in driver load that PDS_CORE_DRV_NAME matches KBUILD_MODNAME
- remove debugfs irqs_show(), redundant with /proc
- return -ENOMEM if intr_info = kcalloc() fails
- move the status code enum into pds_core_if.h as part of API definition
- fix up one place in pdsc_devcmd_wait() we're using the status codes where we could use the errno
- remove redundant calls to flush_workqueue()
- grab config_lock before testing state bits in pdsc_fw_reporter_diagnose()
- change pdsc_color_match() to return bool
- remove useless VIF setup loop and just setup vDPA services for now
- remove pf pointer from struct padev and have clients use pci_physfn()
- drop use of "vf" in auxdev.c function names, make more generic
- remove last of client ops struct and simply export the functions
- drop drivers@pensando.io from MAINTAINERS and add new include dir
- include dynamic_debug.h in adminq.c to protect dynamic_hex_dump()
- fixed fw_slot type from u8 to int for handling error returns
- fixed comment spelling
- changed void arg in pdsc_adminq_post() to struct pdsc *
v9: Link: https://lore.kernel.org/netdev/20230406234143.11318-1-shannon.nelson@amd.com/
- change pdsc field name id to uid to clarify the unique id used for aux device
- remove unnecessary pf->state and other checks in aux device creation
- hardcode fw slotnames for devlink info, don't use strings from FW
- handle errors from PDS_CORE_CMD_INIT devcmd call
- tighten up health thread use of config_lock
- remove pdsc_queue_health_check() layer over queuing health check
- start pds_core.rst file in first patch, add to it incrementally
- give more user interaction info in commit messages
- removed a few more extraneous includes
v5: Link: https://lore.kernel.org/netdev/20230322185626.38758-1-shannon.nelson@amd.com/
- added devlink health reporter for FW issues
- removed asic_type, asic_rev, serial_num, fw_version from debugfs as
they are available through other means
- trimed OS info in pdsc_identify(), we don't need to send that much info to the FW
- removed reg/unreg from auxbus client API, they are now in the core when VF
is started
- removed need for pdsc definition in client by simplifying the padev to only carry
struct pci_dev pointers rather than full struct pdsc to the pf and vf
- removed the unused pdsc argument in pdsc_notify()
- moved include/linux/pds/pds_core.h to driver/../pds_core/core.h
- restored a few pds_core_if.h interface values and structs that are shared
with FW source
- moved final config_lock unlock to before tear down of timer and workqueue
to be sure there are no deadlocks while waiting for any stragglers
- changed use of PAGE_SIZE to local PDS_PAGE_SIZE to keep with FW layout needs
without regard to kernel PAGE_SIZE configuration
- removed the redundant *adminqcq argument from pdsc_adminq_post()
v4: Link: https://lore.kernel.org/netdev/20230308051310.12544-1-shannon.nelson@amd.com/
- reworked to attach to both Core PF and vDPA VF PCI devices
- now creates auxiliary_device as part of each VF PCI probe, removes them on PCI remove
- auxiliary devices now use simple unique id rather than PCI address for identifier
- replaced home-grown event publishing with kernel-based notifier service
- dropped live_migration parameter, not needed when not creating aux device for it
- replaced devm_* functions with traditional interfaces
- added MAINTAINERS entry
- removed lingering traces of set/get_vf attribute adminq commands
- trimmed some include lists
- cleaned a kernel test robot complaint about a stray unused variable Link: https://lore.kernel.org/oe-kbuild-all/202302181049.yeUQMeWY-lkp@intel.com/
v3: Link: https://lore.kernel.org/netdev/20230217225558.19837-1-shannon.nelson@amd.com/
- changed names from "pensando" to "amd" and updated copyright strings
- dropped the DEVLINK_PARAM_GENERIC_ID_FW_BANK for future development
- changed the auxiliary device creation to be triggered by the
PCI bus event BOUND_DRIVER, and torn down at UNBIND_DRIVER in order
to properly handle users using the sysfs bind/unbind functions
- dropped some noisy log messages
- rebased to current net-next
RFC to v2: Link: https://lore.kernel.org/netdev/20221207004443.33779-1-shannon.nelson@amd.com/
- added separate devlink param patches for DEVLINK_PARAM_GENERIC_ID_ENABLE_MIGRATION
and DEVLINK_PARAM_GENERIC_ID_FW_BANK, and dropped the driver specific implementations
- updated descriptions for the new devlink parameters
- dropped netdev support
- dropped vDPA patches, will followup later
- separated fw update and fw bank select into their own patches
Shannon Nelson [Wed, 19 Apr 2023 17:04:26 +0000 (10:04 -0700)]
pds_core: publish events to the clients
When the Core device gets an event from the device, or notices
the device FW to be up or down, it needs to send those events
on to the clients that have an event handler. Add the code to
pass along the events to the clients.
The entry points pdsc_register_notify() and pdsc_unregister_notify()
are EXPORTed for other drivers that want to listen for these events.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Wed, 19 Apr 2023 17:04:25 +0000 (10:04 -0700)]
pds_core: add the aux client API
Add the client API operations for running adminq commands.
The core registers the client with the FW, then the client
has a context for requesting adminq services. We expect
to add additional operations for other clients, including
requesting additional private adminqs and IRQs, but don't have
the need yet.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Wed, 19 Apr 2023 17:04:23 +0000 (10:04 -0700)]
pds_core: add auxiliary_bus devices
An auxiliary_bus device is created for each vDPA type VF at VF
probe and destroyed at VF remove. The aux device name comes
from the driver name + VIF type + the unique id assigned at PCI
probe. The VFs are always removed on PF remove, so there should
be no issues with VFs trying to access missing PF structures.
The auxiliary_device names will look like "pds_core.vDPA.nn"
where 'nn' is the VF's uid.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Wed, 19 Apr 2023 17:04:22 +0000 (10:04 -0700)]
pds_core: add initial VF device handling
This is the initial VF PCI driver framework for the new
pds_vdpa VF device, which will work in conjunction with an
auxiliary_bus client of the pds_core driver. This does the
very basics of registering for the new VF device, setting
up debugfs entries, and registering with devlink.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Wed, 19 Apr 2023 17:04:21 +0000 (10:04 -0700)]
pds_core: set up the VIF definitions and defaults
The Virtual Interfaces (VIFs) supported by the DSC's
configuration (vDPA, Eth, RDMA, etc) are reported in the
dev_ident struct and made visible in debugfs. At this point
only vDPA is supported in this driver so we only setup
devices for that feature.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Wed, 19 Apr 2023 17:04:20 +0000 (10:04 -0700)]
pds_core: add FW update feature to devlink
Add in the support for doing firmware updates. Of the two
main banks available, a and b, this updates the one not in
use and then selects it for the next boot.
Example:
devlink dev flash pci/0000:b2:00.0 \
file pensando/dsc_fw_1.63.0-22.tar
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Wed, 19 Apr 2023 17:04:18 +0000 (10:04 -0700)]
pds_core: set up device and adminq
Set up the basic adminq and notifyq queue structures. These are
used mostly by the client drivers for feature configuration.
These are essentially the same adminq and notifyq as in the
ionic driver.
Part of this includes querying for device identity and FW
information, so we can make that available to devlink dev info.