]> git.apps.os.sepia.ceph.com Git - ceph-client.git/log
ceph-client.git
19 months agortnetlink: make rtnl_fill_link_ifmap() RCU ready
Eric Dumazet [Thu, 22 Feb 2024 10:50:20 +0000 (10:50 +0000)]
rtnetlink: make rtnl_fill_link_ifmap() RCU ready

Use READ_ONCE() to read the following device fields:

dev->mem_start
dev->mem_end
dev->base_addr
dev->irq
dev->dma
dev->if_port

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoinet: switch inet_dump_fib() to RCU protection
Eric Dumazet [Thu, 22 Feb 2024 10:50:19 +0000 (10:50 +0000)]
inet: switch inet_dump_fib() to RCU protection

No longer hold RTNL while calling inet_dump_fib().

Also change return value for a completed dump:

Returning 0 instead of skb->len allows NLMSG_DONE
to be appended to the skb. User space does not have
to call us again to get a standalone NLMSG_DONE marker.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonexthop: allow nexthop_mpath_fill_node() to be called without RTNL
Eric Dumazet [Thu, 22 Feb 2024 10:50:18 +0000 (10:50 +0000)]
nexthop: allow nexthop_mpath_fill_node() to be called without RTNL

nexthop_mpath_fill_node() will be potentially called
from contexts holding rcu_lock instead of RTNL.

Suggested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/all/ZdZDWVdjMaQkXBgW@shredder/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoinet: allow ip_valid_fib_dump_req() to be called with RTNL or RCU
Eric Dumazet [Thu, 22 Feb 2024 10:50:17 +0000 (10:50 +0000)]
inet: allow ip_valid_fib_dump_req() to be called with RTNL or RCU

Add a new field into struct fib_dump_filter, to let callers
tell if they use RTNL locking or RCU.

This is used in the following patch, when inet_dump_fib()
no longer holds RTNL.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoipv6: switch inet6_dump_ifinfo() to RCU protection
Eric Dumazet [Thu, 22 Feb 2024 10:50:16 +0000 (10:50 +0000)]
ipv6: switch inet6_dump_ifinfo() to RCU protection

No longer hold RTNL while calling inet6_dump_ifinfo()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agortnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag
Eric Dumazet [Thu, 22 Feb 2024 10:50:15 +0000 (10:50 +0000)]
rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag

Similarly to RTNL_FLAG_DOIT_UNLOCKED, this new flag
allows dump operations registered via rtnl_register()
or rtnl_register_module() to opt-out from RTNL protection.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agortnetlink: change nlk->cb_mutex role
Eric Dumazet [Thu, 22 Feb 2024 10:50:14 +0000 (10:50 +0000)]
rtnetlink: change nlk->cb_mutex role

In commit af65bdfce98d ("[NETLINK]: Switch cb_lock spinlock
to mutex and allow to override it"), Patrick McHardy used
a common mutex to protect both nlk->cb and the dump() operations.

The override is used for rtnl dumps, registered with
rntl_register() and rntl_register_module().

We want to be able to opt-out some dump() operations
to not acquire RTNL, so we need to protect nlk->cb
with a per socket mutex.

This patch renames nlk->cb_def_mutex to nlk->nl_cb_mutex

The optional pointer to the mutex used to protect dump()
call is stored in nlk->dump_cb_mutex

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonetlink: hold nlk->cb_mutex longer in __netlink_dump_start()
Eric Dumazet [Thu, 22 Feb 2024 10:50:13 +0000 (10:50 +0000)]
netlink: hold nlk->cb_mutex longer in __netlink_dump_start()

__netlink_dump_start() releases nlk->cb_mutex right before
calling netlink_dump() which grabs it again.

This seems dangerous, even if KASAN did not bother yet.

Add a @lock_taken parameter to netlink_dump() to let it
grab the mutex if called from netlink_recvmsg() only.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonetlink: fix netlink_diag_dump() return value
Eric Dumazet [Thu, 22 Feb 2024 10:50:12 +0000 (10:50 +0000)]
netlink: fix netlink_diag_dump() return value

__netlink_diag_dump() returns 1 if the dump is not complete,
zero if no error occurred.

If err variable is zero, this means the dump is complete:
We should not return skb->len in this case, but 0.

This allows NLMSG_DONE to be appended to the skb.
User space does not have to call us again only to get NLMSG_DONE.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoipv6: use xarray iterator to implement inet6_dump_ifinfo()
Eric Dumazet [Thu, 22 Feb 2024 10:50:11 +0000 (10:50 +0000)]
ipv6: use xarray iterator to implement inet6_dump_ifinfo()

Prepare inet6_dump_ifinfo() to run with RCU protection
instead of RTNL and use for_each_netdev_dump() interface.

Also properly return 0 at the end of a dump, avoiding
an extra recvmsg() system call and RTNL acquisition.

Note that RTNL-less dumps need core changes, coming later
in the series.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoipv6: prepare inet6_fill_ifinfo() for RCU protection
Eric Dumazet [Thu, 22 Feb 2024 10:50:10 +0000 (10:50 +0000)]
ipv6: prepare inet6_fill_ifinfo() for RCU protection

We want to use RCU protection instead of RTNL
for inet6_fill_ifinfo().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoipv6: prepare inet6_fill_ifla6_attrs() for RCU
Eric Dumazet [Thu, 22 Feb 2024 10:50:09 +0000 (10:50 +0000)]
ipv6: prepare inet6_fill_ifla6_attrs() for RCU

We want to no longer hold RTNL while calling inet6_fill_ifla6_attrs()
in the future. Add needed READ_ONCE()/WRITE_ONCE() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agortnetlink: prepare nla_put_iflink() to run under RCU
Eric Dumazet [Thu, 22 Feb 2024 10:50:08 +0000 (10:50 +0000)]
rtnetlink: prepare nla_put_iflink() to run under RCU

We want to be able to run rtnl_fill_ifinfo() under RCU protection
instead of RTNL in the future.

This patch prepares dev_get_iflink() and nla_put_iflink()
to run either with RTNL or RCU held.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoMerge branch 'dp83826'
David S. Miller [Mon, 26 Feb 2024 11:38:45 +0000 (11:38 +0000)]
Merge branch 'dp83826'

Jérémie Dautheribes says:

====================
Add support for TI DP83826 configuration

This short patch series introduces the possibility of overriding
some parameters which are latched by default by hardware straps on the
TI DP83826 PHY.

The settings that can be overridden include:
  - Configuring the PHY in either MII mode or RMII mode.
  - When in RMII mode, configuring the PHY in RMII slave mode or RMII
  master mode.

The RMII master/slave mode is TI-specific and determines whether the PHY
operates from a 25MHz reference clock (master mode) or from a 50MHz
reference clock (slave mode).

While these features should be supported by all the TI DP8382x family,
I have only been able to test them on TI DP83826 hardware.  Therefore,
support has been added specifically for this PHY in this patch series.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: phy: dp83826: support configuring RMII master/slave operation mode
Jérémie Dautheribes [Thu, 22 Feb 2024 10:31:17 +0000 (11:31 +0100)]
net: phy: dp83826: support configuring RMII master/slave operation mode

The TI DP83826 PHY can operate between two RMII modes:
- master mode (PHY operates from a 25MHz clock reference)
        - slave mode (PHY operates from a 50MHz clock reference)

By default, the operation mode is configured by hardware straps.

Add support to configure the operation mode from within the driver.

Signed-off-by: Jérémie Dautheribes <jeremie.dautheribes@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: phy: dp83826: Add support for phy-mode configuration
Jérémie Dautheribes [Thu, 22 Feb 2024 10:31:16 +0000 (11:31 +0100)]
net: phy: dp83826: Add support for phy-mode configuration

The TI DP83826 PHY can operate in either MII mode or RMII mode.
By default, it is configured by straps.
It can also be configured by writing to the bit 5 of register 0x17 - RMII
and Status Register (RCSR).

When phydev->interface is rmii, rmii mode must be enabled, otherwise
mii mode must be set.
This prevents misconfiguration of hw straps.

Signed-off-by: Jérémie Dautheribes <jeremie.dautheribes@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agodt-bindings: net: dp83822: support configuring RMII master/slave mode
Jérémie Dautheribes [Thu, 22 Feb 2024 10:31:15 +0000 (11:31 +0100)]
dt-bindings: net: dp83822: support configuring RMII master/slave mode

Add property ti,rmii-mode to support selecting the RMII operation mode
between:
- master mode (PHY operates from a 25MHz clock reference)
- slave mode (PHY operates from a 50MHz clock reference)

If not set, the operation mode is configured by hardware straps.

Signed-off-by: Jérémie Dautheribes <jeremie.dautheribes@bootlin.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: dsa: microchip: Add support for bridge port isolation
Oleksij Rempel [Thu, 22 Feb 2024 07:51:13 +0000 (08:51 +0100)]
net: dsa: microchip: Add support for bridge port isolation

Implement bridge port isolation for KSZ switches. Enabling the isolation
of switch ports from each other while maintaining connectivity with the
CPU and other forwarding ports. For instance, to isolate swp1 and swp2
from each other, use the following commands:
- bridge link set dev swp1 isolated on
- bridge link set dev swp2 isolated on

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agotools: ynl: fix header guards
Jakub Kicinski [Thu, 22 Feb 2024 23:48:31 +0000 (15:48 -0800)]
tools: ynl: fix header guards

devlink and ethtool have a trailing _ in the header guard. I must have
copy/pasted it into new guards, assuming it's a headers_install artifact.

This fixes build if system headers are old.

Fixes: 8f109e91b852 ("tools: ynl: include dpll and mptcp_pm in C codegen")
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240222234831.179181-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agogenetlink: make info in GENL_REQ_ATTR_CHECK() const
Jakub Kicinski [Thu, 22 Feb 2024 22:28:19 +0000 (14:28 -0800)]
genetlink: make info in GENL_REQ_ATTR_CHECK() const

Make the local variable in GENL_REQ_ATTR_CHECK() const.
genl_info_dump() returns a const pointer, so the macro
is currently hard to use in genl dumps.

Link: https://lore.kernel.org/r/20240222222819.156320-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoMerge branch 'tools-ynl-couple-of-cmdline-enhancements'
Jakub Kicinski [Sat, 24 Feb 2024 02:16:45 +0000 (18:16 -0800)]
Merge branch 'tools-ynl-couple-of-cmdline-enhancements'

Jiri Pirko says:

====================
tools: ynl: couple of cmdline enhancements

This is part of the original "netlink: specs: devlink: add the rest of
missing attribute definitions" set which was rejected [1]. These three
patches enhances the cmdline user comfort, allowing to pass flag
attribute with bool values and enum names instead of scalars.

[1] https://lore.kernel.org/all/20240220181004.639af931@kernel.org/
====================

Link: https://lore.kernel.org/r/20240222134351.224704-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agotools: ynl: allow user to pass enum string instead of scalar value
Jiri Pirko [Thu, 22 Feb 2024 13:43:51 +0000 (14:43 +0100)]
tools: ynl: allow user to pass enum string instead of scalar value

During decoding of messages coming from kernel, attribute values are
converted to enum names in case the attribute type is enum of bitfield32.

However, when user constructs json message, he has to pass plain scalar
values. See "state" "selector" and "value" attributes in following
examples:

$ sudo ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml --do pin-set --json '{"id": 0, "parent-device": {"parent-id": 0, "state": 1}}'
$ sudo ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml --do port-set --json '{"bus-name": "pci", "dev-name": "0000:08:00.1", "port-index": 98304, "port-function": {"caps": {"selector": 1, "value": 1 }}}'

Allow user to pass strings containing enum names, convert them to scalar
values to be encoded into Netlink message:

$ sudo ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml --do pin-set --json '{"id": 0, "parent-device": {"parent-id": 0, "state": "connected"}}'
$ sudo ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml --do port-set --json '{"bus-name": "pci", "dev-name": "0000:08:00.1", "port-index": 98304, "port-function": {"caps": {"selector": ["roce-bit"], "value": ["roce-bit"] }}}'

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240222134351.224704-4-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agotools: ynl: process all scalar types encoding in single elif statement
Jiri Pirko [Thu, 22 Feb 2024 13:43:50 +0000 (14:43 +0100)]
tools: ynl: process all scalar types encoding in single elif statement

As a preparation to handle enums for scalar values, unify the processing
of all scalar types in a single elif statement.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240222134351.224704-3-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agotools: ynl: allow user to specify flag attr with bool values
Jiri Pirko [Thu, 22 Feb 2024 13:43:49 +0000 (14:43 +0100)]
tools: ynl: allow user to specify flag attr with bool values

The flag attr presence in Netlink message indicates value "true",
if it is missing in the message it means "false".

Allow user to specify attrname with value "true"/"false"
in json for flag attrs, treat "false" value properly.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240222134351.224704-2-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoMerge branch 'net-staging-don-t-bother-filling-in-ethtool-driver-version'
Jakub Kicinski [Sat, 24 Feb 2024 02:04:15 +0000 (18:04 -0800)]
Merge branch 'net-staging-don-t-bother-filling-in-ethtool-driver-version'

John Garry says:

====================
net: Don't bother filling in ethtool driver version

The drivers included in this series set the ethtool driver version to the
same as the default, UTS_RELEASE, so don't both doing this.

As noted by Masahiro in [0], with CONFIG_MODVERSIONS=y, some drivers could
be built as modules against a different kernel tree with differing
UTS_RELEASE. As such, these changes could lead to a change in behaviour.
However, defaulting to the core kernel UTS_RELEASE would be expected
behaviour.

[0] https://lore.kernel.org/all/CAK7LNASfTW+OMk1cJJWb4E6P+=k0FEsm_=6FDfDF_mTrxJCSMQ@mail.gmail.com/
====================

Link: https://lore.kernel.org/r/20240222090042.12609-1-john.g.garry@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: team: Don't bother filling in ethtool driver version
John Garry [Thu, 22 Feb 2024 09:00:41 +0000 (09:00 +0000)]
net: team: Don't bother filling in ethtool driver version

The version is same as the default, so don't bother filling it in.

Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240222090042.12609-3-john.g.garry@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agorocker: Don't bother filling in ethtool driver version
John Garry [Thu, 22 Feb 2024 09:00:40 +0000 (09:00 +0000)]
rocker: Don't bother filling in ethtool driver version

The version is same as the default, so don't bother filling it in.

Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240222090042.12609-2-john.g.garry@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agops3/gelic: minor Kernel Doc corrections
Simon Horman [Wed, 21 Feb 2024 17:46:21 +0000 (17:46 +0000)]
ps3/gelic: minor Kernel Doc corrections

* Update the Kernel Doc for gelic_descr_set_tx_cmdstat()
  and gelic_net_setup_netdev() so that documented name
  and the actual name of the function match.

* Move define of GELIC_ALIGN() so that it is no longer
  between gelic_alloc_card_net() and it's Kernel Doc.

* Document netdev parameter of gelic_alloc_card_net()
  in a way consistent to the documentation of other netdev parameters
  in this file.

Addresses the following warnings flagged by ./scripts/kernel-doc -none:

  .../ps3_gelic_net.c:711: warning: expecting prototype for gelic_net_set_txdescr_cmdstat(). Prototype was for gelic_descr_set_tx_cmdstat() instead
  .../ps3_gelic_net.c:1474: warning: expecting prototype for gelic_ether_setup_netdev(). Prototype was for gelic_net_setup_netdev() instead
  .../ps3_gelic_net.c:1528: warning: expecting prototype for gelic_alloc_card_net(). Prototype was for GELIC_ALIGN() instead
  .../ps3_gelic_net.c:1531: warning: Function parameter or struct member 'netdev' not described in 'gelic_alloc_card_net'

Signed-off-by: Simon Horman <horms@kernel.org>
Acked-by: Geoff Levand <geoff@infradead.org>
Link: https://lore.kernel.org/r/20240221-ps3-gelic-kdoc-v1-1-7629216d1340@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: mpls: error out if inner headers are not set
Florian Westphal [Thu, 22 Feb 2024 14:03:10 +0000 (15:03 +0100)]
net: mpls: error out if inner headers are not set

mpls_gso_segment() assumes skb_inner_network_header() returns
a valid result:

  mpls_hlen = skb_inner_network_header(skb) - skb_network_header(skb);
  if (unlikely(!mpls_hlen || mpls_hlen % MPLS_HLEN))
        goto out;
  if (unlikely(!pskb_may_pull(skb, mpls_hlen)))

With syzbot reproducer, skb_inner_network_header() yields 0,
skb_network_header() returns 108, so this will
"pskb_may_pull(skb, -108)))" which triggers a newly added
DEBUG_NET_WARN_ON_ONCE() check:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 5068 at include/linux/skbuff.h:2723 pskb_may_pull_reason include/linux/skbuff.h:2723 [inline]
WARNING: CPU: 0 PID: 5068 at include/linux/skbuff.h:2723 pskb_may_pull include/linux/skbuff.h:2739 [inline]
WARNING: CPU: 0 PID: 5068 at include/linux/skbuff.h:2723 mpls_gso_segment+0x773/0xaa0 net/mpls/mpls_gso.c:34
[..]
 skb_mac_gso_segment+0x383/0x740 net/core/gso.c:53
 nsh_gso_segment+0x40a/0xad0 net/nsh/nsh.c:108
 skb_mac_gso_segment+0x383/0x740 net/core/gso.c:53
 __skb_gso_segment+0x324/0x4c0 net/core/gso.c:124
 skb_gso_segment include/net/gso.h:83 [inline]
 [..]
 sch_direct_xmit+0x11a/0x5f0 net/sched/sch_generic.c:327
 [..]
 packet_sendmsg+0x46a9/0x6130 net/packet/af_packet.c:3113
 [..]

First iteration of this patch made mpls_hlen signed and changed
test to error out to "mpls_hlen <= 0 || ..".

Eric Dumazet said:
 > I was thinking about adding a debug check in skb_inner_network_header()
 > if inner_network_header is zero (that would mean it is not 'set' yet),
 > but this would trigger even after your patch.

So add new skb_inner_network_header_was_set() helper and use that.

The syzbot reproducer injects data via packet socket. The skb that gets
allocated and passed down the stack has ->protocol set to NSH (0x894f)
and gso_type set to SKB_GSO_UDP | SKB_GSO_DODGY.

This gets passed to skb_mac_gso_segment(), which sees NSH as ptype to
find a callback for.  nsh_gso_segment() retrieves next type:

        proto = tun_p_to_eth_p(nsh_hdr(skb)->np);

... which is MPLS (TUN_P_MPLS_UC). It updates skb->protocol and then
calls mpls_gso_segment().  Inner offsets are all 0, so mpls_gso_segment()
ends up with a negative header size.

In case more callers rely on silent handling of such large may_pull values
we could also 'legalize' this behaviour, either replacing the debug check
with (len > INT_MAX) test or removing it and instead adding a comment
before existing

 if (unlikely(len > skb->len))
    return SKB_DROP_REASON_PKT_TOO_SMALL;

test in pskb_may_pull_reason(), saying that this check also implicitly
takes care of callers that miscompute header sizes.

Cc: Simon Horman <horms@kernel.org>
Fixes: 219eee9c0d16 ("net: skbuff: add overflow debug check to pull/push helpers")
Reported-by: syzbot+99d15fcdb0132a1e1a82@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/00000000000043b1310611e388aa@google.com/raw
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240222140321.14080-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: ethtool: avoid rebuilds on UTS_RELEASE change
Jann Horn [Tue, 20 Feb 2024 19:42:44 +0000 (20:42 +0100)]
net: ethtool: avoid rebuilds on UTS_RELEASE change

Currently, when you switch between branches or something like that and
rebuild, net/ethtool/ioctl.c has to be built again because it depends
on UTS_RELEASE.

By instead referencing a string variable stored in another object file,
this can be avoided.

Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240220194244.2056384-1-jannh@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: stmmac: dwmac-qcom-ethqos: Add support for 2.5G SGMII
Sneh Shah [Tue, 20 Feb 2024 05:07:35 +0000 (10:37 +0530)]
net: stmmac: dwmac-qcom-ethqos: Add support for 2.5G SGMII

Serdes phy needs to operate at 2500 mode for 2.5G speed and 1000
mode for 1G/100M/10M speed.
Added changes to configure serdes phy and mac based on link speed.
Changing serdes phy speed involves multiple register writes for
serdes block. To avoid redundant write operations only update serdes
phy when new speed is different.
For 2500 speed MAC PCS autoneg needs to disabled. Added changes to
disable MAC PCS autoneg if ANE parameter is not set.

Signed-off-by: Sneh Shah <quic_snehshah@quicinc.com>
Tested-by: Abhishek Chauhan <quic_abchauha@quicinc.com> # sa8775p-ride
Reviewed-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agobonding: rate-limit bonding driver inspect messages
Praveen Kumar Kannoju [Wed, 21 Feb 2024 08:27:52 +0000 (13:57 +0530)]
bonding: rate-limit bonding driver inspect messages

Through the routine bond_mii_monitor(), bonding driver inspects and commits
the slave state changes. During the times when slave state change and
failure in aqcuiring rtnl lock happen at the same time, the routine
bond_mii_monitor() reschedules itself to come around after 1 msec to commit
the new state.

During this, it executes the routine bond_miimon_inspect() to re-inspect
the state chane and prints the corresponding slave state on to the console.
Hence we do see a message at every 1 msec till the rtnl lock is acquired
and state chage is committed.

This patch doesn't change how bond functions. It only simply limits this
kind of log flood.

Signed-off-by: Praveen Kumar Kannoju <praveen.kannoju@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/20240221082752.4660-1-praveen.kannoju@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoMerge tag 'nf-next-24-02-21' of https://git.kernel.org/pub/scm/linux/kernel/git/netfi...
Jakub Kicinski [Fri, 23 Feb 2024 03:06:20 +0000 (19:06 -0800)]
Merge tag 'nf-next-24-02-21' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Florian Westphal says:

====================
netfilter updates for net-next

1. Prefer KMEM_CACHE() macro to create kmem caches, from Kunwu Chan.

Patches 2 and 3 consolidate nf_log NULL checks and introduces
extra boundary checks on family and type to make it clear that no out
of bounds access will happen.  No in-tree user currently passes such
values, but thats not clear from looking at the function.
From Pablo Neira Ayuso.

Patch 4, also from Pablo, gets rid of unneeded conditional in
nft_osf init function.

Patch 5, from myself, fixes erroneous Kconfig dependencies that
came in an earlier net-next pull request. This should get rid
of the xtables related build failure reports.

Patches 6 to 10 are an update to nftables' concatenated-ranges
set type to speed up element insertions.  This series also
compacts a few data structures and cleans up a few oddities such
as reliance on ZERO_SIZE_PTR when asking to allocate a set with
no elements. From myself.

Patches 11 moves the nf_reinject function from the netfilter core
(vmlinux) into the nfnetlink_queue backend, the only location where
this is called from. Also from myself.

Patch 12, from Kees Cook, switches xtables' compat layer to use
unsafe_memcpy because xt_entry_target cannot easily get converted
to a real flexible array (its UAPI and used inside other structs).

* tag 'nf-next-24-02-21' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: x_tables: Use unsafe_memcpy() for 0-sized destination
  netfilter: move nf_reinject into nfnetlink_queue modules
  netfilter: nft_set_pipapo: use GFP_KERNEL for insertions
  netfilter: nft_set_pipapo: speed up bulk element insertions
  netfilter: nft_set_pipapo: shrink data structures
  netfilter: nft_set_pipapo: do not rely on ZERO_SIZE_PTR
  netfilter: nft_set_pipapo: constify lookup fn args where possible
  netfilter: xtables: fix up kconfig dependencies
  netfilter: nft_osf: simplify init path
  netfilter: nf_log: validate nf_logger_find_get()
  netfilter: nf_log: consolidate check for NULL logger in lookup function
  netfilter: expect: Simplify the allocation of slab caches in nf_conntrack_expect_init
====================

Link: https://lore.kernel.org/r/20240221112637.5396-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoipv6/sit: Do not allocate stats in the driver
Breno Leitao [Wed, 21 Feb 2024 16:17:32 +0000 (08:17 -0800)]
ipv6/sit: Do not allocate stats in the driver

With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf"), stats allocation could be done on net core
instead of this driver.

With this new approach, the driver doesn't have to bother with error
handling (allocation failure checking, making sure free happens in the
right spot, etc). This is core responsibility now.

Remove the allocation in the ipv6/sit driver and leverage the network
core allocation.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240221161732.3026127-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoocteon_ep_vf: Improve help text grammar
Geert Uytterhoeven [Wed, 21 Feb 2024 10:52:41 +0000 (11:52 +0100)]
octeon_ep_vf: Improve help text grammar

Add missing articles.
Fix plural vs. singular.
Fix present vs. future.

Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Sathesh B Edara <sedara@marvell.com>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/b3b97462c3d9eba2ec03dd6d597e63bf49a7365a.1708512706.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: microchip: lan743x: Fix spelling mistake "erro" -> "error"
Colin Ian King [Tue, 20 Feb 2024 09:17:37 +0000 (09:17 +0000)]
net: microchip: lan743x: Fix spelling mistake "erro" -> "error"

There is a spelling mistake in a netif_err message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240220091737.2676984-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet/af_iucv: fix virtual vs physical address confusion
Alexander Gordeev [Thu, 15 Feb 2024 08:05:00 +0000 (09:05 +0100)]
net/af_iucv: fix virtual vs physical address confusion

Fix virtual vs physical address confusion. This does not fix a bug
since virtual and physical address spaces are currently the same.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://lore.kernel.org/r/20240215080500.2616848-1-agordeev@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 22 Feb 2024 23:24:56 +0000 (15:24 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

Conflicts:

net/ipv4/udp.c
  f796feabb9f5 ("udp: add local "peek offset enabled" flag")
  56667da7399e ("net: implement lockless setsockopt(SO_PEEK_OFF)")

Adjacent changes:

net/unix/garbage.c
  aa82ac51d633 ("af_unix: Drop oob_skb ref before purging queue in GC.")
  11498715f266 ("af_unix: Remove io_uring code for GC.")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoMerge tag 'wireless-next-2024-02-22' of git://git.kernel.org/pub/scm/linux/kernel...
Jakub Kicinski [Thu, 22 Feb 2024 23:11:18 +0000 (15:11 -0800)]
Merge tag 'wireless-next-2024-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.9

The third "new features" pull request for v6.9. This is a quick
followup to send commit 04edb5dc68f4 ("wifi: ath12k: Fix uninitialized
use of ret in ath12k_mac_allocate()") to fix the ath12k clang warning
introduced in the previous pull request.

We also have support for QCA2066 in ath11k, several new features in
ath12k and few other changes in drivers. In stack it's mostly cleanup
and refactoring.

Major changes:

ath12k
 * firmware-2.bin support
 * support having multiple identical PCI devices (firmware needs to
   have ATH12K_FW_FEATURE_MULTI_QRTR_ID)
 * QCN9274: support split-PHY devices
 * WCN7850: enable Power Save Mode in station mode
 * WCN7850: P2P support

ath11k:
 * QCA6390 & WCN6855: support 2 concurrent station interfaces
 * QCA2066 support

iwlwifi
 * mvm: support wider-bandwidth OFDMA
 * bump firmware API to 90 for BZ/SC devices

brcmfmac
 * DMI nvram filename quirk for ACEPC W5 Pro

* tag 'wireless-next-2024-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (75 commits)
  wifi: wilc1000: revert reset line logic flip
  wifi: brcmfmac: Add DMI nvram filename quirk for ACEPC W5 Pro
  wifi: rtlwifi: set initial values for unexpected cases of USB endpoint priority
  wifi: rtl8xxxu: check vif before using in rtl8xxxu_tx()
  wifi: rtlwifi: rtl8192cu: Fix TX aggregation
  wifi: wilc1000: remove AKM suite be32 conversion for external auth request
  wifi: nl80211: refactor parsing CSA offsets
  wifi: nl80211: force WLAN_AKM_SUITE_SAE in big endian in NL80211_CMD_EXTERNAL_AUTH
  wifi: iwlwifi: load b0 version of ucode for HR1/HR2
  wifi: iwlwifi: handle per-phy statistics from fw
  wifi: iwlwifi: iwl-fh.h: fix kernel-doc issues
  wifi: iwlwifi: api: fix kernel-doc reference
  wifi: iwlwifi: mvm: unlock mvm if there is no primary link
  wifi: iwlwifi: bump FW API to 90 for BZ/SC devices
  wifi: iwlwifi: mvm: support PHY context version 6
  wifi: iwlwifi: mvm: partially support PHY context version 6
  wifi: iwlwifi: mvm: support wider-bandwidth OFDMA
  wifi: cfg80211: use ML element parsing helpers
  wifi: mac80211: align ieee80211_mle_get_bss_param_ch_cnt()
  wifi: cfg80211: refactor RNR parsing
  ...
====================

Link: https://lore.kernel.org/r/20240222105205.CEC54C433F1@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoMerge tag 'net-6.8.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 22 Feb 2024 17:57:58 +0000 (09:57 -0800)]
Merge tag 'net-6.8.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bpf and netfilter.

  Current release - regressions:

   - af_unix: fix another unix GC hangup

  Previous releases - regressions:

   - core: fix a possible AF_UNIX deadlock

   - bpf: fix NULL pointer dereference in sk_psock_verdict_data_ready()

   - netfilter: nft_flow_offload: release dst in case direct xmit path
     is used

   - bridge: switchdev: ensure MDB events are delivered exactly once

   - l2tp: pass correct message length to ip6_append_data

   - dccp/tcp: unhash sk from ehash for tb2 alloc failure after
     check_estalblished()

   - tls: fixes for record type handling with PEEK

   - devlink: fix possible use-after-free and memory leaks in
     devlink_init()

  Previous releases - always broken:

   - bpf: fix an oops when attempting to read the vsyscall page through
     bpf_probe_read_kernel

   - sched: act_mirred: use the backlog for mirred ingress

   - netfilter: nft_flow_offload: fix dst refcount underflow

   - ipv6: sr: fix possible use-after-free and null-ptr-deref

   - mptcp: fix several data races

   - phonet: take correct lock to peek at the RX queue

  Misc:

   - handful of fixes and reliability improvements for selftests"

* tag 'net-6.8.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
  l2tp: pass correct message length to ip6_append_data
  net: phy: realtek: Fix rtl8211f_config_init() for RTL8211F(D)(I)-VD-CG PHY
  selftests: ioam: refactoring to align with the fix
  Fix write to cloned skb in ipv6_hop_ioam()
  phonet/pep: fix racy skb_queue_empty() use
  phonet: take correct lock to peek at the RX queue
  net: sparx5: Add spinlock for frame transmission from CPU
  net/sched: flower: Add lock protection when remove filter handle
  devlink: fix port dump cmd type
  net: stmmac: Fix EST offset for dwmac 5.10
  tools: ynl: don't leak mcast_groups on init error
  tools: ynl: make sure we always pass yarg to mnl_cb_run
  net: mctp: put sock on tag allocation failure
  netfilter: nf_tables: use kzalloc for hook allocation
  netfilter: nf_tables: register hooks last when adding new chain/flowtable
  netfilter: nft_flow_offload: release dst in case direct xmit path is used
  netfilter: nft_flow_offload: reset dst in route object after setting up flow
  netfilter: nf_tables: set dormant flag on hook register failure
  selftests: tls: add test for peeking past a record of a different type
  selftests: tls: add test for merging of same-type control messages
  ...

20 months agoMerge tag 'trace-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace...
Linus Torvalds [Thu, 22 Feb 2024 17:23:22 +0000 (09:23 -0800)]
Merge tag 'trace-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fix from Steven Rostedt:

 - While working on the ring buffer I noticed that the counter used for
   knowing where the end of the data is on a sub-buffer was not a full
   "int" but just 20 bits. It was masked out to 0xfffff.

   With the new code that allows the user to change the size of the
   sub-buffer, it is theoretically possible to ask for a size bigger
   than 2^20. If that happens, unexpected results may occur as there's
   no code checking if the counter overflowed the 20 bits of the write
   mask. There are other checks to make sure events fit in the
   sub-buffer, but if the sub-buffer itself is too big, that is not
   checked.

   Add a check in the resize of the sub-buffer to make sure that it
   never goes beyond the size of the counter that holds how much data is
   on it.

* tag 'trace-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Do not let subbuf be bigger than write mask

20 months agoMerge branch 'bnxt_en-ntuple-filter-improvements'
Paolo Abeni [Thu, 22 Feb 2024 14:31:25 +0000 (15:31 +0100)]
Merge branch 'bnxt_en-ntuple-filter-improvements'

Michael Chan says:

====================
bnxt_en: Ntuple filter improvements

The current Ntuple filter implementation has a limitation on 5750X (P5)
and newer chips.  The destination ring of the ntuple filter must be
a valid ring in the RSS indirection table.  Ntuple filters may not work
if the RSS indirection table is modified by the user to only contain a
subset of the rings.  If an ntuple filter is set to a ring destination
that is not in the RSS indirection table, the packet matching that
filter will be placed in a random ring instead of the specified
destination ring.

This series of patches will fix the problem by using a separate VNIC
for ntuple filters.  The default VNIC will be dedicated for RSS and
so the indirection table can be setup in any way and will not affect
ntuple filters using the separate VNIC.

Quite a bit of refactoring is needed to do the the VNIC and RSS
context accounting in the first few patches.  This is technically a
bug fix, but I think the changes are too big for -net.
====================

Link: https://lore.kernel.org/r/20240220230317.96341-1-michael.chan@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agobnxt_en: Use the new VNIC to create ntuple filters
Pavan Chebbi [Tue, 20 Feb 2024 23:03:17 +0000 (15:03 -0800)]
bnxt_en: Use the new VNIC to create ntuple filters

The newly created vnic (BNXT_VNIC_NTUPLE) is ready to be used to create
ntuple filters when supported by firmware.  All RX rings can be used
regardless of the RSS indirection setting on the default VNIC.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agobnxt_en: Create and setup the additional VNIC for adding ntuple filters
Pavan Chebbi [Tue, 20 Feb 2024 23:03:16 +0000 (15:03 -0800)]
bnxt_en: Create and setup the additional VNIC for adding ntuple filters

Allocate and setup the additional VNIC for ntuple filters if this
new method is supported by the firmware.  Even though this VNIC is
only used for ntuple filters with direct ring destinations, we still
setup the RSS hash to be identical to the default VNIC so that each
RX packet will have the correct hash in the RX completion.  This
VNIC is always at VNIC index BNXT_VNIC_NTUPLE.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agobnxt_en: Provision for an additional VNIC for ntuple filters
Pavan Chebbi [Tue, 20 Feb 2024 23:03:15 +0000 (15:03 -0800)]
bnxt_en: Provision for an additional VNIC for ntuple filters

On newer chips that support the ring table index method for
ntuple filters, the current scheme of using the same VNIC for
both RSS and ntuple filters will not work in all cases.  An
ntuple filter can only be directed to a destination ring if
that destination ring is also in the RSS indirection table.

To support ntuple filters with any arbitratry RSS indirection
table that may only include a subset of the rings, we need to
use a separate VNIC for ntuple filters.

This patch provisions the additional VNIC.  The next patch will
allocate additional VNIC from firmware and set it up.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agobnxt_en: Define BNXT_VNIC_DEFAULT for the default vnic index
Pavan Chebbi [Tue, 20 Feb 2024 23:03:14 +0000 (15:03 -0800)]
bnxt_en: Define BNXT_VNIC_DEFAULT for the default vnic index

Replace hard coded 0 index with more meaningful BNXT_VNIC_DEFAULT.

Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agobnxt_en: Refactor bnxt_set_features()
Pavan Chebbi [Tue, 20 Feb 2024 23:03:13 +0000 (15:03 -0800)]
bnxt_en: Refactor bnxt_set_features()

Refactor bnxt_set_features() function to have a common
function to re-init.  We'll need this to reinitialize when
ntuple configuration changes.

Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agobnxt_en: Add bnxt_get_total_vnics() to calculate number of VNICs
Venkat Duvvuru [Tue, 20 Feb 2024 23:03:12 +0000 (15:03 -0800)]
bnxt_en: Add bnxt_get_total_vnics() to calculate number of VNICs

Refactor the code by adding a new function to calculate the number of
required VNICs.  This is used in multiple places when reserving or
checking resources.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agobnxt_en: Check additional resources in bnxt_check_rings()
Michael Chan [Tue, 20 Feb 2024 23:03:11 +0000 (15:03 -0800)]
bnxt_en: Check additional resources in bnxt_check_rings()

bnxt_check_rings() is called to check if we have enough resource
assets to satisfy the new number of ethtool channels.  If the asset
test fails, the ethtool operation will fail gracefully.  Otherwise
we will proceed and commit to use the new number of channels.  If it
fails to allocate any resources, the chip will fail to come up.

For completeness, check all possible resources before committing to
the new settings.  Add the missing ring group and RSS context asset
tests in bnxt_check_rings().

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agobnxt_en: Improve RSS context reservation infrastructure
Pavan Chebbi [Tue, 20 Feb 2024 23:03:10 +0000 (15:03 -0800)]
bnxt_en: Improve RSS context reservation infrastructure

Add RSS context fields to struct bnxt_hw_rings and struct bnxt_hw_resc.
With these, we can now specific the exact number of RSS contexts to
reserve and store the reserved value.  The original code relies on
other resources to infer the number of RSS contexts to reserve and the
reserved value is not stored.  This improved infrastructure will make
the RSS context accounting more complete and is needed by later
patches.

Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agobnxt_en: Explicitly specify P5 completion rings to reserve
Michael Chan [Tue, 20 Feb 2024 23:03:09 +0000 (15:03 -0800)]
bnxt_en: Explicitly specify P5 completion rings to reserve

The current code assumes that every RX ring group and every TX ring
requires a completion ring on P5_PLUS chips.  Now that we have the
bnxt_hw_rings structure, add the cp_p5 field so that it can
be explicitly specified.  This makes the logic more clear.

Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agobnxt_en: Refactor ring reservation functions
Michael Chan [Tue, 20 Feb 2024 23:03:08 +0000 (15:03 -0800)]
bnxt_en: Refactor ring reservation functions

The current functions to reserve hardware rings pass in 6 different ring
or resource types as parameters.  Add a structure bnxt_hw_rings to
consolidate all these parameters and pass the structure pointer instead
to these functions.  Add 2 related helper functions also.  This makes
the code cleaner and makes it easier to add new resources to be
reserved.

Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agoMerge branch 'mctp-core-protocol-updates-minor-fixes-tests'
Paolo Abeni [Thu, 22 Feb 2024 12:32:57 +0000 (13:32 +0100)]
Merge branch 'mctp-core-protocol-updates-minor-fixes-tests'

Jeremy Kerr says:

====================
MCTP core protocol updates, minor fixes & tests

This series implements some procotol improvements for AF_MCTP,
particularly for systems with multiple MCTP networks defined. For those,
we need to add the network ID to the tag lookups, which then suggests an
updated version of the tag allocate / drop ioctl to allow the net ID to
be specified there too.

The ioctl change affects uabi, so might warrant some extra attention.

There are also a couple of new kunit tests for multiple-net
configurations.

We have a fix for populating the flow data when fragmenting, and a
testcase for that too.

Of course, any queries/comments/etc., please let me know!
====================

Link: https://lore.kernel.org/r/cover.1708335994.git.jk@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agonet: mctp: tests: Add a test for proper tag creation on local output
Jeremy Kerr [Mon, 19 Feb 2024 09:51:56 +0000 (17:51 +0800)]
net: mctp: tests: Add a test for proper tag creation on local output

Ensure we have the correct key parameters on sending a message.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agonet: mctp: tests: Test that outgoing skbs have flow data populated
Jeremy Kerr [Mon, 19 Feb 2024 09:51:55 +0000 (17:51 +0800)]
net: mctp: tests: Test that outgoing skbs have flow data populated

When CONFIG_MCTP_FLOWS is enabled, outgoing skbs should have their
SKB_EXT_MCTP extension set for drivers to consume.

Add two tests for local-to-output routing that check for the flow
extensions: one for the simple single-packet case, and one for
fragmentation.

We now make MCTP_TEST select MCTP_FLOWS, so we always get coverage of
these flow tests. The tests are skippable if MCTP_FLOWS is (otherwise)
disabled, but that would need manual config tweaking.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agonet: mctp: copy skb ext data when fragmenting
Jeremy Kerr [Mon, 19 Feb 2024 09:51:54 +0000 (17:51 +0800)]
net: mctp: copy skb ext data when fragmenting

If we're fragmenting on local output, the original packet may contain
ext data for the MCTP flows. We'll want this in the resulting fragment
skbs too.

So, do a skb_ext_copy() in the fragmentation path, and implement the
MCTP-specific parts of an ext copy operation.

Fixes: 67737c457281 ("mctp: Pass flow data & flow release events to drivers")
Reported-by: Jian Zhang <zhangjian.3032@bytedance.com>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agonet: mctp: tests: Add MCTP net isolation tests
Jeremy Kerr [Mon, 19 Feb 2024 09:51:53 +0000 (17:51 +0800)]
net: mctp: tests: Add MCTP net isolation tests

Add a couple of tests that excersise the new net-specific sk_key and
bind lookups

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agonet: mctp: tests: Add netid argument to __mctp_route_test_init
Jeremy Kerr [Mon, 19 Feb 2024 09:51:52 +0000 (17:51 +0800)]
net: mctp: tests: Add netid argument to __mctp_route_test_init

We'll want to create net-specific test setups in an upcoming change, so
allow the caller to provide a non-default netid.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agonet: mctp: provide a more specific tag allocation ioctl
Jeremy Kerr [Mon, 19 Feb 2024 09:51:51 +0000 (17:51 +0800)]
net: mctp: provide a more specific tag allocation ioctl

Now that we have net-specific tags, extend the tag allocation ioctls
(SIOCMCTPALLOCTAG / SIOCMCTPDROPTAG) to allow a network parameter to be
passed to the tag allocation.

We also add a local_addr member to the ioc struct, to allow for a future
finer-grained tag allocation using local EIDs too. We don't add any
specific support for that now though, so require MCTP_ADDR_ANY or
MCTP_ADDR_NULL for those at present.

The old ioctls will still work, but allocate for the default MCTP net.
These are now marked as deprecated in the header.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agonet: mctp: separate key correlation across nets
Jeremy Kerr [Mon, 19 Feb 2024 09:51:50 +0000 (17:51 +0800)]
net: mctp: separate key correlation across nets

Currently, we lookup sk_keys from the entire struct net_namespace, which
may contain multiple MCTP net IDs. In those cases we want to distinguish
between endpoints with the same EID but different net ID.

Add the net ID data to the struct mctp_sk_key, populate on add and
filter on this during route lookup.

For the ioctl interface, we use a default net of
MCTP_INITIAL_DEFAULT_NET (ie., what will be in use for single-net
configurations), but we'll extend the ioctl interface to provide
net-specific tag allocation in an upcoming change.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agonet: mctp: tests: create test skbs with the correct net and device
Jeremy Kerr [Mon, 19 Feb 2024 09:51:49 +0000 (17:51 +0800)]
net: mctp: tests: create test skbs with the correct net and device

In our test skb creation functions, we're not setting up the net and
device data. This doesn't matter at the moment, but we will want to add
support for distinct net IDs in future.

Set the ->net identifier on the test MCTP device, and ensure that test
skbs are set up with the correct device-related data on creation. Create
a helper for setting skb->dev and mctp_skb_cb->net.

We have a few cases where we're calling __mctp_cb() to initialise the cb
(which we need for the above) separately, so integrate this into the skb
creation helpers.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agonet: mctp: make key lookups match the ANY address on either local or peer
Jeremy Kerr [Mon, 19 Feb 2024 09:51:48 +0000 (17:51 +0800)]
net: mctp: make key lookups match the ANY address on either local or peer

We may have an ANY address in either the local or peer address of a
sk_key, and may want to match on an incoming daddr or saddr being ANY.

Do this by altering the conflicting-tag lookup to also accept ANY as
the local/peer address.

We don't want mctp_address_matches to match on the requested EID being
ANY, as that is a specific lookup case on packet input.

Reported-by: Eric Chuang <echuang@google.com>
Reported-by: Anthony <anthonyhkf@google.com>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agonet: mctp: Add some detail on the key allocation implementation
Jeremy Kerr [Mon, 19 Feb 2024 09:51:47 +0000 (17:51 +0800)]
net: mctp: Add some detail on the key allocation implementation

We could do with a little more comment on where MCTP_ADDR_ANY will match
in the key allocations.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agonet: mctp: avoid confusion over local/peer dest/source addresses
Jeremy Kerr [Mon, 19 Feb 2024 09:51:46 +0000 (17:51 +0800)]
net: mctp: avoid confusion over local/peer dest/source addresses

We have a double-swap of local and peer addresses in
mctp_alloc_local_tag; the arguments in both call sites are swapped, but
there is also a swap in the implementation of alloc_local_tag. This is
opaque because we're using source/dest address references, which don't
match the local/peer semantics.

Avoid this confusion by naming the arguments as 'local' and 'peer', and
remove the double swap. The calling order now matches mctp_key_alloc.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agoMerge tag 'ath-next-20240222' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath
Kalle Valo [Thu, 22 Feb 2024 10:41:45 +0000 (12:41 +0200)]
Merge tag 'ath-next-20240222' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath

ath.git patches for v6.9

We have support for QCA2066 now and also several new features in ath12k.

Major changes:

ath12k

* firmware-2.bin support

* support having multiple identical PCI devices (firmware needs to
  have ATH12K_FW_FEATURE_MULTI_QRTR_ID)

* QCN9274: support split-PHY devices

* WCN7850: enable Power Save Mode in station mode

* WCN7850: P2P support

ath11k:

* QCA6390 & WCN6855: support 2 concurrent station interfaces

* QCA2066 support

20 months agol2tp: pass correct message length to ip6_append_data
Tom Parkin [Tue, 20 Feb 2024 12:21:56 +0000 (12:21 +0000)]
l2tp: pass correct message length to ip6_append_data

l2tp_ip6_sendmsg needs to avoid accounting for the transport header
twice when splicing more data into an already partially-occupied skbuff.

To manage this, we check whether the skbuff contains data using
skb_queue_empty when deciding how much data to append using
ip6_append_data.

However, the code which performed the calculation was incorrect:

     ulen = len + skb_queue_empty(&sk->sk_write_queue) ? transhdrlen : 0;

...due to C operator precedence, this ends up setting ulen to
transhdrlen for messages with a non-zero length, which results in
corrupted packets on the wire.

Add parentheses to correct the calculation in line with the original
intent.

Fixes: 9d4c75800f61 ("ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data()")
Cc: David Howells <dhowells@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240220122156.43131-1-tparkin@katalix.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agoMerge tag 'nf-24-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Paolo Abeni [Thu, 22 Feb 2024 09:20:50 +0000 (10:20 +0100)]
Merge tag 'nf-24-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) If user requests to wake up a table and hook fails, restore the
   dormant flag from the error path, from Florian Westphal.

2) Reset dst after transferring it to the flow object, otherwise dst
   gets released twice from the error path.

3) Release dst in case the flowtable selects a direct xmit path, eg.
   transmission to bridge port. Otherwise, dst is memleaked.

4) Register basechain and flowtable hooks at the end of the command.
   Error path releases these datastructure without waiting for the
   rcu grace period.

5) Use kzalloc() to initialize struct nft_hook to fix a KMSAN report
   on access to hook type, also from Florian Westphal.

netfilter pull request 24-02-22

* tag 'nf-24-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: use kzalloc for hook allocation
  netfilter: nf_tables: register hooks last when adding new chain/flowtable
  netfilter: nft_flow_offload: release dst in case direct xmit path is used
  netfilter: nft_flow_offload: reset dst in route object after setting up flow
  netfilter: nf_tables: set dormant flag on hook register failure
====================

Link: https://lore.kernel.org/r/20240222000843.146665-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agoMerge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Paolo Abeni [Thu, 22 Feb 2024 09:04:46 +0000 (10:04 +0100)]
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2024-02-22

The following pull-request contains BPF updates for your *net* tree.

We've added 11 non-merge commits during the last 24 day(s) which contain
a total of 15 files changed, 217 insertions(+), 17 deletions(-).

The main changes are:

1) Fix a syzkaller-triggered oops when attempting to read the vsyscall
   page through bpf_probe_read_kernel and friends, from Hou Tao.

2) Fix a kernel panic due to uninitialized iter position pointer in
   bpf_iter_task, from Yafang Shao.

3) Fix a race between bpf_timer_cancel_and_free and bpf_timer_cancel,
   from Martin KaFai Lau.

4) Fix a xsk warning in skb_add_rx_frag() (under CONFIG_DEBUG_NET)
   due to incorrect truesize accounting, from Sebastian Andrzej Siewior.

5) Fix a NULL pointer dereference in sk_psock_verdict_data_ready,
   from Shigeru Yoshida.

6) Fix a resolve_btfids warning when bpf_cpumask symbol cannot be
   resolved, from Hari Bathini.

bpf-for-netdev

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf, sockmap: Fix NULL pointer dereference in sk_psock_verdict_data_ready()
  selftests/bpf: Add negtive test cases for task iter
  bpf: Fix an issue due to uninitialized bpf_iter_task
  selftests/bpf: Test racing between bpf_timer_cancel_and_free and bpf_timer_cancel
  bpf: Fix racing between bpf_timer_cancel_and_free and bpf_timer_cancel
  selftest/bpf: Test the read of vsyscall page under x86-64
  x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault()
  x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h
  bpf, scripts: Correct GPL license name
  xsk: Add truesize to skb_add_rx_frag().
  bpf: Fix warning for bpf_cpumask in verifier
====================

Link: https://lore.kernel.org/r/20240221231826.1404-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agonet: phy: realtek: Fix rtl8211f_config_init() for RTL8211F(D)(I)-VD-CG PHY
Siddharth Vadapalli [Tue, 20 Feb 2024 07:00:07 +0000 (12:30 +0530)]
net: phy: realtek: Fix rtl8211f_config_init() for RTL8211F(D)(I)-VD-CG PHY

Commit bb726b753f75 ("net: phy: realtek: add support for
RTL8211F(D)(I)-VD-CG") extended support of the driver from the existing
support for RTL8211F(D)(I)-CG PHY to the newer RTL8211F(D)(I)-VD-CG PHY.

While that commit indicated that the RTL8211F_PHYCR2 register is not
supported by the "VD-CG" PHY model and therefore updated the corresponding
section in rtl8211f_config_init() to be invoked conditionally, the call to
"genphy_soft_reset()" was left as-is, when it should have also been invoked
conditionally. This is because the call to "genphy_soft_reset()" was first
introduced by the commit 0a4355c2b7f8 ("net: phy: realtek: add dt property
to disable CLKOUT clock") since the RTL8211F guide indicates that a PHY
reset should be issued after setting bits in the PHYCR2 register.

As the PHYCR2 register is not applicable to the "VD-CG" PHY model, fix the
rtl8211f_config_init() function by invoking "genphy_soft_reset()"
conditionally based on the presence of the "PHYCR2" register.

Fixes: bb726b753f75 ("net: phy: realtek: add support for RTL8211F(D)(I)-VD-CG")
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240220070007.968762-1-s-vadapalli@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agoMerge branch 'ioam6-fix-write-to-cloned-skb-s'
Paolo Abeni [Thu, 22 Feb 2024 08:28:06 +0000 (09:28 +0100)]
Merge branch 'ioam6-fix-write-to-cloned-skb-s'

Justin Iurman says:

====================
ioam6: fix write to cloned skb's

Make sure the IOAM data insertion is not applied on cloned skb's. As a
consequence, ioam selftests needed a refactoring.
====================

Link: https://lore.kernel.org/r/20240219135255.15429-1-justin.iurman@uliege.be
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agoselftests: ioam: refactoring to align with the fix
Justin Iurman [Mon, 19 Feb 2024 13:52:55 +0000 (14:52 +0100)]
selftests: ioam: refactoring to align with the fix

ioam6_parser uses a packet socket. After the fix to prevent writing to
cloned skb's, the receiver does not see its IOAM data anymore, which
makes input/forward ioam-selftests to fail. As a workaround,
ioam6_parser now uses an IPv6 raw socket and leverages ancillary data to
get hop-by-hop options. As a consequence, the hook is "after" the IOAM
data insertion by the receiver and all tests are working again.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agoFix write to cloned skb in ipv6_hop_ioam()
Justin Iurman [Mon, 19 Feb 2024 13:52:54 +0000 (14:52 +0100)]
Fix write to cloned skb in ipv6_hop_ioam()

ioam6_fill_trace_data() writes inside the skb payload without ensuring
it's writeable (e.g., not cloned). This function is called both from the
input and output path. The output path (ioam6_iptunnel) already does the
check. This commit provides a fix for the input path, inside
ipv6_hop_ioam(). It also updates ip6_parse_tlv() to refresh the network
header pointer ("nh") when returning from ipv6_hop_ioam().

Fixes: 9ee11f0fff20 ("ipv6: ioam: Data plane support for Pre-allocated Trace")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agophonet/pep: fix racy skb_queue_empty() use
Rémi Denis-Courmont [Sun, 18 Feb 2024 08:12:14 +0000 (10:12 +0200)]
phonet/pep: fix racy skb_queue_empty() use

The receive queues are protected by their respective spin-lock, not
the socket lock. This could lead to skb_peek() unexpectedly
returning NULL or a pointer to an already dequeued socket buffer.

Fixes: 9641458d3ec4 ("Phonet: Pipe End Point for Phonet Pipes protocol")
Signed-off-by: Rémi Denis-Courmont <courmisch@gmail.com>
Link: https://lore.kernel.org/r/20240218081214.4806-2-remi@remlab.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agophonet: take correct lock to peek at the RX queue
Rémi Denis-Courmont [Sun, 18 Feb 2024 08:12:13 +0000 (10:12 +0200)]
phonet: take correct lock to peek at the RX queue

The receive queue is protected by its embedded spin-lock, not the
socket lock, so we need the former lock here (and only that one).

Fixes: 107d0d9b8d9a ("Phonet: Phonet datagram transport protocol")
Reported-by: Luosili <rootlab@huawei.com>
Signed-off-by: Rémi Denis-Courmont <courmisch@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240218081214.4806-1-remi@remlab.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
20 months agoPPPoL2TP: Add more code snippets
Samuel Thibault [Sat, 17 Feb 2024 21:14:25 +0000 (22:14 +0100)]
PPPoL2TP: Add more code snippets

The existing documentation was not telling that one has to create a PPP
channel and a PPP interface to get PPPoL2TP data offloading working.

Also, tunnel switching was not mentioned, so that people were thinking
it was not supported, while it actually is.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Acked-by: Tom Parkin <tparkin@katalix.com>
Link: https://lore.kernel.org/r/20240217211425.qj576u3jmaa6yidf@begin
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: sparx5: Add spinlock for frame transmission from CPU
Horatiu Vultur [Mon, 19 Feb 2024 08:00:43 +0000 (09:00 +0100)]
net: sparx5: Add spinlock for frame transmission from CPU

Both registers used when doing manual injection or fdma injection are
shared between all the net devices of the switch. It was noticed that
when having two process which each of them trying to inject frames on
different ethernet ports, that the HW started to behave strange, by
sending out more frames then expected. When doing fdma injection it is
required to set the frame in the DCB and then make sure that the next
pointer of the last DCB is invalid. But because there is no locks for
this, then easily this pointer between the DCB can be broken and then it
would create a loop of DCBs. And that means that the HW will
continuously transmit these frames in a loop. Until the SW will break
this loop.
Therefore to fix this issue, add a spin lock for when accessing the
registers for manual or fdma injection.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Fixes: f3cad2611a77 ("net: sparx5: add hostmode with phylink support")
Link: https://lore.kernel.org/r/20240219080043.1561014-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet/sched: flower: Add lock protection when remove filter handle
Jianbo Liu [Tue, 20 Feb 2024 08:59:28 +0000 (08:59 +0000)]
net/sched: flower: Add lock protection when remove filter handle

As IDR can't protect itself from the concurrent modification, place
idr_remove() under the protection of tp->lock.

Fixes: 08a0063df3ae ("net/sched: flower: Move filter handle initialization earlier")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20240220085928.9161-1-jianbol@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agodevlink: fix port dump cmd type
Jiri Pirko [Tue, 20 Feb 2024 07:52:45 +0000 (08:52 +0100)]
devlink: fix port dump cmd type

Unlike other commands, due to a c&p error, port dump fills-up cmd with
wrong value, different from port-get request cmd, port-get doit reply
and port notification.

Fix it by filling cmd with value DEVLINK_CMD_PORT_NEW.

Skimmed through devlink userspace implementations, none of them cares
about this cmd value. Only ynl, for which, this is actually a fix, as it
expects doit and dumpit ops rsp_value to be the same.

Omit the fixes tag, even thought this is fix, better to target this for
next release.

Fixes: bfcd3a466172 ("Introduce devlink infrastructure")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240220075245.75416-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: stmmac: Fix EST offset for dwmac 5.10
Kurt Kanzenbach [Tue, 20 Feb 2024 08:22:46 +0000 (09:22 +0100)]
net: stmmac: Fix EST offset for dwmac 5.10

Fix EST offset for dwmac 5.10.

Currently configuring Qbv doesn't work as expected. The schedule is
configured, but never confirmed:

|[  128.250219] imx-dwmac 428a0000.ethernet eth1: configured EST

The reason seems to be the refactoring of the EST code which set the wrong
EST offset for the dwmac 5.10. After fixing this it works as before:

|[  106.359577] imx-dwmac 428a0000.ethernet eth1: configured EST
|[  128.430715] imx-dwmac 428a0000.ethernet eth1: EST: SWOL has been switched

Tested on imx93.

Fixes: c3f3b97238f6 ("net: stmmac: Refactor EST implementation")
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Link: https://lore.kernel.org/r/20240220-stmmac_est-v1-1-c41f9ae2e7b7@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoudp: add local "peek offset enabled" flag
Paolo Abeni [Tue, 20 Feb 2024 11:00:01 +0000 (12:00 +0100)]
udp: add local "peek offset enabled" flag

We want to re-organize the struct sock layout. The sk_peek_off
field location is problematic, as most protocols want it in the
RX read area, while UDP wants it on a cacheline different from
sk_receive_queue.

Create a local (inside udp_sock) copy of the 'peek offset is enabled'
flag and place it inside the same cacheline of reader_queue.

Check such flag before reading sk_peek_off. This will save potential
false sharing and cache misses in the fast-path.

Tested under UDP flood with small packets. The struct sock layout
update causes a 4% performance drop, and this patch restores completely
the original tput.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/67ab679c15fbf49fa05b3ffe05d91c47ab84f147.1708426665.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoMerge branch 'tools-ynl-fix-impossible-errors'
Jakub Kicinski [Thu, 22 Feb 2024 01:02:30 +0000 (17:02 -0800)]
Merge branch 'tools-ynl-fix-impossible-errors'

Jakub Kicinski says:

====================
tools: ynl: fix impossible errors

Fix bugs discovered while I was hacking in low level stuff in YNL
and kept breaking the socket, exercising the "impossible" error paths.

v1: https://lore.kernel.org/all/20240217001742.2466993-1-kuba@kernel.org/
====================

Link: https://lore.kernel.org/r/20240220161112.2735195-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agotools: ynl: don't leak mcast_groups on init error
Jakub Kicinski [Tue, 20 Feb 2024 16:11:12 +0000 (08:11 -0800)]
tools: ynl: don't leak mcast_groups on init error

Make sure to free the already-parsed mcast_groups if
we don't get an ack from the kernel when reading family info.
This is part of the ynl_sock_create() error path, so we won't
get a call to ynl_sock_destroy() to free them later.

Fixes: 86878f14d71a ("tools: ynl: user space helpers")
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240220161112.2735195-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agotools: ynl: make sure we always pass yarg to mnl_cb_run
Jakub Kicinski [Tue, 20 Feb 2024 16:11:11 +0000 (08:11 -0800)]
tools: ynl: make sure we always pass yarg to mnl_cb_run

There is one common error handler in ynl - ynl_cb_error().
It expects priv to be a pointer to struct ynl_parse_arg AKA yarg.
To avoid potential crashes if we encounter a stray NLMSG_ERROR
always pass yarg as priv (or a struct which has it as the first
member).

ynl_cb_null() has a similar problem directly - it expects yarg
but priv passed by the caller is ys.

Found by code inspection.

Fixes: 86878f14d71a ("tools: ynl: user space helpers")
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240220161112.2735195-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: mctp: put sock on tag allocation failure
Jeremy Kerr [Thu, 15 Feb 2024 07:53:08 +0000 (15:53 +0800)]
net: mctp: put sock on tag allocation failure

We may hold an extra reference on a socket if a tag allocation fails: we
optimistically allocate the sk_key, and take a ref there, but do not
drop if we end up not using the allocated key.

Ensure we're dropping the sock on this failure by doing a proper unref
rather than directly kfree()ing.

Fixes: de8a6b15d965 ("net: mctp: add an explicit reference from a mctp_sk_key to sock")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/ce9b61e44d1cdae7797be0c5e3141baf582d23a0.1707983487.git.jk@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonetfilter: nf_tables: use kzalloc for hook allocation
Florian Westphal [Wed, 21 Feb 2024 17:38:45 +0000 (18:38 +0100)]
netfilter: nf_tables: use kzalloc for hook allocation

KMSAN reports unitialized variable when registering the hook,
   reg->hook_ops_type == NF_HOOK_OP_BPF)
        ~~~~~~~~~~~ undefined

This is a small structure, just use kzalloc to make sure this
won't happen again when new fields get added to nf_hook_ops.

Fixes: 7b4b2fa37587 ("netfilter: annotate nf_tables base hook ops")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
20 months agonetfilter: nf_tables: register hooks last when adding new chain/flowtable
Pablo Neira Ayuso [Mon, 19 Feb 2024 18:43:53 +0000 (19:43 +0100)]
netfilter: nf_tables: register hooks last when adding new chain/flowtable

Register hooks last when adding chain/flowtable to ensure that packets do
not walk over datastructure that is being released in the error path
without waiting for the rcu grace period.

Fixes: 91c7b38dc9f0 ("netfilter: nf_tables: use new transaction infrastructure to handle chain")
Fixes: 3b49e2e94e6e ("netfilter: nf_tables: add flow table netlink frontend")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
20 months agonetfilter: nft_flow_offload: release dst in case direct xmit path is used
Pablo Neira Ayuso [Tue, 20 Feb 2024 20:36:39 +0000 (21:36 +0100)]
netfilter: nft_flow_offload: release dst in case direct xmit path is used

Direct xmit does not use it since it calls dev_queue_xmit() to send
packets, hence it calls dst_release().

kmemleak reports:

unreferenced object 0xffff88814f440900 (size 184):
  comm "softirq", pid 0, jiffies 4294951896
  hex dump (first 32 bytes):
    00 60 5b 04 81 88 ff ff 00 e6 e8 82 ff ff ff ff  .`[.............
    21 0b 50 82 ff ff ff ff 00 00 00 00 00 00 00 00  !.P.............
  backtrace (crc cb2bf5d6):
    [<000000003ee17107>] kmem_cache_alloc+0x286/0x340
    [<0000000021a5de2c>] dst_alloc+0x43/0xb0
    [<00000000f0671159>] rt_dst_alloc+0x2e/0x190
    [<00000000fe5092c9>] __mkroute_output+0x244/0x980
    [<000000005fb96fb0>] ip_route_output_flow+0xc0/0x160
    [<0000000045367433>] nf_ip_route+0xf/0x30
    [<0000000085da1d8e>] nf_route+0x2d/0x60
    [<00000000d1ecd1cb>] nft_flow_route+0x171/0x6a0 [nft_flow_offload]
    [<00000000d9b2fb60>] nft_flow_offload_eval+0x4e8/0x700 [nft_flow_offload]
    [<000000009f447dbb>] expr_call_ops_eval+0x53/0x330 [nf_tables]
    [<00000000072e1be6>] nft_do_chain+0x17c/0x840 [nf_tables]
    [<00000000d0551029>] nft_do_chain_inet+0xa1/0x210 [nf_tables]
    [<0000000097c9d5c6>] nf_hook_slow+0x5b/0x160
    [<0000000005eccab1>] ip_forward+0x8b6/0x9b0
    [<00000000553a269b>] ip_rcv+0x221/0x230
    [<00000000412872e5>] __netif_receive_skb_one_core+0xfe/0x110

Fixes: fa502c865666 ("netfilter: flowtable: simplify route logic")
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
20 months agonetfilter: nft_flow_offload: reset dst in route object after setting up flow
Pablo Neira Ayuso [Wed, 21 Feb 2024 11:32:58 +0000 (12:32 +0100)]
netfilter: nft_flow_offload: reset dst in route object after setting up flow

dst is transferred to the flow object, route object does not own it
anymore.  Reset dst in route object, otherwise if flow_offload_add()
fails, error path releases dst twice, leading to a refcount underflow.

Fixes: a3c90f7a2323 ("netfilter: nf_tables: flow offload expression")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
20 months agonetfilter: nf_tables: set dormant flag on hook register failure
Florian Westphal [Mon, 19 Feb 2024 15:58:04 +0000 (16:58 +0100)]
netfilter: nf_tables: set dormant flag on hook register failure

We need to set the dormant flag again if we fail to register
the hooks.

During memory pressure hook registration can fail and we end up
with a table marked as active but no registered hooks.

On table/base chain deletion, nf_tables will attempt to unregister
the hook again which yields a warn splat from the nftables core.

Reported-and-tested-by: syzbot+de4025c006ec68ac56fc@syzkaller.appspotmail.com
Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
20 months agoMerge branch 'net-phy-marvell-88q2xxx-add-driver-for-the-marvell-88q2220-phy'
Jakub Kicinski [Wed, 21 Feb 2024 22:57:03 +0000 (14:57 -0800)]
Merge branch 'net-phy-marvell-88q2xxx-add-driver-for-the-marvell-88q2220-phy'

Dimitri Fedrau says:

====================
net: phy: marvell-88q2xxx: add driver for the Marvell 88Q2220 PHY
====================

Link: https://lore.kernel.org/r/20240218075753.18067-1-dima.fedrau@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: phy: marvell-88q2xxx: move interrupt configuration
Dimitri Fedrau [Sun, 18 Feb 2024 07:57:51 +0000 (08:57 +0100)]
net: phy: marvell-88q2xxx: move interrupt configuration

Move interrupt configuration from mv88q222x_revb0_config_init to
mv88q2xxx_config_init. Same register and bits are used for the 88q2xxx
devices.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Stefan Eichenberger <eichest@gmail.com>
Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Link: https://lore.kernel.org/r/20240218075753.18067-15-dima.fedrau@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: phy: marvell-88q2xxx: remove duplicated assignment of pma_extable
Dimitri Fedrau [Sun, 18 Feb 2024 07:57:50 +0000 (08:57 +0100)]
net: phy: marvell-88q2xxx: remove duplicated assignment of pma_extable

Remove assignment of phydev->pma_extable in mv88q222x_revb0_config_init.
It is already done in mv88q2xxx_config_init, just call
mv88q2xxx_config_init.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Link: https://lore.kernel.org/r/20240218075753.18067-14-dima.fedrau@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: phy: marvell-88q2xxx: cleanup mv88q2xxx_config_init
Dimitri Fedrau [Sun, 18 Feb 2024 07:57:49 +0000 (08:57 +0100)]
net: phy: marvell-88q2xxx: cleanup mv88q2xxx_config_init

mv88q2xxx_config_init calls genphy_c45_read_pma which is done by
mv88q2xxx_read_status, it calls also mv88q2xxx_config_aneg which is
also called by the PHY state machine. Let the PHY state machine handle
the phydriver ops in their intendend way.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Stefan Eichenberger <eichest@gmail.com>
Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Link: https://lore.kernel.org/r/20240218075753.18067-13-dima.fedrau@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: phy: marvell-88q2xxx: switch to mv88q2xxx_config_aneg
Dimitri Fedrau [Sun, 18 Feb 2024 07:57:48 +0000 (08:57 +0100)]
net: phy: marvell-88q2xxx: switch to mv88q2xxx_config_aneg

Switch to mv88q2xxx_config_aneg for Marvell 88Q2220 devices and remove
the mv88q222x_config_aneg function which is basically a copy of the
mv88q2xxx_config_aneg function.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Link: https://lore.kernel.org/r/20240218075753.18067-12-dima.fedrau@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: phy: marvell-88q2xxx: make mv88q2xxx_config_aneg generic
Dimitri Fedrau [Sun, 18 Feb 2024 07:57:47 +0000 (08:57 +0100)]
net: phy: marvell-88q2xxx: make mv88q2xxx_config_aneg generic

Marvell 88Q2xxx devices follow the same scheme, after configuration they
need a soft reset. Soft resets differ between devices, so we use the
.soft_reset callback instead of creating .config_aneg callbacks for each
device.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Link: https://lore.kernel.org/r/20240218075753.18067-11-dima.fedrau@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: phy: marvell-88q2xxx: add cable test support
Dimitri Fedrau [Sun, 18 Feb 2024 07:57:46 +0000 (08:57 +0100)]
net: phy: marvell-88q2xxx: add cable test support

Add cable test support for Marvell 88Q222x devices. Reported distance
granularity is 1m.

1m cable, open:
  Cable test started for device eth0.
  Cable test completed for device eth0.
  Pair A code Open Circuit
  Pair A, fault length: 1.00m

1m cable, shorted:
  Cable test started for device eth0.
  Cable test completed for device eth0.
  Pair A code Short within Pair
  Pair A, fault length: 1.00m

6m cable, open:
  Cable test started for device eth0.
  Cable test completed for device eth0.
  Pair A code Open Circuit
  Pair A, fault length: 6.00m

6m cable, shorted:
  Cable test started for device eth0.
  Cable test completed for device eth0.
  Pair A code Short within Pair
  Pair A, fault length: 6.00m

Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240218075753.18067-10-dima.fedrau@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: phy: marvell-88q2xxx: add support for temperature sensor
Dimitri Fedrau [Sun, 18 Feb 2024 07:57:45 +0000 (08:57 +0100)]
net: phy: marvell-88q2xxx: add support for temperature sensor

Marvell 88q2xxx devices have an inbuilt temperature sensor. Add hwmon
support for this sensor.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Link: https://lore.kernel.org/r/20240218075753.18067-9-dima.fedrau@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: phy: marvell-88q2xxx: add suspend / resume ops
Dimitri Fedrau [Sun, 18 Feb 2024 07:57:44 +0000 (08:57 +0100)]
net: phy: marvell-88q2xxx: add suspend / resume ops

Add suspend/resume ops for Marvell 88Q2xxx devices.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Link: https://lore.kernel.org/r/20240218075753.18067-8-dima.fedrau@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: phy: marvell-88q2xxx: add interrupt support for link detection
Dimitri Fedrau [Sun, 18 Feb 2024 07:57:43 +0000 (08:57 +0100)]
net: phy: marvell-88q2xxx: add interrupt support for link detection

Added .config_intr and .handle_interrupt callbacks. Whenever the link
goes up or down an interrupt will be triggered. Interrupts are configured
separately for 100/1000BASET1.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Link: https://lore.kernel.org/r/20240218075753.18067-7-dima.fedrau@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: phy: marvell-88q2xxx: add driver for the Marvell 88Q2220 PHY
Dimitri Fedrau [Sun, 18 Feb 2024 07:57:42 +0000 (08:57 +0100)]
net: phy: marvell-88q2xxx: add driver for the Marvell 88Q2220 PHY

Add a driver for the Marvell 88Q2220. This driver allows to detect the
link, switch between 100BASE-T1 and 1000BASE-T1 and switch between
master and slave mode. Autonegotiation is supported.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Gregor Herburger <gregor.herburger@ew.tq-group.com>
Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Link: https://lore.kernel.org/r/20240218075753.18067-6-dima.fedrau@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>