]> git.apps.os.sepia.ceph.com Git - ceph-client.git/log
ceph-client.git
8 months agobpf: Add BPF_SOCK_OPS_TSTAMP_SND_SW_CB callback
Jason Xing [Thu, 20 Feb 2025 07:29:35 +0000 (15:29 +0800)]
bpf: Add BPF_SOCK_OPS_TSTAMP_SND_SW_CB callback

Support sw SCM_TSTAMP_SND case for bpf timestamping.

Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_SND_SW_CB. This
callback will occur at the same timestamping point as the user
space's software SCM_TSTAMP_SND. The BPF program can use it to
get the same SCM_TSTAMP_SND timestamp without modifying the
user-space application.

Based on this patch, BPF program will get the software
timestamp when the driver is ready to send the skb. In the
sebsequent patch, the hardware timestamp will be supported.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250220072940.99994-8-kerneljasonxing@gmail.com
8 months agobpf: Add BPF_SOCK_OPS_TSTAMP_SCHED_CB callback
Jason Xing [Thu, 20 Feb 2025 07:29:34 +0000 (15:29 +0800)]
bpf: Add BPF_SOCK_OPS_TSTAMP_SCHED_CB callback

Support SCM_TSTAMP_SCHED case for bpf timestamping.

Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_SCHED_CB. This
callback will occur at the same timestamping point as the user
space's SCM_TSTAMP_SCHED. The BPF program can use it to get the
same SCM_TSTAMP_SCHED timestamp without modifying the user-space
application.

A new SKBTX_BPF flag is added to mark skb_shinfo(skb)->tx_flags,
ensuring that the new BPF timestamping and the current user
space's SO_TIMESTAMPING do not interfere with each other.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250220072940.99994-7-kerneljasonxing@gmail.com
8 months agonet-timestamp: Prepare for isolating two modes of SO_TIMESTAMPING
Jason Xing [Thu, 20 Feb 2025 07:29:33 +0000 (15:29 +0800)]
net-timestamp: Prepare for isolating two modes of SO_TIMESTAMPING

No functional changes here. Only add test to see if the orig_skb
matches the usage of application SO_TIMESTAMPING.

In this series, bpf timestamping and previous socket timestamping
are implemented in the same function __skb_tstamp_tx(). To test
the socket enables socket timestamping feature, this function
skb_tstamp_tx_report_so_timestamping() is added.

In the next patch, another check for bpf timestamping feature
will be introduced just like the above report function, namely,
skb_tstamp_tx_report_bpf_timestamping(). Then users will be able
to know the socket enables either or both of features.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250220072940.99994-6-kerneljasonxing@gmail.com
8 months agobpf: Disable unsafe helpers in TX timestamping callbacks
Jason Xing [Thu, 20 Feb 2025 07:29:32 +0000 (15:29 +0800)]
bpf: Disable unsafe helpers in TX timestamping callbacks

New TX timestamping sock_ops callbacks will be added in the
subsequent patch. Some of the existing BPF helpers will not
be safe to be used in the TX timestamping callbacks.

The bpf_sock_ops_setsockopt, bpf_sock_ops_getsockopt, and
bpf_sock_ops_cb_flags_set require owning the sock lock. TX
timestamping callbacks will not own the lock.

The bpf_sock_ops_load_hdr_opt needs the skb->data pointing
to the TCP header. This will not be true in the TX timestamping
callbacks.

At the beginning of these helpers, this patch checks the
bpf_sock->op to ensure these helpers are used by the existing
sock_ops callbacks only.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250220072940.99994-5-kerneljasonxing@gmail.com
8 months agobpf: Prevent unsafe access to the sock fields in the BPF timestamping callback
Jason Xing [Thu, 20 Feb 2025 07:29:31 +0000 (15:29 +0800)]
bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback

The subsequent patch will implement BPF TX timestamping. It will
call the sockops BPF program without holding the sock lock.

This breaks the current assumption that all sock ops programs will
hold the sock lock. The sock's fields of the uapi's bpf_sock_ops
requires this assumption.

To address this, a new "u8 is_locked_tcp_sock;" field is added. This
patch sets it in the current sock_ops callbacks. The "is_fullsock"
test is then replaced by the "is_locked_tcp_sock" test during
sock_ops_convert_ctx_access().

The new TX timestamping callbacks added in the subsequent patch will
not have this set. This will prevent unsafe access from the new
timestamping callbacks.

Potentially, we could allow read-only access. However, this would
require identifying which callback is read-safe-only and also requires
additional BPF instruction rewrites in the covert_ctx. Since the BPF
program can always read everything from a socket (e.g., by using
bpf_core_cast), this patch keeps it simple and disables all read
and write access to any socket fields through the bpf_sock_ops
UAPI from the new TX timestamping callback.

Moreover, note that some of the fields in bpf_sock_ops are specific
to tcp_sock, and sock_ops currently only supports tcp_sock. In
the future, UDP timestamping will be added, which will also break
this assumption. The same idea used in this patch will be reused.
Considering that the current sock_ops only supports tcp_sock, the
variable is named is_locked_"tcp"_sock.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250220072940.99994-4-kerneljasonxing@gmail.com
8 months agobpf: Prepare the sock_ops ctx and call bpf prog for TX timestamping
Jason Xing [Thu, 20 Feb 2025 07:29:30 +0000 (15:29 +0800)]
bpf: Prepare the sock_ops ctx and call bpf prog for TX timestamping

This patch introduces a new bpf_skops_tx_timestamping() function
that prepares the "struct bpf_sock_ops" ctx and then executes the
sockops BPF program.

The subsequent patch will utilize bpf_skops_tx_timestamping() at
the existing TX timestamping kernel callbacks (__sk_tstamp_tx
specifically) to call the sockops BPF program. Later, four callback
points to report information to user space based on this patch will
be introduced.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250220072940.99994-3-kerneljasonxing@gmail.com
8 months agobpf: Add networking timestamping support to bpf_get/setsockopt()
Jason Xing [Thu, 20 Feb 2025 07:29:29 +0000 (15:29 +0800)]
bpf: Add networking timestamping support to bpf_get/setsockopt()

The new SK_BPF_CB_FLAGS and new SK_BPF_CB_TX_TIMESTAMPING are
added to bpf_get/setsockopt. The later patches will implement the
BPF networking timestamping. The BPF program will use
bpf_setsockopt(SK_BPF_CB_FLAGS, SK_BPF_CB_TX_TIMESTAMPING) to
enable the BPF networking timestamping on a socket.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250220072940.99994-2-kerneljasonxing@gmail.com
8 months agoMerge branch 'bpf-support-setting-max-rto-for-bpf_setsockopt'
Martin KaFai Lau [Wed, 19 Feb 2025 20:30:52 +0000 (12:30 -0800)]
Merge branch 'bpf-support-setting-max-rto-for-bpf_setsockopt'

Jason Xing says:

====================
Support max RTO set by BPF program calling bpf_setsockopt().
Add corresponding selftests.
====================

Link: https://patch.msgid.link/20250219081333.56378-1-kerneljasonxing@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
8 months agoselftests/bpf: Add rto max for bpf_setsockopt test
Jason Xing [Wed, 19 Feb 2025 08:13:32 +0000 (16:13 +0800)]
selftests/bpf: Add rto max for bpf_setsockopt test

Test the TCP_RTO_MAX_MS optname in the existing setget_sockopt test.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250219081333.56378-3-kerneljasonxing@gmail.com
8 months agobpf: Support TCP_RTO_MAX_MS for bpf_setsockopt
Jason Xing [Wed, 19 Feb 2025 08:13:31 +0000 (16:13 +0800)]
bpf: Support TCP_RTO_MAX_MS for bpf_setsockopt

Some applications don't want to wait for too long because the
time of retransmission increases exponentially and can reach more
than 10 seconds, for example. Eric implements the core logic
on supporting rto max feature in the stack previously. Based on that,
we can support it for BPF use.

This patch reuses the same logic of TCP_RTO_MAX_MS in do_tcp_setsockopt()
and do_tcp_getsockopt(). BPF program can call bpf_{set/get}sockopt()
to set/get the maximum value of RTO.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219081333.56378-2-kerneljasonxing@gmail.com
8 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 13 Feb 2025 20:43:01 +0000 (12:43 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.14-rc3).

No conflicts or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoMerge tag 'net-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 13 Feb 2025 20:17:04 +0000 (12:17 -0800)]
Merge tag 'net-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter, wireless and bluetooth.

  Kalle Valo steps down after serving as the WiFi driver maintainer for
  over a decade.

  Current release - fix to a fix:

   - vsock: orphan socket after transport release, avoid null-deref

   - Bluetooth: L2CAP: fix corrupted list in hci_chan_del

  Current release - regressions:

   - eth:
      - stmmac: correct Rx buffer layout when SPH is enabled
      - iavf: fix a locking bug in an error path

   - rxrpc: fix alteration of headers whilst zerocopy pending

   - s390/qeth: move netif_napi_add_tx() and napi_enable() from under BH

   - Revert "netfilter: flowtable: teardown flow if cached mtu is stale"

  Current release - new code bugs:

   - rxrpc: fix ipv6 path MTU discovery, only ipv4 worked

   - pse-pd: fix deadlock in current limit functions

  Previous releases - regressions:

   - rtnetlink: fix netns refleak with rtnl_setlink()

   - wifi: brcmfmac: use random seed flag for BCM4355 and BCM4364
     firmware

  Previous releases - always broken:

   - add missing RCU protection of struct net throughout the stack

   - can: rockchip: bail out if skb cannot be allocated

   - eth: ti: am65-cpsw: base XDP support fixes

  Misc:

   - ethtool: tsconfig: update the format of hwtstamp flags, changes the
     uAPI but this uAPI was not in any release yet"

* tag 'net-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
  net: pse-pd: Fix deadlock in current limit functions
  rxrpc: Fix ipv6 path MTU discovery
  Reapply "net: skb: introduce and use a single page frag cache"
  s390/qeth: move netif_napi_add_tx() and napi_enable() from under BH
  mlxsw: Add return value check for mlxsw_sp_port_get_stats_raw()
  ipv6: mcast: add RCU protection to mld_newpack()
  team: better TEAM_OPTION_TYPE_STRING validation
  Bluetooth: L2CAP: Fix corrupted list in hci_chan_del
  Bluetooth: btintel_pcie: Fix a potential race condition
  Bluetooth: L2CAP: Fix slab-use-after-free Read in l2cap_send_cmd
  net: ethernet: ti: am65_cpsw: fix tx_cleanup for XDP case
  net: ethernet: ti: am65-cpsw: fix RX & TX statistics for XDP_TX case
  net: ethernet: ti: am65-cpsw: fix memleak in certain XDP cases
  vsock/test: Add test for SO_LINGER null ptr deref
  vsock: Orphan socket after transport release
  MAINTAINERS: Add sctp headers to the general netdev entry
  Revert "netfilter: flowtable: teardown flow if cached mtu is stale"
  iavf: Fix a locking bug in an error path
  rxrpc: Fix alteration of headers whilst zerocopy pending
  net: phylink: make configuring clock-stop dependent on MAC support
  ...

8 months agoMerge tag 'for-6.14-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Thu, 13 Feb 2025 20:06:29 +0000 (12:06 -0800)]
Merge tag 'for-6.14-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - fix stale page cache after race between readahead and direct IO write

 - fix hole expansion when writing at an offset beyond EOF, the range
   will not be zeroed

 - use proper way to calculate offsets in folio ranges

* tag 'for-6.14-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix hole expansion when writing at an offset beyond EOF
  btrfs: fix stale page cache after race between readahead and direct IO write
  btrfs: fix two misuses of folio_shift()

8 months agoMerge tag 'bcachefs-2025-02-12' of git://evilpiepirate.org/bcachefs
Linus Torvalds [Thu, 13 Feb 2025 19:58:11 +0000 (11:58 -0800)]
Merge tag 'bcachefs-2025-02-12' of git://evilpiepirate.org/bcachefs

Pull bcachefs fixes from Kent Overstreet:
 "Just small stuff.

  As a general announcement, on disk format is now frozen in my master
  branch - future on disk format changes will be optional, not required.

   - More fixes for going read-only: the previous fix was insufficient,
     but with more work on ordering journal reclaim flushing (and a
     btree node accounting fix so we don't split until we have to) the
     tiering_replication test now consistently goes read-only in less
     than a second.

   - fix for fsck when we have reflink pointers to missing indirect
     extents

   - some transaction restart handling fixes from Alan; the "Pass
     _orig_restart_count to trans_was_restarted" likely fixes some rare
     undefined behaviour heisenbugs"

* tag 'bcachefs-2025-02-12' of git://evilpiepirate.org/bcachefs:
  bcachefs: Reuse transaction
  bcachefs: Pass _orig_restart_count to trans_was_restarted
  bcachefs: CONFIG_BCACHEFS_INJECT_TRANSACTION_RESTARTS
  bcachefs: Fix want_new_bset() so we write until the end of the btree node
  bcachefs: Split out journal pins by btree level
  bcachefs: Fix use after free
  bcachefs: Fix marking reflink pointers to missing indirect extents

8 months agonet: pse-pd: Fix deadlock in current limit functions
Kory Maincent [Wed, 12 Feb 2025 15:17:51 +0000 (16:17 +0100)]
net: pse-pd: Fix deadlock in current limit functions

Fix a deadlock in pse_pi_get_current_limit and pse_pi_set_current_limit
caused by consecutive mutex_lock calls. One in the function itself and
another in pse_pi_get_voltage.

Resolve the issue by using the unlocked version of pse_pi_get_voltage
instead.

Fixes: e0a5e2bba38a ("net: pse-pd: Use power limit at driver side instead of current limit")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20250212151751.1515008-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agorxrpc: Fix ipv6 path MTU discovery
David Howells [Wed, 12 Feb 2025 11:21:24 +0000 (11:21 +0000)]
rxrpc: Fix ipv6 path MTU discovery

rxrpc path MTU discovery currently only makes use of ICMPv4, but not
ICMPv6, which means that pmtud for IPv6 doesn't work correctly.  Fix it to
check for ICMPv6 messages also.

Fixes: eeaedc5449d9 ("rxrpc: Implement path-MTU probing using padded PING ACKs (RFC8899)")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/3517283.1739359284@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoMerge tag 'for-net-2025-02-13' of git://git.kernel.org/pub/scm/linux/kernel/git/bluet...
Jakub Kicinski [Thu, 13 Feb 2025 17:41:33 +0000 (09:41 -0800)]
Merge tag 'for-net-2025-02-13' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - btintel_pcie: Fix a potential race condition
 - L2CAP: Fix slab-use-after-free Read in l2cap_send_cmd
 - L2CAP: Fix corrupted list in hci_chan_del

* tag 'for-net-2025-02-13' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: L2CAP: Fix corrupted list in hci_chan_del
  Bluetooth: btintel_pcie: Fix a potential race condition
  Bluetooth: L2CAP: Fix slab-use-after-free Read in l2cap_send_cmd
====================

Link: https://patch.msgid.link/20250213162446.617632-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoMerge tag 'nf-25-02-13' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Jakub Kicinski [Thu, 13 Feb 2025 17:38:50 +0000 (09:38 -0800)]
Merge tag 'nf-25-02-13' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following batch contains one revert for:

1) Revert flowtable entry teardown cycle when skbuff exceeds mtu to
   deal with DF flag unset scenarios. This is reverts a patch coming
   in the previous merge window (available in 6.14-rc releases).

* tag 'nf-25-02-13' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  Revert "netfilter: flowtable: teardown flow if cached mtu is stale"
====================

Link: https://patch.msgid.link/20250213100502.3983-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoReapply "net: skb: introduce and use a single page frag cache"
Jakub Kicinski [Thu, 13 Feb 2025 16:49:44 +0000 (08:49 -0800)]
Reapply "net: skb: introduce and use a single page frag cache"

This reverts commit 011b0335903832facca86cd8ed05d7d8d94c9c76.

Sabrina reports that the revert may trigger warnings due to intervening
changes, especially the ability to rise MAX_SKB_FRAGS. Let's drop it
and revisit once that part is also ironed out.

Fixes: 011b03359038 ("Revert "net: skb: introduce and use a single page frag cache"")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/6bf54579233038bc0e76056c5ea459872ce362ab.1739375933.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoMerge tag 'loongarch-fixes-6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 13 Feb 2025 16:43:46 +0000 (08:43 -0800)]
Merge tag 'loongarch-fixes-6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
 "Fix bugs about idle, kernel_page_present(), IP checksum and KVM, plus
  some trival cleanups"

* tag 'loongarch-fixes-6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: KVM: Set host with kernel mode when switch to VM mode
  LoongArch: KVM: Remove duplicated cache attribute setting
  LoongArch: KVM: Fix typo issue about GCFG feature detection
  LoongArch: csum: Fix OoB access in IP checksum code for negative lengths
  LoongArch: Remove the deprecated notifier hook mechanism
  LoongArch: Use str_yes_no() helper function for /proc/cpuinfo
  LoongArch: Fix kernel_page_present() for KPRANGE/XKPRANGE
  LoongArch: Fix idle VS timer enqueue

8 months agoMerge tag 'platform-drivers-x86-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 13 Feb 2025 16:41:48 +0000 (08:41 -0800)]
Merge tag 'platform-drivers-x86-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:

 - thinkpad_acpi:
     - Fix registration of tpacpi platform driver
     - Support fan speed in ticks per revolution (Thinkpad X120e)
     - Support V9 DYTC profiles (new Thinkpad AMD platforms)

 - int3472: Handle GPIO "enable" vs "reset" variation (ov7251)

* tag 'platform-drivers-x86-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: thinkpad_acpi: Fix registration of tpacpi platform driver
  platform/x86: int3472: Call "reset" GPIO "enable" for INT347E
  platform/x86: int3472: Use correct type for "polarity", call it gpio_flags
  platform/x86: thinkpad_acpi: Support for V9 DYTC platform profiles
  platform/x86: thinkpad_acpi: Fix invalid fan speed on ThinkPad X120e

8 months agos390/qeth: move netif_napi_add_tx() and napi_enable() from under BH
Alexandra Winter [Wed, 12 Feb 2025 16:36:59 +0000 (17:36 +0100)]
s390/qeth: move netif_napi_add_tx() and napi_enable() from under BH

Like other drivers qeth is calling local_bh_enable() after napi_schedule()
to kick-start softirqs [0].
Since netif_napi_add_tx() and napi_enable() now take the netdev_lock()
mutex [1], move them out from under the BH protection. Same solution as in
commit a60558644e20 ("wifi: mt76: move napi_enable() from under BH")

Fixes: 1b23cdbd2bbc ("net: protect netdev->napi_list with netdev_lock()")
Link: https://lore.kernel.org/netdev/20240612181900.4d9d18d0@kernel.org/
Link: https://lore.kernel.org/netdev/20250115035319.559603-1-kuba@kernel.org/
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Acked-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250212163659.2287292-1-wintera@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet: usb: asix_devices: add FiberGecko DeviceID
Max Schulze [Wed, 12 Feb 2025 15:09:51 +0000 (16:09 +0100)]
net: usb: asix_devices: add FiberGecko DeviceID

The FiberGecko is a small USB module that connects a 100 Mbit/s SFP

Signed-off-by: Max Schulze <max.schulze@online.de>
Tested-by: Max Schulze <max.schulze@online.de>
Suggested-by: David Hollis <dhollis@davehollis.com>
Reported-by: Sven Kreiensen <s.kreiensen@lyconsys.com>
Link: https://patch.msgid.link/20250212150957.43900-2-max.schulze@online.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agomlxsw: Add return value check for mlxsw_sp_port_get_stats_raw()
Wentao Liang [Wed, 12 Feb 2025 15:23:11 +0000 (23:23 +0800)]
mlxsw: Add return value check for mlxsw_sp_port_get_stats_raw()

Add a check for the return value of mlxsw_sp_port_get_stats_raw()
in __mlxsw_sp_port_get_stats(). If mlxsw_sp_port_get_stats_raw()
returns an error, exit the function to prevent further processing
with potentially invalid data.

Fixes: 614d509aa1e7 ("mlxsw: Move ethtool_ops to spectrum_ethtool.c")
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20250212152311.1332-1-vulab@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoipv6: mcast: add RCU protection to mld_newpack()
Eric Dumazet [Wed, 12 Feb 2025 14:10:21 +0000 (14:10 +0000)]
ipv6: mcast: add RCU protection to mld_newpack()

mld_newpack() can be called without RTNL or RCU being held.

Note that we no longer can use sock_alloc_send_skb() because
ipv6.igmp_sk uses GFP_KERNEL allocations which can sleep.

Instead use alloc_skb() and charge the net->ipv6.igmp_sk
socket under RCU protection.

Fixes: b8ad0cbc58f7 ("[NETNS][IPV6] mcast - handle several network namespace")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250212141021.1663666-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoteam: better TEAM_OPTION_TYPE_STRING validation
Eric Dumazet [Wed, 12 Feb 2025 13:49:28 +0000 (13:49 +0000)]
team: better TEAM_OPTION_TYPE_STRING validation

syzbot reported following splat [1]

Make sure user-provided data contains one nul byte.

[1]
 BUG: KMSAN: uninit-value in string_nocheck lib/vsprintf.c:633 [inline]
 BUG: KMSAN: uninit-value in string+0x3ec/0x5f0 lib/vsprintf.c:714
  string_nocheck lib/vsprintf.c:633 [inline]
  string+0x3ec/0x5f0 lib/vsprintf.c:714
  vsnprintf+0xa5d/0x1960 lib/vsprintf.c:2843
  __request_module+0x252/0x9f0 kernel/module/kmod.c:149
  team_mode_get drivers/net/team/team_core.c:480 [inline]
  team_change_mode drivers/net/team/team_core.c:607 [inline]
  team_mode_option_set+0x437/0x970 drivers/net/team/team_core.c:1401
  team_option_set drivers/net/team/team_core.c:375 [inline]
  team_nl_options_set_doit+0x1339/0x1f90 drivers/net/team/team_core.c:2662
  genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
  genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
  genl_rcv_msg+0x1214/0x12c0 net/netlink/genetlink.c:1210
  netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2543
  genl_rcv+0x40/0x60 net/netlink/genetlink.c:1219
  netlink_unicast_kernel net/netlink/af_netlink.c:1322 [inline]
  netlink_unicast+0xf52/0x1260 net/netlink/af_netlink.c:1348
  netlink_sendmsg+0x10da/0x11e0 net/netlink/af_netlink.c:1892
  sock_sendmsg_nosec net/socket.c:718 [inline]
  __sock_sendmsg+0x30f/0x380 net/socket.c:733
  ____sys_sendmsg+0x877/0xb60 net/socket.c:2573
  ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2627
  __sys_sendmsg net/socket.c:2659 [inline]
  __do_sys_sendmsg net/socket.c:2664 [inline]
  __se_sys_sendmsg net/socket.c:2662 [inline]
  __x64_sys_sendmsg+0x212/0x3c0 net/socket.c:2662
  x64_sys_call+0x2ed6/0x3c30 arch/x86/include/generated/asm/syscalls_64.h:47
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device")
Reported-by: syzbot+1fcd957a82e3a1baa94d@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1fcd957a82e3a1baa94d
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250212134928.1541609-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agor8169: add support for Intel Killer E5000
Heiner Kallweit [Wed, 12 Feb 2025 07:03:56 +0000 (08:03 +0100)]
r8169: add support for Intel Killer E5000

This adds support for the Intel Killer E5000 which seems to be a
rebranded RTL8126. Copied from r8126 vendor driver.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/9db73e9b-e2e8-45de-97a5-041c5f71d774@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoixgene-v2: prepare for phylib stop exporting phy_10_100_features_array
Heiner Kallweit [Wed, 12 Feb 2025 06:32:52 +0000 (07:32 +0100)]
ixgene-v2: prepare for phylib stop exporting phy_10_100_features_array

As part of phylib cleanup we plan to stop exporting the feature arrays.
So explicitly remove the modes not supported by the MAC. The media type
bits don't have any impact on kernel behavior, so don't touch them.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Link: https://patch.msgid.link/be356a21-5a1a-45b3-9407-3a97f3af4600@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agosctp: Remove commented out code
Thorsten Blum [Tue, 11 Feb 2025 10:20:56 +0000 (11:20 +0100)]
sctp: Remove commented out code

Remove commented out code.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20250211102057.587182-1-thorsten.blum@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoBluetooth: L2CAP: Fix corrupted list in hci_chan_del
Luiz Augusto von Dentz [Thu, 6 Feb 2025 20:54:45 +0000 (15:54 -0500)]
Bluetooth: L2CAP: Fix corrupted list in hci_chan_del

This fixes the following trace by reworking the locking of l2cap_conn
so instead of only locking when changing the chan_l list this promotes
chan_lock to a general lock of l2cap_conn so whenever it is being held
it would prevents the likes of l2cap_conn_del to run:

list_del corruption, ffff888021297e00->prev is LIST_POISON2 (dead000000000122)
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:61!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 1 UID: 0 PID: 5896 Comm: syz-executor213 Not tainted 6.14.0-rc1-next-20250204-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 12/27/2024
RIP: 0010:__list_del_entry_valid_or_report+0x12c/0x190 lib/list_debug.c:59
Code: 8c 4c 89 fe 48 89 da e8 32 8c 37 fc 90 0f 0b 48 89 df e8 27 9f 14 fd 48 c7 c7 a0 c0 60 8c 4c 89 fe 48 89 da e8 15 8c 37 fc 90 <0f> 0b 4c 89 e7 e8 0a 9f 14 fd 42 80 3c 2b 00 74 08 4c 89 e7 e8 cb
RSP: 0018:ffffc90003f6f998 EFLAGS: 00010246
RAX: 000000000000004e RBX: dead000000000122 RCX: 01454d423f7fbf00
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: dffffc0000000000 R08: ffffffff819f077c R09: 1ffff920007eded0
R10: dffffc0000000000 R11: fffff520007eded1 R12: dead000000000122
R13: dffffc0000000000 R14: ffff8880352248d8 R15: ffff888021297e00
FS:  00007f7ace6686c0(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7aceeeb1d0 CR3: 000000003527c000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 __list_del_entry_valid include/linux/list.h:124 [inline]
 __list_del_entry include/linux/list.h:215 [inline]
 list_del_rcu include/linux/rculist.h:168 [inline]
 hci_chan_del+0x70/0x1b0 net/bluetooth/hci_conn.c:2858
 l2cap_conn_free net/bluetooth/l2cap_core.c:1816 [inline]
 kref_put include/linux/kref.h:65 [inline]
 l2cap_conn_put+0x70/0xe0 net/bluetooth/l2cap_core.c:1830
 l2cap_sock_shutdown+0xa8a/0x1020 net/bluetooth/l2cap_sock.c:1377
 l2cap_sock_release+0x79/0x1d0 net/bluetooth/l2cap_sock.c:1416
 __sock_release net/socket.c:642 [inline]
 sock_close+0xbc/0x240 net/socket.c:1393
 __fput+0x3e9/0x9f0 fs/file_table.c:448
 task_work_run+0x24f/0x310 kernel/task_work.c:227
 ptrace_notify+0x2d2/0x380 kernel/signal.c:2522
 ptrace_report_syscall include/linux/ptrace.h:415 [inline]
 ptrace_report_syscall_exit include/linux/ptrace.h:477 [inline]
 syscall_exit_work+0xc7/0x1d0 kernel/entry/common.c:173
 syscall_exit_to_user_mode_prepare kernel/entry/common.c:200 [inline]
 __syscall_exit_to_user_mode_work kernel/entry/common.c:205 [inline]
 syscall_exit_to_user_mode+0x24a/0x340 kernel/entry/common.c:218
 do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f7aceeaf449
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f7ace668218 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: fffffffffffffffc RBX: 00007f7acef39328 RCX: 00007f7aceeaf449
RDX: 000000000000000e RSI: 0000000020000100 RDI: 0000000000000004
RBP: 00007f7acef39320 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
R13: 0000000000000004 R14: 00007f7ace668670 R15: 000000000000000b
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:__list_del_entry_valid_or_report+0x12c/0x190 lib/list_debug.c:59
Code: 8c 4c 89 fe 48 89 da e8 32 8c 37 fc 90 0f 0b 48 89 df e8 27 9f 14 fd 48 c7 c7 a0 c0 60 8c 4c 89 fe 48 89 da e8 15 8c 37 fc 90 <0f> 0b 4c 89 e7 e8 0a 9f 14 fd 42 80 3c 2b 00 74 08 4c 89 e7 e8 cb
RSP: 0018:ffffc90003f6f998 EFLAGS: 00010246
RAX: 000000000000004e RBX: dead000000000122 RCX: 01454d423f7fbf00
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: dffffc0000000000 R08: ffffffff819f077c R09: 1ffff920007eded0
R10: dffffc0000000000 R11: fffff520007eded1 R12: dead000000000122
R13: dffffc0000000000 R14: ffff8880352248d8 R15: ffff888021297e00
FS:  00007f7ace6686c0(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7acef05b08 CR3: 000000003527c000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Reported-by: syzbot+10bd8fe6741eedd2be2e@syzkaller.appspotmail.com
Tested-by: syzbot+10bd8fe6741eedd2be2e@syzkaller.appspotmail.com
Fixes: b4f82f9ed43a ("Bluetooth: L2CAP: Fix slab-use-after-free Read in l2cap_send_cmd")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
8 months agoBluetooth: btintel_pcie: Fix a potential race condition
Kiran K [Fri, 31 Jan 2025 13:00:19 +0000 (18:30 +0530)]
Bluetooth: btintel_pcie: Fix a potential race condition

On HCI_OP_RESET command, firmware raises alive interrupt. Driver needs
to wait for this before sending other command. This patch fixes the potential
miss of alive interrupt due to which HCI_OP_RESET can timeout.

Expected flow:
If tx command is HCI_OP_RESET,
  1. set data->gp0_received = false
  2. send HCI_OP_RESET
  3. wait for alive interrupt

Actual flow having potential race:
If tx command is HCI_OP_RESET,
 1. send HCI_OP_RESET
   1a. Firmware raises alive interrupt here and in ISR
       data->gp0_received  is set to true
 2. set data->gp0_received = false
 3. wait for alive interrupt

Signed-off-by: Kiran K <kiran.k@intel.com>
Fixes: 05c200c8f029 ("Bluetooth: btintel_pcie: Add handshake between driver and firmware")
Reported-by: Bjorn Helgaas <helgaas@kernel.org>
Closes: https://patchwork.kernel.org/project/bluetooth/patch/20241001104451.626964-1-kiran.k@intel.com/
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
8 months agoBluetooth: L2CAP: Fix slab-use-after-free Read in l2cap_send_cmd
Luiz Augusto von Dentz [Thu, 16 Jan 2025 15:35:03 +0000 (10:35 -0500)]
Bluetooth: L2CAP: Fix slab-use-after-free Read in l2cap_send_cmd

After the hci sync command releases l2cap_conn, the hci receive data work
queue references the released l2cap_conn when sending to the upper layer.
Add hci dev lock to the hci receive data work queue to synchronize the two.

[1]
BUG: KASAN: slab-use-after-free in l2cap_send_cmd+0x187/0x8d0 net/bluetooth/l2cap_core.c:954
Read of size 8 at addr ffff8880271a4000 by task kworker/u9:2/5837

CPU: 0 UID: 0 PID: 5837 Comm: kworker/u9:2 Not tainted 6.13.0-rc5-syzkaller-00163-gab75170520d4 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Workqueue: hci1 hci_rx_work
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0x169/0x550 mm/kasan/report.c:489
 kasan_report+0x143/0x180 mm/kasan/report.c:602
 l2cap_build_cmd net/bluetooth/l2cap_core.c:2964 [inline]
 l2cap_send_cmd+0x187/0x8d0 net/bluetooth/l2cap_core.c:954
 l2cap_sig_send_rej net/bluetooth/l2cap_core.c:5502 [inline]
 l2cap_sig_channel net/bluetooth/l2cap_core.c:5538 [inline]
 l2cap_recv_frame+0x221f/0x10db0 net/bluetooth/l2cap_core.c:6817
 hci_acldata_packet net/bluetooth/hci_core.c:3797 [inline]
 hci_rx_work+0x508/0xdb0 net/bluetooth/hci_core.c:4040
 process_one_work kernel/workqueue.c:3229 [inline]
 process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3310
 worker_thread+0x870/0xd30 kernel/workqueue.c:3391
 kthread+0x2f0/0x390 kernel/kthread.c:389
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
 </TASK>

Allocated by task 5837:
 kasan_save_stack mm/kasan/common.c:47 [inline]
 kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
 poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
 __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:394
 kasan_kmalloc include/linux/kasan.h:260 [inline]
 __kmalloc_cache_noprof+0x243/0x390 mm/slub.c:4329
 kmalloc_noprof include/linux/slab.h:901 [inline]
 kzalloc_noprof include/linux/slab.h:1037 [inline]
 l2cap_conn_add+0xa9/0x8e0 net/bluetooth/l2cap_core.c:6860
 l2cap_connect_cfm+0x115/0x1090 net/bluetooth/l2cap_core.c:7239
 hci_connect_cfm include/net/bluetooth/hci_core.h:2057 [inline]
 hci_remote_features_evt+0x68e/0xac0 net/bluetooth/hci_event.c:3726
 hci_event_func net/bluetooth/hci_event.c:7473 [inline]
 hci_event_packet+0xac2/0x1540 net/bluetooth/hci_event.c:7525
 hci_rx_work+0x3f3/0xdb0 net/bluetooth/hci_core.c:4035
 process_one_work kernel/workqueue.c:3229 [inline]
 process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3310
 worker_thread+0x870/0xd30 kernel/workqueue.c:3391
 kthread+0x2f0/0x390 kernel/kthread.c:389
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Freed by task 54:
 kasan_save_stack mm/kasan/common.c:47 [inline]
 kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:582
 poison_slab_object mm/kasan/common.c:247 [inline]
 __kasan_slab_free+0x59/0x70 mm/kasan/common.c:264
 kasan_slab_free include/linux/kasan.h:233 [inline]
 slab_free_hook mm/slub.c:2353 [inline]
 slab_free mm/slub.c:4613 [inline]
 kfree+0x196/0x430 mm/slub.c:4761
 l2cap_connect_cfm+0xcc/0x1090 net/bluetooth/l2cap_core.c:7235
 hci_connect_cfm include/net/bluetooth/hci_core.h:2057 [inline]
 hci_conn_failed+0x287/0x400 net/bluetooth/hci_conn.c:1266
 hci_abort_conn_sync+0x56c/0x11f0 net/bluetooth/hci_sync.c:5603
 hci_cmd_sync_work+0x22b/0x400 net/bluetooth/hci_sync.c:332
 process_one_work kernel/workqueue.c:3229 [inline]
 process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3310
 worker_thread+0x870/0xd30 kernel/workqueue.c:3391
 kthread+0x2f0/0x390 kernel/kthread.c:389
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Reported-by: syzbot+31c2f641b850a348a734@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=31c2f641b850a348a734
Tested-by: syzbot+31c2f641b850a348a734@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
8 months agoDocumentation: dpaa2 ethernet switch driver: Fix spelling
Ritvik Gupta [Wed, 12 Feb 2025 02:13:11 +0000 (02:13 +0000)]
Documentation: dpaa2 ethernet switch driver: Fix spelling

Corrected spelling mistake

Signed-off-by: Ritvik Gupta <ritvikfoss@gmail.com>
Link: https://patch.msgid.link/20250212021311.13257-1-ritvikfoss@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 months agoarp: Convert SIOCDARP and SIOCSARP to per-netns RTNL.
Kuniyuki Iwashima [Tue, 11 Feb 2025 04:50:57 +0000 (13:50 +0900)]
arp: Convert SIOCDARP and SIOCSARP to per-netns RTNL.

ioctl(SIOCDARP/SIOCSARP) operates on a single netns fetched from
an AF_INET socket in inet_ioctl().

Let's hold rtnl_net_lock() for SIOCDARP and SIOCSARP.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250211045057.10419-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 months agonet: phy: marvell-88q2xxx: Add support for PHY LEDs on 88q2xxx
Dimitri Fedrau [Mon, 10 Feb 2025 14:53:40 +0000 (15:53 +0100)]
net: phy: marvell-88q2xxx: Add support for PHY LEDs on 88q2xxx

Marvell 88Q2XXX devices support up to two configurable Light Emitting
Diode (LED). Add minimal LED controller driver supporting the most common
uses with the 'netdev' trigger.

Reviewed-by: Stefan Eichenberger <eichest@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Link: https://patch.msgid.link/20250210-marvell-88q2xxx-leds-v4-1-3a0900dc121f@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 months agoMerge branch 'net-ethernet-ti-am65-cpsw-xdp-fixes'
Jakub Kicinski [Thu, 13 Feb 2025 04:08:47 +0000 (20:08 -0800)]
Merge branch 'net-ethernet-ti-am65-cpsw-xdp-fixes'

Roger Quadros says:

====================
net: ethernet: ti: am65-cpsw: XDP fixes

This series fixes memleak and statistics for XDP cases.
====================

Link: https://patch.msgid.link/20250210-am65-cpsw-xdp-fixes-v1-0-ec6b1f7f1aca@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet: ethernet: ti: am65_cpsw: fix tx_cleanup for XDP case
Roger Quadros [Mon, 10 Feb 2025 14:52:17 +0000 (16:52 +0200)]
net: ethernet: ti: am65_cpsw: fix tx_cleanup for XDP case

For XDP transmit case, swdata doesn't contain SKB but the
XDP Frame. Infer the correct swdata based on buffer type
and return the XDP Frame for XDP transmit case.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Fixes: 8acacc40f733 ("net: ethernet: ti: am65-cpsw: Add minimal XDP support")
Link: https://patch.msgid.link/20250210-am65-cpsw-xdp-fixes-v1-3-ec6b1f7f1aca@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet: ethernet: ti: am65-cpsw: fix RX & TX statistics for XDP_TX case
Roger Quadros [Mon, 10 Feb 2025 14:52:16 +0000 (16:52 +0200)]
net: ethernet: ti: am65-cpsw: fix RX & TX statistics for XDP_TX case

For successful XDP_TX and XDP_REDIRECT cases, the packet was received
successfully so update RX statistics. Use original received
packet length for that.

TX packets statistics are incremented on TX completion so don't
update it while TX queueing.

If xdp_convert_buff_to_frame() fails, increment tx_dropped.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Fixes: 8acacc40f733 ("net: ethernet: ti: am65-cpsw: Add minimal XDP support")
Link: https://patch.msgid.link/20250210-am65-cpsw-xdp-fixes-v1-2-ec6b1f7f1aca@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet: ethernet: ti: am65-cpsw: fix memleak in certain XDP cases
Roger Quadros [Mon, 10 Feb 2025 14:52:15 +0000 (16:52 +0200)]
net: ethernet: ti: am65-cpsw: fix memleak in certain XDP cases

If the XDP program doesn't result in XDP_PASS then we leak the
memory allocated by am65_cpsw_build_skb().

It is pointless to allocate SKB memory before running the XDP
program as we would be wasting CPU cycles for cases other than XDP_PASS.
Move the SKB allocation after evaluating the XDP program result.

This fixes the memleak. A performance boost is seen for XDP_DROP test.

XDP_DROP test:
Before: 460256 rx/s                  0 err/s
After:  784130 rx/s                  0 err/s

Fixes: 8acacc40f733 ("net: ethernet: ti: am65-cpsw: Add minimal XDP support")
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Link: https://patch.msgid.link/20250210-am65-cpsw-xdp-fixes-v1-1-ec6b1f7f1aca@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet: stmmac: dwmac-loongson: Set correct {tx,rx}_fifo_size
Huacai Chen [Mon, 10 Feb 2025 13:43:28 +0000 (21:43 +0800)]
net: stmmac: dwmac-loongson: Set correct {tx,rx}_fifo_size

Now for dwmac-loongson {tx,rx}_fifo_size are uninitialised, which means
zero. This means dwmac-loongson doesn't support changing MTU because in
stmmac_change_mtu() it requires the fifo size be no less than MTU. Thus,
set the correct tx_fifo_size and rx_fifo_size for it (16KB multiplied by
queue counts).

Here {tx,rx}_fifo_size is initialised with the initial value (also the
maximum value) of {tx,rx}_queues_to_use. So it will keep as 16KB if we
don't change the queue count, and will be larger than 16KB if we change
(decrease) the queue count. However stmmac_change_mtu() still work well
with current logic (MTU cannot be larger than 16KB for stmmac).

Note: the Fixes tag picked here is the oldest commit and key commit of
the dwmac-loongson series "stmmac: Add Loongson platform support".

Acked-by: Yanteng Si <si.yanteng@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Chong Qiao <qiaochong@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Link: https://patch.msgid.link/20250210134328.2755328-1-chenhuacai@loongson.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoLoongArch: KVM: Set host with kernel mode when switch to VM mode
Bibo Mao [Thu, 13 Feb 2025 04:02:56 +0000 (12:02 +0800)]
LoongArch: KVM: Set host with kernel mode when switch to VM mode

PRMD register is only meaningful on the beginning stage of exception
entry, and it is overwritten with nested irq or exception.

When CPU runs in VM mode, interrupt need be enabled on host. And the
mode for host had better be kernel mode rather than random or user mode.

When VM is running, the running mode with top command comes from CRMD
register, and running mode should be kernel mode since kernel function
is executing with perf command. It needs be consistent with both top and
perf command.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
8 months agoLoongArch: KVM: Remove duplicated cache attribute setting
Bibo Mao [Thu, 13 Feb 2025 04:02:56 +0000 (12:02 +0800)]
LoongArch: KVM: Remove duplicated cache attribute setting

Cache attribute comes from GPA->HPA secondary mmu page table and is
configured when kvm is enabled. It is the same for all VMs, so remove
duplicated cache attribute setting on vCPU context switch.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
8 months agoLoongArch: KVM: Fix typo issue about GCFG feature detection
Bibo Mao [Thu, 13 Feb 2025 04:02:56 +0000 (12:02 +0800)]
LoongArch: KVM: Fix typo issue about GCFG feature detection

This is typo issue and misusage about GCFG feature macro. The code
is wrong, only that it does not cause obvious problem since GCFG is
set again on vCPU context switch.

Fixes: 0d0df3c99d4f ("LoongArch: KVM: Implement kvm hardware enable, disable interface")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
8 months agoLoongArch: csum: Fix OoB access in IP checksum code for negative lengths
Yuli Wang [Thu, 13 Feb 2025 04:02:40 +0000 (12:02 +0800)]
LoongArch: csum: Fix OoB access in IP checksum code for negative lengths

Commit 69e3a6aa6be2 ("LoongArch: Add checksum optimization for 64-bit
system") would cause an undefined shift and an out-of-bounds read.

Commit 8bd795fedb84 ("arm64: csum: Fix OoB access in IP checksum code
for negative lengths") fixes the same issue on ARM64.

Fixes: 69e3a6aa6be2 ("LoongArch: Add checksum optimization for 64-bit system")
Co-developed-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Yuli Wang <wangyuli@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
8 months agoLoongArch: Remove the deprecated notifier hook mechanism
Yuli Wang [Thu, 13 Feb 2025 04:02:40 +0000 (12:02 +0800)]
LoongArch: Remove the deprecated notifier hook mechanism

The notifier hook mechanism in proc and cpuinfo is actually unnecessary
for LoongArch because it's not used anywhere.

It was originally added to the MIPS code in commit d6d3c9afaab4 ("MIPS:
MT: proc: Add support for printing VPE and TC ids"), and LoongArch then
inherited it.

But as the kernel code stands now, this notifier hook mechanism doesn't
really make sense for either LoongArch or MIPS.

In addition, the seq_file forward declaration needs to be moved to its
proper place, as only the show_ipi_list() function in smp.c requires it.

Co-developed-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Yuli Wang <wangyuli@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
8 months agoLoongArch: Use str_yes_no() helper function for /proc/cpuinfo
Yuli Wang [Thu, 13 Feb 2025 04:02:35 +0000 (12:02 +0800)]
LoongArch: Use str_yes_no() helper function for /proc/cpuinfo

Remove hard-coded strings by using the str_yes_no() helper function.

Similar to commit c4a0a4a45a45 ("MIPS: kernel: proc: Use str_yes_no()
helper function").

Co-developed-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Yuli Wang <wangyuli@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
8 months agoLoongArch: Fix kernel_page_present() for KPRANGE/XKPRANGE
Huacai Chen [Thu, 13 Feb 2025 04:02:35 +0000 (12:02 +0800)]
LoongArch: Fix kernel_page_present() for KPRANGE/XKPRANGE

Now kernel_page_present() always return true for KPRANGE/XKPRANGE
addresses, this isn't correct because hibernation (ACPI S4) use it
to distinguish whether a page is saveable. If all KPRANGE/XKPRANGE
addresses are considered as saveable, then reserved memory such as
EFI_RUNTIME_SERVICES_CODE / EFI_RUNTIME_SERVICES_DATA will also be
saved and restored.

Fix this by returning true only if the KPRANGE/XKPRANGE address is in
memblock.memory.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
8 months agoLoongArch: Fix idle VS timer enqueue
Marco Crivellari [Thu, 13 Feb 2025 04:02:35 +0000 (12:02 +0800)]
LoongArch: Fix idle VS timer enqueue

LoongArch re-enables interrupts on its idle routine and performs a
TIF_NEED_RESCHED check afterwards before putting the CPU to sleep.

The IRQs firing between the check and the idle instruction may set the
TIF_NEED_RESCHED flag. In order to deal with such a race, IRQs
interrupting __arch_cpu_idle() rollback their return address to the
beginning of __arch_cpu_idle() so that TIF_NEED_RESCHED is checked
again before going back to sleep.

However idle IRQs can also queue timers that may require a tick
reprogramming through a new generic idle loop iteration but those timers
would go unnoticed here because __arch_cpu_idle() only checks
TIF_NEED_RESCHED. It doesn't check for pending timers.

Fix this with fast-forwarding idle IRQs return address to the end of the
idle routine instead of the beginning, so that the generic idle loop can
handle both TIF_NEED_RESCHED and pending timers.

Fixes: 0603839b18f4 ("LoongArch: Add exception/interrupt handling")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
8 months agoMerge branch 'vsock-null-ptr-deref-when-so_linger-enabled'
Jakub Kicinski [Thu, 13 Feb 2025 04:01:30 +0000 (20:01 -0800)]
Merge branch 'vsock-null-ptr-deref-when-so_linger-enabled'

Michal Luczaj says:

====================
vsock: null-ptr-deref when SO_LINGER enabled

syzbot pointed out that a recent patching of a use-after-free introduced a
null-ptr-deref. This series fixes the problem and adds a test.

v2: https://lore.kernel.org/20250206-vsock-linger-nullderef-v2-0-f8a1f19146f8@rbox.co
v1: https://lore.kernel.org/20250204-vsock-linger-nullderef-v1-0-6eb1760fa93e@rbox.co
====================

Link: https://patch.msgid.link/20250210-vsock-linger-nullderef-v3-0-ef6244d02b54@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agovsock/test: Add test for SO_LINGER null ptr deref
Michal Luczaj [Mon, 10 Feb 2025 12:15:01 +0000 (13:15 +0100)]
vsock/test: Add test for SO_LINGER null ptr deref

Explicitly close() a TCP_ESTABLISHED (connectible) socket with SO_LINGER
enabled.

As for now, test does not verify if close() actually lingers.
On an unpatched machine, may trigger a null pointer dereference.

Tested-by: Luigi Leonardi <leonardi@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://patch.msgid.link/20250210-vsock-linger-nullderef-v3-2-ef6244d02b54@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agovsock: Orphan socket after transport release
Michal Luczaj [Mon, 10 Feb 2025 12:15:00 +0000 (13:15 +0100)]
vsock: Orphan socket after transport release

During socket release, sock_orphan() is called without considering that it
sets sk->sk_wq to NULL. Later, if SO_LINGER is enabled, this leads to a
null pointer dereferenced in virtio_transport_wait_close().

Orphan the socket only after transport release.

Partially reverts the 'Fixes:' commit.

KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
 lock_acquire+0x19e/0x500
 _raw_spin_lock_irqsave+0x47/0x70
 add_wait_queue+0x46/0x230
 virtio_transport_release+0x4e7/0x7f0
 __vsock_release+0xfd/0x490
 vsock_release+0x90/0x120
 __sock_release+0xa3/0x250
 sock_close+0x14/0x20
 __fput+0x35e/0xa90
 __x64_sys_close+0x78/0xd0
 do_syscall_64+0x93/0x1b0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Reported-by: syzbot+9d55b199192a4be7d02c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9d55b199192a4be7d02c
Fixes: fcdd2242c023 ("vsock: Keep the binding until socket destruction")
Tested-by: Luigi Leonardi <leonardi@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://patch.msgid.link/20250210-vsock-linger-nullderef-v3-1-ef6244d02b54@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoMAINTAINERS: Add sctp headers to the general netdev entry
Marcelo Ricardo Leitner [Mon, 10 Feb 2025 13:24:55 +0000 (10:24 -0300)]
MAINTAINERS: Add sctp headers to the general netdev entry

All SCTP patches are picked up by netdev maintainers. Two headers were
missing to be listed there.

Reported-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/b3c2dc3a102eb89bd155abca2503ebd015f50ee0.1739193671.git.marcelo.leitner@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoMerge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net...
Jakub Kicinski [Thu, 13 Feb 2025 03:53:03 +0000 (19:53 -0800)]
Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-02-11 (idpf, ixgbe, igc)

For idpf:

Sridhar fixes a couple issues in handling of RSC packets.

Josh adds a call to set_real_num_queues() to keep queue count in sync.

For ixgbe:

Piotr removes missed IS_ERR() removal when ERR_PTR usage was removed.

For igc:

Zdenek Bouska fixes reporting of Rx timestamp with AF_XDP.

Siang sets buffer type on empty frame to ensure proper handling.

* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  igc: Set buffer type for empty frames in igc_init_empty_frame
  igc: Fix HW RX timestamp when passed by ZC XDP
  ixgbe: Fix possible skb NULL pointer dereference
  idpf: call set_real_num_queues in idpf_open
  idpf: record rx queue in skb for RSC packets
  idpf: fix handling rsc packet with a single segment
====================

Link: https://patch.msgid.link/20250211214343.4092496-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonetlink: specs: add conntrack dump and stats dump support
Florian Westphal [Mon, 10 Feb 2025 15:21:52 +0000 (16:21 +0100)]
netlink: specs: add conntrack dump and stats dump support

This adds support to dump the connection tracking table
("conntrack -L") and the conntrack statistics, ("conntrack -S").

Example conntrack dump:
tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/conntrack.yaml --dump get
[{'id': 59489769,
  'mark': 0,
  'nfgen-family': 2,
  'protoinfo': {'protoinfo-tcp': {'tcp-flags-original': {'flags': {'maxack',
                                                                   'sack-perm',
                                                                   'window-scale'},
                                                         'mask': set()},
                                  'tcp-flags-reply': {'flags': {'maxack',
                                                                'sack-perm',
                                                                'window-scale'},
                                                      'mask': set()},
                                  'tcp-state': 'established',
                                  'tcp-wscale-original': 7,
                                  'tcp-wscale-reply': 8}},
  'res-id': 0,
  'secctx': {'secctx-name': 'system_u:object_r:unlabeled_t:s0'},
  'status': {'assured',
             'confirmed',
             'dst-nat-done',
             'seen-reply',
             'src-nat-done'},
  'timeout': 431949,
  'tuple-orig': {'tuple-ip': {'ip-v4-dst': '34.107.243.93',
                              'ip-v4-src': '192.168.0.114'},
                 'tuple-proto': {'proto-dst-port': 443,
                                 'proto-num': 6,
                                 'proto-src-port': 37104}},
  'tuple-reply': {'tuple-ip': {'ip-v4-dst': '192.168.0.114',
                               'ip-v4-src': '34.107.243.93'},
                  'tuple-proto': {'proto-dst-port': 37104,
                                  'proto-num': 6,
                                  'proto-src-port': 443}},
  'use': 1,
  'version': 0},
 {'id': 3402229480,

Example stats dump:
tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/conntrack.yaml --dump get-stats
[{'chain-toolong': 0,
  'clash-resolve': 3,
  'drop': 0,
 ....

Changes since last iteration:
 - Address comments from Donald Hunter, in particular, fixup "get" and
   "get-stats" descriptions, the former operation supports both dump
   and normal request (returns a single entry, if found), the latter
   only supports dumps.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20250210152159.41077-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet: avoid unconditionally touching sk_tsflags on RX
Paolo Abeni [Tue, 11 Feb 2025 17:17:31 +0000 (18:17 +0100)]
net: avoid unconditionally touching sk_tsflags on RX

After commit 5d4cc87414c5 ("net: reorganize "struct sock" fields"),
the sk_tsflags field shares the same cacheline with sk_forward_alloc.

The UDP protocol does not acquire the sock lock in the RX path;
forward allocations are protected via the receive queue spinlock;
additionally udp_recvmsg() calls sock_recv_cmsgs() unconditionally
touching sk_tsflags on each packet reception.

Due to the above, under high packet rate traffic, when the BH and the
user-space process run on different CPUs, UDP packet reception
experiences a cache miss while accessing sk_tsflags.

The receive path doesn't strictly need to access the problematic field;
change sock_set_timestamping() to maintain the relevant information
in a newly allocated sk_flags bit, so that sock_recv_cmsgs() can
take decisions accessing the latter field only.

With this patch applied, on an AMD epic server with i40e NICs, I
measured a 10% performance improvement for small packets UDP flood
performance tests - possibly a larger delta could be observed with more
recent H/W.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/dbd18c8a1171549f8249ac5a8b30b1b5ec88a425.1739294057.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoMerge branch 'netlink-specs-add-a-spec-for-nl80211-wiphy'
Jakub Kicinski [Thu, 13 Feb 2025 03:32:26 +0000 (19:32 -0800)]
Merge branch 'netlink-specs-add-a-spec-for-nl80211-wiphy'

Donald Hunter says:

====================
netlink: specs: add a spec for nl80211 wiphy

Add a rudimentary YNL spec for nl80211 that includes get-wiphy and
get-interface, along with some required enhancements to YNL and the
netlink schemas.

Patch 1 is a minor cleanup to prepare for patch 2
Patches 2-4 are new features for YNL
Patches 5-7 are updates to ynl_gen_c
Patches 8-9 are schema updates for feature parity
Patch 10 is the new nl80211 spec
====================

Link: https://patch.msgid.link/20250211120127.84858-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonetlink: specs: wireless: add a spec for nl80211
Donald Hunter [Tue, 11 Feb 2025 12:01:27 +0000 (12:01 +0000)]
netlink: specs: wireless: add a spec for nl80211

Add a rudimentary YNL spec for nl80211 that covers get-wiphy,
get-interface and get-protocol-features.

./tools/net/ynl/pyynl/cli.py --family nl80211 \
    --do get-protocol-features
{'protocol-features': {'split-wiphy-dump'}}

./tools/net/ynl/pyynl/cli.py --family nl80211 \
    --dump get-wiphy --json '{ "split-wiphy-dump": true }'

./tools/net/ynl/pyynl/cli.py --family nl80211 \
    --dump get-interface

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Link: https://patch.msgid.link/20250211120127.84858-11-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonetlink: specs: add s8, s16 to genetlink schemas
Donald Hunter [Tue, 11 Feb 2025 12:01:26 +0000 (12:01 +0000)]
netlink: specs: add s8, s16 to genetlink schemas

Add s8 and s16 types to the genetlink schemas to align scalar types
across all schemas.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250211120127.84858-10-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonetlink: specs: support nested structs in genetlink legacy
Donald Hunter [Tue, 11 Feb 2025 12:01:25 +0000 (12:01 +0000)]
netlink: specs: support nested structs in genetlink legacy

Nested structs are already supported in netlink-raw. Add the same
capability to the genetlink legacy schema.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250211120127.84858-9-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agotools/net/ynl: add indexed-array scalar support to ynl-gen-c
Donald Hunter [Tue, 11 Feb 2025 12:01:24 +0000 (12:01 +0000)]
tools/net/ynl: add indexed-array scalar support to ynl-gen-c

Extend ynl-gen-c.py with support for indexed-array that has a scalar
sub-type.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250211120127.84858-8-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agotools/net/ynl: sanitise enums with leading digits in ynl-gen-c
Donald Hunter [Tue, 11 Feb 2025 12:01:23 +0000 (12:01 +0000)]
tools/net/ynl: sanitise enums with leading digits in ynl-gen-c

Turn attribute names with leading digits into valid C names by
prepending an underscore, e.g. 5ghz -> _5ghz

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250211120127.84858-7-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agotools/net/ynl: add s8, s16 to valid scalars in ynl-gen-c
Donald Hunter [Tue, 11 Feb 2025 12:01:22 +0000 (12:01 +0000)]
tools/net/ynl: add s8, s16 to valid scalars in ynl-gen-c

Add the missing s8 and s16 scalar types to the list of recognised
scalars in ynl-gen-c.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250211120127.84858-6-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agotools/net/ynl: accept IP string inputs
Donald Hunter [Tue, 11 Feb 2025 12:01:21 +0000 (12:01 +0000)]
tools/net/ynl: accept IP string inputs

The ynl tool uses display-hint to know when to format IP addresses in
printed output, but not to parse IP addresses from --json input. Add
support for parsing ipv4 and ipv6 strings.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250211120127.84858-5-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agotools/net/ynl: support rendering C array members to strings
Donald Hunter [Tue, 11 Feb 2025 12:01:20 +0000 (12:01 +0000)]
tools/net/ynl: support rendering C array members to strings

The nl80211 family encodes the list of supported ciphers as a C array of
u32 values. Add support for translating arrays of scalars into strings
for enum names and display hints.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250211120127.84858-4-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agotools/net/ynl: support decoding indexed arrays as enums
Donald Hunter [Tue, 11 Feb 2025 12:01:19 +0000 (12:01 +0000)]
tools/net/ynl: support decoding indexed arrays as enums

When decoding an indexed-array with a scalar subtype, it is currently
only possible to add a display-hint. Add support for decoding each value
as an enum.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250211120127.84858-3-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agotools/net/ynl: remove extraneous plural from variable names
Donald Hunter [Tue, 11 Feb 2025 12:01:18 +0000 (12:01 +0000)]
tools/net/ynl: remove extraneous plural from variable names

_decode_array_attr() uses variable subattrs in every branch when only
one branch decodes more than a single attribute.

Change the variable name to subattr in the branches that only decode a
single attribute so that the intent is more obvious.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250211120127.84858-2-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoMerge branch 'net: dsa: add support for phylink managed EEE'
Jakub Kicinski [Thu, 13 Feb 2025 02:20:06 +0000 (18:20 -0800)]
Merge branch 'net: dsa: add support for phylink managed EEE'

Russell King says:

====================
net: dsa: add support for phylink managed EEE

This series adds support for phylink managed EEE to DSA, and converts
mt753x to make use of this feature.

Patch 1 implements a helper to indicate whether the MAC LPI operations
are populated (suggested by Vladimir)

Patch 2 makes the necessary changes to the core code - we retain calling
set_mac_eee(), but this method now becomes a way to merely validate the
arguments when using phylink managed EEE rather than performing any
configuration.

Patch 3 converts the mt7530 driver to use phylink managed EEE.
====================

Link: https://patch.msgid.link/Z6nWujbjxlkzK_3P@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet: dsa: mt7530: convert to phylink managed EEE
Russell King (Oracle) [Mon, 10 Feb 2025 10:36:54 +0000 (10:36 +0000)]
net: dsa: mt7530: convert to phylink managed EEE

Convert mt7530 to use phylink managed EEE. When enabling EEE, we set
both PMCR_FORCE_EEE1G and PMCR_FORCE_EEE100 irrespective of the speed,
and clear them both when disabling.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1thR9q-003vXI-Cp@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet: dsa: allow use of phylink managed EEE support
Russell King (Oracle) [Mon, 10 Feb 2025 10:36:49 +0000 (10:36 +0000)]
net: dsa: allow use of phylink managed EEE support

In order to allow DSA drivers to use phylink managed EEE, we need to
change the behaviour of the DSA's .set_eee() ethtool method.
Implementation of the DSA .set_mac_eee() method becomes optional with
phylink managed EEE as it is only used to validate the EEE parameters
supplied from userspace. The rest of the EEE state management should
be left to phylink.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1thR9l-003vXC-9F@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet: phylink: provide phylink_mac_implements_lpi()
Russell King (Oracle) [Mon, 10 Feb 2025 10:36:44 +0000 (10:36 +0000)]
net: phylink: provide phylink_mac_implements_lpi()

Provide a helper to determine whether the MAC operations structure
implements the LPI operations, which will be used by both phylink and
DSA.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1thR9g-003vX6-4s@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoMerge branch 'eth-fbnic-report-software-queue-stats'
Jakub Kicinski [Thu, 13 Feb 2025 00:38:03 +0000 (16:38 -0800)]
Merge branch 'eth-fbnic-report-software-queue-stats'

Jakub Kicinski says:

====================
eth: fbnic: report software queue stats

Fill in typical software queue stats.

  # ./pyynl/cli.py --spec netlink/specs/netdev.yaml --dump qstats-get
  [{'ifindex': 2,
    'rx-alloc-fail': 0,
    'rx-bytes': 398064076,
    'rx-csum-complete': 271,
    'rx-csum-none': 0,
    'rx-packets': 276044,
    'tx-bytes': 7223770,
    'tx-needs-csum': 28148,
    'tx-packets': 28449,
    'tx-stop': 0,
    'tx-wake': 0}]

Note that we don't collect csum-unnecessary, just the uncommon
cases (and unnecessary is all the rest of the packets). There
is no programatic use for these stats AFAIK, just manual debug.
====================

Link: https://patch.msgid.link/20250211181356.580800-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoeth: fbnic: re-sort the objects in the Makefile
Jakub Kicinski [Tue, 11 Feb 2025 18:13:56 +0000 (10:13 -0800)]
eth: fbnic: re-sort the objects in the Makefile

Looks like recent commit broke the sort order, fix it.

Acked-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20250211181356.580800-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoeth: fbnic: report software Tx queue stats
Jakub Kicinski [Tue, 11 Feb 2025 18:13:55 +0000 (10:13 -0800)]
eth: fbnic: report software Tx queue stats

Gather and report software Tx queue stats - checksum stats
and queue stop / start.

Acked-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20250211181356.580800-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoeth: fbnic: report software Rx queue stats
Jakub Kicinski [Tue, 11 Feb 2025 18:13:54 +0000 (10:13 -0800)]
eth: fbnic: report software Rx queue stats

Gather and report software Rx queue stats - checksum stats
and allocation failures.

Acked-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20250211181356.580800-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoeth: fbnic: wrap tx queue stats in a struct
Jakub Kicinski [Tue, 11 Feb 2025 18:13:53 +0000 (10:13 -0800)]
eth: fbnic: wrap tx queue stats in a struct

The queue stats struct is used for Rx and Tx queues. Wrap
the Tx stats in a struct and a union, so that we can reuse
the same space for Rx stats on Rx queues.

This also makes it easy to add an assert to the stat handling
code to catch new stats not being aggregated on shutdown.

Acked-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20250211181356.580800-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet: report csum_complete via qstats
Jakub Kicinski [Tue, 11 Feb 2025 18:13:52 +0000 (10:13 -0800)]
net: report csum_complete via qstats

Commit 13c7c941e729 ("netdev: add qstat for csum complete") reserved
the entry for csum complete in the qstats uAPI. Start reporting this
value now that we have a driver which needs it.

Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20250211181356.580800-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agobcachefs: Reuse transaction
Alan Huang [Wed, 12 Feb 2025 09:27:51 +0000 (17:27 +0800)]
bcachefs: Reuse transaction

bch2_nocow_write_convert_unwritten is already in transaction context:

00191 ========= TEST   generic/648
00242 kernel BUG at fs/bcachefs/btree_iter.c:3332!
00242 Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
00242 Modules linked in:
00242 CPU: 4 UID: 0 PID: 2593 Comm: fsstress Not tainted 6.13.0-rc3-ktest-g345af8f855b7 #14403
00242 Hardware name: linux,dummy-virt (DT)
00242 pstate: 60001005 (nZCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00242 pc : __bch2_trans_get+0x120/0x410
00242 lr : __bch2_trans_get+0xcc/0x410
00242 sp : ffffff80d89af600
00242 x29: ffffff80d89af600 x28: ffffff80ddb23000 x27: 00000000fffff705
00242 x26: ffffff80ddb23028 x25: ffffff80d8903fe0 x24: ffffff80ebb30168
00242 x23: ffffff80c8aeb500 x22: 000000000000005d x21: ffffff80d8904078
00242 x20: ffffff80d8900000 x19: ffffff80da9e8000 x18: 0000000000000000
00242 x17: 64747568735f6c61 x16: 6e72756f6a20726f x15: 0000000000000028
00242 x14: 0000000000000004 x13: 000000000000f787 x12: ffffffc081bbcdc8
00242 x11: 0000000000000000 x10: 0000000000000003 x9 : ffffffc08094efbc
00242 x8 : 000000001092c111 x7 : 000000000000000c x6 : ffffffc083c31fc4
00242 x5 : ffffffc083c31f28 x4 : ffffff80c8aeb500 x3 : ffffff80ebb30000
00242 x2 : 0000000000000001 x1 : 0000000000000a21 x0 : 000000000000028e
00242 Call trace:
00242  __bch2_trans_get+0x120/0x410 (P)
00242  bch2_inum_offset_err_msg+0x48/0xb0
00242  bch2_nocow_write_convert_unwritten+0x3d0/0x530
00242  bch2_nocow_write+0xeb0/0x1000
00242  __bch2_write+0x330/0x4e8
00242  bch2_write+0x1f0/0x530
00242  bch2_direct_write+0x530/0xc00
00242  bch2_write_iter+0x160/0xbe0
00242  vfs_write+0x1cc/0x360
00242  ksys_write+0x5c/0xf0
00242  __arm64_sys_write+0x20/0x30
00242  invoke_syscall.constprop.0+0x54/0xe8
00242  do_el0_svc+0x44/0xc0
00242  el0_svc+0x34/0xa0
00242  el0t_64_sync_handler+0x104/0x130
00242  el0t_64_sync+0x154/0x158
00242 Code: 6b01001f 54ffff01 79408460 3617fec0 (d4210000)
00242 ---[ end trace 0000000000000000 ]---
00242 Kernel panic - not syncing: Oops - BUG: Fatal exception

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 months agobcachefs: Pass _orig_restart_count to trans_was_restarted
Alan Huang [Wed, 12 Feb 2025 18:11:01 +0000 (02:11 +0800)]
bcachefs: Pass _orig_restart_count to trans_was_restarted

_orig_restart_count is unused now, according to the logic, trans_was_restarted
should be using _orig_restart_count.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 months agobcachefs: CONFIG_BCACHEFS_INJECT_TRANSACTION_RESTARTS
Kent Overstreet [Tue, 24 Sep 2024 02:12:31 +0000 (22:12 -0400)]
bcachefs: CONFIG_BCACHEFS_INJECT_TRANSACTION_RESTARTS

Incorrectly handled transaction restarts can be a source of heisenbugs;
add a mode where we randomly inject them to shake them out.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 months agoMerge tag 'mfd-fixes-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Linus Torvalds [Wed, 12 Feb 2025 22:22:37 +0000 (14:22 -0800)]
Merge tag 'mfd-fixes-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd

Pull MFD fix from Lee Jones:

 - Fix syscon users not specifying the "syscon" compatible

* tag 'mfd-fixes-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
  mfd: syscon: Restore device_node_to_regmap() for non-syscon nodes

8 months agoMerge branch 'use-phylib-for-reset-randomization-and-adjustable-polling'
Jakub Kicinski [Wed, 12 Feb 2025 18:49:15 +0000 (10:49 -0800)]
Merge branch 'use-phylib-for-reset-randomization-and-adjustable-polling'

Oleksij Rempel says:

====================
Use PHYlib for reset randomization and adjustable polling

This patch set tackles a DP83TG720 reset lock issue and improves PHY
polling. Rather than adding a separate polling worker to randomize PHY
resets, I chose to extend the PHYlib framework - which already handles
most of the needed functionality - with adjustable polling. This
approach not only addresses the DP83TG720-specific problem (where
synchronized resets can lock the link) but also lays the groundwork for
optimizing PHY stats polling across all PHY drivers. With generic PHY
stats coming in, we can adjust the polling interval based on hardware
characteristics, such as using longer intervals for PHYs with stable HW
counters or shorter ones for high-speed links prone to counter
overflows.

Patch version changes are tracked in separate patches.
====================

Link: https://patch.msgid.link/20250210082358.200751-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet: phy: dp83tg720: Add randomized polling intervals for link detection
Oleksij Rempel [Mon, 10 Feb 2025 08:23:58 +0000 (09:23 +0100)]
net: phy: dp83tg720: Add randomized polling intervals for link detection

Address the limitations of the DP83TG720 PHY, which cannot reliably
detect or report a stable link state. To handle this, the PHY must be
periodically reset when the link is down. However, synchronized reset
intervals between the PHY and its link partner can result in a deadlock,
preventing the link from re-establishing.

This change introduces a randomized polling interval when the link is
down to desynchronize resets between link partners.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250210082358.200751-3-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet: phy: Add support for driver-specific next update time
Oleksij Rempel [Mon, 10 Feb 2025 08:23:57 +0000 (09:23 +0100)]
net: phy: Add support for driver-specific next update time

Introduce the `phy_get_next_update_time` function to allow PHY drivers
to dynamically determine the time (in jiffies) until the next state
update event. This enables more flexible and adaptive polling intervals
based on the link state or other conditions.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250210082358.200751-2-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoMerge branch 'rate-management-on-traffic-classes-misc'
Jakub Kicinski [Wed, 12 Feb 2025 18:46:18 +0000 (10:46 -0800)]
Merge branch 'rate-management-on-traffic-classes-misc'

Tariq Toukan says:

====================
mlx5 misc

Patches 1-3 by William reduce the memory consumption for representors to
achieve better scalability.

Patches 4-5 by Akiva expose ICM memory consumption per function.

Patches 6-8 expose helpful information on RSS resources in devlink RX
reporter diagnose.

Patches 9-10 are simple enhancements by Alex Lazar.
====================

Link: https://patch.msgid.link/20250209101716.112774-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet/mlx5: XDP, Enable TX side XDP multi-buffer support
Alexei Lazar [Sun, 9 Feb 2025 10:17:16 +0000 (12:17 +0200)]
net/mlx5: XDP, Enable TX side XDP multi-buffer support

In XDP scenarios, fragmented packets can occur if the MTU is larger
than the page size, even when the packet size fits within the linear
part.
If XDP multi-buffer support is disabled, the fragmented part won't be
handled in the TX flow, leading to packet drops.

Since XDP multi-buffer support is always available, this commit removes
the conditional check for enabling it.
This ensures that XDP multi-buffer support is always enabled,
regardless of the `is_xdp_mb` parameter, and guarantees the handling of
fragmented packets in such scenarios.

Signed-off-by: Alexei Lazar <alazar@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250209101716.112774-16-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet/mlx5: Extend Ethtool loopback selftest to support non-linear SKB
Alexei Lazar [Sun, 9 Feb 2025 10:17:15 +0000 (12:17 +0200)]
net/mlx5: Extend Ethtool loopback selftest to support non-linear SKB

Current loopback test validation ignores non-linear SKB case in
the SKB access, which can lead to failures in scenarios such as
when HW GRO is enabled.
Linearize the SKB so both cases will be handled.

Signed-off-by: Alexei Lazar <alazar@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250209101716.112774-15-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet/mlx5e: Expose RSS via devlink rx reporter diagnose
Amir Tzin [Sun, 9 Feb 2025 10:17:14 +0000 (12:17 +0200)]
net/mlx5e: Expose RSS via devlink rx reporter diagnose

Underneath "rx resources" tag expose RSS diagnostic information. For
each RSS expose its rqtn, TIRs and inner TIRs.

$ devlink health diagnose auxiliary/mlx5_core.eth.0/65535 reporter rx
 .......
RSS:
    Index: 0 rqtn: 0
    TIRs Numbers:
        tt: TT_IPV4_TCP tirn: 0
        tt: TT_IPV6_TCP tirn: 1
        tt: TT_IPV4_UDP tirn: 2
        tt: TT_IPV6_UDP tirn: 3
        tt: TT_IPV4_IPSEC_AH tirn: 4
        tt: TT_IPV6_IPSEC_AH tirn: 5
        tt: TT_IPV4_IPSEC_ESP tirn: 6
        tt: TT_IPV6_IPSEC_ESP tirn: 7
        tt: TT_IPV4 tirn: 8
        tt: TT_IPV6 tirn: 9
    Inner TIRs Numbers:
        tt: TT_IPV4_TCP tirn: 10
        tt: TT_IPV6_TCP tirn: 11
        tt: TT_IPV4_UDP tirn: 12
        tt: TT_IPV6_UDP tirn: 13
        tt: TT_IPV4_IPSEC_AH tirn: 14
        tt: TT_IPV6_IPSEC_AH tirn: 15
        tt: TT_IPV4_IPSEC_ESP tirn: 16
        tt: TT_IPV6_IPSEC_ESP tirn: 17
        tt: TT_IPV4 tirn: 18
        tt: TT_IPV6 tirn: 19
    Index: 2 rqtn: 27
    TIRs Numbers:
        tt: TT_IPV6_TCP tirn: 46

Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250209101716.112774-14-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet/mlx5e: Add direct TIRs to devlink rx reporter diagnose
Amir Tzin [Sun, 9 Feb 2025 10:17:13 +0000 (12:17 +0200)]
net/mlx5e: Add direct TIRs to devlink rx reporter diagnose

Add "RX resources" tag to the output of rx reporter diagnose callback.
Underneath add tag for direct TIRs, for each TIR expose its tirn and
the corresponding rqtn.

$ sudo devlink health diagnose auxiliary/mlx5_core.eth.0/65535 reporter rx
 ....
 rx resources:
   Direct TIRs:
       ix: 0 tirn: 20 rqtn: 1
       ix: 1 tirn: 21 rqtn: 2
       ix: 2 tirn: 22 rqtn: 3
       ix: 3 tirn: 23 rqtn: 4
       ix: 4 tirn: 24 rqtn: 5
       ix: 5 tirn: 25 rqtn: 6
       ix: 6 tirn: 26 rqtn: 7
       ix: 7 tirn: 27 rqtn: 8
       ix: 8 tirn: 28 rqtn: 9
       ix: 9 tirn: 29 rqtn: 10
       ix: 10 tirn: 30 rqtn: 11
       ix: 11 tirn: 31 rqtn: 12
       ix: 12 tirn: 32 rqtn: 13
       ix: 13 tirn: 33 rqtn: 14
       ix: 14 tirn: 34 rqtn: 15
       ix: 15 tirn: 35 rqtn: 16
       ix: 16 tirn: 36 rqtn: 17
       ix: 17 tirn: 37 rqtn: 18
       ix: 18 tirn: 38 rqtn: 19
       ix: 19 tirn: 39 rqtn: 20
       ix: 20 tirn: 40 rqtn: 21
       ix: 21 tirn: 41 rqtn: 22
       ix: 22 tirn: 42 rqtn: 23
       ix: 23 tirn: 43 rqtn: 24

Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250209101716.112774-13-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet/mlx5e: Move RQs diagnose to a dedicated function
Amir Tzin [Sun, 9 Feb 2025 10:17:12 +0000 (12:17 +0200)]
net/mlx5e: Move RQs diagnose to a dedicated function

Move rx reporter RQs diagnose from mlx5e_rx_reporter_diagnose() to a
dedicated function. This change is a preparation for the following
series which extends diagnose output for the rx reporter. While at it,
also pass a mlx5e_priv pointer to
mlx5e_rx_reporter_diagnose_common_config() as this is the argument the
latter actually needs.

Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250209101716.112774-12-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet/mlx5: Expose ICM consumption per function
Akiva Goldberger [Sun, 9 Feb 2025 10:17:11 +0000 (12:17 +0200)]
net/mlx5: Expose ICM consumption per function

ICM is a portion of the host's memory assigned to a function by the OS
through requests made by the NIC's firmware.

PF ICM consumption can be accessed directly, while VF/SF ICM consumption
can be accessed through their representors in switchdev mode.

The value is exposed to the user in granularity of 4KB through the vnic
health reporter as follows:

$ devlink health diagnose pci/0000:08:00.0 reporter vnic
 vNIC env counters:
     total_error_queues: 0 send_queue_priority_update_flow: 0
     comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0
     invalid_command: 0 quota_exceeded_command: 0
     nic_receive_steering_discard: 0 icm_consumption: 1032

Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250209101716.112774-11-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet/mlx5: Rename and move mlx5_esw_query_vport_vhca_id
Akiva Goldberger [Sun, 9 Feb 2025 10:17:10 +0000 (12:17 +0200)]
net/mlx5: Rename and move mlx5_esw_query_vport_vhca_id

Rename mlx5_esw_query_vport_vhca_id to mlx5_vport_get_vhca_id and move
it to vport file. Also, add function declaration to mlx5_core header
file. This better represents the function's usage and allows for it to
be called from other parts of the mlx5_core driver.

Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250209101716.112774-10-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet/mlx5e: set the tx_queue_len for pfifo_fast
William Tu [Sun, 9 Feb 2025 10:17:09 +0000 (12:17 +0200)]
net/mlx5e: set the tx_queue_len for pfifo_fast

By default, the mq netdev creates a pfifo_fast qdisc. On a
system with 16 core, the pfifo_fast with 3 bands consumes
16 * 3 * 8 (size of pointer) * 1024 (default tx queue len)
= 393KB. The patch sets the tx qlen to representor default
value, 128 (1<<MLX5E_REP_PARAMS_DEF_LOG_SQ_SIZE), which
consumes 16 * 3 * 8 * 128 = 49KB, saving 344KB for each
representor at ECPF.

Signed-off-by: William Tu <witu@nvidia.com>
Reviewed-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250209101716.112774-9-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet/mlx5e: reduce rep rxq depth to 256 for ECPF
William Tu [Sun, 9 Feb 2025 10:17:08 +0000 (12:17 +0200)]
net/mlx5e: reduce rep rxq depth to 256 for ECPF

By experiments, a single queue representor netdev consumes kernel
memory around 2.8MB, and 1.8MB out of the 2.8MB is due to page
pool for the RXQ. Scaling to a thousand representors consumes 2.8GB,
which becomes a memory pressure issue for embedded devices such as
BlueField-2 16GB / BlueField-3 32GB memory.

Since representor netdevs mostly handles miss traffic, and ideally,
most of the traffic will be offloaded, reduce the default non-uplink
rep netdev's RXQ default depth from 1024 to 256 if mdev is ecpf eswitch
manager. This saves around 1MB of memory per regular RQ,
(1024 - 256) * 2KB, allocated from page pool.

With rxq depth of 256, the netlink page pool tool reports
$./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
 --dump page-pool-get
 {'id': 277,
  'ifindex': 9,
  'inflight': 128,
  'inflight-mem': 786432,
  'napi-id': 775}]

This is due to mtu 1500 + headroom consumes half pages, so 256 rxq
entries consumes around 128 pages (thus create a page pool with
size 128), shown above at inflight.

Note that each netdev has multiple types of RQs, including
Regular RQ, XSK, PTP, Drop, Trap RQ. Since non-uplink representor
only supports regular rq, this patch only changes the regular RQ's
default depth.

Signed-off-by: William Tu <witu@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250209101716.112774-8-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agonet/mlx5e: reduce the max log mpwrq sz for ECPF and reps
William Tu [Sun, 9 Feb 2025 10:17:07 +0000 (12:17 +0200)]
net/mlx5e: reduce the max log mpwrq sz for ECPF and reps

For the ECPF and representors, reduce the max MPWRQ size from 256KB (18)
to 128KB (17). This prepares the later patch for saving representor
memory.

With Striding RQ, there is a minimum of 4 MPWQEs. So with 128KB of max
MPWRQ size, the minimal memory is 4 * 128KB = 512KB. When creating page
pool, consider 1500 mtu, the minimal page pool size will be 512KB/4KB =
128 pages = 256 rx ring entries (2 entries per page).

Before this patch, setting RX ringsize (ethtool -G rx) to 256 causes
driver to allocate page pool size more than it needs due to max MPWRQ
is 256KB (18). Ex: 4 * 256KB = 1MB, 1MB/4KB = 256 pages, but actually
128 pages is good enough. Reducing the max MPWRQ to 128KB fixes the
limitation.

Signed-off-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250209101716.112774-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoplatform/x86: thinkpad_acpi: Fix registration of tpacpi platform driver
Mark Pearson [Tue, 11 Feb 2025 17:36:11 +0000 (12:36 -0500)]
platform/x86: thinkpad_acpi: Fix registration of tpacpi platform driver

The recent platform profile changes prevent the tpacpi platform driver
from registering. This error is seen in the kernel logs, and the
various tpacpi entries are not created:

[ 7550.642171] platform thinkpad_acpi: Resources present before probing

This happens because devm_platform_profile_register() is called before
tpacpi_pdev probes (thanks to Kurt Borja for identifying the root
cause).

For now revert back to the old platform_profile_register to fix the
issue. This is quick fix and will be re-implemented later as more
testing is needed for full solution.

Tested on X1 Carbon G12.

Fixes: 31658c916fa6 ("platform/x86: thinkpad_acpi: Use devm_platform_profile_register()")
Signed-off-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Reviewed-by: Kurt Borja <kuurtb@gmail.com>
Link: https://lore.kernel.org/r/20250211173620.16522-1-mpearson-lenovo@squebb.ca
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
8 months agoRevert "netfilter: flowtable: teardown flow if cached mtu is stale"
Pablo Neira Ayuso [Fri, 7 Feb 2025 12:25:57 +0000 (13:25 +0100)]
Revert "netfilter: flowtable: teardown flow if cached mtu is stale"

This reverts commit b8baac3b9c5cc4b261454ff87d75ae8306016ffd.

IPv4 packets with no DF flag set on result in frequent flow entry
teardown cycles, this is visible in the network topology that is used in
the nft_flowtable.sh test.

nft_flowtable.sh test ocassionally fails reporting that the dscp_fwd
test sees no packets going through the flowtable path.

Fixes: b8baac3b9c5c ("netfilter: flowtable: teardown flow if cached mtu is stale")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 months agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
Jakub Kicinski [Wed, 12 Feb 2025 03:51:16 +0000 (19:51 -0800)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-02-10 (ice, igc, e1000e)

For ice:

Karol, Jake, and Michal add PTP support for E830 devices. Karol
refactors and cleans up PTP code. Jake allows for a common
cross-timestamp implementation to be shared for all devices and
Michal adds E830 support.

Mateusz cleans up initial Flow Director rule creation to loop rather
than duplicate repeated similar calls.

For igc:

Siang adjust calls to remove need for close and open calls on loading
XDP program.

For e1000e:

Gerhard Engleder batches register writes for writing multicast table
on real-time kernels.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  e1000e: Fix real-time violations on link up
  igc: Avoid unnecessary link down event in XDP_SETUP_PROG process
  ice: refactor ice_fdir_create_dflt_rules() function
  ice: Implement PTP support for E830 devices
  ice: Refactor ice_ptp_init_tx_*
  ice: Add unified ice_capture_crosststamp
  ice: Process TSYN IRQ in a separate function
  ice: Use FIELD_PREP for timestamp values
  ice: Remove unnecessary ice_is_e8xx() functions
  ice: Don't check device type when checking GNSS presence
====================

Link: https://patch.msgid.link/20250210192352.3799673-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoiavf: Fix a locking bug in an error path
Bart Van Assche [Thu, 6 Feb 2025 17:51:08 +0000 (09:51 -0800)]
iavf: Fix a locking bug in an error path

If the netdev lock has been obtained, unlock it before returning.
This bug has been detected by the Clang thread-safety analyzer.

Fixes: afc664987ab3 ("eth: iavf: extend the netdev_lock usage")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20250206175114.1974171-28-bvanassche@acm.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agoMerge branch 'sfc-support-devlink-flash'
Jakub Kicinski [Wed, 12 Feb 2025 01:45:45 +0000 (17:45 -0800)]
Merge branch 'sfc-support-devlink-flash'

Edward Cree says:

====================
sfc: support devlink flash

Allow upgrading device firmware on Solarflare NICs through standard tools.
====================

Link: https://patch.msgid.link/cover.1739186252.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 months agosfc: document devlink flash support
Edward Cree [Mon, 10 Feb 2025 11:25:45 +0000 (11:25 +0000)]
sfc: document devlink flash support

Update the information in sfc's devlink documentation including
 support for firmware update with devlink flash.
Also update the help text for CONFIG_SFC_MTD, as it is no longer
 strictly required for firmware updates.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/3476b0ef04a0944f03e0b771ec8ed1a9c70db4dc.1739186253.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>