Each sample script sources functions.sh before parameters.sh
which makes $APPEND undefined when trapping EXIT no matter in
append mode or not. Due to this when sample scripts finished
they always do "pgctrl reset" which resets pktgen config.
So move trap to each script after sourcing parameters.sh
and trap EXIT explicitly.
Signed-off-by: J.J. Martzki <mars14850@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Díaz [Sat, 1 Jul 2023 04:41:03 +0000 (22:41 -0600)]
selftests/net: Add xt_policy config for xfrm_policy test
When running Kselftests with the current selftests/net/config
the following problem can be seen with the net:xfrm_policy.sh
selftest:
# selftests: net: xfrm_policy.sh
[ 41.076721] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
[ 41.094787] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
[ 41.107635] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
# modprobe: FATAL: Module ip_tables not found in directory /lib/modules/6.1.36
# iptables v1.8.7 (legacy): can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
# Perhaps iptables or your kernel needs to be upgraded.
# modprobe: FATAL: Module ip_tables not found in directory /lib/modules/6.1.36
# iptables v1.8.7 (legacy): can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
# Perhaps iptables or your kernel needs to be upgraded.
# SKIP: Could not insert iptables rule
ok 1 selftests: net: xfrm_policy.sh # SKIP
This is because IPsec "policy" match support is not available
to the kernel.
This patch adds CONFIG_NETFILTER_XT_MATCH_POLICY as a module
to the selftests/net/config file, so that `make
kselftest-merge` can take this into consideration.
Signed-off-by: Daniel Díaz <daniel.diaz@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 66e4c8d95008 ("net: warn if transport header was not set") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Fri, 30 Jun 2023 22:20:10 +0000 (01:20 +0300)]
net: dsa: tag_sja1105: fix source port decoding in vlan_filtering=0 bridge mode
There was a regression introduced by the blamed commit, where pinging to
a VLAN-unaware bridge would fail with the repeated message "Couldn't
decode source port" coming from the tagging protocol driver.
When receiving packets with a bridge_vid as determined by
dsa_tag_8021q_bridge_join(), dsa_8021q_rcv() will decode:
- source_port = 0 (which isn't really valid, more like "don't know")
- switch_id = 0 (which isn't really valid, more like "don't know")
- vbid = value in range 1-7
Since the blamed patch has reversed the order of the checks, we are now
going to believe that source_port != -1 and switch_id != -1, so they're
valid, but they aren't.
The minimal solution to the problem is to only populate source_port and
switch_id with what dsa_8021q_rcv() came up with, if the vbid is zero,
i.e. the source port information is trustworthy.
Fixes: c1ae02d87689 ("net: dsa: tag_sja1105: always prefer source port information from INCL_SRCPT") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Fri, 30 Jun 2023 16:41:18 +0000 (19:41 +0300)]
net: bridge: keep ports without IFF_UNICAST_FLT in BR_PROMISC mode
According to the synchronization rules for .ndo_get_stats() as seen in
Documentation/networking/netdevices.rst, acquiring a plain spin_lock()
should not be illegal, but the bridge driver implementation makes it so.
After running these commands, I am being faced with the following
lockdep splat:
$ ip link add link swp0 name macsec0 type macsec encrypt on && ip link set swp0 up
$ ip link add dev br0 type bridge vlan_filtering 1 && ip link set br0 up
$ ip link set macsec0 master br0 && ip link set macsec0 up
========================================================
WARNING: possible irq lock inversion dependency detected 6.4.0-04295-g31b577b4bd4a #603 Not tainted
--------------------------------------------------------
swapper/1/0 just changed the state of lock: ffff6bd348724cd8 (&br->lock){+.-.}-{3:3}, at: br_forward_delay_timer_expired+0x34/0x198
but this lock took another, SOFTIRQ-unsafe lock in the past:
(&ocelot->stats_lock){+.+.}-{3:3}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Chain exists of:
&br->lock --> &br->hash_lock --> &ocelot->stats_lock
swp0 is instantiated by drivers/net/dsa/ocelot/felix.c, and this
only matters to the extent that its .ndo_get_stats64() method calls
spin_lock(&ocelot->stats_lock).
Documentation/locking/lockdep-design.rst says:
| A lock is irq-safe means it was ever used in an irq context, while a lock
| is irq-unsafe means it was ever acquired with irq enabled.
(...)
| Furthermore, the following usage based lock dependencies are not allowed
| between any two lock-classes::
|
| <hardirq-safe> -> <hardirq-unsafe>
| <softirq-safe> -> <softirq-unsafe>
Lockdep marks br->hash_lock as softirq-safe, because it is sometimes
taken in softirq context (for example br_fdb_update() which runs in
NET_RX softirq), and when it's not in softirq context it blocks softirqs
by using spin_lock_bh().
Lockdep marks ocelot->stats_lock as softirq-unsafe, because it never
blocks softirqs from running, and it is never taken from softirq
context. So it can always be interrupted by softirqs.
There is a call path through which a function that holds br->hash_lock:
fdb_add_hw_addr() will call a function that acquires ocelot->stats_lock:
ocelot_port_get_stats64(). This can be seen below:
The macsec0 bridge port is created without p->flags & BR_PROMISC,
because it is what br_manage_promisc() decides for a VLAN filtering
bridge with a single auto port.
As part of the br_add_if() procedure, br_fdb_add_local() is called for
the MAC address of the device, and this results in a call to
dev_uc_add() for macsec0 while the softirq-safe br->hash_lock is taken.
Because macsec0 does not have IFF_UNICAST_FLT, dev_uc_add() ends up
calling __dev_set_promiscuity() for macsec0, which is propagated by its
implementation, macsec_dev_change_rx_flags(), to the lower device: swp0.
This triggers the call path:
with a calling context that lockdep doesn't like (br->hash_lock held).
Normally we don't see this, because even though many drivers that can be
bridge ports don't support IFF_UNICAST_FLT, we need a driver that
(a) doesn't support IFF_UNICAST_FLT, *and*
(b) it forwards the IFF_PROMISC flag to another driver, and
(c) *that* driver implements ndo_get_stats64() using a softirq-unsafe
spinlock.
Condition (b) is necessary because the first __dev_set_rx_mode() calls
__dev_set_promiscuity() with "bool notify=false", and thus, the
rtmsg_ifinfo() code path won't be entered.
The same criteria also hold true for DSA switches which don't report
IFF_UNICAST_FLT. When the DSA master uses a spin_lock() in its
ndo_get_stats64() method, the same lockdep splat can be seen.
I think the deadlock possibility is real, even though I didn't reproduce
it, and I'm thinking of the following situation to support that claim:
fdb_add_hw_addr() runs on a CPU A, in a context with softirqs locally
disabled and br->hash_lock held, and may end up attempting to acquire
ocelot->stats_lock.
In parallel, ocelot->stats_lock is currently held by a thread B (say,
ocelot_check_stats_work()), which is interrupted while holding it by a
softirq which attempts to lock br->hash_lock.
Thread B cannot make progress because br->hash_lock is held by A. Whereas
thread A cannot make progress because ocelot->stats_lock is held by B.
When taking the issue at face value, the bridge can avoid that problem
by simply making the ports promiscuous from a code path with a saner
calling context (br->hash_lock not held). A bridge port without
IFF_UNICAST_FLT is going to become promiscuous as soon as we call
dev_uc_add() on it (which we do unconditionally), so why not be
preemptive and make it promiscuous right from the beginning, so as to
not be taken by surprise.
With this, we've broken the links between code that holds br->hash_lock
or br->lock and code that calls into the ndo_change_rx_flags() or
ndo_get_stats64() ops of the bridge port.
Fixes: 2796d0c648c9 ("bridge: Automatically manage port promiscuous mode.") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Kelam [Fri, 30 Jun 2023 06:28:45 +0000 (11:58 +0530)]
octeontx2-af: Reset MAC features in FLR
AF driver configures MAC features like internal loopback and PFC upon
receiving the request from PF and its VF netdev. But these
features are not getting reset in FLR. This patch fixes the issue by
resetting the same.
Fixes: 23999b30ae67 ("octeontx2-af: Enable or disable CGX internal loopback") Fixes: 1121f6b02e7a ("octeontx2-af: Priority flow control configuration support") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Kelam [Fri, 30 Jun 2023 06:28:44 +0000 (11:58 +0530)]
octeontx2-af: Add validation before accessing cgx and lmac
with the addition of new MAC blocks like CN10K RPM and CN10KB
RPM_USX, LMACs are noncontiguous and CGX blocks are also
noncontiguous. But during RVU driver initialization, the driver
is assuming they are contiguous and trying to access
cgx or lmac with their id which is resulting in kernel panic.
This patch fixes the issue by adding proper checks.
Fixes: 91c6945ea1f9 ("octeontx2-af: cn10k: Add RPM MAC support") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Kelam [Fri, 30 Jun 2023 06:28:43 +0000 (11:58 +0530)]
octeontx2-af: Fix mapping for NIX block from CGX connection
Firmware configures NIX block mapping for all MAC blocks.
The current implementation reads the configuration and
creates the mapping between RVU PF and NIX blocks. But
this configuration is only valid for silicons that support
multiple blocks. For all other silicons, all MAC blocks
map to NIX0.
This patch corrects the mapping by adding a check for the same.
Fixes: c5a73b632b90 ("octeontx2-af: Map NIX block from CGX connection") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Kelam [Fri, 30 Jun 2023 06:28:42 +0000 (11:58 +0530)]
octeontx2-af: cn10kb: fix interrupt csr addresses
The current design is that, for asynchronous events like link_up and
link_down firmware raises the interrupt to kernel. The previous patch
which added RPM_USX driver has a bug where it uses old csr addresses
for configuring interrupts. Which is resulting in losing interrupts
from source firmware.
This patch fixes the issue by correcting csr addresses.
Fixes: b9d0fedc6234 ("octeontx2-af: cn10kb: Add RPM_USX MAC support") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Thu, 29 Jun 2023 21:47:53 +0000 (22:47 +0100)]
nvme-tcp: Fix comma-related oops
Fix a comma that should be a semicolon. The comma is at the end of an
if-body and thus makes the statement after (a bvec_set_page()) conditional
too, resulting in an oops because we didn't fill out the bio_vec[]:
docs: networking: Update codeaurora references for rmnet
source.codeaurora.org is no longer accessible and so the reference link
in the documentation is not useful. Use iproute2 instead as it has a
rmnet module for configuration.
Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com> Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 30 Jun 2023 16:00:25 +0000 (09:00 -0700)]
docs: netdev: broaden mailbot to all MAINTAINERS
Reword slightly now that all MAINTAINERS have access to the commands.
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The new LARA-R6 product variant identified by the "01B" string can be
configured (by AT interface) in three different USB modes:
* Default mode (Vendor ID: 0x1546 Product ID: 0x1311) with 4 serial
interfaces
* RmNet mode (Vendor ID: 0x1546 Product ID: 0x1312) with 4 serial
interfaces and 1 RmNet virtual network interface
* CDC-ECM mode (Vendor ID: 0x1546 Product ID: 0x1313) with 4 serial
interface and 1 CDC-ECM virtual network interface
The first 4 interfaces of all the 3 configurations (default, RmNet, ECM)
are the same.
In CDC-ECM mode LARA-R6 01B exposes the following interfaces:
If 0: Diagnostic
If 1: AT parser
If 2: AT parser
If 3: AT parset/alternative functions
If 4: CDC-ECM interface
Signed-off-by: Davide Tronchin <davide.tronchin.94@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Zhengchao Shao [Fri, 30 Jun 2023 01:26:47 +0000 (09:26 +0800)]
mlxsw: minimal: fix potential memory leak in mlxsw_m_linecards_init
The line cards array is not freed in the error path of
mlxsw_m_linecards_init(), which can lead to a memory leak. Fix by
freeing the array in the error path, thereby making the error path
identical to mlxsw_m_linecards_fini().
Fixes: 01328e23a476 ("mlxsw: minimal: Extend module to port mapping with slot index") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20230630012647.1078002-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nick Child [Wed, 28 Jun 2023 18:22:44 +0000 (13:22 -0500)]
ibmvnic: Do not reset dql stats on NON_FATAL err
All ibmvnic resets, make a call to netdev_tx_reset_queue() when
re-opening the device. netdev_tx_reset_queue() resets the num_queued
and num_completed byte counters. These stats are used in Byte Queue
Limit (BQL) algorithms. The difference between these two stats tracks
the number of bytes currently sitting on the physical NIC. ibmvnic
increases the number of queued bytes though calls to
netdev_tx_sent_queue() in the drivers xmit function. When, VIOS reports
that it is done transmitting bytes, the ibmvnic device increases the
number of completed bytes through calls to netdev_tx_completed_queue().
It is important to note that the driver batches its transmit calls and
num_queued is increased every time that an skb is added to the next
batch, not necessarily when the batch is sent to VIOS for transmission.
Unlike other reset types, a NON FATAL reset will not flush the sub crq
tx buffers. Therefore, it is possible for the batched skb array to be
partially full. So if there is call to netdev_tx_reset_queue() when
re-opening the device, the value of num_queued (0) would not account
for the skb's that are currently batched. Eventually, when the batch
is sent to VIOS, the call to netdev_tx_completed_queue() would increase
num_completed to a value greater than the num_queued. This causes a
BUG_ON crash:
Therefore, do not reset the dql stats when performing a NON_FATAL reset.
Fixes: 0d973388185d ("ibmvnic: Introduce xmit_more support using batched subCRQ hcalls") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Martin Habets [Wed, 28 Jun 2023 12:32:20 +0000 (13:32 +0100)]
sfc: support for devlink port requires MAE access
On systems without MAE permission efx->mae is not initialised,
and trying to lookup an mport results in a NULL pointer
dereference.
Fixes: 25414b2a64ae ("sfc: add devlink port support for ef100") Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthew Anderson [Sat, 24 Jun 2023 17:08:10 +0000 (12:08 -0500)]
Bluetooth: btusb: Add MT7922 bluetooth ID for the Asus Ally
Adding the device ID from the Asus Ally gets the bluetooth working
on the device.
Signed-off-by: Matthew Anderson <ruinairas1992@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ivan Orlov [Tue, 20 Jun 2023 14:40:52 +0000 (16:40 +0200)]
Bluetooth: hci_sysfs: make bt_class a static const structure
Now that the driver core allows for struct class to be in read-only
memory, move the bt_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at load time.
Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Cc: linux-bluetooth@vger.kernel.org Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bluetooth: ISO: Rework sync_interval to be sync_factor
This rework sync_interval to be sync_factor as having sync_interval in
the order of seconds is sometimes not disarable.
Wit sync_factor the application can tell how many SDU intervals it wants
to send an announcement with PA, the EA interval is set to 2 times that
so a factor of 24 of BIG SDU interval of 10ms would look like the
following:
< HCI Command: LE Set Extended Advertising Parameters (0x08|0x0036) plen 25
Handle: 0x01
Properties: 0x0000
Min advertising interval: 480.000 msec (0x0300)
Max advertising interval: 480.000 msec (0x0300)
Channel map: 37, 38, 39 (0x07)
Own address type: Random (0x01)
Peer address type: Public (0x00)
Peer address: 00:00:00:00:00:00 (OUI 00-00-00)
Filter policy: Allow Scan Request from Any, Allow Connect Request from Any (0x00)
TX power: Host has no preference (0x7f)
Primary PHY: LE 1M (0x01)
Secondary max skip: 0x00
Secondary PHY: LE 2M (0x02)
SID: 0x00
Scan request notifications: Disabled (0x00)
< HCI Command: LE Set Periodic Advertising Parameters (0x08|0x003e) plen 7
Handle: 1
Min interval: 240.00 msec (0x00c0)
Max interval: 240.00 msec (0x00c0)
Properties: 0x0000
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bluetooth: MGMT: Fix marking SCAN_RSP as not connectable
When receiving a scan response there is no way to know if the remote
device is connectable or not, so when it cannot be merged don't
make any assumption and instead just mark it with a new flag defined as
MGMT_DEV_FOUND_SCAN_RSP so userspace can tell it is a standalone
SCAN_RSP.
Pauli Virtanen [Fri, 2 Jun 2023 21:28:12 +0000 (00:28 +0300)]
Bluetooth: hci_event: fix Set CIG Parameters error status handling
If the event has error status, return right error code and don't show
incorrect "response malformed" messages.
Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pauli Virtanen [Thu, 1 Jun 2023 06:34:43 +0000 (09:34 +0300)]
Bluetooth: ISO: use hci_sync for setting CIG parameters
When reconfiguring CIG after disconnection of the last CIS, LE Remove
CIG shall be sent before LE Set CIG Parameters. Otherwise, it fails
because CIG is in the inactive state and not configurable (Core v5.3
Vol 6 Part B Sec. 4.5.14.3). This ordering is currently wrong under
suitable timing conditions, because LE Remove CIG is sent via the
hci_sync queue and may be delayed, but Set CIG Parameters is via
hci_send_cmd.
Make the ordering well-defined by sending also Set CIG Parameters via
hci_sync.
Fixes: 26afbd826ee3 ("Bluetooth: Add initial implementation of CIS connections") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Johan Hovold [Fri, 2 Jun 2023 08:19:12 +0000 (10:19 +0200)]
Bluetooth: hci_bcm: do not mark valid bd_addr as invalid
A recent commit restored the original (and still documented) semantics
for the HCI_QUIRK_USE_BDADDR_PROPERTY quirk so that the device address
is considered invalid unless an address is provided by firmware.
This specifically means that this flag must only be set for devices with
invalid addresses, but the Broadcom driver has so far been setting this
flag unconditionally.
Fortunately the driver already checks for invalid addresses during setup
and sets the HCI_QUIRK_INVALID_BDADDR flag. Use this flag to indicate
when the address can be overridden by firmware (long term, this should
probably just always be allowed).
Fixes: 6945795bc81a ("Bluetooth: fix use-bdaddr-property quirk") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/lkml/ecef83c8-497f-4011-607b-a63c24764867@samsung.com Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sungwoo Kim [Wed, 31 May 2023 05:39:56 +0000 (01:39 -0400)]
Bluetooth: L2CAP: Fix use-after-free in l2cap_sock_ready_cb
l2cap_sock_release(sk) frees sk. However, sk's children are still alive
and point to the already free'd sk's address.
To fix this, l2cap_sock_release(sk) also cleans sk's children.
==================================================================
BUG: KASAN: use-after-free in l2cap_sock_ready_cb+0xb7/0x100 net/bluetooth/l2cap_sock.c:1650
Read of size 8 at addr ffff888104617aa8 by task kworker/u3:0/276
The buggy address belongs to the object at ffff888104617800
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 680 bytes inside of
1024-byte region [ffff888104617800, ffff888104617c00)
Ack: This bug is found by FuzzBT with a modified Syzkaller. Other
contributors are Ruoyu Wu and Hui Peng. Signed-off-by: Sungwoo Kim <iam@sung-woo.kim> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Johan Hovold [Wed, 31 May 2023 09:04:24 +0000 (11:04 +0200)]
Bluetooth: fix use-bdaddr-property quirk
Devices that lack persistent storage for the device address can indicate
this by setting the HCI_QUIRK_INVALID_BDADDR which causes the controller
to be marked as unconfigured until user space has set a valid address.
The related HCI_QUIRK_USE_BDADDR_PROPERTY was later added to similarly
indicate that the device lacks a valid address but that one may be
specified in the devicetree.
As is clear from commit 7a0e5b15ca45 ("Bluetooth: Add quirk for reading
BD_ADDR from fwnode property") that added and documented this quirk and
commits like de79a9df1692 ("Bluetooth: btqcomsmd: use
HCI_QUIRK_USE_BDADDR_PROPERTY"), the device address of controllers with
this flag should be treated as invalid until user space has had a chance
to configure the controller in case the devicetree property is missing.
As it does not make sense to allow controllers with invalid addresses,
restore the original semantics, which also makes sure that the
implementation is consistent (e.g. get_missing_options() indicates that
the address must be set) and matches the documentation (including
comments in the code, such as, "In case any of them is set, the
controller has to start up as unconfigured.").
Fixes: e668eb1e1578 ("Bluetooth: hci_core: Don't stop BT if the BD address missing in dts") Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Johan Hovold [Wed, 31 May 2023 09:04:23 +0000 (11:04 +0200)]
Bluetooth: fix invalid-bdaddr quirk for non-persistent setup
Devices that lack persistent storage for the device address can indicate
this by setting the HCI_QUIRK_INVALID_BDADDR which causes the controller
to be marked as unconfigured until user space has set a valid address.
Once configured, the device address must be set on every setup for
controllers with HCI_QUIRK_NON_PERSISTENT_SETUP to avoid marking the
controller as unconfigured and requiring the address to be set again.
Fixes: 740011cfe948 ("Bluetooth: Add new quirk for non-persistent setup settings") Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Zhengping Jiang [Thu, 25 May 2023 00:04:15 +0000 (17:04 -0700)]
Bluetooth: L2CAP: Fix use-after-free
Fix potential use-after-free in l2cap_le_command_rej.
Signed-off-by: Zhengping Jiang <jiangzp@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Min-Hua Chen [Fri, 19 May 2023 10:43:23 +0000 (18:43 +0800)]
Bluetooth: btqca: use le32_to_cpu for ver.soc_id
Use le32_to_cpu for ver.soc_id to fix the following
sparse warning.
drivers/bluetooth/btqca.c:640:24: sparse: warning: restricted
__le32 degrades to integer
Signed-off-by: Min-Hua Chen <minhuadotchen@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Dan Gora <dan.gora@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add missing MODULE_FIRMWARE declarations for firmware referenced in
btrtl.c.
Signed-off-by: Dan Gora <dan.gora@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
Fix PTP received on wrong port with bridged SJA1105 DSA
Since the changes were made to tag_8021q to support imprecise RX for
bridged ports, the tag_sja1105 driver still prefers the source port
information deduced from the VLAN headers for link-local traffic, even
though the switch can theoretically do better and report the precise
source port.
The problem is that the tagger doesn't know when to trust one source of
information over another, because the INCL_SRCPT option (to "tag" link
local frames) is sometimes enabled and sometimes it isn't.
The first patch makes the switch provide the hardware tag for link local
traffic under all circumstances, and the second patch makes the tagger
always use that hardware tag as primary source of information for link
local packets.
====================
Vladimir Oltean [Tue, 27 Jun 2023 09:42:07 +0000 (12:42 +0300)]
net: dsa: tag_sja1105: always prefer source port information from INCL_SRCPT
Currently the sja1105 tagging protocol prefers using the source port
information from the VLAN header if that is available, falling back to
the INCL_SRCPT option if it isn't. The VLAN header is available for all
frames except for META frames initiated by the switch (containing RX
timestamps), and thus, the "if (is_link_local)" branch is practically
dead.
The tag_8021q source port identification has become more loose
("imprecise") and will report a plausible rather than exact bridge port,
when under a bridge (be it VLAN-aware or VLAN-unaware). But link-local
traffic always needs to know the precise source port. With incorrect
source port reporting, for example PTP traffic over 2 bridged ports will
all be seen on sockets opened on the first such port, which is incorrect.
Now that the tagging protocol has been changed to make link-local frames
always contain source port information, we can reverse the order of the
checks so that we always give precedence to that information (which is
always precise) in lieu of the tag_8021q VID which is only precise for a
standalone port.
Fixes: d7f9787a763f ("net: dsa: tag_8021q: add support for imprecise RX based on the VBID") Fixes: 91495f21fcec ("net: dsa: tag_8021q: replace the SVL bridging with VLAN-unaware IVL bridging") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vladimir Oltean [Tue, 27 Jun 2023 09:42:06 +0000 (12:42 +0300)]
net: dsa: sja1105: always enable the INCL_SRCPT option
Link-local traffic on bridged SJA1105 ports is sometimes tagged by the
hardware with source port information (when the port is under a VLAN
aware bridge).
The tag_8021q source port identification has become more loose
("imprecise") and will report a plausible rather than exact bridge port,
when under a bridge (be it VLAN-aware or VLAN-unaware). But link-local
traffic always needs to know the precise source port.
Modify the driver logic (and therefore: the tagging protocol itself) to
always include the source port information with link-local packets,
regardless of whether the port is standalone, under a VLAN-aware or
VLAN-unaware bridge. This makes it possible for the tagging driver to
give priority to that information over the tag_8021q VLAN header.
The big drawback with INCL_SRCPT is that it makes it impossible to
distinguish between an original MAC DA of 01:80:C2:XX:YY:ZZ and
01:80:C2:AA:BB:ZZ, because the tagger just patches MAC DA bytes 3 and 4
with zeroes. Only if PTP RX timestamping is enabled, the switch will
generate a META follow-up frame containing the RX timestamp and the
original bytes 3 and 4 of the MAC DA. Those will be used to patch up the
original packet. Nonetheless, in the absence of PTP RX timestamping, we
have to live with this limitation, since it is more important to have
the more precise source port information for link-local traffic.
Fixes: d7f9787a763f ("net: dsa: tag_8021q: add support for imprecise RX based on the VBID") Fixes: 91495f21fcec ("net: dsa: tag_8021q: replace the SVL bridging with VLAN-unaware IVL bridging") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
Fix PTP packet drops with ocelot-8021q DSA tag protocol
Changes in v2:
- Distinguish between L2 and L4 PTP packets
v1 at:
https://lore.kernel.org/netdev/20230626154003.3153076-1-vladimir.oltean@nxp.com/
Patch 3/3 fixes an issue with the ocelot/felix driver, where it would
drop PTP traffic on RX unless hardware timestamping for that packet type
was enabled.
Fixing that requires the driver to know whether it had previously
configured the hardware to timestamp PTP packets on that port. But it
cannot correctly determine that today using the existing code structure,
so patches 1/3 and 2/3 fix the control path of the code such that
ocelot->ports[port]->trap_proto faithfully reflects whether that
configuration took place.
====================
Vladimir Oltean [Tue, 27 Jun 2023 16:31:14 +0000 (19:31 +0300)]
net: dsa: felix: don't drop PTP frames with tag_8021q when RX timestamping is disabled
The driver implements a workaround for the fact that it doesn't have an
IRQ source to tell it whether PTP frames are available through the
extraction registers, for those frames to be processed and passed
towards the network stack. That workaround is to configure the switch,
through felix_hwtstamp_set() -> felix_update_trapping_destinations(),
to create two copies of PTP packets: one sent over Ethernet to the DSA
master, and one to be consumed through the aforementioned CPU extraction
queue registers.
The reason why we want PTP packets to be consumed through the CPU
extraction registers in the first place is because we want to see their
hardware RX timestamp. With tag_8021q, that is only visible that way,
and it isn't visible with the copy of the packet that's transmitted over
Ethernet.
The problem with the workaround implementation is that it drops the
packet received over Ethernet, in expectation of its copy being present
in the CPU extraction registers. However, if felix_hwtstamp_set() hasn't
run (aka PTP RX timestamping is disabled), the driver will drop the
original PTP frame and there will be no copy of it in the CPU extraction
registers. So, the network stack will simply not see any PTP frame.
Look at the port's trapping configuration to see whether the driver has
previously enabled the CPU extraction registers. If it hasn't, just
don't RX timestamp the frame and let it be passed up the stack by DSA,
which is perfectly fine.
Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vladimir Oltean [Tue, 27 Jun 2023 16:31:13 +0000 (19:31 +0300)]
net: mscc: ocelot: don't keep PTP configuration of all ports in single structure
In a future change, the driver will need to determine whether PTP RX
timestamping is enabled on a port (including whether traps were set up
on that port in particular) and that is currently not possible.
The driver supports different RX filters (L2, L4) and kinds of TX
timestamping (one-step, two-step) on its ports, but it saves all
configuration in a single struct hwtstamp_config that is global to the
switch. So, the latest timestamping configuration on one port
(including a request to disable timestamping) affects what gets reported
for all ports, even though the configuration itself is still individual
to each port.
The port timestamping configurations are only coupled because of the
common structure, so replace the hwtstamp_config with a mask of trapped
protocols saved per port. We also have the ptp_cmd to distinguish
between one-step and two-step PTP timestamping, so with those 2 bits of
information we can fully reconstruct a descriptive struct
hwtstamp_config for each port, during the SIOCGHWTSTAMP ioctl.
Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support") Fixes: 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vladimir Oltean [Tue, 27 Jun 2023 16:31:12 +0000 (19:31 +0300)]
net: mscc: ocelot: don't report that RX timestamping is enabled by default
PTP RX timestamping should be enabled when the user requests it, not by
default. If it is enabled by default, it can be problematic when the
ocelot driver is a DSA master, and it sidesteps what DSA tries to avoid
through __dsa_master_hwtstamp_validate().
Additionally, after the change which made ocelot trap PTP packets only
to the CPU at ocelot_hwtstamp_set() time, it is no longer even true that
RX timestamping is enabled by default, because until ocelot_hwtstamp_set()
is called, the PTP traps are actually not set up. So the rx_filter field
of ocelot->hwtstamp_config reflects an incorrect reality.
Fixes: 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets") Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 29 Jun 2023 10:10:39 +0000 (12:10 +0200)]
Merge branch 'net-sched-act_ipt-bug-fixes'
Florian Westphal says:
====================
net/sched: act_ipt bug fixes
v3: prefer skb_header() helper in patch 2. No other changes.
I've retained Acks and RvB-Tags of v2.
While checking if netfilter could be updated to replace selected
instances of NF_DROP with kfree_skb_reason+NF_STOLEN to improve
debugging info via drop monitor I found that act_ipt is incompatible
with such an approach. Moreover, it lacks multiple sanity checks
to avoid certain code paths that make assumptions that the tc layer
doesn't meet, such as header sanity checks, availability of skb_dst,
skb_nfct() and so on.
act_ipt test in the tc selftest still pass with this applied.
I think that we should consider removal of this module, while
this should take care of all problems, its ipv4 only and I don't
think there are any netfilter targets that lack a native tc
equivalent, even when ignoring bpf.
====================
Florian Westphal [Tue, 27 Jun 2023 12:38:13 +0000 (14:38 +0200)]
net/sched: act_ipt: zero skb->cb before calling target
xtables relies on skb being owned by ip stack, i.e. with ipv4
check in place skb->cb is supposed to be IPCB.
I don't see an immediate problem (REJECT target cannot be used anymore
now that PRE/POSTROUTING hook validation has been fixed), but better be
safe than sorry.
A much better patch would be to either mark act_ipt as
"depends on BROKEN" or remove it altogether. I plan to do this
for -next in the near future.
This tc extension is broken in the sense that tc lacks an
equivalent of NF_STOLEN verdict.
With NF_STOLEN, target function takes complete ownership of skb, caller
cannot dereference it anymore.
ACT_STOLEN cannot be used for this: it has a different meaning, caller
is allowed to dereference the skb.
At this time NF_STOLEN won't be returned by any targets as far as I can
see, but this may change in the future.
It might be possible to work around this via list of allowed
target extensions known to only return DROP or ACCEPT verdicts, but this
is error prone/fragile.
Existing selftest only validates xt_LOG and act_ipt is restricted
to ipv4 so I don't think this action is used widely.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Florian Westphal [Tue, 27 Jun 2023 12:38:12 +0000 (14:38 +0200)]
net/sched: act_ipt: add sanity checks on skb before calling target
Netfilter targets make assumptions on the skb state, for example
iphdr is supposed to be in the linear area.
This is normally done by IP stack, but in act_ipt case no
such checks are made.
Some targets can even assume that skb_dst will be valid.
Make a minimum effort to check for this:
- Don't call the targets eval function for non-ipv4 skbs.
- Don't call the targets eval function for POSTROUTING
emulation when the skb has no dst set.
v3: use skb_protocol helper (Davide Caratti)
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Florian Westphal [Tue, 27 Jun 2023 12:38:11 +0000 (14:38 +0200)]
net/sched: act_ipt: add sanity checks on table name and hook locations
Looks like "tc" hard-codes "mangle" as the only supported table
name, but on kernel side there are no checks.
This is wrong. Not all xtables targets are safe to call from tc.
E.g. "nat" targets assume skb has a conntrack object assigned to it.
Normally those get called from netfilter nat core which consults the
nat table to obtain the address mapping.
"tc" userspace either sets PRE or POSTROUTING as hook number, but there
is no validation of this on kernel side, so update netlink policy to
reject bogus numbers. Some targets may assume skb_dst is set for
input/forward hooks, so prevent those from being used.
act_ipt uses the hook number in two places:
1. the state hook number, this is fine as-is
2. to set par.hook_mask
The latter is a bit mask, so update the assignment to make
xt_check_target() to the right thing.
Followup patch adds required checks for the skb/packet headers before
calling the targets evaluation function.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Chengfeng Ye [Tue, 27 Jun 2023 12:03:40 +0000 (12:03 +0000)]
sctp: fix potential deadlock on &net->sctp.addr_wq_lock
As &net->sctp.addr_wq_lock is also acquired by the timer
sctp_addr_wq_timeout_handler() in protocal.c, the same lock acquisition
at sctp_auto_asconf_init() seems should disable irq since it is called
from sctp_accept() under process context.
This flaw was found using an experimental static analysis tool we are
developing for irq-related deadlock.
The tentative patch fix the potential deadlock by spin_lock_bh().
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Fixes: 34e5b0118685 ("sctp: delay auto_asconf init until binding the first addr") Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20230627120340.19432-1-dg573847474@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Linus Torvalds [Wed, 28 Jun 2023 23:43:10 +0000 (16:43 -0700)]
Merge tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking changes from Jakub Kicinski:
"WiFi 7 and sendpage changes are the biggest pieces of work for this
release. The latter will definitely require fixes but I think that we
got it to a reasonable point.
Core:
- Rework the sendpage & splice implementations
Instead of feeding data into sockets page by page extend sendmsg
handlers to support taking a reference on the data, controlled by a
new flag called MSG_SPLICE_PAGES
Rework the handling of unexpected-end-of-file to invoke an
additional callback instead of trying to predict what the right
combination of MORE/NOTLAST flags is
Remove the MSG_SENDPAGE_NOTLAST flag completely
- Implement SCM_PIDFD, a new type of CMSG type analogous to
SCM_CREDENTIALS, but it contains pidfd instead of plain pid
- Enable socket busy polling with CONFIG_RT
- Improve reliability and efficiency of reporting for ref_tracker
- Auto-generate a user space C library for various Netlink families
Protocols:
- Allow TCP to shrink the advertised window when necessary, prevent
sk_rcvbuf auto-tuning from growing the window all the way up to
tcp_rmem[2]
- Use per-VMA locking for "page-flipping" TCP receive zerocopy
- Prepare TCP for device-to-device data transfers, by making sure
that payloads are always attached to skbs as page frags
- Make the backoff time for the first N TCP SYN retransmissions
linear. Exponential backoff is unnecessarily conservative
- Create a new MPTCP getsockopt to retrieve all info
(MPTCP_FULL_INFO)
- Avoid waking up applications using TLS sockets until we have a full
record
- Allow using kernel memory for protocol ioctl callbacks, paving the
way to issuing ioctls over io_uring
- Add nolocalbypass option to VxLAN, forcing packets to be fully
encapsulated even if they are destined for a local IP address
- Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure
in-kernel ECMP implementation (e.g. Open vSwitch) select the same
link for all packets. Support L4 symmetric hashing in Open vSwitch
- PPPoE: make number of hash bits configurable
- Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client
(ipconfig)
- Add layer 2 miss indication and filtering, allowing higher layers
(e.g. ACL filters) to make forwarding decisions based on whether
packet matched forwarding state in lower devices (bridge)
- Support matching on Connectivity Fault Management (CFM) packets
- Hide the "link becomes ready" IPv6 messages by demoting their
printk level to debug
- HSR: don't enable promiscuous mode if device offloads the proto
- Support active scanning in IEEE 802.15.4
- Continue work on Multi-Link Operation for WiFi 7
BPF:
- Add precision propagation for subprogs and callbacks. This allows
maintaining verification efficiency when subprograms are used, or
in fact passing the verifier at all for complex programs,
especially those using open-coded iterators
- Improve BPF's {g,s}setsockopt() length handling. Previously BPF
assumed the length is always equal to the amount of written data.
But some protos allow passing a NULL buffer to discover what the
output buffer *should* be, without writing anything
- Accept dynptr memory as memory arguments passed to helpers
- Add routing table ID to bpf_fib_lookup BPF helper
- Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands
- Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark
maps as read-only)
- Show target_{obj,btf}_id in tracing link fdinfo
- Addition of several new kfuncs (most of the names are
self-explanatory):
- Add a set of new dynptr kfuncs: bpf_dynptr_adjust(),
bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size()
and bpf_dynptr_clone().
- bpf_task_under_cgroup()
- bpf_sock_destroy() - force closing sockets
- bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs
Netfilter:
- Relax set/map validation checks in nf_tables. Allow checking
presence of an entry in a map without using the value
- Increase ip_vs_conn_tab_bits range for 64BIT builds
- Allow updating size of a set
- Improve NAT tuple selection when connection is closing
Driver API:
- Integrate netdev with LED subsystem, to allow configuring HW
"offloaded" blinking of LEDs based on link state and activity
(i.e. packets coming in and out)
- Support configuring rate selection pins of SFP modules
- Factor Clause 73 auto-negotiation code out of the drivers, provide
common helper routines
- Add more fool-proof helpers for managing lifetime of MDIO devices
associated with the PCS layer
- Allow drivers to report advanced statistics related to Time Aware
scheduler offload (taprio)
- Allow opting out of VF statistics in link dump, to allow more VFs
to fit into the message
- Split devlink instance and devlink port operations
New hardware / drivers:
- Ethernet:
- Synopsys EMAC4 IP support (stmmac)
- Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches
- Marvell 88E6250 7 port switches
- Microchip LAN8650/1 Rev.B0 PHYs
- MediaTek MT7981/MT7988 built-in 1GE PHY driver
- Ethernet NICs:
- Intel (100G, ice):
- support dynamic interrupt allocation
- use meta data match instead of VF MAC addr on slow-path
- nVidia/Mellanox:
- extend link aggregation to handle 4, rather than just 2 ports
- spawn sub-functions without any features by default
- OcteonTX2:
- support HTB (Tx scheduling/QoS) offload
- make RSS hash generation configurable
- support selecting Rx queue using TC filters
- Wangxun (ngbe/txgbe):
- add basic Tx/Rx packet offloads
- add phylink support (SFP/PCS control)
- Freescale/NXP (enetc):
- report TAPRIO packet statistics
- Solarflare/AMD:
- support matching on IP ToS and UDP source port of outer
header
- VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6
- add devlink dev info support for EF10
- Virtual NICs:
- Microsoft vNIC:
- size the Rx indirection table based on requested
configuration
- support VLAN tagging
- Amazon vNIC:
- try to reuse Rx buffers if not fully consumed, useful for ARM
servers running with 16kB pages
- Google vNIC:
- support TCP segmentation of >64kB frames
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- enable USXGMII (88E6191X)
- Microchip:
- lan966x: add support for Egress Stage 0 ACL engine
- lan966x: support mapping packet priority to internal switch
priority (based on PCP or DSCP)
- Ethernet PHYs:
- Broadcom PHYs:
- support for Wake-on-LAN for BCM54210E/B50212E
- report LPI counter
- Microsemi PHYs: support RGMII delay configuration (VSC85xx)
- Micrel PHYs: receive timestamp in the frame (LAN8841)
- Realtek PHYs: support optional external PHY clock
- Altera TSE PCS: merge the driver into Lynx PCS which it is a
variant of
- CAN: Kvaser PCIEcan:
- support packet timestamping
- WiFi:
- Intel (iwlwifi):
- major update for new firmware and Multi-Link Operation (MLO)
- configuration rework to drop test devices and split the
different families
- support for segmented PNVM images and power tables
- new vendor entries for PPAG (platform antenna gain) feature
- Qualcomm 802.11ax (ath11k):
- Multiple Basic Service Set Identifier (MBSSID) and Enhanced
MBSSID Advertisement (EMA) support in AP mode
- support factory test mode
- RealTek (rtw89):
- add RSSI based antenna diversity
- support U-NII-4 channels on 5 GHz band
- RealTek (rtl8xxxu):
- AP mode support for 8188f
- support USB RX aggregation for the newer chips"
* tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1602 commits)
net: scm: introduce and use scm_recv_unix helper
af_unix: Skip SCM_PIDFD if scm->pid is NULL.
net: lan743x: Simplify comparison
netlink: Add __sock_i_ino() for __netlink_diag_dump().
net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses
Revert "af_unix: Call scm_recv() only after scm_set_cred()."
phylink: ReST-ify the phylink_pcs_neg_mode() kdoc
libceph: Partially revert changes to support MSG_SPLICE_PAGES
net: phy: mscc: fix packet loss due to RGMII delays
net: mana: use vmalloc_array and vcalloc
net: enetc: use vmalloc_array and vcalloc
ionic: use vmalloc_array and vcalloc
pds_core: use vmalloc_array and vcalloc
gve: use vmalloc_array and vcalloc
octeon_ep: use vmalloc_array and vcalloc
net: usb: qmi_wwan: add u-blox 0x1312 composition
perf trace: fix MSG_SPLICE_PAGES build error
ipvlan: Fix return value of ipvlan_queue_xmit()
netfilter: nf_tables: fix underflow in chain reference counter
netfilter: nf_tables: unbind non-anonymous set if rule construction fails
...
Linus Torvalds [Wed, 28 Jun 2023 23:05:21 +0000 (16:05 -0700)]
Merge tag 'v6.5-rc1-sysctl-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull sysctl updates from Luis Chamberlain:
"The changes for sysctl are in line with prior efforts to stop usage of
deprecated routines which incur recursion and also make it hard to
remove the empty array element in each sysctl array declaration.
The most difficult user to modify was parport which required a bit of
re-thinking of how to declare shared sysctls there, Joel Granados has
stepped up to the plate to do most of this work and eventual removal
of register_sysctl_table(). That work ended up saving us about 1465
bytes according to bloat-o-meter. Since we gained a few bloat-o-meter
karma points I moved two rather small sysctl arrays from
kernel/sysctl.c leaving us only two more sysctl arrays to move left.
Most changes have been tested on linux-next for about a month. The
last straggler patches are a minor parport fix, changes to the sysctl
kernel selftest so to verify correctness and prevent regressions for
the future change he made to provide an alternative solution for the
special sysctl mount point target which was using the now deprecated
sysctl child element.
This is all prep work to now finally be able to remove the empty array
element in all sysctl declarations / registrations which is expected
to save us a bit of bytes all over the kernel. That work will be
tested early after v6.5-rc1 is out"
* tag 'v6.5-rc1-sysctl-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
sysctl: replace child with an enumeration
sysctl: Remove debugging dump_stack
test_sysclt: Test for registering a mount point
test_sysctl: Add an option to prevent test skip
test_sysctl: Add an unregister sysctl test
test_sysctl: Group node sysctl test under one func
test_sysctl: Fix test metadata getters
parport: plug a sysctl register leak
sysctl: move security keys sysctl registration to its own file
sysctl: move umh sysctl registration to its own file
signal: move show_unhandled_signals sysctl to its own file
sysctl: remove empty dev table
sysctl: Remove register_sysctl_table
sysctl: Refactor base paths registrations
sysctl: stop exporting register_sysctl_table
parport: Removed sysctl related defines
parport: Remove register_sysctl_table from parport_default_proc_register
parport: Remove register_sysctl_table from parport_device_proc_register
parport: Remove register_sysctl_table from parport_proc_register
parport: Move magic number "15" to a define
Linus Torvalds [Wed, 28 Jun 2023 22:51:08 +0000 (15:51 -0700)]
Merge tag 'v6.5-rc1-modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull module updates from Luis Chamberlain:
"The changes queued up for modules are pretty tame, mostly code removal
of moving of code.
Only two minor functional changes are made, the only one which stands
out is Sebastian Andrzej Siewior's simplification of module reference
counting by removing preempt_disable() and that has been tested on
linux-next for well over a month without no regressions.
I'm now, I guess, also a kitchen sink for some kallsyms changes"
[ There was a mis-communication about the concurrent module load changes
that I had expected to come through Luis despite me authoring the
patch. So some of the module updates were left hanging in the email
ether, and I just committed them separately.
It's my bad - I should have made it more clear that I expected my
own patches to come through the module tree too. Now they missed
linux-next, but hopefully that won't cause any issues - Linus ]
* tag 'v6.5-rc1-modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
kallsyms: make kallsyms_show_value() as generic function
kallsyms: move kallsyms_show_value() out of kallsyms.c
kallsyms: remove unsed API lookup_symbol_attrs
kallsyms: remove unused arch_get_kallsym() helper
module: Remove preempt_disable() from module reference counting.
Linus Torvalds [Tue, 30 May 2023 01:39:51 +0000 (21:39 -0400)]
modules: catch concurrent module loads, treat them as idempotent
This is the new-and-improved attempt at avoiding huge memory load spikes
when the user space boot sequence tries to load hundreds (or even
thousands) of redundant duplicate modules in parallel.
See commit 9828ed3f695a ("module: error out early on concurrent load of
the same module file") for background and an earlier failed attempt that
was reverted.
That earlier attempt just said "concurrently loading the same module is
silly, just open the module file exclusively and return -ETXTBSY if
somebody else is already loading it".
While it is true that concurrent module loads of the same module is
silly, the reason that earlier attempt then failed was that the
concurrently loaded module would often be a prerequisite for another
module.
Thus failing to load the prerequisite would then cause cascading
failures of the other modules, rather than just short-circuiting that
one unnecessary module load.
At the same time, we still really don't want to load the contents of the
same module file hundreds of times, only to then wait for an eventually
successful load, and have everybody else return -EEXIST.
As a result, this takes another approach, and treats concurrent module
loads from the same file as "idempotent" in the inode. So if one module
load is ongoing, we don't start a new one, but instead just wait for the
first one to complete and return the same return value as it did.
So unlike the first attempt, this does not return early: the intent is
not to speed up the boot, but to avoid a thundering herd problem in
allocating memory (both physical and virtual) for a module more than
once.
Also note that this does change behavior: it used to be that when you
had concurrent loads, you'd have one "winner" that would return success,
and everybody else would return -EEXIST.
In contrast, this idempotent logic goes all Oprah on the problem, and
says "You are a winner! And you are a winner! We are ALL winners". But
since there's no possible actual real semantic difference between "you
loaded the module" and "somebody else already loaded the module", this
is more of a feel-good change than an actual honest-to-goodness semantic
change.
Of course, any true Johnny-come-latelies that don't get caught in the
concurrency filter will still return -EEXIST. It's no different from
not even getting a seat at an Oprah taping. That's life.
See the long thread on the kernel mailing list about this all, which
includes some numbers for memory use before and after the patch.
Link: https://lore.kernel.org/lkml/20230524213620.3509138-1-mcgrof@kernel.org/ Reviewed-by: Johan Hovold <johan@kernel.org> Tested-by: Johan Hovold <johan@kernel.org> Tested-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Rudi Heitbaum <rudi@heitbaum..com> Tested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 28 Jun 2023 21:06:39 +0000 (14:06 -0700)]
Merge tag 'mmc-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC updates from Ulf Hansson:
"MMC core:
- Allow synchronous detection of (e)MMC/SD/SDIO cards
- Fixup error check for ioctls for SPI hosts
- Disable broken SD-Cache support for Kingston Canvas Go Plus from 2019
- Disable broken eMMC-Trim support for Kingston EMMC04G-M627
- Disable broken eMMC-Trim support for Micron MTFC4GACAJCN-1M
MMC host:
- bcm2835: Convert DT bindings to YAML
- mmci:
- Enable asynchronous probe
- Transform the ux500 HW-busy detection into a proper state machine
- Add support for SW busy-end timeouts for the ux500 variants
- mmci_stm32:
- Add support for sdm32 variant revision v3.0 used on STM32MP25
- Improve the tuning sequence
- mtk-sd: Tune polling-period to improve performance
- sdhci: Fixup DMA configuration for 64-bit DMA mode
- sdhci-bcm-kona: Convert DT bindings to YAML
- sdhci-msm:
- Switch to use the new ICE API
- Add support for the SC8280XP/IPQ6018/QDU1000/QRU1000 variants
- sdhci-pci-gli:
- Add support SD Express cards for GL9767
- Add support for the Genesys Logic GL9767 variant"
* tag 'mmc-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (42 commits)
dt-bindings: mmc: fsl-imx-esdhc: Add imx6ul support
mmc: mmci: Add support for SW busy-end timeouts
mmc: Add MMC_QUIRK_BROKEN_SD_CACHE for Kingston Canvas Go Plus from 11/2019
mmc: core: disable TRIM on Kingston EMMC04G-M627
mmc: mmci: stm32: add delay block support for STM32MP25
mmc: mmci: stm32: prepare other delay block support
mmc: mmci: stm32: manage block gap hardware flow control
mmc: mmci: Add support for sdmmc variant revision v3.0
mmc: mmci: add stm32_idmabsize_align parameter
dt-bindings: mmc: mmci: Add st,stm32mp25-sdmmc2 compatible
mmc: core: disable TRIM on Micron MTFC4GACAJCN-1M
mmc: mmci: Break out a helper function
mmc: mmci: Use a switch statement machine
mmc: mmci: Use state machine state as exit condition
mmc: mmci: Retry the busy start condition
mmc: mmci: Make busy complete state machine explicit
mmc: mmci: Break out error check in busy detect
mmc: mmci: Stash status while waiting for busy
mmc: mmci: Unwind big if() clause
mmc: mmci: Clear busy_status when starting command
...
Linus Torvalds [Wed, 28 Jun 2023 21:02:03 +0000 (14:02 -0700)]
Merge tag 'mtd/for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull mtd updates from
"Core MTD changes:
- otp:
- Put factory OTP/NVRAM into the entropy pool
- Clean up on error in mtd_otp_nvmem_add()
MTD devices changes:
- sm_ftl: Fix typos in comments
- Use SPDX license headers
- pismo: Switch back to use i2c_driver's .probe()
- mtdpart: Drop useless LIST_HEAD
- st_spi_fsm: Use the devm_clk_get_enabled() helper function
DT binding changes:
- partitions:
- Include TP-Link SafeLoader in allowed list
- Add missing type for "linux,rootfs"
- Extend the nand node names filter
- Create a file for raw NAND chip properties
- Mark nand-ecc-placement deprecated
- Describe nand-ecc-mode
- Prevent NAND chip unevaluated properties in all NAND bindings with
a NAND chip reference.
- Qcom: Fix a property position
- Marvell: Convert to YAML DT schema
Raw NAND chip drivers changes:
- Macronix: OTP access for MX30LFxG18AC
- Add basic Sandisk manufacturer ops
- Add support for Sandisk SDTNQGAMA
Raw NAND controller driver changes:
- Meson:
- Replace integer consts with proper defines
- Allow waiting w/o wired ready/busy pin
- Check buffer length validity
- Fix unaligned DMA buffers handling
- dt-bindings: Fix 'nand-rb' property
- Arasan: Revert "mtd: rawnand: arasan: Prevent an unsupported
configuration" as this limitation is no longer true thanks to the
recent efforts in improving the clocks support in this driver
SPI-NAND changes:
- Gigadevice: add support for GD5F2GQ5xExxH
- Macronix: Add support for serial NAND flashes"
Linus Torvalds [Wed, 28 Jun 2023 20:48:42 +0000 (13:48 -0700)]
Merge tag 'spi-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi updates from Mark Brown:
"One small core feature this time around but mostly driver improvements
and additions for SPI:
- Add support for controlling the idle state of MOSI, some systems
can support this and depending on the system integration may need
it to avoid glitching in some situations
- Support for polling mode in the S3C64xx driver and DMA on the
Qualcomm QSPI driver
- Support for several Allwinner SoCs, AMD Pensando Elba, Intel Mount
Evans, Renesas RZ/V2M, and ST STM32H7"
* tag 'spi-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (66 commits)
spi: dt-bindings: atmel,at91rm9200-spi: fix broken sam9x7 compatible
spi: dt-bindings: atmel,at91rm9200-spi: add sam9x7 compatible
spi: Add support for Renesas CSI
spi: dt-bindings: Add bindings for RZ/V2M CSI
spi: sun6i: Use the new helper to derive the xfer timeout value
spi: atmel: Prevent false timeouts on long transfers
spi: dt-bindings: stm32: do not disable spi-slave property for stm32f4-f7
spi: Create a helper to derive adaptive timeouts
spi: spi-geni-qcom: correctly handle -EPROBE_DEFER from dma_request_chan()
spi: stm32: disable spi-slave property for stm32f4-f7
spi: stm32: introduction of stm32h7 SPI device mode support
spi: stm32: use dmaengine_terminate_{a}sync instead of _all
spi: stm32: renaming of spi_master into spi_controller
spi: dw: Remove misleading comment for Mount Evans SoC
spi: dt-bindings: snps,dw-apb-ssi: Add compatible for Intel Mount Evans SoC
spi: dw: Add compatible for Intel Mount Evans SoC
spi: s3c64xx: Use dev_err_probe()
spi: s3c64xx: Use the managed spi master allocation function
spi: spl022: Probe defer is no error
spi: spi-imx: fix mixing of native and gpio chipselects for imx51/imx53/imx6 variants
...
Linus Torvalds [Wed, 28 Jun 2023 20:32:47 +0000 (13:32 -0700)]
Merge tag 'regulator-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator updates from Mark Brown:
"This release is almost all drivers, there's some small improvements in
the core but otherwise everything is updates to drivers, mostly the
addition of new ones.
There's also a bunch of changes pulled in from the MFD subsystem as
dependencies, Rockchip and TI core MFD code that the regulator drivers
depend on.
I've also yet again managed to put a SPI commit in the regulator tree,
I don't know what it is about those two trees (this for
spi-geni-qcom).
Summary:
- Support for Renesas RAA215300, Rockchip RK808, Texas Instruments
TPS6594 and TPS6287x, and X-Powers AXP15060 and AXP313a"
* tag 'regulator-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (43 commits)
regulator: Add Renesas PMIC RAA215300 driver
regulator: dt-bindings: Add Renesas RAA215300 PMIC bindings
regulator: ltc3676: Use maple tree register cache
regulator: ltc3589: Use maple tree register cache
regulator: helper: Document ramp_delay parameter of regulator_set_ramp_delay_regmap()
regulator: mt6358: Use linear voltage helpers for single range regulators
regulator: mt6358: Const-ify mt6358_regulator_info data structures
regulator: mt6358: Drop *_SSHUB regulators
regulator: mt6358: Merge VCN33_* regulators
regulator: dt-bindings: mt6358: Drop *_sshub regulators
regulator: dt-bindings: mt6358: Merge ldo_vcn33_* regulators
regulator: dt-bindings: pwm-regulator: Add missing type for "pwm-dutycycle-unit"
regulator: Switch two more i2c drivers back to use .probe()
spi: spi-geni-qcom: Do not do DMA map/unmap inside driver, use framework instead
soc: qcom: geni-se: Add interfaces geni_se_tx_init_dma() and geni_se_rx_init_dma()
regulator: tps6594-regulator: Add driver for TI TPS6594 regulators
regulator: axp20x: Add AXP15060 support
regulator: axp20x: Add support for AXP313a variant
dt-bindings: pfuze100.yaml: Add an entry for interrupts
regulator: stm32-pwr: Fix regulator disabling
...
Linus Torvalds [Wed, 28 Jun 2023 20:26:19 +0000 (13:26 -0700)]
Merge tag 'regmap-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
"Another busy release for regmap with the second half of the maple tree
register cache implementation, there's some smaller optimisations that
could be done but this should now be able to replace the rbtree cache
for most devices.
We also had a followup from Aidan MacDonald's refactoring of some of
the regmap-irq interfaces, the conversion is complete so the old
interfaces are removed. This means that even with the new features for
the maple tree cache we'd have a nice negative diffstat were it not
for the addition of a bunch more KUnit coverage.
There's one GPIO patch in here, it was a dependency for a cleanup of
an API in the regmap-irq code for which the gpio-104-dio-48e driver
was the only user.
Highlights:
- The maple tree cache can now load in default values more
efficiently, and is capabale of syncing multiple registers
in a single write during cache sync
- More KUnit coverage, including some coverage for raw I/O
and a dummy RAM backed cache to support it
- Removal of several old interfaces in regmap-irq now all
users have been modernised"
* tag 'regmap-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: (23 commits)
regmap: Allow reads from write only registers with the flat cache
regmap: Drop early readability check
regmap: Check for register readability before checking cache during read
regmap: Add test to make sure we don't sync to read only registers
regmap: Add a test case for write only registers
regmap: Add test that writes to write only registers are prevented
regmap: Add debugfs file for forcing field writes
regmap: Don't check for changes in regcache_set_val()
regmap: maple: Implement block sync for the maple tree cache
regmap: Provide basic KUnit coverage for the raw register I/O
regmap: Provide a ram backed regmap with raw support
regmap: Add missing cache_only checks
regmap: regmap-irq: Move handle_post_irq to before pm_runtime_put
regmap: Load register defaults in blocks rather than register by register
regmap: mmio: Allow passing an empty config->reg_stride
regmap-irq: Drop backward compatibility for inverted mask/unmask
regmap-irq: Minor adjustments to .handle_mask_sync()
regmap-irq: Remove support for not_fixed_stride
regmap-irq: Remove type registers
regmap-irq: Remove virtual registers
...
Linus Torvalds [Wed, 28 Jun 2023 19:47:30 +0000 (12:47 -0700)]
x86/mem_encrypt: Remove stale mem_encrypt_init() declaration
The memory encryption initialization logic was moved from init/main.c
into arch_cpu_finalize_init() in commit 439e17576eb4 ("init, x86: Move
mem_encrypt_init() into arch_cpu_finalize_init()"), but a stale
declaration for the init function was left in <linux/init.h>.
And didn't cause any problems if you had X86_MEM_ENCRYPT enabled, which
apparently everybody involved did have. See also commit 0a9567ac5e6a
("x86/mem_encrypt: Unbreak the AMD_MEM_ENCRYPT=n build") in this whole
sad saga of conflicting declarations for different situations.
Reported-by: Matthew Wilcox <willy@infradead.org> Fixes: 439e17576eb4 init, x86: Move mem_encrypt_init() into arch_cpu_finalize_init() Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 28 Jun 2023 19:20:24 +0000 (12:20 -0700)]
mm: fix __access_remote_vm() GUP failure case
Commit ca5e863233e8 ("mm/gup: remove vmas parameter from
get_user_pages_remote()") removed the vma argument from GUP handling,
and instead added a helper function (get_user_page_vma_remote()) that
looks it up separately using 'vma_lookup()'. And then converted
existing users that needed a vma to use the helper instead.
However, the helper function intentionally acts exactly like the old
get_user_pages_remote() did, and only fills in 'vma' on successful page
lookup. Fine so far.
However, __access_remote_vm() wants the vma even for the unsuccessful
case, and used to do a
vma = vma_lookup(mm, addr);
explicitly to look it up when the get_user_page() failed.
However, that conversion commit incorrectly removed that vma lookup,
thinking that get_user_page_vma_remote() would have done it. Not so.
So add the vma_lookup() back in.
Fixes: ca5e863233e8 ("mm/gup: remove vmas parameter from get_user_pages_remote()") Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 28 Jun 2023 17:59:38 +0000 (10:59 -0700)]
Merge tag 'mm-nonmm-stable-2023-06-24-19-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-mm updates from Andrew Morton:
- Arnd Bergmann has fixed a bunch of -Wmissing-prototypes in top-level
directories
- Douglas Anderson has added a new "buddy" mode to the hardlockup
detector. It permits the detector to work on architectures which
cannot provide the required interrupts, by having CPUs periodically
perform checks on other CPUs
- Zhen Lei has enhanced kexec's ability to support two crash regions
- Petr Mladek has done a lot of cleanup on the hard lockup detector's
Kconfig entries
- And the usual bunch of singleton patches in various places
* tag 'mm-nonmm-stable-2023-06-24-19-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits)
kernel/time/posix-stubs.c: remove duplicated include
ocfs2: remove redundant assignment to variable bit_off
watchdog/hardlockup: fix typo in config HARDLOCKUP_DETECTOR_PREFER_BUDDY
powerpc: move arch_trigger_cpumask_backtrace from nmi.h to irq.h
devres: show which resource was invalid in __devm_ioremap_resource()
watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH
watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64
watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific
watchdog/hardlockup: declare arch_touch_nmi_watchdog() only in linux/nmi.h
watchdog/hardlockup: make the config checks more straightforward
watchdog/hardlockup: sort hardlockup detector related config values a logical way
watchdog/hardlockup: move SMP barriers from common code to buddy code
watchdog/buddy: simplify the dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY
watchdog/buddy: don't copy the cpumask in watchdog_next_cpu()
watchdog/buddy: cleanup how watchdog_buddy_check_hardlockup() is called
watchdog/hardlockup: remove softlockup comment in touch_nmi_watchdog()
watchdog/hardlockup: in watchdog_hardlockup_check() use cpumask_copy()
watchdog/hardlockup: don't use raw_cpu_ptr() in watchdog_hardlockup_kick()
watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement watchdog_hardlockup_probe()
watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails
...
Linus Torvalds [Wed, 28 Jun 2023 17:28:11 +0000 (10:28 -0700)]
Merge tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
- Yosry Ahmed brought back some cgroup v1 stats in OOM logs
- Yosry has also eliminated cgroup's atomic rstat flushing
- Nhat Pham adds the new cachestat() syscall. It provides userspace
with the ability to query pagecache status - a similar concept to
mincore() but more powerful and with improved usability
- Mel Gorman provides more optimizations for compaction, reducing the
prevalence of page rescanning
- Lorenzo Stoakes has done some maintanance work on the
get_user_pages() interface
- Liam Howlett continues with cleanups and maintenance work to the
maple tree code. Peng Zhang also does some work on maple tree
- Johannes Weiner has done some cleanup work on the compaction code
- David Hildenbrand has contributed additional selftests for
get_user_pages()
- Thomas Gleixner has contributed some maintenance and optimization
work for the vmalloc code
- Baolin Wang has provided some compaction cleanups,
- SeongJae Park continues maintenance work on the DAMON code
- Huang Ying has done some maintenance on the swap code's usage of
device refcounting
- Christoph Hellwig has some cleanups for the filemap/directio code
- Ryan Roberts provides two patch series which yield some
rationalization of the kernel's access to pte entries - use the
provided APIs rather than open-coding accesses
- Lorenzo Stoakes has some fixes to the interaction between pagecache
and directio access to file mappings
- John Hubbard has a series of fixes to the MM selftesting code
- ZhangPeng continues the folio conversion campaign
- Hugh Dickins has been working on the pagetable handling code, mainly
with a view to reducing the load on the mmap_lock
- Catalin Marinas has reduced the arm64 kmalloc() minimum alignment
from 128 to 8
- Domenico Cerasuolo has improved the zswap reclaim mechanism by
reorganizing the LRU management
- Matthew Wilcox provides some fixups to make gfs2 work better with the
buffer_head code
- Vishal Moola also has done some folio conversion work
- Matthew Wilcox has removed the remnants of the pagevec code - their
functionality is migrated over to struct folio_batch
* tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits)
mm/hugetlb: remove hugetlb_set_page_subpool()
mm: nommu: correct the range of mmap_sem_read_lock in task_mem()
hugetlb: revert use of page_cache_next_miss()
Revert "page cache: fix page_cache_next/prev_miss off by one"
mm/vmscan: fix root proactive reclaim unthrottling unbalanced node
mm: memcg: rename and document global_reclaim()
mm: kill [add|del]_page_to_lru_list()
mm: compaction: convert to use a folio in isolate_migratepages_block()
mm: zswap: fix double invalidate with exclusive loads
mm: remove unnecessary pagevec includes
mm: remove references to pagevec
mm: rename invalidate_mapping_pagevec to mapping_try_invalidate
mm: remove struct pagevec
net: convert sunrpc from pagevec to folio_batch
i915: convert i915_gpu_error to use a folio_batch
pagevec: rename fbatch_count()
mm: remove check_move_unevictable_pages()
drm: convert drm_gem_put_pages() to use a folio_batch
i915: convert shmem_sg_free_table() to use a folio_batch
scatterlist: add sg_set_folio()
...
Linus Torvalds [Wed, 28 Jun 2023 04:52:15 +0000 (21:52 -0700)]
Merge tag 'docs-arm64-move' of git://git.lwn.net/linux
Pull arm64 documentation move from Jonathan Corbet:
"Move the arm64 architecture documentation under Documentation/arch/.
This brings some order to the documentation directory, declutters the
top-level directory, and makes the documentation organization more
closely match that of the source"
* tag 'docs-arm64-move' of git://git.lwn.net/linux:
perf arm-spe: Fix a dangling Documentation/arm64 reference
mm: Fix a dangling Documentation/arm64 reference
arm64: Fix dangling references to Documentation/arm64
dt-bindings: fix dangling Documentation/arm64 reference
docs: arm64: Move arm64 documentation under Documentation/arch/
Linus Torvalds [Wed, 28 Jun 2023 04:24:18 +0000 (21:24 -0700)]
Merge tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"There are three areas of note:
A bunch of strlcpy()->strscpy() conversions ended up living in my tree
since they were either Acked by maintainers for me to carry, or got
ignored for multiple weeks (and were trivial changes).
The compiler option '-fstrict-flex-arrays=3' has been enabled
globally, and has been in -next for the entire devel cycle. This
changes compiler diagnostics (though mainly just -Warray-bounds which
is disabled) and potential UBSAN_BOUNDS and FORTIFY _warning_
coverage. In other words, there are no new restrictions, just
potentially new warnings. Any new FORTIFY warnings we've seen have
been fixed (usually in their respective subsystem trees). For more
details, see commit df8fc4e934c12b.
The under-development compiler attribute __counted_by has been added
so that we can start annotating flexible array members with their
associated structure member that tracks the count of flexible array
elements at run-time. It is possible (likely?) that the exact syntax
of the attribute will change before it is finalized, but GCC and Clang
are working together to sort it out. Any changes can be made to the
macro while we continue to add annotations.
As an example of that last case, I have a treewide commit waiting with
such annotations found via Coccinelle:
- Add missing function prototypes seen with W=1 (Arnd Bergmann)
- Fix strscpy() kerndoc typo (Arne Welzel)
- Replace strlcpy() with strscpy() across many subsystems which were
either Acked by respective maintainers or were trivial changes that
went ignored for multiple weeks (Azeem Shaikh)
- Remove unneeded cc-option test for UBSAN_TRAP (Nick Desaulniers)
- Add KUnit tests for strcat()-family
- Enable KUnit tests of FORTIFY wrappers under UML
- Add more complete FORTIFY protections for strlcat()
- Add missed disabling of FORTIFY for all arch purgatories.
- Enable -fstrict-flex-arrays=3 globally
- Tightening UBSAN_BOUNDS when using GCC
- Improve checkpatch to check for strcpy, strncpy, and fake flex
arrays
- Improve use of const variables in FORTIFY
- Add requested struct_size_t() helper for types not pointers
- Add __counted_by macro for annotating flexible array size members"
* tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (54 commits)
netfilter: ipset: Replace strlcpy with strscpy
uml: Replace strlcpy with strscpy
um: Use HOST_DIR for mrproper
kallsyms: Replace all non-returning strlcpy with strscpy
sh: Replace all non-returning strlcpy with strscpy
of/flattree: Replace all non-returning strlcpy with strscpy
sparc64: Replace all non-returning strlcpy with strscpy
Hexagon: Replace all non-returning strlcpy with strscpy
kobject: Use return value of strreplace()
lib/string_helpers: Change returned value of the strreplace()
jbd2: Avoid printing outside the boundary of the buffer
checkpatch: Check for 0-length and 1-element arrays
riscv/purgatory: Do not use fortified string functions
s390/purgatory: Do not use fortified string functions
x86/purgatory: Do not use fortified string functions
acpi: Replace struct acpi_table_slit 1-element array with flex-array
clocksource: Replace all non-returning strlcpy with strscpy
string: use __builtin_memcpy() in strlcpy/strlcat
staging: most: Replace all non-returning strlcpy with strscpy
drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy
...
Linus Torvalds [Wed, 28 Jun 2023 04:21:32 +0000 (21:21 -0700)]
Merge tag 'pstore-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore updates from Kees Cook:
- Check for out-of-memory condition (Jiasheng Jiang)
- Convert to platform remove callback returning void (Uwe Kleine-König)
* tag 'pstore-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore/ram: Add check for kstrdup
pstore/ram: Convert to platform remove callback returning void
Linus Torvalds [Wed, 28 Jun 2023 04:12:41 +0000 (21:12 -0700)]
Merge tag 'execve-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve updates from Kees Cook:
- Fix a few comments for correctness and typos (Baruch Siach)
- Small simplifications for binfmt (Christophe JAILLET)
- Set p_align to 4 for PT_NOTE in core dump (Fangrui Song)
* tag 'execve-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
binfmt_elf: fix comment typo s/reset/regset/
elf: correct note name comment
binfmt: Slightly simplify elf_fdpic_map_file()
binfmt: Use struct_size()
coredump, vmcore: Set p_align to 4 for PT_NOTE
Linus Torvalds [Wed, 28 Jun 2023 00:58:06 +0000 (17:58 -0700)]
Merge tag 'Smack-for-6.5' of https://github.com/cschaufler/smack-next
Pull smack updates from Casey Schaufler:
"There are two patches, both of which change how Smack initializes the
SMACK64TRANSMUTE extended attribute.
The first corrects the behavior of overlayfs, which creates inodes
differently from other filesystems. The second ensures that transmute
attributes specified by mount options are correctly assigned"
* tag 'Smack-for-6.5' of https://github.com/cschaufler/smack-next:
smack: Record transmuting in smk_transmuted
smack: Retrieve transmuting information in smack_inode_getsecurity()
Linus Torvalds [Wed, 28 Jun 2023 00:32:34 +0000 (17:32 -0700)]
Merge tag 'integrity-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity subsystem updates from Mimi Zohar:
"An i_version change, one bug fix, and three kernel doc fixes:
- instead of IMA detecting file change by directly accesssing
i_version, it now calls vfs_getattr_nosec().
- fix a race condition when inserting a new node in the iint rb-tree"
* tag 'integrity-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: Fix build warnings
evm: Fix build warnings
evm: Complete description of evm_inode_setattr()
integrity: Fix possible multiple allocation in integrity_inode_get()
IMA: use vfs_getattr_nosec to get the i_version
Linus Torvalds [Wed, 28 Jun 2023 00:24:26 +0000 (17:24 -0700)]
Merge tag 'lsm-pr-20230626' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
- A SafeSetID patch to correct what appears to be a cut-n-paste typo in
the code causing a UID to be printed where a GID was desired.
This is coming via the LSM tree because we haven't been able to get a
response from the SafeSetID maintainer (Micah Morton) in several
months. Hopefully we are able to get in touch with Micah, but until
we do I'm going to pick them up in the LSM tree.
- A small fix to the reiserfs LSM xattr code.
We're continuing to work through some issues with the reiserfs code
as we try to fixup the LSM xattr handling, but in the process we're
uncovering some ugly problems in reiserfs and we may just end up
removing the LSM xattr support in reiserfs prior to reiserfs'
removal.
For better or worse, this shouldn't impact any of the reiserfs users,
as we discovered that LSM xattrs on reiserfs were completely broken,
meaning no one is currently using the combo of reiserfs and a file
labeling LSM.
- A tweak to how the cap_user_data_t struct/typedef is declared in the
header file to appease the Sparse gods.
- In the process of trying to sort out the SafeSetID lost-maintainer
problem I realized that I needed to update the labeled networking
entry to "Supported".
- Minor comment/documentation and spelling fixes.
* tag 'lsm-pr-20230626' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
device_cgroup: Fix kernel-doc warnings in device_cgroup
SafeSetID: fix UID printed instead of GID
MAINTAINERS: move labeled networking to "supported"
capability: erase checker warnings about struct __user_cap_data_struct
lsm: fix a number of misspellings
reiserfs: Initialize sec->length in reiserfs_security_init().
capability: fix kernel-doc warnings in capability.c
Linus Torvalds [Wed, 28 Jun 2023 00:18:48 +0000 (17:18 -0700)]
Merge tag 'selinux-pr-20230626' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
- Thanks to help from the MPTCP folks, it looks like we have finally
sorted out a proper solution to the MPTCP socket labeling issue, see
the new security_mptcp_add_subflow() LSM hook.
- Fix the labeled NFS handling such that a labeled NFS share mounted
prior to the initial SELinux policy load is properly labeled once a
policy is loaded; more information in the commit description.
- Two patches to security/selinux/Makefile, the first took the cleanups
in v6.4 a bit further and the second removed the grouped targets
support as that functionality doesn't appear to be properly supported
prior to make v4.3.
- Deprecate the "fs" object context type in SELinux policies. The fs
object context type was an old vestige that was introduced back in
v2.6.12-rc2 but never really used.
- A number of small changes that remove dead code, clean up some
awkward bits, and generally improve the quality of the code. See the
individual commit descriptions for more information.
* tag 'selinux-pr-20230626' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: avoid bool as identifier name
selinux: fix Makefile for versions of make < v4.3
selinux: make labeled NFS work when mounted before policy load
selinux: cleanup exit_sel_fs() declaration
selinux: deprecated fs ocon
selinux: make header files self-including
selinux: keep context struct members in sync
selinux: Implement mptcp_add_subflow hook
security, lsm: Introduce security_mptcp_add_subflow()
selinux: small cleanups in selinux_audit_rule_init()
selinux: declare read-only data arrays const
selinux: retain const qualifier on string literal in avtab_hash_eval()
selinux: drop return at end of void function avc_insert()
selinux: avc: drop unused function avc_disable()
selinux: adjust typos in comments
selinux: do not leave dangling pointer behind
selinux: more Makefile tweaks
Linus Torvalds [Wed, 28 Jun 2023 00:10:27 +0000 (17:10 -0700)]
Merge tag 'landlock-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
"Add support for Landlock to UML.
To do this, this fixes the way hostfs manages inodes according to the
underlying filesystem [1]. They are now properly handled as for other
filesystems, which enables Landlock support (and probably other
features).
This also extends Landlock's tests with 6 pseudo filesystems,
including hostfs"
Linus Torvalds [Tue, 27 Jun 2023 23:54:21 +0000 (16:54 -0700)]
Merge tag 'cgroup-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- Whenever cpuset needs to rebuild sched_domain, it walked all tasks
looking for DEADLINE tasks as they need to be accounted on the new
domain. Walking all tasks can be expensive and there may not be any
DEADLINE tasks at all. Task iteration is now omitted if there are no
DEADLINE tasks
- Fixes DEADLINE bandwidth misaccounting after task migration failures
- When no controller is enabled, -Wstringop-overflow warning is
triggered. The fix patch added an early exit which is too eager and
got reverted for now. Will fix later
- Everything else is minor cleanups
* tag 'cgroup-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
Revert "cgroup: Avoid -Wstringop-overflow warnings"
cgroup/misc: Expose misc.current on cgroup v2 root
cgroup: Avoid -Wstringop-overflow warnings
cgroup: remove obsolete comment on cgroup_on_dfl()
cgroup: remove unused task_cgroup_path()
cgroup/cpuset: remove unneeded header files
cgroup: make cgroup_is_threaded() and cgroup_is_thread_root() static
rdmacg: fix kernel-doc warnings in rdmacg
cgroup: Replace the css_set call with cgroup_get
cgroup: remove unused macro for_each_e_css()
cgroup: Update out-of-date comment in cgroup_migrate()
cgroup: Replace all non-returning strlcpy with strscpy
cgroup/cpuset: remove unneeded header files
cgroup/cpuset: Free DL BW in case can_attach() fails
sched/deadline: Create DL BW alloc, free & check overflow interface
cgroup/cpuset: Iterate only if DEADLINE tasks are present
sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets
sched/cpuset: Bring back cpuset_mutex
cgroup/cpuset: Rename functions dealing with DEADLINE accounting
Linus Torvalds [Tue, 27 Jun 2023 23:46:06 +0000 (16:46 -0700)]
Merge tag 'wq-for-6.5-cleanup-ordered' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull ordered workqueue creation updates from Tejun Heo:
"For historical reasons, unbound workqueues with max concurrency limit
of 1 are considered ordered, even though the concurrency limit hasn't
been system-wide for a long time.
This creates ambiguity around whether ordered execution is actually
required for correctness, which was actually confusing for e.g. btrfs
(btrfs updates are being routed through the btrfs tree).
There aren't that many users in the tree which use the combination and
there are pending improvements to unbound workqueue affinity handling
which will make inadvertent use of ordered workqueue a bigger loss.
This clarifies the situation for most of them by updating the ones
which require ordered execution to use alloc_ordered_workqueue().
There are some conversions being routed through subsystem-specific
trees and likely a few stragglers. Once they're all converted,
workqueue can trigger a warning on unbound + @max_active==1 usages and
eventually drop the implicit ordered behavior"
* tag 'wq-for-6.5-cleanup-ordered' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
rxrpc: Use alloc_ordered_workqueue() to create ordered workqueues
net: qrtr: Use alloc_ordered_workqueue() to create ordered workqueues
net: wwan: t7xx: Use alloc_ordered_workqueue() to create ordered workqueues
dm integrity: Use alloc_ordered_workqueue() to create ordered workqueues
media: amphion: Use alloc_ordered_workqueue() to create ordered workqueues
scsi: NCR5380: Use default @max_active for hostdata->work_q
media: coda: Use alloc_ordered_workqueue() to create ordered workqueues
crypto: octeontx2: Use alloc_ordered_workqueue() to create ordered workqueues
wifi: ath10/11/12k: Use alloc_ordered_workqueue() to create ordered workqueues
wifi: mwifiex: Use default @max_active for workqueues
wifi: iwlwifi: Use default @max_active for trans_pcie->rba.alloc_wq
xen/pvcalls: Use alloc_ordered_workqueue() to create ordered workqueues
virt: acrn: Use alloc_ordered_workqueue() to create ordered workqueues
net: octeontx2: Use alloc_ordered_workqueue() to create ordered workqueues
net: thunderx: Use alloc_ordered_workqueue() to create ordered workqueues
greybus: Use alloc_ordered_workqueue() to create ordered workqueues
powerpc, workqueue: Use alloc_ordered_workqueue() to create ordered workqueues
Linus Torvalds [Tue, 27 Jun 2023 23:32:52 +0000 (16:32 -0700)]
Merge tag 'wq-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
- Concurrency-managed per-cpu work items that hog CPUs and delay the
execution of other work items are now automatically detected and
excluded from concurrency management. Reporting on such work items
can also be enabled through a config option.
- Added tools/workqueue/wq_monitor.py which improves visibility into
workqueue usages and behaviors.
- Arnd's minimal fix for gcc-13 enum warning on 32bit compiles,
superseded by commit afa4bb778e48 in mainline.
* tag 'wq-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Disable per-cpu CPU hog detection when wq_cpu_intensive_thresh_us is 0
workqueue: Fix WARN_ON_ONCE() triggers in worker_enter_idle()
workqueue: fix enum type for gcc-13
workqueue: Track and monitor per-workqueue CPU time usage
workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism
workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE
workqueue: Improve locking rule description for worker fields
workqueue: Move worker_set/clr_flags() upwards
workqueue: Re-order struct worker fields
workqueue: Add pwq->stats[] and a monitoring script
Further upgrade queue_work_on() comment
Linus Torvalds [Tue, 27 Jun 2023 23:03:20 +0000 (16:03 -0700)]
Merge tag 'for-linus-6.5-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- three patches adding missing prototypes
- a fix for finding the iBFT in a Xen dom0 for supporting diskless
iSCSI boot
* tag 'for-linus-6.5-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86: xen: add missing prototypes
x86/xen: add prototypes for paravirt mmu functions
iscsi_ibft: Fix finding the iBFT under Xen Dom 0
xen: xen_debug_interrupt prototype to global header
Linus Torvalds [Tue, 27 Jun 2023 22:49:10 +0000 (15:49 -0700)]
Merge tag 's390-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:
- Fix the style of protected key API driver source: use x-mas tree for
all local variable declarations
- Rework protected key API driver to not use the struct pkey_protkey
and pkey_clrkey anymore. Both structures have a fixed size buffer,
but with the support of ECC protected key these buffers are not big
enough. Use dynamic buffers internally and transparently for
userspace
- Add support for a new 'non CCA clear key token' with ECC clear keys
supported: ECC P256, ECC P384, ECC P521, ECC ED25519 and ECC ED448.
This makes it possible to derive a protected key from the ECC clear
key input via PKEY_KBLOB2PROTK3 ioctl, while currently the only way
to derive is via PCKMO instruction
- The s390 PMU of PAI crypto and extension 1 NNPA counters use atomic_t
for reference counting. Replace this with the proper data type
refcount_t
- Select ARCH_SUPPORTS_INT128, but limit this to clang for now, since
gcc generates inefficient code, which may lead to stack overflows
- Replace one-element array with flexible-array member in struct
vfio_ccw_parent and refactor the rest of the code accordingly. Also,
prefer struct_size() over sizeof() open- coded versions
- Introduce OS_INFO_FLAGS_ENTRY pointing to a flags field and
OS_INFO_FLAG_REIPL_CLEAR flag that informs a dumper whether the
system memory should be cleared or not once dumped
- Fix a hang when a user attempts to remove a VFIO-AP mediated device
attached to a guest: add VFIO_DEVICE_GET_IRQ_INFO and
VFIO_DEVICE_SET_IRQS IOCTLs and wire up the VFIO bus driver callback
to request a release of the device
- Fix calculation for R_390_GOTENT relocations for modules
- Allow any user space process with CAP_PERFMON capability read and
display the CPU Measurement facility counter sets
- Rework large statically-defined per-CPU cpu_cf_events data structure
and replace it with dynamically allocated structures created when a
perf_event_open() system call is invoked or /dev/hwctr device is
accessed
* tag 's390-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/cpum_cf: rework PER_CPU_DEFINE of struct cpu_cf_events
s390/cpum_cf: open access to hwctr device for CAP_PERFMON privileged process
s390/module: fix rela calculation for R_390_GOTENT
s390/vfio-ap: wire in the vfio_device_ops request callback
s390/vfio-ap: realize the VFIO_DEVICE_SET_IRQS ioctl
s390/vfio-ap: realize the VFIO_DEVICE_GET_IRQ_INFO ioctl
s390/pkey: add support for ecc clear key
s390/pkey: do not use struct pkey_protkey
s390/pkey: introduce reverse x-mas trees
s390/zcore: conditionally clear memory on reipl
s390/ipl: add REIPL_CLEAR flag to os_info
vfio/ccw: use struct_size() helper
vfio/ccw: replace one-element array with flexible-array member
s390: select ARCH_SUPPORTS_INT128
s390/pai_ext: replace atomic_t with refcount_t
s390/pai_crypto: replace atomic_t with refcount_t
Linus Torvalds [Tue, 27 Jun 2023 22:44:11 +0000 (15:44 -0700)]
Merge tag 'xtensa-20230627' of https://github.com/jcmvbkbc/linux-xtensa
Pull xtensa updates from Max Filippov:
- clean up platform_* interface of the xtensa architecture
- enable HAVE_ASM_MODVERSIONS
- drop ARCH_WANT_FRAME_POINTERS
- clean up unaligned access exception handler
- provide handler for load/store exceptions
- various small fixes and cleanups
* tag 'xtensa-20230627' of https://github.com/jcmvbkbc/linux-xtensa:
xtensa: dump userspace code around the exception PC
xtensa: rearrange show_stack output
xtensa: add load/store exception handler
xtensa: rearrange unaligned exception handler
xtensa: always install slow handler for unaligned access exception
xtensa: move early_trap_init from kasan_early_init to init_arch
xtensa: drop ARCH_WANT_FRAME_POINTERS
xtensa: report trax and perf counters in cpuinfo
xtensa: add asm-prototypes.h
xtensa: only build __strncpy_user with CONFIG_ARCH_HAS_STRNCPY_FROM_USER
xtensa: drop bcopy implementation
xtensa: drop EXPORT_SYMBOL for common_exception_return
xtensa: boot-redboot: clean up Makefile
xtensa: clean up default platform functions
xtensa: drop platform_halt and platform_power_off
xtensa: drop platform_restart
xtensa: drop platform_heartbeat
xtensa: xt2000: drop empty platform_init
The patch "nios2: Convert __pte_free_tlb() to use ptdescs" was supposed
to go together with a patchset that Vishal Moola had planned taking it
through the mm tree. By just having this patch, all NIOS2 builds are
broken.
In the worst case of an allyesconfig+CONFIG_DEBUG_INFO=y kernel,
DWARF creates almost 200 million relocations, ballooning objtool's
peak heap usage to 53GB. These patches reduce that to 25GB.
On a distro-type kernel with kernel IBT enabled, they reduce
objtool's peak heap usage from 4.2GB to 2.8GB.
These changes also improve the runtime significantly.
Debuggability improvements:
- Add the unwind_debug command-line option, for more extend unwinding
debugging output
- Limit unreachable warnings to once per function
- Add verbose option for disassembling affected functions
- Include backtrace in verbose mode
- Detect missing __noreturn annotations
- Ignore exc_double_fault() __noreturn warnings
- Remove superfluous global_noreturns entries
- Move noreturn function list to separate file
- Add __kunit_abort() to noreturns
Unwinder improvements:
- Allow stack operations in UNWIND_HINT_UNDEFINED regions
- drm/vmwgfx: Add unwind hints around RBP clobber
Cleanups:
- Move the x86 entry thunk restore code into thunk functions
- x86/unwind/orc: Use swap() instead of open coding it
- Remove unnecessary/unused variables
Fixes for modern stack canary handling"
* tag 'objtool-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
x86/orc: Make the is_callthunk() definition depend on CONFIG_BPF_JIT=y
objtool: Skip reading DWARF section data
objtool: Free insns when done
objtool: Get rid of reloc->rel[a]
objtool: Shrink elf hash nodes
objtool: Shrink reloc->sym_reloc_entry
objtool: Get rid of reloc->jump_table_start
objtool: Get rid of reloc->addend
objtool: Get rid of reloc->type
objtool: Get rid of reloc->offset
objtool: Get rid of reloc->idx
objtool: Get rid of reloc->list
objtool: Allocate relocs in advance for new rela sections
objtool: Add for_each_reloc()
objtool: Don't free memory in elf_close()
objtool: Keep GElf_Rel[a] structs synced
objtool: Add elf_create_section_pair()
objtool: Add mark_sec_changed()
objtool: Fix reloc_hash size
objtool: Consolidate rel/rela handling
...
Linus Torvalds [Tue, 27 Jun 2023 21:47:17 +0000 (14:47 -0700)]
Merge tag 'x86-mm-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
- Remove Xen-PV leftovers from init_32.c
- Fix __swp_entry_to_pte() warning splat for Xen PV guests, triggered
on CONFIG_DEBUG_VM_PGTABLE=y
* tag 'x86-mm-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Remove Xen-PV leftovers from init_32.c
x86/mm: Fix __swp_entry_to_pte() for Xen PV guests
Linus Torvalds [Tue, 27 Jun 2023 21:43:02 +0000 (14:43 -0700)]
Merge tag 'perf-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:
- Rework & fix the event forwarding logic by extending the core
interface.
This fixes AMD PMU events that have to be forwarded from the
core PMU to the IBS PMU.
- Add self-tests to test AMD IBS invocation via core PMU events
- Clean up Intel FixCntrCtl MSR encoding & handling
* tag 'perf-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Re-instate the linear PMU search
perf/x86/intel: Define bit macros for FixCntrCtl MSR
perf test: Add selftest to test IBS invocation via core pmu events
perf/core: Remove pmu linear searching code
perf/ibs: Fix interface via core pmu events
perf/core: Rework forwarding of {task|cpu}-clock events
Linus Torvalds [Tue, 27 Jun 2023 21:14:30 +0000 (14:14 -0700)]
Merge tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- Introduce cmpxchg128() -- aka. the demise of cmpxchg_double()
The cmpxchg128() family of functions is basically & functionally the
same as cmpxchg_double(), but with a saner interface.
Instead of a 6-parameter horror that forced u128 - u64/u64-halves
layout details on the interface and exposed users to complexity,
fragility & bugs, use a natural 3-parameter interface with u128
types.
- Restructure the generated atomic headers, and add kerneldoc comments
for all of the generic atomic{,64,_long}_t operations.
The generated definitions are much cleaner now, and come with
documentation.
- Implement lock_set_cmp_fn() on lockdep, for defining an ordering when
taking multiple locks of the same type.
This gets rid of one use of lockdep_set_novalidate_class() in the
bcache code.
- Fix raw_cpu_generic_try_cmpxchg() bug due to an unintended variable
shadowing generating garbage code on Clang on certain ARM builds.
* tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
locking/atomic: scripts: fix ${atomic}_dec_if_positive() kerneldoc
percpu: Fix self-assignment of __old in raw_cpu_generic_try_cmpxchg()
locking/atomic: treewide: delete arch_atomic_*() kerneldoc
locking/atomic: docs: Add atomic operations to the driver basic API documentation
locking/atomic: scripts: generate kerneldoc comments
docs: scripts: kernel-doc: accept bitwise negation like ~@var
locking/atomic: scripts: simplify raw_atomic*() definitions
locking/atomic: scripts: simplify raw_atomic_long*() definitions
locking/atomic: scripts: split pfx/name/sfx/order
locking/atomic: scripts: restructure fallback ifdeffery
locking/atomic: scripts: build raw_atomic_long*() directly
locking/atomic: treewide: use raw_atomic*_<op>()
locking/atomic: scripts: add trivial raw_atomic*_<op>()
locking/atomic: scripts: factor out order template generation
locking/atomic: scripts: remove leftover "${mult}"
locking/atomic: scripts: remove bogus order parameter
locking/atomic: xtensa: add preprocessor symbols
locking/atomic: x86: add preprocessor symbols
locking/atomic: sparc: add preprocessor symbols
locking/atomic: sh: add preprocessor symbols
...
Linus Torvalds [Tue, 27 Jun 2023 21:03:21 +0000 (14:03 -0700)]
Merge tag 'sched-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Scheduler SMP load-balancer improvements:
- Avoid unnecessary migrations within SMT domains on hybrid systems.
Problem:
On hybrid CPU systems, (processors with a mixture of
higher-frequency SMT cores and lower-frequency non-SMT cores),
under the old code lower-priority CPUs pulled tasks from the
higher-priority cores if more than one SMT sibling was busy -
resulting in many unnecessary task migrations.
Solution:
The new code improves the load balancer to recognize SMT cores
with more than one busy sibling and allows lower-priority CPUs
to pull tasks, which avoids superfluous migrations and lets
lower-priority cores inspect all SMT siblings for the busiest
queue.
- Implement the 'runnable boosting' feature in the EAS balancer:
consider CPU contention in frequency, EAS max util & load-balance
busiest CPU selection.
This improves CPU utilization for certain workloads, while leaves
other key workloads unchanged.
Scheduler infrastructure improvements:
- Rewrite the scheduler topology setup code by consolidating it into
the build_sched_topology() helper function and building it
dynamically on the fly.
- Resolve the local_clock() vs. noinstr complications by rewriting
the code: provide separate sched_clock_noinstr() and
local_clock_noinstr() functions to be used in instrumentation code,
and make sure it is all instrumentation-safe.
Fixes:
- Fix a kthread_park() race with wait_woken()
- Fix misc wait_task_inactive() bugs unearthed by the -rt merge:
- Fix UP PREEMPT bug by unifying the SMP and UP implementations
- Fix task_struct::saved_state handling
- Fix various rq clock update bugs, unearthed by turning on the rq
clock debugging code.
- Fix the PSI WINDOW_MIN_US trigger limit, which was easy to trigger
by creating enough cgroups, by removing the warnign and restricting
window size triggers to PSI file write-permission or
CAP_SYS_RESOURCE.
- Propagate SMT flags in the topology when removing degenerate domain
- Fix grub_reclaim() calculation bug in the deadline scheduler code
- Avoid resetting the min update period when it is unnecessary, in
psi_trigger_destroy().
- Don't balance a task to its current running CPU in load_balance(),
which was possible on certain NUMA topologies with overlapping
groups.
- Fix the sched-debug printing of rq->nr_uninterruptible
Cleanups:
- Address various -Wmissing-prototype warnings, as a preparation to
(maybe) enable this warning in the future.
- Remove unused code
- Mark more functions __init
- Fix shadow-variable warnings"
* tag 'sched-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()
sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop()
sched/core: Fixed missing rq clock update before calling set_rq_offline()
sched/deadline: Update GRUB description in the documentation
sched/deadline: Fix bandwidth reclaim equation in GRUB
sched/wait: Fix a kthread_park race with wait_woken()
sched/topology: Mark set_sched_topology() __init
sched/fair: Rename variable cpu_util eff_util
arm64/arch_timer: Fix MMIO byteswap
sched/fair, cpufreq: Introduce 'runnable boosting'
sched/fair: Refactor CPU utilization functions
cpuidle: Use local_clock_noinstr()
sched/clock: Provide local_clock_noinstr()
x86/tsc: Provide sched_clock_noinstr()
clocksource: hyper-v: Provide noinstr sched_clock()
clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX
x86/vdso: Fix gettimeofday masking
math64: Always inline u128 version of mul_u64_u64_shr()
s390/time: Provide sched_clock_noinstr()
loongarch: Provide noinstr sched_clock_read()
...
Linus Torvalds [Tue, 27 Jun 2023 20:49:33 +0000 (13:49 -0700)]
Merge tag 'x86_sgx_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SGX update from Borislav Petkov:
- A fix to avoid using a list iterator variable after the loop it is
used in
* tag 'x86_sgx_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Avoid using iterator after loop in sgx_mmu_notifier_release()
Linus Torvalds [Tue, 27 Jun 2023 20:26:30 +0000 (13:26 -0700)]
Merge tag 'x86_sev_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV updates from Borislav Petkov:
- Some SEV and CC platform helpers cleanup and simplifications now that
the usage patterns are becoming apparent
[ I'm sure I'm the only one that has gets confused by all the TLAs, but
in case there are others: here SEV is AMD's "Secure Encrypted
Virtualization" and CC is generic "Confidential Computing".
There's also Intel SGX (Software Guard Extensions) and TDX (Trust
Domain Extensions), along with all the vendor memory encryption
extensions (SME, TSME, TME, and WTF).
And then we have arm64 with RMA and CCA, and I probably forgot another
dozen or so related acronyms - Linus ]
* tag 'x86_sev_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/coco: Get rid of accessor functions
x86/sev: Get rid of special sev_es_enable_key
x86/coco: Mark cc_platform_has() and descendants noinstr
Linus Torvalds [Tue, 27 Jun 2023 20:11:32 +0000 (13:11 -0700)]
Merge tag 'x86_mtrr_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mtrr updates from Borislav Petkov:
"A serious scrubbing of the MTRR code including adding a new map
mechanism in order to look up the memory type of a region easily.
Also address memory range lookup issues like returning an invalid
memory type. Furthermore, this handles the decoupling of PAT from MTRR
more naturally.
All work by Juergen Gross"
* tag 'x86_mtrr_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/xen: Set default memory type for PV guests to WB
x86/mtrr: Unify debugging printing
x86/mtrr: Remove unused code
x86/mm: Only check uniform after calling mtrr_type_lookup()
x86/mtrr: Don't let mtrr_type_lookup() return MTRR_TYPE_INVALID
x86/mtrr: Use new cache_map in mtrr_type_lookup()
x86/mtrr: Add mtrr=debug command line option
x86/mtrr: Construct a memory map with cache modes
x86/mtrr: Add get_effective_type() service function
x86/mtrr: Allocate mtrr_value array dynamically
x86/mtrr: Move 32-bit code from mtrr.c to legacy.c
x86/mtrr: Have only one set_mtrr() variant
x86/mtrr: Replace vendor tests in MTRR code
x86/xen: Set MTRR state when running as Xen PV initial domain
x86/hyperv: Set MTRR state when running as SEV-SNP Hyper-V guest
x86/mtrr: Support setting MTRR state for software defined MTRRs
x86/mtrr: Replace size_or_mask and size_and_mask with a much easier concept
x86/mtrr: Remove physical address size calculation
Linus Torvalds [Tue, 27 Jun 2023 19:25:42 +0000 (12:25 -0700)]
Merge tag 'x86_misc_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Borislav Petkov:
- Remove the local symbols prefix of the get/put_user() exception
handling symbols so that tools do not get confused by the presence of
code belonging to the wrong symbol/not belonging to any symbol
- Improve csum_partial()'s performance
- Some improvements to the kcpuid tool
* tag 'x86_misc_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/lib: Make get/put_user() exception handling a visible symbol
x86/csum: Fix clang -Wuninitialized in csum_partial()
x86/csum: Improve performance of `csum_partial`
tools/x86/kcpuid: Add .gitignore
tools/x86/kcpuid: Dump the correct CPUID function in error
Linus Torvalds [Tue, 27 Jun 2023 19:03:44 +0000 (12:03 -0700)]
Merge tag 'x86_microcode_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loader updates from Borislav Petkov:
- Load late on both SMT threads on AMD, just like it is being done in
the early loading procedure
- Cleanups
* tag 'x86_microcode_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Load late on both threads too
x86/microcode/amd: Remove unneeded pointer arithmetic
x86/microcode/AMD: Get rid of __find_equiv_id()
Linus Torvalds [Tue, 27 Jun 2023 18:58:16 +0000 (11:58 -0700)]
Merge tag 'docs-arm-move' of git://git.lwn.net/linux
Pull arm documentation move from Jonathan Corbet:
"Move the Arm architecture documentation under Documentation/arch/.
This brings some order to the documentation directory, declutters the
top-level directory, and makes the documentation organization more
closely match that of the source"
* tag 'docs-arm-move' of git://git.lwn.net/linux:
dt-bindings: Update Documentation/arm references
docs: update some straggling Documentation/arm references
crypto: update some Arm documentation references
mips: update a reference to a moved Arm Document
arm64: Update Documentation/arm references
arm: update in-source documentation references
arm: docs: Move Arm documentation to Documentation/arch/
Linus Torvalds [Tue, 27 Jun 2023 18:33:47 +0000 (11:33 -0700)]
Merge tag 'docs-6.5' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"It's been a relatively calm cycle in docsland. We do have:
- Some initial page-table documentation from Linus (the other Linus)
- Regression-handling documentation improvements from Thorsten
- Addition of kerneldoc documentation for the ERR_PTR() and related
macros from James Seo
... and the usual collection of fixes and updates"
* tag 'docs-6.5' of git://git.lwn.net/linux:
docs: consolidate storage interfaces
Documentation: update git configuration for Link: tag
Documentation: KVM: make corrections to vcpu-requests.rst
Documentation: KVM: make corrections to ppc-pv.rst
Documentation: KVM: make corrections to locking.rst
Documentation: KVM: make corrections to halt-polling.rst
Documentation: virt: correct location of haltpoll module params
Documentation/mm: Initial page table documentation
docs: crypto: async-tx-api: fix typo in struct name
docs/doc-guide: Clarify how to write tables
docs: handling-regressions: rework section about fixing procedures
docs: process: fix a typoed cross-reference
docs: submitting-patches: Discuss interleaved replies
MAINTAINERS: direct process doc changes to a dedicated ML
Documentation: core-api: Add error pointer functions to kernel-api
err.h: Add missing kerneldocs for error pointer functions
Documentation: conf.py: Add __force to c_id_attributes
docs: clarify KVM related kernel parameters' descriptions
docs: consolidate human interface subsystems
docs: admin-guide: Add information about intel_pstate active mode
Linus Torvalds [Tue, 27 Jun 2023 18:28:56 +0000 (11:28 -0700)]
Merge tag 'linux-kselftest-next-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest updates from Shuah Khan:
- allow runners to override the timeout
This change is made to avoid future increases of long timeouts
- several other spelling and cleanups
- a new subtest to video_device_test
- enhancements to test coverage in clone3 test
- other fixes to ftrace and cpufreq tests
* tag 'linux-kselftest-next-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/ftace: Fix KTAP output ordering
selftests/cpufreq: Don't enable generic lock debugging options
kselftests: Sort the collections list to avoid duplicate tests
selftest: pidfd: Omit long and repeating outputs
selftests: allow runners to override the timeout
selftests/ftrace: Add new test case which checks for optimized probes
selftests/clone3: test clone3 with exit signal in flags
kselftest: vDSO: Fix accumulation of uninitialized ret when CLOCK_REALTIME is undefined
selftests: prctl: Fix spelling mistake "anonynous" -> "anonymous"
selftests: media_tests: Add new subtest to video_device_test
Linus Torvalds [Tue, 27 Jun 2023 18:12:55 +0000 (11:12 -0700)]
Merge tag 'linux-kselftest-kunit-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit updates from Shuah Khan:
- kunit_add_action() API to defer a call until test exit
- Update document to add kunit_add_action() usage notes
- Changes to always run cleanup from a test kthread
- Documentation updates to clarify cleanup usage (assertions should not
be used in cleanup)
- Documentation update to clearly indicate that exit functions should
run even if init fails
- Several fixes and enhancements to existing tests
* tag 'linux-kselftest-kunit-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
MAINTAINERS: Add source tree entry for kunit
Documentation: kunit: Rename references to kunit_abort()
kunit: Move kunit_abort() call out of kunit_do_failed_assertion()
kunit: Fix obsolete name in documentation headers (func->action)
Documentation: Kunit: add MODULE_LICENSE to sample code
kunit: Update kunit_print_ok_not_ok function
kunit: Fix reporting of the skipped parameterized tests
kunit/test: Add example test showing parameterized testing
Documentation: kunit: Add usage notes for kunit_add_action()
kunit: kmalloc_array: Use kunit_add_action()
kunit: executor_test: Use kunit_add_action()
kunit: Add kunit_add_action() to defer a call until test exit
kunit: example: Provide example exit functions
Documentation: kunit: Warn that exit functions run even if init fails
Documentation: kunit: Note that assertions should not be used in cleanup
kunit: Always run cleanup from a test kthread
Documentation: kunit: Modular tests should not depend on KUNIT=y
kunit: tool: undo type subscripts for subprocess.Popen
Linus Torvalds [Tue, 27 Jun 2023 17:56:41 +0000 (10:56 -0700)]
Merge tag 'nolibc.2023.06.22a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull nolibc updates from Paul McKenney:
- Add stackprotector support
- Fix RISC-V load-store instruction syntax to support 32-bit binaries,
plus fixes for generic 32-bit support
- Fix use of s390 sys_fork()
- Add my_syscall6() for ARM
- Support different platforms having different errno definitions
- Fix ppoll/ppoll_time64 arguments (add the fifth argument)
- Force use of little endian on MIPS
- Improved testing, for example, better handling of different compilers
and compiler versions, comparing nolibc behavior to that of libc, and
additional test cases
- Improve syntax and header ordering
- Use existing <linux/reboot.h> instead of redefining constants
- Add syscall()
* tag 'nolibc.2023.06.22a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (53 commits)
selftests/nolibc: make sure gcc always use little endian on MIPS
selftests/nolibc: also count skipped and failed tests in output
selftests/nolibc: add new gettimeofday test cases
selftests/nolibc: remove gettimeofday_bad1/2 completely
selftests/nolibc: support two errnos with EXPECT_SYSER2()
tools/nolibc: open: fix up compile warning for arm
tools/nolibc: arm: add missing my_syscall6
selftests/nolibc: use INT_MAX instead of __INT_MAX__
selftests/nolibc: not include limits.h for nolibc
selftests/nolibc: fix up compile warning with glibc on x86_64
selftests/nolibc: allow specify extra arguments for qemu
selftests/nolibc: remove test gettimeofday_null
tools/nolibc: ensure fast64 integer types have 64 bits
selftests/nolibc: test_fork: fix up duplicated print
tools/nolibc: ppoll/ppoll_time64: add a missing argument
selftests/nolibc: remove the duplicated gettimeofday_bad2
selftests/nolibc: print name instead of number for EOVERFLOW
tools/nolibc: support nanoseconds in stat()
selftests/nolibc: prevent coredumps during test execution
tools/nolibc: add support for prctl()
...
====================
af_unix: Followup fixes for SO_PASSPIDFD.
This series fixes 2 issues introduced by commit 5e2ff6704a27 ("scm: add
SO_PASSPIDFD and SCM_PIDFD").
The 1st patch fixes a warning in scm_pidfd_recv() reported by syzkaller.
The 2nd patch fixes a regression that bluetooth can't be built as module.
====================
Recently, our friends from bluetooth subsystem reported [1] that after
commit 5e2ff6704a27 ("scm: add SO_PASSPIDFD and SCM_PIDFD") scm_recv()
helper become unusable in kernel modules (because it uses unexported
pidfd_prepare() API).
We were aware of this issue and workarounded it in a hard way
by commit 97154bcf4d1b ("af_unix: Kconfig: make CONFIG_UNIX bool").
But recently a new functionality was added in the scope of commit 817efd3cad74 ("Bluetooth: hci_sock: Forward credentials to monitor")
and after that bluetooth can't be compiled as a kernel module.
After some discussion in [1] we decided to split scm_recv() into
two helpers, one won't support SCM_PIDFD (used for unix sockets),
and another one will be completely the same as it was before commit 5e2ff6704a27 ("scm: add SO_PASSPIDFD and SCM_PIDFD").
syzkaller hit a WARN_ON_ONCE(!scm->pid) in scm_pidfd_recv().
In unix_stream_read_generic(), if there is no skb in the queue, we could
bail out the do-while loop without calling scm_set_cred():
1. No skb in the queue
2. sk is non-blocking
or
shutdown(sk, RCV_SHUTDOWN) is called concurrently
or
peer calls close()
If the socket is configured with SO_PASSPIDFD, scm_pidfd_recv() would
populate cmsg with garbage emitting the warning.
Let's skip SCM_PIDFD if scm->pid is NULL in scm_pidfd_recv().
Note another way would be skip calling scm_recv() in such cases, but this
caused a regression resulting in commit 9d797ee2dce1 ("Revert "af_unix:
Call scm_recv() only after scm_set_cred()."").
Linus Torvalds [Tue, 27 Jun 2023 17:37:01 +0000 (10:37 -0700)]
Merge tag 'rcu.2023.06.22a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:
"Documentation updates
Miscellaneous fixes, perhaps most notably:
- Remove RCU_NONIDLE(). The new visibility of most of the idle loop
to RCU has obsoleted this API.
- Make the RCU_SOFTIRQ callback-invocation time limit also apply to
the rcuc kthreads that invoke callbacks for CONFIG_PREEMPT_RT.
- Add a jiffies-based callback-invocation time limit to handle
long-running callbacks. (The local_clock() function is only invoked
once per 32 callbacks due to its high overhead.)
- Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs, which
fixes a bug that can occur on systems with non-contiguous CPU
numbering.
kvfree_rcu updates:
- Eliminate the single-argument variant of k[v]free_rcu() now that
all uses have been converted to k[v]free_rcu_mightsleep().
- Add WARN_ON_ONCE() checks for k[v]free_rcu*() freeing callbacks too
soon. Yes, this is closing the barn door after the horse has
escaped, but Murphy says that there will be more horses.
Callback-offloading updates:
- Fix a number of bugs involving the shrinker and lazy callbacks.
Tasks RCU updates
Torture-test updates"
* tag 'rcu.2023.06.22a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (32 commits)
torture: Remove duplicated argument -enable-kvm for ppc64
doc/rcutorture: Add description of rcutorture.stall_cpu_block
rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale
rcu/rcuscale: Move rcu_scale_*() after kfree_scale_cleanup()
rcutorture: Correct name of use_softirq module parameter
locktorture: Add long_hold to adjust lock-hold delays
rcu/nocb: Make shrinker iterate only over NOCB CPUs
rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs
rcu: Make rcu_cpu_starting() rely on interrupts being disabled
rcu: Mark rcu_cpu_kthread() accesses to ->rcu_cpu_has_work
rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp
rcu: Employ jiffies-based backstop to callback time limit
rcu: Check callback-invocation time limit for rcuc kthreads
rcu: Remove RCU_NONIDLE()
rcu: Add more RCU files to kernel-api.rst
rcu-tasks: Clarify the cblist_init_generic() function's pr_info() output
rcu-tasks: Avoid pr_info() with spin lock in cblist_init_generic()
rcu/nocb: Recheck lazy callbacks under the ->nocb_lock from shrinker
rcu/nocb: Fix shrinker race against callback enqueuer
rcu/nocb: Protect lazy shrinker against concurrent (de-)offloading
...