Mark Harmstone [Wed, 25 Feb 2026 10:36:06 +0000 (10:36 +0000)]
btrfs: read key again after incrementing slot in move_existing_remaps()
Fix move_existing_remaps() so that if we increment the slot because the
key we encounter isn't a REMAP_BACKREF, we don't reuse the objectid and
offset of the old item.
Link: https://lore.kernel.org/linux-btrfs/20260125123908.2096548-1-clm@meta.com/ Reported-by: Chris Mason <clm@fb.com> Fixes: bbea42dfb91f ("btrfs: move existing remaps before relocating block group") Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
The __stackleak_poison() inline assembly comes with a "count" operand where
the "d" constraint is used. "count" is used with the exrl instruction and
"d" means that the compiler may allocate any register from 0 to 15.
If the compiler would allocate register 0 then the exrl instruction would
not or the value of "count" into the executed instruction - resulting in a
stackframe which is only partially poisoned.
Use the correct "a" constraint, which excludes register 0 from register
allocation.
Heiko Carstens [Mon, 2 Mar 2026 13:34:59 +0000 (14:34 +0100)]
s390/xor: Improve inline assembly constraints
The inline assembly constraint for the "bytes" operand is "d" for all xor()
inline assemblies. "d" means that any register from 0 to 15 can be used. If
the compiler would use register 0 then the exrl instruction would not or
the value of "bytes" into the executed instruction - resulting in an
incorrect result.
However all the xor() inline assemblies make hard-coded use of register 0,
and it is correctly listed in the clobber list, so that this cannot happen.
Given that this is quite subtle use the better "a" constraint, which
excludes register 0 from register allocation in any case.
The inline assembly constraints for xor_xc_2() are incorrect. "bytes",
"p1", and "p2" are input operands, while all three of them are modified
within the inline assembly. Given that the function consists only of this
inline assembly it seems unlikely that this may cause any problems, however
fix this in any case.
Vasily Gorbik [Mon, 2 Mar 2026 18:03:34 +0000 (19:03 +0100)]
s390/xor: Fix xor_xc_5() inline assembly
xor_xc_5() contains a larl 1,2f that is not used by the asm and is not
declared as a clobber. This can corrupt a compiler-allocated value in %r1
and lead to miscompilation. Remove the instruction.
Fixes: 745600ed6965 ("s390/lib: Use exrl instead of ex in xor functions") Cc: stable@vger.kernel.org Reviewed-by: Juergen Christ <jchrist@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
In this state, deleting the subvol fails with ENOENT, but attempting to
create a new file or subvol over it errors out with EEXIST and even
aborts the fs. Which leaves us a bit stuck.
dmesg contains a single notable error message reading:
"could not do orphan cleanup -2"
2 is ENOENT and the error comes from the failure handling path of
btrfs_orphan_cleanup(), with the stack leading back up to
btrfs_lookup().
btrfs_lookup
btrfs_lookup_dentry
btrfs_orphan_cleanup // prints that message and returns -ENOENT
After some detailed inspection of the internal state, it became clear
that:
- there are no orphan items for the subvol
- the subvol is otherwise healthy looking, it is not half-deleted or
anything, there is no drop progress, etc.
- the subvol was created a while ago and does the meaningful first
btrfs_orphan_cleanup() call that sets BTRFS_ROOT_ORPHAN_CLEANUP much
later.
- after btrfs_orphan_cleanup() fails, btrfs_lookup_dentry() returns -ENOENT,
which results in a negative dentry for the subvolume via
d_splice_alias(NULL, dentry), leading to the observed behavior. The
bug can be mitigated by dropping the dentry cache, at which point we
can successfully delete the subvolume if we want.
i.e.,
btrfs_lookup()
btrfs_lookup_dentry()
if (!sb_rdonly(inode->vfs_inode)->vfs_inode)
btrfs_orphan_cleanup(sub_root)
test_and_set_bit(BTRFS_ROOT_ORPHAN_CLEANUP)
btrfs_search_slot() // finds orphan item for inode N
...
prints "could not do orphan cleanup -2"
if (inode == ERR_PTR(-ENOENT))
inode = NULL;
return d_splice_alias(NULL, dentry) // NEGATIVE DENTRY for valid subvolume
btrfs_orphan_cleanup() does test_and_set_bit(BTRFS_ROOT_ORPHAN_CLEANUP)
on the root when it runs, so it cannot run more than once on a given
root, so something else must run concurrently. However, the obvious
routes to deleting an orphan when nlinks goes to 0 should not be able to
run without first doing a lookup into the subvolume, which should run
btrfs_orphan_cleanup() and set the bit.
The final important observation is that create_subvol() calls
d_instantiate_new() but does not set BTRFS_ROOT_ORPHAN_CLEANUP, so if
the dentry cache gets dropped, the next lookup into the subvolume will
make a real call into btrfs_orphan_cleanup() for the first time. This
opens up the possibility of concurrently deleting the inode/orphan items
but most typical evict() paths will be holding a reference on the parent
dentry (child dentry holds parent->d_lockref.count via dget in
d_alloc(), released in __dentry_kill()) and prevent the parent from
being removed from the dentry cache.
The one exception is delayed iputs. Ordered extent creation calls
igrab() on the inode. If the file is unlinked and closed while those
refs are held, iput() in __dentry_kill() decrements i_count but does
not trigger eviction (i_count > 0). The child dentry is freed and the
subvol dentry's d_lockref.count drops to 0, making it evictable while
the inode is still alive.
Since there are two races (the race between writeback and unlink and
the race between lookup and delayed iputs), and there are too many moving
parts, the following three diagrams show the complete picture.
(Only the second and third are races)
Phase 1:
Create Subvol in dentry cache without BTRFS_ROOT_ORPHAN_CLEANUP set
btrfs_mksubvol()
lookup_one_len()
__lookup_slow()
d_alloc_parallel()
__d_alloc() // d_lockref.count = 1
create_subvol(dentry)
// doesn't touch the bit..
d_instantiate_new(dentry, inode) // dentry in cache with d_lockref.count == 1
Phase 2:
Create a delayed iput for a file in the subvol but leave the subvol in
state where its dentry can be evicted (d_lockref.count == 0)
Phase 3:
Once the delayed iput is pending and the subvol dentry is evictable,
the shrinker can free it, causing the next lookup to go through
btrfs_lookup() and call btrfs_orphan_cleanup() for the first time.
If the cleaner kthread processes the delayed iput concurrently, the
two race:
btrfs_orphan_del()
// inode freed
// returns -ENOENT
btrfs_del_orphan_item()
// -ENOENT
// "could not do orphan cleanup -2"
d_splice_alias(NULL, dentry)
// negative dentry for valid subvol
The most straightforward fix is to ensure the invariant that a dentry
for a subvolume can exist if and only if that subvolume has
BTRFS_ROOT_ORPHAN_CLEANUP set on its root (and is known to have no
orphans or ran btrfs_orphan_cleanup()).
Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs: zoned: move btrfs_zoned_reserve_data_reloc_bg() after kthread start
btrfs_zoned_reserve_data_reloc_bg() is called on each mount of a file
system and allocates a new block-group, to assign it to be the dedicated
relocation target, if no pre-existing usable block-group for this task is
found.
If for some reason the transaction is aborted, btrfs_end_transaction()
will wake up the transaction kthread. But the transaction kthread is not
yet initialized at the time btrfs_zoned_reserve_data_reloc_bg() is
called, leading to the following NULL-pointer dereference:
Move the call to btrfs_zoned_reserve_data_reloc_bg() after the
transaction_kthread has been initialized to fix this problem.
Fixes: 694ce5e143d6 ("btrfs: zoned: reserve data_reloc block group on mount") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Sun YangKai [Mon, 9 Feb 2026 12:53:39 +0000 (20:53 +0800)]
btrfs: hold space_info->lock when clearing periodic reclaim ready
btrfs_set_periodic_reclaim_ready() requires space_info->lock to be held,
as enforced by lockdep_assert_held(). However, btrfs_reclaim_sweep() was
calling it after do_reclaim_sweep() returns, at which point
space_info->lock is no longer held.
Fix this by explicitly acquiring space_info->lock before clearing the
periodic reclaim ready flag in btrfs_reclaim_sweep().
Reported-by: Chris Mason <clm@meta.com> Link: https://lore.kernel.org/linux-btrfs/20260208182556.891815-1-clm@meta.com/ Fixes: 19eff93dc738 ("btrfs: fix periodic reclaim condition") Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Sun YangKai <sunk67188@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Mark Harmstone [Mon, 9 Feb 2026 18:00:14 +0000 (18:00 +0000)]
btrfs: print-tree: add remap tree definitions
Add the definitions for the remap tree to print-tree.c, so that we get
more useful information if a tree is dumped to dmesg.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Revert "ACPI: PM: Let acpi_dev_pm_attach() skip devices without ACPI PM"
Revert commit 88fad6ce090b ("ACPI: PM: Let acpi_dev_pm_attach() skip
devices without ACPI PM") that introduced a SoundWire suspend regression
[1].
It is actually not true that the commit above doesn't make a functional
difference because acpi_subsys_suspend(), for example, may resume
devices in runtime-suspend which affects the subsequent handling of
those devices during the suspend transition. For this reason, the
devices that were handled by the ACPI PM domain before that commit may
be handled differently now which may lead to suspend-resume issues.
Fixes: 88fad6ce090b ("ACPI: PM: Let acpi_dev_pm_attach() skip devices without ACPI PM") Reported-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com> Closes: https://github.com/thesofproject/linux/pull/5677#issuecomment-3984375077 [1] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/2829615.mvXUDI8C0e@rafael.j.wysocki
Charles Keepax [Tue, 3 Mar 2026 14:17:07 +0000 (14:17 +0000)]
ASoC: SDCA: Add allocation failure check for Entity name
Currently find_sdca_entity_iot() can allocate a string for the
Entity name but it doesn't check if that allocation succeeded.
Add the missing NULL check after the allocation.
Hou Wenlong [Sun, 1 Mar 2026 05:04:52 +0000 (13:04 +0800)]
x86/PVH: Use boot params to pass RSDP address in start_info page
After commit e6e094e053af75 ("x86/acpi, x86/boot: Take RSDP address from
boot params if available"), the RSDP address can be passed in boot
params. Therefore, store the RSDP address in start_info page into boot
params in the PVH entry instead of registering a different callback.
This removes an absolute reference during the PVH entry and is more
standardized.
kexinsun [Tue, 24 Feb 2026 02:24:24 +0000 (10:24 +0800)]
x86/xen: update outdated comment
The function xen_flush_tlb_others() was renamed xen_flush_tlb_multi()
by commit 4ce94eabac16 ("x86/mm/tlb: Flush remote and local TLBs
concurrently"). Update the comment accordingly.
David Thomson [Tue, 24 Feb 2026 09:37:11 +0000 (09:37 +0000)]
xen/acpi-processor: fix _CST detection using undersized evaluation buffer
read_acpi_id() attempts to evaluate _CST using a stack buffer of
sizeof(union acpi_object) (48 bytes), but _CST returns a nested Package
of sub-Packages (one per C-state, each containing a register descriptor,
type, latency, and power) requiring hundreds of bytes. The evaluation
always fails with AE_BUFFER_OVERFLOW.
On modern systems using FFH/MWAIT entry (where pblk is zero), this
causes the function to return before setting the acpi_id_cst_present
bit. In check_acpi_ids(), flags.power is then zero for all Phase 2 CPUs
(physical CPUs beyond dom0's vCPU count), so push_cxx_to_hypervisor() is
never called for them.
On a system with dom0_max_vcpus=2 and 8 physical CPUs, only PCPUs 0-1
receive C-state data. PCPUs 2-7 are stuck in C0/C1 idle, unable to
enter C2/C3. This costs measurable wall power (4W observed on an Intel
Core Ultra 7 265K with Xen 4.20).
The function never uses the _CST return value -- it only needs to know
whether _CST exists. Replace the broken acpi_evaluate_object() call with
acpi_has_method(), which correctly detects _CST presence using
acpi_get_handle() without any buffer allocation. This brings C-state
detection to parity with the P-state path, which already works correctly
for Phase 2 CPUs.
Fixes: 59a568029181 ("xen/acpi-processor: C and P-state driver that uploads said data to hypervisor.") Signed-off-by: David Thomson <dt@linux-mail.net> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
Message-ID: <20260224093707.19679-1-dt@linux-mail.net>
Hou Wenlong [Thu, 22 Jan 2026 10:06:14 +0000 (18:06 +0800)]
x86/xen: Build identity mapping page tables dynamically for XENPV
After commit 47ffe0578aee ("x86/pvh: Add 64bit relocation page tables"),
the PVH entry uses a new set of page tables instead of the
preconstructed page tables in head64.S. Since those preconstructed page
tables are only used in XENPV now and XENPV does not actually need the
preconstructed identity page tables directly, they can be filled in
xen_setup_kernel_pagetable(). Therefore, build the identity mapping page
table dynamically to remove the preconstructed page tables and make the
code cleaner.
Thorsten Blum [Tue, 3 Mar 2026 11:30:51 +0000 (12:30 +0100)]
platform/x86: dell-wmi-sysman: Don't hex dump plaintext password data
set_new_password() hex dumps the entire buffer, which contains plaintext
password data, including current and new passwords. Remove the hex dump
to avoid leaking credentials.
Fixes: e8a60aa7404b ("platform/x86: Introduce support for Systems Management Driver over WMI for Dell Systems") Cc: stable@vger.kernel.org Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://patch.msgid.link/20260303113050.58127-2-thorsten.blum@linux.dev Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
YiFei Zhu [Fri, 27 Feb 2026 22:19:37 +0000 (22:19 +0000)]
net: Fix rcu_tasks stall in threaded busypoll
I was debugging a NIC driver when I noticed that when I enable
threaded busypoll, bpftrace hangs when starting up. dmesg showed:
rcu_tasks_wait_gp: rcu_tasks grace period number 85 (since boot) is 10658 jiffies old.
rcu_tasks_wait_gp: rcu_tasks grace period number 85 (since boot) is 40793 jiffies old.
rcu_tasks_wait_gp: rcu_tasks grace period number 85 (since boot) is 131273 jiffies old.
rcu_tasks_wait_gp: rcu_tasks grace period number 85 (since boot) is 402058 jiffies old.
INFO: rcu_tasks detected stalls on tasks: 00000000769f52cd: .N nvcsw: 2/2 holdout: 1 idle_cpu: -1/64
task:napi/eth2-8265 state:R running task stack:0 pid:48300 tgid:48300 ppid:2 task_flags:0x208040 flags:0x00004000
Call Trace:
<TASK>
? napi_threaded_poll_loop+0x27c/0x2c0
? __pfx_napi_threaded_poll+0x10/0x10
? napi_threaded_poll+0x26/0x80
? kthread+0xfa/0x240
? __pfx_kthread+0x10/0x10
? ret_from_fork+0x31/0x50
? __pfx_kthread+0x10/0x10
? ret_from_fork_asm+0x1a/0x30
</TASK>
The cause is that in threaded busypoll, the main loop is in
napi_threaded_poll rather than napi_threaded_poll_loop, where the
latter rarely iterates more than once within its loop. For
rcu_softirq_qs_periodic inside napi_threaded_poll_loop to report its
qs state, the last_qs must be 100ms behind, and this can't happen
because napi_threaded_poll_loop rarely iterates in threaded busypoll,
and each time napi_threaded_poll_loop is called last_qs is reset to
latest jiffies.
This patch changes so that in threaded busypoll, last_qs is saved
in the outer napi_threaded_poll, and whether busy_poll_last_qs
is NULL indicates whether napi_threaded_poll_loop is called for
busypoll. This way last_qs would not reset to latest jiffies on
each invocation of napi_threaded_poll_loop.
Fixes: c18d4b190a46 ("net: Extend NAPI threaded polling to allow kthread based busy polling") Cc: stable@vger.kernel.org Signed-off-by: YiFei Zhu <zhuyifei@google.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Link: https://patch.msgid.link/20260227221937.1060857-1-zhuyifei@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Revert "driver core: enforce device_lock for driver_match_device()"
This reverts commit dc23806a7c47 ("driver core: enforce device_lock for
driver_match_device()") and commit 289b14592cef ("driver core: fix
inverted "locked" suffix of driver_match_device()").
While technically correct, there is a major downside to this approach:
When a device is already present in the system and a driver is
registered on the same bus, we iterate over all devices registered on
this bus to see if one of them matches. If we come across an already
bound one where the corresponding driver crashed while holding the
device lock (e.g. in probe()) we can't make any progress anymore.
However, drivers are typically the least tested code in the kernel and
hence it is a case that is likely to happen regularly. Besides hurting
developer ergonomics, it potentially decreases chances of shutting
things down cleanly and obtaining logs in production environments as
well [1].
This came up in the context of a firewire bug, which only in combination
with the reverted commit, caused the machine to hang [2]. Additionally,
it was observed in [3].
Thus, revert commit dc23806a7c47 ("driver core: enforce device_lock for
driver_match_device()") and add a brief note clarifying that an
implementer of struct bus_type must not expect match() to be called with
the device lock held.
net/rds: Fix circular locking dependency in rds_tcp_tune
syzbot reported a circular locking dependency in rds_tcp_tune() where
sk_net_refcnt_upgrade() is called while holding the socket lock:
======================================================
WARNING: possible circular locking dependency detected
======================================================
kworker/u10:8/15040 is trying to acquire lock: ffffffff8e9aaf80 (fs_reclaim){+.+.}-{0:0},
at: __kmalloc_cache_noprof+0x4b/0x6f0
but task is already holding lock: ffff88805a3c1ce0 (k-sk_lock-AF_INET6){+.+.}-{0:0},
at: rds_tcp_tune+0xd7/0x930
The issue occurs because sk_net_refcnt_upgrade() performs memory
allocation (via get_net_track() -> ref_tracker_alloc()) while the
socket lock is held, creating a circular dependency with fs_reclaim.
Fix this by moving sk_net_refcnt_upgrade() outside the socket lock
critical section. This is safe because the fields modified by the
sk_net_refcnt_upgrade() call (sk_net_refcnt, ns_tracker) are not
accessed by any concurrent code path at this point.
v2:
- Corrected fixes tag
- check patch line wrap nits
- ai commentary nits
Lorenzo Bianconi [Thu, 26 Feb 2026 19:11:16 +0000 (20:11 +0100)]
wifi: mt76: Fix possible oob access in mt76_connac2_mac_write_txwi_80211()
Check frame length before accessing the mgmt fields in
mt76_connac2_mac_write_txwi_80211 in order to avoid a possible oob
access.
Fixes: 577dbc6c656d ("mt76: mt7915: enable offloading of sequence number assignment") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260226-mt76-addba-req-oob-access-v1-3-b0f6d1ad4850@kernel.org
[fix check to also cover mgmt->u.action.u.addba_req.capab,
correct Fixes tag] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Bart Van Assche [Mon, 23 Feb 2026 22:00:24 +0000 (14:00 -0800)]
wifi: cw1200: Fix locking in error paths
cw1200_wow_suspend() must only return with priv->conf_mutex locked if it
returns zero. This mutex must be unlocked if an error is returned. Add
mutex_unlock() calls to the error paths from which that call is missing.
This has been detected by the Clang thread-safety analyzer.
====================
avoid compiler and IQ/OQ reordering
Utilize READ_ONCE and WRITE_ONCE APIs to prevent compiler
optimization and reordering. Ensure IO queue OUT/IN_CNT
registers are flushed. Relocate IQ/OQ IN/OUT_CNTS updates
to occur before NAPI completion, and replace napi_complete
with napi_complete_done.
====================
Vimlesh Kumar [Fri, 27 Feb 2026 09:14:00 +0000 (09:14 +0000)]
octeon_ep_vf: avoid compiler and IQ/OQ reordering
Utilize READ_ONCE and WRITE_ONCE APIs for IO queue Tx/Rx
variable access to prevent compiler optimization and reordering.
Additionally, ensure IO queue OUT/IN_CNT registers are flushed
by performing a read-back after writing.
The compiler could reorder reads/writes to pkts_pending, last_pkt_count,
etc., causing stale values to be used when calculating packets to process
or register updates to send to hardware. The Octeon hardware requires a
read-back after writing to OUT_CNT/IN_CNT registers to ensure the write
has been flushed through any posted write buffers before the interrupt
resend bit is set. Without this, we have observed cases where the hardware
didn't properly update its internal state.
wmb/rmb only provides ordering guarantees but doesn't prevent the compiler
from performing optimizations like caching in registers, load tearing etc.
Vimlesh Kumar [Fri, 27 Feb 2026 09:13:59 +0000 (09:13 +0000)]
octeon_ep_vf: Relocate counter updates before NAPI
Relocate IQ/OQ IN/OUT_CNTS updates to occur before NAPI completion.
Moving the IQ/OQ counter updates before napi_complete_done ensures
1. Counter registers are updated before re-enabling interrupts.
2. Prevents a race where new packets arrive but counters aren't properly
synchronized.
Vimlesh Kumar [Fri, 27 Feb 2026 09:13:58 +0000 (09:13 +0000)]
octeon_ep: avoid compiler and IQ/OQ reordering
Utilize READ_ONCE and WRITE_ONCE APIs for IO queue Tx/Rx
variable access to prevent compiler optimization and reordering.
Additionally, ensure IO queue OUT/IN_CNT registers are flushed
by performing a read-back after writing.
The compiler could reorder reads/writes to pkts_pending, last_pkt_count,
etc., causing stale values to be used when calculating packets to process
or register updates to send to hardware. The Octeon hardware requires a
read-back after writing to OUT_CNT/IN_CNT registers to ensure the write
has been flushed through any posted write buffers before the interrupt
resend bit is set. Without this, we have observed cases where the hardware
didn't properly update its internal state.
wmb/rmb only provides ordering guarantees but doesn't prevent the compiler
from performing optimizations like caching in registers, load tearing etc.
Vimlesh Kumar [Fri, 27 Feb 2026 09:13:57 +0000 (09:13 +0000)]
octeon_ep: Relocate counter updates before NAPI
Relocate IQ/OQ IN/OUT_CNTS updates to occur before NAPI completion,
and replace napi_complete with napi_complete_done.
Moving the IQ/OQ counter updates before napi_complete_done ensures
1. Counter registers are updated before re-enabling interrupts.
2. Prevents a race where new packets arrive but counters aren't properly
synchronized.
napi_complete_done (vs napi_complete) allows for better
interrupt coalescing.
====================
bonding: fix missing XDP compat check on xmit_hash_policy change
syzkaller reported a bug https://syzkaller.appspot.com/bug?extid=5a287bcdc08104bc3132
When a bond device is in 802.3ad or balance-xor mode, XDP is supported
only when xmit_hash_policy != vlan+srcmac. This constraint is enforced
in bond_option_mode_set() via bond_xdp_check(), which prevents switching
to an XDP-incompatible mode while a program is loaded. However, the
symmetric path -- changing xmit_hash_policy while XDP is loaded -- had
no such guard in bond_option_xmit_hash_policy_set().
This means the following sequence silently creates an inconsistent state:
1. Create a bond in 802.3ad mode with xmit_hash_policy=layer2+3.
2. Attach a native XDP program to the bond.
3. Change xmit_hash_policy to vlan+srcmac (no error, not checked).
Now bond->xdp_prog is set but bond_xdp_check() returns false for the
same device. When the bond is later torn down (e.g. netns deletion),
dev_xdp_uninstall() calls bond_xdp_set(dev, NULL) to remove the
program, which hits the bond_xdp_check() guard and returns -EOPNOTSUPP,
triggering a kernel WARNING:
Beyond the WARNING itself, when dev_xdp_install() fails during
dev_xdp_uninstall(), bond_xdp_set() returns early without calling
bpf_prog_put() on the old program. dev_xdp_uninstall() then releases
only the reference held by dev->xdp_state[], while the reference held
by bond->xdp_prog is never dropped, leaking the struct bpf_prog.
The fix refactors the core logic of bond_xdp_check() into a new helper
__bond_xdp_check_mode(mode, xmit_policy) that takes both parameters
explicitly, avoiding the need to read them from the bond struct.
bond_xdp_check() becomes a thin wrapper around it.
bond_option_xmit_hash_policy_set() then uses __bond_xdp_check_mode()
directly, passing the candidate xmit_policy before it is committed,
mirroring exactly what bond_option_mode_set() already does for mode
changes.
Patch 1 adds the kernel fix.
Patch 2 adds a selftest that reproduces the WARNING by attaching native
XDP to a bond in 802.3ad mode, then attempting to change xmit_hash_policy
to vlan+srcmac -- verifying the change is rejected with the fix applied.
====================
Jiayuan Chen [Thu, 26 Feb 2026 08:03:02 +0000 (16:03 +0800)]
selftests/bpf: add test for xdp_bonding xmit_hash_policy compat
Add a selftest to verify that changing xmit_hash_policy to vlan+srcmac
is rejected when a native XDP program is loaded on a bond in 802.3ad
mode. Without the fix in bond_option_xmit_hash_policy_set(), the change
succeeds silently, creating an inconsistent state that triggers a kernel
WARNING in dev_xdp_uninstall() when the bond is torn down.
The test attaches native XDP to a bond0 (802.3ad, layer2+3), then
attempts to switch xmit_hash_policy to vlan+srcmac and asserts the
operation fails. It also verifies the change succeeds after XDP is
detached, confirming the rejection is specific to the XDP-loaded state.
Jiayuan Chen [Thu, 26 Feb 2026 08:03:01 +0000 (16:03 +0800)]
bpf/bonding: reject vlan+srcmac xmit_hash_policy change when XDP is loaded
bond_option_mode_set() already rejects mode changes that would make a
loaded XDP program incompatible via bond_xdp_check(). However,
bond_option_xmit_hash_policy_set() has no such guard.
For 802.3ad and balance-xor modes, bond_xdp_check() returns false when
xmit_hash_policy is vlan+srcmac, because the 802.1q payload is usually
absent due to hardware offload. This means a user can:
1. Attach a native XDP program to a bond in 802.3ad/balance-xor mode
with a compatible xmit_hash_policy (e.g. layer2+3).
2. Change xmit_hash_policy to vlan+srcmac while XDP remains loaded.
This leaves bond->xdp_prog set but bond_xdp_check() now returning false
for the same device. When the bond is later destroyed, dev_xdp_uninstall()
calls bond_xdp_set(dev, NULL, NULL) to remove the program, which hits
the bond_xdp_check() guard and returns -EOPNOTSUPP, triggering:
Fix this by rejecting xmit_hash_policy changes to vlan+srcmac when an
XDP program is loaded on a bond in 802.3ad or balance-xor mode.
commit 39a0876d595b ("net, bonding: Disallow vlan+srcmac with XDP")
introduced bond_xdp_check() which returns false for 802.3ad/balance-xor
modes when xmit_hash_policy is vlan+srcmac. The check was wired into
bond_xdp_set() to reject XDP attachment with an incompatible policy, but
the symmetric path -- preventing xmit_hash_policy from being changed to an
incompatible value after XDP is already loaded -- was left unguarded in
bond_option_xmit_hash_policy_set().
Note:
commit 094ee6017ea0 ("bonding: check xdp prog when set bond mode")
later added a similar guard to bond_option_mode_set(), but
bond_option_xmit_hash_policy_set() remained unprotected.
wangdicheng [Tue, 3 Mar 2026 08:15:16 +0000 (16:15 +0800)]
ALSA: hda/senary: Ensure EAPD is enabled during init
The driver sets spec->gen.own_eapd_ctl to take manual control of the
EAPD (External Amplifier). However, senary_init does not turn on the
EAPD, while senary_shutdown turns it off.
Since the generic driver skips EAPD handling when own_eapd_ctl is set,
the EAPD remains off after initialization (e.g., after resume), leaving
the codec in a non-functional state.
Explicitly call senary_auto_turn_eapd in senary_init to ensure the EAPD
is enabled and the codec is functional.
Matthew Wilcox [Fri, 20 Feb 2026 08:49:59 +0000 (14:19 +0530)]
tee: shm: Remove refcounting of kernel pages
Earlier TEE subsystem assumed to refcount all the memory pages to be
shared with TEE implementation to be refcounted. However, the slab
allocations within the kernel don't allow refcounting kernel pages.
It is rather better to trust the kernel clients to not free pages while
being shared with TEE implementation. Hence, remove refcounting of kernel
pages from register_shm_helper() API.
Fixes: b9c0e49abfca ("mm: decline to manipulate the refcount on a slab page") Reported-by: Marco Felsch <m.felsch@pengutronix.de> Reported-by: Sven Püschel <s.pueschel@pengutronix.de> Signed-off-by: Matthew Wilcox <willy@infradead.org> Co-developed-by: Sumit Garg <sumit.garg@oss.qualcomm.com> Signed-off-by: Sumit Garg <sumit.garg@oss.qualcomm.com> Tested-by: Sven Püschel <s.pueschel@pengutronix.de> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Zhao Mengmeng [Tue, 3 Mar 2026 07:23:17 +0000 (15:23 +0800)]
selftests/sched_ext: Fix peek_dsq.bpf.c compile error for clang 17
When compiling sched_ext selftests using clang 17.0.6, it raised
compiler crash and build error:
Error at line 68: Unsupport signed division for DAG: 0x55b2f9a60240:
i64 = sdiv 0x55b2f9a609b0, Constant:i64<100>, peek_dsq.bpf.c:68:25 @[
peek_dsq.bpf.c:95:4 @[ peek_dsq.bpf.c:169:8 @[ peek
_dsq.bpf.c:140:6 ] ] ]Please convert to unsigned div/mod
After digging, it's not a compiler error, clang supported Signed division
only when using -mcpu=v4, while we use -mcpu=v3 currently, the better way
is to use unsigned div, see [1] for details.
Zhao Mengmeng [Tue, 3 Mar 2026 07:23:16 +0000 (15:23 +0800)]
selftests/sched_ext: Add -fms-extensions to bpf build flags
Similar to commit 835a50753579 ("selftests/bpf: Add -fms-extensions to
bpf build flags") and commit 639f58a0f480 ("bpftool: Fix build warnings
due to MS extensions")
Fix "declaration does not declare anything" warning by using
-fms-extensions and -Wno-microsoft-anon-tag flags to build bpf programs
that #include "vmlinux.h"
Zhao Mengmeng [Tue, 3 Mar 2026 07:23:15 +0000 (15:23 +0800)]
tools/sched_ext: Add -fms-extensions to bpf build flags
Similar to commit 835a50753579 ("selftests/bpf: Add -fms-extensions to
bpf build flags") and commit 639f58a0f480 ("bpftool: Fix build warnings
due to MS extensions")
The kernel is now built with -fms-extensions, therefore
generated vmlinux.h contains types like:
struct aes_key {
struct aes_enckey;
union aes_invkey_arch inv_k;
};
Which raise warning like below when building scx scheduler:
tools/sched_ext/build/include/vmlinux.h:50533:3: warning:
declaration does not declare anything [-Wmissing-declarations]
50533 | struct ns_tree;
| ^
Fix it by using -fms-extensions and -Wno-microsoft-anon-tag flags
to build bpf programs that #include "vmlinux.h"
The KCSAN documentation requires that if one accessor uses WRITE_ONCE()
to annotate lock-free access, all other accesses must also use the
appropriate accessor. Plain reads alongside WRITE_ONCE() leave the pair
incomplete and can trigger KCSAN warnings.
Note that scx_tick() already uses the correct READ_ONCE() annotation:
last_check + READ_ONCE(scx_watchdog_timeout)
Fix the three remaining plain reads to match, making all accesses to
scx_watchdog_timeout consistently annotated and KCSAN-clean.
Signed-off-by: zhidao su <suzhidao@xiaomi.com> Signed-off-by: Tejun Heo <tj@kernel.org>
Frank Li [Mon, 2 Mar 2026 21:59:55 +0000 (16:59 -0500)]
dt-bindings: auxdisplay: ht16k33: Use unevaluatedProperties to fix common property warning
Change additionalProperties to unevaluatedProperties because it refs to
/schemas/input/matrix-keymap.yaml.
Fix below CHECK_DTBS warnings:
arch/arm/boot/dts/nxp/imx/imx6dl-victgo.dtb: keypad@70 (holtek,ht16k33): 'keypad,num-columns', 'keypad,num-rows' do not match any of the regexes: '^pinctrl-[0-9]+$'
from schema $id: http://devicetree.org/schemas/auxdisplay/holtek,ht16k33.yaml#
Fixes: f12b457c6b25c ("dt-bindings: auxdisplay: ht16k33: Convert to json-schema") Acked-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
include/uapi/linux/dma-buf.h uses several macros from ioctl.h to define
its ioctl commands. However, it does not include ioctl.h itself. So,
if userspace source code tries to include the dma-buf.h file without
including ioctl.h, it can result in build failures.
Therefore, include ioctl.h in the dma-buf UAPI header.
Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com> Reviewed-by: T.J. Mercier <tjmercier@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20260303002309.1401849-1-isaacmanjarres@google.com
wangdicheng [Tue, 3 Mar 2026 05:42:42 +0000 (13:42 +0800)]
ALSA: hda/senary: Use codec->core.afg for GPIO access
Replace the hardcoded GPIO node ID (0x01) with codec->core.afg.
This follows the standard HDA driver practice and makes the driver
more robust against different hardware configurations.
Eric Biggers [Sat, 21 Feb 2026 20:45:25 +0000 (12:45 -0800)]
fsverity: add dependency on 64K or smaller pages
Currently, all filesystems that support fsverity (ext4, f2fs, and btrfs)
cache the Merkle tree in the pagecache at a 64K aligned offset after the
end of the file data. This offset needs to be a multiple of the page
size, which is guaranteed only when the page size is 64K or smaller.
64K was chosen to be the "largest reasonable page size". But it isn't
the largest *possible* page size: the hexagon and powerpc ports of Linux
support 256K pages, though that configuration is rarely used.
For now, just disable support for FS_VERITY in these odd configurations
to ensure it isn't used in cases where it would have incorrect behavior.
Fixes: 671e67b47e9f ("fs-verity: add Kconfig and the helper functions for hashing") Reported-by: Christoph Hellwig <hch@lst.de> Closes: https://lore.kernel.org/r/20260119063349.GA643@lst.de Reviewed-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20260221204525.30426-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Commit 1cc93c48b5d7 ("selftests/net: packetdrill: remove tests for
tcp_rcv_*big") removed the test for the reverted commit 1d2fbaad7cd8
("tcp: stronger sk_rcvbuf checks") but also the one for commit 9ca48d616ed7 ("tcp: do not accept packets beyond window").
Restore the test with the necessary adaptation: expect a delayed ACK
instead of an immediate one, since tcp_can_ingest() does not fail
anymore for the last data packet.
net: dsa: realtek: rtl8365mb: fix rtl8365mb_phy_ocp_write return value
Function rtl8365mb_phy_ocp_write() always returns 0, even when an error
occurs during register access. This patch fixes the return value to
propagate the actual error code from regmap operations.
F2fs and RoCEv2 stopped using this CRC32 implementation in commits 3ca4bec40ee211cd ("f2fs: switch to using the crc32 library") and ccca5e8aa1457231 ("RDMA/rxe: switch to using the crc32 library").
Ext4, jbd2, iSCSI, NVMeoF/TCP, and Btrfs stopped using this CRC32c
implementation in commits f2b4fa19647e18a2 ("ext4: switch to using the
crc32c library"), dd348f054b24a3f5 ("jbd2: switch to using the crc32c
library"), 92186c1455a2d356 ("scsi: iscsi_tcp: Switch to using the
crc32c library"), 427fff9aff295e2c ("nvme-tcp: use crc32c() and
skb_copy_and_crc32c_datagram_iter()"), and fe11ac191ce0ad91 ("btrfs:
switch to library APIs for checksums").
NFS, Ceph, SMB, and Btrfs stopped using this SHA-256 implementation in
commits c2c90a8b2620626c ("nfsd: use SHA-256 library API instead of
crypto_shash API"), 27c0a7b05d13a0dc ("libceph: Use HMAC-SHA256 library
instead of crypto_shash"), 924067ef183bd17f ("ksmbd: Use HMAC-SHA256
library for message signing and key generation"), and fe11ac191ce0ad91
("btrfs: switch to library APIs for checksums").
Dillon Varone [Wed, 18 Feb 2026 19:34:28 +0000 (14:34 -0500)]
drm/amd/display: Fallback to boot snapshot for dispclk
[WHY & HOW]
If the dentist is unavailable, fallback to reading CLKIP via the boot
snapshot to get the current dispclk.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Dillon Varone <Dillon.Varone@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2ab77600d1e55a042c02437326d3c7563e853c6c) Cc: stable@vger.kernel.org
Alex Hung [Fri, 27 Feb 2026 19:30:38 +0000 (12:30 -0700)]
drm/amd/display: Enable DEGAMMA and reject COLOR_PIPELINE+DEGAMMA_LUT
[WHAT]
Create DEGAMMA properties even if color pipeline is enabled, and enforce
the mutual exclusion in atomic check by rejecting any commit that
attempts to enable both COLOR_PIPELINE on the plane and DEGAMMA_LUT on
the CRTC simultaneously.
Fixes: 18a4127e9315 ("drm/amd/display: Disable CRTC degamma when color pipeline is enabled") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4963 Reviewed-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 196a6aa727f1f15eb54dda5e60a41543ea9397ee)
kbuild: Leave objtool binary around with 'make clean'
The difference between 'make clean' and 'make mrproper' is documented in
'make help' as:
clean - Remove most generated files but keep the config and
enough build support to build external modules
mrproper - Remove all generated files + config + various backup files
After commit 68b4fe32d737 ("kbuild: Add objtool to top-level clean
target"), running 'make clean' then attempting to build an external
module with the resulting build directory fails with
$ make ARCH=x86_64 O=build clean
$ make -C build M=... MO=...
...
/bin/sh: line 1: .../build/tools/objtool/objtool: No such file or directory
as 'make clean' removes the objtool binary.
Split the objtool clean target into mrproper and clean like Kbuild does
and remove all generated artifacts with 'make clean' except for the
objtool binary, which is removed with 'make mrproper'. To avoid a small
race when running the objtool clean target through both objtool_mrproper
and objtool_clean when running 'make mrproper', modify objtool's clean
up find command to avoid using find's '-delete' command by piping the
files into 'xargs rm -f' like the rest of Kbuild does.
Cc: stable@vger.kernel.org Fixes: 68b4fe32d737 ("kbuild: Add objtool to top-level clean target") Reported-by: Michal Suchanek <msuchanek@suse.de> Closes: https://lore.kernel.org/20260225112633.6123-1-msuchanek@suse.de/ Reported-by: Rainer Fiebig <jrf@mailbox.org> Closes: https://lore.kernel.org/62d12399-76e5-3d40-126a-7490b4795b17@mailbox.org/ Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Nicolas Schier <nsc@kernel.org> Tested-by: Nicolas Schier <nsc@kernel.org> Link: https://patch.msgid.link/20260227-avoid-objtool-binary-removal-clean-v1-1-122f3e55eae9@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Restore the alignment of sampled_vals to 16 bytes by using
IIO_DECLARE_QUATERNION(). This field contains a quaternion value which
has scan_type.repeat = 4 and storagebits = 32. So the alignment must
be 16 bytes to match the assumptions of iio_storage_bytes_for_si() and
also to not break userspace.
David Lechner [Sat, 28 Feb 2026 20:02:22 +0000 (14:02 -0600)]
iio: add IIO_DECLARE_QUATERNION() macro
Add a new IIO_DECLARE_QUATERNION() macro that is used to declare the
field in an IIO buffer struct that contains a quaternion vector.
Quaternions are currently the only IIO data type that uses the .repeat
feature of struct iio_scan_type. This has an implicit rule that the
element in the buffer must be aligned to the entire size of the repeated
element. This macro will make that requirement explicit. Since this is
the only user, we just call the macro IIO_DECLARE_QUATERNION() instead
of something more generic.
Signed-off-by: David Lechner <dlechner@baylibre.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Felix Gu [Mon, 2 Mar 2026 16:00:04 +0000 (00:00 +0800)]
iio: adc: ti-ads1119: Replace IRQF_ONESHOT with IRQF_NO_THREAD
As there is no threaded handler, replace devm_request_threaded_irq()
with devm_request_irq(), and as the handler calls iio_trigger_poll()
which may not be called from a threaded handler replace IRQF_ONESHOT
with IRQF_NO_THREAD.
Since commit aef30c8d569c ("genirq: Warn about using IRQF_ONESHOT
without a threaded handler"), the IRQ core checks IRQF_ONESHOT flag
in IRQ request and gives a warning if there is no threaded handler.
Fixes: a9306887eba4 ("iio: adc: ti-ads1119: Add driver") Signed-off-by: Felix Gu <ustc.gu@gmail.com> Reviewed-by: David Lechner <dlechner@baylibre.com> Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Documentation: KVM: Formalizing taking vcpu->mutex *outside* of kvm->slots_lock
Explicitly document the ordering of vcpu->mutex being taken *outside* of
kvm->slots_lock. While somewhat unintuitive since vCPUs conceptually have
narrower scope than VMs, the scope of the owning object (vCPU versus VM)
doesn't automatically carry over to the lock. In this case, vcpu->mutex
has far broader scope than kvm->slots_lock. As Paolo put it, it's a
"don't worry about multiple ioctls at the same time" mutex that's intended
to be taken at the outer edges of KVM.
More importantly, arm64 and x86 have gained flows that take kvm->slots_lock
inside of vcpu->mutex. x86's kvm_inhibit_apic_access_page() is particularly
nasty, as slots_lock is taken quite deep within KVM_RUN, i.e. simply
swapping the ordering isn't an option.
Commit to the vcpu->mutex => kvm->slots_lock ordering, as vcpu->mutex
really is intended to be a "top-level" lock, whereas kvm->slots_lock is
"just" a helper lock.
Opportunistically document that vcpu->mutex is also taken outside of
slots_arch_lock, e.g. when allocating shadow roots on x86 (which is the
entire reason slots_arch_lock exists, as shadow roots must be allocated
while holding kvm->srcu)
Jiri Olsa [Mon, 2 Mar 2026 08:16:22 +0000 (09:16 +0100)]
ftrace: Add missing ftrace_lock to update_ftrace_direct_add/del
Ihor and Kumar reported splat from ftrace_get_addr_curr [1], which happened
because of the missing ftrace_lock in update_ftrace_direct_add/del functions
allowing concurrent access to ftrace internals.
The ftrace_update_ops function must be guarded by ftrace_lock, adding that.
Lizhi Hou [Thu, 26 Feb 2026 21:38:57 +0000 (13:38 -0800)]
accel/amdxdna: Fix NULL pointer dereference of mgmt_chann
mgmt_chann may be set to NULL if the firmware returns an unexpected
error in aie2_send_mgmt_msg_wait(). This can later lead to a NULL
pointer dereference in aie2_hw_stop().
Fix this by introducing a dedicated helper to destroy mgmt_chann
and by adding proper NULL checks before accessing it.
zhidao su [Mon, 2 Mar 2026 09:14:40 +0000 (17:14 +0800)]
sched_ext: Replace naked scx_root dereferences in kobject callbacks
scx_attr_ops_show() and scx_uevent() access scx_root->ops.name directly.
This is problematic for two reasons:
1. The file-level comment explicitly identifies naked scx_root
dereferences as a temporary measure that needs to be replaced
with proper per-instance access.
2. scx_attr_events_show(), the neighboring sysfs show function in
the same group, already uses the correct pattern:
so container_of(kobj, struct scx_sched, kobj) correctly retrieves
the owning scx_sched instance in both callbacks.
Replace the naked scx_root dereferences with container_of()-based
access, consistent with scx_attr_events_show() and in preparation
for proper multi-instance scx_sched support.
Signed-off-by: zhidao su <suzhidao@xiaomi.com> Signed-off-by: Tejun Heo <tj@kernel.org>
zhidao su [Mon, 2 Mar 2026 09:14:39 +0000 (17:14 +0800)]
sched_ext: Use READ_ONCE() for the read side of dsq->nr update
scx_bpf_dsq_nr_queued() reads dsq->nr via READ_ONCE() without holding
any lock, making dsq->nr a lock-free concurrently accessed variable.
However, dsq_mod_nr(), the sole writer of dsq->nr, only uses
WRITE_ONCE() on the write side without the matching READ_ONCE() on the
read side:
The KCSAN documentation requires that if one accessor uses READ_ONCE()
or WRITE_ONCE() on a variable to annotate lock-free access, all other
accesses must also use the appropriate accessor. A plain read on the
right-hand side of WRITE_ONCE() leaves the pair incomplete and will
trigger KCSAN warnings.
Fix by using READ_ONCE() for the read side of the update:
WRITE_ONCE(dsq->nr, READ_ONCE(dsq->nr) + delta);
This is consistent with scx_bpf_dsq_nr_queued() and makes the
concurrent access annotation complete and KCSAN-clean.
Signed-off-by: zhidao su <suzhidao@xiaomi.com> Signed-off-by: Tejun Heo <tj@kernel.org>
Arnd Bergmann [Mon, 2 Feb 2026 09:48:53 +0000 (10:48 +0100)]
kunit: reduce stack usage in kunit_run_tests()
Some of the recent changes to the kunit framework caused the stack usage
for kunit_run_tests() to grow higher than most other kernel functions,
which triggers a warning when CONFIG_FRAME_WARN is set to a relatively
low value:
lib/kunit/test.c: In function 'kunit_run_tests':
lib/kunit/test.c:801:1: error: the frame size of 1312 bytes is larger than 1280 bytes [-Werror=frame-larger-than=]
Split out the inner loop into a separate function to ensure that each
function remains under the limit, and pass the kunit_result_stats
structures by reference to avoid excessive copies.
Fixed checkpatch warnings at commit time:
Shuah Khan <skhan@linuxfoundation.org>
Cc: Carlos Llamas <cmllamas@google.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
- Fix credential reference leaks in the NFSD netlink admin protocol
* tag 'nfsd-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: report the requested maximum number of threads instead of number running
nfsd: Fix cred ref leak in nfsd_nl_listener_set_doit().
nfsd: Fix cred ref leak in nfsd_nl_threads_set_doit().
Shuvam Pandey [Thu, 26 Feb 2026 15:29:10 +0000 (21:14 +0545)]
kunit: tool: copy caller args in run_kernel to prevent mutation
run_kernel() appended KUnit flags directly to the caller-provided args
list. When exec_tests() calls run_kernel() repeatedly (e.g. with
--run_isolated), each call mutated the same list, causing later runs
to inherit stale filter_glob values and duplicate kunit.enable flags.
Fix this by copying args at the start of run_kernel(). Add a regression
test that calls run_kernel() twice with the same list and verifies the
original remains unchanged.
Fixes: ff9e09a3762f ("kunit: tool: support running each suite/test separately") Signed-off-by: Shuvam Pandey <shuvampandey1@gmail.com> Reviewed-by: David Gow <david@davidgow.net> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
If `CONFIG_PRINTK` is not set, then the following warnings are issued
during build:
warning: unused variable: `args`
--> ../rust/kernel/kunit.rs:16:12
|
16 | pub fn err(args: fmt::Arguments<'_>) {
| ^^^^ help: if this is intentional, prefix it with an underscore: `_args`
|
= note: `#[warn(unused_variables)]` (part of `#[warn(unused)]`) on by default
warning: unused variable: `args`
--> ../rust/kernel/kunit.rs:32:13
|
32 | pub fn info(args: fmt::Arguments<'_>) {
| ^^^^ help: if this is intentional, prefix it with an underscore: `_args`
Fix this by adding a no-op assignment using `args` when `CONFIG_PRINTK`
is not set.
Fixes: a66d733da801 ("rust: support running Rust documentation tests as KUnit ones") Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: David Gow <david@davidgow.net> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Thomas Hellström [Tue, 10 Feb 2026 11:56:53 +0000 (12:56 +0100)]
mm: Fix a hmm_range_fault() livelock / starvation problem
If hmm_range_fault() fails a folio_trylock() in do_swap_page,
trying to acquire the lock of a device-private folio for migration,
to ram, the function will spin until it succeeds grabbing the lock.
However, if the process holding the lock is depending on a work
item to be completed, which is scheduled on the same CPU as the
spinning hmm_range_fault(), that work item might be starved and
we end up in a livelock / starvation situation which is never
resolved.
This can happen, for example if the process holding the
device-private folio lock is stuck in
migrate_device_unmap()->lru_add_drain_all()
sinc lru_add_drain_all() requires a short work-item
to be run on all online cpus to complete.
A prerequisite for this to happen is:
a) Both zone device and system memory folios are considered in
migrate_device_unmap(), so that there is a reason to call
lru_add_drain_all() for a system memory folio while a
folio lock is held on a zone device folio.
b) The zone device folio has an initial mapcount > 1 which causes
at least one migration PTE entry insertion to be deferred to
try_to_migrate(), which can happen after the call to
lru_add_drain_all().
c) No or voluntary only preemption.
This all seems pretty unlikely to happen, but indeed is hit by
the "xe_exec_system_allocator" igt test.
Resolve this by waiting for the folio to be unlocked if the
folio_trylock() fails in do_swap_page().
Rename migration_entry_wait_on_locked() to
softleaf_entry_wait_unlock() and update its documentation to
indicate the new use-case.
Future code improvements might consider moving
the lru_add_drain_all() call in migrate_device_unmap() to be
called *after* all pages have migration entries inserted.
That would eliminate also b) above.
v2:
- Instead of a cond_resched() in hmm_range_fault(),
eliminate the problem by waiting for the folio to be unlocked
in do_swap_page() (Alistair Popple, Andrew Morton)
v3:
- Add a stub migration_entry_wait_on_locked() for the
!CONFIG_MIGRATION case. (Kernel Test Robot)
v4:
- Rename migrate_entry_wait_on_locked() to
softleaf_entry_wait_on_locked() and update docs (Alistair Popple)
v5:
- Add a WARN_ON_ONCE() for the !CONFIG_MIGRATION
version of softleaf_entry_wait_on_locked().
- Modify wording around function names in the commit message
(Andrew Morton)
Suggested-by: Alistair Popple <apopple@nvidia.com> Fixes: 1afaeb8293c9 ("mm/migrate: Trylock device page in do_swap_page") Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: linux-mm@kvack.org Cc: <dri-devel@lists.freedesktop.org> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: <stable@vger.kernel.org> # v6.15+ Reviewed-by: John Hubbard <jhubbard@nvidia.com> #v3 Reviewed-by: Alistair Popple <apopple@nvidia.com> Link: https://patch.msgid.link/20260210115653.92413-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit a69d1ab971a624c6f112cea61536569d579c3215) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Nilay Shroff [Sun, 1 Mar 2026 12:59:43 +0000 (18:29 +0530)]
block: break pcpu_alloc_mutex dependency on freeze_lock
While nr_hw_update allocates tagset tags it acquires ->pcpu_alloc_mutex
after ->freeze_lock is acquired or queue is frozen. This potentially
creates a circular dependency involving ->fs_reclaim if reclaim is
triggered simultaneously in a code path which first acquires ->pcpu_
alloc_mutex. As the queue is already frozen while nr_hw_queue update
allocates tagsets, the reclaim can't forward progress and thus it could
cause a potential deadlock as reported in lockdep splat[1].
Fix this by pre-allocating tagset tags before we freeze queue during
nr_hw_queue update. Later the allocated tagset tags could be safely
installed and used after queue is frozen.
Reported-by: Yi Zhang <yi.zhang@redhat.com> Closes: https://lore.kernel.org/all/CAHj4cs8F=OV9s3La2kEQ34YndgfZP-B5PHS4Z8_b9euKG6J4mw@mail.gmail.com/ [1] Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Reviewed-by: Yu Kuai <yukuai@fnnas.com>
[axboe: fix brace style issue] Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Mon, 2 Mar 2026 14:32:04 +0000 (14:32 +0000)]
io_uring/net: reject SEND_VECTORIZED when unsupported
IORING_SEND_VECTORIZED with registered buffers is not implemented but
could be. Don't silently ignore the flag in this case but reject it with
an error. It only affects sendzc as normal sends don't support
registered buffers.
Fixes: 6f02527729bd3 ("io_uring/net: Allow to do vectorized send") Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
blktrace: fix __this_cpu_read/write in preemptible context
tracing_record_cmdline() internally uses __this_cpu_read() and
__this_cpu_write() on the per-CPU variable trace_cmdline_save, and
trace_save_cmdline() explicitly asserts preemption is disabled via
lockdep_assert_preemption_disabled(). These operations are only safe
when preemption is off, as they were designed to be called from the
scheduler context (probe_wakeup_sched_switch() / probe_wakeup()).
__blk_add_trace() was calling tracing_record_cmdline(current) early in
the blk_tracer path, before ring buffer reservation, from process
context where preemption is fully enabled. This triggers the following
using blktests/blktrace/002:
blktrace/002 (blktrace ftrace corruption with sysfs trace) [failed]
runtime 0.367s ... 0.437s
something found in dmesg:
[ 81.211018] run blktests blktrace/002 at 2026-02-25 22:24:33
[ 81.239580] null_blk: disk nullb1 created
[ 81.357294] BUG: using __this_cpu_read() in preemptible [00000000] code: dd/2516
[ 81.362842] caller is tracing_record_cmdline+0x10/0x40
[ 81.362872] CPU: 16 UID: 0 PID: 2516 Comm: dd Tainted: G N 7.0.0-rc1lblk+ #84 PREEMPT(full)
[ 81.362877] Tainted: [N]=TEST
[ 81.362878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[ 81.362881] Call Trace:
[ 81.362884] <TASK>
[ 81.362886] dump_stack_lvl+0x8d/0xb0
...
(See '/mnt/sda/blktests/results/nodev/blktrace/002.dmesg' for the entire message)
The same BUG fires from blk_add_trace_plug(), blk_add_trace_unplug(),
and blk_add_trace_rq() paths as well.
The purpose of tracing_record_cmdline() is to cache the task->comm for
a given PID so that the trace can later resolve it. It is only
meaningful when a trace event is actually being recorded. Ring buffer
reservation via ring_buffer_lock_reserve() disables preemption, and
preemption remains disabled until the event is committed :-
Tomasz Lis [Thu, 26 Feb 2026 21:26:58 +0000 (22:26 +0100)]
drm/xe/queue: Call fini on exec queue creation fail
Every call to queue init should have a corresponding fini call.
Skipping this would mean skipping removal of the queue from GuC list
(which is part of guc_id allocation). A damaged queue stored in
exec_queue_lookup list would lead to invalid memory reference,
sooner or later.
Call fini to free guc_id. This must be done before any internal
LRCs are freed.
Since the finalization with this extra call became very similar to
__xe_exec_queue_fini(), reuse that. To make this reuse possible,
alter xe_lrc_put() so it can survive NULL parameters, like other
similar functions.
v2: Reuse _xe_exec_queue_fini(). Make xe_lrc_put() aware of NULLs.
Matthew Brost [Thu, 15 Jan 2026 00:45:46 +0000 (16:45 -0800)]
drm/xe: Do not preempt fence signaling CS instructions
If a batch buffer is complete, it makes little sense to preempt the
fence signaling instructions in the ring, as the largest portion of the
work (the batch buffer) is already done and fence signaling consists of
only a few instructions. If these instructions are preempted, the GuC
would need to perform a context switch just to signal the fence, which
is costly and delays fence signaling. Avoid this scenario by disabling
preemption immediately after the BB start instruction and re-enabling it
after executing the fence signaling instructions.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Carlos Santa <carlos.santa@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patch.msgid.link/20260115004546.58060-1-matthew.brost@intel.com
(cherry picked from commit 2bcbf2dcde0c839a73af664a3c77d4e77d58a3eb) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
int main(void) {
int fd = open("/dev/dri/renderD128", O_RDWR);
struct drm_syncobj_create arg1;
ioctl(fd, DRM_IOCTL_SYNCOBJ_CREATE, &arg1);
struct drm_syncobj_handle arg2;
memset(&arg2, 1, sizeof(arg2)); // simulate dirty stack
arg2.handle = arg1.handle;
arg2.flags = 0;
arg2.fd = 0;
arg2.pad = 0;
// arg2.point = 0; // userspace is required to set point to 0
ioctl(fd, DRM_IOCTL_SYNCOBJ_HANDLE_TO_FD, &arg2);
}
The last ioctl returns EINVAL because args->point is not 0. However,
userspace developed against older kernel versions is not aware of the
new point field and might therefore not initialize it.
The correct check would be
if (args->flags & DRM_SYNCOBJ_FD_TO_HANDLE_FLAGS_TIMELINE)
return -EINVAL;
However, there might already be userspace that relies on this not
returning an error as long as point == 0. Therefore use the more lenient
check.
Fixes: c2d3a7300695 ("drm/syncobj: Extend EXPORT_SYNC_FILE for timeline syncobjs") Signed-off-by: Julian Orth <ju.orth@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20260301-point-v1-1-21fc5fd98614@gmail.com
Michael Walle [Mon, 2 Mar 2026 12:24:52 +0000 (13:24 +0100)]
dt-bindings: hwmon: sl28cpld: Drop sa67mcu compatible
I was just informed that this product is discontinued (without being
ever released to the market). Pull the plug and let's not waste any more
maintainers time and revert commit 0f6eae86e626 ("dt-bindings: hwmon:
sl28cpld: add sa67mcu compatible").
Franz Schnyder [Wed, 18 Feb 2026 10:25:14 +0000 (11:25 +0100)]
regulator: pf9453: Respect IRQ trigger settings from firmware
The datasheet specifies, that the IRQ_B pin is pulled low when any
unmasked interrupt bit status is changed, and it is released high once
the application processor reads the INT1 register. As it specifies a
level-low behavior, it should not force a falling-edge interrupt.
Remove the IRQF_TRIGGER_FALLING to not force the falling-edge interrupt
and instead rely on the flag from the device tree.
Sheetal [Mon, 2 Mar 2026 08:53:22 +0000 (14:23 +0530)]
ASoC: dt-bindings: tegra: Add compatible for Tegra238 sound card
Tegra238 requires different PLLA and PLLA_OUT0 clock rates compared to
other Tegra platforms. Add Tegra238 compatible string to the APE
tegra-audio-graph-card bindings.
Frank Li [Thu, 12 Feb 2026 16:30:00 +0000 (11:30 -0500)]
dt-bindings: net: can: nxp,sja1000: add reference to mc-peripheral-props.yaml
Add a reference to mc-peripheral-props.yaml to allow vendor-specific
properties for memory access timings.
Fix below CHECK_DTBS warings:
arch/arm/boot/dts/nxp/imx/imx27-phytec-phycore-rdk.dtb: can@4,0 (nxp,sja1000): Unevaluated properties are not allowed ('fsl,weim-cs-timing' was unexpected)
from schema $id: http://devicetree.org/schemas/net/can/nxp,sja1000.yaml
can: gs_usb: gs_can_open(): always configure bitrates before starting device
So far the driver populated the struct can_priv::do_set_bittiming() and
struct can_priv::fd::do_set_data_bittiming() callbacks.
Before bringing up the interface, user space has to configure the bitrates.
With these callbacks the configuration is directly forwarded into the CAN
hardware. Then the interface can be brought up.
An ifdown-ifup cycle (without changing the bit rates) doesn't re-configure
the bitrates in the CAN hardware. This leads to a problem with the
CANable-2.5 [1] firmware, which resets the configured bit rates during
ifdown.
To fix the problem remove both bit timing callbacks and always configure
the bitrates in the struct net_device_ops::ndo_open() callback.
Kim Phillips [Tue, 3 Feb 2026 22:24:03 +0000 (16:24 -0600)]
x86/sev: Allow IBPB-on-Entry feature for SNP guests
The SEV-SNP IBPB-on-Entry feature does not require a guest-side
implementation. It was added in Zen5 h/w, after the first SNP Zen
implementation, and thus was not accounted for when the initial set of SNP
features were added to the kernel.
Tom Lendacky [Wed, 4 Feb 2026 15:01:00 +0000 (09:01 -0600)]
x86/boot/sev: Move SEV decompressor variables into the .data section
As part of the work to remove the dependency on calling into the decompressor
code (startup_64()) for a UEFI boot, a call to rmpadjust() was removed from
sev_enable() in favor of checking the value of the snp_vmpl variable.
When booting through a non-UEFI path and calling startup_64(), the call to
sev_enable() is performed before the BSS section is zeroed. With the removal
of the rmpadjust() call and the corresponding check of the return code, the
snp_vmpl variable is checked.
Since the kernel is running at VMPL0, the snp_vmpl variable will not have been
set and should be the default value of 0. However, since the call occurs
before the BSS is zeroed, the snp_vmpl variable may not actually be zero,
which will cause the guest boot to fail.
Since the decompressor relocates itself, the BSS would need to be cleared both
before and after the relocation, but this would, in effect, cause all of the
changes to BSS variables before relocation to be lost after relocation.
Instead, move the snp_vmpl variable into the .data section so that it is
initialized and the value made safe during relocation. As a pre-caution
against future changes, move other SEV-related decompressor variables into the
.data section, too.
Fixes: 68a501d7fd82 ("x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Changyuan Lyu <changyuanl@google.com> Tested-by: Kevin Hui <kevinhui@meta.com> Tested-by: Changyuan Lyu <changyuanl@google.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/5648b7de5b0a5d0dfef3785f9582b718678c6448.1770217260.git.thomas.lendacky@amd.com
can: usb: f81604: correctly anchor the urb in the read bulk callback
When submitting an urb, that is using the anchor pattern, it needs to be
anchored before submitting it otherwise it could be leaked if
usb_kill_anchored_urbs() is called. This logic is correctly done
elsewhere in the driver, except in the read bulk callback so do that
here also.
Cc: Ji-Ze Hong (Peter Hong) <peter_hong@fintek.com.tw> Cc: Marc Kleine-Budde <mkl@pengutronix.de> Cc: Vincent Mailhol <mailhol@kernel.org> Cc: stable@kernel.org Assisted-by: gkh_clanker_2000 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/2026022334-starlight-scaling-2cea@gregkh Fixes: 88da17436973 ("can: usb: f81604: add Fintek F81604 support") Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
If a write urb fails then more needs to be done other than just logging
the message, otherwise the transmission could be stalled. Properly
increment the error counters and wake up the queues so that data will
continue to flow.
Cc: Ji-Ze Hong (Peter Hong) <peter_hong@fintek.com.tw> Cc: Marc Kleine-Budde <mkl@pengutronix.de> Cc: Vincent Mailhol <mailhol@kernel.org> Cc: stable@kernel.org Assisted-by: gkh_clanker_2000 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/2026022334-slackness-dynamic-9195@gregkh Fixes: 88da17436973 ("can: usb: f81604: add Fintek F81604 support") Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
can: usb: etas_es58x: correctly anchor the urb in the read bulk callback
When submitting an urb, that is using the anchor pattern, it needs to be
anchored before submitting it otherwise it could be leaked if
usb_kill_anchored_urbs() is called. This logic is correctly done
elsewhere in the driver, except in the read bulk callback so do that
here also.
Cc: Vincent Mailhol <mailhol@kernel.org> Cc: Marc Kleine-Budde <mkl@pengutronix.de> Cc: stable@kernel.org Assisted-by: gkh_clanker_2000 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Vincent Mailhol <mailhol@kernel.org> Tested-by: Vincent Mailhol <mailhol@kernel.org> Link: https://patch.msgid.link/2026022320-poser-stiffly-9d84@gregkh Fixes: 8537257874e9 ("can: etas_es58x: add core support for ETAS ES58X CAN USB interfaces") Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>