]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph-client.git/log
ceph-client.git
2 months agoidpf: skip deallocating txq group's txqs if it is NULL
Li Li [Mon, 12 Jan 2026 23:09:44 +0000 (23:09 +0000)]
idpf: skip deallocating txq group's txqs if it is NULL

In idpf_txq_group_alloc(), if any txq group's txqs failed to
allocate memory:

for (j = 0; j < tx_qgrp->num_txq; j++) {
tx_qgrp->txqs[j] = kzalloc(sizeof(*tx_qgrp->txqs[j]),
   GFP_KERNEL);
if (!tx_qgrp->txqs[j])
goto err_alloc;
}

It would cause a NULL ptr kernel panic in idpf_txq_group_rel():

for (j = 0; j < txq_grp->num_txq; j++) {
if (flow_sch_en) {
kfree(txq_grp->txqs[j]->refillq);
txq_grp->txqs[j]->refillq = NULL;
}

kfree(txq_grp->txqs[j]);
txq_grp->txqs[j] = NULL;
}

[    6.532461] BUG: kernel NULL pointer dereference, address: 0000000000000058
...
[    6.534433] RIP: 0010:idpf_txq_group_rel+0xc9/0x110
...
[    6.538513] Call Trace:
[    6.538639]  <TASK>
[    6.538760]  idpf_vport_queues_alloc+0x75/0x550
[    6.538978]  idpf_vport_open+0x4d/0x3f0
[    6.539164]  idpf_open+0x71/0xb0
[    6.539324]  __dev_open+0x142/0x260
[    6.539506]  netif_open+0x2f/0xe0
[    6.539670]  dev_open+0x3d/0x70
[    6.539827]  bond_enslave+0x5ed/0xf50
[    6.540005]  ? rcutree_enqueue+0x1f/0xb0
[    6.540193]  ? call_rcu+0xde/0x2a0
[    6.540375]  ? barn_get_empty_sheaf+0x5c/0x80
[    6.540594]  ? __kfree_rcu_sheaf+0xb6/0x1a0
[    6.540793]  ? nla_put_ifalias+0x3d/0x90
[    6.540981]  ? kvfree_call_rcu+0xb5/0x3b0
[    6.541173]  ? kvfree_call_rcu+0xb5/0x3b0
[    6.541365]  do_set_master+0x114/0x160
[    6.541547]  do_setlink+0x412/0xfb0
[    6.541717]  ? security_sock_rcv_skb+0x2a/0x50
[    6.541931]  ? sk_filter_trim_cap+0x7c/0x320
[    6.542136]  ? skb_queue_tail+0x20/0x50
[    6.542322]  ? __nla_validate_parse+0x92/0xe50
ro[o t   t o6 .d5e4f2a5u4l0t]-  ? security_capable+0x35/0x60
[    6.542792]  rtnl_newlink+0x95c/0xa00
[    6.542972]  ? __rtnl_unlock+0x37/0x70
[    6.543152]  ? netdev_run_todo+0x63/0x530
[    6.543343]  ? allocate_slab+0x280/0x870
[    6.543531]  ? security_capable+0x35/0x60
[    6.543722]  rtnetlink_rcv_msg+0x2e6/0x340
[    6.543918]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[    6.544138]  netlink_rcv_skb+0x16a/0x1a0
[    6.544328]  netlink_unicast+0x20a/0x320
[    6.544516]  netlink_sendmsg+0x304/0x3b0
[    6.544748]  __sock_sendmsg+0x89/0xb0
[    6.544928]  ____sys_sendmsg+0x167/0x1c0
[    6.545116]  ? ____sys_recvmsg+0xed/0x150
[    6.545308]  ___sys_sendmsg+0xdd/0x120
[    6.545489]  ? ___sys_recvmsg+0x124/0x1e0
[    6.545680]  ? rcutree_enqueue+0x1f/0xb0
[    6.545867]  ? rcutree_enqueue+0x1f/0xb0
[    6.546055]  ? call_rcu+0xde/0x2a0
[    6.546222]  ? evict+0x286/0x2d0
[    6.546389]  ? rcutree_enqueue+0x1f/0xb0
[    6.546577]  ? kmem_cache_free+0x2c/0x350
[    6.546784]  __x64_sys_sendmsg+0x72/0xc0
[    6.546972]  do_syscall_64+0x6f/0x890
[    6.547150]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[    6.547393] RIP: 0033:0x7fc1a3347bd0
...
[    6.551375] RIP: 0010:idpf_txq_group_rel+0xc9/0x110
...
[    6.578856] Rebooting in 10 seconds..

We should skip deallocating txqs[j] if it is NULL in the first place.

Tested: with this patch, the kernel panic no longer appears.

Fixes: 1c325aac10a8 ("idpf: configure resources for TX queues")
Signed-off-by: Li Li <boolli@google.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agoidpf: skip deallocating bufq_sets from rx_qgrp if it is NULL
Li Li [Mon, 12 Jan 2026 23:09:43 +0000 (23:09 +0000)]
idpf: skip deallocating bufq_sets from rx_qgrp if it is NULL

In idpf_rxq_group_alloc(), if rx_qgrp->splitq.bufq_sets failed to get
allocated:

rx_qgrp->splitq.bufq_sets = kcalloc(vport->num_bufqs_per_qgrp,
    sizeof(struct idpf_bufq_set),
    GFP_KERNEL);
if (!rx_qgrp->splitq.bufq_sets) {
err = -ENOMEM;
goto err_alloc;
}

idpf_rxq_group_rel() would attempt to deallocate it in
idpf_rxq_sw_queue_rel(), causing a kernel panic:

```
[    7.967242] early-network-sshd-n-rexd[3148]: knetbase: Info: [    8.127804] BUG: kernel NULL pointer dereference, address: 00000000000000c0
...
[    8.129779] RIP: 0010:idpf_rxq_group_rel+0x101/0x170
...
[    8.133854] Call Trace:
[    8.133980]  <TASK>
[    8.134092]  idpf_vport_queues_alloc+0x286/0x500
[    8.134313]  idpf_vport_open+0x4d/0x3f0
[    8.134498]  idpf_open+0x71/0xb0
[    8.134668]  __dev_open+0x142/0x260
[    8.134840]  netif_open+0x2f/0xe0
[    8.135004]  dev_open+0x3d/0x70
[    8.135166]  bond_enslave+0x5ed/0xf50
[    8.135345]  ? nla_put_ifalias+0x3d/0x90
[    8.135533]  ? kvfree_call_rcu+0xb5/0x3b0
[    8.135725]  ? kvfree_call_rcu+0xb5/0x3b0
[    8.135916]  do_set_master+0x114/0x160
[    8.136098]  do_setlink+0x412/0xfb0
[    8.136269]  ? security_sock_rcv_skb+0x2a/0x50
[    8.136509]  ? sk_filter_trim_cap+0x7c/0x320
[    8.136714]  ? skb_queue_tail+0x20/0x50
[    8.136899]  ? __nla_validate_parse+0x92/0xe50
[    8.137112]  ? security_capable+0x35/0x60
[    8.137304]  rtnl_newlink+0x95c/0xa00
[    8.137483]  ? __rtnl_unlock+0x37/0x70
[    8.137664]  ? netdev_run_todo+0x63/0x530
[    8.137855]  ? allocate_slab+0x280/0x870
[    8.138044]  ? security_capable+0x35/0x60
[    8.138235]  rtnetlink_rcv_msg+0x2e6/0x340
[    8.138431]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[    8.138650]  netlink_rcv_skb+0x16a/0x1a0
[    8.138840]  netlink_unicast+0x20a/0x320
[    8.139028]  netlink_sendmsg+0x304/0x3b0
[    8.139217]  __sock_sendmsg+0x89/0xb0
[    8.139399]  ____sys_sendmsg+0x167/0x1c0
[    8.139588]  ? ____sys_recvmsg+0xed/0x150
[    8.139780]  ___sys_sendmsg+0xdd/0x120
[    8.139960]  ? ___sys_recvmsg+0x124/0x1e0
[    8.140152]  ? rcutree_enqueue+0x1f/0xb0
[    8.140341]  ? rcutree_enqueue+0x1f/0xb0
[    8.140528]  ? call_rcu+0xde/0x2a0
[    8.140695]  ? evict+0x286/0x2d0
[    8.140856]  ? rcutree_enqueue+0x1f/0xb0
[    8.141043]  ? kmem_cache_free+0x2c/0x350
[    8.141236]  __x64_sys_sendmsg+0x72/0xc0
[    8.141424]  do_syscall_64+0x6f/0x890
[    8.141603]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[    8.141841] RIP: 0033:0x7f2799d21bd0
...
[    8.149905] Kernel panic - not syncing: Fatal exception
[    8.175940] Kernel Offset: 0xf800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    8.176425] Rebooting in 10 seconds..
```

Tested: With this patch, the kernel panic no longer appears.

Fixes: 95af467d9a4e ("idpf: configure resources for RX queues")
Signed-off-by: Li Li <boolli@google.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agoidpf: increment completion queue next_to_clean in sw marker wait routine
Li Li [Mon, 5 Jan 2026 06:47:28 +0000 (06:47 +0000)]
idpf: increment completion queue next_to_clean in sw marker wait routine

Currently, in idpf_wait_for_sw_marker_completion(), when an
IDPF_TXD_COMPLT_SW_MARKER packet is found, the routine breaks out of
the for loop and does not increment the next_to_clean counter. This
causes the subsequent NAPI polls to run into the same
IDPF_TXD_COMPLT_SW_MARKER packet again and print out the following:

    [   23.261341] idpf 0000:05:00.0 eth1: Unknown TX completion type: 5

Instead, we should increment next_to_clean regardless when an
IDPF_TXD_COMPLT_SW_MARKER packet is found.

Tested: with the patch applied, we do not see the errors above from NAPI
polls anymore.

Fixes: 9d39447051a0 ("idpf: remove SW marker handling from NAPI")
Signed-off-by: Li Li <boolli@google.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agomshv: add arm64 support for doorbell & intercept SINTs
Anirudh Rayabharam (Microsoft) [Wed, 25 Feb 2026 12:44:03 +0000 (12:44 +0000)]
mshv: add arm64 support for doorbell & intercept SINTs

On x86, the HYPERVISOR_CALLBACK_VECTOR is used to receive synthetic
interrupts (SINTs) from the hypervisor for doorbells and intercepts.
There is no such vector reserved for arm64.

On arm64, the hypervisor exposes a synthetic register that can be read
to find the INTID that should be used for SINTs. This INTID is in the
PPI range.

To better unify the code paths, introduce mshv_sint_vector_init() that
either reads the synthetic register and obtains the INTID (arm64) or
just uses HYPERVISOR_CALLBACK_VECTOR as the interrupt vector (x86).

Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Signed-off-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2 months agomshv: refactor synic init and cleanup
Anirudh Rayabharam (Microsoft) [Wed, 25 Feb 2026 12:44:02 +0000 (12:44 +0000)]
mshv: refactor synic init and cleanup

Rename mshv_synic_init() to mshv_synic_cpu_init() and
mshv_synic_cleanup() to mshv_synic_cpu_exit() to better reflect that
these functions handle per-cpu synic setup and teardown.

Use mshv_synic_init/cleanup() to perform init/cleanup that is not per-cpu.
Move all the synic related setup from mshv_parent_partition_init.

Move the reboot notifier to mshv_synic.c because it currently only
operates on the synic cpuhp state.

Move out synic_pages from the global mshv_root since its use is now
completely local to mshv_synic.c.

This is in preparation for adding more stuff to mshv_synic_init().

No functional change.

Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2 months agoMerge tag 'ata-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Linus Torvalds [Wed, 25 Feb 2026 18:41:14 +0000 (10:41 -0800)]
Merge tag 'ata-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull ata fixes from Niklas Cassel:

 - The newly introduced feature that issues a deferred (non-NCQ) command
   from a workqueue, forgot to consider the case where the deferred QC
   times out. Fix the code to take timeouts into consideration, which
   avoids a use after free (Damien)

 - The newly introduced feature that issues a deferred (non-NCQ) command
   from a workqueue, when unloading the module, calls cancel_work_sync(),
   a function that can sleep, while holding a spin lock. Move the function
   call outside the lock (Damien)

* tag 'ata-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: libata-core: fix cancellation of a port deferred qc work
  ata: libata-eh: correctly handle deferred qc timeouts

2 months agoMerge tag 'vfs-7.0-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Linus Torvalds [Wed, 25 Feb 2026 18:34:23 +0000 (10:34 -0800)]
Merge tag 'vfs-7.0-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

 - Fix an uninitialized variable in file_getattr().

   The flags_valid field wasn't initialized before calling
   vfs_fileattr_get(), triggering KMSAN uninit-value reports in fuse

 - Fix writeback wakeup and logging timeouts when DETECT_HUNG_TASK is
   not enabled.

   sysctl_hung_task_timeout_secs is 0 in that case causing spurious
   "waiting for writeback completion for more than 1 seconds" warnings

 - Fix a null-ptr-deref in do_statmount() when the mount is internal

 - Add missing kernel-doc description for the @private parameter in
   iomap_readahead()

 - Fix mount namespace creation to hold namespace_sem across the mount
   copy in create_new_namespace().

   The previous drop-and-reacquire pattern was fragile and failed to
   clean up mount propagation links if the real rootfs was a shared or
   dependent mount

 - Fix /proc mount iteration where m->index wasn't updated when
   m->show() overflows, causing a restart to repeatedly show the same
   mount entry in a rapidly expanding mount table

 - Return EFSCORRUPTED instead of ENOSPC in minix_new_inode() when the
   inode number is out of range

 - Fix unshare(2) when CLONE_NEWNS is set and current->fs isn't shared.

   copy_mnt_ns() received the live fs_struct so if a subsequent
   namespace creation failed the rollback would leave pwd and root
   pointing to detached mounts. Always allocate a new fs_struct when
   CLONE_NEWNS is requested

 - fserror bug fixes:

    - Remove the unused fsnotify_sb_error() helper now that all callers
      have been converted to fserror_report_metadata

    - Fix a lockdep splat in fserror_report() where igrab() takes
      inode::i_lock which can be held in IRQ context.

      Replace igrab() with a direct i_count bump since filesystems
      should not report inodes that are about to be freed or not yet
      exposed

 - Handle error pointer in procfs for try_lookup_noperm()

 - Fix an integer overflow in ep_loop_check_proc() where recursive calls
   returning INT_MAX would overflow when +1 is added, breaking the
   recursion depth check

 - Fix a misleading break in pidfs

* tag 'vfs-7.0-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  pidfs: avoid misleading break
  eventpoll: Fix integer overflow in ep_loop_check_proc()
  proc: Fix pointer error dereference
  fserror: fix lockdep complaint when igrabbing inode
  fsnotify: drop unused helper
  unshare: fix unshare_fs() handling
  minix: Correct errno in minix_new_inode
  namespace: fix proc mount iteration
  mount: hold namespace_sem across copy in create_new_namespace()
  iomap: Describe @private in iomap_readahead()
  statmount: Fix the null-ptr-deref in do_statmount()
  writeback: Fix wakeup and logging timeouts for !DETECT_HUNG_TASK
  fs: init flags_valid before calling vfs_fileattr_get

2 months agousb: image: mdc800: kill download URB on timeout
Ziyi Guo [Mon, 9 Feb 2026 15:19:37 +0000 (15:19 +0000)]
usb: image: mdc800: kill download URB on timeout

mdc800_device_read() submits download_urb and waits for completion.
If the timeout fires and the device has not responded, the function
returns without killing the URB, leaving it active.

A subsequent read() resubmits the same URB while it is still
in-flight, triggering the WARN in usb_submit_urb():

  "URB submitted while active"

Check the return value of wait_event_timeout() and kill the URB if
it indicates timeout, ensuring the URB is complete before its status
is inspected or the URB is resubmitted.

Similar to
- commit 372c93131998 ("USB: yurex: fix control-URB timeout handling")
- commit b98d5000c505 ("media: rc: iguanair: handle timeouts")

Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu>
Cc: stable <stable@kernel.org>
Link: https://patch.msgid.link/20260209151937.2247202-1-n7l8m4@u.northwestern.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 months agousb: mdc800: handle signal and read racing
Oliver Neukum [Mon, 9 Feb 2026 14:20:48 +0000 (15:20 +0100)]
usb: mdc800: handle signal and read racing

If a signal arrives after a read has partially completed,
we need to return the number of bytes read. -EINTR is correct
only if that number is zero.

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Cc: stable <stable@kernel.org>
Link: https://patch.msgid.link/20260209142048.1503791-1-oneukum@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 months agocgroup/cpuset: fix null-ptr-deref in rebuild_sched_domains_cpuslocked
Chen Ridong [Wed, 25 Feb 2026 01:15:23 +0000 (01:15 +0000)]
cgroup/cpuset: fix null-ptr-deref in rebuild_sched_domains_cpuslocked

A null-pointer-dereference bug was reported by syzbot:

Oops: general protection fault, probably for address 0xdffffc0000000000:
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
RIP: 0010:bitmap_subset include/linux/bitmap.h:433 [inline]
RIP: 0010:cpumask_subset include/linux/cpumask.h:836 [inline]
RIP: 0010:rebuild_sched_domains_locked kernel/cgroup/cpuset.c:967
RSP: 0018:ffffc90003ecfbc0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000020
RDX: ffff888028de0000 RSI: ffffffff8200f003 RDI: ffffffff8df14f28
RBP: 0000000000000000 R08: 0000000000000cc0 R09: 00000000ffffffff
R10: ffffffff8e7d95b3 R11: 0000000000000001 R12: 0000000000000000
R13: 00000000000f4240 R14: dffffc0000000000 R15: 0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2f463fff CR3: 000000003704c000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 rebuild_sched_domains_cpuslocked kernel/cgroup/cpuset.c:983 [inline]
 rebuild_sched_domains+0x21/0x40 kernel/cgroup/cpuset.c:990
 sched_rt_handler+0xb5/0xe0 kernel/sched/rt.c:2911
 proc_sys_call_handler+0x47f/0x5a0 fs/proc/proc_sysctl.c:600
 new_sync_write fs/read_write.c:595 [inline]
 vfs_write+0x6ac/0x1070 fs/read_write.c:688
 ksys_write+0x12a/0x250 fs/read_write.c:740
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The issue occurs when generate_sched_domains() returns ndoms = 1 and
doms = NULL due to a kmalloc failure. This leads to a null-pointer
dereference when accessing doms in rebuild_sched_domains_locked().

Fix this by adding a NULL check for doms before accessing it.

Fixes: 6ee43047e8ad ("cpuset: Remove unnecessary checks in rebuild_sched_domains_locked")
Reported-by: syzbot+460792609a79c085f79f@syzkaller.appspotmail.com
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2 months agoMAINTAINERS: Update contact with the kernel.org address
Daniel Lezcano [Tue, 24 Feb 2026 12:41:20 +0000 (13:41 +0100)]
MAINTAINERS: Update contact with the kernel.org address

Use the kernel.org address as a unified single entry to send patches
to. At the same time, update mailmap to group all past contributions.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2 months agomtd: rawnand: cadence: Fix error check for dma_alloc_coherent() in cadence_nand_init()
Chen Ni [Mon, 9 Feb 2026 07:56:18 +0000 (15:56 +0800)]
mtd: rawnand: cadence: Fix error check for dma_alloc_coherent() in cadence_nand_init()

Fix wrong variable used for error checking after dma_alloc_coherent()
call. The function checks cdns_ctrl->dma_cdma_desc instead of
cdns_ctrl->cdma_desc, which could lead to incorrect error handling.

Fixes: ec4ba01e894d ("mtd: rawnand: Add new Cadence NAND driver to MTD subsystem")
Cc: stable@vger.kernel.org
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2 months agomtd: Avoid boot crash in RedBoot partition table parser
Finn Thain [Mon, 16 Feb 2026 07:01:30 +0000 (18:01 +1100)]
mtd: Avoid boot crash in RedBoot partition table parser

Given CONFIG_FORTIFY_SOURCE=y and a recent compiler,
commit 439a1bcac648 ("fortify: Use __builtin_dynamic_object_size() when
available") produces the warning below and an oops.

    Searching for RedBoot partition table in 50000000.flash at offset 0x7e0000
    ------------[ cut here ]------------
    WARNING: lib/string_helpers.c:1035 at 0xc029e04c, CPU#0: swapper/0/1
    memcmp: detected buffer overflow: 15 byte read of buffer size 14
    Modules linked in:
    CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.19.0 #1 NONE

As Kees said, "'names' is pointing to the final 'namelen' many bytes
of the allocation ... 'namelen' could be basically any length at all.
This fortify warning looks legit to me -- this code used to be reading
beyond the end of the allocation."

Since the size of the dynamic allocation is calculated with strlen()
we can use strcmp() instead of memcmp() and remain within bounds.

Cc: Kees Cook <kees@kernel.org>
Cc: stable@vger.kernel.org
Cc: linux-hardening@vger.kernel.org
Link: https://lore.kernel.org/all/202602151911.AD092DFFCD@keescook/
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Suggested-by: Kees Cook <kees@kernel.org>
Signed-off-by: Finn Thain <fthain@linux-m68k.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2 months agoMAINTAINERS: add Takahiro Kuwano as SPI NOR reviewer
Tudor Ambarus [Fri, 13 Feb 2026 17:02:34 +0000 (17:02 +0000)]
MAINTAINERS: add Takahiro Kuwano as SPI NOR reviewer

Takahiro has been an active contributor to the SPI NOR subsystem,
providing valuable patches and reviews. Add him as a designated
reviewer to help facilitate patch processing and maintenance.

Cc: Takahiro Kuwano <takahiro.kuwano@infineon.com>
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Acked-by: Takahiro Kuwano <takahiro.kuwano@infineon.com>
Acked-by: Michael Walle <mwalle@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2 months agoMAINTAINERS: remove Tudor Ambarus as SPI NOR maintainer
Tudor Ambarus [Fri, 13 Feb 2026 17:02:33 +0000 (17:02 +0000)]
MAINTAINERS: remove Tudor Ambarus as SPI NOR maintainer

I have not been actively involved in SPI NOR development recently and
would like to step down to focus on my current day-to-day work.

The subsystem remains in good hands with Pratyush and Michael.

Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Acked-by: Michael Walle <mwalle@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2 months agos390/pfault: Fix virtual vs physical address confusion
Alexander Gordeev [Tue, 24 Feb 2026 06:41:07 +0000 (07:41 +0100)]
s390/pfault: Fix virtual vs physical address confusion

When Linux is running as guest, runs a user space process and the
user space process accesses a page that the host has paged out,
the guest gets a pfault interrupt and schedules a different process.
Without this mechanism the host would have to suspend the whole
virtual CPU until the page has been paged in.

To setup the pfault interrupt the real address of parameter list
should be passed to DIAGNOSE 0x258, but a virtual address is passed
instead.

That has a performance impact, since the pfault setup never succeeds,
the interrupt is never delivered to a guest and the whole virtual CPU
is suspended as result.

Cc: stable@vger.kernel.org
Fixes: c98d2ecae08f ("s390/mm: Uncouple physical vs virtual address spaces")
Reported-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2 months agos390/kexec: Disable stack protector in s390_reset_system()
Vasily Gorbik [Mon, 23 Feb 2026 22:33:52 +0000 (23:33 +0100)]
s390/kexec: Disable stack protector in s390_reset_system()

s390_reset_system() calls set_prefix(0), which switches back to the
absolute lowcore. At that point the stack protector canary no longer
matches the canary from the lowcore the function was entered with, so
the stack check fails.

Mark s390_reset_system() __no_stack_protector. This is safe here since
its callers (__do_machine_kdump() and __do_machine_kexec()) are
effectively no-return and fall back to disabled_wait() on failure.

Fixes: f5730d44e05e ("s390: Add stackprotector support")
Reported-by: Nikita Dubrovskii <nikita@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2 months agoMerge branch 'idle-vtime-fixes-cleanups' into fixes
Vasily Gorbik [Wed, 25 Feb 2026 15:59:16 +0000 (16:59 +0100)]
Merge branch 'idle-vtime-fixes-cleanups' into fixes

Pull a small idle/vtime series with two cputime accounting fixes plus
related cleanups and minor improvements.

* idle-vtime-fixes-cleanups:
  s390/idle: Remove psw_idle() prototype
  s390/vtime: Use lockdep_assert_irqs_disabled() instead of BUG_ON()
  s390/vtime: Use __this_cpu_read() / get rid of READ_ONCE()
  s390/irq/idle: Remove psw bits early
  s390/idle: Inline update_timer_idle()
  s390/idle: Slightly optimize idle time accounting
  s390/idle: Add comment for non obvious code
  s390/vtime: Fix virtual timer forwarding
  s390/idle: Fix cpu idle exit cpu time accounting

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2 months agos390/idle: Remove psw_idle() prototype
Heiko Carstens [Wed, 18 Feb 2026 14:20:12 +0000 (15:20 +0100)]
s390/idle: Remove psw_idle() prototype

psw_idle() does not exist anymore. Remove its prototype.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2 months agos390/vtime: Use lockdep_assert_irqs_disabled() instead of BUG_ON()
Heiko Carstens [Wed, 18 Feb 2026 14:20:11 +0000 (15:20 +0100)]
s390/vtime: Use lockdep_assert_irqs_disabled() instead of BUG_ON()

Use lockdep_assert_irqs_disabled() instead of BUG_ON(). This avoids
crashing the kernel, and generates better code if CONFIG_PROVE_LOCKING
is disabled.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2 months agos390/vtime: Use __this_cpu_read() / get rid of READ_ONCE()
Heiko Carstens [Wed, 18 Feb 2026 14:20:10 +0000 (15:20 +0100)]
s390/vtime: Use __this_cpu_read() / get rid of READ_ONCE()

do_account_vtime() runs always with interrupts disabled, therefore use
__this_cpu_read() instead of this_cpu_read() to get rid of a pointless
preempt_disable() / preempt_enable() pair.

Also there are no concurrent writers to the cpu time accounting fields
in lowcore. Therefore get rid of READ_ONCE() usages.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2 months agos390/irq/idle: Remove psw bits early
Heiko Carstens [Wed, 18 Feb 2026 14:20:09 +0000 (15:20 +0100)]
s390/irq/idle: Remove psw bits early

Remove wait, io, external interrupt bits early in do_io_irq()/do_ext_irq()
when previous context was idle. This saves one conditional branch and is
closer to the original old assembly code.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2 months agos390/idle: Inline update_timer_idle()
Heiko Carstens [Wed, 18 Feb 2026 14:20:08 +0000 (15:20 +0100)]
s390/idle: Inline update_timer_idle()

Inline update_timer_idle() again to avoid an extra function call. This
way the generated code is close to old assembler version again.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2 months agos390/idle: Slightly optimize idle time accounting
Heiko Carstens [Wed, 18 Feb 2026 14:20:07 +0000 (15:20 +0100)]
s390/idle: Slightly optimize idle time accounting

Slightly optimize account_idle_time_irq() and update_timer_idle():

- Use fast single instruction __atomic64() primitives to update per
  cpu idle_time and idle_count, instead of READ_ONCE() / WRITE_ONCE()
  pairs

- stcctm() is an inline assembly with a full memory barrier. This
  leads to a not necessary extra dereference of smp_cpu_mtid in
  update_timer_idle(). Avoid this and read smp_cpu_mtid into a
  variable

- Use __this_cpu_add() instead of this_cpu_add() to avoid disabling /
  enabling of preemption several times in a loop in update_timer_idle().

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2 months agos390/idle: Add comment for non obvious code
Heiko Carstens [Wed, 18 Feb 2026 14:20:06 +0000 (15:20 +0100)]
s390/idle: Add comment for non obvious code

Add a comment to update_timer_idle() which describes why wall time (not
steal time) is added to steal_timer. This is not obvious and was reported
by Frederic Weisbecker.

Reported-by: Frederic Weisbecker <frederic@kernel.org>
Closes: https://lore.kernel.org/all/aXEVM-04lj0lntMr@localhost.localdomain/
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2 months agos390/vtime: Fix virtual timer forwarding
Heiko Carstens [Wed, 18 Feb 2026 14:20:05 +0000 (15:20 +0100)]
s390/vtime: Fix virtual timer forwarding

Since delayed accounting of system time [1] the virtual timer is
forwarded by do_account_vtime() but also vtime_account_kernel(),
vtime_account_softirq(), and vtime_account_hardirq(). This leads
to double accounting of system, guest, softirq, and hardirq time.

Remove accounting from the vtime_account*() family to restore old behavior.

There is only one user of the vtimer interface, which might explain
why nobody noticed this so far.

Fixes: b7394a5f4ce9 ("sched/cputime, s390: Implement delayed accounting of system time") [1]
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2 months agos390/idle: Fix cpu idle exit cpu time accounting
Heiko Carstens [Wed, 18 Feb 2026 14:20:04 +0000 (15:20 +0100)]
s390/idle: Fix cpu idle exit cpu time accounting

With the conversion to generic entry [1] cpu idle exit cpu time accounting
was converted from assembly to C. This introduced an reversed order of cpu
time accounting.

On cpu idle exit the current accounting happens with the following call
chain:

-> do_io_irq()/do_ext_irq()
 -> irq_enter_rcu()
  -> account_hardirq_enter()
   -> vtime_account_irq()
    -> vtime_account_kernel()

vtime_account_kernel() accounts the passed cpu time since last_update_timer
as system time, and updates last_update_timer to the current cpu timer
value.

However the subsequent call of

 -> account_idle_time_irq()

will incorrectly subtract passed cpu time from timer_idle_enter to the
updated last_update_timer value from system_timer. Then last_update_timer
is updated to a sys_enter_timer, which means that last_update_timer goes
back in time.

Subsequently account_hardirq_exit() will account too much cpu time as
hardirq time. The sum of all accounted cpu times is still correct, however
some cpu time which was previously accounted as system time is now
accounted as hardirq time, plus there is the oddity that last_update_timer
goes back in time.

Restore previous behavior by extracting cpu time accounting code from
account_idle_time_irq() into a new update_timer_idle() function and call it
before irq_enter_rcu().

Fixes: 56e62a737028 ("s390: convert to generic entry") [1]
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2 months agoio_uring/timeout: READ_ONCE sqe->addr
Pavel Begunkov [Wed, 25 Feb 2026 10:35:57 +0000 (10:35 +0000)]
io_uring/timeout: READ_ONCE sqe->addr

We should use READ_ONCE when reading from a SQE, make sure timeout gets
a stable timespec address.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 months agoALSA: hda/intel: increase default bdl_pos_adj for Nvidia controllers
Panagiotis Foliadis [Wed, 25 Feb 2026 14:53:43 +0000 (14:53 +0000)]
ALSA: hda/intel: increase default bdl_pos_adj for Nvidia controllers

The default bdl_pos_adj of 32 for Nvidia HDA controllers is
insufficient on GA102 (and likely other recent Nvidia GPUs) after S3
suspend/resume. The controller's DMA timing degrades after resume,
causing premature IRQ detection in azx_position_ok() which results in
silent HDMI/DP audio output despite userspace reporting a valid
playback state and correct ELD data.

Increase bdl_pos_adj to 64 for AZX_DRIVER_NVIDIA, matching the value
already used by Intel Apollo Lake for the same class of timing issue.

Cc: stable@vger.kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221069
Suggested-by: Charalampos Mitrodimas <charmitro@posteo.net>
Signed-off-by: Panagiotis Foliadis <pfoliadis@posteo.net>
Link: https://patch.msgid.link/20260225-nvidia-audio-fix-v1-1-b1383c37ec49@posteo.net
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2 months agolib/Kconfig.debug: Require a release version of LLVM 22 for context analysis
Nathan Chancellor [Tue, 24 Feb 2026 23:16:30 +0000 (16:16 -0700)]
lib/Kconfig.debug: Require a release version of LLVM 22 for context analysis

Using a prerelease version as a minimum supported version for
CONFIG_WARN_CONTEXT_ANALYSIS was reasonable to do while LLVM 22 was the
development version so that people could immediately build from main and
start testing and validating this in their own code. However, it can be
problematic when using prerelease versions of LLVM 22, such as Android
clang 22.0.1 (the current android mainline compiler) or when bisecting
LLVM between llvmorg-22-init and llvmorg-23-init, to build the kernel,
as all compiler fixes for the context analysis may not be present,
potentially resulting in warnings that can easily turn into errors.

Now that LLVM 22 is released as 22.1.0, upgrade the check to require at
least this version to ensure that a user's toolchain actually has all
the changes needed for a smooth experience with context analysis.

Fixes: 3269701cb256 ("compiler-context-analysis: Add infrastructure for Context Analysis with Clang")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Marco Elver <elver@google.com>
Link: https://patch.msgid.link/20260224-bump-clang-ver-context-analysis-v1-1-16cc7a90a040@kernel.org
2 months agoperf: Fix __perf_event_overflow() vs perf_remove_from_context() race
Peter Zijlstra [Tue, 24 Feb 2026 12:29:09 +0000 (13:29 +0100)]
perf: Fix __perf_event_overflow() vs perf_remove_from_context() race

Make sure that __perf_event_overflow() runs with IRQs disabled for all
possible callchains. Specifically the software events can end up running
it with only preemption disabled.

This opens up a race vs perf_event_exit_event() and friends that will go
and free various things the overflow path expects to be present, like
the BPF program.

Fixes: 592903cdcbf6 ("perf_counter: add an event_list")
Reported-by: Simond Hu <cmdhh1767@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Simond Hu <cmdhh1767@gmail.com>
Link: https://patch.msgid.link/20260224122909.GV1395416@noisy.programming.kicks-ass.net
2 months agocpufreq: intel_pstate: Fix crash during turbo disable
Srinivas Pandruvada [Wed, 25 Feb 2026 00:17:52 +0000 (16:17 -0800)]
cpufreq: intel_pstate: Fix crash during turbo disable

When the system is booted with kernel command line argument "nosmt" or
"maxcpus" to limit the number of CPUs, disabling turbo via:

 echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo

results in a crash:

 PF: supervisor read access in kernel mode
 PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP PTI
 ...
 RIP: 0010:store_no_turbo+0x100/0x1f0
 ...

This occurs because for_each_possible_cpu() returns CPUs even if they
are not online. For those CPUs, all_cpu_data[] will be NULL. Since
commit 973207ae3d7c ("cpufreq: intel_pstate: Rearrange max frequency
updates handling code"), all_cpu_data[] is dereferenced even for CPUs
which are not online, causing the NULL pointer dereference.

To fix that, pass CPU number to intel_pstate_update_max_freq() and use
all_cpu_data[] for those CPUs for which there is a valid cpufreq policy.

Fixes: 973207ae3d7c ("cpufreq: intel_pstate: Rearrange max frequency updates handling code")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221068
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 6.16+ <stable@vger.kernel.org> # 6.16+
Link: https://patch.msgid.link/20260225001752.890164-1-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2 months agoxfs: add static size checks for ioctl UABI
Wilfred Mallawa [Wed, 11 Feb 2026 03:29:04 +0000 (13:29 +1000)]
xfs: add static size checks for ioctl UABI

The ioctl structures in libxfs/xfs_fs.h are missing static size checks.
It is useful to have static size checks for these structures as adding
new fields to them could cause issues (e.g. extra padding that may be
inserted by the compiler). So add these checks to xfs/xfs_ondisk.h.

Due to different padding/alignment requirements across different
architectures, to avoid build failures, some structures are ommited from
the size checks. For example, structures with "compat_" definitions in
xfs/xfs_ioctl32.h are ommited.

Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoxfs: remove duplicate static size checks
Wilfred Mallawa [Wed, 11 Feb 2026 03:29:02 +0000 (13:29 +1000)]
xfs: remove duplicate static size checks

In libxfs/xfs_ondisk.h, remove some duplicate entries of
XFS_CHECK_STRUCT_SIZE().

Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoxfs: Add comments for usages of some macros.
Nirjhar Roy (IBM) [Fri, 20 Feb 2026 06:54:01 +0000 (12:24 +0530)]
xfs: Add comments for usages of some macros.

Add comments explaining when to use XFS_IS_CORRUPT() and ASSERT()

Suggested-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoxfs: Update lazy counters in xfs_growfs_rt_bmblock()
Nirjhar Roy (IBM) [Fri, 20 Feb 2026 06:54:00 +0000 (12:24 +0530)]
xfs: Update lazy counters in xfs_growfs_rt_bmblock()

Update lazy counters in xfs_growfs_rt_bmblock() similar to the way it
is done xfs_growfs_data_private(). This is because the lazy counters are
not always updated and synching the counters will avoid inconsistencies
between frexents and rtextents(total realtime extent count). This will
be more useful once realtime shrink is implemented as this will prevent
some transient state to occur where frexents might be greater than
total rtextents.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoxfs: Add a comment in xfs_log_sb()
Nirjhar Roy (IBM) [Fri, 20 Feb 2026 06:53:59 +0000 (12:23 +0530)]
xfs: Add a comment in xfs_log_sb()

Add a comment explaining why the sb_frextents are updated outside the
if (xfs_has_lazycount(mp) check even though it is a lazycounter.
RT groups are supported only in v5 filesystems which always have
lazycounter enabled - so putting it inside the if(xfs_has_lazycount(mp)
check is redundant.

Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoxfs: Fix xfs_last_rt_bmblock()
Nirjhar Roy (IBM) [Fri, 20 Feb 2026 06:53:58 +0000 (12:23 +0530)]
xfs: Fix xfs_last_rt_bmblock()

Bug description:

If the size of the last rtgroup i.e, the rtg passed to
xfs_last_rt_bmblock() is such that the last rtextent falls in 0th word
offset of a bmblock of the bitmap file tracking this (last) rtgroup,
then in that case xfs_last_rt_bmblock() incorrectly returns the next
bmblock number instead of the current/last used bmblock number.
When xfs_last_rt_bmblock() incorrectly returns the next bmblock,
the loop to grow/modify the bmblocks in xfs_growfs_rtg() doesn't
execute and xfs_growfs basically does a nop in certain cases.

xfs_growfs will do a nop when the new size of the fs will have the same
number of rtgroups i.e, we are only growing the last rtgroup.

Reproduce:
$ mkfs.xfs -m metadir=0 -r rtdev=/dev/loop1 /dev/loop0 \
-r size=32769b -f
$ mount -o rtdev=/dev/loop1 /dev/loop0 /mnt/scratch
$ xfs_growfs -R $(( 32769 + 1 )) /mnt/scratch
$ xfs_info /mnt/scratch | grep rtextents
$ # We can see that rtextents hasn't changed

Fix:
Fix this by returning the current/last used bmblock when the last
rtgroup size is not a multiple xfs_rtbitmap_rtx_per_rbmblock()
and the next bmblock when the rtgroup size is a multiple of
xfs_rtbitmap_rtx_per_rbmblock() i.e, the existing blocks are
completely used up.
Also, I have renamed xfs_last_rt_bmblock() to
xfs_last_rt_bmblock_to_extend() to signify that this function
returns the bmblock number to extend and NOT always the last used
bmblock number.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoxfs: don't report half-built inodes to fserror
Darrick J. Wong [Wed, 18 Feb 2026 23:25:38 +0000 (15:25 -0800)]
xfs: don't report half-built inodes to fserror

Sam Sun apparently found a syzbot way to fuzz a filesystem such that
xfs_iget_cache_miss would free the inode before the fserror code could
catch up.  Frustratingly he doesn't use the syzbot dashboard so there's
no C reproducer and not even a full error report, so I'm guessing that:

Inodes that are being constructed or torn down inside XFS are not
visible to the VFS.  They should never be reported to fserror.
Also, any inode that has been freshly allocated in _cache_miss should be
marked INEW immediately because, well, it's an incompletely constructed
inode that isn't yet visible to the VFS.

Reported-by: Sam Sun <samsun1006219@gmail.com>
Fixes: 5eb4cb18e445d0 ("xfs: convey metadata health events to the health monitor")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoxfs: don't report metadata inodes to fserror
Darrick J. Wong [Wed, 18 Feb 2026 23:25:38 +0000 (15:25 -0800)]
xfs: don't report metadata inodes to fserror

Internal metadata inodes are not exposed to userspace programs, so it
makes no sense to pass them to the fserror functions (aka fsnotify).
Instead, report metadata file problems as general filesystem corruption.

Fixes: 5eb4cb18e445d0 ("xfs: convey metadata health events to the health monitor")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoxfs: fix potential pointer access race in xfs_healthmon_get
Darrick J. Wong [Wed, 18 Feb 2026 23:25:37 +0000 (15:25 -0800)]
xfs: fix potential pointer access race in xfs_healthmon_get

Pankaj Raghav asks about this code in xfs_healthmon_get:

  hm = mp->m_healthmon;
  if (hm && !refcount_inc_not_zero(&hm->ref))
    hm = NULL;
  rcu_read_unlock();
  return hm;

(slightly edited to compress a mailing list thread)

"Nit: Should we do a READ_ONCE(mp->m_healthmon) here to avoid any
compiler tricks that can result in an undefined behaviour? I am not sure
if I am being paranoid here.

"So this is my understanding: RCU guarantees that we get a valid object
(actual data of m_healthmon) but does not guarantee the compiler will
not reread the pointer between checking if hm is !NULL and accessing the
pointer as we are doing it lockless.

"So just a barrier() call in rcu_read_lock is enough to make sure this
doesn't happen and probably adding a READ_ONCE() is not needed?"

After some initial confusion I concluded that he's correct.  The
compiler could very well eliminate the hm variable in favor of walking
the pointers twice, turning the code into:

  if (mp->m_healthmon && !refcount_inc_not_zero(&mp->m_healthmon->ref))

If this happens, then xfs_healthmon_detach can sneak in between the
two sides of the && expression and set mp->m_healthmon to NULL, and
thereby cause a null pointer dereference crash.  Fix this by using the
rcu pointer assignment and dereference functions, which ensure that the
proper reordering barriers are in place.

Practically speaking, gcc seems to allocate an actual variable for hm
and only reads mp->m_healthmon once (as intended), but we ought to be
more explicit about requiring this.

Reported-by: Pankaj Raghav <pankaj.raghav@linux.dev>
Fixes: a48373e7d35a89f6f ("xfs: start creating infrastructure for health monitoring")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoxfs: fix xfs_group release bug in xfs_dax_notify_dev_failure
Darrick J. Wong [Wed, 18 Feb 2026 23:25:36 +0000 (15:25 -0800)]
xfs: fix xfs_group release bug in xfs_dax_notify_dev_failure

Chris Mason reports that his AI tools noticed that we were using
xfs_perag_put and xfs_group_put to release the group reference returned
by xfs_group_next_range.  However, the iterator function returns an
object with an active refcount, which means that we must use the correct
function to release the active refcount, which is _rele.

Cc: <stable@vger.kernel.org> # v6.0
Fixes: 6f643c57d57c56 ("xfs: implement ->notify_failure() for XFS")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoxfs: fix xfs_group release bug in xfs_verify_report_losses
Darrick J. Wong [Wed, 18 Feb 2026 23:25:35 +0000 (15:25 -0800)]
xfs: fix xfs_group release bug in xfs_verify_report_losses

Chris Mason reports that his AI tools noticed that we were using
xfs_perag_put and xfs_group_put to release the group reference returned
by xfs_group_next_range.  However, the iterator function returns an
object with an active refcount, which means that we must use the correct
function to release the active refcount, which is _rele.

Fixes: b8accfd65d31f2 ("xfs: add media verification ioctl")
Reported-by: Chris Mason <clm@meta.com>
Link: https://lore.kernel.org/linux-xfs/20260206030527.2506821-1-clm@meta.com/
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoxfs: fix copy-paste error in previous fix
Darrick J. Wong [Wed, 18 Feb 2026 23:25:35 +0000 (15:25 -0800)]
xfs: fix copy-paste error in previous fix

Chris Mason noticed that there is a copy-paste error in a recent change
to xrep_dir_teardown that nulls out pointers after freeing the
resources.

Fixes: ba408d299a3bb3c ("xfs: only call xf{array,blob}_destroy if we have a valid pointer")
Link: https://lore.kernel.org/linux-xfs/20260205194211.2307232-1-clm@meta.com/
Reported-by: Chris Mason <clm@meta.com>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoxfs: Fix error pointer dereference
Ethan Tidmore [Fri, 20 Feb 2026 03:38:25 +0000 (21:38 -0600)]
xfs: Fix error pointer dereference

The function try_lookup_noperm() can return an error pointer and is not
checked for one.

Add checks for error pointer in xrep_adoption_check_dcache() and
xrep_adoption_zap_dcache().

Detected by Smatch:
fs/xfs/scrub/orphanage.c:449 xrep_adoption_check_dcache() error:
'd_child' dereferencing possible ERR_PTR()

fs/xfs/scrub/orphanage.c:485 xrep_adoption_zap_dcache() error:
'd_child' dereferencing possible ERR_PTR()

Fixes: 73597e3e42b4 ("xfs: ensure dentry consistency when the orphanage adopts a file")
Cc: stable@vger.kernel.org # v6.16
Signed-off-by: Ethan Tidmore <ethantidmore06@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoxfs: remove metafile inodes from the active inode stat
Christoph Hellwig [Mon, 2 Feb 2026 14:14:32 +0000 (15:14 +0100)]
xfs: remove metafile inodes from the active inode stat

The active inode (or active vnode until recently) stat can get much larger
than expected on file systems with a lot of metafile inodes like zoned
file systems on SMR hard disks with 10.000s of rtg rmap inodes.

Remove all metafile inodes from the active counter to make it more useful
to track actual workloads and add a separate counter for active metafile
inodes.

This fixes xfs/177 on SMR hard drives.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoxfs: cleanup inode counter stats
Christoph Hellwig [Mon, 2 Feb 2026 14:14:31 +0000 (15:14 +0100)]
xfs: cleanup inode counter stats

Most of them are unused, so mark them as such.  Give the remaining ones
names that match their use instead of the historic IRIX ones based on
vnodes.  Note that the names are purely internal to the XFS code, the
user interface is based on section names and arrays of counters.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoxfs: fix code alignment issues in xfs_ondisk.c
Wilfred Mallawa [Thu, 12 Feb 2026 22:50:06 +0000 (08:50 +1000)]
xfs: fix code alignment issues in xfs_ondisk.c

Fixup some code alignment issues in xfs_ondisk.c

Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoxfs: Replace &rtg->rtg_group with rtg_group()
Nirjhar Roy (IBM) [Wed, 11 Feb 2026 13:55:14 +0000 (19:25 +0530)]
xfs: Replace &rtg->rtg_group with rtg_group()

Use the already existing rtg_group() wrapper instead of directly
accessing the struct xfs_group member in struct xfs_rtgroup.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
[cem: Conflict resolution against 06873dbd940d]
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoxfs: Refactoring the nagcount and delta calculation
Nirjhar Roy (IBM) [Wed, 11 Feb 2026 13:55:13 +0000 (19:25 +0530)]
xfs: Refactoring the nagcount and delta calculation

Introduce xfs_growfs_compute_delta() to calculate the nagcount
and delta blocks and refactor the code from xfs_growfs_data_private().
No functional changes.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoxfs: Replace ASSERT with XFS_IS_CORRUPT in xfs_rtcopy_summary()
Nirjhar Roy (IBM) [Wed, 4 Feb 2026 15:06:26 +0000 (20:36 +0530)]
xfs: Replace ASSERT with XFS_IS_CORRUPT in xfs_rtcopy_summary()

Replace ASSERT(sum > 0) with an XFS_IS_CORRUPT() and place it just
after the call to xfs_rtget_summary() so that we don't end up using
an illegal value of sum.

Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoRDMA/umem: Fix double dma_buf_unpin in failure path
Jacob Moroni [Tue, 24 Feb 2026 23:41:53 +0000 (23:41 +0000)]
RDMA/umem: Fix double dma_buf_unpin in failure path

In ib_umem_dmabuf_get_pinned_with_dma_device(), the call to
ib_umem_dmabuf_map_pages() can fail. If this occurs, the dmabuf
is immediately unpinned but the umem_dmabuf->pinned flag is still
set. Then, when ib_umem_release() is called, it calls
ib_umem_dmabuf_revoke() which will call dma_buf_unpin() again.

Fix this by removing the immediate unpin upon failure and just let
the ib_umem_release/revoke path handle it. This also ensures the
proper unmap-unpin unwind ordering if the dmabuf_map_pages call
happened to fail due to dma_resv_wait_timeout (and therefore has
a non-NULL umem_dmabuf->sgt).

Fixes: 1e4df4a21c5a ("RDMA/umem: Allow pinned dmabuf umem usage")
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20260224234153.1207849-1-jmoroni@google.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2 months agoRDMA/core: Check id_priv->restricted_node_type in cma_listen_on_dev()
Stefan Metzmacher [Tue, 24 Feb 2026 16:59:52 +0000 (17:59 +0100)]
RDMA/core: Check id_priv->restricted_node_type in cma_listen_on_dev()

When listening on wildcard addresses we have a global list for the application
layer rdma_cm_id and for any existing device or any device added in future we
try to listen on any wildcard listener.

When the listener has a restricted_node_type we should prevent listening on
devices with a different node type.

While there fix the documentation comment of rdma_restrict_node_type()
to include rdma_resolve_addr() instead of having rdma_bind_addr() twice.

Fixes: a760e80e90f5 ("RDMA/core: introduce rdma_restrict_node_type()")
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-rdma@vger.kernel.org
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Link: https://patch.msgid.link/20260224165951.3582093-2-metze@samba.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2 months agoKVM: arm64: Deduplicate ASID retrieval code
Marc Zyngier [Wed, 25 Feb 2026 10:47:18 +0000 (10:47 +0000)]
KVM: arm64: Deduplicate ASID retrieval code

We currently have three versions of the ASID retrieval code, one
in the S1 walker, and two in the VNCR handling (although the last
two are limited to the EL2&0 translation regime).

Make this code common, and take this opportunity to also simplify
the code a bit while switching over to the TTBRx_EL1_ASID macro.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260225104718.14209-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agodmaengine: idxd: Fix leaking event log memory
Vinicius Costa Gomes [Wed, 21 Jan 2026 18:34:36 +0000 (10:34 -0800)]
dmaengine: idxd: Fix leaking event log memory

During the device remove process, the device is reset, causing the
configuration registers to go back to their default state, which is
zero. As the driver is checking if the event log support was enabled
before deallocating, it will fail if a reset happened before.

Do not check if the support was enabled, the check for 'idxd->evl'
being valid (only allocated if the HW capability is available) is
enough.

Fixes: 244da66cda35 ("dmaengine: idxd: setup event log configuration")
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://patch.msgid.link/20260121-idxd-fix-flr-on-kernel-queues-v3-v3-10-7ed70658a9d1@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2 months agodmaengine: idxd: Fix freeing the allocated ida too late
Vinicius Costa Gomes [Wed, 21 Jan 2026 18:34:35 +0000 (10:34 -0800)]
dmaengine: idxd: Fix freeing the allocated ida too late

It can happen that when the cdev .release() is called, the driver
already called ida_destroy(). Move ida_free() to the _del() path.

We see with DEBUG_KOBJECT_RELEASE enabled and forcing an early PCI
unbind.

Fixes: 04922b7445a1 ("dmaengine: idxd: fix cdev setup and free device lifetime issues")
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://patch.msgid.link/20260121-idxd-fix-flr-on-kernel-queues-v3-v3-9-7ed70658a9d1@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2 months agodmaengine: idxd: Fix memory leak when a wq is reset
Vinicius Costa Gomes [Wed, 21 Jan 2026 18:34:34 +0000 (10:34 -0800)]
dmaengine: idxd: Fix memory leak when a wq is reset

idxd_wq_disable_cleanup() which is called from the reset path for a
workqueue, sets the wq type to NONE, which for other parts of the
driver mean that the wq is empty (all its resources were released).

Only set the wq type to NONE after its resources are released.

Fixes: da32b28c95a7 ("dmaengine: idxd: cleanup workqueue config after disabling")
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://patch.msgid.link/20260121-idxd-fix-flr-on-kernel-queues-v3-v3-8-7ed70658a9d1@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2 months agodmaengine: idxd: Fix not releasing workqueue on .release()
Vinicius Costa Gomes [Wed, 21 Jan 2026 18:34:33 +0000 (10:34 -0800)]
dmaengine: idxd: Fix not releasing workqueue on .release()

The workqueue associated with an DSA/IAA device is not released when
the object is freed.

Fixes: 47c16ac27d4c ("dmaengine: idxd: fix idxd conf_dev 'struct device' lifetime")
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://patch.msgid.link/20260121-idxd-fix-flr-on-kernel-queues-v3-v3-7-7ed70658a9d1@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2 months agodmaengine: idxd: Wait for submitted operations on .device_synchronize()
Vinicius Costa Gomes [Wed, 21 Jan 2026 18:34:32 +0000 (10:34 -0800)]
dmaengine: idxd: Wait for submitted operations on .device_synchronize()

When the dmaengine "core" asks the driver to synchronize, send a Drain
operation to the device workqueue, which will wait for the already
submitted operations to finish.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://patch.msgid.link/20260121-idxd-fix-flr-on-kernel-queues-v3-v3-6-7ed70658a9d1@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2 months agodmaengine: idxd: Flush all pending descriptors
Vinicius Costa Gomes [Wed, 21 Jan 2026 18:34:31 +0000 (10:34 -0800)]
dmaengine: idxd: Flush all pending descriptors

When used as a dmaengine, the DMA "core" might ask the driver to
terminate all pending requests, when that happens, flush all pending
descriptors.

In this context, flush means removing the requests from the pending
lists, so even if they are completed after, the user is not notified.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://patch.msgid.link/20260121-idxd-fix-flr-on-kernel-queues-v3-v3-5-7ed70658a9d1@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2 months agodmaengine: idxd: Flush kernel workqueues on Function Level Reset
Vinicius Costa Gomes [Wed, 21 Jan 2026 18:34:30 +0000 (10:34 -0800)]
dmaengine: idxd: Flush kernel workqueues on Function Level Reset

When a Function Level Reset (FLR) happens, terminate the pending
descriptors that were issued by in-kernel users and disable the
interrupts associated with those. They will be re-enabled after FLR
finishes.

idxd_wq_flush_desc() is declared on idxd.h because it's going to be
used in by the DMA backend in a future patch.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://patch.msgid.link/20260121-idxd-fix-flr-on-kernel-queues-v3-v3-4-7ed70658a9d1@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2 months agodmaengine: idxd: Fix possible invalid memory access after FLR
Vinicius Costa Gomes [Wed, 21 Jan 2026 18:34:29 +0000 (10:34 -0800)]
dmaengine: idxd: Fix possible invalid memory access after FLR

In the case that the first Function Level Reset (FLR) concludes
correctly, but in the second FLR the scratch area for the saved
configuration cannot be allocated, it's possible for a invalid memory
access to happen.

Always set the deallocated scratch area to NULL after FLR completes.

Fixes: 98d187a98903 ("dmaengine: idxd: Enable Function Level Reset (FLR) for halt")
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://patch.msgid.link/20260121-idxd-fix-flr-on-kernel-queues-v3-v3-3-7ed70658a9d1@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2 months agodmaengine: idxd: Fix crash when the event log is disabled
Vinicius Costa Gomes [Wed, 21 Jan 2026 18:34:28 +0000 (10:34 -0800)]
dmaengine: idxd: Fix crash when the event log is disabled

If reporting errors to the event log is not supported by the hardware,
and an error that causes Function Level Reset (FLR) is received, the
driver will try to restore the event log even if it was not allocated.

Also, only try to free the event log if it was properly allocated.

Fixes: 6078a315aec1 ("dmaengine: idxd: Add idxd_device_config_save() and idxd_device_config_restore() helpers")
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://patch.msgid.link/20260121-idxd-fix-flr-on-kernel-queues-v3-v3-2-7ed70658a9d1@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2 months agodmaengine: idxd: Fix lockdep warnings when calling idxd_device_config()
Vinicius Costa Gomes [Wed, 21 Jan 2026 18:34:27 +0000 (10:34 -0800)]
dmaengine: idxd: Fix lockdep warnings when calling idxd_device_config()

Move the check for IDXD_FLAG_CONFIGURABLE and the locking to "inside"
idxd_device_config(), as this is common to all callers, and the one
that wasn't holding the lock was an error (that was causing the
lockdep warning).

Suggested-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://patch.msgid.link/20260121-idxd-fix-flr-on-kernel-queues-v3-v3-1-7ed70658a9d1@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2 months agox86/efi: defer freeing of boot services memory
Mike Rapoport (Microsoft) [Wed, 25 Feb 2026 06:55:55 +0000 (08:55 +0200)]
x86/efi: defer freeing of boot services memory

efi_free_boot_services() frees memory occupied by EFI_BOOT_SERVICES_CODE
and EFI_BOOT_SERVICES_DATA using memblock_free_late().

There are two issue with that: memblock_free_late() should be used for
memory allocated with memblock_alloc() while the memory reserved with
memblock_reserve() should be freed with free_reserved_area().

More acutely, with CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
efi_free_boot_services() is called before deferred initialization of the
memory map is complete.

Benjamin Herrenschmidt reports that this causes a leak of ~140MB of
RAM on EC2 t3a.nano instances which only have 512MB or RAM.

If the freed memory resides in the areas that memory map for them is
still uninitialized, they won't be actually freed because
memblock_free_late() calls memblock_free_pages() and the latter skips
uninitialized pages.

Using free_reserved_area() at this point is also problematic because
__free_page() accesses the buddy of the freed page and that again might
end up in uninitialized part of the memory map.

Delaying the entire efi_free_boot_services() could be problematic
because in addition to freeing boot services memory it updates
efi.memmap without any synchronization and that's undesirable late in
boot when there is concurrency.

More robust approach is to only defer freeing of the EFI boot services
memory.

Split efi_free_boot_services() in two. First efi_unmap_boot_services()
collects ranges that should be freed into an array then
efi_free_boot_services() later frees them after deferred init is complete.

Link: https://lore.kernel.org/all/ec2aaef14783869b3be6e3c253b2dcbf67dbc12a.camel@kernel.crashing.org
Fixes: 916f676f8dc0 ("x86, efi: Retain boot service code until after switching to virtual mode")
Cc: <stable@vger.kernel.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2 months agodmaengine: dw-edma: fix MSI data programming for multi-IRQ case
Shenghui Shi [Mon, 9 Feb 2026 10:37:25 +0000 (18:37 +0800)]
dmaengine: dw-edma: fix MSI data programming for multi-IRQ case

When using MSI (not MSI-X) with multiple IRQs, the MSI data value
must be unique per vector to ensure correct interrupt delivery.
Currently, the driver fails to increment the MSI data per vector,
causing interrupts to be misrouted.

Fix this by caching the base MSI data and adjusting each vector's
data accordingly during IRQ setup.

Fixes: e63d79d1ff04 ("dmaengine: dw-edma: Add Synopsys DesignWare eDMA IP core driver")
Signed-off-by: Shenghui Shi <brody.shi@m2semi.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260209103726.414-1-brody.shi@m2semi.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2 months agoerofs: fix interlaced plain identification for encoded extents
Gao Xiang [Tue, 24 Feb 2026 10:31:25 +0000 (18:31 +0800)]
erofs: fix interlaced plain identification for encoded extents

Only plain data whose start position and on-disk physical length are
both aligned to the block size should be classified as interlaced
plain extents. Otherwise, it must be treated as shifted plain extents.

This issue was found by syzbot using a crafted compressed image
containing plain extents with unaligned physical lengths, which can
cause OOB read in z_erofs_transform_plain().

Reported-and-tested-by: syzbot+d988dc155e740d76a331@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/699d5714.050a0220.cdd3c.03e7.GAE@google.com
Fixes: 1d191b4ca51d ("erofs: implement encoded extent metadata")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2 months agodmaengine: fsl-edma: fix channel parameter config for fixed channel requests
Joy Zou [Wed, 17 Sep 2025 09:53:42 +0000 (17:53 +0800)]
dmaengine: fsl-edma: fix channel parameter config for fixed channel requests

Configure only the requested channel when a fixed channel is specified
to avoid modifying other channels unintentionally.

Fix parameter configuration when a fixed DMA channel is requested on
i.MX9 AON domain and i.MX8QM/QXP/DXL platforms. When a client requests
a fixed channel (e.g., channel 6), the driver traverses channels 0-5
and may unintentionally modify their configuration if they are unused.

This leads to issues such as setting the `is_multi_fifo` flag unexpectedly,
causing memcpy tests to fail when using the dmatest tool.

Only affect edma memcpy test when the channel is fixed.

Fixes: 72f5801a4e2b ("dmaengine: fsl-edma: integrate v3 support")
Signed-off-by: Joy Zou <joy.zou@nxp.com>
Cc: stable@vger.kernel.org
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250917-b4-edma-chanconf-v1-1-886486e02e91@nxp.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2 months agoALSA: usb-audio: Use inclusive terms
Takashi Iwai [Wed, 25 Feb 2026 08:52:31 +0000 (09:52 +0100)]
ALSA: usb-audio: Use inclusive terms

Replace the remaining with inclusive terms; it's only this function
name we overlooked at the previous conversion.

Fixes: 53837b4ac2bd ("ALSA: usb-audio: Replace slave/master terms")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20260225085233.316306-5-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2 months agoALSA: usb-audio: Avoid implicit feedback mode on DIYINHK USB Audio 2.0
Takashi Iwai [Wed, 25 Feb 2026 08:52:30 +0000 (09:52 +0100)]
ALSA: usb-audio: Avoid implicit feedback mode on DIYINHK USB Audio 2.0

Although DIYINHK USB Audio 2.0 (ID 20b1:2009) shows the implicit
feedback source for the capture stream, this would cause several
problems for the playback.  Namely, the device can get wMaxPackSize
1024 for 24/32 bit format with 6 channels, and when a high sample rate
like 352.8kHz or 384kHz is played, the packet size overflows the max
limit.  Also, the device has another two playback altsets, and those
aren't properly handled with the implicit feedback.

Since the device has been working well even before introducing the
implicit feedback, we can assume that it works fine in the async mode.
This patch adds the explicit skip of the implicit fb detection to make
the playback running in the async mode.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=221076
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20260225085233.316306-4-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2 months agoALSA: usb-audio: Check max frame size for implicit feedback mode, too
Takashi Iwai [Wed, 25 Feb 2026 08:52:29 +0000 (09:52 +0100)]
ALSA: usb-audio: Check max frame size for implicit feedback mode, too

When the packet sizes are taken from the capture stream in the
implicit feedback mode, the sizes might be larger than the upper
boundary defined by the descriptor.  As already done for other
transfer modes, we have to cap the sizes accordingly at sending,
otherwise this would lead to an error in USB core at submission of
URBs.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=221076
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20260225085233.316306-3-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2 months agoALSA: usb-audio: Cap the packet size pre-calculations
Takashi Iwai [Wed, 25 Feb 2026 08:52:28 +0000 (09:52 +0100)]
ALSA: usb-audio: Cap the packet size pre-calculations

We calculate the possible packet sizes beforehand for adaptive and
synchronous endpoints, but we didn't take care of the max frame size
for those pre-calculated values.  When a device or a bus limits the
packet size, a high sample rate or a high number of channels may lead
to the packet sizes that are larger than the given limit, which
results in an error from the USB core at submitting URBs.

As a simple workaround, just add the sanity checks of pre-calculated
packet sizes to have the upper boundary of ep->maxframesize.

Fixes: f0bd62b64016 ("ALSA: usb-audio: Improve frames size computation")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=221076
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20260225085233.316306-2-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2 months agoirqchip/gic-v5: Fix inversion of IRS_IDR0.virt flag
Sascha Bischoff [Wed, 25 Feb 2026 08:31:40 +0000 (08:31 +0000)]
irqchip/gic-v5: Fix inversion of IRS_IDR0.virt flag

It appears that a !! became ! during a cleanup, resulting in inverted
logic when detecting if a host GICv5 implementation is capable of
virtualization.

Re-add the missing !, fixing the behaviour.

Fixes: 3227c3a89d65f ("irqchip/gic-v5: Check if impl is virt capable")
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Link: https://patch.msgid.link/20260225083130.3378490-1-sascha.bischoff@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoesp: fix skb leak with espintcp and async crypto
Sabrina Dubroca [Mon, 23 Feb 2026 23:05:14 +0000 (00:05 +0100)]
esp: fix skb leak with espintcp and async crypto

When the TX queue for espintcp is full, esp_output_tail_tcp will
return an error and not free the skb, because with synchronous crypto,
the common xfrm output code will drop the packet for us.

With async crypto (esp_output_done), we need to drop the skb when
esp_output_tail_tcp returns an error.

Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 months agoxfrm: call xdo_dev_state_delete during state update
Sabrina Dubroca [Mon, 23 Feb 2026 23:05:13 +0000 (00:05 +0100)]
xfrm: call xdo_dev_state_delete during state update

When we update an SA, we construct a new state and call
xdo_dev_state_add, but never insert it. The existing state is updated,
then we immediately destroy the new state. Since we haven't added it,
we don't go through the standard state delete code, and we're skipping
removing it from the device (but xdo_dev_state_free will get called
when we destroy the temporary state).

This is similar to commit c5d4d7d83165 ("xfrm: Fix deletion of
offloaded SAs on failure.").

Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 months agoxfrm: fix the condition on x->pcpu_num in xfrm_sa_len
Sabrina Dubroca [Mon, 23 Feb 2026 23:05:12 +0000 (00:05 +0100)]
xfrm: fix the condition on x->pcpu_num in xfrm_sa_len

pcpu_num = 0 is a valid value. The marker for "unset pcpu_num" which
makes copy_to_user_state_extra not add the XFRMA_SA_PCPU attribute is
UINT_MAX.

Fixes: 1ddf9916ac09 ("xfrm: Add support for per cpu xfrm state handling.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 months agoxfrm: add missing extack for XFRMA_SA_PCPU in add_acquire and allocspi
Sabrina Dubroca [Mon, 23 Feb 2026 23:05:11 +0000 (00:05 +0100)]
xfrm: add missing extack for XFRMA_SA_PCPU in add_acquire and allocspi

We're returning an error caused by invalid user input without setting
an extack. Add one.

Fixes: 1ddf9916ac09 ("xfrm: Add support for per cpu xfrm state handling.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2 months agosched_ext: Disable preemption between scx_claim_exit() and kicking helper work
Tejun Heo [Wed, 25 Feb 2026 07:39:58 +0000 (21:39 -1000)]
sched_ext: Disable preemption between scx_claim_exit() and kicking helper work

scx_claim_exit() atomically sets exit_kind, which prevents scx_error() from
triggering further error handling. After claiming exit, the caller must kick
the helper kthread work which initiates bypass mode and teardown.

If the calling task gets preempted between claiming exit and kicking the
helper work, and the BPF scheduler fails to schedule it back (since error
handling is now disabled), the helper work is never queued, bypass mode
never activates, tasks stop being dispatched, and the system wedges.

Disable preemption across scx_claim_exit() and the subsequent work kicking
in all callers - scx_disable() and scx_vexit(). Add
lockdep_assert_preemption_disabled() to scx_claim_exit() to enforce the
requirement.

Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2 months agodrm/client: Do not destroy NULL modes
Jonathan Cavitt [Tue, 24 Feb 2026 22:12:28 +0000 (22:12 +0000)]
drm/client: Do not destroy NULL modes

'modes' in drm_client_modeset_probe may fail to kcalloc.  If this
occurs, we jump to 'out', calling modes_destroy on it, which
dereferences it.  This may result in a NULL pointer dereference in the
error case.  Prevent that.

Fixes: 3039cc0c0653 ("drm/client: Make copies of modes")
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/20260224221227.69126-2-jonathan.cavitt@intel.com
2 months agofirmware: stratix10-rsu: Fix NULL pointer dereference when RSU is disabled
Liwei Song [Thu, 12 Feb 2026 04:00:35 +0000 (12:00 +0800)]
firmware: stratix10-rsu: Fix NULL pointer dereference when RSU is disabled

When the Remote System Update (RSU) isn't enabled in the First Stage
Boot Loader (FSBL), the driver encounters a NULL pointer dereference when
excute svc_normal_to_secure_thread() thread, resulting in a kernel panic:

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
Mem abort info:
...
Data abort info:
...
[0000000000000008] user address but active_mm is swapper
Internal error: Oops: 0000000096000004 [#1]  SMP
Modules linked in:
CPU: 0 UID: 0 PID: 79 Comm: svc_smc_hvc_thr Not tainted 6.19.0-rc8-yocto-standard+ #59 PREEMPT
Hardware name: SoCFPGA Stratix 10 SoCDK (DT)
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : svc_normal_to_secure_thread+0x38c/0x990
lr : svc_normal_to_secure_thread+0x144/0x990
...
Call trace:
 svc_normal_to_secure_thread+0x38c/0x990 (P)
 kthread+0x150/0x210
 ret_from_fork+0x10/0x20
Code: 97cfc113 f9400260 aa1403e1 f9400400 (f9400402)
---[ end trace 0000000000000000 ]---

The issue occurs because rsu_send_async_msg() fails when RSU is not enabled
in firmware, causing the channel to be freed via stratix10_svc_free_channel().
However, the probe function continues execution and registers
svc_normal_to_secure_thread(), which subsequently attempts to access the
already-freed channel, triggering the NULL pointer dereference.

Fix this by properly cleaning up the async client and returning early on
failure, preventing the thread from being used with an invalid channel.

Fixes: 15847537b623 ("firmware: stratix10-rsu: Migrate RSU driver to use stratix10 asynchronous framework.")
Cc: stable@kernel.org
Signed-off-by: Liwei Song <liwei.song@windriver.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2 months agonet: stmmac: fix timestamping configuration after suspend/resume
Russell King (Oracle) [Mon, 23 Feb 2026 12:19:08 +0000 (12:19 +0000)]
net: stmmac: fix timestamping configuration after suspend/resume

When stmmac_init_timestamping() is called, it clears the receive and
transmit path booleans that allow timestamps to be read. These are
never re-initialised until after userspace requests timestamping
features to be enabled.

However, our copy of the timestamp configuration is not cleared, which
means we return the old configuration to userspace when requested.
This is inconsistent. Fix this by clearing the timestamp configuration.

Fixes: d6228b7cdd6e ("net: stmmac: implement the SIOCGHWTSTAMP ioctl")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/E1vuUu4-0000000Afea-0j9B@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoarm64: dts: imx93-tqma9352: improve eMMC pad configuration
Markus Niebel [Mon, 9 Feb 2026 15:50:14 +0000 (16:50 +0100)]
arm64: dts: imx93-tqma9352: improve eMMC pad configuration

Use DSE x4 an PullUp for CMD an DAT, DSE x4 and PullDown for CLK to improve
stability and detection at low temperatures under -25°C.

Fixes: 0b5fdfaa8e45 ("arm64: dts: freescale: imx93-tqma9352: set SION for cmd and data pad of USDHC")
Signed-off-by: Markus Niebel <Markus.Niebel@ew.tq-group.com>
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
2 months agoarm64: dts: imx91-tqma9131: improve eMMC pad configuration
Markus Niebel [Mon, 9 Feb 2026 15:50:13 +0000 (16:50 +0100)]
arm64: dts: imx91-tqma9131: improve eMMC pad configuration

Use DSE x4 an PullUp for CMD an DAT, DSE x4 and PullDown for CLK to improve
stability and detection at low temperatures under -25°C.

Fixes: e71db39f0c7c ("arm64: dts: freescale: add initial device tree for TQMa91xx/MBa91xxCA")
Signed-off-by: Markus Niebel <Markus.Niebel@ew.tq-group.com>
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
2 months agooverflow: Make sure size helpers are always inlined
Kees Cook [Tue, 24 Feb 2026 23:24:52 +0000 (15:24 -0800)]
overflow: Make sure size helpers are always inlined

With kmalloc_obj() performing implicit size calculations, the embedded
size_mul() calls, while marked inline, were not always being inlined.
I noticed a couple places where allocations were making a call out for
things that would otherwise be compile-time calculated. Force the
compilers to always inline these calculations.

Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/20260224232451.work.614-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2 months agokunit: irq: Ensure timer doesn't fire too frequently
Eric Biggers [Tue, 24 Feb 2026 03:37:51 +0000 (19:37 -0800)]
kunit: irq: Ensure timer doesn't fire too frequently

Fix a bug where kunit_run_irq_test() could hang if the system is too
slow.  This was noticed with the crypto library tests in certain VMs.

Specifically, if kunit_irq_test_timer_func() and the associated hrtimer
code took over 5us to run, then the CPU would spend all its time
executing that code in hardirq context.  As a result, the task executing
kunit_run_irq_test() never had a chance to run, exit the loop, and
cancel the timer.

To fix it, make kunit_irq_test_timer_func() increase the timer interval
when the other contexts aren't having a chance to run.

Fixes: 950a81224e8b ("lib/crypto: tests: Add hash-test-template.h and gen-hash-testvecs.py")
Cc: stable@vger.kernel.org
Reviewed-by: David Gow <david@davidgow.net>
Link: https://lore.kernel.org/r/20260224033751.97615-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2 months agozloop: check for spurious options passed to remove
Christoph Hellwig [Tue, 24 Feb 2026 14:21:45 +0000 (06:21 -0800)]
zloop: check for spurious options passed to remove

Zloop uses a command option parser for all control commands,
but most options are only valid for adding a new device.  Check
for incorrectly specified options in the remove handler.

Fixes: eb0570c7df23 ("block: new zoned loop block device driver")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 months agozloop: advertise a volatile write cache
Christoph Hellwig [Tue, 24 Feb 2026 14:21:44 +0000 (06:21 -0800)]
zloop: advertise a volatile write cache

Zloop is file system backed and thus needs to sync the underlying file
system to persist data.  Set BLK_FEAT_WRITE_CACHE so that the block
layer actually send flush commands, and fix the flush implementation
as sync_filesystem requires s_umount to be held and the code currently
misses that.

Fixes: eb0570c7df23 ("block: new zoned loop block device driver")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 months agomedia: dvb-core: fix wrong reinitialization of ringbuffer on reopen
Jens Axboe [Tue, 24 Feb 2026 18:51:16 +0000 (11:51 -0700)]
media: dvb-core: fix wrong reinitialization of ringbuffer on reopen

dvb_dvr_open() calls dvb_ringbuffer_init() when a new reader opens the
DVR device.  dvb_ringbuffer_init() calls init_waitqueue_head(), which
reinitializes the waitqueue list head to empty.

Since dmxdev->dvr_buffer.queue is a shared waitqueue (all opens of the
same DVR device share it), this orphans any existing waitqueue entries
from io_uring poll or epoll, leaving them with stale prev/next pointers
while the list head is reset to {self, self}.

The waitqueue and spinlock in dvr_buffer are already properly
initialized once in dvb_dmxdev_init().  The open path only needs to
reset the buffer data pointer, size, and read/write positions.

Replace the dvb_ringbuffer_init() call in dvb_dvr_open() with direct
assignment of data/size and a call to dvb_ringbuffer_reset(), which
properly resets pread, pwrite, and error with correct memory ordering
without touching the waitqueue or spinlock.

Cc: stable@vger.kernel.org
Fixes: 34731df288a5f ("V4L/DVB (3501): Dmxdev: use dvb_ringbuffer")
Reported-by: syzbot+ab12f0c08dd7ab8d057c@syzkaller.appspotmail.com
Tested-by: syzbot+ab12f0c08dd7ab8d057c@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/698a26d3.050a0220.3b3015.007d.GAE@google.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 months agoarm64: dts: imx93-9x9-qsb: change usdhc tuning step for eMMC and SD
Luke Wang [Tue, 3 Feb 2026 11:23:08 +0000 (19:23 +0800)]
arm64: dts: imx93-9x9-qsb: change usdhc tuning step for eMMC and SD

During system resume, the following errors occurred:

  [  430.638625] mmc1: error -84 writing Cache Enable bit
  [  430.643618] mmc1: error -84 doing runtime resume

For eMMC and SD, there are two tuning pass windows and the gap between
those two windows may only have one cell. If tuning step > 1, the gap may
just be skipped and host assumes those two windows as a continuous
windows. This will cause a wrong delay cell near the gap to be selected.

Set the tuning step to 1 to avoid selecting the wrong delay cell.

For SDIO, the gap is sufficiently large, so the default tuning step does
not cause this issue.

Fixes: 0565d20cd8c2 ("arm64: dts: freescale: Support i.MX93 9x9 Quick Start Board")
Signed-off-by: Luke Wang <ziniu.wang_1@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
2 months agoarm64: dts: imx8mq: Set the correct gpu_ahb clock frequency
Sebastian Krzyszkowiak [Tue, 27 Jan 2026 23:28:28 +0000 (00:28 +0100)]
arm64: dts: imx8mq: Set the correct gpu_ahb clock frequency

According to i.MX 8M Quad Reference Manual, GPU_AHB_CLK_ROOT's maximum
frequency is 400MHz.

Fixes: 45d2c84eb3a2 ("arm64: dts: imx8mq: add GPU node")
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Sebastian Krzyszkowiak <sebastian.krzyszkowiak@puri.sm>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
2 months agoscsi: ufs: core: Fix shift out of bounds when MAXQ=32
wangshuaiwei [Tue, 24 Feb 2026 06:32:28 +0000 (14:32 +0800)]
scsi: ufs: core: Fix shift out of bounds when MAXQ=32

According to JESD223F, the maximum number of queues (MAXQ) is 32. When MCQ
is enabled and ESI is disabled, nr_hw_queues=32 causes a shift overflow
problem.

Fix this by using 64-bit intermediate values to handle the nr_hw_queues=32
case safely.

Signed-off-by: wangshuaiwei <wangshuaiwei1@xiaomi.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20260224063228.50112-1-wangshuaiwei1@xiaomi.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2 months agoMAINTAINERS: update Yosry Ahmed's email address
Yosry Ahmed [Mon, 23 Feb 2026 16:00:26 +0000 (16:00 +0000)]
MAINTAINERS: update Yosry Ahmed's email address

Use my kernel.org email address.

Link: https://lkml.kernel.org/r/20260223160027.122307-1-yosry@kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomailmap: add entry for Daniele Alessandrelli
Daniele Alessandrelli [Mon, 23 Feb 2026 17:09:05 +0000 (17:09 +0000)]
mailmap: add entry for Daniele Alessandrelli

My Intel email is going to bounce soon.  Map it to my personal Gmail
address.

Link: https://lkml.kernel.org/r/20260223170905.278956-1-daniele.alessandrelli@intel.com
Signed-off-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com>
Cc: Daniele Alessandrelli <daniele.alessandrelli@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm: fix NULL NODE_DATA dereference for memoryless nodes on boot
Ming Lei [Sun, 22 Feb 2026 11:57:02 +0000 (19:57 +0800)]
mm: fix NULL NODE_DATA dereference for memoryless nodes on boot

Commit d49004c5f0c1 ("arch, mm: consolidate initialization of nodes, zones
and memory map") moved free_area_init() from setup_arch() to
mm_core_init_early(), which runs after setup_arch() returns.

This changed the ordering relative to init_cpu_to_node() on x86.  Before
the commit, free_area_init() ran during paging_init() (called from
setup_arch()) *before* init_cpu_to_node().  After the commit, it runs
*after* init_cpu_to_node().

On machines with memoryless NUMA nodes (e.g., node 0 has CPUs but no
memory), this causes a NULL pointer dereference:

 1. numa_register_nodes() skips memoryless nodes: no alloc_node_data()
    and no node_set_online() for them.
 2. init_cpu_to_node() sets memoryless nodes online (they have CPUs)
    but does not allocate NODE_DATA.
 3. free_area_init() checks "if (!node_online(nid))" to decide whether
    to call alloc_offline_node_data(). Since the memoryless node is now
    online, the allocation is skipped, leaving NODE_DATA(nid) == NULL.
 4. The immediate "pgdat = NODE_DATA(nid)" dereferences NULL.

The crash happens before console_init(), so no output is visible without
earlyprintk.  With earlyprintk enabled, the following panic is observed:

 BUG: unable to handle page fault for address: 000000000002a1e0
 Oops: Oops: 0000 [#1] SMP NOPTI
 RIP: 0010:free_area_init_node+0x3a/0x540
 Call Trace:
  <TASK>
  free_area_init+0x331/0x4e0
  start_kernel+0x69/0x4a0
  x86_64_start_reservations+0x24/0x30
  x86_64_start_kernel+0x125/0x130
  common_startup_64+0x13e/0x148
  </TASK>
 Kernel panic - not syncing: Attempted to kill the idle task!

Fix this by checking "if (!NODE_DATA(nid))" instead of "if
(!node_online(nid))".  This directly tests whether the per-node data
structure needs to be allocated, regardless of the node's online status.
This change is also safe for non-x86 architectures as they all allocate
NODE_DATA for every node including memoryless ones, so the check simply
evaluates to false with no change in behavior.

Link: https://lkml.kernel.org/r/20260222115702.3659-1-ming.lei@redhat.com
Fixes: d49004c5f0c1 ("arch, mm: consolidate initialization of nodes, zones and memory map")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/tracing: rss_stat: ensure curr is false from kthread context
Kalesh Singh [Thu, 19 Feb 2026 23:36:56 +0000 (15:36 -0800)]
mm/tracing: rss_stat: ensure curr is false from kthread context

The rss_stat trace event allows userspace tools, like Perfetto [1], to
inspect per-process RSS metric changes over time.

The curr field was introduced to rss_stat in commit e4dcad204d3a
("rss_stat: add support to detect RSS updates of external mm").  Its
intent is to indicate whether the RSS update is for the mm_struct of the
current execution context; and is set to false when operating on a remote
mm_struct (e.g., via kswapd or a direct reclaimer).

However, an issue arises when a kernel thread temporarily adopts a user
process's mm_struct.  Kernel threads do not have their own mm_struct and
normally have current->mm set to NULL.  To operate on user memory, they
can "borrow" a memory context using kthread_use_mm(), which sets
current->mm to the user process's mm.

This can be observed, for example, in the USB Function Filesystem (FFS)
driver.  The ffs_user_copy_worker() handles AIO completions and uses
kthread_use_mm() to copy data to a user-space buffer.  If a page fault
occurs during this copy, the fault handler executes in the kthread's
context.

At this point, current is the kthread, but current->mm points to the user
process's mm.  Since the rss_stat event (from the page fault) is for that
same mm, the condition current->mm == mm becomes true, causing curr to be
incorrectly set to true when the trace event is emitted.

This is misleading because it suggests the mm belongs to the kthread,
confusing userspace tools that track per-process RSS changes and
corrupting their mm_id-to-process association.

Fix this by ensuring curr is always false when the trace event is emitted
from a kthread context by checking for the PF_KTHREAD flag.

Link: https://lkml.kernel.org/r/20260219233708.1971199-1-kaleshsingh@google.com
Link: https://perfetto.dev/
Fixes: e4dcad204d3a ("rss_stat: add support to detect RSS updates of external mm")
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org> [5.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/kfence: fix KASAN hardware tag faults during late enablement
Alexander Potapenko [Fri, 20 Feb 2026 14:49:40 +0000 (15:49 +0100)]
mm/kfence: fix KASAN hardware tag faults during late enablement

When KASAN hardware tags are enabled, re-enabling KFENCE late (via
/sys/module/kfence/parameters/sample_interval) causes KASAN faults.

This happens because the KFENCE pool and metadata are allocated via the
page allocator, which tags the memory, while KFENCE continues to access it
using untagged pointers during initialization.

Use __GFP_SKIP_KASAN for late KFENCE pool and metadata allocations to
ensure the memory remains untagged, consistent with early allocations from
memblock.  To support this, add __GFP_SKIP_KASAN to the allowlist in
__alloc_contig_verify_gfp_mask().

Link: https://lkml.kernel.org/r/20260220144940.2779209-1-glider@google.com
Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure")
Signed-off-by: Alexander Potapenko <glider@google.com>
Suggested-by: Ernesto Martinez Garcia <ernesto.martinezgarcia@tugraz.at>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Marco Elver <elver@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/damon/core: disallow non-power of two min_region_sz
SeongJae Park [Sat, 14 Feb 2026 21:41:21 +0000 (13:41 -0800)]
mm/damon/core: disallow non-power of two min_region_sz

DAMON core uses min_region_sz parameter value as the DAMON region
alignment.  The alignment is made using ALIGN() and ALIGN_DOWN(), which
support only the power of two alignments.  But DAMON core API callers can
set min_region_sz to an arbitrary number.  Users can also set it
indirectly, using addr_unit.

When the alignment is not properly set, DAMON behavior becomes difficult
to expect and understand, makes it effectively broken.  It doesn't cause a
kernel crash-like significant issue, though.

Fix the issue by disallowing min_region_sz input that is not a power of
two.  Add the check to damon_commit_ctx(), as all DAMON API callers who
set min_region_sz uses the function.

This can be a sort of behavioral change, but it does not break users, for
the following reasons.  As the symptom is making DAMON effectively broken,
it is not reasonable to believe there are real use cases of non-power of
two min_region_sz.  There is no known use case or issue reports from the
setup, either.

In future, if we find real use cases of non-power of two alignments and we
can support it with low enough overhead, we can consider moving the
restriction.  But, for now, simply disallowing the corner case should be
good enough as a hot fix.

Link: https://lkml.kernel.org/r/20260214214124.87689-1-sj@kernel.org
Fixes: d8f867fa0825 ("mm/damon: add damon_ctx->min_sz_region")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Quanmin Yan <yanquanmin1@huawei.com>
Cc: <stable@vger.kernel.org> [6.18+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agoSquashfs: check metadata block offset is within range
Phillip Lougher [Tue, 17 Feb 2026 05:09:55 +0000 (05:09 +0000)]
Squashfs: check metadata block offset is within range

Syzkaller reports a "general protection fault in squashfs_copy_data"

This is ultimately caused by a corrupted index look-up table, which
produces a negative metadata block offset.

This is subsequently passed to squashfs_copy_data (via
squashfs_read_metadata) where the negative offset causes an out of bounds
access.

The fix is to check that the offset is within range in
squashfs_read_metadata.  This will trap this and other cases.

Link: https://lkml.kernel.org/r/20260217050955.138351-1-phillip@squashfs.org.uk
Fixes: f400e12656ab ("Squashfs: cache operations")
Reported-by: syzbot+a9747fe1c35a5b115d3f@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/699234e2.a70a0220.2c38d7.00e2.GAE@google.com/
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agoMAINTAINERS, mailmap: update e-mail address for Vlastimil Babka
Vlastimil Babka (SUSE) [Tue, 17 Feb 2026 10:21:52 +0000 (11:21 +0100)]
MAINTAINERS, mailmap: update e-mail address for Vlastimil Babka

Hopefully improve e-mail performance.

Link: https://lkml.kernel.org/r/20260217102151.10425-2-vbabka@kernel.org
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agoliveupdate: luo_file: remember retrieve() status
Pratyush Yadav (Google) [Mon, 16 Feb 2026 13:22:19 +0000 (14:22 +0100)]
liveupdate: luo_file: remember retrieve() status

LUO keeps track of successful retrieve attempts on a LUO file.  It does so
to avoid multiple retrievals of the same file.  Multiple retrievals cause
problems because once the file is retrieved, the serialized data
structures are likely freed and the file is likely in a very different
state from what the code expects.

The retrieve boolean in struct luo_file keeps track of this, and is passed
to the finish callback so it knows what work was already done and what it
has left to do.

All this works well when retrieve succeeds.  When it fails,
luo_retrieve_file() returns the error immediately, without ever storing
anywhere that a retrieve was attempted or what its error code was.  This
results in an errored LIVEUPDATE_SESSION_RETRIEVE_FD ioctl to userspace,
but nothing prevents it from trying this again.

The retry is problematic for much of the same reasons listed above.  The
file is likely in a very different state than what the retrieve logic
normally expects, and it might even have freed some serialization data
structures.  Attempting to access them or free them again is going to
break things.

For example, if memfd managed to restore 8 of its 10 folios, but fails on
the 9th, a subsequent retrieve attempt will try to call
kho_restore_folio() on the first folio again, and that will fail with a
warning since it is an invalid operation.

Apart from the retry, finish() also breaks.  Since on failure the
retrieved bool in luo_file is never touched, the finish() call on session
close will tell the file handler that retrieve was never attempted, and it
will try to access or free the data structures that might not exist, much
in the same way as the retry attempt.

There is no sane way of attempting the retrieve again.  Remember the error
retrieve returned and directly return it on a retry.  Also pass this
status code to finish() so it can make the right decision on the work it
needs to do.

This is done by changing the bool to an integer.  A value of 0 means
retrieve was never attempted, a positive value means it succeeded, and a
negative value means it failed and the error code is the value.

Link: https://lkml.kernel.org/r/20260216132221.987987-1-pratyush@kernel.org
Fixes: 7c722a7f44e0 ("liveupdate: luo_file: implement file systems callbacks")
Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>