git-server-git.apps.pok.os.sepia.ceph.com Git

fs/ceph/dir: fix `i_nlink` underrun during async unlink

During async unlink, we drop the `i_nlink` counter before we receive
the completion (that will eventually update the `i_nlink`) because "we
assume that the unlink will succeed".  That is not a bad idea, but it
races against deletions by other clients (or against the completion of
our own unlink) and can lead to an underrun which emits a WARNING like
this one:

WARNING: CPU: 85 PID: 25093 at fs/inode.c:407 drop_nlink+0x50/0x68
Modules linked in:
CPU: 85 UID: 3221252029 PID: 25093 Comm: php-cgi8.1 Not tainted 6.14.11-cm4all1-ampere #655
Hardware name: Supermicro ARS-110M-NR/R12SPD-A, BIOS 1.1b 10/17/2023
pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : drop_nlink+0x50/0x68
lr : ceph_unlink+0x6c4/0x720
sp : ffff80012173bc90
x29: ffff80012173bc90 x28: ffff086d0a45aaf8 x27: ffff0871d0eb5680
x26: ffff087f2a64a718 x25: 0000020000000180 x24: 0000000061c88647
x23: 0000000000000002 x22: ffff07ff9236d800 x21: 0000000000001203
x20: ffff07ff9237b000 x19: ffff088b8296afc0 x18: 00000000f3c93365
x17: 0000000000070000 x16: ffff08faffcbdfe8 x15: ffff08faffcbdfec
x14: 0000000000000000 x13: 45445f65645f3037 x12: 34385f6369706f74
x11: 0000a2653104bb20 x10: ffffd85f26d73290 x9 : ffffd85f25664f94
x8 : 00000000000000c0 x7 : 0000000000000000 x6 : 0000000000000002
x5 : 0000000000000081 x4 : 0000000000000481 x3 : 0000000000000000
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff08727d3f91e8
Call trace:
  drop_nlink+0x50/0x68 (P)
  vfs_unlink+0xb0/0x2e8
  do_unlinkat+0x204/0x288
  __arm64_sys_unlinkat+0x3c/0x80
  invoke_syscall.constprop.0+0x54/0xe8
  do_el0_svc+0xa4/0xc8
  el0_svc+0x18/0x58
  el0t_64_sync_handler+0x104/0x130
  el0t_64_sync+0x154/0x158

In ceph_unlink(), a call to ceph_mdsc_submit_request() submits the
CEPH_MDS_OP_UNLINK to the MDS, but does not wait for completion.

Meanwhile, between this call and the following drop_nlink() call, a
worker thread may process a CEPH_CAP_OP_IMPORT, CEPH_CAP_OP_GRANT or
just a CEPH_MSG_CLIENT_REPLY (the latter of which could be our own
completion).  These will lead to a set_nlink() call, updating the
`i_nlink` counter to the value received from the MDS.  If that new
`i_nlink` value happens to be zero, it is illegal to decrement it
further.  But that is exactly what ceph_unlink() will do then.

The WARNING can be reproduced this way:

1. Force async unlink; only the async code path is affected.  Having
   no real clue about Ceph internals, I was unable to find out why the
   MDS wouldn't give me the "Fxr" capabilities, so I patched
   get_caps_for_async_unlink() to always succeed.

   (Note that the WARNING dump above was found on an unpatched kernel,
   without this kludge - this is not a theoretical bug.)

2. Add a sleep call after ceph_mdsc_submit_request() so the unlink
   completion gets handled by a worker thread before drop_nlink() is
   called.  This guarantees that the `i_nlink` is already zero before
   drop_nlink() runs.

The solution is to skip the counter decrement when it is already zero,
but doing so without a lock is still racy (TOCTOU).  Since
ceph_fill_inode() and handle_cap_grant() both hold the
`ceph_inode_info.i_ceph_lock` spinlock while set_nlink() runs, this
seems like the proper lock to protect the `i_nlink` updates.

I found prior art in NFS and SMB (using `inode.i_lock`) and AFS (using
`afs_vnode.cb_lock`).  All three have the zero check as well.

Fixes: 2ccb45462aea ("ceph: perform asynchronous unlink if we have sufficient caps")
Cc: stable@vger.kernel.org
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>

fs/ceph: add a bunch of missing ceph_path_info initializers

ceph_mdsc_build_path() must be called with a zero-initialized
ceph_path_info parameter, or else the following
ceph_mdsc_free_path_info() may crash.

Example crash (on Linux 6.18.12):

  virt_to_cache: Object is not a Slab page!
  WARNING: CPU: 184 PID: 2871736 at mm/slub.c:6732 kmem_cache_free+0x316/0x400
  [...]
  Call Trace:
   [...]
   ceph_open+0x13d/0x3e0
   do_dentry_open+0x134/0x480
   vfs_open+0x2a/0xe0
   path_openat+0x9a3/0x1160
  [...]
  cache_from_obj: Wrong slab cache. names_cache but object is from ceph_inode_info
  WARNING: CPU: 184 PID: 2871736 at mm/slub.c:6746 kmem_cache_free+0x2dd/0x400
  [...]
  kernel BUG at mm/slub.c:634!
  Oops: invalid opcode: 0000 [#1] SMP NOPTI
  RIP: 0010:__slab_free+0x1a4/0x350

Some of the ceph_mdsc_build_path() callers had initializers, but
others had not, even though they were all added by
commit 15f519e9f883 ("ceph: fix race condition validating r_parent
before applying state").
The ones without initializer are suspectible to random
crashes.  (I can imagine it could even be possible to exploit this bug
to elevate privileges.)

Unfortunately, these Ceph functions are undocumented and its semantics
can only be derived from the code.  I see that ceph_mdsc_build_path()
initializes the structure only on success, but not on error.

Calling ceph_mdsc_free_path_info() after a failed
ceph_mdsc_build_path() call does not even make sense, but that's what
all callers do, and for it to be safe, the structure must be
zero-initialized.  The least intrusive approach to fix this is
therefore to add initializers everywhere.

Fixes: 15f519e9f883 ("ceph: fix race condition validating r_parent before applying state")
Cc: stable@vger.kernel.org
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>

kernel: rpm-pkg: Restore find-debuginfo.sh approach to -debuginfo package

Commit 62089b804895 ("kbuild: rpm-pkg: Generate debuginfo package
manually") effectively reverted commit a7c699d090a1 ("kbuild: rpm-pkg:
build a debuginfo RPM") but the approach it took is not safe for older
RPM releases. Restore commit a7c699d090a1 ("kbuild: rpm-pkg: build a
debuginfo RPM") for the !CONFIG_MODULE_SIG case to allow more
environments and configurations to take advantage of the separate debug
information package process.

Cc: stable@vger.kernel.org
Fixes: 62089b804895 ("kbuild: rpm-pkg: Generate debuginfo package manually")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>

kbuild: rpm-pkg: Restrict manual debug package creation

Commit 62089b804895 ("kbuild: rpm-pkg: Generate debuginfo package
manually") moved away from the built-in RPM machinery for generating
-debuginfo packages to a more manual way to be compatible with module
signing, as the built-in machinery strips the modules after the
installation process, breaking the signatures.

Unfortunately, prior to rpm 4.20.0, there is a bug where a custom %files
directive is ignored for a -debuginfo subpackage [1], meaning builds
using older versions of RPM (such as on RHEL9 or RHEL10) fail with:

  Checking for unpackaged file(s): /usr/lib/rpm/check-files .../rpmbuild/BUILDROOT/kernel-6.19.0_dirty-1.x86_64
  error: Installed (but unpackaged) file(s) found:
     /debuginfo.list
     /usr/lib/debug/.build-id/09/748c214974bfba1522d434a7e0a02e2fd7f29b.debug
     /usr/lib/debug/.build-id/0b/b96dd9c7d3689d82e56d2e73b46f53103cc6c7.debug
     /usr/lib/debug/.build-id/0e/979a2f34967c7437fd30aabb41de1f0c8b6a66.debug
    ...

To workaround this, restrict the manual debug info package creation
process to when it is necessary (CONFIG_MODULE_SIG=y) and possible (when
using RPM >= 4.20.0). A follow up change will restore the RPM debuginfo
creation process using a separate internal flag to allow the package to
be built in more situations, as RPM 4.20.0 is a fairly recent version
and the built-in -debuginfo generation works fine when module signing is
disabled.

Cc: stable@vger.kernel.org
Fixes: 62089b804895 ("kbuild: rpm-pkg: Generate debuginfo package manually")
Link: https://github.com/rpm-software-management/rpm/commit/49f906998f3cf1f4152162ca61ac0869251c380f
Reported-by: Steve French <smfrench@gmail.com>
Closes: https://lore.kernel.org/CAH2r5mugbrHTwnaQwQiYEUVwbtqmvFYf0WZiLrrJWpgT8iwftw@mail.gmail.com/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>

ceph: minor cleanup in ceph_fscrypt_decrypt_extents()

The Coverity Scan service has reported a potential issue
in ceph_fscrypt_decrypt_extents() method [1]. The function
ceph_fscrypt_decrypt_page() can return the negative value as
an error code. Logic of ceph_fscrypt_decrypt_extents()
process this case in correct way. However, it makes sense
to make the minor cleanup of the function logic.

This patch adds several unlikely macros to conditions checks
and it reworks fret variable check by adding else statement
to the condition check.

[1] https://scan5.scan.coverity.com/#/project-view/64304/10063?selectedIssue=1662519

Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
cc: Alex Markuze <amarkuze@redhat.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Ceph Development <ceph-devel@vger.kernel.org>

ceph: fix potential overflow in parse_reply_info_dir()

The parse_reply_info_dir() logic tries to parse
a dir fragment:

struct ceph_mds_reply_dirfrag {
__le32 frag;            /* fragment */
__le32 auth;            /* auth mds, if this is a delegation point */
__le32 ndist;           /* number of mds' this is replicated on */
__le32 dist[];
} __attribute__ ((packed));

Potentially, ndist field could be corrupted or to have
invalid or malicious value. As a result, this logic
could result in overflow:

*p += sizeof(**dirfrag) + sizeof(u32) * le32_to_cpu((*dirfrag)->ndist);

Al Viro suggested the initial vision of the fix.
The suggested fix was partially reworked.

This patch adds the checking that ndist is not bigger
than (U32_MAX / sizeof(u32)) and to check that we have
enough space in memory buffer by means of ceph_decode_need().

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
cc: Alex Markuze <amarkuze@redhat.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Ceph Development <ceph-devel@vger.kernel.org>

ceph: cleanup in check_new_map() method

The Coverity Scan service has reported a potential issue
in check_new_map() method [1]. The check_new_map() executes
checking of newmap->m_info on NULL in the beginning of
the method. However, it operates by newmap->m_info later
in the method without any check on NULL. Analysis of the code
flow shows that ceph_mdsmap_decode() guarantees the allocation
of m_info array. And check_new_map() never will be called
with newmap->m_info not allocated.

This patch exchanges checking of newmap->m_info on BUG_ON()
pattern because the situation of having NULL in newmap->m_info
during check_new_map() is not expecting event. Also, this patch
reworks logic of __open_export_target_sessions(),
ceph_mdsmap_get_addr(), ceph_mdsmap_get_state(), and
ceph_mdsmap_is_laggy() by checking mdsmap->m_info on NULL value.

[1] https://scan5.scan.coverity.com/#/project-view/64304/10063?selectedIssue=1491799

Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
cc: Alex Markuze <amarkuze@redhat.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Ceph Development <ceph-devel@vger.kernel.org>

ceph: cleanup of processing ci->i_ceph_flags bits

The Coverity Scan service has detected potential
race condition in ceph_check_delayed_caps() [1].

The CID 1590633 contains explanation: "Accessing
ci->i_ceph_flags without holding lock
ceph_inode_info.i_ceph_lock. The value of the shared data
will be determined by the interleaving of thread execution.
Thread shared data is accessed without holding an appropriate
lock, possibly causing a race condition (CWE-366)".

The patch reworks the logic of accessing ci->i_ceph_flags.
At first, it removes ci item from a mdsc->cap_delay_list.
Then it unlocks mdsc->cap_delay_lock and it locks
ci->i_ceph_lock. Then, it calls smp_mb__before_atomic()
to be sure that ci->i_ceph_flags has consistent state of
the bits. The is_metadata_under_flush variable stores
the state of CEPH_I_FLUSH_BIT. Finally, it unlocks
the ci->i_ceph_lock and it locks the mdsc->cap_delay_lock.
The is_metadata_under_flush is used to check the condition
that ci needs to be removed from mdsc->cap_delay_list.
If it is not the case, then ci will be added into the head of
mdsc->cap_delay_list.

This patch reworks the logic of checking the CEPH_I_FLUSH_BIT,
CEPH_I_FLUSH_SNAPS_BIT, CEPH_I_KICK_FLUSH_BIT,
CEPH_ASYNC_CREATE_BIT, CEPH_I_ERROR_FILELOCK_BIT by test_bit()
method and calling smp_mb__before_atomic() to ensure that
bit state is consistent. It switches on calling the set_bit(),
clear_bit() for these bits, and calling smp_mb__after_atomic()
after these methods to ensure that modified bit is visible.

Additionally, __must_hold() has been added for
__cap_delay_requeue(), __cap_delay_requeue_front(), and
__prep_cap() to help the sparse with lock checking and
it was commented that caller of __cap_delay_requeue_front()
and __prep_cap() must lock the ci->i_ceph_lock.

v.2
Alex Markuze suggested to rework all Ceph inode's flags.
Now, every declaration has CEPH_I_<*> and CEPH_I_<*>_BIT pair.

v.3
The logic of operating by ci->i_ceph_flags bits on using
test_bit(), clear_bit(), set_bit() and smp_mb__before_atomic(),
smp_mb__after_atomic() has been reworked in addr.c, inode.c,
locks.c, mds_client.c, snap.c, super.h, xattr.c additionally
to caps.c.

[1] https://scan5.scan.coverity.com/#/project-view/64304/10063?selectedIssue=1590633

Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
cc: Alex Markuze <amarkuze@redhat.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Ceph Development <ceph-devel@vger.kernel.org>

ceph: cleanup in __ceph_do_pending_vmtruncate() method

The Coverity Scan service has detected an unchecked return
value in __ceph_do_pending_vmtruncate() method [1].

The CID 114041 contains explanation: " Calling
filemap_write_and_wait_range without checking return value.
If the function returns an error value, the error value
may be mistaken for a normal value. Value returned from
a function is not checked for errors before being used.
(CWE-252)".

The patch adds the checking of returned value of
filemap_write_and_wait_range() and reporting the error
message if something wrong is happened during the call.
It reworks the logic by removing the jump to retry
label because it could be the reason of potential
infinite loop in the case of error condition during
the filemap_write_and_wait_range() call. It was removed
the check to == ci->i_truncate_pagecache_size because
the to variable is set by ci->i_truncate_pagecache_size
in the above code logic. The uneccessary finish variable
has been removed because the to variable always has
ci->i_truncate_pagecache_size value. Now if the condition
ci->i_truncate_pending == 0 is true then logic will jump
to the end of the function and wake_up_all(&ci->i_cap_wq)
will be called.

[1] https://scan5.scan.coverity.com/#/project-view/64304/10063?selectedIssue=114041

cc: Alex Markuze <amarkuze@redhat.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Ceph Development <ceph-devel@vger.kernel.org>
Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>

ceph: fix potential race condition of i_cap_delay_list access

The Coverity Scan service has detected potential
race condition of i_cap_delay_list access [1].
The CID 1596363 contains explanation: "Accessing
ci->i_cap_delay_list without holding lock
ceph_mds_client.cap_delay_lock. Elsewhere,
ceph_inode_info.i_cap_delay_list is written to with
ceph_mds_client.cap_delay_lock held 9 out of 9 times.
The value of the shared data will be determined
by the interleaving of thread execution. In ceph_check_caps:
Thread shared data is accessed without holding an appropriate
lock, possibly causing a race condition (CWE-366)".

The patch reworks __cap_delay_cancel() logic by means
moving list_empty(&ci->i_cap_delay_list) under
mdsc->cap_delay_lock protection. Patch introduces
is_cap_delay_list_empty_safe() function that checks
the emptiness of i_cap_delay_list under
mdsc->cap_delay_lock protection. This function is used
in ceph_check_caps() and __ceph_touch_fmode() methods
to resolve the race condition issue.

[1] https://scan5.scan.coverity.com/#/project-view/64304/10063?selectedIssue=1596363

Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>

ceph: fix overflowed value issue in ceph_submit_write()

The Coverity Scan service has detected overflowed value
issue in ceph_submit_write() [1]. The CID 1646339 defect
contains explanation: "The overflowed value due to
arithmetic on constants is too small or unexpectedly
negative, causing incorrect computations.
In ceph_submit_write: Integer overflow occurs in
arithmetic on constant operands (CWE-190)".

This patch adds a check ceph_wbc->locked_pages on
equality to zero and it exits function if it has
zero value. Also, it introduces a processed_pages
variable with the goal of assigning the number of
processed pages and checking this number on
equality to zero. The check of processed_pages
variable on equality to zero should protect from
overflowed value of index that selects page in
ceph_wbc->pages[index] array.

[1] https://scan5.scan.coverity.com/#/project-view/64304/10063?selectedIssue=1646339

Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>

ceph: fix ceph_fallocate() ignoring of FALLOC_FL_ALLOCATE_RANGE mode

The fio test reveals the issue for the case of file size
is not aligned on 4K (for example, 4122, 8600, 10K etc).
The reproducing path:

target_dir=/mnt/cephfs
report_dir=/report
size=100ki
nrfiles=10
pattern=0x74657374

fio --runtime=5M --rw=write --bs=4k --size=$size \
--nrfiles=$nrfiles --numjobs=16 --buffer_pattern=0x74657374 \
--iodepth=1 --direct=0 --ioengine=libaio --group_reporting \
--name=fiotest --directory=$target_dir \
--output $report_dir/sequential_write.log

fio --runtime=5M --verify_only --verify=pattern \
--verify_pattern=0x74657374 --size=$size --nrfiles=$nrfiles \
--numjobs=16 --bs=4k --iodepth=1 --direct=0 --name=fiotest \
--ioengine=libaio --group_reporting --verify_fatal=1 \
--verify_state_save=0 --directory=$target_dir \
--output $report_dir/verify_sequential_write.log

The essence of the issue that the write phase calls
the fallocate() to pre-allocate 10K of file size and, then,
it writes only 8KB of data. However, CephFS code
in ceph_fallocate() ignores the FALLOC_FL_ALLOCATE_RANGE
mode and, finally, file is 8K in size only. As a result,
verification phase initiates wierd behaviour of CephFS code.
CephFS code calls ceph_fallocate() again and completely
re-write the file content by some garbage. Finally,
verification phase fails because file contains unexpected
data pattern.

fio: got pattern 'd0', wanted '74'. Bad bits 3
fio: bad pattern block offset 0
pattern: verify failed at file /mnt/cephfs/fiotest.3.0 offset 0, length 2631490270 (requested block: offset=0, length=4096, flags=8)
fio: verify type mismatch (36969 media, 18 given)
fio: got pattern '25', wanted '74'. Bad bits 3
fio: bad pattern block offset 0
pattern: verify failed at file /mnt/cephfs/fiotest.4.0 offset 0, length 1694436820 (requested block: offset=0, length=4096, flags=8)
fio: verify type mismatch (6714 media, 18 given)

Expected state ot the file:

hexdump -C ./fiotest.0.0
00000000 74 65 73 74 74 65 73 74 74 65 73 74 74 65 73 74 |testtesttesttest| *
00002000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| *
00002190 00 00 00 00 00 00 00 00 |........|
00002198

Real state of the file:

head -n 2 ./fiotest.0.0
00000000 35 e0 28 cc 38 a0 99 16 06 9c 6a a9 f2 cd e9 0a |5.(.8.....j.....|
00000010 80 53 2a 07 09 e5 0d 15 70 4a 25 f7 0b 39 9d 18 |.S*.....pJ%..9..|

The patch reworks ceph_fallocate() method by means of adding
support of FALLOC_FL_ALLOCATE_RANGE mode. Also, it adds the checking
that new size can be allocated by means of checking inode_newsize_ok(),
fsc->max_file_size, and ceph_quota_is_max_bytes_exceeded().
Invalidation and making dirty logic is moved into dedicated
methods.

There is one peculiarity for the case of generic/103 test.
CephFS logic receives max_file_size from MDS server and it's 1TB
by default. As a result, generic/103 can fail if max_file_size
is smaller than volume size:

generic/103 6s ... - output mismatch (see /home/slavad/XFSTESTS/xfstests-dev/results//generic/103.out.bad)

ceph: add process/thread ID into debug output

Process/Thread ID (pid) is crucial and essential info
during the debug and bug fix. It is really hard
to analyze the debug output without these details.
This patch addes PID info into the debug output.

Reviewed-by: Alex Markuze <amarkuze@redhat.com>
Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>

ceph: cleanup the sessions when peer reset

The reconnect feature never been supported by MDS in mds non-RECONNECT
state. This reconnect requests will incorrectly close the just reopened
sessions when the MDS kills them during the "mds_session_blocklist_on_evict"
option is disabled.

Fixes: 7e70f0ed9f3e ("ceph: attempt mds reconnect if mds closes our session")
URL: https://tracker.ceph.com/issues/65647
Signed-off-by: Xiubo Li <xiubli@redhat.com>

ceph: defer clearing the CEPH_I_FLUSH_SNAPS flag

Clear the flag just after the capsnap request being sent out. Else the
ceph_check_caps() will race with it and send the cap update request
just before this capsnap request. Which will cause the cap update request
to miss setting the CEPH_CLIENT_CAPS_PENDING_CAPSNAP flag and finally
the mds will drop the capsnap request to floor.

URL: https://tracker.ceph.com/issues/64209
URL: https://tracker.ceph.com/issues/65705
Signed-off-by: Xiubo Li <xiubli@redhat.com>

ceph: make the ceph-cap workqueue UNBOUND

There is not harm to mark the ceph-cap workqueue unbounded, just
like we do in ceph-inode workqueue.

URL: https://www.spinics.net/lists/ceph-users/msg78775.html
URL: https://tracker.ceph.com/issues/64977
Reported-by: Stefan Kooman <stefan@bit.nl>
Signed-off-by: Xiubo Li <xiubli@redhat.com>

ceph: return -ENODATA when xattr doesn't exist for removexattr

The POSIX says we should return -ENODATA when the corresponding
attribute doesn't exist when removing it. While there is one
exception for the acl ones in the local filesystems, for exmaple
for xfs, which will treat it as success.

While in the MDS side there have two ways to remove the xattr:
sending a CEPH_MDS_OP_SETXATTR request by setting the 'flags' with
CEPH_XATTR_REMOVE and just issued a CEPH_MDS_OP_RMXATTR request
directly.

For the first one it will always return 0 when the corresponding
xattr doesn't exist, while for the later one it will return
-ENODATA instead, this should be fixed in MDS to make them to be
consistent.

And at the same time added a new flags CEPH_XATTR_REMOVE2 and in
MDS side it will return -ENODATA when the xattr doesn't exist.
While the CEPH_XATTR_REMOVE will be kept to be compatible with
old cephs.

Please note this commit also fixed a bug, which is that even when
the ACL xattrs don't exist the ctime/mode still will be updated.

URL: https://tracker.ceph.com/issues/64679
Signed-off-by: Xiubo Li <xiubli@redhat.com>

[DO NOT MERGE]ceph: add more debug log when we hitting no inode or caps

It's so strange that the caps in client side is removed but still exists
in MDS.

URL: https://tracker.ceph.com/issues/64977
Signed-off-by: Xiubo Li <xiubli@redhat.com>

[DO NOT MERGE] ceph: BUG if MDS changed truncate_seq with client caps still outstanding

We need to trigger to crash the kernel and fail the qa tests to
get more infomation about the bug.

URL: https://tracker.ceph.com/issues/56693
Signed-off-by: Xiubo Li <xiubli@redhat.com>

[DO NOT MERGE] ceph: make sure all the files successfully put before unmounting

When close a file it will be deferred to call the fput(), which
will hold the inode's i_count. And when unmounting the mountpoint
the evict_inodes() may skip evicting some inodes.

If encrypt is enabled the kernel generate a warning when removing
the encrypt keys when the skipped inodes still hold the keyring:

WARNING: CPU: 4 PID: 168846 at fs/crypto/keyring.c:242 fscrypt_destroy_keyring+0x7e/0xd0
CPU: 4 PID: 168846 Comm: umount Tainted: G S  6.1.0-rc5-ceph-g72ead199864c #1
Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
RIP: 0010:fscrypt_destroy_keyring+0x7e/0xd0
RSP: 0018:ffffc9000b277e28 EFLAGS: 00010202
RAX: 0000000000000002 RBX: ffff88810d52ac00 RCX: ffff88810b56aa00
RDX: 0000000080000000 RSI: ffffffff822f3a09 RDI: ffff888108f59000
RBP: ffff8881d394fb88 R08: 0000000000000028 R09: 0000000000000000
R10: 0000000000000001 R11: 11ff4fe6834fcd91 R12: ffff8881d394fc40
R13: ffff888108f59000 R14: ffff8881d394f800 R15: 0000000000000000
FS:  00007fd83f6f1080(0000) GS:ffff88885fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f918d417000 CR3: 000000017f89a005 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
generic_shutdown_super+0x47/0x120
kill_anon_super+0x14/0x30
ceph_kill_sb+0x36/0x90 [ceph]
deactivate_locked_super+0x29/0x60
cleanup_mnt+0xb8/0x140
task_work_run+0x67/0xb0
exit_to_user_mode_prepare+0x23d/0x240
syscall_exit_to_user_mode+0x25/0x60
do_syscall_64+0x40/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fd83dc39e9b

URL: https://tracker.ceph.com/issues/58126
Signed-off-by: Xiubo Li <xiubli@redhat.com>

[DO NOT MERGE] mm: BUG if filemap_alloc_folio gives us a folio with a non-NULL ->private

We've seen some instances where we call __filemap_get_folio and get back
one with a ->private value that is non-NULL. Let's have the allocator
bug if that happens.

For now, let's just put this into the testing kernel. We can let Willy
decide if he wants it in mainline.

URL: https://tracker.ceph.com/issues/55421
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Xiubo Li <xiubli@redhat.com>
Cc: Luís Henriques <lhenriques@suse.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>

[DO NOT MERGE] ceph: dump info about cap flushes when we're waiting too long for them

We've had some cases of hung umounts in teuthology testing. It looks
like client is waiting for cap flushes to complete, but they aren't.

Add a field to the inode to track the highest cap flush tid seen for
that inode. Also, add a backpointer to the inode to the ceph_cap_flush
struct.

Change wait_caps_flush to wait 60s, and then dump info about the
condition of the list.

Also, print pr_info messages if we end up dropping a FLUSH_ACK for an
inode onto the floor, or if we get a message on an unregistered
session.

Reported-by: Patrick Donnelly <pdonnell@redhat.com>
URL: https://tracker.ceph.com/issues/51279
Signed-off-by: Jeff Layton <jlayton@kernel.org>

[DO NOT MERGE] rbd: bump RBD_MAX_PARENT_CHAIN_LEN to 128

Bump RBD_MAX_PARENT_CHAIN_LEN from 16 to 128 to avoid fsx failures.

(The alternative is changing fsx to flatten unconditionally when the
limit of 16 is reached, which is ugly and not needed for librbd.)

ceph: assert loop invariants in ceph_writepages_start()

If `locked_pages` is zero, the page array must not be allocated:
ceph_process_folio_batch() uses `locked_pages` to decide when to
allocate `pages`, and redundant allocations trigger
ceph_allocate_page_array()'s BUG_ON(), resulting in a worker oops (and
writeback stall) or even a kernel panic. Consequently, the main loop in
ceph_writepages_start() assumes that the lifetime of `pages` is confined
to a single iteration.

This expectation is currently not clear enough, as evidenced by the
recent patch which fixed an oops caused by `pages` persisting into
the next loop iteration:
- "ceph: do not propagate page array emplacement errors as batch errors"

Use an explicit BUG_ON() at the top of the loop to assert the loop's
preexisting expectation that `pages` is cleaned up by the previous
iteration. Because this is closely tied to `locked_pages`, also make it
the previous iteration's responsibility to guarantee its reset, and
verify with a second new BUG_ON() instead of handling (and masking)
failures to do so.

This patch does not change invariants, behavior, or failure modes.
The added BUG_ON() lines catch conditions that would already trigger oops,
but do so earlier for easier debugging and programmer clarity.

Signed-off-by: Sam Edwards <CFSworks@gmail.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

ceph: remove error return from ceph_process_folio_batch()

Following an earlier commit, ceph_process_folio_batch() no longer
returns errors because the writeback loop cannot handle them.

Since this function already indicates failure to lock any pages by
leaving `ceph_wbc.locked_pages == 0`, and the writeback loop has no way
to handle abandonment of a locked batch, change the return type of
ceph_process_folio_batch() to `void` and remove the pathological goto in
the writeback loop. The lack of a return code emphasizes that
ceph_process_folio_batch() is designed to be abort-free: that is, once
it commits a folio for writeback, it will not later abandon it or
propagate an error for that folio. Any future changes requiring "abort"
logic should follow this invariant by cleaning up its array and
resetting ceph_wbc.locked_pages appropriately.

Signed-off-by: Sam Edwards <CFSworks@gmail.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

ceph: fix write storm on fscrypted files

CephFS stores file data across multiple RADOS objects. An object is the
atomic unit of storage, so the writeback code must clean only folios
that belong to the same object with each OSD request.

CephFS also supports RAID0-style striping of file contents: if enabled,
each object stores multiple unbroken "stripe units" covering different
portions of the file; if disabled, a "stripe unit" is simply the whole
object. The stripe unit is (usually) reported as the inode's block size.

Though the writeback logic could, in principle, lock all dirty folios
belonging to the same object, its current design is to lock only a
single stripe unit at a time. Ever since this code was first written,
it has determined this size by checking the inode's block size.
However, the relatively-new fscrypt support needed to reduce the block
size for encrypted inodes to the crypto block size (see 'fixes' commit),
which causes an unnecessarily high number of write operations (~1024x as
many, with 4MiB objects) and correspondingly degraded performance.

Fix this (and clarify intent) by using i_layout.stripe_unit directly in
ceph_define_write_size() so that encrypted inodes are written back with
the same number of operations as if they were unencrypted.

This patch depends on the preceding commit ("ceph: do not propagate page
array emplacement errors as batch errors") for correctness. While it
applies cleanly on its own, applying it alone will introduce a
regression. This dependency is only relevant for kernels where
ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method") has
been applied; stable kernels without that commit are unaffected.

Cc: stable@vger.kernel.org
Fixes: 94af0470924c ("ceph: add some fscrypt guardrails")
Signed-off-by: Sam Edwards <CFSworks@gmail.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

ceph: do not propagate page array emplacement errors as batch errors

When fscrypt is enabled, move_dirty_folio_in_page_array() may fail
because it needs to allocate bounce buffers to store the encrypted
versions of each folio. Each folio beyond the first allocates its bounce
buffer with GFP_NOWAIT. Failures are common (and expected) under this
allocation mode; they should flush (not abort) the batch.

However, ceph_process_folio_batch() uses the same `rc` variable for its
own return code and for capturing the return codes of its routine calls;
failing to reset `rc` back to 0 results in the error being propagated
out to the main writeback loop, which cannot actually tolerate any
errors here: once `ceph_wbc.pages` is allocated, it must be passed to
ceph_submit_write() to be freed. If it survives until the next iteration
(e.g. due to the goto being followed), ceph_allocate_page_array()'s
BUG_ON() will oops the worker.

Note that this failure mode is currently masked due to another bug
(addressed next in this series) that prevents multiple encrypted folios
from being selected for the same write.

For now, just reset `rc` when redirtying the folio to prevent errors in
move_dirty_folio_in_page_array() from propagating. Note that
move_dirty_folio_in_page_array() is careful never to return errors on
the first folio, so there is no need to check for that. After this
change, ceph_process_folio_batch() no longer returns errors; its only
remaining failure indicator is `locked_pages == 0`, which the caller
already handles correctly.

Cc: stable@vger.kernel.org
Fixes: ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method")
Signed-off-by: Sam Edwards <CFSworks@gmail.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

ceph: supply snapshot context in ceph_uninline_data()

The ceph_uninline_data function was missing proper snapshot context
handling for its OSD write operations. Both CEPH_OSD_OP_CREATE and
CEPH_OSD_OP_WRITE requests were passing NULL instead of the appropriate
snapshot context, which could lead to unnecessary object clone.

Reproducer:
../src/vstart.sh --new -x --localhost --bluestore
// turn on cephfs inline data
./bin/ceph fs set a inline_data true --yes-i-really-really-mean-it
// allow fs_a client to take snapshot
./bin/ceph auth caps client.fs_a mds 'allow rwps fsname=a' mon 'allow r fsname=a' osd 'allow rw tag cephfs data=a'
// mount cephfs with fuse, since kernel cephfs doesn't support inline write
ceph-fuse --id fs_a -m 127.0.0.1:40318 --conf ceph.conf -d /mnt/mycephfs/
// bump snapshot seq
mkdir /mnt/mycephfs/.snap/snap1
echo "foo" > /mnt/mycephfs/test
// umount and mount it again using kernel cephfs client
umount /mnt/mycephfs
mount -t ceph fs_a@.a=/ /mnt/mycephfs/ -o conf=./ceph.conf
echo "bar" >> /mnt/mycephfs/test
./bin/rados listsnaps -p cephfs.a.data $(printf "%x\n" $(stat -c %i /mnt/mycephfs/test)).00000000

will see this object does unnecessary clone
1000000000a.00000000 (seq:2):
cloneid snaps   size    overlap
2       2       4       []
head    -       8

but it's expected to see
10000000000.00000000 (seq:2):
cloneid snaps   size    overlap
head    -       8

since there's no snapshot between these 2 writes

clone happened because the first osd request CEPH_OSD_OP_CREATE doesn't
pass snap context so object is created with snap seq 0, but later data
writeback is equipped with snapshot context.
snap.seq(1) > object snap seq(0), so osd does object clone.

This fix properly acquiring the snapshot context before performing
write operations.

Signed-off-by: ethanwu <ethanwu@synology.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Tested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

ceph: supply snapshot context in ceph_zero_partial_object()

The ceph_zero_partial_object function was missing proper snapshot
context for its OSD write operations, which could lead to data
inconsistencies in snapshots.

Reproducer:
../src/vstart.sh --new -x --localhost --bluestore
./bin/ceph auth caps client.fs_a mds 'allow rwps fsname=a' mon 'allow r fsname=a' osd 'allow rw tag cephfs data=a'
mount -t ceph fs_a@.a=/ /mnt/mycephfs/ -o conf=./ceph.conf
dd if=/dev/urandom of=/mnt/mycephfs/foo bs=64K count=1
mkdir /mnt/mycephfs/.snap/snap1
md5sum /mnt/mycephfs/.snap/snap1/foo
fallocate -p -o 0 -l 4096 /mnt/mycephfs/foo
echo 3 > /proc/sys/vm/drop/caches
md5sum /mnt/mycephfs/.snap/snap1/foo # get different md5sum!!

Cc: stable@vger.kernel.org
Fixes: ad7a60de882ac ("ceph: punch hole support")
Signed-off-by: ethanwu <ethanwu@synology.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Tested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

libceph: adapt ceph_x_challenge_blob hashing and msgr1 message signing

The existing approach where ceph_x_challenge_blob is encrypted with the
client's secret key and then the digest derived from the ciphertext is
used for the test doesn't work with CEPH_CRYPTO_AES256KRB5 because the
confounder randomizes the ciphertext: the client and the server get two
different ciphertexts and therefore two different digests.

msgr1 signatures are affected the same way: a digest derived from the
ciphertext for the message's "sigblock" is what becomes a signature and
the two sides disagree on the expected value.

For CEPH_CRYPTO_AES256KRB5 (and potential future encryption schemes),
switch to HMAC-SHA256 function keyed in the same way as the existing
encryption. For CEPH_CRYPTO_AES, everything is preserved as is.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

libceph: add support for CEPH_CRYPTO_AES256KRB5

This is based on AES256-CTS-HMAC384-192 crypto algorithm per RFC 8009
(i.e. Kerberos 5, hence the name) with custom-defined key usage numbers.
The implementation allows a given key to have/be linked to between one
and three usage numbers.

The existing CEPH_CRYPTO_AES remains in place and unchanged. The
usage_slot parameter that needed to be added to ceph_crypt() and its
wrappers is simply ignored there.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

libceph: introduce ceph_crypto_key_prepare()

In preparation for bringing in a new encryption scheme/key type,
decouple decoding or cloning the key from allocating required crypto
API objects and setting them up. The rationale is that a) in some
cases a shallow clone is sufficient and b) ceph_crypto_key_prepare()
may grow additional parameters that would be inconvenient to provide
at the point the key is originally decoded.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

libceph: generalize ceph_x_encrypt_offset() and ceph_x_encrypt_buflen()

- introduce the notion of a data offset for ceph_x_encrypt_offset()
  to allow for e.g. confounder to be prepended before the encryption
  header in the future.  For CEPH_CRYPTO_AES, the data offset is 0
  (i.e. nothing is prepended).

- adjust ceph_x_encrypt_buflen() accordingly and make it account for
  PKCS#7 padding that is used by CEPH_CRYPTO_AES precisely instead of
  just always adding 16.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

libceph: define and enforce CEPH_MAX_KEY_LEN

When decoding the key, verify that the key material would fit into
a fixed-size buffer in process_auth_done() and generally has a sane
length.

The new CEPH_MAX_KEY_LEN check replaces the existing check for a key
with no key material which is a) not universal since CEPH_CRYPTO_NONE
has to be excluded and b) doesn't provide much value since a smaller
than needed key is just as invalid as no key -- this has to be handled
elsewhere anyway.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

Linux 6.19

Merge tag 'i2c-for-6.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fix from Wolfram Sang:

- imx: preserve error state during SMBus block read length handling

* tag 'i2c-for-6.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: imx: preserve error state in block data length handler

Merge tag 'spi-fix-v6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"One final batch of fixes for the Tegra SPI drivers, the main one is a
  batch of fixes for races with the interrupts in the Tegra210 QSPI
  driver that Breno has been working on for a while"

* tag 'spi-fix-v6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: tegra114: Preserve SPI mode bits in def_command1_reg
  spi: tegra: Fix a memory leak in tegra_slink_probe()
  spi: tegra210-quad: Protect curr_xfer check in IRQ handler
  spi: tegra210-quad: Protect curr_xfer clearing in tegra_qspi_non_combined_seq_xfer
  spi: tegra210-quad: Protect curr_xfer in tegra_qspi_combined_seq_xfer
  spi: tegra210-quad: Protect curr_xfer assignment in tegra_qspi_setup_transfer_one
  spi: tegra210-quad: Move curr_xfer read inside spinlock
  spi: tegra210-quad: Return IRQ_HANDLED when timeout already processed transfer

Merge tag 'regulator-fix-v6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fix from Mark Brown:
"One last fix for v6.19: the voltages for the SpaceMIT P1 were not
described correctly"

* tag 'regulator-fix-v6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: spacemit-p1: Fix n_voltages for BUCK and LDO regulators

Merge tag 'char-misc-6.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull binder fixes from Greg KH:
"Here are some small, last-minute binder C and Rust driver fixes for
  reported issues. They include a number of fixes for reported crashes
  and other problems.

  All of these have been in linux-next this week, and longer"

* tag 'char-misc-6.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  binderfs: fix ida_alloc_max() upper bound
  rust_binderfs: fix ida_alloc_max() upper bound
  binder: fix BR_FROZEN_REPLY error log
  rust_binder: add additional alignment checks
  binder: fix UAF in binder_netlink_report()
  rust_binder: correctly handle FDA objects of length zero

Merge tag 'sched-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
"Miscellaneous MMCID fixes to address bugs and performance regressions
  in the recent rewrite of the SCHED_MM_CID management code:

   - Fix livelock triggered by BPF CI testing

   - Fix hard lockup on weakly ordered systems

   - Simplify the dropping of CIDs in the exit path by removing an
     unintended transition phase

   - Fix performance/scalability regression on a thread-pool benchmark
     by optimizing transitional CIDs when scheduling out"

* tag 'sched-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/mmcid: Optimize transitional CIDs when scheduling out
  sched/mmcid: Drop per CPU CID immediately when switching to per task mode
  sched/mmcid: Protect transition on weakly ordered systems
  sched/mmcid: Prevent live lock on task to CPU mode transition

Merge tag 'objtool-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fixes from Ingo Molnar::

- Bump up the Clang minimum version requirements for livepatch
   builds, due to Clang assembler section handling bugs causing
   silent miscompilations

- Strip livepatching symbol artifacts from non-livepatch modules

- Fix livepatch build warnings when certain Clang LTO options
   are enabled

- Fix livepatch build error when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y

* tag 'objtool-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool/klp: Fix unexported static call key access for manually built livepatch modules
  objtool/klp: Fix symbol correlation for orphaned local symbols
  livepatch: Free klp_{object,func}_ext data after initialization
  livepatch: Fix having __klp_objects relics in non-livepatch modules
  livepatch/klp-build: Require Clang assembler >= 20

x86/vmware: Fix hypercall clobbers

Fedora QA reported the following panic:

  BUG: unable to handle page fault for address: 0000000040003e54
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0002) - not-present page
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20251119-3.fc43 11/19/2025
  RIP: 0010:vmware_hypercall4.constprop.0+0x52/0x90
  ..
  Call Trace:
   vmmouse_report_events+0x13e/0x1b0
   psmouse_handle_byte+0x15/0x60
   ps2_interrupt+0x8a/0xd0
   ...

because the QEMU VMware mouse emulation is buggy, and clears the top 32
bits of %rdi that the kernel kept a pointer in.

The QEMU vmmouse driver saves and restores the register state in a
"uint32_t data[6];" and as a result restores the state with the high
bits all cleared.

RDI originally contained the value of a valid kernel stack address
(0xff5eeb3240003e54).  After the vmware hypercall it now contains
0x40003e54, and we get a page fault as a result when it is dereferenced.

The proper fix would be in QEMU, but this works around the issue in the
kernel to keep old setups working, when old kernels had not happened to
keep any state in %rdi over the hypercall.

In theory this same issue exists for all the hypercalls in the vmmouse
driver; in practice it has only been seen with vmware_hypercall3() and
vmware_hypercall4().  For now, just mark RDI/RSI as clobbered for those
two calls.  This should have a minimal effect on code generation overall
as it should be rare for the compiler to want to make RDI/RSI live
across hypercalls.

Reported-by: Justin Forbes <jforbes@fedoraproject.org>
Link: https://lore.kernel.org/all/99a9c69a-fc1a-43b7-8d1e-c42d6493b41f@broadcom.com/
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'mm-hotfixes-stable-2026-02-06-12-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull hotfixes from Andrew Morton:
"A couple of late-breaking MM fixes. One against a new-in-this-cycle
  patch and the other addresses a locking issue which has been there for
  over a year"

* tag 'mm-hotfixes-stable-2026-02-06-12-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/memory-failure: reject unsupported non-folio compound page
  procfs: avoid fetching build ID while holding VMA lock

Merge tag 'trace-v6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fix from Steven Rostedt:

- Fix event format field alignments for 32 bit architectures

   The fields in the event format files are used to parse the raw binary
   buffer data by applications. If they are incorrect, then the
   application produces garbage.

   On 32 bit architectures, the function graph 64bit calltime and
   rettime were off by 4bytes. That's because the actual fields are in a
   packed structure but the macros used by the ftrace events did not
   mark them as packed, and instead, gave them their natural alignment
   which made their offsets off by 4 bytes.

   There are macros to have a packed field within an embedded structure
   of an event, but there's no macro for normal fields within a packed
   structure of the event. The macro __field_packed() was used for the
   packed embedded structure field. Rename that to __field_desc_packed()
   (to match the non-packed embedded field macro __field_desc()), and
   make __field_packed() for fields that are in a packed event structure
   (which matches the unpacked __field() macro).

   Switch the calltime and rettime fields of the function graph event to
   use the new __field_packed() and this makes the offsets correct.

* tag 'trace-v6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix ftrace event field alignments

Merge tag 'ceph-for-6.19-rc9' of https://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
"One RBD and two CephFS fixes which address potential oopses.

  The RBD thing is more of a rare edge case that pops up in our CI,
  while the two CephFS scenarios are regressions that were reported by
  users and can be triggered trivially in normal operation. All marked
  for stable"

* tag 'ceph-for-6.19-rc9' of https://github.com/ceph/ceph-client:
  ceph: fix NULL pointer dereference in ceph_mds_auth_match()
  ceph: fix oops due to invalid pointer for kfree() in parse_longname()
  rbd: check for EOD after exclusive lock is ensured to be held

Merge tag 'dma-mapping-6.19-2026-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux

Pull dma-mapping fixes from Marek Szyprowski:
"Two minor fixes for the DMA-mapping subsystem:

   - check for the rare case of the allocation failure of the global CMA
     pool (Shanker Donthineni)

   - avoid perf buffer overflow when tracing large scatter-gather lists
     (Deepanshu Kartikey)"

* tag 'dma-mapping-6.19-2026-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
  dma: contiguous: Check return value of dma_contiguous_reserve_area()
  tracing/dma: Cap dma_map_sg tracepoint arrays to prevent buffer overflow

Merge tag 'iommu-fix-v6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu fix from Joerg Roedel:

- Fix wrong definition of PASID_FLAG_PWSNP bit. This caused DMAR errors
on Arrow Lake platforms.

* tag 'iommu-fix-v6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
iommu/vt-d: Treat PAGE_SNOOP and PWSNP separately

Merge tag 'pmdomain-v6.19-rc3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm

Pull pmdomain fixes from Ulf Hansson:

- imx:
     - Fix system wakeup support for imx8mp power domains
     - Fix potential out-of-range access for imx8m power domains
     - Fix the imx8mm gpu hang

- qcom: Fix off-by-one error for highest state in rpmpd

* tag 'pmdomain-v6.19-rc3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
  pmdomain: imx8mp-blk-ctrl: Keep usb phy power domain on for system wakeup
  pmdomain: imx8mp-blk-ctrl: Keep gpc power domain on for system wakeup
  pmdomain: imx8m-blk-ctrl: fix out-of-range access of bc->domains
  pmdomain: imx: gpcv2: Fix the imx8mm gpu hang due to wrong adb400 reset
  pmdomain: qcom: rpmpd: fix off-by-one error in clamping to the highest state

Merge tag 'gpio-fixes-for-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

- fix incorrect retval check in gpio-loongson-64bit

- fix GPIO counting with ACPI

* tag 'gpio-fixes-for-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: loongson-64bit: Fix incorrect NULL check after devm_kcalloc()
gpiolib: acpi: Fix gpio count with string references

Merge tag 'sound-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"A collection of small fixes. It became a bit larger than wished, but
  all of them are device-specific small fixes, and it should be still
  fairly safe to take at the last minute.

  Included are a few quirks and fixes for Intel, AMD, HD-audio, and
  USB-audio, as well as a race fix in aloop driver and corrections of
  Cirrus firmware kunit test"

* tag 'sound-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek: Enable headset mic for Acer Nitro 5
  ASoC: fsl_xcvr: fix missing lock in fsl_xcvr_mode_put()
  ASoC: dt-bindings: ti,tlv320aic3x: Add compatible string ti,tlv320aic23
  ASoC: amd: fix memory leak in acp3x pdm dma ops
  ALSA: usb-audio: fix broken logic in snd_audigy2nx_led_update()
  ALSA: aloop: Fix racy access at PCM trigger
  ASoC: rt1320: fix intermittent no-sound issue
  ASoC: SOF: Intel: use hdev->info.link_mask directly
  firmware: cs_dsp: rate-limit log messages in KUnit builds
  ASoC: amd: yc: Add quirk for HP 200 G2a 16
  ASoC: cs42l43: Correct handling of 3-pole jack load detection
  ASoC: Intel: sof_es8336: Add DMI quirk for Huawei BOD-WXX9
  ASoC: sof_sdw: Add a quirk for Lenovo laptop using sidecar amps with cs42l43

Merge tag 'slab-for-6.19-rc8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab fix from Vlastimil Babka:
"A stable fix for memory allocation profiling tag not being cleared
when aborting an allocation due to memcg charge failure (Hao Ge)"

* tag 'slab-for-6.19-rc8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slab: Add alloc_tagging_slab_free_hook for memcg_alloc_abort_single

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux

Pull ARM fix from Russell King:
"Just one fix for memset64() on big endian 32-bit ARM systems"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: 9468/1: fix memset64() on big-endian

iommu/vt-d: Treat PAGE_SNOOP and PWSNP separately

The PASID_FLAG_PAGE_SNOOP and PASID_FLAG_PWSNP constants are identical.
This will cause the pasid code to always set both or neither of the
PGSNP and PWSNP bits in PASID table entries. However, PWSNP is a
reserved bit if SMPWC is not set in the IOMMU's extended capability
register, even if SC is supported.

This has resulted in DMAR errors when testing the iommufd code on an
Arrow Lake platform. With this patch, those errors disappear and the
PASID table entries look correct.

Fixes: 101a2854110fa ("iommu/vt-d: Follow PT_FEAT_DMA_INCOHERENT into the PASID entry")
Cc: stable@vger.kernel.org
Signed-off-by: Viktor Kleen <viktor@kleen.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20260202192109.1665799-1-viktor@kleen.org
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

mm/slab: Add alloc_tagging_slab_free_hook for memcg_alloc_abort_single

When CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled, the following warning
may be noticed:

[ 3959.023862] ------------[ cut here ]------------
[ 3959.023891] alloc_tag was not cleared (got tag for lib/xarray.c:378)
[ 3959.023947] WARNING: ./include/linux/alloc_tag.h:155 at alloc_tag_add+0x128/0x178, CPU#6: mkfs.ntfs/113998
[ 3959.023978] Modules linked in: dns_resolver tun brd overlay exfat btrfs blake2b libblake2b xor xor_neon raid6_pq loop sctp ip6_udp_tunnel udp_tunnel ext4 crc16 mbcache jbd2 rfkill sunrpc vfat fat sg fuse nfnetlink sr_mod virtio_gpu cdrom drm_client_lib virtio_dma_buf drm_shmem_helper drm_kms_helper ghash_ce drm sm4 backlight virtio_net net_failover virtio_scsi failover virtio_console virtio_blk virtio_mmio dm_mirror dm_region_hash dm_log dm_multipath dm_mod i2c_dev aes_neon_bs aes_ce_blk [last unloaded: hwpoison_inject]
[ 3959.024170] CPU: 6 UID: 0 PID: 113998 Comm: mkfs.ntfs Kdump: loaded Tainted: G        W           6.19.0-rc7+ #7 PREEMPT(voluntary)
[ 3959.024182] Tainted: [W]=WARN
[ 3959.024186] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
[ 3959.024192] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 3959.024199] pc : alloc_tag_add+0x128/0x178
[ 3959.024207] lr : alloc_tag_add+0x128/0x178
[ 3959.024214] sp : ffff80008b696d60
[ 3959.024219] x29: ffff80008b696d60 x28: 0000000000000000 x27: 0000000000000240
[ 3959.024232] x26: 0000000000000000 x25: 0000000000000240 x24: ffff800085d17860
[ 3959.024245] x23: 0000000000402800 x22: ffff0000c0012dc0 x21: 00000000000002d0
[ 3959.024257] x20: ffff0000e6ef3318 x19: ffff800085ae0410 x18: 0000000000000000
[ 3959.024269] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[ 3959.024281] x14: 0000000000000000 x13: 0000000000000001 x12: ffff600064101293
[ 3959.024292] x11: 1fffe00064101292 x10: ffff600064101292 x9 : dfff800000000000
[ 3959.024305] x8 : 00009fff9befed6e x7 : ffff000320809493 x6 : 0000000000000001
[ 3959.024316] x5 : ffff000320809490 x4 : ffff600064101293 x3 : ffff800080691838
[ 3959.024328] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000d5bcd640
[ 3959.024340] Call trace:
[ 3959.024346]  alloc_tag_add+0x128/0x178 (P)
[ 3959.024355]  __alloc_tagging_slab_alloc_hook+0x11c/0x1a8
[ 3959.024362]  kmem_cache_alloc_lru_noprof+0x1b8/0x5e8
[ 3959.024369]  xas_alloc+0x304/0x4f0
[ 3959.024381]  xas_create+0x1e0/0x4a0
[ 3959.024388]  xas_store+0x68/0xda8
[ 3959.024395]  __filemap_add_folio+0x5b0/0xbd8
[ 3959.024409]  filemap_add_folio+0x16c/0x7e0
[ 3959.024416]  __filemap_get_folio_mpol+0x2dc/0x9e8
[ 3959.024424]  iomap_get_folio+0xfc/0x180
[ 3959.024435]  __iomap_get_folio+0x2f8/0x4b8
[ 3959.024441]  iomap_write_begin+0x198/0xc18
[ 3959.024448]  iomap_write_iter+0x2ec/0x8f8
[ 3959.024454]  iomap_file_buffered_write+0x19c/0x290
[ 3959.024461]  blkdev_write_iter+0x38c/0x978
[ 3959.024470]  vfs_write+0x4d4/0x928
[ 3959.024482]  ksys_write+0xfc/0x1f8
[ 3959.024489]  __arm64_sys_write+0x74/0xb0
[ 3959.024496]  invoke_syscall+0xd4/0x258
[ 3959.024507]  el0_svc_common.constprop.0+0xb4/0x240
[ 3959.024514]  do_el0_svc+0x48/0x68
[ 3959.024520]  el0_svc+0x40/0xf8
[ 3959.024526]  el0t_64_sync_handler+0xa0/0xe8
[ 3959.024533]  el0t_64_sync+0x1ac/0x1b0
[ 3959.024540] ---[ end trace 0000000000000000 ]---

When __memcg_slab_post_alloc_hook() fails, there are two different
free paths depending on whether size == 1 or size != 1. In the
kmem_cache_free_bulk() path, we do call alloc_tagging_slab_free_hook().
However, in memcg_alloc_abort_single() we don't, the above warning will be
triggered on the next allocation.

Therefore, add alloc_tagging_slab_free_hook() to the
memcg_alloc_abort_single() path.

Fixes: 9f9796b413d3 ("mm, slab: move memcg charging to post-alloc hook")
Cc: stable@vger.kernel.org
Suggested-by: Hao Li <hao.li@linux.dev>
Signed-off-by: Hao Ge <hao.ge@linux.dev>
Reviewed-by: Hao Li <hao.li@linux.dev>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260204101401.202762-1-hao.ge@linux.dev
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Merge tag 'hwmon-for-v6.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

- occ: Mark occ_init_attribute() as __printf to avoid build failure due
   to '-Werror=suggest-attribute=format'

- gpio-fan: Allow to stop fans when CONFIG_PM is disabled, and fix
   set_rpm() return value

- acpi_power_meter: Fix deadlocks related to acpi_power_meter_notify()

- dell-smm: Add Dell G15 5510 to fan control whitelist

* tag 'hwmon-for-v6.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (occ) Mark occ_init_attribute() as __printf
  hwmon: (gpio-fan) Allow to stop FANs when CONFIG_PM is disabled
  hwmon: (gpio-fan) Fix set_rpm() return value
  hwmon: (acpi_power_meter) Fix deadlocks related to acpi_power_meter_notify()
  hwmon: (dell-smm) Add Dell G15 5510 to fan control whitelist

Merge tag 'drm-fixes-2026-02-06' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"The usual xe/amdgpu selection, and a couple of misc changes for
  gma500, mgag200 and bridge. There is a nouveau revert, and also a set
  of changes that fix a regression since we moved to 570 firmware.
  Suspend/resume was broken on a bunch of GPUs. The fix looks big, but
  it's mostly just refactoring to pass an extra bit down the nouveau
  abstractions to the firmware command.

  amdgpu:
   - MES 11 old firmware compatibility fix
   - ASPM fix
   - DC LUT fixes

  amdkfd:
   - Fix possible double deletion of validate list

  xe:
   - Fix topology query pointer advance
   - A couple of kerneldoc fixes
   - Disable D3Cold for BMG only on specific platforms
   - Fix CFI violation in debugfs access

  nouveau:
   - Revert adding atomic commit functions as it regresses pre-nv50
   - Fix suspend/resume bugs exposed by enabling 570 firmware

  gma500:
   - Revert a regression caused by vblank changes

  mgag200:
   - Replace a busy loop with a polling loop to fix that blocking 1 cpu
     for 300 ms roughly every 20 minutes

bridge:
   - imx8mp-hdmi-pa: Use runtime pm to fix a bug in channel ordering"

* tag 'drm-fixes-2026-02-06' of https://gitlab.freedesktop.org/drm/kernel:
  drm/xe/guc: Fix CFI violation in debugfs access.
  drm/bridge: imx8mp-hdmi-pai: enable PM runtime
  drm/xe/pm: Disable D3Cold for BMG only on specific platforms
  drm/xe: Fix kerneldoc for xe_tlb_inval_job_alloc_dep
  drm/xe: Fix kerneldoc for xe_gt_tlb_inval_init_early
  drm/xe: Fix kerneldoc for xe_migrate_exec_queue
  drm/xe/query: Fix topology query pointer advance
  drm/mgag200: fix mgag200_bmc_stop_scanout()
  nouveau/gsp: fix suspend/resume regression on r570 firmware
  nouveau: add a third state to the fini handler.
  nouveau/gsp: use rpc sequence numbers properly.
  drm/amdgpu: Fix double deletion of validate_list
  drm/amd/display: remove assert around dpp_base replacement
  drm/amd/display: extend delta clamping logic to CM3 LUT helper
  drm/amd/display: fix wrong color value mapping on MCM shaper LUT
  Revert "drm/amd: Check if ASPM is enabled from PCIe subsystem"
  drm/amd: Set minimum version for set_hw_resource_1 on gfx11 to 0x52
  Revert "drm/gma500: use drm_crtc_vblank_crtc()"
  Revert "drm/nouveau/disp: Set drm_mode_config_funcs.atomic_(check|commit)"

Merge tag 'amd-drm-fixes-6.19-2026-02-05' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.19-2026-02-05:

amdgpu:
- MES 11 old firmware compatibility fix
- ASPM fix
- DC LUT fixes

amdkfd:
- Fix possible double deletion of validate list

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20260205182017.2409773-1-alexander.deucher@amd.com

Merge tag 'drm-xe-fixes-2026-02-05' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
- Fix topology query pointer advance (Shuicheng)
- A couple of kerneldoc fixes (Shuicheng)
- Disable D3Cold for BMG only on specific platforms (Karthik)
- Fix CFI violation in debugfs access (Daniele)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/aYS2v12R8ELQoTiZ@fedora

Merge tag 'drm-misc-fixes-2026-02-05' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

drm-misc-fixes for v6.19 final:

nouveau
-------
Revert adding atomic commit functions as it regresses pre-nv50.
Fix bugs exposed by enabling 570 firmware.

gma500
------
Revert a regression caused by vblank changes.

mgag200
-------
Replace a busy loop with a polling loop to fix that blocking 1 cpu for 300 ms roughly every 20 minutes.

bridge
------
imx8mp-hdmi-pa: Use runtime pm to fix a bug in channel ordering.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patch.msgid.link/c0077ea5-faeb-4b0c-bd4a-ea2384d6dc0c@linux.intel.com

Merge tag 'block-6.19-20260205' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block fixes from Jens Axboe:

- Revert of a change for loop, which caused regressions for some users
   (Actually revert of two commits, where one is just an existing fix
   for the offending commit)

- NVMe pull via Keith:
      - Fix NULL pointer access setting up dma mappings
      - Fix invalid memory access from malformed TCP PDU

* tag 'block-6.19-20260205' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  loop: revert exclusive opener loop status change
  nvmet-tcp: add bounds checks in nvmet_tcp_build_pdu_iovec
  nvme-pci: handle changing device dma map requirements

Merge tag 'io_uring-6.19-20260205' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull io_uring fixes from Jens Axboe:

- Two small fixes for zcrx

- Two small fixes for fdinfo - one is just killing a superflous newline

* tag 'io_uring-6.19-20260205' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  io_uring/fdinfo: be a bit nicer when looping a lot of SQEs/CQEs
  io_uring/fdinfo: kill unnecessary newline feed in CQE32 printing
  io_uring/zcrx: fix rq flush locking
  io_uring/zcrx: fix page array leak

mm/memory-failure: reject unsupported non-folio compound page

When !CONFIG_TRANSPARENT_HUGEPAGE, a non-folio compound page can appear in
a userspace mapping via either vm_insert_*() functions or
vm_operatios_struct->fault(). They are not folios, thus should not be
considered for folio operations like split. To reject these pages, make
sure get_hwpoison_page() is always called as HWPoisonHandlable() will do
the right work.

[Some commit log borrowed from Zi Yan. Thanks.]

Link: https://lkml.kernel.org/r/20260205075328.523211-1-linmiaohe@huawei.com
Fixes: 689b8986776c ("mm/memory-failure: improve large block size folio handling")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reported-by: 是参差 <shicenci@gmail.com>
Closes: https://lore.kernel.org/all/PS1PPF7E1D7501F1E4F4441E7ECD056DEADAB98A@PS1PPF7E1D7501F.apcprd02.prod.outlook.com/
Reviewed-by: Zi Yan <ziy@nvidia.com>
Tested-by: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

procfs: avoid fetching build ID while holding VMA lock

Fix PROCMAP_QUERY to fetch optional build ID only after dropping mmap_lock
or per-VMA lock, whichever was used to lock VMA under question, to avoid
deadlock reported by syzbot:

-> #1 (&mm->mmap_lock){++++}-{4:4}:
        __might_fault+0xed/0x170
        _copy_to_iter+0x118/0x1720
        copy_page_to_iter+0x12d/0x1e0
        filemap_read+0x720/0x10a0
        blkdev_read_iter+0x2b5/0x4e0
        vfs_read+0x7f4/0xae0
        ksys_read+0x12a/0x250
        do_syscall_64+0xcb/0xf80
        entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}:
        __lock_acquire+0x1509/0x26d0
        lock_acquire+0x185/0x340
        down_read+0x98/0x490
        blkdev_read_iter+0x2a7/0x4e0
        __kernel_read+0x39a/0xa90
        freader_fetch+0x1d5/0xa80
        __build_id_parse.isra.0+0xea/0x6a0
        do_procmap_query+0xd75/0x1050
        procfs_procmap_ioctl+0x7a/0xb0
        __x64_sys_ioctl+0x18e/0x210
        do_syscall_64+0xcb/0xf80
        entry_SYSCALL_64_after_hwframe+0x77/0x7f

other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   rlock(&mm->mmap_lock);
                                lock(&sb->s_type->i_mutex_key#8);
                                lock(&mm->mmap_lock);
   rlock(&sb->s_type->i_mutex_key#8);

  *** DEADLOCK ***

This seems to be exacerbated (as we haven't seen these syzbot reports
before that) by the recent:

777a8560fd29 ("lib/buildid: use __kernel_read() for sleepable context")

To make this safe, we need to grab file refcount while VMA is still locked, but
other than that everything is pretty straightforward. Internal build_id_parse()
API assumes VMA is passed, but it only needs the underlying file reference, so
just add another variant build_id_parse_file() that expects file passed
directly.

[akpm@linux-foundation.org: fix up kerneldoc]
Link: https://lkml.kernel.org/r/20260129215340.3742283-1-andrii@kernel.org
Fixes: ed5d583a88a9 ("fs/procfs: implement efficient VMA querying API for /proc/<pid>/maps")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reported-by: <syzbot+4e70c8e0a2017b432f7a@syzkaller.appspotmail.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

spi: tegra114: Preserve SPI mode bits in def_command1_reg

The COMMAND1 register bits [29:28] set the SPI mode, which controls
the clock idle level. When a transfer ends, tegra_spi_transfer_end()
writes def_command1_reg back to restore the default state, but this
register value currently lacks the mode bits. This results in the
clock always being configured as idle low, breaking devices that
need it high.

Fix this by storing the mode bits in def_command1_reg during setup,
to prevent this field from always being cleared.

Fixes: f333a331adfa ("spi/tegra114: add spi driver")
Signed-off-by: Vishwaroop A <va@nvidia.com>
Link: https://patch.msgid.link/20260204141212.1540382-1-va@nvidia.com
Signed-off-by: Mark Brown <broonie@kernel.org>

Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull dcache fixes from Al Viro:
"A couple of regression fixes for the tree-in-dcache series this cycle"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
functionfs: use spinlock for FFS_DEACTIVATED/FFS_CLOSING transitions
rust_binderfs: fix a dentry leak

functionfs: use spinlock for FFS_DEACTIVATED/FFS_CLOSING transitions

When all files are closed, functionfs needs ffs_data_reset() to be
done before any further opens are allowed.

During that time we have ffs->state set to FFS_CLOSING; that makes
->open() fail with -EBUSY.  Once ffs_data_reset() is done, it
switches state (to FFS_READ_DESCRIPTORS) indicating that opening
that thing is allowed again.  There's a couple of additional twists:
* mounting with -o no_disconnect delays ffs_data_reset()
from doing that at the final ->release() to the first subsequent
open().  That's indicated by ffs->state set to FFS_DEACTIVATED;
if open() sees that, it immediately switches to FFS_CLOSING and
proceeds with doing ffs_data_reset() before returning to userland.
* a couple of usb callbacks need to force the delayed
transition; unfortunately, they are done in locking environment
that does not allow blocking and ffs_data_reset() can block.
As the result, if these callbacks see FFS_DEACTIVATED, they change
state to FFS_CLOSING and use schedule_work() to get ffs_data_reset()
executed asynchronously.

Unfortunately, the locking is rather insufficient.  A fix attempted
in e5bf5ee26663 ("functionfs: fix the open/removal races") had closed
a bunch of UAF, but it didn't do anything to the callbacks, lacked
barriers in transition from FFS_CLOSING to FFS_READ_DESCRIPTORS
_and_ it had been too heavy-handed in open()/open() serialization -
I've used ffs->mutex for that, and it's being held over actual IO on
ep0, complete with copy_from_user(), etc.

Even more unfortunately, the userland side is apparently racy enough
to have the resulting timing changes (no failures, just a delayed
return of open(2)) disrupt the things quite badly.  Userland bugs
or not, it's a clear regression that needs to be dealt with.

Solution is to use a spinlock for serializing these state checks and
transitions - unlike ffs->mutex it can be taken in these callbacks
and it doesn't disrupt the timings in open().

We could introduce a new spinlock, but it's easier to use the one
that is already there (ffs->eps_lock) instead - the locking
environment is safe for it in all affected places.

Since now it is held over all places that alter or check the
open count (ffs->opened), there's no need to keep that atomic_t -
int would serve just fine and it's simpler that way.

Fixes: e5bf5ee26663 ("functionfs: fix the open/removal races")
Fixes: 18d6b32fca38 ("usb: gadget: f_fs: add "no_disconnect" mode") # v4.0
Tested-by: Samuel Wu <wusamuel@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

rust_binderfs: fix a dentry leak

Parallel to binderfs patches - 02da8d2c0965 "binderfs_binder_ctl_create():
kill a bogus check" and the bit of b89aa544821d "convert binderfs" that
got lost when making 4433d8e25d73 "convert rust_binderfs"; the former is
a cleanup, the latter is about marking /binder-control persistent, so that
it would be taken out on umount.

Fixes: 4433d8e25d73 ("convert rust_binderfs")
Acked-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Merge tag 'net-6.19-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless and Netfilter.

  Previous releases - regressions:

   - eth: stmmac: fix stm32 (and potentially others) resume regression

   - nf_tables: fix inverted genmask check in nft_map_catchall_activate()

   - usb: r8152: fix resume reset deadlock

   - fix reporting RXH_XFRM_NO_CHANGE as input_xfrm for RSS contexts

  Previous releases - always broken:

   - sched: cls_u32: use skb_header_pointer_careful() to avoid OOB reads
     with malicious u32 rules

   - eth: ice: timestamping related fixes"

* tag 'net-6.19-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits)
  ipv6: Fix ECMP sibling count mismatch when clearing RTF_ADDRCONF
  netfilter: nf_tables: fix inverted genmask check in nft_map_catchall_activate()
  net: cpsw: Execute ndo_set_rx_mode callback in a work queue
  net: cpsw_new: Execute ndo_set_rx_mode callback in a work queue
  gve: Correct ethtool rx_dropped calculation
  gve: Fix stats report corruption on queue count change
  selftest: net: add a test-case for encap segmentation after GRO
  net: gro: fix outer network offset
  net: add proper RCU protection to /proc/net/ptype
  net: ethernet: adi: adin1110: Check return value of devm_gpiod_get_optional() in adin1110_check_spi()
  wifi: iwlwifi: mvm: pause TCM on fast resume
  wifi: iwlwifi: mld: cancel mlo_scan_start_wk
  net: spacemit: k1-emac: fix jumbo frame support
  net: enetc: Convert 16-bit register reads to 32-bit for ENETC v4
  net: enetc: Convert 16-bit register writes to 32-bit for ENETC v4
  net: enetc: Remove CBDR cacheability AXI settings for ENETC v4
  net: enetc: Remove SI/BDR cacheability AXI settings for ENETC v4
  tipc: use kfree_sensitive() for session key material
  net: stmmac: fix stm32 (and potentially others) resume regression
  net: rss: fix reporting RXH_XFRM_NO_CHANGE as input_xfrm for contexts
  ...

gpio: loongson-64bit: Fix incorrect NULL check after devm_kcalloc()

Fix incorrect NULL check in loongson_gpio_init_irqchip().
The function checks chip->parent instead of chip->irq.parents.

Fixes: 03c146cb6cd1 ("gpio: loongson-64bit: Add support for Loongson-2K0300 SoC")
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Link: https://patch.msgid.link/20260205072649.3271158-1-nichen@iscas.ac.cn
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>

ipv6: Fix ECMP sibling count mismatch when clearing RTF_ADDRCONF

syzbot reported a kernel BUG in fib6_add_rt2node() when adding an IPv6
route. [0]

Commit f72514b3c569 ("ipv6: clear RA flags when adding a static
route") introduced logic to clear RTF_ADDRCONF from existing routes
when a static route with the same nexthop is added. However, this
causes a problem when the existing route has a gateway.

When RTF_ADDRCONF is cleared from a route that has a gateway, that
route becomes eligible for ECMP, i.e. rt6_qualify_for_ecmp() returns
true. The issue is that this route was never added to the
fib6_siblings list.

This leads to a mismatch between the following counts:

- The sibling count computed by iterating fib6_next chain, which
includes the newly ECMP-eligible route

- The actual siblings in fib6_siblings list, which does not include
that route

When a subsequent ECMP route is added, fib6_add_rt2node() hits
BUG_ON(sibling->fib6_nsiblings != rt->fib6_nsiblings) because the
counts don't match.

Fix this by only clearing RTF_ADDRCONF when the existing route does
not have a gateway. Routes without a gateway cannot qualify for ECMP
anyway (rt6_qualify_for_ecmp() requires fib_nh_gw_family), so clearing
RTF_ADDRCONF on them is safe and matches the original intent of the
commit.

[0]:
kernel BUG at net/ipv6/ip6_fib.c:1217!
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 6010 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
RIP: 0010:fib6_add_rt2node+0x3433/0x3470 net/ipv6/ip6_fib.c:1217
[...]
Call Trace:
<TASK>
fib6_add+0x8da/0x18a0 net/ipv6/ip6_fib.c:1532
__ip6_ins_rt net/ipv6/route.c:1351 [inline]
ip6_route_add+0xde/0x1b0 net/ipv6/route.c:3946
ipv6_route_ioctl+0x35c/0x480 net/ipv6/route.c:4571
inet6_ioctl+0x219/0x280 net/ipv6/af_inet6.c:577
sock_do_ioctl+0xdc/0x300 net/socket.c:1245
sock_ioctl+0x576/0x790 net/socket.c:1366
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:597 [inline]
__se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: f72514b3c569 ("ipv6: clear RA flags when adding a static route")
Reported-by: syzbot+cb809def1baaac68ab92@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=cb809def1baaac68ab92
Tested-by: syzbot+cb809def1baaac68ab92@syzkaller.appspotmail.com
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260204095837.1285552-1-syoshida@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nf-26-02-05' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter: update for net

This is one last-minute crash fix for nf_tables, from Andrew Fasano:

Logical check is inverted, this makes kernel fail to correctly undo
the transaction, leading to a use-after-free.

* tag 'nf-26-02-05' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: fix inverted genmask check in nft_map_catchall_activate()
====================

Link: https://patch.msgid.link/20260205074450.3187-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

loop: revert exclusive opener loop status change

This commit effectively reverts the following two commits:

2704024d83fa ("loop: add missing bd_abort_claiming in loop_set_status")
08e136ebd193 ("loop: don't change loop device under exclusive opener in loop_set_status")

as there are reports of them causing issues with unmounting. As we're
close to the 6.19 kernel release and the original author hasn't taken a
closer look at this yet, revert them for release.

Reported-by: nokangaroo <nokangaroo@aon.at>
Link: https://lore.kernel.org/all/62de4453-17e8-47f6-a10b-39bf5a49fdee@leemhuis.info/
Signed-off-by: Jens Axboe <axboe@kernel.dk>

objtool/klp: Fix unexported static call key access for manually built livepatch modules

Enabling CONFIG_MEM_ALLOC_PROFILING_DEBUG with CONFIG_SAMPLE_LIVEPATCH
results in the following error:

  samples/livepatch/livepatch-shadow-fix1.o: error: objtool: static_call: can't find static_call_key symbol: __SCK__WARN_trap

This is caused an extra file->klp sanity check which was added by commit
164c9201e1da ("objtool: Add base objtool support for livepatch
modules").  That check was intended to ensure that livepatch modules
built with klp-build always have full access to their static call keys.

However, it failed to account for the fact that manually built livepatch
modules (i.e., not built with klp-build) might need access to unexported
static call keys, for which read-only access is typically allowed for
modules.

While the livepatch-shadow-fix1 module doesn't explicitly use any static
calls, it does have a memory allocation, which can cause
CONFIG_MEM_ALLOC_PROFILING_DEBUG to insert a WARN() call.  And WARN() is
now an unexported static call as of commit 860238af7a33 ("x86_64/bug:
Inline the UD1").

Fix it by removing the overzealous file->klp check, restoring the
original behavior for manually built livepatch modules.

Fixes: 164c9201e1da ("objtool: Add base objtool support for livepatch modules")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Song Liu <song@kernel.org>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/0bd3ae9a53c3d743417fe842b740a7720e2bcd1c.1770058775.git.jpoimboe@kernel.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

objtool/klp: Fix symbol correlation for orphaned local symbols

When compiling with CONFIG_LTO_CLANG_THIN, vmlinux.o has
__irf_[start|end] before the first FILE entry:

  $ readelf -sW vmlinux.o
  Symbol table '.symtab' contains 597706 entries:
     Num:    Value          Size Type    Bind   Vis      Ndx Name
       0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
       1: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   18 __irf_start
       2: 0000000000000200     0 NOTYPE  LOCAL  DEFAULT   18 __irf_end
       3: 0000000000000000     0 SECTION LOCAL  DEFAULT   17 .text
       4: 0000000000000000     0 SECTION LOCAL  DEFAULT   18 .init.ramfs

This causes klp-build warnings like:

  vmlinux.o: warning: objtool: no correlation: __irf_start
  vmlinux.o: warning: objtool: no correlation: __irf_end

The problem is that Clang LTO is stripping the initramfs_data.o FILE
symbol, causing those two symbols to be orphaned and not noticed by
klp-diff's correlation logic.  Add a loop to correlate any symbols found
before the first FILE symbol.

Fixes: dd590d4d57eb ("objtool/klp: Introduce klp diff subcommand for diffing object files")
Reported-by: Song Liu <song@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://patch.msgid.link/e21ec1141fc749b5f538d7329b531c1ab63a6d1a.1770055235.git.jpoimboe@kernel.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

livepatch: Free klp_{object,func}_ext data after initialization

The klp_object_ext and klp_func_ext data, which are stored in the
__klp_objects and __klp_funcs sections, respectively, are not needed
after they are used to create the actual klp_object and klp_func
instances. This operation is implemented by the init function in
scripts/livepatch/init.c.

Prefix the two sections with ".init" so they are freed after the module
is initializated.

Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Aaron Tomlin <atomlin@atomlin.com>
Link: https://patch.msgid.link/20260123102825.3521961-3-petr.pavlu@suse.com
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

livepatch: Fix having __klp_objects relics in non-livepatch modules

The linker script scripts/module.lds.S specifies that all input
__klp_objects sections should be consolidated into an output section of
the same name, and start/stop symbols should be created to enable
scripts/livepatch/init.c to locate this data.

This start/stop pattern is not ideal for modules because the symbols are
created even if no __klp_objects input sections are present.
Consequently, a dummy __klp_objects section also appears in the
resulting module. This unnecessarily pollutes non-livepatch modules.

Instead, since modules are relocatable files, the usual method for
locating consolidated data in a module is to read its section table.
This approach avoids the aforementioned problem.

The klp_modinfo already stores a copy of the entire section table with
the final addresses. Introduce a helper function that
scripts/livepatch/init.c can call to obtain the location of the
__klp_objects section from this data.

Fixes: dd590d4d57eb ("objtool/klp: Introduce klp diff subcommand for diffing object files")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Aaron Tomlin <atomlin@atomlin.com>
Link: https://patch.msgid.link/20260123102825.3521961-2-petr.pavlu@suse.com
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

Merge tag 'nvme-6.19-2026-02-05' of git://git.infradead.org/nvme into block-6.19

Pull NVMe fixes from Keith:

"- Fix NULL pointer access setting up dma mappings (Keith)
- Fix invalid memory access from malformed TCP PDU (YunJe)"

* tag 'nvme-6.19-2026-02-05' of git://git.infradead.org/nvme:
nvmet-tcp: add bounds checks in nvmet_tcp_build_pdu_iovec
nvme-pci: handle changing device dma map requirements

nvmet-tcp: add bounds checks in nvmet_tcp_build_pdu_iovec

nvmet_tcp_build_pdu_iovec() could walk past cmd->req.sg when a PDU
length or offset exceeds sg_cnt and then use bogus sg->length/offset
values, leading to _copy_to_iter() GPF/KASAN. Guard sg_idx, remaining
entries, and sg->length/offset before building the bvec.

Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver")
Signed-off-by: YunJe Shin <ioerts@kookmin.ac.kr>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Joonkyo Jung <joonkyoj@yonsei.ac.kr>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvme-pci: handle changing device dma map requirements

The initial state of dma_needs_unmap may be false, but change to true
while mapping the data iterator. Enabling swiotlb is one such case that
can change the result. The nvme driver needs to save the mapped dma
vectors to be unmapped later, so allocate as needed during iteration
rather than assume it was always allocated at the beginning. This fixes
a NULL dereference from accessing an uninitialized dma_vecs when the
device dma unmapping requirements change mid-iteration.

Fixes: b8b7570a7ec8 ("nvme-pci: fix dma unmapping when using PRPs and not using the IOVA mapping")
Link: https://lore.kernel.org/linux-nvme/20260202125738.1194899-1-pradeep.pragallapati@oss.qualcomm.com/
Reported-by: Pradeep P V K <pradeep.pragallapati@oss.qualcomm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

tracing: Fix ftrace event field alignments

The fields of ftrace specific events (events used to save ftrace internal
events like function traces and trace_printk) are generated similarly to
how normal trace event fields are generated. That is, the fields are added
to a trace_events_fields array that saves the name, offset, size,
alignment and signness of the field. It is used to produce the output in
the format file in tracefs so that tooling knows how to parse the binary
data of the trace events.

The issue is that some of the ftrace event structures are packed. The
function graph exit event structures are one of them. The 64 bit calltime
and rettime fields end up 4 byte aligned, but the algorithm to show to
userspace shows them as 8 byte aligned.

The macros that create the ftrace events has one for embedded structure
fields. There's two macros for theses fields:

  __field_desc() and __field_packed()

The difference of the latter macro is that it treats the field as packed.

Rename that field to __field_desc_packed() and create replace the
__field_packed() to be a normal field that is packed and have the calltime
and rettime use those.

This showed up on 32bit architectures for function graph time fields. It
had:

~# cat /sys/kernel/tracing/events/ftrace/funcgraph_exit/format
[..]
        field:unsigned long func;       offset:8;       size:4; signed:0;
        field:unsigned int depth;       offset:12;      size:4; signed:0;
        field:unsigned int overrun;     offset:16;      size:4; signed:0;
        field:unsigned long long calltime;      offset:24;      size:8; signed:0;
        field:unsigned long long rettime;       offset:32;      size:8; signed:0;

Notice that overrun is at offset 16 with size 4, where in the structure
calltime is at offset 20 (16 + 4), but it shows the offset at 24. That's
because it used the alignment of unsigned long long when used as a
declaration and not as a member of a structure where it would be aligned
by word size (in this case 4).

By using the proper structure alignment, the format has it at the correct
offset:

~# cat /sys/kernel/tracing/events/ftrace/funcgraph_exit/format
[..]
        field:unsigned long func;       offset:8;       size:4; signed:0;
        field:unsigned int depth;       offset:12;      size:4; signed:0;
        field:unsigned int overrun;     offset:16;      size:4; signed:0;
        field:unsigned long long calltime;      offset:20;      size:8; signed:0;
        field:unsigned long long rettime;       offset:28;      size:8; signed:0;

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reported-by: "jempty.liang" <imntjempty@163.com>
Link: https://patch.msgid.link/20260204113628.53faec78@gandalf.local.home
Fixes: 04ae87a52074e ("ftrace: Rework event_create_dir()")
Closes: https://lore.kernel.org/all/20260130015740.212343-1-imntjempty@163.com/
Closes: https://lore.kernel.org/all/20260202123342.2544795-1-imntjempty@163.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

pmdomain: imx8mp-blk-ctrl: Keep usb phy power domain on for system wakeup

USB system wakeup need its PHY on, so add the GENPD_FLAG_ACTIVE_WAKEUP
flags to USB PHY genpd configuration.

Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Fixes: 556f5cf9568a ("soc: imx: add i.MX8MP HSIO blk-ctrl")
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

pmdomain: imx8mp-blk-ctrl: Keep gpc power domain on for system wakeup

Current design will power off all dependent GPC power domains in
imx8mp_blk_ctrl_suspend(), even though the user device has enabled
wakeup capability. The result is that wakeup function never works
for such device.

An example will be USB wakeup on i.MX8MP. PHY device '382f0040.usb-phy'
is attached to power domain 'hsioblk-usb-phy2' which is spawned by hsio
block control. A virtual power domain device 'genpd:3:32f10000.blk-ctrl'
is created to build connection with 'hsioblk-usb-phy2' and it depends on
GPC power domain 'usb-otg2'. If device '382f0040.usb-phy' enable wakeup,
only power domain 'hsioblk-usb-phy2' keeps on during system suspend,
power domain 'usb-otg2' is off all the time. So the wakeup event can't
happen.

In order to further establish a connection between the power domains
related to GPC and block control during system suspend, register a genpd
power on/off notifier for the power_dev. This allows us to prevent the GPC
power domain from being powered off, in case the block control power
domain is kept on to serve system wakeup.

Suggested-by: Ulf Hansson <ulf.hansson@linaro.org>
Fixes: 556f5cf9568a ("soc: imx: add i.MX8MP HSIO blk-ctrl")
Cc: stable@vger.kernel.org
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

drm/xe/guc: Fix CFI violation in debugfs access.

xe_guc_print_info is void-returning, but the function pointer it is
assigned to expects an int-returning function, leading to the following
CFI error:

[ 206.873690] CFI failure at guc_debugfs_show+0xa1/0xf0 [xe]
(target: xe_guc_print_info+0x0/0x370 [xe]; expected type: 0xbe3bc66a)

Fix this by updating xe_guc_print_info to return an integer.

Fixes: e15826bb3c2c ("drm/xe/guc: Refactor GuC debugfs initialization")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: George D Sworo <george.d.sworo@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260129182547.32899-2-daniele.ceraolospurio@intel.com
(cherry picked from commit dd8ea2f2ab71b98887fdc426b0651dbb1d1ea760)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/bridge: imx8mp-hdmi-pai: enable PM runtime

There is an audio channel shift issue with multi channel case - the
channel order is correct for the first run, but the channel order is
shifted for the second run. The fix method is to reset the PAI interface
at the end of playback.

The reset can be handled by PM runtime, so enable PM runtime.

Fixes: 0205fae6327a ("drm/bridge: imx: add driver for HDMI TX Parallel Audio Interface")
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Reviewed-by: Liu Ying <victor.liu@nxp.com>
Signed-off-by: Liu Ying <victor.liu@nxp.com>
Link: https://lore.kernel.org/r/20260130080910.3532724-1-shengjiu.wang@nxp.com

ALSA: hda/realtek: Enable headset mic for Acer Nitro 5

Add quirk to support microphone input through headphone jack on Acer Nitro 5 AN515-57 (ALC295).

Signed-off-by: Breno Baptista <brenomb07@gmail.com>
Link: https://patch.msgid.link/20260205024341.26694-1-brenomb07@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

netfilter: nf_tables: fix inverted genmask check in nft_map_catchall_activate()

nft_map_catchall_activate() has an inverted element activity check
compared to its non-catchall counterpart nft_mapelem_activate() and
compared to what is logically required.

nft_map_catchall_activate() is called from the abort path to re-activate
catchall map elements that were deactivated during a failed transaction.
It should skip elements that are already active (they don't need
re-activation) and process elements that are inactive (they need to be
restored). Instead, the current code does the opposite: it skips inactive
elements and processes active ones.

Compare the non-catchall activate callback, which is correct:

  nft_mapelem_activate():
    if (nft_set_elem_active(ext, iter->genmask))
        return 0;   /* skip active, process inactive */

With the buggy catchall version:

  nft_map_catchall_activate():
    if (!nft_set_elem_active(ext, genmask))
        continue;   /* skip inactive, process active */

The consequence is that when a DELSET operation is aborted,
nft_setelem_data_activate() is never called for the catchall element.
For NFT_GOTO verdict elements, this means nft_data_hold() is never
called to restore the chain->use reference count. Each abort cycle
permanently decrements chain->use. Once chain->use reaches zero,
DELCHAIN succeeds and frees the chain while catchall verdict elements
still reference it, resulting in a use-after-free.

This is exploitable for local privilege escalation from an unprivileged
user via user namespaces + nftables on distributions that enable
CONFIG_USER_NS and CONFIG_NF_TABLES.

Fix by removing the negation so the check matches nft_mapelem_activate():
skip active elements, process inactive ones.

Fixes: 628bd3e49cba ("netfilter: nf_tables: drop map element references from preparation phase")
Signed-off-by: Andrew Fasano <andrew.fasano@nist.gov>
Signed-off-by: Florian Westphal <fw@strlen.de>

Merge tag 'wireless-2026-02-04' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
Two last-minute iwlwifi fixes:
- cancel mlo_scan_work on disassoc to avoid
   use-after-free/init-after-queue issues
- pause TCM work on suspend to avoid crashing
   the FW (and sometimes the host) on resume
   with traffic

* tag 'wireless-2026-02-04' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: iwlwifi: mvm: pause TCM on fast resume
  wifi: iwlwifi: mld: cancel mlo_scan_start_wk
====================

Link: https://patch.msgid.link/20260204113547.159742-4-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'mm-hotfixes-stable-2026-02-04-15-55' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
"Five hotfixes.  Two are cc:stable, two are for MM.

  All are singletons - please see the changelogs for details"

* tag 'mm-hotfixes-stable-2026-02-04-15-55' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  Documentation: document liveupdate cmdline parameter
  mm, shmem: prevent infinite loop on truncate race
  mailmap: update Alexander Mikhalitsyn's emails
  liveupdate: luo_file: do not clear serialized_data on unfreeze
  x86/kfence: fix booting on 32bit non-PAE systems

Merge tag 'tsm-fixes-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm

Pull TSM (TEE security Manager) fixes from Dan Williams:
"The largest change is reverting part of an ABI that never shipped in a
  released kernel (Documentation/ABI/testing/sysfs-class-tsm). The fix /
  replacement for that is too large to squeeze in at this late date.

  The rest is a collection of small fixups:

   - Fix multiple streams per host bridge for SEV-TIO

   - Drop the TSM ABI for reporting IDE streams (to be replaced)

   - Fix virtual function enumeration

   - Fix reserved stream ID initialization

   - Fix unused variable compiler warning"

* tag 'tsm-fixes-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm:
  crypto/ccp: Allow multiple streams on the same root bridge
  crypto/ccp: Use PCI bridge defaults for IDE
  coco/tsm: Remove unused variable tsm_rwsem
  PCI/IDE: Fix reading a wrong reg for unused sel stream initialization
  PCI/IDE: Fix off by one error calculating VF RID range
  Revert "PCI/TSM: Report active IDE streams"

Merge tag 'sched_ext-for-6.19-rc8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext fix from Tejun Heo:

- Fix race where sched_class operations (sched_setscheduler() and
   friends) could be invoked on dead tasks after sched_ext_dead()
   already ran, causing invalid SCX task state transitions and NULL
   pointer dereferences.

   This was a regression from the cgroup exit ordering fix which
   moved sched_ext_free() to finish_task_switch().

* tag 'sched_ext-for-6.19-rc8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: Short-circuit sched_class operations on dead tasks

hwmon: (occ) Mark occ_init_attribute() as __printf

This is a printf-style function, which gcc -Werror=suggest-attribute=format
correctly points out:

drivers/hwmon/occ/common.c: In function 'occ_init_attribute':
drivers/hwmon/occ/common.c:761:9: error: function 'occ_init_attribute' might be a candidate for 'gnu_printf' format attribute [-Werror=suggest-attribute=format]

Add the attribute to avoid this warning and ensure any incorrect
format strings are detected here.

Fixes: 744c2fe950e9 ("hwmon: (occ) Rework attribute registration for stack usage")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20260203163440.2674340-1-arnd@kernel.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>

sched_ext: Short-circuit sched_class operations on dead tasks

7900aa699c34 ("sched_ext: Fix cgroup exit ordering by moving sched_ext_free()
to finish_task_switch()") moved sched_ext_free() to finish_task_switch() and
renamed it to sched_ext_dead() to fix cgroup exit ordering issues. However,
this created a race window where certain sched_class ops may be invoked on
dead tasks leading to failures - e.g. sched_setscheduler() may try to switch a
task which finished sched_ext_dead() back into SCX triggering invalid SCX task
state transitions.

Add task_dead_and_done() which tests whether a task is TASK_DEAD and has
completed its final context switch, and use it to short-circuit sched_class
operations which may be called on dead tasks.

Fixes: 7900aa699c34 ("sched_ext: Fix cgroup exit ordering by moving sched_ext_free() to finish_task_switch()")
Reported-by: Andrea Righi <arighi@nvidia.com>
Link: http://lkml.kernel.org/r/20260202151341.796959-1-arighi@nvidia.com
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

ceph: fix NULL pointer dereference in ceph_mds_auth_match()

The CephFS kernel client has regression starting from 6.18-rc1.
We have issue in ceph_mds_auth_match() if fs_name == NULL:

    const char fs_name = mdsc->fsc->mount_options->mds_namespace;
    ...
    if (auth->match.fs_name && strcmp(auth->match.fs_name, fs_name)) {
            / fsname mismatch, try next one */
            return 0;
    }

Patrick Donnelly suggested that: In summary, we should definitely start
decoding `fs_name` from the MDSMap and do strict authorizations checks
against it. Note that the `-o mds_namespace=foo` should only be used for
selecting the file system to mount and nothing else. It's possible
no mds_namespace is specified but the kernel will mount the only
file system that exists which may have name "foo".

This patch reworks ceph_mdsmap_decode() and namespace_equals() with
the goal of supporting the suggested concept. Now struct ceph_mdsmap
contains m_fs_name field that receives copy of extracted FS name
by ceph_extract_encoded_string(). For the case of "old" CephFS file
systems, it is used "cephfs" name.

[ idryomov: replace redundant %*pE with %s in ceph_mdsmap_decode(),
  get rid of a series of strlen() calls in ceph_namespace_match(),
  drop changes to namespace_equals() body to avoid treating empty
  mds_namespace as equal, drop changes to ceph_mdsc_handle_fsmap()
  as namespace_equals() isn't an equivalent substitution there ]

Cc: stable@vger.kernel.org
Fixes: 22c73d52a6d0 ("ceph: fix multifs mds auth caps issue")
Link: https://tracker.ceph.com/issues/73886
Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Tested-by: Patrick Donnelly <pdonnell@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:

- Fix a bug where AVIC is incorrectly inhibited when running with
   x2AVIC disabled via module param (or on a system without x2AVIC)

- Fix a dangling device posted IRQs bug by explicitly checking if the
   irqfd is still active (on the list) when handling an eventfd signal,
   instead of zeroing the irqfd's routing information when the irqfd is
   deassigned.

   Zeroing the irqfd's routing info causes arm64 and x86's to not
   disable posting for the IRQ (kvm_arch_irq_bypass_del_producer() looks
   for an MSI), incorrectly leaving the IRQ in posted mode (and leading
   to use-after-free and memory leaks on AMD in particular).

   This is both the most pressing and scariest, but it's been in -next
   for a while.

- Disable FORTIFY_SOURCE for KVM selftests to prevent the compiler from
   generating calls to the checked versions of memset() and friends,
   which leads to unexpected page faults in guest code due e.g.
   __memset_chk@plt not being resolved.

- Explicitly configure the supported XSS capabilities from within
   {svm,vmx}_set_cpu_caps() to fix a bug where VMX will compute the
   reference VMCS configuration with SHSTK and IBT enabled, but then
   compute each CPUs local config with SHSTK and IBT disabled if not all
   CET xfeatures are enabled, e.g. if the kernel is built with
   X86_KERNEL_IBT=n.

   The mismatch in features results in differing nVMX setting, and
   ultimately causes kvm-intel.ko to refuse to load with nested=1.

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Explicitly configure supported XSS from {svm,vmx}_set_cpu_caps()
  KVM: selftests: Add -U_FORTIFY_SOURCE to avoid some unpredictable test failures
  KVM: x86: Assert that non-MSI doesn't have bypass vCPU when deleting producer
  KVM: Don't clobber irqfd routing type when deassigning irqfd
  KVM: SVM: Check vCPU ID against max x2AVIC ID if and only if x2AVIC is enabled

Merge tag 'kvm-x86-fixes-6.19-rc8' of https://github.com/kvm-x86/linux into HEAD

Final KVM fixes for 6.19:

- Fix a bug where AVIC is incorrectly inhibited when running with x2AVIC
   disabled via module param (or on a system without x2AVIC).

- Fix a dangling device posted IRQs bug by explicitly checking if the irqfd is
   still active (on the list) when handling an eventfd signal, instead of
   zeroing the irqfd's routing information when the irqfd is deassigned.
   Zeroing the irqfd's routing info causes arm64 and x86's to not disable
   posting for the IRQ (kvm_arch_irq_bypass_del_producer() looks for an MSI),
   incorrectly leaving the IRQ in posted mode (and leading to use-after-free
   and memory leaks on AMD in particular).

   This is both the most pressing and scariest, but it's been in -next for
   a while.

- Disable FORTIFY_SOURCE for KVM selftests to prevent the compiler from
   generating calls to the checked versions of memset() and friends, which
   leads to unexpected page faults in guest code due e.g. __memset_chk@plt
   not being resolved.

- Explicitly configure the support XSS from within {svm,vmx}_set_cpu_caps() to
   fix a bug where VMX will compute the reference VMCS configuration with SHSTK
   and IBT enabled, but then compute each CPUs local config with SHSTK and IBT
   disabled if not all CET xfeatures are enabled, e.g. if the kernel is built
   with X86_KERNEL_IBT=n.  The mismatch in features results in differing nVMX
   setting, and ultimately causes kvm-intel.ko to refuse to load with nested=1.

Merge tag 'soc-fixes-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull SoC fixes from Arnd Bergmann:
"Shawn Guo is moving on from maintaining the NXP i.MX platform and
  hands over to Frank Li. Shawn has maintained the platform for 15 years
  after initially upstreaming support for i.MX6 and i.MX23/28, and his
  work has helped make this the most important industrial embedded Linux
  platform. Roughly one out of five devicetree files in mainline kernels
  are for the wider i.MX platform. Many thanks to Shawn for the taking
  care of the platform all these years!

  There are also two additional updates for the MAINTAINERS file, and a
  fix for error handling in the qualcomm smem driver"

* tag 'soc-fixes-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  MAINTAINERS: Change Sudeep Holla's email address
  MAINTAINERS: Add myself as maintainer of hisi_soc_hha
  soc: qcom: smem: fix qcom_smem_is_available and check if __smem is valid
  MAINTAINERS: Replace Shawn with Frank as i.MX platform maintainer

Merge tag 'asoc-fix-v6.19-rc8' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.19

A bunch more small fixes here, plus some more of the constant stream of
quirks. The most notable change here is Richard's change to the cs_dsp
code for the KUnit tests which is relatively large, mostly due to
boilerplate. The tests were triggering large numbers of error messages
as part of verifying that problems with input data are appropriately
detected which in turn caused runtime issues for the framework due to
the performance impact of pushing the logging out, while the logging is
valuable in normal operation it's basically useless while doing tests
designed to trigger it so rate limiting is an appropriate fix.

drm/xe/pm: Disable D3Cold for BMG only on specific platforms

Restrict D3Cold disablement for BMG to unsupported NUC platforms,
instead of disabling it on all platforms.

Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Fixes: 3e331a6715ee ("drm/xe/pm: Temporarily disable D3Cold on BMG")
Link: https://patch.msgid.link/20260123173238.1642383-1-karthik.poosa@intel.com
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 39125eaf8863ab09d70c4b493f58639b08d5a897)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe: Fix kerneldoc for xe_tlb_inval_job_alloc_dep

Correct the function name in the kerneldoc.
It is for below warning:
"Warning: drivers/gpu/drm/xe/xe_tlb_inval_job.c:210 expecting prototype for
xe_tlb_inval_alloc_dep(). Prototype was for xe_tlb_inval_job_alloc_dep()
instead"

Fixes: 15366239e2130 ("drm/xe: Decouple TLB invalidations from GT")
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260129233834.419977-8-shuicheng.lin@intel.com
(cherry picked from commit 9f9c117ac566cb567dd56cc5b7564c45653f7a2a)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe: Fix kerneldoc for xe_gt_tlb_inval_init_early

Correct the function name in the kerneldoc.
It is for below warning:
"Warning: drivers/gpu/drm/xe/xe_tlb_inval.c:136 expecting prototype for
xe_gt_tlb_inval_init(). Prototype was for xe_gt_tlb_inval_init_early()
instead"

v2: add () for the function. (Michal)

Fixes: db16f9d90c1d9 ("drm/xe: Split TLB invalidation code in frontend and backend")
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260129233834.419977-7-shuicheng.lin@intel.com
(cherry picked from commit 0651dbb9d6a72e99569576fbec4681fd8160d161)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>