]> git.apps.os.sepia.ceph.com Git - ceph-client.git/log
ceph-client.git
12 months agofs/proc: fix build with GCC 15 due to -Werror=unterminated-string-initialization
Brahmajit Das [Sat, 5 Oct 2024 06:37:00 +0000 (12:07 +0530)]
fs/proc: fix build with GCC 15 due to -Werror=unterminated-string-initialization

show show_smap_vma_flags() has been a using misspelled initializer in
mnemonics[] - it needed to initialize 2 element array of char and it used
NUL-padded 2 character string literals (i.e.  3-element initializer).

This has been spotted by gcc-15[*]; prior to that gcc quietly dropped the
3rd eleemnt of initializers.  To fix this we are increasing the size of
mnemonics[] (from mnemonics[BITS_PER_LONG][2] to
mnemonics[BITS_PER_LONG][3]) to accomodate the NUL-padded string literals.

This also helps us in simplyfying the logic for printing of the flags as
instead of printing each character from the mnemonics[], we can just print
the mnemonics[] using seq_printf.

[*]: fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
  917 |                 [0 ... (BITS_PER_LONG-1)] = "??",
      |                                                 ^~~~
fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
...

Stephen pointed out:

: The C standard explicitly allows for a string initializer to be too long
: due to the NUL byte at the end ...  so this warning may be overzealous.

but let's make the warning go away anwyay.

Link: https://lkml.kernel.org/r/20241005063700.2241027-1-brahmajit.xyz@gmail.com
Link: https://lkml.kernel.org/r/20241003093040.47c08382@canb.auug.org.au
Signed-off-by: Brahmajit Das <brahmajit.xyz@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 months agolib: alloc_tag_module_unload must wait for pending kfree_rcu calls
Florian Westphal [Mon, 7 Oct 2024 20:52:24 +0000 (22:52 +0200)]
lib: alloc_tag_module_unload must wait for pending kfree_rcu calls

Ben Greear reports following splat:
 ------------[ cut here ]------------
 net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
 WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
 Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
...
 Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
 RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
  codetag_unload_module+0x19b/0x2a0
  ? codetag_load_module+0x80/0x80

nf_nat module exit calls kfree_rcu on those addresses, but the free
operation is likely still pending by the time alloc_tag checks for leaks.

Wait for outstanding kfree_rcu operations to complete before checking
resolves this warning.

Reproducer:
unshare -n iptables-nft -t nat -A PREROUTING -p tcp
grep nf_nat /proc/allocinfo # will list 4 allocations
rmmod nft_chain_nat
rmmod nf_nat                # will WARN.

[akpm@linux-foundation.org: add comment]
Link: https://lkml.kernel.org/r/20241007205236.11847-1-fw@strlen.de
Fixes: a473573964e5 ("lib: code tagging module support")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reported-by: Ben Greear <greearb@candelatech.com>
Closes: https://lore.kernel.org/netdev/bdaaef9d-4364-4171-b82b-bcfc12e207eb@candelatech.com/
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 months agomm/mremap: fix move_normal_pmd/retract_page_tables race
Jann Horn [Mon, 7 Oct 2024 21:42:04 +0000 (23:42 +0200)]
mm/mremap: fix move_normal_pmd/retract_page_tables race

In mremap(), move_page_tables() looks at the type of the PMD entry and the
specified address range to figure out by which method the next chunk of
page table entries should be moved.

At that point, the mmap_lock is held in write mode, but no rmap locks are
held yet.  For PMD entries that point to page tables and are fully covered
by the source address range, move_pgt_entry(NORMAL_PMD, ...) is called,
which first takes rmap locks, then does move_normal_pmd().
move_normal_pmd() takes the necessary page table locks at source and
destination, then moves an entire page table from the source to the
destination.

The problem is: The rmap locks, which protect against concurrent page
table removal by retract_page_tables() in the THP code, are only taken
after the PMD entry has been read and it has been decided how to move it.
So we can race as follows (with two processes that have mappings of the
same tmpfs file that is stored on a tmpfs mount with huge=advise); note
that process A accesses page tables through the MM while process B does it
through the file rmap:

process A                      process B
=========                      =========
mremap
  mremap_to
    move_vma
      move_page_tables
        get_old_pmd
        alloc_new_pmd
                      *** PREEMPT ***
                               madvise(MADV_COLLAPSE)
                                 do_madvise
                                   madvise_walk_vmas
                                     madvise_vma_behavior
                                       madvise_collapse
                                         hpage_collapse_scan_file
                                           collapse_file
                                             retract_page_tables
                                               i_mmap_lock_read(mapping)
                                               pmdp_collapse_flush
                                               i_mmap_unlock_read(mapping)
        move_pgt_entry(NORMAL_PMD, ...)
          take_rmap_locks
          move_normal_pmd
          drop_rmap_locks

When this happens, move_normal_pmd() can end up creating bogus PMD entries
in the line `pmd_populate(mm, new_pmd, pmd_pgtable(pmd))`.  The effect
depends on arch-specific and machine-specific details; on x86, you can end
up with physical page 0 mapped as a page table, which is likely
exploitable for user->kernel privilege escalation.

Fix the race by letting process B recheck that the PMD still points to a
page table after the rmap locks have been taken.  Otherwise, we bail and
let the caller fall back to the PTE-level copying path, which will then
bail immediately at the pmd_none() check.

Bug reachability: Reaching this bug requires that you can create
shmem/file THP mappings - anonymous THP uses different code that doesn't
zap stuff under rmap locks.  File THP is gated on an experimental config
flag (CONFIG_READ_ONLY_THP_FOR_FS), so on normal distro kernels you need
shmem THP to hit this bug.  As far as I know, getting shmem THP normally
requires that you can mount your own tmpfs with the right mount flags,
which would require creating your own user+mount namespace; though I don't
know if some distros maybe enable shmem THP by default or something like
that.

Bug impact: This issue can likely be used for user->kernel privilege
escalation when it is reachable.

Link: https://lkml.kernel.org/r/20241007-move_normal_pmd-vs-collapse-fix-2-v1-1-5ead9631f2ea@google.com
Fixes: 1d65b771bc08 ("mm/khugepaged: retract_page_tables() without mmap or vma lock")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Closes: https://project-zero.issues.chromium.org/371047675
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 months agomm: percpu: increase PERCPU_DYNAMIC_SIZE_SHIFT on certain builds.
Sebastian Andrzej Siewior [Mon, 7 Oct 2024 14:30:49 +0000 (16:30 +0200)]
mm: percpu: increase PERCPU_DYNAMIC_SIZE_SHIFT on certain builds.

Arnd reported a build failure due to the BUILD_BUG_ON() statement in
alloc_kmem_cache_cpus().  The test

  PERCPU_DYNAMIC_EARLY_SIZE < NR_KMALLOC_TYPES * KMALLOC_SHIFT_HIGH * sizeof(struct kmem_cache_cpu)

The factors that increase the right side of the equation:
- PAGE_SIZE > 4KiB increases KMALLOC_SHIFT_HIGH
- For the local_lock_t in kmem_cache_cpu:
  - PREEMPT_RT adds an actual lock.
  - LOCKDEP increases the size of the lock.
  - LOCK_STAT adds additional bytes plus padding to the lockdep
    structure.

The net difference with and without PREEMPT_RT is 88 bytes for the
lock_lock_t, 96 bytes for kmem_cache_cpu due to additional padding.  This
is enough to exceed the 80KiB limit with 16KiB page size - the 8KiB page
size is fine.

Increase PERCPU_DYNAMIC_SIZE_SHIFT to 13 on configs with PAGE_SIZE larger
than 4KiB and LOCKDEP enabled.

Link: https://lkml.kernel.org/r/20241007143049.gyMpEu89@linutronix.de
Fixes: d8fccd9ca5f9 ("arm64: Allow to enable PREEMPT_RT.")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202410020326.iaZIteIx-lkp@intel.com/
Reported-by: Arnd Bergmann <arnd@kernel.org>
Closes: https://lore.kernel.org/20241004095702.637528-1-arnd@kernel.org
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 months agoselftests/mm: fix deadlock for fork after pthread_create on ARM
Edward Liaw [Thu, 3 Oct 2024 21:17:11 +0000 (21:17 +0000)]
selftests/mm: fix deadlock for fork after pthread_create on ARM

On Android with arm, there is some synchronization needed to avoid a
deadlock when forking after pthread_create.

Link: https://lkml.kernel.org/r/20241003211716.371786-3-edliaw@google.com
Fixes: cff294582798 ("selftests/mm: extend and rename uffd pagemap test")
Signed-off-by: Edward Liaw <edliaw@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 months agoselftests/mm: replace atomic_bool with pthread_barrier_t
Edward Liaw [Thu, 3 Oct 2024 21:17:10 +0000 (21:17 +0000)]
selftests/mm: replace atomic_bool with pthread_barrier_t

Patch series "selftests/mm: fix deadlock after pthread_create".

On Android arm, pthread_create followed by a fork caused a deadlock in the
case where the fork required work to be completed by the created thread.

Update the synchronization primitive to use pthread_barrier instead of
atomic_bool.

Apply the same fix to the wp-fork-with-event test.

This patch (of 2):

Swap synchronization primitive with pthread_barrier, so that stdatomic.h
does not need to be included.

The synchronization is needed on Android ARM64; we see a deadlock with
pthread_create when the parent thread races forward before the child has a
chance to start doing work.

Link: https://lkml.kernel.org/r/20241003211716.371786-1-edliaw@google.com
Link: https://lkml.kernel.org/r/20241003211716.371786-2-edliaw@google.com
Fixes: cff294582798 ("selftests/mm: extend and rename uffd pagemap test")
Signed-off-by: Edward Liaw <edliaw@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 months agofat: fix uninitialized variable
OGAWA Hirofumi [Fri, 4 Oct 2024 06:03:49 +0000 (15:03 +0900)]
fat: fix uninitialized variable

syszbot produced this with a corrupted fs image.  In theory, however an IO
error would trigger this also.

This affects just an error report, so should not be a serious error.

Link: https://lkml.kernel.org/r/87r08wjsnh.fsf@mail.parknet.co.jp
Link: https://lkml.kernel.org/r/66ff2c95.050a0220.49194.03e9.GAE@google.com
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reported-by: syzbot+ef0d7bc412553291aa86@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 months agonilfs2: propagate directory read errors from nilfs_find_entry()
Ryusuke Konishi [Fri, 4 Oct 2024 03:35:31 +0000 (12:35 +0900)]
nilfs2: propagate directory read errors from nilfs_find_entry()

Syzbot reported that a task hang occurs in vcs_open() during a fuzzing
test for nilfs2.

The root cause of this problem is that in nilfs_find_entry(), which
searches for directory entries, ignores errors when loading a directory
page/folio via nilfs_get_folio() fails.

If the filesystem images is corrupted, and the i_size of the directory
inode is large, and the directory page/folio is successfully read but
fails the sanity check, for example when it is zero-filled,
nilfs_check_folio() may continue to spit out error messages in bursts.

Fix this issue by propagating the error to the callers when loading a
page/folio fails in nilfs_find_entry().

The current interface of nilfs_find_entry() and its callers is outdated
and cannot propagate error codes such as -EIO and -ENOMEM returned via
nilfs_find_entry(), so fix it together.

Link: https://lkml.kernel.org/r/20241004033640.6841-1-konishi.ryusuke@gmail.com
Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: Lizhi Xu <lizhi.xu@windriver.com>
Closes: https://lkml.kernel.org/r/20240927013806.3577931-1-lizhi.xu@windriver.com
Reported-by: syzbot+8a192e8d090fa9a31135@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=8a192e8d090fa9a31135
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 months agomm/mmap: correct error handling in mmap_region()
Lorenzo Stoakes [Wed, 2 Oct 2024 07:39:32 +0000 (08:39 +0100)]
mm/mmap: correct error handling in mmap_region()

Commit f8d112a4e657 ("mm/mmap: avoid zeroing vma tree in mmap_region()")
changed how error handling is performed in mmap_region().

The error value defaults to -ENOMEM, but then gets reassigned immediately
to the result of vms_gather_munmap_vmas() if we are performing a MAP_FIXED
mapping over existing VMAs (and thus unmapping them).

This overwrites the error value, potentially clearing it.

After this, we invoke may_expand_vm() and possibly vm_area_alloc(), and
check to see if they failed. If they do so, then we perform error-handling
logic, but importantly, we do NOT update the error code.

This means that, if vms_gather_munmap_vmas() succeeds, but one of these
calls does not, the function will return indicating no error, but rather an
address value of zero, which is entirely incorrect.

Correct this and avoid future confusion by strictly setting error on each
and every occasion we jump to the error handling logic, and set the error
code immediately prior to doing so.

This way we can see at a glance that the error code is always correct.

Many thanks to Vegard Nossum who spotted this issue in discussion around
this problem.

Link: https://lkml.kernel.org/r/20241002073932.13482-1-lorenzo.stoakes@oracle.com
Fixes: f8d112a4e657 ("mm/mmap: avoid zeroing vma tree in mmap_region()")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Suggested-by: Vegard Nossum <vegard.nossum@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 months agoLinux 6.12-rc3
Linus Torvalds [Sun, 13 Oct 2024 21:33:32 +0000 (14:33 -0700)]
Linux 6.12-rc3

12 months agoMerge tag '6.12-rc2-cifs-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 13 Oct 2024 17:52:39 +0000 (10:52 -0700)]
Merge tag '6.12-rc2-cifs-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
 "Two fixes for Windows symlink handling"

* tag '6.12-rc2-cifs-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Fix creating native symlinks pointing to current or parent directory
  cifs: Improve creating native symlinks pointing to directory

12 months agoMerge tag 'usb-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 13 Oct 2024 16:21:36 +0000 (09:21 -0700)]
Merge tag 'usb-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are some small USB fixes for some reported problems for 6.12-rc3.
  Include in here is:

   - fix for yurex driver that was caused in -rc1

   - build error fix for usbg network filesystem code

   - onboard_usb_dev build fix

   - dwc3 driver fixes for reported errors

   - gadget driver fix

   - new USB storage driver quirk

   - xhci resume bugfix

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  net/9p/usbg: Fix build error
  USB: yurex: kill needless initialization in yurex_read
  Revert "usb: yurex: Replace snprintf() with the safer scnprintf() variant"
  usb: xhci: Fix problem with xhci resume from suspend
  usb: misc: onboard_usb_dev: introduce new config symbol for usb5744 SMBus support
  usb: dwc3: core: Stop processing of pending events if controller is halted
  usb: dwc3: re-enable runtime PM after failed resume
  usb: storage: ignore bogus device raised by JieLi BR21 USB sound chip
  usb: gadget: core: force synchronous registration

12 months agoMerge tag 'driver-core-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 13 Oct 2024 16:10:52 +0000 (09:10 -0700)]
Merge tag 'driver-core-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here is a single driver core fix, and a .mailmap update.

  The fix is for the rust driver core bindings, turned out that the
  from_raw binding wasn't a good idea (don't want to pass a pointer to a
  reference counted object without actually incrementing the pointer.)
  So this change fixes it up as the from_raw binding came in in -rc1.

  The other change is a .mailmap update.

  Both have been in linux-next for a while with no reported issues"

* tag 'driver-core-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  mailmap: update mail for Fiona Behrens
  rust: device: change the from_raw() function

12 months agoMerge tag 'powerpc-6.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sun, 13 Oct 2024 00:16:21 +0000 (17:16 -0700)]
Merge tag 'powerpc-6.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:

 - Fix crash in memcpy on 8xx due to dcbz workaround since recent
   changes

Thanks to Christophe Leroy.

* tag 'powerpc-6.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/8xx: Fix kernel DTLB miss on dcbz

12 months agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 12 Oct 2024 16:24:13 +0000 (09:24 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Four small fixes, three in drivers and one in the FC transport class
  to add idempotence to state setting"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: scsi_transport_fc: Allow setting rport state to current state
  scsi: wd33c93: Don't use stale scsi_pointer value
  scsi: fnic: Move flush_work initialization out of if block
  scsi: ufs: Use pre-calculated offsets in ufshcd_init_lrb()

12 months agoMerge tag 'hwmon-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 12 Oct 2024 16:09:04 +0000 (09:09 -0700)]
Merge tag 'hwmon-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

 - Add missing dependencies on REGMAP_I2C for several drivers

 - Fix memory leak in adt7475 driver

 - Relabel Columbiaville temperature sensor in intel-m10-bmc-hwmon
   driver to match other sensor labels

* tag 'hwmon-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (max1668) Add missing dependency on REGMAP_I2C
  hwmon: (ltc2991) Add missing dependency on REGMAP_I2C
  hwmon: (adt7470) Add missing dependency on REGMAP_I2C
  hwmon: (adm9240) Add missing dependency on REGMAP_I2C
  hwmon: (mc34vr500) Add missing dependency on REGMAP_I2C
  hwmon: (tmp513) Add missing dependency on REGMAP_I2C
  hwmon: (adt7475) Fix memory leak in adt7475_fan_pwm_config()
  hwmon: intel-m10-bmc-hwmon: relabel Columbiaville to CVL Die Temperature

12 months agoMerge tag 'linux_kselftest-fixes-6.12-rc3' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Fri, 11 Oct 2024 23:12:45 +0000 (16:12 -0700)]
Merge tag 'linux_kselftest-fixes-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:
 "Fixes for build, run-time errors, and reporting errors:

   - ftrace: regression test for a kernel crash when running function
     graph tracing and then enabling function profiler.

   - rseq: fix for mm_cid test failure.

   - vDSO:
      - fixes to reporting skip and other error conditions
      - changes unconditionally build chacha and getrandom tests on all
        architectures to make it easier for them to run in CIs
      - build error when sched.h to bring in CLONE_NEWTIME define"

* tag 'linux_kselftest-fixes-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  ftrace/selftest: Test combination of function_graph tracer and function profiler
  selftests/rseq: Fix mm_cid test failure
  selftests: vDSO: Explicitly include sched.h
  selftests: vDSO: improve getrandom and chacha error messages
  selftests: vDSO: unconditionally build getrandom test
  selftests: vDSO: unconditionally build chacha test

12 months agoMerge tag 'devicetree-fixes-for-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 11 Oct 2024 23:07:15 +0000 (16:07 -0700)]
Merge tag 'devicetree-fixes-for-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

 - Disable kunit tests for arm64+ACPI

 - Fix refcount issue in kunit tests

 - Drop constraints on non-conformant 'interrupt-map' in fsl,ls-extirq

 - Drop type ref on 'msi-parent in fsl,qoriq-mc binding

 - Move elgin,jg10309-01 to its own binding from trivial-devices

* tag 'devicetree-fixes-for-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  of: Skip kunit tests when arm64+ACPI doesn't populate root node
  of: Fix unbalanced of node refcount and memory leaks
  dt-bindings: interrupt-controller: fsl,ls-extirq: workaround wrong interrupt-map number
  dt-bindings: misc: fsl,qoriq-mc: remove ref for msi-parent
  dt-bindings: display: elgin,jg10309-01: Add own binding

12 months agoMerge tag 'fbdev-for-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/delle...
Linus Torvalds [Fri, 11 Oct 2024 22:56:02 +0000 (15:56 -0700)]
Merge tag 'fbdev-for-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev

Pull fbdev platform driver fix from Helge Deller:
 "Switch fbdev drivers back to struct platform_driver::remove()

  Now that 'remove()' has been converted to the sane new API, there's
  no reason for the 'remove_new()' use, so this converts back to the
  traditional and simpler name.

  See commits

     5c5a7680e67b ("platform: Provide a remove callback that returns no value")
     0edb555a65d1 ("platform: Make platform_driver::remove() return void")

  for background to this all"

* tag 'fbdev-for-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
  fbdev: Switch back to struct platform_driver::remove()

12 months agoMerge tag 'gpio-fixes-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 11 Oct 2024 22:42:26 +0000 (15:42 -0700)]
Merge tag 'gpio-fixes-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

 - fix clock handle leak in probe() error path in gpio-aspeed

 - add a dummy register read to ensure the write actually completed

* tag 'gpio-fixes-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: aspeed: Use devm_clk api to manage clock source
  gpio: aspeed: Add the flush write to ensure the write complete.

12 months agoMerge tag 'nfs-for-6.12-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Linus Torvalds [Fri, 11 Oct 2024 22:37:15 +0000 (15:37 -0700)]
Merge tag 'nfs-for-6.12-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:
 "Localio Bugfixes:
   - remove duplicated include in localio.c
   - fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()
   - fix Kconfig for NFS_COMMON_LOCALIO_SUPPORT
   - fix nfsd_file tracepoints to handle NULL rqstp pointers

  Other Bugfixes:
   - fix program selection loop in svc_process_common
   - fix integer overflow in decode_rc_list()
   - prevent NULL-pointer dereference in nfs42_complete_copies()
   - fix CB_RECALL performance issues when using a large number of
     delegations"

* tag 'nfs-for-6.12-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS: remove revoked delegation from server's delegation list
  nfsd/localio: fix nfsd_file tracepoints to handle NULL rqstp
  nfs_common: fix Kconfig for NFS_COMMON_LOCALIO_SUPPORT
  nfs_common: fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()
  NFSv4: Prevent NULL-pointer dereference in nfs42_complete_copies()
  SUNRPC: Fix integer overflow in decode_rc_list()
  sunrpc: fix prog selection loop in svc_process_common
  nfs: Remove duplicated include in localio.c

12 months agoMerge tag 'rcu.fixes.6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu...
Linus Torvalds [Fri, 11 Oct 2024 21:42:27 +0000 (14:42 -0700)]
Merge tag 'rcu.fixes.6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU fix from Neeraj Upadhyay:
 "Fix rcuog kthread wakeup invocation from softirq context on a CPU
  which has been marked offline.

  This can happen when new callbacks are enqueued from a softirq on an
  offline CPU before it calls rcutree_report_cpu_dead(). When this
  happens on NOCB configuration, the rcuog wake-up is deferred through
  an IPI to an online CPU. This is done to avoid call into the scheduler
  which can risk arming the RT-bandwidth after hrtimers have been
  migrated out and disabled.

  However, doing IPI call from softirq is not allowed: Fix this by
  forcing deferred rcuog wakeup through the NOCB timer when the CPU is
  offline"

* tag 'rcu.fixes.6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux:
  rcu/nocb: Fix rcuog wake-up from offline softirq

12 months agoMerge tag 'for-linus-6.12a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 11 Oct 2024 21:34:18 +0000 (14:34 -0700)]
Merge tag 'for-linus-6.12a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fix from Juergen Gross:
 "A fix for topology information of Xen PV guests"

* tag 'for-linus-6.12a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: mark boot CPU of PV guest in MSR_IA32_APICBASE

12 months agoftrace/selftest: Test combination of function_graph tracer and function profiler
Steven Rostedt [Thu, 10 Oct 2024 20:52:35 +0000 (16:52 -0400)]
ftrace/selftest: Test combination of function_graph tracer and function profiler

Masami reported a bug when running function graph tracing then the
function profiler. The following commands would cause a kernel crash:

  # cd /sys/kernel/tracing/
  # echo function_graph > current_tracer
  # echo 1 > function_profile_enabled

In that order. Create a test to test this two to make sure this does not
come back as a regression.

Link: https://lore.kernel.org/172398528350.293426.8347220120333730248.stgit@devnote2
Link: https://lore.kernel.org/all/20241010165235.35122877@gandalf.local.home/
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
12 months agoselftests/rseq: Fix mm_cid test failure
Mathieu Desnoyers [Wed, 9 Oct 2024 01:28:01 +0000 (21:28 -0400)]
selftests/rseq: Fix mm_cid test failure

Adapt the rseq.c/rseq.h code to follow GNU C library changes introduced by:

glibc commit 2e456ccf0c34 ("Linux: Make __rseq_size useful for feature detection (bug 31965)")

Without this fix, rseq selftests for mm_cid fail:

./run_param_test.sh
Default parameters
Running test spinlock
Running compare-twice test spinlock
Running mm_cid test spinlock
Error: cpu id getter unavailable

Fixes: 18c2355838e7 ("selftests/rseq: Implement rseq mm_cid field support")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
CC: Carlos O'Donell <carlos@redhat.com>
CC: Florian Weimer <fweimer@redhat.com>
CC: linux-kselftest@vger.kernel.org
CC: stable@vger.kernel.org
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
12 months agoMerge tag 'io_uring-6.12-20241011' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 11 Oct 2024 19:00:21 +0000 (12:00 -0700)]
Merge tag 'io_uring-6.12-20241011' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

 - Explicitly have a mshot_finished condition for IORING_OP_RECV in
   multishot mode, similarly to what IORING_OP_RECVMSG has. This doesn't
   fix a bug right now, but it makes it harder to actually have a bug
   here if a request takes multiple iterations to finish.

 - Fix handling of retry of read/write of !FMODE_NOWAIT files. If they
   are pollable, that's all we need.

* tag 'io_uring-6.12-20241011' of git://git.kernel.dk/linux:
  io_uring/rw: allow pollable non-blocking attempts for !FMODE_NOWAIT
  io_uring/rw: fix cflags posting for single issue multishot read

12 months agoMerge tag 'pm-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Fri, 11 Oct 2024 18:41:20 +0000 (11:41 -0700)]
Merge tag 'pm-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These address two issues in the TPMI module of the Intel RAPL power
  capping driver and one issue in the processor part of the Intel
  int340x thermal driver, update a CPU ID list and register definitions
  needed for RAPL PL4 support and remove some unused code.

  Specifics:

   - Fix the TPMI_RAPL_REG_DOMAIN_INFO register offset in the TPMI part
     of the Intel RAPL power capping driver, make it ignore minor
     hardware version mismatches (which only indicate exposing
     additional features) and update register definitions in it to
     enable PL4 support (Zhang Rui)

   - Add Arrow Lake-U to the list of processors supporting PL4 in the
     MSR part of the Intel RAPL power capping driver (Sumeet Pawnikar)

   - Remove excess pci_disable_device() calls from the processor part of
     the int340x thermal driver to address a warning triggered during
     module unload and remove unused CPU hotplug code related to RAPL
     support from it (Zhang Rui)"

* tag 'pm-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: intel: int340x: processor: Add MMIO RAPL PL4 support
  thermal: intel: int340x: processor: Remove MMIO RAPL CPU hotplug support
  powercap: intel_rapl_msr: Add PL4 support for Arrowlake-U
  powercap: intel_rapl_tpmi: Ignore minor version change
  thermal: intel: int340x: processor: Fix warning during module unload
  powercap: intel_rapl_tpmi: Fix bogus register reading

12 months agoMerge tag 'thermal-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 11 Oct 2024 18:35:30 +0000 (11:35 -0700)]
Merge tag 'thermal-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control fixes from Rafael Wysocki:
 "Address possible use-after-free scenarios during the processing of
  thermal netlink commands and during thermal zone removal (Rafael
  Wysocki)"

* tag 'thermal-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: core: Free tzp copy along with the thermal zone
  thermal: core: Reference count the zone in thermal_zone_get_by_id()

12 months agoMerge tag 'acpi-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 11 Oct 2024 18:32:10 +0000 (11:32 -0700)]
Merge tag 'acpi-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "Reduce the number of ACPI IRQ override DMI quirks by combining quirks
  that cover similar systems while making them cover additional models
  at the same time (Hans de Goede)"

* tag 'acpi-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: resource: Fold Asus Vivobook Pro N6506M* DMI quirks together
  ACPI: resource: Fold Asus ExpertBook B1402C* and B1502C* DMI quirks together
  ACPI: resource: Make Asus ExpertBook B2502 matches cover more models
  ACPI: resource: Make Asus ExpertBook B2402 matches cover more models

12 months agoMerge tag 'pmdomain-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh...
Linus Torvalds [Fri, 11 Oct 2024 18:26:15 +0000 (11:26 -0700)]
Merge tag 'pmdomain-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm

Pull pmdomain fixes from Ulf Hansson:
 "pmdomain core:
   - Fix alloc/free in dev_pm_domain_attach|detach_list()

  pmdomain providers:
   - qcom: Fix the return of uninitialized variable

  pmdomain consumers:
   - drm/tegra/gr3d: Revert conversion to dev_pm_domain_attach|detach_list()

  OPP core:
   - Fix error code in dev_pm_opp_set_config()"

* tag 'pmdomain-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
  PM: domains: Fix alloc/free in dev_pm_domain_attach|detach_list()
  Revert "drm/tegra: gr3d: Convert into dev_pm_domain_attach|detach_list()"
  pmdomain: qcom-cpr: Fix the return of uninitialized variable
  OPP: fix error code in dev_pm_opp_set_config()

12 months agoMerge tag 'mmc-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Linus Torvalds [Fri, 11 Oct 2024 18:23:21 +0000 (11:23 -0700)]
Merge tag 'mmc-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
 "MMC core:
   - Prevent splat from warning when setting maximum DMA segment

  MMC host:
   - mvsdio: Drop sg_miter support for PIO as it didn't work
   - sdhci-of-dwcmshc: Prevent stale interrupt for the T-Head 1520
     variant"

* tag 'mmc-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci-of-dwcmshc: Prevent stale command interrupt handling
  Revert "mmc: mvsdio: Use sg_miter for PIO"
  mmc: core: Only set maximum DMA segment size if DMA is supported

12 months agoMerge tag 'ata-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata...
Linus Torvalds [Fri, 11 Oct 2024 18:18:31 +0000 (11:18 -0700)]
Merge tag 'ata-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull ata fixes from Niklas Cassel:

 - Fix a hibernate regression where the disk was needlessly spun down
   and then immediately spun up both when entering and when resuming
   from hibernation (me)

 - Update the MAINTAINERS file to remove remnants from Jens
   maintainership of libata (Damien)

* tag 'ata-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: libata: Update MAINTAINERS file
  ata: libata: avoid superfluous disk spin down + spin up during hibernation

12 months agoMerge tag 'drm-fixes-2024-10-11' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Fri, 11 Oct 2024 18:13:05 +0000 (11:13 -0700)]
Merge tag 'drm-fixes-2024-10-11' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Weekly fixes haul for drm, lots of small fixes all over, amdgpu, xe
  lead the way, some minor nouveau and radeon fixes, and then a bunch of
  misc all over.

  Nothing too scary or out of the unusual.

  sched:
   - Avoid leaking lockdep map

  fbdev-dma:
   - Only clean up deferred I/O if instanciated

  amdgpu:
   - Fix invalid UBSAN warnings
   - Fix artifacts in MPO transitions
   - Hibernation fix

  amdkfd:
   - Fix an eviction fence leak

  radeon:
   - Add late register for connectors
   - Always set GEM function pointers

  i915:
   - HDCP refcount fix

  nouveau:
   - dmem: Fix privileged error in copy engine channel; Fix possible
     data leak in migrate_to_ram()
   - gsp: Fix coding style

  v3d:
   - Stop active perfmon before destroying it

  vc4:
   - Stop active perfmon before destroying it

  xe:
   - Drop GuC submit_wq pool
   - Fix error checking with xa_store()
   - Fix missing freq restore on GSC load error
   - Fix wedged_mode file permission
   - Fix use-after-free in ct communication"

* tag 'drm-fixes-2024-10-11' of https://gitlab.freedesktop.org/drm/kernel:
  drm/fbdev-dma: Only cleanup deferred I/O if necessary
  drm/xe: Make wedged_mode debugfs writable
  drm/xe: Restore GT freq on GSC load error
  drm/xe/guc_submit: fix xa_store() error checking
  drm/xe/ct: fix xa_store() error checking
  drm/xe/ct: prevent UAF in send_recv()
  drm/radeon: always set GEM function pointer
  nouveau/dmem: Fix vulnerability in migrate_to_ram upon copy error
  nouveau/dmem: Fix privileged error in copy engine channel
  drm/amd/display: fix hibernate entry for DCN35+
  drm/amd/display: Clear update flags after update has been applied
  drm/amdgpu: partially revert powerplay `__counted_by` changes
  drm/radeon: add late_register for connector
  drm/amdkfd: Fix an eviction fence leak
  drm/vc4: Stop the active perfmon before being destroyed
  drm/v3d: Stop the active perfmon before being destroyed
  drm/i915/hdcp: fix connector refcounting
  drm/nouveau/gsp: remove extraneous ; after mutex
  drm/xe: Drop GuC submit_wq pool
  drm/sched: Use drm sched lockdep map for submit_wq

12 months agopowerpc/8xx: Fix kernel DTLB miss on dcbz
Christophe Leroy [Sat, 5 Oct 2024 08:53:29 +0000 (10:53 +0200)]
powerpc/8xx: Fix kernel DTLB miss on dcbz

Following OOPS is encountered while loading test_bpf module
on powerpc 8xx:

[  218.835567] BUG: Unable to handle kernel data access on write at 0xcb000000
[  218.842473] Faulting instruction address: 0xc0017a80
[  218.847451] Oops: Kernel access of bad area, sig: 11 [#1]
[  218.852854] BE PAGE_SIZE=16K PREEMPT CMPC885
[  218.857207] SAF3000 DIE NOTIFICATION
[  218.860713] Modules linked in: test_bpf(+) test_module
[  218.865867] CPU: 0 UID: 0 PID: 527 Comm: insmod Not tainted 6.11.0-s3k-dev-09856-g3de3d71ae2e6-dirty #1280
[  218.875546] Hardware name: MIAE 8xx 0x500000 CMPC885
[  218.880521] NIP:  c0017a80 LR: beab859c CTR: 000101d4
[  218.885584] REGS: cac2bc90 TRAP: 0300   Not tainted  (6.11.0-s3k-dev-09856-g3de3d71ae2e6-dirty)
[  218.894308] MSR:  00009032 <EE,ME,IR,DR,RI>  CR: 55005555  XER: a0007100
[  218.901290] DAR: cb000000 DSISR: c2000000
[  218.901290] GPR00: 000185d1 cac2bd50 c21b9580 caf7c030 c3883fcc 00000008 cafffffc 00000000
[  218.901290] GPR08: 00040000 18300000 20000000 00000004 99005555 100d815e ca669d08 00000369
[  218.901290] GPR16: ca730000 00000000 ca2c004c 00000000 00000000 0000035d 00000311 00000369
[  218.901290] GPR24: ca732240 00000001 00030ba3 c3800000 00000000 00185d48 caf7c000 ca2c004c
[  218.941087] NIP [c0017a80] memcpy+0x88/0xec
[  218.945277] LR [beab859c] test_bpf_init+0x22c/0x3c90 [test_bpf]
[  218.951476] Call Trace:
[  218.953916] [cac2bd50] [beab8570] test_bpf_init+0x200/0x3c90 [test_bpf] (unreliable)
[  218.962034] [cac2bde0] [c0004c04] do_one_initcall+0x4c/0x1fc
[  218.967706] [cac2be40] [c00a2ec4] do_init_module+0x68/0x360
[  218.973292] [cac2be60] [c00a5194] init_module_from_file+0x8c/0xc0
[  218.979401] [cac2bed0] [c00a5568] sys_finit_module+0x250/0x3f0
[  218.985248] [cac2bf20] [c000e390] system_call_exception+0x8c/0x15c
[  218.991444] [cac2bf30] [c00120a8] ret_from_syscall+0x0/0x28

This happens in the main loop of memcpy()

  ==> c0017a80: 7c 0b 37 ec  dcbz    r11,r6
c0017a84: 80 e4 00 04  lwz     r7,4(r4)
c0017a88: 81 04 00 08  lwz     r8,8(r4)
c0017a8c: 81 24 00 0c  lwz     r9,12(r4)
c0017a90: 85 44 00 10  lwzu    r10,16(r4)
c0017a94: 90 e6 00 04  stw     r7,4(r6)
c0017a98: 91 06 00 08  stw     r8,8(r6)
c0017a9c: 91 26 00 0c  stw     r9,12(r6)
c0017aa0: 95 46 00 10  stwu    r10,16(r6)
c0017aa4: 42 00 ff dc  bdnz    c0017a80 <memcpy+0x88>

Commit ac9f97ff8b32 ("powerpc/8xx: Inconditionally use task PGDIR in
DTLB misses") relies on re-reading DAR register to know if an error is
due to a missing copy of a PMD entry in task's PGDIR, allthough DAR
was already read in the exception prolog and copied into thread
struct. This is because is it done very early in the exception and
there are not enough registers available to keep a pointer to thread
struct.

However, dcbz instruction is buggy and doesn't update DAR register on
fault. That is detected and generates a call to FixupDAR workaround
which updates DAR copy in thread struct but doesn't fix DAR register.

Let's fix DAR in addition to the update of DAR copy in thread struct.

Fixes: ac9f97ff8b32 ("powerpc/8xx: Inconditionally use task PGDIR in DTLB misses")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/2b851399bd87e81c6ccb87ea3a7a6b32c7aa04d7.1728118396.git.christophe.leroy@csgroup.eu
12 months agoMerge tag 'drm-xe-fixes-2024-10-10' of https://gitlab.freedesktop.org/drm/xe/kernel...
Dave Airlie [Fri, 11 Oct 2024 03:54:05 +0000 (13:54 +1000)]
Merge tag 'drm-xe-fixes-2024-10-10' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
- Fix error checking with xa_store() (Matthe Auld)
- Fix missing freq restore on GSC load error (Vinay)
- Fix wedged_mode file permission (Matt Roper)
- Fix use-after-free in ct communication (Matthew Auld)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/jri65tmv3bjbhqhxs5smv45nazssxzhtwphojem4uufwtjuliy@gsdhlh6kzsdy
12 months agoMerge tag 'drm-misc-fixes-2024-10-10' of https://gitlab.freedesktop.org/drm/misc...
Dave Airlie [Thu, 10 Oct 2024 23:03:20 +0000 (09:03 +1000)]
Merge tag 'drm-misc-fixes-2024-10-10' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

Short summary of fixes pull:

fbdev-dma:
- Only clean up deferred I/O if instanciated

nouveau:
- dmem: Fix privileged error in copy engine channel; Fix possible
data leak in migrate_to_ram()
- gsp: Fix coding style

sched:
- Avoid leaking lockdep map

v3d:
- Stop active perfmon before destroying it

vc4:
- Stop active perfmon before destroying it

xe:
- Drop GuC submit_wq pool

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20241010133708.GA461532@localhost.localdomain
12 months agoMerge tag 'drm-intel-fixes-2024-10-10' of https://gitlab.freedesktop.org/drm/i915...
Dave Airlie [Thu, 10 Oct 2024 22:55:26 +0000 (08:55 +1000)]
Merge tag 'drm-intel-fixes-2024-10-10' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- HDCP refcount fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Zwd78Tnw8t3w9F16@jlahtine-mobl.ger.corp.intel.com
12 months agoMerge tag 'net-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 10 Oct 2024 19:36:35 +0000 (12:36 -0700)]
Merge tag 'net-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth and netfilter.

  Current release - regressions:

   - dsa: sja1105: fix reception from VLAN-unaware bridges

   - Revert "net: stmmac: set PP_FLAG_DMA_SYNC_DEV only if XDP is
     enabled"

   - eth: fec: don't save PTP state if PTP is unsupported

  Current release - new code bugs:

   - smc: fix lack of icsk_syn_mss with IPPROTO_SMC, prevent null-deref

   - eth: airoha: update Tx CPU DMA ring idx at the end of xmit loop

   - phy: aquantia: AQR115c fix up PMA capabilities

  Previous releases - regressions:

   - tcp: 3 fixes for retrans_stamp and undo logic

  Previous releases - always broken:

   - net: do not delay dst_entries_add() in dst_release()

   - netfilter: restrict xtables extensions to families that are safe,
     syzbot found a way to combine ebtables with extensions that are
     never used by userspace tools

   - sctp: ensure sk_state is set to CLOSED if hashing fails in
     sctp_listen_start

   - mptcp: handle consistently DSS corruption, and prevent corruption
     due to large pmtu xmit"

* tag 'net-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
  MAINTAINERS: Add headers and mailing list to UDP section
  MAINTAINERS: consistently exclude wireless files from NETWORKING [GENERAL]
  slip: make slhc_remember() more robust against malicious packets
  net/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC
  ppp: fix ppp_async_encode() illegal access
  docs: netdev: document guidance on cleanup patches
  phonet: Handle error of rtnl_register_module().
  mpls: Handle error of rtnl_register_module().
  mctp: Handle error of rtnl_register_module().
  bridge: Handle error of rtnl_register_module().
  vxlan: Handle error of rtnl_register_module().
  rtnetlink: Add bulk registration helpers for rtnetlink message handlers.
  net: do not delay dst_entries_add() in dst_release()
  mptcp: pm: do not remove closing subflows
  mptcp: fallback when MPTCP opts are dropped after 1st data
  tcp: fix mptcp DSS corruption due to large pmtu xmit
  mptcp: handle consistently DSS corruption
  net: netconsole: fix wrong warning
  net: dsa: refuse cross-chip mirroring operations
  net: fec: don't save PTP state if PTP is unsupported
  ...

12 months agoMerge tag 'trace-ringbuffer-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 10 Oct 2024 19:25:32 +0000 (12:25 -0700)]
Merge tag 'trace-ringbuffer-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fix from Steven Rostedt:
 "Ring-buffer fix: do not have boot-mapped buffers use CPU hotplug
  callbacks

  When a ring buffer is mapped to memory assigned at boot, it also
  splits it up evenly between the possible CPUs. But the allocation code
  still attached a CPU notifier callback to this ring buffer. When a CPU
  is added, the callback will happen and another per-cpu buffer is
  created for the ring buffer.

  But for boot mapped buffers, there is no room to add another one (as
  they were all created already). The result of calling the CPU hotplug
  notifier on a boot mapped ring buffer is unpredictable and could lead
  to a system crash.

  If the ring buffer is boot mapped simply do not attach the CPU
  notifier to it"

* tag 'trace-ringbuffer-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Do not have boot mapped buffers hook to CPU hotplug

12 months agoof: Skip kunit tests when arm64+ACPI doesn't populate root node
Stephen Boyd [Wed, 9 Oct 2024 20:41:31 +0000 (13:41 -0700)]
of: Skip kunit tests when arm64+ACPI doesn't populate root node

A root node is required to apply DT overlays. A root node is usually
present after commit 7b937cc243e5 ("of: Create of_root if no dtb
provided by firmware"), except for on arm64 systems booted with ACPI
tables. In that case, the root node is intentionally not populated
because it would "allow DT devices to be instantiated atop an ACPI base
system"[1].

Introduce an OF function that skips the kunit test if the root node
isn't populated. Limit the test to when both CONFIG_ARM64 and
CONFIG_ACPI are set, because otherwise the lack of a root node is a bug.
Make the function private and take a kunit test parameter so that it
can't be abused to test for the presence of the root node in non-test
code.

Use this function to skip tests that require the root node. Currently
that's the DT tests and any tests that apply overlays.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/r/6cd337fb-38f0-41cb-b942-5844b84433db@roeck-us.net
Link: https://lore.kernel.org/r/Zd4dQpHO7em1ji67@FVFF77S0Q05N.cambridge.arm.com
Fixes: 893ecc6d2d61 ("of: Add KUnit test to confirm DTB is loaded")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20241009204133.1169931-1-sboyd@kernel.org
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
12 months agoMerge tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Thu, 10 Oct 2024 17:02:59 +0000 (10:02 -0700)]
Merge tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - update fstrim loop and add more cancellation points, fix reported
   delayed or blocked suspend if there's a huge chunk queued

 - fix error handling in recent qgroup xarray conversion

 - in zoned mode, fix warning printing device path without RCU
   protection

 - again fix invalid extent xarray state (6252690f7e1b), lost due to
   refactoring

* tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix clear_dirty and writeback ordering in submit_one_sector()
  btrfs: zoned: fix missing RCU locking in error message when loading zone info
  btrfs: fix missing error handling when adding delayed ref with qgroups enabled
  btrfs: add cancellation points to trim loops
  btrfs: split remaining space to discard in chunks

12 months agoMerge tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Linus Torvalds [Thu, 10 Oct 2024 16:52:49 +0000 (09:52 -0700)]
Merge tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Fix NFSD bring-up / shutdown

 - Fix a UAF when releasing a stateid

* tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: fix possible badness in FREE_STATEID
  nfsd: nfsd_destroy_serv() must call svc_destroy() even if nfsd_startup_net() failed
  NFSD: Mark filecache "down" if init fails

12 months agorcu/nocb: Fix rcuog wake-up from offline softirq
Frederic Weisbecker [Thu, 10 Oct 2024 16:36:09 +0000 (18:36 +0200)]
rcu/nocb: Fix rcuog wake-up from offline softirq

After a CPU has set itself offline and before it eventually calls
rcutree_report_cpu_dead(), there are still opportunities for callbacks
to be enqueued, for example from a softirq. When that happens on NOCB,
the rcuog wake-up is deferred through an IPI to an online CPU in order
not to call into the scheduler and risk arming the RT-bandwidth after
hrtimers have been migrated out and disabled.

But performing a synchronized IPI from a softirq is buggy as reported in
the following scenario:

        WARNING: CPU: 1 PID: 26 at kernel/smp.c:633 smp_call_function_single
        Modules linked in: rcutorture torture
        CPU: 1 UID: 0 PID: 26 Comm: migration/1 Not tainted 6.11.0-rc1-00012-g9139f93209d1 #1
        Stopper: multi_cpu_stop+0x0/0x320 <- __stop_cpus+0xd0/0x120
        RIP: 0010:smp_call_function_single
        <IRQ>
        swake_up_one_online
        __call_rcu_nocb_wake
        __call_rcu_common
        ? rcu_torture_one_read
        call_timer_fn
        __run_timers
        run_timer_softirq
        handle_softirqs
        irq_exit_rcu
        ? tick_handle_periodic
        sysvec_apic_timer_interrupt
        </IRQ>

Fix this with forcing deferred rcuog wake up through the NOCB timer when
the CPU is offline. The actual wake up will happen from
rcutree_report_cpu_dead().

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202409231644.4c55582d-lkp@intel.com
Fixes: 9139f93209d1 ("rcu/nocb: Fix RT throttling hrtimer armed from offline CPU")
Reviewed-by: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
12 months agoMerge tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Thu, 10 Oct 2024 16:45:45 +0000 (09:45 -0700)]
Merge tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Carlos Maiolino:

 - A few small typo fixes

 - fstests xfs/538 DEBUG-only fix

 - Performance fix on blockgc on COW'ed files, by skipping trims on
   cowblock inodes currently opened for write

 - Prevent cowblocks to be freed under dirty pagecache during unshare

 - Update MAINTAINERS file to quote the new maintainer

* tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix a typo
  xfs: don't free cowblocks from under dirty pagecache on unshare
  xfs: skip background cowblock trims on inodes open for write
  xfs: support lowmode allocations in xfs_bmap_exact_minlen_extent_alloc
  xfs: call xfs_bmap_exact_minlen_extent_alloc from xfs_bmap_btalloc
  xfs: don't ifdef around the exact minlen allocations
  xfs: fold xfs_bmap_alloc_userdata into xfs_bmapi_allocate
  xfs: distinguish extra split from real ENOSPC from xfs_attr_node_try_addname
  xfs: distinguish extra split from real ENOSPC from xfs_attr3_leaf_split
  xfs: return bool from xfs_attr3_leaf_add
  xfs: merge xfs_attr_leaf_try_add into xfs_attr_leaf_addname
  xfs: Use try_cmpxchg() in xlog_cil_insert_pcp_aggregate()
  xfs: scrub: convert comma to semicolon
  xfs: Remove empty declartion in header file
  MAINTAINERS: add Carlos Maiolino as XFS release manager

12 months agoMerge branch 'maintainers-networking-file-coverage-updates'
Jakub Kicinski [Thu, 10 Oct 2024 16:35:50 +0000 (09:35 -0700)]
Merge branch 'maintainers-networking-file-coverage-updates'

Simon Horman says:

====================
MAINTAINERS: Networking file coverage updates

The aim of this proposal is to make the handling of some files,
related to Networking and Wireless, more consistently. It does so by:

1. Adding some more headers to the UDP section, making it consistent
   with the TCP section.

2. Excluding some files relating to Wireless from NETWORKING [GENERAL],
   making their handling consistent with other files related to
   Wireless.

The aim of this is to make things more consistent.  And for MAINTAINERS
to better reflect the situation on the ground.  I am more than happy to
be told that the current state of affairs is fine. Or for other ideas to
be discussed.

v1: https://lore.kernel.org/20241004-maint-net-hdrs-v1-0-41fd555aacc5@kernel.org
====================

Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-0-f2c86e7309c8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoMAINTAINERS: Add headers and mailing list to UDP section
Simon Horman [Wed, 9 Oct 2024 08:47:23 +0000 (09:47 +0100)]
MAINTAINERS: Add headers and mailing list to UDP section

Add netdev mailing list and some more udp.h headers to the UDP section.
This is now more consistent with the TCP section.

Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-2-f2c86e7309c8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoMAINTAINERS: consistently exclude wireless files from NETWORKING [GENERAL]
Simon Horman [Wed, 9 Oct 2024 08:47:22 +0000 (09:47 +0100)]
MAINTAINERS: consistently exclude wireless files from NETWORKING [GENERAL]

We already exclude wireless drivers from the netdev@ traffic, to
delegate it to linux-wireless@, and avoid overwhelming netdev@.

Many of the following wireless-related sections MAINTAINERS
are already not included in the NETWORKING [GENERAL] section.
For consistency, exclude those that are.

* 802.11 (including CFG80211/NL80211)
* MAC80211
* RFKILL

Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-1-f2c86e7309c8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoslip: make slhc_remember() more robust against malicious packets
Eric Dumazet [Wed, 9 Oct 2024 09:11:32 +0000 (09:11 +0000)]
slip: make slhc_remember() more robust against malicious packets

syzbot found that slhc_remember() was missing checks against
malicious packets [1].

slhc_remember() only checked the size of the packet was at least 20,
which is not good enough.

We need to make sure the packet includes the IPv4 and TCP header
that are supposed to be carried.

Add iph and th pointers to make the code more readable.

[1]

BUG: KMSAN: uninit-value in slhc_remember+0x2e8/0x7b0 drivers/net/slip/slhc.c:666
  slhc_remember+0x2e8/0x7b0 drivers/net/slip/slhc.c:666
  ppp_receive_nonmp_frame+0xe45/0x35e0 drivers/net/ppp/ppp_generic.c:2455
  ppp_receive_frame drivers/net/ppp/ppp_generic.c:2372 [inline]
  ppp_do_recv+0x65f/0x40d0 drivers/net/ppp/ppp_generic.c:2212
  ppp_input+0x7dc/0xe60 drivers/net/ppp/ppp_generic.c:2327
  pppoe_rcv_core+0x1d3/0x720 drivers/net/ppp/pppoe.c:379
  sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1113
  __release_sock+0x1da/0x330 net/core/sock.c:3072
  release_sock+0x6b/0x250 net/core/sock.c:3626
  pppoe_sendmsg+0x2b8/0xb90 drivers/net/ppp/pppoe.c:903
  sock_sendmsg_nosec net/socket.c:729 [inline]
  __sock_sendmsg+0x30f/0x380 net/socket.c:744
  ____sys_sendmsg+0x903/0xb60 net/socket.c:2602
  ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
  __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
  __do_sys_sendmmsg net/socket.c:2771 [inline]
  __se_sys_sendmmsg net/socket.c:2768 [inline]
  __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
  x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Uninit was created at:
  slab_post_alloc_hook mm/slub.c:4091 [inline]
  slab_alloc_node mm/slub.c:4134 [inline]
  kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4186
  kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587
  __alloc_skb+0x363/0x7b0 net/core/skbuff.c:678
  alloc_skb include/linux/skbuff.h:1322 [inline]
  sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2732
  pppoe_sendmsg+0x3a7/0xb90 drivers/net/ppp/pppoe.c:867
  sock_sendmsg_nosec net/socket.c:729 [inline]
  __sock_sendmsg+0x30f/0x380 net/socket.c:744
  ____sys_sendmsg+0x903/0xb60 net/socket.c:2602
  ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
  __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
  __do_sys_sendmmsg net/socket.c:2771 [inline]
  __se_sys_sendmmsg net/socket.c:2768 [inline]
  __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
  x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

CPU: 0 UID: 0 PID: 5460 Comm: syz.2.33 Not tainted 6.12.0-rc2-syzkaller-00006-g87d6aab2389e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024

Fixes: b5451d783ade ("slip: Move the SLIP drivers")
Reported-by: syzbot+2ada1bc857496353be5a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/670646db.050a0220.3f80e.0027.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241009091132.2136321-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agonet/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC
D. Wythe [Wed, 9 Oct 2024 06:55:16 +0000 (14:55 +0800)]
net/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC

Eric report a panic on IPPROTO_SMC, and give the facts
that when INET_PROTOSW_ICSK was set, icsk->icsk_sync_mss must be set too.

Bug: Unable to handle kernel NULL pointer dereference at virtual address
0000000000000000
Mem abort info:
ESR = 0x0000000086000005
EC = 0x21: IABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x05: level 1 translation fault
user pgtable: 4k pages, 48-bit VAs, pgdp=00000001195d1000
[0000000000000000] pgd=0800000109c46003, p4d=0800000109c46003,
pud=0000000000000000
Internal error: Oops: 0000000086000005 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 UID: 0 PID: 8037 Comm: syz.3.265 Not tainted
6.11.0-rc7-syzkaller-g5f5673607153 #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 08/06/2024
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : 0x0
lr : cipso_v4_sock_setattr+0x2a8/0x3c0 net/ipv4/cipso_ipv4.c:1910
sp : ffff80009b887a90
x29: ffff80009b887aa0 x28: ffff80008db94050 x27: 0000000000000000
x26: 1fffe0001aa6f5b3 x25: dfff800000000000 x24: ffff0000db75da00
x23: 0000000000000000 x22: ffff0000d8b78518 x21: 0000000000000000
x20: ffff0000d537ad80 x19: ffff0000d8b78000 x18: 1fffe000366d79ee
x17: ffff8000800614a8 x16: ffff800080569b84 x15: 0000000000000001
x14: 000000008b336894 x13: 00000000cd96feaa x12: 0000000000000003
x11: 0000000000040000 x10: 00000000000020a3 x9 : 1fffe0001b16f0f1
x8 : 0000000000000000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000040 x4 : 0000000000000001 x3 : 0000000000000000
x2 : 0000000000000002 x1 : 0000000000000000 x0 : ffff0000d8b78000
Call trace:
0x0
netlbl_sock_setattr+0x2e4/0x338 net/netlabel/netlabel_kapi.c:1000
smack_netlbl_add+0xa4/0x154 security/smack/smack_lsm.c:2593
smack_socket_post_create+0xa8/0x14c security/smack/smack_lsm.c:2973
security_socket_post_create+0x94/0xd4 security/security.c:4425
__sock_create+0x4c8/0x884 net/socket.c:1587
sock_create net/socket.c:1622 [inline]
__sys_socket_create net/socket.c:1659 [inline]
__sys_socket+0x134/0x340 net/socket.c:1706
__do_sys_socket net/socket.c:1720 [inline]
__se_sys_socket net/socket.c:1718 [inline]
__arm64_sys_socket+0x7c/0x94 net/socket.c:1718
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
Code: ???????? ???????? ???????? ???????? (????????)
---[ end trace 0000000000000000 ]---

This patch add a toy implementation that performs a simple return to
prevent such panic. This is because MSS can be set in sock_create_kern
or smc_setsockopt, similar to how it's done in AF_SMC. However, for
AF_SMC, there is currently no way to synchronize MSS within
__sys_connect_file. This toy implementation lays the groundwork for us
to support such feature for IPPROTO_SMC in the future.

Fixes: d25a92ccae6b ("net/smc: Introduce IPPROTO_SMC")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Link: https://patch.msgid.link/1728456916-67035-1-git-send-email-alibuda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoppp: fix ppp_async_encode() illegal access
Eric Dumazet [Wed, 9 Oct 2024 18:58:02 +0000 (18:58 +0000)]
ppp: fix ppp_async_encode() illegal access

syzbot reported an issue in ppp_async_encode() [1]

In this case, pppoe_sendmsg() is called with a zero size.
Then ppp_async_encode() is called with an empty skb.

BUG: KMSAN: uninit-value in ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline]
 BUG: KMSAN: uninit-value in ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675
  ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline]
  ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675
  ppp_async_send+0x130/0x1b0 drivers/net/ppp/ppp_async.c:634
  ppp_channel_bridge_input drivers/net/ppp/ppp_generic.c:2280 [inline]
  ppp_input+0x1f1/0xe60 drivers/net/ppp/ppp_generic.c:2304
  pppoe_rcv_core+0x1d3/0x720 drivers/net/ppp/pppoe.c:379
  sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1113
  __release_sock+0x1da/0x330 net/core/sock.c:3072
  release_sock+0x6b/0x250 net/core/sock.c:3626
  pppoe_sendmsg+0x2b8/0xb90 drivers/net/ppp/pppoe.c:903
  sock_sendmsg_nosec net/socket.c:729 [inline]
  __sock_sendmsg+0x30f/0x380 net/socket.c:744
  ____sys_sendmsg+0x903/0xb60 net/socket.c:2602
  ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
  __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
  __do_sys_sendmmsg net/socket.c:2771 [inline]
  __se_sys_sendmmsg net/socket.c:2768 [inline]
  __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
  x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Uninit was created at:
  slab_post_alloc_hook mm/slub.c:4092 [inline]
  slab_alloc_node mm/slub.c:4135 [inline]
  kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4187
  kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587
  __alloc_skb+0x363/0x7b0 net/core/skbuff.c:678
  alloc_skb include/linux/skbuff.h:1322 [inline]
  sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2732
  pppoe_sendmsg+0x3a7/0xb90 drivers/net/ppp/pppoe.c:867
  sock_sendmsg_nosec net/socket.c:729 [inline]
  __sock_sendmsg+0x30f/0x380 net/socket.c:744
  ____sys_sendmsg+0x903/0xb60 net/socket.c:2602
  ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
  __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
  __do_sys_sendmmsg net/socket.c:2771 [inline]
  __se_sys_sendmmsg net/socket.c:2768 [inline]
  __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
  x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

CPU: 1 UID: 0 PID: 5411 Comm: syz.1.14 Not tainted 6.12.0-rc1-syzkaller-00165-g360c1f1f24c6 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+1d121645899e7692f92a@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009185802.3763282-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agodocs: netdev: document guidance on cleanup patches
Simon Horman [Wed, 9 Oct 2024 09:12:19 +0000 (10:12 +0100)]
docs: netdev: document guidance on cleanup patches

The purpose of this section is to document what is the current practice
regarding clean-up patches which address checkpatch warnings and similar
problems. I feel there is a value in having this documented so others
can easily refer to it.

Clearly this topic is subjective. And to some extent the current
practice discourages a wider range of patches than is described here.
But I feel it is best to start somewhere, with the most well established
part of the current practice.

Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009-doc-mc-clean-v2-1-e637b665fa81@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoMerge branch 'rtnetlink-handle-error-of-rtnl_register_module'
Paolo Abeni [Thu, 10 Oct 2024 13:39:37 +0000 (15:39 +0200)]
Merge branch 'rtnetlink-handle-error-of-rtnl_register_module'

Kuniyuki Iwashima says:

====================
rtnetlink: Handle error of rtnl_register_module().

While converting phonet to per-netns RTNL, I found a weird comment

  /* Further rtnl_register_module() cannot fail */

that was true but no longer true after commit addf9b90de22 ("net:
rtnetlink: use rcu to free rtnl message handlers").

Many callers of rtnl_register_module() just ignore the returned
value but should handle them properly.

This series introduces two helpers, rtnl_register_many() and
rtnl_unregister_many(), to do that easily and fix such callers.

All rtnl_register() and rtnl_register_module() will be converted
to _many() variant and some rtnl_lock() will be saved in _many()
later in net-next.

Changes:
  v4:
    * Add more context in changelog of each patch

  v3: https://lore.kernel.org/all/20241007124459.5727-1-kuniyu@amazon.com/
    * Move module *owner to struct rtnl_msg_handler
    * Make struct rtnl_msg_handler args/vars const
    * Update mctp goto labels

  v2: https://lore.kernel.org/netdev/20241004222358.79129-1-kuniyu@amazon.com/
    * Remove __exit from mctp_neigh_exit().

  v1: https://lore.kernel.org/netdev/20241003205725.5612-1-kuniyu@amazon.com/
====================

Link: https://patch.msgid.link/20241008184737.9619-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agophonet: Handle error of rtnl_register_module().
Kuniyuki Iwashima [Tue, 8 Oct 2024 18:47:37 +0000 (11:47 -0700)]
phonet: Handle error of rtnl_register_module().

Before commit addf9b90de22 ("net: rtnetlink: use rcu to free rtnl
message handlers"), once the first rtnl_register_module() allocated
rtnl_msg_handlers[PF_PHONET], the following calls never failed.

However, after the commit, rtnl_register_module() could fail silently
to allocate rtnl_msg_handlers[PF_PHONET][msgtype] and requires error
handling for each call.

Handling the error allows users to view a module as an all-or-nothing
thing in terms of the rtnetlink functionality.  This prevents syzkaller
from reporting spurious errors from its tests, where OOM often occurs
and module is automatically loaded.

Let's use rtnl_register_many() to handle the errors easily.

Fixes: addf9b90de22 ("net: rtnetlink: use rcu to free rtnl message handlers")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Rémi Denis-Courmont <courmisch@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agompls: Handle error of rtnl_register_module().
Kuniyuki Iwashima [Tue, 8 Oct 2024 18:47:36 +0000 (11:47 -0700)]
mpls: Handle error of rtnl_register_module().

Since introduced, mpls_init() has been ignoring the returned
value of rtnl_register_module(), which could fail silently.

Handling the error allows users to view a module as an all-or-nothing
thing in terms of the rtnetlink functionality.  This prevents syzkaller
from reporting spurious errors from its tests, where OOM often occurs
and module is automatically loaded.

Let's handle the errors by rtnl_register_many().

Fixes: 03c0566542f4 ("mpls: Netlink commands to add, remove, and dump routes")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agomctp: Handle error of rtnl_register_module().
Kuniyuki Iwashima [Tue, 8 Oct 2024 18:47:35 +0000 (11:47 -0700)]
mctp: Handle error of rtnl_register_module().

Since introduced, mctp has been ignoring the returned value of
rtnl_register_module(), which could fail silently.

Handling the error allows users to view a module as an all-or-nothing
thing in terms of the rtnetlink functionality.  This prevents syzkaller
from reporting spurious errors from its tests, where OOM often occurs
and module is automatically loaded.

Let's handle the errors by rtnl_register_many().

Fixes: 583be982d934 ("mctp: Add device handling and netlink interface")
Fixes: 831119f88781 ("mctp: Add neighbour netlink interface")
Fixes: 06d2f4c583a7 ("mctp: Add netlink route management")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agobridge: Handle error of rtnl_register_module().
Kuniyuki Iwashima [Tue, 8 Oct 2024 18:47:34 +0000 (11:47 -0700)]
bridge: Handle error of rtnl_register_module().

Since introduced, br_vlan_rtnl_init() has been ignoring the returned
value of rtnl_register_module(), which could fail silently.

Handling the error allows users to view a module as an all-or-nothing
thing in terms of the rtnetlink functionality.  This prevents syzkaller
from reporting spurious errors from its tests, where OOM often occurs
and module is automatically loaded.

Let's handle the errors by rtnl_register_many().

Fixes: 8dcea187088b ("net: bridge: vlan: add rtm definitions and dump support")
Fixes: f26b296585dc ("net: bridge: vlan: add new rtm message support")
Fixes: adb3ce9bcb0f ("net: bridge: vlan: add del rtm message support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agovxlan: Handle error of rtnl_register_module().
Kuniyuki Iwashima [Tue, 8 Oct 2024 18:47:33 +0000 (11:47 -0700)]
vxlan: Handle error of rtnl_register_module().

Since introduced, vxlan_vnifilter_init() has been ignoring the
returned value of rtnl_register_module(), which could fail silently.

Handling the error allows users to view a module as an all-or-nothing
thing in terms of the rtnetlink functionality.  This prevents syzkaller
from reporting spurious errors from its tests, where OOM often occurs
and module is automatically loaded.

Let's handle the errors by rtnl_register_many().

Fixes: f9c4bb0b245c ("vxlan: vni filtering support on collect metadata device")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agortnetlink: Add bulk registration helpers for rtnetlink message handlers.
Kuniyuki Iwashima [Tue, 8 Oct 2024 18:47:32 +0000 (11:47 -0700)]
rtnetlink: Add bulk registration helpers for rtnetlink message handlers.

Before commit addf9b90de22 ("net: rtnetlink: use rcu to free rtnl message
handlers"), once rtnl_msg_handlers[protocol] was allocated, the following
rtnl_register_module() for the same protocol never failed.

However, after the commit, rtnl_msg_handler[protocol][msgtype] needs to
be allocated in each rtnl_register_module(), so each call could fail.

Many callers of rtnl_register_module() do not handle the returned error,
and we need to add many error handlings.

To handle that easily, let's add wrapper functions for bulk registration
of rtnetlink message handlers.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agoPM: domains: Fix alloc/free in dev_pm_domain_attach|detach_list()
Ulf Hansson [Wed, 2 Oct 2024 12:22:23 +0000 (14:22 +0200)]
PM: domains: Fix alloc/free in dev_pm_domain_attach|detach_list()

The dev_pm_domain_attach|detach_list() functions are not resource managed,
hence they should not use devm_* helpers to manage allocation/freeing of
data. Let's fix this by converting to the traditional alloc/free functions.

Fixes: 161e16a5e50a ("PM: domains: Add helper functions to attach/detach multiple PM domains")
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/r/20241002122232.194245-3-ulf.hansson@linaro.org
12 months agoRevert "drm/tegra: gr3d: Convert into dev_pm_domain_attach|detach_list()"
Ulf Hansson [Wed, 2 Oct 2024 12:22:22 +0000 (14:22 +0200)]
Revert "drm/tegra: gr3d: Convert into dev_pm_domain_attach|detach_list()"

This reverts commit f790b5c09665cab0d51dfcc84832d79d2b1e6c0e.

The reverted commit was not ready to be applied due to dependency on other
OPP/pmdomain changes that didn't make it for the last release cycle. Let's
revert it to fix the behaviour.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/r/20241002122232.194245-2-ulf.hansson@linaro.org
12 months agoMerge tag 'nf-24-10-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Paolo Abeni [Thu, 10 Oct 2024 11:50:55 +0000 (13:50 +0200)]
Merge tag 'nf-24-10-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Restrict xtables extensions to families that are safe, syzbot found
   a way to combine ebtables with extensions that are never used by
   userspace tools. From Florian Westphal.

2) Set l3mdev inconditionally whenever possible in nft_fib to fix lookup
   mismatch, also from Florian.

netfilter pull request 24-10-09

* tag 'nf-24-10-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  selftests: netfilter: conntrack_vrf.sh: add fib test case
  netfilter: fib: check correct rtable in vrf setups
  netfilter: xtables: avoid NFPROTO_UNSPEC where needed
====================

Link: https://patch.msgid.link/20241009213858.3565808-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agommc: sdhci-of-dwcmshc: Prevent stale command interrupt handling
Michal Wilczynski [Tue, 8 Oct 2024 10:03:27 +0000 (12:03 +0200)]
mmc: sdhci-of-dwcmshc: Prevent stale command interrupt handling

While working with the T-Head 1520 LicheePi4A SoC, certain conditions
arose that allowed me to reproduce a race issue in the sdhci code.

To reproduce the bug, you need to enable the sdio1 controller in the
device tree file
`arch/riscv/boot/dts/thead/th1520-lichee-module-4a.dtsi` as follows:

&sdio1 {
bus-width = <4>;
max-frequency = <100000000>;
no-sd;
no-mmc;
broken-cd;
cap-sd-highspeed;
post-power-on-delay-ms = <50>;
status = "okay";
wakeup-source;
keep-power-in-suspend;
};

When resetting the SoC using the reset button, the following messages
appear in the dmesg log:

[    8.164898] mmc2: Got command interrupt 0x00000001 even though no
command operation was in progress.
[    8.174054] mmc2: sdhci: ============ SDHCI REGISTER DUMP ===========
[    8.180503] mmc2: sdhci: Sys addr:  0x00000000 | Version:  0x00000005
[    8.186950] mmc2: sdhci: Blk size:  0x00000000 | Blk cnt:  0x00000000
[    8.193395] mmc2: sdhci: Argument:  0x00000000 | Trn mode: 0x00000000
[    8.199841] mmc2: sdhci: Present:   0x03da0000 | Host ctl: 0x00000000
[    8.206287] mmc2: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
[    8.212733] mmc2: sdhci: Wake-up:   0x00000000 | Clock:    0x0000decf
[    8.219178] mmc2: sdhci: Timeout:   0x00000000 | Int stat: 0x00000000
[    8.225622] mmc2: sdhci: Int enab:  0x00ff1003 | Sig enab: 0x00ff1003
[    8.232068] mmc2: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[    8.238513] mmc2: sdhci: Caps:      0x3f69c881 | Caps_1:   0x08008177
[    8.244959] mmc2: sdhci: Cmd:       0x00000502 | Max curr: 0x00191919
[    8.254115] mmc2: sdhci: Resp[0]:   0x00001009 | Resp[1]:  0x00000000
[    8.260561] mmc2: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
[    8.267005] mmc2: sdhci: Host ctl2: 0x00001000
[    8.271453] mmc2: sdhci: ADMA Err:  0x00000000 | ADMA Ptr:
0x0000000000000000
[    8.278594] mmc2: sdhci: ============================================

I also enabled some traces to better understand the problem:

     kworker/3:1-62      [003] .....     8.163538: mmc_request_start:
mmc2: start struct mmc_request[000000000d30cc0c]: cmd_opcode=5
cmd_arg=0x0 cmd_flags=0x2e1 cmd_retries=0 stop_opcode=0 stop_arg=0x0
stop_flags=0x0 stop_retries=0 sbc_opcode=0 sbc_arg=0x0 sbc_flags=0x0
sbc_retires=0 blocks=0 block_size=0 blk_addr=0 data_flags=0x0 tag=0
can_retune=0 doing_retune=0 retune_now=0 need_retune=0 hold_retune=1
retune_period=0
          <idle>-0       [000] d.h2.     8.164816: sdhci_cmd_irq:
hw_name=ffe70a0000.mmc quirks=0x2008008 quirks2=0x8 intmask=0x10000
intmask_p=0x18000
     irq/24-mmc2-96      [000] .....     8.164840: sdhci_thread_irq:
msg=
     irq/24-mmc2-96      [000] d.h2.     8.164896: sdhci_cmd_irq:
hw_name=ffe70a0000.mmc quirks=0x2008008 quirks2=0x8 intmask=0x1
intmask_p=0x1
     irq/24-mmc2-96      [000] .....     8.285142: mmc_request_done:
mmc2: end struct mmc_request[000000000d30cc0c]: cmd_opcode=5
cmd_err=-110 cmd_resp=0x0 0x0 0x0 0x0 cmd_retries=0 stop_opcode=0
stop_err=0 stop_resp=0x0 0x0 0x0 0x0 stop_retries=0 sbc_opcode=0
sbc_err=0 sbc_resp=0x0 0x0 0x0 0x0 sbc_retries=0 bytes_xfered=0
data_err=0 tag=0 can_retune=0 doing_retune=0 retune_now=0 need_retune=0
hold_retune=1 retune_period=0

Here's what happens: the __mmc_start_request function is called with
opcode 5. Since the power to the Wi-Fi card, which resides on this SDIO
bus, is initially off after the reset, an interrupt SDHCI_INT_TIMEOUT is
triggered. Immediately after that, a second interrupt SDHCI_INT_RESPONSE
is triggered. Depending on the exact timing, these conditions can
trigger the following race problem:

1) The sdhci_cmd_irq top half handles the command as an error. It sets
   host->cmd to NULL and host->pending_reset to true.
2) The sdhci_thread_irq bottom half is scheduled next and executes faster
   than the second interrupt handler for SDHCI_INT_RESPONSE. It clears
   host->pending_reset before the SDHCI_INT_RESPONSE handler runs.
3) The pending interrupt SDHCI_INT_RESPONSE handler gets called, triggering
   a code path that prints: "mmc2: Got command interrupt 0x00000001 even
   though no command operation was in progress."

To solve this issue, we need to clear pending interrupts when resetting
host->pending_reset. This ensures that after sdhci_threaded_irq restores
interrupts, there are no pending stale interrupts.

The behavior observed here is non-compliant with the SDHCI standard.
Place the code in the sdhci-of-dwcmshc driver to account for a
hardware-specific quirk instead of the core SDHCI code.

Signed-off-by: Michal Wilczynski <m.wilczynski@samsung.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: 43658a542ebf ("mmc: sdhci-of-dwcmshc: Add support for T-Head TH1520")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20241008100327.4108895-1-m.wilczynski@samsung.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
12 months agonet: do not delay dst_entries_add() in dst_release()
Eric Dumazet [Tue, 8 Oct 2024 14:31:10 +0000 (14:31 +0000)]
net: do not delay dst_entries_add() in dst_release()

dst_entries_add() uses per-cpu data that might be freed at netns
dismantle from ip6_route_net_exit() calling dst_entries_destroy()

Before ip6_route_net_exit() can be called, we release all
the dsts associated with this netns, via calls to dst_release(),
which waits an rcu grace period before calling dst_destroy()

dst_entries_add() use in dst_destroy() is racy, because
dst_entries_destroy() could have been called already.

Decrementing the number of dsts must happen sooner.

Notes:

1) in CONFIG_XFRM case, dst_destroy() can call
   dst_release_immediate(child), this might also cause UAF
   if the child does not have DST_NOCOUNT set.
   IPSEC maintainers might take a look and see how to address this.

2) There is also discussion about removing this count of dst,
   which might happen in future kernels.

Fixes: f88649721268 ("ipv4: fix dst race in sk_dst_get()")
Closes: https://lore.kernel.org/lkml/CANn89iLCCGsP7SFn9HKpvnKu96Td4KD08xf7aGtiYgZnkjaL=w@mail.gmail.com/T/
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Xin Long <lucien.xin@gmail.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20241008143110.1064899-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agoata: libata: Update MAINTAINERS file
Damien Le Moal [Thu, 10 Oct 2024 02:01:17 +0000 (11:01 +0900)]
ata: libata: Update MAINTAINERS file

Modify the entry for the ahci_platform driver (LIBATA SATA
AHCI PLATFORM devices support) in the MAINTAINERS file to remove Jens
as maintainer. Also remove all references to Jens block tree from the
various LIBATA driver entries as the tree reference for these is defined
by the LIBATA SUBSYSTEM entry.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Acked-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20241010020117.416333-1-dlemoal@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
12 months agodrm/fbdev-dma: Only cleanup deferred I/O if necessary
Janne Grunau [Sun, 6 Oct 2024 17:49:45 +0000 (19:49 +0200)]
drm/fbdev-dma: Only cleanup deferred I/O if necessary

Commit 5a498d4d06d6 ("drm/fbdev-dma: Only install deferred I/O if
necessary") initializes deferred I/O only if it is used.
drm_fbdev_dma_fb_destroy() however calls fb_deferred_io_cleanup()
unconditionally with struct fb_info.fbdefio == NULL. KASAN with the
out-of-tree Apple silicon display driver posts following warning from
__flush_work() of a random struct work_struct instead of the expected
NULL pointer derefs.

[   22.053799] ------------[ cut here ]------------
[   22.054832] WARNING: CPU: 2 PID: 1 at kernel/workqueue.c:4177 __flush_work+0x4d8/0x580
[   22.056597] Modules linked in: uhid bnep uinput nls_ascii ip6_tables ip_tables i2c_dev loop fuse dm_multipath nfnetlink zram hid_magicmouse btrfs xor xor_neon brcmfmac_wcc raid6_pq hci_bcm4377 bluetooth brcmfmac hid_apple brcmutil nvmem_spmi_mfd simple_mfd_spmi dockchannel_hid cfg80211 joydev regmap_spmi nvme_apple ecdh_generic ecc macsmc_hid rfkill dwc3 appledrm snd_soc_macaudio macsmc_power nvme_core apple_isp phy_apple_atc apple_sart apple_rtkit_helper apple_dockchannel tps6598x macsmc_hwmon snd_soc_cs42l84 videobuf2_v4l2 spmi_apple_controller nvmem_apple_efuses videobuf2_dma_sg apple_z2 videobuf2_memops spi_nor panel_summit videobuf2_common asahi videodev pwm_apple apple_dcp snd_soc_apple_mca apple_admac spi_apple clk_apple_nco i2c_pasemi_platform snd_pcm_dmaengine mc i2c_pasemi_core mux_core ofpart adpdrm drm_dma_helper apple_dart apple_soc_cpufreq leds_pwm phram
[   22.073768] CPU: 2 UID: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.11.2-asahi+ #asahi-dev
[   22.075612] Hardware name: Apple MacBook Pro (13-inch, M2, 2022) (DT)
[   22.077032] pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[   22.078567] pc : __flush_work+0x4d8/0x580
[   22.079471] lr : __flush_work+0x54/0x580
[   22.080345] sp : ffffc000836ef820
[   22.081089] x29: ffffc000836ef880 x28: 0000000000000000 x27: ffff80002ddb7128
[   22.082678] x26: dfffc00000000000 x25: 1ffff000096f0c57 x24: ffffc00082d3e358
[   22.084263] x23: ffff80004b7862b8 x22: dfffc00000000000 x21: ffff80005aa1d470
[   22.085855] x20: ffff80004b786000 x19: ffff80004b7862a0 x18: 0000000000000000
[   22.087439] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000005
[   22.089030] x14: 1ffff800106ddf0a x13: 0000000000000000 x12: 0000000000000000
[   22.090618] x11: ffffb800106ddf0f x10: dfffc00000000000 x9 : 1ffff800106ddf0e
[   22.092206] x8 : 0000000000000000 x7 : aaaaaaaaaaaaaaaa x6 : 0000000000000001
[   22.093790] x5 : ffffc000836ef728 x4 : 0000000000000000 x3 : 0000000000000020
[   22.095368] x2 : 0000000000000008 x1 : 00000000000000aa x0 : 0000000000000000
[   22.096955] Call trace:
[   22.097505]  __flush_work+0x4d8/0x580
[   22.098330]  flush_delayed_work+0x80/0xb8
[   22.099231]  fb_deferred_io_cleanup+0x3c/0x130
[   22.100217]  drm_fbdev_dma_fb_destroy+0x6c/0xe0 [drm_dma_helper]
[   22.101559]  unregister_framebuffer+0x210/0x2f0
[   22.102575]  drm_fb_helper_unregister_info+0x48/0x60
[   22.103683]  drm_fbdev_dma_client_unregister+0x4c/0x80 [drm_dma_helper]
[   22.105147]  drm_client_dev_unregister+0x1cc/0x230
[   22.106217]  drm_dev_unregister+0x58/0x570
[   22.107125]  apple_drm_unbind+0x50/0x98 [appledrm]
[   22.108199]  component_del+0x1f8/0x3a8
[   22.109042]  dcp_platform_shutdown+0x24/0x38 [apple_dcp]
[   22.110357]  platform_shutdown+0x70/0x90
[   22.111219]  device_shutdown+0x368/0x4d8
[   22.112095]  kernel_restart+0x6c/0x1d0
[   22.112946]  __arm64_sys_reboot+0x1c8/0x328
[   22.113868]  invoke_syscall+0x78/0x1a8
[   22.114703]  do_el0_svc+0x124/0x1a0
[   22.115498]  el0_svc+0x3c/0xe0
[   22.116181]  el0t_64_sync_handler+0x70/0xc0
[   22.117110]  el0t_64_sync+0x190/0x198
[   22.117931] ---[ end trace 0000000000000000 ]---

Signed-off-by: Janne Grunau <j@jannau.net>
Fixes: 5a498d4d06d6 ("drm/fbdev-dma: Only install deferred I/O if necessary")
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/ZwLNuZL-8Gh5UUQb@robin
12 months agoof: Fix unbalanced of node refcount and memory leaks
Jinjie Ruan [Thu, 10 Oct 2024 03:44:16 +0000 (11:44 +0800)]
of: Fix unbalanced of node refcount and memory leaks

Got following report when doing overlay_test:

OF: ERROR: memory leak, expected refcount 1 instead of 2,
of_node_get()/of_node_put() unbalanced - destroy cset entry:
attach overlay node            /kunit-test

OF: ERROR: memory leak before free overlay changeset,  /kunit-test

In of_overlay_apply_kunit_cleanup(), the "np" should be associated with
fake instead of test to call of_node_put(), so the node is put before
the overlay is removed.

It also fix the following memory leaks:

unreferenced object 0xffffff80c7d22800 (size 256):
  comm "kunit_try_catch", pid 236, jiffies 4294894764
  hex dump (first 32 bytes):
    d0 26 d4 c2 80 ff ff ff 00 00 00 00 00 00 00 00  .&..............
    60 19 75 c1 80 ff ff ff 00 00 00 00 00 00 00 00  `.u.............
  backtrace (crc ee0a471c):
    [<0000000058ea1340>] kmemleak_alloc+0x34/0x40
    [<00000000c538ac7e>] __kmalloc_cache_noprof+0x26c/0x2f4
    [<00000000119f34f3>] __of_node_dup+0x4c/0x328
    [<00000000b212ca39>] build_changeset_next_level+0x2cc/0x4c0
    [<00000000eb208e87>] of_overlay_fdt_apply+0x930/0x1334
    [<000000005bdc53a3>] of_overlay_fdt_apply_kunit+0x54/0x10c
    [<00000000143acd5d>] of_overlay_apply_kunit_cleanup+0x12c/0x524
    [<00000000a813abc8>] kunit_try_run_case+0x13c/0x3ac
    [<00000000d77ab00c>] kunit_generic_run_threadfn_adapter+0x80/0xec
    [<000000000b296be1>] kthread+0x2e8/0x374
    [<0000000007bd1c51>] ret_from_fork+0x10/0x20
unreferenced object 0xffffff80c1751960 (size 16):
  comm "kunit_try_catch", pid 236, jiffies 4294894764
  hex dump (first 16 bytes):
    6b 75 6e 69 74 2d 74 65 73 74 00 c1 80 ff ff ff  kunit-test......
  backtrace (crc 18196259):
    [<0000000058ea1340>] kmemleak_alloc+0x34/0x40
    [<0000000071006e2c>] __kmalloc_node_track_caller_noprof+0x300/0x3e0
    [<00000000b16ac6cb>] kstrdup+0x48/0x84
    [<0000000050e3373b>] __of_node_dup+0x60/0x328
    [<00000000b212ca39>] build_changeset_next_level+0x2cc/0x4c0
    [<00000000eb208e87>] of_overlay_fdt_apply+0x930/0x1334
    [<000000005bdc53a3>] of_overlay_fdt_apply_kunit+0x54/0x10c
    [<00000000143acd5d>] of_overlay_apply_kunit_cleanup+0x12c/0x524
    [<00000000a813abc8>] kunit_try_run_case+0x13c/0x3ac
    [<00000000d77ab00c>] kunit_generic_run_threadfn_adapter+0x80/0xec
    [<000000000b296be1>] kthread+0x2e8/0x374
    [<0000000007bd1c51>] ret_from_fork+0x10/0x20
unreferenced object 0xffffff80c2e96e00 (size 192):
  comm "kunit_try_catch", pid 236, jiffies 4294894764
  hex dump (first 32 bytes):
    80 19 75 c1 80 ff ff ff 0b 00 00 00 00 00 00 00  ..u.............
    a0 19 75 c1 80 ff ff ff 00 6f e9 c2 80 ff ff ff  ..u......o......
  backtrace (crc 1924cba4):
    [<0000000058ea1340>] kmemleak_alloc+0x34/0x40
    [<00000000c538ac7e>] __kmalloc_cache_noprof+0x26c/0x2f4
    [<000000009fdd35ad>] __of_prop_dup+0x7c/0x2ec
    [<00000000aa4e0111>] add_changeset_property+0x548/0x9e0
    [<000000004777e25b>] build_changeset_next_level+0xd4/0x4c0
    [<00000000a9c93f8a>] build_changeset_next_level+0x3a8/0x4c0
    [<00000000eb208e87>] of_overlay_fdt_apply+0x930/0x1334
    [<000000005bdc53a3>] of_overlay_fdt_apply_kunit+0x54/0x10c
    [<00000000143acd5d>] of_overlay_apply_kunit_cleanup+0x12c/0x524
    [<00000000a813abc8>] kunit_try_run_case+0x13c/0x3ac
    [<00000000d77ab00c>] kunit_generic_run_threadfn_adapter+0x80/0xec
    [<000000000b296be1>] kthread+0x2e8/0x374
    [<0000000007bd1c51>] ret_from_fork+0x10/0x20
unreferenced object 0xffffff80c1751980 (size 16):
  comm "kunit_try_catch", pid 236, jiffies 4294894764
  hex dump (first 16 bytes):
    63 6f 6d 70 61 74 69 62 6c 65 00 c1 80 ff ff ff  compatible......
  backtrace (crc 42df3c87):
    [<0000000058ea1340>] kmemleak_alloc+0x34/0x40
    [<0000000071006e2c>] __kmalloc_node_track_caller_noprof+0x300/0x3e0
    [<00000000b16ac6cb>] kstrdup+0x48/0x84
    [<00000000a8888fd8>] __of_prop_dup+0xb0/0x2ec
    [<00000000aa4e0111>] add_changeset_property+0x548/0x9e0
    [<000000004777e25b>] build_changeset_next_level+0xd4/0x4c0
    [<00000000a9c93f8a>] build_changeset_next_level+0x3a8/0x4c0
    [<00000000eb208e87>] of_overlay_fdt_apply+0x930/0x1334
    [<000000005bdc53a3>] of_overlay_fdt_apply_kunit+0x54/0x10c
    [<00000000143acd5d>] of_overlay_apply_kunit_cleanup+0x12c/0x524
    [<00000000a813abc8>] kunit_try_run_case+0x13c/0x3ac
    [<00000000d77ab00c>] kunit_generic_run_threadfn_adapter+0x80/0xec
    [<000000000b296be1>] kthread+0x2e8/0x374
unreferenced object 0xffffff80c2e96f00 (size 192):
  comm "kunit_try_catch", pid 236, jiffies 4294894764
  hex dump (first 32 bytes):
    40 f7 bb c6 80 ff ff ff 0b 00 00 00 00 00 00 00  @...............
    c0 19 75 c1 80 ff ff ff 00 00 00 00 00 00 00 00  ..u.............
  backtrace (crc f2f57ea7):
    [<0000000058ea1340>] kmemleak_alloc+0x34/0x40
    [<00000000c538ac7e>] __kmalloc_cache_noprof+0x26c/0x2f4
    [<000000009fdd35ad>] __of_prop_dup+0x7c/0x2ec
    [<00000000aa4e0111>] add_changeset_property+0x548/0x9e0
    [<000000004777e25b>] build_changeset_next_level+0xd4/0x4c0
    [<00000000a9c93f8a>] build_changeset_next_level+0x3a8/0x4c0
    [<00000000eb208e87>] of_overlay_fdt_apply+0x930/0x1334
    [<000000005bdc53a3>] of_overlay_fdt_apply_kunit+0x54/0x10c
    [<00000000143acd5d>] of_overlay_apply_kunit_cleanup+0x12c/0x524
    [<00000000a813abc8>] kunit_try_run_case+0x13c/0x3ac
    [<00000000d77ab00c>] kunit_generic_run_threadfn_adapter+0x80/0xec
    [<000000000b296be1>] kthread+0x2e8/0x374
    [<0000000007bd1c51>] ret_from_fork+0x10/0x20
......

How to reproduce:
CONFIG_OF_OVERLAY_KUNIT_TEST=y, CONFIG_DEBUG_KMEMLEAK=y
and CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y, launch the kernel.

Fixes: 5c9dd72d8385 ("of: Add a KUnit test for overlays and test managed APIs")
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://lore.kernel.org/r/20241010034416.2324196-1-ruanjinjie@huawei.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
12 months agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net...
Jakub Kicinski [Thu, 10 Oct 2024 03:01:20 +0000 (20:01 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-10-08 (ice, i40e, igb, e1000e)

This series contains updates to ice, i40e, igb, and e1000e drivers.

For ice:

Marcin allows driver to load, into safe mode, when DDP package is
missing or corrupted and adjusts the netif_is_ice() check to
account for when the device is in safe mode. He also fixes an
out-of-bounds issue when MSI-X are increased for VFs.

Wojciech clears FDB entries on reset to match the hardware state.

For i40e:

Aleksandr adds locking around MACVLAN filters to prevent memory leaks
due to concurrency issues.

For igb:

Mohamed Khalfella adds a check to not attempt to bring up an already
running interface on non-fatal PCIe errors.

For e1000e:

Vitaly changes board type for I219 to more closely match the hardware
and stop PHY issues.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  e1000e: change I219 (19) devices to ADP
  igb: Do not bring the device up after non-fatal error
  i40e: Fix macvlan leak by synchronizing access to mac_filter_hash
  ice: Fix increasing MSI-X on VF
  ice: Flush FDB entries before reset
  ice: Fix netif_is_ice() in Safe Mode
  ice: Fix entering Safe Mode
====================

Link: https://patch.msgid.link/20241008230050.928245-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoMerge branch 'mptcp-misc-fixes-involving-fallback-to-tcp'
Jakub Kicinski [Thu, 10 Oct 2024 02:43:46 +0000 (19:43 -0700)]
Merge branch 'mptcp-misc-fixes-involving-fallback-to-tcp'

Matthieu Baerts says:

====================
mptcp: misc. fixes involving fallback to TCP

- Patch 1: better handle DSS corruptions from a bugged peer: reducing
  warnings, doing a fallback or a reset depending on the subflow state.
  For >= v5.7.

- Patch 2: fix DSS corruption due to large pmtu xmit, where MPTCP was
  not taken into account. For >= v5.6.

- Patch 3: fallback when MPTCP opts are dropped after the first data
  packet, instead of resetting the connection. For >= v5.6.

- Patch 4: restrict the removal of a subflow to other closing states, a
  better fix, for a recent one. For >= v5.10.
====================

Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-0-c6fb8e93e551@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agomptcp: pm: do not remove closing subflows
Matthieu Baerts (NGI0) [Tue, 8 Oct 2024 11:04:55 +0000 (13:04 +0200)]
mptcp: pm: do not remove closing subflows

In a previous fix, the in-kernel path-manager has been modified not to
retrigger the removal of a subflow if it was already closed, e.g. when
the initial subflow is removed, but kept in the subflows list.

To be complete, this fix should also skip the subflows that are in any
closing state: mptcp_close_ssk() will initiate the closure, but the
switch to the TCP_CLOSE state depends on the other peer.

Fixes: 58e1b66b4e4b ("mptcp: pm: do not remove already closed subflows")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-4-c6fb8e93e551@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agomptcp: fallback when MPTCP opts are dropped after 1st data
Matthieu Baerts (NGI0) [Tue, 8 Oct 2024 11:04:54 +0000 (13:04 +0200)]
mptcp: fallback when MPTCP opts are dropped after 1st data

As reported by Christoph [1], before this patch, an MPTCP connection was
wrongly reset when a host received a first data packet with MPTCP
options after the 3wHS, but got the next ones without.

According to the MPTCP v1 specs [2], a fallback should happen in this
case, because the host didn't receive a DATA_ACK from the other peer,
nor receive data for more than the initial window which implies a
DATA_ACK being received by the other peer.

The patch here re-uses the same logic as the one used in other places:
by looking at allow_infinite_fallback, which is disabled at the creation
of an additional subflow. It's not looking at the first DATA_ACK (or
implying one received from the other side) as suggested by the RFC, but
it is in continuation with what was already done, which is safer, and it
fixes the reported issue. The next step, looking at this first DATA_ACK,
is tracked in [4].

This patch has been validated using the following Packetdrill script:

   0 socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3
  +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
  +0 bind(3, ..., ...) = 0
  +0 listen(3, 1) = 0

  // 3WHS is OK
  +0.0 < S  0:0(0)       win 65535  <mss 1460, sackOK, nop, nop, nop, wscale 6, mpcapable v1 flags[flag_h] nokey>
  +0.0 > S. 0:0(0) ack 1            <mss 1460, nop, nop, sackOK, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey]>
  +0.1 <  . 1:1(0) ack 1 win 2048                                              <mpcapable v1 flags[flag_h] key[ckey=2, skey]>
  +0 accept(3, ..., ...) = 4

  // Data from the client with valid MPTCP options (no DATA_ACK: normal)
  +0.1 < P. 1:501(500) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[skey, ckey] mpcdatalen 500, nop, nop>
  // From here, the MPTCP options will be dropped by a middlebox
  +0.0 >  . 1:1(0)     ack 501        <dss dack8=501 dll=0 nocs>

  +0.1 read(4, ..., 500) = 500
  +0   write(4, ..., 100) = 100

  // The server replies with data, still thinking MPTCP is being used
  +0.0 > P. 1:101(100)   ack 501          <dss dack8=501 dsn8=1 ssn=1 dll=100 nocs, nop, nop>
  // But the client already did a fallback to TCP, because the two previous packets have been received without MPTCP options
  +0.1 <  . 501:501(0)   ack 101 win 2048

  +0.0 < P. 501:601(100) ack 101 win 2048
  // The server should fallback to TCP, not reset: it didn't get a DATA_ACK, nor data for more than the initial window
  +0.0 >  . 101:101(0)   ack 601

Note that this script requires Packetdrill with MPTCP support, see [3].

Fixes: dea2b1ea9c70 ("mptcp: do not reset MP_CAPABLE subflow on mapping errors")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/518 [1]
Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback
Link: https://github.com/multipath-tcp/packetdrill
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/519
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-3-c6fb8e93e551@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agotcp: fix mptcp DSS corruption due to large pmtu xmit
Paolo Abeni [Tue, 8 Oct 2024 11:04:53 +0000 (13:04 +0200)]
tcp: fix mptcp DSS corruption due to large pmtu xmit

Syzkaller was able to trigger a DSS corruption:

  TCP: request_sock_subflow_v4: Possible SYN flooding on port [::]:20002. Sending cookies.
  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 5227 at net/mptcp/protocol.c:695 __mptcp_move_skbs_from_subflow+0x20a9/0x21f0 net/mptcp/protocol.c:695
  Modules linked in:
  CPU: 0 UID: 0 PID: 5227 Comm: syz-executor350 Not tainted 6.11.0-syzkaller-08829-gaf9c191ac2a0 #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
  RIP: 0010:__mptcp_move_skbs_from_subflow+0x20a9/0x21f0 net/mptcp/protocol.c:695
  Code: 0f b6 dc 31 ff 89 de e8 b5 dd ea f5 89 d8 48 81 c4 50 01 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 98 da ea f5 90 <0f> 0b 90 e9 47 ff ff ff e8 8a da ea f5 90 0f 0b 90 e9 99 e0 ff ff
  RSP: 0018:ffffc90000006db8 EFLAGS: 00010246
  RAX: ffffffff8ba9df18 RBX: 00000000000055f0 RCX: ffff888030023c00
  RDX: 0000000000000100 RSI: 00000000000081e5 RDI: 00000000000055f0
  RBP: 1ffff110062bf1ae R08: ffffffff8ba9cf12 R09: 1ffff110062bf1b8
  R10: dffffc0000000000 R11: ffffed10062bf1b9 R12: 0000000000000000
  R13: dffffc0000000000 R14: 00000000700cec61 R15: 00000000000081e5
  FS:  000055556679c380(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000020287000 CR3: 0000000077892000 CR4: 00000000003506f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <IRQ>
   move_skbs_to_msk net/mptcp/protocol.c:811 [inline]
   mptcp_data_ready+0x29c/0xa90 net/mptcp/protocol.c:854
   subflow_data_ready+0x34a/0x920 net/mptcp/subflow.c:1490
   tcp_data_queue+0x20fd/0x76c0 net/ipv4/tcp_input.c:5283
   tcp_rcv_established+0xfba/0x2020 net/ipv4/tcp_input.c:6237
   tcp_v4_do_rcv+0x96d/0xc70 net/ipv4/tcp_ipv4.c:1915
   tcp_v4_rcv+0x2dc0/0x37f0 net/ipv4/tcp_ipv4.c:2350
   ip_protocol_deliver_rcu+0x22e/0x440 net/ipv4/ip_input.c:205
   ip_local_deliver_finish+0x341/0x5f0 net/ipv4/ip_input.c:233
   NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
   NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
   __netif_receive_skb_one_core net/core/dev.c:5662 [inline]
   __netif_receive_skb+0x2bf/0x650 net/core/dev.c:5775
   process_backlog+0x662/0x15b0 net/core/dev.c:6107
   __napi_poll+0xcb/0x490 net/core/dev.c:6771
   napi_poll net/core/dev.c:6840 [inline]
   net_rx_action+0x89b/0x1240 net/core/dev.c:6962
   handle_softirqs+0x2c5/0x980 kernel/softirq.c:554
   do_softirq+0x11b/0x1e0 kernel/softirq.c:455
   </IRQ>
   <TASK>
   __local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382
   local_bh_enable include/linux/bottom_half.h:33 [inline]
   rcu_read_unlock_bh include/linux/rcupdate.h:919 [inline]
   __dev_queue_xmit+0x1764/0x3e80 net/core/dev.c:4451
   dev_queue_xmit include/linux/netdevice.h:3094 [inline]
   neigh_hh_output include/net/neighbour.h:526 [inline]
   neigh_output include/net/neighbour.h:540 [inline]
   ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:236
   ip_local_out net/ipv4/ip_output.c:130 [inline]
   __ip_queue_xmit+0x118c/0x1b80 net/ipv4/ip_output.c:536
   __tcp_transmit_skb+0x2544/0x3b30 net/ipv4/tcp_output.c:1466
   tcp_transmit_skb net/ipv4/tcp_output.c:1484 [inline]
   tcp_mtu_probe net/ipv4/tcp_output.c:2547 [inline]
   tcp_write_xmit+0x641d/0x6bf0 net/ipv4/tcp_output.c:2752
   __tcp_push_pending_frames+0x9b/0x360 net/ipv4/tcp_output.c:3015
   tcp_push_pending_frames include/net/tcp.h:2107 [inline]
   tcp_data_snd_check net/ipv4/tcp_input.c:5714 [inline]
   tcp_rcv_established+0x1026/0x2020 net/ipv4/tcp_input.c:6239
   tcp_v4_do_rcv+0x96d/0xc70 net/ipv4/tcp_ipv4.c:1915
   sk_backlog_rcv include/net/sock.h:1113 [inline]
   __release_sock+0x214/0x350 net/core/sock.c:3072
   release_sock+0x61/0x1f0 net/core/sock.c:3626
   mptcp_push_release net/mptcp/protocol.c:1486 [inline]
   __mptcp_push_pending+0x6b5/0x9f0 net/mptcp/protocol.c:1625
   mptcp_sendmsg+0x10bb/0x1b10 net/mptcp/protocol.c:1903
   sock_sendmsg_nosec net/socket.c:730 [inline]
   __sock_sendmsg+0x1a6/0x270 net/socket.c:745
   ____sys_sendmsg+0x52a/0x7e0 net/socket.c:2603
   ___sys_sendmsg net/socket.c:2657 [inline]
   __sys_sendmsg+0x2aa/0x390 net/socket.c:2686
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
  RIP: 0033:0x7fb06e9317f9
  Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007ffe2cfd4f98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
  RAX: ffffffffffffffda RBX: 00007fb06e97f468 RCX: 00007fb06e9317f9
  RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000005
  RBP: 00007fb06e97f446 R08: 0000555500000000 R09: 0000555500000000
  R10: 0000555500000000 R11: 0000000000000246 R12: 00007fb06e97f406
  R13: 0000000000000001 R14: 00007ffe2cfd4fe0 R15: 0000000000000003
   </TASK>

Additionally syzkaller provided a nice reproducer. The repro enables
pmtu on the loopback device, leading to tcp_mtu_probe() generating
very large probe packets.

tcp_can_coalesce_send_queue_head() currently does not check for
mptcp-level invariants, and allowed the creation of cross-DSS probes,
leading to the mentioned corruption.

Address the issue teaching tcp_can_coalesce_send_queue_head() about
mptcp using the tcp_skb_can_collapse(), also reducing the code
duplication.

Fixes: 85712484110d ("tcp: coalesce/collapse must respect MPTCP extensions")
Cc: stable@vger.kernel.org
Reported-by: syzbot+d1bff73460e33101f0e7@syzkaller.appspotmail.com
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/513
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-2-c6fb8e93e551@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agomptcp: handle consistently DSS corruption
Paolo Abeni [Tue, 8 Oct 2024 11:04:52 +0000 (13:04 +0200)]
mptcp: handle consistently DSS corruption

Bugged peer implementation can send corrupted DSS options, consistently
hitting a few warning in the data path. Use DEBUG_NET assertions, to
avoid the splat on some builds and handle consistently the error, dumping
related MIBs and performing fallback and/or reset according to the
subflow type.

Fixes: 6771bfd9ee24 ("mptcp: update mptcp ack sequence from work queue")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-1-c6fb8e93e551@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agonet: netconsole: fix wrong warning
Breno Leitao [Tue, 8 Oct 2024 09:43:24 +0000 (02:43 -0700)]
net: netconsole: fix wrong warning

A warning is triggered when there is insufficient space in the buffer
for userdata. However, this is not an issue since userdata will be sent
in the next iteration.

Current warning message:

    ------------[ cut here ]------------
     WARNING: CPU: 13 PID: 3013042 at drivers/net/netconsole.c:1122 write_ext_msg+0x3b6/0x3d0
      ? write_ext_msg+0x3b6/0x3d0
      console_flush_all+0x1e9/0x330

The code incorrectly issues a warning when this_chunk is zero, which is
a valid scenario. The warning should only be triggered when this_chunk
is negative.

Fixes: 1ec9daf95093 ("net: netconsole: append userdata to fragmented netconsole messages")
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241008094325.896208-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agonet: dsa: refuse cross-chip mirroring operations
Vladimir Oltean [Tue, 8 Oct 2024 09:43:20 +0000 (12:43 +0300)]
net: dsa: refuse cross-chip mirroring operations

In case of a tc mirred action from one switch to another, the behavior
is not correct. We simply tell the source switch driver to program a
mirroring entry towards mirror->to_local_port = to_dp->index, but it is
not even guaranteed that the to_dp belongs to the same switch as dp.

For proper cross-chip support, we would need to go through the
cross-chip notifier layer in switch.c, program the entry on cascade
ports, and introduce new, explicit API for cross-chip mirroring, given
that intermediary switches should have introspection into the DSA tags
passed through the cascade port (and not just program a port mirror on
the entire cascade port). None of that exists today.

Reject what is not implemented so that user space is not misled into
thinking it works.

Fixes: f50f212749e8 ("net: dsa: Add plumbing for port mirroring")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241008094320.3340980-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agonet: fec: don't save PTP state if PTP is unsupported
Wei Fang [Tue, 8 Oct 2024 06:11:53 +0000 (14:11 +0800)]
net: fec: don't save PTP state if PTP is unsupported

Some platforms (such as i.MX25 and i.MX27) do not support PTP, so on
these platforms fec_ptp_init() is not called and the related members
in fep are not initialized. However, fec_ptp_save_state() is called
unconditionally, which causes the kernel to panic. Therefore, add a
condition so that fec_ptp_save_state() is not called if PTP is not
supported.

Fixes: a1477dc87dc4 ("net: fec: Restart PPS after link state change")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/lkml/353e41fe-6bb4-4ee9-9980-2da2a9c1c508@roeck-us.net/
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Csókás, Bence <csokas.bence@prolan.hu>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://patch.msgid.link/20241008061153.1977930-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agonet: ibm: emac: mal: add dcr_unmap to _remove
Rosen Penev [Tue, 8 Oct 2024 23:30:50 +0000 (16:30 -0700)]
net: ibm: emac: mal: add dcr_unmap to _remove

It's done in probe so it should be undone here.

Fixes: 1d3bb996481e ("Device tree aware EMAC driver")
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20241008233050.9422-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agonet: ftgmac100: fixed not check status from fixed phy
Jacky Chou [Mon, 7 Oct 2024 03:24:35 +0000 (11:24 +0800)]
net: ftgmac100: fixed not check status from fixed phy

Add error handling from calling fixed_phy_register.
It may return some error, therefore, need to check the status.

And fixed_phy_register needs to bind a device node for mdio.
Add the mac device node for fixed_phy_register function.
This is a reference to this function, of_phy_register_fixed_link().

Fixes: e24a6c874601 ("net: ftgmac100: Get link speed and duplex for NC-SI")
Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com>
Link: https://patch.msgid.link/20241007032435.787892-1-jacky_chou@aspeedtech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoMerge tag 'mm-hotfixes-stable-2024-10-09-15-46' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Wed, 9 Oct 2024 23:01:40 +0000 (16:01 -0700)]
Merge tag 'mm-hotfixes-stable-2024-10-09-15-46' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "12 hotfixes, 5 of which are c:stable. All singletons, about half of
  which are MM"

* tag 'mm-hotfixes-stable-2024-10-09-15-46' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm: zswap: delete comments for "value" member of 'struct zswap_entry'.
  CREDITS: sort alphabetically by name
  secretmem: disable memfd_secret() if arch cannot set direct map
  .mailmap: update Fangrui's email
  mm/huge_memory: check pmd_special() only after pmd_present()
  resource, kunit: fix user-after-free in resource_test_region_intersects()
  fs/proc/kcore.c: allow translation of physical memory addresses
  selftests/mm: fix incorrect buffer->mirror size in hmm2 double_map test
  device-dax: correct pgoff align in dax_set_mapping()
  kthread: unpark only parked kthread
  Revert "mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"
  bcachefs: do not use PF_MEMALLOC_NORECLAIM

12 months agoselftests: netfilter: conntrack_vrf.sh: add fib test case
Florian Westphal [Wed, 9 Oct 2024 07:19:03 +0000 (09:19 +0200)]
selftests: netfilter: conntrack_vrf.sh: add fib test case

meta iifname veth0 ip daddr ... fib daddr oif

... is expected to return "dummy0" interface which is part of same vrf
as veth0.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
12 months agonetfilter: fib: check correct rtable in vrf setups
Florian Westphal [Wed, 9 Oct 2024 07:19:02 +0000 (09:19 +0200)]
netfilter: fib: check correct rtable in vrf setups

We need to init l3mdev unconditionally, else main routing table is searched
and incorrect result is returned unless strict (iif keyword) matching is
requested.

Next patch adds a selftest for this.

Fixes: 2a8a7c0eaa87 ("netfilter: nft_fib: Fix for rpath check with VRF devices")
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1761
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
12 months agonetfilter: xtables: avoid NFPROTO_UNSPEC where needed
Florian Westphal [Mon, 7 Oct 2024 09:28:16 +0000 (11:28 +0200)]
netfilter: xtables: avoid NFPROTO_UNSPEC where needed

syzbot managed to call xt_cluster match via ebtables:

 WARNING: CPU: 0 PID: 11 at net/netfilter/xt_cluster.c:72 xt_cluster_mt+0x196/0x780
 [..]
 ebt_do_table+0x174b/0x2a40

Module registers to NFPROTO_UNSPEC, but it assumes ipv4/ipv6 packet
processing.  As this is only useful to restrict locally terminating
TCP/UDP traffic, register this for ipv4 and ipv6 family only.

Pablo points out that this is a general issue, direct users of the
set/getsockopt interface can call into targets/matches that were only
intended for use with ip(6)tables.

Check all UNSPEC matches and targets for similar issues:

- matches and targets are fine except if they assume skb_network_header()
  is valid -- this is only true when called from inet layer: ip(6) stack
  pulls the ip/ipv6 header into linear data area.
- targets that return XT_CONTINUE or other xtables verdicts must be
  restricted too, they are incompatbile with the ebtables traverser, e.g.
  EBT_CONTINUE is a completely different value than XT_CONTINUE.

Most matches/targets are changed to register for NFPROTO_IPV4/IPV6, as
they are provided for use by ip(6)tables.

The MARK target is also used by arptables, so register for NFPROTO_ARP too.

While at it, bail out if connbytes fails to enable the corresponding
conntrack family.

This change passes the selftests in iptables.git.

Reported-by: syzbot+256c348558aa5cf611a9@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netfilter-devel/66fec2e2.050a0220.9ec68.0047.GAE@google.com/
Fixes: 0269ea493734 ("netfilter: xtables: add cluster match")
Signed-off-by: Florian Westphal <fw@strlen.de>
Co-developed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
12 months agomm: zswap: delete comments for "value" member of 'struct zswap_entry'.
Kanchana P Sridhar [Wed, 2 Oct 2024 19:42:13 +0000 (12:42 -0700)]
mm: zswap: delete comments for "value" member of 'struct zswap_entry'.

Made a minor edit in the comments for 'struct zswap_entry' to delete the
description of the 'value' member that was deleted in commit 20a5532ffa53
("mm: remove code to handle same filled pages").

Link: https://lkml.kernel.org/r/20241002194213.30041-1-kanchana.p.sridhar@intel.com
Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Fixes: 20a5532ffa53 ("mm: remove code to handle same filled pages")
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Usama Arif <usamaarif642@gmail.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Wajdi Feghali <wajdi.k.feghali@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 months agoCREDITS: sort alphabetically by name
Krzysztof Kozlowski [Wed, 2 Oct 2024 11:19:32 +0000 (13:19 +0200)]
CREDITS: sort alphabetically by name

Re-sort few misplaced entries in the CREDITS file.

Link: https://lkml.kernel.org/r/20241002111932.46012-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 months agosecretmem: disable memfd_secret() if arch cannot set direct map
Patrick Roy [Tue, 1 Oct 2024 08:00:41 +0000 (09:00 +0100)]
secretmem: disable memfd_secret() if arch cannot set direct map

Return -ENOSYS from memfd_secret() syscall if !can_set_direct_map().  This
is the case for example on some arm64 configurations, where marking 4k
PTEs in the direct map not present can only be done if the direct map is
set up at 4k granularity in the first place (as ARM's break-before-make
semantics do not easily allow breaking apart large/gigantic pages).

More precisely, on arm64 systems with !can_set_direct_map(),
set_direct_map_invalid_noflush() is a no-op, however it returns success
(0) instead of an error.  This means that memfd_secret will seemingly
"work" (e.g.  syscall succeeds, you can mmap the fd and fault in pages),
but it does not actually achieve its goal of removing its memory from the
direct map.

Note that with this patch, memfd_secret() will start erroring on systems
where can_set_direct_map() returns false (arm64 with
CONFIG_RODATA_FULL_DEFAULT_ENABLED=n, CONFIG_DEBUG_PAGEALLOC=n and
CONFIG_KFENCE=n), but that still seems better than the current silent
failure.  Since CONFIG_RODATA_FULL_DEFAULT_ENABLED defaults to 'y', most
arm64 systems actually have a working memfd_secret() and aren't be
affected.

From going through the iterations of the original memfd_secret patch
series, it seems that disabling the syscall in these scenarios was the
intended behavior [1] (preferred over having
set_direct_map_invalid_noflush return an error as that would result in
SIGBUSes at page-fault time), however the check for it got dropped between
v16 [2] and v17 [3], when secretmem moved away from CMA allocations.

[1]: https://lore.kernel.org/lkml/20201124164930.GK8537@kernel.org/
[2]: https://lore.kernel.org/lkml/20210121122723.3446-11-rppt@kernel.org/#t
[3]: https://lore.kernel.org/lkml/20201125092208.12544-10-rppt@kernel.org/

Link: https://lkml.kernel.org/r/20241001080056.784735-1-roypat@amazon.co.uk
Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas")
Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: James Gowans <jgowans@amazon.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 months ago.mailmap: update Fangrui's email
Fangrui Song [Fri, 27 Sep 2024 19:29:12 +0000 (12:29 -0700)]
.mailmap: update Fangrui's email

I'm leaving Google.

Link: https://lkml.kernel.org/r/20240927192912.31532-1-i@maskray.me
Signed-off-by: Fangrui Song <i@maskray.me>
Acked-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 months agomm/huge_memory: check pmd_special() only after pmd_present()
David Hildenbrand [Thu, 26 Sep 2024 15:42:34 +0000 (17:42 +0200)]
mm/huge_memory: check pmd_special() only after pmd_present()

We should only check for pmd_special() after we made sure that we have a
present PMD.  For example, if we have a migration PMD, pmd_special() might
indicate that we have a special PMD although we really don't.

This fixes confusing migration entries as PFN mappings, and not doing what
we are supposed to do in the "is_swap_pmd()" case further down in the
function -- including messing up COW, page table handling and accounting.

Link: https://lkml.kernel.org/r/20240926154234.2247217-1-david@redhat.com
Fixes: bc02afbd4d73 ("mm/fork: accept huge pfnmap entries")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: syzbot+bf2c35fa302ebe3c7471@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/66f15c8d.050a0220.c23dd.000f.GAE@google.com/
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 months agoresource, kunit: fix user-after-free in resource_test_region_intersects()
Huang Ying [Mon, 30 Sep 2024 07:06:11 +0000 (15:06 +0800)]
resource, kunit: fix user-after-free in resource_test_region_intersects()

In resource_test_insert_resource(), the pointer is used in error message
after kfree().  This is user-after-free.  To fix this, we need to call
kunit_add_action_or_reset() to schedule memory freeing after usage.  But
kunit_add_action_or_reset() itself may fail and free the memory.  So, its
return value should be checked and abort the test for failure.  Then, we
found that other usage of kunit_add_action_or_reset() in
resource_test_region_intersects() needs to be fixed too.  We fix all these
user-after-free bugs in this patch.

Link: https://lkml.kernel.org/r/20240930070611.353338-1-ying.huang@intel.com
Fixes: 99185c10d5d9 ("resource, kunit: add test case for region_intersects()")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reported-by: Kees Bakker <kees@ijzerbout.nl>
Closes: https://lore.kernel.org/lkml/87ldzaotcg.fsf@yhuang6-desk2.ccr.corp.intel.com/
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 months agofs/proc/kcore.c: allow translation of physical memory addresses
Alexander Gordeev [Mon, 30 Sep 2024 12:21:19 +0000 (14:21 +0200)]
fs/proc/kcore.c: allow translation of physical memory addresses

When /proc/kcore is read an attempt to read the first two pages results in
HW-specific page swap on s390 and another (so called prefix) pages are
accessed instead.  That leads to a wrong read.

Allow architecture-specific translation of memory addresses using
kc_xlate_dev_mem_ptr() and kc_unxlate_dev_mem_ptr() callbacks similarily
to /dev/mem xlate_dev_mem_ptr() and unxlate_dev_mem_ptr() callbacks.  That
way an architecture can deal with specific physical memory ranges.

Re-use the existing /dev/mem callback implementation on s390, which
handles the described prefix pages swapping correctly.

For other architectures the default callback is basically NOP.  It is
expected the condition (vaddr == __va(__pa(vaddr))) always holds true for
KCORE_RAM memory type.

Link: https://lkml.kernel.org/r/20240930122119.1651546-1-agordeev@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 months agoselftests/mm: fix incorrect buffer->mirror size in hmm2 double_map test
Donet Tom [Fri, 27 Sep 2024 05:07:52 +0000 (00:07 -0500)]
selftests/mm: fix incorrect buffer->mirror size in hmm2 double_map test

The hmm2 double_map test was failing due to an incorrect buffer->mirror
size.  The buffer->mirror size was 6, while buffer->ptr size was 6 *
PAGE_SIZE.  The test failed because the kernel's copy_to_user function was
attempting to copy a 6 * PAGE_SIZE buffer to buffer->mirror.  Since the
size of buffer->mirror was incorrect, copy_to_user failed.

This patch corrects the buffer->mirror size to 6 * PAGE_SIZE.

Test Result without this patch
==============================
 #  RUN           hmm2.hmm2_device_private.double_map ...
 # hmm-tests.c:1680:double_map:Expected ret (-14) == 0 (0)
 # double_map: Test terminated by assertion
 #          FAIL  hmm2.hmm2_device_private.double_map
 not ok 53 hmm2.hmm2_device_private.double_map

Test Result with this patch
===========================
 #  RUN           hmm2.hmm2_device_private.double_map ...
 #            OK  hmm2.hmm2_device_private.double_map
 ok 53 hmm2.hmm2_device_private.double_map

Link: https://lkml.kernel.org/r/20240927050752.51066-1-donettom@linux.ibm.com
Fixes: fee9f6d1b8df ("mm/hmm/test: add selftests for HMM")
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 months agodevice-dax: correct pgoff align in dax_set_mapping()
Kun(llfl) [Fri, 27 Sep 2024 07:45:09 +0000 (15:45 +0800)]
device-dax: correct pgoff align in dax_set_mapping()

pgoff should be aligned using ALIGN_DOWN() instead of ALIGN().  Otherwise,
vmf->address not aligned to fault_size will be aligned to the next
alignment, that can result in memory failure getting the wrong address.

It's a subtle situation that only can be observed in
page_mapped_in_vma() after the page is page fault handled by
dev_dax_huge_fault.  Generally, there is little chance to perform
page_mapped_in_vma in dev-dax's page unless in specific error injection
to the dax device to trigger an MCE - memory-failure.  In that case,
page_mapped_in_vma() will be triggered to determine which task is
accessing the failure address and kill that task in the end.

We used self-developed dax device (which is 2M aligned mapping) , to
perform error injection to random address.  It turned out that error
injected to non-2M-aligned address was causing endless MCE until panic.
Because page_mapped_in_vma() kept resulting wrong address and the task
accessing the failure address was never killed properly:

[ 3783.719419] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3784.049006] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3784.049190] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3784.448042] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3784.448186] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3784.792026] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3784.792179] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3785.162502] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3785.162633] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3785.461116] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3785.461247] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3785.764730] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3785.764859] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3786.042128] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3786.042259] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3786.464293] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3786.464423] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3786.818090] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3786.818217] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3787.085297] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3787.085424] Memory failure: 0x200c9742: recovery action for dax page:
Recovered

It took us several weeks to pinpoint this problem,  but we eventually
used bpftrace to trace the page fault and mce address and successfully
identified the issue.

Joao added:

; Likely we never reproduce in production because we always pin
: device-dax regions in the region align they provide (Qemu does
: similarly with prealloc in hugetlb/file backed memory).  I think this
: bug requires that we touch *unpinned* device-dax regions unaligned to
: the device-dax selected alignment (page size i.e.  4K/2M/1G)

Link: https://lkml.kernel.org/r/23c02a03e8d666fef11bbe13e85c69c8b4ca0624.1727421694.git.llfl@linux.alibaba.com
Fixes: b9b5777f09be ("device-dax: use ALIGN() for determining pgoff")
Signed-off-by: Kun(llfl) <llfl@linux.alibaba.com>
Tested-by: JianXiong Zhao <zhaojianxiong.zjx@alibaba-inc.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 months agokthread: unpark only parked kthread
Frederic Weisbecker [Fri, 13 Sep 2024 21:46:34 +0000 (23:46 +0200)]
kthread: unpark only parked kthread

Calling into kthread unparking unconditionally is mostly harmless when
the kthread is already unparked. The wake up is then simply ignored
because the target is not in TASK_PARKED state.

However if the kthread is per CPU, the wake up is preceded by a call
to kthread_bind() which expects the task to be inactive and in
TASK_PARKED state, which obviously isn't the case if it is unparked.

As a result, calling kthread_stop() on an unparked per-cpu kthread
triggers such a warning:

WARNING: CPU: 0 PID: 11 at kernel/kthread.c:525 __kthread_bind_mask kernel/kthread.c:525
 <TASK>
 kthread_stop+0x17a/0x630 kernel/kthread.c:707
 destroy_workqueue+0x136/0xc40 kernel/workqueue.c:5810
 wg_destruct+0x1e2/0x2e0 drivers/net/wireguard/device.c:257
 netdev_run_todo+0xe1a/0x1000 net/core/dev.c:10693
 default_device_exit_batch+0xa14/0xa90 net/core/dev.c:11769
 ops_exit_list net/core/net_namespace.c:178 [inline]
 cleanup_net+0x89d/0xcc0 net/core/net_namespace.c:640
 process_one_work kernel/workqueue.c:3231 [inline]
 process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3312
 worker_thread+0x86d/0xd70 kernel/workqueue.c:3393
 kthread+0x2f0/0x390 kernel/kthread.c:389
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
 </TASK>

Fix this with skipping unecessary unparking while stopping a kthread.

Link: https://lkml.kernel.org/r/20240913214634.12557-1-frederic@kernel.org
Fixes: 5c25b5ff89f0 ("workqueue: Tag bound workers with KTHREAD_IS_PER_CPU")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reported-by: syzbot+943d34fa3cf2191e3068@syzkaller.appspotmail.com
Tested-by: syzbot+943d34fa3cf2191e3068@syzkaller.appspotmail.com
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 months agoRevert "mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"
Michal Hocko [Thu, 26 Sep 2024 17:11:51 +0000 (19:11 +0200)]
Revert "mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"

This reverts commit eab0af905bfc3e9c05da2ca163d76a1513159aa4.

There is no existing user of those flags.  PF_MEMALLOC_NOWARN is dangerous
because a nested allocation context can use GFP_NOFAIL which could cause
unexpected failure.  Such a code would be hard to maintain because it
could be deeper in the call chain.

PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1] that
such a allocation contex is inherently unsafe if the context doesn't fully
control all allocations called from this context.

While PF_MEMALLOC_NOWARN is not dangerous the way PF_MEMALLOC_NORECLAIM is
it doesn't have any user and as Matthew has pointed out we are running out
of those flags so better reclaim it without any real users.

[1] https://lore.kernel.org/all/ZcM0xtlKbAOFjv5n@tiehlicka/

Link: https://lkml.kernel.org/r/20240926172940.167084-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: James Morris <jmorris@namei.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 months agobcachefs: do not use PF_MEMALLOC_NORECLAIM
Michal Hocko [Thu, 26 Sep 2024 17:11:50 +0000 (19:11 +0200)]
bcachefs: do not use PF_MEMALLOC_NORECLAIM

Patch series "remove PF_MEMALLOC_NORECLAIM" v3.

This patch (of 2):

bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new
inode to achieve GFP_NOWAIT semantic while holding locks. If this
allocation fails it will drop locks and use GFP_NOFS allocation context.

We would like to drop PF_MEMALLOC_NORECLAIM because it is really
dangerous to use if the caller doesn't control the full call chain with
this flag set. E.g. if any of the function down the chain needed
GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and
cause unexpected failure.

While this is not the case in this particular case using the scoped gfp
semantic is not really needed bacause we can easily pus the allocation
context down the chain without too much clutter.

[akpm@linux-foundation.org: fix kerneldoc warnings]
Link: https://lkml.kernel.org/r/20240926172940.167084-1-mhocko@kernel.org
Link: https://lkml.kernel.org/r/20240926172940.167084-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz> # For vfs changes
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: James Morris <jmorris@namei.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 months agomisc: sgi-gru: Don't disable preemption in GRU driver
Dimitri Sivanich [Thu, 19 Sep 2024 12:34:50 +0000 (07:34 -0500)]
misc: sgi-gru: Don't disable preemption in GRU driver

Disabling preemption in the GRU driver is unnecessary, and clashes with
sleeping locks in several code paths.  Remove preempt_disable and
preempt_enable from the GRU driver.

Signed-off-by: Dimitri Sivanich <sivanich@hpe.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 months agoNFS: remove revoked delegation from server's delegation list
Dai Ngo [Tue, 8 Oct 2024 22:58:07 +0000 (15:58 -0700)]
NFS: remove revoked delegation from server's delegation list

After the delegation is returned to the NFS server remove it
from the server's delegations list to reduce the time it takes
to scan this list.

Network trace captured while running the below script shows the
time taken to service the CB_RECALL increases gradually due to
the overhead of traversing the delegation list in
nfs_delegation_find_inode_server.

The NFS server in this test is a Solaris server which issues
CB_RECALL when receiving the all-zero stateid in the SETATTR.

mount=/mnt/data
for i in $(seq 1 20)
do
   echo $i
   mkdir $mount/testtarfile$i
   time  tar -C $mount/testtarfile$i -xf 5000_files.tar
done

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
12 months agoMerge tag 'unicode-fixes-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 9 Oct 2024 19:22:02 +0000 (12:22 -0700)]
Merge tag 'unicode-fixes-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode

Pull unicode fix from Gabriel Krisman Bertazi:

 - Handle code-points with the Ignorable property as regular character
   instead of treating them as an empty string (me)

* tag 'unicode-fixes-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
  unicode: Don't special case ignorable code points

12 months agounicode: Don't special case ignorable code points
Gabriel Krisman Bertazi [Tue, 8 Oct 2024 22:43:16 +0000 (18:43 -0400)]
unicode: Don't special case ignorable code points

We don't need to handle them separately. Instead, just let them
decompose/casefold to themselves.

Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
12 months agoata: libata: avoid superfluous disk spin down + spin up during hibernation
Niklas Cassel [Tue, 8 Oct 2024 13:58:44 +0000 (15:58 +0200)]
ata: libata: avoid superfluous disk spin down + spin up during hibernation

A user reported that commit aa3998dbeb3a ("ata: libata-scsi: Disable scsi
device manage_system_start_stop") introduced a spin down + immediate spin
up of the disk both when entering and when resuming from hibernation.
This behavior was not there before, and causes an increased latency both
when entering and when resuming from hibernation.

Hibernation is done by three consecutive PM events, in the following order:
1) PM_EVENT_FREEZE
2) PM_EVENT_THAW
3) PM_EVENT_HIBERNATE

Commit aa3998dbeb3a ("ata: libata-scsi: Disable scsi device
manage_system_start_stop") modified ata_eh_handle_port_suspend() to call
ata_dev_power_set_standby() (which spins down the disk), for both event
PM_EVENT_FREEZE and event PM_EVENT_HIBERNATE.

Documentation/driver-api/pm/devices.rst, section "Entering Hibernation",
explicitly mentions that PM_EVENT_FREEZE does not have to be put the device
in a low-power state, and actually recommends not doing so. Thus, let's not
spin down the disk on PM_EVENT_FREEZE. (The disk will instead be spun down
during the subsequent PM_EVENT_HIBERNATE event.)

This way, PM_EVENT_FREEZE will behave as it did before commit aa3998dbeb3a
("ata: libata-scsi: Disable scsi device manage_system_start_stop"), while
PM_EVENT_HIBERNATE will continue to spin down the disk.

This will avoid the superfluous spin down + spin up when entering and
resuming from hibernation, while still making sure that the disk is spun
down before actually entering hibernation.

Cc: stable@vger.kernel.org # v6.6+
Fixes: aa3998dbeb3a ("ata: libata-scsi: Disable scsi device manage_system_start_stop")
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20241008135843.1266244-2-cassel@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
12 months agoring-buffer: Do not have boot mapped buffers hook to CPU hotplug
Steven Rostedt [Tue, 8 Oct 2024 18:32:42 +0000 (14:32 -0400)]
ring-buffer: Do not have boot mapped buffers hook to CPU hotplug

The boot mapped ring buffer has its buffer mapped at a fixed location
found at boot up. It is not dynamic. It cannot grow or be expanded when
new CPUs come online.

Do not hook fixed memory mapped ring buffers to the CPU hotplug callback,
otherwise it can cause a crash when it tries to add the buffer to the
memory that is already fully occupied.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20241008143242.25e20801@gandalf.local.home
Fixes: be68d63a139bd ("ring-buffer: Add ring_buffer_alloc_range()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
12 months agonet: hns3/hns: Update the maintainer for the HNS3/HNS ethernet driver
Jijie Shao [Tue, 8 Oct 2024 02:48:36 +0000 (10:48 +0800)]
net: hns3/hns: Update the maintainer for the HNS3/HNS ethernet driver

Yisen Zhuang has left the company in September.
Jian Shen will be responsible for maintaining the
hns3/hns driver's code in the future,
so add Jian Shen to the hns3/hns driver's matainer list.

Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>