]> git.apps.os.sepia.ceph.com Git - ceph-client.git/log
ceph-client.git
3 years agoMerge tag 'acpi-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Sat, 22 Oct 2022 01:08:30 +0000 (18:08 -0700)]
Merge tag 'acpi-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These fix issues introduced during this merge window (ACPI/PCI, device
  enumeration and documentation) and some other ones found recently.

  Specifics:

   - Add missing device reference counting to acpi_get_pci_dev() after
     changing it recently (Rafael Wysocki)

   - Fix resource list walk in acpi_dma_get_range() (Robin Murphy)

   - Add IRQ override quirk for LENOVO IdeaPad and extend the IRQ
     override warning message (Jiri Slaby)

   - Fix integer overflow in ghes_estatus_pool_init() (Ashish Kalra)

   - Fix multiple error records handling in one of the ACPI extlog
     driver code paths (Tony Luck)

   - Prune DSDT override documentation from index after dropping it
     (Bagas Sanjaya)"

* tag 'acpi-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: scan: Fix DMA range assignment
  ACPI: PCI: Fix device reference counting in acpi_get_pci_dev()
  ACPI: resource: note more about IRQ override
  ACPI: resource: do IRQ override on LENOVO IdeaPad
  ACPI: extlog: Handle multiple records
  ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
  Documentation: ACPI: Prune DSDT override documentation from index

3 years agoMerge tag 'efi-fixes-for-v6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 22 Oct 2022 01:02:36 +0000 (18:02 -0700)]
Merge tag 'efi-fixes-for-v6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fixes from Ard Biesheuvel:

 - fixes for the EFI variable store refactor that landed in v6.0

 - fixes for issues that were introduced during the merge window

 - back out some changes related to EFI zboot signing - we'll add a
   better solution for this during the next cycle

* tag 'efi-fixes-for-v6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: runtime: Don't assume virtual mappings are missing if VA == PA == 0
  efi: libstub: Fix incorrect payload size in zboot header
  efi: libstub: Give efi_main() asmlinkage qualification
  efi: efivars: Fix variable writes without query_variable_store()
  efi: ssdt: Don't free memory if ACPI table was loaded successfully
  efi: libstub: Remove zboot signing from build options

3 years agoMerge tag 'iommu-fixes-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 22 Oct 2022 00:47:39 +0000 (17:47 -0700)]
Merge tag 'iommu-fixes-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:
 "Intel VT-d fixes:

   - Fix a lockdep splat issue in intel_iommu_init()

   - Allow NVS regions to pass RMRR check

   - Domain cleanup in error path"

* tag 'iommu-fixes-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Clean up si_domain in the init_dmars() error path
  iommu/vt-d: Allow NVS regions in arch_rmrr_sanity_check()
  iommu/vt-d: Use rcu_lock in get_resv_regions
  iommu: Add gfp parameter to iommu_alloc_resv_region

3 years agoMerge tag 'for-linus-2022102101' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 22 Oct 2022 00:41:57 +0000 (17:41 -0700)]
Merge tag 'for-linus-2022102101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID fixes from Benjamin Tissoires:

 - a 12 year old bug fix for the Apple Magic Trackpad v1 (José Expósito)

 - a fix for a potential crash on removal of the Playstation controllers
   (Roderick Colenbrander)

 - a few new device IDs and device-specific quirks, most notably support
   of the new Playstation DualSense Edge controller

* tag 'for-linus-2022102101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: lenovo: Make array tp10ubkbd_led static const
  HID: saitek: add madcatz variant of MMO7 mouse device ID
  HID: playstation: support updated DualSense rumble mode.
  HID: playstation: add initial DualSense Edge controller support
  HID: playstation: stop DualSense output work on remove.
  HID: magicmouse: Do not set BTN_MOUSE on double report

3 years agoMerge tag '6.1-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Fri, 21 Oct 2022 23:01:53 +0000 (16:01 -0700)]
Merge tag '6.1-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:

 - memory leak fixes

 - fixes for directory leases, including an important one which fixes a
   problem noticed by git functional tests

 - fixes relating to missing free_xid calls (helpful for
   tracing/debugging of entry/exit into cifs.ko)

 - a multichannel fix

 - a small cleanup fix (use of list_move instead of list_del/list_add)

* tag '6.1-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal module number
  cifs: fix memory leaks in session setup
  cifs: drop the lease for cached directories on rmdir or rename
  smb3: interface count displayed incorrectly
  cifs: Fix memory leak when build ntlmssp negotiate blob failed
  cifs: set rc to -ENOENT if we can not get a dentry for the cached dir
  cifs: use LIST_HEAD() and list_move() to simplify code
  cifs: Fix xid leak in cifs_get_file_info_unix()
  cifs: Fix xid leak in cifs_ses_add_channel()
  cifs: Fix xid leak in cifs_flock()
  cifs: Fix xid leak in cifs_copy_file_range()
  cifs: Fix xid leak in cifs_create()

3 years agoMerge tag 'nfsd-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Linus Torvalds [Fri, 21 Oct 2022 22:51:30 +0000 (15:51 -0700)]
Merge tag 'nfsd-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:
 "Fixes for patches merged in v6.1"

* tag 'nfsd-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: ensure we always call fh_verify_error tracepoint
  NFSD: unregister shrinker when nfsd_init_net() fails

3 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Fri, 21 Oct 2022 22:19:43 +0000 (15:19 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Two small changes, one in the lpfc driver and the other in the core.

  The core change is an additional footgun guard which prevents users
  from writing the wrong state to sysfs and causing a hang"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: lpfc: Fix memory leak in lpfc_create_port()
  scsi: core: Restrict legal sdev_state transitions via sysfs

3 years agoMerge tag 'block-6.1-2022-10-20' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 21 Oct 2022 22:14:14 +0000 (15:14 -0700)]
Merge tag 'block-6.1-2022-10-20' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - NVMe pull request via Christoph:
      - fix nvme-hwmon for DMA non-cohehrent architectures (Serge Semin)
      - add a nvme-hwmong maintainer (Christoph Hellwig)
      - fix error pointer dereference in error handling (Dan Carpenter)
      - fix invalid memory reference in nvmet_subsys_attr_qid_max_show
        (Daniel Wagner)
      - don't limit the DMA segment size in nvme-apple (Russell King)
      - fix workqueue MEM_RECLAIM flushing dependency (Sagi Grimberg)
      - disable write zeroes on various Kingston SSDs (Xander Li)

 - fix a memory leak with block device tracing (Ye)

 - flexible-array fix for ublk (Yushan)

 - document the ublk recovery feature from this merge window
   (ZiyangZhang)

 - remove dead bfq variable in struct (Yuwei)

 - error handling rq clearing fix (Yu)

 - add an IRQ safety check for the cached bio freeing (Pavel)

 - drbd bio cloning fix (Christoph)

* tag 'block-6.1-2022-10-20' of git://git.kernel.dk/linux:
  blktrace: remove unnessary stop block trace in 'blk_trace_shutdown'
  blktrace: fix possible memleak in '__blk_trace_remove'
  blktrace: introduce 'blk_trace_{start,stop}' helper
  bio: safeguard REQ_ALLOC_CACHE bio put
  block, bfq: remove unused variable for bfq_queue
  drbd: only clone bio if we have a backing device
  ublk_drv: use flexible-array member instead of zero-length array
  nvmet: fix invalid memory reference in nvmet_subsys_attr_qid_max_show
  nvmet: fix workqueue MEM_RECLAIM flushing dependency
  nvme-hwmon: kmalloc the NVME SMART log buffer
  nvme-hwmon: consistently ignore errors from nvme_hwmon_init
  nvme: add Guenther as nvme-hwmon maintainer
  nvme-apple: don't limit DMA segement size
  nvme-pci: disable write zeroes on various Kingston SSD
  nvme: fix error pointer dereference in error handling
  Documentation: document ublk user recovery feature
  blk-mq: fix null pointer dereference in blk_mq_clear_rq_mapping()

3 years agoMerge tag 'io_uring-6.1-2022-10-20' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 21 Oct 2022 22:09:10 +0000 (15:09 -0700)]
Merge tag 'io_uring-6.1-2022-10-20' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

 - Fix a potential memory leak in the error handling path of io-wq setup
   (Rafael)

 - Kill an errant debug statement that got added in this release (me)

 - Fix an oops with an invalid direct descriptor with IORING_OP_MSG_RING
   (Harshit)

 - Remove unneeded FFS_SCM flagging (Pavel)

 - Remove polling off the exit path (Pavel)

 - Move out direct descriptor debug check to the cleanup path (Pavel)

 - Use the proper helper rather than open-coding cached request get
   (Pavel)

* tag 'io_uring-6.1-2022-10-20' of git://git.kernel.dk/linux:
  io-wq: Fix memory leak in worker creation
  io_uring/msg_ring: Fix NULL pointer dereference in io_msg_send_fd()
  io_uring/rw: remove leftover debug statement
  io_uring: don't iopoll from io_ring_ctx_wait_and_kill()
  io_uring: reuse io_alloc_req()
  io_uring: kill hot path fixed file bitmap debug checks
  io_uring: remove FFS_SCM

3 years agoMerge tag 'for-linus-6.1-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 21 Oct 2022 21:43:09 +0000 (14:43 -0700)]
Merge tag 'for-linus-6.1-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "Just two fixes for the new 'virtio with grants' feature"

* tag 'for-linus-6.1-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/virtio: Convert PAGE_SIZE/PAGE_SHIFT/PFN_UP to Xen counterparts
  xen/virtio: Handle cases when page offset > PAGE_SIZE properly

3 years agoMerge tag 'selinux-pr-20221020' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 21 Oct 2022 21:33:36 +0000 (14:33 -0700)]
Merge tag 'selinux-pr-20221020' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux fix from Paul Moore:
 "A small SELinux fix for a GFP_KERNEL allocation while a spinlock is
  held.

  The patch, while still fairly small, is a bit larger than one might
  expect from a simple s/GFP_KERNEL/GFP_ATOMIC/ conversion because we
  added support for the function to be called with different gfp flags
  depending on the context, preserving GFP_KERNEL for those cases that
  can safely sleep"

* tag 'selinux-pr-20221020' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: enable use of both GFP_KERNEL and GFP_ATOMIC in convert_context()

3 years agoMerge tag 'mm-hotfixes-stable-2022-10-20' of git://git.kernel.org/pub/scm/linux/kerne...
Linus Torvalds [Fri, 21 Oct 2022 19:33:03 +0000 (12:33 -0700)]
Merge tag 'mm-hotfixes-stable-2022-10-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morron:
 "Seventeen hotfixes, mainly for MM.

  Five are cc:stable and the remainder address post-6.0 issues"

* tag 'mm-hotfixes-stable-2022-10-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  nouveau: fix migrate_to_ram() for faulting page
  mm/huge_memory: do not clobber swp_entry_t during THP split
  hugetlb: fix memory leak associated with vma_lock structure
  mm/page_alloc: reduce potential fragmentation in make_alloc_exact()
  mm: /proc/pid/smaps_rollup: fix maple tree search
  mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages
  mm/mmap: fix MAP_FIXED address return on VMA merge
  mm/mmap.c: __vma_adjust(): suppress uninitialized var warning
  mm/mmap: undo ->mmap() when mas_preallocate() fails
  init: Kconfig: fix spelling mistake "satify" -> "satisfy"
  ocfs2: clear dinode links count in case of error
  ocfs2: fix BUG when iput after ocfs2_mknod fails
  gcov: support GCC 12.1 and newer compilers
  zsmalloc: zs_destroy_pool: add size_class NULL check
  mm/mempolicy: fix mbind_range() arguments to vma_merge()
  mailmap: update email for Qais Yousef
  mailmap: update Dan Carpenter's email address

3 years agoMerge tag 'trace-tools-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 21 Oct 2022 19:29:52 +0000 (12:29 -0700)]
Merge tag 'trace-tools-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing tool update from Steven Rostedt:

 - Make dot2c generate monitor's automata definition static

* tag 'trace-tools-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rv/dot2c: Make automaton definition static

3 years agoMerge tag 'linux-watchdog-6.1-rc2' of git://www.linux-watchdog.org/linux-watchdog
Linus Torvalds [Fri, 21 Oct 2022 19:25:39 +0000 (12:25 -0700)]
Merge tag 'linux-watchdog-6.1-rc2' of git://www.linux-watchdog.org/linux-watchdog

Pull watchdog updates from Wim Van Sebroeck:

 - Add tracing events for the most common watchdog events

* tag 'linux-watchdog-6.1-rc2' of git://www.linux-watchdog.org/linux-watchdog:
  watchdog: Add tracing events for the most usual watchdog events

3 years agoMerge branches 'acpi-scan', 'acpi-resource', 'acpi-apei', 'acpi-extlog' and 'acpi...
Rafael J. Wysocki [Fri, 21 Oct 2022 18:07:41 +0000 (20:07 +0200)]
Merge branches 'acpi-scan', 'acpi-resource', 'acpi-apei', 'acpi-extlog' and 'acpi-docs'

Merge assorted ACPI fixes for 6.1-rc2:

 - Fix resource list walk in acpi_dma_get_range() (Robin Murphy).

 - Add IRQ override quirk for LENOVO IdeaPad and extend the IRQ
   override warning message (Jiri Slaby).

 - Fix integer overflow in ghes_estatus_pool_init() (Ashish Kalra).

 - Fix multiple error records handling in one of the ACPI extlog driver
   code paths (Tony Luck).

 - Prune DSDT override documentation from index after dropping it (Bagas
   Sanjaya).

* acpi-scan:
  ACPI: scan: Fix DMA range assignment

* acpi-resource:
  ACPI: resource: note more about IRQ override
  ACPI: resource: do IRQ override on LENOVO IdeaPad

* acpi-apei:
  ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()

* acpi-extlog:
  ACPI: extlog: Handle multiple records

* acpi-docs:
  Documentation: ACPI: Prune DSDT override documentation from index

3 years agoefi: runtime: Don't assume virtual mappings are missing if VA == PA == 0
Ard Biesheuvel [Thu, 20 Oct 2022 13:16:09 +0000 (15:16 +0200)]
efi: runtime: Don't assume virtual mappings are missing if VA == PA == 0

The generic EFI stub can be instructed to avoid SetVirtualAddressMap(),
and simply run with the firmware's 1:1 mapping. In this case, it
populates the virtual address fields of the runtime regions in the
memory map with the physical address of each region, so that the mapping
code has to be none the wiser. Only if SetVirtualAddressMap() fails, the
virtual addresses are wiped and the kernel code knows that the regions
cannot be mapped.

However, wiping amounts to setting it to zero, and if a runtime region
happens to live at physical address 0, its valid 1:1 mapped virtual
address could be mistaken for a wiped field, resulting on loss of access
to the EFI services at runtime.

So let's only assume that VA == 0 means 'no runtime services' if the
region in question does not live at PA 0x0.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
3 years agoefi: libstub: Fix incorrect payload size in zboot header
Ard Biesheuvel [Thu, 20 Oct 2022 09:26:42 +0000 (11:26 +0200)]
efi: libstub: Fix incorrect payload size in zboot header

The linker script symbol definition that captures the size of the
compressed payload inside the zboot decompressor (which is exposed via
the image header) refers to '.' for the end of the region, which does
not give the correct result as the expression is not placed at the end
of the payload. So use the symbol name explicitly.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
3 years agoefi: libstub: Give efi_main() asmlinkage qualification
Ard Biesheuvel [Fri, 14 Oct 2022 17:29:57 +0000 (19:29 +0200)]
efi: libstub: Give efi_main() asmlinkage qualification

To stop the bots from sending sparse warnings to me and the list about
efi_main() not having a prototype, decorate it with asmlinkage so that
it is clear that it is called from assembly, and therefore needs to
remain external, even if it is never declared in a header file.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
3 years agoefi: efivars: Fix variable writes without query_variable_store()
Ard Biesheuvel [Wed, 19 Oct 2022 21:29:58 +0000 (23:29 +0200)]
efi: efivars: Fix variable writes without query_variable_store()

Commit bbc6d2c6ef22 ("efi: vars: Switch to new wrapper layer")
refactored the efivars layer so that the 'business logic' related to
which UEFI variables affect the boot flow in which way could be moved
out of it, and into the efivarfs driver.

This inadvertently broke setting variables on firmware implementations
that lack the QueryVariableInfo() boot service, because we no longer
tolerate a EFI_UNSUPPORTED result from check_var_size() when calling
efivar_entry_set_get_size(), which now ends up calling check_var_size()
a second time inadvertently.

If QueryVariableInfo() is missing, we support writes of up to 64k -
let's move that logic into check_var_size(), and drop the redundant
call.

Cc: <stable@vger.kernel.org> # v6.0
Fixes: bbc6d2c6ef22 ("efi: vars: Switch to new wrapper layer")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
3 years agoefi: ssdt: Don't free memory if ACPI table was loaded successfully
Ard Biesheuvel [Fri, 14 Oct 2022 10:25:52 +0000 (12:25 +0200)]
efi: ssdt: Don't free memory if ACPI table was loaded successfully

Amadeusz reports KASAN use-after-free errors introduced by commit
3881ee0b1edc ("efi: avoid efivars layer when loading SSDTs from
variables"). The problem appears to be that the memory that holds the
new ACPI table is now freed unconditionally, instead of only when the
ACPI core reported a failure to load the table.

So let's fix this, by omitting the kfree() on success.

Cc: <stable@vger.kernel.org> # v6.0
Link: https://lore.kernel.org/all/a101a10a-4fbb-5fae-2e3c-76cf96ed8fbd@linux.intel.com/
Fixes: 3881ee0b1edc ("efi: avoid efivars layer when loading SSDTs from variables")
Reported-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
3 years agoefi: libstub: Remove zboot signing from build options
Ard Biesheuvel [Mon, 17 Oct 2022 10:48:46 +0000 (12:48 +0200)]
efi: libstub: Remove zboot signing from build options

The zboot decompressor series introduced a feature to sign the PE/COFF
kernel image for secure boot as part of the kernel build. This was
necessary because there are actually two images that need to be signed:
the kernel with the EFI stub attached, and the decompressor application.

This is a bit of a burden, because it means that the images must be
signed on the the same system that performs the build, and this is not
realistic for distros.

During the next cycle, we will introduce changes to the zboot code so
that the inner image no longer needs to be signed. This means that the
outer PE/COFF image can be handled as usual, and be signed later in the
release process.

Let's remove the associated Kconfig options now so that they don't end
up in a LTS release while already being deprecated.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
3 years agoiommu/vt-d: Clean up si_domain in the init_dmars() error path
Jerry Snitselaar [Wed, 19 Oct 2022 00:44:47 +0000 (08:44 +0800)]
iommu/vt-d: Clean up si_domain in the init_dmars() error path

A splat from kmem_cache_destroy() was seen with a kernel prior to
commit ee2653bbe89d ("iommu/vt-d: Remove domain and devinfo mempool")
when there was a failure in init_dmars(), because the iommu_domain
cache still had objects. While the mempool code is now gone, there
still is a leak of the si_domain memory if init_dmars() fails. So
clean up si_domain in the init_dmars() error path.

Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Fixes: 86080ccc223a ("iommu/vt-d: Allocate si_domain in init_dmars()")
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Link: https://lore.kernel.org/r/20221010144842.308890-1-jsnitsel@redhat.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
3 years agoiommu/vt-d: Allow NVS regions in arch_rmrr_sanity_check()
Charlotte Tan [Wed, 19 Oct 2022 00:44:46 +0000 (08:44 +0800)]
iommu/vt-d: Allow NVS regions in arch_rmrr_sanity_check()

arch_rmrr_sanity_check() warns if the RMRR is not covered by an ACPI
Reserved region, but it seems like it should accept an NVS region as
well. The ACPI spec
https://uefi.org/specs/ACPI/6.5/15_System_Address_Map_Interfaces.html
uses similar wording for "Reserved" and "NVS" region types; for NVS
regions it says "This range of addresses is in use or reserved by the
system and must not be used by the operating system."

There is an old comment on this mailing list that also suggests NVS
regions should pass the arch_rmrr_sanity_check() test:

 The warnings come from arch_rmrr_sanity_check() since it checks whether
 the region is E820_TYPE_RESERVED. However, if the purpose of the check
 is to detect RMRR has regions that may be used by OS as free memory,
 isn't  E820_TYPE_NVS safe, too?

This patch overlaps with another proposed patch that would add the region
type to the log since sometimes the bug reporter sees this log on the
console but doesn't know to include the kernel log:

https://lore.kernel.org/lkml/20220611204859.234975-3-atomlin@redhat.com/

Here's an example of the "Firmware Bug" apparent false positive (wrapped
for line length):

 DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR
       [0x000000006f760000-0x000000006f762fff], contact BIOS vendor for
       fixes
 DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR
       [0x000000006f760000-0x000000006f762fff]

This is the snippet from the e820 table:

 BIOS-e820: [mem 0x0000000068bff000-0x000000006ebfefff] reserved
 BIOS-e820: [mem 0x000000006ebff000-0x000000006f9fefff] ACPI NVS
 BIOS-e820: [mem 0x000000006f9ff000-0x000000006fffefff] ACPI data

Fixes: f036c7fa0ab6 ("iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved")
Cc: Will Mortensen <will@extrahop.com>
Link: https://lore.kernel.org/linux-iommu/64a5843d-850d-e58c-4fc2-0a0eeeb656dc@nec.com/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216443
Signed-off-by: Charlotte Tan <charlotte@extrahop.com>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Link: https://lore.kernel.org/r/20220929044449.32515-1-charlotte@extrahop.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
3 years agoiommu/vt-d: Use rcu_lock in get_resv_regions
Lu Baolu [Wed, 19 Oct 2022 00:44:45 +0000 (08:44 +0800)]
iommu/vt-d: Use rcu_lock in get_resv_regions

Commit 5f64ce5411b46 ("iommu/vt-d: Duplicate iommu_resv_region objects
per device list") converted rcu_lock in get_resv_regions to
dmar_global_lock to allow sleeping in iommu_alloc_resv_region(). This
introduced possible recursive locking if get_resv_regions is called from
within a section where intel_iommu_init() already holds dmar_global_lock.

Especially, after commit 57365a04c921 ("iommu: Move bus setup to IOMMU
device registration"), below lockdep splats could always be seen.

 ============================================
 WARNING: possible recursive locking detected
 6.0.0-rc4+ #325 Tainted: G          I
 --------------------------------------------
 swapper/0/1 is trying to acquire lock:
 ffffffffa8a18c90 (dmar_global_lock){++++}-{3:3}, at:
 intel_iommu_get_resv_regions+0x25/0x270

 but task is already holding lock:
 ffffffffa8a18c90 (dmar_global_lock){++++}-{3:3}, at:
 intel_iommu_init+0x36d/0x6ea

 ...

 Call Trace:
  <TASK>
  dump_stack_lvl+0x48/0x5f
  __lock_acquire.cold.73+0xad/0x2bb
  lock_acquire+0xc2/0x2e0
  ? intel_iommu_get_resv_regions+0x25/0x270
  ? lock_is_held_type+0x9d/0x110
  down_read+0x42/0x150
  ? intel_iommu_get_resv_regions+0x25/0x270
  intel_iommu_get_resv_regions+0x25/0x270
  iommu_create_device_direct_mappings.isra.28+0x8d/0x1c0
  ? iommu_get_dma_cookie+0x6d/0x90
  bus_iommu_probe+0x19f/0x2e0
  iommu_device_register+0xd4/0x130
  intel_iommu_init+0x3e1/0x6ea
  ? iommu_setup+0x289/0x289
  ? rdinit_setup+0x34/0x34
  pci_iommu_init+0x12/0x3a
  do_one_initcall+0x65/0x320
  ? rdinit_setup+0x34/0x34
  ? rcu_read_lock_sched_held+0x5a/0x80
  kernel_init_freeable+0x28a/0x2f3
  ? rest_init+0x1b0/0x1b0
  kernel_init+0x1a/0x130
  ret_from_fork+0x1f/0x30
  </TASK>

This rolls back dmar_global_lock to rcu_lock in get_resv_regions to avoid
the lockdep splat.

Fixes: 57365a04c921 ("iommu: Move bus setup to IOMMU device registration")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20220927053109.4053662-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
3 years agoiommu: Add gfp parameter to iommu_alloc_resv_region
Lu Baolu [Wed, 19 Oct 2022 00:44:44 +0000 (08:44 +0800)]
iommu: Add gfp parameter to iommu_alloc_resv_region

Add gfp parameter to iommu_alloc_resv_region() for the callers to specify
the memory allocation behavior. Thus iommu_alloc_resv_region() could also
be available in critical contexts.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20220927053109.4053662-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
3 years agonouveau: fix migrate_to_ram() for faulting page
Alistair Popple [Wed, 19 Oct 2022 12:29:34 +0000 (23:29 +1100)]
nouveau: fix migrate_to_ram() for faulting page

Commit 16ce101db85d ("mm/memory.c: fix race when faulting a device private
page") changed the migrate_to_ram() callback to take a reference on the
device page to ensure it can't be freed while handling the fault.
Unfortunately the corresponding update to Nouveau to accommodate this
change was inadvertently dropped from that patch causing GPU to CPU
migration to fail so add it here.

Link: https://lkml.kernel.org/r/20221019122934.866205-1-apopple@nvidia.com
Fixes: 16ce101db85d ("mm/memory.c: fix race when faulting a device private page")
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm/huge_memory: do not clobber swp_entry_t during THP split
Mel Gorman [Wed, 19 Oct 2022 13:41:56 +0000 (14:41 +0100)]
mm/huge_memory: do not clobber swp_entry_t during THP split

The following has been observed when running stressng mmap since commit
b653db77350c ("mm: Clear page->private when splitting or migrating a page")

   watchdog: BUG: soft lockup - CPU#75 stuck for 26s! [stress-ng:9546]
   CPU: 75 PID: 9546 Comm: stress-ng Tainted: G            E      6.0.0-revert-b653db77-fix+ #29 0357d79b60fb09775f678e4f3f64ef0579ad1374
   Hardware name: SGI.COM C2112-4GP3/X10DRT-P-Series, BIOS 2.0a 05/09/2016
   RIP: 0010:xas_descend+0x28/0x80
   Code: cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 <48> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 89 c2
   RSP: 0018:ffffbbf02a2236a8 EFLAGS: 00000246
   RAX: ffff9cab7d6a0002 RBX: ffffe04b0af88040 RCX: 0000000000000002
   RDX: 0000000000000030 RSI: ffff9cab60509b60 RDI: ffffbbf02a2236c0
   RBP: 0000000000000000 R08: ffff9cab60509b60 R09: ffffbbf02a2236c0
   R10: 0000000000000001 R11: ffffbbf02a223698 R12: 0000000000000000
   R13: ffff9cab4e28da80 R14: 0000000000039c01 R15: ffff9cab4e28da88
   FS:  00007fab89b85e40(0000) GS:ffff9cea3fcc0000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: 00007fab84e00000 CR3: 00000040b73a4003 CR4: 00000000003706e0
   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
   Call Trace:
    <TASK>
    xas_load+0x3a/0x50
    __filemap_get_folio+0x80/0x370
    ? put_swap_page+0x163/0x360
    pagecache_get_page+0x13/0x90
    __try_to_reclaim_swap+0x50/0x190
    scan_swap_map_slots+0x31e/0x670
    get_swap_pages+0x226/0x3c0
    folio_alloc_swap+0x1cc/0x240
    add_to_swap+0x14/0x70
    shrink_page_list+0x968/0xbc0
    reclaim_page_list+0x70/0xf0
    reclaim_pages+0xdd/0x120
    madvise_cold_or_pageout_pte_range+0x814/0xf30
    walk_pgd_range+0x637/0xa30
    __walk_page_range+0x142/0x170
    walk_page_range+0x146/0x170
    madvise_pageout+0xb7/0x280
    ? asm_common_interrupt+0x22/0x40
    madvise_vma_behavior+0x3b7/0xac0
    ? find_vma+0x4a/0x70
    ? find_vma+0x64/0x70
    ? madvise_vma_anon_name+0x40/0x40
    madvise_walk_vmas+0xa6/0x130
    do_madvise+0x2f4/0x360
    __x64_sys_madvise+0x26/0x30
    do_syscall_64+0x5b/0x80
    ? do_syscall_64+0x67/0x80
    ? syscall_exit_to_user_mode+0x17/0x40
    ? do_syscall_64+0x67/0x80
    ? syscall_exit_to_user_mode+0x17/0x40
    ? do_syscall_64+0x67/0x80
    ? do_syscall_64+0x67/0x80
    ? common_interrupt+0x8b/0xa0
    entry_SYSCALL_64_after_hwframe+0x63/0xcd

The problem can be reproduced with the mmtests config
config-workload-stressng-mmap.  It does not always happen and when it
triggers is variable but it has happened on multiple machines.

The intent of commit b653db77350c patch was to avoid the case where
PG_private is clear but folio->private is not-NULL.  However, THP tail
pages uses page->private for "swp_entry_t if folio_test_swapcache()" as
stated in the documentation for struct folio.  This patch only clobbers
page->private for tail pages if the head page was not in swapcache and
warns once if page->private had an unexpected value.

Link: https://lkml.kernel.org/r/20221019134156.zjyyn5aownakvztf@techsingularity.net
Fixes: b653db77350c ("mm: Clear page->private when splitting or migrating a page")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agohugetlb: fix memory leak associated with vma_lock structure
Mike Kravetz [Wed, 19 Oct 2022 20:19:57 +0000 (13:19 -0700)]
hugetlb: fix memory leak associated with vma_lock structure

The hugetlb vma_lock structure hangs off the vm_private_data pointer of
sharable hugetlb vmas.  The structure is vma specific and can not be
shared between vmas.  At fork and various other times, vmas are duplicated
via vm_area_dup().  When this happens, the pointer in the newly created
vma must be cleared and the structure reallocated.  Two hugetlb specific
routines deal with this hugetlb_dup_vma_private and hugetlb_vm_op_open.
Both routines are called for newly created vmas.  hugetlb_dup_vma_private
would always clear the pointer and hugetlb_vm_op_open would allocate the
new vms_lock structure.  This did not work in the case of this calling
sequence pointed out in [1].

  move_vma
    copy_vma
      new_vma = vm_area_dup(vma);
      new_vma->vm_ops->open(new_vma); --> new_vma has its own vma lock.
    is_vm_hugetlb_page(vma)
      clear_vma_resv_huge_pages
        hugetlb_dup_vma_private --> vma->vm_private_data is set to NULL

When clearing hugetlb_dup_vma_private we actually leak the associated
vma_lock structure.

The vma_lock structure contains a pointer to the associated vma.  This
information can be used in hugetlb_dup_vma_private and hugetlb_vm_op_open
to ensure we only clear the vm_private_data of newly created (copied)
vmas.  In such cases, the vma->vma_lock->vma field will not point to the
vma.

Update hugetlb_dup_vma_private and hugetlb_vm_op_open to not clear
vm_private_data if vma->vma_lock->vma == vma.  Also, log a warning if
hugetlb_vm_op_open ever encounters the case where vma_lock has already
been correctly allocated for the vma.

[1] https://lore.kernel.org/linux-mm/5154292a-4c55-28cd-0935-82441e512fc3@huawei.com/

Link: https://lkml.kernel.org/r/20221019201957.34607-1-mike.kravetz@oracle.com
Fixes: 131a79b474e9 ("hugetlb: fix vma lock handling during split vma and range unmapping")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: James Houghton <jthoughton@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm/page_alloc: reduce potential fragmentation in make_alloc_exact()
Liam R. Howlett [Tue, 31 May 2022 13:20:51 +0000 (09:20 -0400)]
mm/page_alloc: reduce potential fragmentation in make_alloc_exact()

Try to avoid using the left over split page on the next request for a page
by calling __free_pages_ok() with FPI_TO_TAIL.  This increases the
potential of defragmenting memory when it's used for a short period of
time.

Link: https://lkml.kernel.org/r/20220531185626.yvlmymbxyoe5vags@revolver
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm: /proc/pid/smaps_rollup: fix maple tree search
Hugh Dickins [Wed, 19 Oct 2022 03:18:38 +0000 (20:18 -0700)]
mm: /proc/pid/smaps_rollup: fix maple tree search

/proc/pid/smaps_rollup showed 0 kB for everything: now find first vma.

Link: https://lkml.kernel.org/r/3011bee7-182-97a2-1083-d5f5b688e54b@google.com
Fixes: c4c84f06285e ("fs/proc/task_mmu: stop using linked list and highest_vm_end")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages
Rik van Riel [Tue, 18 Oct 2022 00:25:05 +0000 (20:25 -0400)]
mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages

The h->*_huge_pages counters are protected by the hugetlb_lock, but
alloc_huge_page has a corner case where it can decrement the counter
outside of the lock.

This could lead to a corrupted value of h->resv_huge_pages, which we have
observed on our systems.

Take the hugetlb_lock before decrementing h->resv_huge_pages to avoid a
potential race.

Link: https://lkml.kernel.org/r/20221017202505.0e6a4fcd@imladris.surriel.com
Fixes: a88c76954804 ("mm: hugetlb: fix hugepage memory leak caused by wrong reserve count")
Signed-off-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Glen McCready <gkmccready@meta.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm/mmap: fix MAP_FIXED address return on VMA merge
Liam Howlett [Tue, 18 Oct 2022 19:17:12 +0000 (19:17 +0000)]
mm/mmap: fix MAP_FIXED address return on VMA merge

mmap should return the start address of newly mapped area when successful.
On a successful merge of a VMA, the return address was changed and thus
was violating that expectation from userspace.

This is a restoration of functionality provided by 309d08d9b3a3
(mm/mmap.c: fix mmap return value when vma is merged after call_mmap()).
For completeness of fixing MAP_FIXED, implement the comments from the
previous discussion to never update the address and fail if the address
changes.  Leaving the error as a WARN_ON() to avoid crashing the kernel.

Link: https://lkml.kernel.org/r/20221018191613.4133459-1-Liam.Howlett@oracle.com
Link: https://lore.kernel.org/all/Y06yk66SKxlrwwfb@lakrids/
Link: https://lore.kernel.org/all/20201203085350.22624-1-liuzixian4@huawei.com/
Fixes: 4dd1b84140c1 ("mm/mmap: use advanced maple tree API for mmap_region()")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Cc: Liu Zixian <liuzixian4@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm/mmap.c: __vma_adjust(): suppress uninitialized var warning
Andrew Morton [Tue, 18 Oct 2022 20:57:37 +0000 (13:57 -0700)]
mm/mmap.c: __vma_adjust(): suppress uninitialized var warning

The code is OK, but it fools gcc.

mm/mmap.c:802 __vma_adjust() error: uninitialized symbol 'next_next'.

Fixes: 524e00b36e8c5 ("mm: remove rb tree.")
Reported-by: kernel test robot <lkp@intel.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm/mmap: undo ->mmap() when mas_preallocate() fails
Mike Kravetz [Tue, 18 Oct 2022 02:49:45 +0000 (19:49 -0700)]
mm/mmap: undo ->mmap() when mas_preallocate() fails

A memory leak in hugetlb_reserve_pages was reported in [1].  The root
cause was traced to an error path in mmap_region when mas_preallocate()
fails.  In this case, the vma is freed after a successful call to
filesystem specific mmap.  The hugetlbfs mmap routine may allocate data
structures pointed to by m_private_data.  These need to be cleaned up by
the hugetlb vm_ops->close() routine.

The same issue was addressed by commit deb0f6562884 ("mm/mmap: undo
->mmap() when arch_validate_flags() fails") for the arch_validate_flags()
test.  Go to the same close_and_free_vma label if mas_preallocate() fails.

[1] https://lore.kernel.org/linux-mm/CAKXUXMxf7OiCwbxib7MwfR4M1b5+b3cNTU7n5NV9Zm4967=FPQ@mail.gmail.com/

Link: https://lkml.kernel.org/r/20221018024945.415036-1-mike.kravetz@oracle.com
Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agoinit: Kconfig: fix spelling mistake "satify" -> "satisfy"
Colin Ian King [Fri, 7 Oct 2022 20:43:39 +0000 (21:43 +0100)]
init: Kconfig: fix spelling mistake "satify" -> "satisfy"

There is a spelling mistake in a Kconfig description.  Fix it.

Link: https://lkml.kernel.org/r/20221007204339.2757753-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agoocfs2: clear dinode links count in case of error
Joseph Qi [Mon, 17 Oct 2022 13:02:27 +0000 (21:02 +0800)]
ocfs2: clear dinode links count in case of error

In ocfs2_mknod(), if error occurs after dinode successfully allocated,
ocfs2 i_links_count will not be 0.

So even though we clear inode i_nlink before iput in error handling, it
still won't wipe inode since we'll refresh inode from dinode during inode
lock.  So just like clear inode i_nlink, we clear ocfs2 i_links_count as
well.  Also do the same change for ocfs2_symlink().

Link: https://lkml.kernel.org/r/20221017130227.234480-2-joseph.qi@linux.alibaba.com
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Yan Wang <wangyan122@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agoocfs2: fix BUG when iput after ocfs2_mknod fails
Joseph Qi [Mon, 17 Oct 2022 13:02:26 +0000 (21:02 +0800)]
ocfs2: fix BUG when iput after ocfs2_mknod fails

Commit b1529a41f777 "ocfs2: should reclaim the inode if
'__ocfs2_mknod_locked' returns an error" tried to reclaim the claimed
inode if __ocfs2_mknod_locked() fails later.  But this introduce a race,
the freed bit may be reused immediately by another thread, which will
update dinode, e.g.  i_generation.  Then iput this inode will lead to BUG:
inode->i_generation != le32_to_cpu(fe->i_generation)

We could make this inode as bad, but we did want to do operations like
wipe in some cases.  Since the claimed inode bit can only affect that an
dinode is missing and will return back after fsck, it seems not a big
problem.  So just leave it as is by revert the reclaim logic.

Link: https://lkml.kernel.org/r/20221017130227.234480-1-joseph.qi@linux.alibaba.com
Fixes: b1529a41f777 ("ocfs2: should reclaim the inode if '__ocfs2_mknod_locked' returns an error")
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Yan Wang <wangyan122@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agogcov: support GCC 12.1 and newer compilers
Martin Liska [Thu, 13 Oct 2022 07:40:59 +0000 (09:40 +0200)]
gcov: support GCC 12.1 and newer compilers

Starting with GCC 12.1, the created .gcda format can't be read by gcov
tool.  There are 2 significant changes to the .gcda file format that
need to be supported:

a) [gcov: Use system IO buffering]
   (23eb66d1d46a34cb28c4acbdf8a1deb80a7c5a05) changed that all sizes in
   the format are in bytes and not in words (4B)

b) [gcov: make profile merging smarter]
   (72e0c742bd01f8e7e6dcca64042b9ad7e75979de) add a new checksum to the
   file header.

Tested with GCC 7.5, 10.4, 12.2 and the current master.

Link: https://lkml.kernel.org/r/624bda92-f307-30e9-9aaa-8cc678b2dfb2@suse.cz
Signed-off-by: Martin Liska <mliska@suse.cz>
Tested-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agozsmalloc: zs_destroy_pool: add size_class NULL check
Alexey Romanov [Thu, 13 Oct 2022 11:28:25 +0000 (14:28 +0300)]
zsmalloc: zs_destroy_pool: add size_class NULL check

Inside the zs_destroy_pool() function, there can still be NULL size_class
pointers: if when the next size_class is allocated, inside
zs_create_pool() function, kzalloc will return NULL and handling the error
condition, zs_create_pool() will call zs_destroy_pool().

Link: https://lkml.kernel.org/r/20221013112825.61869-1-avromanov@sberdevices.ru
Fixes: f24263a5a076 ("zsmalloc: remove unnecessary size_class NULL check")
Signed-off-by: Alexey Romanov <avromanov@sberdevices.ru>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomm/mempolicy: fix mbind_range() arguments to vma_merge()
Liam Howlett [Sat, 15 Oct 2022 02:12:33 +0000 (02:12 +0000)]
mm/mempolicy: fix mbind_range() arguments to vma_merge()

Fuzzing produced an invalid argument to vma_merge() which was caught by
the newly added verification of the number of VMAs being removed on
process exit.  Analyzing the failure eventually resulted in finding an
issue with the search of a VMA that started at address 0, which caused an
underflow and thus the loss of many VMAs being tracked in the tree.  Fix
the underflow by changing the search of the maple tree to use the start
address directly.

Link: https://lkml.kernel.org/r/20221015021135.2816178-1-Liam.Howlett@oracle.com
Fixes: 66850be55e8e ("mm/mempolicy: use vma iterator & maple state instead of vma linked list")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/r/202210052318.5ad10912-oliver.sang@intel.com
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomailmap: update email for Qais Yousef
Qais Yousef [Fri, 14 Oct 2022 14:10:16 +0000 (15:10 +0100)]
mailmap: update email for Qais Yousef

Update my email address for old entry and add a new entry for my
contribution while working with arm to continue support that work.

Link: https://lkml.kernel.org/r/20221014141016.539625-1-qyousef@layalina.io
Signed-off-by: Qais Yousef <qyousef@layalina.io>
Acked-by: Qais Yousef <qais.yousef@arm.com>
Acked-by: Qais Yousef <qsyousef@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agomailmap: update Dan Carpenter's email address
Dan Carpenter [Wed, 12 Oct 2022 13:19:39 +0000 (16:19 +0300)]
mailmap: update Dan Carpenter's email address

My time at Oracle is ending at the end of the month.  Update my email
address accordingly.

Link: https://lkml.kernel.org/r/Y0a+6+5SHMdvUnpg@kili
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 years agoMerge tag 'drm-fixes-2022-10-21' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 21 Oct 2022 01:28:32 +0000 (18:28 -0700)]
Merge tag 'drm-fixes-2022-10-21' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Usual fixes for the week.

  The amdgpu contains fixes for two regressions, one reported in
  response to rc1 which broke on SI GPUs, and one gfx9 APU regression.

  Otherwise it's mostly fixes for new IP, and some GPU reset fixes. vc4
  is just HDMI fixes, and panfrost has some mnor types fixes.

  Core:
   - fix connector DDC pointer
   - fix buffer overflow in format_helper_test

  amdgpu:
   - Mode2 reset fixes for Sienna Cichlid
   - Revert broken fan speed sensor fix
   - SMU 13.x fixes
   - GC 11.x fixes
   - RAS fixes
   - SR-IOV fixes
   - Fix BO move breakage on SI
   - Misc compiler fixes
   - Fix gfx9 APU regression caused by PCI AER fix

  vc4:
   - HDMI fixes

  panfrost:
   - compiler fixes"

* tag 'drm-fixes-2022-10-21' of git://anongit.freedesktop.org/drm/drm: (35 commits)
  drm/amdgpu: fix sdma doorbell init ordering on APUs
  drm/panfrost: replace endian-specific types with native ones
  drm/panfrost: Remove type name from internal structs
  drm/connector: Set DDC pointer in drmm_connector_init
  drm: tests: Fix a buffer overflow in format_helper_test
  drm/amdgpu: use DRM_SCHED_FENCE_DONT_PIPELINE for VM updates
  drm/sched: add DRM_SCHED_FENCE_DONT_PIPELINE flag
  drm/amdgpu: Fix for BO move issue
  drm/amdgpu: dequeue mes scheduler during fini
  drm/amd/pm: enable thermal alert on smu_v13_0_10
  drm/amdgpu: Program GC registers through RLCG interface in gfx_v11/gmc_v11
  drm/amdkfd: Fix type of reset_type parameter in hqd_destroy() callback
  drm/amd/display: Increase frame size limit for display_mode_vba_util_32.o
  drm/amd/pm: add SMU IP v13.0.4 IF version define to V7
  drm/amd/pm: update SMU IP v13.0.4 driver interface version
  drm/amd/pm: Init pm_attr_list when dpm is disabled
  drm/amd/pm: disable cstate feature for gpu reset scenario
  drm/amd/pm: fulfill SMU13.0.7 cstate control interface
  drm/amd/pm: fulfill SMU13.0.0 cstate control interface
  drm/amdgpu: Add sriov vf ras support in amdgpu_ras_asic_supported
  ...

3 years agoMerge tag 'net-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Fri, 21 Oct 2022 00:24:59 +0000 (17:24 -0700)]
Merge tag 'net-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter.

  Current release - regressions:

   - revert "net: fix cpu_max_bits_warn() usage in
     netif_attrmask_next{,_and}"

   - revert "net: sched: fq_codel: remove redundant resource cleanup in
     fq_codel_init()"

   - dsa: uninitialized variable in dsa_slave_netdevice_event()

   - eth: sunhme: uninitialized variable in happy_meal_init()

  Current release - new code bugs:

   - eth: octeontx2: fix resource not freed after malloc

  Previous releases - regressions:

   - sched: fix return value of qdisc ingress handling on success

   - sched: fix race condition in qdisc_graft()

   - udp: update reuse->has_conns under reuseport_lock.

   - tls: strp: make sure the TCP skbs do not have overlapping data

   - hsr: avoid possible NULL deref in skb_clone()

   - tipc: fix an information leak in tipc_topsrv_kern_subscr

   - phylink: add mac_managed_pm in phylink_config structure

   - eth: i40e: fix DMA mappings leak

   - eth: hyperv: fix a RX-path warning

   - eth: mtk: fix memory leaks

  Previous releases - always broken:

   - sched: cake: fix null pointer access issue when cake_init() fails"

* tag 'net-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits)
  net: phy: dp83822: disable MDI crossover status change interrupt
  net: sched: fix race condition in qdisc_graft()
  net: hns: fix possible memory leak in hnae_ae_register()
  wwan_hwsim: fix possible memory leak in wwan_hwsim_dev_new()
  sfc: include vport_id in filter spec hash and equal()
  genetlink: fix kdoc warnings
  selftests: add selftest for chaining of tc ingress handling to egress
  net: Fix return value of qdisc ingress handling on success
  net: sched: sfb: fix null pointer access issue when sfb_init() fails
  Revert "net: sched: fq_codel: remove redundant resource cleanup in fq_codel_init()"
  net: sched: cake: fix null pointer access issue when cake_init() fails
  ethernet: marvell: octeontx2 Fix resource not freed after malloc
  netfilter: nf_tables: relax NFTA_SET_ELEM_KEY_END set flags requirements
  netfilter: rpfilter/fib: Set ->flowic_uid correctly for user namespaces.
  ionic: catch NULL pointer issue on reconfig
  net: hsr: avoid possible NULL deref in skb_clone()
  bnxt_en: fix memory leak in bnxt_nvm_test()
  ip6mr: fix UAF issue in ip6mr_sk_done() when addrconf_init_net() failed
  udp: Update reuse->has_conns under reuseport_lock.
  net: ethernet: mediatek: ppe: Remove the unused function mtk_foe_entry_usable()
  ...

3 years agoMerge tag 'ata-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal...
Linus Torvalds [Fri, 21 Oct 2022 00:07:54 +0000 (17:07 -0700)]
Merge tag 'ata-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata

Pull ata fixes from Damien Le Moal:
 "Several minor fixes:

   - Fix the module alias for the ahci_imx driver to get autoloading to
     work (Alexander)

   - Fix a potential array-index-out-of-bounds problem with the
     enclosure managment support in the ahci driver (Kai-Heng)

   - Several patches to fix compilation warnings thrown by clang in the
     ahci_st, sata_rcar, ahci_brcm, ahci_xgene, ahci_imx and ahci_qoriq
     drivers (me)"

* tag 'ata-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: ahci_qoriq: Fix compilation warning
  ata: ahci_imx: Fix compilation warning
  ata: ahci_xgene: Fix compilation warning
  ata: ahci_brcm: Fix compilation warning
  ata: sata_rcar: Fix compilation warning
  ata: ahci_st: Fix compilation warning
  ata: ahci: Match EM_MAX_SLOTS with SATA_PMP_MAX_PORTS
  ata: ahci-imx: Fix MODULE_ALIAS

3 years agoMerge tag 'for-6.1/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 21 Oct 2022 00:00:54 +0000 (17:00 -0700)]
Merge tag 'for-6.1/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - Fix dm-bufio to use test_bit_acquire to properly test_bit on arches
   with weaker memory ordering.

 - DM core replace DMWARN with DMERR or DMCRIT for fatal errors.

 - Enable WQ_HIGHPRI on DM verity target's verify_wq.

 - Add documentation for DM verity's try_verify_in_tasklet option.

 - Various typo and redundant word fixes in code and/or comments.

* tag 'for-6.1/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm clone: Fix typo in block_device format specifier
  dm: remove unnecessary assignment statement in alloc_dev()
  dm verity: Add documentation for try_verify_in_tasklet option
  dm cache: delete the redundant word 'each' in comment
  dm raid: fix typo in analyse_superblocks code comment
  dm verity: enable WQ_HIGHPRI on verify_wq
  dm raid: delete the redundant word 'that' in comment
  dm: change from DMWARN to DMERR or DMCRIT for fatal errors
  dm bufio: use the acquire memory barrier when testing for B_READING

3 years agoMerge tag 'drm-misc-fixes-2022-10-20' of git://anongit.freedesktop.org/drm/drm-misc...
Dave Airlie [Thu, 20 Oct 2022 23:56:07 +0000 (09:56 +1000)]
Merge tag 'drm-misc-fixes-2022-10-20' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

drm-misc-fixes for v6.1-rc2:
- Fix a buffer overflow in format_helper_test.
- Set DDC pointer in drmm_connector_init.
- Compiler fixes for panfrost.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/c4d05683-8ebe-93b8-d24c-d1d2c68f12c4@linux.intel.com
3 years agoMerge tag 'amd-drm-fixes-6.1-2022-10-20' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Thu, 20 Oct 2022 22:10:31 +0000 (08:10 +1000)]
Merge tag 'amd-drm-fixes-6.1-2022-10-20' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.1-2022-10-20:

amdgpu:
- Fix gfx9 APU regression caused by PCI AER fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221020135225.562807-1-alexander.deucher@amd.com
3 years agoMerge tag 'amd-drm-fixes-6.1-2022-10-19' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Thu, 20 Oct 2022 22:10:15 +0000 (08:10 +1000)]
Merge tag 'amd-drm-fixes-6.1-2022-10-19' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.1-2022-10-19:

amdgpu:
- Mode2 reset fixes for Sienna Cichlid
- Revert broken fan speed sensor fix
- SMU 13.x fixes
- GC 11.x fixes
- RAS fixes
- SR-IOV fixes
- Fix BO move breakage on SI
- Misc compiler fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221019191357.6208-1-alexander.deucher@amd.com
3 years agoMerge tag 'drm-misc-fixes-2022-10-13' of git://anongit.freedesktop.org/drm/drm-misc...
Dave Airlie [Thu, 20 Oct 2022 22:08:25 +0000 (08:08 +1000)]
Merge tag 'drm-misc-fixes-2022-10-13' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

Short summary of fixes pull:

 * vc4: HDMI fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/Y0gGdlujszCstDeP@linux-uq9g
3 years agorv/dot2c: Make automaton definition static
Daniel Bristot de Oliveira [Tue, 23 Aug 2022 15:20:28 +0000 (17:20 +0200)]
rv/dot2c: Make automaton definition static

Monitor's automata definition is only used locally, so make dot2c generate
a static definition.

Link: https://lore.kernel.org/all/202208210332.gtHXje45-lkp@intel.com
Link: https://lore.kernel.org/all/202208210358.6HH3OrVs-lkp@intel.com
Link: https://lkml.kernel.org/r/ffbb92010f643307766c9307fd42f416e5b85fa0.1661266564.git.bristot@kernel.org
Cc: Steven Rostedt <rostedt@goodmis.org>
Fixes: e3c9fc78f096 ("tools/rv: Add dot2c")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
3 years agodrm/amdgpu: fix sdma doorbell init ordering on APUs
Alex Deucher [Wed, 19 Oct 2022 20:57:42 +0000 (16:57 -0400)]
drm/amdgpu: fix sdma doorbell init ordering on APUs

Commit 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
uncovered a bug in amdgpu that required a reordering of the driver
init sequence to avoid accessing a special register on the GPU
before it was properly set up leading to an PCI AER error.  This
reordering uncovered a different hw programming ordering dependency
in some APUs where the SDMA doorbells need to be programmed before
the GFX doorbells. To fix this, move the SDMA doorbell programming
back into the soc15 common code, but use the actual doorbell range
values directly rather than the values stored in the ring structure
since those will not be initialized at this point.

This is a partial revert, but with the doorbell assignment
fixed so the proper doorbell index is set before it's used.

Fixes: e3163bc8ffdfdb ("drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega")
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: skhan@linuxfoundation.org
Cc: stable@vger.kernel.org
3 years agoblktrace: remove unnessary stop block trace in 'blk_trace_shutdown'
Ye Bin [Wed, 19 Oct 2022 03:36:02 +0000 (11:36 +0800)]
blktrace: remove unnessary stop block trace in 'blk_trace_shutdown'

As previous commit, 'blk_trace_cleanup' will stop block trace if
block trace's state is 'Blktrace_running'.
So remove unnessary stop block trace in 'blk_trace_shutdown'.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221019033602.752383-4-yebin@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblktrace: fix possible memleak in '__blk_trace_remove'
Ye Bin [Wed, 19 Oct 2022 03:36:01 +0000 (11:36 +0800)]
blktrace: fix possible memleak in '__blk_trace_remove'

When test as follows:
step1: ioctl(sda, BLKTRACESETUP, &arg)
step2: ioctl(sda, BLKTRACESTART, NULL)
step3: ioctl(sda, BLKTRACETEARDOWN, NULL)
step4: ioctl(sda, BLKTRACESETUP, &arg)
Got issue as follows:
debugfs: File 'dropped' in directory 'sda' already present!
debugfs: File 'msg' in directory 'sda' already present!
debugfs: File 'trace0' in directory 'sda' already present!

And also find syzkaller report issue like "KASAN: use-after-free Read in relay_switch_subbuf"
"https://syzkaller.appspot.com/bug?id=13849f0d9b1b818b087341691be6cc3ac6a6bfb7"

If remove block trace without stop(BLKTRACESTOP) block trace, '__blk_trace_remove'
will just set 'q->blk_trace' with NULL. However, debugfs file isn't removed, so
will report file already present when call BLKTRACESETUP.
static int __blk_trace_remove(struct request_queue *q)
{
        struct blk_trace *bt;

        bt = rcu_replace_pointer(q->blk_trace, NULL,
                                 lockdep_is_held(&q->debugfs_mutex));
        if (!bt)
                return -EINVAL;

if (bt->trace_state != Blktrace_running)
         blk_trace_cleanup(q, bt);

        return 0;
}

If do test as follows:
step1: ioctl(sda, BLKTRACESETUP, &arg)
step2: ioctl(sda, BLKTRACESTART, NULL)
step3: ioctl(sda, BLKTRACETEARDOWN, NULL)
step4: remove sda

There will remove debugfs directory which will remove recursively all file
under directory.
>> blk_release_queue
>> debugfs_remove_recursive(q->debugfs_dir)
So all files which created in 'do_blk_trace_setup' are removed, and
'dentry->d_inode' is NULL. But 'q->blk_trace' is still in 'running_trace_lock',
'trace_note_tsk' will traverse 'running_trace_lock' all nodes.
>>trace_note_tsk
>>  trace_note
>>    relay_reserve
>>       relay_switch_subbuf
>>        d_inode(buf->dentry)->i_size

To solve above issues, reference commit '5afedf670caf', call 'blk_trace_cleanup'
unconditionally in '__blk_trace_remove' and first stop block trace in
'blk_trace_cleanup'.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221019033602.752383-3-yebin@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblktrace: introduce 'blk_trace_{start,stop}' helper
Ye Bin [Wed, 19 Oct 2022 03:36:00 +0000 (11:36 +0800)]
blktrace: introduce 'blk_trace_{start,stop}' helper

Introduce 'blk_trace_{start,stop}' helper. No functional changed.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221019033602.752383-2-yebin@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agobio: safeguard REQ_ALLOC_CACHE bio put
Pavel Begunkov [Tue, 18 Oct 2022 19:50:55 +0000 (20:50 +0100)]
bio: safeguard REQ_ALLOC_CACHE bio put

bio_put() with REQ_ALLOC_CACHE assumes that it's executed not from
an irq context. Let's add a warning if the invariant is not respected,
especially since there is a couple of places removing REQ_POLLED by hand
without also clearing REQ_ALLOC_CACHE.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/558d78313476c4e9c233902efa0092644c3d420a.1666122465.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio-wq: Fix memory leak in worker creation
Rafael Mendonca [Thu, 20 Oct 2022 01:47:09 +0000 (22:47 -0300)]
io-wq: Fix memory leak in worker creation

If the CPU mask allocation for a node fails, then the memory allocated for
the 'io_wqe' struct of the current node doesn't get freed on the error
handling path, since it has not yet been added to the 'wqes' array.

This was spotted when fuzzing v6.1-rc1 with Syzkaller:
BUG: memory leak
unreferenced object 0xffff8880093d5000 (size 1024):
  comm "syz-executor.2", pid 7701, jiffies 4295048595 (age 13.900s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000cb463369>] __kmem_cache_alloc_node+0x18e/0x720
    [<00000000147a3f9c>] kmalloc_node_trace+0x2a/0x130
    [<000000004e107011>] io_wq_create+0x7b9/0xdc0
    [<00000000c38b2018>] io_uring_alloc_task_context+0x31e/0x59d
    [<00000000867399da>] __io_uring_add_tctx_node.cold+0x19/0x1ba
    [<000000007e0e7a79>] io_uring_setup.cold+0x1b80/0x1dce
    [<00000000b545e9f6>] __x64_sys_io_uring_setup+0x5d/0x80
    [<000000008a8a7508>] do_syscall_64+0x5d/0x90
    [<000000004ac08bec>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: 0e03496d1967 ("io-wq: use private CPU mask")
Cc: stable@vger.kernel.org
Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Link: https://lore.kernel.org/r/20221020014710.902201-1-rafaelmendsr@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock, bfq: remove unused variable for bfq_queue
Yuwei Guan [Tue, 18 Oct 2022 03:01:39 +0000 (11:01 +0800)]
block, bfq: remove unused variable for bfq_queue

it defined in d0edc2473be9d, but there's nowhere to use it,
so remove it.

Signed-off-by: Yuwei Guan <Yuwei.Guan@zeekrlife.com>
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Link: https://lore.kernel.org/r/20221018030139.159-1-Yuwei.Guan@zeekrlife.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agodrbd: only clone bio if we have a backing device
Christoph Böhmwalder [Thu, 20 Oct 2022 08:52:05 +0000 (10:52 +0200)]
drbd: only clone bio if we have a backing device

Commit c347a787e34cb (drbd: set ->bi_bdev in drbd_req_new) moved a
bio_set_dev call (which has since been removed) to "earlier", from
drbd_request_prepare to drbd_req_new.

The problem is that this accesses device->ldev->backing_bdev, which is
not NULL-checked at this point. When we don't have an ldev (i.e. when
the DRBD device is diskless), this leads to a null pointer deref.

So, only allocate the private_bio if we actually have a disk. This is
also a small optimization, since we don't clone the bio to only to
immediately free it again in the diskless case.

Fixes: c347a787e34cb ("drbd: set ->bi_bdev in drbd_req_new")
Co-developed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Co-developed-by: Joel Colledge <joel.colledge@linbit.com>
Signed-off-by: Joel Colledge <joel.colledge@linbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221020085205.129090-1-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoMerge tag 'nvme-6.1-2022-10-22' of git://git.infradead.org/nvme into block-6.1
Jens Axboe [Thu, 20 Oct 2022 12:43:58 +0000 (05:43 -0700)]
Merge tag 'nvme-6.1-2022-10-22' of git://git.infradead.org/nvme into block-6.1

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 6.1

 - fix nvme-hwmon for DMA non-cohehrent architectures (Serge Semin)
 - add a nvme-hwmong maintainer (Christoph Hellwig)
 - fix error pointer dereference in error handling (Dan Carpenter)
 - fix invalid memory reference in nvmet_subsys_attr_qid_max_show
   (Daniel Wagner)
 - don't limit the DMA segment size in nvme-apple (Russell King)
 - fix workqueue MEM_RECLAIM flushing dependency (Sagi Grimberg)
 - disable write zeroes on various Kingston SSDs (Xander Li)"

* tag 'nvme-6.1-2022-10-22' of git://git.infradead.org/nvme:
  nvmet: fix invalid memory reference in nvmet_subsys_attr_qid_max_show
  nvmet: fix workqueue MEM_RECLAIM flushing dependency
  nvme-hwmon: kmalloc the NVME SMART log buffer
  nvme-hwmon: consistently ignore errors from nvme_hwmon_init
  nvme: add Guenther as nvme-hwmon maintainer
  nvme-apple: don't limit DMA segement size
  nvme-pci: disable write zeroes on various Kingston SSD
  nvme: fix error pointer dereference in error handling

3 years agodrm/panfrost: replace endian-specific types with native ones
Steven Price [Mon, 17 Oct 2022 10:46:02 +0000 (11:46 +0100)]
drm/panfrost: replace endian-specific types with native ones

__le32 and __le64 types aren't portable and are not available on
FreeBSD (which uses the same uAPI).

Instead of attempting to always output little endian, just use native
endianness in the dumps. Tools can detect the endianness in use by
looking at the 'magic' field, but equally we don't expect big-endian to
be used with Mali (there are no known implementations out there).

Bug: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7252
Fixes: 730c2bf4ad39 ("drm/panfrost: Add support for devcoredump")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221017104602.142992-3-steven.price@arm.com
3 years agodrm/panfrost: Remove type name from internal structs
Steven Price [Mon, 17 Oct 2022 10:46:01 +0000 (11:46 +0100)]
drm/panfrost: Remove type name from internal structs

The two structs internal to struct panfrost_dump_object_header were
named, but sadly that is incompatible with C++, causing an error: "an
anonymous union may only have public non-static data members".

However nothing refers to struct pan_reg_hdr and struct pan_bomap_hdr
and there's no need to export these definitions, so lets drop them. This
fixes the C++ build error with the minimum change in userspace API.

Reported-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Fixes: 730c2bf4ad39 ("drm/panfrost: Add support for devcoredump")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221017104602.142992-2-steven.price@arm.com
3 years agodrm/connector: Set DDC pointer in drmm_connector_init
Maxime Ripard [Wed, 19 Oct 2022 14:34:42 +0000 (16:34 +0200)]
drm/connector: Set DDC pointer in drmm_connector_init

Commit 35a3b82f1bdd ("drm/connector: Introduce drmm_connector_init")
introduced the function drmm_connector_init() with a parameter for an
optional ddc pointer to the i2c controller used to access the DDC bus.

However, the underlying call to __drm_connector_init() was always
setting it to NULL instead of passing the ddc argument around.

This resulted in unexpected null pointer dereference on platforms
expecting to get a DDC controller.

Fixes: 35a3b82f1bdd ("drm/connector: Introduce drmm_connector_init")
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20221019143442.1798964-1-maxime@cerno.tech
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
3 years agodrm: tests: Fix a buffer overflow in format_helper_test
David Gow [Wed, 19 Oct 2022 07:32:40 +0000 (15:32 +0800)]
drm: tests: Fix a buffer overflow in format_helper_test

The xrgb2101010 format conversion test (unlike for other formats) does
an endianness conversion on the results. However, it always converts
TEST_BUF_SIZE 32-bit integers, which results in reading from (and
writing to) more memory than in present in the result buffer. Instead,
use the buffer size, divided by sizeof(u32).

The issue could be reproduced with KASAN:
./tools/testing/kunit/kunit.py run --kunitconfig drivers/gpu/drm/tests \
--kconfig_add CONFIG_KASAN=y --kconfig_add CONFIG_KASAN_VMALLOC=y \
--kconfig_add CONFIG_KASAN_KUNIT_TEST=y \
drm_format_helper_test.*xrgb2101010

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Fixes: 453114319699 ("drm/format-helper: Add KUnit tests for drm_fb_xrgb8888_to_xrgb2101010()")
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Maíra Canal <mairacanal@riseup.net>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: José Expósito <jose.exposito89@gmail.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221019073239.3779180-1-davidgow@google.com
3 years agoMerge drm/drm-fixes into drm-misc-fixes
Thomas Zimmermann [Thu, 20 Oct 2022 07:09:00 +0000 (09:09 +0200)]
Merge drm/drm-fixes into drm-misc-fixes

Backmerging to get v6.1-rc1.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
3 years agonet: phy: dp83822: disable MDI crossover status change interrupt
Felix Riemann [Tue, 18 Oct 2022 10:47:54 +0000 (12:47 +0200)]
net: phy: dp83822: disable MDI crossover status change interrupt

If the cable is disconnected the PHY seems to toggle between MDI and
MDI-X modes. With the MDI crossover status interrupt active this causes
roughly 10 interrupts per second.

As the crossover status isn't checked by the driver, the interrupt can
be disabled to reduce the interrupt load.

Fixes: 87461f7a58ab ("net: phy: DP83822 initial driver submission")
Signed-off-by: Felix Riemann <felix.riemann@sma.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20221018104755.30025-1-svc.sw.rte.linux@sma.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoublk_drv: use flexible-array member instead of zero-length array
Yushan Zhou [Tue, 18 Oct 2022 10:01:32 +0000 (18:01 +0800)]
ublk_drv: use flexible-array member instead of zero-length array

Eliminate the following coccicheck warning:
./drivers/block/ublk_drv.c:127:16-19: WARNING use flexible-array member instead

Signed-off-by: Yushan Zhou <katrinzhou@tencent.com>
Link: https://lore.kernel.org/r/20221018100132.355393-1-zys.zljxml@gmail.com
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agonet: sched: fix race condition in qdisc_graft()
Eric Dumazet [Tue, 18 Oct 2022 20:32:58 +0000 (20:32 +0000)]
net: sched: fix race condition in qdisc_graft()

We had one syzbot report [1] in syzbot queue for a while.
I was waiting for more occurrences and/or a repro but
Dmitry Vyukov spotted the issue right away.

<quoting Dmitry>
qdisc_graft() drops reference to qdisc in notify_and_destroy
while it's still assigned to dev->qdisc
</quoting>

Indeed, RCU rules are clear when replacing a data structure.
The visible pointer (dev->qdisc in this case) must be updated
to the new object _before_ RCU grace period is started
(qdisc_put(old) in this case).

[1]
BUG: KASAN: use-after-free in __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
Read of size 4 at addr ffff88802065e038 by task syz-executor.4/21027

CPU: 0 PID: 21027 Comm: syz-executor.4 Not tainted 6.0.0-rc3-syzkaller-00363-g7726d4c3e60b #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
__tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
__tcf_qdisc_find net/sched/cls_api.c:1051 [inline]
tc_new_tfilter+0x34f/0x2200 net/sched/cls_api.c:2018
rtnetlink_rcv_msg+0x955/0xca0 net/core/rtnetlink.c:6081
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:734
____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
__sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f5efaa89279
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f5efbc31168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f5efab9bf80 RCX: 00007f5efaa89279
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000005
RBP: 00007f5efaae32e9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f5efb0cfb1f R14: 00007f5efbc31300 R15: 0000000000022000
</TASK>

Allocated by task 21027:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_node include/linux/slab.h:623 [inline]
kzalloc_node include/linux/slab.h:744 [inline]
qdisc_alloc+0xb0/0xc50 net/sched/sch_generic.c:938
qdisc_create_dflt+0x71/0x4a0 net/sched/sch_generic.c:997
attach_one_default_qdisc net/sched/sch_generic.c:1152 [inline]
netdev_for_each_tx_queue include/linux/netdevice.h:2437 [inline]
attach_default_qdiscs net/sched/sch_generic.c:1170 [inline]
dev_activate+0x760/0xcd0 net/sched/sch_generic.c:1229
__dev_open+0x393/0x4d0 net/core/dev.c:1441
__dev_change_flags+0x583/0x750 net/core/dev.c:8556
rtnl_configure_link+0xee/0x240 net/core/rtnetlink.c:3189
rtnl_newlink_create net/core/rtnetlink.c:3371 [inline]
__rtnl_newlink+0x10b8/0x17e0 net/core/rtnetlink.c:3580
rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3593
rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:734
____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
__sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 21020:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1754 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1780
slab_free mm/slub.c:3534 [inline]
kfree+0xe2/0x580 mm/slub.c:4562
rcu_do_batch kernel/rcu/tree.c:2245 [inline]
rcu_core+0x7b5/0x1890 kernel/rcu/tree.c:2505
__do_softirq+0x1d3/0x9c6 kernel/softirq.c:571

Last potentially related work creation:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
__kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
call_rcu+0x99/0x790 kernel/rcu/tree.c:2793
qdisc_put+0xcd/0xe0 net/sched/sch_generic.c:1083
notify_and_destroy net/sched/sch_api.c:1012 [inline]
qdisc_graft+0xeb1/0x1270 net/sched/sch_api.c:1084
tc_modify_qdisc+0xbb7/0x1a00 net/sched/sch_api.c:1671
rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:734
____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
__sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Second to last potentially related work creation:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
__kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
kvfree_call_rcu+0x74/0x940 kernel/rcu/tree.c:3322
neigh_destroy+0x431/0x630 net/core/neighbour.c:912
neigh_release include/net/neighbour.h:454 [inline]
neigh_cleanup_and_release+0x1f8/0x330 net/core/neighbour.c:103
neigh_del net/core/neighbour.c:225 [inline]
neigh_remove_one+0x37d/0x460 net/core/neighbour.c:246
neigh_forced_gc net/core/neighbour.c:276 [inline]
neigh_alloc net/core/neighbour.c:447 [inline]
___neigh_create+0x18b5/0x29a0 net/core/neighbour.c:642
ip6_finish_output2+0xfb8/0x1520 net/ipv6/ip6_output.c:125
__ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
ip6_finish_output+0x690/0x1160 net/ipv6/ip6_output.c:206
NF_HOOK_COND include/linux/netfilter.h:296 [inline]
ip6_output+0x1ed/0x540 net/ipv6/ip6_output.c:227
dst_output include/net/dst.h:451 [inline]
NF_HOOK include/linux/netfilter.h:307 [inline]
NF_HOOK include/linux/netfilter.h:301 [inline]
mld_sendpack+0xa09/0xe70 net/ipv6/mcast.c:1820
mld_send_cr net/ipv6/mcast.c:2121 [inline]
mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2653
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306

The buggy address belongs to the object at ffff88802065e000
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 56 bytes inside of
1024-byte region [ffff88802065e000ffff88802065e400)

The buggy address belongs to the physical page:
page:ffffea0000819600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20658
head:ffffea0000819600 order:3 compound_mapcount:0 compound_pincount:0
flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000010200 0000000000000000 dead000000000001 ffff888011841dc0
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 3523, tgid 3523 (sshd), ts 41495190986, free_ts 41417713212
prep_new_page mm/page_alloc.c:2532 [inline]
get_page_from_freelist+0x109b/0x2ce0 mm/page_alloc.c:4283
__alloc_pages+0x1c7/0x510 mm/page_alloc.c:5515
alloc_pages+0x1a6/0x270 mm/mempolicy.c:2270
alloc_slab_page mm/slub.c:1824 [inline]
allocate_slab+0x27e/0x3d0 mm/slub.c:1969
new_slab mm/slub.c:2029 [inline]
___slab_alloc+0x7f1/0xe10 mm/slub.c:3031
__slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3118
slab_alloc_node mm/slub.c:3209 [inline]
__kmalloc_node_track_caller+0x2f2/0x380 mm/slub.c:4955
kmalloc_reserve net/core/skbuff.c:358 [inline]
__alloc_skb+0xd9/0x2f0 net/core/skbuff.c:430
alloc_skb_fclone include/linux/skbuff.h:1307 [inline]
tcp_stream_alloc_skb+0x38/0x580 net/ipv4/tcp.c:861
tcp_sendmsg_locked+0xc36/0x2f80 net/ipv4/tcp.c:1325
tcp_sendmsg+0x2b/0x40 net/ipv4/tcp.c:1483
inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:734
sock_write_iter+0x291/0x3d0 net/socket.c:1108
call_write_iter include/linux/fs.h:2187 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x9e9/0xdd0 fs/read_write.c:578
ksys_write+0x1e8/0x250 fs/read_write.c:631
page last free stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1449 [inline]
free_pcp_prepare+0x5e4/0xd20 mm/page_alloc.c:1499
free_unref_page_prepare mm/page_alloc.c:3380 [inline]
free_unref_page+0x19/0x4d0 mm/page_alloc.c:3476
__unfreeze_partials+0x17c/0x1a0 mm/slub.c:2548
qlink_free mm/kasan/quarantine.c:168 [inline]
qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187
kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:294
__kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:447
kasan_slab_alloc include/linux/kasan.h:224 [inline]
slab_post_alloc_hook mm/slab.h:727 [inline]
slab_alloc_node mm/slub.c:3243 [inline]
slab_alloc mm/slub.c:3251 [inline]
__kmem_cache_alloc_lru mm/slub.c:3258 [inline]
kmem_cache_alloc+0x267/0x3b0 mm/slub.c:3268
kmem_cache_zalloc include/linux/slab.h:723 [inline]
alloc_buffer_head+0x20/0x140 fs/buffer.c:2974
alloc_page_buffers+0x280/0x790 fs/buffer.c:829
create_empty_buffers+0x2c/0xee0 fs/buffer.c:1558
ext4_block_write_begin+0x1004/0x1530 fs/ext4/inode.c:1074
ext4_da_write_begin+0x422/0xae0 fs/ext4/inode.c:2996
generic_perform_write+0x246/0x560 mm/filemap.c:3738
ext4_buffered_write_iter+0x15b/0x460 fs/ext4/file.c:270
ext4_file_write_iter+0x44a/0x1660 fs/ext4/file.c:679
call_write_iter include/linux/fs.h:2187 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x9e9/0xdd0 fs/read_write.c:578

Fixes: af356afa010f ("net_sched: reintroduce dev->qdisc for use by sch_api")
Reported-by: syzbot <syzkaller@googlegroups.com>
Diagnosed-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20221018203258.2793282-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: hns: fix possible memory leak in hnae_ae_register()
Yang Yingliang [Tue, 18 Oct 2022 12:24:51 +0000 (20:24 +0800)]
net: hns: fix possible memory leak in hnae_ae_register()

Inject fault while probing module, if device_register() fails,
but the refcount of kobject is not decreased to 0, the name
allocated in dev_set_name() is leaked. Fix this by calling
put_device(), so that name can be freed in callback function
kobject_cleanup().

unreferenced object 0xffff00c01aba2100 (size 128):
  comm "systemd-udevd", pid 1259, jiffies 4294903284 (age 294.152s)
  hex dump (first 32 bytes):
    68 6e 61 65 30 00 00 00 18 21 ba 1a c0 00 ff ff  hnae0....!......
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<0000000034783f26>] slab_post_alloc_hook+0xa0/0x3e0
    [<00000000748188f2>] __kmem_cache_alloc_node+0x164/0x2b0
    [<00000000ab0743e8>] __kmalloc_node_track_caller+0x6c/0x390
    [<000000006c0ffb13>] kvasprintf+0x8c/0x118
    [<00000000fa27bfe1>] kvasprintf_const+0x60/0xc8
    [<0000000083e10ed7>] kobject_set_name_vargs+0x3c/0xc0
    [<000000000b87affc>] dev_set_name+0x7c/0xa0
    [<000000003fd8fe26>] hnae_ae_register+0xcc/0x190 [hnae]
    [<00000000fe97edc9>] hns_dsaf_ae_init+0x9c/0x108 [hns_dsaf]
    [<00000000c36ff1eb>] hns_dsaf_probe+0x548/0x748 [hns_dsaf]

Fixes: 6fe6611ff275 ("net: add Hisilicon Network Subsystem hnae framework support")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20221018122451.1749171-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agowwan_hwsim: fix possible memory leak in wwan_hwsim_dev_new()
Yang Yingliang [Tue, 18 Oct 2022 13:16:07 +0000 (21:16 +0800)]
wwan_hwsim: fix possible memory leak in wwan_hwsim_dev_new()

Inject fault while probing module, if device_register() fails,
but the refcount of kobject is not decreased to 0, the name
allocated in dev_set_name() is leaked. Fix this by calling
put_device(), so that name can be freed in callback function
kobject_cleanup().

unreferenced object 0xffff88810152ad20 (size 8):
  comm "modprobe", pid 252, jiffies 4294849206 (age 22.713s)
  hex dump (first 8 bytes):
    68 77 73 69 6d 30 00 ff                          hwsim0..
  backtrace:
    [<000000009c3504ed>] __kmalloc_node_track_caller+0x44/0x1b0
    [<00000000c0228a5e>] kvasprintf+0xb5/0x140
    [<00000000cff8c21f>] kvasprintf_const+0x55/0x180
    [<0000000055a1e073>] kobject_set_name_vargs+0x56/0x150
    [<000000000a80b139>] dev_set_name+0xab/0xe0

Fixes: f36a111a74e7 ("wwan_hwsim: WWAN device simulator")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Acked-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Link: https://lore.kernel.org/r/20221018131607.1901641-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agosfc: include vport_id in filter spec hash and equal()
Pieter Jansen van Vuuren [Tue, 18 Oct 2022 09:28:41 +0000 (10:28 +0100)]
sfc: include vport_id in filter spec hash and equal()

Filters on different vports are qualified by different implicit MACs and/or
VLANs, so shouldn't be considered equal even if their other match fields
are identical.

Fixes: 7c460d9be610 ("sfc: Extend and abstract efx_filter_spec to cover Huntington/EF10")
Co-developed-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20221018092841.32206-1-pieter.jansen-van-vuuren@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agocifs: update internal module number
Steve French [Wed, 19 Oct 2022 05:30:04 +0000 (00:30 -0500)]
cifs: update internal module number

To 2.40

Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: fix memory leaks in session setup
Paulo Alcantara [Wed, 19 Oct 2022 14:25:37 +0000 (11:25 -0300)]
cifs: fix memory leaks in session setup

We were only zeroing out the ntlmssp blob but forgot to free the
allocated buffer in the end of SMB2_sess_auth_rawntlmssp_negotiate()
and SMB2_sess_auth_rawntlmssp_authenticate() functions.

This fixes below kmemleak reports:

unreferenced object 0xffff88800ddcfc60 (size 96):
  comm "mount.cifs", pid 758, jiffies 4294696066 (age 42.967s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000d0beeb29>] __kmalloc+0x39/0xa0
    [<00000000e3834047>] build_ntlmssp_smb3_negotiate_blob+0x2c/0x110 [cifs]
    [<00000000e85f5ab2>] SMB2_sess_auth_rawntlmssp_negotiate+0xd3/0x230 [cifs]
    [<0000000080fdb897>] SMB2_sess_setup+0x16c/0x2a0 [cifs]
    [<000000009af320a8>] cifs_setup_session+0x13b/0x370 [cifs]
    [<00000000f15d5982>] cifs_get_smb_ses+0x643/0xb90 [cifs]
    [<00000000fe15eb90>] mount_get_conns+0x63/0x3e0 [cifs]
    [<00000000768aba03>] mount_get_dfs_conns+0x16/0xa0 [cifs]
    [<00000000cf1cf146>] cifs_mount+0x1c2/0x9a0 [cifs]
    [<000000000d66b51e>] cifs_smb3_do_mount+0x10e/0x710 [cifs]
    [<0000000077a996c5>] smb3_get_tree+0xf4/0x200 [cifs]
    [<0000000094dbd041>] vfs_get_tree+0x23/0xc0
    [<000000003a8561de>] path_mount+0x2d3/0xb50
    [<00000000ed5c86d6>] __x64_sys_mount+0x102/0x140
    [<00000000142142f3>] do_syscall_64+0x3b/0x90
    [<00000000e2b89731>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
unreferenced object 0xffff88801437f000 (size 512):
  comm "mount.cifs", pid 758, jiffies 4294696067 (age 42.970s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000d0beeb29>] __kmalloc+0x39/0xa0
    [<00000000004f53d2>] build_ntlmssp_auth_blob+0x4f/0x340 [cifs]
    [<000000005f333084>] SMB2_sess_auth_rawntlmssp_authenticate+0xd4/0x250 [cifs]
    [<0000000080fdb897>] SMB2_sess_setup+0x16c/0x2a0 [cifs]
    [<000000009af320a8>] cifs_setup_session+0x13b/0x370 [cifs]
    [<00000000f15d5982>] cifs_get_smb_ses+0x643/0xb90 [cifs]
    [<00000000fe15eb90>] mount_get_conns+0x63/0x3e0 [cifs]
    [<00000000768aba03>] mount_get_dfs_conns+0x16/0xa0 [cifs]
    [<00000000cf1cf146>] cifs_mount+0x1c2/0x9a0 [cifs]
    [<000000000d66b51e>] cifs_smb3_do_mount+0x10e/0x710 [cifs]
    [<0000000077a996c5>] smb3_get_tree+0xf4/0x200 [cifs]
    [<0000000094dbd041>] vfs_get_tree+0x23/0xc0
    [<000000003a8561de>] path_mount+0x2d3/0xb50
    [<00000000ed5c86d6>] __x64_sys_mount+0x102/0x140
    [<00000000142142f3>] do_syscall_64+0x3b/0x90
    [<00000000e2b89731>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: a4e430c8c8ba ("cifs: replace kfree() with kfree_sensitive() for sensitive data")
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: drop the lease for cached directories on rmdir or rename
Ronnie Sahlberg [Tue, 18 Oct 2022 07:39:10 +0000 (17:39 +1000)]
cifs: drop the lease for cached directories on rmdir or rename

When we delete or rename a directory we must also drop any cached lease we have
on the directory.

Fixes: a350d6e73f5e ("cifs: enable caching of directories for which a lease is held")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Jakub Kicinski [Wed, 19 Oct 2022 22:45:54 +0000 (15:45 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Missing flowi uid field in nft_fib expression, from Guillaume Nault.
   This is broken since the creation of the fib expression.

2) Relax sanity check to fix bogus EINVAL error when deleting elements
   belonging set intervals. Broken since 6.0-rc.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: relax NFTA_SET_ELEM_KEY_END set flags requirements
  netfilter: rpfilter/fib: Set ->flowic_uid correctly for user namespaces.
====================

Link: https://lore.kernel.org/r/20221019065225.1006344-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agogenetlink: fix kdoc warnings
Jakub Kicinski [Tue, 18 Oct 2022 23:13:10 +0000 (16:13 -0700)]
genetlink: fix kdoc warnings

Address a bunch of kdoc warnings:

include/net/genetlink.h:81: warning: Function parameter or member 'module' not described in 'genl_family'
include/net/genetlink.h:243: warning: expecting prototype for struct genl_info. Prototype was for struct genl_dumpit_info instead
include/net/genetlink.h:419: warning: Function parameter or member 'net' not described in 'genlmsg_unicast'
include/net/genetlink.h:438: warning: expecting prototype for gennlmsg_data(). Prototype was for genlmsg_data() instead
include/net/genetlink.h:244: warning: Function parameter or member 'op' not described in 'genl_dumpit_info'

Link: https://lore.kernel.org/r/20221018231310.1040482-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoio_uring/msg_ring: Fix NULL pointer dereference in io_msg_send_fd()
Harshit Mogalapalli [Wed, 19 Oct 2022 17:12:18 +0000 (10:12 -0700)]
io_uring/msg_ring: Fix NULL pointer dereference in io_msg_send_fd()

Syzkaller produced the below call trace:

 BUG: KASAN: null-ptr-deref in io_msg_ring+0x3cb/0x9f0
 Write of size 8 at addr 0000000000000070 by task repro/16399

 CPU: 0 PID: 16399 Comm: repro Not tainted 6.1.0-rc1 #28
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7
 Call Trace:
  <TASK>
  dump_stack_lvl+0xcd/0x134
  ? io_msg_ring+0x3cb/0x9f0
  kasan_report+0xbc/0xf0
  ? io_msg_ring+0x3cb/0x9f0
  kasan_check_range+0x140/0x190
  io_msg_ring+0x3cb/0x9f0
  ? io_msg_ring_prep+0x300/0x300
  io_issue_sqe+0x698/0xca0
  io_submit_sqes+0x92f/0x1c30
  __do_sys_io_uring_enter+0xae4/0x24b0
....
 RIP: 0033:0x7f2eaf8f8289
 RSP: 002b:00007fff40939718 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa
 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2eaf8f8289
 RDX: 0000000000000000 RSI: 0000000000006f71 RDI: 0000000000000004
 RBP: 00007fff409397a0 R08: 0000000000000000 R09: 0000000000000039
 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004006d0
 R13: 00007fff40939880 R14: 0000000000000000 R15: 0000000000000000
  </TASK>
 Kernel panic - not syncing: panic_on_warn set ...

We don't have a NULL check on file_ptr in io_msg_send_fd() function,
so when file_ptr is NUL src_file is also NULL and get_file()
dereferences a NULL pointer and leads to above crash.

Add a NULL check to fix this issue.

Fixes: e6130eba8a84 ("io_uring: add support for passing fixed file descriptors")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Link: https://lore.kernel.org/r/20221019171218.1337614-1-harshit.m.mogalapalli@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoACPI: scan: Fix DMA range assignment
Robin Murphy [Tue, 18 Oct 2022 13:14:04 +0000 (14:14 +0100)]
ACPI: scan: Fix DMA range assignment

Assigning the device's dma_range_map from the iterator variable after
the loop means it always points to the empty terminator at the end of
the map, which is not what we want. Similarly, freeing the iterator on
error when it points to somwhere in the middle of the allocated array
won't work either. Fix this.

Fixes: bf2ee8d0c385 ("ACPI: scan: Support multiple DMA windows with different offsets")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Jianmin Lv <lvjianmin@loongson.cn>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Yicong Yang <yangyicong@hisilicon.com>
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
3 years agosmb3: interface count displayed incorrectly
Steve French [Sat, 15 Oct 2022 22:02:30 +0000 (17:02 -0500)]
smb3: interface count displayed incorrectly

The "Server interfaces" count in /proc/fs/cifs/DebugData increases
as the interfaces are requeried, rather than being reset to the new
value.  This could cause a problem if the server disabled
multichannel as the iface_count is checked in try_adding_channels
to see if multichannel still supported.

Also fixes a coverity warning:

Addresses-Coverity: 1526374 ("Concurrent data access violations  (MISSING_LOCK)")
Cc: <stable@vger.kernel.org>
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agoselinux: enable use of both GFP_KERNEL and GFP_ATOMIC in convert_context()
GONG, Ruiqi [Wed, 19 Oct 2022 02:57:10 +0000 (10:57 +0800)]
selinux: enable use of both GFP_KERNEL and GFP_ATOMIC in convert_context()

The following warning was triggered on a hardware environment:

  SELinux: Converting 162 SID table entries...
  BUG: sleeping function called from invalid context at
       __might_sleep+0x60/0x74 0x0
  in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 5943, name: tar
  CPU: 7 PID: 5943 Comm: tar Tainted: P O 5.10.0 #1
  Call trace:
   dump_backtrace+0x0/0x1c8
   show_stack+0x18/0x28
   dump_stack+0xe8/0x15c
   ___might_sleep+0x168/0x17c
   __might_sleep+0x60/0x74
   __kmalloc_track_caller+0xa0/0x7dc
   kstrdup+0x54/0xac
   convert_context+0x48/0x2e4
   sidtab_context_to_sid+0x1c4/0x36c
   security_context_to_sid_core+0x168/0x238
   security_context_to_sid_default+0x14/0x24
   inode_doinit_use_xattr+0x164/0x1e4
   inode_doinit_with_dentry+0x1c0/0x488
   selinux_d_instantiate+0x20/0x34
   security_d_instantiate+0x70/0xbc
   d_splice_alias+0x4c/0x3c0
   ext4_lookup+0x1d8/0x200 [ext4]
   __lookup_slow+0x12c/0x1e4
   walk_component+0x100/0x200
   path_lookupat+0x88/0x118
   filename_lookup+0x98/0x130
   user_path_at_empty+0x48/0x60
   vfs_statx+0x84/0x140
   vfs_fstatat+0x20/0x30
   __se_sys_newfstatat+0x30/0x74
   __arm64_sys_newfstatat+0x1c/0x2c
   el0_svc_common.constprop.0+0x100/0x184
   do_el0_svc+0x1c/0x2c
   el0_svc+0x20/0x34
   el0_sync_handler+0x80/0x17c
   el0_sync+0x13c/0x140
  SELinux: Context system_u:object_r:pssp_rsyslog_log_t:s0:c0 is
           not valid (left unmapped).

It was found that within a critical section of spin_lock_irqsave in
sidtab_context_to_sid(), convert_context() (hooked by
sidtab_convert_params.func) might cause the process to sleep via
allocating memory with GFP_KERNEL, which is problematic.

As Ondrej pointed out [1], convert_context()/sidtab_convert_params.func
has another caller sidtab_convert_tree(), which is okay with GFP_KERNEL.
Therefore, fix this problem by adding a gfp_t argument for
convert_context()/sidtab_convert_params.func and pass GFP_KERNEL/_ATOMIC
properly in individual callers.

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20221018120111.1474581-1-gongruiqi1@huawei.com/
Reported-by: Tan Ninghao <tanninghao1@huawei.com>
Fixes: ee1a84fdfeed ("selinux: overhaul sidtab to fix bug and improve performance")
Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
[PM: wrap long BUG() output lines, tweak subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>
3 years agoMerge branch 'qdisc-ingress-success'
David S. Miller [Wed, 19 Oct 2022 13:04:36 +0000 (14:04 +0100)]
Merge branch 'qdisc-ingress-success'

Paul Blakey says:

====================
net: Fix return value of qdisc ingress handling on success

Fix patch + self-test with the currently broken scenario.

v4->v3:
  Removed new line in self test and rebase (Paolo).

v2->v3:
  Added DROP return to TC_ACT_SHOT case (Cong).

v1->v2:
  Changed blamed commit
  Added self-test
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: add selftest for chaining of tc ingress handling to egress
Paul Blakey [Tue, 18 Oct 2022 07:34:39 +0000 (10:34 +0300)]
selftests: add selftest for chaining of tc ingress handling to egress

This test runs a simple ingress tc setup between two veth pairs,
then adds a egress->ingress rule to test the chaining of tc ingress
pipeline to tc egress piepline.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: Fix return value of qdisc ingress handling on success
Paul Blakey [Tue, 18 Oct 2022 07:34:38 +0000 (10:34 +0300)]
net: Fix return value of qdisc ingress handling on success

Currently qdisc ingress handling (sch_handle_ingress()) doesn't
set a return value and it is left to the old return value of
the caller (__netif_receive_skb_core()) which is RX drop, so if
the packet is consumed, caller will stop and return this value
as if the packet was dropped.

This causes a problem in the kernel tcp stack when having a
egress tc rule forwarding to a ingress tc rule.
The tcp stack sending packets on the device having the egress rule
will see the packets as not successfully transmitted (although they
actually were), will not advance it's internal state of sent data,
and packets returning on such tcp stream will be dropped by the tcp
stack with reason ack-of-unsent-data. See reproduction in [0] below.

Fix that by setting the return value to RX success if
the packet was handled successfully.

[0] Reproduction steps:
 $ ip link add veth1 type veth peer name peer1
 $ ip link add veth2 type veth peer name peer2
 $ ifconfig peer1 5.5.5.6/24 up
 $ ip netns add ns0
 $ ip link set dev peer2 netns ns0
 $ ip netns exec ns0 ifconfig peer2 5.5.5.5/24 up
 $ ifconfig veth2 0 up
 $ ifconfig veth1 0 up

 #ingress forwarding veth1 <-> veth2
 $ tc qdisc add dev veth2 ingress
 $ tc qdisc add dev veth1 ingress
 $ tc filter add dev veth2 ingress prio 1 proto all flower \
   action mirred egress redirect dev veth1
 $ tc filter add dev veth1 ingress prio 1 proto all flower \
   action mirred egress redirect dev veth2

 #steal packet from peer1 egress to veth2 ingress, bypassing the veth pipe
 $ tc qdisc add dev peer1 clsact
 $ tc filter add dev peer1 egress prio 20 proto ip flower \
   action mirred ingress redirect dev veth1

 #run iperf and see connection not running
 $ iperf3 -s&
 $ ip netns exec ns0 iperf3 -c 5.5.5.6 -i 1

 #delete egress rule, and run again, now should work
 $ tc filter del dev peer1 egress
 $ ip netns exec ns0 iperf3 -c 5.5.5.6 -i 1

Fixes: f697c3e8b35c ("[NET]: Avoid unnecessary cloning for ingress filtering")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'qdisc-null-deref'
David S. Miller [Wed, 19 Oct 2022 12:47:09 +0000 (13:47 +0100)]
Merge branch 'qdisc-null-deref'

Zhengchao Shao says:

====================
net: fix null pointer access issue in qdisc

These three patches fix the same type of problem. Set the default qdisc,
and then construct an init failure scenario when the dev qdisc is
configured on mqprio to trigger the reset process. NULL pointer access
may occur during the reset process.

---
v2: for fq_codel, revert the patch
---
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: sched: sfb: fix null pointer access issue when sfb_init() fails
Zhengchao Shao [Tue, 18 Oct 2022 06:32:01 +0000 (14:32 +0800)]
net: sched: sfb: fix null pointer access issue when sfb_init() fails

When the default qdisc is sfb, if the qdisc of dev_queue fails to be
inited during mqprio_init(), sfb_reset() is invoked to clear resources.
In this case, the q->qdisc is NULL, and it will cause gpf issue.

The process is as follows:
qdisc_create_dflt()
sfb_init()
tcf_block_get()          --->failed, q->qdisc is NULL
...
qdisc_put()
...
sfb_reset()
qdisc_reset(q->qdisc)    --->q->qdisc is NULL
ops = qdisc->ops

The following is the Call Trace information:
general protection fault, probably for non-canonical address
0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
RIP: 0010:qdisc_reset+0x2b/0x6f0
Call Trace:
<TASK>
sfb_reset+0x37/0xd0
qdisc_reset+0xed/0x6f0
qdisc_destroy+0x82/0x4c0
qdisc_put+0x9e/0xb0
qdisc_create_dflt+0x2c3/0x4a0
mqprio_init+0xa71/0x1760
qdisc_create+0x3eb/0x1000
tc_modify_qdisc+0x408/0x1720
rtnetlink_rcv_msg+0x38e/0xac0
netlink_rcv_skb+0x12d/0x3a0
netlink_unicast+0x4a2/0x740
netlink_sendmsg+0x826/0xcc0
sock_sendmsg+0xc5/0x100
____sys_sendmsg+0x583/0x690
___sys_sendmsg+0xe8/0x160
__sys_sendmsg+0xbf/0x160
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f2164122d04
</TASK>

Fixes: e13e02a3c68d ("net_sched: SFB flow scheduler")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoRevert "net: sched: fq_codel: remove redundant resource cleanup in fq_codel_init()"
Zhengchao Shao [Tue, 18 Oct 2022 06:32:00 +0000 (14:32 +0800)]
Revert "net: sched: fq_codel: remove redundant resource cleanup in fq_codel_init()"

This reverts commit 494f5063b86cd6e972cb41a27e083c9a3664319d.

When the default qdisc is fq_codel, if the qdisc of dev_queue fails to be
inited during mqprio_init(), fq_codel_reset() is invoked to clear
resources. In this case, the flow is NULL, and it will cause gpf issue.

The process is as follows:
qdisc_create_dflt()
fq_codel_init()
...
q->flows_cnt = 1024;
...
q->flows = kvcalloc(...)      --->failed, q->flows is NULL
...
qdisc_put()
...
fq_codel_reset()
...
flow = q->flows + i   --->q->flows is NULL

The following is the Call Trace information:
general protection fault, probably for non-canonical address
0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
RIP: 0010:fq_codel_reset+0x14d/0x350
Call Trace:
<TASK>
qdisc_reset+0xed/0x6f0
qdisc_destroy+0x82/0x4c0
qdisc_put+0x9e/0xb0
qdisc_create_dflt+0x2c3/0x4a0
mqprio_init+0xa71/0x1760
qdisc_create+0x3eb/0x1000
tc_modify_qdisc+0x408/0x1720
rtnetlink_rcv_msg+0x38e/0xac0
netlink_rcv_skb+0x12d/0x3a0
netlink_unicast+0x4a2/0x740
netlink_sendmsg+0x826/0xcc0
sock_sendmsg+0xc5/0x100
____sys_sendmsg+0x583/0x690
___sys_sendmsg+0xe8/0x160
__sys_sendmsg+0xbf/0x160
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fd272b22d04
</TASK>

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: sched: cake: fix null pointer access issue when cake_init() fails
Zhengchao Shao [Tue, 18 Oct 2022 06:31:59 +0000 (14:31 +0800)]
net: sched: cake: fix null pointer access issue when cake_init() fails

When the default qdisc is cake, if the qdisc of dev_queue fails to be
inited during mqprio_init(), cake_reset() is invoked to clear
resources. In this case, the tins is NULL, and it will cause gpf issue.

The process is as follows:
qdisc_create_dflt()
cake_init()
q->tins = kvcalloc(...)        --->failed, q->tins is NULL
...
qdisc_put()
...
cake_reset()
...
cake_dequeue_one()
b = &q->tins[...]   --->q->tins is NULL

The following is the Call Trace information:
general protection fault, probably for non-canonical address
0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
RIP: 0010:cake_dequeue_one+0xc9/0x3c0
Call Trace:
<TASK>
cake_reset+0xb1/0x140
qdisc_reset+0xed/0x6f0
qdisc_destroy+0x82/0x4c0
qdisc_put+0x9e/0xb0
qdisc_create_dflt+0x2c3/0x4a0
mqprio_init+0xa71/0x1760
qdisc_create+0x3eb/0x1000
tc_modify_qdisc+0x408/0x1720
rtnetlink_rcv_msg+0x38e/0xac0
netlink_rcv_skb+0x12d/0x3a0
netlink_unicast+0x4a2/0x740
netlink_sendmsg+0x826/0xcc0
sock_sendmsg+0xc5/0x100
____sys_sendmsg+0x583/0x690
___sys_sendmsg+0xe8/0x160
__sys_sendmsg+0xbf/0x160
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f89e5122d04
</TASK>

Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoethernet: marvell: octeontx2 Fix resource not freed after malloc
Manank Patel [Tue, 18 Oct 2022 05:33:18 +0000 (11:03 +0530)]
ethernet: marvell: octeontx2 Fix resource not freed after malloc

fix rxsc and txsc not getting freed before going out of scope

Fixes: c54ffc73601c ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
Signed-off-by: Manank Patel <pmanank200502@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoACPI: PCI: Fix device reference counting in acpi_get_pci_dev()
Rafael J. Wysocki [Tue, 18 Oct 2022 17:34:03 +0000 (19:34 +0200)]
ACPI: PCI: Fix device reference counting in acpi_get_pci_dev()

Commit 63f534b8bad9 ("ACPI: PCI: Rework acpi_get_pci_dev()") failed
to reference count the device returned by acpi_get_pci_dev() as
expected by its callers which in some cases may cause device objects
to be dropped prematurely.

Add the missing get_device() to acpi_get_pci_dev().

Fixes: 63f534b8bad9 ("ACPI: PCI: Rework acpi_get_pci_dev()")
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
3 years agodrm/amdgpu: use DRM_SCHED_FENCE_DONT_PIPELINE for VM updates
Christian König [Fri, 7 Oct 2022 08:59:58 +0000 (10:59 +0200)]
drm/amdgpu: use DRM_SCHED_FENCE_DONT_PIPELINE for VM updates

Make sure that we always have a CPU round trip to let the submission
code correctly decide if a TLB flush is necessary or not.

Signed-off-by: Christian König <christian.koenig@amd.com>
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2113#note_1579296
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014081553.114899-2-christian.koenig@amd.com
3 years agonvmet: fix invalid memory reference in nvmet_subsys_attr_qid_max_show
Daniel Wagner [Fri, 7 Oct 2022 07:29:34 +0000 (09:29 +0200)]
nvmet: fix invalid memory reference in nvmet_subsys_attr_qid_max_show

The item passed into nvmet_subsys_attr_qid_max_show is not a member of
struct nvmet_port, it is part of nvmet_subsys.  Hence, don't try to
dereference it as struct nvme_ctrl pointer.

Fixes: 3e980f5995e0 ("nvmet: Expose max queues to configfs")
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20220913064203.133536-1-dwagner@suse.de
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvmet: fix workqueue MEM_RECLAIM flushing dependency
Sagi Grimberg [Wed, 28 Sep 2022 06:39:10 +0000 (09:39 +0300)]
nvmet: fix workqueue MEM_RECLAIM flushing dependency

The keep alive timer needs to stay on nvmet_wq, and not
modified to reschedule on the system_wq.

This fixes a warning:
------------[ cut here ]------------
workqueue: WQ_MEM_RECLAIM
nvmet-wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing
!WQ_MEM_RECLAIM events:nvmet_keep_alive_timer [nvmet]
WARNING: CPU: 3 PID: 1086 at kernel/workqueue.c:2628
check_flush_dependency+0x16c/0x1e0

Reported-by: Yi Zhang <yi.zhang@redhat.com>
Fixes: 8832cf922151 ("nvmet: use a private workqueue instead of the system workqueue")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme-hwmon: kmalloc the NVME SMART log buffer
Serge Semin [Tue, 18 Oct 2022 15:33:52 +0000 (17:33 +0200)]
nvme-hwmon: kmalloc the NVME SMART log buffer

Recent commit 52fde2c07da6 ("nvme: set dma alignment to dword") has
caused a regression on our platform.

It turned out that the nvme_get_log() method invocation caused the
nvme_hwmon_data structure instance corruption.  In particular the
nvme_hwmon_data.ctrl pointer was overwritten either with zeros or with
garbage.  After some research we discovered that the problem happened
even before the actual NVME DMA execution, but during the buffer mapping.
Since our platform is DMA-noncoherent, the mapping implied the cache-line
invalidations or write-backs depending on the DMA-direction parameter.
In case of the NVME SMART log getting the DMA was performed
from-device-to-memory, thus the cache-invalidation was activated during
the buffer mapping.  Since the log-buffer isn't cache-line aligned, the
cache-invalidation caused the neighbour data to be discarded.  The
neighbouring data turned to be the data surrounding the buffer in the
framework of the nvme_hwmon_data structure.

In order to fix that we need to make sure that the whole log-buffer is
defined within the cache-line-aligned memory region so the
cache-invalidation procedure wouldn't involve the adjacent data. One of
the option to guarantee that is to kmalloc the DMA-buffer [1]. Seeing the
rest of the NVME core driver prefer that method it has been chosen to fix
this problem too.

Note after a deeper researches we found out that the denoted commit wasn't
a root cause of the problem. It just revealed the invalidity by activating
the DMA-based NVME SMART log getting performed in the framework of the
NVME hwmon driver. The problem was here since the initial commit of the
driver.

[1] Documentation/core-api/dma-api-howto.rst

Fixes: 400b6a7b13a3 ("nvme: Add hardware monitoring support")
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme-hwmon: consistently ignore errors from nvme_hwmon_init
Christoph Hellwig [Tue, 18 Oct 2022 14:55:55 +0000 (16:55 +0200)]
nvme-hwmon: consistently ignore errors from nvme_hwmon_init

An NVMe controller works perfectly fine even when the hwmon
initialization fails.  Stop returning errors that do not come from a
controller reset from nvme_hwmon_init to handle this case consistently.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
3 years agodrm/sched: add DRM_SCHED_FENCE_DONT_PIPELINE flag
Christian König [Fri, 7 Oct 2022 07:51:13 +0000 (09:51 +0200)]
drm/sched: add DRM_SCHED_FENCE_DONT_PIPELINE flag

Setting this flag on a scheduler fence prevents pipelining of jobs
depending on this fence. In other words we always insert a full CPU
round trip before dependent jobs are pushed to the pipeline.

Signed-off-by: Christian König <christian.koenig@amd.com>
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2113#note_1579296
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014081553.114899-1-christian.koenig@amd.com
3 years agonvme: add Guenther as nvme-hwmon maintainer
Christoph Hellwig [Tue, 18 Oct 2022 14:59:16 +0000 (16:59 +0200)]
nvme: add Guenther as nvme-hwmon maintainer

Given that non of the overall NVMe maintainers knows this code very
deeply it probably makes sense to add Guenther as an additional
MAINTAINER for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Guenter Roeck <linux@roeck-us.net>
3 years agonvme-apple: don't limit DMA segement size
Russell King (Oracle) [Wed, 12 Oct 2022 11:46:06 +0000 (12:46 +0100)]
nvme-apple: don't limit DMA segement size

NVMe uses PRPs for data transfers and has no specific limit for a single
DMA segement.  Limiting the size will cause problems because the block
layer assumes PRP-ish devices using a virt boundary mask don't have a
segment limit.  And while this is true, we also really need to tell the
DMA mapping layer about it, otherwise dma-debug will trip over it.

Fixes: 5bd2927aceba ("nvme-apple: Add initial Apple SoC NVMe driver")
Suggested-by: Sven Peter <sven@svenpeter.dev>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
[hch: rewrote the commit message based on the PCIe commit]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Curtin <ecurtin@redhat.com>
Reviewed-by: Sven Peter <sven@svenpeter.dev>
3 years agonvme-pci: disable write zeroes on various Kingston SSD
Xander Li [Tue, 11 Oct 2022 11:06:42 +0000 (04:06 -0700)]
nvme-pci: disable write zeroes on various Kingston SSD

Kingston SSDs do support NVMe Write_Zeroes cmd but take long time to
process.  The firmware version is locked by these SSDs, we can not expect
firmware improvement, so disable Write_Zeroes cmd.

Signed-off-by: Xander Li <xander_li@kingston.com.tw>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme: fix error pointer dereference in error handling
Dan Carpenter [Sat, 15 Oct 2022 08:25:56 +0000 (11:25 +0300)]
nvme: fix error pointer dereference in error handling

There is typo here so it releases the wrong variable.  "ctrl->admin_q"
was intended instead of "ctrl->fabrics_q".

Fixes: fe60e8c53411 ("nvme: add common helpers to allocate and free tagsets")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonetfilter: nf_tables: relax NFTA_SET_ELEM_KEY_END set flags requirements
Pablo Neira Ayuso [Mon, 17 Oct 2022 12:12:58 +0000 (14:12 +0200)]
netfilter: nf_tables: relax NFTA_SET_ELEM_KEY_END set flags requirements

Otherwise EINVAL is bogusly reported to userspace when deleting a set
element. NFTA_SET_ELEM_KEY_END does not need to be set in case of:

- insertion: if not present, start key is used as end key.
- deletion: only start key needs to be specified, end key is ignored.

Hence, relax the sanity check.

Fixes: 88cccd908d51 ("netfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flags")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>