Myeonghun Pak [Fri, 24 Apr 2026 13:50:51 +0000 (22:50 +0900)]
hwmon: (corsair-psu) Close HID device on probe errors
corsairpsu_probe() opens the HID device before sending the device init
and firmware-info commands. If either command fails, the error path jumps
directly to fail_and_stop and skips hid_hw_close().
Use the existing fail_and_close label for those post-open failures so the
open count and low-level close callback are balanced before hid_hw_stop().
hwmon: Remove stale CONFIG_SENSORS_SBRMI Makefile reference
kconfiglint reports:
X001: CONFIG_SENSORS_SBRMI referenced in Makefile but not defined
in any Kconfig
The SB-RMI hardware monitoring driver was originally introduced in
commit 5a0f50d110b3 ("hwmon: Add support for SB-RMI power module") with
both a Kconfig entry (CONFIG_SENSORS_SBRMI) and a Makefile line
(obj-$(CONFIG_SENSORS_SBRMI) += sbrmi.o) in drivers/hwmon/.
Commit e156586764050 ("hwmon/misc: amd-sbi: Move core sbrmi from hwmon to
misc")
moved the driver to drivers/misc/amd-sbi/ to support additional
functionality beyond hardware monitoring. That commit correctly removed the
Kconfig entry from drivers/hwmon/Kconfig, moved the source file
drivers/hwmon/sbrmi.c to drivers/misc/amd-sbi/sbrmi.c, and created new
Kconfig/Makefile entries in drivers/misc/amd-sbi/ with a renamed symbol
(CONFIG_AMD_SBRMI_I2C).
However, the Makefile line in drivers/hwmon/Makefile was not removed in
that commit. The orphaned line references a CONFIG symbol that no longer
exists and a source file that is no longer present, so it has no effect
on the build — but it is dead code that should be cleaned up.
hwmon: (ltc2992) Fix u32 overflow in power read path
ltc2992_get_power() computes the divisor for mul_u64_u32_div() as
r_sense_uohm * 1000. This multiplication overflows u32 when
r_sense_uohm exceeds about 4.29 ohms (4294967 micro-ohms), producing
a truncated divisor and an incorrect power reading.
Cancel the factor of 1000 from both the numerator
(VADC_UV_LSB * IADC_NANOV_LSB = 312500000) and the divisor
(r_sense_uohm * 1000), giving (VADC_UV_LSB / 1000) * IADC_NANOV_LSB
= 312500 as the numerator and plain r_sense_uohm as the divisor.
The cancellation is exact because LTC2992_VADC_UV_LSB (25000) is
divisible by 1000.
This is the read-path counterpart of the write-path fix applied in
the preceding patch.
hwmon: (ltc2992) Clamp threshold writes to hardware range
ltc2992_set_voltage(), ltc2992_set_current(), and ltc2992_set_power()
do not validate the user-supplied value before converting it to a
register value. This can result in:
1. Negative input values wrapping to large positive register values.
For power, the negative long is implicitly cast to u64 in
mul_u64_u32_div(), producing an incorrect value. For voltage and
current, the negative converted value wraps when passed to
ltc2992_write_reg() as a u32.
2. Intermediate arithmetic exceeding the range representable in u64 on
64-bit platforms. In ltc2992_set_voltage(), (u64)val * 1000 can
exceed U64_MAX when val is a large positive long. In
ltc2992_set_current(), (u64)val * r_sense_uohm can overflow
similarly. In ltc2992_set_power(), the computed value may not fit
in u64.
3. Register values exceeding the hardware field width. Voltage and
current threshold registers are 12-bit (stored left-justified in
16 bits), and power threshold registers are 24-bit. Without
clamping, bits above the field width are truncated in
ltc2992_write_reg().
Fix by clamping negative values to zero, clamping positive values to
the rounded hardware-representable maximum (the value returned by the
read path for a full-scale register) to prevent intermediate overflow,
and clamping the converted register value to the hardware field width
before writing. The existing conversion formula and rounding behavior
are preserved.
In the power write path, cancel the factor of 1000 from both the
numerator (r_sense_uohm * 1000) and the denominator
(VADC_UV_LSB * IADC_NANOV_LSB) to also eliminate a u32 overflow of
r_sense_uohm * 1000 when r_sense_uohm exceeds about 4.29 ohms.
netfilter: flowtable: ensure sufficient headroom in xmit path
Check for headroom and call skb_expand_head() like in the IP output
path to ensure there is sufficient headroom for the mac header when
forwarding this packet as suggested by sashiko.
netfilter: xtables: fix L4 header parsing for non-first fragments
Multiple targets and matches relies on L4 header to operate. For
fragmented packets, every fragment carries the transport protocol
identifier, but only the first fragment contains the L4 header.
As the 'raw' table can be configured to run at priority -450 (before
defragmentation at -400), the target/match can be reached before
reassembly. In this case, non-first fragments have their payload
incorrectly parsed as a TCP/UDP header. This would be of course a
misconfiguration scenario. In most of the cases this just lead to a
unreliable behavior for fragmented traffic.
Add a fragment check to ensure target/match only evaluates unfragmented
packets or the first fragment in the stream.
Fixes: 902d6a4c2a4f ("netfilter: nf_defrag: Skip defrag if NOTRACK is set") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_tables: skip L4 header parsing for non-first fragments
The tproxy, osf and exthdr (SCTP) expressions rely on the presence of
transport layer headers to perform socket lookups, fingerprint matching,
or chunk extraction. For fragmented packets, while the IP protocol
remains constant across all fragments, only the first fragment contains
the actual L4 header.
The expressions could be attached to a chain with a priority lower than
-400, bypassing defragmentation. Or could be used in stateless
environments where defragmentation is not happening at all. This could
result in garbage data being used for the matching.
Add a check for pkt->fragoff so only unfragmented packets or the first
fragment is processed.
Fixes: 133dc203d77d ("netfilter: nft_exthdr: Support SCTP chunks") Fixes: 4ed8eb6570a4 ("netfilter: nf_tables: Add native tproxy support") Fixes: b96af92d6eaf ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_socket: skip socket lookup for non-first fragments
Both nft_socket and xt_socket relies on L4 headers to perform socket
lookup in the slow path. For fragmented packets, while the IP protocol
remains constant across all fragments, only the first fragment contains
the actual L4 header.
As the expression/match could be attached to a chain with a priority
lower than -400, it could bypass defragmentation.
Add a check for fragmentation in the lookup functions directly so the
problem is handled for both nft_socket and xt_socket at the same time.
In addition, future users of the functions would not need to care about
this.
Fixes: 902d6a4c2a4f ("netfilter: nf_defrag: Skip defrag if NOTRACK is set") Fixes: 554ced0a6e29 ("netfilter: nf_tables: add support for native socket matching") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
- eth:
- stmmac: prevent NULL deref when RX memory exhausted
- airoha: do not read uninitialized fragment address
- rtl8150: fix use-after-free in rtl8150_start_xmit()
Misc:
- add Ido Schimmel as IPv4/IPv6 maintainer
- add David Heidelberg as NFC subsystem maintainer"
* tag 'net-7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits)
net/sched: cls_flower: revert unintended changes
sfc: fix error code in efx_devlink_info_running_versions()
net: tls: fix strparser anchor skb leak on offload RX setup failure
ice: add dpll peer notification for paired SMA and U.FL pins
ice: fix missing dpll notifications for SW pins
dpll: export __dpll_pin_change_ntf() for use under dpll_lock
ice: fix SMA and U.FL pin state changes affecting paired pin
ice: fix missing SMA pin initialization in DPLL subsystem
ice: fix infinite recursion in ice_cfg_tx_topo via ice_init_dev_hw
ice: fix NULL pointer dereference in ice_reset_all_vfs()
iavf: add VIRTCHNL_OP_ADD_VLAN to success completion handler
iavf: wait for PF confirmation before removing VLAN filters
iavf: stop removing VLAN filters from PF on interface down
iavf: rename IAVF_VLAN_IS_NEW to IAVF_VLAN_ADDING
page_pool: fix memory-provider leak in page_pool_create_percpu() error path
bonding: 3ad: implement proper RCU rules for port->aggregator
net: airoha: Do not return err in ndo_stop() callback
hv_sock: fix ARM64 support
MAINTAINERS: update the IPv4/IPv6 entry and add Ido Schimmel
selftests: drv-net: clarify linters and frameworks in README
...
Merge tag 'sound-7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A bunch of small fixes. One minor fix is found in the core side for
data race in PCM OSS layer, while remaining changes are various
device-specific fixes and quirks.
- Core: PCM OSS data race fix
- HD-audio: Fixes for TAS2781, CS35L56, and Realtek/Conexant quirks;
avoidance of a WARN_ON for HDMI channel mapping
- USB-audio: Improvements in UAC3 parsing robustness (leaks, size
checks) and fixes for potential endless loops
- ASoC: Driver-specific fixes for CS35L56, Intel bytcr_wm5102,
Spacemit, AW88395, and others, plus a new quirk for Steam Deck
OLED
- Misc: A UAF fix in aloop driver, division by zero fix in ua101
driver and leak fixes in caiaq driver"
* tag 'sound-7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (32 commits)
ALSA: hda/tas2781: Fix incorrect bit update for non-book-zero or book 0 pages >1
ALSA: hda: cs35l56: Fix uninitialized value in cs35l56_hda_read_acpi()
ALSA: hda/conexant: Fix missing error check for jack detection
ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks
ALSA: usb-audio: Fix quirk entry placement for PreSonus AudioBox USB
ASoC: spacemit: adjust FIFO trigger threshold to half FIFO size
ASoC: spacemit: move hw constraints from hw_params to startup
ASoC: codecs: ab8500: Fix casting of private data
ASoC: cs35l56: Fix illegal writes to OTP_MEM registers
ASoC: Intel: bytcr_wm5102: Fix MCLK leak on platform_clock_control error
ALSA: usb-audio: Avoid potential endless loop in convert_chmap_v3()
ALSA: usb-audio: Fix potential leak of pd at parsing UAC3 streams
ALSA: caiaq: Don't abort when no input device is available
ALSA: caiaq: Fix potentially leftover ep1_in_urb at error path
ASoC: aw88395: Fix kernel panic caused by invalid GPIO error pointer
ALSA: caiaq: fix usb_dev refcount leak on probe failure
sound: ua101: fix division by zero at probe
ALSA: usb-audio: apply quirk for Playstation PDP Riffmaster
ALSA: hda: Remove duplicate cmedia entries in codecs Makefile
ALSA: hda/realtek: Add micmute LED quirk for Acer Aspire A315-44P
...
ALSA: hda/realtek: Fix speaker silence after S3 resume on Xiaomi Mi Laptop Pro 15
The Xiaomi Mi Laptop Pro 15 (TM1905, subsystem 1d72:1905) ships with the
Realtek ALC256 codec on Intel Comet Lake PCH-LP. After S3 resume the
codec sets coefficient register 0x10 to 0x0220 instead of 0x0020 — bit 9
is erroneously set, which silences the internal speaker. Bluetooth and
HDMI audio are unaffected because they use different paths.
This is the same mechanism fixed for Clevo NJ51CU by commit edca7cc4b0ac
("ALSA: hda/realtek: Fix quirk for Clevo NJ51CU"), but the existing
ALC256_FIXUP_MIC_NO_PRESENCE_AND_RESUME also reconfigures pin 0x19 as a
front mic, which is wrong for this Xiaomi where pin 0x19 default is
0x411111f0 (disabled). Add a minimal fixup that only clears the stuck
coef bit, and add the Xiaomi SSID to the quirk table.
Verified by reading coef 0x10 with hda-verb after resume (returns
0x0220), writing 0x0020, and confirming the internal speaker resumes
output. With this fixup applied the bit is cleared on every codec init,
including post-resume.
mm: memcontrol: fix rcu unbalance in get_non_dying_memcg_end()
Currently, get_non_dying_memcg_start() and get_non_dying_memcg_end() both
evaluate cgroup_subsys_on_dfl(memory_cgrp_subsys) independently to
determine whether to acquire or release the RCU read lock.
However, the result of cgroup_subsys_on_dfl() can change dynamically at
runtime due to cgroup hierarchy rebinding (e.g., when the memory
controller is moved between cgroup v1 and v2 hierarchies). This can cause
the following warning:
=====================================
WARNING: bad unlock balance detected!
7.0.0-next-20260420+ #83 Tainted: G W
-------------------------------------
memcg-repro/270 is trying to release lock (rcu_read_lock) at:
[<ffffffff815f57f7>] rcu_read_unlock+0x17/0x60
but there are no more locks to release!
other info that might help us debug this:
1 lock held by memcg-repro/270:
#0: ffff888102fa2088 (vm_lock){++++}-{0:0}, at: do_user_addr_fault+0x285/0x880
Fix this by explicitly tracking the RCU lock state, ensuring that
rcu_read_unlock() in get_non_dying_memcg_end() is strictly paired with the
lock acquisition, regardless of any runtime rebinding events.
Link: https://lore.kernel.org/20260429073105.44472-1-qi.zheng@linux.dev Fixes: 8285917d6f38 ("mm: memcontrol: prepare for reparenting non-hierarchical stats") Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Muchun Song <muchun.song@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
io_uring/tw: serialize ctx->retry_llist with ->uring_lock
The DEFER_TASKRUN local task work paths all run under ctx->uring_lock,
which serializes them with each other and with the rest of the ring's
hot paths. io_move_task_work_from_local() is the exception - it's called
from io_ring_exit_work() on a kworker without holding the lock and from
the iopoll cancelation side right after dropping it.
->work_llist is fine with this, as it's only ever updated via the
expected paths. But the ->retry_llist is updated while runing, and hence
it could potentially race between normal task_work running and the
task-has-exited shutdown path.
Simply grab ->uring_lock while moving the local work to the fallback
list for exit purposes, which nicely serializes it across both the
normal additions and the exit prune path.
Cc: stable@vger.kernel.org Fixes: f46b9cdb22f7 ("io_uring: limit local tw done") Reported-by: Robert Femmer <robert.femmer@x41-dsec.de> Reported-by: Christian Reitter <invd@inhq.net> Reported-by: Michael Rodler <michael.rodler@x41-dsec.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
platform/x86: lenovo: wmi-other: Fix uninitialized variable in lwmi_om_hwmon_write()
When the flag relax_fan_constraint is set, local variable 'raw'
is never assigned, and lwmi_om_hwmon_write() will pass uninitialized
value to lwmi_om_fan_get_set() resulting in undefined behavior.
This flag allows user to bypass minimum fan RPM divisor rounding,
but assignment to 'raw' only happens in the non-relaxed path.
Fix by defaulting 'raw' to user provided 'val' in the else branch.
Fixes: 51ed34282f63 ("platform/x86: lenovo-wmi-other: Add HWMON for fan reporting/tuning") Reviewed-by: Rong Zhang <i@rong.moe> Signed-off-by: Yufei CHENG <cd345al@gmail.com> Link: https://patch.msgid.link/20260426165034.9073-1-cd345al@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
platform/x86: hp-wmi: silence unknown board warning for 8D41
The HP Omen Max 16-ah0xxx, DMI board ID 8D41 is currently marked with
victus_s_thermal_params in the victus_s_thermal_profile_boards[] list.
This disables thermal profile readback and adds a dmesg warning during
driver init for "unknown board".
After testing we know that (similar to another HP Omen Max 16 device,
board ID 8D87), the embedded controller on this board does not expose
thermal profile which means we have to intentionally disable EC
readback.
Changing its driver_data to omen_v1_no_ec_thermal_params is sufficient
to silence the warning.
Tested-by: Benjamin Y <hectare.plains1h@icloud.com> Signed-off-by: Krishna Chomal <krishna.chomal108@gmail.com> Link: https://patch.msgid.link/20260429180953.129885-1-krishna.chomal108@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Kurt Borja [Wed, 29 Apr 2026 13:20:56 +0000 (08:20 -0500)]
platform/wmi: Fix unchecked min_size in wmidev_invoke_method()
After calling wmidev_evaluate_method(), if the ACPI core does not return
an out object, then wmidev_invoke_method() bypasses the min_size check
and returns 0. Add a check for min_size if there is not an out object.
Fixes: 1aeded2f55f0 ("platform/wmi: Extend wmidev_query_block() to reject undersized data") Closes: https://sashiko.dev/#/patchset/20260406203237.2970-1-W_Armin%40gmx.de Signed-off-by: Kurt Borja <kuurtb@gmail.com> Reviewed-by: Armin Wolf <W_Armin@gmx.de> Link: https://patch.msgid.link/20260429-invoke-fix-v1-1-ce938eb80cd3@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Prevent re-exporting of imported GEM buffers by adding a custom
prime_handle_to_fd callback that checks if the object is imported
and returns -EOPNOTSUPP if so.
Re-exporting imported GEM buffers causes loss of buffer flags settings,
leading to incorrect device access and data corruption.
Reported-by: Yametsu <yam3tsu@gmail.com> Fixes: 57557964b582 ("accel/ivpu: Add support for userptr buffer objects") Reviewed-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com> Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Cc: <stable@vger.kernel.org> # v6.19+
Paolo Abeni [Wed, 29 Apr 2026 07:39:11 +0000 (09:39 +0200)]
net/sched: cls_flower: revert unintended changes
While applying the blamed commit 4ca07b9239bd ("net: mctp i2c: check
length before marking flow active"), I unintentionally included
unrelated and unacceptable changes.
Revert them.
Fixes: 4ca07b9239bd ("net: mctp i2c: check length before marking flow active") Reported-by: Jeremy Kerr <jk@codeconstruct.com.au> Closes: https://lore.kernel.org/netdev/bd8704fe0bd53e278add5cde4873256656623e2e.camel@codeconstruct.com.au/ Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/043026a53ff84da88b17648c4b0d17f0331749cb.1777447863.git.pabeni@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Dan Carpenter [Wed, 29 Apr 2026 06:48:17 +0000 (09:48 +0300)]
sfc: fix error code in efx_devlink_info_running_versions()
Return -EIO if efx_mcdi_rpc() doesn't return enough space.
Fixes: 14743ddd2495 ("sfc: add devlink info support for ef100") Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/afGpsbLRHL4_H0KS@stanley.mountain Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When tls_set_device_offload_rx() fails at tls_dev_add(), the error path
calls tls_sw_free_resources_rx() to clean up the SW context that was
initialized by tls_set_sw_offload(). This function calls
tls_sw_release_resources_rx() (which stops the strparser via
tls_strp_stop()) and tls_sw_free_ctx_rx() (which kfrees the context),
but never frees the anchor skb that was allocated by alloc_skb(0) in
tls_strp_init().
Note that tls_sw_free_resources_rx() is exclusively used for this
"failed to start offload" code path, there's no other caller.
The leak did not exist before commit 84c61fe1a75b ("tls: rx: do not use
the standard strparser"), because the standard strparser doesn't try
to pre-allocate an skb.
The normal close path in tls_sk_proto_close() handles cleanup by calling
tls_sw_strparser_done() (which calls tls_strp_done()) after dropping
the socket lock, because tls_strp_done() does cancel_work_sync() and
the strparser work handler takes the socket lock.
Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20260428231559.1358502-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
Intel Wired LAN Update 2026-04-27 (ice, iavf)
Petr Oros from RedHat has accumulated a number of fixes for the Intel ice
and iavf drivers, bundled together in this series.
First, a series of 4 fixes to resolve issues with the iavf driver logic for
handling VLAN filters. This includes keeping VLAN filters while the
interface is brought down, waiting for confirmation on filter deletion
before deleting filters from the driver tracking structures, and handling
the VIRTCHNL_OP_ADD_VLAN for the old v1 VLAN_ADD command.
A fix for a crash in ice_reset_all_vfs(), properly checking for errors when
ice_vf_rebuild_vsi() fails.
A fix for a possible infinite recursion in ice_cfg_tx_topo() that occurs
when trying to apply invalid Tx topology configuration.
A fix to initialize the SMA pins in the DPLL subsystem properly.
A fix to change the SMA and U.FL pin state for paired pins, ensuring that
all flows changing one pin will also update its shared pin appropriately.
A preparatory patch to export __dpll_pin_change_ntf() so that drivers can
notify pin changes while already holding the dpll_lock.
A fix to ensure DPLL notifications are sent for the software-controlled
pins which wrap the physical CGU input/output pins.
A fix to add DPLL notifications for peer pins when changing the SMA or U.FL
pins, ensuring DPLL subsystem is notified about the paired connected pins.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
====================
Petr Oros [Tue, 28 Apr 2026 05:22:23 +0000 (22:22 -0700)]
ice: add dpll peer notification for paired SMA and U.FL pins
SMA and U.FL pins share physical signal paths in pairs (SMA1/U.FL1 and
SMA2/U.FL2). When one pin's state changes via a PCA9575 GPIO write,
the paired pin's state also changes, but no notification is sent for
the peer pin. Userspace consumers monitoring the peer via dpll netlink
subscribe never learn about the update.
Add ice_dpll_sw_pin_notify_peer() which sends a change notification for
the paired SW pin. Call it from ice_dpll_pin_sma_direction_set(),
ice_dpll_sma_pin_state_set(), and ice_dpll_ufl_pin_state_set() after
pf->dplls.lock is released. Use __dpll_pin_change_ntf() because
dpll_lock is still held by the dpll netlink layer (dpll_pin_pre_doit).
Fixes: 2dd5d03c77e2 ("ice: redesign dpll sma/u.fl pins control") Signed-off-by: Petr Oros <poros@redhat.com> Tested-by: Alexander Nowlin <alexander.nowlin@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-11-cdcb48303fd8@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Petr Oros [Tue, 28 Apr 2026 05:22:22 +0000 (22:22 -0700)]
ice: fix missing dpll notifications for SW pins
The SMA/U.FL pin redesign (commit 2dd5d03c77e2 ("ice: redesign dpll
sma/u.fl pins control")) introduced software-controlled pins that wrap
backing CGU input/output pins, but never updated the notification and
data paths to propagate pin events to these SW wrappers.
The periodic work sends dpll_pin_change_ntf() only for direct CGU input
pins. SW pins that wrap these inputs never receive change or phase
offset notifications, so userspace consumers such as synce4l monitoring
SMA pins via dpll netlink never learn about state transitions or phase
offset updates. Similarly, ice_dpll_phase_offset_get() reads the SW
pin's own phase_offset field which is never updated; the PPS monitor
writes to the backing CGU input's field instead.
Fix by introducing ice_dpll_pin_ntf(), a wrapper around
dpll_pin_change_ntf() that also notifies any registered SMA/U.FL pin
whose backing CGU input matches. Replace all direct
dpll_pin_change_ntf() calls in the periodic notification paths with
this wrapper. Fix ice_dpll_phase_offset_get() to return the backing
CGU input's phase_offset for input-direction SW pins.
Fixes: 2dd5d03c77e2 ("ice: redesign dpll sma/u.fl pins control") Signed-off-by: Petr Oros <poros@redhat.com> Tested-by: Alexander Nowlin <alexander.nowlin@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-10-cdcb48303fd8@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Ivan Vecera [Tue, 28 Apr 2026 05:22:21 +0000 (22:22 -0700)]
dpll: export __dpll_pin_change_ntf() for use under dpll_lock
Export __dpll_pin_change_ntf() so that drivers can send pin change
notifications from within pin callbacks, which are already called
under dpll_lock. Using dpll_pin_change_ntf() in that context would
deadlock.
Add lockdep_assert_held() to catch misuse without the lock held.
Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com> Tested-by: Alexander Nowlin <alexander.nowlin@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-9-cdcb48303fd8@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Petr Oros [Tue, 28 Apr 2026 05:22:20 +0000 (22:22 -0700)]
ice: fix SMA and U.FL pin state changes affecting paired pin
SMA and U.FL pins share physical signal paths in pairs (SMA1/U.FL1 and
SMA2/U.FL2) controlled by the PCA9575 GPIO expander. Each pair can
only have one active pin at a time: SMA1 output and U.FL1 output share
the same CGU output, SMA2 input and U.FL2 input share the same CGU
input. The PCA9575 register bits determine which connector in each
pair owns the signal path.
The driver does not account for this pairing in two places:
ice_dpll_ufl_pin_state_set() modifies PCA9575 bits and disables the
backing CGU pin without checking whether the U.FL pin is currently
active. Disconnecting an already inactive U.FL pin flips bits that
the paired SMA pin relies on, breaking its connection.
ice_dpll_sma_direction_set() does not propagate direction changes to
the paired U.FL pin. For SMA2/U.FL2 the ICE_SMA2_UFL2_RX_DIS bit is
never managed, so U.FL2 stays disconnected after SMA2 switches to
output. For both pairs the backing CGU pin of the U.FL side is never
enabled when a direction change activates it, so userspace sees the
pin as disconnected even though the routing is correct.
Fix by guarding the U.FL disconnect path against inactive pins and by
updating the paired U.FL pin fully on SMA direction changes: manage
ICE_SMA2_UFL2_RX_DIS for the SMA2/U.FL2 pair and enable the backing
CGU pin whenever the peer becomes active.
Fixes: 2dd5d03c77e2 ("ice: redesign dpll sma/u.fl pins control") Signed-off-by: Petr Oros <poros@redhat.com> Tested-by: Alexander Nowlin <alexander.nowlin@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-8-cdcb48303fd8@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Petr Oros [Tue, 28 Apr 2026 05:22:19 +0000 (22:22 -0700)]
ice: fix missing SMA pin initialization in DPLL subsystem
The DPLL SMA/U.FL pin redesign introduced ice_dpll_sw_pin_frequency_get()
which gates frequency reporting on the pin's active flag. This flag is
determined by ice_dpll_sw_pins_update() from the PCA9575 GPIO expander
state. Before the redesign, SMA pins were exposed as direct HW
input/output pins and ice_dpll_frequency_get() returned the CGU
frequency unconditionally — the PCA9575 state was never consulted.
The PCA9575 powers on with all outputs high, setting ICE_SMA1_DIR_EN,
ICE_SMA1_TX_EN, ICE_SMA2_DIR_EN and ICE_SMA2_TX_EN. Nothing in the
driver writes the register during initialization, so
ice_dpll_sw_pins_update() sees all pins as inactive and
ice_dpll_sw_pin_frequency_get() permanently returns 0 Hz for every
SW pin.
Fix this by writing a default SMA configuration in
ice_dpll_init_info_sw_pins(): clear all SMA bits, then set SMA1 and
SMA2 as active inputs (DIR_EN=0) with U.FL1 output and U.FL2 input
disabled. Each SMA/U.FL pair shares a physical signal path so only
one pin per pair can be active at a time. U.FL pins still report
frequency 0 after this fix: U.FL1 (output-only) is disabled by
ICE_SMA1_TX_EN which keeps the TX output buffer off, and U.FL2
(input-only) is disabled by ICE_SMA2_UFL2_RX_DIS. They can be
activated by changing the corresponding SMA pin direction via dpll
netlink.
Fixes: 2dd5d03c77e2 ("ice: redesign dpll sma/u.fl pins control") Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Tested-by: Alexander Nowlin <alexander.nowlin@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-7-cdcb48303fd8@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Petr Oros [Tue, 28 Apr 2026 05:22:18 +0000 (22:22 -0700)]
ice: fix infinite recursion in ice_cfg_tx_topo via ice_init_dev_hw
On certain E810 configurations where firmware supports Tx scheduler
topology switching (tx_sched_topo_comp_mode_en), ice_cfg_tx_topo()
may need to apply a new 5-layer or 9-layer topology from the DDP
package. If the AQ command to set the topology fails (e.g. due to
invalid DDP data or firmware limitations), the global configuration
lock must still be cleared via a CORER reset.
Commit 86aae43f21cf ("ice: don't leave device non-functional if Tx
scheduler config fails") correctly fixed this by refactoring
ice_cfg_tx_topo() to always trigger CORER after acquiring the global
lock and re-initialize hardware via ice_init_hw() afterwards.
However, commit 8a37f9e2ff40 ("ice: move ice_deinit_dev() to the end
of deinit paths") later moved ice_init_dev_hw() into ice_init_hw(),
breaking the reinit path introduced by 86aae43f21cf. This creates an
infinite recursive call chain:
Fix by moving ice_init_dev_hw() back out of ice_init_hw() and calling
it explicitly from ice_probe() and ice_devlink_reinit_up(). The third
caller, ice_cfg_tx_topo(), intentionally does not need ice_init_dev_hw()
during its reinit, it only needs the core HW reinitialization. This
breaks the recursion cleanly without adding flags or guards.
The deinit ordering changes from commit 8a37f9e2ff40 ("ice: move
ice_deinit_dev() to the end of deinit paths") which fixed slow rmmod
are preserved, only the init-side placement of ice_init_dev_hw() is
reverted.
Fixes: 8a37f9e2ff40 ("ice: move ice_deinit_dev() to the end of deinit paths") Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Alexander Nowlin <alexander.nowlin@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-6-cdcb48303fd8@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Petr Oros [Tue, 28 Apr 2026 05:22:17 +0000 (22:22 -0700)]
ice: fix NULL pointer dereference in ice_reset_all_vfs()
ice_reset_all_vfs() ignores the return value of ice_vf_rebuild_vsi().
When the VSI rebuild fails (e.g. during NVM firmware update via
nvmupdate64e), ice_vsi_rebuild() tears down the VSI on its error path,
leaving txq_map and rxq_map as NULL. The subsequent unconditional call
to ice_vf_post_vsi_rebuild() leads to a NULL pointer dereference in
ice_ena_vf_q_mappings() when it accesses vsi->txq_map[0].
The single-VF reset path in ice_reset_vf() already handles this
correctly by checking the return value of ice_vf_reconfig_vsi() and
skipping ice_vf_post_vsi_rebuild() on failure.
Apply the same pattern to ice_reset_all_vfs(): check the return value
of ice_vf_rebuild_vsi() and skip ice_vf_post_vsi_rebuild() and
ice_eswitch_attach_vf() on failure. The VF is left safely disabled
(ICE_VF_STATE_INIT not set, VFGEN_RSTAT not set to VFACTIVE) and can
be recovered via a VFLR triggered by a PCI reset of the VF
(sysfs reset or driver rebind).
Note that this patch does not prevent the VF VSI rebuild from failing
during NVM update — the underlying cause is firmware being in a
transitional state while the EMP reset is processed, which can cause
Admin Queue commands (ice_add_vsi, ice_cfg_vsi_lan) to fail. This
patch only prevents the subsequent NULL pointer dereference that
crashes the kernel when the rebuild does fail.
dmesg:
ice 0000:13:00.0: firmware recommends not updating fw.mgmt, as it
may result in a downgrade. continuing anyways
ice 0000:13:00.1: ice_init_nvm failed -5
ice 0000:13:00.1: Rebuild failed, unload and reload driver
Petr Oros [Tue, 28 Apr 2026 05:22:16 +0000 (22:22 -0700)]
iavf: add VIRTCHNL_OP_ADD_VLAN to success completion handler
The V1 ADD_VLAN opcode had no success handler; filters sent via V1
stayed in ADDING state permanently. Add a fallthrough case so V1
filters also transition ADDING -> ACTIVE on PF confirmation.
Critically, add an `if (v_retval) break` guard: the error switch in
iavf_virtchnl_completion() does NOT return after handling errors,
it falls through to the success switch. Without this guard, a
PF-rejected ADD would incorrectly mark ADDING filters as ACTIVE,
creating a driver/HW mismatch where the driver believes the filter
is installed but the PF never accepted it.
For V2, this is harmless: iavf_vlan_add_reject() in the error
block already kfree'd all ADDING filters, so the success handler
finds nothing to transition.
Fixes: 968996c070ef ("iavf: Fix VLAN_V2 addition/rejection") Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-4-cdcb48303fd8@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Petr Oros [Tue, 28 Apr 2026 05:22:15 +0000 (22:22 -0700)]
iavf: wait for PF confirmation before removing VLAN filters
The VLAN filter DELETE path was asymmetric with the ADD path: ADD
waits for PF confirmation (ADD -> ADDING -> ACTIVE), but DELETE
immediately frees the filter struct after sending the DEL message
without waiting for the PF response.
This is problematic because:
- If the PF rejects the DEL, the filter remains in HW but the driver
has already freed the tracking structure, losing sync.
- Race conditions between DEL pending and other operations
(add, reset) cannot be properly resolved if the filter struct
is already gone.
Add IAVF_VLAN_REMOVING state to make the DELETE path symmetric:
In iavf_del_vlans(), transition filters from REMOVE to REMOVING
instead of immediately freeing them. The new DEL completion handler
in iavf_virtchnl_completion() frees filters on success or reverts
them to ACTIVE on error.
Update iavf_add_vlan() to handle the REMOVING state: if a DEL is
pending and the user re-adds the same VLAN, queue it for ADD so
it gets re-programmed after the PF processes the DEL.
The !VLAN_FILTERING_ALLOWED early-exit path still frees filters
directly since no PF message is sent in that case.
Also update iavf_del_vlan() to skip filters already in REMOVING
state: DEL has been sent to PF and the completion handler will
free the filter when PF confirms. Without this guard, the sequence
DEL(pending) -> user-del -> second DEL could cause the PF to return
an error for the second DEL (filter already gone), causing the
completion handler to incorrectly revert a deleted filter back to
ACTIVE.
Fixes: 968996c070ef ("iavf: Fix VLAN_V2 addition/rejection") Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-3-cdcb48303fd8@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Petr Oros [Tue, 28 Apr 2026 05:22:14 +0000 (22:22 -0700)]
iavf: stop removing VLAN filters from PF on interface down
When a VF goes down, the driver currently sends DEL_VLAN to the PF for
every VLAN filter (ACTIVE -> DISABLE -> send DEL -> INACTIVE), then
re-adds them all on UP (INACTIVE -> ADD -> send ADD -> ADDING ->
ACTIVE). This round-trip is unnecessary because:
1. The PF disables the VF's queues via VIRTCHNL_OP_DISABLE_QUEUES,
which already prevents all RX/TX traffic regardless of VLAN filter
state.
2. The VLAN filters remaining in PF HW while the VF is down is
harmless - packets matching those filters have nowhere to go with
queues disabled.
3. The DEL+ADD cycle during down/up creates race windows where the
VLAN filter list is incomplete. With spoofcheck enabled, the PF
enables TX VLAN filtering on the first non-zero VLAN add, blocking
traffic for any VLANs not yet re-added.
Remove the entire DISABLE/INACTIVE state machinery:
- Remove IAVF_VLAN_DISABLE and IAVF_VLAN_INACTIVE enum values
- Remove iavf_restore_filters() and its call from iavf_open()
- Remove VLAN filter handling from iavf_clear_mac_vlan_filters(),
rename it to iavf_clear_mac_filters()
- Remove DEL_VLAN_FILTER scheduling from iavf_down()
- Remove all DISABLE/INACTIVE handling from iavf_del_vlans()
VLAN filters now stay ACTIVE across down/up cycles. Only explicit
user removal (ndo_vlan_rx_kill_vid) or PF/VF reset triggers VLAN
filter deletion/re-addition.
Fixes: ed1f5b58ea01 ("i40evf: remove VLAN filters on close") Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-2-cdcb48303fd8@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Petr Oros [Tue, 28 Apr 2026 05:22:13 +0000 (22:22 -0700)]
iavf: rename IAVF_VLAN_IS_NEW to IAVF_VLAN_ADDING
Rename the IAVF_VLAN_IS_NEW state to IAVF_VLAN_ADDING to better
describe what the state represents: an ADD request has been sent to
the PF and is waiting for a response.
This is a pure rename with no behavioral change, preparing for a
cleanup of the VLAN filter state machine.
Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-1-cdcb48303fd8@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
parisc: Fix build failure for 32-bit kernel with PA2.0 instruction set
The CONFIG_PA11 option can not be used as a reliable check if we build a
32-bit kernel which needs the 32-bit VDSO.
Instead depend on CONFIG_64BIT and CONFIG_COMPAT only.
Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de> Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de> Signed-off-by: Helge Deller <deller@gmx.de>
netfilter: nf_tables: fix netdev hook allocation memleak with dormant tables
sashiko says:
could the related code in __nf_tables_abort() leak the struct nft_hook objects when the table is dormant?
In __nf_tables_abort(), when rolling back a NEWCHAIN transaction that
updates hooks, the code conditionally unregisters and frees the hooks only
if the table is not dormant [..]
if (!(table->flags & NFT_TABLE_F_DORMANT)) {
nft_netdev_unregister_hooks(net,
&nft_trans_chain_hooks(trans),
true);
}
...
nft_trans_destroy(trans);
Unfortunately netdev family mixes hook registration and allocation.
Push table struct down and only check for the flag to unregister.
Fixes: 216e7bf7402c ("netfilter: nf_tables: skip netdev hook unregistration if table is dormant") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: xt_CT: fix usersize for v1 and v2 revision
While resurrecting the conntrack-tool test cases I found following bug:
In:
iptables -I OUTPUT -t raw -p 13 -j CT --timeout test-generic
Out:
[0:0] -A OUTPUT -p 13 -j CT --timeout test
Data after first four bytes of the timeout policy name is never
copied to userspace because its treated as kernel-only.
Fixes: ec2318904965 ("xtables: extend matches and targets with .usersize") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Merge tag 'trace-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix inverted check of registering the stats for branch tracing
When calling register_stat_tracer() which returns zero on success and
negative on error, the callers were checking the return of zero as an
error and printing a warning message. Because this was just a normal
printk() message and not a WARN(), it wasn't caught in any testing.
Fix the check to print the warning message when an error actually
happens.
- Fix a typo in a comment in tracepoint.h
- Limit the size of event probes to 3K in size
It is possible to create a dynamic event probe via the tracefs system
that is greater than the max size of an event that the ring buffer
can hold. This basically causes the event to become useless.
Limit the size of an event probe to be 3K as that should be large
enough to handle any dynamic events being created, and fits within
the PAGE_SIZE sub-buffers of the ring buffer.
* tag 'trace-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/probes: Limit size of event probe to 3K
tracepoint: Fix typo in tracepoint.h comment
tracing: branch: Fix inverted check on stat tracer registration
regulator: rpi-panel-attiny: add back GPIOLIB dependency
This driver provides a gpio chip, which is only possible when GPIOLIB
is enabled, which was previously guaranteed by the CONFIG_OF_GPIO
dependency that is now gone:
riscv: Define __riscv_copy_{,vec_}{words,bytes}_unaligned() using SYM_TYPED_FUNC_START
After commit 67bdd7b01387 ("riscv: Split out measure_cycles() for
reuse") and commit c03ad15f7cf6 ("riscv: Reuse measure_cycles() in
check_vector_unaligned_access()"), there are CFI failure when booting
kernels with CONFIG_CFI=y:
CFI failure at measure_cycles+0x38/0xe0 (target: __riscv_copy_words_unaligned+0x0/0x50; expected type: ...)
CFI failure at measure_cycles+0x38/0xe0 (target: __riscv_copy_vec_words_unaligned+0x0/0x24; expected type: ...)
The __riscv_copy_*_unaligned() functions are now called indirectly but
they are not defined with SYM_TYPED_FUNC_START, which is required for
assembly functions called indirectly from C to pass CFI checking. Switch
to SYM_TYPED_FUNC_START to clear up the CFI failures.
Fixes: 67bdd7b01387 ("riscv: Split out measure_cycles() for reuse") Fixes: c03ad15f7cf6 ("riscv: Reuse measure_cycles() in check_vector_unaligned_access()") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://patch.msgid.link/20260406-measure_cycles-cfi-failure-v1-1-03e0234ae02f@kernel.org Signed-off-by: Paul Walmsley <pjw@kernel.org>
Hasan Basbunar [Tue, 28 Apr 2026 17:07:39 +0000 (19:07 +0200)]
page_pool: fix memory-provider leak in page_pool_create_percpu() error path
When page_pool_create_percpu() fails on page_pool_list(), it falls
through to its err_uninit: label, which calls page_pool_uninit().
At that point page_pool_init() has already taken two references
when the user requested PP_FLAG_ALLOW_UNREADABLE_NETMEM:
Neither is undone by page_pool_uninit(); both are only undone by
__page_pool_destroy() (success-side teardown). The error path
therefore leaks the per-provider reference taken by mp_ops->init
(io_zcrx_ifq->refs in the io_uring zcrx provider, the dmabuf
binding refcount in the devmem provider) plus one increment of
the page_pool_mem_providers static branch on every failure of
xa_alloc_cyclic() inside page_pool_list().
The leaked io_zcrx_ifq->refs in turn pins everything
io_zcrx_ifq_free() would release on cleanup: ifq->user (uid),
ifq->mm_account (mmdrop), ifq->dev (device refcount),
ifq->netdev_tracker (netdev refcount), and the rbuf region.
The leaked static branch increment forces all subsequent
page_pool_alloc_netmems() and page_pool_return_page() callers to
take the slow mp_ops branch for the lifetime of the kernel.
The same shape applies to the devmem dmabuf provider via
mp_dmabuf_devmem_init()/mp_dmabuf_devmem_destroy().
Restore the cleanup symmetry by moving the mp_ops->destroy() and
static_branch_dec() calls out of __page_pool_destroy() and into
page_pool_uninit(), so page_pool_uninit() is again the strict
inverse of page_pool_init(). page_pool_uninit() has only two
callers (the err_uninit: path and __page_pool_destroy()), so this
preserves the single-call invariant on the success path while
fixing the err path. The error path of page_pool_init() itself
still skips the mp_ops cleanup correctly: mp_ops->init is the
last action that takes a reference before page_pool_init() returns
0, so when it returns an error neither the refcount nor the static
branch has been touched.
Triggering the bug requires xa_alloc_cyclic() to fail with -ENOMEM,
which under normal GFP_KERNEL retry behaviour is rare. It is
deterministic under CONFIG_FAULT_INJECTION with fail_page_alloc /
xa fault injection, or under sustained memory pressure. The leak
is silent: there is no warning, and the released kernel build
continues running with a permanently-incremented static branch.
value changed: 0x0000000000000000 -> 0xffff88813cf5c400
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 UID: 0 PID: 22063 Comm: syz.0.31122 Tainted: G W syzkaller #0 PREEMPT(full)
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
Fixes: 47e91f56008b ("bonding: use RCU protection for 3ad xmit path") Reported-by: syzbot+9bb2ff2a4ab9e17307e1@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/69f0a82f.050a0220.3aadc4.0000.GAE@google.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <jv@jvosburgh.net> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Link: https://patch.msgid.link/20260428123207.3809211-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lorenzo Bianconi [Tue, 28 Apr 2026 06:53:16 +0000 (08:53 +0200)]
net: airoha: Do not return err in ndo_stop() callback
Always complete the airoha_dev_stop() routine regardless of the
airoha_set_vip_for_gdm_port() return value, since errors from
ndo_stop() are ignored by the networking stack and the interface is
always considered down after the call.
Hamza Mahfooz [Tue, 28 Apr 2026 12:53:39 +0000 (08:53 -0400)]
hv_sock: fix ARM64 support
VMBUS ring buffers must be page aligned. Therefore, the current value of
24K presents a challenge on ARM64 kernels (with 64K pages). So, use
VMBUS_RING_SIZE() to ensure they are always aligned and large enough to
hold all of the relevant data.
Cc: stable@vger.kernel.org Fixes: 77ffe33363c0 ("hv_sock: use HV_HYP_PAGE_SIZE for Hyper-V communication") Tested-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260428125339.13963-1-hamzamahfooz@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 28 Apr 2026 20:39:24 +0000 (13:39 -0700)]
MAINTAINERS: update the IPv4/IPv6 entry and add Ido Schimmel
The IPv4/IPv6 and routing code is not very well separated from
the TCP/UDP code. Scope it down properly by providing a more
accurate file list, instead of net/ipv4/ and net/ipv6/
Now that the entry is more accurately representing layer 3
and routing merge in the nexthop entry into it.
Add Ido Schimmel as a co-maintainer, Ido's git history speaks
for itself.
Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260428203924.1229169-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 28 Apr 2026 20:33:57 +0000 (13:33 -0700)]
selftests: drv-net: clarify linters and frameworks in README
Minor clarifications in the README:
- call out what linters we expect to be clean
- make it clear that by "frameworks" we mean code under lib/
not just factoring code out in the same file
Jakub Kicinski [Tue, 28 Apr 2026 02:53:20 +0000 (19:53 -0700)]
net: add net_iov_init() and use it to initialize ->page_type
Commit db359fccf212 ("mm: introduce a new page type for page pool in
page type") added a page_type field to struct net_iov at the same
offset as struct page::page_type, so that page_pool_set_pp_info() can
call __SetPageNetpp() uniformly on both pages and net_iovs.
The page-type API requires the field to hold the UINT_MAX "no type"
sentinel before a type can be set; for real struct page that invariant
is established by the page allocator on free. struct net_iov is not
allocated through the page allocator, so the field is left as zero
(io_uring zcrx, which uses __GFP_ZERO) or as slab garbage (devmem,
which uses kvmalloc_objs() without zeroing). When the page pool then
calls page_pool_set_pp_info() on a freshly-bound niov,
__SetPageNetpp()'s VM_BUG_ON_PAGE(page->page_type != UINT_MAX) fires
and the kernel BUGs. Triggered in selftests by io_uring zcrx setup
through the fbnic queue restart path:
The same path is reachable through devmem dmabuf binding via
netdev_nl_bind_rx_doit() -> net_devmem_bind_dmabuf_to_queue().
Add a net_iov_init() helper that stamps ->owner, ->type and the
->page_type sentinel, and use it from both the devmem and io_uring
zcrx niov init loops.
Fixes: db359fccf212 ("mm: introduce a new page type for page pool in page type") Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Acked-by: Byungchul Park <byungchul@sk.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Acked-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://patch.msgid.link/20260428025320.853452-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Weiming Shi [Mon, 27 Apr 2026 12:34:50 +0000 (14:34 +0200)]
netfilter: nft_fwd_netdev: use recursion counter in neigh egress path
nft_fwd_neigh can be used in egress chains (NF_NETDEV_EGRESS). When the
forwarding rule targets the same device or two devices forward to each
other, neigh_xmit() triggers dev_queue_xmit() which re-enters
nf_hook_egress(), causing infinite recursion and stack overflow.
Move the nf_get_nf_dup_skb_recursion() accessor and NF_RECURSION_LIMIT
to the shared header nf_dup_netdev.h as a static inline, so that
nft_fwd_netdev can use the recursion counter directly without exported
function call overhead. Guard neigh_xmit() with the same recursion
limit already used in nf_do_netdev_egress().
[ Updated to cache the nf_get_nf_dup_skb_recursion pointer. --pablo ]
Fixes: f87b9464d152 ("netfilter: nft_fwd_netdev: Support egress hook") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nft_fwd_netdev: add device and headroom validate with neigh forwarding
The ttl field has been decremented already and evaluation of this rule
would proceed, just drop this packet instead if there is no destination
device to forwards this packet. This is exactly what nf_dup already does
in this case.
Moreover, check for headroom and call skb_expand_head() like in the IP
output path to ensure there is sufficient headroom when forwarding this
via neigh_xmit().
Fixes: d32de98ea70f ("netfilter: nft_fwd_netdev: allow to forward packets via neighbour layer") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: replace skb_try_make_writable() by skb_ensure_writable()
skb_try_make_writable() only works on clones and uncloned packets might
have their network header in paged fragments.
nft_fwd needs to work for the ingress and egress hooks, but the egress
hook where skb->data points to the mac header, use skb_network_offset()
to include the mac header. The flowtable is fine since it already uses
the transport offset.
Fixes: d32de98ea70f ("netfilter: nft_fwd_netdev: allow to forward packets via neighbour layer") Fixes: 7d2086871762 ("netfilter: nf_flow_table: move ipv4 offload hook code to nf_flow_table") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Shyam Prasad N [Mon, 30 Mar 2026 10:49:59 +0000 (16:19 +0530)]
cifs: change_conf needs to be called for session setup
Today we skip calling change_conf for negotiates and session setup
requests. This can be a problem for mchan as the immediate next call
after session setup could be due to an I/O that is made on the
mount point. For single channel, this is not a problem as
there will be several calls after setting up session.
This change enforces calling change_conf when the total credits contain
enough for reservations for echoes and oplocks. We expect this to happen
during the last session setup response. This way, echoes and oplocks are
not disabled before the first request to the server. So if that first
request is an open, it does not need to disable requesting leases.
Cc: <stable@vger.kernel.org> Reviewed-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
smb: client: change allocation requirements in smb2_compound_op
Currently, smb2_compound_op() allocates
struct smb2_compound_vars *vars using GFP_ATOMIC, although
smb2_compound_op() can sleep when it calls compound_send_recv()
before vars is freed.
Allocate vars using GFP_KERNEL.
Signed-off-by: Fredric Cover <fredric.cover.lkernel@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Hyunchul Lee [Tue, 28 Apr 2026 01:34:10 +0000 (10:34 +0900)]
ntfs: drop nlink once for WIN32/DOS aliases
NTFS could store a filename as paired WIN32 and DOS $FILE_NAME attributes
for directories. But ntfs_delete() deleted both attributes for unlinking
a directory, but it also called drop_nlink() for each attributes.
This could trigger warnings when unlinking directories.
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Steven Rostedt [Tue, 28 Apr 2026 16:23:02 +0000 (12:23 -0400)]
tracing/probes: Limit size of event probe to 3K
There currently isn't a max limit an event probe can be. One could make an
event greater than PAGE_SIZE, which makes the event useless because if
it's bigger than the max event that can be recorded into the ring buffer,
then it will never be recorded.
A event probe should never need to be greater than 3K, so make that the
max size. As long as the max is less than the max that can be recorded
onto the ring buffer, it should be fine.
Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 93ccae7a22274 ("tracing/kprobes: Support basic types on dynamic events") Link: https://patch.msgid.link/20260428122302.706610ba@gandalf.local.home Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
workqueue: Annotate alloc_workqueue_va() with __printf(1, 0)
alloc_workqueue_va() forwards its va_list to __alloc_workqueue() which
ultimately feeds vsnprintf(). __alloc_workqueue() already carries
__printf(1, 0); the new wrapper needs the same annotation so format
string checking propagates through the forwarding.
Michael Guralnik [Mon, 27 Apr 2026 11:02:36 +0000 (14:02 +0300)]
RDMA/mlx5: Fix null-ptr-deref in Raw Packet QP creation
Raw Packet QPs are unique in that they support separate send and receive
queues, using 2 different user-provided buffers.
They can also be created with one of the queues having size 0, allowing
a send-only or receive-only QP.
The Raw Packet RQ umem is created in the common user QP creation path,
which allows zero-length queues. Add a later validation of the RQ umem
in Raw Packet QP creation path when an RQ was requested.
This prevents possible null-ptr dereference crashes, as seen in the
below trace:
Fixes: 0fb2ed66a14c ("IB/mlx5: Add create and destroy functionality for Raw Packet QP") Link: https://patch.msgid.link/r/20260427-security-bug-fixes-v3-5-4621fa52de0e@nvidia.com Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Michael Guralnik [Mon, 27 Apr 2026 11:02:35 +0000 (14:02 +0300)]
RDMA/core: Fix rereg_mr use-after-free race
When a driver creates a new MR during rereg_user_mr, a race window
exists between rdma_alloc_commit_uobject() for the new MR and the point
where the code reads that MR to populate the response keys.
A concurrent rereg_mr or destroy_mr could destroy the MR in this window
and cause UAF in the first thread.
Racing flow between two rereg_mr calls:
CPU0 CPU1
---- ----
rereg_user_mr(mr_handle)
uobj_get_write(mr_handle) -> mr0
mr1 = driver→rereg()
rdma_alloc_commit_uobject(mr1)
// mr1 replaced mr0 and is unlocked
uobj_put_destroy(mr0)
rereg_user_mr(mr_handle)
uobj_get_write(mr_handle) -> mr1
mr2 = driver→rereg()
rdma_alloc_commit_uobject(mr2)
// mr2 replaced mr1 and is unlocked
uobj_put_destroy(mr1)
// Destroys mr1!
resp.lkey = mr1->lkey; // UAF - mr1 was freed!
resp.rkey = mr1->rkey; // UAF - mr1 was freed!
Fix by storing lkey/rkey in local variables before the new MR is
unlocked and using the local variables to set the user response.
Fixes: 6e0954b11c05 ("RDMA/uverbs: Allow drivers to create a new HW object during rereg_mr") Link: https://patch.msgid.link/r/20260427-security-bug-fixes-v3-4-4621fa52de0e@nvidia.com Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
IB/core: Fix IPv6 netlink message size in ib_nl_ip_send_msg()
When resolving an RDMA-CM IPv6 address, ib_nl_ip_send_msg() sends a
netlink request to the userspace daemon to perform IP-to-GID
resolution in certain cases. The function allocates the netlink message
buffer using nla_total_size(sizeof(size)), which passes 8 bytes (the
size of size_t) instead of 16 bytes (the size of an IPv6 address).
This results in an 8-byte under-allocation.
This is currently masked by nlmsg_new() over-allocation of the skb
in its internal logic. However, the code remains incorrect.
Fix the issue by supplying the proper IPv6 address length to
nla_total_size().
Fixes: ae43f8286730 ("IB/core: Add IP to GID netlink offload") Link: https://patch.msgid.link/r/20260427-security-bug-fixes-v3-3-4621fa52de0e@nvidia.com Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Edward Srouji [Mon, 27 Apr 2026 11:02:33 +0000 (14:02 +0300)]
RDMA/mlx5: Fix UAF in DCT destroy due to race with create
A potential race condition exists between mlx5_core_destroy_dct() and
mlx5_core_create_dct() that can lead to a use-after-free.
After _mlx5_core_destroy_dct() releases the DCT to firmware, the DCTN
can be immediately reallocated for a new DCT being created concurrently.
If the create path stores the new DCT in the xarray before the destroy path
erases it, the destroy will incorrectly delete the new DCT's entry.
Later accesses then hit freed memory.
Fix by replacing the unconditional xa_erase_irq() with xa_cmpxchg_irq()
that only erases the entry if it hasn't already been replaced (still
contains XA_ZERO_ENTRY), preserving any newly created DCT.
Fixes: afff24899846 ("RDMA/mlx5: Handle DCT QP logic separately from low level QP interface") Link: https://patch.msgid.link/r/20260427-security-bug-fixes-v3-2-4621fa52de0e@nvidia.com Signed-off-by: Edward Srouji <edwards@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Edward Srouji [Mon, 27 Apr 2026 11:02:32 +0000 (14:02 +0300)]
RDMA/mlx5: Fix UAF in SRQ destroy due to race with create
A race condition exists between mlx5_cmd_destroy_srq() and
mlx5_cmd_create_srq() that can lead to a use-after-free (UAF) [1].
After destroy_srq_split() releases the SRQ to firmware, the SRQN can be
immediately reallocated for a new SRQ being created concurrently. If the
create path stores the new SRQ in the xarray before the destroy path
erases it, the destroy will incorrectly delete the new SRQ's entry.
Later accesses then hit freed memory.
Fix by replacing the unconditional xa_erase_irq() with xa_cmpxchg_irq()
that only erases the entry if it hasn't already been replaced (still
contains XA_ZERO_ENTRY), preserving any newly created SRQ.
Jia Yao [Fri, 17 Apr 2026 05:59:17 +0000 (05:59 +0000)]
drm/xe/uapi: Reject coh_none PAT index for CPU_ADDR_MIRROR
Add validation in xe_vm_bind_ioctl() to reject PAT indices
with XE_COH_NONE coherency mode when used with
DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR.
CPU address mirror mappings use system memory that is CPU
cached, which makes them incompatible with COH_NONE PAT
indices. Allowing COH_NONE with CPU cached buffers is a
security risk, as the GPU may bypass CPU caches and read
stale sensitive data from DRAM.
Although CPU_ADDR_MIRROR does not create an immediate
mapping, the backing system memory is still CPU cached.
Apply the same PAT coherency restrictions as
DRM_XE_VM_BIND_OP_MAP_USERPTR.
v2:
- Correct fix tag
v6:
- No change
v7:
- Correct fix tag
v8:
- Rebase
v9:
- Limit the restrictions to iGPU
v10:
- Just add the iGPU logic but keep dGPU logic
Fixes: b43e864af0d4 ("drm/xe/uapi: Add DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR") Cc: <stable@vger.kernel.org> # v6.15+ Cc: Shuicheng Lin <shuicheng.lin@intel.com> Cc: Mathew Alwin <alwin.mathew@intel.com> Cc: Michal Mrozek <michal.mrozek@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Jia Yao <jia.yao@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Acked-by: Michal Mrozek <michal.mrozek@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260417055917.2027459-3-jia.yao@intel.com
(cherry picked from commit 4d58d7535e826a3175527b6174502f0db319d7f6) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Jia Yao [Fri, 17 Apr 2026 05:59:16 +0000 (05:59 +0000)]
drm/xe/uapi: Reject coh_none PAT index for CPU cached memory in madvise
Add validation in xe_vm_madvise_ioctl() to reject PAT indices with
XE_COH_NONE coherency mode when applied to CPU cached memory.
Using coh_none with CPU cached buffers is a security issue. When the
kernel clears pages before reallocation, the clear operation stays in
CPU cache (dirty). GPU with coh_none can bypass CPU caches and read
stale sensitive data directly from DRAM, potentially leaking data from
previously freed pages of other processes.
This aligns with the existing validation in vm_bind path
(xe_vm_bind_ioctl_validate_bo).
v2(Matthew brost)
- Add fixes
- Move one debug print to better place
v3(Matthew Auld)
- Should be drm/xe/uapi
- More Cc
v4(Shuicheng Lin)
- Fix kmem leak issues by the way
v5
- Remove kmem leak because it has been merged by another patch
v6
- Remove the fix which is not related to current fix
v7
- No change
v8
- Rebase
v9
- Limit the restrictions to iGPU
v10
- No change
Fixes: ada7486c5668 ("drm/xe: Implement madvise ioctl for xe") Cc: <stable@vger.kernel.org> # v6.18+ Cc: Shuicheng Lin <shuicheng.lin@intel.com> Cc: Mathew Alwin <alwin.mathew@intel.com> Cc: Michal Mrozek <michal.mrozek@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Jia Yao <jia.yao@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Acked-by: Michal Mrozek <michal.mrozek@intel.com> Acked-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260417055917.2027459-2-jia.yao@intel.com
(cherry picked from commit 016ccdb674b8c899940b3944952c96a6a490d10a) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Shuicheng Lin [Fri, 17 Apr 2026 16:33:08 +0000 (16:33 +0000)]
drm/xe/gsc: Fix BO leak on error in query_compatibility_version()
When xe_gsc_read_out_header() fails, query_compatibility_version()
returns directly instead of jumping to the out_bo label. This skips
the xe_bo_unpin_map_no_vm() call, leaving the BO pinned and mapped
with no remaining reference to free it.
Fix by using goto out_bo so the error path properly cleans up the BO,
consistent with the other error handling in the same function.
Shuicheng Lin [Wed, 15 Apr 2026 22:54:28 +0000 (22:54 +0000)]
drm/xe/eustall: Fix drm_dev_put called before stream disable in close
In xe_eu_stall_stream_close(), drm_dev_put() is called before the
stream is disabled and its resources are freed. If this drops the
last reference, the device structures could be freed while the
subsequent cleanup code still accesses them, leading to a
use-after-free.
Fix this by moving drm_dev_put() after all device accesses are
complete. This matches the ordering in xe_oa_release().
Fixes: 9a0b11d4cf3b ("drm/xe/eustall: Add support to init, enable and disable EU stall sampling") Cc: Harish Chegondi <harish.chegondi@intel.com> Assisted-by: Claude:claude-opus-4.6 Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Link: https://patch.msgid.link/20260415225428.3399934-1-shuicheng.lin@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 35aff528f7297e949e5e19c9cd7fd748cf1cf21c) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Shuicheng Lin [Wed, 8 Apr 2026 02:06:47 +0000 (02:06 +0000)]
drm/xe: Fix error cleanup in xe_exec_queue_create_ioctl()
Two error handling issues exist in xe_exec_queue_create_ioctl():
1. When xe_hw_engine_group_add_exec_queue() fails, the error path jumps
to put_exec_queue which skips xe_exec_queue_kill(). If the VM is in
preempt fence mode, xe_vm_add_compute_exec_queue() has already added
the queue to the VM's compute exec queue list. Skipping the kill
leaves the queue on that list, leading to a dangling pointer after
the queue is freed.
2. When xa_alloc() fails after xe_hw_engine_group_add_exec_queue() has
succeeded, the error path does not call
xe_hw_engine_group_del_exec_queue() to remove the queue from the hw
engine group list. The queue is then freed while still linked into
the hw engine group, causing a use-after-free.
Fix both by:
- Changing the xe_hw_engine_group_add_exec_queue() failure path to jump
to kill_exec_queue so that xe_exec_queue_kill() properly removes the
queue from the VM's compute list.
- Adding a del_hw_engine_group label before kill_exec_queue for the
xa_alloc() failure path, which removes the queue from the hw engine
group before proceeding with the rest of the cleanup.
Shuicheng Lin [Wed, 8 Apr 2026 17:52:55 +0000 (17:52 +0000)]
drm/xe: Fix dma-buf attachment leak in xe_gem_prime_import()
When xe_dma_buf_init_obj() fails, the attachment from
dma_buf_dynamic_attach() is not detached. Add dma_buf_detach() before
returning the error. Note: we cannot use goto out_err here because
xe_dma_buf_init_obj() already frees bo on failure, and out_err would
double-free it.
Shuicheng Lin [Wed, 8 Apr 2026 17:52:54 +0000 (17:52 +0000)]
drm/xe: Fix bo leak in xe_dma_buf_init_obj() on allocation failure
When drm_gpuvm_resv_object_alloc() fails, the pre-allocated storage bo
is not freed. Add xe_bo_free(storage) before returning the error.
xe_dma_buf_init_obj() calls xe_bo_init_locked(), which frees the bo on
error. Therefore, xe_dma_buf_init_obj() must also free the bo on its own
error paths. Otherwise, since xe_gem_prime_import() cannot distinguish
whether the failure originated from xe_dma_buf_init_obj() or from
xe_bo_init_locked(), it cannot safely decide whether the bo should be
freed.
Add comments documenting the ownership semantics: on success, ownership
of storage is transferred to the returned drm_gem_object; on failure,
storage is freed before returning.
Shuicheng Lin [Wed, 8 Apr 2026 17:52:53 +0000 (17:52 +0000)]
drm/xe/bo: Fix bo leak on GGTT flag validation in xe_bo_init_locked()
When XE_BO_FLAG_GGTT_ALL is set without XE_BO_FLAG_GGTT, the function
returns an error without freeing a caller-provided bo, violating the
documented contract that bo is freed on failure.
Shuicheng Lin [Wed, 8 Apr 2026 17:52:52 +0000 (17:52 +0000)]
drm/xe/bo: Fix bo leak on unaligned size validation in xe_bo_init_locked()
When type is ttm_bo_type_device and aligned_size != size, the function
returns an error without freeing a caller-provided bo, violating the
documented contract that bo is freed on failure.
Shuicheng Lin [Thu, 9 Apr 2026 00:34:49 +0000 (00:34 +0000)]
drm/xe: Fix potential NULL deref in xe_exec_queue_tlb_inval_last_fence_put_unlocked
xe_exec_queue_tlb_inval_last_fence_put_unlocked() uses q->vm->xe as the
first argument to xe_assert(). This function is called unconditionally
from xe_exec_queue_destroy() for all queues, including kernel queues
that have q->vm == NULL (e.g., queues created during GT init in
xe_gt_record_default_lrcs() with vm=NULL).
While current compilers optimize away the q->vm->xe dereference (even
in CONFIG_DRM_XE_DEBUG=y builds, the compiler pushes the dereference
into the WARN branch that is only taken when the assert condition is
false), the code is semantically incorrect and constitutes undefined
behavior in the C abstract machine for the NULL pointer case.
Use gt_to_xe(q->gt) instead, which is always valid for any exec queue.
This is consistent with how xe_exec_queue_destroy() itself obtains the
xe_device pointer in its own xe_assert at the top of the function.
drm/xe/vf: Use drm mm instead of drm sa for CCS read/write
The suballocator algorithm tracks a hole cursor at the last allocation
and tries to allocate after it. This is optimized for fence-ordered
progress, where older allocations are expected to become reusable first.
In fence-enabled mode, that ordering assumption holds. In fence-disabled
mode, allocations may be freed in arbitrary order, so limiting allocation
to the current hole window can miss valid free space and fail allocations
despite sufficient total space.
Use DRM memory manager instead of sub-allocator to get rid of this issue
as CCS read/write operations do not use fences.
Fixes: 864690cf4dd6 ("drm/xe/vf: Attach and detach CCS copy commands with BO") Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Maarten Lankhorst <dev@lankhorst.se> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260408110145.1639937-6-satyanarayana.k.v.p@intel.com
(cherry picked from commit 6c84b493012aeb05dec29c709377bf0e17ac6815) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Add a memory pool to allocate sub-ranges from a BO-backed pool
using drm_mm.
Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Maarten Lankhorst <dev@lankhorst.se> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260408110145.1639937-5-satyanarayana.k.v.p@intel.com
(cherry picked from commit 1ce3229f8f269a245ff3b8c65ffae36b4d6afb93) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Matt Roper [Wed, 8 Apr 2026 22:27:44 +0000 (15:27 -0700)]
drm/xe/debugfs: Correct printing of register whitelist ranges
The register-save-restore debugfs prints whitelist entries as offset
ranges. E.g.,
REG[0x39319c-0x39319f]: allow read access
for a single dword-sized register. However the GENMASK value used to
set the lower bits to '1' for the upper bound of the whitelist range
incorrectly included one more bit than it should have, causing the
whitelist ranges to sometimes appear twice as large as they really were.
For example,
REG[0x6210-0x6217]: allow rw access
was also intended to be a single dword-sized register whitelist (with a
range 0x6210-0x6213) but was printed incorrectly as a qword-sized range
because one too many bits was flipped on. Similar 'off by one' logic
was applied when printing 4-dword register ranges and 64-dword register
ranges as well.
Correct the GENMASK logic to print these ranges in debugfs correctly.
No impact outside of correcting the misleading debugfs output.
Matt Roper [Fri, 10 Apr 2026 22:50:30 +0000 (15:50 -0700)]
drm/xe: Mark ROW_CHICKEN5 as a masked register
ROW_CHICKEN5 is a masked register (i.e., to adjust the value of any of
the lower 16 bits, the corresponding bit in the upper 16 bits must also
be set). Add the XE_REG_OPTION_MASKED to its definition; failure to do
so will cause workaround updates of this register to not apply properly.
Matt Roper [Fri, 10 Apr 2026 22:50:29 +0000 (15:50 -0700)]
drm/xe/tuning: Use proper register offset for GAMSTLB_CTRL
From Xe2 onward (i.e., all platforms officially supported by the Xe
driver), the GAMSTLB_CTRL register is located at offset 0x477C and
represented by the macro "GAMSTLB_CTRL" in code. However the register
formerly resided at offset 0xCF4C on Xe1-era platforms, and we also have
macro XEHP_GAMSTLB_CTRL that represents this old offset in the
unofficial/developer-only Xe1 code. When tuning for the register was
added for Xe3p_LPG, the old Xe1-era macro was accidentally used instead
of the proper macro for Xe2 and beyond, causing the tuning to not be
applied properly. Use the proper definition so that the correct offset
is written to.
drm/xe/xe3p_lpg: Add missing indirect ring state feature flag
Even though commit 8fcb7dfb8bbf ("drm/xe/xe3p_lpg: Add support for
graphics IP 35.10") mentions that the support for Indirect Ring State
exists for Xe3p_LPG, it missed actually setting the feature flag in
graphics_xe3p_lpg. Fix that by adding the missing member.
Matt Roper [Wed, 1 Apr 2026 20:12:44 +0000 (13:12 -0700)]
drm/xe: Drop redundant rtp entries for Wa_14019988906 & Wa_14019877138
There appears to have been a silent merge conflict between some commits
updating the workaround tables on Xe's -fixes and -next branches:
- Commit bc6387a2e0c1 ("drm/xe/xe2_hpg: Fix handling of Wa_14019988906
& Wa_14019877138") from the fixes branch moved the Xe2_HPG instance
of two workarounds touching the PSS_CHICKEN register from the
engine_was[] table to the lrc_was[] table; the equivalent
implementation for all other platforms/IPs were already properly
located on lrc_was[]. This commit on the fixes branch is a
cherry-pick of commit e04c609eedf4 ("drm/xe/xe2_hpg: Fix handling of
Wa_14019988906 & Wa_14019877138") that already existed on the next
branch.
- Commit 55b19abb6c44 ("drm/xe: Consolidate workaround entries for
Wa_14019877138") and commit c2142a1a8415 ("drm/xe: Consolidate
workaround entries for Wa_14019988906") consolidated the individual
entries per IP generation for each workaround into single, larger
range-based entries.
During merge conflict resolution the Xe2_HPG-specific entries (i.e.,
those with rule "GRAPHICS_VERSION_RANGE(2001, 2002)") were accidentally
resurrected, even though the table already contains the consolidated
entries that match a superset of thse ranges. These redundant entries
don't cause any build failures but do trigger a dmesg error during probe
on BMG-G21 devices:
Matthew Brost [Thu, 26 Mar 2026 21:01:16 +0000 (14:01 -0700)]
drm/xe: Drop registration of guc_submit_wedged_fini from xe_guc_submit_wedge()
xe_guc_submit_wedge() runs in the DMA-fence signaling path, where
GFP_KERNEL memory allocations are not permitted. However, registering
guc_submit_wedged_fini via drmm_add_action_or_reset() triggers such an
allocation.
Avoid this by moving the logic from guc_submit_wedged_fini() into
guc_submit_fini(), where wedged exec queue references are dropped during
normal teardown.
DaeMyung Kang [Sat, 25 Apr 2026 09:38:28 +0000 (18:38 +0900)]
ksmbd: rewrite stop_sessions() with restartable iteration
stop_sessions() walks conn_list with hash_for_each() and, for every
entry, drops conn_list_lock across the transport ->shutdown() call
before re-acquiring the read lock to continue the loop. The hash
walk relies on cross-iteration state (the current bucket and the
hlist position), which is not preserved across unlock/relock: if
another thread performs a list mutation during the unlocked window,
the ongoing iteration becomes unreliable and can re-visit
connections that have already been handled or skip connections that
have not. The outer `if (!hash_empty(conn_list)) goto again;` retry
masks the symptom in the common case but does not address the
unsafe iteration itself.
Reframe the loop so it never relies on iterator state across
unlock/relock. Under conn_list_lock held for read, pick the first
connection whose ->shutdown() has not yet been issued by this path,
pin it by taking an extra reference, record that fact on the
connection and mark it EXITING while still inside the locked walk,
then drop the lock. Then call ->shutdown() outside the lock, drop
the pin (freeing the connection if the handler already released its
reference), and restart from the top.
Use a new per-connection flag, conn->stop_called, as the "shutdown
issued from stop_sessions()" marker rather than reusing the status
state. ksmbd_conn_set_exiting() is also invoked by
ksmbd_sessions_deregister() on sibling channels of a multichannel
session without issuing a transport shutdown, so treating
KSMBD_SESS_EXITING as "already handled here" would skip connections
that still need shutdown() to wake their handler out of recv(),
leaving the outer retry waiting indefinitely for the hash to drain.
stop_sessions() is serialised by init_lock in
ksmbd_conn_transport_destroy(), so writing stop_called under the
read lock has no other writer.
Set EXITING inside the locked walk so the selection, the stop_called
marker, and the status transition all happen together, and guard
against regressing a connection that has already advanced to
KSMBD_SESS_RELEASING on its own (for example, if the handler exited
its receive loop for an unrelated reason between teardown steps).
When the pin drop is the last put, release the transport and pair
ida_destroy(&target->async_ida) with the ida_init() done in
ksmbd_conn_alloc(), so stop_sessions() retiring a connection on its
own does not leak the xarray backing of the embedded async_ida.
The outer retry with msleep() is kept to wait for handler threads to
reach ksmbd_conn_free() and drain the hash.
Observed with an instrumented build that logs one line per visit and
widens the unlocked window before ->shutdown() by 200 ms, under
five concurrent cifs mounts (nosharesock, one connection each):
* Current code: the same connection address is revisited many
times during a single stop_sessions() call and ->shutdown() is
invoked well beyond the number of live connections before the
hash finally drains.
* Rewritten code: each live connection produces exactly one
->shutdown() call; the function returns as soon as the hash is
empty.
Functional teardown via `ksmbd.control --shutdown` with the same
five mounts completes cleanly on the rewritten path.
Performance is observably unchanged. Tearing down N concurrent
nosharesock cifs connections with `ksmbd.control --shutdown` +
`rmmod ksmbd` takes essentially the same wall time before and after
the rewrite:
N before after
10 4.93s 5.34s
30 7.34s 7.03s
50 7.31s 7.01s (3-run avg: 7.04s vs 7.25s)
100 6.98s 6.78s
200 6.77s 6.89s
and the number of ->shutdown() calls equals the number of live
connections on both paths when the race is not widened. The
teardown is dominated by the msleep(100)-based outer retry waiting
for handler threads to run ksmbd_conn_free(), not by the iteration
itself; the restartable loop's worst-case O(N^2) visit cost is in
the microseconds even at N=200 and sits far below the msleep(100)
granularity.
Applied alone on top of ksmbd-for-next-next, this patch does not
introduce a new leak site. Under the same reproducer (10x
concurrent-holders + ss -K + ksmbd.control --shutdown + rmmod), the
tree still shows the pre-existing per-connection transport leak
count that arises when the last refcount drop lands in one of
ksmbd_conn_r_count_dec(), __free_opinfo() or session_fd_check() -
all of which end with a bare kfree() today. kmemleak backtraces
for the unreferenced objects point into the TCP accept path
(sk_clone -> inet_csk_clone_lock, sock_alloc_inode) and none
involve stop_sessions(). Plugging those bare-kfree sites is the
responsibility of the follow-up patch.
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"On top of a lot of Arm fixes, this includes a massive rename of types
and variables in tools/testing/selftests/kvm - these were
unnecessarily different from what the kernel uses, so they're being
made consistent.
arm64:
- Allow tracing for non-pKVM, which was accidentally disabled when
the series was merged
- Rationalise the way the pKVM hypercall ranges are defined by using
the same mechanism as already used for the vcpu_sysreg enum
- Enforce that SMCCC function numbers relayed by the pKVM proxy are
actually compliant with the specification
- Fix a couple of feature to idreg mappings which resulted in the
wrong sanitisation being applied
- Fix the GICD_IIDR revision number field that could never been
written correctly by userspace
- Make kvm_vcpu_initialized() correctly use its parameter instead of
relying on the surrounding context
- Enforce correct ordering in __pkvm_init_vcpu(), plugging a
potential pin leak at the same time
- Move __pkvm_init_finalise() to a less dangerous spot, avoiding
future problems
- Restore functional userspace irqchip support after a four year
breakage (last functional kernel was 5.18...)
- Spelling fixes
Selftests:
- Rename types across all KVM selftests to more closely align with
types used in the kernel:
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (31 commits)
KVM: selftests: Add check_steal_time_uapi() implementation for LoongArch
KVM: arm64: Wake-up from WFI when iqrchip is in userspace
KVM: arm64: Fix initialisation order in __pkvm_init_finalise()
KVM: arm64: Fix pin leak and publication ordering in __pkvm_init_vcpu()
KVM: arm64: Fix kvm_vcpu_initialized() macro parameter
KVM: arm64: Fix FEAT_SPE_FnE to use PMSIDR_EL1.FnE, not PMSVer
KVM: arm64: Fix typo in feature check comments
KVM: arm64: Fix FEAT_Debugv8p9 to check DebugVer, not PMUVer
KVM: arm64: Reject non compliant SMCCC function calls in pKVM
KVM: arm64: vgic: Fix IIDR revision field extracted from wrong value
KVM: selftests: Replace "paddr" with "gpa" throughout
KVM: selftests: Replace "u64 nested_paddr" with "gpa_t l2_gpa"
KVM: selftests: Replace "u64 gpa" with "gpa_t" throughout
KVM: selftests: Replace "vaddr" with "gva" throughout
KVM: selftests: Clarify that arm64's inject_uer() takes a host PA, not a guest PA
KVM: selftests: Rename translate_to_host_paddr() => translate_hva_to_hpa()
KVM: selftests: Rename vm_vaddr_populate_bitmap() => vm_populate_gva_bitmap()
KVM: selftests: Rename vm_vaddr_unused_gap() => vm_unused_gva_gap()
KVM: selftests: Drop "vaddr_" from APIs that allocate memory for a given VM
KVM: selftests: Use u8 instead of uint8_t
...
Michal Kosiorek [Wed, 29 Apr 2026 08:54:51 +0000 (10:54 +0200)]
xfrm: defensively unhash xfrm_state lists in __xfrm_state_delete
KASAN reproduces a slab-use-after-free in __xfrm_state_delete()'s
hlist_del_rcu calls under syzkaller load on linux-6.12.y stable
(reproduced on 6.12.47, also reachable via the same code path on
torvalds/master and on the ipsec tree). Nine unique signatures cluster
in the xfrm_state lifecycle, the load-bearing one being:
BUG: KASAN: slab-use-after-free in __hlist_del include/linux/list.h:990 [inline]
BUG: KASAN: slab-use-after-free in hlist_del_rcu include/linux/rculist.h:516 [inline]
BUG: KASAN: slab-use-after-free in __xfrm_state_delete net/xfrm/xfrm_state.c
Write of size 8 at addr ffff8881198bcb70 by task kworker/u8:9/435
The other observed signatures hit the same slab object from
__xfrm_state_lookup, xfrm_alloc_spi, __xfrm_state_insert and an OOB
write variant of __xfrm_state_delete, all on the byseq/byspi
hash chains.
__xfrm_state_delete() guards its byseq and byspi unhashes with
value-based predicates:
if (x->km.seq)
hlist_del_rcu(&x->byseq);
if (x->id.spi)
hlist_del_rcu(&x->byspi);
while everywhere else in the file (e.g. state_cache, state_cache_input)
the safer hlist_unhashed() check is used. xfrm_alloc_spi() sets
x->id.spi = newspi inside xfrm_state_lock and then immediately inserts
into byspi, but a path that observes x->id.spi != 0 outside of
xfrm_state_lock can still skip-or-hit the byspi unhash inconsistently
with whether x is actually on the list. The same holds for x->km.seq
versus byseq, and the bydst/bysrc unhashes have no predicate at all,
so a second __xfrm_state_delete() on the same object writes through
LIST_POISON pprev.
The defensive change here:
- Use hlist_del_init_rcu() instead of hlist_del_rcu() on bydst,
bysrc, byseq and byspi so a second deletion is a no-op rather
than a write through LIST_POISON pprev. The byseq/byspi nodes
are already initialised in xfrm_state_alloc().
- Test hlist_unhashed() rather than the value predicate for
byseq/byspi, so the unhash decision tracks list state rather than
mutable scalar fields.
Empirical verification: applied this patch on top of v6.12.47, rebuilt,
and re-ran the same syzkaller harness for 1h16m on a previously-crashy
configuration that produced ~100 hits each of slab-use-after-free
Read in xfrm_alloc_spi / Read in __xfrm_state_lookup / Write in
__xfrm_state_delete. After the patch, 7.1M execs across 32 VMs at
~1550 exec/sec produced zero xfrm_state UAF/OOB hits. /proc/slabinfo
confirms the xfrm_state slab is actively allocated and freed during
the run (~143 KiB resident), so the fuzzer is still exercising those
code paths -- they just no longer crash.
Reproduction:
- Linux 6.12.47 x86_64 + KASAN_GENERIC + KASAN_INLINE + KCOV
- syzkaller @ 746545b8b1e4c3a128db8652b340d3df90ce61db
- 32 QEMU/KVM VMs x 2 vCPU on AWS c5.metal bare metal
- 9 unique signatures collected in ~9h, all within xfrm_state
lifecycle
Fixes: fe9f1d8779cb ("xfrm: add state hashtable keyed by seq") Fixes: 7b4dc3600e48 ("[XFRM]: Do not add a state whose SPI is zero to the SPI hash.") Reported-by: Michal Kosiorek <mkosiorek121@gmail.com> Tested-by: Michal Kosiorek <mkosiorek121@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Michal Kosiorek <mkosiorek121@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Task B acquires both hb locks and attempts to acquire the PI-lock of the
top most waiter (task B). Task A is leaving early due to a signal/
timeout and started removing itself from the queue. It updates its
requeue_state but can not remove it from the list because this requires
the hb lock which is owned by task B.
Usually task A is able to swoop the lock after task B unlocked it.
However if task B is of higher priority then task A may not be able to
wake up in time and acquire the lock before task B gets it again.
Especially on a UP system where A is never scheduled.
As a result task A blocks on the lock and task B busy loops, trying to
make progress but live locks the system instead. Tragic.
This can be fixed by removing the top most waiter from the list in this
case. This allows task B to grab the next top waiter (if any) in the
next iteration and make progress.
Remove the top most waiter if futex_requeue_pi_prepare() fails.
Let the waiter conditionally remove itself from the list in
handle_early_requeue_pi_wakeup().
Fixes: 07d91ef510fb1 ("futex: Prevent requeue_pi() lock nesting issue on RT") Reported-by: Moritz Klammler <Moritz.Klammler@ferchau.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260428103425.dywXyPd3@linutronix.de Closes: https://lore.kernel.org/all/VE1PR06MB6894BE61C173D802365BE19DFF4CA@VE1PR06MB6894.eurprd06.prod.outlook.com
WANG Rui [Mon, 27 Apr 2026 08:47:21 +0000 (16:47 +0800)]
efi/libstub: Synchronize instruction cache after kernel relocation
The relocated kernel image is copied to its new location using memcpy().
On architectures with separate instruction and data caches, the copied
instructions may remain stale in the instruction cache, leading to the
execution of outdated contents.
Call efi_cache_sync_image() after the relocation copy to ensure the
instruction cache is synchronized with the updated memory contents before
control is transferred to the relocated kernel.