]> git.apps.os.sepia.ceph.com Git - ceph-client.git/log
ceph-client.git
7 months agodrm/amdgpu: Do not program AGP BAR regs under SRIOV in gfxhub_v1_0.c
Victor Lu [Thu, 13 Feb 2025 23:38:28 +0000 (18:38 -0500)]
drm/amdgpu: Do not program AGP BAR regs under SRIOV in gfxhub_v1_0.c

SRIOV VF does not have write access to AGP BAR regs.
Skip the writes to avoid a dmesg warning.

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Fix core reset sequence for JPEG4_0_3
Sathishkumar S [Wed, 26 Feb 2025 10:18:39 +0000 (15:48 +0530)]
drm/amdgpu: Fix core reset sequence for JPEG4_0_3

For cores 1 through 7 repair the core reset sequence by
adjusting offsets to access the expected registers.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Add support for CPERs on virtualization
Tony Yi [Wed, 26 Feb 2025 22:03:10 +0000 (17:03 -0500)]
drm/amdgpu: Add support for CPERs on virtualization

Add support for CPERs on VFs.

VFs do not receive PMFW messages directly; as such, they need to
query them from the host. To avoid hitting host event guard,
CPER queries need to be rate limited. CPER queries share the same
RAS telemetry buffer as error count query, so a mutex protecting
the shared buffer was added as well.

For readability, the amdgpu_detect_virtualization was refactored
into multiple individual functions.

Signed-off-by: Tony Yi <Tony.Yi@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdkfd: remove unnecessary cpu domain validation
James Zhu [Mon, 3 Mar 2025 18:10:33 +0000 (13:10 -0500)]
drm/amdkfd: remove unnecessary cpu domain validation

before move to GTT domain.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: use drm_* instead of DRM_ in apply_edid_quirks()
Aurabindo Pillai [Mon, 24 Feb 2025 19:27:54 +0000 (14:27 -0500)]
drm/amd/display: use drm_* instead of DRM_ in apply_edid_quirks()

drm_* macros are more helpful that DRM_* macros since the former
indicates the associated DRM device that prints the error, which maybe
helpful when debugging.

Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Add workaround for a panel
Aurabindo Pillai [Mon, 24 Feb 2025 19:24:45 +0000 (14:24 -0500)]
drm/amd/display: Add workaround for a panel

Implement w/a for a panel which requires 10s delay after link detect.

Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Update headers for CPER support on SRIOV
Tony Yi [Wed, 26 Feb 2025 21:56:02 +0000 (16:56 -0500)]
drm/amdgpu: Update headers for CPER support on SRIOV

Update amdgv_sriovmsg.h and mxgpu_nv.h to add new definitions for
CPER support on VFs. PMFW ACA messages are not available on VFs,
and VFs must query CPERs from host.

Signed-off-by: Tony Yi <Tony.Yi@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/pm: always allow ih interrupt from fw
Kenneth Feng [Fri, 28 Feb 2025 09:02:11 +0000 (17:02 +0800)]
drm/amd/pm: always allow ih interrupt from fw

always allow ih interrupt from fw on smu v14 based on
the interface requirement

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Reinit FW shared flags on VCN v5.0.1
Lijo Lazar [Wed, 26 Feb 2025 07:23:48 +0000 (12:53 +0530)]
drm/amdgpu: Reinit FW shared flags on VCN v5.0.1

After a full device reset, shared memory region will clear out and it's
not possible to reliably save the region in case of RAS errors.
Reinitialize the flags if required.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Use the right struct for VCN v5.0.1
Lijo Lazar [Wed, 26 Feb 2025 06:54:05 +0000 (12:24 +0530)]
drm/amdgpu: Use the right struct for VCN v5.0.1

VCN IP versions >= 5.0 uses VCN5 fw shared struct.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdkfd: Fix NULL Pointer Dereference in KFD queue
Andrew Martin [Fri, 28 Feb 2025 16:26:48 +0000 (11:26 -0500)]
drm/amdkfd: Fix NULL Pointer Dereference in KFD queue

Through KFD IOCTL Fuzzing we encountered a NULL pointer derefrence
when calling kfd_queue_acquire_buffers.

Fixes: 629568d25fea ("drm/amdkfd: Validate queue cwsr area and eop buffer size")
Signed-off-by: Andrew Martin <Andrew.Martin@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Andrew Martin <Andrew.Martin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: add dce_v6_0_soft_reset() to DCE6
Alexandre Demers [Sat, 1 Mar 2025 04:11:20 +0000 (23:11 -0500)]
drm/amdgpu: add dce_v6_0_soft_reset() to DCE6

DCE6 was missing soft reset, but it was easily identifiable under radeon.
This should be it, pretty much as it is done under DCE8 and DCE10.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: fix style in DCE6
Alexandre Demers [Sat, 1 Mar 2025 02:17:47 +0000 (21:17 -0500)]
drm/amdgpu: fix style in DCE6

Whitespace cleanups.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: add some comments in DCE6
Alexandre Demers [Sat, 1 Mar 2025 02:17:46 +0000 (21:17 -0500)]
drm/amdgpu: add some comments in DCE6

Add some comments.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/pm: Fix indentation issue
Asad Kamal [Thu, 27 Feb 2025 14:58:33 +0000 (22:58 +0800)]
drm/amd/pm: Fix indentation issue

Fix indentation issue for smu_v_13_0_12 get_gpu_metrics

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202502272246.OISqUnC1-lkp@intel.com
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Set PG state to gating for vcn_v_5_0_1
Asad Kamal [Thu, 27 Feb 2025 12:09:16 +0000 (20:09 +0800)]
drm/amdgpu: Set PG state to gating for vcn_v_5_0_1

For vcn_v_5_0_1, set power state to gating during hw fini. Also there may
be scenario where VCN engine hangs during a job execution, then it's not
safe to assume that set_pg_state works fine during hw_fini to put the state
to gated. After a reset, we can assume that it's in the default state,
therefore reset the driver maintained state. Put the default state as gated
during reset as per this assumption.

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Remove unused pqm_get_kernel_queue
Dr. David Alan Gilbert [Mon, 3 Mar 2025 14:39:22 +0000 (14:39 +0000)]
drm/amdgpu: Remove unused pqm_get_kernel_queue

pqm_get_kernel_queue() has been unused since 2022's
commit 5bdd3eb25354 ("drm/amdkfd: Remove unused old debugger
implementation")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Remove unused print__rq_dlg_params_st
Dr. David Alan Gilbert [Mon, 3 Mar 2025 14:39:21 +0000 (14:39 +0000)]
drm/amdgpu: Remove unused print__rq_dlg_params_st

print__rq_dlg_params_st() was added in 2017 by
commit 061bfa06a42a ("drm/amdgpu/display: Add dml support for DCN")
but has remained unused.

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Remove unused pre_surface_trace
Dr. David Alan Gilbert [Mon, 3 Mar 2025 14:39:20 +0000 (14:39 +0000)]
drm/amdgpu: Remove unused pre_surface_trace

pre_surface_trace() has been unused since 2017's
commit 745cc746da42 ("drm/amd/display: remove
dc_pre_update_surfaces_to_stream from dc use")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Remove powerdown_uvd member
Dr. David Alan Gilbert [Mon, 3 Mar 2025 14:39:19 +0000 (14:39 +0000)]
drm/amdgpu: Remove powerdown_uvd member

With phm_powerdown_uvd() gone in the previous patch, there's
now no longer anything that reads the powerdown_uvd member of the
pp_hwmgr_func.

Remove it.

There are a few assignments to it; a boring NULL which can just go,
and two functions, but those functions are called explicitly anyway
so the assignments to the member go.

One of those (smu7_powerdown_uvd) wasn't static previously;
make it static.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Remove phm_powerdown_uvd
Dr. David Alan Gilbert [Mon, 3 Mar 2025 14:39:18 +0000 (14:39 +0000)]
drm/amdgpu: Remove phm_powerdown_uvd

phm_powerdown_uvd() has been unused since 2017's
commit 47047263c527 ("drm/amd/powerplay: delete eventmgr related files.")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Remove ppatomfwctrl deadcode
Dr. David Alan Gilbert [Mon, 3 Mar 2025 14:39:17 +0000 (14:39 +0000)]
drm/amdgpu: Remove ppatomfwctrl deadcode

pp_atomfwctrl_get_pp_assign_pin() and pp_atomfwctrl_get_pp_assign_pin()
were added in 2017 by
commit 0d2c7569e196 ("drm/amdgpu: add new atomfirmware based helpers for
powerplay")
but have remained unused.

Remove them, and the helper functions they used.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/radeon: Fix rs400_gpu_init for ATI mobility radeon Xpress 200M
Richard Thier [Mon, 17 Jun 2019 21:46:27 +0000 (23:46 +0200)]
drm/radeon: Fix rs400_gpu_init for ATI mobility radeon Xpress 200M

num_gb_pipes was set to a wrong value using r420_pipe_config

This have lead to HyperZ glitches on fast Z clearing.

Closes: https://bugs.freedesktop.org/show_bug.cgi?id=110897
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Richard Thier <u9vata@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Add a new dcdebugmask to allow turning off brightness curve
Mario Limonciello [Fri, 28 Feb 2025 18:51:45 +0000 (12:51 -0600)]
drm/amd/display: Add a new dcdebugmask to allow turning off brightness curve

Upgrading the kernel may cause some systems that were previously not using
a firmware specified brightness curve to use one.

In the event of problems with this curve (for example an interpolation
error) add a new dcdebugmask value that can be used to turn it off.  Also
add an info message to show that custom brightness curves are currently in
use.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Link: https://lore.kernel.org/r/20250228185145.186319-6-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Add support for custom brightness curve
Mario Limonciello [Fri, 28 Feb 2025 18:51:44 +0000 (12:51 -0600)]
drm/amd/display: Add support for custom brightness curve

Some systems specify in the firmware a brightness curve that better
reflects the characteristics of the panel used. This is done in the
form of data points and matching luminance percentage.

When converting a userspace requested brightness value use that curve
to convert to a firmware intended brightness value.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Link: https://lore.kernel.org/r/20250228185145.186319-5-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Avoid operating on copies of backlight caps
Mario Limonciello [Fri, 28 Feb 2025 18:51:43 +0000 (12:51 -0600)]
drm/amd/display: Avoid operating on copies of backlight caps

Making a copy of the backlight caps structure between uses is unnecessary.
Refer to pointers to the same structure when using it.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Link: https://lore.kernel.org/r/20250228185145.186319-4-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd: Pass luminance data to amdgpu_dm_backlight_caps
Mario Limonciello [Fri, 28 Feb 2025 18:51:42 +0000 (12:51 -0600)]
drm/amd: Pass luminance data to amdgpu_dm_backlight_caps

The ATIF method on some systems will provide a backlight curve. Pass
this curve into amdgpu_dm add it to the structures.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Link: https://lore.kernel.org/r/20250228185145.186319-3-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd: Copy entire structure in amdgpu_acpi_get_backlight_caps()
Mario Limonciello [Fri, 28 Feb 2025 18:51:41 +0000 (12:51 -0600)]
drm/amd: Copy entire structure in amdgpu_acpi_get_backlight_caps()

As new members are introduced to the structure copying the entire
structure will help avoid missing them.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Link: https://lore.kernel.org/r/20250228185145.186319-2-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Promote DAL to 3.2.323
Taimur Hassan [Mon, 24 Feb 2025 01:39:00 +0000 (20:39 -0500)]
drm/amd/display: Promote DAL to 3.2.323

This version brings along following fixes:
- Various cleanups to amdgpu dm
- Add DP tunneling IRQ handler
- Fix display corruption for dcn35
- Fix dmcub reset problem
- Adjust BW determination for PCON
- DIO encoder refactor
- Fix performance with SubVP under gaming

Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Use drm_err() for handle_hpd_irq_helper()
Mario Limonciello [Tue, 18 Feb 2025 04:58:40 +0000 (22:58 -0600)]
drm/amd/display: Use drm_err() for handle_hpd_irq_helper()

drm_err() will show which device has the error.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Use scoped guards for handle_hpd_irq_helper()
Mario Limonciello [Tue, 18 Feb 2025 04:58:39 +0000 (22:58 -0600)]
drm/amd/display: Use scoped guards for handle_hpd_irq_helper()

Scoped guards will release the mutex when they go out of scope.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Use _free() macro for amdgpu_dm_update_connector_after_detect()
Mario Limonciello [Tue, 18 Feb 2025 04:58:38 +0000 (22:58 -0600)]
drm/amd/display: Use _free() macro for amdgpu_dm_update_connector_after_detect()

By using a _free() macro multiple duplicated snippets of code to free
the sink can be dropped. The sink will be released when leaving scope.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Use scoped guard for amdgpu_dm_update_connector_after_detect()
Mario Limonciello [Tue, 18 Feb 2025 04:58:37 +0000 (22:58 -0600)]
drm/amd/display: Use scoped guard for amdgpu_dm_update_connector_after_detect()

A scoped guard will release the mutex when it goes out of scope.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Use _free(kfree) for dm_gpureset_commit_state()
Mario Limonciello [Tue, 18 Feb 2025 04:58:36 +0000 (22:58 -0600)]
drm/amd/display: Use _free(kfree) for dm_gpureset_commit_state()

Using a _free(kfree) macro drops the need for a goto statement
as it will be freed when it goes out of scope.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Change amdgpu_dm_irq_resume_*() to void
Mario Limonciello [Tue, 18 Feb 2025 04:58:35 +0000 (22:58 -0600)]
drm/amd/display: Change amdgpu_dm_irq_resume_*() to void

amdgpu_dm_irq_resume_early() and amdgpu_dm_irq_resume_late() don't
have any error flows. Change the return type from integer to void.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Change amdgpu_dm_irq_resume_*() to use drm_dbg()
Mario Limonciello [Tue, 18 Feb 2025 04:58:34 +0000 (22:58 -0600)]
drm/amd/display: Change amdgpu_dm_irq_resume_*() to use drm_dbg()

drm_dbg() is helpful to show which device had the debug statement.
Adjust to using this instead for debug messages.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Use scoped guard for dm_resume()
Mario Limonciello [Tue, 18 Feb 2025 04:58:33 +0000 (22:58 -0600)]
drm/amd/display: Use scoped guard for dm_resume()

Scoped guards will release the mutex when they go out of scope.
Adjust the code to use these instead.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Use drm_err() instead of DRM_ERROR in dm_resume()
Mario Limonciello [Tue, 18 Feb 2025 04:58:32 +0000 (22:58 -0600)]
drm/amd/display: Use drm_err() instead of DRM_ERROR in dm_resume()

drm_err() is helpful to show which device had the error. Adjust to
using this instead for error messages.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Use _free() macro for amdgpu_dm_commit_zero_streams()
Mario Limonciello [Tue, 18 Feb 2025 04:58:31 +0000 (22:58 -0600)]
drm/amd/display: Use _free() macro for amdgpu_dm_commit_zero_streams()

All cases except a failure to create a copy of the current context will
call dc_state_release() on the copied context.

Use a _free() macro to free the context and then adjust the error handling
flow to drop the unnecessary use of goto statements.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Catch failures for amdgpu_dm_commit_zero_streams()
Mario Limonciello [Tue, 18 Feb 2025 04:58:30 +0000 (22:58 -0600)]
drm/amd/display: Catch failures for amdgpu_dm_commit_zero_streams()

amdgpu_dm_commit_zero_streams() returns a DC error code that isn't
checked. Add an explicit check to this and fail dm_suspend() if it
is not DC_OK.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Drop `ret` variable from dm_suspend()
Mario Limonciello [Tue, 18 Feb 2025 04:58:29 +0000 (22:58 -0600)]
drm/amd/display: Drop `ret` variable from dm_suspend()

The `ret` variable in dm_suspend() doesn't get set and is just used
to return 0.  Drop the needless declaration.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Change amdgpu_dm_irq_suspend() to void
Mario Limonciello [Tue, 18 Feb 2025 04:58:28 +0000 (22:58 -0600)]
drm/amd/display: Change amdgpu_dm_irq_suspend() to void

amdgpu_dm_irq_suspend() doesn't have any error flows and always
returns zero.

Change the function to void.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Add tunneling IRQ handler
Cruise Hung [Thu, 20 Feb 2025 03:29:50 +0000 (11:29 +0800)]
drm/amd/display: Add tunneling IRQ handler

USB4 DP BW Allocation uses DP_TUNNELING_IRQ to indicate the status update.
The DP_TUNNELING_IRQ is defined in LINK_SERVICE_IRQ_VECTOR_ESI0. When
receiving DP HPD IRQ in USB4, read the LINK_SERVICE_IRQ_VECTOR_ESI0.

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Cruise Hung <Cruise.Hung@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Added visual confirm for DCC
Leo Zeng [Wed, 19 Feb 2025 15:29:39 +0000 (10:29 -0500)]
drm/amd/display: Added visual confirm for DCC

[WHY]
We want to add a visual confirm mode for DCC and MCache for
debugging purpose.

[HOW]
color pipes based on whether DCC is enabled and what MCache id
is used.
black - DCC disabled
red - DCC enabled
grey - 2 different MCaches used
other colors - 1 MCache used

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Leo Zeng <Leo.Zeng@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Ensure DMCUB idle before reset on DCN31/DCN35
Nicholas Kazlauskas [Wed, 19 Feb 2025 14:56:53 +0000 (09:56 -0500)]
drm/amd/display: Ensure DMCUB idle before reset on DCN31/DCN35

[Why]
If we soft reset before halt finishes and there are outstanding
memory transactions then the memory interface may produce unexpected
results, such as out of order transactions when the firmware next runs.

These can manifest as random or unexpected load/store violations.

[How]
Increase the timeout before soft reset to ensure the DMCUB has quiesced.
This is effectively 1s maximum based on experimentation.

Use the enable bit check on DCN31 like we're doing on DCN35 and reorder
the reset writes to follow the HW programming guide.

Ensure we're reading SCRATCH7 instead of SCRATCH8 for the HALT code.
No current versions of DMCUB firmware use the SCRATCH8 boot bit to
dynamically switch where the HALT code goes to maintain backwards
compatibility with PSP.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Revert "Increase halt timeout for DMCUB to 1s"
Nicholas Kazlauskas [Wed, 19 Feb 2025 14:46:55 +0000 (09:46 -0500)]
drm/amd/display: Revert "Increase halt timeout for DMCUB to 1s"

This reverts commit 50f040c53ea9 ("drm/amd/display: Increase halt
timeout for DMCUB to 1s")

There's two issues here:
1. Each poll is closer to 10us than 1us so it stalls for 15s on PNP.
2. We're reading the wrong scratch register to check for the HALT code.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Check NULL connector before it is used
Alex Hung [Wed, 19 Feb 2025 04:14:47 +0000 (21:14 -0700)]
drm/amd/display: Check NULL connector before it is used

[Why & How]
amdgpu_dm_find_first_crtc_matching_connector can return NULL.
It is necessary to the returned connector before passing it
drm_atomic_get_new_connector_state which always assumes connector is
not NULL.

Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Remove unused struct definition
George Shen [Fri, 14 Feb 2025 15:57:08 +0000 (10:57 -0500)]
drm/amd/display: Remove unused struct definition

[Why/How]
The struct is not and will not be used, as it is no longer relevant nor
supported.

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: George Shen <george.shen@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Skip checking FRL_MODE bit for PCON BW determination
George Shen [Sat, 15 Feb 2025 03:00:13 +0000 (22:00 -0500)]
drm/amd/display: Skip checking FRL_MODE bit for PCON BW determination

[Why/How]
Certain PCON will clear the FRL_MODE bit despite supporting the link BW
indicated in the other bits.

Thus, skip checking the FRL_MODE bit when interpreting the
hdmi_encoded_link_bw struct.

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: George Shen <george.shen@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: misc for dio encoder refactor
Peichen Huang [Tue, 11 Feb 2025 06:41:11 +0000 (14:41 +0800)]
drm/amd/display: misc for dio encoder refactor

[WHY]
These are left required changes for dio encoder refactor.

[HOW]
1. original logic is separated by config option
2. new link encoder dp enable/disable code for dcn35
3. process fec only for DP 8b10b encoding

Reviewed-by: Cruise Hung <cruise.hung@amd.com>
Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: read mso dpcd caps
Hansen Dsouza [Fri, 24 Jan 2025 20:12:37 +0000 (15:12 -0500)]
drm/amd/display: read mso dpcd caps

[Why & How]
Read if panel support multi-sst links

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Hansen Dsouza <Hansen.Dsouza@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Fix DMUB reset sequence for DCN401
Dillon Varone [Thu, 13 Feb 2025 18:10:41 +0000 (13:10 -0500)]
drm/amd/display: Fix DMUB reset sequence for DCN401

[WHY]
It should no longer use DMCUB_SOFT_RESET as it can result
in the memory request path becoming desynchronized.

[HOW]
To ensure robustness in the reset sequence:
1) Extend timeout on the "halt" command sent via gpint, and check for
controller to enter "wait" as a stronger guarantee that there are no
requests to memory still in flight.
2) Remove usage of DMCUB_SOFT_RESET
3) Rely on PSP to reset the controller safely

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Dillon Varone <Dillon.Varone@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Fix p-state type when p-state is unsupported
Dillon Varone [Wed, 12 Feb 2025 22:06:42 +0000 (17:06 -0500)]
drm/amd/display: Fix p-state type when p-state is unsupported

[WHY&HOW]
P-state type would remain on previously used when unsupported which
causes confusion in logging and visual confirm, so set back to zero
when unsupported.

Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Dillon Varone <Dillon.Varone@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Request HW cursor on DCN3.2 with SubVP
Aric Cyr [Thu, 23 Jan 2025 21:39:52 +0000 (16:39 -0500)]
drm/amd/display: Request HW cursor on DCN3.2 with SubVP

[why]
When SubVP is active the HW cursor size is limited to 64x64, and
anything larger will force composition which is bad for gaming on
DCN3.2 if the game uses a larger cursor.

[how]
If HW cursor is requested, typically by a fullscreen game, do not
enable SubVP so that up to 256x256 cursor sizes are available for
DCN3.2.

Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Aric Cyr <Aric.Cyr@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: fix type mismatch in CalculateDynamicMetadataParameters()
Vitaliy Shevtsov [Wed, 26 Feb 2025 20:28:51 +0000 (01:28 +0500)]
drm/amd/display: fix type mismatch in CalculateDynamicMetadataParameters()

There is a type mismatch between what CalculateDynamicMetadataParameters()
takes and what is passed to it. Currently this function accepts several
args as signed long but it's called with unsigned integers and integer. On
some systems where long is 32 bits and one of these unsigned int params is
greater than INT_MAX it may cause passing input params as negative values.

Fix this by changing these argument types from long to unsigned int and to
int respectively. Also this will align the function's definition with
similar functions in other dcn* drivers.

Found by Linux Verification Center (linuxtesting.org) with Svace.

Fixes: 6725a88f88a7 ("drm/amd/display: Add DCN3 DML")
Signed-off-by: Vitaliy Shevtsov <v.shevtsov@mt-integration.ru>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Avoid HDP flush on JPEG v5.0.1
Lijo Lazar [Thu, 20 Feb 2025 08:08:06 +0000 (13:38 +0530)]
drm/amdgpu: Avoid HDP flush on JPEG v5.0.1

Similar to JPEG v4.0.3, HDP flush shouldn't be performed by JPEG engine.
Keep it empty.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Initialize RRMT status on JPEG v5.0.1
Lijo Lazar [Thu, 20 Feb 2025 09:55:53 +0000 (15:25 +0530)]
drm/amdgpu: Initialize RRMT status on JPEG v5.0.1

Initialize RRMT enablement status from register.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Update SDMA scheduler mask handling to include page queue
Jesse.zhang@amd.com [Thu, 13 Feb 2025 05:33:34 +0000 (13:33 +0800)]
drm/amdgpu: Update SDMA scheduler mask handling to include page queue

This patch updates the SDMA scheduler mask handling to include the page queue
if it exists. The scheduler mask is calculated based on the number of SDMA
instances and the presence of the page queue. The mask is updated to reflect
the state of both the SDMA gfx ring and the page queue.

Changes:
- Add handling for the SDMA page queue in `amdgpu_debugfs_sdma_sched_mask_set`.
- Update scheduler mask calculations to include the page queue.
- Modify `amdgpu_debugfs_sdma_sched_mask_get` to return the correct mask value.

This change is necessary to verify multiple queues (SDMA gfx queue + page queue)
and ensure proper scheduling and state management for SDMA instances.

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Add offset normalization in VCN v5.0.1
Lijo Lazar [Thu, 20 Feb 2025 08:10:31 +0000 (13:40 +0530)]
drm/amdgpu: Add offset normalization in VCN v5.0.1

VCN v5.0.1 also will need register offset normalization. Reuse the logic
from VCN v4.0.3. Also, avoid HDP flush similar to VCN v4.0.3

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Initialize RRMT status on VCN v5.0.1
Lijo Lazar [Thu, 20 Feb 2025 10:03:06 +0000 (15:33 +0530)]
drm/amdgpu: Initialize RRMT status on VCN v5.0.1

Initialize RRMT status from register.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Free CPER entry after committing to ring
Xiang Liu [Fri, 28 Feb 2025 03:11:08 +0000 (11:11 +0800)]
drm/amdgpu: Free CPER entry after committing to ring

Free CPER entry when it's committed to CPER ring to avoid memory leak.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: fix spelling typos in SI
Alexandre Demers [Thu, 27 Feb 2025 05:05:06 +0000 (00:05 -0500)]
drm/amdgpu: fix spelling typos in SI

Fix typos

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/radeon: fix spelling typos
Alexandre Demers [Thu, 27 Feb 2025 05:05:05 +0000 (00:05 -0500)]
drm/radeon: fix spelling typos

Found some typos while exploring radeon code.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: fix spelling typos
Alexandre Demers [Thu, 27 Feb 2025 05:05:04 +0000 (00:05 -0500)]
drm/amdgpu: fix spelling typos

Found some typos while exploring amdgpu code.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Fix parameter annotation in vcn_v5_0_0_is_idle
Srinivasan Shanmugam [Wed, 26 Feb 2025 01:50:51 +0000 (07:20 +0530)]
drm/amdgpu: Fix parameter annotation in vcn_v5_0_0_is_idle

Update parameter description in the vcn_v5_0_0_is_idle function

Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1231: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_0_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1231: warning: Excess function parameter 'handle' description in 'vcn_v5_0_0_is_idle'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdkfd: debugfs hang_hws skip GPU with MES
Philip Yang [Mon, 10 Feb 2025 14:42:31 +0000 (09:42 -0500)]
drm/amdkfd: debugfs hang_hws skip GPU with MES

debugfs hang_hws is used by GPU reset test with HWS, for MES this crash
the kernel with NULL pointer access because dqm->packet_mgr is not setup
for MES path.

Skip GPU with MES for now, MES hang_hws debugfs interface will be
supported later.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdkfd: Fix pqm_destroy_queue race with GPU reset
Philip Yang [Thu, 20 Feb 2025 21:02:13 +0000 (16:02 -0500)]
drm/amdkfd: Fix pqm_destroy_queue race with GPU reset

If GPU in reset, destroy_queue return -EIO, pqm_destroy_queue should
delete the queue from process_queue_list and free the resource.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Fix parameter annotations for VCN clock gating functions
Srinivasan Shanmugam [Wed, 26 Feb 2025 01:40:38 +0000 (07:10 +0530)]
drm/amdgpu: Fix parameter annotations for VCN clock gating functions

The previous references to a non-existent `adev` parameter have been
removed & corrected to reflect the use of the `vinst` pointer, which
points to the VCN instance structure, in the below files:

- vcn_v1_0.c
- vcn_v2_0.c
- vcn_v3_0.c

Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c:624: warning: Function parameter or struct member 'vinst' not described in 'vcn_v1_0_enable_clock_gating'
drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c:624: warning: Excess function parameter 'adev' description in 'vcn_v1_0_enable_clock_gating'
drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c:376: warning: Function parameter or struct member 'vinst' not described in 'vcn_v2_0_mc_resume'
drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c:376: warning: Excess function parameter 'adev' description in 'vcn_v2_0_mc_resume'
drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:776: warning: Function parameter or struct member 'vinst' not described in 'vcn_v3_0_disable_clock_gating'
drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:776: warning: Excess function parameter 'adev' description in 'vcn_v3_0_disable_clock_gating'
drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:776: warning: Excess function parameter 'inst' description in 'vcn_v3_0_disable_clock_gating'
drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:965: warning: Function parameter or struct member 'vinst' not described in 'vcn_v3_0_enable_clock_gating'
drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:965: warning: Excess function parameter 'adev' description in 'vcn_v3_0_enable_clock_gating'
drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:965: warning: Excess function parameter 'inst' description in 'vcn_v3_0_enable_clock_gating'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdkfd: Fix mode1 reset crash issue
Philip Yang [Thu, 6 Feb 2025 22:50:13 +0000 (17:50 -0500)]
drm/amdkfd: Fix mode1 reset crash issue

If HW scheduler hangs and mode1 reset is used to recover GPU, KFD signal
user space to abort the processes. After process abort exit, user queues
still use the GPU to access system memory before h/w is reset while KFD
cleanup worker free system memory and free VRAM.

There is use-after-free race bug that KFD allocate and reuse the freed
system memory, and user queue write to the same system memory to corrupt
the data structure and cause driver crash.

To fix this race, KFD cleanup worker terminate user queues, then flush
reset_domain wq to wait for any GPU ongoing reset complete, and then
free outstanding BOs.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdkfd: KFD release_work possible circular locking
Philip Yang [Tue, 18 Feb 2025 01:08:29 +0000 (20:08 -0500)]
drm/amdkfd: KFD release_work possible circular locking

If waiting for gpu reset done in KFD release_work, thers is WARNING:
possible circular locking dependency detected

  #2  kfd_create_process
        kfd_process_mutex
          flush kfd release work

  #1  kfd release work
        wait for amdgpu reset work

  #0  amdgpu_device_gpu_reset
        kgd2kfd_pre_reset
          kfd_process_mutex

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock((work_completion)(&p->release_work));
                  lock((wq_completion)kfd_process_wq);
                  lock((work_completion)(&p->release_work));
   lock((wq_completion)amdgpu-reset-dev);

To fix this, KFD create process move flush release work outside
kfd_process_mutex.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdkfd: Remove kfd_process_hw_exception worker
Philip Yang [Tue, 25 Feb 2025 15:04:06 +0000 (10:04 -0500)]
drm/amdkfd: Remove kfd_process_hw_exception worker

With GPU reset-domain worker implemented, KFD hw_exception worker is not
needed any more, just call amdgpu_amdkfd_gpu_reset directly from
kfd_hws_hang.

Suggested-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/amdgpu: Add support for xgmi_v6_4_1
Asad Kamal [Wed, 26 Feb 2025 04:52:50 +0000 (12:52 +0800)]
drm/amd/amdgpu: Add support for xgmi_v6_4_1

Add support for xgmi_v6_4_1 and use it appropriate places

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Add xgmi speed/width related info
Lijo Lazar [Thu, 6 Feb 2025 11:54:51 +0000 (17:24 +0530)]
drm/amdgpu: Add xgmi speed/width related info

Add APIs to initialize XGMI speed, width details and get to max
bandwidth supported. It is assumed that a device only supports same
generation of XGMI links with uniform width.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Move xgmi definitions to xgmi header
Lijo Lazar [Thu, 6 Feb 2025 10:46:43 +0000 (16:16 +0530)]
drm/amdgpu: Move xgmi definitions to xgmi header

Move definitions related to xgmi to amdgpu_xgmi header

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/pm: add fan abnormal detection
Kenneth Feng [Thu, 27 Feb 2025 02:13:53 +0000 (10:13 +0800)]
drm/amd/pm: add fan abnormal detection

add fan abnormal detection on smu v14.0.2&smu v14.0.3

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdkfd: remove kfd_pasid.c from amdgpu driver build
Xiaogang Chen [Mon, 24 Feb 2025 22:50:50 +0000 (16:50 -0600)]
drm/amdkfd: remove kfd_pasid.c from amdgpu driver build

Since kfd uses pasid values from graphic driver now do not need use kfd pasid
fucntions.

Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdkfd: clamp queue size to minimum
David Yat Sin [Tue, 25 Feb 2025 23:08:02 +0000 (18:08 -0500)]
drm/amdkfd: clamp queue size to minimum

If queue size is less than minimum, clamp it to minimum to prevent
underflow when writing queue mqd.

Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Create a debug option to disable ring reset
André Almeida [Wed, 26 Feb 2025 13:11:18 +0000 (10:11 -0300)]
drm/amdgpu: Create a debug option to disable ring reset

Prior to the addition of ring reset, the debug option
`debug_disable_soft_recovery` could be used to force a full device
reset. Now that we have ring reset, create a debug option to disable
them in amdgpu, forcing the driver to go with the full device
reset path again when both options are combined.

This option is useful for testing and debugging purposes when one wants
to test the full reset from userspace.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Fix null check for pipe_ctx->plane_state in resource_build_scaling_p...
Ma Ke [Wed, 26 Feb 2025 08:37:31 +0000 (16:37 +0800)]
drm/amd/display: Fix null check for pipe_ctx->plane_state in resource_build_scaling_params

Null pointer dereference issue could occur when pipe_ctx->plane_state
is null. The fix adds a check to ensure 'pipe_ctx->plane_state' is not
null before accessing. This prevents a null pointer dereference.

Found by code review.

Fixes: 3be5262e353b ("drm/amd/display: Rename more dc_surface stuff to plane_state")
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agoDocumentation/gpu: remove duplicate entries in different glossaries
Alex Deucher [Wed, 26 Feb 2025 14:25:49 +0000 (09:25 -0500)]
Documentation/gpu: remove duplicate entries in different glossaries

Some items were defined in both the general and DC glossaries.
Remove the duplicate entries.

Fixes: 2df30ae0ba0b ("Documentation/gpu: Add acronyms for some firmware components")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu/mes11: drop amdgpu_mes_suspend()/amdgpu_mes_resume() calls
Alex Deucher [Thu, 20 Feb 2025 15:42:30 +0000 (10:42 -0500)]
drm/amdgpu/mes11: drop amdgpu_mes_suspend()/amdgpu_mes_resume() calls

They are noops on GFX11 for most firmware versions. KFD already
handles its own queues and they should already be unmapped at this
point so even if this runs, it's not doing anything.

Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Fix spelling mistake "initiailize" -> "initialize" and grammar
Colin Ian King [Wed, 26 Feb 2025 08:57:33 +0000 (08:57 +0000)]
drm/amdgpu: Fix spelling mistake "initiailize" -> "initialize" and grammar

There is a spelling mistake and a grammatical error in a dev_err
message. Fix it.

Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Decode deferred error type in aca bank parser
Xiang Liu [Wed, 26 Feb 2025 03:36:55 +0000 (11:36 +0800)]
drm/amdgpu: Decode deferred error type in aca bank parser

In the case of poison inband log, the error type need to be specified
by checking the deferred or poison bit of status register.

v2: check both deferred and poison bit

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: add sdma page queue irq processing for sdma442
Le Ma [Tue, 11 Feb 2025 06:06:54 +0000 (14:06 +0800)]
drm/amdgpu: add sdma page queue irq processing for sdma442

Add the trap irq processing for page queue of sdma442

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by and Tested-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/pm: disable gfxoff on the specific sku
Kenneth Feng [Wed, 26 Feb 2025 08:05:57 +0000 (16:05 +0800)]
drm/amd/pm: disable gfxoff on the specific sku

disable gfxoff on the specific sku based on the requirement

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Report generic instead of unknown boot time errors
Xiang Liu [Wed, 26 Feb 2025 06:27:27 +0000 (14:27 +0800)]
drm/amdgpu: Report generic instead of unknown boot time errors

Change the DMESG reporting of unknown errors to "Boot Controller
Generic Error" to align with the RAS SPEC and provide more clarity
to customers.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Fix logic to fetch supported NPS modes
Lijo Lazar [Tue, 25 Feb 2025 10:51:51 +0000 (16:21 +0530)]
drm/amdgpu: Fix logic to fetch supported NPS modes

Correct the logic to find supported NPS modes from firmware.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reported-by: Ava Zhang <niandong.zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Fixes: 30eb41f5d1a7 ("drm/amdgpu: Use firmware supported NPS modes")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Disable fru_id field in CPER section
Xiang Liu [Mon, 24 Feb 2025 07:13:40 +0000 (15:13 +0800)]
drm/amdgpu: Disable fru_id field in CPER section

The fru_id field is disabled cause of mis-matching defination
between CPER spec and driver.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdkfd: Fix Circular Locking Dependency in 'svm_range_cpu_invalidate_pagetables'
Srinivasan Shanmugam [Mon, 24 Feb 2025 08:16:32 +0000 (13:46 +0530)]
drm/amdkfd: Fix Circular Locking Dependency in 'svm_range_cpu_invalidate_pagetables'

This commit addresses a circular locking dependency in the
svm_range_cpu_invalidate_pagetables function. The function previously
held a lock while determining whether to perform an unmap or eviction
operation, which could lead to deadlocks.

Fixes the below:

[  223.418794] ======================================================
[  223.418820] WARNING: possible circular locking dependency detected
[  223.418845] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G     U     OE
[  223.418869] ------------------------------------------------------
[  223.418889] kfdtest/3939 is trying to acquire lock:
[  223.418906] ffff8957552eae38 (&dqm->lock_hidden){+.+.}-{3:3}, at: evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[  223.419302]
               but task is already holding lock:
[  223.419303] ffff8957556b83b0 (&prange->lock){+.+.}-{3:3}, at: svm_range_cpu_invalidate_pagetables+0x9d/0x850 [amdgpu]
[  223.419447] Console: switching to colour dummy device 80x25
[  223.419477] [IGT] amd_basic: executing
[  223.419599]
               which lock already depends on the new lock.

[  223.419611]
               the existing dependency chain (in reverse order) is:
[  223.419621]
               -> #2 (&prange->lock){+.+.}-{3:3}:
[  223.419636]        __mutex_lock+0x85/0xe20
[  223.419647]        mutex_lock_nested+0x1b/0x30
[  223.419656]        svm_range_validate_and_map+0x2f1/0x15b0 [amdgpu]
[  223.419954]        svm_range_set_attr+0xe8c/0x1710 [amdgpu]
[  223.420236]        svm_ioctl+0x46/0x50 [amdgpu]
[  223.420503]        kfd_ioctl_svm+0x50/0x90 [amdgpu]
[  223.420763]        kfd_ioctl+0x409/0x6d0 [amdgpu]
[  223.421024]        __x64_sys_ioctl+0x95/0xd0
[  223.421036]        x64_sys_call+0x1205/0x20d0
[  223.421047]        do_syscall_64+0x87/0x140
[  223.421056]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  223.421068]
               -> #1 (reservation_ww_class_mutex){+.+.}-{3:3}:
[  223.421084]        __ww_mutex_lock.constprop.0+0xab/0x1560
[  223.421095]        ww_mutex_lock+0x2b/0x90
[  223.421103]        amdgpu_amdkfd_alloc_gtt_mem+0xcc/0x2b0 [amdgpu]
[  223.421361]        add_queue_mes+0x3bc/0x440 [amdgpu]
[  223.421623]        unhalt_cpsch+0x1ae/0x240 [amdgpu]
[  223.421888]        kgd2kfd_start_sched+0x5e/0xd0 [amdgpu]
[  223.422148]        amdgpu_amdkfd_start_sched+0x3d/0x50 [amdgpu]
[  223.422414]        amdgpu_gfx_enforce_isolation_handler+0x132/0x270 [amdgpu]
[  223.422662]        process_one_work+0x21e/0x680
[  223.422673]        worker_thread+0x190/0x330
[  223.422682]        kthread+0xe7/0x120
[  223.422690]        ret_from_fork+0x3c/0x60
[  223.422699]        ret_from_fork_asm+0x1a/0x30
[  223.422708]
               -> #0 (&dqm->lock_hidden){+.+.}-{3:3}:
[  223.422723]        __lock_acquire+0x16f4/0x2810
[  223.422734]        lock_acquire+0xd1/0x300
[  223.422742]        __mutex_lock+0x85/0xe20
[  223.422751]        mutex_lock_nested+0x1b/0x30
[  223.422760]        evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[  223.423025]        kfd_process_evict_queues+0x8a/0x1d0 [amdgpu]
[  223.423285]        kgd2kfd_quiesce_mm+0x43/0x90 [amdgpu]
[  223.423540]        svm_range_cpu_invalidate_pagetables+0x4a7/0x850 [amdgpu]
[  223.423807]        __mmu_notifier_invalidate_range_start+0x1f5/0x250
[  223.423819]        copy_page_range+0x1e94/0x1ea0
[  223.423829]        copy_process+0x172f/0x2ad0
[  223.423839]        kernel_clone+0x9c/0x3f0
[  223.423847]        __do_sys_clone+0x66/0x90
[  223.423856]        __x64_sys_clone+0x25/0x30
[  223.423864]        x64_sys_call+0x1d7c/0x20d0
[  223.423872]        do_syscall_64+0x87/0x140
[  223.423880]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  223.423891]
               other info that might help us debug this:

[  223.423903] Chain exists of:
                 &dqm->lock_hidden --> reservation_ww_class_mutex --> &prange->lock

[  223.423926]  Possible unsafe locking scenario:

[  223.423935]        CPU0                    CPU1
[  223.423942]        ----                    ----
[  223.423949]   lock(&prange->lock);
[  223.423958]                                lock(reservation_ww_class_mutex);
[  223.423970]                                lock(&prange->lock);
[  223.423981]   lock(&dqm->lock_hidden);
[  223.423990]
                *** DEADLOCK ***

[  223.423999] 5 locks held by kfdtest/3939:
[  223.424006]  #0: ffffffffb82b4fc0 (dup_mmap_sem){.+.+}-{0:0}, at: copy_process+0x1387/0x2ad0
[  223.424026]  #1: ffff89575eda81b0 (&mm->mmap_lock){++++}-{3:3}, at: copy_process+0x13a8/0x2ad0
[  223.424046]  #2: ffff89575edaf3b0 (&mm->mmap_lock/1){+.+.}-{3:3}, at: copy_process+0x13e4/0x2ad0
[  223.424066]  #3: ffffffffb82e76e0 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at: copy_page_range+0x1cea/0x1ea0
[  223.424088]  #4: ffff8957556b83b0 (&prange->lock){+.+.}-{3:3}, at: svm_range_cpu_invalidate_pagetables+0x9d/0x850 [amdgpu]
[  223.424365]
               stack backtrace:
[  223.424374] CPU: 0 UID: 0 PID: 3939 Comm: kfdtest Tainted: G     U     OE      6.12.0-amdstaging-drm-next-lol-050225 #14
[  223.424392] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[  223.424401] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO WIFI/X570 AORUS PRO WIFI, BIOS F36a 02/16/2022
[  223.424416] Call Trace:
[  223.424423]  <TASK>
[  223.424430]  dump_stack_lvl+0x9b/0xf0
[  223.424441]  dump_stack+0x10/0x20
[  223.424449]  print_circular_bug+0x275/0x350
[  223.424460]  check_noncircular+0x157/0x170
[  223.424469]  ? __bfs+0xfd/0x2c0
[  223.424481]  __lock_acquire+0x16f4/0x2810
[  223.424490]  ? srso_return_thunk+0x5/0x5f
[  223.424505]  lock_acquire+0xd1/0x300
[  223.424514]  ? evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[  223.424783]  __mutex_lock+0x85/0xe20
[  223.424792]  ? evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[  223.425058]  ? srso_return_thunk+0x5/0x5f
[  223.425067]  ? mark_held_locks+0x54/0x90
[  223.425076]  ? evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[  223.425339]  ? srso_return_thunk+0x5/0x5f
[  223.425350]  mutex_lock_nested+0x1b/0x30
[  223.425358]  ? mutex_lock_nested+0x1b/0x30
[  223.425367]  evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[  223.425631]  kfd_process_evict_queues+0x8a/0x1d0 [amdgpu]
[  223.425893]  kgd2kfd_quiesce_mm+0x43/0x90 [amdgpu]
[  223.426156]  svm_range_cpu_invalidate_pagetables+0x4a7/0x850 [amdgpu]
[  223.426423]  ? srso_return_thunk+0x5/0x5f
[  223.426436]  __mmu_notifier_invalidate_range_start+0x1f5/0x250
[  223.426450]  copy_page_range+0x1e94/0x1ea0
[  223.426461]  ? srso_return_thunk+0x5/0x5f
[  223.426474]  ? srso_return_thunk+0x5/0x5f
[  223.426484]  ? lock_acquire+0xd1/0x300
[  223.426494]  ? copy_process+0x1718/0x2ad0
[  223.426502]  ? srso_return_thunk+0x5/0x5f
[  223.426510]  ? sched_clock_noinstr+0x9/0x10
[  223.426519]  ? local_clock_noinstr+0xe/0xc0
[  223.426528]  ? copy_process+0x1718/0x2ad0
[  223.426537]  ? srso_return_thunk+0x5/0x5f
[  223.426550]  copy_process+0x172f/0x2ad0
[  223.426569]  kernel_clone+0x9c/0x3f0
[  223.426577]  ? __schedule+0x4c9/0x1b00
[  223.426586]  ? srso_return_thunk+0x5/0x5f
[  223.426594]  ? sched_clock_noinstr+0x9/0x10
[  223.426602]  ? srso_return_thunk+0x5/0x5f
[  223.426610]  ? local_clock_noinstr+0xe/0xc0
[  223.426619]  ? schedule+0x107/0x1a0
[  223.426629]  __do_sys_clone+0x66/0x90
[  223.426643]  __x64_sys_clone+0x25/0x30
[  223.426652]  x64_sys_call+0x1d7c/0x20d0
[  223.426661]  do_syscall_64+0x87/0x140
[  223.426671]  ? srso_return_thunk+0x5/0x5f
[  223.426679]  ? common_nsleep+0x44/0x50
[  223.426690]  ? srso_return_thunk+0x5/0x5f
[  223.426698]  ? trace_hardirqs_off+0x52/0xd0
[  223.426709]  ? srso_return_thunk+0x5/0x5f
[  223.426717]  ? syscall_exit_to_user_mode+0xcc/0x200
[  223.426727]  ? srso_return_thunk+0x5/0x5f
[  223.426736]  ? do_syscall_64+0x93/0x140
[  223.426748]  ? srso_return_thunk+0x5/0x5f
[  223.426756]  ? up_write+0x1c/0x1e0
[  223.426765]  ? srso_return_thunk+0x5/0x5f
[  223.426775]  ? srso_return_thunk+0x5/0x5f
[  223.426783]  ? trace_hardirqs_off+0x52/0xd0
[  223.426792]  ? srso_return_thunk+0x5/0x5f
[  223.426800]  ? syscall_exit_to_user_mode+0xcc/0x200
[  223.426810]  ? srso_return_thunk+0x5/0x5f
[  223.426818]  ? do_syscall_64+0x93/0x140
[  223.426826]  ? syscall_exit_to_user_mode+0xcc/0x200
[  223.426836]  ? srso_return_thunk+0x5/0x5f
[  223.426844]  ? do_syscall_64+0x93/0x140
[  223.426853]  ? srso_return_thunk+0x5/0x5f
[  223.426861]  ? irqentry_exit+0x6b/0x90
[  223.426869]  ? srso_return_thunk+0x5/0x5f
[  223.426877]  ? exc_page_fault+0xa7/0x2c0
[  223.426888]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  223.426898] RIP: 0033:0x7f46758eab57
[  223.426906] Code: ba 04 00 f3 0f 1e fa 64 48 8b 04 25 10 00 00 00 45 31 c0 31 d2 31 f6 bf 11 00 20 01 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 41 89 c0 85 c0 75 2c 64 48 8b 04 25 10 00
[  223.426930] RSP: 002b:00007fff5c3e5188 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[  223.426943] RAX: ffffffffffffffda RBX: 00007f4675f8c040 RCX: 00007f46758eab57
[  223.426954] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[  223.426965] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  223.426975] R10: 00007f4675e81a50 R11: 0000000000000246 R12: 0000000000000001
[  223.426986] R13: 00007fff5c3e5470 R14: 00007fff5c3e53e0 R15: 00007fff5c3e5410
[  223.427004]  </TASK>

v2: To resolve this issue, the allocation of the process context buffer
(`proc_ctx_bo`) has been moved from the `add_queue_mes` function to the
`pqm_create_queue` function. This change ensures that the buffer is
allocated only when the first queue for a process is created and only if
the Micro Engine Scheduler (MES) is enabled. (Felix)

v3: Fix typo s/Memory Execution Scheduler (MES)/Micro Engine Scheduler
in commit message. (Lijo)

Fixes: 438b39ac74e2 ("drm/amdkfd: pause autosuspend when creating pdd")
Cc: Jesse Zhang <jesse.zhang@amd.com>
Cc: Yunxiang Li <Yunxiang.Li@amd.com>
Cc: Philip Yang <Philip.Yang@amd.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Add amdisp pinctrl MFD resource
Benjamin Chan [Fri, 31 Jan 2025 19:03:46 +0000 (14:03 -0500)]
drm/amdgpu: Add amdisp pinctrl MFD resource

AMDISP GPIO control uses a dedicated pinctrl driver,
and requires MFD hotadd GPIO resources.

Co-developed-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Benjamin Chan <benjamin.chan@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu/mes12: drop amdgpu_mes_suspend()/amdgpu_mes_resume() calls
Alex Deucher [Thu, 20 Feb 2025 14:58:25 +0000 (09:58 -0500)]
drm/amdgpu/mes12: drop amdgpu_mes_suspend()/amdgpu_mes_resume() calls

They are noops on GFX12.  There is no suspend/resume all support
in firmware so the function doesn't do anything.  KFD already
handles its own queues and they should already be unmapped at this
point so even if this runs, it's not doing anything.

Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display: Remove unused optc3_fpu_set_vrr_m_const
Dr. David Alan Gilbert [Mon, 24 Feb 2025 01:49:42 +0000 (01:49 +0000)]
drm/amd/display: Remove unused optc3_fpu_set_vrr_m_const

The last use of optc3_fpu_set_vrr_m_const() was removed in 2022's
commit 64f991590ff4 ("drm/amd/display: Fix a compilation failure on PowerPC
caused by FPU code")
which removed the only caller (with a similar) name.

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu: Replace DRM_ERROR() with drm_err()
Pratap Nirujogi [Wed, 19 Feb 2025 22:01:26 +0000 (17:01 -0500)]
drm/amdgpu: Replace DRM_ERROR() with drm_err()

DRM_ERROR() is no longer preferred. Replace DRM_ERROR() usage
with drm_err() in isp driver.

Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amd/display/dc: Refactor remove duplications
Luan Arcanjo [Tue, 25 Feb 2025 01:55:29 +0000 (22:55 -0300)]
drm/amd/display/dc: Refactor remove duplications

All dce command_table_helper's shares a copy-pasted collection
of copy-pasted functions, which are: phy_id_to_atom,
clock_source_id_to_atom_phy_clk_src_id, and engine_bp_to_atom.

This patch removes the multiple copy-pasted by moving them to
the command_table_helper.c and make the command_table_helper's
calls the functions implemented by the command_table_helper.c
instead.

The changes were not tested on actual hardware. I am only able
to verify that the changes keep the code compileable and do my
best to to look repeatedly if I am not actually changing any code.

This is the version 4 of the PATCH, fixed comments about
licence in the new files and the matches From email to
Signed-off-by email. Fixed comments about using
command_table_helper instead of creating a dce_common

Signed-off-by: Luan Icaro Pinto Arcanjo <luanicaro@usp.br>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu/vcn: use dev_info() for firmware information
Alex Deucher [Tue, 7 Jan 2025 17:26:46 +0000 (12:26 -0500)]
drm/amdgpu/vcn: use dev_info() for firmware information

To properly handle multiple GPUs.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu/vcn: optimize firmware storage
Alex Deucher [Tue, 7 Jan 2025 17:16:28 +0000 (12:16 -0500)]
drm/amdgpu/vcn: optimize firmware storage

If each instance uses the same fw image, only store one
copy in the driver.

Acked-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu/vcn5.0.1: use generic set_power_gating_state helper
Alex Deucher [Tue, 10 Dec 2024 19:25:46 +0000 (14:25 -0500)]
drm/amdgpu/vcn5.0.1: use generic set_power_gating_state helper

No need for an IP specific version.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu/vcn5.0.0: use generic set_power_gating_state helper
Alex Deucher [Tue, 10 Dec 2024 19:22:53 +0000 (14:22 -0500)]
drm/amdgpu/vcn5.0.0: use generic set_power_gating_state helper

No need for an IP specific version.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu/vcn4.0.5: use generic set_power_gating_state helper
Alex Deucher [Tue, 10 Dec 2024 19:22:25 +0000 (14:22 -0500)]
drm/amdgpu/vcn4.0.5: use generic set_power_gating_state helper

No need for an IP specific version.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 months agodrm/amdgpu/vcn4.0.3: use generic set_power_gating_state helper
Alex Deucher [Tue, 26 Nov 2024 17:41:45 +0000 (12:41 -0500)]
drm/amdgpu/vcn4.0.3: use generic set_power_gating_state helper

No need for an IP specific version.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>