Commit Graph

118468 Commits

Author SHA1 Message Date
Simona Vetter 18b1ce0b29 UAPI Changes:
- Make madvise autoreset an explicit behavior requested by userspace
    (Thomas Hellström)
 
 Driver Changes:
  - Drop XE_VMA flag conversion and ensure GPUVA flags are passed around
    (homas Hellström)
  - Fix missing wq allocation error checking (Matthew Brost)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE6rM8lpABPHM5FqyDm6KlpjDL6lMFAmj6+SEACgkQm6KlpjDL
 6lNJuA/+Nh1l9Heu1CbLxjj85hHq4EDiu0t6o+txKexczy9oCet7aSAueVPEAv9l
 DG/jAvM2Oa+dKLlhw7S8YEZSY+Pnqlq8ex9ASdV5RRIvwc/mZEPxRqJOOBDgB44D
 DAz3ify02zJ64siQNlBmyJi33lV7p1xqRzSTTaiQ6UrlKpebSke+SqY6H2NtQSNw
 6lyYH+YQOzDi4MLIRBbVJgKkw3cBRFvTTYcFUrIjNbehSYOGVTUoPj1AO/ufjjhI
 af+Rgxdw48EbJ9i2Nz8qYM564iQWtpt9GHv9/wcXAB9WA2rCMhykirFilpYG4aua
 K9eB4dgtN8rgouxkBG6gLdJ9+BVOuCH/Y80qOFn8dh8/ZATg/zCCtP6xBsUcD1J5
 79u8RZtvT4eAHAsYKPYpezrF/1+GGBA/gNsVlfLGDmEsOxdXv2/PNqaGX6KpEpaE
 wK/DVDPQCJkSbr5EsVAfvmZQopQ6OG5a8ehbdhRjwewZVe0w2IAQSyEO4wQVOvfK
 ZuNUk0iDEMJhZwdRCva1aMwi9pPRN0oY9QriuUWynWespyxmgmcT3xTOCkJdxrAO
 QmzBPYUmShGL1nvLCEjjvpDRsiv1Dp/TMfTzHvrO1qbpAxmHobG3j/33fpgBnNOm
 wWDfYAQqu62/KrKD6OQ+X4tpQtLc7e7EaRWHuN/XIr0zXeSh5UM=
 =/yIj
 -----END PGP SIGNATURE-----

Merge tag 'drm-xe-fixes-2025-10-23' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

UAPI Changes:
 - Make madvise autoreset an explicit behavior requested by userspace
   (Thomas Hellström)

Driver Changes:
 - Drop XE_VMA flag conversion and ensure GPUVA flags are passed around
   (homas Hellström)
 - Fix missing wq allocation error checking (Matthew Brost)

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/4p2glnvgifc6osjlvzv23xhsyqhw4diqlfxz54lmg7robv44bi@nwd37zpqfa2l
2025-10-24 13:39:21 +02:00
Simona Vetter adb0971a1a - Fix panic structure allocation memory leak (Jani)
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAmj6I2AACgkQ+mJfZA7r
 E8qykQf9HoJq6rEFWhGE9QcI/lkxa8fquz7D7gjm3BIQBvWlcaLyGy9rF2pmw9VB
 p+vyDpI+xaDhwn9tTrMTkghM3dw6pQhfeTsbX/ll2T9ErooSfwso4ErX1hsl2za5
 u1AvgpXARlt3fHbUYQPye/4JWWpoRPXpP2G8juz/Tcgh+2bvRg0iteTtH5eauZlh
 /qia9ZTpZokQTsw0aA01tdklNvSfc1zfE+SdrOYxSu29sS87IK7xyqAAfxZZglMR
 Ms9bXOy829odd9Gb3D649OtMcT1IhcMnAJcAt1lJcmiZRN2fNn0HL6mk5R/bpc8s
 GQJ4LbOH0zNAQgBvYwCPLHFVfNkOBQ==
 =EbYt
 -----END PGP SIGNATURE-----

Merge tag 'drm-intel-fixes-2025-10-23' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Fix panic structure allocation memory leak (Jani)

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/aPojgsvNYOU0tN4U@intel.com
2025-10-24 13:36:48 +02:00
Simona Vetter 0cdf7f6fa6 Short summary of fixes pull:
panic:
 - Fix several issues in size calculations
 
 panthor:
 - Fix kernel panic on partial unmap of GPU VA region
 
 rockchip:
 - hdmi: Fix HDP setup
 -----BEGIN PGP SIGNATURE-----
 
 iQFPBAABCgA5FiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmj56G8bFIAAAAAABAAO
 bWFudTIsMi41KzEuMTEsMiwyAAoJEGgNwR1TC3oj8tYIAI/0c0nQugunz7MbiFYn
 to8kG9yzRZoGMoiiWLn4a7vIg8dw3KpZkVWXCMW1eIWK+TUsCiRIF7+XCthuxyxI
 boXPUQNUCTgVR74II7jeYFsEra9ODpr2V2RfnZHsosEgo+v0sn1iPUxpZ7dJBj8h
 qTl8PjvO9SNUr7f3wnBnWy4Fm6yFn/fbsXHgD2r8SahSeEzYGhQMUiPPkGT2PdH1
 dsh8IRkRBzMFGkXX7jZ9HwF8WskJ2w/bCA7B/DoXfYy/GYlth31SDih/gGbZqkLh
 HjPr70ljUVh9etwu72pRAK5u14jIwQlrKJMbMfp7WjPxwc9ARrWTnLnH1w+i4Spz
 Pvw=
 =IY81
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-fixes-2025-10-23' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

Short summary of fixes pull:

panic:
- Fix several issues in size calculations

panthor:
- Fix kernel panic on partial unmap of GPU VA region

rockchip:
- hdmi: Fix HDP setup

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20251023083449.GA13190@linux-2.fritz.box
2025-10-24 13:35:26 +02:00
Matthew Brost ce29214ada drm/xe: Check return value of GGTT workqueue allocation
Workqueue allocation can fail, so check the return value of the GGTT
workqueue allocation and fail driver initialization if the allocation
fails.

Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20251022005538.828980-2-matthew.brost@intel.com
(cherry picked from commit 1f1314e8e7)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-23 20:09:30 -07:00
Aurabindo Pillai 72a1eb3cf5 drm/amd/display: use GFP_NOWAIT for allocation in interrupt handler
schedule_dc_vmin_vmax() is called by dm_crtc_high_irq(). Hence, we
cannot have the former sleep. Use GFP_NOWAIT for allocation in this
function.

Fixes: c210b757b4 ("drm/amd/display: fix dmub access race condition")
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c04812cbe2)
Cc: stable@vger.kernel.org
2025-10-21 09:52:06 -04:00
Charlene Liu bec947cbe9 drm/amd/display: increase max link count and fix link->enc NULL pointer access
[why]
1.) dc->links[MAX_LINKS] array size smaller than actual requested.
max_connector + max_dpia + 4 virtual = 14.
increase from 12 to 14.

2.) hw_init() access null LINK_ENC for dpia non display_endpoint.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Reviewed-by: Chris Park <chris.park@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d7f5a61e1b)
Cc: stable@vger.kernel.org
2025-10-21 09:50:27 -04:00
Meenakshikumar Somasundaram 89939cf252 drm/amd/display: Fix NULL pointer dereference
[Why]
On a mst branch with multi display setup, dc context is obselete
after updating the first stream. Referencing the same dc context
for the next stream update to fetch dc pointer leads to NULL
pointer dereference.

[How]
Get the dc pointer from the link rather than context.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit dc69b48988)
Cc: stable@vger.kernel.org
2025-10-21 09:45:33 -04:00
Jocelyn Falempe 23437509a6 drm/panic: Fix 24bit pixel crossing page boundaries
When using page list framebuffer, and using RGB888 format, some
pixels can cross the page boundaries, and this case was not handled,
leading to writing 1 or 2 bytes on the next virtual address.

Add a check and a specific function to handle this case.

Fixes: c9ff280879 ("drm/panic: Add support to scanout buffer as array of pages")
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://lore.kernel.org/r/20251009122955.562888-7-jfalempe@redhat.com
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
2025-10-21 11:28:03 +02:00
Jocelyn Falempe 2e337dd278 drm/panic: Fix divide by 0 if the screen width < font width
In the unlikely case that the screen is tiny, and smaller than the
font width, it leads to a divide by 0:

draw_line_with_wrap()
chars_per_row = sb->width / font->width = 0
line_wrap.len = line->len % chars_per_row;

This will trigger a divide by 0

Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://lore.kernel.org/r/20251009122955.562888-6-jfalempe@redhat.com
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
2025-10-21 11:28:03 +02:00
Jocelyn Falempe e9b36fe063 drm/panic: Fix kmsg text drawing rectangle
The rectangle height was larger than the screen size. This has no
real impact.

Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://lore.kernel.org/r/20251009122955.562888-5-jfalempe@redhat.com
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
2025-10-21 11:28:03 +02:00
Jocelyn Falempe 4fcffb5e5c drm/panic: Fix qr_code, ensure vmargin is positive
Depending on qr_code size and screen size, the vertical margin can
be negative, that means there is not enough room to draw the qr_code.

So abort early, to avoid a segfault by trying to draw at negative
coordinates.

Fixes: cb5164ac43 ("drm/panic: Add a QR code panic screen")
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://lore.kernel.org/r/20251009122955.562888-4-jfalempe@redhat.com
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
2025-10-21 11:28:03 +02:00
Jocelyn Falempe cfa56e0a0e drm/panic: Fix overlap between qr code and logo
The borders of the qr code was not taken into account to check if it
overlap with the logo, leading to the logo being partially covered.

Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://lore.kernel.org/r/20251009122955.562888-3-jfalempe@redhat.com
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
2025-10-21 11:28:02 +02:00
Jocelyn Falempe 179753aa5b drm/panic: Fix drawing the logo on a small narrow screen
If the logo width is bigger than the framebuffer width, and the
height is big enough to hold the logo and the message, it will draw
at x coordinate that are higher than the width, and ends up in a
corrupted image.

Fixes: 4b570ac2eb ("drm/rect: Add drm_rect_overlap()")
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://lore.kernel.org/r/20251009122955.562888-2-jfalempe@redhat.com
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
2025-10-21 11:28:02 +02:00
Thomas Hellström ce831bffce drm/xe/uapi: Hide the madvise autoreset behind a VM_BIND flag
The madvise implementation currently resets the SVM madvise if the
underlying CPU map is unmapped. This is in an attempt to mimic the
CPU madvise behaviour. However, it's not clear that this is a desired
behaviour since if the end app user relies on it for malloc()ed
objects or stack objects, it may not work as intended.

Instead of having the autoreset functionality being a direct
application-facing implicit UAPI, make the UMD explicitly choose
this behaviour if it wants to expose it by introducing
DRM_XE_VM_BIND_FLAG_MADVISE_AUTORESET, and add a semantics
description.

v2:
- Kerneldoc fixes. Fix a commit log message.

Fixes: a2eb8aec3e ("drm/xe: Reset VMA attributes to default in SVM garbage collector")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: "Falkowski, John" <john.falkowski@intel.com>
Cc: "Mrozek, Michal" <michal.mrozek@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20251015170726.178685-2-thomas.hellstrom@linux.intel.com
(cherry picked from commit 59a2d3f38a)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-20 17:03:44 -07:00
Thomas Hellström 9a3c0d6834 drm/xe: Retain vma flags when recreating and splitting vmas for madvise
When splitting and restoring vmas for madvise, we only copied the
XE_VMA_SYSTEM_ALLOCATOR flag. That meant we lost flags for read_only,
dumpable and sparse (in case anyone would call madvise for the latter).

Instead, define a mask of relevant flags and ensure all are replicated,
To simplify this and make the code a bit less fragile, remove the
conversion to VMA_CREATE flags and instead just pass around the
gpuva flags after initial conversion from user-space.

Fixes: a2eb8aec3e ("drm/xe: Reset VMA attributes to default in SVM garbage collector")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251015170726.178685-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit b3af8658ec)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-20 17:03:39 -07:00
Jani Nikula 789e46fbfc
drm/i915/panic: fix panic structure allocation memory leak
Separating the panic allocation from framebuffer allocation in commit
729c5f7ffa ("drm/{i915,xe}/panic: move framebuffer allocation where it
belongs") failed to deallocate the panic structure anywhere.

The fix is two-fold. First, free the panic structure in
intel_user_framebuffer_destroy() in the general case. Second, move the
panic allocation later to intel_framebuffer_init() to not leak the panic
structure in error paths (if any, now or later) between
intel_framebuffer_alloc() and intel_framebuffer_init().

v2: Rebase

Fixes: 729c5f7ffa ("drm/{i915,xe}/panic: move framebuffer allocation where it belongs")
Cc: Jocelyn Falempe <jfalempe@redhat.com>
Cc: Maarten Lankhorst <dev@lankhorst.se>
Reported-by: Michał Grzelak <michal.grzelak@intel.com>
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Michał Grzelak <michal.grzelak@intel.com> # v1
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
Link: https://lore.kernel.org/r/20251015095135.2183415-1-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit 8f8ef09fcf)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-10-20 12:53:57 -04:00
Akash Goel 4eabd0d879 drm/panthor: Fix kernel panic on partial unmap of a GPU VA region
This commit address a kernel panic issue that can happen if Userspace
tries to partially unmap a GPU virtual region (aka drm_gpuva).
The VM_BIND interface allows partial unmapping of a BO.

Panthor driver pre-allocates memory for the new drm_gpuva structures
that would be needed for the map/unmap operation, done using drm_gpuvm
layer. It expected that only one new drm_gpuva would be needed on umap
but a partial unmap can require 2 new drm_gpuva and that's why it
ended up doing a NULL pointer dereference causing a kernel panic.

Following dump was seen when partial unmap was exercised.
 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000078
 Mem abort info:
   ESR = 0x0000000096000046
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x06: level 2 translation fault
 Data abort info:
   ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000
   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp=000000088a863000
 [000000000000078] pgd=080000088a842003, p4d=080000088a842003, pud=0800000884bf5003, pmd=0000000000000000
 Internal error: Oops: 0000000096000046 [#1] PREEMPT SMP
 <snip>
 pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : panthor_gpuva_sm_step_remap+0xe4/0x330 [panthor]
 lr : panthor_gpuva_sm_step_remap+0x6c/0x330 [panthor]
 sp : ffff800085d43970
 x29: ffff800085d43970 x28: ffff00080363e440 x27: ffff0008090c6000
 x26: 0000000000000030 x25: ffff800085d439f8 x24: ffff00080d402000
 x23: ffff800085d43b60 x22: ffff800085d439e0 x21: ffff00080abdb180
 x20: 0000000000000000 x19: 0000000000000000 x18: 0000000000000010
 x17: 6e656c202c303030 x16: 3666666666646466 x15: 393d61766f69202c
 x14: 312d3d7361203a70 x13: 303030323d6e656c x12: ffff80008324bf58
 x11: 0000000000000003 x10: 0000000000000002 x9 : ffff8000801a6a9c
 x8 : ffff00080360b300 x7 : 0000000000000000 x6 : 000000088aa35fc7
 x5 : fff1000080000000 x4 : ffff8000842ddd30 x3 : 0000000000000001
 x2 : 0000000100000000 x1 : 0000000000000001 x0 : 0000000000000078
 Call trace:
  panthor_gpuva_sm_step_remap+0xe4/0x330 [panthor]
  op_remap_cb.isra.22+0x50/0x80
  __drm_gpuvm_sm_unmap+0x10c/0x1c8
  drm_gpuvm_sm_unmap+0x40/0x60
  panthor_vm_exec_op+0xb4/0x3d0 [panthor]
  panthor_vm_bind_exec_sync_op+0x154/0x278 [panthor]
  panthor_ioctl_vm_bind+0x160/0x4a0 [panthor]
  drm_ioctl_kernel+0xbc/0x138
  drm_ioctl+0x240/0x500
  __arm64_sys_ioctl+0xb0/0xf8
  invoke_syscall+0x4c/0x110
  el0_svc_common.constprop.1+0x98/0xf8
  do_el0_svc+0x24/0x38
  el0_svc+0x40/0xf8
  el0t_64_sync_handler+0xa0/0xc8
  el0t_64_sync+0x174/0x178

Signed-off-by: Akash Goel <akash.goel@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Fixes: 647810ec24 ("drm/panthor: Add the MMU/VM logical block")
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20251017102922.670084-1-akash.goel@arm.com
2025-10-17 13:48:56 +01:00
Dave Airlie 62cab426d0 Driver Changes:
- Increase global invalidation timeout to handle some workloads
    (Kenneth Graunke)
  - Fix NPD while evicting BOs in an array of VM binds (Matthew Brost)
  - Fix resizable BAR to account for possibly needing to move BARs other
    than the LMEMBAR (Lucas De Marchi)
  - Fix error handling in xe_migrate_init() (Thomas Hellström)
  - Fix atomic fault handling with mixed mappings or if the page is
    already in VRAM (Matthew Brost)
  - Enable media samplers power gating for platforms before Xe2 (Vinay
    Belgaumkar)
  - Fix de-registering exec queue from GuC when unbinding (Matthew Brost)
  - Ensure data migration to system if indicated by madvise with SVM
    (Thomas Hellström)
  - Fix kerneldoc for kunit change (Matt Roper)
  - Always account for cacheline alignment on migration (Matthew Auld)
  - Drop bogus assertion on eviction (Matthew Auld)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE6rM8lpABPHM5FqyDm6KlpjDL6lMFAmjxG4kACgkQm6KlpjDL
 6lMVSxAAi89fShCW8/H7TJUDfaQdX1qTg574x+4kmsxAe5IdZLKR17iKagGwhXYt
 pGPOHLL6s13trhSDT9RHxrQ/iUhlUMAf3HGZyeC0/X86QuKA0qGbrXoJTdexaA/V
 AyaXmyPCh4CsDP7o/QNfkmaH9Ze3tYniYPxKmQXIsbJbG6hK8jgREpE3UC0ilveX
 9rgA8t66W08CbPsHX8bLEgpQ6dchSZHOvHSaXvW3X1xDIi9P5kd2A3JPW9q+T15M
 84xtbxan6JDZx+xguIKimlUti6ihTSksxkAV6nKyg0I3n56iLarf0HN5MDM6ZExU
 8uS1ZmocaKqLji51LroIL+0X31H4VnQZlT/eehheBukW8SF6/jXEnq2PtxNy01Yi
 NJTCcwvvA0jMhK02tc9gcpHgJcmjp08lbymlZ0QdEp4gIQn5dpXubhcvdNeOmUK9
 NJMD8aE+9JnQ6iD8GFVjvdTSHKMpKtsNl2kUShOU3oK1KNHAqn/v3r4iM8VbGBff
 TaCxusNeVqFCcWkh4R58ppKdKiwLzitjc0xP9kjNFtGDVtPS11fluxQ+BhhrzFKk
 84wnhG8Lry7Ss5TpCAWjirxQOANx/q4Nef7uby6QAF9SLuon7Q2XU7ShLOlWIeTH
 AmtX57A8TxTrXa0Smn0rIP7/sYAdGfWDAdTDdJjAoJ36w8T2rgo=
 =BqxC
 -----END PGP SIGNATURE-----

Merge tag 'drm-xe-fixes-2025-10-16' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
 - Increase global invalidation timeout to handle some workloads
   (Kenneth Graunke)
 - Fix NPD while evicting BOs in an array of VM binds (Matthew Brost)
 - Fix resizable BAR to account for possibly needing to move BARs other
   than the LMEMBAR (Lucas De Marchi)
 - Fix error handling in xe_migrate_init() (Thomas Hellström)
 - Fix atomic fault handling with mixed mappings or if the page is
   already in VRAM (Matthew Brost)
 - Enable media samplers power gating for platforms before Xe2 (Vinay
   Belgaumkar)
 - Fix de-registering exec queue from GuC when unbinding (Matthew Brost)
 - Ensure data migration to system if indicated by madvise with SVM
   (Thomas Hellström)
 - Fix kerneldoc for kunit change (Matt Roper)
 - Always account for cacheline alignment on migration (Matthew Auld)
 - Drop bogus assertion on eviction (Matthew Auld)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/rch735eqkmprfyutk3ux2fsqa3e5ve4p77w7a5j66qdpgyquxr@ao3wzcqtpn6s
2025-10-17 09:39:53 +10:00
Dave Airlie d6dd930a6b Short summary of fixes pull:
ast:
 - Fix display output after reboot
 
 bridge:
 - lt9211: Fix version check
 
 core:
 - draw: Avoid color truncation
 - gpuvm: Avoid kernel-doc warning
 - sched: Avoid double free
 
 panthor:
 - Fix MCU suspend
 
 qaic:
 - Init bootlog in correct order
 - Treat remaining == 0 as error in find_and_map_user_pages()
 - Lock access to DBC request queue
 
 rockchip:
 - vop2: Fix destination size in atomic check
 -----BEGIN PGP SIGNATURE-----
 
 iQFPBAABCgA5FiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmjw/fgbFIAAAAAABAAO
 bWFudTIsMi41KzEuMTEsMiwyAAoJEGgNwR1TC3ojCAAH/Aso0c3Zhiy2t6v+ZkWA
 lm9biRjMA82aPqHODQMiuBcqMjNel8zY1mRAwEMWs88MV/fw3BTdUjg7Oh9cDPBa
 IbeYBqUefyMs5I+4J4o3KjvmmAkr9qGiFjz+IYScLyFgy5/zZBRiK33LCXMfwQl9
 dD6mgVkn78o6i15v4IaVpkrGzXi8dPqhVG3l3qONvyIgJa8yzCoyTq2dpdI1aHL4
 Czv6NtZ05bkfZFIFois1YKVKPaAkipQhmxYtulVfFOaah09VSO20HPfhlSlONRMz
 IiXVNisE0os1LP7J7/sg2CKbnTTJ0G1ObJ8d5ZBtM6HlBBe0B1n4Vz5hDKjc4ObL
 /yI=
 =JFwD
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-fixes-2025-10-16' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

Short summary of fixes pull:

ast:
- Fix display output after reboot

bridge:
- lt9211: Fix version check

core:
- draw: Avoid color truncation
- gpuvm: Avoid kernel-doc warning
- sched: Avoid double free

panthor:
- Fix MCU suspend

qaic:
- Init bootlog in correct order
- Treat remaining == 0 as error in find_and_map_user_pages()
- Lock access to DBC request queue

rockchip:
- vop2: Fix destination size in atomic check

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20251016141607.GA73919@linux.fritz.box
2025-10-17 09:14:06 +10:00
Dave Airlie 520133b0ba amd-drm-fixes-6.18-2025-10-16:
amdgpu:
 - Backlight fix
 - SI fixes
 - CIK fix
 - Make CE support debug only
 - IP discovery fix
 - Ring reset fixes
 - GPUVM fault memory barrier fix
 - Drop unused structures in amdgpu_drm.h
 - JPEG debugfs fix
 - VRAM handling fixes for GPUs without VRAM
 - GC 12 MES fixes
 
 amdkfd:
 - MES fix
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCaPDwuAAKCRC93/aFa7yZ
 2CyaAP9Xwuar8fw2+CaoL7zdvo7MkqQpwOVBkyoQkKOQlZK9gQEAzotisG8jHkls
 hAfx8eqFoNSQevdKkShgVubIHna5hAw=
 =nbWU
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-fixes-6.18-2025-10-16' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.18-2025-10-16:

amdgpu:
- Backlight fix
- SI fixes
- CIK fix
- Make CE support debug only
- IP discovery fix
- Ring reset fixes
- GPUVM fault memory barrier fix
- Drop unused structures in amdgpu_drm.h
- JPEG debugfs fix
- VRAM handling fixes for GPUs without VRAM
- GC 12 MES fixes

amdkfd:
- MES fix

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20251016132224.2534946-1-alexander.deucher@amd.com
2025-10-17 06:58:40 +10:00
Alok Tiwari c700e7279b drm/rockchip: dw_hdmi: use correct SCLIN mask for RK3228
In dw_hdmi_rk3228_setup_hpd(), the SCLIN mask incorrectly references
the RK3328 variant. This change updates it to the RK3228-specific mask
RK3228_HDMI_SCLIN_MSK using FIELD_PREP_WM16, ensuring proper HPD and
I2C pin configuration for RK3228.

Change: RK3328_HDMI_SCLIN_MSK -> RK3228_HDMI_SCLIN_MSK

Fixes: 63df37f3fc ("drm/rockchip: dw_hdmi: switch to FIELD_PREP_WM16* macros")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20251010173143.72733-1-alok.a.tiwari@oracle.com
2025-10-16 17:57:50 +02:00
Tvrtko Ursulin 5801e65206 drm/sched: Fix potential double free in drm_sched_job_add_resv_dependencies
When adding dependencies with drm_sched_job_add_dependency(), that
function consumes the fence reference both on success and failure, so in
the latter case the dma_fence_put() on the error path (xarray failed to
expand) is a double free.

Interestingly this bug appears to have been present ever since
commit ebd5f74255 ("drm/sched: Add dependency tracking"), since the code
back then looked like this:

drm_sched_job_add_implicit_dependencies():
...
       for (i = 0; i < fence_count; i++) {
               ret = drm_sched_job_add_dependency(job, fences[i]);
               if (ret)
                       break;
       }

       for (; i < fence_count; i++)
               dma_fence_put(fences[i]);

Which means for the failing 'i' the dma_fence_put was already a double
free. Possibly there were no users at that time, or the test cases were
insufficient to hit it.

The bug was then only noticed and fixed after
commit 9c2ba26535 ("drm/scheduler: use new iterator in drm_sched_job_add_implicit_dependencies v2")
landed, with its fixup of
commit 4eaf02d607 ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies").

At that point it was a slightly different flavour of a double free, which
commit 963d0b3569 ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder")
noticed and attempted to fix.

But it only moved the double free from happening inside the
drm_sched_job_add_dependency(), when releasing the reference not yet
obtained, to the caller, when releasing the reference already released by
the former in the failure case.

As such it is not easy to identify the right target for the fixes tag so
lets keep it simple and just continue the chain.

While fixing we also improve the comment and explain the reason for taking
the reference and not dropping it.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: 963d0b3569 ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/dri-devel/aNFbXq8OeYl3QSdm@stanley.mountain/
Cc: Christian König <christian.koenig@amd.com>
Cc: Rob Clark <robdclark@chromium.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Philipp Stanner <phasta@kernel.org>
Cc: Christian König <ckoenig.leichtzumerken@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: stable@vger.kernel.org # v5.16+
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://lore.kernel.org/r/20251015084015.6273-1-tvrtko.ursulin@igalia.com
2025-10-16 14:26:05 +02:00
Matthew Auld 225bc03d85 drm/xe/evict: drop bogus assert
This assert can trigger here with non pin_map users that select
LATE_RESTORE, since the vmap is allowed to be NULL given that
save/restore can now use the blitter instead. The check here doesn't
seem to have much value anymore given that we no longer move pinned
memory, so any existing vmap is left well alone, and doesn't need to be
recreated upon restore, so just drop the assert here.

Fixes: 86f69c2611 ("drm/xe: use backup object for pinned save/restore")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6213
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/20251010152457.177884-2-matthew.auld@intel.com
(cherry picked from commit a10b4a69c7)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-15 22:48:54 -07:00
Matthew Auld 6a91af25cd drm/xe/migrate: don't misalign current bytes
If current bytes exceeds the max copy size, ensure the clamped size
still accounts for the XE_CACHELINE_BYTES alignment, otherwise we
trigger the assert in xe_migrate_vram with the size now being out of
alignment.

Fixes: 8c2d61e0e9 ("drm/xe/migrate: don't overflow max copy size")
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6212
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251010162020.190962-2-matthew.auld@intel.com
(cherry picked from commit 641bcf8731)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-15 22:48:48 -07:00
Matt Roper 6d36f65ba5 drm/xe/kunit: Fix kerneldoc for parameterized tests
Kunit's generate_params() was recently updated to take an additional
test context parameter.  Xe's IP and platform parameter generators were
updated accordingly at the same time, but the new parameter was not
added to the functions' kerneldoc, resulting in the following warnings:

   Warning: drivers/gpu/drm/xe/tests/xe_pci.c:78 function parameter 'test' not described in 'xe_pci_fake_data_gen_params'
   Warning: drivers/gpu/drm/xe/tests/xe_pci.c:254 function parameter 'test' not described in 'xe_pci_graphics_ip_gen_param'
   Warning: drivers/gpu/drm/xe/tests/xe_pci.c:278 function parameter 'test' not described in 'xe_pci_media_ip_gen_param'
   Warning: drivers/gpu/drm/xe/tests/xe_pci.c:302 function parameter 'test' not described in 'xe_pci_id_gen_param'
   Warning: drivers/gpu/drm/xe/tests/xe_pci.c:390 function parameter 'test' not described in 'xe_pci_live_device_gen_param'
   5 warnings as errors

Document the new parameter to eliminate the warnings and make CI happy.

Fixes: b9a214b5f6 ("kunit: Pass parameterized test context to generate_params()")
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://lore.kernel.org/r/20251013153014.2362879-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 89e347f8a7)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-15 22:48:43 -07:00
Thomas Hellström 7987b93e3a drm/xe/svm: Ensure data will be migrated to system if indicated by madvise.
If the location madvise() is set to
DRM_XE_PREFERRED_LOC_DEFAULT_SYSTEM, the drm_pagemap in the
SVM gpu fault handler will be set to NULL. However there is nothing
that explicitly migrates the data to system if it is already present
in device memory.

In that case, set the device memory owner to NULL to ensure
data gets properly migrated to system on page-fault.

v2:
- Remove redundant dpagemap assignment (Himal Prasad Ghimiray)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v1
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20251010104149.72783-2-thomas.hellstrom@linux.intel.com
Fixes: 10aa5c8060 ("drm/gpusvm, drm/xe: Fix userptr to not allow device private pages")
(cherry picked from commit 2cfcea7a74)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-15 22:48:37 -07:00
Jouni Högander 95355766e5
drm/i915/psr: Deactivate PSR only on LNL and when selective fetch enabled
Using intel_psr_exit in frontbuffer flush on older platforms seems to be
causing problems.

Sending single full frame update using intel_psr_force_update is anyways
more optimal compared to psr deactivate/activate -> move back to this
approach on PSR1, PSR HW tracking and Panel Replay full frame update and
use deactivate/activate only on LunarLake and only when selective fetch is
enabled.

Tested-by: Lemen <lemen@lemen.xyz>
Tested-by: Koos Vriezen <koos.vriezen@gmail.com>
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14946
Signed-off-by: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
Link: https://lore.kernel.org/r/20250922102725.2752742-1-jouni.hogander@intel.com
(cherry picked from commit 924adb0bbd)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-10-15 10:12:43 -04:00
Thomas Zimmermann 6f719373b9 drm/ast: Blank with VGACR17 sync enable, always clear VGACRB6 sync off
Blank the display by disabling sync pulses with VGACR17<7>. Unblank
by reenabling them. This VGA setting should be supported by all Aspeed
hardware.

Ast currently blanks via sync-off bits in VGACRB6. Not all BMCs handle
VGACRB6 correctly. After disabling sync during a reboot, some BMCs do
not reenable it after the soft reset. The display output remains dark.
When the display is off during boot, some BMCs set the sync-off bits in
VGACRB6, so the display remains dark. Observed with  Blackbird AST2500
BMCs. Clearing the sync-off bits unconditionally fixes these issues.

Also do not modify VGASR1's SD bit for blanking, as it only disables GPU
access to video memory.

v2:
- init vgacrb6 correctly (Jocelyn)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: ce3d99c834 ("drm: Call drm_atomic_helper_shutdown() at shutdown time for misc drivers")
Tested-by: Nick Bowler <nbowler@draconx.ca>
Reported-by: Nick Bowler <nbowler@draconx.ca>
Closes: https://lore.kernel.org/dri-devel/wpwd7rit6t4mnu6kdqbtsnk5bhftgslio6e2jgkz6kgw6cuvvr@xbfswsczfqsi/
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Jocelyn Falempe <jfalempe@redhat.com>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v6.7+
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
Link: https://lore.kernel.org/r/20251014084743.18242-1-tzimmermann@suse.de
2025-10-15 09:55:35 +02:00
Thomas Zimmermann 48a710760e Merge drm/drm-fixes into drm-misc-fixes
Updating drm-misc-fixes to the state of v6.18-rc1.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2025-10-14 10:59:58 +02:00
Alok Tiwari 7f38a14875 drm/rockchip: vop2: use correct destination rectangle height check
The vop2_plane_atomic_check() function incorrectly checks
drm_rect_width(dest) twice instead of verifying both width and height.
Fix the second condition to use drm_rect_height(dest) so that invalid
destination rectangles with height < 4 are correctly rejected.

Fixes: 604be85547 ("drm/rockchip: Add VOP2 driver")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Andy Yan <andy.yan@rock-chips.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20251012142005.660727-1-alok.a.tiwari@oracle.com
2025-10-14 10:32:17 +02:00
Francesco Valla 095232711f drm/draw: fix color truncation in drm_draw_fill24
The color parameter passed to drm_draw_fill24() was truncated to 16
bits, leading to an incorrect color drawn to the target iosys_map.
Fix this behavior, widening the parameter to 32 bits.

Fixes: 31fa2c1ca0 ("drm/panic: Move drawing functions to drm_draw")

Signed-off-by: Francesco Valla <francesco@valla.it>
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
Link: https://lore.kernel.org/r/20251003-drm_draw_fill24_fix-v1-1-8fb7c1c2a893@valla.it
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
2025-10-14 09:25:10 +02:00
Shuicheng Lin 9f64b3cd05 drm/xe/guc: Check GuC running state before deregistering exec queue
In normal operation, a registered exec queue is disabled and
deregistered through the GuC, and freed only after the GuC confirms
completion. However, if the driver is forced to unbind while the exec
queue is still running, the user may call exec_destroy() after the GuC
has already been stopped and CT communication disabled.

In this case, the driver cannot receive a response from the GuC,
preventing proper cleanup of exec queue resources. Fix this by directly
releasing the resources when GuC is not running.

Here is the failure dmesg log:
"
[  468.089581] ---[ end trace 0000000000000000 ]---
[  468.089608] pci 0000:03:00.0: [drm] *ERROR* GT0: GUC ID manager unclean (1/65535)
[  468.090558] pci 0000:03:00.0: [drm] GT0:     total 65535
[  468.090562] pci 0000:03:00.0: [drm] GT0:     used 1
[  468.090564] pci 0000:03:00.0: [drm] GT0:     range 1..1 (1)
[  468.092716] ------------[ cut here ]------------
[  468.092719] WARNING: CPU: 14 PID: 4775 at drivers/gpu/drm/xe/xe_ttm_vram_mgr.c:298 ttm_vram_mgr_fini+0xf8/0x130 [xe]
"

v2: use xe_uc_fw_is_running() instead of xe_guc_ct_enabled().
    As CT may go down and come back during VF migration.

Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251010172529.2967639-2-shuicheng.lin@intel.com
(cherry picked from commit 9b42321a02)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-13 13:03:26 -07:00
Vinay Belgaumkar 1852d27aa9 drm/xe: Enable media sampler power gating
Where applicable, enable media sampler power gating. Also, add
it to the powergate_info debugfs.

v2: Remove the sampler powergate status since it is cleared quickly anyway.
v3: Use vcs mask (Rodrigo) and fix the version check for media
v4: Remove extra spaces
v5: Media samplers are independent of vcs mask,
    use Media version 1255 (Matt Roper)

Fixes: 38e8c4184e ("drm/xe: Enable Coarse Power Gating")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://lore.kernel.org/r/20251010011047.2047584-1-vinay.belgaumkar@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 4cbc08649a)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-13 13:03:20 -07:00
Matthew Brost 7413e9f2be drm/xe: Handle mixed mappings and existing VRAM on atomic faults
Moving to VRAM will fail if mixed mappings are present or if the page is
already located in VRAM. Atomic faults that require a move to VRAM
currently retry without attempting to evict mixed mappings or locate
existing VRAM mappings.

This patch fixes the issue by attempting to evict mixed mappings or find
existing VRAM pages when a move to VRAM fails during atomic fault
handling.

Fixes: a9ac0fa455 ("drm/xe: Strict migration policy for atomic SVM faults")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20251009130629.3531962-1-matthew.brost@intel.com
(cherry picked from commit 75188605c5)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-13 13:03:15 -07:00
Thomas Hellström 1117e7d1e8 drm/xe/migrate: Fix an error path
The exhaustive eviction accidently changed an error path goto to
a return. Fix this.

Fixes: 59eabff2a3 ("drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction")
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://lore.kernel.org/r/20250910160939.103473-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit 381f1ed151)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-13 13:03:08 -07:00
Lucas De Marchi d30203739b drm/xe: Move rebar to be done earlier
There may be cases in which the BAR0 also needs to move to accommodate
the bigger BAR2. However if it's not released, the BAR2 resize fails.
During the vram probe it can't be released as it's already in use by
xe_mmio for early register access.

Add a new function in xe_vram and let xe_pci call it directly before
even early device probe. This allows the BAR2 to resize in cases BAR0
also needs to move, assuming there aren't other reasons to hold that
move:

	[] xe 0000:03:00.0: vgaarb: deactivate vga console
	[] xe 0000:03:00.0: [drm] Attempting to resize bar from 8192MiB -> 16384MiB
	[] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: releasing
	[] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing
	[] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing
	[] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing
	[] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned
	[] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned
	[] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned
	[] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: assigned
	[] pcieport 0000:00:01.0: PCI bridge to [bus 01-04]
	[] pcieport 0000:00:01.0:   bridge window [mem 0x83000000-0x840fffff]
	[] pcieport 0000:00:01.0:   bridge window [mem 0x4000000000-0x44007fffff 64bit pref]
	[] pcieport 0000:01:00.0: PCI bridge to [bus 02-04]
	[] pcieport 0000:01:00.0:   bridge window [mem 0x83000000-0x840fffff]
	[] pcieport 0000:01:00.0:   bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]
	[] pcieport 0000:02:01.0: PCI bridge to [bus 03]
	[] pcieport 0000:02:01.0:   bridge window [mem 0x83000000-0x83ffffff]
	[] pcieport 0000:02:01.0:   bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]
	[] xe 0000:03:00.0: [drm] BAR2 resized to 16384M
	[] xe 0000:03:00.0: [drm:xe_pci_probe [xe]] BATTLEMAGE  e221:0000 dgfx:1 gfx:Xe2_HPG (20.02) ...

For BMG there are additional fix needed in the PCI side, but this
helps getting it to a working resize.

All the rebar logic is more pci-specific than xe-specific and can be
done very early in the probe sequence. In future it would be good to
move it out of xe_vram.c, but this refactor is left for later.

Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: stable@vger.kernel.org # 6.12+
Link: https://lore.kernel.org/intel-xe/fafda2a3-fc63-ce97-d22b-803f771a4d19@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20250918-xe-pci-rebar-2-v1-2-6c094702a074@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 45e33f220f)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-13 13:03:03 -07:00
Matthew Brost 7ac74613e5 drm/xe: Don't allow evicting of BOs in same VM in array of VM binds
An array of VM binds can potentially evict other buffer objects (BOs)
within the same VM under certain conditions, which may lead to NULL
pointer dereferences later in the bind pipeline. To prevent this, clear
the allow_res_evict flag in the xe_bo_validate call.

v2:
 - Invert polarity of no_res_evict (Thomas)
 - Add comment in code explaining issue (Thomas)

Cc: stable@vger.kernel.org
Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6268
Fixes: 774b5fa509 ("drm/xe: Avoid evicting object of the same vm in none fault mode")
Fixes: 77f2ef3f16 ("drm/xe: Lock all gpuva ops during VM bind IOCTL")
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/20251009110618.3481870-1-matthew.brost@intel.com
(cherry picked from commit 8b9ba8d6d9)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-13 13:02:58 -07:00
Kenneth Graunke e5ae8d1eb0 drm/xe: Increase global invalidation timeout to 1000us
The previous timeout of 500us seems to be too small; panning the map in
the Roll20 VTT in Firefox on a KDE/Wayland desktop reliably triggered
timeouts within a few seconds of usage, causing the monitor to freeze
and the following to be printed to dmesg:

[Jul30 13:44] xe 0000:03:00.0: [drm] *ERROR* GT0: Global invalidation timeout
[Jul30 13:48] xe 0000:03:00.0: [drm] *ERROR* [CRTC:82:pipe A] flip_done timed out

I haven't hit a single timeout since increasing it to 1000us even after
several multi-hour testing sessions.

Fixes: 0dd2dd0182 ("drm/xe: Move DSB l2 flush to a more sensible place")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5710
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: stable@vger.kernel.org
Cc: Maarten Lankhorst <dev@lankhorst.se>
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://lore.kernel.org/r/20250912223254.147940-1-kenneth@whitecape.org
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 146046907b)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-13 13:02:50 -07:00
Jonathan Kim 079ae5118e drm/amdkfd: fix suspend/resume all calls in mes based eviction path
Suspend/resume all gangs should be done with the device lock is held.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:28 -04:00
Jonathan Kim 277bb0f83e drm/amdgpu: enable suspend/resume all for gfx 12
Suspend/resume all gangs has been available for GFX12 for a while now
so enable it.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:28 -04:00
Jonathan Kim 0ef930e1fa drm/amdgpu: fix hung reset queue array memory allocation
By design the MES will return an array result that is twice the number
of hung doorbells it can report.

i.e. if up k reported doorbells are supported, then the
second half of the array, also of length k, holds the HQD information
(type/queue/pipe) where queue 1 corresponds to index 0 and k,
queue 2 corresponds to index 1 and k + 1 etc ...

The driver will use the HDQ info to target queue/pipe reset for
hardware scheduled user compute queues.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:28 -04:00
Jonathan Kim 8745ca5efb drm/amdgpu: fix initialization of doorbell array for detect and hang
Initialized doorbells should be set to invalid rather than 0 to prevent
driver from over counting hung doorbells since it checks against the
invalid value to begin with.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:28 -04:00
Jonathan Kim d0de79f66a drm/amdgpu: fix gfx12 mes packet status return check
GFX12 MES uses low 32 bits of status return for success (1 or 0)
and high bits for debug information if low bits are 0.

GFX11 MES doesn't do this so checking full 64-bit status return
for 1 or 0 is still valid.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2025-10-13 14:14:16 -04:00
Jesse.Zhang 883f309add drm/amdgpu: Fix NULL pointer dereference in VRAM logic for APU devices
Previously, APU platforms (and other scenarios with uninitialized VRAM managers)
triggered a NULL pointer dereference in `ttm_resource_manager_usage()`. The root
cause is not that the `struct ttm_resource_manager *man` pointer itself is NULL,
but that `man->bdev` (the backing device pointer within the manager) remains
uninitialized (NULL) on APUs—since APUs lack dedicated VRAM and do not fully
set up VRAM manager structures. When `ttm_resource_manager_usage()` attempts to
acquire `man->bdev->lru_lock`, it dereferences the NULL `man->bdev`, leading to
a kernel OOPS.

1. **amdgpu_cs.c**: Extend the existing bandwidth control check in
   `amdgpu_cs_get_threshold_for_moves()` to include a check for
   `ttm_resource_manager_used()`. If the manager is not used (uninitialized
   `bdev`), return 0 for migration thresholds immediately—skipping VRAM-specific
   logic that would trigger the NULL dereference.

2. **amdgpu_kms.c**: Update the `AMDGPU_INFO_VRAM_USAGE` ioctl and memory info
   reporting to use a conditional: if the manager is used, return the real VRAM
   usage; otherwise, return 0. This avoids accessing `man->bdev` when it is
   NULL.

3. **amdgpu_virt.c**: Modify the vf2pf (virtual function to physical function)
   data write path. Use `ttm_resource_manager_used()` to check validity: if the
   manager is usable, calculate `fb_usage` from VRAM usage; otherwise, set
   `fb_usage` to 0 (APUs have no discrete framebuffer to report).

This approach is more robust than APU-specific checks because it:
- Works for all scenarios where the VRAM manager is uninitialized (not just APUs),
- Aligns with TTM's design by using its native helper function,
- Preserves correct behavior for discrete GPUs (which have fully initialized
  `man->bdev` and pass the `ttm_resource_manager_used()` check).

v4: use ttm_resource_manager_used(&adev->mman.vram_mgr.manager) instead of checking the adev->gmc.is_app_apu flag (Christian)

Reviewed-by: Christian König <christian.koenig@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:15 -04:00
Christian König 33cc891b56 drm/amdgpu: hide VRAM sysfs attributes on GPUs without VRAM
Otherwise accessing them can cause a crash.

Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:15 -04:00
Sathishkumar S 74de0eaa00 drm/amdgpu: fix bit shift logic
BIT_ULL(n) sets nth bit, remove explicit shift and set the position

Fixes: a7a411e246 ("drm/amdgpu: fix shift-out-of-bounds in amdgpu_debugfs_jpeg_sched_mask_set")
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:15 -04:00
Timur Kristóf 6917112af2 drm/amd/powerplay: Fix CIK shutdown temperature
Remove extra multiplication.

CIK GPUs such as Hawaii appear to use PP_TABLE_V0 in which case
the shutdown temperature is hardcoded in smu7_init_dpm_defaults
and is already multiplied by 1000. The value was mistakenly
multiplied another time by smu7_get_thermal_temperature_range.

Fixes: 4ba082572a ("drm/amd/powerplay: export the thermal ranges of VI asics (V2)")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1676
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:15 -04:00
Gui-Dong Han 6df8e84aa6 drm/amdgpu: use atomic functions with memory barriers for vm fault info
The atomic variable vm_fault_info_updated is used to synchronize access to
adev->gmc.vm_fault_info between the interrupt handler and
get_vm_fault_info().

The default atomic functions like atomic_set() and atomic_read() do not
provide memory barriers. This allows for CPU instruction reordering,
meaning the memory accesses to vm_fault_info and the vm_fault_info_updated
flag are not guaranteed to occur in the intended order. This creates a
race condition that can lead to inconsistent or stale data being used.

The previous implementation, which used an explicit mb(), was incomplete
and inefficient. It failed to account for all potential CPU reorderings,
such as the access of vm_fault_info being reordered before the atomic_read
of the flag. This approach is also more verbose and less performant than
using the proper atomic functions with acquire/release semantics.

Fix this by switching to atomic_set_release() and atomic_read_acquire().
These functions provide the necessary acquire and release semantics,
which act as memory barriers to ensure the correct order of operations.
It is also more efficient and idiomatic than using explicit full memory
barriers.

Fixes: b97dfa27ef ("drm/amdgpu: save vm fault information for amdkfd")
Cc: stable@vger.kernel.org
Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:15 -04:00
Alex Deucher ff780f4f80 drm/amdgpu: set an error on all fences from a bad context
When we backup ring contents to reemit after a queue reset,
we don't backup ring contents from the bad context.  When
we signal the fences, we should set an error on those
fences as well.

v2: misc cleanups
v3: add locking for fence error, fix comment (Christian)
v4: fix wrap around, locking (Christian)

Fixes: 77cc0da39c ("drm/amdgpu: track ring state associated with a fence")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:15 -04:00
Alex Deucher 1f22fcb88b drm/amdgpu: handle wrap around in reemit handling
Compare the sequence numbers directly.

Fixes: 77cc0da39c ("drm/amdgpu: track ring state associated with a fence")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:15 -04:00