linux

Commit Graph

Author	SHA1	Message	Date
Simona Vetter	18b1ce0b29	UAPI Changes: - Make madvise autoreset an explicit behavior requested by userspace (Thomas Hellström) Driver Changes: - Drop XE_VMA flag conversion and ensure GPUVA flags are passed around (homas Hellström) - Fix missing wq allocation error checking (Matthew Brost) -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE6rM8lpABPHM5FqyDm6KlpjDL6lMFAmj6+SEACgkQm6KlpjDL 6lNJuA/+Nh1l9Heu1CbLxjj85hHq4EDiu0t6o+txKexczy9oCet7aSAueVPEAv9l DG/jAvM2Oa+dKLlhw7S8YEZSY+Pnqlq8ex9ASdV5RRIvwc/mZEPxRqJOOBDgB44D DAz3ify02zJ64siQNlBmyJi33lV7p1xqRzSTTaiQ6UrlKpebSke+SqY6H2NtQSNw 6lyYH+YQOzDi4MLIRBbVJgKkw3cBRFvTTYcFUrIjNbehSYOGVTUoPj1AO/ufjjhI af+Rgxdw48EbJ9i2Nz8qYM564iQWtpt9GHv9/wcXAB9WA2rCMhykirFilpYG4aua K9eB4dgtN8rgouxkBG6gLdJ9+BVOuCH/Y80qOFn8dh8/ZATg/zCCtP6xBsUcD1J5 79u8RZtvT4eAHAsYKPYpezrF/1+GGBA/gNsVlfLGDmEsOxdXv2/PNqaGX6KpEpaE wK/DVDPQCJkSbr5EsVAfvmZQopQ6OG5a8ehbdhRjwewZVe0w2IAQSyEO4wQVOvfK ZuNUk0iDEMJhZwdRCva1aMwi9pPRN0oY9QriuUWynWespyxmgmcT3xTOCkJdxrAO QmzBPYUmShGL1nvLCEjjvpDRsiv1Dp/TMfTzHvrO1qbpAxmHobG3j/33fpgBnNOm wWDfYAQqu62/KrKD6OQ+X4tpQtLc7e7EaRWHuN/XIr0zXeSh5UM= =/yIj -----END PGP SIGNATURE----- Merge tag 'drm-xe-fixes-2025-10-23' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes UAPI Changes: - Make madvise autoreset an explicit behavior requested by userspace (Thomas Hellström) Driver Changes: - Drop XE_VMA flag conversion and ensure GPUVA flags are passed around (homas Hellström) - Fix missing wq allocation error checking (Matthew Brost) Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> From: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/4p2glnvgifc6osjlvzv23xhsyqhw4diqlfxz54lmg7robv44bi@nwd37zpqfa2l	2025-10-24 13:39:21 +02:00
Simona Vetter	adb0971a1a	- Fix panic structure allocation memory leak (Jani) -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAmj6I2AACgkQ+mJfZA7r E8qykQf9HoJq6rEFWhGE9QcI/lkxa8fquz7D7gjm3BIQBvWlcaLyGy9rF2pmw9VB p+vyDpI+xaDhwn9tTrMTkghM3dw6pQhfeTsbX/ll2T9ErooSfwso4ErX1hsl2za5 u1AvgpXARlt3fHbUYQPye/4JWWpoRPXpP2G8juz/Tcgh+2bvRg0iteTtH5eauZlh /qia9ZTpZokQTsw0aA01tdklNvSfc1zfE+SdrOYxSu29sS87IK7xyqAAfxZZglMR Ms9bXOy829odd9Gb3D649OtMcT1IhcMnAJcAt1lJcmiZRN2fNn0HL6mk5R/bpc8s GQJ4LbOH0zNAQgBvYwCPLHFVfNkOBQ== =EbYt -----END PGP SIGNATURE----- Merge tag 'drm-intel-fixes-2025-10-23' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes - Fix panic structure allocation memory leak (Jani) Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/aPojgsvNYOU0tN4U@intel.com	2025-10-24 13:36:48 +02:00
Simona Vetter	0cdf7f6fa6	Short summary of fixes pull: panic: - Fix several issues in size calculations panthor: - Fix kernel panic on partial unmap of GPU VA region rockchip: - hdmi: Fix HDP setup -----BEGIN PGP SIGNATURE----- iQFPBAABCgA5FiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmj56G8bFIAAAAAABAAO bWFudTIsMi41KzEuMTEsMiwyAAoJEGgNwR1TC3oj8tYIAI/0c0nQugunz7MbiFYn to8kG9yzRZoGMoiiWLn4a7vIg8dw3KpZkVWXCMW1eIWK+TUsCiRIF7+XCthuxyxI boXPUQNUCTgVR74II7jeYFsEra9ODpr2V2RfnZHsosEgo+v0sn1iPUxpZ7dJBj8h qTl8PjvO9SNUr7f3wnBnWy4Fm6yFn/fbsXHgD2r8SahSeEzYGhQMUiPPkGT2PdH1 dsh8IRkRBzMFGkXX7jZ9HwF8WskJ2w/bCA7B/DoXfYy/GYlth31SDih/gGbZqkLh HjPr70ljUVh9etwu72pRAK5u14jIwQlrKJMbMfp7WjPxwc9ARrWTnLnH1w+i4Spz Pvw= =IY81 -----END PGP SIGNATURE----- Merge tag 'drm-misc-fixes-2025-10-23' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: panic: - Fix several issues in size calculations panthor: - Fix kernel panic on partial unmap of GPU VA region rockchip: - hdmi: Fix HDP setup Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20251023083449.GA13190@linux-2.fritz.box	2025-10-24 13:35:26 +02:00
Matthew Brost	ce29214ada	drm/xe: Check return value of GGTT workqueue allocation Workqueue allocation can fail, so check the return value of the GGTT workqueue allocation and fail driver initialization if the allocation fails. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20251022005538.828980-2-matthew.brost@intel.com (cherry picked from commit `1f1314e8e7`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-23 20:09:30 -07:00
Aurabindo Pillai	72a1eb3cf5	drm/amd/display: use GFP_NOWAIT for allocation in interrupt handler schedule_dc_vmin_vmax() is called by dm_crtc_high_irq(). Hence, we cannot have the former sleep. Use GFP_NOWAIT for allocation in this function. Fixes: `c210b757b4` ("drm/amd/display: fix dmub access race condition") Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c04812cbe2`) Cc: stable@vger.kernel.org	2025-10-21 09:52:06 -04:00
Charlene Liu	bec947cbe9	drm/amd/display: increase max link count and fix link->enc NULL pointer access [why] 1.) dc->links[MAX_LINKS] array size smaller than actual requested. max_connector + max_dpia + 4 virtual = 14. increase from 12 to 14. 2.) hw_init() access null LINK_ENC for dpia non display_endpoint. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com> Reviewed-by: Chris Park <chris.park@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `d7f5a61e1b`) Cc: stable@vger.kernel.org	2025-10-21 09:50:27 -04:00
Meenakshikumar Somasundaram	89939cf252	drm/amd/display: Fix NULL pointer dereference [Why] On a mst branch with multi display setup, dc context is obselete after updating the first stream. Referencing the same dc context for the next stream update to fetch dc pointer leads to NULL pointer dereference. [How] Get the dc pointer from the link rather than context. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `dc69b48988`) Cc: stable@vger.kernel.org	2025-10-21 09:45:33 -04:00
Jocelyn Falempe	23437509a6	drm/panic: Fix 24bit pixel crossing page boundaries When using page list framebuffer, and using RGB888 format, some pixels can cross the page boundaries, and this case was not handled, leading to writing 1 or 2 bytes on the next virtual address. Add a check and a specific function to handle this case. Fixes: `c9ff280879` ("drm/panic: Add support to scanout buffer as array of pages") Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://lore.kernel.org/r/20251009122955.562888-7-jfalempe@redhat.com Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>	2025-10-21 11:28:03 +02:00
Jocelyn Falempe	2e337dd278	drm/panic: Fix divide by 0 if the screen width < font width In the unlikely case that the screen is tiny, and smaller than the font width, it leads to a divide by 0: draw_line_with_wrap() chars_per_row = sb->width / font->width = 0 line_wrap.len = line->len % chars_per_row; This will trigger a divide by 0 Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://lore.kernel.org/r/20251009122955.562888-6-jfalempe@redhat.com Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>	2025-10-21 11:28:03 +02:00
Jocelyn Falempe	e9b36fe063	drm/panic: Fix kmsg text drawing rectangle The rectangle height was larger than the screen size. This has no real impact. Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://lore.kernel.org/r/20251009122955.562888-5-jfalempe@redhat.com Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>	2025-10-21 11:28:03 +02:00
Jocelyn Falempe	4fcffb5e5c	drm/panic: Fix qr_code, ensure vmargin is positive Depending on qr_code size and screen size, the vertical margin can be negative, that means there is not enough room to draw the qr_code. So abort early, to avoid a segfault by trying to draw at negative coordinates. Fixes: `cb5164ac43` ("drm/panic: Add a QR code panic screen") Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://lore.kernel.org/r/20251009122955.562888-4-jfalempe@redhat.com Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>	2025-10-21 11:28:03 +02:00
Jocelyn Falempe	cfa56e0a0e	drm/panic: Fix overlap between qr code and logo The borders of the qr code was not taken into account to check if it overlap with the logo, leading to the logo being partially covered. Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://lore.kernel.org/r/20251009122955.562888-3-jfalempe@redhat.com Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>	2025-10-21 11:28:02 +02:00
Jocelyn Falempe	179753aa5b	drm/panic: Fix drawing the logo on a small narrow screen If the logo width is bigger than the framebuffer width, and the height is big enough to hold the logo and the message, it will draw at x coordinate that are higher than the width, and ends up in a corrupted image. Fixes: `4b570ac2eb` ("drm/rect: Add drm_rect_overlap()") Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://lore.kernel.org/r/20251009122955.562888-2-jfalempe@redhat.com Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>	2025-10-21 11:28:02 +02:00
Thomas Hellström	ce831bffce	drm/xe/uapi: Hide the madvise autoreset behind a VM_BIND flag The madvise implementation currently resets the SVM madvise if the underlying CPU map is unmapped. This is in an attempt to mimic the CPU madvise behaviour. However, it's not clear that this is a desired behaviour since if the end app user relies on it for malloc()ed objects or stack objects, it may not work as intended. Instead of having the autoreset functionality being a direct application-facing implicit UAPI, make the UMD explicitly choose this behaviour if it wants to expose it by introducing DRM_XE_VM_BIND_FLAG_MADVISE_AUTORESET, and add a semantics description. v2: - Kerneldoc fixes. Fix a commit log message. Fixes: `a2eb8aec3e` ("drm/xe: Reset VMA attributes to default in SVM garbage collector") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: "Falkowski, John" <john.falkowski@intel.com> Cc: "Mrozek, Michal" <michal.mrozek@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20251015170726.178685-2-thomas.hellstrom@linux.intel.com (cherry picked from commit `59a2d3f38a`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-20 17:03:44 -07:00
Thomas Hellström	9a3c0d6834	drm/xe: Retain vma flags when recreating and splitting vmas for madvise When splitting and restoring vmas for madvise, we only copied the XE_VMA_SYSTEM_ALLOCATOR flag. That meant we lost flags for read_only, dumpable and sparse (in case anyone would call madvise for the latter). Instead, define a mask of relevant flags and ensure all are replicated, To simplify this and make the code a bit less fragile, remove the conversion to VMA_CREATE flags and instead just pass around the gpuva flags after initial conversion from user-space. Fixes: `a2eb8aec3e` ("drm/xe: Reset VMA attributes to default in SVM garbage collector") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251015170726.178685-1-thomas.hellstrom@linux.intel.com (cherry picked from commit `b3af8658ec`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-20 17:03:39 -07:00
Jani Nikula	789e46fbfc	drm/i915/panic: fix panic structure allocation memory leak Separating the panic allocation from framebuffer allocation in commit `729c5f7ffa` ("drm/{i915,xe}/panic: move framebuffer allocation where it belongs") failed to deallocate the panic structure anywhere. The fix is two-fold. First, free the panic structure in intel_user_framebuffer_destroy() in the general case. Second, move the panic allocation later to intel_framebuffer_init() to not leak the panic structure in error paths (if any, now or later) between intel_framebuffer_alloc() and intel_framebuffer_init(). v2: Rebase Fixes: `729c5f7ffa` ("drm/{i915,xe}/panic: move framebuffer allocation where it belongs") Cc: Jocelyn Falempe <jfalempe@redhat.com> Cc: Maarten Lankhorst <dev@lankhorst.se> Reported-by: Michał Grzelak <michal.grzelak@intel.com> Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: Michał Grzelak <michal.grzelak@intel.com> # v1 Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://lore.kernel.org/r/20251015095135.2183415-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit `8f8ef09fcf`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-10-20 12:53:57 -04:00
Akash Goel	4eabd0d879	drm/panthor: Fix kernel panic on partial unmap of a GPU VA region This commit address a kernel panic issue that can happen if Userspace tries to partially unmap a GPU virtual region (aka drm_gpuva). The VM_BIND interface allows partial unmapping of a BO. Panthor driver pre-allocates memory for the new drm_gpuva structures that would be needed for the map/unmap operation, done using drm_gpuvm layer. It expected that only one new drm_gpuva would be needed on umap but a partial unmap can require 2 new drm_gpuva and that's why it ended up doing a NULL pointer dereference causing a kernel panic. Following dump was seen when partial unmap was exercised. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000078 Mem abort info: ESR = 0x0000000096000046 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x06: level 2 translation fault Data abort info: ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000 CM = 0, WnR = 1, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=000000088a863000 [000000000000078] pgd=080000088a842003, p4d=080000088a842003, pud=0800000884bf5003, pmd=0000000000000000 Internal error: Oops: 0000000096000046 [#1] PREEMPT SMP <snip> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : panthor_gpuva_sm_step_remap+0xe4/0x330 [panthor] lr : panthor_gpuva_sm_step_remap+0x6c/0x330 [panthor] sp : ffff800085d43970 x29: ffff800085d43970 x28: ffff00080363e440 x27: ffff0008090c6000 x26: 0000000000000030 x25: ffff800085d439f8 x24: ffff00080d402000 x23: ffff800085d43b60 x22: ffff800085d439e0 x21: ffff00080abdb180 x20: 0000000000000000 x19: 0000000000000000 x18: 0000000000000010 x17: 6e656c202c303030 x16: 3666666666646466 x15: 393d61766f69202c x14: 312d3d7361203a70 x13: 303030323d6e656c x12: ffff80008324bf58 x11: 0000000000000003 x10: 0000000000000002 x9 : ffff8000801a6a9c x8 : ffff00080360b300 x7 : 0000000000000000 x6 : 000000088aa35fc7 x5 : fff1000080000000 x4 : ffff8000842ddd30 x3 : 0000000000000001 x2 : 0000000100000000 x1 : 0000000000000001 x0 : 0000000000000078 Call trace: panthor_gpuva_sm_step_remap+0xe4/0x330 [panthor] op_remap_cb.isra.22+0x50/0x80 __drm_gpuvm_sm_unmap+0x10c/0x1c8 drm_gpuvm_sm_unmap+0x40/0x60 panthor_vm_exec_op+0xb4/0x3d0 [panthor] panthor_vm_bind_exec_sync_op+0x154/0x278 [panthor] panthor_ioctl_vm_bind+0x160/0x4a0 [panthor] drm_ioctl_kernel+0xbc/0x138 drm_ioctl+0x240/0x500 __arm64_sys_ioctl+0xb0/0xf8 invoke_syscall+0x4c/0x110 el0_svc_common.constprop.1+0x98/0xf8 do_el0_svc+0x24/0x38 el0_svc+0x40/0xf8 el0t_64_sync_handler+0xa0/0xc8 el0t_64_sync+0x174/0x178 Signed-off-by: Akash Goel <akash.goel@arm.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Fixes: `647810ec24` ("drm/panthor: Add the MMU/VM logical block") Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20251017102922.670084-1-akash.goel@arm.com	2025-10-17 13:48:56 +01:00
Dave Airlie	62cab426d0	Driver Changes: - Increase global invalidation timeout to handle some workloads (Kenneth Graunke) - Fix NPD while evicting BOs in an array of VM binds (Matthew Brost) - Fix resizable BAR to account for possibly needing to move BARs other than the LMEMBAR (Lucas De Marchi) - Fix error handling in xe_migrate_init() (Thomas Hellström) - Fix atomic fault handling with mixed mappings or if the page is already in VRAM (Matthew Brost) - Enable media samplers power gating for platforms before Xe2 (Vinay Belgaumkar) - Fix de-registering exec queue from GuC when unbinding (Matthew Brost) - Ensure data migration to system if indicated by madvise with SVM (Thomas Hellström) - Fix kerneldoc for kunit change (Matt Roper) - Always account for cacheline alignment on migration (Matthew Auld) - Drop bogus assertion on eviction (Matthew Auld) -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE6rM8lpABPHM5FqyDm6KlpjDL6lMFAmjxG4kACgkQm6KlpjDL 6lMVSxAAi89fShCW8/H7TJUDfaQdX1qTg574x+4kmsxAe5IdZLKR17iKagGwhXYt pGPOHLL6s13trhSDT9RHxrQ/iUhlUMAf3HGZyeC0/X86QuKA0qGbrXoJTdexaA/V AyaXmyPCh4CsDP7o/QNfkmaH9Ze3tYniYPxKmQXIsbJbG6hK8jgREpE3UC0ilveX 9rgA8t66W08CbPsHX8bLEgpQ6dchSZHOvHSaXvW3X1xDIi9P5kd2A3JPW9q+T15M 84xtbxan6JDZx+xguIKimlUti6ihTSksxkAV6nKyg0I3n56iLarf0HN5MDM6ZExU 8uS1ZmocaKqLji51LroIL+0X31H4VnQZlT/eehheBukW8SF6/jXEnq2PtxNy01Yi NJTCcwvvA0jMhK02tc9gcpHgJcmjp08lbymlZ0QdEp4gIQn5dpXubhcvdNeOmUK9 NJMD8aE+9JnQ6iD8GFVjvdTSHKMpKtsNl2kUShOU3oK1KNHAqn/v3r4iM8VbGBff TaCxusNeVqFCcWkh4R58ppKdKiwLzitjc0xP9kjNFtGDVtPS11fluxQ+BhhrzFKk 84wnhG8Lry7Ss5TpCAWjirxQOANx/q4Nef7uby6QAF9SLuon7Q2XU7ShLOlWIeTH AmtX57A8TxTrXa0Smn0rIP7/sYAdGfWDAdTDdJjAoJ36w8T2rgo= =BqxC -----END PGP SIGNATURE----- Merge tag 'drm-xe-fixes-2025-10-16' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - Increase global invalidation timeout to handle some workloads (Kenneth Graunke) - Fix NPD while evicting BOs in an array of VM binds (Matthew Brost) - Fix resizable BAR to account for possibly needing to move BARs other than the LMEMBAR (Lucas De Marchi) - Fix error handling in xe_migrate_init() (Thomas Hellström) - Fix atomic fault handling with mixed mappings or if the page is already in VRAM (Matthew Brost) - Enable media samplers power gating for platforms before Xe2 (Vinay Belgaumkar) - Fix de-registering exec queue from GuC when unbinding (Matthew Brost) - Ensure data migration to system if indicated by madvise with SVM (Thomas Hellström) - Fix kerneldoc for kunit change (Matt Roper) - Always account for cacheline alignment on migration (Matthew Auld) - Drop bogus assertion on eviction (Matthew Auld) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/rch735eqkmprfyutk3ux2fsqa3e5ve4p77w7a5j66qdpgyquxr@ao3wzcqtpn6s	2025-10-17 09:39:53 +10:00
Dave Airlie	d6dd930a6b	Short summary of fixes pull: ast: - Fix display output after reboot bridge: - lt9211: Fix version check core: - draw: Avoid color truncation - gpuvm: Avoid kernel-doc warning - sched: Avoid double free panthor: - Fix MCU suspend qaic: - Init bootlog in correct order - Treat remaining == 0 as error in find_and_map_user_pages() - Lock access to DBC request queue rockchip: - vop2: Fix destination size in atomic check -----BEGIN PGP SIGNATURE----- iQFPBAABCgA5FiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmjw/fgbFIAAAAAABAAO bWFudTIsMi41KzEuMTEsMiwyAAoJEGgNwR1TC3ojCAAH/Aso0c3Zhiy2t6v+ZkWA lm9biRjMA82aPqHODQMiuBcqMjNel8zY1mRAwEMWs88MV/fw3BTdUjg7Oh9cDPBa IbeYBqUefyMs5I+4J4o3KjvmmAkr9qGiFjz+IYScLyFgy5/zZBRiK33LCXMfwQl9 dD6mgVkn78o6i15v4IaVpkrGzXi8dPqhVG3l3qONvyIgJa8yzCoyTq2dpdI1aHL4 Czv6NtZ05bkfZFIFois1YKVKPaAkipQhmxYtulVfFOaah09VSO20HPfhlSlONRMz IiXVNisE0os1LP7J7/sg2CKbnTTJ0G1ObJ8d5ZBtM6HlBBe0B1n4Vz5hDKjc4ObL /yI= =JFwD -----END PGP SIGNATURE----- Merge tag 'drm-misc-fixes-2025-10-16' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: ast: - Fix display output after reboot bridge: - lt9211: Fix version check core: - draw: Avoid color truncation - gpuvm: Avoid kernel-doc warning - sched: Avoid double free panthor: - Fix MCU suspend qaic: - Init bootlog in correct order - Treat remaining == 0 as error in find_and_map_user_pages() - Lock access to DBC request queue rockchip: - vop2: Fix destination size in atomic check Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20251016141607.GA73919@linux.fritz.box	2025-10-17 09:14:06 +10:00
Dave Airlie	520133b0ba	amd-drm-fixes-6.18-2025-10-16: amdgpu: - Backlight fix - SI fixes - CIK fix - Make CE support debug only - IP discovery fix - Ring reset fixes - GPUVM fault memory barrier fix - Drop unused structures in amdgpu_drm.h - JPEG debugfs fix - VRAM handling fixes for GPUs without VRAM - GC 12 MES fixes amdkfd: - MES fix -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCaPDwuAAKCRC93/aFa7yZ 2CyaAP9Xwuar8fw2+CaoL7zdvo7MkqQpwOVBkyoQkKOQlZK9gQEAzotisG8jHkls hAfx8eqFoNSQevdKkShgVubIHna5hAw= =nbWU -----END PGP SIGNATURE----- Merge tag 'amd-drm-fixes-6.18-2025-10-16' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.18-2025-10-16: amdgpu: - Backlight fix - SI fixes - CIK fix - Make CE support debug only - IP discovery fix - Ring reset fixes - GPUVM fault memory barrier fix - Drop unused structures in amdgpu_drm.h - JPEG debugfs fix - VRAM handling fixes for GPUs without VRAM - GC 12 MES fixes amdkfd: - MES fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20251016132224.2534946-1-alexander.deucher@amd.com	2025-10-17 06:58:40 +10:00
Alok Tiwari	c700e7279b	drm/rockchip: dw_hdmi: use correct SCLIN mask for RK3228 In dw_hdmi_rk3228_setup_hpd(), the SCLIN mask incorrectly references the RK3328 variant. This change updates it to the RK3228-specific mask RK3228_HDMI_SCLIN_MSK using FIELD_PREP_WM16, ensuring proper HPD and I2C pin configuration for RK3228. Change: RK3328_HDMI_SCLIN_MSK -> RK3228_HDMI_SCLIN_MSK Fixes: `63df37f3fc` ("drm/rockchip: dw_hdmi: switch to FIELD_PREP_WM16* macros") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20251010173143.72733-1-alok.a.tiwari@oracle.com	2025-10-16 17:57:50 +02:00
Tvrtko Ursulin	5801e65206	drm/sched: Fix potential double free in drm_sched_job_add_resv_dependencies When adding dependencies with drm_sched_job_add_dependency(), that function consumes the fence reference both on success and failure, so in the latter case the dma_fence_put() on the error path (xarray failed to expand) is a double free. Interestingly this bug appears to have been present ever since commit `ebd5f74255` ("drm/sched: Add dependency tracking"), since the code back then looked like this: drm_sched_job_add_implicit_dependencies(): ... for (i = 0; i < fence_count; i++) { ret = drm_sched_job_add_dependency(job, fences[i]); if (ret) break; } for (; i < fence_count; i++) dma_fence_put(fences[i]); Which means for the failing 'i' the dma_fence_put was already a double free. Possibly there were no users at that time, or the test cases were insufficient to hit it. The bug was then only noticed and fixed after commit `9c2ba26535` ("drm/scheduler: use new iterator in drm_sched_job_add_implicit_dependencies v2") landed, with its fixup of commit `4eaf02d607` ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies"). At that point it was a slightly different flavour of a double free, which commit `963d0b3569` ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder") noticed and attempted to fix. But it only moved the double free from happening inside the drm_sched_job_add_dependency(), when releasing the reference not yet obtained, to the caller, when releasing the reference already released by the former in the failure case. As such it is not easy to identify the right target for the fixes tag so lets keep it simple and just continue the chain. While fixing we also improve the comment and explain the reason for taking the reference and not dropping it. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: `963d0b3569` ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/dri-devel/aNFbXq8OeYl3QSdm@stanley.mountain/ Cc: Christian König <christian.koenig@amd.com> Cc: Rob Clark <robdclark@chromium.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Philipp Stanner <phasta@kernel.org> Cc: Christian König <ckoenig.leichtzumerken@gmail.com> Cc: dri-devel@lists.freedesktop.org Cc: stable@vger.kernel.org # v5.16+ Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20251015084015.6273-1-tvrtko.ursulin@igalia.com	2025-10-16 14:26:05 +02:00
Matthew Auld	225bc03d85	drm/xe/evict: drop bogus assert This assert can trigger here with non pin_map users that select LATE_RESTORE, since the vmap is allowed to be NULL given that save/restore can now use the blitter instead. The check here doesn't seem to have much value anymore given that we no longer move pinned memory, so any existing vmap is left well alone, and doesn't need to be recreated upon restore, so just drop the assert here. Fixes: `86f69c2611` ("drm/xe: use backup object for pinned save/restore") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6213 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20251010152457.177884-2-matthew.auld@intel.com (cherry picked from commit `a10b4a69c7`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-15 22:48:54 -07:00
Matthew Auld	6a91af25cd	drm/xe/migrate: don't misalign current bytes If current bytes exceeds the max copy size, ensure the clamped size still accounts for the XE_CACHELINE_BYTES alignment, otherwise we trigger the assert in xe_migrate_vram with the size now being out of alignment. Fixes: `8c2d61e0e9` ("drm/xe/migrate: don't overflow max copy size") Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6212 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251010162020.190962-2-matthew.auld@intel.com (cherry picked from commit `641bcf8731`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-15 22:48:48 -07:00
Matt Roper	6d36f65ba5	drm/xe/kunit: Fix kerneldoc for parameterized tests Kunit's generate_params() was recently updated to take an additional test context parameter. Xe's IP and platform parameter generators were updated accordingly at the same time, but the new parameter was not added to the functions' kerneldoc, resulting in the following warnings: Warning: drivers/gpu/drm/xe/tests/xe_pci.c:78 function parameter 'test' not described in 'xe_pci_fake_data_gen_params' Warning: drivers/gpu/drm/xe/tests/xe_pci.c:254 function parameter 'test' not described in 'xe_pci_graphics_ip_gen_param' Warning: drivers/gpu/drm/xe/tests/xe_pci.c:278 function parameter 'test' not described in 'xe_pci_media_ip_gen_param' Warning: drivers/gpu/drm/xe/tests/xe_pci.c:302 function parameter 'test' not described in 'xe_pci_id_gen_param' Warning: drivers/gpu/drm/xe/tests/xe_pci.c:390 function parameter 'test' not described in 'xe_pci_live_device_gen_param' 5 warnings as errors Document the new parameter to eliminate the warnings and make CI happy. Fixes: `b9a214b5f6` ("kunit: Pass parameterized test context to generate_params()") Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20251013153014.2362879-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit `89e347f8a7`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-15 22:48:43 -07:00
Thomas Hellström	7987b93e3a	drm/xe/svm: Ensure data will be migrated to system if indicated by madvise. If the location madvise() is set to DRM_XE_PREFERRED_LOC_DEFAULT_SYSTEM, the drm_pagemap in the SVM gpu fault handler will be set to NULL. However there is nothing that explicitly migrates the data to system if it is already present in device memory. In that case, set the device memory owner to NULL to ensure data gets properly migrated to system on page-fault. v2: - Remove redundant dpagemap assignment (Himal Prasad Ghimiray) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v1 Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20251010104149.72783-2-thomas.hellstrom@linux.intel.com Fixes: `10aa5c8060` ("drm/gpusvm, drm/xe: Fix userptr to not allow device private pages") (cherry picked from commit `2cfcea7a74`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-15 22:48:37 -07:00
Jouni Högander	95355766e5	drm/i915/psr: Deactivate PSR only on LNL and when selective fetch enabled Using intel_psr_exit in frontbuffer flush on older platforms seems to be causing problems. Sending single full frame update using intel_psr_force_update is anyways more optimal compared to psr deactivate/activate -> move back to this approach on PSR1, PSR HW tracking and Panel Replay full frame update and use deactivate/activate only on LunarLake and only when selective fetch is enabled. Tested-by: Lemen <lemen@lemen.xyz> Tested-by: Koos Vriezen <koos.vriezen@gmail.com> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14946 Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Mika Kahola <mika.kahola@intel.com> Link: https://lore.kernel.org/r/20250922102725.2752742-1-jouni.hogander@intel.com (cherry picked from commit `924adb0bbd`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-10-15 10:12:43 -04:00
Thomas Zimmermann	6f719373b9	drm/ast: Blank with VGACR17 sync enable, always clear VGACRB6 sync off Blank the display by disabling sync pulses with VGACR17<7>. Unblank by reenabling them. This VGA setting should be supported by all Aspeed hardware. Ast currently blanks via sync-off bits in VGACRB6. Not all BMCs handle VGACRB6 correctly. After disabling sync during a reboot, some BMCs do not reenable it after the soft reset. The display output remains dark. When the display is off during boot, some BMCs set the sync-off bits in VGACRB6, so the display remains dark. Observed with Blackbird AST2500 BMCs. Clearing the sync-off bits unconditionally fixes these issues. Also do not modify VGASR1's SD bit for blanking, as it only disables GPU access to video memory. v2: - init vgacrb6 correctly (Jocelyn) Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: `ce3d99c834` ("drm: Call drm_atomic_helper_shutdown() at shutdown time for misc drivers") Tested-by: Nick Bowler <nbowler@draconx.ca> Reported-by: Nick Bowler <nbowler@draconx.ca> Closes: https://lore.kernel.org/dri-devel/wpwd7rit6t4mnu6kdqbtsnk5bhftgslio6e2jgkz6kgw6cuvvr@xbfswsczfqsi/ Cc: Douglas Anderson <dianders@chromium.org> Cc: Dave Airlie <airlied@redhat.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Jocelyn Falempe <jfalempe@redhat.com> Cc: dri-devel@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v6.7+ Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://lore.kernel.org/r/20251014084743.18242-1-tzimmermann@suse.de	2025-10-15 09:55:35 +02:00
Thomas Zimmermann	48a710760e	Merge drm/drm-fixes into drm-misc-fixes Updating drm-misc-fixes to the state of v6.18-rc1. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>	2025-10-14 10:59:58 +02:00
Alok Tiwari	7f38a14875	drm/rockchip: vop2: use correct destination rectangle height check The vop2_plane_atomic_check() function incorrectly checks drm_rect_width(dest) twice instead of verifying both width and height. Fix the second condition to use drm_rect_height(dest) so that invalid destination rectangles with height < 4 are correctly rejected. Fixes: `604be85547` ("drm/rockchip: Add VOP2 driver") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Andy Yan <andy.yan@rock-chips.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20251012142005.660727-1-alok.a.tiwari@oracle.com	2025-10-14 10:32:17 +02:00
Francesco Valla	095232711f	drm/draw: fix color truncation in drm_draw_fill24 The color parameter passed to drm_draw_fill24() was truncated to 16 bits, leading to an incorrect color drawn to the target iosys_map. Fix this behavior, widening the parameter to 32 bits. Fixes: `31fa2c1ca0` ("drm/panic: Move drawing functions to drm_draw") Signed-off-by: Francesco Valla <francesco@valla.it> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://lore.kernel.org/r/20251003-drm_draw_fill24_fix-v1-1-8fb7c1c2a893@valla.it Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>	2025-10-14 09:25:10 +02:00
Shuicheng Lin	9f64b3cd05	drm/xe/guc: Check GuC running state before deregistering exec queue In normal operation, a registered exec queue is disabled and deregistered through the GuC, and freed only after the GuC confirms completion. However, if the driver is forced to unbind while the exec queue is still running, the user may call exec_destroy() after the GuC has already been stopped and CT communication disabled. In this case, the driver cannot receive a response from the GuC, preventing proper cleanup of exec queue resources. Fix this by directly releasing the resources when GuC is not running. Here is the failure dmesg log: " [ 468.089581] ---[ end trace 0000000000000000 ]--- [ 468.089608] pci 0000:03:00.0: [drm] ERROR GT0: GUC ID manager unclean (1/65535) [ 468.090558] pci 0000:03:00.0: [drm] GT0: total 65535 [ 468.090562] pci 0000:03:00.0: [drm] GT0: used 1 [ 468.090564] pci 0000:03:00.0: [drm] GT0: range 1..1 (1) [ 468.092716] ------------[ cut here ]------------ [ 468.092719] WARNING: CPU: 14 PID: 4775 at drivers/gpu/drm/xe/xe_ttm_vram_mgr.c:298 ttm_vram_mgr_fini+0xf8/0x130 [xe] " v2: use xe_uc_fw_is_running() instead of xe_guc_ct_enabled(). As CT may go down and come back during VF migration. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251010172529.2967639-2-shuicheng.lin@intel.com (cherry picked from commit `9b42321a02`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 13:03:26 -07:00
Vinay Belgaumkar	1852d27aa9	drm/xe: Enable media sampler power gating Where applicable, enable media sampler power gating. Also, add it to the powergate_info debugfs. v2: Remove the sampler powergate status since it is cleared quickly anyway. v3: Use vcs mask (Rodrigo) and fix the version check for media v4: Remove extra spaces v5: Media samplers are independent of vcs mask, use Media version 1255 (Matt Roper) Fixes: `38e8c4184e` ("drm/xe: Enable Coarse Power Gating") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20251010011047.2047584-1-vinay.belgaumkar@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `4cbc08649a`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 13:03:20 -07:00
Matthew Brost	7413e9f2be	drm/xe: Handle mixed mappings and existing VRAM on atomic faults Moving to VRAM will fail if mixed mappings are present or if the page is already located in VRAM. Atomic faults that require a move to VRAM currently retry without attempting to evict mixed mappings or locate existing VRAM mappings. This patch fixes the issue by attempting to evict mixed mappings or find existing VRAM pages when a move to VRAM fails during atomic fault handling. Fixes: `a9ac0fa455` ("drm/xe: Strict migration policy for atomic SVM faults") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20251009130629.3531962-1-matthew.brost@intel.com (cherry picked from commit `75188605c5`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 13:03:15 -07:00
Thomas Hellström	1117e7d1e8	drm/xe/migrate: Fix an error path The exhaustive eviction accidently changed an error path goto to a return. Fix this. Fixes: `59eabff2a3` ("drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction") Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://lore.kernel.org/r/20250910160939.103473-1-thomas.hellstrom@linux.intel.com (cherry picked from commit `381f1ed151`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 13:03:08 -07:00
Lucas De Marchi	d30203739b	drm/xe: Move rebar to be done earlier There may be cases in which the BAR0 also needs to move to accommodate the bigger BAR2. However if it's not released, the BAR2 resize fails. During the vram probe it can't be released as it's already in use by xe_mmio for early register access. Add a new function in xe_vram and let xe_pci call it directly before even early device probe. This allows the BAR2 to resize in cases BAR0 also needs to move, assuming there aren't other reasons to hold that move: [] xe 0000:03:00.0: vgaarb: deactivate vga console [] xe 0000:03:00.0: [drm] Attempting to resize bar from 8192MiB -> 16384MiB [] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: releasing [] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing [] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing [] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing [] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned [] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned [] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned [] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: assigned [] pcieport 0000:00:01.0: PCI bridge to [bus 01-04] [] pcieport 0000:00:01.0: bridge window [mem 0x83000000-0x840fffff] [] pcieport 0000:00:01.0: bridge window [mem 0x4000000000-0x44007fffff 64bit pref] [] pcieport 0000:01:00.0: PCI bridge to [bus 02-04] [] pcieport 0000:01:00.0: bridge window [mem 0x83000000-0x840fffff] [] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref] [] pcieport 0000:02:01.0: PCI bridge to [bus 03] [] pcieport 0000:02:01.0: bridge window [mem 0x83000000-0x83ffffff] [] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref] [] xe 0000:03:00.0: [drm] BAR2 resized to 16384M [] xe 0000:03:00.0: [drm:xe_pci_probe [xe]] BATTLEMAGE e221:0000 dgfx:1 gfx:Xe2_HPG (20.02) ... For BMG there are additional fix needed in the PCI side, but this helps getting it to a working resize. All the rebar logic is more pci-specific than xe-specific and can be done very early in the probe sequence. In future it would be good to move it out of xe_vram.c, but this refactor is left for later. Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: stable@vger.kernel.org # 6.12+ Link: https://lore.kernel.org/intel-xe/fafda2a3-fc63-ce97-d22b-803f771a4d19@linux.intel.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20250918-xe-pci-rebar-2-v1-2-6c094702a074@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `45e33f220f`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 13:03:03 -07:00
Matthew Brost	7ac74613e5	drm/xe: Don't allow evicting of BOs in same VM in array of VM binds An array of VM binds can potentially evict other buffer objects (BOs) within the same VM under certain conditions, which may lead to NULL pointer dereferences later in the bind pipeline. To prevent this, clear the allow_res_evict flag in the xe_bo_validate call. v2: - Invert polarity of no_res_evict (Thomas) - Add comment in code explaining issue (Thomas) Cc: stable@vger.kernel.org Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6268 Fixes: `774b5fa509` ("drm/xe: Avoid evicting object of the same vm in none fault mode") Fixes: `77f2ef3f16` ("drm/xe: Lock all gpuva ops during VM bind IOCTL") Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20251009110618.3481870-1-matthew.brost@intel.com (cherry picked from commit `8b9ba8d6d9`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 13:02:58 -07:00
Kenneth Graunke	e5ae8d1eb0	drm/xe: Increase global invalidation timeout to 1000us The previous timeout of 500us seems to be too small; panning the map in the Roll20 VTT in Firefox on a KDE/Wayland desktop reliably triggered timeouts within a few seconds of usage, causing the monitor to freeze and the following to be printed to dmesg: [Jul30 13:44] xe 0000:03:00.0: [drm] ERROR GT0: Global invalidation timeout [Jul30 13:48] xe 0000:03:00.0: [drm] ERROR [CRTC:82:pipe A] flip_done timed out I haven't hit a single timeout since increasing it to 1000us even after several multi-hour testing sessions. Fixes: `0dd2dd0182` ("drm/xe: Move DSB l2 flush to a more sensible place") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5710 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Cc: stable@vger.kernel.org Cc: Maarten Lankhorst <dev@lankhorst.se> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250912223254.147940-1-kenneth@whitecape.org Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `146046907b`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 13:02:50 -07:00
Jonathan Kim	079ae5118e	drm/amdkfd: fix suspend/resume all calls in mes based eviction path Suspend/resume all gangs should be done with the device lock is held. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:28 -04:00
Jonathan Kim	277bb0f83e	drm/amdgpu: enable suspend/resume all for gfx 12 Suspend/resume all gangs has been available for GFX12 for a while now so enable it. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:28 -04:00
Jonathan Kim	0ef930e1fa	drm/amdgpu: fix hung reset queue array memory allocation By design the MES will return an array result that is twice the number of hung doorbells it can report. i.e. if up k reported doorbells are supported, then the second half of the array, also of length k, holds the HQD information (type/queue/pipe) where queue 1 corresponds to index 0 and k, queue 2 corresponds to index 1 and k + 1 etc ... The driver will use the HDQ info to target queue/pipe reset for hardware scheduled user compute queues. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:28 -04:00
Jonathan Kim	8745ca5efb	drm/amdgpu: fix initialization of doorbell array for detect and hang Initialized doorbells should be set to invalid rather than 0 to prevent driver from over counting hung doorbells since it checks against the invalid value to begin with. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:28 -04:00
Jonathan Kim	d0de79f66a	drm/amdgpu: fix gfx12 mes packet status return check GFX12 MES uses low 32 bits of status return for success (1 or 0) and high bits for debug information if low bits are 0. GFX11 MES doesn't do this so checking full 64-bit status return for 1 or 0 is still valid. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2025-10-13 14:14:16 -04:00
Jesse.Zhang	883f309add	drm/amdgpu: Fix NULL pointer dereference in VRAM logic for APU devices Previously, APU platforms (and other scenarios with uninitialized VRAM managers) triggered a NULL pointer dereference in `ttm_resource_manager_usage()`. The root cause is not that the `struct ttm_resource_manager man` pointer itself is NULL, but that `man->bdev` (the backing device pointer within the manager) remains uninitialized (NULL) on APUs—since APUs lack dedicated VRAM and do not fully set up VRAM manager structures. When `ttm_resource_manager_usage()` attempts to acquire `man->bdev->lru_lock`, it dereferences the NULL `man->bdev`, leading to a kernel OOPS. 1. amdgpu_cs.c: Extend the existing bandwidth control check in `amdgpu_cs_get_threshold_for_moves()` to include a check for `ttm_resource_manager_used()`. If the manager is not used (uninitialized `bdev`), return 0 for migration thresholds immediately—skipping VRAM-specific logic that would trigger the NULL dereference. 2. amdgpu_kms.c: Update the `AMDGPU_INFO_VRAM_USAGE` ioctl and memory info reporting to use a conditional: if the manager is used, return the real VRAM usage; otherwise, return 0. This avoids accessing `man->bdev` when it is NULL. 3. amdgpu_virt.c*: Modify the vf2pf (virtual function to physical function) data write path. Use `ttm_resource_manager_used()` to check validity: if the manager is usable, calculate `fb_usage` from VRAM usage; otherwise, set `fb_usage` to 0 (APUs have no discrete framebuffer to report). This approach is more robust than APU-specific checks because it: - Works for all scenarios where the VRAM manager is uninitialized (not just APUs), - Aligns with TTM's design by using its native helper function, - Preserves correct behavior for discrete GPUs (which have fully initialized `man->bdev` and pass the `ttm_resource_manager_used()` check). v4: use ttm_resource_manager_used(&adev->mman.vram_mgr.manager) instead of checking the adev->gmc.is_app_apu flag (Christian) Reviewed-by: Christian König <christian.koenig@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Christian König	33cc891b56	drm/amdgpu: hide VRAM sysfs attributes on GPUs without VRAM Otherwise accessing them can cause a crash. Signed-off-by: Christian König <christian.koenig@amd.com> Tested-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Sathishkumar S	74de0eaa00	drm/amdgpu: fix bit shift logic BIT_ULL(n) sets nth bit, remove explicit shift and set the position Fixes: `a7a411e246` ("drm/amdgpu: fix shift-out-of-bounds in amdgpu_debugfs_jpeg_sched_mask_set") Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Timur Kristóf	6917112af2	drm/amd/powerplay: Fix CIK shutdown temperature Remove extra multiplication. CIK GPUs such as Hawaii appear to use PP_TABLE_V0 in which case the shutdown temperature is hardcoded in smu7_init_dpm_defaults and is already multiplied by 1000. The value was mistakenly multiplied another time by smu7_get_thermal_temperature_range. Fixes: `4ba082572a` ("drm/amd/powerplay: export the thermal ranges of VI asics (V2)") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1676 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Gui-Dong Han	6df8e84aa6	drm/amdgpu: use atomic functions with memory barriers for vm fault info The atomic variable vm_fault_info_updated is used to synchronize access to adev->gmc.vm_fault_info between the interrupt handler and get_vm_fault_info(). The default atomic functions like atomic_set() and atomic_read() do not provide memory barriers. This allows for CPU instruction reordering, meaning the memory accesses to vm_fault_info and the vm_fault_info_updated flag are not guaranteed to occur in the intended order. This creates a race condition that can lead to inconsistent or stale data being used. The previous implementation, which used an explicit mb(), was incomplete and inefficient. It failed to account for all potential CPU reorderings, such as the access of vm_fault_info being reordered before the atomic_read of the flag. This approach is also more verbose and less performant than using the proper atomic functions with acquire/release semantics. Fix this by switching to atomic_set_release() and atomic_read_acquire(). These functions provide the necessary acquire and release semantics, which act as memory barriers to ensure the correct order of operations. It is also more efficient and idiomatic than using explicit full memory barriers. Fixes: `b97dfa27ef` ("drm/amdgpu: save vm fault information for amdkfd") Cc: stable@vger.kernel.org Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Alex Deucher	ff780f4f80	drm/amdgpu: set an error on all fences from a bad context When we backup ring contents to reemit after a queue reset, we don't backup ring contents from the bad context. When we signal the fences, we should set an error on those fences as well. v2: misc cleanups v3: add locking for fence error, fix comment (Christian) v4: fix wrap around, locking (Christian) Fixes: `77cc0da39c` ("drm/amdgpu: track ring state associated with a fence") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Alex Deucher	1f22fcb88b	drm/amdgpu: handle wrap around in reemit handling Compare the sequence numbers directly. Fixes: `77cc0da39c` ("drm/amdgpu: track ring state associated with a fence") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00

1 2 3 4 5 ...

118468 Commits