To avoid future confusion on the purpose and design of the CRn pinning code.
Also note that if the attacker controls page-tables, the CRn bits lose much of
the attraction anyway.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20260320092521.GG3739106@noisy.programming.kicks-ass.net
Commit in Fixes added the FRED CR4 bit to the CR4 pinned bits mask so
that whenever something else modifies CR4, that bit remains set. Which
in itself is a perfectly fine idea.
However, there's an issue when during boot FRED is initialized: first on
the BSP and later on the APs. Thus, there's a window in time when
exceptions cannot be handled.
This becomes particularly nasty when running as SEV-{ES,SNP} or TDX
guests which, when they manage to trigger exceptions during that short
window described above, triple fault due to FRED MSRs not being set up
yet.
See Link tag below for a much more detailed explanation of the
situation.
So, as a result, the commit in that Link URL tried to address this
shortcoming by temporarily disabling CR4 pinning when an AP is not
online yet.
However, that is a problem in itself because in this case, an attack on
the kernel needs to only modify the online bit - a single bit in RW
memory - and then disable CR4 pinning and then disable SM*P, leading to
more and worse things to happen to the system.
So, instead, remove the FRED bit from the CR4 pinning mask, thus
obviating the need to temporarily disable CR4 pinning.
If someone manages to disable FRED when poking at CR4, then
idt_invalidate() would make sure the system would crash'n'burn on the
first exception triggered, which is a much better outcome security-wise.
Fixes: ff45746fbf ("x86/cpu: Add X86_CR4_FRED macro")
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org> # 6.12+
Link: https://lore.kernel.org/r/177385987098.1647592.3381141860481415647.tip-bot2@tip-bot2
Move FSGSBASE enablement from identify_cpu() to cpu_init_exception_handling()
to ensure it is enabled before any exceptions can occur on both boot and
secondary CPUs.
== Background ==
Exception entry code (paranoid_entry()) uses ALTERNATIVE patching based on
X86_FEATURE_FSGSBASE to decide whether to use RDGSBASE/WRGSBASE instructions
or the slower RDMSR/SWAPGS sequence for saving/restoring GSBASE.
On boot CPU, ALTERNATIVE patching happens after enabling FSGSBASE in CR4.
When the feature is available, the code is permanently patched to use
RDGSBASE/WRGSBASE, which require CR4.FSGSBASE=1 to execute without triggering
== Boot Sequence ==
Boot CPU (with CR pinning enabled):
trap_init()
cpu_init() <- Uses unpatched code (RDMSR/SWAPGS)
x2apic_setup()
...
arch_cpu_finalize_init()
identify_boot_cpu()
identify_cpu()
cr4_set_bits(X86_CR4_FSGSBASE) # Enables the feature
# This becomes part of cr4_pinned_bits
...
alternative_instructions() <- Patches code to use RDGSBASE/WRGSBASE
Secondary CPUs (with CR pinning enabled):
start_secondary()
cr4_init() <- Code already patched, CR4.FSGSBASE=1
set implicitly via cr4_pinned_bits
cpu_init() <- exceptions work because FSGSBASE is
already enabled
Secondary CPU (with CR pinning disabled):
start_secondary()
cr4_init() <- Code already patched, CR4.FSGSBASE=0
cpu_init()
x2apic_setup()
rdmsrq(MSR_IA32_APICBASE) <- Triggers #VC in SNP guests
exc_vmm_communication()
paranoid_entry() <- Uses RDGSBASE with CR4.FSGSBASE=0
(patched code)
...
ap_starting()
identify_secondary_cpu()
identify_cpu()
cr4_set_bits(X86_CR4_FSGSBASE) <- Enables the feature, which is
too late
== CR Pinning ==
Currently, for secondary CPUs, CR4.FSGSBASE is set implicitly through
CR-pinning: the boot CPU sets it during identify_cpu(), it becomes part of
cr4_pinned_bits, and cr4_init() applies those pinned bits to secondary CPUs.
This works but creates an undocumented dependency between cr4_init() and the
pinning mechanism.
== Problem ==
Secondary CPUs boot after alternatives have been applied globally. They
execute already-patched paranoid_entry() code that uses RDGSBASE/WRGSBASE
instructions, which require CR4.FSGSBASE=1. Upcoming changes to CR pinning
behavior will break the implicit dependency, causing secondary CPUs to
generate #UD.
This issue manifests itself on AMD SEV-SNP guests, where the rdmsrq() in
x2apic_setup() triggers a #VC exception early during cpu_init(). The #VC
handler (exc_vmm_communication()) executes the patched paranoid_entry() path.
Without CR4.FSGSBASE enabled, RDGSBASE instructions trigger #UD.
== Fix ==
Enable FSGSBASE explicitly in cpu_init_exception_handling() before loading
exception handlers. This makes the dependency explicit and ensures both
boot and secondary CPUs have FSGSBASE enabled before paranoid_entry()
executes.
Fixes: c82965f9e5 ("x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit")
Reported-by: Borislav Petkov <bp@alien8.de>
Suggested-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Cc: <stable@kernel.org>
Link: https://patch.msgid.link/20260318075654.1792916-2-nikunj@amd.com
- Improve Qemu MCE-injection behavior by only using
AMD SMCA MSRs if the feature bit is set.
- Fix the relative path of gettimeofday.c inclusion
in vclock_gettime.c
- Fix a boot crash on UV clusters when a socket is
marked as 'deconfigured' which are mapped to the
SOCK_EMPTY (0xffff [short -1]) node ID by the UV firmware,
while Linux APIs expect NUMA_NO_NODE (0xffffffff [int -1]).
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmm/qYwRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1imUBAAmL2xQCgOZCeLPWFZ28S5EiKf5xvyC9R4
uIm/WncXYQDJZN0JaTABdjmLiaBQnlyULpmqN47Xuiy8avMO532S92yreFpWR2OB
5TEE/v9+wcbSQOJBELhch3XNzUu7cNPQ+HbOuhot4rpK/MlyJ8rHaHLYVSVhRBYG
ynM3iTDvD6ENtyAOGT1Afkxg/sqMOZG8jdwrN1z8BH3AMU8BJ6OgfguqKuEi99Vp
Lz31a3bR0LcRPZ4ECKH0ModcKgjkqcgnVccOYCDtq8XReciGQo1Zg0diiV6VRn2a
hAqoSvdd9e1XUGoU8dEi0KwR5YXYJh3uOmjSH56rbceXr5XxPcKRJ4mOOX0GX+KC
KZl5KdwJo4veHodxWTYFe+67hi84BwKoB4gM36nRhDmzLS/XHoI4RUOjZVXSPVzJ
dwBBZDKd9tP6aaFruSa+v5sH7EbVartwLuEkQx6SDydS/4htGbESKTxgdqr63gUZ
jwmFRVaPo3FUWkOhS9GExt6XewhBVaJ9/cHZkcpW0vrm9RBlFoyTX9YLfkF4x9O1
h3wxXsULyZuYn5/tunIFfDsLpcmgsz85NXnW+0A1URAAWFNRYfVIKxaO7tUUF2OA
asnOe0NrztvKoZfKzCi7s14HVCPT34aEp3Vf2Am/QRMy5JYLhS3xAwApiZ/hrZ4R
Kb+4dVfh12Q=
=uIqH
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Improve Qemu MCE-injection behavior by only using AMD SMCA MSRs if
the feature bit is set
- Fix the relative path of gettimeofday.c inclusion in vclock_gettime.c
- Fix a boot crash on UV clusters when a socket is marked as
'deconfigured' which are mapped to the SOCK_EMPTY node ID by
the UV firmware, while Linux APIs expect NUMA_NO_NODE.
The difference being (0xffff [unsigned short ~0]) vs [int -1]
* tag 'x86-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/uv: Handle deconfigured sockets
x86/entry/vdso: Fix path of included gettimeofday.c
x86/mce/amd: Check SMCA feature bit before accessing SMCA MSRs
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmm80s8THHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXuD2B/9lj5tU54YNxRkB7iLjicrX6RfYTTC2
xynQTHRQhAOEBF3woPR7BkTLVBvRyf10fplRP/nSxpAUAzdulz99rprBGlckEJji
VlBhvuWz3fQ5XWsAovWaghhcSw9juf9k2x1x2VudwSLKcsgtqft96JQb4JxnuH02
cNJLeprLuKRwafqF+Wmwj+o4uWEe78Sa7yDg6vai10+e2Q4sE+QrvyXqEJGFDdjj
W4+aQmeAElvYTxKRGMrUsEaraif/BXNRmihCD96b9v8/w0BIZjPIVGvKE0NA4BKS
rkPxs8KfD2Gh7dSvBvvnptd1+36XeyXsW+nFmndumwxqY2NKHuuA9fAt
=Cqrd
-----END PGP SIGNATURE-----
Merge tag 'hyperv-fixes-signed-20260319' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull Hyper-V fixes from Wei Liu:
- Fix ARM64 MSHV support (Anirudh Rayabharam)
- Fix MSHV driver memory handling issues (Stanislav Kinsburskii)
- Update maintainers for Hyper-V DRM driver (Saurabh Sengar)
- Misc clean up in MSHV crashdump code (Ard Biesheuvel, Uros Bizjak)
- Minor improvements to MSHV code (Mukesh R, Wei Liu)
- Revert not yet released MSHV scrub partition hypercall (Wei Liu)
* tag 'hyperv-fixes-signed-20260319' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
mshv: Fix error handling in mshv_region_pin
MAINTAINERS: Update maintainers for Hyper-V DRM driver
mshv: Fix use-after-free in mshv_map_user_memory error path
mshv: pass struct mshv_user_mem_region by reference
x86/hyperv: Use any general-purpose register when saving %cr2 and %cr8
x86/hyperv: Use current_stack_pointer to avoid asm() in hv_hvcrash_ctxt_save()
x86/hyperv: Save segment registers directly to memory in hv_hvcrash_ctxt_save()
x86/hyperv: Use __naked attribute to fix stackless C function
Revert "mshv: expose the scrub partition hypercall"
mshv: add arm64 support for doorbell & intercept SINTs
mshv: refactor synic init and cleanup
x86/hyperv: print out reserved vectors in hexadecimal
People do effort to inject MCEs into guests in order to simulate/test
handling of hardware errors. The real use case behind it is testing the
handling of SIGBUS which the memory failure code sends to the process.
If that process is QEMU, instead of killing the whole guest, the MCE can
be injected into the guest kernel so that latter can attempt proper
handling and kill the user *process* in the guest, instead, which
caused the MCE. The assumption being here that the whole injection flow
can supply enough information that the guest kernel can pinpoint the
right process. But that's a different topic...
Regardless of virtualization or not, access to SMCA-specific registers
like MCA_DESTAT should only be done after having checked the smca
feature bit. And there are AMD machines like Bulldozer (the one before
Zen1) which do support deferred errors but are not SMCA machines.
Therefore, properly check the feature bit before accessing related MSRs.
[ bp: Rewrite commit message. ]
Fixes: 7cb735d7c0 ("x86/mce: Unify AMD DFR handler with MCA Polling")
Signed-off-by: William Roche <william.roche@oracle.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20260218163025.1316501-1-william.roche@oracle.com
Now that the x86 topology code has a sensible nodes-per-package
measure, that does not depend on the online status of CPUs, use this
to divinate the SNC mode.
Note that when Cluster on Die (CoD) is configured on older systems this
will also show multiple NUMA nodes per package. Intel Resource Director
Technology is incomaptible with CoD. Print a warning and do not use the
fixup MSR_RMID_SNC_CONFIG.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Link: https://patch.msgid.link/aaCxbbgjL6OZ6VMd@agluck-desk3
Link: https://patch.msgid.link/20260303110100.367976706@infradead.org
Use the MADT and SRAT table data to compute __num_nodes_per_package.
Specifically, SRAT has already been parsed in x86_numa_init(), which is called
before acpi_boot_init() which parses MADT. So both are available in
topology_init_possible_cpus().
This number is useful to divinate the various Intel CoD/SNC and AMD NPS modes,
since the platforms are failing to provide this otherwise.
Doing it this way is independent of the number of online CPUs and
other such shenanigans.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: Kyle Meyer <kyle.meyer@hpe.com>
Link: https://patch.msgid.link/20260303110100.004091624@infradead.org
This converts some of the visually simpler cases that have been split
over multiple lines. I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.
Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script. I probably had made it a bit _too_ trivial.
So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.
The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the exact same thing as the 'alloc_obj()' version, only much
smaller because there are a lot fewer users of the *alloc_flex()
interface.
As with alloc_obj() version, this was done entirely with mindless brute
force, using the same script, except using 'flex' in the pattern rather
than 'objs*'.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmmWuQwTHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXnnHB/41Jji+y8FHe2SqpQhUOqHb6NDEr3GX
YpAybhz2IsBHVhbCQn789UiIcSr0UDR7wnVLAmXe+5eY/jRwNggIO3tFqLYn92pK
KSTNafgNbLxh3iKBxRsUy0b3JutjD2LytkpFj2KVbBsZfmRxCZmKIV/4V18rV+fA
uemvoqLwU7emEWkhZ24suHMHPVpv6xKs9O6gOrQ4+zXR0g//eMLDqb17uj8h+8sM
ZsPsMYeuOihXlvGeBRjbnWYjA1ODWGDvwR9VT+VU4+HWht/KSr15EGeXZdV2eZUt
e/8swbqOS94a2ZjOgStzVkcPqAF88t9zZ+gvYElTDzLlHjqbrZdpeDDt
=A7tT
-----END PGP SIGNATURE-----
Merge tag 'hyperv-next-signed-20260218' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull Hyper-V updates from Wei Liu:
- Debugfs support for MSHV statistics (Nuno Das Neves)
- Support for the integrated scheduler (Stanislav Kinsburskii)
- Various fixes for MSHV memory management and hypervisor status
handling (Stanislav Kinsburskii)
- Expose more capabilities and flags for MSHV partition management
(Anatol Belski, Muminul Islam, Magnus Kulke)
- Miscellaneous fixes to improve code quality and stability (Carlos
López, Ethan Nelson-Moore, Li RongQing, Michael Kelley, Mukesh
Rathor, Purna Pavan Chandra Aekkaladevi, Stanislav Kinsburskii, Uros
Bizjak)
- PREEMPT_RT fixes for vmbus interrupts (Jan Kiszka)
* tag 'hyperv-next-signed-20260218' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (34 commits)
mshv: Handle insufficient root memory hypervisor statuses
mshv: Handle insufficient contiguous memory hypervisor status
mshv: Introduce hv_deposit_memory helper functions
mshv: Introduce hv_result_needs_memory() helper function
mshv: Add SMT_ENABLED_GUEST partition creation flag
mshv: Add nested virtualization creation flag
Drivers: hv: vmbus: Simplify allocation of vmbus_evt
mshv: expose the scrub partition hypercall
mshv: Add support for integrated scheduler
mshv: Use try_cmpxchg() instead of cmpxchg()
x86/hyperv: Fix error pointer dereference
x86/hyperv: Reserve 3 interrupt vectors used exclusively by MSHV
Drivers: hv: vmbus: Use kthread for vmbus interrupts on PREEMPT_RT
x86/hyperv: Remove ASM_CALL_CONSTRAINT with VMMCALL insn
x86/hyperv: Use savesegment() instead of inline asm() to save segment registers
mshv: fix SRCU protection in irqfd resampler ack handler
mshv: make field names descriptive in a header struct
x86/hyperv: Update comment in hyperv_cleanup()
mshv: clear eventfd counter on irqfd shutdown
x86/hyperv: Use memremap()/memunmap() instead of ioremap_cache()/iounmap()
...
Total patches: 36
Reviews/patch: 1.77
Reviewed rate: 83%
- The 2 patch series "mm/vmscan: fix demotion targets checks in
reclaim/demotion" from Bing Jiao fixes a couple of issues in the
demotion code - pages were failed demotion and were finding themselves
demoted into disallowed nodes.
- The 11 patch series "Remove XA_ZERO from error recovery of dup_mmap()"
from Liam Howlett fixes a rare mapledtree race and performs a number of
cleanups.
- The 13 patch series "mm: add bitmap VMA flag helpers and convert all
mmap_prepare to use them" from Lorenzo Stoakes implements a lot of
cleanups following on from the conversion of the VMA flags into a
bitmap.
- The 5 patch series "support batch checking of references and unmapping
for large folios" from Baolin Wang implements batching to greatly
improve the performance of reclaiming clean file-backed large folios.
- The 3 patch series "selftests/mm: add memory failure selftests" from
Miaohe Lin does as claimed.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaZaIEQAKCRDdBJ7gKXxA
jj73AQCQDwLoipDiQRGyjB5BDYydymWuDoiB1tlDPHfYAP3b/QD/UQtVlOEXqwM3
naOKs3NQ1pwnfhDaQMirGw2eAnJ1SQY=
=6Iif
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
- "mm/vmscan: fix demotion targets checks in reclaim/demotion" fixes a
couple of issues in the demotion code - pages were failed demotion
and were finding themselves demoted into disallowed nodes (Bing Jiao)
- "Remove XA_ZERO from error recovery of dup_mmap()" fixes a rare
mapledtree race and performs a number of cleanups (Liam Howlett)
- "mm: add bitmap VMA flag helpers and convert all mmap_prepare to use
them" implements a lot of cleanups following on from the conversion
of the VMA flags into a bitmap (Lorenzo Stoakes)
- "support batch checking of references and unmapping for large folios"
implements batching to greatly improve the performance of reclaiming
clean file-backed large folios (Baolin Wang)
- "selftests/mm: add memory failure selftests" does as claimed (Miaohe
Lin)
* tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (36 commits)
mm/page_alloc: clear page->private in free_pages_prepare()
selftests/mm: add memory failure dirty pagecache test
selftests/mm: add memory failure clean pagecache test
selftests/mm: add memory failure anonymous page test
mm: rmap: support batched unmapping for file large folios
arm64: mm: implement the architecture-specific clear_flush_young_ptes()
arm64: mm: support batch clearing of the young flag for large folios
arm64: mm: factor out the address and ptep alignment into a new helper
mm: rmap: support batched checks of the references for large folios
tools/testing/vma: add VMA userland tests for VMA flag functions
tools/testing/vma: separate out vma_internal.h into logical headers
tools/testing/vma: separate VMA userland tests into separate files
mm: make vm_area_desc utilise vma_flags_t only
mm: update all remaining mmap_prepare users to use vma_flags_t
mm: update shmem_[kernel]_file_*() functions to use vma_flags_t
mm: update secretmem to use VMA flags on mmap_prepare
mm: update hugetlbfs to use VMA flags on mmap_prepare
mm: add basic VMA flag operation helper functions
tools: bitmap: add missing bitmap_[subset(), andnot()]
mm: add mk_vma_flags() bitmap flag macro helper
...
MSVC compiler, used to compile the Microsoft Hypervisor, currently
has an assert intrinsic that uses interrupt vector 0x29 to create an
exception. This will cause hypervisor to then crash and collect core. As
such, if this interrupt number is assigned to a device by Linux and the
device generates it, hypervisor will crash. There are two other such
vectors hard coded in the hypervisor, 0x2C and 0x2D for debug purposes.
Fortunately, the three vectors are part of the kernel driver space and
that makes it feasible to reserve them early so they are not assigned
later.
Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
In order to be able to use only vma_flags_t in vm_area_desc we must adjust
shmem file setup functions to operate in terms of vma_flags_t rather than
vm_flags_t.
This patch makes this change and updates all callers to use the new
functions.
No functional changes intended.
[akpm@linux-foundation.org: comment fixes, per Baolin]
Link: https://lkml.kernel.org/r/736febd280eb484d79cef5cf55b8a6f79ad832d2.1769097829.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Yury Norov <ynorov@nvidia.com>
Cc: Chris Mason <clm@fb.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmmKGBcACgkQaDWVMHDJ
krDRRg/6AjkL6Hcx+3jC7X2VWOY6UK8giMdyDOOBtA82/TLmch4BL3Zcf/eJky6j
OAVd6E22eWNNIRGJqhhSWY/0IZwnWj+VP57S1pFDeS8NVU1kTau2li5xT2tBQOm5
b+zztHZXsgYrAFTUOyVKbNncUlD67YIXl9t2KzS4obRa2HLOrkJotZmni86bGlyF
oBHL14cgCpXdHVH1RawCNxXa3LJSlk39+kgoM4aeJdxlb0WCB2n23ek+QIQ75TPY
mBzmF3Yxt7nri/2Lhf0cRV5Z9JQuk+6kIjbxEmFJ9nyg34CDzYt8nJiT8AlctOe0
Os3+hstgCKcvXKDK5N0UUnG/GvVlZp515fClaNeYIYraZk86WWEUA8tBGEiwghuX
SUL6Xm2OpRUClfk5nW44B01/4NIUKm78Ch6Q3QkMGRfDfWXwXQyQkjX8jiBs7GAQ
xJvN8oqj4QSiMSC9OWZnZwDxMqrnyfbEl69Iibf6ojQhf4rl6ADwzOb6U7V0XGXa
y/VJik7l7EG+UK3t3ACzWLfJ/CwvwL5D/B+O8HEpVn6vXDOFJOg/jtXovrf0LmjL
ynWttGLkXBm6rU4H02jdaHIRwu7OvTj/lNfpiIupvHFt0uKIvwng3pIk8KookQHi
E05nwLtrvvLTNQ04hXHOwxGMLL6DkVaZyOWUN3pJGC92/HEXRRU=
=Izzg
-----END PGP SIGNATURE-----
Merge tag 'x86_entry_for_7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 entry code updates from Dave Hansen:
"This is entirely composed of a set of long overdue VDSO cleanups. They
makes the VDSO build much more logical and zap quite a bit of old
cruft.
It also results in a coveted net-code-removal diffstat"
* tag 'x86_entry_for_7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry/vdso: Add vdso2c to .gitignore
x86/entry/vdso32: Omit '.cfi_offset eflags' for LLVM < 16
MAINTAINERS: Adjust vdso file entry in INTEL SGX
x86/entry/vdso/selftest: Update location of vgetrandom-chacha.S
x86/entry/vdso: Fix filtering of vdso compiler flags
x86/entry/vdso: Update the object paths for "make vdso_install"
x86/entry/vdso32: When using int $0x80, use it directly
x86/cpufeature: Replace X86_FEATURE_SYSENTER32 with X86_FEATURE_SYSFAST32
x86/vdso: Abstract out vdso system call internals
x86/entry/vdso: Include GNU_PROPERTY and GNU_STACK PHDRs
x86/entry/vdso32: Remove open-coded DWARF in sigreturn.S
x86/entry/vdso32: Remove SYSCALL_ENTER_KERNEL macro in sigreturn.S
x86/entry/vdso32: Don't rely on int80_landing_pad for adjusting ip
x86/entry/vdso: Refactor the vdso build
x86/entry/vdso: Move vdso2c to arch/x86/tools
x86/entry/vdso: Rename vdso_image_* to vdso*_image
clock interface, taming the include hell by splitting the pv_ops structure
and removing of a bunch of obsolete code. Work by Juergen Gross.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmmLKHAACgkQEsHwGGHe
VUrURg//Ucf+3EAIkLCmFkH0WwYmQl2JjRYww8bPAw3iJMIVxy4dMnaBbsUiAtUp
kYza+pgEtvyAwwd8RIEs85c9VhZn0DKoaWV8goBH3zFH6YvIRiLwb0w2QvjkF+70
FNU+4zlvt/I3FD+tWNElAgVtkFL3Gmzm44qyLLsPtlYaJ71xFl2XB7V+TlqXMHzE
m8BMenP9/CrbTlBBdNJGzAkAbWi1uAP+IydvuFNolH/F2lqVM2z5Ta3gUWWCIk/q
jWrPLDZCHr2WlBZNUGamKVVH9NEh+7YNwBAGUrSNYGZFoaFjqeX6lN3djzS+wXIj
0nDoW35jN0QNKz239MdXZDf1mfpb6ZQd/iOhFjo4dAvbm+J8WPAMr98ac8wR3Dyb
2LF/BxkoKWRabxQApXSCrLPXEuqT6Qc1+lDA0bNHg51zBoqP5vRNVZRwArnzGB+O
LxDKx+o4VYOf+UCaB6oQHjylbSgFvIedZ9p822hBe3QG9act8indRE8LWip7Utld
peoJGgvlQ0xtClh6FjVHpvmVfAvk7Zki5ywj2GwmB/TZ0yywuGStAjE3UqY168/M
gb7MSajh+HHZNj1/2+b/se4CUYlAgIPDQ+SwHJPm5TqyopvnOVi/2XWmjbx8I5jT
jS0nxaxD+SbESSZ6IMAsppnAAxAYbvRHGIS+6mtNCXVkaV1pMbA=
=AeFt
-----END PGP SIGNATURE-----
Merge tag 'x86_paravirt_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 paravirt updates from Borislav Petkov:
- A nice cleanup to the paravirt code containing a unification of the
paravirt clock interface, taming the include hell by splitting the
pv_ops structure and removing of a bunch of obsolete code (Juergen
Gross)
* tag 'x86_paravirt_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86/paravirt: Use XOR r32,r32 to clear register in pv_vcpu_is_preempted()
x86/paravirt: Remove trailing semicolons from alternative asm templates
x86/pvlocks: Move paravirt spinlock functions into own header
x86/paravirt: Specify pv_ops array in paravirt macros
x86/paravirt: Allow pv-calls outside paravirt.h
objtool: Allow multiple pv_ops arrays
x86/xen: Drop xen_mmu_ops
x86/xen: Drop xen_cpu_ops
x86/xen: Drop xen_irq_ops
x86/paravirt: Move pv_native_*() prototypes to paravirt.c
x86/paravirt: Introduce new paravirt-base.h header
x86/paravirt: Move paravirt_sched_clock() related code into tsc.c
x86/paravirt: Use common code for paravirt_steal_clock()
riscv/paravirt: Use common code for paravirt_steal_clock()
loongarch/paravirt: Use common code for paravirt_steal_clock()
arm64/paravirt: Use common code for paravirt_steal_clock()
arm/paravirt: Use common code for paravirt_steal_clock()
sched: Move clock related paravirt code to kernel/sched
paravirt: Remove asm/paravirt_api_clock.h
x86/paravirt: Move thunk macros to paravirt_types.h
...
used in a guest only until now), extend it to be able to do that too.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmmLIG4ACgkQEsHwGGHe
VUpN+A//duBFWDIFwftTOuVZW55O1JVLzkCKX4U7F9IABGLcogWryBuJ2Ii5EHJR
wWDuqEjzH0EJMQxCfTbbW8WimLpUY9KANLpDA2GoF1hkB3aYkjdPzQRBNV70AyDN
KwOSXWTriQXCU+09+EWrjPMeswrLOyf0JuDDNsdbgyjUtI/sgdI6oKflpX9arih3
N5PTnMtePS8Z6zp7ORrtZpZMIOuTGUa/hw2LPykcTGvMSjt1/6uDuoDIcRGEd8nH
EIkkcOVp8Z62c/gHI3GOffFVk/R/q9BDbJpm8Cgx0XxPdCDRrmLuRsNZ6ByDExW6
4Bk8cZ5H1ArzdtpZ5LjbpwAClrdI3ZpaDZhq7CvKZAix/W5Wl/DbREEPRs5xn6AB
tFzQ0Y9p9ExJZFvsIQiQ4g8BomFvyv3U0/GboK5HYFSz9VzaVBGQ0Tm2Vo2LZm8P
5XZEVhcxKSr598s3ESXNAX1Q6vniQhX/ofkZkAvx7YEYeL9Igdam0bf2pzYh8Nzb
0mEoWLp4MMU5/FLk/3lIxWMW3+iFeIHdrb0hFkdYKD4zyPE+LI3OimqMHH6tKtJt
H43bw+LWLZxE09d2kGqYYS5p1Gd9WcUyGc+dwpKO4QYyyffh7TBhOj0CeYKH3aq8
sDSYKclUaZ6pBxWqBZURaJsJTj3lIhKPzpNGzYM24/c9LFzGlD8=
=BK7I
-----END PGP SIGNATURE-----
Merge tag 'x86_microcode_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loader update from Borislav Petkov:
- Since debugging the microcode loader makes sense on baremetal too (it
was used in a guest only until now), extend it to be able to do that
too
* tag 'x86_microcode_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Allow loader debugging to be enabled on baremetal too
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmmLFI0ACgkQEsHwGGHe
VUpuwQ/9Gm7faW+oz0yNAm0q/GQw/hVPAPQgAN5UmpbsaW0CYI6EzB/b8ZWO4HXy
zbsKntVoEEV54KBQ9a4gouWzP6id9//NAzElY/ZBo6brNWSRXbN/7M28RczArIzE
Nm5MrHTbEU552HRQ29x5wEl3CzIANzWuij/Xt+lNBt5kXNHgRcOZ35icw692d1va
+7eNgtP/AOGjFpBX/OuBK2HvTxA02tpZPRLSgursWq8rNFIpN5quOmCbNWPJKH4I
+5tDXv3yfMiJ4BPCMLoQWYm6w1avwJI72aCM0V066Q+ZGi5XHfcg5GeOq3NGdE5c
/D4eoNIRqLrxC9V93d8CJV4BnmPuYsTra9B8HsWp50ip2jTVwborAoONkxLBat0r
hoFZ+3IrXU6QEEkVFDeJWv5gb7FK/krJV5y04tQV/byD03QP5HEJeVcewsBUluN3
OdTbin/GQbJKc+kwYlV9iBHeY93zmlHiHNWLfTfkU2Urp6oWQpYk4srhI12CdTFy
upQJ8o6atnONqHWiyIipCNxHHmAikaOiRUpXAgnqTTk8jZZOMs8sU/X1/85fQfg2
cM72/Qb6iGGdyB33AkmmG6aIvmsmoLhMnz2HV7S8E+L8A7EO3/nBkuLpLqH/KNwi
gNo323xqaTfFm9fi13Up2/epPApIicxEjZWD0eTbNA31ft7sMzM=
=oa27
-----END PGP SIGNATURE-----
Merge tag 'x86_cleanups_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Borislav Petkov:
- The usual set of cleanups and simplifications all over the tree
* tag 'x86_cleanups_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/segment: Use MOVL when reading segment registers
selftests/x86: Clean up sysret_rip coding style
x86/mm: Hide mm_free_global_asid() definition under CONFIG_BROADCAST_TLB_FLUSH
x86/crash: Use set_memory_p() instead of __set_memory_prot()
x86/CPU/AMD: Simplify the spectral chicken fix
x86/platform/olpc: Replace strcpy() with strscpy() in xo15_sci_add()
x86/split_lock: Remove dead string when split_lock_detect=fatal
practical usage of this is being able to tell how much energy or how much
work can be attributed to a group of tasks tracked under a single
idenitifier. Prepend this work with proper refactoring of resctrl domains
handling code. Work by Tony Luck
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmmKaKcACgkQEsHwGGHe
VUrzTxAAkQP0r5DdcNZA6SjnaHMjjjxV/+BlxhVDJV1EOT2/luRDelPQwm+GwaHz
Rk02cpUvLUmGQ3/vD/VLmH2Oar0gkKfzEgSJU/OAr7lDpeA3eN3BRQhyLL99fFFQ
XaH5JaXjweer8er2+ultyux+0yXmwA2Albeh2IVNR6heGjJNIG4/p9YB6z9aCS1b
B9Freb548ISq2MPqzczu5+Ku7N4nsA5TEL1wE+ndMz0NwdSvvhz8LWX3H/5cfgtM
Vf3thqgtIMZq8EiS29tDtE6EeveMMXC5XGpQ4Ts6GLgV6a3L/0Ppc2VfUQ1AsC8u
tkRIrXN9gxqcqTzI/GQ/b2QgcqH6/qy/wHowLDOjf9j8GJbRYtDda1rPo7imnoKL
6Ljn0E6qbstatBz6QBojYJB5fzC3wj0VMdG3jI2ZxaH1+iVwkb4D0fW2EScwcV2X
C6GTg+VsEDUvhm/V/1gyNSZJHPibHpNkmD61M+J9rn3COlzslRF8p7sEUogjfWxu
RXan9aJDocQA7/PDw7uYLlXZYQsyzwZbs3yv0UA8dU2vkNcNPQradlj305Fn2FxI
D8oaFCx/AB0iPI8W6wZPdLR2gbCmwS9cNgt2PQAdSFCy3z4ceAkRYgLn64Sv4SJ7
gQVJXG+V2NZuKhFi7DhSIeffDqtMT6Agia3fWW/p3NIns4FyyXo=
=WMoi
-----END PGP SIGNATURE-----
Merge tag 'x86_cache_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control updates from Borislav Petkov:
- Extend the resctrl machinery to support telemetry monitoring on
Intel (Tony Luck)
The practical usage of this is being able to tell how much energy or
how much work can be attributed to a group of tasks tracked under a
single idenitifier. Prepend this work with proper refactoring of
resctrl domains handling code.
* tag 'x86_cache_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
x86,fs/resctrl: Update documentation for telemetry events
x86/resctrl: Enable RDT_RESOURCE_PERF_PKG
fs/resctrl: Move RMID initialization to first mount
x86,fs/resctrl: Compute number of RMIDs as minimum across resources
fs/resctrl: Move allocation/free of closid_num_dirty_rmid[]
x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG
x86/resctrl: Add energy/perf choices to rdt boot option
x86,fs/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG
fs/resctrl: Refactor rmdir_mondata_subdir_allrdtgrp()
fs/resctrl: Refactor mkdir_mondata_subdir()
x86/resctrl: Read telemetry events
x86/resctrl: Find and enable usable telemetry events
x86,fs/resctrl: Add architectural event pointer
x86,fs/resctrl: Fill in details of events for performance and energy GUIDs
x86/resctrl: Discover hardware telemetry events
fs/resctrl: Emphasize that L3 monitoring resource is required for summing domains
x86,fs/resctrl: Add and initialize a resource for package scope monitoring
x86,fs/resctrl: Add an architectural hook called for first mount
x86,fs/resctrl: Support binary fixed point event counters
x86,fs/resctrl: Handle events that can be read from any CPU
...
- amd: Correct the microcode table for Zenbleed
- amd: Use ZEN_MODEL_STEP_UCODE() for erratum_1386_microcode[]
- Drop vestigial PBE logic in AMD/Hygon/Centaur/Cyrix
(Andrew Cooper)
- tsx: Set default TSX mode to auto (Nikolay Borisov)
- Drop unused Kconfig symbol X86_P6_NOP (Randy Dunlap)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmJlU0RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hVaBAArtXnP3nah0R6UphYGbFew9G5cLHXwhgZ
bUs9j6NbeJhEZBiXQON0Jfy2ucwePMOB+75+UlIm6VnKMHhE+ayB6H4zihSWirgb
Q0CzjwlT+mZ9hRxnG9o3QcbwbSmnTAwyCAcQuEpTYYKW5vqiMaKoXt8iMIwbWNcT
0hX3pLzpfbOjrXgyjGzRk5nj9JZvpL3rdmOi+/7tsDoKgsfbyOKudP/PV5OuZ2mC
9CyQCW00nAUN5+Lq8zZFBkbJS71mI78wdE3uC3NgUENhOFLccQERxdPUyVdx9K38
dXyaRfc61OOxG+RDQH5etGSexKgXrbaE22hasMVCypBP0PP66GeLJZFArOS5Pa2s
39r9NaCALyZZ76gyxceSJ1B1pOsX8MPgC2+WVW2HxZBeOxcUOEcsTRdCthZkb2w9
kg+Fa9ITMzG01yqCT6WtIfADAPxa8FpVk/q4qBTCG/WkPrq9APSKLsj6NLfHM6MM
ZesXoBULiSOED3oypglGbItsIPyWHpIumpObWdTR/U3Z1K/xCRY1qAgfjMHkajrm
8brSnEDYPOdFpvDNCaJnwrnWJwu1YY3qcuVvVvJPzdWNUff0hpIXnn63S+FegVfX
J5velNq+GlHvvh2yZgV8jUVpeqqckIP4/SS3z0sH+QcXXDoJWgyQqwbhx1M7a61m
1UDI1V8K0vo=
=k5rk
-----END PGP SIGNATURE-----
Merge tag 'x86-cpu-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Ingo Molnar:
- CPU model updates (Andrew Cooper):
- amd: Correct the microcode table for Zenbleed
- amd: Use ZEN_MODEL_STEP_UCODE() for erratum_1386_microcode[]
- Drop vestigial PBE logic in AMD/Hygon/Centaur/Cyrix
- tsx: Set default TSX mode to auto (Nikolay Borisov)
- Drop unused Kconfig symbol X86_P6_NOP (Randy Dunlap)
* tag 'x86-cpu-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsx: Set default TSX mode to auto
x86/cpu: Drop unused Kconfig symbol X86_P6_NOP
x86/cpu: Drop vestigial PBE logic in AMD/Hygon/Centaur/Cyrix
x86/cpu/amd: Use ZEN_MODEL_STEP_UCODE() for erratum_1386_microcode[]
x86/cpu/amd: Correct the microcode table for Zenbleed
- x86/acpi: Add acpi=spcr to use SPCR-provided default console
(Shenghao Yang)
- x86/acpi/boot: Correct the acpi_is_processor_usable() check again
(Yazen Ghannam)
- Refresh the x86 memory map (e820 table) handling code, and
make the printouts a bit more informative. (Ingo Molnar)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmJk1wRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1gcvA/9E00IuhZS6RxIPhqPFVhUvYlOEMnBBfnb
Sv0de0etGb2Q5GRL5j+XqMbTE2mns1we8Jh7dyY9N4K1eRlhXJZQWEFjqsFHX5/A
VxSc3/QrfzzpAxYDAg0hOSjMeJv3W4pdLjfwDpNLa/EZ+KA5tVcwu6ufd9ZFXO3a
DWlk0H3JbRuiObFlKqSpccaf9/taRz9LZ9DNqTT0K9zDN4yukWKYXxjhwxASLZ2C
i5cAzgJqvxiK4xtx5WbCs5b0swLzDi/4V7L1D9ojhckoCkrlLUjo7eAvSiS2kYik
NcA7LHDehMmP6mKOQUq9YJhR/P9FfmeItd0VgdAmVTuMdsl0LQDCl5v8XkWoBbXG
yYabeJl3eBrd5OVqsMK/vrnRkxMuBnTHdysyUpJrtCXHrMbQnpxerMIgqwhDH5rx
Z3zN4MRDf2buDJZDQT53ItKjaCGuUAEu869vo12eWhbA+OeL83G6+cs8kPUXquIq
p3NfPTLWj3U7qcwN35x8WTT1PWBrOtiUdjm27lLFUFsrY4xqXrgcHoXkoAp3tqQX
eVoxjROVGsrmFdzVr6c9D/1ZyVDwsetKQdixHrLWiAlm/S0lwDYee180uB+ocYLH
ML4QD7rvb6npAHcxUtvftpDEBO7TRnxZWlSgS70ZOYUnnVxLBhFrhygLMJ1giEew
aAodneB2Os8=
=brCM
-----END PGP SIGNATURE-----
Merge tag 'x86-boot-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/boot updates from Ingo Molnar:
- x86/acpi: Add acpi=spcr to use SPCR-provided default console
(Shenghao Yang)
- x86/acpi/boot: Correct the acpi_is_processor_usable() check again
(Yazen Ghannam)
- Refresh the x86 memory map (e820 table) handling code, and make the
printouts a bit more informative (Ingo Molnar)
* tag 'x86-boot-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
x86/acpi: Add acpi=spcr to use SPCR-provided default console
x86/boot/e820: Use <linux/sizes.h> symbols for literals
x86/boot/e820: Make sure e820_search_gap() finds all gaps
x86/boot/e820: Simplify the e820__range_remove() API
x86/boot/e820: Remove e820__range_remove()'s unused return parameter
x86/boot/e820: Simplify append_e820_table() and remove restriction on single-entry tables
x86/boot/e820: Standardize __init/__initdata tag placement
x86/boot/e820: Simplify & clarify __e820__range_add() a bit
x86/boot/e820: Rename gap_start/gap_size to max_gap_start/max_gap_start in e820_search_gap() et al
x86/boot/e820: Change e820_search_gap() to search for the highest-address PCI gap
x86/boot/e820: Clean up e820__setup_pci_gap()/e820_search_gap() a bit
x86/boot/e820: Change struct e820_table::nr_entries type from __u32 to u32
x86/boot/e820: Standardize e820 table index variable types under 'u32'
x86/boot/e820: Standardize e820 table index variable names under 'idx'
x86/boot/e820: Remove unnecessary header inclusions
x86/boot/e820: Clean up __refdata use a bit
x86/boot/e820: Clean up __e820__range_add() a bit
x86/boot/e820: Improve e820_print_type() messages
x86/boot/e820: Clean up confusing and self-contradictory verbiage around E820 related resource allocations
x86/boot/e820: Remove pointless early_panic() indirection
...
-----BEGIN PGP SIGNATURE-----
iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmltb/UeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGb3QH/1SeMiCPW/jZTLdr
J5ljJ5eldQXcE3lCDGRKF2s4BNuYrWTQnBh1pjcS1OSH6d5KLjmULf5/BznGHbWY
ZFjH1LL4A6HAWpg6BIPU9pPq6mZzAlu7ESxjXIFOavhuOA22OqF9GfoFxh1qF2er
ew8G4UqalGyD311jqfGjWAIHtMss5//JSc9RZ04SjeuI+8oUlBFKFiZgnQyaAKzv
ntBvQDdavzW1jG7KuWlCvN+0feApXoRP94va3PsZ9UdBqIxY39KTLtyluDE/Cu66
5D4lNaagmZUYPw3YSAsS+J/7Rr28ARp6J5ebzoupc7HfilofBkxWBea8/LaSrGTH
nBiGQac=
=IOwO
-----END PGP SIGNATURE-----
Merge tag 'v6.19-rc6' into tip-x86-cleanups
Pick up upstream work and
d9b40d7262 ("selftests/x86: Add selftests include path for kselftest.h after centralization")
especially which is a build fix needed for a selftests cleanup coming
ontop of this.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
In most cases, the use of "fast 32-bit system call" depends either on
X86_FEATURE_SEP or X86_FEATURE_SYSENTER32 || X86_FEATURE_SYSCALL32.
However, nearly all the logic for both is identical.
Define X86_FEATURE_SYSFAST32 which indicates that *either* SYSENTER32 or
SYSCALL32 should be used, for either 32- or 64-bit kernels. This
defaults to SYSENTER; use SYSCALL if the SYSCALL32 bit is also set.
As this removes ALL existing uses of X86_FEATURE_SYSENTER32, which is
a kernel-only synthetic feature bit, simply remove it and replace it
with X86_FEATURE_SYSFAST32.
This leaves an unused alternative for a true 32-bit kernel, but that
should really not matter in any way.
The clearing of X86_FEATURE_SYSCALL32 can be removed once the patches
for automatically clearing disabled features has been merged.
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20251216212606.1325678-10-hpa@zytor.com
The memory bandwidth calculation relies on reading the hardware counter
and measuring the delta between samples. To ensure accurate measurement,
the software reads the counter frequently enough to prevent it from
rolling over twice between reads.
The default Memory Bandwidth Monitoring (MBM) counter width is 24 bits.
Hygon CPUs provide a 32-bit width counter, but they do not support the
MBM capability CPUID leaf (0xF.[ECX=1]:EAX) to report the width offset
(from 24 bits).
Consequently, the kernel falls back to the 24-bit default counter width,
which causes incorrect overflow handling on Hygon CPUs.
Fix this by explicitly setting the counter width offset to 8 bits (resulting
in a 32-bit total counter width) for Hygon CPUs.
Fixes: d8df126349 ("x86/cpu/hygon: Add missing resctrl_cpu_detect() in bsp_init helper")
Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251209062650.1536952-3-shenxiaochen@open-hieco.net
Hygon CPUs supporting Platform QoS features currently undergo partial resctrl
initialization through resctrl_cpu_detect() in the Hygon BSP init helper and
AMD/Hygon common initialization code. However, several critical data
structures remain uninitialized for Hygon CPUs in the following paths:
- get_mem_config()-> __rdt_get_mem_config_amd():
rdt_resource::membw,alloc_capable
hw_res::num_closid
- rdt_init_res_defs()->rdt_init_res_defs_amd():
rdt_resource::cache
hw_res::msr_base,msr_update
Add the missing AMD/Hygon common initialization to ensure proper Platform QoS
functionality on Hygon CPUs.
Fixes: d8df126349 ("x86/cpu/hygon: Add missing resctrl_cpu_detect() in bsp_init helper")
Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251209062650.1536952-2-shenxiaochen@open-hieco.net
Paravirt clock related functions are available in multiple archs.
In order to share the common parts, move the common static keys
to kernel/sched/ and remove them from the arch specific files.
Make a common paravirt_steal_clock() implementation available in
kernel/sched/cputime.c, guarding it with a new config option
CONFIG_HAVE_PV_STEAL_CLOCK_GEN, which can be selected by an arch
in case it wants to use that common variant.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260105110520.21356-7-jgross@suse.com
In some places asm/paravirt.h is included without really being needed.
Remove the related #include statements.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260105110520.21356-2-jgross@suse.com
Since telemetry events are enumerated on resctrl mount the RDT_RESOURCE_PERF_PKG
resource is not considered "monitoring capable" during early resctrl initialization.
This means that the domain list for RDT_RESOURCE_PERF_PKG is not built when the CPU
hotplug notifiers are registered and run for the first time right after resctrl
initialization.
Mark the RDT_RESOURCE_PERF_PKG as "monitoring capable" upon successful telemetry
event enumeration to ensure future CPU hotplug events include this resource and
initialize its domain list for CPUs that are already online.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
resctrl assumes that only the L3 resource supports monitor events, so it
simply takes the rdt_resource::num_rmid from RDT_RESOURCE_L3 as the system's
number of RMIDs.
The addition of telemetry events in a different resource breaks that
assumption.
Compute the number of available RMIDs as the minimum value across all
mon_capable resources (analogous to how the number of CLOSIDs is computed
across alloc_capable resources).
Note that mount time enumeration of the telemetry resource means that
this number can be reduced. If this happens, then some memory will
be wasted as the allocations for rdt_l3_mon_domain::mbm_states[] and
rdt_l3_mon_domain::rmid_busy_llc created during resctrl initialization will
be larger than needed.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
There are now three meanings for "number of RMIDs":
1) The number for legacy features enumerated by CPUID leaf 0xF. This is the
maximum number of distinct values that can be loaded into MSR_IA32_PQR_ASSOC.
Note that systems with Sub-NUMA Cluster mode enabled will force scaling down
the CPUID enumerated value by the number of SNC nodes per L3-cache.
2) The number of registers in MMIO space for each event. This is enumerated in
the XML files and is the value initialized into event_group::num_rmid.
3) The number of "hardware counters" (this isn't a strictly accurate
description of how things work, but serves as a useful analogy that does
describe the limitations) feeding to those MMIO registers. This is enumerated
in telemetry_region::num_rmids returned by intel_pmt_get_regions_by_feature().
Event groups with insufficient "hardware counters" to track all RMIDs are
difficult for users to use, since the system may reassign "hardware counters"
at any time. This means that users cannot reliably collect two consecutive
event counts to compute the rate at which events are occurring.
Disable such event groups by default. The user may override this with
a command line "rdt=" option. In this case limit an under-resourced event
group's number of possible monitor resource groups to the lowest number of
"hardware counters".
Scan all enabled event groups and assign the RDT_RESOURCE_PERF_PKG resource
"num_rmid" value to the smallest of these values as this value will be used
later to compare against the number of RMIDs supported by other resources to
determine how many monitoring resource groups are supported.
N.B. Change type of resctrl_mon::num_rmid to u32 to match its usage and the
type of event_group::num_rmid so that min(r->num_rmid, e->num_rmid) won't
complain about mixing signed and unsigned types.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
Legacy resctrl features are enumerated by X86_FEATURE_* flags. These may be
overridden by quirks to disable features in the case of errata. Users can use
kernel command line options to either disable a feature, or to force enable
a feature that was disabled by a quirk.
A different approach is needed for hardware features that do not have an
X86_FEATURE_* flag.
Update parsing of the "rdt=" boot parameter to call the telemetry driver
directly to handle new "perf" and "energy" options that controls activation of
telemetry monitoring of the named type. By itself a "perf" or "energy" option
controls the forced enabling or disabling (with ! prefix) of all event groups
of the named type. A ":guid" suffix allows for fine grained control per event
group.
[ bp: s/intel_aet_option/intel_handle_aet_option/g ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
The L3 resource has several requirements for domains. There are per-domain
structures that hold the 64-bit values of counters, and elements to keep
track of the overflow and limbo threads.
None of these are needed for the PERF_PKG resource. The hardware counters
are wide enough that they do not wrap around for decades.
Define a new rdt_perf_pkg_mon_domain structure which just consists of the
standard rdt_domain_hdr to keep track of domain id and CPU mask.
Update resctrl_online_mon_domain() for RDT_RESOURCE_PERF_PKG. The only action
needed for this resource is to create and populate domain directories if a
domain is added while resctrl is mounted.
Similarly resctrl_offline_mon_domain() only needs to remove domain directories.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
Introduce intel_aet_read_event() to read telemetry events for resource
RDT_RESOURCE_PERF_PKG. There may be multiple aggregators tracking each
package, so scan all of them and add up all counters. Aggregators may return
an invalid data indication if they have received no records for a given RMID.
The user will see "Unavailable" if none of the aggregators on a package
provide valid counts.
Resctrl now uses readq() so depends on X86_64. Update Kconfig.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
Every event group has a private copy of the data of all telemetry event
aggregators (aka "telemetry regions") tracking its feature type. Included
may be regions that have the same feature type but tracking different GUID
from the event group's.
Traverse the event group's telemetry region data and mark all regions that
are not usable by the event group as unusable by clearing those regions'
MMIO addresses. A region is considered unusable if:
1) GUID does not match the GUID of the event group.
2) Package ID is invalid.
3) The enumerated size of the MMIO region does not match the expected
value from the XML description file.
Hereafter any telemetry region with an MMIO address is considered valid for
the event group it is associated with.
Enable all the event group's events as long as there is at least one usable
region from where data for its events can be read. Enabling of an event can
fail if the same event has already been enabled as part of another event
group. It should never happen that the same event is described by different
GUID supported by the same system so just WARN (via resctrl_enable_mon_event())
and skip the event.
Note that it is architecturally possible that some telemetry events are only
supported by a subset of the packages in the system. It is not expected that
systems will ever do this. If they do the user will see event files in resctrl
that always return "Unavailable".
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
The resctrl file system layer passes the domain, RMID, and event id to the
architecture to fetch an event counter.
Fetching a telemetry event counter requires additional information that is
private to the architecture, for example, the offset into MMIO space from
where the counter should be read.
Add mon_evt::arch_priv that architecture can use for any private data related
to the event. The resctrl filesystem initializes mon_evt::arch_priv when the
architecture enables the event and passes it back to architecture when needing
to fetch an event counter.
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
The telemetry event aggregators of the Intel Clearwater Forest CPU support two
RMID-based feature types: "energy" with GUID 0x26696143¹, and "perf" with
GUID 0x26557651².
The event counter offsets in an aggregator's MMIO space are arranged in groups
for each RMID.
E.g., the "energy" counters for GUID 0x26696143 are arranged like this:
MMIO offset:0x0000 Counter for RMID 0 PMT_EVENT_ENERGY
MMIO offset:0x0008 Counter for RMID 0 PMT_EVENT_ACTIVITY
MMIO offset:0x0010 Counter for RMID 1 PMT_EVENT_ENERGY
MMIO offset:0x0018 Counter for RMID 1 PMT_EVENT_ACTIVITY
...
MMIO offset:0x23F0 Counter for RMID 575 PMT_EVENT_ENERGY
MMIO offset:0x23F8 Counter for RMID 575 PMT_EVENT_ACTIVITY
After all counters there are three status registers that provide indications
of how many times an aggregator was unable to process event counts, the time
stamp for the most recent loss of data, and the time stamp of the most recent
successful update.
MMIO offset:0x2400 AGG_DATA_LOSS_COUNT
MMIO offset:0x2408 AGG_DATA_LOSS_TIMESTAMP
MMIO offset:0x2410 LAST_UPDATE_TIMESTAMP
Define event_group structures for both of these aggregator types and define
the events tracked by the aggregators in the file system code.
PMT_EVENT_ENERGY and PMT_EVENT_ACTIVITY are produced in fixed point format.
File system code must output as floating point values.
¹https://github.com/intel/Intel-PMT/blob/main/xml/CWF/OOBMSM/RMID-ENERGY/cwf_aggregator.xml
²https://github.com/intel/Intel-PMT/blob/main/xml/CWF/OOBMSM/RMID-PERF/cwf_aggregator.xml
[ bp: Massage commit message. ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
Each CPU collects data for telemetry events that it sends to the nearest
telemetry event aggregator either when the value of MSR_IA32_PQR_ASSOC.RMID
changes, or when a two millisecond timer expires.
There is a feature type ("energy" or "perf"), GUID, and MMIO region associated
with each aggregator. This combination links to an XML description of the
set of telemetry events tracked by the aggregator. XML files are published
by Intel in a GitHub repository¹.
The telemetry event aggregators maintain per-RMID per-event counts of the
total seen for all the CPUs. There may be multiple telemetry event aggregators
per package.
There are separate sets of aggregators for each feature type. Aggregators
in a set may have different GUIDs. All aggregators with the same feature
type and GUID are symmetric keeping counts for the same set of events for
the CPUs that provide data to them.
The XML file for each aggregator provides the following information:
0) Feature type of the events ("perf" or "energy")
1) Which telemetry events are tracked by the aggregator.
2) The order in which the event counters appear for each RMID.
3) The value type of each event counter (integer or fixed-point).
4) The number of RMIDs supported.
5) Which additional aggregator status registers are included.
6) The total size of the MMIO region for an aggregator.
Introduce struct event_group that condenses the relevant information from
an XML file. Hereafter an "event group" refers to a group of events of a
particular feature type (event_group::pfname set to "energy" or "perf") with
a particular GUID.
Use event_group::pfname to determine the feature id needed to obtain the
aggregator details. It will later be used in console messages and with the
rdt= boot parameter.
The INTEL_PMT_TELEMETRY driver enumerates support for telemetry events.
This driver provides intel_pmt_get_regions_by_feature() to list all available
telemetry event aggregators of a given feature type. The list includes the
"guid", the base address in MMIO space for the region where the event counters
are exposed, and the package id where the all the CPUs that report to this
aggregator are located.
Call INTEL_PMT_TELEMETRY's intel_pmt_get_regions_by_feature() for each event
group to obtain a private copy of that event group's aggregator data. Duplicate
the aggregator data between event groups that have the same feature type
but different GUID. Further processing on this private copy will be unique
to the event group.
¹https://github.com/intel/Intel-PMT
[ bp: Zap text explaining the code, s/guid/GUID/g ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
Add a new PERF_PKG resource and introduce package level scope for monitoring
telemetry events so that CPU hotplug notifiers can build domains at the
package granularity.
Use the physical package ID available via topology_physical_package_id()
to identify the monitoring domains with package level scope. This enables
user space to use:
/sys/devices/system/cpu/cpuX/topology/physical_package_id
to identify the monitoring domain a CPU is associated with.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
Enumeration of Intel telemetry events is an asynchronous process involving
several mutually dependent drivers added as auxiliary devices during the
device_initcall() phase of Linux boot. The process finishes after the probe
functions of these drivers completes. But this happens after
resctrl_arch_late_init() is executed.
Tracing the enumeration process shows that it does complete a full seven
seconds before the earliest possible mount of the resctrl file system (when
included in /etc/fstab for automatic mount by systemd).
Add a hook for use by telemetry event enumeration and initialization and
run it once at the beginning of resctrl mount without any locks held.
The architecture is responsible for any required locking.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20260105191711.GBaVwON5nZn-uO6Sqg@fat_crate.local
msr_set_bit() takes a bit number to set but MSR_ZEN2_SPECTRAL_CHICKEN_BIT
is a bit mask. The usual pattern that code uses is a _BIT-named type
macro instead of a mask.
So convert it to a bit number to reflect that.
Also, msr_set_bit() already does the reading and checking whether the
bit needs to be set so use that instead of a local variable.
Fixup tabbing while at it.
No functional changes.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://patch.msgid.link/20251230110731.28108-1-bp@kernel.org
resctrl assumes that all monitor events can be displayed as unsigned decimal
integers.
Hardware architecture counters may provide some telemetry events with greater
precision where the event is not a simple count, but is a measurement of some
sort (e.g. Joules for energy consumed).
Add a new argument to resctrl_enable_mon_event() for architecture code to
inform the file system that the value for a counter is a fixed-point value
with a specific number of binary places.
Only allow architecture to use floating point format on events that the file
system has marked with mon_evt::is_floating_point which reflects the contract
with user space on how the event values are displayed.
Display fixed point values with values rounded to ceil(binary_bits * log10(2))
decimal places. Special case for zero binary bits to print "{value}.0".
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
resctrl assumes that monitor events can only be read from a CPU in the
cpumask_t set of each domain. This is true for x86 events accessed with an
MSR interface, but may not be true for other access methods such as MMIO.
Introduce and use flag mon_evt::any_cpu, settable by architecture, that
indicates there are no restrictions on which CPU can read that event. This
flag is not supported by the L3 event reading that requires to be run on a CPU
that belongs to the L3 domain of the event being read.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
With the arrival of monitor events tied to new domains associated with a
different resource it would be clearer if the L3 resource specific functions
are more accurately named.
Rename three groups of functions:
Functions that allocate/free architecture per-RMID MBM state information:
arch_domain_mbm_alloc() -> l3_mon_domain_mbm_alloc()
mon_domain_free() -> l3_mon_domain_free()
Functions that allocate/free filesystem per-RMID MBM state information:
domain_setup_mon_state() -> domain_setup_l3_mon_state()
domain_destroy_mon_state() -> domain_destroy_l3_mon_state()
Initialization/exit:
rdt_get_mon_l3_config() -> rdt_get_l3_mon_config()
resctrl_mon_resource_init() -> resctrl_l3_mon_resource_init()
resctrl_mon_resource_exit() -> resctrl_l3_mon_resource_exit()
Ensure kernel-doc descriptions of these functions' return values are present
and correctly formatted.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
The upcoming telemetry event monitoring is not tied to the L3 resource and
will have a new domain structure.
Rename the L3 resource specific domain data structures to include "l3_"
in their names to avoid confusion between the different resource specific
domain structures:
rdt_mon_domain -> rdt_l3_mon_domain
rdt_hw_mon_domain -> rdt_hw_l3_mon_domain
No functional change.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
Convert the whole call sequence from mon_event_read() to resctrl_arch_rmid_read() to
pass resource independent struct rdt_domain_hdr instead of an L3 specific domain
structure to prepare for monitoring events in other resources.
This additional layer of indirection obscures which aspects of event counting depend
on a valid domain. Event initialization, support for assignable counters, and normal
event counting implicitly depend on a valid domain while summing of domains does not.
Split summing domains from the core event counting handling to make their respective
dependencies obvious.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
sld_state_show() has a dead str1 below:
if (A) {
...
} else if (B) {
pr_info(... A ? str1 : str2 ...);
}
where A is always false in the second block, implied by the "if (A) else"
pattern. Hence, str2 is always used.
This seems to be some mysterious legacy inherited from the earlier patch
revisions of
ebb1064e7c ("x86/traps: Handle #DB for bus lock").
Earlier revisions¹ did enable both sld and bld at the same time to detect
non-WB bus_locks when split_lock_detect=fatal, but that's no longer true in
the merged revision.
Remove it and translate the pr_info() into its equivalent form.
¹ https://lore.kernel.org/r/20201121023624.3604415-3-fenghua.yu@intel.com
[ bp: Massage commit message; simplify braces ]
Signed-off-by: Rong Zhang <i@rong.moe>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20251215182907.152881-1-i@rong.moe