Commit Graph

14419 Commits

Author SHA1 Message Date
Linus Torvalds 4ea7c1717f Arm:
- Fix trapping regression when no in-kernel irqchip is present
 
 - Check host-provided, untrusted ranges and offsets in pKVM
 
 - Fix regression restoring the ID_PFR1_EL1 register
 
 - Fix vgic ITS locking issues when LPIs are not directly injected
 
 Arm selftests:
 
 - Correct target CPU programming in vgic_lpi_stress selftest
 
 - Fix exposure of SCTLR2_EL2 and ZCR_EL2 in get-reg-list selftest
 
 RISC-V:
 
 - Fix check for local interrupts on riscv32
 
 - Read HGEIP CSR on the correct cpu when checking for IMSIC interrupts
 
 - Remove automatic I/O mapping from kvm_arch_prepare_memory_region()
 
 x86:
 
 - Inject #UD if the guest attempts to execute SEAMCALL or TDCALL as KVM
   doesn't support virtualization the instructions, but the instructions
   are gated only by VMXON.  That is, they will VM-Exit instead of taking
   a #UD and until now this resulted in KVM exiting to userspace with an
   emulation error.
 
 - Unload the "FPU" when emulating INIT of XSTATE features if and only if
   the FPU is actually loaded, instead of trying to predict when KVM will
   emulate an INIT (CET support missed the MP_STATE path).  Add sanity
   checks to detect and harden against similar bugs in the future.
 
 - Unregister KVM's GALog notifier (for AVIC) when kvm-amd.ko is unloaded.
 
 - Use a raw spinlock for svm->ir_list_lock as the lock is taken during
   schedule(), and "normal" spinlocks are sleepable locks when PREEMPT_RT=y.
 
 - Remove guest_memfd bindings on memslot deletion when a gmem file is dying
   to fix a use-after-free race found by syzkaller.
 
 - Fix a goof in the EPT Violation handler where KVM checks the wrong
   variable when determining if the reported GVA is valid.
 
 - Fix and simplify the handling of LBR virtualization on AMD, which was made
   buggy and unnecessarily complicated by nested VM support
 
 Misc:
 
 - Update Oliver's email address
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmkQSAAUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMDzQf9GW6F8yAo1QTnLNfSIvHMQcJAKwr3
 pd+bOLtMN/OzdQO/dvGVdkYuYYg/PQv/+zw6dVIeiLjtlrsTg40rsqVZEAXYCsbB
 q7TFM634thgON6R6fD6eLa+72UP0wMai7xqEfEyXVW3enAEEe+lrPWC9BiwJRqKZ
 pY1MpVIdYa0XNfUBeiOhO5AH+y9OmUDq5AHptrYn9X5xdsU2OWqQCyHW/1RLPWvA
 9bkyuz1A8+EQ20ngHUd0hrQx4UeJ7jvPblbryUxXaMwqahPC9sA2iDI12gAu8a84
 skvWbIPHSMgj5qDO/CkAsHb47GiqudU4LH7LniDZNsq21iTekiURSbdklw==
 =SM6O
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Arm:

   - Fix trapping regression when no in-kernel irqchip is present

   - Check host-provided, untrusted ranges and offsets in pKVM

   - Fix regression restoring the ID_PFR1_EL1 register

   - Fix vgic ITS locking issues when LPIs are not directly injected

  Arm selftests:

   - Correct target CPU programming in vgic_lpi_stress selftest

   - Fix exposure of SCTLR2_EL2 and ZCR_EL2 in get-reg-list selftest

  RISC-V:

   - Fix check for local interrupts on riscv32

   - Read HGEIP CSR on the correct cpu when checking for IMSIC
     interrupts

   - Remove automatic I/O mapping from kvm_arch_prepare_memory_region()

  x86:

   - Inject #UD if the guest attempts to execute SEAMCALL or TDCALL as
     KVM doesn't support virtualization the instructions, but the
     instructions are gated only by VMXON. That is, they will VM-Exit
     instead of taking a #UD and until now this resulted in KVM exiting
     to userspace with an emulation error.

   - Unload the "FPU" when emulating INIT of XSTATE features if and only
     if the FPU is actually loaded, instead of trying to predict when
     KVM will emulate an INIT (CET support missed the MP_STATE path).
     Add sanity checks to detect and harden against similar bugs in the
     future.

   - Unregister KVM's GALog notifier (for AVIC) when kvm-amd.ko is
     unloaded.

   - Use a raw spinlock for svm->ir_list_lock as the lock is taken
     during schedule(), and "normal" spinlocks are sleepable locks when
     PREEMPT_RT=y.

   - Remove guest_memfd bindings on memslot deletion when a gmem file is
     dying to fix a use-after-free race found by syzkaller.

   - Fix a goof in the EPT Violation handler where KVM checks the wrong
     variable when determining if the reported GVA is valid.

   - Fix and simplify the handling of LBR virtualization on AMD, which
     was made buggy and unnecessarily complicated by nested VM support

  Misc:

   - Update Oliver's email address"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
  KVM: nSVM: Fix and simplify LBR virtualization handling with nested
  KVM: nSVM: Always recalculate LBR MSR intercepts in svm_update_lbrv()
  KVM: SVM: Mark VMCB_LBR dirty when MSR_IA32_DEBUGCTLMSR is updated
  MAINTAINERS: Switch myself to using kernel.org address
  KVM: arm64: vgic-v3: Release reserved slot outside of lpi_xa's lock
  KVM: arm64: vgic-v3: Reinstate IRQ lock ordering for LPI xarray
  KVM: arm64: Limit clearing of ID_{AA64PFR0,PFR1}_EL1.GIC to userspace irqchip
  KVM: arm64: Set ID_{AA64PFR0,PFR1}_EL1.GIC when GICv3 is configured
  KVM: arm64: Make all 32bit ID registers fully writable
  KVM: VMX: Fix check for valid GVA on an EPT violation
  KVM: guest_memfd: Remove bindings on memslot deletion when gmem is dying
  KVM: SVM: switch to raw spinlock for svm->ir_list_lock
  KVM: SVM: Make avic_ga_log_notifier() local to avic.c
  KVM: SVM: Unregister KVM's GALog notifier on kvm-amd.ko exit
  KVM: SVM: Initialize per-CPU svm_data at the end of hardware setup
  KVM: x86: Call out MSR_IA32_S_CET is not handled by XSAVES
  KVM: x86: Harden KVM against imbalanced load/put of guest FPU state
  KVM: x86: Unload "FPU" state on INIT if and only if its currently in-use
  KVM: arm64: Check the untrusted offset in FF-A memory share
  KVM: arm64: Check range args for pKVM mem transitions
  ...
2025-11-10 08:54:36 -08:00
Paolo Bonzini 0e5ba55750 KVM x86 fixes for 6.18:
- Inject #UD if the guest attempts to execute SEAMCALL or TDCALL as KVM
    doesn't support virtualization the instructions, but the instructions
    are gated only by VMXON, i.e. will VM-Exit instead of taking a #UD and
    thus result in KVM exiting to userspace with an emulation error.
 
  - Unload the "FPU" when emulating INIT of XSTATE features if and only if
    the FPU is actually loaded, instead of trying to predict when KVM will
    emulate an INIT (CET support missed the MP_STATE path).  Add sanity
    checks to detect and harden against similar bugs in the future.
 
  - Unregister KVM's GALog notifier (for AVIC) when kvm-amd.ko is unloaded.
 
  - Use a raw spinlock for svm->ir_list_lock as the lock is taken during
    schedule(), and "normal" spinlocks are sleepable locks when PREEMPT_RT=y.
 
  - Remove guest_memfd bindings on memslot deletion when a gmem file is dying
    to fix a use-after-free race found by syzkaller.
 
  - Fix a goof in the EPT Violation handler where KVM checks the wrong
    variable when determining if the reported GVA is valid.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmkOa3gACgkQOlYIJqCj
 N/22rQ/9GUfFDRCi7oABkZcOiAY3vDLEY0V71/SrOzaZqNZUJ1EZXMOLgfbCytSv
 jGjDawP7qZyjfpWS8VMd+haqDU0yGz9NsJQ6LaOTJfy+519BRykbOz+2kDiL09Xj
 n8N8EuQWgcsn2FhiYINrool6GwIrtEHGaisvBG+BH6pVmXxgTnFcrB0VGoUlYcyh
 giLDvm1o69vJm1nlDksYP903BVOAEGX3/8V6H67D9B77VU82GVZle5kZ1LPGDyqH
 VF5mBls6bRlYa0XSXZs0DzoBVboioWyqELHdOrC2mt0NMYt66M8FcNDA7A8mxK9m
 BqFUctPoGxQy88zlruvHNjdd+RjTawC5L5CshKsq5So7T1CWGCx5+5DtkwKAJtNS
 uxo0CSyI9r72VN2ktb1yALyymJ4QA+HGT1B9FzKqlGLrNFfNTgmuU+gHIvfScD0B
 rqCXI0Z6t9SY4nKGOG8kDF/P8lox+llE8d6Q1srK+dz2jvhlVzMZPLKPUfghdsbM
 HEH7GwzpT00tRyJeeTyDeHTx0ZaZ9mXUCeI0/jC8LaErTdAXyh/qdCpVYfex6+nw
 xZNbJnYsI/0MzEI//4h8Q02+ijYtk3StgvDay0AUdHMaRA8EjEuN8LNT1BXBrB4e
 eoQtSDF5Jb5OCMZDz7LRqBn7AxjcbPY0a8xeU4P3pVDgKyh4odM=
 =yRcD
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-fixes-6.18-rc5' of https://github.com/kvm-x86/linux into HEAD

KVM x86 fixes for 6.18:

 - Inject #UD if the guest attempts to execute SEAMCALL or TDCALL as KVM
   doesn't support virtualization the instructions, but the instructions
   are gated only by VMXON, i.e. will VM-Exit instead of taking a #UD and
   thus result in KVM exiting to userspace with an emulation error.

 - Unload the "FPU" when emulating INIT of XSTATE features if and only if
   the FPU is actually loaded, instead of trying to predict when KVM will
   emulate an INIT (CET support missed the MP_STATE path).  Add sanity
   checks to detect and harden against similar bugs in the future.

 - Unregister KVM's GALog notifier (for AVIC) when kvm-amd.ko is unloaded.

 - Use a raw spinlock for svm->ir_list_lock as the lock is taken during
   schedule(), and "normal" spinlocks are sleepable locks when PREEMPT_RT=y.

 - Remove guest_memfd bindings on memslot deletion when a gmem file is dying
   to fix a use-after-free race found by syzkaller.

 - Fix a goof in the EPT Violation handler where KVM checks the wrong
   variable when determining if the reported GVA is valid.
2025-11-09 08:07:32 +01:00
Linus Torvalds 0d7bee10be Miscellaneous fixes:
- Fix AMD PCI root device caching regression that triggers
    on certain firmware variants
 
  - Fix the zen5_rdseed_microcode[] array to be NULL-terminated
 
  - Add more AMD models to microcode signature checking
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmkPQyURHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gmBw/+NG6jg8ICxDRcwNSlSfEjOr8EWe7WrmcD
 Y9gYGHi4LbG+8fZCqOZFBUSAePIBJUqcMsRk+CgAxDwZyiJt7jqaXTSpiRTPR9pa
 IjI5FF5qhypTRk5/MgJObKgnmkJwduCuTrkU/9neeLSSZigiOQO1i3imM/KhCSvG
 HGb0qwgDxBdc/uhuiK81hLVHs1CEBnNHoRl8u6gkkNXMFkzVx+5SGWN3BdRwHZuP
 Ttw/B5XZBLYGY0Rs+AUgTnKWdR3DHfZafPnV4mHmAQ3Y5HI49bGxwUqpf+xkx6Mv
 1t/UBPOLG2WlBlO+WJF27/yPit3ITo8yltEBKTsZxTtoXp8cG/qcnfwZnZ5At0/v
 sW94Dg3H+pSsNBSEfDzeywdDpGequ6wcj2DxQDg7qNLSdoOwDARvJD0j21ckdNDq
 HUf8dn8H6Sh+nqQLbWu8nnl6NZUmtaiw0bTsVQjQi8decnpKhbhyLlgSGA0GuWkY
 QzT5mDZuf70N2v+hS03MDtIQvTfF5CbG6LIvUJnzZyge4CWh65JzkTmqGIgbYexV
 CrxMFNY/G6PiTdf7IONG+/qCUiiCRtCFGGNSxkj5NAo0zyX0tInRA2CZHU6M+Dvk
 T7+EOoPl2vPS9hgw9eeMk/0UgjbX8JZSX1pTAcRkSH75GlNqPl4ytxkH+QhvM8vd
 xYqAjIBLA84=
 =duTI
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

 - Fix AMD PCI root device caching regression that triggers
   on certain firmware variants

 - Fix the zen5_rdseed_microcode[] array to be NULL-terminated

 - Add more AMD models to microcode signature checking

* tag 'x86-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/AMD: Add more known models to entry sign checking
  x86/CPU/AMD: Add missing terminator for zen5_rdseed_microcode
  x86/amd_node: Fix AMD root device caching
2025-11-08 09:01:11 -08:00
Linus Torvalds 284922f4c5 x86: uaccess: don't use runtime-const rewriting in modules
The runtime-const infrastructure was never designed to handle the
modular case, because the constant fixup is only done at boot time for
core kernel code.

But by the time I used it for the x86-64 user space limit handling in
commit 86e6b1547b ("x86: fix user address masking non-canonical
speculation issue"), I had completely repressed that fact.

And it all happens to work because the only code that currently actually
gets inlined by modules is for the access_ok() limit check, where the
default constant value works even when not fixed up.  Because at least I
had intentionally made it be something that is in the non-canonical
address space region.

But it's technically very wrong, and it does mean that at least in
theory, the use of 'access_ok()' + '__get_user()' can trigger the same
speculation issue with non-canonical addresses that the original commit
was all about.

The pattern is unusual enough that this probably doesn't matter in
practice, but very wrong is still very wrong.  Also, let's fix it before
the nice optimized scoped user accessor helpers that Thomas Gleixner is
working on cause this pseudo-constant to then be more widely used.

This all came up due to an unrelated discussion with Mateusz Guzik about
using the runtime const infrastructure for names_cachep accesses too.
There the modular case was much more obviously broken, and Mateusz noted
it in his 'v2' of the patch series.

That then made me notice how broken 'access_ok()' had been in modules
all along.  Mea culpa, mea maxima culpa.

Fix it by simply not using the runtime-const code in modules, and just
using the USER_PTR_MAX variable value instead.  This is not
performance-critical like the core user accessor functions (get_user()
and friends) are.

Also make sure this doesn't get forgotten the next time somebody wants
to do runtime constant optimizations by having the x86 runtime-const.h
header file error out if included by modules.

Fixes: 86e6b1547b ("x86: fix user address masking non-canonical speculation issue")
Acked-by: Borislav Petkov <bp@alien8.de>
Acked-by: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Triggered-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/all/20251030105242.801528-1-mjguzik@gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-11-05 10:24:36 +09:00
Yazen Ghannam 0a4b61d9c2 x86/amd_node: Fix AMD root device caching
Recent AMD node rework removed the "search and count" method of caching AMD
root devices. This depended on the value from a Data Fabric register that was
expected to hold the PCI bus of one of the root devices attached to that
fabric.

However, this expectation is incorrect. The register, when read from PCI
config space, returns the bitwise-OR of the buses of all attached root
devices.

This behavior is benign on AMD reference design boards, since the bus numbers
are aligned. This results in a bitwise-OR value matching one of the buses. For
example, 0x00 | 0x40 | 0xA0 | 0xE0 = 0xE0.

This behavior breaks on boards where the bus numbers are not exactly aligned.
For example, 0x00 | 0x07 | 0xE0 | 0x15 = 0x1F.

The examples above are for AMD node 0. The first root device on other nodes
will not be 0x00. The first root device for other nodes will depend on the
total number of root devices, the system topology, and the specific PCI bus
number assignment.

For example, a system with 2 AMD nodes could have this:

  Node 0 : 0x00 0x07 0x0e 0x15
  Node 1 : 0x1c 0x23 0x2a 0x31

The bus numbering style in the reference boards is not a requirement.  The
numbering found in other boards is not incorrect. Therefore, the root device
caching method needs to be adjusted.

Go back to the "search and count" method used before the recent rework.
Search for root devices using PCI class code rather than fixed PCI IDs.

This keeps the goal of the rework (remove dependency on PCI IDs) while being
able to support various board designs.

Merge helper functions to reduce code duplication.

  [ bp: Reflow comment. ]

Fixes: 40a5f6ffdf ("x86/amd_nb: Simplify root device search")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/all/20251028-fix-amd-root-v2-1-843e38f8be2c@amd.com
2025-11-03 12:46:57 +01:00
Nathan Chancellor 9b041a4b66 x86/mm: Ensure clear_page() variants always have __kcfi_typeid_ symbols
When building with CONFIG_CFI=y and CONFIG_LTO_CLANG_FULL=y, there is a series
of errors from the various versions of clear_page() not having __kcfi_typeid_
symbols.

  $ cat kernel/configs/repro.config
  CONFIG_CFI=y
  # CONFIG_LTO_NONE is not set
  CONFIG_LTO_CLANG_FULL=y

  $ make -skj"$(nproc)" ARCH=x86_64 LLVM=1 clean defconfig repro.config bzImage
  ld.lld: error: undefined symbol: __kcfi_typeid_clear_page_rep
  >>> referenced by ld-temp.o
  >>>               vmlinux.o:(__cfi_clear_page_rep)

  ld.lld: error: undefined symbol: __kcfi_typeid_clear_page_orig
  >>> referenced by ld-temp.o
  >>>               vmlinux.o:(__cfi_clear_page_orig)

  ld.lld: error: undefined symbol: __kcfi_typeid_clear_page_erms
  >>> referenced by ld-temp.o
  >>>               vmlinux.o:(__cfi_clear_page_erms)

With full LTO, it is possible for LLVM to realize that these functions never
have their address taken (as they are only used within an alternative, which
will make them a direct call) across the whole kernel and either drop or skip
generating their kCFI type identification symbols.

clear_page_{rep,orig,erms}() are defined in clear_page_64.S with
SYM_TYPED_FUNC_START as a result of

  2981557cb0 ("x86,kcfi: Fix EXPORT_SYMBOL vs kCFI"),

as exported functions are free to be called indirectly thus need kCFI type
identifiers.

Use KCFI_REFERENCE with these clear_page() functions to force LLVM to see
these functions as address-taken and generate then keep the kCFI type
identifiers.

Fixes: 2981557cb0 ("x86,kcfi: Fix EXPORT_SYMBOL vs kCFI")
Closes: https://github.com/ClangBuiltLinux/linux/issues/2128
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://patch.msgid.link/20251013-x86-fix-clear_page-cfi-full-lto-errors-v1-1-d69534c0be61@kernel.org
2025-10-31 22:47:24 +01:00
Tony Luck 89216c9051 x86/cpu: Add/fix core comments for {Panther,Nova} Lake
The E-core in Panther Lake is Darkmont, not Crestmont.

Nova Lake is built from Coyote Cove (P-core) and Arctic Wolf (E-core).

Fixes: 43bb700cff ("x86/cpu: Update Intel Family comments")
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://patch.msgid.link/20251028172948.6721-1-tony.luck@intel.com
2025-10-30 11:34:02 +01:00
Sean Christopherson 9d7dfb95da KVM: VMX: Inject #UD if guest tries to execute SEAMCALL or TDCALL
Add VMX exit handlers for SEAMCALL and TDCALL to inject a #UD if a non-TD
guest attempts to execute SEAMCALL or TDCALL.  Neither SEAMCALL nor TDCALL
is gated by any software enablement other than VMXON, and so will generate
a VM-Exit instead of e.g. a native #UD when executed from the guest kernel.

Note!  No unprivileged DoS of the L1 kernel is possible as TDCALL and
SEAMCALL #GP at CPL > 0, and the CPL check is performed prior to the VMX
non-root (VM-Exit) check, i.e. userspace can't crash the VM. And for a
nested guest, KVM forwards unknown exits to L1, i.e. an L2 kernel can
crash itself, but not L1.

Note #2!  The Intel® Trust Domain CPU Architectural Extensions spec's
pseudocode shows the CPL > 0 check for SEAMCALL coming _after_ the VM-Exit,
but that appears to be a documentation bug (likely because the CPL > 0
check was incorrectly bundled with other lower-priority #GP checks).
Testing on SPR and EMR shows that the CPL > 0 check is performed before
the VMX non-root check, i.e. SEAMCALL #GPs when executed in usermode.

Note #3!  The aforementioned Trust Domain spec uses confusing pseudocode
that says that SEAMCALL will #UD if executed "inSEAM", but "inSEAM"
specifically means in SEAM Root Mode, i.e. in the TDX-Module.  The long-
form description explicitly states that SEAMCALL generates an exit when
executed in "SEAM VMX non-root operation".  But that's a moot point as the
TDX-Module injects #UD if the guest attempts to execute SEAMCALL, as
documented in the "Unconditionally Blocked Instructions" section of the
TDX-Module base specification.

Cc: stable@vger.kernel.org
Cc: Kai Huang <kai.huang@intel.com>
Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20251016182148.69085-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 09:37:04 -07:00
Linus Torvalds 9591fdb061 - Remove a bunch of asm implementing condition flags testing in KVM's
emulator in favor of int3_emulate_jcc() which is written in C
 
 - Replace KVM fastops with C-based stubs which avoids problems with the
   fastop infra related to latter not adhering to the C ABI due to their
   special calling convention and, more importantly, bypassing compiler
   control-flow integrity checking because they're written in asm
 
 - Remove wrongly used static branches and other ugliness accumulated
   over time in hyperv's hypercall implementation with a proper static
   function call to the correct hypervisor call variant
 
 - Add some fixes and modifications to allow running FRED-enabled kernels
   in KVM even on non-FRED hardware
 
 - Add kCFI improvements like validating indirect calls and prepare for
   enabling kCFI with GCC. Add cmdline params documentation and other
   code cleanups
 
 - Use the single-byte 0xd6 insn as the official #UD single-byte
   undefined opcode instruction as agreed upon by both x86 vendors
 
 - Other smaller cleanups and touchups all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmjqXxkACgkQEsHwGGHe
 VUq9QBAAsjaay99a1+Dc53xyP1/HzCUFZDOzEYhj9zF85I8/xA9vTXZr7Qg2m6os
 +4EEmnlwU43AR5KgwGJcuszLF9qSqTMz5qkAdFpvnoQ1Hbc8b49A+3yo9/hM7NA2
 gPGH0gVZVBcffoETiQ8tJN6C9H6Ec0nTZwKTbasWwxz5oUAw+ppjP+aF4rFQ2/5w
 b1ofrcga5yucjvSlXjBOEwHvd21l7O9iMre1oGEn6b0E2LU8ldToRkJkVZIhkWeL
 2Iq3gYtVNN4Ao06WbV/EfXAqg5HWXjcm5bLcUXDtSF+Blae+gWoCjrT7XQdQGyEq
 J12l4FbIZk5Ha8eWAC425ye9i3Wwo+oie3Cc4SVCMdv5A+AmOF0ijAlo1hcxq0rX
 eGNWm8BKJOJ9zz1kxLISO7CfjULKgpsXLabF5a19uwoCsQgj5YrhlJezaIKHXbnK
 OWwHWg9IuRkN2KLmJa7pXtHkuAHp4MtEV9TP9kU2WCvCInrNrzp3gYtds3pri82c
 8ove+WA3yb/AQ6RCq5vAMLYXBxMRbN7FrmY5ZuwgWJTMi6cp1Sp02mhobwJOgNhO
 H7nKWCZnQMyCLPzVeg97HTSgqSXw13dSrujWX9gWYVWBMfZO1B9HcUrhtiOhH7Q9
 cvELkcqaxKrCKdRHLLYgHeMIQU2tdpsQ5TXHm7C7liEcZPZpk+g=
 =3Otb
 -----END PGP SIGNATURE-----

Merge tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull more x86 updates from Borislav Petkov:

 - Remove a bunch of asm implementing condition flags testing in KVM's
   emulator in favor of int3_emulate_jcc() which is written in C

 - Replace KVM fastops with C-based stubs which avoids problems with the
   fastop infra related to latter not adhering to the C ABI due to their
   special calling convention and, more importantly, bypassing compiler
   control-flow integrity checking because they're written in asm

 - Remove wrongly used static branches and other ugliness accumulated
   over time in hyperv's hypercall implementation with a proper static
   function call to the correct hypervisor call variant

 - Add some fixes and modifications to allow running FRED-enabled
   kernels in KVM even on non-FRED hardware

 - Add kCFI improvements like validating indirect calls and prepare for
   enabling kCFI with GCC. Add cmdline params documentation and other
   code cleanups

 - Use the single-byte 0xd6 insn as the official #UD single-byte
   undefined opcode instruction as agreed upon by both x86 vendors

 - Other smaller cleanups and touchups all over the place

* tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86,retpoline: Optimize patch_retpoline()
  x86,ibt: Use UDB instead of 0xEA
  x86/cfi: Remove __noinitretpoline and __noretpoline
  x86/cfi: Add "debug" option to "cfi=" bootparam
  x86/cfi: Standardize on common "CFI:" prefix for CFI reports
  x86/cfi: Document the "cfi=" bootparam options
  x86/traps: Clarify KCFI instruction layout
  compiler_types.h: Move __nocfi out of compiler-specific header
  objtool: Validate kCFI calls
  x86/fred: KVM: VMX: Always use FRED for IRQs when CONFIG_X86_FRED=y
  x86/fred: Play nice with invoking asm_fred_entry_from_kvm() on non-FRED hardware
  x86/fred: Install system vector handlers even if FRED isn't fully enabled
  x86/hyperv: Use direct call to hypercall-page
  x86/hyperv: Clean up hv_do_hypercall()
  KVM: x86: Remove fastops
  KVM: x86: Convert em_salc() to C
  KVM: x86: Introduce EM_ASM_3WCL
  KVM: x86: Introduce EM_ASM_1SRC2
  KVM: x86: Introduce EM_ASM_2CL
  KVM: x86: Introduce EM_ASM_2W
  ...
2025-10-11 11:19:16 -07:00
Linus Torvalds 2f0a750453 - Simplify inline asm flag output operands now that the minimum compiler
version supports the =@ccCOND syntax
 
 - Remove a bunch of AS_* Kconfig symbols which detect assembler support for
   various instruction mnemonics now that the minimum assembler version
   supports them all
 
 - The usual cleanups all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmjqQswACgkQEsHwGGHe
 VUoFQg/9EoQ8TnWyzdTQ83+4sy1ePIgY+WyRPlDPmyoAjGN1WTT1NUY2JBeaW5CA
 UVKJlaO2Nh/c5YypuJR2PtpPuJlNvRBLwpN3Lj+PiAhaYv8gcyeZg64c4MaRaTyc
 yuoj5CaEhyQ16CDBPAjxDQ6+68YHjltlDSZainj77YWSzcBSflJCYH1RnNlCHiM9
 ggBIoFmWltrCEDDW6d0Phl+Fh3K4tuYexRucIavgE+k4ZD+XqujWeLTaau837yW7
 CMvN16elGorWGRBGiaRGH2sbrh8ruYPw4lr5DlFl7ApoBmxgK9s9peicUHtHQz4H
 E9/c2XjGwVE4MtCI5IfeqG87DfojVeiWkXO30CMRalsFlbZzKs4JwalspIzgxH4s
 m2tsfN++y9eC1b4a8EaSVWBk03xmmNWM7FqjC3LOMyV0aI9dqj/u36aadHMC/GsL
 Rwl1GCnJnwu0Z7bho7L2qB0om4NOkX8H3uyzoOzDNC+RTKvgwumI0LpJBwrUrqW7
 Ftf7hIc52hj94drN2RsVtvu3ueBNJF8SW4VJ13UJyZyJDnB4Os2wrI9aJ1vBam1e
 md90pVVGjiXg/PhoCPDHPYzPs8oV2zNEJ0im/wNhkCH42yMAoIlbFDS77JghzSF2
 sI9vMJVsLN7y/SbiysejTBG83j1dEPIpkC7oSzkYOZNNjCKRWWo=
 =dW6J
 -----END PGP SIGNATURE-----

Merge tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Borislav Petkov:

 - Simplify inline asm flag output operands now that the minimum
   compiler version supports the =@ccCOND syntax

 - Remove a bunch of AS_* Kconfig symbols which detect assembler support
   for various instruction mnemonics now that the minimum assembler
   version supports them all

 - The usual cleanups all over the place

* tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm: Remove code depending on __GCC_ASM_FLAG_OUTPUTS__
  x86/sgx: Use ENCLS mnemonic in <kernel/cpu/sgx/encls.h>
  x86/mtrr: Remove license boilerplate text with bad FSF address
  x86/asm: Use RDPKRU and WRPKRU mnemonics in <asm/special_insns.h>
  x86/idle: Use MONITORX and MWAITX mnemonics in <asm/mwait.h>
  x86/entry/fred: Push __KERNEL_CS directly
  x86/kconfig: Remove CONFIG_AS_AVX512
  crypto: x86 - Remove CONFIG_AS_VPCLMULQDQ
  crypto: X86 - Remove CONFIG_AS_VAES
  crypto: x86 - Remove CONFIG_AS_GFNI
  x86/kconfig: Drop unused and needless config X86_64_SMP
2025-10-11 10:51:14 -07:00
Linus Torvalds 256e341706 Generic:
* Rework almost all of KVM's exports to expose symbols only to KVM's x86
   vendor modules (kvm-{amd,intel}.ko and PPC's kvm-{pr,hv}.ko.
 
 x86:
 
 * Rework almost all of KVM x86's exports to expose symbols only to KVM's
   vendor modules, i.e. to kvm-{amd,intel}.ko.
 
 * Add support for virtualizing Control-flow Enforcement Technology (CET) on
   Intel (Shadow Stacks and Indirect Branch Tracking) and AMD (Shadow Stacks).
   It's worth noting that while SHSTK and IBT can be enabled separately in CPUID,
   it is not really possible to virtualize them separately.  Therefore, Intel
   processors will really allow both SHSTK and IBT under the hood if either is
   made visible in the guest's CPUID.  The alternative would be to intercept
   XSAVES/XRSTORS, which is not feasible for performance reasons.
 
 * Fix a variety of fuzzing WARNs all caused by checking L1 intercepts when
   completing userspace I/O.  KVM has already committed to allowing L2 to
   to perform I/O at that point.
 
 * Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the MSR is
   supposed to exist for v2 PMUs.
 
 * Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs.
 
 * Add support for the immediate forms of RDMSR and WRMSRNS, sans full
   emulator support (KVM should never need to emulate the MSRs outside of
   forced emulation and other contrived testing scenarios).
 
 * Clean up the MSR APIs in preparation for CET and FRED virtualization, as
   well as mediated vPMU support.
 
 * Clean up a pile of PMU code in anticipation of adding support for mediated
   vPMUs.
 
 * Reject in-kernel IOAPIC/PIT for TDX VMs, as KVM can't obtain EOI vmexits
   needed to faithfully emulate an I/O APIC for such guests.
 
 * Many cleanups and minor fixes.
 
 * Recover possible NX huge pages within the TDP MMU under read lock to
   reduce guest jitter when restoring NX huge pages.
 
 * Return -EAGAIN during prefault if userspace concurrently deletes/moves the
   relevant memslot, to fix an issue where prefaulting could deadlock with the
   memslot update.
 
 x86 (AMD):
 
 * Enable AVIC by default for Zen4+ if x2AVIC (and other prereqs) is supported.
 
 * Require a minimum GHCB version of 2 when starting SEV-SNP guests via
   KVM_SEV_INIT2 so that invalid GHCB versions result in immediate errors
   instead of latent guest failures.
 
 * Add support for SEV-SNP's CipherText Hiding, an opt-in feature that prevents
   unauthorized CPU accesses from reading the ciphertext of SNP guest private
   memory, e.g. to attempt an offline attack.  This feature splits the shared
   SEV-ES/SEV-SNP ASID space into separate ranges for SEV-ES and SEV-SNP guests,
   therefore a new module parameter is needed to control the number of ASIDs
   that can be used for VMs with CipherText Hiding vs. how many can be used to
   run SEV-ES guests.
 
 * Add support for Secure TSC for SEV-SNP guests, which prevents the untrusted
   host from tampering with the guest's TSC frequency, while still allowing the
   the VMM to configure the guest's TSC frequency prior to launch.
 
 * Validate the XCR0 provided by the guest (via the GHCB) to avoid bugs
   resulting from bogus XCR0 values.
 
 * Save an SEV guest's policy if and only if LAUNCH_START fully succeeds to
   avoid leaving behind stale state (thankfully not consumed in KVM).
 
 * Explicitly reject non-positive effective lengths during SNP's LAUNCH_UPDATE
   instead of subtly relying on guest_memfd to deal with them.
 
 * Reload the pre-VMRUN TSC_AUX on #VMEXIT for SEV-ES guests, not the host's
   desired TSC_AUX, to fix a bug where KVM was keeping a different vCPU's
   TSC_AUX in the host MSR until return to userspace.
 
 KVM (Intel):
 
 * Preparation for FRED support.
 
 * Don't retry in TDX's anti-zero-step mitigation if the target memslot is
   invalid, i.e. is being deleted or moved, to fix a deadlock scenario similar
   to the aforementioned prefaulting case.
 
 * Misc bugfixes and minor cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmjjx/0UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMLFwf9HXZdqBn6VvkbSL/HIGdNG1BEzeJ0
 MQVEMMdmWJ72JtI6soJ6oN5NWTIJJeMTPuCgRrNxFbIivSdm9vYPTSCNwNBhKb+H
 FEsr62a9T4XgnTqy20h+yZJiKNvwtaggdTWFnUAUqsBSFkEtksAP72odvZx+GNv/
 cndqtxy/84TcJ4ZXFdxElylCcQ9xRoRkqkU8KaVfg88wqMIMbSR3OBSH/g8bqR+3
 cjvDGNC7TPHPEN2Wmq2AYluRlBxB2ZhsOauArsdidPXHAevO+AFnbS27fz6bixZK
 LTS/qwKOsvhFzyHngemuG6s6HgkgBEshfcKk5i7d2ReRjaGP4EvkhmlImA==
 =k49c
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull x86 kvm updates from Paolo Bonzini:
 "Generic:

   - Rework almost all of KVM's exports to expose symbols only to KVM's
     x86 vendor modules (kvm-{amd,intel}.ko and PPC's kvm-{pr,hv}.ko

  x86:

   - Rework almost all of KVM x86's exports to expose symbols only to
     KVM's vendor modules, i.e. to kvm-{amd,intel}.ko

   - Add support for virtualizing Control-flow Enforcement Technology
     (CET) on Intel (Shadow Stacks and Indirect Branch Tracking) and AMD
     (Shadow Stacks).

     It is worth noting that while SHSTK and IBT can be enabled
     separately in CPUID, it is not really possible to virtualize them
     separately. Therefore, Intel processors will really allow both
     SHSTK and IBT under the hood if either is made visible in the
     guest's CPUID. The alternative would be to intercept
     XSAVES/XRSTORS, which is not feasible for performance reasons

   - Fix a variety of fuzzing WARNs all caused by checking L1 intercepts
     when completing userspace I/O. KVM has already committed to
     allowing L2 to to perform I/O at that point

   - Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the
     MSR is supposed to exist for v2 PMUs

   - Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs

   - Add support for the immediate forms of RDMSR and WRMSRNS, sans full
     emulator support (KVM should never need to emulate the MSRs outside
     of forced emulation and other contrived testing scenarios)

   - Clean up the MSR APIs in preparation for CET and FRED
     virtualization, as well as mediated vPMU support

   - Clean up a pile of PMU code in anticipation of adding support for
     mediated vPMUs

   - Reject in-kernel IOAPIC/PIT for TDX VMs, as KVM can't obtain EOI
     vmexits needed to faithfully emulate an I/O APIC for such guests

   - Many cleanups and minor fixes

   - Recover possible NX huge pages within the TDP MMU under read lock
     to reduce guest jitter when restoring NX huge pages

   - Return -EAGAIN during prefault if userspace concurrently
     deletes/moves the relevant memslot, to fix an issue where
     prefaulting could deadlock with the memslot update

  x86 (AMD):

   - Enable AVIC by default for Zen4+ if x2AVIC (and other prereqs) is
     supported

   - Require a minimum GHCB version of 2 when starting SEV-SNP guests
     via KVM_SEV_INIT2 so that invalid GHCB versions result in immediate
     errors instead of latent guest failures

   - Add support for SEV-SNP's CipherText Hiding, an opt-in feature that
     prevents unauthorized CPU accesses from reading the ciphertext of
     SNP guest private memory, e.g. to attempt an offline attack. This
     feature splits the shared SEV-ES/SEV-SNP ASID space into separate
     ranges for SEV-ES and SEV-SNP guests, therefore a new module
     parameter is needed to control the number of ASIDs that can be used
     for VMs with CipherText Hiding vs. how many can be used to run
     SEV-ES guests

   - Add support for Secure TSC for SEV-SNP guests, which prevents the
     untrusted host from tampering with the guest's TSC frequency, while
     still allowing the the VMM to configure the guest's TSC frequency
     prior to launch

   - Validate the XCR0 provided by the guest (via the GHCB) to avoid
     bugs resulting from bogus XCR0 values

   - Save an SEV guest's policy if and only if LAUNCH_START fully
     succeeds to avoid leaving behind stale state (thankfully not
     consumed in KVM)

   - Explicitly reject non-positive effective lengths during SNP's
     LAUNCH_UPDATE instead of subtly relying on guest_memfd to deal with
     them

   - Reload the pre-VMRUN TSC_AUX on #VMEXIT for SEV-ES guests, not the
     host's desired TSC_AUX, to fix a bug where KVM was keeping a
     different vCPU's TSC_AUX in the host MSR until return to userspace

  KVM (Intel):

   - Preparation for FRED support

   - Don't retry in TDX's anti-zero-step mitigation if the target
     memslot is invalid, i.e. is being deleted or moved, to fix a
     deadlock scenario similar to the aforementioned prefaulting case

   - Misc bugfixes and minor cleanups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits)
  KVM: x86: Export KVM-internal symbols for sub-modules only
  KVM: x86: Drop pointless exports of kvm_arch_xxx() hooks
  KVM: x86: Move kvm_intr_is_single_vcpu() to lapic.c
  KVM: Export KVM-internal symbols for sub-modules only
  KVM: s390/vfio-ap: Use kvm_is_gpa_in_memslot() instead of open coded equivalent
  KVM: VMX: Make CR4.CET a guest owned bit
  KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported
  KVM: selftests: Add coverage for KVM-defined registers in MSRs test
  KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test
  KVM: selftests: Extend MSRs test to validate vCPUs without supported features
  KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test
  KVM: selftests: Add an MSR test to exercise guest/host and read/write
  KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors
  KVM: x86: Define Control Protection Exception (#CP) vector
  KVM: x86: Add human friendly formatting for #XM, and #VE
  KVM: SVM: Enable shadow stack virtualization for SVM
  KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  KVM: SVM: Pass through shadow stack MSRs as appropriate
  KVM: SVM: Update dump_vmcb with shadow stack save area additions
  KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02
  ...
2025-10-06 12:37:34 -07:00
Linus Torvalds 50ac57c3b1 - Make TDX and kexec work together
- Skip TDX bug workaround when the bug is not present
  - Update maintainers entries
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmjdYFQACgkQaDWVMHDJ
 krCXKxAAhtOgFjUfz5JZ9AFDN6w+9eJ+gSuwVjqIXhQW0QNQKcGXHLq8OP7yehI4
 I/8qH/PRKnBeciKXqtmVqLgK6A68qzzkVjnA1QSAvkR3fkxGfNd1j/uwRXPwNNeE
 ZTOX8+6fE3Ol0J7+X6TY0lwUbuSiZdzeErZK/LTLTkckEvAzYlr5BxyGu3GUVtho
 MocuVeOewJv5oSeQ4SOLxg2srZaKxsc9yp8W7aylZ2SDzmV6zvjrYdRvEyAxiPuu
 24foC3IuNDnkAvN/s6kPZchJO0wNZOqad5iN1tLPOYobavpD5Y+wSAb7kY9x8MZg
 znTvhO401BX+Cni7T8GTXrNklaH1T3C6e7/F1JODYL8WpkM6/KxUMaoQUQn4g/m0
 DgMIH5DikTbecrp1GRyLmJO8W2RoL2sWzYTSCRHVjmKUKUDpT2Srx3YHzSgSntJy
 jchFkEi3S/FD+wY1pHcrSUOSKHz1xlMFaKSGEM4cM/dWkcSbYO4mWiwm/kkL7pGR
 a+1yle7xQz2U4dIpSn2RET49j0HbuQQP7SIfCnl3hJTiYlqX4cB/A6R0AxSdSqDj
 LvZ9q4NistGYmXWUdPtc0OotZzSrVb0RJKk85ZXjHRCSAeqM7TTm0qfZFKM6ih0F
 zPYQiEyG05pgECEWGS7I3X/j0Qoxn6FKH+pHbmAoRmh8vTdXmQg=
 =oJ6l
 -----END PGP SIGNATURE-----

Merge tag 'x86_tdx_for_6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 TDX updates from Dave Hansen:
 "The biggest change here is making TDX and kexec play nicely together.

  Before this, the memory encryption hardware (which doesn't respect
  cache coherency) could write back old cachelines on top of data in the
  new kernel, so kexec and TDX were made mutually exclusive. This
  removes the limitation.

  There is also some work to tighten up a hardware bug workaround and
  some MAINTAINERS updates.

   - Make TDX and kexec work together

    - Skip TDX bug workaround when the bug is not present

    - Update maintainers entries"

* tag 'x86_tdx_for_6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/virt/tdx: Use precalculated TDVPR page physical address
  KVM/TDX: Explicitly do WBINVD when no more TDX SEAMCALLs
  x86/virt/tdx: Update the kexec section in the TDX documentation
  x86/virt/tdx: Remove the !KEXEC_CORE dependency
  x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum
  x86/virt/tdx: Mark memory cache state incoherent when making SEAMCALL
  x86/sme: Use percpu boolean to control WBINVD during kexec
  x86/kexec: Consolidate relocate_kernel() function parameters
  x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present
  x86/tdx: Tidy reset_pamt functions
  x86/tdx: Eliminate duplicate code in tdx_clear_page()
  MAINTAINERS: Add KVM mail list to the TDX entry
  MAINTAINERS: Add Rick Edgecombe as a TDX reviewer
  MAINTAINERS: Update the file list in the TDX entry.
2025-10-04 10:01:30 -07:00
Linus Torvalds f3826aa996 guest_memfd:
* Add support for host userspace mapping of guest_memfd-backed memory for VM
   types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE (which isn't
   precisely the same thing as CoCo VMs, since x86's SEV-MEM and SEV-ES have
   no way to detect private vs. shared).
 
   This lays the groundwork for removal of guest memory from the kernel direct
   map, as well as for limited mmap() for guest_memfd-backed memory.
 
   For more information see:
   * a6ad54137a ("Merge branch 'guest-memfd-mmap' into HEAD", 2025-08-27)
   * https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
     (guest_memfd in Firecracker)
   * https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/
     (direct map removal)
   * https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/
     (mmap support)
 
 ARM:
 
 * Add support for FF-A 1.2 as the secure memory conduit for pKVM,
   allowing more registers to be used as part of the message payload.
 
 * Change the way pKVM allocates its VM handles, making sure that the
   privileged hypervisor is never tricked into using uninitialised
   data.
 
 * Speed up MMIO range registration by avoiding unnecessary RCU
   synchronisation, which results in VMs starting much quicker.
 
 * Add the dump of the instruction stream when panic-ing in the EL2
   payload, just like the rest of the kernel has always done. This will
   hopefully help debugging non-VHE setups.
 
 * Add 52bit PA support to the stage-1 page-table walker, and make use
   of it to populate the fault level reported to the guest on failing
   to translate a stage-1 walk.
 
 * Add NV support to the GICv3-on-GICv5 emulation code, ensuring
   feature parity for guests, irrespective of the host platform.
 
 * Fix some really ugly architecture problems when dealing with debug
   in a nested VM. This has some bad performance impacts, but is at
   least correct.
 
 * Add enough infrastructure to be able to disable EL2 features and
   give effective values to the EL2 control registers. This then allows
   a bunch of features to be turned off, which helps cross-host
   migration.
 
 * Large rework of the selftest infrastructure to allow most tests to
   transparently run at EL2. This is the first step towards enabling
   NV testing.
 
 * Various fixes and improvements all over the map, including one BE
   fix, just in time for the removal of the feature.
 
 LoongArch:
 
 * Detect page table walk feature on new hardware
 
 * Add sign extension with kernel MMIO/IOCSR emulation
 
 * Improve in-kernel IPI emulation
 
 * Improve in-kernel PCH-PIC emulation
 
 * Move kvm_iocsr tracepoint out of generic code
 
 RISC-V:
 
 * Added SBI FWFT extension for Guest/VM with misaligned delegation and
   pointer masking PMLEN features
 
 * Added ONE_REG interface for SBI FWFT extension
 
 * Added Zicbop and bfloat16 extensions for Guest/VM
 
 * Enabled more common KVM selftests for RISC-V
 
 * Added SBI v3.0 PMU enhancements in KVM and perf driver
 
 s390:
 
 * Improve interrupt cpu for wakeup, in particular the heuristic to decide
   which vCPU to deliver a floating interrupt to.
 
 * Clear the PTE when discarding a swapped page because of CMMA; this
   bug was introduced in 6.16 when refactoring gmap code.
 
 x86 selftests:
 
 * Add #DE coverage in the fastops test (the only exception that's guest-
   triggerable in fastop-emulated instructions).
 
 * Fix PMU selftests errors encountered on Granite Rapids (GNR), Sierra
   Forest (SRF) and Clearwater Forest (CWF).
 
 * Minor cleanups and improvements
 
 x86 (guest side):
 
 * For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when
   overriding guest MTRR for TDX/SNP to fix an issue where ACPI auto-mapping
   could map devices as WB and prevent the device drivers from mapping their
   devices with UC/UC-.
 
 * Make kvm_async_pf_task_wake() a local static helper and remove its
   export.
 
 * Use native qspinlocks when running in a VM with dedicated vCPU=>pCPU
   bindings even when PV_UNHALT is unsupported.
 
 Generic:
 
 * Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as __GFP_NOWARN is
   now included in GFP_NOWAIT.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmjcGSkUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPSPAgAnJDswU4fZ5YdJr6jGzsbSQ6utlIV
 FeEltLKQIM7Aq/uvL6PLN5Kx1Pb/d9r9ag39mDT6lq9fOfJdOLjJr2SBXPTCsrPS
 6hyNL1mlgo5qzs54T8dkMbQThlSgA4zaehsc0zl8vnwil6ygoAdrtTHqZm6V0hu/
 F/sVlikCsLix1hC0KtzwscyWYcjWtXfVoi9eU5WY6ALpQaVXfRUtwyOhGDkldr+m
 i3iDiGiLAZ5Iu3igUCIOEzSSQY0FgLJpzbwJAeUxIvomDkHGJLaR14ijvM+NkRZi
 FBo2CLbjrwXb56Rbh2ABcq0CGJ3EiU3L+CC34UaRLzbtl/2BtpetkC3irA==
 =fyov
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "This excludes the bulk of the x86 changes, which I will send
  separately. They have two not complex but relatively unusual conflicts
  so I will wait for other dust to settle.

  guest_memfd:

   - Add support for host userspace mapping of guest_memfd-backed memory
     for VM types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE
     (which isn't precisely the same thing as CoCo VMs, since x86's
     SEV-MEM and SEV-ES have no way to detect private vs. shared).

     This lays the groundwork for removal of guest memory from the
     kernel direct map, as well as for limited mmap() for
     guest_memfd-backed memory.

     For more information see:
       - commit a6ad54137a ("Merge branch 'guest-memfd-mmap' into HEAD")
       - guest_memfd in Firecracker:
           https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
       - direct map removal:
           https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/
       - mmap support:
           https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/

  ARM:

   - Add support for FF-A 1.2 as the secure memory conduit for pKVM,
     allowing more registers to be used as part of the message payload.

   - Change the way pKVM allocates its VM handles, making sure that the
     privileged hypervisor is never tricked into using uninitialised
     data.

   - Speed up MMIO range registration by avoiding unnecessary RCU
     synchronisation, which results in VMs starting much quicker.

   - Add the dump of the instruction stream when panic-ing in the EL2
     payload, just like the rest of the kernel has always done. This
     will hopefully help debugging non-VHE setups.

   - Add 52bit PA support to the stage-1 page-table walker, and make use
     of it to populate the fault level reported to the guest on failing
     to translate a stage-1 walk.

   - Add NV support to the GICv3-on-GICv5 emulation code, ensuring
     feature parity for guests, irrespective of the host platform.

   - Fix some really ugly architecture problems when dealing with debug
     in a nested VM. This has some bad performance impacts, but is at
     least correct.

   - Add enough infrastructure to be able to disable EL2 features and
     give effective values to the EL2 control registers. This then
     allows a bunch of features to be turned off, which helps cross-host
     migration.

   - Large rework of the selftest infrastructure to allow most tests to
     transparently run at EL2. This is the first step towards enabling
     NV testing.

   - Various fixes and improvements all over the map, including one BE
     fix, just in time for the removal of the feature.

  LoongArch:

   - Detect page table walk feature on new hardware

   - Add sign extension with kernel MMIO/IOCSR emulation

   - Improve in-kernel IPI emulation

   - Improve in-kernel PCH-PIC emulation

   - Move kvm_iocsr tracepoint out of generic code

  RISC-V:

   - Added SBI FWFT extension for Guest/VM with misaligned delegation
     and pointer masking PMLEN features

   - Added ONE_REG interface for SBI FWFT extension

   - Added Zicbop and bfloat16 extensions for Guest/VM

   - Enabled more common KVM selftests for RISC-V

   - Added SBI v3.0 PMU enhancements in KVM and perf driver

  s390:

   - Improve interrupt cpu for wakeup, in particular the heuristic to
     decide which vCPU to deliver a floating interrupt to.

   - Clear the PTE when discarding a swapped page because of CMMA; this
     bug was introduced in 6.16 when refactoring gmap code.

  x86 selftests:

   - Add #DE coverage in the fastops test (the only exception that's
     guest- triggerable in fastop-emulated instructions).

   - Fix PMU selftests errors encountered on Granite Rapids (GNR),
     Sierra Forest (SRF) and Clearwater Forest (CWF).

   - Minor cleanups and improvements

  x86 (guest side):

   - For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when
     overriding guest MTRR for TDX/SNP to fix an issue where ACPI
     auto-mapping could map devices as WB and prevent the device drivers
     from mapping their devices with UC/UC-.

   - Make kvm_async_pf_task_wake() a local static helper and remove its
     export.

   - Use native qspinlocks when running in a VM with dedicated
     vCPU=>pCPU bindings even when PV_UNHALT is unsupported.

  Generic:

   - Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as
     __GFP_NOWARN is now included in GFP_NOWAIT.

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (178 commits)
  KVM: s390: Fix to clear PTE when discarding a swapped page
  KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs
  KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs
  KVM: arm64: selftests: Cope with arch silliness in EL2 selftest
  KVM: arm64: selftests: Add basic test for running in VHE EL2
  KVM: arm64: selftests: Enable EL2 by default
  KVM: arm64: selftests: Initialize HCR_EL2
  KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters
  KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2
  KVM: arm64: selftests: Select SMCCC conduit based on current EL
  KVM: arm64: selftests: Provide helper for getting default vCPU target
  KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts
  KVM: arm64: selftests: Create a VGICv3 for 'default' VMs
  KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation
  KVM: arm64: selftests: Add helper to check for VGICv3 support
  KVM: arm64: selftests: Initialize VGICv3 only once
  KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code
  KVM: selftests: Add ex_str() to print human friendly name of exception vectors
  selftests/kvm: remove stale TODO in xapic_state_test
  KVM: selftests: Handle Intel Atom errata that leads to PMU event overcount
  ...
2025-10-04 08:52:16 -07:00
Linus Torvalds 58809f614e drm next for 6.18-rc1
cross-subsystem:
 - i2c-hid: Make elan touch controllers power on after panel is enabled
 - dt bindings for STM32MP25 SoC
 - pci vgaarb: use screen_info helpers
 - rust pin-init updates
 - add MEI driver for late binding firmware update/load
 
 uapi:
 - add ioctl for reassigning GEM handles
 - provide boot_display attribute on boot-up devices
 
 core:
 - document DRM_MODE_PAGE_FLIP_EVENT
 - add vendor specific recovery method to drm device wedged uevent
 
 gem:
 - Simplify gpuvm locking
 
 ttm:
 - add interface to populate buffers
 
 sched:
 - Fix race condition in trace code
 
 atomic:
 - Reallow no-op async page flips
 
 display:
 - dp: Fix command length
 
 video:
 - Improve pixel-format handling for struct screen_info
 
 rust:
 - drop Opaque<> from ioctl args
 - Alloc:
 - BorrowedPage type and AsPageIter traits
 - Implement Vmalloc::to_page() and VmallocPageIter
 - DMA/Scatterlist:
 - Add dma::DataDirection and type alias for dma_addr_t
 - Abstraction for struct scatterlist and sg_table
 - DRM:
 - simplify use of generics
 - add DriverFile type alias
 - drop Object::SIZE
 - Rust:
 - pin-init tree merge
 - Various methods for AsBytes and FromBytes traits
 
 gpuvm:
 - Support madvice in Xe driver
 
 gpusvm:
 - fix hmm_pfn_to_map_order usage in gpusvm
 
 bridge:
 - Improve and fix ref counting on bridge management
 - cdns-dsi: Various improvements to mode setting
 - Support Solomon SSD2825 plus DT bindings
 - Support Waveshare DSI2DPI plus DT bindings
 - Support Content Protection property
 - display-connector: Improve DP display detection
 - Add support for Radxa Ra620 plus DT bindings
 - adv7511: Provide SPD and HDMI infoframes
 - it6505: Replace crypto_shash with sha()
 - synopsys: Add support for DW DPTX Controller plus DT bindings
 - adv7511: Write full Audio infoframe
 - ite6263: Support vendor-specific infoframes
 - simple: Add support for Realtek RTD2171 DP-to-HDMI plus DT bindings
 
 panel:
 - panel-edp: Support mt8189 Chromebooks; Support BOE NV140WUM-N64;
   Support SHP LQ134Z1; Fixes
 - panel-simple: Support Olimex LCD-OLinuXino-5CTS plus DT bindings
 - Support Samsung AMS561RA01
 - Support Hydis HV101HD1 plus DT bindings
 - ilitek-ili9881c: Refactor mode setting; Add support for Bestar
   BSD1218-A101KL68 LCD plus DT bindings
 - lvds: Add support for Ampire AMP19201200B5TZQW-T03 to DT bindings
 - edp: Add support for additonal mt8189 Chromebook panels
 - lvds: Add DT bindings for EDT ETML0700Z8DHA
 
 amdgpu:
 - add CRIU support for gem objects
 - RAS updates
 - VCN SRAM load fixes
 - EDID read fixes
 - eDP ALPM support
 - Documentation updates
 - Rework PTE flag generation
 - DCE6 fixes
 - VCN devcoredump cleanup
 - MMHUB client id fixes
 - VCN 5.0.1 RAS support
 - SMU 13.0.x updates
 - Expanded PCIe DPC support
 - Expanded VCN reset support
 - VPE per queue reset support
 - give kernel jobs unique id for tracing
 - pre-populate exported buffers
 - cyan skillfish updates
 - make vbios build number available in sysfs
 - userq updates
 - HDCP updates
 - support MMIO remap page as ttm pool
 - JPEG parser updates
 - DCE6 DC updates
 - use devm for i2c buses
 - GPUVM locking updates
 - Drop non-DC DCE11 code
 - improve fallback handling for pixel encoding
 
 amdkfd:
 - SVM/page migration fixes
 - debugfs fixes
 - add CRIO support for gem objects
 - SVM updates
 
 radeon:
 - use dev_warn_once in CS parsers
 
 xe:
 - add madvise interface
 - add DRM_IOCTL_XE_VM_QUERY_MEMORY_RANGE_ATTRS to query VMA count
   and memory attributes
 - drop L# bank mask reporting from media GT3 on Xe3+.
 - add SLPC power_profile sysfs interface
 - add configs attribs to add post/mid context-switch commands
 - handle firmware reported hardware errors notifying userspace with
   device wedged uevent
 - use same dir structure across sysfs/debugfs
 - cleanup and future proof vram region init
 - add G-states and PCI link states to debugfs
 - Add SRIOV support for CCS surfaces on Xe2+
 - Enable SRIOV PF mode by default on supported platforms
 - move flush to common code
 - extended core workarounds for Xe2/3
 - use DRM scheduler for delayed GT TLB invalidations
 - configs improvements and allow VF device enablement
 - prep work to expose mmio regions to userspace
 - VF migration support added
 - prepare GPU SVM for THP migration
 - start fixing XE_PAGE_SIZE vs PAGE_SIZE
 - add PSMI support for hw validation
 - resize VF bars to max possible size according to number of VFs
 - Ensure GT is in C0 during resume
 - pre-populate exported buffers
 - replace xe_hmm with gpusvm
 - add more SVM GT stats to debugfs
 - improve fake pci and WA kunnit handle for new platform testing
 - Test GuC to GuC comms to add debugging
 - use attribute groups to simplify sysfs registration
 - add Late Binding firmware code to interact with MEI
 
 i915:
 - apply multiple JSL/EHL/Gen7/Gen6 workarounds properly
 - protect against overflow in active_engine()
 - Use try_cmpxchg64() in __active_lookup()
 - include GuC registers in error state
 - get rid of dev->struct_mutex
 - iopoll: generalize read_poll_timout
 - lots more display refactoring
 - Reject HBR3 in any eDP Panel
 - Prune modes for YUV420
 - Display Wa fix, additions, and updates
 - DP: Fix 2.7 Gbps link training on g4x
 - DP: Adjust the idle pattern handling
 - DP: Shuffle the link training code a bit
 - Don't set/read the DSI C clock divider on GLK
 - Enable_psr kernel parameter changes
 - Type-C enabled/disconnected dp-alt sink
 - Wildcat Lake enabling
 - DP HDR updates
 - DRAM detection
 - wait PSR idle on dsb commit
 - Remove FBC modulo 4 restriction for ADL-P+
 - panic: refactor framebuffer allocation
 
 habanalabs:
 - debug/visibility improvements
 - vmalloc-backed coherent mmap support
 - HLDIO infrastructure
 
 nova-core:
 - various register!() macro improvements
 - minor vbios/firmware fixes/refactoring
 - advance firmware boot stages; process Booter and patch signatures
 - process GSP and GSP bootloader
 - Add r570.144 firmware bindings and update to it
 - Move GSP boot code to own module
 - Use new pin-init features to store driver's private data in a single
  allocation
 - Update ARef import from sync::aref
 
 nova-drm:
 - Update ARef import from sync::aref
 
 tyr:
 - initial driver skeleton for a rust driver for ARM Mali GPUs
 - capable of powering up, query metadata and provide it to userspace.
 
 msm:
 - GPU and Core:
 - in DT bindings describe clocks per GPU type
 - GMU bandwidth voting for x1-85
 - a623/a663 speedbins
 - cleanup some remaining no-iommu leftovers after VM_BIND conversion
 - fix GEM obj 32b size truncation
 - add missing VM_BIND param validation
 - IFPC for x1-85 and a750
 - register xml and gen_header.py sync from mesa
 - Display:
 - add missing bindings for display on SC8180X
 - added DisplayPort MST bindings
 - conversion from round_rate() to determine_rate()
 
 amdxdna:
 - add IOCTL_AMDXDNA_GET_ARRAY
 - support user space allocated buffers
 - streamline PM interfaces
 - Refactoring wrt. hardware contexts
 - improve error reporting
 
 nouveau:
 - use GSP firmware by default
 - improve error reporting
 - Pre-populate exported buffers
 
 ast:
 - Clean up detection of DRAM config
 
 exynos:
 - add DSIM bridge driver support for Exynos7870
 - Document Exynos7870 DSIM compatible in dt-binding
 
 panthor:
 - Print task/pid on errors
 - Add support for Mali G710, G510, G310, Gx15, Gx20, Gx25
 - Improve cache flushing
 - Fail VM bind if BO has offset
 
 renesas:
 - convert to RUNTIME_PM_OPS
 
 rcar-du:
 - Make number of lanes configurable
 - Use RUNTIME_PM_OPS
 - Add support for DSI commands
 
 rocket:
 - Add driver for Rockchip NPU plus DT bindings
 - Use kfree() and sizeof() correctly
 - Test DMA status
 
 rockchip:
 - dsi2: Add support for RK3576 plus DT bindings
 - Add support for RK3588 DPTX output
 
 tidss:
 - Use crtc_ fields for programming display mode
 - Remove other drivers from aperture
 
 pixpaper:
 - Add support for Mayqueen Pixpaper plus DT bindings
 
 v3d:
 - Support querying nubmer of GPU resets for KHR_robustness
 
 stm:
 - Clean up logging
 - ltdc: Add support support for STM32MP257F-EV1 plus DT bindings
 
 sitronix:
 - st7571-i2c: Add support for inverted displays and 2-bit grayscale
 
 tidss:
 - Convert to kernel's FIELD_ macros
 
 vesadrm:
 - Support 8-bit palette mode
 
 imagination:
 - Improve power management
 - Add support for TH1520 GPU
 - Support Risc-V architectures
 
 v3d:
 - Improve job management and locking
 
 vkms:
 - Support variants of ARGB8888, ARGB16161616, RGB565, RGB888 and P01x
 - Spport YUV with 16-bit components
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmjcpjkACgkQDHTzWXnE
 hr7Q7g/5AcxXqLUx7wvmDga9TpzIjDD+C+MOt568RpFQ9cYprI+/86ma7ELCpuNe
 dVgeobxQb/jyhf4acdBU+t5aZz+j8VPhPtIPrPY2kOVDuL1NfeQNS8VmGNpFhR+0
 6hqVrtfvbYdLBrAHrU/V/RwZlBJvI/D/I2QGuvZZwWzCBgYd4u4bGuRyBCvGDxOD
 CTPaEqYyzjvpVuzu7AGQk655WkZQnyPmiezIl2lit1meEMMMv80HePkyWHclZo7Q
 hMqsEasSp5w5Q5EpYqVr1z5IdBAV1O53oor9W573J3kEoB4o1zEsTPfLO4N1dgXo
 bfvc24uW3zyChWY2hWyRKvOzvAoClnjfY6whv9NRP0Qi4UjzhLlNOpmhm9cst/J+
 uj2Nn8UJtyvFJbTmDvoocpgdhq2mkGKdIVhVQ6tG7PjihFmyQRF7PJZjb+0Vee7L
 53F0c4d6HiBI4DHa+lH6fgQUBspIvSfmcnR0ACg29NByib+JEoPSPb4ET+uZ8lLd
 IbQvNiCdnUduYDCKfo5ea/FesP8AXy1KfSa+z7oEEFYHbbkc7PSztUagEyZdS/yS
 FnnYqmo/DidmyM4nxDQUII+UDqjng7fo+l4BzIhL12pR693KzCf0mexMr6SA24ny
 gasN97923OTle1J9xrPrKavkx6WjswZCvOaG7ZbnJB47ydJVu5w=
 =ZVKY
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2025-10-01' of https://gitlab.freedesktop.org/drm/kernel

Pull drm updates from Dave Airlie:
 "cross-subsystem:
   - i2c-hid: Make elan touch controllers power on after panel is
     enabled
   - dt bindings for STM32MP25 SoC
   - pci vgaarb: use screen_info helpers
   - rust pin-init updates
   - add MEI driver for late binding firmware update/load

  uapi:
   - add ioctl for reassigning GEM handles
   - provide boot_display attribute on boot-up devices

  core:
   - document DRM_MODE_PAGE_FLIP_EVENT
   - add vendor specific recovery method to drm device wedged uevent

  gem:
   - Simplify gpuvm locking

  ttm:
   - add interface to populate buffers

  sched:
   - Fix race condition in trace code

  atomic:
   - Reallow no-op async page flips

  display:
   - dp: Fix command length

  video:
   - Improve pixel-format handling for struct screen_info

  rust:
   - drop Opaque<> from ioctl args
   - Alloc:
       - BorrowedPage type and AsPageIter traits
       - Implement Vmalloc::to_page() and VmallocPageIter
   - DMA/Scatterlist:
       - Add dma::DataDirection and type alias for dma_addr_t
       - Abstraction for struct scatterlist and sg_table
   - DRM:
       - simplify use of generics
       - add DriverFile type alias
       - drop Object::SIZE
   - Rust:
       - pin-init tree merge
       - Various methods for AsBytes and FromBytes traits

  gpuvm:
   - Support madvice in Xe driver

  gpusvm:
   - fix hmm_pfn_to_map_order usage in gpusvm

  bridge:
   - Improve and fix ref counting on bridge management
   - cdns-dsi: Various improvements to mode setting
   - Support Solomon SSD2825 plus DT bindings
   - Support Waveshare DSI2DPI plus DT bindings
   - Support Content Protection property
   - display-connector: Improve DP display detection
   - Add support for Radxa Ra620 plus DT bindings
   - adv7511: Provide SPD and HDMI infoframes
   - it6505: Replace crypto_shash with sha()
   - synopsys: Add support for DW DPTX Controller plus DT bindings
   - adv7511: Write full Audio infoframe
   - ite6263: Support vendor-specific infoframes
   - simple: Add support for Realtek RTD2171 DP-to-HDMI plus DT bindings

  panel:
   - panel-edp: Support mt8189 Chromebooks; Support BOE NV140WUM-N64;
     Support SHP LQ134Z1; Fixes
   - panel-simple: Support Olimex LCD-OLinuXino-5CTS plus DT bindings
   - Support Samsung AMS561RA01
   - Support Hydis HV101HD1 plus DT bindings
   - ilitek-ili9881c: Refactor mode setting; Add support for Bestar
     BSD1218-A101KL68 LCD plus DT bindings
   - lvds: Add support for Ampire AMP19201200B5TZQW-T03 to DT bindings
   - edp: Add support for additonal mt8189 Chromebook panels
   - lvds: Add DT bindings for EDT ETML0700Z8DHA

  amdgpu:
   - add CRIU support for gem objects
   - RAS updates
   - VCN SRAM load fixes
   - EDID read fixes
   - eDP ALPM support
   - Documentation updates
   - Rework PTE flag generation
   - DCE6 fixes
   - VCN devcoredump cleanup
   - MMHUB client id fixes
   - VCN 5.0.1 RAS support
   - SMU 13.0.x updates
   - Expanded PCIe DPC support
   - Expanded VCN reset support
   - VPE per queue reset support
   - give kernel jobs unique id for tracing
   - pre-populate exported buffers
   - cyan skillfish updates
   - make vbios build number available in sysfs
   - userq updates
   - HDCP updates
   - support MMIO remap page as ttm pool
   - JPEG parser updates
   - DCE6 DC updates
   - use devm for i2c buses
   - GPUVM locking updates
   - Drop non-DC DCE11 code
   - improve fallback handling for pixel encoding

  amdkfd:
   - SVM/page migration fixes
   - debugfs fixes
   - add CRIO support for gem objects
   - SVM updates

  radeon:
   - use dev_warn_once in CS parsers

  xe:
   - add madvise interface
   - add DRM_IOCTL_XE_VM_QUERY_MEMORY_RANGE_ATTRS to query VMA count
     and memory attributes
   - drop L# bank mask reporting from media GT3 on Xe3+.
   - add SLPC power_profile sysfs interface
   - add configs attribs to add post/mid context-switch commands
   - handle firmware reported hardware errors notifying userspace with
     device wedged uevent
   - use same dir structure across sysfs/debugfs
   - cleanup and future proof vram region init
   - add G-states and PCI link states to debugfs
   - Add SRIOV support for CCS surfaces on Xe2+
   - Enable SRIOV PF mode by default on supported platforms
   - move flush to common code
   - extended core workarounds for Xe2/3
   - use DRM scheduler for delayed GT TLB invalidations
   - configs improvements and allow VF device enablement
   - prep work to expose mmio regions to userspace
   - VF migration support added
   - prepare GPU SVM for THP migration
   - start fixing XE_PAGE_SIZE vs PAGE_SIZE
   - add PSMI support for hw validation
   - resize VF bars to max possible size according to number of VFs
   - Ensure GT is in C0 during resume
   - pre-populate exported buffers
   - replace xe_hmm with gpusvm
   - add more SVM GT stats to debugfs
   - improve fake pci and WA kunnit handle for new platform testing
   - Test GuC to GuC comms to add debugging
   - use attribute groups to simplify sysfs registration
   - add Late Binding firmware code to interact with MEI

  i915:
   - apply multiple JSL/EHL/Gen7/Gen6 workarounds properly
   - protect against overflow in active_engine()
   - Use try_cmpxchg64() in __active_lookup()
   - include GuC registers in error state
   - get rid of dev->struct_mutex
   - iopoll: generalize read_poll_timout
   - lots more display refactoring
   - Reject HBR3 in any eDP Panel
   - Prune modes for YUV420
   - Display Wa fix, additions, and updates
   - DP: Fix 2.7 Gbps link training on g4x
   - DP: Adjust the idle pattern handling
   - DP: Shuffle the link training code a bit
   - Don't set/read the DSI C clock divider on GLK
   - Enable_psr kernel parameter changes
   - Type-C enabled/disconnected dp-alt sink
   - Wildcat Lake enabling
   - DP HDR updates
   - DRAM detection
   - wait PSR idle on dsb commit
   - Remove FBC modulo 4 restriction for ADL-P+
   - panic: refactor framebuffer allocation

  habanalabs:
   - debug/visibility improvements
   - vmalloc-backed coherent mmap support
   - HLDIO infrastructure

  nova-core:
   - various register!() macro improvements
   - minor vbios/firmware fixes/refactoring
   - advance firmware boot stages; process Booter and patch signatures
   - process GSP and GSP bootloader
   - Add r570.144 firmware bindings and update to it
   - Move GSP boot code to own module
   - Use new pin-init features to store driver's private data in a
     single allocation
   - Update ARef import from sync::aref

  nova-drm:
   - Update ARef import from sync::aref

  tyr:
   - initial driver skeleton for a rust driver for ARM Mali GPUs
   - capable of powering up, query metadata and provide it to userspace.

  msm:
   - GPU and Core:
      - in DT bindings describe clocks per GPU type
      - GMU bandwidth voting for x1-85
      - a623/a663 speedbins
      - cleanup some remaining no-iommu leftovers after VM_BIND conversion
      - fix GEM obj 32b size truncation
      - add missing VM_BIND param validation
      - IFPC for x1-85 and a750
      - register xml and gen_header.py sync from mesa
   - Display:
      - add missing bindings for display on SC8180X
      - added DisplayPort MST bindings
      - conversion from round_rate() to determine_rate()

  amdxdna:
   - add IOCTL_AMDXDNA_GET_ARRAY
   - support user space allocated buffers
   - streamline PM interfaces
   - Refactoring wrt. hardware contexts
   - improve error reporting

  nouveau:
   - use GSP firmware by default
   - improve error reporting
   - Pre-populate exported buffers

  ast:
   - Clean up detection of DRAM config

  exynos:
   - add DSIM bridge driver support for Exynos7870
   - Document Exynos7870 DSIM compatible in dt-binding

  panthor:
   - Print task/pid on errors
   - Add support for Mali G710, G510, G310, Gx15, Gx20, Gx25
   - Improve cache flushing
   - Fail VM bind if BO has offset

  renesas:
   - convert to RUNTIME_PM_OPS

  rcar-du:
   - Make number of lanes configurable
   - Use RUNTIME_PM_OPS
   - Add support for DSI commands

  rocket:
   - Add driver for Rockchip NPU plus DT bindings
   - Use kfree() and sizeof() correctly
   - Test DMA status

  rockchip:
   - dsi2: Add support for RK3576 plus DT bindings
   - Add support for RK3588 DPTX output

  tidss:
   - Use crtc_ fields for programming display mode
   - Remove other drivers from aperture

  pixpaper:
   - Add support for Mayqueen Pixpaper plus DT bindings

  v3d:
   - Support querying nubmer of GPU resets for KHR_robustness

  stm:
   - Clean up logging
   - ltdc: Add support support for STM32MP257F-EV1 plus DT bindings

  sitronix:
   - st7571-i2c: Add support for inverted displays and 2-bit grayscale

  tidss:
   - Convert to kernel's FIELD_ macros

  vesadrm:
   - Support 8-bit palette mode

  imagination:
   - Improve power management
   - Add support for TH1520 GPU
   - Support Risc-V architectures

  v3d:
   - Improve job management and locking

  vkms:
   - Support variants of ARGB8888, ARGB16161616, RGB565, RGB888 and P01x
   - Spport YUV with 16-bit components"

* tag 'drm-next-2025-10-01' of https://gitlab.freedesktop.org/drm/kernel: (1455 commits)
  drm/amd: Add name to modes from amdgpu_connector_add_common_modes()
  drm/amd: Drop some common modes from amdgpu_connector_add_common_modes()
  drm/amdgpu: update MODULE_PARM_DESC for freesync_video
  drm/amd: Use dynamic array size declaration for amdgpu_connector_add_common_modes()
  drm/amd/display: Share dce100_validate_global with DCE6-8
  drm/amd/display: Share dce100_validate_bandwidth with DCE6-8
  drm/amdgpu: Fix fence signaling race condition in userqueue
  amd/amdkfd: enhance kfd process check in switch partition
  amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw
  drm/amd/display: Reject modes with too high pixel clock on DCE6-10
  drm/amd: Drop unnecessary check in amdgpu_connector_add_common_modes()
  drm/amd/display: Only enable common modes for eDP and LVDS
  drm/amdgpu: remove the redeclaration of variable i
  drm/amdgpu/userq: assign an error code for invalid userq va
  drm/amdgpu: revert "rework reserved VMID handling" v2
  drm/amdgpu: remove leftover from enforcing isolation by VMID
  drm/amdgpu: Add fallback to pipe reset if KCQ ring reset fails
  accel/habanalabs: add Infineon version check
  accel/habanalabs/gaudi2: read preboot status after recovering from dirty state
  accel/habanalabs: add HL_GET_P_STATE passthrough type
  ...
2025-10-02 12:47:25 -07:00
Linus Torvalds e1b1d03cee for-6.18/block-20250929
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmjbLCgQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpoY0D/9J+11BC88pBxCrLKv/V2TwCNokRMi0dU3L
 r3EUdA46k0oXmvb6ueZqIcfY2e+IX7rdQkaRbh1zRdsNejqHo4548C3ePWGdBAcM
 OdNEGfpehO0aD0td1+mK/NxoJMLhbs5QraPanz+SOkGZOKeF+vGCga5PUDivsr5J
 16T9yb7i+isENLdAc2RJbZVyAphqHQlo5GHi5ZIKOVi5cNt8GU/q2sQl7NYmGvHd
 aq37svvZHFOhLRajP959Fw9WOxEYITewzQ4UYf1FZjUodJUxO+vCnP0ooBQRlyu8
 1B4PYWwSE+Vn3GkQE0Om+mzo9AVPOiLmoAWGxdgJBMyEkZndocr46XEslXOufQ1Z
 T3Gu19G6jCxcyByNVhjVnaajYKmvSQAy1w75m4XlfqTRm4f9Om+LAJavUk3RuaOL
 7lXKQ7Ql1/Tby9Jmf8afjYYXXotNDNku6rz2P3qtOwAA26mNJfgVt0rO+8XGRDe9
 ioLbCkTjslYMc/Oh4jSsbrspsVALbaQMq/Dmah8k0EWb4QAHVgCJyGBoff3hOboI
 jD6B1enaKOQVgcjWcjm/FjOk3jv2h3v4X26YWQZTvEc/1PnSnST78Zi/ePhzDdmt
 sBALUAS37TfTgNMzrhbHl5Zs13k0C0XyANuayuKuo5hlNnC1wbdap+5FZJOmpuOB
 YT+VkYnaOA==
 =kOmc
 -----END PGP SIGNATURE-----

Merge tag 'for-6.18/block-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block updates from Jens Axboe:

 - NVMe pull request via Keith:
     - FC target fixes (Daniel)
     - Authentication fixes and updates (Martin, Chris)
     - Admin controller handling (Kamaljit)
     - Target lockdep assertions (Max)
     - Keep-alive updates for discovery (Alastair)
     - Suspend quirk (Georg)

 - MD pull request via Yu:
     - Add support for a lockless bitmap.

       A key feature for the new bitmap are that the IO fastpath is
       lockless. If a user issues lots of write IO to the same bitmap
       bit in a short time, only the first write has additional overhead
       to update bitmap bit, no additional overhead for the following
       writes.

       By supporting only resync or recover written data, means in the
       case creating new array or replacing with a new disk, there is no
       need to do a full disk resync/recovery.

 - Switch ->getgeo() and ->bios_param() to using struct gendisk rather
   than struct block_device.

 - Rust block changes via Andreas. This series adds configuration via
   configfs and remote completion to the rnull driver. The series also
   includes a set of changes to the rust block device driver API: a few
   cleanup patches, and a few features supporting the rnull changes.

   The series removes the raw buffer formatting logic from
   `kernel::block` and improves the logic available in `kernel::string`
   to support the same use as the removed logic.

 - floppy arch cleanups

 - Reduce the number of dereferencing needed for ublk commands

 - Restrict supported sockets for nbd. Mostly done to eliminate a class
   of issues perpetually reported by syzbot, by using nonsensical socket
   setups.

 - A few s390 dasd block fixes

 - Fix a few issues around atomic writes

 - Improve DMA interation for integrity requests

 - Improve how iovecs are treated with regards to O_DIRECT aligment
   constraints.

   We used to require each segment to adhere to the constraints, now
   only the request as a whole needs to.

 - Clean up and improve p2p support, enabling use of p2p for metadata
   payloads

 - Improve locking of request lookup, using SRCU where appropriate

 - Use page references properly for brd, avoiding very long RCU sections

 - Fix ordering of recursively submitted IOs

 - Clean up and improve updating nr_requests for a live device

 - Various fixes and cleanups

* tag 'for-6.18/block-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (164 commits)
  s390/dasd: enforce dma_alignment to ensure proper buffer validation
  s390/dasd: Return BLK_STS_INVAL for EINVAL from do_dasd_request
  ublk: remove redundant zone op check in ublk_setup_iod()
  nvme: Use non zero KATO for persistent discovery connections
  nvmet: add safety check for subsys lock
  nvme-core: use nvme_is_io_ctrl() for I/O controller check
  nvme-core: do ioccsz/iorcsz validation only for I/O controllers
  nvme-core: add method to check for an I/O controller
  blk-cgroup: fix possible deadlock while configuring policy
  blk-mq: fix null-ptr-deref in blk_mq_free_tags() from error path
  blk-mq: Fix more tag iteration function documentation
  selftests: ublk: fix behavior when fio is not installed
  ublk: don't access ublk_queue in ublk_unmap_io()
  ublk: pass ublk_io to __ublk_complete_rq()
  ublk: don't access ublk_queue in ublk_need_complete_req()
  ublk: don't access ublk_queue in ublk_check_commit_and_fetch()
  ublk: don't pass ublk_queue to ublk_fetch()
  ublk: don't access ublk_queue in ublk_config_io_buf()
  ublk: don't access ublk_queue in ublk_check_fetch_buf()
  ublk: pass q_id and tag to __ublk_check_and_get_req()
  ...
2025-10-02 10:16:56 -07:00
Linus Torvalds 7601d18be0 A set of changes to consolidate the generic TIF bits accross architectures
All architectures define the same set of generic TIF bits. This makes it
 pointlessly hard to add a new generic TIF bit or to change an existing one.
 
 Provide a generic variant and convert the architectures which utilize the
 generic entry code over to use it. The TIF space is divided into 16 generic
 bits and 16 architecture specific bits, which turned out to provide enough
 space on both sides.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmjaP78THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoTKyD/9NEg3DiN6aM959o1MhvhDk7jNdECXm
 QOvZ85wJYhCv3Pb7RSgMxJL/dR9aWV2b24TSzEeki0awbR4nPeUz82vExmgS7rWX
 9rQLPrOZsrYd76IcAVoV5Ua/g48c+9TM8kLzoZFa9JEYMPmyTEiA3gy4bgab/aov
 L2b903ZrCNkWRKM1Wz5V8xdyPEzhE+cLhbWoPbeCqfzxqbv4+WWKQlPmqamQw+yq
 /61Xhq0tmx3+4hn1IB/Rc8yMTbAK0EwN7SHM+l7yaJ3ijlkGSele4S3mKAHv2I3c
 vODIFwdQ8pRbC1C5eMBnUKRm7Cmf+8CB3m+OIA5ghj10TPFiTUzQ6iG6nUSVniJm
 QB21LHYSWroeQBRibnT5k7RiW5QjtTQmcsjvO7S2rZ/7CkMr7LMu6kN1P0TSiOvc
 SKxs3MO72KaRU/JrEjLqvT2tvdpg2hpffg69U0jA1xCeFULE1jrqo3GwL8dPDk7z
 zKbC73JNg4QJDdi+hIn5nl0fRGVszLzkDum5eyCpCLY/W7BSiQ7q/ayzt9upsSOm
 uc7sqeIgelQMRDMoMNcQUsMnApT0JHOS74WQ03SfahZESj8eFoOb8pr7vaqu4lfi
 6LV4fpwZPBTMDcQ36r2JLuUTqHNHNtWn4xXjQ72ngovIVUL9A2H0DqK+1JhLJ6yX
 tknZ9WVmWVf+JQ==
 =6L98
 -----END PGP SIGNATURE-----

Merge tag 'core-core-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull TIF bit unification updates from Thomas Gleixner:
 "A set of changes to consolidate the generic TIF (thread info flag)
  bits accross architectures.

  All architectures define the same set of generic TIF bits. This makes
  it pointlessly hard to add a new generic TIF bit or to change an
  existing one.

  Provide a generic variant and convert the architectures which utilize
  the generic entry code over to use it. The TIF space is divided into
  16 generic bits and 16 architecture specific bits, which turned out to
  provide enough space on both sides"

* tag 'core-core-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  LoongArch: Fix bitflag conflict for TIF_FIXADE
  riscv: Use generic TIF bits
  loongarch: Use generic TIF bits
  s390/entry: Remove unused TIF flags
  s390: Use generic TIF bits
  x86: Use generic TIF bits
  asm-generic: Provide generic TIF infrastructure
2025-09-30 14:36:20 -07:00
Linus Torvalds 22bdd6e68b - Add functionality to provide runtime firmware updates for the non-x86 parts
of an AMD platform like the security processor (ASP) firmware, modules
   etc, for example. The intent being that these updates are interim,
   live fixups before a proper BIOS update can be attempted
 
 - Add guest support for AMD's Secure AVIC feature which gives encrypted
   guests the needed protection against a malicious hypervisor generating
   unexpected interrupts and injecting them into such guest, thus
   interfering with its operation in an unexpected and negative manner.
   The advantage of this scheme is that the guest determines which
   interrupts and when to accept them vs leaving that to the benevolence
   (or not) of the hypervisor
 
 - Strictly separate the startup code from the rest of the kernel where
   former is executed from the initial 1:1 mapping of memory. The problem
   was that the toolchain-generated version of the code was being
   executed from a different mapping of memory than what was "assumed"
   during code generation, needing an ever-growing pile of fixups for
   absolute memory references which are invalid in the early, 1:1 memory
   mapping during boot.
 
   The major advantage of this is that there's no need to check the 1:1
   mapping portion of the code for absolute relocations anymore and get
   rid of the RIP_REL_REF() macro sprinkling all over the place.
 
   For more info, see Ard's very detailed writeup on this:
   https://lore.kernel.org/r/CAMj1kXEzKEuePEiHB%2BHxvfQbFz0sTiHdn4B%2B%2BzVBJ2mhkPkQ4Q@mail.gmail.com
 
 - The usual cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmjWaaMACgkQEsHwGGHe
 VUr/RBAAnfneG+5U7f5x+hYW68mwZu07eoQn02IW9WGb2xjV6LKYxzDqyEj/+l+x
 jgN77i1uhl/4sqqKBvUjFfgot1gQ0g6M2fok2eZscSf+AHZF+LfDJPl4dFheVENo
 KtPieu1yi2bA+stL9JgaKh0I1ELX40qebXeZY4H4rYVzokHG0H+CEcuhv6Es71bW
 1C6efkZKHS3pAhlRUoa2MZagxnw+3mn9bfZDvSSNNM6I4qy9/CAPZlWw0jGrXKQX
 K/gjBI2KcoqK2bdJtCQsTvbrsuBedjkM6BZveAAhvOVCh6Aq6lnbqirJPJX8WJLq
 bIDAdsWGJ1vOzcgiPwT0e3qsfaTWep6MewcAQ/HnzrksH+IFb7J/l9awUgGY6LFh
 GzG7KPEKIWiLOxYFC+gLxRn8SWhcXHeY/fB8i5OOnhnikODWG4bJtM8F1MTQO4O1
 u2UuZ+wNzgdatJDXmLK1eluyuhkIqCZ7Hd8kpE0Zr32rbipEvuxnUPSyMzfhDM9M
 +UJGm3C205vPU6doRG8X0+EosFGCyZcixQNXhOugmedT5g3XGHHoJtiLj2i29jLN
 Xi0npxh2hwBe6N+WcIRnOfonFTsp6wWYatWPnGWTChpe+OGj9ZISXpmxnFUVCSag
 spG1J+upBA7ck1exuwpS3ldNSiw/066iTxB7Ht02vbeQ4JXIF6M=
 =shVa
 -----END PGP SIGNATURE-----

Merge tag 'x86_apic_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SEV and apic updates from Borislav Petkov:

 - Add functionality to provide runtime firmware updates for the non-x86
   parts of an AMD platform like the security processor (ASP) firmware,
   modules etc, for example. The intent being that these updates are
   interim, live fixups before a proper BIOS update can be attempted

 - Add guest support for AMD's Secure AVIC feature which gives encrypted
   guests the needed protection against a malicious hypervisor
   generating unexpected interrupts and injecting them into such guest,
   thus interfering with its operation in an unexpected and negative
   manner.

   The advantage of this scheme is that the guest determines which
   interrupts and when to accept them vs leaving that to the benevolence
   (or not) of the hypervisor

 - Strictly separate the startup code from the rest of the kernel where
   former is executed from the initial 1:1 mapping of memory.

   The problem was that the toolchain-generated version of the code was
   being executed from a different mapping of memory than what was
   "assumed" during code generation, needing an ever-growing pile of
   fixups for absolute memory references which are invalid in the early,
   1:1 memory mapping during boot.

   The major advantage of this is that there's no need to check the 1:1
   mapping portion of the code for absolute relocations anymore and get
   rid of the RIP_REL_REF() macro sprinkling all over the place.

   For more info, see Ard's very detailed writeup on this [1]

 - The usual cleanups and fixes

Link: https://lore.kernel.org/r/CAMj1kXEzKEuePEiHB%2BHxvfQbFz0sTiHdn4B%2B%2BzVBJ2mhkPkQ4Q@mail.gmail.com [1]

* tag 'x86_apic_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits)
  x86/boot: Drop erroneous __init annotation from early_set_pages_state()
  crypto: ccp - Add AMD Seamless Firmware Servicing (SFS) driver
  crypto: ccp - Add new HV-Fixed page allocation/free API
  x86/sev: Add new dump_rmp parameter to snp_leak_pages() API
  x86/startup/sev: Document the CPUID flow in the boot #VC handler
  objtool: Ignore __pi___cfi_ prefixed symbols
  x86/sev: Zap snp_abort()
  x86/apic/savic: Do not use snp_abort()
  x86/boot: Get rid of the .head.text section
  x86/boot: Move startup code out of __head section
  efistub/x86: Remap inittext read-execute when needed
  x86/boot: Create a confined code area for startup code
  x86/kbuild: Incorporate boot/startup/ via Kbuild makefile
  x86/boot: Revert "Reject absolute references in .head.text"
  x86/boot: Check startup code for absence of absolute relocations
  objtool: Add action to check for absence of absolute relocations
  x86/sev: Export startup routines for later use
  x86/sev: Move __sev_[get|put]_ghcb() into separate noinstr object
  x86/sev: Provide PIC aliases for SEV related data objects
  x86/boot: Provide PIC aliases for 5-level paging related constants
  ...
2025-09-30 13:40:35 -07:00
Linus Torvalds 2cb8eeaf00 - Add support on AMD for assigning QOS bandwidth counters to resources
(RMIDs) with the ability for those resources to be tracked by the
   counters as long as they're assigned to them. Previously, due to hw
   limitations, bandwidth counts from untracked resources would get lost
   when those resources are not tracked. Refactor the code and user
   interfaces to be able to also support other, similar features on ARM,
   for example
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmjWZ+cACgkQEsHwGGHe
 VUoRrRAAjy/bn0gQCJPPySUfUa7rTG/qiriI6Ax9tc09X8ahspeUQ0IHCK1RWNwm
 tpzbsrTQImA8AeBkNbnD6ALKGSZdfL1l0hMZkKr1Htg35JoHSDMIalxab+sYCfg8
 cysw0tmEKQce1wC8PtP4iX6ENbci4thS6td/t8rWcWTg1ndfOF9eTKHtZvZ/W0Qr
 zJ8MnBIqNhPMV0VEUxacmLuqPGnyWuXyIXzX9nHTc3fjSzTc94jqIB20wFBRFPAx
 EVFivq/mpu3uXXvvumuZZzXuqgBTIMMx2KjABw+5LNU9BIvkzoFwYZvRlUoU9es8
 ASSBddt2jvYyHLal8MUXYieIHJAESKOe2apxtP74ZJIR0t7pUswobPyYO0GEseuQ
 +H2QbLlmJGNO3kJZUuSUGZ9yfjW4JsqM5UlowlVoHGvAOXIW+wR6r1tbdErfima9
 rkqwUCzCXJg3UJ+a6xrK7hhc7QsbIHoo8mHBD+q5I1Hoipwoqe7j6YV80Oely/IC
 Gy3mFQ3e/KwRplfO7qHT88sfAAh+J+TmsVQTHlM62wrnjQOOPcMm7r9m0A2eTOme
 7ZlvcU2q9nAJBaA8ylXbBt4l2AcFKaJg8btskWiZLAXGIjAb6B+5uepmwPRavY5Q
 NdiVa3pk7vGDl9oJ2DJIFKbq3adr50TCCarUM0304s3BsQvKnX4=
 =2FMs
 -----END PGP SIGNATURE-----

Merge tag 'x86_cache_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 resource control updates from Borislav Petkov:
 "Add support on AMD for assigning QoS bandwidth counters to resources
  (RMIDs) with the ability for those resources to be tracked by the
  counters as long as they're assigned to them.

  Previously, due to hw limitations, bandwidth counts from untracked
  resources would get lost when those resources are not tracked.

  Refactor the code and user interfaces to be able to also support
  other, similar features on ARM, for example"

* tag 'x86_cache_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
  fs/resctrl: Fix counter auto-assignment on mkdir with mbm_event enabled
  MAINTAINERS: resctrl: Add myself as reviewer
  x86/resctrl: Configure mbm_event mode if supported
  fs/resctrl: Introduce the interface to switch between monitor modes
  fs/resctrl: Disable BMEC event configuration when mbm_event mode is enabled
  fs/resctrl: Introduce the interface to modify assignments in a group
  fs/resctrl: Introduce mbm_L3_assignments to list assignments in a group
  fs/resctrl: Auto assign counters on mkdir and clean up on group removal
  fs/resctrl: Introduce mbm_assign_on_mkdir to enable assignments on mkdir
  fs/resctrl: Provide interface to update the event configurations
  fs/resctrl: Add event configuration directory under info/L3_MON/
  fs/resctrl: Support counter read/reset with mbm_event assignment mode
  x86/resctrl: Implement resctrl_arch_reset_cntr() and resctrl_arch_cntr_read()
  x86/resctrl: Refactor resctrl_arch_rmid_read()
  fs/resctrl: Introduce counter ID read, reset calls in mbm_event mode
  fs/resctrl: Pass struct rdtgroup instead of individual members
  fs/resctrl: Add the functionality to unassign MBM events
  fs/resctrl: Add the functionality to assign MBM events
  x86,fs/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
  fs/resctrl: Introduce event configuration field in struct mon_evt
  ...
2025-09-30 13:29:42 -07:00
Linus Torvalds a65879b458 - Make UMIP instruction detection more robust
- Correct and cleanup AMD CPU topology detection; document the relevant
   CPUID leaves topology parsing precedence on AMD
 
 - Add support for running the kernel as guest on FreeBSD's Bhyve
   hypervisor
 
 - Cleanups and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmjWZx4ACgkQEsHwGGHe
 VUqskRAAuhsxDxqbA/HnHlDkLM2YgfpE8j9XkFen9R1kFNi8WtQ7DPNdA9SeDJF2
 NpRuZZpzpQrmHwwlERHzOY6AxwWJXdK2lA2GF/jkdvB9lRDAwlb4nenBZNnTL1zo
 NDctnKPto1Nz1fOLFl3Hve++PoOkdzOQZZPQ7LYS5wm6zaRv2O5h0s6Rfw7Kt9YC
 Sl+WYFZ3SA8KOoaHZKhGcgsLNkxv0lV7oXyhSQ1MXT3rFkG2jt/IzjwCyOVK50bj
 iob6MOfpuM9SG2ZMGBL4P82GBUG1E3BMhqiLbYYxyuMdxRgt/t3zlK4x/yjNbOET
 3iQM+aimqOlHKmnJm/qzs2rjRhYQmBjNjIPcuCoiVVG0U6er6VKl+x2wLdAPlTq4
 Du6Oj7veEOLF9lAMQOq/9ZeG7IVlQT1xJ5RwtMpnZKhnlStnFlyDgPAAYs88L3Uf
 aAga/XdLd40mxOj/z3+2Fn7snOHNK/79NWB35DFOjjivNyQXMgk9hPxibECb+9bO
 sXFwhFfujFI+X7UZbfISoaTLq7c/D0EV9uqIEYjbpFXhxTXgMcey/tMDWxCLnGyl
 camkN8An1PmCsZdD9vud4QhuDqPhX6S8ndyK1C/EaStxT9t76sOSpEpybeee0xxq
 eiktBLp0uqlyK6Oo7J0/LcBnNRrYFeipU6HwztY3bz3VH/ejYWM=
 =sMTE
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpuid updates from Borislav Petkov:

 - Make UMIP instruction detection more robust

 - Correct and cleanup AMD CPU topology detection; document the relevant
   CPUID leaves topology parsing precedence on AMD

 - Add support for running the kernel as guest on FreeBSD's Bhyve
   hypervisor

 - Cleanups and improvements

* tag 'x86_cpu_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/umip: Fix decoding of register forms of 0F 01 (SGDT and SIDT aliases)
  x86/umip: Check that the instruction opcode is at least two bytes
  Documentation/x86/topology: Detail CPUID leaves used for topology enumeration
  x86/cpu/topology: Define AMD64_CPUID_EXT_FEAT MSR
  x86/cpu/topology: Check for X86_FEATURE_XTOPOLOGY instead of passing has_xtopology
  x86/cpu/cacheinfo: Simplify cacheinfo_amd_init_llc_id() using _cpuid4_info
  x86/cpu: Rename and move CPU model entry for Diamond Rapids
  x86/cpu: Detect FreeBSD Bhyve hypervisor
2025-09-30 13:19:08 -07:00
Linus Torvalds d7ec0cf1cd - Add VMSCAPE to the attack vector controls infrastructure
- A bunch of the usual cleanups and fixlets, some of them resulting from
   fuzzing the different mitigation options
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmjWZqMACgkQEsHwGGHe
 VUrMvA//dT7J2fiKEYrKa2MSMAG9/4hvLGehiyjPdlSuMVjgrpNK6+gvYBWWSkO8
 OO2rfx7a3dWZ78qG7dSgk3hnC3X+t0BS3fLq0s2Nu35oITD6TC0nOkETg6xVePqW
 mvYWnqG6CqtNEDIjm7lbUZ5ChjeWdjjBXiYhIu4w1Ev1c1GTcBT02fshd5r+ZsRE
 hcXzh5w50T5ZyOwOtdiv9I8oSXcrg4cjEExZCSefjd0h5Xd0s9SUXyk9PgBzjTAX
 QY6liuzg99YPQ+oCv65gNxfhIfrB9WGuWt2RkGtdFn6hOE+3x5fWuxGXeCKI15GM
 SWrxwIJeTqksSviL8kdTJ5IXLpmVJRGAJeX5rjPHc1CDMHGyeiPqJpZTUqeHUTi/
 hOGrgs+zvPx9rzDKrORPoC3W9kOV5Z1xF3igFQyVDuS2ctbIqlwoz5618oSveTAM
 XaQik8oFNJJxpNA4LXe8meYQWa6BF0wfLD9i2jAio5iQpkrvzbrvm6UpVRqkuZbY
 EHi0K2QD6RwRbICqbPuUh/lJA7a6NtLUYyC4HMctYw/7VmTHxaNo9Zod3yutKfHr
 7xM7T9g1iS9oo2o316Q5n1m0SuXmp58Y/GEUmxaO8WwbLCMoQt7Iicx1CqeVsgKP
 Ws6GZJ6UXzzW8aBmxa0OHB8xviOPYh4v4V7el41Uitw0XqERqdM=
 =ALng
 -----END PGP SIGNATURE-----

Merge tag 'x86_bugs_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 mitigation updates from Borislav Petkov:

 - Add VMSCAPE to the attack vector controls infrastructure

 - A bunch of the usual cleanups and fixlets, some of them resulting
   from fuzzing the different mitigation options

* tag 'x86_bugs_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Report correct retbleed mitigation status
  x86/bugs: Fix reporting of LFENCE retpoline
  x86/bugs: Fix spectre_v2 forcing
  x86/bugs: Remove uses of cpu_mitigations_off()
  x86/bugs: Simplify SSB cmdline parsing
  x86/bugs: Use early_param() for spectre_v2
  x86/bugs: Use early_param() for spectre_v2_user
  x86/bugs: Add attack vector controls for VMSCAPE
  x86/its: Move ITS indirect branch thunks to .text..__x86.indirect_thunk
2025-09-30 12:46:57 -07:00
Linus Torvalds d9c43b6e43 - Unify and refactor the MCA arch side and better separate code
- Cleanup and simplify the AMD RAS side, unify code, drop unused stuff
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmjWZhQACgkQEsHwGGHe
 VUqyWBAAndQgG8ZIH2kALrinJn4zBswHtJ0T5a5viadZtmiQBL2Z25ZeMGo22jJF
 zkHb2FOsXIZmOZUkW6UdU3O+4EdlYWsflwDpPZuGnwsSsXYyvT/eSvhkZ3BsBjgI
 OHAxG/1HHZaAz2SfGMFu63xm05+X/d9f5jprU5fa9Obf52VraTM8X5hPa4IQ1Q1B
 UT25TjYaR4IiTy/2hBfKRSFyrOqYDRNkwhoRJkxtJ0uLLZcjvolbc+WL+zVfSd/2
 JC65bTOxpmlImDl8J02y1QD3m0XwTFxdbi4LhpVa1qCu28nnk6HBXEZizr/P3foM
 8HsNRg6vSDDtW4jwy8uyr/AgTCnLj/GTP6wPU49sEJfsCN0XqeIbibu5NVygKQb8
 L+RhEqG7wdbzd1vLc858lZKl1wSx2rFOTE5xz0weTS5eSIFpiTYvsQtoJPjN5SOH
 rl1iwDtYhHXrGEYcel+af3zJrzJGB7+e5dVRPRicbPw1F0+Ty7Zdd04yk5VRV5LV
 XndmK8JqWQPtZ+m8W3I6CznzAlzRUKtdm/J6sst9HQSU2Rk4CHqABi/Kgff2t7A4
 ZjXYoLH9zVcM1R/unMsjqg7u4xLeB5x5vAOBlK9ruQSVZc03W+XAXPBD2a+6yXTb
 rPzDkV9aT77Sipk8cHR2shxu6Aw5HazRZmw5Gsid+8Uy9VFedPE=
 =+njZ
 -----END PGP SIGNATURE-----

Merge tag 'ras_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 RAS updates from Borislav Petkov:

 - Unify and refactor the MCA arch side and better separate code

 - Cleanup and simplify the AMD RAS side, unify code, drop unused stuff

* tag 'ras_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Add a clear_bank() helper
  x86/mce: Move machine_check_poll() status checks to helper functions
  x86/mce: Separate global and per-CPU quirks
  x86/mce: Do 'UNKNOWN' vendor check early
  x86/mce: Define BSP-only SMCA init
  x86/mce: Define BSP-only init
  x86/mce: Set CR4.MCE last during init
  x86/mce: Remove __mcheck_cpu_init_early()
  x86/mce: Cleanup bank processing on init
  x86/mce/amd: Put list_head in threshold_bank
  x86/mce/amd: Remove smca_banks_map
  x86/mce/amd: Remove return value for mce_threshold_{create,remove}_device()
  x86/mce/amd: Rename threshold restart function
2025-09-30 12:43:17 -07:00
Linus Torvalds 45d96dd2c6 - Fix RDPID's output operand size in inline asm and use the insn
mnemonic because the minimum binutils version supports it
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmjWZUoACgkQEsHwGGHe
 VUpFiBAAjaGbb700+W4qCbdtXHkZqdr/1al4LrxhBzVda6DJTYCIXSLjT086rTsr
 kZ424g/XAw5YhQZBeVbUoYDe1tAfi14yskjJX5rwxD9kwYcoPRTw5Y68V5xc6TBt
 MpG2M5XIJDYG3A0uRjSSrj6zDsN13hxJZr78otmMG8mJo+rejhoVl4QTi0KxHPhw
 dLeu3K8MRtzAwrcuv8J/u63PJPnGeJrS0Dsf+Lu5+tdAIvcmyAkflDd5pEIiVngi
 kXgAZsZrCGjG6UrOirNnW5mzOLQm81N2DTv9A/eJCaCpi+dB0oY4xvy9ilpM6yxE
 kVswTY3loKGvxXpjBSMdA5D6hen1pfqdiYyc0YxD9JvjROxAYQqX/wH26dH24Fjn
 kJY+S9QHt+OLDNsUER901Dk/UKeJpMJw3nzbdiAfhDouAK2nMU3h53W5ILVlMGIR
 l0fH1ryxLLvm4BA2OOK2DTFccNEtO3vGl8NbiIKHI840lsGZldGfmszmM2egxhik
 gptRS+Xz0SsTsCbni0jJ2TZc55rXWfqm1kGJxtuE/+KoR/Xwi1XrlhFpGnUOZiJO
 vdWQZjF4eAFgDVQI0cJ8YFUTzZj/r6cvhhBo7kNa+5zKzul70R3zS03qp+pDHkGW
 1JkmvSHo+XsxQxO5H7A26RmfEmjqGIDgbZ93PYBWvy5BE2veCPY=
 =VSzx
 -----END PGP SIGNATURE-----

Merge tag 'x86_asm_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 asm update from Borislav Petkov:

 - Fix RDPID's output operand size in inline asm and use the insn
   mnemonic because the minimum binutils version supports it

* tag 'x86_asm_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vdso: Fix output operand size of RDPID
2025-09-30 12:19:29 -07:00
Linus Torvalds 98afd4dd3d - Add instruction decoding support for the XOP-prefixed instruction set
present on the AMD Bulldozer uarch
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmjWZM8ACgkQEsHwGGHe
 VUpDwQ//UeVlmtFqIK/KJSoc0/WrL34D7ICqoKuo9ocLoYHrGLNxinV18GhtJ4vD
 o1pmuwcLj6fIEIjTvlNSvs1/mtB7XYNRMm/W21fOLAyhRARg1P3Mi+crpMB/HrfA
 Lowx+YjKYFRo6NUaNARCAfgnkkFz9w715o5ab5rr3yxkqeVSyYUz7Rl2eIDc03vl
 HYEarpRFhrvwi8ccxoz1xKECehmvW0Htn8QzaHGrJkOf/gfWJdIz5KmjFDONt8mM
 AdMVTZ49lM3fzoYRpsr2+x5oHWfjMw1nRYyJliLhriVDga03FNzj6FL+AHt3itvG
 2wFqJfYxOETMjdvSHAQHkOZUu7ZSKFNIwtYdqsdklTPNB5AVNE2ANd21vY2+wgUP
 P5aIXNo7SEIaUSxkE7JfUubJTn/lRExZOkgZx/fwYeFbF1Wc7e5w9Q5bhxq70s1A
 1E0iSAPR30t9HvveDCDLBGqbNAfNsBN0E3m4EULRcEwKZ3P7OvknddBHpVjnwEG+
 +OgDf/mJQt3hW6ubvYHvgEhOYyEnc92z9YEtqFiGG6bRx0ppqO+K/7FE7vBrUp7a
 fDRsyYtWHLQVOZTmeStK7aeslM9BGnsmhACy566oaX1C11f2I7KLYQAI1apKb8A4
 y+NjQUebmTBmlKx4J+QnKBOhGNbl1xbSFJPIoNM0WIlm410Od48=
 =mUtz
 -----END PGP SIGNATURE-----

Merge tag 'x86_misc_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 instruction decoder update from Borislav Petkov:

 - Add instruction decoding support for the XOP-prefixed instruction set
   present on the AMD Bulldozer uarch

[ These instructions don't normally happen, but a X86_NATIVE_CPU build
  on a bulldozer host can make the compiler then use these unusual
  instruction encodings ]

* tag 'x86_misc_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/insn: Add XOP prefix instructions decoder support
2025-09-30 11:48:33 -07:00
Linus Torvalds e4dcbdff11 Performance events updates for v6.18:
Core perf code updates:
 
  - Convert mmap() related reference counts to refcount_t. This
    is in reaction to the recently fixed refcount bugs, which
    could have been detected earlier and could have mitigated
    the bug somewhat. (Thomas Gleixner, Peter Zijlstra)
 
  - Clean up and simplify the callchain code, in preparation
    for sframes. (Steven Rostedt, Josh Poimboeuf)
 
 Uprobes updates:
 
  - Add support to optimize usdt probes on x86-64, which
    gives a substantial speedup. (Jiri Olsa)
 
  - Cleanups and fixes on x86 (Peter Zijlstra)
 
 PMU driver updates:
 
  - Various optimizations and fixes to the Intel PMU driver
    (Dapeng Mi)
 
 Misc cleanups and fixes:
 
  - Remove redundant __GFP_NOWARN (Qianfeng Rong)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmjWpGIRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iHvxAAvO8qWbbhUdF3EZaFU0Wx6oh5KBhImU49
 VZ107xe9llA0Szy3hIl1YpdOQA2NAHtma6We/ebonrPVTTkcSCGq8absc+GahA3I
 CHIomx2hjD0OQ01aHvTqgHJUdFUQQ0yzE3+FY6Tsn05JsNZvDmqpAMIoMQT0LuuG
 7VvVRLBuDXtuMtNmGaGCvfDGKTZkGGxD6iZS1iWHuixvVAz4IECK0vYqSyh31UGA
 w9Jwa0thwjKm2EZTmcSKaHSM2zw3N8QXJ3SNPPThuMrtO6QDz2+3Da9kO+vhGcRP
 Jls9KnWC2wxNxqIs3dr80Mzn4qMplc67Ekx2tUqX4tYEGGtJQxW6tm3JOKKIgFMI
 g/KF9/WJPXp0rVI9mtoQkgndzyswR/ZJBAwfEQu+nAqlp3gmmQR9+MeYPCyNnyhB
 2g22PTMbXkihJmRPAVeH+WhwFy1YY3nsRhh61ha3/N0ULXTHUh0E+hWwUVMifYSV
 SwXqQx4srlo6RJJNTji1d6R3muNjXCQNEsJ0lCOX6ajVoxWZsPH2x7/W1A8LKmY+
 FLYQUi6X9ogQbOO3WxCjUhzp5nMTNA2vvo87MUzDlZOCLPqYZmqcjntHuXwdjPyO
 lPcfTzc2nK1Ud26bG3+p2Bk3fjqkX9XcTMFniOvjKfffEfwpAq4xRPBQ3uRlzn0V
 pf9067JYF+c=
 =sVXH
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull performance events updates from Ingo Molnar:
 "Core perf code updates:

   - Convert mmap() related reference counts to refcount_t. This is in
     reaction to the recently fixed refcount bugs, which could have been
     detected earlier and could have mitigated the bug somewhat (Thomas
     Gleixner, Peter Zijlstra)

   - Clean up and simplify the callchain code, in preparation for
     sframes (Steven Rostedt, Josh Poimboeuf)

  Uprobes updates:

   - Add support to optimize usdt probes on x86-64, which gives a
     substantial speedup (Jiri Olsa)

   - Cleanups and fixes on x86 (Peter Zijlstra)

  PMU driver updates:

   - Various optimizations and fixes to the Intel PMU driver (Dapeng Mi)

  Misc cleanups and fixes:

   - Remove redundant __GFP_NOWARN (Qianfeng Rong)"

* tag 'perf-core-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
  selftests/bpf: Fix uprobe_sigill test for uprobe syscall error value
  uprobes/x86: Return error from uprobe syscall when not called from trampoline
  perf: Skip user unwind if the task is a kernel thread
  perf: Simplify get_perf_callchain() user logic
  perf: Use current->flags & PF_KTHREAD|PF_USER_WORKER instead of current->mm == NULL
  perf: Have get_perf_callchain() return NULL if crosstask and user are set
  perf: Remove get_perf_callchain() init_nr argument
  perf/x86: Print PMU counters bitmap in x86_pmu_show_pmu_cap()
  perf/x86/intel: Add ICL_FIXED_0_ADAPTIVE bit into INTEL_FIXED_BITS_MASK
  perf/x86/intel: Change macro GLOBAL_CTRL_EN_PERF_METRICS to BIT_ULL(48)
  perf/x86: Add PERF_CAP_PEBS_TIMING_INFO flag
  perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error
  perf/x86/intel: Use early_initcall() to hook bts_init()
  uprobes: Remove redundant __GFP_NOWARN
  selftests/seccomp: validate uprobe syscall passes through seccomp
  seccomp: passthrough uprobe systemcall without filtering
  selftests/bpf: Fix uprobe syscall shadow stack test
  selftests/bpf: Change test_uretprobe_regs_change for uprobe and uretprobe
  selftests/bpf: Add uprobe_regs_equal test
  selftests/bpf: Add optimized usdt variant for basic usdt test
  ...
2025-09-30 11:11:21 -07:00
Sean Christopherson d273b52b6f KVM: x86: Move kvm_intr_is_single_vcpu() to lapic.c
Move kvm_intr_is_single_vcpu() to lapic.c, drop its export, and make its
"fast" helper local to lapic.c.  kvm_intr_is_single_vcpu() is only usable
if the local APIC is in-kernel, i.e. it most definitely belongs in the
local APIC code.

No functional change intended.

Fixes: cf04ec393e ("KVM: x86: Dedup AVIC vs. PI code for identifying target vCPU")
Link: https://lore.kernel.org/r/20250919003303.1355064-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-09-30 13:40:02 -04:00
Sean Christopherson 20c4892058 KVM: Export KVM-internal symbols for sub-modules only
Rework the vast majority of KVM's exports to expose symbols only to KVM
submodules, i.e. to x86's kvm-{amd,intel}.ko and PPC's kvm-{pr,hv}.ko.
With few exceptions, KVM's exported APIs are intended (and safe) for KVM-
internal usage only.

Keep kvm_get_kvm(), kvm_get_kvm_safe(), and kvm_put_kvm() as normal
exports, as they are needed by VFIO, and are generally safe for external
usage (though ideally even the get/put APIs would be KVM-internal, and
VFIO would pin a VM by grabbing a reference to its associated file).

Implement a framework in kvm_types.h in anticipation of providing a macro
to restrict KVM-specific kernel exports, i.e. to provide symbol exports
for KVM if and only if KVM is built as one or more modules.

Link: https://lore.kernel.org/r/20250919003303.1355064-3-seanjc@google.com
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-09-30 13:40:02 -04:00
Paolo Bonzini 12abeb81c8 KVM x86 CET virtualization support for 6.18
Add support for virtualizing Control-flow Enforcement Technology (CET) on
 Intel (Shadow Stacks and Indirect Branch Tracking) and AMD (Shadow Stacks).
 
 CET is comprised of two distinct features, Shadow Stacks (SHSTK) and Indirect
 Branch Tracking (IBT), that can be utilized by software to help provide
 Control-flow integrity (CFI).  SHSTK defends against backward-edge attacks
 (a.k.a. Return-oriented programming (ROP)), while IBT defends against
 forward-edge attacks (a.k.a. similarly CALL/JMP-oriented programming (COP/JOP)).
 
 Attackers commonly use ROP and COP/JOP methodologies to redirect the control-
 flow to unauthorized targets in order to execute small snippets of code,
 a.k.a. gadgets, of the attackers choice.  By chaining together several gadgets,
 an attacker can perform arbitrary operations and circumvent the system's
 defenses.
 
 SHSTK defends against backward-edge attacks, which execute gadgets by modifying
 the stack to branch to the attacker's target via RET, by providing a second
 stack that is used exclusively to track control transfer operations.  The
 shadow stack is separate from the data/normal stack, and can be enabled
 independently in user and kernel mode.
 
 When SHSTK is is enabled, CALL instructions push the return address on both the
 data and shadow stack. RET then pops the return address from both stacks and
 compares the addresses.  If the return addresses from the two stacks do not
 match, the CPU generates a Control Protection (#CP) exception.
 
 IBT defends against backward-edge attacks, which branch to gadgets by executing
 indirect CALL and JMP instructions with attacker controlled register or memory
 state, by requiring the target of indirect branches to start with a special
 marker instruction, ENDBRANCH.  If an indirect branch is executed and the next
 instruction is not an ENDBRANCH, the CPU generates a #CP.  Note, ENDBRANCH
 behaves as a NOP if IBT is disabled or unsupported.
 
 From a virtualization perspective, CET presents several problems.  While SHSTK
 and IBT have two layers of enabling, a global control in the form of a CR4 bit,
 and a per-feature control in user and kernel (supervisor) MSRs (U_CET and S_CET
 respectively), the {S,U}_CET MSRs can be context switched via XSAVES/XRSTORS.
 Practically speaking, intercepting and emulating XSAVES/XRSTORS is not a viable
 option due to complexity, and outright disallowing use of XSTATE to context
 switch SHSTK/IBT state would render the features unusable to most guests.
 
 To limit the overall complexity without sacrificing performance or usability,
 simply ignore the potential virtualization hole, but ensure that all paths in
 KVM treat SHSTK/IBT as usable by the guest if the feature is supported in
 hardware, and the guest has access to at least one of SHSTK or IBT.  I.e. allow
 userspace to advertise one of SHSTK or IBT if both are supported in hardware,
 even though doing so would allow a misbehaving guest to use the unadvertised
 feature.
 
 Fully emulating SHSTK and IBT would also require significant complexity, e.g.
 to track and update branch state for IBT, and shadow stack state for SHSTK.
 Given that emulating large swaths of the guest code stream isn't necessary on
 modern CPUs, punt on emulating instructions that meaningful impact or consume
 SHSTK or IBT.  However, instead of doing nothing, explicitly reject emulation
 of such instructions so that KVM's emulator can't be abused to circumvent CET.
 Disable support for SHSTK and IBT if KVM is configured such that emulation of
 arbitrary guest instructions may be required, specifically if Unrestricted
 Guest (Intel only) is disabled, or if KVM will emulate a guest.MAXPHYADDR that
 is smaller than host.MAXPHYADDR.
 
 Lastly disable SHSTK support if shadow paging is enabled, as the protections
 for the shadow stack are novel (shadow stacks require Writable=0,Dirty=1, so
 that they can't be directly modified by software), i.e. would require
 non-trivial support in the Shadow MMU.
 
 Note, AMD CPUs currently only support SHSTK.  Explicitly disable IBT support
 so that KVM doesn't over-advertise if AMD CPUs add IBT, and virtualizing IBT
 in SVM requires KVM modifications.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmjXbisACgkQOlYIJqCj
 N/373w//ckB4c9MjS6eDRp+LtTXQfXyAs8eMcs9YTs7yD3uMvqcbaNuDsf1U2cI6
 i2qcuOdxlnKSJphn6oH2JKDWPjRAfHhCqmYghUPaJwgeYqsTfork9s8rzU2tC82q
 38mQ6BhAuOwa/plodvDp/+POEIoXUyexSoWX+cngGVTmFWdbfA4NNGjWMZOl1XG2
 qLBck6t+IxxUTs1Ij+OsexlAKdY7FcZZ85Ok6I/VE4/lITEhuTJkwkYdh8td3KK/
 IVVk1jb1Z7t8lGQ5fi3+N/D8iHJ/0ladmOux6Yxzw88uyj6XLIFOOFsdK09GyhUS
 QzV06syFkV2vU68VDYiOcMZIdeGmYR5jDpmy9N+o0s86YLU6rKKEaXRP7vW5yHj/
 99AU+DfRHvhqKwWyQ51B+rhr80F3EQrkZXI0QBr8KO7sseFZvZNNVozwKjSyZtNH
 VBhxjIlVQm5Z1rjucKjc573sONK95z9XUSZjYnCUwB1NH7VsvdULQmJBucCmzW/p
 9j49CpmShwggceV6LcYg4Miuvjl/bL1B8Go5Fg+1Fdg7L6Nepi16yywxHmyPqreJ
 Wx/6N0gqZ3LKDdl5CFYxAxvJoldJR6lbw/AGjvFkre8A+TGGRdz3uS9XXqGHvtbu
 W5wKhnvGov69lm4xYbxbI+rvxYmmQLm9SgQXel23icbKJ5kmE48=
 =zsBl
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-cet-6.18' of https://github.com/kvm-x86/linux into HEAD

KVM x86 CET virtualization support for 6.18

Add support for virtualizing Control-flow Enforcement Technology (CET) on
Intel (Shadow Stacks and Indirect Branch Tracking) and AMD (Shadow Stacks).

CET is comprised of two distinct features, Shadow Stacks (SHSTK) and Indirect
Branch Tracking (IBT), that can be utilized by software to help provide
Control-flow integrity (CFI).  SHSTK defends against backward-edge attacks
(a.k.a. Return-oriented programming (ROP)), while IBT defends against
forward-edge attacks (a.k.a. similarly CALL/JMP-oriented programming (COP/JOP)).

Attackers commonly use ROP and COP/JOP methodologies to redirect the control-
flow to unauthorized targets in order to execute small snippets of code,
a.k.a. gadgets, of the attackers choice.  By chaining together several gadgets,
an attacker can perform arbitrary operations and circumvent the system's
defenses.

SHSTK defends against backward-edge attacks, which execute gadgets by modifying
the stack to branch to the attacker's target via RET, by providing a second
stack that is used exclusively to track control transfer operations.  The
shadow stack is separate from the data/normal stack, and can be enabled
independently in user and kernel mode.

When SHSTK is is enabled, CALL instructions push the return address on both the
data and shadow stack. RET then pops the return address from both stacks and
compares the addresses.  If the return addresses from the two stacks do not
match, the CPU generates a Control Protection (#CP) exception.

IBT defends against backward-edge attacks, which branch to gadgets by executing
indirect CALL and JMP instructions with attacker controlled register or memory
state, by requiring the target of indirect branches to start with a special
marker instruction, ENDBRANCH.  If an indirect branch is executed and the next
instruction is not an ENDBRANCH, the CPU generates a #CP.  Note, ENDBRANCH
behaves as a NOP if IBT is disabled or unsupported.

From a virtualization perspective, CET presents several problems.  While SHSTK
and IBT have two layers of enabling, a global control in the form of a CR4 bit,
and a per-feature control in user and kernel (supervisor) MSRs (U_CET and S_CET
respectively), the {S,U}_CET MSRs can be context switched via XSAVES/XRSTORS.
Practically speaking, intercepting and emulating XSAVES/XRSTORS is not a viable
option due to complexity, and outright disallowing use of XSTATE to context
switch SHSTK/IBT state would render the features unusable to most guests.

To limit the overall complexity without sacrificing performance or usability,
simply ignore the potential virtualization hole, but ensure that all paths in
KVM treat SHSTK/IBT as usable by the guest if the feature is supported in
hardware, and the guest has access to at least one of SHSTK or IBT.  I.e. allow
userspace to advertise one of SHSTK or IBT if both are supported in hardware,
even though doing so would allow a misbehaving guest to use the unadvertised
feature.

Fully emulating SHSTK and IBT would also require significant complexity, e.g.
to track and update branch state for IBT, and shadow stack state for SHSTK.
Given that emulating large swaths of the guest code stream isn't necessary on
modern CPUs, punt on emulating instructions that meaningful impact or consume
SHSTK or IBT.  However, instead of doing nothing, explicitly reject emulation
of such instructions so that KVM's emulator can't be abused to circumvent CET.
Disable support for SHSTK and IBT if KVM is configured such that emulation of
arbitrary guest instructions may be required, specifically if Unrestricted
Guest (Intel only) is disabled, or if KVM will emulate a guest.MAXPHYADDR that
is smaller than host.MAXPHYADDR.

Lastly disable SHSTK support if shadow paging is enabled, as the protections
for the shadow stack are novel (shadow stacks require Writable=0,Dirty=1, so
that they can't be directly modified by software), i.e. would require
non-trivial support in the Shadow MMU.

Note, AMD CPUs currently only support SHSTK.  Explicitly disable IBT support
so that KVM doesn't over-advertise if AMD CPUs add IBT, and virtualizing IBT
in SVM requires KVM modifications.
2025-09-30 13:37:14 -04:00
Paolo Bonzini d05ca6b793 KVM x86 changes for 6.18
- Don't (re)check L1 intercepts when completing userspace I/O to fix a flaw
    where a misbehaving usersepace (a.k.a. syzkaller) could swizzle L1's
    intercepts and trigger a variety of WARNs in KVM.
 
  - Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the MSR is
    supposed to exist for v2 PMUs.
 
  - Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs.
 
  - Clean up KVM's vector hashing code for delivering lowest priority IRQs.
 
  - Clean up the fastpath handler code to only handle IPIs and WRMSRs that are
    actually "fast", as opposed to handling those that KVM _hopes_ are fast, and
    in the process of doing so add fastpath support for TSC_DEADLINE writes on
    AMD CPUs.
 
  - Clean up a pile of PMU code in anticipation of adding support for mediated
    vPMUs.
 
  - Add support for the immediate forms of RDMSR and WRMSRNS, sans full
    emulator support (KVM should never need to emulate the MSRs outside of
    forced emulation and other contrived testing scenarios).
 
  - Clean up the MSR APIs in preparation for CET and FRED virtualization, as
    well as mediated vPMU support.
 
  - Rejecting a fully in-kernel IRQCHIP if EOIs are protected, i.e. for TDX VMs,
    as KVM can't faithfully emulate an I/O APIC for such guests.
 
  - KVM_REQ_MSR_FILTER_CHANGED into a generic RECALC_INTERCEPTS in preparation
    for mediated vPMU support, as KVM will need to recalculate MSR intercepts in
    response to PMU refreshes for guests with mediated vPMUs.
 
  - Misc cleanups and minor fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmjXIr0ACgkQOlYIJqCj
 N/1bbhAAxHzxN7IcizgAYf1BZWMjRU4zJgwlkoGuBeH/IgUOODPjs93L9kyrzvVL
 tcFgIe9o5fZRGmUfyZbCKnJaQi/4u/2QPRSGhsYt7vyDjCoXzO5CJPMYIqDz5Z2r
 qg+GNMlLtWI8EbcDd4qT22SWC8GufoXFEQnX6PUNhasOHeKit5ye8wmttcG+zvYV
 KeIkPluddQkQ2JKyG53IFNmm1lkY05oAibv61hkxqUSwCIJKsQFuDjl4GVouAd/H
 eu0+pzNmzPUTQ/qJzr2cNL5Nqz08DGp2OCFFRO6bgXaWkvHnFG3EAEHlhTAUh92t
 LPJxmhb6R8SUc+z8rYTgyF/zVpgeJcJO7F44FrXa7r2iV58ds3TfuO53hVaEfyNp
 1GUMH0m8N2vfjtFyUVP1KwZHuFxiGKLd1wZ1h0yKpj1Eg1FjR2cEontqwH44tHn2
 ENq8MIbWIBhvCsz5fIbM4y591JSevJUrDlYu60Lz7VyXHAw8Cq92t/dN9O7oH5mJ
 pIyoracU1g0Q6bbATZYsOGhkCTYLtdelZaBb5AYIgQ+U4C1TA4GpgEBUSVH8HXDy
 kXzVqSFlL0v5rrFkBPjiNFb5WD3iLjJIM3DLGoNegOM8+79r/USGHUY+XU3z/kCH
 rV8JBlTnLBCrNOHEiHJUI2kwBQ9C9/l88X/VwvRUNv7SthuExSo=
 =9IB0
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-misc-6.18' of https://github.com/kvm-x86/linux into HEAD

KVM x86 changes for 6.18

 - Don't (re)check L1 intercepts when completing userspace I/O to fix a flaw
   where a misbehaving usersepace (a.k.a. syzkaller) could swizzle L1's
   intercepts and trigger a variety of WARNs in KVM.

 - Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the MSR is
   supposed to exist for v2 PMUs.

 - Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs.

 - Clean up KVM's vector hashing code for delivering lowest priority IRQs.

 - Clean up the fastpath handler code to only handle IPIs and WRMSRs that are
   actually "fast", as opposed to handling those that KVM _hopes_ are fast, and
   in the process of doing so add fastpath support for TSC_DEADLINE writes on
   AMD CPUs.

 - Clean up a pile of PMU code in anticipation of adding support for mediated
   vPMUs.

 - Add support for the immediate forms of RDMSR and WRMSRNS, sans full
   emulator support (KVM should never need to emulate the MSRs outside of
   forced emulation and other contrived testing scenarios).

 - Clean up the MSR APIs in preparation for CET and FRED virtualization, as
   well as mediated vPMU support.

 - Rejecting a fully in-kernel IRQCHIP if EOIs are protected, i.e. for TDX VMs,
   as KVM can't faithfully emulate an I/O APIC for such guests.

 - KVM_REQ_MSR_FILTER_CHANGED into a generic RECALC_INTERCEPTS in preparation
   for mediated vPMU support, as KVM will need to recalculate MSR intercepts in
   response to PMU refreshes for guests with mediated vPMUs.

 - Misc cleanups and minor fixes.
2025-09-30 13:36:41 -04:00
Paolo Bonzini a104e0a305 KVM SVM changes for 6.18
- Require a minimum GHCB version of 2 when starting SEV-SNP guests via
    KVM_SEV_INIT2 so that invalid GHCB versions result in immediate errors
    instead of latent guest failures.
 
  - Add support for Secure TSC for SEV-SNP guests, which prevents the untrusted
    host from tampering with the guest's TSC frequency, while still allowing the
    the VMM to configure the guest's TSC frequency prior to launch.
 
  - Mitigate the potential for TOCTOU bugs when accessing GHCB fields by
    wrapping all accesses via READ_ONCE().
 
  - Validate the XCR0 provided by the guest (via the GHCB) to avoid tracking a
    bogous XCR0 value in KVM's software model.
 
  - Save an SEV guest's policy if and only if LAUNCH_START fully succeeds to
    avoid leaving behind stale state (thankfully not consumed in KVM).
 
  - Explicitly reject non-positive effective lengths during SNP's LAUNCH_UPDATE
    instead of subtly relying on guest_memfd to do the "heavy" lifting.
 
  - Reload the pre-VMRUN TSC_AUX on #VMEXIT for SEV-ES guests, not the host's
    desired TSC_AUX, to fix a bug where KVM could clobber a different vCPU's
    TSC_AUX due to hardware not matching the value cached in the user-return MSR
    infrastructure.
 
  - Enable AVIC by default for Zen4+ if x2AVIC (and other prereqs) is supported,
    and clean up the AVIC initialization code along the way.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmjXH54ACgkQOlYIJqCj
 N/0OCw//e+0o6jov6/PO8ljq6sXJySsXKxEFYnvQlWYzjqtlVs05Y2SY0GBTnMu3
 g0ie2c4V3VD7cY5bGAWETWvrOMLqGXM3E7v9dVOuE4xU3xx0HkCAlXc/woOLUXoT
 jo/komNXnpeiZ1QRO9FlGooHTJ6Y+jg6/mM7asStS2Pk3Mm//wYgQej9mSJDrypo
 NB4+BCS9cyt8rndNtCUkyedFYMboVQ8AEvXh/jeydhw4rdbBh0/Ci2IKGcVI5DP1
 be8GD/FsNTIUDtieHRYCR+LCKCMFj/hYzlg2nQ6UjxHZbvlDyQuh2Ld2LtZiGSef
 ejNr9e+ro6vxWBgX6wplWtKRLxBYEnQ1h/rQ9A3g50TuhrtFJbxBxY7DPQ16hlBJ
 EB/E1JFvVgkGVrYN0oPQCvvfhFtpkx43qnEBw4q0pbdAS79XOnG2GJFvI0hpZAP6
 qwy19lbsJ5g3qLTlDPChxQJC08gThn3CbarCmZNNzBpPDQoLDUfYBfyN4prRPuiN
 UByfaaEC0Fi6JSgmHsO0LsUB9K++k2ucWiIIW4YQhVgPUtCjTNLe9omgGJ1UYe0X
 YITqgklewe3QtBJ46JE0APkPaHio7r6zd7QvO+RhRFkjwZfY6dlsrSImykKrpK3O
 rPaZnW+UpAnA1XIqroMl1RVoczFCfGcP1Cat9JwScBVVxjJ1DlI=
 =zd53
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-svm-6.18' of https://github.com/kvm-x86/linux into HEAD

KVM SVM changes for 6.18

 - Require a minimum GHCB version of 2 when starting SEV-SNP guests via
   KVM_SEV_INIT2 so that invalid GHCB versions result in immediate errors
   instead of latent guest failures.

 - Add support for Secure TSC for SEV-SNP guests, which prevents the untrusted
   host from tampering with the guest's TSC frequency, while still allowing the
   the VMM to configure the guest's TSC frequency prior to launch.

 - Mitigate the potential for TOCTOU bugs when accessing GHCB fields by
   wrapping all accesses via READ_ONCE().

 - Validate the XCR0 provided by the guest (via the GHCB) to avoid tracking a
   bogous XCR0 value in KVM's software model.

 - Save an SEV guest's policy if and only if LAUNCH_START fully succeeds to
   avoid leaving behind stale state (thankfully not consumed in KVM).

 - Explicitly reject non-positive effective lengths during SNP's LAUNCH_UPDATE
   instead of subtly relying on guest_memfd to do the "heavy" lifting.

 - Reload the pre-VMRUN TSC_AUX on #VMEXIT for SEV-ES guests, not the host's
   desired TSC_AUX, to fix a bug where KVM could clobber a different vCPU's
   TSC_AUX due to hardware not matching the value cached in the user-return MSR
   infrastructure.

 - Enable AVIC by default for Zen4+ if x2AVIC (and other prereqs) is supported,
   and clean up the AVIC initialization code along the way.
2025-09-30 13:34:12 -04:00
Paolo Bonzini 5b0d0d8542 KVM x86 MMU changes for 6.18
- Recover possible NX huge pages within the TDP MMU under read lock to
    reduce guest jitter when restoring NX huge pages.
 
  - Return -EAGAIN during prefault if userspace concurrently deletes/moves the
    relevant memslot to fix an issue where prefaulting could deadlock with the
    memslot update.
 
  - Don't retry in TDX's anti-zero-step mitigation if the target memslot is
    invalid, i.e. is being deleted or moved, to fix a deadlock scenario similar
    to the aforementioned prefaulting case.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmjXHaEACgkQOlYIJqCj
 N/1uDxAAxGMl1q1Hg0tpVPw7PdcourXlVYJjFzsrK6CdtZpL7n2GJPVhEFBDovud
 oIM9IIiP5f2UDtWeRb6b/mm9INqwTB8lyswbJk/tO+CshBiBdE7PfDbzDzvj9lAv
 Uecc6tQhv+CDpJcSf7t5OqgiRo5gEBTXZZj0l5GOdtiaOU09eq4ttZTME5S1jQgh
 kBddFd3glWeMLv67cTNCxdHsOFnaVWIBoupfw7Fv7LVJ1k6cgKyHAhjfq8A9elEK
 3CyDo8DZ8MG4aguhHzAUQuEM9ELMxOTyJG8xS2BWtFA/glbvUBnOfGeyTmHgo/nN
 qKyjytlpmO0yIlehTd/5tLfpidL8l30VN7+nDpqwTjCDEz9bC39zC9zBmKni84Dt
 wItfmELb6lbvprA+FOseiRwk7/2quLrgc4y21GI29Zqbf6wMoQEnRHF/moFZ3cqg
 C/SP1Ev6N5ENM2BZG9mFSRWr8e2yyan8YWs+AUtsBEM82KaeJrMlZ4yqA1m33a5T
 YK5eL3DablObdfvvz1YXCVxByQ7aIbVCpE3VVigeyHrqoR/EFwZMzYLouOI34jjN
 Nj5+Qck6VMhI+OetUlcXS1D/DIHgpDgZFPcgeLURiwO0l62H/gYLHuoCek4YmkIi
 30ZwVXubBWDg5TcxEi5oIbVfyZfHNi+MyeLMWLEy6hEdnFsTsZU=
 =6qMx
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-mmu-6.18' of https://github.com/kvm-x86/linux into HEAD

KVM x86 MMU changes for 6.18

 - Recover possible NX huge pages within the TDP MMU under read lock to
   reduce guest jitter when restoring NX huge pages.

 - Return -EAGAIN during prefault if userspace concurrently deletes/moves the
   relevant memslot to fix an issue where prefaulting could deadlock with the
   memslot update.

 - Don't retry in TDX's anti-zero-step mitigation if the target memslot is
   invalid, i.e. is being deleted or moved, to fix a deadlock scenario similar
   to the aforementioned prefaulting case.
2025-09-30 13:32:27 -04:00
Paolo Bonzini 3c5d19a365 x86/kvm guest side changes for 6.18
- For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when
    overriding guest MTRR for TDX/SNP to fix an issue where ACPI auto-mapping
    could map devices as WB and prevent the device drivers from mapping their
    devices with UC/UC-.
 
  - Make kvm_async_pf_task_wake() a local static helper and remove its
    export.
 
  - Use native qspinlocks when running in a VM with dedicated vCPU=>pCPU
    bindings even when PV_UNHALT is unsupported.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmjXHJAACgkQOlYIJqCj
 N/2/FhAAgQ32+t+uN0fPWPRLeVKMc4EKYkqGc/rSxNOT2N1edEnLvH3JU/cEkHsN
 og35gJ19bGKjcp6uF0W32xAXTbJijWfmQExZpl/Gwo3Xe84y3IX0gPODj5UUp9ip
 F9DuJ3ZU70kmwuL0VucyL1lD/i+9aQh+eCB27NGuX3NXmN46BWlMcVlsflwqMFFs
 iRnCJnQ2pPT9dhFG6rEeLCdGDWdkobd+662GhNTRzG0Wl95n/T6aA7hugbkbJtmN
 V4mxQsUpLrPbkxyrhVqowSeoU1zENv3e2McCfn24gRn0aXjWV/oSdEff9hqcBD3Q
 ZKagD5nQHv8qpHze1I/psPwwowV9lW7JkFMczdGSfWHtHhdsGsrePx3aUYj3zflZ
 ZL0adRUO5wSVfkA0UAZPGSVZzjlCyMzefx/SEBt+gD0UIiTZaX13Joix4R54ndrg
 MpMV+KvnihkbAjoYd8N3Qgh4YZE+FdFYKUoZXBdUe8opMCo7N969k6wukPjjpCe8
 yr87v7xClpBZQgWHjFT039p0rxv8RKLGHsg9a8Pj92o7zZBpzr5VZmOFjIZ/AIWp
 mIN9LkPGqf5gZD+EQm/E9lNEoGP4W8mgwkztIcaAj8HvD+RBMZmM9jaowKQymv9+
 QhW7GP8iJfKPGT6C2QmZJh+EoDtuHyVj77v2isk3oDk2vHVFYzs=
 =JSdJ
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-guest-6.18' of https://github.com/kvm-x86/linux into HEAD

x86/kvm guest side changes for 6.18

 - For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when
   overriding guest MTRR for TDX/SNP to fix an issue where ACPI auto-mapping
   could map devices as WB and prevent the device drivers from mapping their
   devices with UC/UC-.

 - Make kvm_async_pf_task_wake() a local static helper and remove its
   export.

 - Use native qspinlocks when running in a VM with dedicated vCPU=>pCPU
   bindings even when PV_UNHALT is unsupported.
2025-09-30 13:24:59 -04:00
Paolo Bonzini 6a13749717 LoongArch KVM changes for v6.18
1. Add PTW feature detection on new hardware.
 2. Add sign extension with kernel MMIO/IOCSR emulation.
 3. Improve in-kernel IPI emulation.
 4. Improve in-kernel PCH-PIC emulation.
 5. Move kvm_iocsr tracepoint out of generic code.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmjTd/wWHGNoZW5odWFj
 YWlAa2VybmVsLm9yZwAKCRAChivD8uImep7+D/4uA7triOH1eB4TikB6NlF3T+ry
 1qLycMy+O8mm24UxT0217sqOtKiZHrglyJB+cCmxPNjq0vWX/F08VgNoh8+6feWL
 6X28YlJ8qcRGCTUxlaaleEGIiXt7lbbnZKFfiBn8Ibb9rHn3tE8V738Kzm+SV1Dr
 bSZlAGnAqp/pRI1UFBQ0T+GpqQz+UvDw8JCOJj5Vs5UylhDY3atPaNhLjfR2tkUh
 AFrqr87gKPHdJxmk//7u+e6QLGViBB9aO1fNMP6y8gViJRfkCEbbm8XYe0qe+SmO
 QpLKuHBEVo7C8vOzemEVieQX2VujDcGSDDRGCU3wKbIpbIQgmOGbbsKfrKf9FxaR
 8ieNyP3UExr2ZvYV9SOLqeLD2K2yox9EkO7tD2CM9kwev9HUtr+6e/OaIP3Sjth2
 rd7V47x/8bCdgt0grQXIxejebbO5NawnFjXlFS7M2SRQXLMtyfFbiuZVGw0kZ+nn
 rzMeodQVmGi518nmmc9YcyW0/R8qer9DaaiN1ybgVF/4ZSK+LhlZo7xv4Dv5bWuv
 ThR89Iz09xUmmGYDniAR6q3/zH/52lJAuNU4tQGwB6O/+z8qJR0tZFM86KnApyLU
 pGI3q1s0g9ZK9mouaM+jcV5/fzFGoTXGkIQqaaXOqob97klagudAdBWKN74YynT2
 rAU7sWXF6F9WsKX2sA==
 =keSO
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-kvm-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD

LoongArch KVM changes for v6.18

1. Add PTW feature detection on new hardware.
2. Add sign extension with kernel MMIO/IOCSR emulation.
3. Improve in-kernel IPI emulation.
4. Improve in-kernel PCH-PIC emulation.
5. Move kvm_iocsr tracepoint out of generic code.
2025-09-30 13:23:44 -04:00
Paolo Bonzini 924ccf1d09 KVM/riscv changes for 6.18
- Added SBI FWFT extension for Guest/VM with misaligned
   delegation and pointer masking PMLEN features
 - Added ONE_REG interface for SBI FWFT extension
 - Added Zicbop and bfloat16 extensions for Guest/VM
 - Enabled more common KVM selftests for RISC-V such as
   access_tracking_perf_test, dirty_log_perf_test,
   memslot_modification_stress_test, memslot_perf_test,
   mmu_stress_test, and rseq_test
 - Added SBI v3.0 PMU enhancements in KVM and perf driver
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmjU1XEACgkQrUjsVaLH
 LAfahQ//aPSwG0jbPwRSmWtoqdm38VvKYuWOOxeCjKyas5w0cZPkMQCh3zV/Ks92
 YKNIe6wlDeLIlT+qIzAi5VLwHHiC3ggoJsuNhf7/qmJJvks+1quvl1clUDkVGbRu
 G+FKFkRpBa1HF2O3NASnvBnRLYzXa/wKOuArwsY5DEG4+irnJU3AegbqRuhyPBFr
 VX+LwpfUUHipFGg/0ivflsVBjcz42Y3i1VQ4oXzuHqsRn3Ig3eSy9dGTHIOqJ0A3
 leNbDUSdZxnMlj3Sfab7hH4Vxlr8kiCEgMogHyYpnlyPvG8zJobMWDE9Ziy86D7f
 130zMcHuF0ZkeuaKX+3o2L7yGNTnN/JNtns7VtClRShGqSA8Dtn2xLhnvyray7RH
 CIYjv//34z1BjWyCBfuN5kFIfX5K7cNQHlDP7RVfgw91k7rbCGCJF743nkxz1WBX
 98G05/Rnmn+KCtU8t2pRoG9Qkq+bE3iz8Ka8thmiUNciqO78nn/p7a6OEMcjn7jH
 C+VUNST519UBGCjLehBrCZuUOtrPRozHz7Adx3VTJT5wBX2hd/QLoNoMssEi9VmS
 1j/abh7HJcbe7aTUDWUhOK9I+bPHJRsbeNyL5r9jpoC2Xg5Njh/V3aQZ/uuLzO5+
 pYjciRrrmdZ30xkInulWI6wORHkmki34RRmsM9gqYIJBMV+Rcxg=
 =BCyU
 -----END PGP SIGNATURE-----

Merge tag 'kvm-riscv-6.18-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv changes for 6.18

- Added SBI FWFT extension for Guest/VM with misaligned
  delegation and pointer masking PMLEN features
- Added ONE_REG interface for SBI FWFT extension
- Added Zicbop and bfloat16 extensions for Guest/VM
- Enabled more common KVM selftests for RISC-V such as
  access_tracking_perf_test, dirty_log_perf_test,
  memslot_modification_stress_test, memslot_perf_test,
  mmu_stress_test, and rseq_test
- Added SBI v3.0 PMU enhancements in KVM and perf driver
2025-09-30 13:23:36 -04:00
Linus Torvalds 30d4efb2f5 xen: branch for v6.18-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCaNZ35gAKCRCAXGG7T9hj
 vg/BAPwOpOT8J8TV2CwxRdqEJAJBysoicMbxGw+6v97OssMymwD+JCWYkrt6pyrn
 mgKCT6FSOB6ZQ50IWYSr8DMQXjHybAU=
 =NzbF
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:

 - fix the migration of a Xen virq to another cpu plus some related
   cleanup work

 - clean up Xen-PV mode specific code, resulting in removing some of
   that code in the resulting binary in case CONFIG_XEN_PV is not set

 - fixes and cleanup for suspend handling under Xen

* tag 'for-linus-6.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: take system_transition_mutex on suspend
  xen/manage: Fix suspend error path
  xen/events: Update virq_to_irq on migration
  xen/events: Return -EEXIST for bound VIRQs
  xen/events: Cleanup find_virq() return codes
  x86/xen: select HIBERNATE_CALLBACKS more directly
  drivers/xen/gntdev: use xen_pv_domain() instead of cached value
  xen: replace XENFEAT_auto_translated_physmap with xen_pv_domain()
  xen: rework xen_pv_domain()
2025-09-29 19:42:03 -07:00
Linus Torvalds a5ba183bde hardening updates for v6.18-rc1
- Clean up usage of TRAILING_OVERLAP() (Gustavo A. R. Silva)
 
 - lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
   (Junjie Cao)
 
 - Add str_assert_deassert() helper (Lad Prabhakar)
 
 - gcc-plugins: Remove TODO_verify_il for GCC >= 16
 
 - kconfig: Fix BrokenPipeError warnings in selftests
 
 - kconfig: Add transitional symbol attribute for migration support
 
 - kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaNraNQAKCRA2KwveOeQk
 u/DkAPwKPP5BSmVR2wkdpQaXIr3PGA+cbBYp34DMJNujZ9piIwD/WZ+HfGTLoERy
 +2Q6HLj9hUdd+Rx3IZ8/w1QmnhUIUAU=
 =AwV9
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:
 "One notable addition is the creation of the 'transitional' keyword for
  kconfig so CONFIG renaming can go more smoothly.

  This has been a long-standing deficiency, and with the renaming of
  CONFIG_CFI_CLANG to CONFIG_CFI (since GCC will soon have KCFI
  support), this came up again.

  The breadth of the diffstat is mainly this renaming.

   - Clean up usage of TRAILING_OVERLAP() (Gustavo A. R. Silva)

   - lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
     (Junjie Cao)

   - Add str_assert_deassert() helper (Lad Prabhakar)

   - gcc-plugins: Remove TODO_verify_il for GCC >= 16

   - kconfig: Fix BrokenPipeError warnings in selftests

   - kconfig: Add transitional symbol attribute for migration support

   - kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI"

* tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  lib/string_choices: Add str_assert_deassert() helper
  kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
  kconfig: Add transitional symbol attribute for migration support
  kconfig: Fix BrokenPipeError warnings in selftests
  gcc-plugins: Remove TODO_verify_il for GCC >= 16
  stddef: Introduce __TRAILING_OVERLAP()
  stddef: Remove token-pasting in TRAILING_OVERLAP()
  lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
2025-09-29 17:48:27 -07:00
Linus Torvalds 8c1ed30218 ffs-const update for v6.18-rc1
- PCI: Fix theoretical underflow in use of ffs().
 
 - Universally apply __attribute_const__ to all architecture's ffs()-family
   of functions.
 
 - Add KUnit tests for ffs() behavior and const-ness.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaNrWngAKCRA2KwveOeQk
 u3ZGAPwJTscARU4MspnqpbuAV601dG1TNoJG+8JYH84r+R2jjQEAlmBZB0jaHbC2
 qFWjHivD/0ofvihKfAPFgxlakyV1XAg=
 =diXF
 -----END PGP SIGNATURE-----

Merge tag 'ffs-const-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull ffs const-attribute cleanups from Kees Cook:
 "While working on various hardening refactoring a while back we
  encountered inconsistencies in the application of __attribute_const__
  on the ffs() family of functions.

  This series fixes this across all archs and adds KUnit tests.

  Notably, this found a theoretical underflow in PCI (also fixed here)
  and uncovered an inefficiency in ARC (fixed in the ARC arch PR). I
  kept the series separate from the general hardening PR since it is a
  stand-alone "topic".

   - PCI: Fix theoretical underflow in use of ffs().

   - Universally apply __attribute_const__ to all architecture's
     ffs()-family of functions.

   - Add KUnit tests for ffs() behavior and const-ness"

* tag 'ffs-const-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  KUnit: ffs: Validate all the __attribute_const__ annotations
  sparc: Add __attribute_const__ to ffs()-family implementations
  xtensa: Add __attribute_const__ to ffs()-family implementations
  s390: Add __attribute_const__ to ffs()-family implementations
  parisc: Add __attribute_const__ to ffs()-family implementations
  mips: Add __attribute_const__ to ffs()-family implementations
  m68k: Add __attribute_const__ to ffs()-family implementations
  openrisc: Add __attribute_const__ to ffs()-family implementations
  riscv: Add __attribute_const__ to ffs()-family implementations
  hexagon: Add __attribute_const__ to ffs()-family implementations
  alpha: Add __attribute_const__ to ffs()-family implementations
  sh: Add __attribute_const__ to ffs()-family implementations
  powerpc: Add __attribute_const__ to ffs()-family implementations
  x86: Add __attribute_const__ to ffs()-family implementations
  csky: Add __attribute_const__ to ffs()-family implementations
  bitops: Add __attribute_const__ to generic ffs()-family implementations
  KUnit: Introduce ffs()-family tests
  PCI: Test for bit underflow in pcie_set_readrq()
2025-09-29 16:31:35 -07:00
Linus Torvalds 722df25ddf kernel-6.18-rc1.clone3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZgMQAKCRCRxhvAZXjc
 ornXAP954dZjz+OJw6lJLCf0j9TXJOczGHvK3oW5ZD9KnqtTdwEA7p1A6WMOKJyl
 8VtTgCS0yNt8QlznUnsSDfVm0jXVGAY=
 =tUXG
 -----END PGP SIGNATURE-----

Merge tag 'kernel-6.18-rc1.clone3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull copy_process updates from Christian Brauner:
 "This contains the changes to enable support for clone3() on nios2
  which apparently is still a thing.

  The more exciting part of this is that it cleans up the inconsistency
  in how the 64-bit flag argument is passed from copy_process() into the
  various other copy_*() helpers"

[ Fixed up rv ltl_monitor 32-bit support as per Sasha Levin in the merge ]

* tag 'kernel-6.18-rc1.clone3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  nios2: implement architecture-specific portion of sys_clone3
  arch: copy_thread: pass clone_flags as u64
  copy_process: pass clone_flags as u64 across calltree
  copy_sighand: Handle architectures where sizeof(unsigned long) < sizeof(u64)
2025-09-29 10:36:50 -07:00
Kees Cook 23ef9d4397 kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
The kernel's CFI implementation uses the KCFI ABI specifically, and is
not strictly tied to a particular compiler. In preparation for GCC
supporting KCFI, rename CONFIG_CFI_CLANG to CONFIG_CFI (along with
associated options).

Use new "transitional" Kconfig option for old CONFIG_CFI_CLANG that will
enable CONFIG_CFI during olddefconfig.

Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20250923213422.1105654-3-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-09-24 14:29:14 -07:00
Sean Christopherson fddd07626b KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors
Add {HV,CP,SX}_VECTOR definitions for AMD's Hypervisor Injection Exception,
VMM Communication Exception, and SVM Security Exception vectors, along with
human friendly formatting for trace_kvm_inj_exception().

Note, KVM is all but guaranteed to never observe or inject #SX, and #HV is
also unlikely to go unused.  Add the architectural collateral mostly for
completeness, and on the off chance that hardware goes off the rails.

Link: https://lore.kernel.org/r/20250919223258.1604852-44-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:29:03 -07:00
Sean Christopherson f2f5519aa4 KVM: x86: Define Control Protection Exception (#CP) vector
Add a CP_VECTOR definition for CET's Control Protection Exception (#CP),
along with human friendly formatting for trace_kvm_inj_exception().

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-43-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:28:56 -07:00
Yang Weijiang e140467bbd KVM: x86: Enable CET virtualization for VMX and advertise to userspace
Add support for the LOAD_CET_STATE VM-Enter and VM-Exit controls, the
CET XFEATURE bits in XSS, and  advertise support for IBT and SHSTK to
userspace.  Explicitly clear IBT and SHSTK onn SVM, as additional work is
needed to enable CET on SVM, e.g. to context switch S_CET and other state.

Disable KVM CET feature if unrestricted_guest is unsupported/disabled as
KVM does not support emulating CET, as running without Unrestricted Guest
can result in KVM emulating large swaths of guest code.  While it's highly
unlikely any guest will trigger emulation while also utilizing IBT or
SHSTK, there's zero reason to allow CET without Unrestricted Guest as that
combination should only be possible when explicitly disabling
unrestricted_guest for testing purposes.

Disable CET if VMX_BASIC[bit56] == 0, i.e. if hardware strictly enforces
the presence of an Error Code based on exception vector, as attempting to
inject a #CP with an Error Code (#CP architecturally has an Error Code)
will fail due to the #CP vector historically not having an Error Code.

Clear S_CET and SSP-related VMCS on "reset" to emulate the architectural
of CET MSRs and SSP being reset to 0 after RESET, power-up and INIT.  Note,
KVM already clears guest CET state that is managed via XSTATE in
kvm_xstate_reset().

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
[sean: move some bits to separate patches, massage changelog]
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-29-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:22:32 -07:00
Yang Weijiang b3744c59eb KVM: x86: Allow setting CR4.CET if IBT or SHSTK is supported
Drop X86_CR4_CET from CR4_RESERVED_BITS and instead mark CET as reserved
if and only if IBT *and* SHSTK are unsupported, i.e. allow CR4.CET to be
set if IBT or SHSTK is supported.  This creates a virtualization hole if
the CPU supports both IBT and SHSTK, but the kernel or vCPU model only
supports one of the features.  However, it's entirely legal for a CPU to
have only one of IBT or SHSTK, i.e. the hole is a flaw in the architecture,
not in KVM.

More importantly, so long as KVM is careful to initialize and context
switch both IBT and SHSTK state (when supported in hardware) if either
feature is exposed to the guest, a misbehaving guest can only harm itself.
E.g. VMX initializes host CET VMCS fields based solely on hardware
capabilities.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
[sean: split to separate patch, write changelog]
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-24-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:17:48 -07:00
Sean Christopherson 296599346c KVM: x86/mmu: WARN on attempt to check permissions for Shadow Stack #PF
Add PFERR_SS_MASK, a.k.a. Shadow Stack access, and WARN if KVM attempts to
check permissions for a Shadow Stack access as KVM hasn't been taught to
understand the magic Writable=0,Dirty=1 combination that is required for
Shadow Stack accesses, and likely will never learn.  There are no plans to
support Shadow Stacks with the Shadow MMU, and the emulator rejects all
instructions that affect Shadow Stacks, i.e. it should be impossible for
KVM to observe a #PF due to a shadow stack access.

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-22-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:16:53 -07:00
Yang Weijiang 9d6812d415 KVM: x86: Enable guest SSP read/write interface with new uAPIs
Add a KVM-defined ONE_REG register, KVM_REG_GUEST_SSP, to let userspace
save and restore the guest's Shadow Stack Pointer (SSP).  On both Intel
and AMD, SSP is a hardware register that can only be accessed by software
via dedicated ISA (e.g. RDSSP) or via VMCS/VMCB fields (used by hardware
to context switch SSP at entry/exit).  As a result, SSP doesn't fit in
any of KVM's existing interfaces for saving/restoring state.

Internally, treat SSP as a fake/synthetic MSR, as the semantics of writes
to SSP follow that of several other Shadow Stack MSRs, e.g. the PLx_SSP
MSRs.  Use a translation layer to hide the KVM-internal MSR index so that
the arbitrary index doesn't become ABI, e.g. so that KVM can rework its
implementation as needed, so long as the ONE_REG ABI is maintained.

Explicitly reject accesses to SSP if the vCPU doesn't have Shadow Stack
support to avoid running afoul of ignore_msrs, which unfortunately applies
to host-initiated accesses (which is a discussion for another day).  I.e.
ensure consistent behavior for KVM-defined registers irrespective of
ignore_msrs.

Link: https://lore.kernel.org/all/aca9d389-f11e-4811-90cf-d98e345a5cc2@intel.com
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-14-seanjc@google.com
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:10:33 -07:00
Yang Weijiang d6c387fc39 KVM: VMX: Introduce CET VMCS fields and control bits
Control-flow Enforcement Technology (CET) is a kind of CPU feature used
to prevent Return/CALL/Jump-Oriented Programming (ROP/COP/JOP) attacks.
It provides two sub-features(SHSTK,IBT) to defend against ROP/COP/JOP
style control-flow subversion attacks.

Shadow Stack (SHSTK):
  A shadow stack is a second stack used exclusively for control transfer
  operations. The shadow stack is separate from the data/normal stack and
  can be enabled individually in user and kernel mode. When shadow stack
  is enabled, CALL pushes the return address on both the data and shadow
  stack. RET pops the return address from both stacks and compares them.
  If the return addresses from the two stacks do not match, the processor
  generates a #CP.

Indirect Branch Tracking (IBT):
  IBT introduces instruction(ENDBRANCH)to mark valid target addresses of
  indirect branches (CALL, JMP etc...). If an indirect branch is executed
  and the next instruction is _not_ an ENDBRANCH, the processor generates
  a #CP. These instruction behaves as a NOP on platforms that have no CET.

Several new CET MSRs are defined to support CET:
  MSR_IA32_{U,S}_CET: CET settings for {user,supervisor} CET respectively.

  MSR_IA32_PL{0,1,2,3}_SSP: SHSTK pointer linear address for CPL{0,1,2,3}.

  MSR_IA32_INT_SSP_TAB: Linear address of SHSTK pointer table, whose entry
			is indexed by IST of interrupt gate desc.

Two XSAVES state bits are introduced for CET:
  IA32_XSS:[bit 11]: Control saving/restoring user mode CET states
  IA32_XSS:[bit 12]: Control saving/restoring supervisor mode CET states.

Six VMCS fields are introduced for CET:
  {HOST,GUEST}_S_CET: Stores CET settings for kernel mode.
  {HOST,GUEST}_SSP: Stores current active SSP.
  {HOST,GUEST}_INTR_SSP_TABLE: Stores current active MSR_IA32_INT_SSP_TAB.

On Intel platforms, two additional bits are defined in VM_EXIT and VM_ENTRY
control fields:
If VM_EXIT_LOAD_CET_STATE = 1, host CET states are loaded from following
VMCS fields at VM-Exit:
  HOST_S_CET
  HOST_SSP
  HOST_INTR_SSP_TABLE

If VM_ENTRY_LOAD_CET_STATE = 1, guest CET states are loaded from following
VMCS fields at VM-Entry:
  GUEST_S_CET
  GUEST_SSP
  GUEST_INTR_SSP_TABLE

Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-13-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:00:49 -07:00
Chao Gao 338543cbe0 KVM: x86: Check XSS validity against guest CPUIDs
Maintain per-guest valid XSS bits and check XSS validity against them
rather than against KVM capabilities. This is to prevent bits that are
supported by KVM but not supported for a guest from being set.

Opportunistically return KVM_MSR_RET_UNSUPPORTED on IA32_XSS MSR accesses
if guest CPUID doesn't enumerate X86_FEATURE_XSAVES. Since
KVM_MSR_RET_UNSUPPORTED takes care of host_initiated cases, drop the
host_initiated check.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:00:45 -07:00
Yang Weijiang 06f2969c6a KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support
Enable KVM_{G,S}ET_ONE_REG uAPIs so that userspace can access MSRs and
other non-MSR registers through them, along with support for
KVM_GET_REG_LIST to enumerate support for KVM-defined registers.

This is in preparation for allowing userspace to read/write the guest SSP
register, which is needed for the upcoming CET virtualization support.

Currently, two types of registers are supported: KVM_X86_REG_TYPE_MSR and
KVM_X86_REG_TYPE_KVM. All MSRs are in the former type; the latter type is
added for registers that lack existing KVM uAPIs to access them. The "KVM"
in the name is intended to be vague to give KVM flexibility to include
other potential registers.  More precise names like "SYNTHETIC" and
"SYNTHETIC_MSR" were considered, but were deemed too confusing (e.g. can
be conflated with synthetic guest-visible MSRs) and may put KVM into a
corner (e.g. if KVM wants to change how a KVM-defined register is modeled
internally).

Enumerate only KVM-defined registers in KVM_GET_REG_LIST to avoid
duplicating KVM_GET_MSR_INDEX_LIST, and so that KVM can return _only_
registers that are fully supported (KVM_GET_REG_LIST is vCPU-scoped, i.e.
can be precise, whereas KVM_GET_MSR_INDEX_LIST is system-scoped).

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Link: https://lore.kernel.org/all/20240219074733.122080-18-weijiang.yang@intel.com [1]
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:00:44 -07:00
Sean Christopherson 5dca3808b2 KVM: x86: Merge 'svm' into 'cet' to pick up GHCB dependencies
Merge the queue of SVM changes for 6.18 to pick up the KVM-defined GHCB
helpers so that kvm_ghcb_get_xss() can be used to virtualize CET for
SEV-ES+ guests.
2025-09-23 08:59:49 -07:00
Hou Wenlong 9bc3663507 KVM: x86: Add helper to retrieve current value of user return MSR
In the user return MSR support, the cached value is always the hardware
value of the specific MSR. Therefore, add a helper to retrieve the
cached value, which can replace the need for RDMSR, for example, to
allow SEV-ES guests to restore the correct host hardware value without
using RDMSR.

Cc: stable@vger.kernel.org
Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
[sean: drop "cache" from the name, make it a one-liner, tag for stable]
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250923153738.1875174-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 08:55:20 -07:00
Sean Christopherson 4135a9a8cc KVM: SEV: Validate XCR0 provided by guest in GHCB
Use __kvm_set_xcr() to propagate XCR0 changes from the GHCB to KVM's
software model in order to validate the new XCR0 against KVM's view of
the supported XCR0.  Allowing garbage is thankfully mostly benign, as
kvm_load_{guest,host}_xsave_state() bail early for vCPUs with protected
state, xstate_required_size() will simply provide garbage back to the
guest, and attempting to save/restore the bad value via KVM_{G,S}ET_XCRS
will only harm the guest (setting XCR0 will fail).

However, allowing the guest to put junk into a field that KVM assumes is
valid is a CVE waiting to happen.  And as a bonus, using the proper API
eliminates the ugly open coding of setting arch.cpuid_dynamic_bits_dirty.

Simply ignore bad values, as either the guest managed to get an
unsupported value into hardware, or the guest is misbehaving and providing
pure garbage.  In either case, KVM can't fix the broken guest.

Note, using __kvm_set_xcr() also avoids recomputing dynamic CPUID bits
if XCR0 isn't actually changing (relatively to KVM's previous snapshot).

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Fixes: 291bd20d5d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 08:55:19 -07:00