Commit Graph

191263 Commits

Author SHA1 Message Date
Linus Torvalds 773602256a A single fix for the x86 scheduler topology:
Using cluster topology on hybrid CPUs, e.g. Alder Lake, biases the
   scheduler towards the ATOM cluster as that has more total capacity.
   Use selection based on CPU priority instead.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmG1xxETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWhaEADFlr1l6QwIJs8Bv3Z9u/kkW1fPIoqP
 aFDrqLiQ/T+gBuS+d7R3YkRf1cICS+gbvv/dw8qx/v6gerUC71tEXxYxvOUqR7QL
 wX2zFNrhQqBaIKKGnbTHe47pB2+/AlpbXuQ2/j415AHxCxB4fOC1N4VKq0hTUyPu
 Ubz+QmOHcB0NhAf1jeN9DAlTJgQNbRW6oYYPJDsr+n+WjdlN3IZW/hCxLN1cvnNd
 NzsDEK1EeuccteoT5btzYy+XB31+WNrZ9afcJgG7goO/zmJfZ1lWAsyjKPocn6dC
 ozZSvHQhbgJIAwBaZJFyjZocgHTIkdFX6NzlpKaWw05tk984hJJ9pvg4KO5YeNx/
 tIehPnkmd9GwwO541+t3DhcbCPQIiKypRZKnXuczkjIuG3t2oD25b1EAqwhoPKGk
 TEX2o4DjixS3oKUP6khDteQkj8B0qOjeY18eGjBilJ764mRfbL+GRk/NoqVidrnt
 JWSCPEMM842Y8W7GYfZZZAypqDz9U0yRl+BCKpNDcx3Nr5OSbfkWC/LdvM3LlgIE
 /a2C6CXIZXgiifTF2PHOmv0O3/VPAtHXZ01ExzZu/MJP0hBUjLrxsj+TwILgociw
 WLVYnBgCf5XN1rMPmora8P5y+O/UpEYhXbmIFa6HQAQyfc0Ffiof4ZOl5VYf3mNh
 1GLB+ByEgCz4ig==
 =rr0v
 -----END PGP SIGNATURE-----

Merge tag 'sched-urgent-2021-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Thomas Gleixner:
 "A single fix for the x86 scheduler topology:

  Using cluster topology on hybrid CPUs, e.g. Alder Lake, biases the
  scheduler towards the ATOM cluster as that has more total capacity.
  Use selection based on CPU priority instead"

* tag 'sched-urgent-2021-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched,x86: Don't use cluster topology for x86 hybrid CPUs
2021-12-12 09:38:04 -08:00
Linus Torvalds 0f3d41e82d csky updates for 5.16-rc5
Only 1 fixup for csky:
  - fpu config macro
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEE2KAv+isbWR/viAKHAXH1GYaIxXsFAmG0fjcSHGd1b3JlbkBr
 ZXJuZWwub3JnAAoJEAFx9RmGiMV7MvAQAI9FNrhBJlK41rtMjzspKWQXYc0R7GRl
 acRUam/97uGKeiVlQcKaSfP07GDQCfcaXlaQ1tW4pgALDbXqaaOOh2DlsNaZKQ3h
 OoBJJ2WcH2+sE+YkZcxjBR2FuuAKRPfGAEbTiZ7rZjZqiQDGhjqwNqBcoz4/Ua90
 saOMoVxstVuALK1W6AZ2JmeSNXzA+aO7kM7O3pp2UgaYCJkh0nqDFJwIkXVhPhNz
 GhADqot1VRad51pa5xNd8KXXXic5lIUsk2qohrxZpluz1kjGFS3JAWSn4Mbj5hBk
 SfzT9FI188NXieh+Zm5lhFbxgOcaWCKUJut7yJOhsYY4YFnNB7lj+AFJj/l0GUfq
 Q1D4fww5GLLv9nKBBUVHF6dbXC3Z66J/5kJa/t1BOLQR7L7BUbd4bKbMlx/sigjm
 CxdS2YZVuDCmz4rSof4B1Y3TOjV6Olb0muawqHq+iSzZqMtB/+5sgMyD5LmEyigK
 5sJT6LZDbL7sdjh5YqU3rVtqsonO5o2hRfdAZaxtgodxCLZpeF4tUMHl2CSD7XKQ
 nhRZvZ7w4n7BY4zS+fyXfPAecVzfH0nIMdskDsZhwcnjxqGuRZbDd/ookXW7P3ID
 7g/fNVAOkxUbRE7MSO7RSOfh6lD4Y6zTrBx+YDj4ht1YDErTeLn4BCVWOb4LgjQs
 CR9Hbai5YUIS
 =g1GK
 -----END PGP SIGNATURE-----

Merge tag 'csky-for-linus-5.16-rc5' of git://github.com/c-sky/csky-linux

Pull csky from Guo Ren:
 "Only one fix for csky: fix fpu config macro"

* tag 'csky-for-linus-5.16-rc5' of git://github.com/c-sky/csky-linux:
  csky: fix typo of fpu config macro
2021-12-12 09:32:49 -08:00
Linus Torvalds b9172f9e88 More x86 fixes:
* Logic bugs in CR0 writes and Hyper-V hypercalls
 * Don't use Enlightened MSR Bitmap for L3
 * Remove user-triggerable WARN
 
 Plus a few selftest fixes and a regression test for the
 user-triggerable WARN.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmGzZuIUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMD4Qf+Im7Q0XNRZGzK6x4Blu3ZZJSuIIkW
 gEK5mDX/BWSYxoGhRN0IOkyf1Tx/A5qYwbZts87wZSvKONG2MuVzdeQ0mkDxgKc3
 cYwvvIPxCKaW/dQLD2OKVlqdAv6YbeJiFURWXgszMkrcgHvw39H5Tn6ldi0B5nvg
 Gvpj8LtbPDXGXab//Xrhia3+1F9TKOrcOG+obGC5G2mrGKTkG2+pi9L6LohvENhd
 sOSWdpmvQTU4PeqGlhW8RCwcN+vpa+NasHT2i2tHcWZA9Lqp4P81+4ZyQLIBsRB3
 psANG0c40EW+lfjFGbLL/6VR5kypxa6zy9RgX+QiRcj6C0+dJOgnwNY35A==
 =cREz
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "More x86 fixes:

   - Logic bugs in CR0 writes and Hyper-V hypercalls

   - Don't use Enlightened MSR Bitmap for L3

   - Remove user-triggerable WARN

  Plus a few selftest fixes and a regression test for the
  user-triggerable WARN"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  selftests: KVM: Add test to verify KVM doesn't explode on "bad" I/O
  KVM: x86: Don't WARN if userspace mucks with RCX during string I/O exit
  KVM: X86: Raise #GP when clearing CR0_PG in 64 bit mode
  selftests: KVM: avoid failures due to reserved HyperTransport region
  KVM: x86: Ignore sparse banks size for an "all CPUs", non-sparse IPI req
  KVM: x86: Wait for IPIs to be delivered when handling Hyper-V TLB flush hypercall
  KVM: x86: selftests: svm_int_ctl_test: fix intercept calculation
  KVM: nVMX: Don't use Enlightened MSR Bitmap for L3
2021-12-10 14:09:12 -08:00
Linus Torvalds b8a98b6bf6 pci-v5.16-fixes-2
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmGzf4IUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vwjcBAAiWel9P5H947jR9sTbz4ya6wH1biD
 k2w97VDa65DyH/LBJSgNwmblnXs7yIUuGTd+mRq9bhlpE8CQi9BfeCehP1vCfTeQ
 JtMH62dW8KBLkvIHU83H1SSZZNKQgDn7hqUsrrMa0HD+Z+ovbuQYp4M1Oh6xRAEM
 TTBTKb0KivA8bFwvtgj/mu7K7sVJH+cVMilD9ABoVeGmCWfUSO48ovEjWB+vmBFs
 UyTCU5CUg/FkjvVmZTOv5GY4EL83FA9Jdtzy8inRA+hSWY6ImXHTzmQlAzvA+Rkv
 k344ZQM9GNvbvwKfBa9iW2g+B2y/OJXafGoVL0NBUcj/eiY5dnAX0/tZHvx0aXFy
 G1Txy2utaG2MSkfZzchEKbRvS0tV7kiFiTmqp9lNmffTZiP72k4+kFJHQC5AzvZb
 O7Ce/XSQifQ1Z3f5B+Ymx6EOgKYJUaWO9B1U1KF0EKGMe5GB0TBiXh/tS2EmV1O8
 1hkUJm032Bbf1Bv5R6BLdgKVz4I3UsqmGKH5gg3blyylAQ1oHsioaUKeV6iHSq40
 u9rNZaKGC3SweYZVISNE1uoII4qzEgLOHggHpZvWxhQy35cFBz8ZsNfLwBD3/8z9
 UfFuLSLHjx+hv3Ev5mgDWH1mzAlzyq5KkDT0bodBix07s5mviDH+57yyw3JtHUuL
 F+tMrYjUHK9ArC8=
 =BWkT
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.16-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:

 - Revert emulation of Marvell Armada A3720 expansion ROM because it
   doesn't work as expected (Marek Behún)

 - Assert PERST# in Apple M1 driver to fix initialization when booting
   from bootloaders using PCIe, such as U-Boot (Marc Zyngier)

 - Describe PERST# as active low in Apple T8103 DT and update driver to
   match (Marc Zyngier)

* tag 'pci-v5.16-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: apple: Fix PERST# polarity
  arm64: dts: apple: t8103: Mark PCIe PERST# polarity active low in DT
  PCI: apple: Follow the PCIe specifications when resetting the port
  Revert "PCI: aardvark: Fix support for PCI_ROM_ADDRESS1 on emulated bridge"
2021-12-10 11:56:05 -08:00
Sean Christopherson d07898eaf3 KVM: x86: Don't WARN if userspace mucks with RCX during string I/O exit
Replace a WARN with a comment to call out that userspace can modify RCX
during an exit to userspace to handle string I/O.  KVM doesn't actually
support changing the rep count during an exit, i.e. the scenario can be
ignored, but the WARN needs to go as it's trivial to trigger from
userspace.

Cc: stable@vger.kernel.org
Fixes: 3b27de2718 ("KVM: x86: split the two parts of emulator_pio_in")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211025201311.1881846-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-10 09:38:02 -05:00
Lai Jiangshan 777ab82d7c KVM: X86: Raise #GP when clearing CR0_PG in 64 bit mode
In the SDM:
If the logical processor is in 64-bit mode or if CR4.PCIDE = 1, an
attempt to clear CR0.PG causes a general-protection exception (#GP).
Software should transition to compatibility mode and clear CR4.PCIDE
before attempting to disable paging.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211207095230.53437-1-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-10 09:38:01 -05:00
Sean Christopherson 3244867af8 KVM: x86: Ignore sparse banks size for an "all CPUs", non-sparse IPI req
Do not bail early if there are no bits set in the sparse banks for a
non-sparse, a.k.a. "all CPUs", IPI request.  Per the Hyper-V spec, it is
legal to have a variable length of '0', e.g. VP_SET's BankContents in
this case, if the request can be serviced without the extra info.

  It is possible that for a given invocation of a hypercall that does
  accept variable sized input headers that all the header input fits
  entirely within the fixed size header. In such cases the variable sized
  input header is zero-sized and the corresponding bits in the hypercall
  input should be set to zero.

Bailing early results in KVM failing to send IPIs to all CPUs as expected
by the guest.

Fixes: 214ff83d44 ("KVM: x86: hyperv: implement PV IPI send hypercalls")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211207220926.718794-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-10 07:12:42 -05:00
Vitaly Kuznetsov 1ebfaa11eb KVM: x86: Wait for IPIs to be delivered when handling Hyper-V TLB flush hypercall
Prior to commit 0baedd7927 ("KVM: x86: make Hyper-V PV TLB flush use
tlb_flush_guest()"), kvm_hv_flush_tlb() was using 'KVM_REQ_TLB_FLUSH |
KVM_REQUEST_NO_WAKEUP' when making a request to flush TLBs on other vCPUs
and KVM_REQ_TLB_FLUSH is/was defined as:

 (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)

so KVM_REQUEST_WAIT was lost. Hyper-V TLFS, however, requires that
"This call guarantees that by the time control returns back to the
caller, the observable effects of all flushes on the specified virtual
processors have occurred." and without KVM_REQUEST_WAIT there's a small
chance that the vCPU making the TLB flush will resume running before
all IPIs get delivered to other vCPUs and a stale mapping can get read
there.

Fix the issue by adding KVM_REQUEST_WAIT flag to KVM_REQ_TLB_FLUSH_GUEST:
kvm_hv_flush_tlb() is the sole caller which uses it for
kvm_make_all_cpus_request()/kvm_make_vcpus_request_mask() where
KVM_REQUEST_WAIT makes a difference.

Cc: stable@kernel.org
Fixes: 0baedd7927 ("KVM: x86: make Hyper-V PV TLB flush use tlb_flush_guest()")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211209102937.584397-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-10 07:12:41 -05:00
Linus Torvalds ded746bfc9 Networking fixes for 5.16-rc5, including fixes from bpf, can and netfilter.
Current release - regressions:
 
  - bpf, sockmap: re-evaluate proto ops when psock is removed from sockmap
 
 Current release - new code bugs:
 
  - bpf: fix bpf_check_mod_kfunc_call for built-in modules
 
  - ice: fixes for TC classifier offloads
 
  - vrf: don't run conntrack on vrf with !dflt qdisc
 
 Previous releases - regressions:
 
  - bpf: fix the off-by-two error in range markings
 
  - seg6: fix the iif in the IPv6 socket control block
 
  - devlink: fix netns refcount leak in devlink_nl_cmd_reload()
 
  - dsa: mv88e6xxx: fix "don't use PHY_DETECT on internal PHY's"
 
  - dsa: mv88e6xxx: allow use of PHYs on CPU and DSA ports
 
 Previous releases - always broken:
 
  - ethtool: do not perform operations on net devices being unregistered
 
  - udp: use datalen to cap max gso segments
 
  - ice: fix races in stats collection
 
  - fec: only clear interrupt of handling queue in fec_enet_rx_queue()
 
  - m_can: pci: fix incorrect reference clock rate
 
  - m_can: disable and ignore ELO interrupt
 
  - mvpp2: fix XDP rx queues registering
 
 Misc:
 
  - treewide: add missing includes masked by cgroup -> bpf.h dependency
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmGyN1AACgkQMUZtbf5S
 IrtgMA/8D0qk3c75ts0hCzGXwdNdEBs+e7u1bJVPqdyU8x/ZLAp2c0EKB/7IWuxA
 CtsnbanPcmibqvQJDI1hZEBdafi43BmF5VuFSIxYC4EM/1vLoRprurXlIwL2YWki
 aWi//tyOIGBl6/ClzJ9Vm51HTJQwDmdv8GRnKAbsC1eOTM3pmmcg+6TLbDhycFEQ
 F9kkDCvyB9kWIH645QyJRH+Y5qQOvneCyQCPkkyjTgEADzV5i7YgtRol6J3QIbw3
 umPHSckCBTjMacYcCLsbhQaF2gTMgPV1basNLPMjCquJVrItE0ZaeX3MiD6nBFae
 yY5+Wt5KAZDzjERhneX8AINHoRPA/tNIahC1+ytTmsTA8Hj230FHE5hH1ajWiJ9+
 GSTBCBqjtZXce3r2Efxfzy0Kb9JwL3vDi7LS2eKQLv0zBLfYp2ry9Sp9qe4NhPkb
 OYrxws9kl5GOPvrFB5BWI9XBINciC9yC3PjIsz1noi0vD8/Hi9dPwXeAYh36fXU3
 rwRg9uAt6tvFCpwbuQ9T2rsMST0miur2cDYd8qkJtuJ7zFvc+suMXwBZyI29nF2D
 uyymIC2XStHJfAjUkFsGVUSXF5FhML9OQsqmisdQ8KdH26jMnDeMjIWJM7UWK+zY
 E/fqWT8UyS3mXWqaggid4ZbotipCwA0gxiDHuqqUGTM+dbKrzmk=
 =F6rS
 -----END PGP SIGNATURE-----

Merge tag 'net-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf, can and netfilter.

  Current release - regressions:

   - bpf, sockmap: re-evaluate proto ops when psock is removed from
     sockmap

  Current release - new code bugs:

   - bpf: fix bpf_check_mod_kfunc_call for built-in modules

   - ice: fixes for TC classifier offloads

   - vrf: don't run conntrack on vrf with !dflt qdisc

  Previous releases - regressions:

   - bpf: fix the off-by-two error in range markings

   - seg6: fix the iif in the IPv6 socket control block

   - devlink: fix netns refcount leak in devlink_nl_cmd_reload()

   - dsa: mv88e6xxx: fix "don't use PHY_DETECT on internal PHY's"

   - dsa: mv88e6xxx: allow use of PHYs on CPU and DSA ports

  Previous releases - always broken:

   - ethtool: do not perform operations on net devices being
     unregistered

   - udp: use datalen to cap max gso segments

   - ice: fix races in stats collection

   - fec: only clear interrupt of handling queue in fec_enet_rx_queue()

   - m_can: pci: fix incorrect reference clock rate

   - m_can: disable and ignore ELO interrupt

   - mvpp2: fix XDP rx queues registering

  Misc:

   - treewide: add missing includes masked by cgroup -> bpf.h
     dependency"

* tag 'net-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (82 commits)
  net: dsa: mv88e6xxx: allow use of PHYs on CPU and DSA ports
  net: wwan: iosm: fixes unable to send AT command during mbim tx
  net: wwan: iosm: fixes net interface nonfunctional after fw flash
  net: wwan: iosm: fixes unnecessary doorbell send
  net: dsa: felix: Fix memory leak in felix_setup_mmio_filtering
  MAINTAINERS: s390/net: remove myself as maintainer
  net/sched: fq_pie: prevent dismantle issue
  net: mana: Fix memory leak in mana_hwc_create_wq
  seg6: fix the iif in the IPv6 socket control block
  nfp: Fix memory leak in nfp_cpp_area_cache_add()
  nfc: fix potential NULL pointer deref in nfc_genl_dump_ses_done
  nfc: fix segfault in nfc_genl_dump_devices_done
  udp: using datalen to cap max gso segments
  net: dsa: mv88e6xxx: error handling for serdes_power functions
  can: kvaser_usb: get CAN clock frequency from device
  can: kvaser_pciefd: kvaser_pciefd_rx_error_frame(): increase correct stats->{rx,tx}_errors counter
  net: mvpp2: fix XDP rx queues registering
  vmxnet3: fix minimum vectors alloc issue
  net, neigh: clear whole pneigh_entry at alloc time
  net: dsa: mv88e6xxx: fix "don't use PHY_DETECT on internal PHY's"
  ...
2021-12-09 11:26:44 -08:00
Jakub Kicinski 6efcdadc15 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
bpf 2021-12-08

We've added 12 non-merge commits during the last 22 day(s) which contain
a total of 29 files changed, 659 insertions(+), 80 deletions(-).

The main changes are:

1) Fix an off-by-two error in packet range markings and also add a batch of
   new tests for coverage of these corner cases, from Maxim Mikityanskiy.

2) Fix a compilation issue on MIPS JIT for R10000 CPUs, from Johan Almbladh.

3) Fix two functional regressions and a build warning related to BTF kfunc
   for modules, from Kumar Kartikeya Dwivedi.

4) Fix outdated code and docs regarding BPF's migrate_disable() use on non-
   PREEMPT_RT kernels, from Sebastian Andrzej Siewior.

5) Add missing includes in order to be able to detangle cgroup vs bpf header
   dependencies, from Jakub Kicinski.

6) Fix regression in BPF sockmap tests caused by missing detachment of progs
   from sockets when they are removed from the map, from John Fastabend.

7) Fix a missing "no previous prototype" warning in x86 JIT caused by BPF
   dispatcher, from Björn Töpel.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Add selftests to cover packet access corner cases
  bpf: Fix the off-by-two error in range markings
  treewide: Add missing includes masked by cgroup -> bpf dependency
  tools/resolve_btfids: Skip unresolved symbol warning for empty BTF sets
  bpf: Fix bpf_check_mod_kfunc_call for built-in modules
  bpf: Make CONFIG_DEBUG_INFO_BTF depend upon CONFIG_BPF_SYSCALL
  mips, bpf: Fix reference to non-existing Kconfig symbol
  bpf: Make sure bpf_disable_instrumentation() is safe vs preemption.
  Documentation/locking/locktypes: Update migrate_disable() bits.
  bpf, sockmap: Re-evaluate proto ops when psock is removed from sockmap
  bpf, sockmap: Attach map progs to psock early for feature probes
  bpf, x86: Fix "no previous prototype" warning
====================

Link: https://lore.kernel.org/r/20211208155125.11826-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08 16:06:44 -08:00
Peter Zijlstra cabdc3a847 sched,x86: Don't use cluster topology for x86 hybrid CPUs
For x86 hybrid CPUs like Alder Lake, the order of CPU selection should
be based strictly on CPU priority.  Don't include cluster topology for
hybrid CPUs to avoid interference with such CPU selection order.

On Alder Lake, the Atom CPU cluster has more capacity (4 Atom CPUs) vs
Big core cluster (2 hyperthread CPUs). This could potentially bias CPU
selection towards Atom over Big Core, when Big core CPU has higher
priority.

Fixes: 66558b730f ("sched: Add cluster scheduler level for x86")
Suggested-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Link: https://lkml.kernel.org/r/20211204091402.GM16608@worktop.programming.kicks-ass.net
2021-12-08 22:15:37 +01:00
Vitaly Kuznetsov 250552b925 KVM: nVMX: Don't use Enlightened MSR Bitmap for L3
When KVM runs as a nested hypervisor on top of Hyper-V it uses Enlightened
VMCS and enables Enlightened MSR Bitmap feature for its L1s and L2s (which
are actually L2s and L3s from Hyper-V's perspective). When MSR bitmap is
updated, KVM has to reset HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP from
clean fields to make Hyper-V aware of the change. For KVM's L1s, this is
done in vmx_disable_intercept_for_msr()/vmx_enable_intercept_for_msr().
MSR bitmap for L2 is build in nested_vmx_prepare_msr_bitmap() by blending
MSR bitmap for L1 and L1's idea of MSR bitmap for L2. KVM, however, doesn't
check if the resulting bitmap is different and never cleans
HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP in eVMCS02. This is incorrect and
may result in Hyper-V missing the update.

The issue could've been solved by calling evmcs_touch_msr_bitmap() for
eVMCS02 from nested_vmx_prepare_msr_bitmap() unconditionally but doing so
would not give any performance benefits (compared to not using Enlightened
MSR Bitmap at all). 3-level nesting is also not a very common setup
nowadays.

Don't enable 'Enlightened MSR Bitmap' feature for KVM's L2s (real L3s) for
now.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211129094704.326635-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:53:13 -05:00
Kelly Devilliv a0793fdad9 csky: fix typo of fpu config macro
Fix typo which will cause fpe and privilege exception error.

Signed-off-by: Kelly Devilliv <kelly.devilliv@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2021-12-08 14:15:54 +08:00
Marc Zyngier 5b970dfcfe arm64: dts: apple: t8103: Mark PCIe PERST# polarity active low in DT
As the name indicates, PERST# is active low. Fix the DT description to
match the HW behaviour.

Fixes: ff2a8d91d8 ("arm64: apple: Add PCIe node")
Link: https://lore.kernel.org/r/20211123180636.80558-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Luca Ceresoli <luca@lucaceresoli.net>
Reviewed-by: Mark Kettenis <kettenis@openbsd.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
2021-12-07 14:27:07 -06:00
Linus Torvalds 55a677b256 EFI fix for v5.16
Ensure that the EFI memory map resides in encrypted memory even after it
 has been reallocated.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmGt9nUACgkQw08iOZLZ
 jyT6ZgwAt1jkqt/ILev5MPE2rEQ8BrYhR7pMBcNW7su4BECwRfmQBGjYFeeXjlqp
 7tK2L18DsHJxgt699gumDD7IyjbLNUx9zqsd7nJzlvnpwjGSiUhAGLPwoAeRSEgD
 sxK/HIt6J5ZX/zMsEbtptP9M1eitbZQhtt+0aCLcJvt1NCZc/+FvUaEEPI+6BrMt
 TMWdXd6MOZ8FV2iq+FUpTeG9SGzze6QaIiL7H5Z/otvXbBG1iWmlWR0bR17CXGUC
 thIuqCjEKSM7P45kFF+byCq5ajo55ULy3ZrPVQUYLt/FdMdI+pLxK8TTg46xq8KS
 mXqexjtCPLiJP728lWTETIPB7x9aQW/i6cwesh0O6cmhPznJybW/+uR9zBsyiSGs
 8+9ua+JT9+B3bEn1rsbWYKEcy3G4KrxwtJ5aqRNVJM96pel/yxA4EYkmfyf4TlrS
 pxbY2Agzlci18qCHb5lycc1yM6WHdirzY1l0m8uwqk7o8GHtRMMMU/7e/kvmtxUA
 j9V3vqm6
 =mDRi
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fix from Ard Biesheuvel:
 "Ensure that the EFI memory map resides in encrypted memory even after
  it has been reallocated"

* tag 'efi-urgent-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  x86/sme: Explicitly map new EFI memmap table as encrypted
2021-12-06 10:09:00 -08:00
Linus Torvalds 268ba09537 parisc architecture bug and warning fixes for kernel v5.16-rc4
Some bug and warning fixes:
 - Fix "make install" to use debians "installkernel" script which is now in /usr/sbin
 - Fix the bindeb-pkg make target by giving the correct KBUILD_IMAGE file name
 - Fix compiler warnings by annotating parisc agp init functions with __init
 - Fix timekeeping on SMP machines with dual-core CPUs
 - Enable some more config options in the 64-bit defconfig
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCYa0fHwAKCRD3ErUQojoP
 X4auAQDw7wtS4zDGMmEnsLhqkWCPcqnZY0WANi7gCGN7iuJqIQD/ZfqhcBksfbO1
 OyrOpcQ6QYZuiADRPPzmPsnTw0t6YwQ=
 =kzK0
 -----END PGP SIGNATURE-----

Merge tag 'for-5.16/parisc-6' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc fixes from Helge Deller:
 "Some bug and warning fixes:

   - Fix "make install" to use debians "installkernel" script which is
     now in /usr/sbin

   - Fix the bindeb-pkg make target by giving the correct KBUILD_IMAGE
     file name

   - Fix compiler warnings by annotating parisc agp init functions with
     __init

   - Fix timekeeping on SMP machines with dual-core CPUs

   - Enable some more config options in the 64-bit defconfig"

* tag 'for-5.16/parisc-6' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Mark cr16 CPU clocksource unstable on all SMP machines
  parisc: Fix "make install" on newer debian releases
  parisc/agp: Annotate parisc agp init functions with __init
  parisc: Enable sata sil, audit and usb support on 64-bit defconfig
  parisc: Fix KBUILD_IMAGE for self-extracting kernel
2021-12-05 12:58:18 -08:00
Linus Torvalds f5d54a42d3 - Fix a couple of SWAPGS fencing issues in the x86 entry code
- Use the proper operand types in __{get,put}_user() to prevent
 truncation in SEV-ES string io
 
 - Make sure the kernel mappings are present in trampoline_pgd in order
 to prevent any potential accesses to unmapped memory after switching to
 it
 
 - Fix a trivial list corruption in objtool's pv_ops validation
 
 - Disable the clocksource watchdog for TSC on platforms which claim
 that the TSC is constant, doesn't stop in sleep states, CPU has TSC
 adjust and the number of sockets of the platform are max 2, to prevent
 erroneous markings of the TSC as unstable.
 
 - Make sure TSC adjust is always checked not only when going idle
 
 - Prevent a stack leak by initializing struct _fpx_sw_bytes properly in
 the FPU code
 
 - Fix INTEL_FAM6_RAPTORLAKE define naming to adhere to the convention
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmGsnWcACgkQEsHwGGHe
 VUoR+g/9FcOP0/XLH+LKHumYc9JHXsp5BvYGihyypMFgU0fXQBORtGqdls8jZtiJ
 kEdbW6iL0MRlyN8aHJCqr7dJqs7KJlpWes6hky7BY+U+7uewtjL5y3eSyZnA34T3
 M/Raecx27Hh0L0kHQlHXTUN73v1cgDvq3dCXWsP7Jqgjf5cEmCcV/tPEateqhq/f
 8TkLVIm55rJlbJ0LBO/cT0V3Q8QH9JPKm7nviOZuKCh9gcttFEPaM9MkaJyKUhoy
 O13jlenDoVkVWRXIQec1EZp2pTLxVAm/3Y0plge1yEVsejzh07gsQnMpoNeF+yFC
 8mDgSv8ZAED/vbsnB+BcgoRVj6ajG0+ilpLzcfPwUquiqS9pZrBSTddlvYDPjRMC
 MEXO548xiYgxmipu3r62H89nqmLEYQPk914rJu6bDnDeJ1gaabh8RXbNtQcRqqj3
 RETgVOp78iWn+aT33RLLD1EyodZb2IkMy087a3+TZICIXG81aDj9VgHvrVRnWnfY
 yKuldyrEKzi60yMQkV6h1oc8KSWQhspUSLtOVS9zrulCinYphFOfYFrzFmcKUWIq
 GdVb9eaP2oNBGfPybXP+TBLGZ4Zv9iXZmaEUk7ZGCjgv3ZmGMWJ18Hs/ufs2cwWK
 RNNUo3sz/y3OsreHowkWIk1eSxI16MabB7G/PDMnBSHlioVT390=
 =d6nS
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Fix a couple of SWAPGS fencing issues in the x86 entry code

 - Use the proper operand types in __{get,put}_user() to prevent
   truncation in SEV-ES string io

 - Make sure the kernel mappings are present in trampoline_pgd in order
   to prevent any potential accesses to unmapped memory after switching
   to it

 - Fix a trivial list corruption in objtool's pv_ops validation

 - Disable the clocksource watchdog for TSC on platforms which claim
   that the TSC is constant, doesn't stop in sleep states, CPU has TSC
   adjust and the number of sockets of the platform are max 2, to
   prevent erroneous markings of the TSC as unstable.

 - Make sure TSC adjust is always checked not only when going idle

 - Prevent a stack leak by initializing struct _fpx_sw_bytes properly in
   the FPU code

 - Fix INTEL_FAM6_RAPTORLAKE define naming to adhere to the convention

* tag 'x86_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/xen: Add xenpv_restore_regs_and_return_to_usermode()
  x86/entry: Use the correct fence macro after swapgs in kernel CR3
  x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry()
  x86/sev: Fix SEV-ES INS/OUTS instructions for word, dword, and qword
  x86/64/mm: Map all kernel memory into trampoline_pgd
  objtool: Fix pv_ops noinstr validation
  x86/tsc: Disable clocksource watchdog for TSC on qualified platorms
  x86/tsc: Add a timer to make sure TSC_adjust is always checked
  x86/fpu/signal: Initialize sw_bytes in save_xstate_epilog()
  x86/cpu: Drop spurious underscore from RAPTOR_LAKE #define
2021-12-05 08:43:35 -08:00
Linus Torvalds 90bf8d98b4 * Static analysis fix
* New SEV-ES protocol for communicating invalid VMGEXIT requests
 * Ensure APICv is considered inactive if there is no APIC
 * Fix reserved bits for AMD PerfEvtSeln register
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmGscuAUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOnzAgAg/0tifAOC8F31xu+gbbFa2hJsfH1
 sr2QXqVI6XUWeAHw5GEEcSFHGcFc0BrmfPEL4T3tEhkTGcijL2uTK8dwgq3ue4yg
 5LzaZzh03AWi4x84rV3XNVHHatzF69tgbUG49rSlK2T6BkGWzh4gI5LZV7XqNqLh
 acW+92YcCGo/O9RAUYYakofX4bp0rsQaZornQiD/R5X6AlrtMUyhAYHH5Wnv69n+
 MHf4K9MzrtixXbTvkOXflN5yz6TkIGvCpCK+gmppeuJ3JyZshjn+y95XcDDZtB6o
 +4Ypap3SmIkjTg4VwS8lRwIFkkY31hxhW2ohRnorfP2Q5Ckcz2/kVcIEew==
 =uJMy
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more kvm fixes from Paolo Bonzini:

 - Static analysis fix

 - New SEV-ES protocol for communicating invalid VMGEXIT requests

 - Ensure APICv is considered inactive if there is no APIC

 - Fix reserved bits for AMD PerfEvtSeln register

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure
  KVM: SEV: Fall back to vmalloc for SEV-ES scratch area if necessary
  KVM: SEV: Return appropriate error codes if SEV-ES scratch setup fails
  KVM: x86/mmu: Retry page fault if root is invalidated by memslot update
  KVM: VMX: Set failure code in prepare_vmcs02()
  KVM: ensure APICv is considered inactive if there is no APIC
  KVM: x86/pmu: Fix reserved bits for AMD PerfEvtSeln register
2021-12-05 08:25:33 -08:00
Tom Lendacky 1ff2fc0286 x86/sme: Explicitly map new EFI memmap table as encrypted
Reserving memory using efi_mem_reserve() calls into the x86
efi_arch_mem_reserve() function. This function will insert a new EFI
memory descriptor into the EFI memory map representing the area of
memory to be reserved and marking it as EFI runtime memory. As part
of adding this new entry, a new EFI memory map is allocated and mapped.
The mapping is where a problem can occur. This new memory map is mapped
using early_memremap() and generally mapped encrypted, unless the new
memory for the mapping happens to come from an area of memory that is
marked as EFI_BOOT_SERVICES_DATA memory. In this case, the new memory will
be mapped unencrypted. However, during replacement of the old memory map,
efi_mem_type() is disabled, so the new memory map will now be long-term
mapped encrypted (in efi.memmap), resulting in the map containing invalid
data and causing the kernel boot to crash.

Since it is known that the area will be mapped encrypted going forward,
explicitly map the new memory map as encrypted using early_memremap_prot().

Cc: <stable@vger.kernel.org> # 4.14.x
Fixes: 8f716c9b5f ("x86/mm: Add support to access boot related data in the clear")
Link: https://lore.kernel.org/all/ebf1eb2940405438a09d51d121ec0d02c8755558.1634752931.git.thomas.lendacky@amd.com/
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[ardb: incorporate Kconfig fix by Arnd]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2021-12-05 16:44:52 +01:00
Tom Lendacky ad5b353240 KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure
Currently, an SEV-ES guest is terminated if the validation of the VMGEXIT
exit code or exit parameters fails.

The VMGEXIT instruction can be issued from userspace, even though
userspace (likely) can't update the GHCB. To prevent userspace from being
able to kill the guest, return an error through the GHCB when validation
fails rather than terminating the guest. For cases where the GHCB can't be
updated (e.g. the GHCB can't be mapped, etc.), just return back to the
guest.

The new error codes are documented in the lasest update to the GHCB
specification.

Fixes: 291bd20d5d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <b57280b5562893e2616257ac9c2d4525a9aeeb42.1638471124.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-05 03:02:04 -05:00
Sean Christopherson a655276a59 KVM: SEV: Fall back to vmalloc for SEV-ES scratch area if necessary
Use kvzalloc() to allocate KVM's buffer for SEV-ES's GHCB scratch area so
that KVM falls back to __vmalloc() if physically contiguous memory isn't
available.  The buffer is purely a KVM software construct, i.e. there's
no need for it to be physically contiguous.

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109222350.2266045-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-05 03:02:03 -05:00
Sean Christopherson 75236f5f22 KVM: SEV: Return appropriate error codes if SEV-ES scratch setup fails
Return appropriate error codes if setting up the GHCB scratch area for an
SEV-ES guest fails.  In particular, returning -EINVAL instead of -ENOMEM
when allocating the kernel buffer could be confusing as userspace would
likely suspect a guest issue.

Fixes: 8f423a80d2 ("KVM: SVM: Support MMIO for an SEV-ES guest")
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109222350.2266045-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-05 03:02:03 -05:00
Helge Deller afdb4a5b1d parisc: Mark cr16 CPU clocksource unstable on all SMP machines
In commit c8c3735997 ("parisc: Enhance detection of synchronous cr16
clocksources") I assumed that CPUs on the same physical core are syncronous.
While booting up the kernel on two different C8000 machines, one with a
dual-core PA8800 and one with a dual-core PA8900 CPU, this turned out to be
wrong. The symptom was that I saw a jump in the internal clocks printed to the
syslog and strange overall behaviour.  On machines which have 4 cores (2
dual-cores) the problem isn't visible, because the current logic already marked
the cr16 clocksource unstable in this case.

This patch now marks the cr16 interval timers unstable if we have more than one
CPU in the system, and it fixes this issue.

Fixes: c8c3735997 ("parisc: Enhance detection of synchronous cr16 clocksources")
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v5.15+
2021-12-04 21:36:04 +01:00
Helge Deller 0f9fee4cde parisc: Fix "make install" on newer debian releases
On newer debian releases the debian-provided "installkernel" script is
installed in /usr/sbin. Fix the kernel install.sh script to look for the
script in this directory as well.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v3.13+
2021-12-04 21:36:04 +01:00
Linus Torvalds 757f3e6ddd s390 updates for 5.16-rc4
- Fix potential overlap of pseudo-MMIO addresses with MIO addresses.
 
 - Fix stack unwinder test case inline assembly compile error that
   happens with LLVM's integrated assembler.
 
 - Update defconfigs.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmGqZq8ACgkQIg7DeRsp
 bsIHuQ/+PeeVR92LJg+b8oo9L8wTQy9w2GslKpU27AGXoj9PC6iGWIfp5v+tj+L7
 LCMykCQIS51JtdtMiymEpmITiSa0hhQ9Cft4qAa3UYjWNZDzYjA03H8emBnnMTlj
 DTBKcubLguxmSTx+MDGIIkUhpicGRky08AKcf8KjF4KkBFtwSr4vcA+I07ZEoE6p
 snoUbrt9CRuPowSkn4vLDARLbRpiom+5k/QA9MnfHXtocC5ivOYvGvS44PXgGO7/
 MCSML5T+4ngv0d9jWyDqJwnRch+BkJKwWTkyNhUethc5hUJxPw+Z0XPrKQCpAQFj
 9mSLbQgf6r32wiEc3NeDnqeqAAn/GJ6QweQkuElb9+kABw7XdNxIzv+uit8eD9/F
 +9sOH4UjbBs9aRMvMEb67ZlnLQ5yWRafYPSLDMyb2UEO6CW7ns+Ekhs39HowqsiB
 phNrgZBKCiXb34wwDedswo1Vt3l1ferTIVtBbKGeDFeD7Den/FSxppWfd+ZVEtZ4
 pFK1KmvbPhFZcoqJg4SRWETASP4tZuMXQTL5Rdnr0d/nLnAQuLf/QEevV32ZIgjN
 QXm6Yre+QFLAiThSLKqKcJ5eEraEDParjkcHvL523BEEuZv7a7c/Aji6pI2gpumn
 x+cM8Iq0ZzLBqDFZaVCtz0gcQ2ctUaFzjdTXvpSGEXw3K/iPovE=
 =+u2H
 -----END PGP SIGNATURE-----

Merge tag 's390-5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Heiko Carstens:

 - Fix potential overlap of pseudo-MMIO addresses with MIO addresses

 - Fix stack unwinder test case inline assembly compile error that
   happens with LLVM's integrated assembler

 - Update defconfigs

* tag 's390-5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: update defconfigs
  s390/pci: move pseudo-MMIO to prevent MIO overlap
  s390/test_unwind: use raw opcode instead of invalid instruction
2021-12-03 11:46:20 -08:00
Linus Torvalds a2aeaeabbc arm64 fixes for -rc4
- Add missing BTI landing instructions to the ftrace*_caller trampolines
 
 - Fix kexec() WARN when DEBUG_VIRTUAL is enabled
 
 - Fix PAC documentation by removing stale references to compiler flags
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmGp/DwQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNBRNCACxT4pnnSuZssYZLdn+bVh9ahLqDwATfsYQ
 NZJEzPmaS2QoLZZ3a6ZwRUjnH7VAfsxHyq17m1SN8Hbx5mh3ZkWNbN4sEy8vlLz8
 m7NK0YKU12SMlP8Vmlgw9gzXgk4yQ/OnK2Jl50SQCGkT3MvCohx16X4lcY+M1oTq
 2+9Rwbpi05T0G7rIFQFPwWqbJyiCoJ0Xr/iVmo1IX74yVp0oT1SGTBADcIsCIHRO
 /xVlsHEsOQWtguZcwZE8UDVtBCgrZFnJh3P+EENlRBZ48ANsWCcQpGMf+wrsrG+l
 chIKU3oLFLe1JcNV1zG8D8RdwQA9r/MSzq0KZFJTw+CNvM11AqAg
 =JEy/
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "Three arm64 fixes for -rc4.

  One of them is just a trivial documentation fix, whereas the other two
  address a warning in the kexec code and a crash in ftrace on systems
  implementing BTI.

  The latter patch has a couple of ugly ifdefs which Mark plans to clean
  up separately, but as-is the patch is straightforward for backporting
  to stable kernels.

  Summary:

   - Add missing BTI landing instructions to the ftrace*_caller
     trampolines

   - Fix kexec() WARN when DEBUG_VIRTUAL is enabled

   - Fix PAC documentation by removing stale references to compiler
     flags"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: ftrace: add missing BTIs
  arm64: kexec: use __pa_symbol(empty_zero_page)
  arm64: update PAC description for kernel
2021-12-03 10:50:14 -08:00
Lai Jiangshan 5c8f6a2e31 x86/xen: Add xenpv_restore_regs_and_return_to_usermode()
In the native case, PER_CPU_VAR(cpu_tss_rw + TSS_sp0) is the
trampoline stack. But XEN pv doesn't use trampoline stack, so
PER_CPU_VAR(cpu_tss_rw + TSS_sp0) is also the kernel stack.

In that case, source and destination stacks are identical, which means
that reusing swapgs_restore_regs_and_return_to_usermode() in XEN pv
would cause %rsp to move up to the top of the kernel stack and leave the
IRET frame below %rsp.

This is dangerous as it can be corrupted if #NMI / #MC hit as either of
these events occurring in the middle of the stack pushing would clobber
data on the (original) stack.

And, with  XEN pv, swapgs_restore_regs_and_return_to_usermode() pushing
the IRET frame on to the original address is useless and error-prone
when there is any future attempt to modify the code.

 [ bp: Massage commit message. ]

Fixes: 7f2590a110 ("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lkml.kernel.org/r/20211126101209.8613-4-jiangshanlai@gmail.com
2021-12-03 19:21:15 +01:00
Lai Jiangshan 1367afaa2e x86/entry: Use the correct fence macro after swapgs in kernel CR3
The commit

  c758907004 ("x86/entry/64: Remove unneeded kernel CR3 switching")

removed a CR3 write in the faulting path of load_gs_index().

But the path's FENCE_SWAPGS_USER_ENTRY has no fence operation if PTI is
enabled, see spectre_v1_select_mitigation().

Rather, it depended on the serializing CR3 write of SWITCH_TO_KERNEL_CR3
and since it got removed, add a FENCE_SWAPGS_KERNEL_ENTRY call to make
sure speculation is blocked.

 [ bp: Massage commit message and comment. ]

Fixes: c758907004 ("x86/entry/64: Remove unneeded kernel CR3 switching")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211126101209.8613-3-jiangshanlai@gmail.com
2021-12-03 19:13:53 +01:00
Lai Jiangshan c07e45553d x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry()
Commit

  18ec54fdd6 ("x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations")

added FENCE_SWAPGS_{KERNEL|USER}_ENTRY for conditional SWAPGS. In
paranoid_entry(), it uses only FENCE_SWAPGS_KERNEL_ENTRY for both
branches. This is because the fence is required for both cases since the
CR3 write is conditional even when PTI is enabled.

But

  96b2371413 ("x86/entry/64: Switch CR3 before SWAPGS in paranoid entry")

changed the order of SWAPGS and the CR3 write. And it missed the needed
FENCE_SWAPGS_KERNEL_ENTRY for the user gsbase case.

Add it back by changing the branches so that FENCE_SWAPGS_KERNEL_ENTRY
can cover both branches.

  [ bp: Massage, fix typos, remove obsolete comment while at it. ]

Fixes: 96b2371413 ("x86/entry/64: Switch CR3 before SWAPGS in paranoid entry")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211126101209.8613-2-jiangshanlai@gmail.com
2021-12-03 18:55:47 +01:00
Michael Sterritt 1d5379d047 x86/sev: Fix SEV-ES INS/OUTS instructions for word, dword, and qword
Properly type the operands being passed to __put_user()/__get_user().
Otherwise, these routines truncate data for dependent instructions
(e.g., INSW) and only read/write one byte.

This has been tested by sending a string with REP OUTSW to a port and
then reading it back in with REP INSW on the same port.

Previous behavior was to only send and receive the first char of the
size. For example, word operations for "abcd" would only read/write
"ac". With change, the full string is now written and read back.

Fixes: f980f9c31a (x86/sev-es: Compile early handler code into kernel image)
Signed-off-by: Michael Sterritt <sterritt@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Reviewed-by: Peter Gonda <pgonda@google.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/20211119232757.176201-1-sterritt@google.com
2021-12-03 18:09:30 +01:00
Joerg Roedel 51523ed1c2 x86/64/mm: Map all kernel memory into trampoline_pgd
The trampoline_pgd only maps the 0xfffffff000000000-0xffffffffffffffff
range of kernel memory (with 4-level paging). This range contains the
kernel's text+data+bss mappings and the module mapping space but not the
direct mapping and the vmalloc area.

This is enough to get the application processors out of real-mode, but
for code that switches back to real-mode the trampoline_pgd is missing
important parts of the address space. For example, consider this code
from arch/x86/kernel/reboot.c, function machine_real_restart() for a
64-bit kernel:

  #ifdef CONFIG_X86_32
  	load_cr3(initial_page_table);
  #else
  	write_cr3(real_mode_header->trampoline_pgd);

  	/* Exiting long mode will fail if CR4.PCIDE is set. */
  	if (boot_cpu_has(X86_FEATURE_PCID))
  		cr4_clear_bits(X86_CR4_PCIDE);
  #endif

  	/* Jump to the identity-mapped low memory code */
  #ifdef CONFIG_X86_32
  	asm volatile("jmpl *%0" : :
  		     "rm" (real_mode_header->machine_real_restart_asm),
  		     "a" (type));
  #else
  	asm volatile("ljmpl *%0" : :
  		     "m" (real_mode_header->machine_real_restart_asm),
  		     "D" (type));
  #endif

The code switches to the trampoline_pgd, which unmaps the direct mapping
and also the kernel stack. The call to cr4_clear_bits() will find no
stack and crash the machine. The real_mode_header pointer below points
into the direct mapping, and dereferencing it also causes a crash.

The reason this does not crash always is only that kernel mappings are
global and the CR3 switch does not flush those mappings. But if theses
mappings are not in the TLB already, the above code will crash before it
can jump to the real-mode stub.

Extend the trampoline_pgd to contain all kernel mappings to prevent
these crashes and to make code which runs on this page-table more
robust.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20211202153226.22946-5-joro@8bytes.org
2021-12-03 09:11:43 +01:00
Heiko Carstens 3c088b1e82 s390: update defconfigs
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-12-02 19:29:44 +01:00
Mark Rutland 35b6b28e69 arm64: ftrace: add missing BTIs
When branch target identifiers are in use, code reachable via an
indirect branch requires a BTI landing pad at the branch target site.

When building FTRACE_WITH_REGS atop patchable-function-entry, we miss
BTIs at the start start of the `ftrace_caller` and `ftrace_regs_caller`
trampolines, and when these are called from a module via a PLT (which
will use a `BR X16`), we will encounter a BTI failure, e.g.

| # insmod lkdtm.ko
| lkdtm: No crash points registered, enable through debugfs
| # echo function_graph > /sys/kernel/debug/tracing/current_tracer
| # cat /sys/kernel/debug/provoke-crash/DIRECT
| Unhandled 64-bit el1h sync exception on CPU0, ESR 0x34000001 -- BTI
| CPU: 0 PID: 174 Comm: cat Not tainted 5.16.0-rc2-dirty #3
| Hardware name: linux,dummy-virt (DT)
| pstate: 60400405 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=jc)
| pc : ftrace_caller+0x0/0x3c
| lr : lkdtm_debugfs_open+0xc/0x20 [lkdtm]
| sp : ffff800012e43b00
| x29: ffff800012e43b00 x28: 0000000000000000 x27: ffff800012e43c88
| x26: 0000000000000000 x25: 0000000000000000 x24: ffff0000c171f200
| x23: ffff0000c27b1e00 x22: ffff0000c2265240 x21: ffff0000c23c8c30
| x20: ffff8000090ba380 x19: 0000000000000000 x18: 0000000000000000
| x17: 0000000000000000 x16: ffff80001002bb4c x15: 0000000000000000
| x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000900ff0
| x11: ffff0000c4166310 x10: ffff800012e43b00 x9 : ffff8000104f2384
| x8 : 0000000000000001 x7 : 0000000000000000 x6 : 000000000000003f
| x5 : 0000000000000040 x4 : ffff800012e43af0 x3 : 0000000000000001
| x2 : ffff8000090b0000 x1 : ffff0000c171f200 x0 : ffff0000c23c8c30
| Kernel panic - not syncing: Unhandled exception
| CPU: 0 PID: 174 Comm: cat Not tainted 5.16.0-rc2-dirty #3
| Hardware name: linux,dummy-virt (DT)
| Call trace:
|  dump_backtrace+0x0/0x1a4
|  show_stack+0x24/0x30
|  dump_stack_lvl+0x68/0x84
|  dump_stack+0x1c/0x38
|  panic+0x168/0x360
|  arm64_exit_nmi.isra.0+0x0/0x80
|  el1h_64_sync_handler+0x68/0xd4
|  el1h_64_sync+0x78/0x7c
|  ftrace_caller+0x0/0x3c
|  do_dentry_open+0x134/0x3b0
|  vfs_open+0x38/0x44
|  path_openat+0x89c/0xe40
|  do_filp_open+0x8c/0x13c
|  do_sys_openat2+0xbc/0x174
|  __arm64_sys_openat+0x6c/0xbc
|  invoke_syscall+0x50/0x120
|  el0_svc_common.constprop.0+0xdc/0x100
|  do_el0_svc+0x84/0xa0
|  el0_svc+0x28/0x80
|  el0t_64_sync_handler+0xa8/0x130
|  el0t_64_sync+0x1a0/0x1a4
| SMP: stopping secondary CPUs
| Kernel Offset: disabled
| CPU features: 0x0,00000f42,da660c5f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: Unhandled exception ]---

Fix this by adding the required `BTI C`, as we only require these to be
reachable via BL for direct calls or BR X16/X17 for PLTs. For now, these
are open-coded in the function prologue, matching the style of the
`__hwasan_tag_mismatch` trampoline.

In future we may wish to consider adding a new SYM_CODE_START_*()
variant which has an implicit BTI.

When ftrace is built atop mcount, the trampolines are marked with
SYM_FUNC_START(), and so get an implicit BTI. We may need to change
these over to SYM_CODE_START() in future for RELIABLE_STACKTRACE, in
case we need to apply special care aroud the return address being
rewritten.

Fixes: 97fed779f2 ("arm64: bti: Provide Kconfig for kernel mode BTI")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211129135709.2274019-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-12-02 10:18:32 +00:00
Mark Rutland 2f2183243f arm64: kexec: use __pa_symbol(empty_zero_page)
In machine_kexec_post_load() we use __pa() on `empty_zero_page`, so that
we can use the physical address during arm64_relocate_new_kernel() to
switch TTBR1 to a new set of tables. While `empty_zero_page` is part of
the old kernel, we won't clobber it until after this switch, so using it
is benign.

However, `empty_zero_page` is part of the kernel image rather than a
linear map address, so it is not correct to use __pa(x), and we should
instead use __pa_symbol(x) or __pa(lm_alias(x)). Otherwise, when the
kernel is built with DEBUG_VIRTUAL, we'll encounter splats as below, as
I've seen when fuzzing v5.16-rc3 with Syzkaller:

| ------------[ cut here ]------------
| virt_to_phys used for non-linear address: 000000008492561a (empty_zero_page+0x0/0x1000)
| WARNING: CPU: 3 PID: 11492 at arch/arm64/mm/physaddr.c:15 __virt_to_phys+0x120/0x1c0 arch/arm64/mm/physaddr.c:12
| CPU: 3 PID: 11492 Comm: syz-executor.0 Not tainted 5.16.0-rc3-00001-g48bd452a045c #1
| Hardware name: linux,dummy-virt (DT)
| pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : __virt_to_phys+0x120/0x1c0 arch/arm64/mm/physaddr.c:12
| lr : __virt_to_phys+0x120/0x1c0 arch/arm64/mm/physaddr.c:12
| sp : ffff80001af17bb0
| x29: ffff80001af17bb0 x28: ffff1cc65207b400 x27: ffffb7828730b120
| x26: 0000000000000e11 x25: 0000000000000000 x24: 0000000000000001
| x23: ffffb7828963e000 x22: ffffb78289644000 x21: 0000600000000000
| x20: 000000000000002d x19: 0000b78289644000 x18: 0000000000000000
| x17: 74706d6528206131 x16: 3635323934383030 x15: 303030303030203a
| x14: 1ffff000035e2eb8 x13: ffff6398d53f4f0f x12: 1fffe398d53f4f0e
| x11: 1fffe398d53f4f0e x10: ffff6398d53f4f0e x9 : ffffb7827c6f76dc
| x8 : ffff1cc6a9fa7877 x7 : 0000000000000001 x6 : ffff6398d53f4f0f
| x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff1cc66f2a99c0
| x2 : 0000000000040000 x1 : d7ce7775b09b5d00 x0 : 0000000000000000
| Call trace:
|  __virt_to_phys+0x120/0x1c0 arch/arm64/mm/physaddr.c:12
|  machine_kexec_post_load+0x284/0x670 arch/arm64/kernel/machine_kexec.c:150
|  do_kexec_load+0x570/0x670 kernel/kexec.c:155
|  __do_sys_kexec_load kernel/kexec.c:250 [inline]
|  __se_sys_kexec_load kernel/kexec.c:231 [inline]
|  __arm64_sys_kexec_load+0x1d8/0x268 kernel/kexec.c:231
|  __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
|  invoke_syscall+0x90/0x2e0 arch/arm64/kernel/syscall.c:52
|  el0_svc_common.constprop.2+0x1e4/0x2f8 arch/arm64/kernel/syscall.c:142
|  do_el0_svc+0xf8/0x150 arch/arm64/kernel/syscall.c:181
|  el0_svc+0x60/0x248 arch/arm64/kernel/entry-common.c:603
|  el0t_64_sync_handler+0x90/0xb8 arch/arm64/kernel/entry-common.c:621
|  el0t_64_sync+0x180/0x184 arch/arm64/kernel/entry.S:572
| irq event stamp: 2428
| hardirqs last  enabled at (2427): [<ffffb7827c6f2308>] __up_console_sem+0xf0/0x118 kernel/printk/printk.c:255
| hardirqs last disabled at (2428): [<ffffb7828223df98>] el1_dbg+0x28/0x80 arch/arm64/kernel/entry-common.c:375
| softirqs last  enabled at (2424): [<ffffb7827c411c00>] softirq_handle_end kernel/softirq.c:401 [inline]
| softirqs last  enabled at (2424): [<ffffb7827c411c00>] __do_softirq+0xa28/0x11e4 kernel/softirq.c:587
| softirqs last disabled at (2417): [<ffffb7827c59015c>] do_softirq_own_stack include/asm-generic/softirq_stack.h:10 [inline]
| softirqs last disabled at (2417): [<ffffb7827c59015c>] invoke_softirq kernel/softirq.c:439 [inline]
| softirqs last disabled at (2417): [<ffffb7827c59015c>] __irq_exit_rcu kernel/softirq.c:636 [inline]
| softirqs last disabled at (2417): [<ffffb7827c59015c>] irq_exit_rcu+0x53c/0x688 kernel/softirq.c:648
| ---[ end trace 0ca578534e7ca938 ]---

With or without DEBUG_VIRTUAL __pa() will fall back to __kimg_to_phys()
for non-linear addresses, and will happen to do the right thing in this
case, even with the warning. But we should not depend upon this, and to
keep the warning useful we should fix this case.

Fix this issue by using __pa_symbol(), which handles kernel image
addresses (and checks its input is a kernel image address). This matches
what we do elsewhere, e.g. in arch/arm64/include/asm/pgtable.h:

| #define ZERO_PAGE(vaddr)       phys_to_page(__pa_symbol(empty_zero_page))

Fixes: 3744b5280e ("arm64: kexec: install a copy of the linear-map")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/r/20211130121849.3319010-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-12-02 10:17:12 +00:00
Sean Christopherson a955cad84c KVM: x86/mmu: Retry page fault if root is invalidated by memslot update
Bail from the page fault handler if the root shadow page was obsoleted by
a memslot update.  Do the check _after_ acuiring mmu_lock, as the TDP MMU
doesn't rely on the memslot/MMU generation, and instead relies on the
root being explicit marked invalid by kvm_mmu_zap_all_fast(), which takes
mmu_lock for write.

For the TDP MMU, inserting a SPTE into an obsolete root can leak a SP if
kvm_tdp_mmu_zap_invalidated_roots() has already zapped the SP, i.e. has
moved past the gfn associated with the SP.

For other MMUs, the resulting behavior is far more convoluted, though
unlikely to be truly problematic.  Installing SPs/SPTEs into the obsolete
root isn't directly problematic, as the obsolete root will be unloaded
and dropped before the vCPU re-enters the guest.  But because the legacy
MMU tracks shadow pages by their role, any SP created by the fault can
can be reused in the new post-reload root.  Again, that _shouldn't_ be
problematic as any leaf child SPTEs will be created for the current/valid
memslot generation, and kvm_mmu_get_page() will not reuse child SPs from
the old generation as they will be flagged as obsolete.  But, given that
continuing with the fault is pointess (the root will be unloaded), apply
the check to all MMUs.

Fixes: b7cccd397f ("KVM: x86/mmu: Fast invalidation for TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211120045046.3940942-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-02 04:12:12 -05:00
Dan Carpenter bfbb307c62 KVM: VMX: Set failure code in prepare_vmcs02()
The error paths in the prepare_vmcs02() function are supposed to set
*entry_failure_code but this path does not.  It leads to using an
uninitialized variable in the caller.

Fixes: 71f7347025 ("KVM: nVMX: Load GUEST_IA32_PERF_GLOBAL_CTRL MSR on VM-Entry")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Message-Id: <20211130125337.GB24578@kili>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-02 04:12:11 -05:00
Paolo Bonzini ef8b4b7203 KVM: ensure APICv is considered inactive if there is no APIC
kvm_vcpu_apicv_active() returns false if a virtual machine has no in-kernel
local APIC, however kvm_apicv_activated might still be true if there are
no reasons to disable APICv; in fact it is quite likely that there is none
because APICv is inhibited by specific configurations of the local APIC
and those configurations cannot be programmed.  This triggers a WARN:

   WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu));

To avoid this, introduce another cause for APICv inhibition, namely the
absence of an in-kernel local APIC.  This cause is enabled by default,
and is dropped by either KVM_CREATE_IRQCHIP or the enabling of
KVM_CAP_IRQCHIP_SPLIT.

Reported-by: Ignat Korchagin <ignat@cloudflare.com>
Fixes: ee49a89329 ("KVM: x86: Move SVM's APICv sanity check to common x86", 2021-10-22)
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Tested-by: Ignat Korchagin <ignat@cloudflare.com>
Message-Id: <20211130123746.293379-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-02 04:12:11 -05:00
Like Xu cb1d220da0 KVM: x86/pmu: Fix reserved bits for AMD PerfEvtSeln register
If we run the following perf command in an AMD Milan guest:

  perf stat \
  -e cpu/event=0x1d0/ \
  -e cpu/event=0x1c7/ \
  -e cpu/umask=0x1f,event=0x18e/ \
  -e cpu/umask=0x7,event=0x18e/ \
  -e cpu/umask=0x18,event=0x18e/ \
  ./workload

dmesg will report a #GP warning from an unchecked MSR access
error on MSR_F15H_PERF_CTLx.

This is because according to APM (Revision: 4.03) Figure 13-7,
the bits [35:32] of AMD PerfEvtSeln register is a part of the
event select encoding, which extends the EVENT_SELECT field
from 8 bits to 12 bits.

Opportunistically update pmu->reserved_bits for reserved bit 19.

Reported-by: Jim Mattson <jmattson@google.com>
Fixes: ca724305a2 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM")
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20211118130320.95997-1-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-02 04:11:50 -05:00
Feng Tang b50db7095f x86/tsc: Disable clocksource watchdog for TSC on qualified platorms
There are cases that the TSC clocksource is wrongly judged as unstable by
the clocksource watchdog mechanism which tries to validate the TSC against
HPET, PM_TIMER or jiffies. While there is hardly a general reliable way to
check the validity of a watchdog, Thomas Gleixner proposed [1]:

"I'm inclined to lift that requirement when the CPU has:

    1) X86_FEATURE_CONSTANT_TSC
    2) X86_FEATURE_NONSTOP_TSC
    3) X86_FEATURE_NONSTOP_TSC_S3
    4) X86_FEATURE_TSC_ADJUST
    5) At max. 4 sockets

 After two decades of horrors we're finally at a point where TSC seems
 to be halfway reliable and less abused by BIOS tinkerers. TSC_ADJUST
 was really key as we can now detect even small modifications reliably
 and the important point is that we can cure them as well (not pretty
 but better than all other options)."

As feature #3 X86_FEATURE_NONSTOP_TSC_S3 only exists on several generations
of Atom processorz, and is always coupled with X86_FEATURE_CONSTANT_TSC
and X86_FEATURE_NONSTOP_TSC, skip checking it, and also be more defensive
to use maximal 2 sockets.

The check is done inside tsc_init() before registering 'tsc-early' and
'tsc' clocksources, as there were cases that both of them had been
wrongly judged as unreliable.

For more background of tsc/watchdog, there is a good summary in [2]

[tglx} Update vs. jiffies:

  On systems where the only remaining clocksource aside of TSC is jiffies
  there is no way to make this work because that creates a circular
  dependency. Jiffies accuracy depends on not missing a periodic timer
  interrupt, which is not guaranteed. That could be detected by TSC, but as
  TSC is not trusted this cannot be compensated. The consequence is a
  circulus vitiosus which results in shutting down TSC and falling back to
  the jiffies clocksource which is even more unreliable.

[1]. https://lore.kernel.org/lkml/87eekfk8bd.fsf@nanos.tec.linutronix.de/
[2]. https://lore.kernel.org/lkml/87a6pimt1f.ffs@nanos.tec.linutronix.de/

[ tglx: Refine comment and amend changelog ]

Fixes: 6e3cd95234 ("x86/hpet: Use another crystalball to evaluate HPET usability")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211117023751.24190-2-feng.tang@intel.com
2021-12-02 00:40:36 +01:00
Feng Tang c7719e7934 x86/tsc: Add a timer to make sure TSC_adjust is always checked
The TSC_ADJUST register is checked every time a CPU enters idle state, but
Thomas Gleixner mentioned there is still a caveat that a system won't enter
idle [1], either because it's too busy or configured purposely to not enter
idle.

Setup a periodic timer (every 10 minutes) to make sure the check is
happening on a regular base.

[1] https://lore.kernel.org/lkml/875z286xtk.fsf@nanos.tec.linutronix.de/

Fixes: 6e3cd95234 ("x86/hpet: Use another crystalball to evaluate HPET usability")
Requested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211117023751.24190-1-feng.tang@intel.com
2021-12-02 00:40:35 +01:00
Marco Elver 52d0b8b187 x86/fpu/signal: Initialize sw_bytes in save_xstate_epilog()
save_sw_bytes() did not fully initialize sw_bytes, which caused KMSAN
to report an infoleak (see below).
Initialize sw_bytes explicitly to avoid this.

KMSAN report follows:

=====================================================
BUG: KMSAN: kernel-infoleak in instrument_copy_to_user ./include/linux/instrumented.h:121
BUG: KMSAN: kernel-infoleak in __copy_to_user ./include/linux/uaccess.h:154
BUG: KMSAN: kernel-infoleak in save_xstate_epilog+0x2df/0x510 arch/x86/kernel/fpu/signal.c:127
 instrument_copy_to_user ./include/linux/instrumented.h:121
 __copy_to_user ./include/linux/uaccess.h:154
 save_xstate_epilog+0x2df/0x510 arch/x86/kernel/fpu/signal.c:127
 copy_fpstate_to_sigframe+0x861/0xb60 arch/x86/kernel/fpu/signal.c:245
 get_sigframe+0x656/0x7e0 arch/x86/kernel/signal.c:296
 __setup_rt_frame+0x14d/0x2a60 arch/x86/kernel/signal.c:471
 setup_rt_frame arch/x86/kernel/signal.c:781
 handle_signal arch/x86/kernel/signal.c:825
 arch_do_signal_or_restart+0x417/0xdd0 arch/x86/kernel/signal.c:870
 handle_signal_work kernel/entry/common.c:149
 exit_to_user_mode_loop+0x1f6/0x490 kernel/entry/common.c:173
 exit_to_user_mode_prepare kernel/entry/common.c:208
 __syscall_exit_to_user_mode_work kernel/entry/common.c:290
 syscall_exit_to_user_mode+0x7e/0xc0 kernel/entry/common.c:302
 do_syscall_64+0x60/0xd0 arch/x86/entry/common.c:88
 entry_SYSCALL_64_after_hwframe+0x44/0xae ??:?

Local variable sw_bytes created at:
 save_xstate_epilog+0x80/0x510 arch/x86/kernel/fpu/signal.c:121
 copy_fpstate_to_sigframe+0x861/0xb60 arch/x86/kernel/fpu/signal.c:245

Bytes 20-47 of 48 are uninitialized
Memory access of size 48 starts at ffff8880801d3a18
Data copied to user address 00007ffd90e2ef50
=====================================================

Link: https://lore.kernel.org/all/CAG_fn=V9T6OKPonSjsi9PmWB0hMHFC=yawozdft8i1-MSxrv=w@mail.gmail.com/
Fixes: 53599b4d54 ("x86/fpu/signal: Prepare for variable sigframe length")
Reported-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Alexander Potapenko <glider@google.com>
Link: https://lkml.kernel.org/r/20211126124746.761278-1-glider@google.com
2021-11-30 15:13:47 -08:00
Tony Luck 7d697f0d57 x86/cpu: Drop spurious underscore from RAPTOR_LAKE #define
Convention for all the other "lake" CPUs is all one word.

So s/RAPTOR_LAKE/RAPTORLAKE/

Fixes: fbdb5e8f29 ("x86/cpu: Add Raptor Lake to Intel family")
Reported-by: Rui Zhang <rui.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20211119170832.1034220-1-tony.luck@intel.com
2021-11-30 14:05:48 -08:00
Helge Deller 7e8aeb9d46 parisc: Enable sata sil, audit and usb support on 64-bit defconfig
Add some more config options which reflect what's needed to boot our
64-bit debian buildds out of the box.

Signed-off-by: Helge Deller <deller@gmx.de>
2021-11-30 21:49:32 +01:00
Helge Deller 1d7c29b777 parisc: Fix KBUILD_IMAGE for self-extracting kernel
Default KBUILD_IMAGE to $(boot)/bzImage if a self-extracting
(CONFIG_PARISC_SELF_EXTRACT=y) kernel is to be built.
This fixes the bindeb-pkg make target.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v4.14+
2021-11-30 21:49:32 +01:00
Linus Torvalds f080815fdb ARM64:
* Fix constant sign extension affecting TCR_EL2 and preventing
 running on ARMv8.7 models due to spurious bits being set
 
 * Fix use of helpers using PSTATE early on exit by always sampling
 it as soon as the exit takes place
 
 * Move pkvm's 32bit handling into a common helper
 
 RISC-V:
 
 * Fix incorrect KVM_MAX_VCPUS value
 
 * Unmap stage2 mapping when deleting/moving a memslot
 
 x86:
 
 * Fix and downgrade BUG_ON due to uninitialized cache
 
 * Many APICv and MOVE_ENC_CONTEXT_FROM fixes
 
 * Correctly emulate TLB flushes around nested vmentry/vmexit
 and when the nested hypervisor uses VPID
 
 * Prevent modifications to CPUID after the VM has run
 
 * Other smaller bugfixes
 
 Generic:
 
 * Memslot handling bugfixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmGmHBEUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOkGgf/RBjt1d7H6Um7tD7oA5QiIHmNY4ko
 K/90OAa8h62rilxpqxkRgLNmphBc5AzcbufVXN4J1hVhw2M+u1ouDxKeHS1GEZTA
 /XdNb0dwK99TpOJkIcuV/NQVIZUxkM00VbIiCoLkX06VuIc1Gie1G4bqzLhWCP8Y
 ts9l/pkfafvfEmjmcjVd7gkDOnEPbT+JPDJcuo/RA7C7Z2L4+8DsFeyfWGqBP647
 J6omUUxD82QRm28OVOK4V7aNALWsAdlaqHrVFAPZywQl7QTWMO0UTcKTdCCB2B4Q
 QnHejFV6pFh55q3/fhe7epy9e2Sw+NOsmWKTEGPbU5nn94R8lyW1GV4ZUQ==
 =Nduu
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM64:

   - Fix constant sign extension affecting TCR_EL2 and preventing
     running on ARMv8.7 models due to spurious bits being set

   - Fix use of helpers using PSTATE early on exit by always sampling it
     as soon as the exit takes place

   - Move pkvm's 32bit handling into a common helper

  RISC-V:

   - Fix incorrect KVM_MAX_VCPUS value

   - Unmap stage2 mapping when deleting/moving a memslot

  x86:

   - Fix and downgrade BUG_ON due to uninitialized cache

   - Many APICv and MOVE_ENC_CONTEXT_FROM fixes

   - Correctly emulate TLB flushes around nested vmentry/vmexit and when
     the nested hypervisor uses VPID

   - Prevent modifications to CPUID after the VM has run

   - Other smaller bugfixes

  Generic:

   - Memslot handling bugfixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (44 commits)
  KVM: fix avic_set_running for preemptable kernels
  KVM: VMX: clear vmx_x86_ops.sync_pir_to_irr if APICv is disabled
  KVM: SEV: accept signals in sev_lock_two_vms
  KVM: SEV: do not take kvm->lock when destroying
  KVM: SEV: Prohibit migration of a VM that has mirrors
  KVM: SEV: Do COPY_ENC_CONTEXT_FROM with both VMs locked
  selftests: sev_migrate_tests: add tests for KVM_CAP_VM_COPY_ENC_CONTEXT_FROM
  KVM: SEV: move mirror status to destination of KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM
  KVM: SEV: initialize regions_list of a mirror VM
  KVM: SEV: cleanup locking for KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM
  KVM: SEV: do not use list_replace_init on an empty list
  KVM: x86: Use a stable condition around all VT-d PI paths
  KVM: x86: check PIR even for vCPUs with disabled APICv
  KVM: VMX: prepare sync_pir_to_irr for running with APICv disabled
  KVM: selftests: page_table_test: fix calculation of guest_test_phys_mem
  KVM: x86/mmu: Handle "default" period when selectively waking kthread
  KVM: MMU: shadow nested paging does not have PKU
  KVM: x86/mmu: Remove spurious TLB flushes in TDP MMU zap collapsible path
  KVM: x86/mmu: Use yield-safe TDP MMU root iter in MMU notifier unmapping
  KVM: X86: Use vcpu->arch.walk_mmu for kvm_mmu_invlpg()
  ...
2021-11-30 09:22:15 -08:00
Johan Almbladh 099f83aa2d mips, bpf: Fix reference to non-existing Kconfig symbol
The Kconfig symbol for R10000 ll/sc errata workaround in the MIPS JIT was
misspelled, causing the workaround to not take effect when enabled.

Fixes: 72570224bb ("mips, bpf: Add JIT workarounds for CPU errata")
Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211130160824.3781635-1-johan.almbladh@anyfinetworks.com
2021-11-30 17:19:36 +01:00
Paolo Bonzini 7cfc5c653b KVM: fix avic_set_running for preemptable kernels
avic_set_running() passes the current CPU to avic_vcpu_load(), albeit
via vcpu->cpu rather than smp_processor_id().  If the thread is migrated
while avic_set_running runs, the call to avic_vcpu_load() can use a stale
value for the processor id.  Avoid this by blocking preemption over the
entire execution of avic_set_running().

Reported-by: Sean Christopherson <seanjc@google.com>
Fixes: 8221c13700 ("svm: Manage vcpu load/unload when enable AVIC")
Cc: stable@vger.kernel.org
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-30 07:40:48 -05:00
Paolo Bonzini e90e51d5f0 KVM: VMX: clear vmx_x86_ops.sync_pir_to_irr if APICv is disabled
There is nothing to synchronize if APICv is disabled, since neither
other vCPUs nor assigned devices can set PIR.ON.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-30 07:40:47 -05:00
Paolo Bonzini c9d61dcb0b KVM: SEV: accept signals in sev_lock_two_vms
Generally, kvm->lock is not taken for a long time, but
sev_lock_two_vms is different: it takes vCPU locks
inside, so userspace can hold it back just by calling
a vCPU ioctl.  Play it safe and use mutex_lock_killable.

Message-Id: <20211123005036.2954379-13-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-30 03:54:15 -05:00
Paolo Bonzini 10a37929ef KVM: SEV: do not take kvm->lock when destroying
Taking the lock is useless since there are no other references,
and there are already accesses (e.g. to sev->enc_context_owner)
that do not take it.  So get rid of it.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211123005036.2954379-12-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-30 03:54:14 -05:00