Commit Graph

2169 Commits

Author SHA1 Message Date
Qiuxu Zhuo f7a29a3737 EDAC/i10nm: Reallocate skx_dev list if preconfigured cnt != runtime cnt
Ideally, read the present DDR memory controller count first and then
allocate the skx_dev list using this count. However, this approach
requires adding a significant amount of code similar to
skx_get_all_bus_mappings() to obtain the PCI bus mappings for the first
socket and use these mappings along with the related PCI register offset
to read the memory controller count.

Given that the Granite Rapids CPU is the only one that can detect the
count of memory controllers at runtime (other CPUs use the count in the
configuration data), to reduce code complexity, reallocate the skx_dev
list only if the preconfigured count of DDR memory controllers differs
from the count read at runtime for Granite Rapids CPU.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250731145534.2759334-7-qiuxu.zhuo@intel.com
2025-08-19 16:24:57 -07:00
Qiuxu Zhuo 91ded20fa2 EDAC/skx_common: Remove redundant upper bound check for res->imc
The following upper bound check for the memory controller physical index
decoded by ADXL is the only place where use the macro 'NUM_IMC' is used:

  res->imc > NUM_IMC - 1

Since this check is already covered by skx_get_mc_mapping(), meaning no
memory controller logical index exists for an invalid memory controller
physical index decoded by ADXL, remove the redundant upper bound check
so that the definition for 'NUM_IMC' can be cleaned up (in another patch).

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250731145534.2759334-6-qiuxu.zhuo@intel.com
2025-08-19 16:24:35 -07:00
Qiuxu Zhuo 43060ca533 EDAC/skx_common: Make skx_dev->imc[] a flexible array
The current skx->imc[NUM_IMC] array of memory controller instances is
sized using the macro NUM_IMC. Each time EDAC support is added for a
new CPU, NUM_IMC needs to be updated to ensure it is greater than or
equal to the number of memory controllers for the new CPU. This approach
is inconvenient and results in memory waste for older CPUs with fewer
memory controllers.

To address this, make skx->imc[] a flexible array and determine its size
from configuration data or at runtime.

Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250731145534.2759334-5-qiuxu.zhuo@intel.com
2025-08-19 16:24:09 -07:00
Qiuxu Zhuo 30b47b71fd EDAC/skx_common: Swap memory controller index mapping
The current mapping of memory controller indices is from physical index [1]
to logical index [2], as show below:

  skx_dev->imc[pmc].mc_mapping = lmc

Since skx_dev->imc[] is an array of present memory controller instances,
mapping memory controller indices from logical index to physical index,
as show below, is more reasonable. This is also a preparatory step for
making skx_dev->imc[] a flexible array.

  skx_dev->imc[lmc].mc_mapping = pmc

Both mappings are equivalent. No functional changes intended.

[1] Indices for memory controllers include both those present to the
    OS and those disabled by BIOS.

[2] Indices for memory controllers present to the OS.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250731145534.2759334-4-qiuxu.zhuo@intel.com
2025-08-19 16:23:46 -07:00
Qiuxu Zhuo 59cfc06a87 EDAC/skx_common: Move mc_mapping to be a field inside struct skx_imc
The mc_mapping and imc fields of struct skx_dev have the same size,
NUM_IMC. Move mc_mapping to be a field inside struct skx_imc to prepare
for making the imc array of memory controller instances a flexible array.

No functional changes intended.

Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250731145534.2759334-3-qiuxu.zhuo@intel.com
2025-08-19 16:23:22 -07:00
Qiuxu Zhuo 219af5dfce EDAC/{skx_common,skx}: Use configuration data, not global macros
Use model-specific configuration data for the number of memory controllers
per socket, channels per memory controller, and DIMMs per channel as
intended, instead of relying on global macros for maximum values.

No functional changes intended.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250731145534.2759334-2-qiuxu.zhuo@intel.com
2025-08-19 16:22:50 -07:00
Qiuxu Zhuo 2e6fe1bbef EDAC/i10nm: Skip DIMM enumeration on a disabled memory controller
When loading the i10nm_edac driver on some Intel Granite Rapids servers,
a call trace may appear as follows:

  UBSAN: shift-out-of-bounds in drivers/edac/skx_common.c:453:16
  shift exponent -66 is negative
  ...
  __ubsan_handle_shift_out_of_bounds+0x1e3/0x390
  skx_get_dimm_info.cold+0x47/0xd40 [skx_edac_common]
  i10nm_get_dimm_config+0x23e/0x390 [i10nm_edac]
  skx_register_mci+0x159/0x220 [skx_edac_common]
  i10nm_init+0xcb0/0x1ff0 [i10nm_edac]
  ...

This occurs because some BIOS may disable a memory controller if there
aren't any memory DIMMs populated on this memory controller. The DIMMMTR
register of this disabled memory controller contains the invalid value
~0, resulting in the call trace above.

Fix this call trace by skipping DIMM enumeration on a disabled memory
controller.

Fixes: ba987eaaab ("EDAC/i10nm: Add Intel Granite Rapids server support")
Reported-by: Jose Jesus Ambriz Meza <jose.jesus.ambriz.meza@intel.com>
Reported-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Closes: https://lore.kernel.org/all/20250730063155.2612379-1-acelan.kao@canonical.com/
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Link: https://lore.kernel.org/r/20250806065707.3533345-1-qiuxu.zhuo@intel.com
2025-08-19 16:18:14 -07:00
Kyle Manna 71b69f817e EDAC/ie31200: Add two more Intel Alder Lake-S SoCs for EDAC support
Host Device IDs (DID0) correspond to:
* Intel Core i7-12700K
* Intel Core i5-12600K

See documentation:
* 12th Generation Intel® Core™ Processors Datasheet
    * Volume 1 of 2, Doc. No.: 655258, Rev.: 011
    * https://edc.intel.com/output/DownloadPdfDocument?id=8297 (PDF)

Signed-off-by: Kyle Manna <kyle@kylemanna.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20250819161739.3241152-1-kyle@kylemanna.com
2025-08-19 14:02:27 -07:00
Sascha Hauer fb13ae067a EDAC: Add EDAC driver for ARM Cortex A72 cores
The driver is designed to support error detection and reporting for
Cortex A72 cores, specifically within their L1 and L2 cache systems.
The errors are detected by reading CPU/L2 memory error syndrome
registers.

Unfortunately there is no robust way to inject errors into the caches,
so this driver doesn't contain any code to actually test it. It has
been tested though with code taken from an older version [1] of this
driver. For reasons stated in thread [1], the error injection code is
not suitable for mainline, so it is removed from the driver.

  [1] https://lore.kernel.org/all/1521073067-24348-1-git-send-email-york.sun@nxp.com/#t

  [ bp: minor touchups. ]

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Co-developed-by: Vijay Balakrishna <vijayb@linux.microsoft.com>
Signed-off-by: Vijay Balakrishna <vijayb@linux.microsoft.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/1752714390-27389-2-git-send-email-vijayb@linux.microsoft.com
2025-08-15 23:37:00 +02:00
Linus Torvalds d7223aed30 - i10nm:
- switch to use scnprintf()
  - Add Granite Rapids-D support
 
 - synopsys: Make sure ECC error and counter registers are cleared during
   init/probing to avoid reporting stale errors
 
 - igen6: Add Wildcat Lake SoCs support
 
 - Make sure scrub features sysfs attributes are initialized properly
 
 - Allocate memory repair sysfs attributes statically to reduce stack
   usage
 
 - Fix DIMM module size computation for DIMMs with total capacity which
   is a non power-of-two number, in amd64_edac
 
 - Do not be too dramatic when reporting disabled memory controllers in
   igen6_edac
 
 - Add support to ie31200_edac for the following SoCs:
  - Core i5-14[67]00
  - Bartless Lake-S SoCs
  - Raptor Lake-HX
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmiHeDcACgkQEsHwGGHe
 VUrnzhAAryFKu8xWuwOE3eGaMW6oJhjKF8wPxLiCxxi6ZdQ/1uudFVnzwgozmkXo
 l10h41A3yc1ZdJqdqn54gF8PxbQ0E1MvbXfmBqZ/U+V+dv6zMwu9TygoPRIJ60ST
 aIxTBq2zoSii7ucGCBjbqClMTF3ZcH/Q2FzZoFbZyZd84snWSz0B9+S+937mtMhl
 9Y55sAgQuigQDQ71YZymAGyWi9E9J20wFk76vIHEboRIa5sS0iCU88Wb4PT+5iKf
 Qc/1gyqnd+6FO9O9ddrYpeDcaIicLShuGVNZNlJalD/JyTIOcP6XdEDa5J7TYp27
 7IcmfHSYmZ5eL0vrJfrIwbauEpRL9ZjWXS+uQjj8/K/gkPUsH/Sdldgldkd50GHV
 6L79XSzpy4yhlAr3BXU0o917qRVWOpbxr9E7l6VAFGBpLl5ewtZiV3W7/Su4rPd2
 zpUGBZvjxO8jmNQn49IPs/XotVQ2L+mT+KSxUMZAO2pV+dztSJELMFQQC0uAXiZc
 ApcrSkQxa4fsxU2Ukc1dLOJNkwxEC1ECcPsl2I9EE1cFoix7NP2E+G92D/V52VoZ
 QeVkxM7LHZCTH9tH1nrCZ+WJr8S2vZ+uY8jRl42P12xU4kcd3RWEtna18bX5oe++
 RlgchnXwutEPSgHYZVPocuaDD7C6eIvYzpaVezVl9dgbRLLx8u4=
 =PBTf
 -----END PGP SIGNATURE-----

Merge tag 'edac_updates_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

 - i10nm:
     - switch to using scnprintf()
     - Add Granite Rapids-D support

 - synopsys: Make sure ECC error and counter registers are cleared
   during init/probing to avoid reporting stale errors

 - igen6: Add Wildcat Lake SoCs support

 - Make sure scrub features sysfs attributes are initialized properly

 - Allocate memory repair sysfs attributes statically to reduce stack
   usage

 - Fix DIMM module size computation for DIMMs with total capacity which
   is a non power-of-two number, in amd64_edac

 - Do not be too dramatic when reporting disabled memory controllers in
   igen6_edac

 - Add support to ie31200_edac for the following SoCs:
     - Core i5-14[67]00
     - Bartless Lake-S SoCs
     - Raptor Lake-HX

* tag 'edac_updates_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/{skx_common,i10nm}: Use scnprintf() for safer buffer handling
  EDAC/synopsys: Clear the ECC counters on init
  EDAC/ie31200: Add Intel Raptor Lake-HX SoCs support
  EDAC/igen6: Add Intel Wildcat Lake SoCs support
  EDAC/i10nm: Add Intel Granite Rapids-D support
  EDAC/mem_repair: Reduce stack usage in edac_mem_repair_get_desc()
  EDAC/igen6: Reduce log level to debug for absent memory controllers
  EDAC/ie31200: Document which CPUs correspond to each Raptor Lake-S device ID
  EDAC/ie31200: Enable support for Core i5-14600 and i7-14700
  ie31200/EDAC: Add Intel Bartlett Lake-S SoCs support
2025-07-29 16:30:38 -07:00
Wang Haoran 35928bc38d EDAC/{skx_common,i10nm}: Use scnprintf() for safer buffer handling
snprintf() is fragile when its return value will be used to append
additional data to a buffer. Use scnprintf() instead.

Signed-off-by: Wang Haoran <haoranwangsec@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20250715131700.1092720-1-haoranwangsec@gmail.com
2025-07-15 10:06:58 -07:00
Shubhrajyoti Datta b1dc7f097b EDAC/synopsys: Clear the ECC counters on init
Clear the ECC error and counter registers during initialization/probe to avoid
reporting stale errors that may have occurred before EDAC registration.

For that, unify the Zynq and ZynqMP ECC state reading paths and simplify the
code.

  [ bp: Massage commit message.
    Fix an -Wsometimes-uninitialized warning as reported by
    Reported-by: kernel test robot <lkp@intel.com>
    Closes: https://lore.kernel.org/oe-kbuild-all/202507141048.obUv3ZUm-lkp@intel.com ]

Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250713050753.7042-1-shubhrajyoti.datta@amd.com
2025-07-14 12:15:37 +02:00
Qiuxu Zhuo 05a61c6cb6 EDAC/ie31200: Add Intel Raptor Lake-HX SoCs support
Intel Raptor Lake-HX SoC shares the same memory controller registers
as Raptor Lake-S SoC. Add a compute die ID for Raptor Lake-HX SoCs with
Out-of-Band ECC capability for EDAC support.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Laurens SEGHERS <laurens@rale.com>
Link: https://lore.kernel.org/r/20250704151609.7833-4-qiuxu.zhuo@intel.com
2025-07-07 10:53:47 -07:00
Lili Li 773d8bb5ba EDAC/igen6: Add Intel Wildcat Lake SoCs support
Intel Wildcat Lake is a mobile derivative of Panther Lake with one
memory controller. Wildcat Lake SoCs share the same IBECC registers
with Meteor Lake-P SoCs.

Add a compute die ID and a new configuration structure for Wildcat
Lake SoCs with In-Band ECC capability for EDAC support.

Signed-off-by: Lili Li <lili.li@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250704151609.7833-3-qiuxu.zhuo@intel.com
2025-07-07 10:51:58 -07:00
Qiuxu Zhuo 9ad08c1115 EDAC/i10nm: Add Intel Granite Rapids-D support
The Granite Rapids-D CPU model uses memory controller registers similar
to those of the Granite Rapids server CPU but with a different memory
controller MMIO base.

Add the Granite Rapids-D CPU model ID and use the new memory controller
MMIO base for EDAC support.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: VikasX Chougule <vikasx.chougule@intel.com>
Link: https://lore.kernel.org/r/20250704151609.7833-2-qiuxu.zhuo@intel.com
2025-07-07 10:50:29 -07:00
Shiju Jose 1e14ea901d EDAC: Initialize EDAC features sysfs attributes
Fix the lockdep splat caused by missing sysfs_attr_init() calls for the
recently added EDAC feature's sysfs attributes.

In lockdep_init_map_type(), the check for the lock-class key if
(!static_obj(key) && !is_dynamic_key(key)) causes the splat.

  Backtrace:
  RIP: 0010:lockdep_init_map_type
  Call Trace:
   __kernfs_create_file
  sysfs_add_file_mode_ns
  internal_create_group
  internal_create_groups
  device_add
  ? __init_waitqueue_head
  edac_dev_register
  devm_cxl_memdev_edac_register
  ? lock_acquire
  ? find_held_lock
  ? cxl_mem_probe
  ? cxl_mem_probe
  ? lockdep_hardirqs_on
  ? cxl_mem_probe
  cxl_mem_probe

  [ bp: Massage. ]

Fixes: f90b738166 ("EDAC: Add scrub control feature")
Fixes: bcbd069b11 ("EDAC: Add a Error Check Scrub control feature")
Fixes: 699ea5219c ("EDAC: Add a memory repair control feature")
Reported-by: Dave Jiang <dave.jiang@intel.com>
Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://lore.kernel.org/20250626101344.1726-1-shiju.jose@huawei.com
2025-06-30 10:57:24 +02:00
Arnd Bergmann 815703e2ec EDAC/mem_repair: Reduce stack usage in edac_mem_repair_get_desc()
Constructing an array on the stack adds complexity and can exceed the
warning limit for per-function stack usage:

  drivers/edac/mem_repair.c:361:5: error: stack frame size (1296) exceeds
  limit (1280) in 'edac_mem_repair_get_desc' [-Werror,-Wframe-larger-than]

Change this to have the actual attribute array allocated statically and then
just add the instance number on the per-instance copy.

Fixes: 699ea5219c ("EDAC: Add a memory repair control feature")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250620114135.4017183-1-arnd@kernel.org
2025-06-26 16:18:41 +02:00
Avadhut Naik a3f3040657 EDAC/amd64: Fix size calculation for Non-Power-of-Two DIMMs
Each Chip-Select (CS) of a Unified Memory Controller (UMC) on AMD Zen-based
SOCs has an Address Mask and a Secondary Address Mask register associated with
it. The amd64_edac module logs DIMM sizes on a per-UMC per-CS granularity
during init using these two registers.

Currently, the module primarily considers only the Address Mask register for
computing DIMM sizes. The Secondary Address Mask register is only considered
for odd CS. Additionally, if it has been considered, the Address Mask register
is ignored altogether for that CS. For power-of-two DIMMs i.e. DIMMs whose
total capacity is a power of two (32GB, 64GB, etc), this is not an issue
since only the Address Mask register is used.

For non-power-of-two DIMMs i.e., DIMMs whose total capacity is not a power of
two (48GB, 96GB, etc), however, the Secondary Address Mask register is used
in conjunction with the Address Mask register. However, since the module only
considers either of the two registers for a CS, the size computed by the
module is incorrect. The Secondary Address Mask register is not considered for
even CS, and the Address Mask register is not considered for odd CS.

Introduce a new helper function so that both Address Mask and Secondary
Address Mask registers are considered, when valid, for computing DIMM sizes.
Furthermore, also rename some variables for greater clarity.

Fixes: 81f5090db8 ("EDAC/amd64: Support asymmetric dual-rank DIMMs")
Closes: https://lore.kernel.org/dbec22b6-00f2-498b-b70d-ab6f8a5ec87e@natrix.lt
Reported-by: Žilvinas Žaltiena <zilvinas@natrix.lt>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Tested-by: Žilvinas Žaltiena <zilvinas@natrix.lt>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20250529205013.403450-1-avadhut.naik@amd.com
2025-06-25 16:40:03 +02:00
Qiuxu Zhuo 10fa9a4e4d EDAC/igen6: Reduce log level to debug for absent memory controllers
The current KERN_WARNING level message for detecting absent memory
controllers is overly dramatic. The BIOS likely had valid reasons to
disable the memory controller (e.g. it isn't connected to any DIMM
slots on the motherboard for this system). So there's nothing actually
wrong that needs to be fixed.

Reduce the log level to KERN_DEBUG to eliminate the false warning.

Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250618162307.1523736-2-qiuxu.zhuo@intel.com
2025-06-23 12:46:31 +02:00
George Gaidarov 1de70efcc8 EDAC/ie31200: Document which CPUs correspond to each Raptor Lake-S device ID
Based on table 103 ("Host Device ID (DID0)") in [1], document which CPUs
correspond to each Raptor Lake-S device ID for better readability.

[1] https://www.intel.com/content/www/us/en/content-details/743844/13th-generation-intel-core-intel-core-14th-generation-intel-core-processor-series-1-and-series-2-and-intel-xeon-e-2400-processor-datasheet-volume-1-of-2.html

Signed-off-by: George Gaidarov <gdgaidarov+lkml@gmail.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250529162933.1228735-2-gdgaidarov+lkml@gmail.com
2025-06-23 12:46:18 +02:00
George Gaidarov 493f9c930e EDAC/ie31200: Enable support for Core i5-14600 and i7-14700
Device ID '0xa740' is shared by i7-14700, i7-14700K, and i7-14700T.
Device ID '0xa704' is shared by i5-14600, i5-14600K, and i5-14600T.

Tested locally on my i7-14700K.

Signed-off-by: George Gaidarov <gdgaidarov+lkml@gmail.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250529162933.1228735-1-gdgaidarov+lkml@gmail.com
2025-06-23 12:46:08 +02:00
Qiuxu Zhuo 021681830e ie31200/EDAC: Add Intel Bartlett Lake-S SoCs support
Bartlett Lake-S is a derivative of Raptor Lake-S and is optimized for
IoT/Edge applications. It shares the same memory controller registers
as Raptor Lake-S. Add compute die IDs of Bartlett Lake-S and reuse the
configuration data of Raptor Lake-S for Bartlett Lake-S EDAC support.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250502013900.343430-1-qiuxu.zhuo@intel.com
2025-06-23 12:45:56 +02:00
Qiuxu Zhuo 88efa0de32 EDAC/igen6: Fix NULL pointer dereference
A kernel panic was reported with the following kernel log:

  EDAC igen6: Expected 2 mcs, but only 1 detected.
  BUG: unable to handle page fault for address: 000000000000d570
  ...
  Hardware name: Notebook V54x_6x_TU/V54x_6x_TU, BIOS Dasharo (coreboot+UEFI) v0.9.0 07/17/2024
  RIP: e030:ecclog_handler+0x7e/0xf0 [igen6_edac]
  ...
  igen6_probe+0x2a0/0x343 [igen6_edac]
  ...
  igen6_init+0xc5/0xff0 [igen6_edac]
  ...

This issue occurred because one memory controller was disabled by
the BIOS but the igen6_edac driver still checked all the memory
controllers, including this absent one, to identify the source of
the error. Accessing the null MMIO for the absent memory controller
resulted in the oops above.

Fix this issue by reverting the configuration structure to non-const
and updating the field 'res_cfg->num_imc' to reflect the number of
detected memory controllers.

Fixes: 20e190b1c1 ("EDAC/igen6: Skip absent memory controllers")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Closes: https://lore.kernel.org/all/aFFN7RlXkaK_loQb@mail-itl/
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Link: https://lore.kernel.org/r/20250618162307.1523736-1-qiuxu.zhuo@intel.com
2025-06-18 20:19:45 +02:00
Avadhut Naik b2e673ae53 EDAC/amd64: Correct number of UMCs for family 19h models 70h-7fh
AMD's Family 19h-based Models 70h-7fh support 4 unified memory controllers
(UMC) per processor die.

The amd64_edac driver, however, assumes only 2 UMCs are supported since
max_mcs variable for the models has not been explicitly set to 4. The same
results in incomplete or incorrect memory information being logged to dmesg by
the module during initialization in some instances.

Fixes: 6c79e42169 ("EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh")
Closes: https://lore.kernel.org/all/27dc093f-ce27-4c71-9e81-786150a040b6@reox.at/
Reported-by: reox <mailinglist@reox.at>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@kernel.org
Link: https://lore.kernel.org/20250613005233.2330627-1-avadhut.naik@amd.com
2025-06-16 23:11:14 +02:00
Linus Torvalds 29e9359005 CXL changes for v6.16
- Remove always true condition in cxl features code.
 - Add verification of CHBS length for CXL 2.0
 - Ignore interleave granularity when interleave ways is 1
 - Add update addressing mising MODULE_DESCRIPTION for cxl_test
 - A series of cleanups/refactor to prep for AMD Zen5 translate code
 - Clean %pa debug printk in core/hdm.c
 - Documentation updates
   - Update to CXL Maturity Map
   - Fixes to source linking in CXL documentation
   - CXL documentation fixes, spelling corrections
   - A large collection of CXL documentation for the entire CXL subsystem, including
     documentation on CXL related platform and firmware notes
 - Remove redundant code of cxlctl_get_supported_features()
 - Series to support CXL RAS Features
   - Including "Patrol Scrub Control", "Error Check Scrub", "Performance Maitenance"
     and "Memory Sparing". The series connects CXL to EDAC.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAmg/EX0ACgkQYGjFFmlT
 OErkxA/+MuvYH6PjdSwbJiJUcLCrq0gdNczX7t+CAo3v4rNs5CrOJSKuXMxGVIpf
 lCjwS/J7j4XOa9DigDO1Dxl4PWG/R0K3HjbHQJLVjy3jmsVr5GgJVt4s5EqS78Or
 QoM9d/2dq8Q6dqk89Z4rFY2JlAmXcibe+lz2m9k5vy8KPQvrZI1KruZMG0qN1rWC
 SBa+eUWW49MP3Ab6pBDRRCI7EPcJ44QF+49SWXrkkiJjll/OTtYu3V1JymaPV4zT
 /UM/CwHLnmb5odUfOx5EZJcZIZzqasBD28Xu6Y6Vjs2pgTNPr9VNGCs+8lLwQCg7
 1O8cyPjPa1p5HwKo2INJfM1Xdpo1Nqar1qGcSPVJKk0+a538i07YuXR3uWqnJ+mO
 uplJwvtL1Rvg9h0C0fHwfB86Tl5poFIDn0zeZQ4tqMWH6y2wged5PcS5RSVuVYdX
 CHEzAvp1RreQvXd9KcSo6ITKbIvv5PplfcyTiftG0R71ewXYOpB386D7ihh8ZvvO
 Y6TJN1nXUv8w3ve7rbs3T2ncX+BbhHRKSnXTp3rbkXTQLF5t+2gjzVuwSSEH71Ps
 4bD+7EeE0JGSz76qLbfQPNt6l3HnG6ctycpAydsUs/YmIOnciKpT5R9OGwncKHrT
 /Ccx+9uN5+CizqsKCi+rxadoOw7Pk4bKmo0a4wCZ5sDPGix0g18=
 =evKj
 -----END PGP SIGNATURE-----

Merge tag 'cxl-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull Compute Express Link (CXL) updates from Dave Jiang:

 - Remove always true condition in cxl features code

 - Add verification of CHBS length for CXL 2.0

 - Ignore interleave granularity when interleave ways is 1

 - Add update addressing mising MODULE_DESCRIPTION for cxl_test

 - A series of cleanups/refactor to prep for AMD Zen5 translate code

 - Clean %pa debug printk in core/hdm.c

 - Documentation updates:
     - Update to CXL Maturity Map
     - Fixes to source linking in CXL documentation
     - CXL documentation fixes, spelling corrections
     - A large collection of CXL documentation for the entire CXL
       subsystem, including documentation on CXL related platform and
       firmware notes

 - Remove redundant code of cxlctl_get_supported_features()

 - Series to support CXL RAS Features
     - Including "Patrol Scrub Control", "Error Check Scrub",
       "Performance Maitenance" and "Memory Sparing". The series
       connects CXL to EDAC.

* tag 'cxl-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (53 commits)
  cxl/edac: Add CXL memory device soft PPR control feature
  cxl/edac: Add CXL memory device memory sparing control feature
  cxl/edac: Support for finding memory operation attributes from the current boot
  cxl/edac: Add support for PERFORM_MAINTENANCE command
  cxl/edac: Add CXL memory device ECS control feature
  cxl/edac: Add CXL memory device patrol scrub control feature
  cxl: Update prototype of function get_support_feature_info()
  EDAC: Update documentation for the CXL memory patrol scrub control feature
  cxl/features: Remove the inline specifier from to_cxlfs()
  cxl/feature: Remove redundant code of get supported features
  docs: ABI: Fix "firwmare" to "firmware"
  cxl/Documentation: Fix typo in sysfs write_bandwidth attribute path
  cxl: doc/linux/access-coordinates Update access coordinates calculation methods
  cxl: docs/platform/acpi/srat Add generic target documentation
  cxl: docs/platform/cdat reference documentation
  Documentation: Update the CXL Maturity Map
  cxl: Sync up the driver-api/cxl documentation
  cxl: docs - add self-referencing cross-links
  cxl: docs/allocation/hugepages
  cxl: docs/allocation/reclaim
  ...
2025-06-03 13:24:14 -07:00
Niravkumar L Rabara e5ef4cd2a4 EDAC/altera: Use correct write width with the INTTEST register
On the SoCFPGA platform, the INTTEST register supports only 16-bit writes.
A 32-bit write triggers an SError to the CPU so do 16-bit accesses only.

  [ bp: AI-massage the commit message. ]

Fixes: c7b4be8db8 ("EDAC, altera: Add Arria10 OCRAM ECC support")
Signed-off-by: Niravkumar L Rabara <niravkumar.l.rabara@intel.com>
Signed-off-by: Matthew Gerlach <matthew.gerlach@altera.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Cc: stable@kernel.org
Link: https://lore.kernel.org/20250527145707.25458-1-matthew.gerlach@altera.com
2025-05-29 17:38:55 +02:00
Linus Torvalds ada1b0436b - ie31200: Add support for Raptor Lake-S and Alder Lake-S compute dies
- Rework how RRL registers per channel tracking is done in order to
   support newer hardware with different RRL configurations and refactor
   that code. Add support for Granite Rapids server
 
 - i10nm: explicitly set RRL modes to fix any wrong BIOS programming
 
 - Properly save and restore Retry Read error Log channel configuration
   info on Intel drivers
 
 - igen6: Handle correctly the case of fused off memory controllers on
   Arizona Beach and Amston Lake SoCs before adding support for them
 
 - the usual set of fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmgzfuAACgkQEsHwGGHe
 VUrYuw/+KE1MD+zGPdTTsY5ZCMli+eKclGKqnr1rMy/5RUWPRvHoBBVc7fGiVeMU
 b0EsD5UEGW4Pzecm4w1rQ+FCFkOG2wHzA2+pbWI0dTE9ZxXjAFtMYSkKCqBOzR0C
 AwdvvYYK6IZlykh/+h5v7h8++4WPbnQQKfsiCu5w3JlHrgpvj/NQnUFCCNCFzavO
 aoQd11UuAwm4tUeshoRLkz+sD6g3+uo9vBiOpMITdBl4ESxA16gJF+7uWUpJB1B7
 M7G2P0c5lhRv704M1AhUSVr9i46F7IYEeoexTTdc1Obb39IYKSl/vRdhatM3V1dy
 dbFMnuXBRiBuHtt6WkoU7oj6OtQQTU/uJ3GWGl1pVoKNVVw46qM7gsrX9sWk/JGW
 Sd5pGpiWVWe0RXdDchWCywlWB3xC9NrGjP+ivgCtyO9z0kLAjtwOPFdBGK2byg4t
 aKkyHCsAjOayCT2uN/7nuPOjjr8Zw2KTgM+g2oDp9oK7TSb70keqDqShgDeB/PKH
 cYbWJPu9m97j1HICMLWBzHHFn4KxByIZbRfWqbcYV8Ufno7fuali6CuV7QCmYyZU
 mqMi4xQ0hfIHgmkamHBWWegyvFtBV7/r6nJjvEz631aME9tPveG/fl8wjABLaKjq
 Q8klRRmC1sdnJssl4KJY7n0DedzrRLfmThRg6gzvF7hVevzK6Bg=
 =GCi0
 -----END PGP SIGNATURE-----

Merge tag 'edac_updates_for_v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

 - ie31200: Add support for Raptor Lake-S and Alder Lake-S compute dies

 - Rework how RRL registers per channel tracking is done in order to
   support newer hardware with different RRL configurations and refactor
   that code. Add support for Granite Rapids server

 - i10nm: explicitly set RRL modes to fix any wrong BIOS programming

 - Properly save and restore Retry Read error Log channel configuration
   info on Intel drivers

 - igen6: Handle correctly the case of fused off memory controllers on
   Arizona Beach and Amston Lake SoCs before adding support for them

 - the usual set of fixes and cleanups

* tag 'edac_updates_for_v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/bluefield: Don't use bluefield_edac_readl() result on error
  EDAC/i10nm: Fix the bitwise operation between variables of different sizes
  EDAC/ie31200: Add two Intel SoCs for EDAC support
  EDAC/{skx_common,i10nm}: Add RRL support for Intel Granite Rapids server
  EDAC/{skx_common,i10nm}: Refactor show_retry_rd_err_log()
  EDAC/{skx_common,i10nm}: Refactor enable_retry_rd_err_log()
  EDAC/{skx_common,i10nm}: Structure the per-channel RRL registers
  EDAC/i10nm: Explicitly set the modes of the RRL register sets
  EDAC/{skx_common,i10nm}: Fix the loss of saved RRL for HBM pseudo channel 0
  EDAC/skx_common: Fix general protection fault
  EDAC/igen6: Add Intel Amston Lake SoCs support
  EDAC/igen6: Add Intel Arizona Beach SoCs support
  EDAC/igen6: Skip absent memory controllers
2025-05-27 10:13:06 -07:00
Linus Torvalds 2bd1bea5fa A set of cleanups for the generic interrupt subsystem:
- Consolidate on one set of functions for the interrupt domain code to
     get rid of pointlessly duplicated code with only marginal different
     semantics.
 
   - Update the documentation accordingly and consolidate the coding style
     of the irqdomain header.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmgzd+MTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYodTRD/0RmG5tngCbEJmTw6lPDQzRZH4OO3ja
 yRYlyBipemoRmvJRGjV4uHqN2QPrdOuoqMuyBO1aWcMdkpww5bAHcbgSFrlGM1lW
 kqtaxVMbufPiLQSGYe7OQf478CE1ykoBd5Va8whFKrtA73qEUdEMfWT0stspg780
 7BlmQOemL91p7Ytf03FbDdo8tZ5Xu9uXGAulwY9FZsFtsCNyvhl7nOv5Sk8ZQtGO
 xHRCeunjZLWR+IaK59hdakvQybXwSnjT6jODp96nlyKABEKSPShGSPFDWd3g9px7
 4911QwgnvTbcrsk6YmQEmPIOgXZzypjbnjpJr8tFpTbkVIy+6chi5cBJzXoqsUaM
 ylTwFcUQNvcP8yF447qb+nyPFKM5xsC07W0UpZMuJUDmhhPRtDm5pK0jpsif96GP
 l4aMsWe65PUmXHQqLdE89RJXAa8XQ2qspKVtNKq9DmEVgTviQ09Z9SSQIx4U0yIx
 w+YPde8kH2+O+YtMUn/MmfHhUP4MKya7j5zd8Bnv8wLBi7XGPPA5EKKh9I0dz9m+
 X94lweNXyH+Q8U9mt2cQf8VG8Yzgk0eeC0sliJIlybwRgEgRcQbVWw0VvZUA1ySa
 VBlaj3SinO90FEQ0CctT51ss2mUJ/XsGCnxpiGZXfqIZzFbyD1YfZQnXJH0H67DI
 CqdHw22I27Mu/A==
 =9nLp
 -----END PGP SIGNATURE-----

Merge tag 'irq-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq cleanups from Thomas Gleixner:
 "A set of cleanups for the generic interrupt subsystem:

   - Consolidate on one set of functions for the interrupt domain code
     to get rid of pointlessly duplicated code with only marginal
     different semantics.

   - Update the documentation accordingly and consolidate the coding
     style of the irqdomain header"

* tag 'irq-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
  irqdomain: Consolidate coding style
  irqdomain: Fix kernel-doc and add it to Documentation
  Documentation: irqdomain: Update it
  Documentation: irq-domain.rst: Simple improvements
  Documentation: irq/concepts: Minor improvements
  Documentation: irq/concepts: Add commas and reflow
  irqdomain: Improve kernel-docs of functions
  irqdomain: Make struct irq_domain_info variables const
  irqdomain: Use irq_domain_instantiate()'s return value as initializers
  irqdomain: Drop irq_linear_revmap()
  pinctrl: keembay: Switch to irq_find_mapping()
  irqchip/armada-370-xp: Switch to irq_find_mapping()
  gpu: ipu-v3: Switch to irq_find_mapping()
  gpio: idt3243x: Switch to irq_find_mapping()
  sh: Switch to irq_find_mapping()
  powerpc: Switch to irq_find_mapping()
  irqdomain: Drop irq_domain_add_*() functions
  powerpc: Switch irq_domain_add_nomap() to use fwnode
  thermal: Switch to irq_domain_create_linear()
  soc: Switch to irq_domain_create_*()
  ...
2025-05-27 08:07:32 -07:00
Shiju Jose 588ca944c2 cxl/edac: Add CXL memory device memory sparing control feature
Memory sparing is defined as a repair function that replaces a portion of
memory with a portion of functional memory at that same DPA. The subclasses
for this operation vary in terms of the scope of the sparing being
performed. The cacheline sparing subclass refers to a sparing action that
can replace a full cacheline. Row sparing is provided as an alternative to
PPR sparing functions and its scope is that of a single DDR row.
As per CXL r3.2 Table 8-125 foot note 1. Memory sparing is preferred over
PPR when possible.
Bank sparing allows an entire bank to be replaced. Rank sparing is defined
as an operation in which an entire DDR rank is replaced.

Memory sparing maintenance operations may be supported by CXL devices
that implement CXL.mem protocol. A sparing maintenance operation requests
the CXL device to perform a repair operation on its media.
For example, a CXL device with DRAM components that support memory sparing
features may implement sparing maintenance operations.

The host may issue a query command by setting query resources flag in the
input payload (CXL spec 3.2 Table 8-120) to determine availability of
sparing resources for a given address. In response to a query request,
the device shall report the resource availability by producing the memory
sparing event record (CXL spec 3.2 Table 8-60) in which the Channel, Rank,
Nibble Mask, Bank Group, Bank, Row, Column, Sub-Channel fields are a copy
of the values specified in the request.

During the execution of a sparing maintenance operation, a CXL memory
device:
- may not retain data
- may not be able to process CXL.mem requests correctly.
These CXL memory device capabilities are specified by restriction flags
in the memory sparing feature readable attributes.

When a CXL device identifies error on a memory component, the device
may inform the host about the need for a memory sparing maintenance
operation by using DRAM event record, where the 'maintenance needed' flag
may set. The event record contains some of the DPA, Channel, Rank,
Nibble Mask, Bank Group, Bank, Row, Column, Sub-Channel fields that
should be repaired. The userspace tool requests for maintenance operation
if the 'maintenance needed' flag set in the CXL DRAM error record.

CXL spec 3.2 section 8.2.10.7.1.4 describes the device's memory sparing
maintenance operation feature.

CXL spec 3.2 section 8.2.10.7.2.3 describes the memory sparing feature
discovery and configuration.

Add support for controlling CXL memory device memory sparing feature.
Register with EDAC driver, which gets the memory repair attr descriptors
from the EDAC memory repair driver and exposes sysfs repair control
attributes for memory sparing to the userspace. For example CXL memory
sparing control for the CXL mem0 device is exposed in
/sys/bus/edac/devices/cxl_mem0/mem_repairX/

Use case
========
1. CXL device identifies a failure in a memory component, report to
   userspace in a CXL DRAM trace event with DPA and other attributes of
   memory to repair such as channel, rank, nibble mask, bank Group,
   bank, row, column, sub-channel.

2. Rasdaemon process the trace event and may issue query request in sysfs
check resources available for memory sparing if either of the following
conditions met.
 - 'maintenance needed' flag set in the event record.
 - 'threshold event' flag set for CVME threshold feature.
 - When the number of corrected error reported on a CXL.mem media to the
   userspace exceeds the threshold value for corrected error count defined
   by the userspace policy.

3. Rasdaemon process the memory sparing trace event and issue repair
   request for memory sparing.

Kernel CXL driver shall report memory sparing event record to the userspace
with the resource availability in order rasdaemon to process the event
record and issue a repair request in sysfs for the memory sparing operation
in the CXL device.

Note: Based on the feedbacks from the community 'query' sysfs attribute is
removed and reporting memory sparing error record to the userspace are not
supported. Instead userspace issues sparing operation and kernel does the
same to the CXL memory device, when 'maintenance needed' flag set in the
DRAM event record.

Add checks to ensure the memory to be repaired is offline and if online,
then originates from a CXL DRAM error record reported in the current boot
before requesting a memory sparing operation on the device.

Note: Tested memory sparing feature control with QEMU patch
      "hw/cxl: Add emulation for memory sparing control feature"
      https://lore.kernel.org/linux-cxl/20250509172229.726-1-shiju.jose@huawei.com/T/#m5f38512a95670d75739f9dad3ee91b95c7f5c8d6

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20250521124749.817-8-shiju.jose@huawei.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-05-23 13:24:53 -07:00
David Thompson ea3b0b7f54 EDAC/bluefield: Don't use bluefield_edac_readl() result on error
The bluefield_edac_readl() routine returns an uninitialized result on error
paths. In those cases the calling routine should not use the uninitialized
result. The driver should simply log the error, and then return early.

Fixes: e419675754 ("EDAC/bluefield: Use Arm SMC for EMI access on BlueField-2")
Signed-off-by: David Thompson <davthompson@nvidia.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shravan Kumar Ramani <shravankr@nvidia.com>
Link: https://lore.kernel.org/20250318214747.12271-1-davthompson@nvidia.com
2025-05-22 17:58:28 +02:00
Jiri Slaby (SUSE) 6be00e4335 EDAC/altera: Switch to irq_domain_create_linear()
irq_domain_add_linear() is going away as being obsolete now. Switch to
the preferred irq_domain_create_linear(). That differs in the first
parameter: It takes more generic struct fwnode_handle instead of struct
device_node. Therefore, of_fwnode_handle() is added around the
parameter.

Note some of the users can likely use dev->fwnode directly instead of
indirect of_fwnode_handle(dev->of_node). But dev->fwnode is not
guaranteed to be set for all, so this has to be investigated on case to
case basis (by people who can actually test with the HW).

[ tglx: Fix up subject prefix ]

Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-17-jirislaby@kernel.org
2025-05-16 21:06:09 +02:00
Ingo Molnar 1f82e8e1ca Merge branch 'x86/msr' into x86/core, to resolve conflicts
Conflicts:
	arch/x86/boot/startup/sme.c
	arch/x86/coco/sev/core.c
	arch/x86/kernel/fpu/core.c
	arch/x86/kernel/fpu/xstate.c

 Semantic conflict:
	arch/x86/include/asm/sev-internal.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-13 10:42:06 +02:00
Ingo Molnar 570d58b12f Linux 6.15-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmgX1CgeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGxiIH/A7LHlVatGEQgRFi
 0JALDgcuGTMtMU1qD43rv8Z1GXqTpCAlaBt9D1C9cUH/86MGyBTVRWgVy0wkaU2U
 8QSfFWQIbrdaIzelHtzmAv5IDtb+KrcX1iYGLcMb6ZYaWkv8/CMzMX1nkgxEr1QT
 37Xo3/F17yJumAdNQxdRhVLGy2d3X5rScecpufwh97sMwoddllMCDs2LIoeSAYpG
 376/wzni09G2fADa8MEKqcaMue4qcf0FOo/gOkT8YwFGSZLKa6uumlBLg04QoCt0
 foK2vfcci1q4H4ZbCu3uQESYGLQHY0f2ICDCwC3m25VF9a81TmlbC3MLum3vhmKe
 RtLDcXg=
 =xyaI
 -----END PGP SIGNATURE-----

Merge tag 'v6.15-rc5' into x86/msr, to pick up fixes and to resolve conflicts

 Conflicts:
	drivers/cpufreq/intel_pstate.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-06 19:42:00 +02:00
Ingo Molnar 24035886d7 Linux 6.15-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmgX1CgeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGxiIH/A7LHlVatGEQgRFi
 0JALDgcuGTMtMU1qD43rv8Z1GXqTpCAlaBt9D1C9cUH/86MGyBTVRWgVy0wkaU2U
 8QSfFWQIbrdaIzelHtzmAv5IDtb+KrcX1iYGLcMb6ZYaWkv8/CMzMX1nkgxEr1QT
 37Xo3/F17yJumAdNQxdRhVLGy2d3X5rScecpufwh97sMwoddllMCDs2LIoeSAYpG
 376/wzni09G2fADa8MEKqcaMue4qcf0FOo/gOkT8YwFGSZLKa6uumlBLg04QoCt0
 foK2vfcci1q4H4ZbCu3uQESYGLQHY0f2ICDCwC3m25VF9a81TmlbC3MLum3vhmKe
 RtLDcXg=
 =xyaI
 -----END PGP SIGNATURE-----

Merge tag 'v6.15-rc5' into x86/cpu, to resolve conflicts

 Conflicts:
	tools/arch/x86/include/asm/cpufeatures.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-06 10:00:58 +02:00
Xin Li (Intel) efef7f184f x86/msr: Add explicit includes of <asm/msr.h>
For historic reasons there are some TSC-related functions in the
<asm/msr.h> header, even though there's an <asm/tsc.h> header.

To facilitate the relocation of rdtsc{,_ordered}() from <asm/msr.h>
to <asm/tsc.h> and to eventually eliminate the inclusion of
<asm/msr.h> in <asm/tsc.h>, add an explicit <asm/msr.h> dependency
to the source files that reference definitions from <asm/msr.h>.

[ mingo: Clarified the changelog. ]

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/20250501054241.1245648-1-xin@zytor.com
2025-05-02 10:23:47 +02:00
Niravkumar L Rabara 6dbe3c5418 EDAC/altera: Set DDR and SDMMC interrupt mask before registration
Mask DDR and SDMMC in probe function to avoid spurious interrupts before
registration.  Removed invalid register write to system manager.

Fixes: 1166fde93d ("EDAC, altera: Add Arria10 ECC memory init functions")
Signed-off-by: Niravkumar L Rabara <niravkumar.l.rabara@altera.com>
Signed-off-by: Matthew Gerlach <matthew.gerlach@altera.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Cc: stable@kernel.org
Link: https://lore.kernel.org/20250425142640.33125-3-matthew.gerlach@altera.com
2025-04-28 12:38:53 +02:00
Niravkumar L Rabara 4fb7b8fceb EDAC/altera: Test the correct error reg offset
Test correct structure member, ecc_cecnt_offset, before using it.

  [ bp: Massage commit message. ]

Fixes: 73bcc942f4 ("EDAC, altera: Add Arria10 EDAC support")
Signed-off-by: Niravkumar L Rabara <niravkumar.l.rabara@altera.com>
Signed-off-by: Matthew Gerlach <matthew.gerlach@altera.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Cc: stable@kernel.org
Link: https://lore.kernel.org/20250425142640.33125-2-matthew.gerlach@altera.com
2025-04-28 12:17:44 +02:00
Qiuxu Zhuo 2b2408aca9 EDAC/i10nm: Fix the bitwise operation between variables of different sizes
The tool of Smatch static checker reported the following warning:

  drivers/edac/i10nm_base.c:364 show_retry_rd_err_log()
  warn: should bitwise negate be 'ullong'?

This warning was due to the bitwise NOT/AND operations between
'status_mask' (a u32 type) and 'log' (a u64 type), which resulted in
the high 32 bits of 'log' were cleared.

This was a false positive warning, as only the low 32 bits of 'log' was
written to the first RRL memory controller register (a u32 type).

To improve code sanity, fix this warning by changing 'status_mask' to
a u64 type, ensuring it matches the size of 'log' for bitwise operations.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/aAih0KmEVq7ch6v2@stanley.mountain/
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250424081454.2952632-1-qiuxu.zhuo@intel.com
2025-04-24 08:42:25 -07:00
Qiuxu Zhuo 180f091224 EDAC/ie31200: Add two Intel SoCs for EDAC support
Add two compute die IDs for Raptor Lake-S and Alder Lake-S for EDAC
support. Note that because Alder Lake-S shares the same memory controller
registers as Raptor Lake-S, it can reuse the configuration data of Raptor
Lake-S for EDAC support.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: James Jernigan <jameswestonjernigan@gmail.com>
Link: https://lore.kernel.org/r/20250422134450.1880648-1-qiuxu.zhuo@intel.com
2025-04-22 08:40:13 -07:00
Qiuxu Zhuo 5904dc561e EDAC/{skx_common,i10nm}: Add RRL support for Intel Granite Rapids server
Compared to previous generations, Granite Rapids defines the RRL control
bits {en_patspr, noover, en} in different positions, adds an extra RRL set
for the new mode of the first patrol-scrub read error, and extends the
number of CORRERRCNT registers from 4 to 8, encoding one counter per
CORRERRCNT register.

Add a Granite Rapids reg_rrl configuration table and adjust the code to
accommodate the differences mentioned above for RRL support.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Feng Xu <feng.f.xu@intel.com>
Link: https://lore.kernel.org/r/20250417150724.1170168-8-qiuxu.zhuo@intel.com
2025-04-17 10:45:21 -07:00
Qiuxu Zhuo 126168fa2c EDAC/{skx_common,i10nm}: Refactor show_retry_rd_err_log()
Make the {valid bit, overwritten status, number} of RRL registers and the
{number, offsets, widths} of per-channel CORRERRCNT registers configurable.
Refactor show_retry_rd_err_log() to use the configurable fields of struct
reg_rrl, making the code more scalable and simpler.

No functional changes intended.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Feng Xu <feng.f.xu@intel.com>
Link: https://lore.kernel.org/r/20250417150724.1170168-7-qiuxu.zhuo@intel.com
2025-04-17 10:33:37 -07:00
Qiuxu Zhuo ba3985c1fa EDAC/{skx_common,i10nm}: Refactor enable_retry_rd_err_log()
Refactor enable_retry_rd_err_log() using helper functions for both
DDR and HBM, making the RRL control bits configurable instead of
hard-coded. Additionally, explicitly define the four RRL modes for
better readability.

No functional changes intended.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Feng Xu <feng.f.xu@intel.com>
Link: https://lore.kernel.org/r/20250417150724.1170168-6-qiuxu.zhuo@intel.com
2025-04-17 10:31:09 -07:00
Qiuxu Zhuo 1a8a6af663 EDAC/{skx_common,i10nm}: Structure the per-channel RRL registers
As the number of RRL (retry_rd_err_log) registers per memory channel
increases, the positions of the RRL control bits and the widths of the
RRL registers vary across different CPU generations. Adding RRL support
for a new CPU requires handling these differences throughout the
RRL-related code.

Structure the offsets, widths, control bit positions, set numbers, modes,
etc., of the per-channel RRL registers and make them configurable to
facilitate easier RRL support for new CPUs.

No functional changes are intended.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Feng Xu <feng.f.xu@intel.com>
Link: https://lore.kernel.org/r/20250417150724.1170168-5-qiuxu.zhuo@intel.com
2025-04-17 10:28:09 -07:00
Qiuxu Zhuo 4878e1e900 EDAC/i10nm: Explicitly set the modes of the RRL register sets
The i10nm_edac driver uses the default modes (either patrol scrub read
or on-demand read) of the RRL register sets configured by the BIOS.

Explicitly set the modes during the loading of the i10nm_edac driver with
the module parameter retry_rd_err_log=2.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Feng Xu <feng.f.xu@intel.com>
Link: https://lore.kernel.org/r/20250417150724.1170168-4-qiuxu.zhuo@intel.com
2025-04-17 10:24:50 -07:00
Qiuxu Zhuo eeed3e03f4 EDAC/{skx_common,i10nm}: Fix the loss of saved RRL for HBM pseudo channel 0
When enabling the retry_rd_err_log (RRL) feature during the loading of the
i10nm_edac driver with the module parameter retry_rd_err_log=2 (Linux RRL
control mode), the default values of the control bits of RRL are saved so
that they can be restored during the unloading of the driver.

In the current code, the RRL of pseudo channel 1 of HBM overwrites pseudo
channel 0 during the loading of the driver, resulting in the loss of saved
RRL for pseudo channel 0. This causes the RRL of pseudo channel 0 of HBM to
be wrongly restored with the values from pseudo channel 1 when unloading
the driver.

Fix this issue by creating two separate groups of RRL control registers
per channel to save default RRL settings of two {sub-,pseudo-}channels.

Fixes: acd4cf68fe ("EDAC/i10nm: Retrieve and print retry_rd_err_log registers for HBM")
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Feng Xu <feng.f.xu@intel.com>
Link: https://lore.kernel.org/r/20250417150724.1170168-3-qiuxu.zhuo@intel.com
2025-04-17 10:22:56 -07:00
Qiuxu Zhuo 20d2d476b3 EDAC/skx_common: Fix general protection fault
After loading i10nm_edac (which automatically loads skx_edac_common), if
unload only i10nm_edac, then reload it and perform error injection testing,
a general protection fault may occur:

  mce: [Hardware Error]: Machine check events logged
  Oops: general protection fault ...
  ...
  Workqueue: events mce_gen_pool_process
  RIP: 0010:string+0x53/0xe0
  ...
  Call Trace:
  <TASK>
  ? die_addr+0x37/0x90
  ? exc_general_protection+0x1e7/0x3f0
  ? asm_exc_general_protection+0x26/0x30
  ? string+0x53/0xe0
  vsnprintf+0x23e/0x4c0
  snprintf+0x4d/0x70
  skx_adxl_decode+0x16a/0x330 [skx_edac_common]
  skx_mce_check_error.part.0+0xf8/0x220 [skx_edac_common]
  skx_mce_check_error+0x17/0x20 [skx_edac_common]
  ...

The issue arose was because the variable 'adxl_component_count' (inside
skx_edac_common), which counts the ADXL components, was not reset. During
the reloading of i10nm_edac, the count was incremented by the actual number
of ADXL components again, resulting in a count that was double the real
number of ADXL components. This led to an out-of-bounds reference to the
ADXL component array, causing the general protection fault above.

Fix this issue by resetting the 'adxl_component_count' in adxl_put(),
which is called during the unloading of {skx,i10nm}_edac.

Fixes: 123b158635 ("EDAC, i10nm: make skx_common.o a separate module")
Reported-by: Feng Xu <feng.f.xu@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Feng Xu <feng.f.xu@intel.com>
Link: https://lore.kernel.org/r/20250417150724.1170168-2-qiuxu.zhuo@intel.com
2025-04-17 10:19:02 -07:00
Qiuxu Zhuo 099d2db362 EDAC/igen6: Add Intel Amston Lake SoCs support
Intel Amston Lake is a series of SoCs tailored for edge computing needs.
The Amston Lake SoCs, equipped with IBECC(In-Band ECC) capability, share
the same IBECC registers with Alder Lake-N SoCs. Add the Intel Amston Lake
SoC compute die ID for EDAC support.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250408132455.489046-4-qiuxu.zhuo@intel.com
2025-04-17 10:15:35 -07:00
Qiuxu Zhuo b804d7c59a EDAC/igen6: Add Intel Arizona Beach SoCs support
The Intel Arizona Beach SoC series is oriented toward network computing.
Some types of these SoCs are equipped with IBECC(In-Band ECC) and share
the same IBECC registers with Alder Lake-N SoCs. Add a die ID for Arizona
Beach SoC for EDAC support.

[Tony: s/Arizona Lake/Arizona Beach/ in commit message]

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250408132455.489046-3-qiuxu.zhuo@intel.com
2025-04-17 10:09:14 -07:00
Qiuxu Zhuo 20e190b1c1 EDAC/igen6: Skip absent memory controllers
Some BIOS versions may fuse off certain memory controllers and set the
registers of these absent memory controllers to ~0. The current igen6_edac
mistakenly enumerates these absent memory controllers and registers them
with the EDAC core.

Skip the absent memory controllers to avoid mistakenly enumerating them.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250408132455.489046-2-qiuxu.zhuo@intel.com
2025-04-17 10:06:32 -07:00
Ingo Molnar 0a35c9280a x86/platform/amd: Move the <asm/amd_node.h> header to <asm/amd/node.h>
Collect AMD specific platform header files in <asm/amd/*.h>.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <superm1@kernel.org>
Link: https://lore.kernel.org/r/20250413084144.3746608-7-mingo@kernel.org
2025-04-14 09:34:17 +02:00
Ingo Molnar bcbb655595 x86/platform/amd: Move the <asm/amd_nb.h> header to <asm/amd/nb.h>
Collect AMD specific platform header files in <asm/amd/*.h>.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <superm1@kernel.org>
Link: https://lore.kernel.org/r/20250413084144.3746608-4-mingo@kernel.org
2025-04-14 09:34:14 +02:00
Ingo Molnar c435e608cf x86/msr: Rename 'rdmsrl()' to 'rdmsrq()'
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-10 11:58:27 +02:00
Linus Torvalds ae8371a46e - Add infrastructure support to EDAC in order to be able to register memory
scrubbing RAS functionality with the kernel and expose sysfs nodes to
   control such scrubbing functionality. The main use case is CXL devices which
   provide different scrubbers for their built-in memories so that tools like
   rasdaemon can configure and control memory scrubbing and other, more
   advanced RAS functionality. (Shiju Jose and Jonathan Cameron)
 
 - Add support to ie31200_edac for client SoCs like Raptor Lake-S which have
   multiple memory controllers and out-of-band ECC capability. (Qiuxu Zhuo)
 
 - The usual round of cleanups, simplifications and fixlets
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmfitgYACgkQEsHwGGHe
 VUp/Tg/7BJeE9QunMlB2EcCbYM+3eelp+Sg899S/6iNdC66sevFPoVXTpv9qz7Q6
 +ZD+V5vKIuKmGlV9dtn8nK5o8VvA2EvZsYSp6kk85qC+GNoqlc9E50I1yB3+otl8
 /3qD7PH0Ww5a4csjg+ioTRTphXp5DaK5J1m+Gze4h9n2ADs/aDb6vWr2AobomYOT
 h8pIb5PBdX9ehjWqUP/d+G+/ZN7244+FtMt1p3/xhBMjRJcwUxeAkw1u59EC5Hpb
 poP60Sl4pjr6uUI6QXrGEvLqvX3kq+fqveRosX1L+SlgAXesGXSg/tbdY5T78zGS
 aTebwmej00tvqQIYfsPpFKqk4W2wxUfnG6a2K0U3fYINQqSjI8kPrq9kMLpPejAG
 Lb0rZmHwLTPMM+G0BZVc4QSClhO9GXnD1wIsH8YGcqEkjDo0wkDL3KVm8aFhcx7b
 BDHn7b9Zx9zIvPhlcRupsUUNiqrNAV3R+zfUWzH9JF/GeCrT148vs8cZX16QGk18
 bnA5SY/mv8EExbeaEltKRagdToqxW6WEq5vv6KuLpco4kfCWJYWUHe5QaAMzZUBZ
 hXW0vvzUbkBLZo2NsdbXJk0+iT/lmjjOaqHZGcuwtepLIsJoRySkabbsMe6TTTtY
 O4abNt5yJkgcCYJwMwKmS9RogROO9yZjhK+4QfGex2lYgjlMTds=
 =OpTE
 -----END PGP SIGNATURE-----

Merge tag 'edac_updates_for_v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

 - Add infrastructure support to EDAC in order to be able to register
   memory scrubbing RAS functionality with the kernel and expose sysfs
   nodes to control such scrubbing functionality.

   The main use case is CXL devices which provide different scrubbers
   for their built-in memories so that tools like rasdaemon can
   configure and control memory scrubbing and other, more advanced RAS
   functionality (Shiju Jose and Jonathan Cameron)

 - Add support to ie31200_edac for client SoCs like Raptor Lake-S which
   have multiple memory controllers and out-of-band ECC capability
   (Qiuxu Zhuo)

 - The usual round of cleanups, simplifications and fixlets

* tag 'edac_updates_for_v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: (25 commits)
  MAINTAINERS: Add a secondary maintainer for bluefield_edac
  EDAC/ie31200: Switch Raptor Lake-S to interrupt mode
  EDAC/ie31200: Add Intel Raptor Lake-S SoCs support
  EDAC/ie31200: Break up ie31200_probe1()
  EDAC/ie31200: Fold the two channel loops into one loop
  EDAC/ie31200: Make struct dimm_data contain decoded information
  EDAC/ie31200: Make the memory controller resources configurable
  EDAC/ie31200: Simplify the pci_device_id table
  EDAC/ie31200: Fix the 3rd parameter name of *populate_dimm_info()
  EDAC/ie31200: Fix the error path order of ie31200_init()
  EDAC/ie31200: Fix the DIMM size mask for several SoCs
  EDAC/ie31200: Fix the size of EDAC_MC_LAYER_CHIP_SELECT layer
  EDAC/device: Fix dev_set_name() format string
  EDAC/pnd2: Make read-only const array intlv static
  EDAC/igen6: Constify struct res_config
  EDAC/amd64: Simplify return statement in dct_ecc_enabled()
  EDAC: Update memory repair control interface for memory sparing feature
  EDAC: Add a memory repair control feature
  EDAC: Use string choice helper functions
  EDAC: Add a Error Check Scrub control feature
  ...
2025-03-25 14:00:26 -07:00
Borislav Petkov (AMD) 298ffd5375 Merge remote-tracking branches 'ras/edac-cxl', 'ras/edac-drivers' and 'ras/edac-misc' into edac-updates
* ras/edac-cxl:
  EDAC/device: Fix dev_set_name() format string
  EDAC: Update memory repair control interface for memory sparing feature
  EDAC: Add a memory repair control feature
  EDAC: Add a Error Check Scrub control feature
  EDAC: Add scrub control feature
  EDAC: Add support for EDAC device features control

* ras/edac-drivers:
  EDAC/ie31200: Switch Raptor Lake-S to interrupt mode
  EDAC/ie31200: Add Intel Raptor Lake-S SoCs support
  EDAC/ie31200: Break up ie31200_probe1()
  EDAC/ie31200: Fold the two channel loops into one loop
  EDAC/ie31200: Make struct dimm_data contain decoded information
  EDAC/ie31200: Make the memory controller resources configurable
  EDAC/ie31200: Simplify the pci_device_id table
  EDAC/ie31200: Fix the 3rd parameter name of *populate_dimm_info()
  EDAC/ie31200: Fix the error path order of ie31200_init()
  EDAC/ie31200: Fix the DIMM size mask for several SoCs
  EDAC/ie31200: Fix the size of EDAC_MC_LAYER_CHIP_SELECT layer
  EDAC/{skx_common,i10nm}: Fix some missing error reports on Emerald Rapids
  EDAC/igen6: Fix the flood of invalid error reports
  EDAC/ie31200: work around false positive build warning

* ras/edac-misc:
  MAINTAINERS: Add a secondary maintainer for bluefield_edac
  EDAC/pnd2: Make read-only const array intlv static
  EDAC/igen6: Constify struct res_config
  EDAC/amd64: Simplify return statement in dct_ecc_enabled()
  EDAC: Use string choice helper functions

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2025-03-25 14:53:27 +01:00
Qiuxu Zhuo a5db1b296b EDAC/ie31200: Switch Raptor Lake-S to interrupt mode
Raptor Lake-S SoCs notify correctable memory errors via CMCI (Corrected
Machine Check Interrupt). Switch Raptor Lake-S EDAC support from polling
to interrupt mode by registering the callback to the MCE decode notifier
chain.

Note that as Raptor Lake-S SoCs may not recover from uncorrectable memory
errors, the system will hang as soon as this type of error occurs, and the
registered callback on the MCE decode chain will not be executed. This is
the expected behavior.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Gary Wang <gary.c.wang@intel.com>
Link: https://lore.kernel.org/r/20250310011411.31685-12-qiuxu.zhuo@intel.com
2025-03-10 10:47:40 -07:00
Qiuxu Zhuo d0742284ec EDAC/ie31200: Add Intel Raptor Lake-S SoCs support
The Intel Raptor Lake-S SoC contains two memory controllers with DDR5
memory type and out-of-band ECC capability. The resource definitions of
the memory controller are different from previous generations. One notable
difference is that the PCI ERRSTS register is deprecated and is not used
to indicate the presence of errors or to clear the MMIO-mapped ECC error
log regsiters.

Extend the ie31200_edac driver to support multiple memory controllers,
add a resource configuration table and use an MSR register to clear the
ECC error log registers to provide EDAC support for Raptor Lake-S SoCs.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Gary Wang <gary.c.wang@intel.com>
Link: https://lore.kernel.org/r/20250310011411.31685-11-qiuxu.zhuo@intel.com
2025-03-10 10:47:14 -07:00
Qiuxu Zhuo 498550e1fa EDAC/ie31200: Break up ie31200_probe1()
Split ie31200_probe1() into two helper functions to easily extend support
for multiple memory controllers.

No functional changes intended.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Gary Wang <gary.c.wang@intel.com>
Link: https://lore.kernel.org/r/20250310011411.31685-10-qiuxu.zhuo@intel.com
2025-03-10 10:46:48 -07:00
Qiuxu Zhuo a217961b83 EDAC/ie31200: Fold the two channel loops into one loop
Fold the two channel loops to simplify the code and improve readability.
Also, delete the comments related to the DRB register, as this register
is not used here.

 No functional changes intended.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Gary Wang <gary.c.wang@intel.com>
Link: https://lore.kernel.org/r/20250310011411.31685-9-qiuxu.zhuo@intel.com
2025-03-10 10:46:21 -07:00
Qiuxu Zhuo afdbc36555 EDAC/ie31200: Make struct dimm_data contain decoded information
The current dimm_data structure contains encoded DIMM information,
which needs to be decoded for a given SoC when it is used. Make it
contain decoded information when it's initialized so that the places
where it is used do not need to decode it again, thereby simplifying
the code.

No functional changes intended.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Gary Wang <gary.c.wang@intel.com>
Link: https://lore.kernel.org/r/20250310011411.31685-8-qiuxu.zhuo@intel.com
2025-03-10 10:45:56 -07:00
Qiuxu Zhuo 2a52cce648 EDAC/ie31200: Make the memory controller resources configurable
The resources such as MMIO, register offset, register mask, memory DIMM
information, ECC error log location, etc., of the memory controller, and
the number of memory controllers can be device-ID-specific. It requires
adding numerous 'if (device_id == new_id)' special handling cases to the
code to support a new SoC.

Make these kinds of resources configurable and separate them from the code
to facilitate the addition of new SoC support.

No functional changes intended.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Gary Wang <gary.c.wang@intel.com>
Link: https://lore.kernel.org/r/20250310011411.31685-7-qiuxu.zhuo@intel.com
2025-03-10 10:45:07 -07:00
Qiuxu Zhuo 312e67a03d EDAC/ie31200: Simplify the pci_device_id table
Use PCI_VDEVICE() to simplify the pci_device_id table.

No functional changes intended.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Gary Wang <gary.c.wang@intel.com>
Link: https://lore.kernel.org/r/20250310011411.31685-6-qiuxu.zhuo@intel.com
2025-03-10 10:44:40 -07:00
Qiuxu Zhuo 44eae52089 EDAC/ie31200: Fix the 3rd parameter name of *populate_dimm_info()
The 3rd parameter of *populate_dimm_info() pertains to the DIMM index
within a channel, not the channel index. Fix the parameter name to dimm
to reflect its actual purpose.

No functional changes intended.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Gary Wang <gary.c.wang@intel.com>
Link: https://lore.kernel.org/r/20250310011411.31685-5-qiuxu.zhuo@intel.com
2025-03-10 10:44:12 -07:00
Qiuxu Zhuo 231e341036 EDAC/ie31200: Fix the error path order of ie31200_init()
The error path order of ie31200_init() is incorrect, fix it.

Fixes: 709ed1bcef ("EDAC/ie31200: Fallback if host bridge device is already initialized")
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Gary Wang <gary.c.wang@intel.com>
Link: https://lore.kernel.org/r/20250310011411.31685-4-qiuxu.zhuo@intel.com
2025-03-10 10:43:39 -07:00
Qiuxu Zhuo 3427befbbc EDAC/ie31200: Fix the DIMM size mask for several SoCs
The DIMM size mask for {Sky, Kaby, Coffee} Lake is not bits{7:0},
but bits{5:0}. Fix it.

Fixes: 953dee9bbd ("EDAC, ie31200_edac: Add Skylake support")
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Gary Wang <gary.c.wang@intel.com>
Link: https://lore.kernel.org/r/20250310011411.31685-3-qiuxu.zhuo@intel.com
2025-03-10 10:43:05 -07:00
Qiuxu Zhuo d59d844e31 EDAC/ie31200: Fix the size of EDAC_MC_LAYER_CHIP_SELECT layer
The EDAC_MC_LAYER_CHIP_SELECT layer pertains to the rank, not the DIMM.
Fix its size to reflect the number of ranks instead of the number of DIMMs.
Also delete the unused macros IE31200_{DIMMS,RANKS}.

Fixes: 7ee40b897d ("ie31200_edac: Introduce the driver")
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Gary Wang <gary.c.wang@intel.com>
Link: https://lore.kernel.org/r/20250310011411.31685-2-qiuxu.zhuo@intel.com
2025-03-10 10:41:32 -07:00
Arnd Bergmann 49472722d9 EDAC/device: Fix dev_set_name() format string
Passing a variable string as the format to dev_set_name() causes a W=1 warning:

  drivers/edac/edac_device.c:736:9: error: format not a string literal and no format arguments [-Werror=format-security]
    736 |         ret = dev_set_name(&ctx->dev, name);
        |         ^~~

Use a literal "%s" instead so the name can be the argument.

Fixes: db99ea5f2c ("EDAC: Add support for EDAC device features control")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250304143603.995820-1-arnd@kernel.org
2025-03-05 23:35:01 +01:00
Colin Ian King 136899ffc4 EDAC/pnd2: Make read-only const array intlv static
Don't populate the const read-only array intlv on the stack at run time,
instead make it static. This also shrinks the object size:

  $ size pnd2_edac.o.*

     text    data     bss     dec     hex filename
    15632     264    1384   17280    4380 pnd2_edac.o.new
    15644     264    1384   17292    438c pnd2_edac.o.old

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20240919170427.497429-1-colin.i.king@gmail.com
2025-03-03 16:39:26 +01:00
Christophe JAILLET ac2fbe0948 EDAC/igen6: Constify struct res_config
The res_config structs are not modified in this driver.

Constifying these structures moves some data to a read-only section, so
increase overall security, especially when the structure holds some function
pointers.

On a x86_64, with allmodconfig, as an example:

  Before:
  ======
     text	   data	    bss	    dec	    hex	filename
    36777	   2479	   4304	  43560	   aa28	drivers/edac/igen6_edac.o

  After:
  =====
     text	   data	    bss	    dec	    hex	filename
    37297	   1959	   4304	  43560	   aa28	drivers/edac/igen6_edac.o

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/a06153870951a64b438e76adf97d440e02c1a1fc.1738355198.git.christophe.jaillet@wanadoo.fr
2025-03-03 16:33:03 +01:00
Thorsten Blum 12378e1c3f EDAC/amd64: Simplify return statement in dct_ecc_enabled()
Simplify the return statement to improve the code's readability.

No functional changes.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20250201130953.1377-2-thorsten.blum@linux.dev
2025-02-28 13:21:43 +01:00
Shiju Jose 81e42fc1d3 EDAC: Update memory repair control interface for memory sparing feature
Update memory repair control interface for memory sparing feature.

CXL memory devices can support soft and hard memory sparing at cacheline,
row, bank and rank granularities. Memory sparing is defined as a repair
function that replaces a portion of memory with a portion of functional
memory at that same granularity.

When a CXL device detects an error in memory, it will report to the host
that there's need for a repair maintenance operation by using an event
record where the "maintenance needed" flag is set.

The event records contain the device physical address (DPA) and other
attributes of the memory to repair such as bank group, bank, rank, row,
column, channel etc.

The kernel will report the corresponding CXL general media or DRAM trace
event to userspace, and userspace tools (e.g. rasdaemon) will initiate
a repair operation in response to the device request via the sysfs
repair control.

  [ bp: Massage. ]

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250212143654.1893-15-shiju.jose@huawei.com
2025-02-26 11:14:40 +01:00
Shiju Jose 699ea5219c EDAC: Add a memory repair control feature
Add a generic EDAC memory repair control driver to manage memory repairs in
the system, such as CXL Post Package Repair (PPR) and other soft and hard PPR
features.

For example, a CXL device with DRAM components that support PPR features may
implement PPR maintenance operations. DRAM components may support two types of
PPR:

 - hard PPR, for a permanent row repair, and
 - soft PPR,  for a temporary row repair.

Soft PPR is much faster than hard PPR, but the repair is lost with a power
cycle.

When a CXL device detects an error in a memory, it may report the need for
a repair maintenance operation by using an event record where the "maintenance
needed" flag is set. The event records contain the device physical
address (DPA) and other optional attributes of the memory to repair.

The kernel will report the corresponding CXL general media or DRAM trace event
to userspace, and userspace tools (e.g. rasdaemon) will initiate a repair
operation in response to the device request via the sysfs repair control.

Device with memory repair features registers with EDAC device driver, which
retrieves a memory repair descriptor from EDAC memory repair driver and exposes
the sysfs repair control attributes to userspace in

  /sys/bus/edac/devices/<dev-name>/mem_repairX/.

The common memory repair control interface abstracts the control of arbitrary
memory repair functionality into a standardized set of functions.  The sysfs
memory repair attribute nodes are only available if the client driver has
implemented the corresponding attribute callback function and provided
operations to the EDAC device driver during registration.

  [ bp: Massage, fixup edac_dev_register() retvals, merge
    write_overflow fix to mem_repair_create_desc() ]

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250212143654.1893-5-shiju.jose@huawei.com
2025-02-26 11:13:23 +01:00
Thorsten Blum d09055122b EDAC: Use string choice helper functions
Remove hard-coded strings by using the str_enabled_disabled(), str_yes_no(),
str_write_read(), and str_plural() helper functions.

Add a space in "All DIMMs support ECC: yes/no" to improve readability.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20250223212429.3466-2-thorsten.blum@linux.dev
2025-02-25 22:19:55 +01:00
Shiju Jose bcbd069b11 EDAC: Add a Error Check Scrub control feature
Add an Error Check Scrub (ECS) control to manage a memory device's ECS
feature.

The ECS is a feature defined in JEDEC DDR5 SDRAM Specification (JESD79-5) and
allows the DRAM to internally read, correct single-bit errors, and write back
corrected data bits to the DRAM array while providing transparency to error
counts.

The DDR5 device contains a number of memory media Field Replaceable Units
(FRU) per device. The DDR5 ECS feature and thus the ECS control driver
supports configuring the ECS parameters per FRU.

Memory devices support the ECS feature register with the EDAC device driver,
which retrieves the ECS descriptor from the EDAC ECS driver.  This driver
exposes sysfs ECS control attributes to userspace via

  /sys/bus/edac/devices/<dev-name>/ecs_fruX/.

The common sysfs ECS control interface abstracts the control of an arbitrary
ECS functionality to a common set of functions.

Support for the ECS feature is added separately because the control attributes
of the DDR5 ECS feature differ from those of the scrub feature.

The sysfs ECS attribute nodes are only present if the client driver has
implemented the corresponding attribute callback function and passed the
necessary operations to the EDAC RAS feature driver during registration.

  [ bp: Massage, fixup edac_dev_register() retvals. ]

Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Tested-by: Fan Ni <fan.ni@samsung.com>
Link: https://lore.kernel.org/r/20250212143654.1893-4-shiju.jose@huawei.com
2025-02-25 15:42:32 +01:00
Shiju Jose f90b738166 EDAC: Add scrub control feature
Add a scrub control to manage memory scrubbers in the system.

Devices with a scrub feature register with the EDAC device driver which
retrieves the scrub descriptor from the scrub driver and exposes the
control attributes for a instance to userspace at

  /sys/bus/edac/devices/<dev-name>/scrubX/.

The common sysfs scrub control interface abstracts the control of
arbitrary scrubbing functionality into a common set of functions. The
attribute nodes are only present if the client driver has implemented
the corresponding attribute callback function and passed the operations
to the device driver during registration.

  [ bp: Massage commit message, docs and code, simplify text a bit.
    Integrate fixup for: https://lore.kernel.org/r/202502251009.0sGkolEJ-lkp@intel.com
    Reported-by: kernel test robot <lkp@intel.com>
    Reported-by: Dan Carpenter <dan.carpenter@linaro.org> ]

Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Daniel Ferguson <danielf@os.amperecomputing.com>
Tested-by: Fan Ni <fan.ni@samsung.com>
Link: https://lore.kernel.org/r/20250212143654.1893-3-shiju.jose@huawei.com
2025-02-25 15:39:09 +01:00
Shiju Jose db99ea5f2c EDAC: Add support for EDAC device features control
Add generic EDAC device feature controls supporting the registration of RAS
features available in the system. The driver exposes control attributes for
these features to userspace in

  /sys/bus/edac/devices/<dev-name>/<ras-feature>

  [ bp: Touch-up documentation, simplify, make edac_dev_type static,
    fixup edac_dev_register() retvals. ]

Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Tested-by: Daniel Ferguson <danielf@os.amperecomputing.com>
Tested-by: Fan Ni <fan.ni@samsung.com>
Link: https://lore.kernel.org/r/20250212143654.1893-2-shiju.jose@huawei.com
2025-02-25 15:33:27 +01:00
Qiuxu Zhuo d9207cf776 EDAC/{skx_common,i10nm}: Fix some missing error reports on Emerald Rapids
When doing error injection to some memory DIMMs on certain Intel Emerald
Rapids servers, the i10nm_edac missed error reports for some memory DIMMs.

Certain BIOS configurations may hide some memory controllers, and the
i10nm_edac doesn't enumerate these hidden memory controllers. However, the
ADXL decodes memory errors using memory controller physical indices even
if there are hidden memory controllers. Therefore, the memory controller
physical indices reported by the ADXL may mismatch the logical indices
enumerated by the i10nm_edac, resulting in missed error reports for some
memory DIMMs.

Fix this issue by creating a mapping table from memory controller physical
indices (used by the ADXL) to logical indices (used by the i10nm_edac) and
using it to convert the physical indices to the logical indices during the
error handling process.

Fixes: c545f5e412 ("EDAC/i10nm: Skip the absent memory controllers")
Reported-by: Kevin Chang <kevin1.chang@intel.com>
Tested-by: Kevin Chang <kevin1.chang@intel.com>
Reported-by: Thomas Chen <Thomas.Chen@intel.com>
Tested-by: Thomas Chen <Thomas.Chen@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250214002728.6287-1-qiuxu.zhuo@intel.com
2025-02-20 17:02:33 -08:00
Qiuxu Zhuo 267e5b1d26 EDAC/igen6: Fix the flood of invalid error reports
The ECC_ERROR_LOG register of certain SoCs may contain the invalid value
~0, which results in a flood of invalid error reports in polling mode.

Fix the flood of invalid error reports by skipping the invalid ECC error
log value ~0.

Fixes: e14232afa9 ("EDAC/igen6: Add polling support")
Reported-by: Ramses <ramses@well-founded.dev>
Closes: https://lore.kernel.org/all/OISL8Rv--F-9@well-founded.dev/
Tested-by: Ramses <ramses@well-founded.dev>
Reported-by: John <therealgraysky@proton.me>
Closes: https://lore.kernel.org/all/p5YcxOE6M3Ncxpn2-Ia_wCt61EM4LwIiN3LroQvT_-G2jMrFDSOW5k2A9D8UUzD2toGpQBN1eI0sL5dSKnkO8iteZegLoQEj-DwQaMhGx4A=@proton.me/
Tested-by: John <therealgraysky@proton.me>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250212083354.31919-1-qiuxu.zhuo@intel.com
2025-02-20 17:00:38 -08:00
Arnd Bergmann c29dfd661f EDAC/ie31200: work around false positive build warning
gcc-14 produces a bogus warning in some configurations:

drivers/edac/ie31200_edac.c: In function 'ie31200_probe1.isra':
drivers/edac/ie31200_edac.c:412:26: error: 'dimm_info' is used uninitialized [-Werror=uninitialized]
  412 |         struct dimm_data dimm_info[IE31200_CHANNELS][IE31200_DIMMS_PER_CHANNEL];
      |                          ^~~~~~~~~
drivers/edac/ie31200_edac.c:412:26: note: 'dimm_info' declared here
  412 |         struct dimm_data dimm_info[IE31200_CHANNELS][IE31200_DIMMS_PER_CHANNEL];
      |                          ^~~~~~~~~

I don't see any way the unintialized access could really happen here,
but I can see why the compiler gets confused by the two loops.

Instead, rework the two nested loops to only read the addr_decode
registers and then keep only one instance of the dimm info structure.

[Tony: Qiuxu pointed out that the "populate DIMM info" comment was left
behind in the refactor and suggested moving it. I deleted the comment
as unnecessry in front os a call to populate_dimm_info(). That seems
pretty self-describing.]

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/all/20250122065031.1321015-1-arnd@kernel.org
2025-02-20 16:56:41 -08:00
Komal Bajaj c158647c10 EDAC/qcom: Correct interrupt enable register configuration
The previous implementation incorrectly configured the cmn_interrupt_2_enable
register for interrupt handling. Using cmn_interrupt_2_enable to configure
Tag, Data RAM ECC interrupts would lead to issues like double handling of the
interrupts (EL1 and EL3) as cmn_interrupt_2_enable is meant to be configured
for interrupts which needs to be handled by EL3.

EL1 LLCC EDAC driver needs to use cmn_interrupt_0_enable register to configure
Tag, Data RAM ECC interrupts instead of cmn_interrupt_2_enable.

Fixes: 27450653f1 ("drivers: edac: Add EDAC driver support for QCOM SoCs")
Signed-off-by: Komal Bajaj <quic_kbajaj@quicinc.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20241119064608.12326-1-quic_kbajaj@quicinc.com
2025-02-14 20:36:11 +01:00
Linus Torvalds b9d8a295ed - The first part of a restructuring of AMD's representation of a northbridge
which is legacy now, and the creation of the new AMD node concept which
   represents the Zen architecture of having a collection of I/O devices within
   an SoC. Those nodes comprise the so-called data fabric on Zen. This has
   at least one practical advantage of not having to add a PCI ID each time
   a new data fabric PCI device releases. Eventually, the lot more uniform
   provider of data fabric functionality amd_node.c will be used by all the
   drivers which need it
 
 - Smaller cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmePuPIACgkQEsHwGGHe
 VUpU6Q//S9j9+YC9EpredFoJ5W0BfERR5XOum7YjlLxq2mVTStrf9Q1ecrwmS4Q6
 4mAydIDfhqNlouUjMBgNNFJcvm8lat+/pjY78oT8ZdjumslMbMxo81VmQ3fX+6fE
 izMrL81DG4j8zeleUyz5ecJEK/KPw1s3SkY736511PeJSalOU4hLYmU819imfAk/
 5c9os2GNhszIROE1YUYZQ3zXne1t2PNXKvctzVrJYjyKpIDgFNzTj6gXhePzXBNO
 iFdApqSgKdnnsD6VsfxYVnOKP+cSIl27Tbge6dm7DHQbSs00aVL64JPcX8/hWtp6
 ExrwBYiFk6yafwsNUu7/PmqbZNKYxDgvXFq8jSOFfioh6Km/QZYs8y1/qXN3qmSU
 78Ah5jyO+U+++FsSa2o9eRpU2l84UIQqvp84PeSLylzh7iLFyFCWsMfreNeIsF9v
 Jsost58JQOCufRK3qfMiDO88QUZRKyCfFymDAVcvPoBwp5nK9R1ohlbxgXrCPsE7
 Bd7J6jrlpcoRyYc8vhshkrnK2Sk6pP77OZOh5AZ9AybnALH0afUNLzk6sBtaObkZ
 xIJcSIBkKz3P4zWFKsXmqGYHWp1IsKsYRsNjCt5FExWOF+uKKKBjynHmlKeS0l/b
 J6bwDUPVW/gfkBqDV8bILultj9Gm8L5Z8SwvD1ww69OYN+c7oVk=
 =ZAjD
 -----END PGP SIGNATURE-----

Merge tag 'x86_misc_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 updates from Borislav Petkov:

 - The first part of a restructuring of AMD's representation of a
   northbridge which is legacy now, and the creation of the new AMD node
   concept which represents the Zen architecture of having a collection
   of I/O devices within an SoC. Those nodes comprise the so-called data
   fabric on Zen.

   This has at least one practical advantage of not having to add a PCI
   ID each time a new data fabric PCI device releases. Eventually, the
   lot more uniform provider of data fabric functionality amd_node.c
   will be used by all the drivers which need it

 - Smaller cleanups

* tag 'x86_misc_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/amd_node: Use defines for SMN register offsets
  x86/amd_node: Remove dependency on AMD_NB
  x86/amd_node: Update __amd_smn_rw() error paths
  x86/amd_nb: Move SMN access code to a new amd_node driver
  x86/amd_nb, hwmon: (k10temp): Simplify amd_pci_dev_to_node_id()
  x86/amd_nb: Simplify function 3 search
  x86/amd_nb: Use topology info to get AMD node count
  x86/amd_nb: Simplify root device search
  x86/amd_nb: Simplify function 4 search
  x86: Start moving AMD node functionality out of AMD_NB
  x86/amd_nb: Clean up early_is_amd_nb()
  x86/amd_nb: Restrict init function to AMD-based systems
  x86/mtrr: Rename mtrr_overwrite_state() to guest_force_mtrr_state()
2025-01-21 09:38:52 -08:00
Linus Torvalds 48795f90cb - Remove the less generic CPU matching infra around struct x86_cpu_desc and
use the generic struct x86_cpu_id thing
 
 - Remove magic naked numbers for CPUID functions and use proper defines of the
   prefix CPUID_LEAF_*. Consolidate some of the crazy use around the tree
 
 - Smaller cleanups and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmePjeIACgkQEsHwGGHe
 VUqRBA//TinKFcWagaQB3lsnoBRwqyg6JJZIBNMF9sBMDD9HnvEZ/JduC+3+g1rx
 iztuCmRSgQsi/QvRaEFNuDMOgk6gACyXxi7Uf6eXsQkSlsZFViaqbXsy9kqslRbl
 7QP1NS1sfdSd42JPp2UZT/lg9kluuVnn5b40zZIwy2AAzwrNFfZAS4Yg7Qe4XQDF
 xBcHi8MAF+LTm5Tv0hLmx2UcfZLhi7hXy8mTAIFS0Liww+Y5qaam33xw9KxNU5lZ
 tVepzY5my43pRs4MB1CvaQCiZ84GxvAVqz3JYsg5YhVp45xh7P2WtjBeeOqLljaW
 MkWnDLOmlaD4Y0kL4QA3ReyBVux54RbDGKC0E/t5fwYlk3dQ7gYwSEvh5358R+0z
 kwxw3NdnNngoLRXAX45EonSxj36jb6KCBHAGqXSfL73OOt30RWCqknEnixcOp/BP
 chNxCiIx7qko+rAYOD62QkguEEPFdb8roeayhIKtiKL5zUwQAr+jt/pKVx2htWLi
 xxqSaVoCFu4edWpsEJnanqhS0Es0v7YiBU3jDC37rZJ+dtzf0C2ewD7Nb1g+wUTn
 NzDkmt58hQW4jBxoxHBIclLfhEETISTEGAAObTa5I5r8IDb7Dv+ZnSv7RfjoR9fL
 RWMz1bJ1Scem+Fx7fc/IRJFSElC41giSwFlhThHdAzI1m95zJN8=
 =9Hdg
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpuid updates from Borislav Petkov:

 - Remove the less generic CPU matching infra around struct x86_cpu_desc
   and use the generic struct x86_cpu_id thing

 - Remove magic naked numbers for CPUID functions and use proper defines
   of the prefix CPUID_LEAF_*. Consolidate some of the crazy use around
   the tree

 - Smaller cleanups and improvements

* tag 'x86_cpu_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Make all all CPUID leaf names consistent
  x86/fpu: Remove unnecessary CPUID level check
  x86/fpu: Move CPUID leaf definitions to common code
  x86/tsc: Remove CPUID "frequency" leaf magic numbers.
  x86/tsc: Move away from TSC leaf magic numbers
  x86/cpu: Move TSC CPUID leaf definition
  x86/cpu: Refresh DCA leaf reading code
  x86/cpu: Remove unnecessary MwAIT leaf checks
  x86/cpu: Use MWAIT leaf definition
  x86/cpu: Move MWAIT leaf definition to common header
  x86/cpu: Remove 'x86_cpu_desc' infrastructure
  x86/cpu: Move AMD erratum 1386 table over to 'x86_cpu_id'
  x86/cpu: Replace PEBS use of 'x86_cpu_desc' use with 'x86_cpu_id'
  x86/cpu: Expose only stepping min/max interface
  x86/cpu: Introduce new microcode matching helper
  x86/cpufeature: Document cpu_feature_enabled() as the default to use
  x86/paravirt: Remove the WBINVD callback
  x86/cpufeatures: Free up unused feature bits
2025-01-21 09:30:59 -08:00
Linus Torvalds 0763dd8928 - Remove the EDAC PowerPC Cell driver due to the removal of the IBM Cell
blades support
 
 - Add a new EDAC driver for Loongson SoCs which reports single-bit correctable
   errors
 
 - Extend the SKX and i10NM EDAC drivers to support UV systems which can have
   more than 8 nodes
 
 - Add Intel Clearwater Forest server support to i10nm_edac
 
 - Minor fix
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmeLi5QACgkQEsHwGGHe
 VUpZEhAAsbvNhKhq84FUCoQP+OWOdQBv0s3WNGherWPAxC3bGY7xDKcjjdf6ebYG
 4Rk0+nTEK5kefA6PsyiWtxWQZiYUzERDrNpdjWq5RxaO6WtREDiczgJm4NaCCTBr
 F59eHImjW/ajBsU8FAcWnVZLo7KqZvtF19vQFL1TXAKJO6Zpb0ybLIts9BVSwwyV
 c1xYyjMFh1p1H8n7MOsF11u2QUpCcc/SMDesCGWSVAJ2QnB7Ox5NfUaI97lQSiow
 gnS8vTWTIM6e6rZk2HtkSage0Wt7UHCkIsza5DEdW3xQG10eZUE6o33kerxNMezd
 lMWNzavR26CYhkO6/McvhsClOHwcAZZVd4PUTXnNvlNTSV+EEbEbD5JWryQvmvkV
 gazOlPHwd0pj1MkAZBUTdCnR6/DCpqsu68sGMjAiPvR7pb3sBLyRF/DGg9p5Rz0a
 s3Q6SuTAg/ZQWNqgRNcnjh2SZbzqZ+GGD4blDTv1pNjWemTDYpxj73Wl5JxKTdtr
 6n7/ariQTaiMTpQ8ZhDkDP5eHWx5fyTY7P5MzPEuOmNzW/gLz0V5goSqXUUYhQjm
 YAKy5PDS1QwTBKzEOQ8dE3RcOhZtd1X2Vj04CcDgHrjhZBDderPkGs42R3B74cF0
 JB4k5QJFJ97Cxe9xYIFHju7Es6+z+j8kLzWGTydfHGMZfHMHCHM=
 =nwgH
 -----END PGP SIGNATURE-----

Merge tag 'edac_updates_for_v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

 - Remove the EDAC PowerPC Cell driver due to the removal of the IBM
   Cell blades support

 - Add a new EDAC driver for Loongson SoCs which reports single-bit
   correctable errors

 - Extend the SKX and i10NM EDAC drivers to support UV systems which can
   have more than 8 nodes

 - Add Intel Clearwater Forest server support to i10nm_edac

 - Minor fix

* tag 'edac_updates_for_v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/cell: Remove powerpc Cell driver
  EDAC: Add an EDAC driver for the Loongson memory controller
  EDAC: Fix typos in comments
  EDAC/{i10nm,skx,skx_common}: Support UV systems
  EDAC/i10nm: Add Intel Clearwater Forest server support
2025-01-21 08:21:12 -08:00
Borislav Petkov (AMD) 368736db4d Merge remote-tracking branches 'ras/edac-drivers' and 'ras/edac-misc' into edac-updates
* ras/edac-drivers:
  EDAC/cell: Remove powerpc Cell driver
  EDAC: Add an EDAC driver for the Loongson memory controller
  EDAC/{i10nm,skx,skx_common}: Support UV systems
  EDAC/i10nm: Add Intel Clearwater Forest server support

* ras/edac-misc:
  EDAC: Fix typos in comments

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2025-01-17 19:36:27 +01:00
Michael Ellerman 6696037a56 EDAC/cell: Remove powerpc Cell driver
This driver can no longer be built since support for IBM Cell Blades was
removed, in particular PPC_CELL_COMMON.

Remove the driver.

  [ bp: Remove EDAC_CELL from Cell's defconfig too. ]

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241218105523.416573-23-mpe@ellerman.id.au
2025-01-16 17:07:50 +01:00
Mario Limonciello d6caeafaa3 x86/amd_nb: Move SMN access code to a new amd_node driver
SMN access was bolted into amd_nb mostly as convenience.  This has
limitations though that require incurring tech debt to keep it working.

Move SMN access to the newly introduced AMD Node driver.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> # pdx86
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> # PMF, PMC
Link: https://lore.kernel.org/r/20241206161210.163701-11-yazen.ghannam@amd.com
2025-01-08 10:59:44 +01:00
Zhao Qunqin 558aff7a63 EDAC: Add an EDAC driver for the Loongson memory controller
Add ECC support for Loongson SoC DDR controller. This driver reports single
bit errors (CE) only.

Only ACPI firmware is supported.

  [ bp: Document what last_ce_count is for. ]

Signed-off-by: Zhao Qunqin <zhaoqunqin@loongson.cn>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Link: https://lore.kernel.org/r/20241219124846.1876-1-zhaoqunqin@loongson.cn
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2025-01-04 12:02:04 +01:00
Dave Hansen 85b08180df x86/cpu: Expose only stepping min/max interface
The x86_match_cpu() infrastructure can match CPU steppings. Since
there are only 16 possible steppings, the matching infrastructure goes
all out and stores the stepping match as a bitmap. That means it can
match any possible steppings in a single list entry. Fun.

But it exposes this bitmap to each of the X86_MATCH_*() helpers when
none of them really need a bitmap. It makes up for this by exporting a
helper (X86_STEPPINGS()) which converts a contiguous stepping range
into the bitmap which every single user leverages.

Instead of a bitmap, have the main helper for this sort of thing
(X86_MATCH_VFM_STEPS()) just take a stepping range. This ends up
actually being even more compact than before.

Leave the helper in place (renamed to __X86_STEPPINGS()) to make it
more clear what is going on instead of just having a random GENMASK()
in the middle of an already complicated macro.

One oddity that I hit was this macro:

       X86_MATCH_VFM_STEPS(vfm, X86_STEPPING_MIN, max_stepping, issues)

It *could* have been converted over to take a min/max stepping value
for each entry. But that would have been a bit too verbose and would
prevent the one oddball in the list (INTEL_COMETLAKE_L stepping 0)
from sticking out.

Instead, just have it take a *maximum* stepping and imply that the match
is from 0=>max_stepping. This is functional for all the cases now and
also retains the nice property of having INTEL_COMETLAKE_L stepping 0
stick out like a sore thumb.

skx_cpuids[] is goofy. It uses the stepping match but encodes all
possible steppings. Just use a normal, non-stepping match helper.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20241213185129.65527B2A%40davehans-spike.ostc.intel.com
2024-12-17 16:14:49 -08:00
Yan Zhen 586e62fe38 EDAC: Fix typos in comments
Fix the following typos:

'Alocate' ==> 'Allocate',
'specifed' ==> 'specified',
'Technlogy' ==> 'Technology',
'Brnach' ==> 'Branch',
'branchs' ==> 'branches'.

Signed-off-by: Yan Zhen <yanzhen@vivo.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240930074023.618110-1-yanzhen@vivo.com
2024-12-15 22:17:34 +01:00
Kyle Meyer 584e09743d EDAC/{i10nm,skx,skx_common}: Support UV systems
The 3-bit source IDs in PCI configuration space registers, used to map
devices to sockets, are limited to 8 unique IDs, and each ID is local to
a UPI/QPI domain.

Source IDs cannot be used to map devices to sockets on UV systems
because they can exceed 8 sockets and have multiple UPI/QPI domains with
identical, repeating source IDs.

Use NUMA information to get package IDs instead of source IDs on UV
systems, and use package/source IDs to name IMC information structures.

Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/all/20241213012549.43099-1-kyle.meyer@hpe.com/
2024-12-13 11:10:31 -08:00
Borislav Petkov (AMD) 747367340c EDAC/amd64: Simplify ECC check on unified memory controllers
The intent of the check is to see whether at least one UMC has ECC
enabled. So do that instead of tracking which ones are enabled in masks
which are too small in size anyway and lead to not loading the driver on
Zen4 machines with UMCs enabled over UMC8.

Fixes: e2be5955a8 ("EDAC/amd64: Add support for AMD Family 19h Models 10h-1Fh and A0h-AFh")
Reported-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Avadhut Naik <avadhut.naik@amd.com>
Reviewed-by: Avadhut Naik <avadhut.naik@amd.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20241210212054.3895697-1-avadhut.naik@amd.com
2024-12-11 21:47:33 +01:00
Qiuxu Zhuo 2e55bb9b71 EDAC/i10nm: Add Intel Clearwater Forest server support
Clearwater Forest is the successor to Sierra Forest. Add Clearwater
Forest CPU model ID for EDAC support.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
Link: https://lore.kernel.org/r/20241203022038.72873-1-qiuxu.zhuo@intel.com
2024-12-09 11:18:23 -08:00
Linus Torvalds e70140ba0d Get rid of 'remove_new' relic from platform driver struct
The continual trickle of small conversion patches is grating on me, and
is really not helping.  Just get rid of the 'remove_new' member
function, which is just an alias for the plain 'remove', and had a
comment to that effect:

  /*
   * .remove_new() is a relic from a prototype conversion of .remove().
   * New drivers are supposed to implement .remove(). Once all drivers are
   * converted to not use .remove_new any more, it will be dropped.
   */

This was just a tree-wide 'sed' script that replaced '.remove_new' with
'.remove', with some care taken to turn a subsequent tab into two tabs
to make things line up.

I did do some minimal manual whitespace adjustment for places that used
spaces to line things up.

Then I just removed the old (sic) .remove_new member function, and this
is the end result.  No more unnecessary conversion noise.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-01 15:12:43 -08:00
Linus Torvalds 42d9e8b7cc powerpc updates for 6.13
- Rework kfence support for the HPT MMU to work on systems with >= 16TB of RAM.
 
  - Remove the powerpc "maple" platform, used by the "Yellow Dog Powerstation".
 
  - Add support for DYNAMIC_FTRACE_WITH_CALL_OPS,
    DYNAMIC_FTRACE_WITH_DIRECT_CALLS & BPF Trampolines.
 
  - Add support for running KVM nested guests on Power11.
 
  - Other small features, cleanups and fixes.
 
 Thanks to: Amit Machhiwal, Arnd Bergmann, Christophe Leroy, Costa Shulyupin,
 David Hunter, David Wang, Disha Goel, Gautam Menghani, Geert Uytterhoeven,
 Hari Bathini, Julia Lawall, Kajol Jain, Keith Packard, Lukas Bulwahn, Madhavan
 Srinivasan, Markus Elfring, Michal Suchanek, Ming Lei, Mukesh Kumar Chaurasiya,
 Nathan Chancellor, Naveen N Rao, Nicholas Piggin, Nysal Jan K.A, Paulo Miguel
 Almeida, Pavithra Prakash, Ritesh Harjani (IBM), Rob Herring (Arm), Sachin P
 Bappalige, Shen Lichuan, Simon Horman, Sourabh Jain, Thomas Weißschuh, Thorsten
 Blum, Thorsten Leemhuis, Venkat Rao Bagalkote, Zhang Zekun,
 zhang jiao.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRjvi15rv0TSTaE+SIF0oADX8seIQUCZ0Fi5AAKCRAF0oADX8se
 IeI0AQCAkNWRYzGNzPM6aMwDpq5qdeZzvp0rZxuNsRSnIKJlxAD+PAOxOietgjbQ
 Lxt3oizg+UcH/304Y/iyT8IrwI4n+gE=
 =xNtu
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Rework kfence support for the HPT MMU to work on systems with >= 16TB
   of RAM.

 - Remove the powerpc "maple" platform, used by the "Yellow Dog
   Powerstation".

 - Add support for DYNAMIC_FTRACE_WITH_CALL_OPS,
   DYNAMIC_FTRACE_WITH_DIRECT_CALLS & BPF Trampolines.

 - Add support for running KVM nested guests on Power11.

 - Other small features, cleanups and fixes.

Thanks to Amit Machhiwal, Arnd Bergmann, Christophe Leroy, Costa
Shulyupin, David Hunter, David Wang, Disha Goel, Gautam Menghani, Geert
Uytterhoeven, Hari Bathini, Julia Lawall, Kajol Jain, Keith Packard,
Lukas Bulwahn, Madhavan Srinivasan, Markus Elfring, Michal Suchanek,
Ming Lei, Mukesh Kumar Chaurasiya, Nathan Chancellor, Naveen N Rao,
Nicholas Piggin, Nysal Jan K.A, Paulo Miguel Almeida, Pavithra Prakash,
Ritesh Harjani (IBM), Rob Herring (Arm), Sachin P Bappalige, Shen
Lichuan, Simon Horman, Sourabh Jain, Thomas Weißschuh, Thorsten Blum,
Thorsten Leemhuis, Venkat Rao Bagalkote, Zhang Zekun, and zhang jiao.

* tag 'powerpc-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (89 commits)
  EDAC/powerpc: Remove PPC_MAPLE drivers
  powerpc/perf: Add per-task/process monitoring to vpa_pmu driver
  powerpc/kvm: Add vpa latency counters to kvm_vcpu_arch
  docs: ABI: sysfs-bus-event_source-devices-vpa-pmu: Document sysfs event format entries for vpa_pmu
  powerpc/perf: Add perf interface to expose vpa counters
  MAINTAINERS: powerpc: Mark Maddy as "M"
  powerpc/Makefile: Allow overriding CPP
  powerpc-km82xx.c: replace of_node_put() with __free
  ps3: Correct some typos in comments
  powerpc/kexec: Fix return of uninitialized variable
  macintosh: Use common error handling code in via_pmu_led_init()
  powerpc/powermac: Use of_property_match_string() in pmac_has_backlight_type()
  powerpc: remove dead config options for MPC85xx platform support
  powerpc/xive: Use cpumask_intersects()
  selftests/powerpc: Remove the path after initialization.
  powerpc/xmon: symbol lookup length fixed
  powerpc/ep8248e: Use %pa to format resource_size_t
  powerpc/ps3: Reorganize kerneldoc parameter names
  KVM: PPC: Book3S HV: Fix kmv -> kvm typo
  powerpc/sstep: make emulate_vsx_load and emulate_vsx_store static
  ...
2024-11-23 10:44:31 -08:00
Linus Torvalds c1f2ffe207 - Log and handle twp new AMD-specific MCA registers: SYND1 and SYND2 and
report the Field Replaceable Unit text info reported through them
 
 - Add support for handling variable-sized SMCA BERT records
 
 - Add the capability for reporting vendor-specific RAS error info without
   adding vendor-specific fields to struct mce
 
 - Cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmc7OlEACgkQEsHwGGHe
 VUpXihAAgVdZExo/1Rmbh6s/259BH38GP6fL+ePaT1SlUzNi770TY2b7I4OYlms4
 xa9t8LAIVMrrIMIg6w6q8JN4YHAQoVdcbRBvHQYB1a24xtoyxaEJxLKQNLA1soUQ
 Jc9asWMHBuXnLfR/4S8Y2vWrzByOSwxqDBzQCu0Ryqvbg7vdRicNt+Hk9oHHIAYy
 cquZpoDGL3W6BA8sXONbEW/6rcQ33JsEQ+Ub4qr1q2g+kNwXrrFuXZlojmz2MxIs
 xgqeYKyrxK6heX0l8dSiipCATA+sOXXWWzbZtdPjFtDGzwIlV3p4yXN3fucrmHm1
 4Fg1gW5a1V82Qosn0FbGiZPojsahhOE2k1bz+yEMDM3Sg2qeRWcK+V3jiS5zKzPd
 WWqUbRtcaxayoEsAXnWrxrp3vxhlUUf1Ivtgk8mlMjhHPLijV5iranrRj+XHEikR
 H0D3Vm0T1LHCPf9AUsbmo0GAfAOeO9DTAB9LJdKv+OJ4ESVgSPJW/9NKWLXKq41p
 hhs7seJTYNw8sp67cL23TnkSp3S+9kd2U7Od3T1kubtd4fVxVnlowu8Fc6kjqd8v
 n+GbdLxhX7GbOgnT0z2OG5Xmc1pNW1JtRbuxSK59NFNia7r6ZkR7BE/OCtL82Rfm
 u7i76z1O0lV91y93GMCyP9DYn8K1ceU7gVCveY6mx/AHgzc87d8=
 =djpG
 -----END PGP SIGNATURE-----

Merge tag 'ras_core_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS updates from Borislav Petkov:

 - Log and handle twp new AMD-specific MCA registers: SYND1 and SYND2
   and report the Field Replaceable Unit text info reported through them

 - Add support for handling variable-sized SMCA BERT records

 - Add the capability for reporting vendor-specific RAS error info
   without adding vendor-specific fields to struct mce

 - Cleanups

* tag 'ras_core_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  EDAC/mce_amd: Add support for FRU text in MCA
  x86/mce/apei: Handle variable SMCA BERT record size
  x86/MCE/AMD: Add support for new MCA_SYND{1,2} registers
  tracing: Add __print_dynamic_array() helper
  x86/mce: Add wrapper for struct mce to export vendor specific info
  x86/mce/intel: Use MCG_BANKCNT_MASK instead of 0xff
  x86/mce/mcelog: Use xchg() to get and clear the flags
2024-11-19 12:04:51 -08:00
Linus Torvalds 77286b868f - Add support for Bluefield-2 SOCs to bluefield_edac
- Add support for Intel Panther Lake-H to igen6_edac
 
 - Add polling support to igen6_edac as some Intel M100 chips have trouble with
   error interrupts
 
 - Add Kaby Lake-S support to ie31200_edac
 
 - Fix memory source detection in the SKX common module which is used by
   a couple of Intel EDAC drivers
 
 - Add support for the NXP i.MX9 memory controller to fsl_edac
 
 - The usual fixes and cleanups all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmc7KAcACgkQEsHwGGHe
 VUpzxQ/6Ahr49jXu58M69UQSW3DdzEU+5NNxmUrZdRrdW/oCJXGpuRdmdFWzvWTj
 HtfCS7GmaSIUPjLaNisyKdCaZxWysBqyLe0Vaexw5nuyybF5TzdYWETqFef1ij9z
 Wqq1j5LPrz+9BiqFqkpbgzo6Y6Ubsv2RKuZu+1GkMT2zRrgEJuJgHi6RlJ8vqj//
 7FePl3CFQ3HDdTom0/L/gsMqSObj7HEq9cbalIjIYw/GRVkZol21vDwKrUkM7rpF
 tfrN1qq3NuJyqM7Du2jw2VtXDomrQ/ZkABNXCbtbczf8trLYUHR5QqIQjxy2ZFts
 jMKIbdCNAfgiqai6bpmm4QHWAIAV3L5DX7OuPmbpQeAzSmOqSEqNbnLbvA1e472f
 5upQH4OLOsHgbnnFTQJ7vcU5jHf41DSauMCFp60h2hyn5RIiVY5ASxRfQ3xdh/+a
 hp2N+hB/y46AjXAidsGhAuUw8nt44MN2x1gtiUfbtMIx6gTewtuu0SbwOb85JW16
 glhD8vxRGTUWoQit+Nh3u/P/rLSGkUJK87mfPr6O/95lleYy5hOizK2jGDbDWkA+
 zOnNXnSWKK/WM+B9qnJnU1sCC7vT3j7cTaDXB1XS2MtcJbArkNC0FOd6xD81PoGh
 MhfWBAKpirXQEomFqpVziDa2wlaUnZrv7/4GGmaBRO401O9iaE4=
 =C3dY
 -----END PGP SIGNATURE-----

Merge tag 'edac_updates_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

 - Add support for Bluefield-2 SOCs to bluefield_edac

 - Add support for Intel Panther Lake-H to igen6_edac

 - Add polling support to igen6_edac as some Intel M100 chips have
   trouble with error interrupts

 - Add Kaby Lake-S support to ie31200_edac

 - Fix memory source detection in the SKX common module which is used by
   a couple of Intel EDAC drivers

 - Add support for the NXP i.MX9 memory controller to fsl_edac

 - The usual fixes and cleanups all over the place

* tag 'edac_updates_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/igen6: Add polling support
  EDAC/igen6: Initialize edac_op_state according to the configuration data
  EDAC/igen6: Avoid segmentation fault on module unload
  EDAC/ie31200: Add Kaby Lake-S dual-core host bridge ID
  MAINTAINERS: Change FSL DDR EDAC maintainership
  EDAC/{skx_common,i10nm}: Fix incorrect far-memory error source indicator
  EDAC/skx_common: Differentiate memory error sources
  EDAC/fsl_ddr: Add support for i.MX9 DDR controller
  dt-bindings: memory: fsl: Add compatible string nxp,imx9-memory-controller
  EDAC/fsl_ddr: Fix bad bit shift operations
  EDAC/fsl_ddr: Move global variables into struct fsl_mc_pdata
  EDAC/fsl_ddr: Pass down fsl_mc_pdata in ddr_in32() and ddr_out32()
  RAS/AMD/ATL: Add debug prints for DF register reads
  EDAC/bluefield: Use Arm SMC for EMI access on BlueField-2
  EDAC/bluefield: Fix potential integer overflow
  EDAC/igen6: Add Intel Panther Lake-H SoCs support
2024-11-19 12:00:10 -08:00
Michael Ellerman 3c592ce799 EDAC/powerpc: Remove PPC_MAPLE drivers
These two drivers are only buildable for the powerpc "maple" platform
(CONFIG_PPC_MAPLE), which has now been removed, see
commit 62f8f307c8 ("powerpc/64: Remove maple platform").

Remove the drivers.

Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241112084134.411964-1-mpe@ellerman.id.au
2024-11-19 16:41:16 +11:00
Borislav Petkov (AMD) 1b38da0115 Merge branch 'edac-misc' into edac-updates
* edac-misc:
  MAINTAINERS: Change FSL DDR EDAC maintainership
  RAS/AMD/ATL: Add debug prints for DF register reads
  EDAC/bluefield: Use Arm SMC for EMI access on BlueField-2
  EDAC/bluefield: Fix potential integer overflow
  EDAC/igen6: Add Intel Panther Lake-H SoCs support

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-11-18 11:33:23 +01:00
Orange Kao e14232afa9 EDAC/igen6: Add polling support
Some PCs with Intel N100 (with PCI device 8086:461c, DID_ADL_N_SKU4)
experienced issues with error interrupts not working, even with the
following configuration in the BIOS.

    In-Band ECC Support: Enabled
    In-Band ECC Operation Mode: 2 (make all requests protected and
                                   ignore range checks)
    IBECC Error Injection Control: Inject Correctable Error on insertion
                                   counter
    Error Injection Insertion Count: 251658240 (0xf000000)

Add polling mode support for these machines to ensure that memory error
events are handled.

Signed-off-by: Orange Kao <orange@aiven.io>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/all/20241106114024.941659-3-orange@aiven.io
2024-11-08 13:36:55 -08:00
Qiuxu Zhuo 1d512b1aa5 EDAC/igen6: Initialize edac_op_state according to the configuration data
Currently, igen6_edac sets edac_op_state to EDAC_OPSTATE_NMI, while the
driver also supports memory errors reported from Machine Check. Initialize
edac_op_state to the correct value according to the configuration data
that the driver probed.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/all/20241106114024.941659-2-orange@aiven.io
2024-11-08 13:35:21 -08:00
Orange Kao fefaae9039 EDAC/igen6: Avoid segmentation fault on module unload
The segmentation fault happens because:

During modprobe:
1. In igen6_probe(), igen6_pvt will be allocated with kzalloc()
2. In igen6_register_mci(), mci->pvt_info will point to
   &igen6_pvt->imc[mc]

During rmmod:
1. In mci_release() in edac_mc.c, it will kfree(mci->pvt_info)
2. In igen6_remove(), it will kfree(igen6_pvt);

Fix this issue by setting mci->pvt_info to NULL to avoid the double
kfree.

Fixes: 10590a9d4f ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219360
Signed-off-by: Orange Kao <orange@aiven.io>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20241104124237.124109-2-orange@aiven.io
2024-11-04 12:09:45 -08:00