Commit Graph

25850 Commits

Author SHA1 Message Date
Ranjan Kumar 39680c59f1 scsi: mpt3sas: Fixed the W=1 compilation warning
Fix W=1 compilation warnings.

Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://patch.msgid.link/20251113153712.31850-7-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-16 22:15:36 -05:00
Ranjan Kumar 72340fecd0 scsi: mpt3sas: Add configurable command retry limit for slow-to-respond devices
Add a new module parameter "command_retry_count" to control the number
of retries during device discovery and readiness checks, improving
reliability for slow or transient SAS/PCIe devices.

Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://patch.msgid.link/20251113153712.31850-6-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-16 22:15:36 -05:00
Ranjan Kumar ad59571931 scsi: mpt3sas: Add firmware event requeue support for busy devices
Add support to requeue SAS/PCIe topology change events when devices are
busy or not ready. Introduce delayed work with retry counters so events
are retried instead of dropped, improving device discovery by retrying
transient failures instead of dropping events.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202510311720.NiDwRLwp-lkp@intel.com/
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://patch.msgid.link/20251113153712.31850-5-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-16 22:15:36 -05:00
Ranjan Kumar aee682fad6 scsi: mpt3sas: Improve device discovery and readiness handling for slow devices
Introduce a new module parameter "issue_scsi_cmd_to_bringup_drive"
(default=1) which allows overriding the driver's behavior of issuing
SCSI TEST_UNIT_READY/START_UNIT commands to bring devices to READY state
during unblock.

Improve device discovery and I/O unblocking reliability by adding
robust device readiness checks and separate callback handling for
discovery I/O.  This introduces new helper routines for SCSI command
execution and readiness determination, ensuring smoother recovery and
initialization for slow or transient devices.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202510310924.crvtELzs-lkp@intel.com/
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://patch.msgid.link/20251113153712.31850-4-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-16 22:14:52 -05:00
Ranjan Kumar 6b553f2a5c scsi: mpt3sas: Added no_turs flag to device unblock logic
Add a "no_turs" flag to _scsih_ublock_io_all_device() to optionally skip
TEST UNIT READY (TUR) checks while unblocking devices. This is used
after broadcast events where sending TURs is not required.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202510310216.gerpzbxP-lkp@intel.com/
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://patch.msgid.link/20251113153712.31850-2-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-16 22:12:39 -05:00
Martin K. Petersen f5c0386e2c Merge patch series "Update lpfc to revision 14.4.0.13"
Justin Tee <justintee8345@gmail.com> says:

Update lpfc to revision 14.4.0.13

This patch set introduces reporting encryption information for Fibre
Channel HBAs.

First, the scsi_transport_fc layer is updated to include a new
fc_encryption_info attribute that is added to struct fc_rport.
Supporting sysfs and LLDD templates are provided.

Then, the lpfc driver is updated to utilize the new templates for
reporting encrypted fibre channel connections to upper layer.

The patches were cut against Martin's 6.19/scsi-queue tree.

Link: https://patch.msgid.link/20251211001659.138635-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-16 21:58:08 -05:00
Justin Tee 6211644253 scsi: lpfc: Update lpfc version to 14.4.0.13
Update lpfc version to 14.4.0.13

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251211001659.138635-4-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-16 21:56:49 -05:00
Sarah Catania e2dacf8e5e scsi: lpfc: Add support for reporting encryption events
Support logging encryption events in both point-to-point and fabric
topologies.  A new LOG_ENCRYPTION flag is defined for reporting
encryption related events for HBAs that support the FEDIF feature.
Encryption information is stored in each NDLP object, which is
determined during the discovery stage after PLOGI completes.

For reporting encryption information to upper layers, the
.get_fc_rport_enc_info routine is implemented in lpfc_get_enc_info().
This allows encryption status to be reported through fc_remote_ports
sysfs.  Debugfs is also updated to report encryption information for all
NDLP objects.

Signed-off-by: Sarah Catania <sarah.catania@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251211001659.138635-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-16 21:56:48 -05:00
Sarah Catania bd2bc52869 scsi: scsi_transport_fc: Introduce encryption group in fc_rport attribute
Introduce a new structure for reporting an encrypted session over an
fc_rport.  The encryption group is added as an attribute in struct
fc_rport and reports information in fc_encryption_info.  This structure
contains a status member variable, which stores a bit value indicating
an encrypted session.

Signed-off-by: Sarah Catania <sarah.catania@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251211001659.138635-2-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-16 21:56:48 -05:00
Yury Norov (NVIDIA) 1a56e63c82 scsi: lpfc: Rework lpfc_sli4_fcf_rr_next_index_get()
The function opencodes for_each_set_bit_wrap(). Use it, and while there
switch from goto-driven codeflow to more high-level constructions.

Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251205235808.358258-1-yury.norov@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-16 21:49:52 -05:00
John Garry a8cf5c1bee scsi: scsi_debug: Drop NULL scsi_cmnd check in sdebug_q_cmd_complete()
The scp pointer cannot be NULL, as it is evaluated from container_of()
and pointer offsets (so remove the check in sdebug_q_cmd_complete()).

Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://patch.msgid.link/20251113133645.2898748-4-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-16 21:48:22 -05:00
John Garry 559ae7a26b scsi: scsi_debug: Stop using READ/WRITE_ONCE() when accessing sdebug_defer.defer_t
Using READ/WRITE_ONCE() means that the read or write is not torn by the
compiler.

READ/WRITE_ONCE() is always used when accessing sdebug_defer.defer_t.

However, we also guard the access in a spinlock when accessing that
member, and spinlock already guarantees no tearing, so stop using
READ/WRITE_ONCE().

Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://patch.msgid.link/20251113133645.2898748-3-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-16 21:48:22 -05:00
John Garry a743b12022 scsi: scsi_debug: Stop printing extra function name in debug logs
The driver defines as follows pr_fmt:

  #define pr_fmt(fmt) KBUILD_MODNAME ":%s: " fmt, __func__

...meaning that we already get the function name added in any debug
statements.

Remove using of __func__ in debug logs to avoid the duplication.

For instances of where the function name was being printed, add some
verbose comment to avoid using "" (which would be a bit silly).

It would be nicer to stop using pr_fmt(), but that would mean rewriting
approx 100 debug statements to have a sensible and clear message.

Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://patch.msgid.link/20251113133645.2898748-2-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-16 21:48:22 -05:00
Michal Rábek 0e16776542 scsi: sg: Fix occasional bogus elapsed time that exceeds timeout
A race condition was found in sg_proc_debug_helper(). It was observed on
a system using an IBM LTO-9 SAS Tape Drive (ULTRIUM-TD9) and monitoring
/proc/scsi/sg/debug every second. A very large elapsed time would
sometimes appear. This is caused by two race conditions.

We reproduced the issue with an IBM ULTRIUM-HH9 tape drive on an x86_64
architecture. A patched kernel was built, and the race condition could
not be observed anymore after the application of this patch. A
reproducer C program utilising the scsi_debug module was also built by
Changhui Zhong and can be viewed here:

https://github.com/MichaelRabek/linux-tests/blob/master/drivers/scsi/sg/sg_race_trigger.c

The first race happens between the reading of hp->duration in
sg_proc_debug_helper() and request completion in sg_rq_end_io().  The
hp->duration member variable may hold either of two types of
information:

 #1 - The start time of the request. This value is present while
      the request is not yet finished.

 #2 - The total execution time of the request (end_time - start_time).

If sg_proc_debug_helper() executes *after* the value of hp->duration was
changed from #1 to #2, but *before* srp->done is set to 1 in
sg_rq_end_io(), a fresh timestamp is taken in the else branch, and the
elapsed time (value type #2) is subtracted from a timestamp, which
cannot yield a valid elapsed time (which is a type #2 value as well).

To fix this issue, the value of hp->duration must change under the
protection of the sfp->rq_list_lock in sg_rq_end_io().  Since
sg_proc_debug_helper() takes this read lock, the change to srp->done and
srp->header.duration will happen atomically from the perspective of
sg_proc_debug_helper() and the race condition is thus eliminated.

The second race condition happens between sg_proc_debug_helper() and
sg_new_write(). Even though hp->duration is set to the current time
stamp in sg_add_request() under the write lock's protection, it gets
overwritten by a call to get_sg_io_hdr(), which calls copy_from_user()
to copy struct sg_io_hdr from userspace into kernel space. hp->duration
is set to the start time again in sg_common_write(). If
sg_proc_debug_helper() is called between these two calls, an arbitrary
value set by userspace (usually zero) is used to compute the elapsed
time.

To fix this issue, hp->duration must be set to the current timestamp
again after get_sg_io_hdr() returns successfully. A small race window
still exists between get_sg_io_hdr() and setting hp->duration, but this
window is only a few instructions wide and does not result in observable
issues in practice, as confirmed by testing.

Additionally, we fix the format specifier from %d to %u for printing
unsigned int values in sg_proc_debug_helper().

Signed-off-by: Michal Rábek <mrabek@redhat.com>
Suggested-by: Tomas Henzl <thenzl@redhat.com>
Tested-by: Changhui Zhong <czhong@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Link: https://patch.msgid.link/20251212160900.64924-1-mrabek@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-16 21:24:28 -05:00
Chandrakanth Patil d373163194 scsi: mpi3mr: Read missing IOCFacts flag for reply queue full overflow
The driver was not reading the MAX_REQ_PER_REPLY_QUEUE_LIMIT IOCFacts
flag, so the reply-queue-full handling was never enabled, even on
firmware that supports it. Reading this flag enables the feature and
prevents reply queue overflow.

Fixes: f08b24d827 ("scsi: mpi3mr: Avoid reply queue full condition")
Cc: stable@vger.kernel.org
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Link: https://patch.msgid.link/20251211002929.22071-1-chandrakanth.patil@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-16 21:19:26 -05:00
John Garry 1f7d6e2efe scsi: scsi_debug: Fix atomic write enable module param description
The atomic write enable module param is "atomic_wr", and not
"atomic_write", so fix the module param description.

Fixes: 84f3a3c01d ("scsi: scsi_debug: Atomic write support")
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251211100651.9056-1-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-16 21:09:44 -05:00
Linus Torvalds 6a1636e066 SCSI misc on 20251214
The only core fix is in doc; all the others are in drivers, with the
 biggest impacts in libsas being the rollback on error handling and in
 ufs coming from a couple of error handling fixes, one causing a crash
 if it's activated before scanning and the other fixing W-LUN
 resumption.
 
 Signed-off-by: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
 -----BEGIN PGP SIGNATURE-----
 
 iLgEABMIAGAWIQTnYEDbdso9F2cI+arnQslM7pishQUCaT4RWBsUgAAAAAAEAA5t
 YW51MiwyLjUrMS4xMSwyLDImHGphbWVzLmJvdHRvbWxleUBoYW5zZW5wYXJ0bmVy
 c2hpcC5jb20ACgkQ50LJTO6YrIVfSgEA23oY0kPjLLkNj4KAjLoWVPmZ/5efKw49
 m6YUwfvzTS8BAIq4eFHVecwAM8FAe9NwuOKgO5d9u6epAy6wzBeP2jly
 =cpHq
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "The only core fix is in doc; all the others are in drivers, with the
  biggest impacts in libsas being the rollback on error handling and in
  ufs coming from a couple of error handling fixes, one causing a crash
  if it's activated before scanning and the other fixing W-LUN
  resumption"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: qcom: Fix confusing cleanup.h syntax
  scsi: libsas: Add rollback handling when an error occurs
  scsi: device_handler: Return error pointer in scsi_dh_attached_handler_name()
  scsi: ufs: core: Fix a deadlock in the frequency scaling code
  scsi: ufs: core: Fix an error handler crash
  scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed"
  scsi: ufs: core: Fix RPMB link error by reversing Kconfig dependencies
  scsi: qla4xxx: Use time conversion macros
  scsi: qla2xxx: Enable/disable IRQD_NO_BALANCING during reset
  scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset
  scsi: imm: Fix use-after-free bug caused by unfinished delayed work
  scsi: target: sbp: Remove KMSG_COMPONENT macro
  scsi: core: Correct documentation for scsi_device_quiesce()
  scsi: mpi3mr: Prevent duplicate SAS/SATA device entries in channel 1
  scsi: target: Reset t_task_cdb pointer in error case
  scsi: ufs: core: Fix EH failure after W-LUN resume error
2025-12-14 15:35:35 +12:00
Chaohai Chen 362432e9b9 scsi: libsas: Add rollback handling when an error occurs
In sas_register_phys(), if an error is triggered in the loop process, we
need to roll back the resources that have already been requested.

Add sas_unregister_phys() when an error occurs in sas_register_ha().

[mkp: a few coding style tweaks and address John's comment]

Signed-off-by: Chaohai Chen <wdhh6@aliyun.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Link: https://patch.msgid.link/20251206060616.69216-1-wdhh6@aliyun.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-08 22:08:31 -05:00
Benjamin Marzinski fd81bc5cca scsi: device_handler: Return error pointer in scsi_dh_attached_handler_name()
If scsi_dh_attached_handler_name() fails to allocate the handler name,
dm-multipath (its only caller) assumes there is no attached device
handler, and sets the device up incorrectly. Return an error pointer
instead, so multipath can distinguish between failure, success where
there is no attached device handler, or when the path device is not a
SCSI device at all.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Link: https://patch.msgid.link/20251206010015.1595225-1-bmarzins@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-08 22:04:38 -05:00
Linus Torvalds 4482ebb297 block-6.19-20251208
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmk3KZsQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpkNYD/91yqAJeehx2Heq3dWj9L8hDuETQelj/g9j
 gtZCiriAPy+bb1/BmWjK+BmvjtBt+g3a4Cwi6tVj4F1zoE46IPeLhO+2iJTEBiBq
 AhRtEf/MFXFK3qUnTpEnS8w3CtsXejOTB81VQ+6BysSu+B708m/1AQHv2HocZ37R
 jivrzfCsEdBr+ISwYw/EG5KcDBVTFo/JdXIhs7k4Z8bBfa3P5ye4EhKjORtgbFNU
 5nXb78SZoWNCZF143YV++9MpZc3M2jzkzrk1CTLsUHhOxWg4T/6wTXfPGZc/W4m8
 UBhs03u/gMJnKHhlZd4kpZWDito1TQZTdY2f5sBsysRQqeT7bwDK/1xiQ1nllZiP
 oYbeD6t65yMAlELwNFXo7y/DNcS2VLBMvChIX6p1gweEzyf23YneoHYyN5agEQlN
 9C4EdcYzZRt0DwtHlIRtKvDk2LZzkJAcLau3D6ahU/DPLOawyWZKmvGiU+sSyJjF
 bEIO5c/+MLqkAgLAGaFgA4twFF1aYH9ssmJerDxprarkf1jtlOBLvUQ391Gtb5Hd
 B1yugmIgEwLbCFzhk9FlCtv2nQcWRCElnaeqv+Lv+xCBVPGCLm2qIHoTqmvHZPCd
 GbN/h0XLdgUboYPCFWVAX72/4K/cv+fQQcb+a7tiq6vMKcgJ/2I1szFGpFqz7azB
 hyiK0v3x2g==
 =r1xa
 -----END PGP SIGNATURE-----

Merge tag 'block-6.19-20251208' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block updates from Jens Axboe:
 "Followup set of fixes and updates for block for the 6.19 merge window.

  NVMe had some late minute debates which lead to dropping some patches
  from that tree, which is why the initial PR didn't have NVMe included.
  It's here now. This pull request contains:

   - NVMe pull request via Keith:
       - Subsystem usage cleanups (Max)
       - Endpoint device fixes (Shin'ichiro)
       - Debug statements (Gerd)
       - FC fabrics cleanups and fixes (Daniel)
       - Consistent alloc API usages (Israel)
       - Code comment updates (Chu)
       - Authentication retry fix (Justin)

   - Fix a memory leak in the discard ioctl code, if the task is being
     interrupted by a signal at just the wrong time

   - Zoned write plugging fixes

   - Add ioctls for for persistent reservations

   - Enable per-cpu bio caching by default

   - Various little fixes and tweaks"

* tag 'block-6.19-20251208' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (27 commits)
  nvme-fabrics: add ENOKEY to no retry criteria for authentication failures
  nvme-auth: use kvfree() for memory allocated with kvcalloc()
  nvmet-tcp: use kvcalloc for commands array
  nvmet-rdma: use kvcalloc for commands and responses arrays
  nvme: fix typo error in nvme target
  nvmet-fc: use pr_* print macros instead of dev_*
  nvmet-fcloop: remove unused lsdir member.
  nvmet-fcloop: check all request and response have been processed
  nvme-fc: check all request and response have been processed
  block: fix memory leak in __blkdev_issue_zero_pages
  block: fix comment for op_is_zone_mgmt() to include RESET_ALL
  block: Clear BLK_ZONE_WPLUG_PLUGGED when aborting plugged BIOs
  blk-mq: Abort suspend when wakeup events are pending
  blk-mq: add blk_rq_nr_bvec() helper
  block: add IOC_PR_READ_RESERVATION ioctl
  block: add IOC_PR_READ_KEYS ioctl
  nvme: reject invalid pr_read_keys() num_keys values
  scsi: sd: reject invalid pr_read_keys() num_keys values
  block: enable per-cpu bio cache by default
  block: use bio_alloc_bioset for passthru IO by default
  ...
2025-12-09 08:53:24 +09:00
Linus Torvalds 7eb7f5723d SCSI misc on 20251204
Usual driver updates (ufs, lpfc, target, qla2xxx) plus assorted
 cleanups and fixes including the WQ_PERCPU series.  The biggest core
 change is the new allocation of pseudo-devices which allow the sending
 of internal commands to a given SCSI target.
 
 Signed-off-by: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
 -----BEGIN PGP SIGNATURE-----
 
 iLgEABMIAGAWIQTnYEDbdso9F2cI+arnQslM7pishQUCaTJf+BsUgAAAAAAEAA5t
 YW51MiwyLjUrMS4xMSwyLDImHGphbWVzLmJvdHRvbWxleUBoYW5zZW5wYXJ0bmVy
 c2hpcC5jb20ACgkQ50LJTO6YrIUC9QEA+q+UqGr7hPTs9C4kdLoxjDpG6tmXo11H
 ZkuXKjR5rDABAPPe3HV0Bk9C9jpLVPLp+fGJvgw//rib+XtYMjwxAJ71
 =bQZL
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "Usual driver updates (ufs, lpfc, target, qla2xxx) plus assorted
  cleanups and fixes including the WQ_PERCPU series.

  The biggest core change is the new allocation of pseudo-devices which
  allow the sending of internal commands to a given SCSI target"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (147 commits)
  scsi: MAINTAINERS: Add the UFS include directory
  scsi: scsi_debug: Support injecting unaligned write errors
  scsi: qla2xxx: Fix improper freeing of purex item
  scsi: ufs: rockchip: Fix compile error without CONFIG_GPIOLIB
  scsi: ufs: rockchip: Reset controller on PRE_CHANGE of hce enable notify
  scsi: ufs: core: Use scsi_device_busy()
  scsi: ufs: core: Fix single doorbell mode support
  scsi: pm80xx: Add WQ_PERCPU to alloc_workqueue() users
  scsi: target: Add WQ_PERCPU to alloc_workqueue() users
  scsi: qedi: Add WQ_PERCPU to alloc_workqueue() users
  scsi: target: ibmvscsi: Add WQ_PERCPU to alloc_workqueue() users
  scsi: qedf: Add WQ_PERCPU to alloc_workqueue() users
  scsi: bnx2fc: Add WQ_PERCPU to alloc_workqueue() users
  scsi: be2iscsi: Add WQ_PERCPU to alloc_workqueue() users
  scsi: message: fusion: Add WQ_PERCPU to alloc_workqueue() users
  scsi: lpfc: WQ_PERCPU added to alloc_workqueue() users
  scsi: scsi_transport_fc: WQ_PERCPU added to alloc_workqueue users()
  scsi: scsi_dh_alua: WQ_PERCPU added to alloc_workqueue() users
  scsi: qla2xxx: WQ_PERCPU added to alloc_workqueue() users
  scsi: target: sbp: Replace use of system_unbound_wq with system_dfl_wq
  ...
2025-12-05 19:56:50 -08:00
Linus Torvalds 43dfc13ca9 pci-v6.19-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmkwoyoUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vztdBAAjvZO0NOafCbhn6lUAz/T4VxPY0R7
 L5RqeLci33rQzbQ0yhJYXsd8VemMo6Zk0qmlSwjddlOMPboHoC1jK3i4C16QoP+3
 R5ecab0VoImLl2Ffig0BZoHQpVq01q2kTGQ2YyrryzDCgBCsBG3U10ZD380pGsTW
 ypqEgOCxaQCq2mtqr5CavaCcquq2krrnHkkVQOP1ryWzRq1C3wDXcQXFYNdzXpDP
 Lq8pBIh8WN5pYwrqjrFMrtNhj7BHPmowLEaAbNIWmH8WjGav624XcKq2O+arx3Hl
 BDHFKVjWtiYikrWmAODZhlY3HgEj546h2HtQYwWPOKuSKkgzNVn28LFIDgL3oXDP
 sJ6gWIDWMgRpEI6VzmqzRXJWbTAkIrRfHv3QFzvATSZV7eHk3eUpMZtG/qOi5z3P
 rPwW7NSXKbPg5qi+zKpqC20Im1Wm6fF4qQtim3uNlz2KXlVGvQl5/Ww27nIZMn6B
 Kkv5RGzdodePbKGqd+CADAQoJOpR6kOng5ZaRdZx+6aTTooyM7KSk8bFq7j/CoYM
 PzVtO6IHIvV42di7H/NP8/qtQA3xwSLue3lWpTh+tzYmZvCdldZiIvBo9YXnlsEM
 kCetDIGBxUWBj7OdYK4hC1BPBVw3XtCWM/51ElslDZTPvT543lApBfMnVTv3JmFp
 gzvZ7XfZx443JOg=
 =qSzr
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull PCI updates from Bjorn Helgaas:
 "Enumeration:

   - Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms (Dan
     Williams)

   - Switch vmd from custom domain number allocator to the common
     allocator to prevent a potential race with new non-VMD buses (Dan
     Williams)

   - Enable Precision Time Measurement (PTM) only if device advertises
     support for a relevant role, to prevent invalid PTM Requests that
     cause ACS violations that are reported as AER Uncorrectable
     Non-Fatal errors (Mika Westerberg)

  Resource management:

   - Prevent resource tree corruption when BAR resize fails (Ilpo
     Järvinen)

   - Restore BARs to the original size if a BAR resize fails (Ilpo
     Järvinen)

   - Remove BAR release from BAR resize attempts by the xe, i915, and
     amdgpu drivers so the PCI core can restore BARs if the resize fails
     (Ilpo Järvinen)

   - Move Resizable BAR code to rebar.c (Ilpo Järvinen)

   - Add pci_rebar_size_supported() and use it in i915 and xe (Ilpo
     Järvinen)

   - Add pci_rebar_get_max_size() and use it in xe and amdgpu (Ilpo
     Järvinen)

  Power management and error handling:

   - For drivers using PCI legacy suspend, save config state at suspend
     so that state (not any earlier state from enumeration, probe, or
     error recovery) will be restored when resuming (Lukas Wunner)

   - For devices with no driver or a driver that lacks power management,
     save config state at hibernate so that state (not any earlier state
     from enumeration, probe, or error recovery) will be restored when
     resuming (Lukas Wunner)

   - Save device config space on device addition, before driver binding,
     so error recovery works more reliably (Lukas Wunner)

   - Drop pci_save_state() from several drivers that no longer need it
     since the PCI core always does it and pci_restore_state() no longer
     invalidates the saved state (Lukas Wunner)

   - Document use of pci_save_state() by drivers to capture the state
     they want restored during error recovery (Lukas Wunner)

  Power control:

   - Add a struct pci_ops.assert_perst() function pointer to
     assert/deassert PCIe PERST# and implement it for the qcom driver
     (Krishna Chaitanya Chundru)

   - Add DT binding and pwrctrl driver for the Toshiba TC9563 PCIe
     switch, which must be held in reset after poweron so the pwrctrl
     driver can configure the switch via I2C before bringing up the
     links (Krishna Chaitanya Chundru)

  Endpoint framework:

   - Convert the endpoint doorbell test to use a threaded IRQ to fix a
     'sleeping while atomic' issue (Bhanu Seshu Kumar Valluri)

   - Add endpoint VNTB MSI doorbell support to reduce latency between
     host and endpoint (Frank Li)

  New native PCIe controller drivers:

   - Add CIX Sky1 host controller DT binding and driver (Hans Zhang)

   - Add NXP S32G host controller DT binding and driver (Vincent
     Guittot)

   - Add Renesas RZ/G3S host controller DT binding and driver (Claudiu
     Beznea)

   - Add SpacemiT K1 host controller DT binding and driver (Alex Elder)

  Amlogic Meson PCIe controller driver:

   - Update DT binding to name DBI region 'dbi', not 'elbi', and update
     driver to support both (Manivannan Sadhasivam)

  Apple PCIe controller driver:

   - Move struct pci_host_bridge allocation from pci_host_common_init()
     to callers, which significantly simplifies pcie-apple (Marc
     Zyngier)

  Broadcom STB PCIe controller driver:

   - Disable advertising ASPM L0s support correctly (Jim Quinlan)

   - Add a panic/die handler to print diagnostic info in case PCIe
     caused an unrecoverable abort (Jim Quinlan)

  Cadence PCIe controller driver:

   - Add module support for Cadence platform host and endpoint
     controller driver (Manikandan K Pillai)

   - Split headers into 'legacy' (LGA) and 'high perf' (HPA) to prepare
     for new CIX Sky1 driver (Manikandan K Pillai)

  MediaTek PCIe controller driver:

   - Convert DT binding to YAML schema (Christian Marangi)

   - Add Airoha AN7583 DT compatible and driver support (Christian
     Marangi)

  Qualcomm PCIe controller driver:

   - Add Qualcomm Kaanapali to SM8550 DT binding (Qiang Yu)

   - Add required 'power-domains' and 'resets' to qcom sa8775p, sc7280,
     sc8280xp, sm8150, sm8250, sm8350, sm8450, sm8550, x1e80100 DT
     schemas (Krzysztof Kozlowski)

   - Look up OPP using both frequency and data rate (not just frequency)
     so RPMh votes can account for both (Krishna Chaitanya Chundru)

  Rockchip DesignWare PCIe controller driver:

   - Add Rockchip RK3528 compatible strings in DT binding (Yao Zi)

  STMicroelectronics STM32MP25 PCIe controller driver:

   - Fix a race between link training and endpoint register
     initialization (Christian Bruel)

   - Align endpoint allocations to match the ATU requirements (Christian
     Bruel)

  Synopsys DesignWare PCIe controller driver:

   - Clear L1 PM Substate Capability 'Supported' bits unless glue driver
     says it's supported, which prevents users from enabling non-working
     L1SS. Currently only qcom and tegra194 support L1SS (Bjorn Helgaas)

   - Remove now-superfluous L1SS disable code from tegra194 (Bjorn
     Helgaas)

   - Configure L1SS support in dw-rockchip when DT says
     'supports-clkreq' (Shawn Lin)

  TI Keystone PCIe controller driver:

   - Fail the probe instead of silently succeeding if ks_pcie_of_data
     didn't specify Root Complex or Endpoint mode (Siddharth Vadapalli)

   - Make keystone buildable as a loadable module, except on ARM32 where
     hook_fault_code() is __init (Siddharth Vadapalli)"

* tag 'pci-v6.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (100 commits)
  MAINTAINERS: Add Manivannan Sadhasivam as PCI/pwrctrl maintainer
  MAINTAINERS: Add CIX Sky1 PCIe controller driver maintainer
  PCI: sky1: Add PCIe host support for CIX Sky1
  dt-bindings: PCI: Add CIX Sky1 PCIe Root Complex bindings
  PCI: cadence: Add support for High Perf Architecture (HPA) controller
  MAINTAINERS: Add NXP S32G PCIe controller driver maintainer
  PCI: s32g: Add NXP S32G PCIe controller driver (RC)
  PCI: dwc: Add register and bitfield definitions
  dt-bindings: PCI: s32g: Add NXP S32G PCIe controller
  PCI: Add Renesas RZ/G3S host controller driver
  PCI: host-generic: Move bridge allocation outside of pci_host_common_init()
  dt-bindings: PCI: Add Renesas RZ/G3S PCIe controller binding
  PCI: Validate pci_rebar_size_supported() input
  Documentation: PCI: Amend error recovery doc with pci_save_state() rules
  treewide: Drop pci_save_state() after pci_restore_state()
  PCI/ERR: Ensure error recoverability at all times
  PCI/PM: Stop needlessly clearing state_saved on enumeration and thaw
  PCI/PM: Reinstate clearing state_saved in legacy and !PM codepaths
  PCI: dw-rockchip: Configure L1SS support
  PCI: tegra194: Remove unnecessary L1SS disable code
  ...
2025-12-04 17:29:41 -08:00
Stefan Hajnoczi ab4fb1d8f6 scsi: sd: reject invalid pr_read_keys() num_keys values
The pr_read_keys() interface has a u32 num_keys parameter. The SCSI
PERSISTENT RESERVE IN command has a maximum READ KEYS service action
size of 65536 bytes. Reject num_keys values that are too large to fit
into the SCSI command.

This will become important when pr_read_keys() is exposed to untrusted
userspace via an <linux/pr.h> ioctl.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-12-04 07:19:26 -07:00
Linus Torvalds cc25df3e2e for-6.19/block-20251201
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmktsoMQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpuiUD/92eivL+HmOh10o8trvxajB0yuyqfSjHHrL
 g+xUbF4s9bgAg/v+Upx7sTY8jdrTcMjKov+G9T6uPvBMqVmeVdZckA1PSAKQaIX1
 Zb7nS2LnO7F6JKbwpwVrrIaqVbcz8MfGIIMbN4yNNEOMCwdIVMp4fo7trPBknJNx
 WddNSGUFlIF3NqSI8AflSS/pYnGm+McfBHXBpJAKipI3iquKKubHv+FX9kLp7Tn4
 x27ZoCWOHglIBTJXU0mmXCVsLF8b5BA8DQcGtT62azb8+l0cRTkaHY0DFAv5BvhG
 TqcjrKdmR0cGSNt+nEmFrujE3atBRl0G0kiHA80YgA1MTtYzdPaUVOUtM9k/rEem
 gpiGMDpBypdxyJAyijPSaVJdfcg0psOlYbhIR4N2wbj/dq8268h+cWzXlF1spgVt
 /7ygoaCmfMNbTy9rKThTjH+es787AVXUAXXaPHhIFsnCKUj8xQl4pT7XltmgYeWx
 1/XD1NEJeLHHog5upAVlGX3H5tbvP1nIICxbZa9mDOJX1rwxxI7/s/RucPjbNXuY
 AiaKPTfxtB9+Ihd2HrJ/76RVMkckcOBc4GIKoFfwuKDbcdLXQ5FcZCmVRoI1V9SV
 KsH7JBgihLwR9XWKE1vp9+CBNe1Qlu3K4IjG/E7CNLeuDntIBu73ihqGP/DqV6Bq
 RX1Dc0OyAQ==
 =m22w
 -----END PGP SIGNATURE-----

Merge tag 'for-6.19/block-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block updates from Jens Axboe:

 - Fix head insertion for mq-deadline, a regression from when priority
   support was added

 - Series simplifying and improving the ublk user copy code

 - Various ublk related cleanups

 - Fixup REQ_NOWAIT handling in loop/zloop, clearing NOWAIT when the
   request is punted to a thread for handling

 - Merge and then later revert loop dio nowait support, as it ended up
   causing excessive stack usage for when the inline issue code needs to
   dip back into the full file system code

 - Improve auto integrity code, making it less deadlock prone

 - Speedup polled IO handling, but manually managing the hctx lookups

 - Fixes for blk-throttle for SSD devices

 - Small series with fixes for the S390 dasd driver

 - Add support for caching zones, avoiding unnecessary report zone
   queries

 - MD pull requests via Yu:
      - fix null-ptr-dereference regression for dm-raid0
      - fix IO hang for raid5 when array is broken with IO inflight
      - remove legacy 1s delay to speed up system shutdown
      - change maintainer's email address
      - data can be lost if array is created with different lbs devices,
        fix this problem and record lbs of the array in metadata
      - fix rcu protection for md_thread
      - fix mddev kobject lifetime regression
      - enable atomic writes for md-linear
      - some cleanups

 - bcache updates via Coly
      - remove useless discard and cache device code
      - improve usage of per-cpu workqueues

 - Reorganize the IO scheduler switching code, fixing some lockdep
   reports as well

 - Improve the block layer P2P DMA support

 - Add support to the block tracing code for zoned devices

 - Segment calculation improves, and memory alignment flexibility
   improvements

 - Set of prep and cleanups patches for ublk batching support. The
   actual batching hasn't been added yet, but helps shrink down the
   workload of getting that patchset ready for 6.20

 - Fix for how the ps3 block driver handles segments offsets

 - Improve how block plugging handles batch tag allocations

 - nbd fixes for use-after-free of the configuration on device clear/put

 - Set of improvements and fixes for zloop

 - Add Damien as maintainer of the block zoned device code handling

 - Various other fixes and cleanups

* tag 'for-6.19/block-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (162 commits)
  block/rnbd: correct all kernel-doc complaints
  blk-mq: use queue_hctx in blk_mq_map_queue_type
  md: remove legacy 1s delay in md_notify_reboot
  md/raid5: fix IO hang when array is broken with IO inflight
  md: warn about updating super block failure
  md/raid0: fix NULL pointer dereference in create_strip_zones() for dm-raid
  sbitmap: fix all kernel-doc warnings
  ublk: add helper of __ublk_fetch()
  ublk: pass const pointer to ublk_queue_is_zoned()
  ublk: refactor auto buffer register in ublk_dispatch_req()
  ublk: add `union ublk_io_buf` with improved naming
  ublk: add parameter `struct io_uring_cmd *` to ublk_prep_auto_buf_reg()
  kfifo: add kfifo_alloc_node() helper for NUMA awareness
  blk-mq: fix potential uaf for 'queue_hw_ctx'
  blk-mq: use array manage hctx map instead of xarray
  ublk: prevent invalid access with DEBUG
  s390/dasd: Use scnprintf() instead of sprintf()
  s390/dasd: Move device name formatting into separate function
  s390/dasd: Remove unnecessary debugfs_create() return checks
  s390/dasd: Fix gendisk parent after copy pair swap
  ...
2025-12-03 19:26:18 -08:00
Linus Torvalds 4d38b88fd1 printk changes for 6.19
-----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmktlbUbFIAAAAAABAAO
 bWFudTIsMi41KzEuMTEsMiwyAAoJEFKgDEdIgJTyevsP/1z98/wfCaSCquIq4H8S
 OTqFGybGgYQt1NmMj2cGPpbAE3LJNYORT0A4tcoqOTy1Z5xbQz63rO3clSI/e7Mf
 n4ZZ7NvkE40i8et1BjqtZa9dSkAv4QLYH73KrtNeuTr5tqvHo1x8FakUH6gQnb1k
 QOOebvbVXnOb+rh89j1GZShrLFcCil0psjp165WHAYE/3PyFBgYGLMCgwLqS+W3H
 re5Q4sl/ySXpMFF/XN1Kww48FWxy/h+YQFCxZwuWlUcXtVjqZ+BN+keb7AqaFQ7R
 dC2exV2W0RBoupEJR/FWHoXrm/bDDLhzqRaMvoggLJrMJ9L6V0WdIhaFA4qzoG63
 paJGFjUfmDX3dpPsAddq7kKeevCz4a2/HwFKhiBqqq4tdHuely7wZgnoFO7ovgmu
 DYDCXHtpJuWZR3WJ5I/V/sJ9i9KFXhhyWcKVf13QTAFiCaA09aeSAcUWNYNaaxbn
 nu6IkUxdIVnWIEBgcYH6jz1DrPGreYLYuD4bVb2gdZoP0r3tnMpG6xfSNIUueSGd
 VFAKW9PJYaj7Id+jgACH6V+gQ22L600xJDdL1bPjRbGE0LD7vlz2F1MZTq3BFJFn
 hUxJeOZplHX+TPophdvH4MO9VLmydWLUyJiDBP1yA8M9XZms/5s7IJJ1RYXqUCcf
 qEB4L7W1+Qy1R/lzf2PU9X4R
 =FnfO
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk updates from Petr Mladek:

 - Allow creaing nbcon console drivers with an unsafe write_atomic()
   callback that can only be called by the final nbcon_atomic_flush_unsafe().
   Otherwise, the driver would rely on the kthread.

   It is going to be used as the-best-effort approach for an
   experimental nbcon netconsole driver, see

     https://lore.kernel.org/r/20251121-nbcon-v1-2-503d17b2b4af@debian.org

   Note that a safe .write_atomic() callback is supposed to work in NMI
   context. But some networking drivers are not safe even in IRQ
   context:

     https://lore.kernel.org/r/oc46gdpmmlly5o44obvmoatfqo5bhpgv7pabpvb6sjuqioymcg@gjsma3ghoz35

   In an ideal world, all networking drivers would be fixed first and
   the atomic flush would be blocked only in NMI context. But it brings
   the question how reliable networking drivers are when the system is
   in a bad state. They might block flushing more reliable serial
   consoles which are more suitable for serious debugging anyway.

 - Allow to use the last 4 bytes of the printk ring buffer.

 - Prevent queuing IRQ work and block printk kthreads when consoles are
   suspended. Otherwise, they create non-necessary churn or even block
   the suspend.

 - Release console_lock() between each record in the kthread used for
   legacy consoles on RT. It might significantly speed up the boot.

 - Release nbcon context between each record in the atomic flush. It
   prevents stalls of the related printk kthread after it has lost the
   ownership in the middle of a record

 - Add support for NBCON consoles into KDB

 - Add %ptsP modifier for printing struct timespec64 and use it where
   possible

 - Misc code clean up

* tag 'printk-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (48 commits)
  printk: Use console_is_usable on console_unblank
  arch: um: kmsg_dump: Use console_is_usable
  drivers: serial: kgdboc: Drop checks for CON_ENABLED and CON_BOOT
  lib/vsprintf: Unify FORMAT_STATE_NUM handlers
  printk: Avoid irq_work for printk_deferred() on suspend
  printk: Avoid scheduling irq_work on suspend
  printk: Allow printk_trigger_flush() to flush all types
  tracing: Switch to use %ptSp
  scsi: snic: Switch to use %ptSp
  scsi: fnic: Switch to use %ptSp
  s390/dasd: Switch to use %ptSp
  ptp: ocp: Switch to use %ptSp
  pps: Switch to use %ptSp
  PCI: epf-test: Switch to use %ptSp
  net: dsa: sja1105: Switch to use %ptSp
  mmc: mmc_test: Switch to use %ptSp
  media: av7110: Switch to use %ptSp
  ipmi: Switch to use %ptSp
  igb: Switch to use %ptSp
  e1000e: Switch to use %ptSp
  ...
2025-12-03 12:42:36 -08:00
Xingui Yang 278712d20b scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed"
This reverts commit ab2068a6fb.

When probing the exp-attached sata device, libsas/libata will issue a
hard reset in sas_probe_sata() -> ata_sas_async_probe(), then a
broadcast event will be received after the disk probe fails, and this
commit causes the probe will be re-executed on the disk, and a faulty
disk may get into an indefinite loop of probe.

Therefore, revert this commit, although it can fix some temporary issues
with disk probe failure.

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Link: https://patch.msgid.link/20251202065627.140361-1-yangxingui@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-03 01:36:13 -05:00
Shi Hao 9086cac895 scsi: qla4xxx: Use time conversion macros
Replace the raw use of 500 value in schedule_timeout() function with
msecs_to_jiffies() to ensure intended value across different kernel
configurations regardless of HZ value.

Signed-off-by: Shi Hao <i.shihao.999@gmail.com>
Link: https://patch.msgid.link/20251117200949.42557-1-i.shihao.999@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-29 16:37:44 -05:00
Wen Xiong eaea513077 scsi: qla2xxx: Enable/disable IRQD_NO_BALANCING during reset
A dynamic remove/add storage adapter test hits EEH on PowerPC:

  EEH: [c00000000004f77c] __eeh_send_failure_event+0x7c/0x160
  EEH: [c000000000048464] eeh_dev_check_failure.part.0+0x254/0x660
  EEH: [c000000000934e0c] __pci_read_msi_msg+0x1ac/0x280
  EEH: [c000000000100f68] pseries_msi_compose_msg+0x28/0x40
  EEH: [c00000000020e1cc] irq_chip_compose_msi_msg+0x5c/0x90
  EEH: [c000000000214b1c] msi_domain_set_affinity+0xbc/0x100
  EEH: [c000000000206be4] irq_do_set_affinity+0x214/0x2c0
  EEH: [c000000000206e04] irq_set_affinity_locked+0x174/0x230
  EEH: [c000000000207044] irq_set_affinity+0x64/0xa0
  EEH: [c000000000212890] write_irq_affinity.constprop.0.isra.0+0x130/0x150
  EEH: [c00000000068868c] proc_reg_write+0xfc/0x160
  EEH: [c0000000005adb48] vfs_write+0xf8/0x4e0
  EEH: [c0000000005ae234] ksys_write+0x84/0x140
  EEH: [c00000000002e994] system_call_exception+0x164/0x310
  EEH: [c00000000000bfe8] system_call_vectored_common+0xe8/0x278

The irqbalance daemon kicks in before invoking qla2xxx->slot_reset
during the EEH recovery process.

  irqbalance daemon
  ->irq_set_affinity()
  ->msi_domain_set_affinity()
  ->irq_chip_set_affiinity_parent()
  ->xive_irq_set_affinity()
  ->pseries_msi_compose_ms()
  ->__pci_read_msi_msg()
  ->irq_chip_compose_msi_msg()

In __pci_read_msi_msg(), the first MSI-X vector is set to all F by the
irqbalance daemon.  pci_write_msg_msix: index=0, lo=ffffffff hi=fffffff

IRQ balancing is not required during adapter reset.

Enable "IRQ_NO_BALANCING" bit before starting adapter reset and disable
it calling pci_restore_state(). The irqbalance daemon is disabled for
this short period of time (~2s).

Co-developed-by: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com>
Signed-off-by: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com>
Signed-off-by: Wen Xiong <wenxiong@linux.ibm.com>
Link: https://patch.msgid.link/20251028142427.3969819-3-wenxiong@linux.ibm.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-29 15:56:18 -05:00
Wen Xiong 6ac3484fb1 scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset
A dynamic remove/add storage adapter test hits EEH on PowerPC:

  EEH: [c00000000004f75c] __eeh_send_failure_event+0x7c/0x160
  EEH: [c000000000048444] eeh_dev_check_failure.part.0+0x254/0x650
  EEH: [c008000001650678] eeh_readl+0x60/0x90 [ipr]
  EEH: [c00800000166746c] ipr_cancel_op+0x2b8/0x524 [ipr]
  EEH: [c008000001656524] ipr_eh_abort+0x6c/0x130 [ipr]
  EEH: [c000000000ab0d20] scmd_eh_abort_handler+0x140/0x440
  EEH: [c00000000017e558] process_one_work+0x298/0x590
  EEH: [c00000000017eef8] worker_thread+0xa8/0x620
  EEH: [c00000000018be34] kthread+0x124/0x130
  EEH: [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64

A PCIe bus trace reveals that a vector of MSI-X is cleared to 0 by
irqbalance daemon. If we disable irqbalance daemon, we won't see the
issue.

With debug enabled in ipr driver:

  [   44.103071] ipr: Entering __ipr_remove
  [   44.103083] ipr: Entering ipr_initiate_ioa_bringdown
  [   44.103091] ipr: Entering ipr_reset_shutdown_ioa
  [   44.103099] ipr: Leaving ipr_reset_shutdown_ioa
  [   44.103105] ipr: Leaving ipr_initiate_ioa_bringdown
  [   44.149918] ipr: Entering ipr_reset_ucode_download
  [   44.149935] ipr: Entering ipr_reset_alert
  [   44.150032] ipr: Entering ipr_reset_start_timer
  [   44.150038] ipr: Leaving ipr_reset_alert
  [   44.244343] scsi 1:2:3:0: alua: Detached
  [   44.254300] ipr: Entering ipr_reset_start_bist
  [   44.254320] ipr: Entering ipr_reset_start_timer
  [   44.254325] ipr: Leaving ipr_reset_start_bist
  [   44.364329] scsi 1:2:4:0: alua: Detached
  [   45.134341] scsi 1:2:5:0: alua: Detached
  [   45.860949] ipr: Entering ipr_reset_shutdown_ioa
  [   45.860962] ipr: Leaving ipr_reset_shutdown_ioa
  [   45.860966] ipr: Entering ipr_reset_alert
  [   45.861028] ipr: Entering ipr_reset_start_timer
  [   45.861035] ipr: Leaving ipr_reset_alert
  [   45.964302] ipr: Entering ipr_reset_start_bist
  [   45.964309] ipr: Entering ipr_reset_start_timer
  [   45.964313] ipr: Leaving ipr_reset_start_bist
  [   46.264301] ipr: Entering ipr_reset_bist_done
  [   46.264309] ipr: Leaving ipr_reset_bist_done

During adapter reset, ipr device driver blocks config space access but
can't block MMIO access for MSI-X entries.  There is very small window:
irqbalance daemon kicks in during adapter reset before ipr driver calls
pci_restore_state(pdev) to restore MSI-X table.

irqbalance daemon reads back all 0 for that MSI-X vector in
__pci_read_msi_msg().

irqbalance daemon:

  msi_domain_set_affinity()
  ->irq_chip_set_affinity_patent()
  ->xive_irq_set_affinity()
  ->irq_chip_compose_msi_msg()
    ->pseries_msi_compose_msg()
    ->__pci_read_msi_msg(): read all 0 since didn't call pci_restore_state
  ->irq_chip_write_msi_msg()
    -> pci_write_msg_msi(): write 0 to the msix vector entry

When ipr driver calls pci_restore_state(pdev) in
ipr_reset_restore_cfg_space(), the MSI-X vector entry has been cleared
by irqbalance daemon in pci_write_msg_msix().

  pci_restore_state()
  ->__pci_restore_msix_state()

Below is the MSI-X table for ipr adapter after irqbalance daemon kicked
in during adapter reset:

  Dump MSIx table: index=0 address_lo=c800 address_hi=10000000 msg_data=0
  Dump MSIx table: index=1 address_lo=c810 address_hi=10000000 msg_data=0
  Dump MSIx table: index=2 address_lo=c820 address_hi=10000000 msg_data=0
  Dump MSIx table: index=3 address_lo=c830 address_hi=10000000 msg_data=0
  Dump MSIx table: index=4 address_lo=c840 address_hi=10000000 msg_data=0
  Dump MSIx table: index=5 address_lo=c850 address_hi=10000000 msg_data=0
  Dump MSIx table: index=6 address_lo=c860 address_hi=10000000 msg_data=0
  Dump MSIx table: index=7 address_lo=c870 address_hi=10000000 msg_data=0
  Dump MSIx table: index=8 address_lo=0 address_hi=0 msg_data=0
  ---------> Hit EEH since msix vector of index=8 are 0
  Dump MSIx table: index=9 address_lo=c890 address_hi=10000000 msg_data=0
  Dump MSIx table: index=10 address_lo=c8a0 address_hi=10000000 msg_data=0
  Dump MSIx table: index=11 address_lo=c8b0 address_hi=10000000 msg_data=0
  Dump MSIx table: index=12 address_lo=c8c0 address_hi=10000000 msg_data=0
  Dump MSIx table: index=13 address_lo=c8d0 address_hi=10000000 msg_data=0
  Dump MSIx table: index=14 address_lo=c8e0 address_hi=10000000 msg_data=0
  Dump MSIx table: index=15 address_lo=c8f0 address_hi=10000000 msg_data=0

  [   46.264312] ipr: Entering ipr_reset_restore_cfg_space
  [   46.267439] ipr: Entering ipr_fail_all_ops
  [   46.267447] ipr: Leaving ipr_fail_all_ops
  [   46.267451] ipr: Leaving ipr_reset_restore_cfg_space
  [   46.267454] ipr: Entering ipr_ioa_bringdown_done
  [   46.267458] ipr: Leaving ipr_ioa_bringdown_done
  [   46.267467] ipr: Entering ipr_worker_thread
  [   46.267470] ipr: Leaving ipr_worker_thread

IRQ balancing is not required during adapter reset.

Enable "IRQ_NO_BALANCING" flag before starting adapter reset and disable
it after calling pci_restore_state(). The irqbalance daemon is disabled
for this short period of time (~2s).

Co-developed-by: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com>
Signed-off-by: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com>
Signed-off-by: Wen Xiong <wenxiong@linux.ibm.com>
Link: https://patch.msgid.link/20251028142427.3969819-2-wenxiong@linux.ibm.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-29 15:56:18 -05:00
Duoming Zhou ab58153ec6 scsi: imm: Fix use-after-free bug caused by unfinished delayed work
The delayed work item 'imm_tq' is initialized in imm_attach() and
scheduled via imm_queuecommand() for processing SCSI commands.  When the
IMM parallel port SCSI host adapter is detached through imm_detach(),
the imm_struct device instance is deallocated.

However, the delayed work might still be pending or executing
when imm_detach() is called, leading to use-after-free bugs
when the work function imm_interrupt() accesses the already
freed imm_struct memory.

The race condition can occur as follows:

CPU 0(detach thread)   | CPU 1
                       | imm_queuecommand()
                       |   imm_queuecommand_lck()
imm_detach()           |     schedule_delayed_work()
  kfree(dev) //FREE    | imm_interrupt()
                       |   dev = container_of(...) //USE
                           dev-> //USE

Add disable_delayed_work_sync() in imm_detach() to guarantee proper
cancellation of the delayed work item before imm_struct is deallocated.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Link: https://patch.msgid.link/20251028100149.40721-1-duoming@zju.edu.cn
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-29 15:42:17 -05:00
Miao Li c131c9bf98 scsi: core: Correct documentation for scsi_device_quiesce()
If scsi_device_quiesce() returns zero, the function executed successfully.

Signed-off-by: Miao Li <limiao@kylinos.cn>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://patch.msgid.link/20251124075444.32699-1-limiao870622@163.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-29 15:20:15 -05:00
Suganath Prabu S 4588e65cfd scsi: mpi3mr: Prevent duplicate SAS/SATA device entries in channel 1
Avoid scanning SAS/SATA devices in channel 1 when SAS transport is
enabled, as the SAS/SATA devices are exposed through channel 0.

Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/stable/20251120071955.463475-1-suganath-prabu.subramani%40broadcom.com
Link: https://patch.msgid.link/20251120071955.463475-1-suganath-prabu.subramani@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-29 15:08:41 -05:00
Rafael J. Wysocki f086594adb Merge branch 'pm-sleep'
Merge updates related to system suspend and hibernation for 6.19-rc1:

 - Replace snprintf() with scnprintf() in show_trace_dev_match()
   (Kaushlendra Kumar)

 - Fix memory allocation error handling in pm_vt_switch_required()
   (Malaya Kumar Rout)

 - Introduce CALL_PM_OP() macro and use it to simplify code in
   generic PM operations (Kaushlendra Kumar)

 - Add module param to backtrace all CPUs in the device power management
   watchdog (Sergey Senozhatsky)

 - Rework message printing in swsusp_save() (Rafael Wysocki)

 - Make it possible to change the number of hibernation compression
   threads (Xueqin Luo)

 - Clarify that only cgroup1 freezer uses PM freezer (Tejun Heo)

 - Add document on debugging shutdown hangs to PM documentation and
   correct a mistaken configuration option in it (Mario Limonciello)

 - Shut down wakeup source timer before removing the wakeup source from
   the list (Kaushlendra Kumar, Rafael Wysocki)

 - Introduce new PMSG_POWEROFF event for system shutdown handling with
   the help of PM device callbacks (Mario Limonciello)

 - Make pm_test delay interruptible by wakeup events (Riwen Lu)

 - Clean up kernel-doc comment style usage in the core hibernation
   code and remove unuseful comments from it (Sunday Adelodun, Rafael
   Wysocki)

 - Add support for handling wakeup events and aborting the suspend
   process while it is syncing file systems (Samuel Wu, Rafael Wysocki)

* pm-sleep: (21 commits)
  PM: hibernate: Extra cleanup of comments in swap handling code
  PM: sleep: Call pm_sleep_fs_sync() instead of ksys_sync_helper()
  PM: sleep: Add support for wakeup during filesystem sync
  PM: hibernate: Clean up kernel-doc comment style usage
  PM: suspend: Make pm_test delay interruptible by wakeup events
  usb: sl811-hcd: Add PM_EVENT_POWEROFF into suspend callbacks
  scsi: Add PM_EVENT_POWEROFF into suspend callbacks
  PM: Introduce new PMSG_POWEROFF event
  PM: wakeup: Update after recent wakeup source removal ordering change
  PM: wakeup: Delete timer before removing wakeup source from list
  Documentation: power: Correct a mistaken configuration option
  Documentation: power: Add document on debugging shutdown hangs
  freezer: Clarify that only cgroup1 freezer uses PM freezer
  PM: hibernate: add sysfs interface for hibernate_compression_threads
  PM: hibernate: make compression threads configurable
  PM: hibernate: dynamically allocate crc->unc_len/unc for configurable threads
  PM: hibernate: Rework message printing in swsusp_save()
  PM: dpm_watchdog: add module param to backtrace all CPUs
  PM: sleep: Introduce CALL_PM_OP() macro to simplify code
  PM: console: Fix memory allocation error handling in pm_vt_switch_required()
  ...
2025-11-28 16:01:13 +01:00
Lukas Wunner 383d89699c treewide: Drop pci_save_state() after pci_restore_state()
In 2009, commit c82f63e411 ("PCI: check saved state before restore")
changed the behavior of pci_restore_state() such that it became necessary
to call pci_save_state() afterwards, lest recovery from subsequent PCI
errors fails.

The commit has just been reverted and so all the pci_save_state() after
pci_restore_state() calls that have accumulated in the tree are now
superfluous.  Drop them.

Two drivers chose a different approach to achieve the same result:
drivers/scsi/ipr.c and drivers/net/ethernet/intel/e1000e/netdev.c set the
pci_dev's "state_saved" flag to true before calling pci_restore_state().
Drop this as well.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>  # qat
Link: https://patch.msgid.link/c2b28cc4defa1b743cf1dedee23c455be98b397a.1760274044.git.lukas@wunner.de
2025-11-24 16:58:59 -06:00
Martin K. Petersen e54f7b4b81 Merge branch 6.18/scsi-fixes into 6.19/scsi-staging
Pull in fixes branch to resolve UFS merge conflict.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-19 22:59:25 -05:00
Bart Van Assche 90449f2d1e scsi: sg: Do not sleep in atomic context
sg_finish_rem_req() calls blk_rq_unmap_user(). The latter function may
sleep. Hence, call sg_finish_rem_req() with interrupts enabled instead
of disabled.

Reported-by: syzbot+c01f8e6e73f20459912e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-scsi/691560c4.a70a0220.3124cb.001a.GAE@google.com/
Cc: Hannes Reinecke <hare@suse.de>
Cc: stable@vger.kernel.org
Fixes: 97d27b0dd0 ("scsi: sg: close race condition in sg_remove_sfp_usercontext()")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251113181643.1108973-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-19 22:46:36 -05:00
Bart Van Assche 13b77ed9c2 scsi: scsi_debug: Support injecting unaligned write errors
Allow user space software, e.g. a blktests test, to inject unaligned
write errors.

Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251113174151.1095574-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-19 22:42:09 -05:00
Zilin Guan 78b1a242fe scsi: qla2xxx: Fix improper freeing of purex item
In qla2xxx_process_purls_iocb(), an item is allocated via
qla27xx_copy_multiple_pkt(), which internally calls
qla24xx_alloc_purex_item().

The qla24xx_alloc_purex_item() function may return a pre-allocated item
from a per-adapter pool for small allocations, instead of dynamically
allocating memory with kzalloc().

An error handling path in qla2xxx_process_purls_iocb() incorrectly uses
kfree() to release the item. If the item was from the pre-allocated
pool, calling kfree() on it is a bug that can lead to memory corruption.

Fix this by using the correct deallocation function,
qla24xx_free_purex_item(), which properly handles both dynamically
allocated and pre-allocated items.

Fixes: 875386b988 ("scsi: qla2xxx: Add Unsolicited LS Request and Response Support for NVMe")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Himanshu Madhani <hmadhani2024@gmail.com>
Link: https://patch.msgid.link/20251113151246.762510-1-zilin@seu.edu.cn
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-19 22:38:27 -05:00
Andy Shevchenko 7b040d4571 scsi: snic: Switch to use %ptSp
Use %ptSp instead of open coded variants to print content of
struct timespec64 in human readable format.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://patch.msgid.link/20251113150217.3030010-21-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 12:30:11 +01:00
Andy Shevchenko d710741f83 scsi: fnic: Switch to use %ptSp
Use %ptSp instead of open coded variants to print content of
struct timespec64 in human readable format.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://patch.msgid.link/20251113150217.3030010-20-andriy.shevchenko@linux.intel.com
[pmladek@suse.com: Fixed output ordering and last_read_time update.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-19 12:28:03 +01:00
Rafael J. Wysocki 37d6d92fe0 Merge back earlier material related to system sleep for 6.19 2025-11-17 16:55:55 +01:00
Mario Limonciello (AMD) 988dd0bd91 scsi: Add PM_EVENT_POWEROFF into suspend callbacks
If the PM core uses hibernation callbacks for powering off the
system, drivers will receive PM_EVENT_POWEROFF and should handle
it the same as they previously handled PM_EVENT_HIBERNATE.

Support this case in the scsi driver.  No functional changes.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Tested-by: Eric Naim <dnaim@cachyos.org>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Link: https://patch.msgid.link/20251112224025.2051702-3-superm1@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-14 17:05:53 +01:00
Martin K. Petersen e360bb6dc8 Merge patch series "replace old wq(s), added WQ_PERCPU to alloc_workqueue"
Marco Crivellari <marco.crivellari@suse.com> says:

Hi,

=== Current situation: problems ===

Let's consider a nohz_full system with isolated CPUs: wq_unbound_cpumask is
set to the housekeeping CPUs, for !WQ_UNBOUND the local CPU is selected.

This leads to different scenarios if a work item is scheduled on an
isolated CPU where "delay" value is 0 or greater then 0:
        schedule_delayed_work(, 0);

This will be handled by __queue_work() that will queue the work item on the
current local (isolated) CPU, while:

        schedule_delayed_work(, 1);

Will move the timer on an housekeeping CPU, and schedule the work there.

Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.

This lack of consistency cannot be addressed without refactoring the API.

=== Recent changes to the WQ API ===

The following, address the recent changes in the Workqueue API:

- commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
- commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")

The old workqueues will be removed in a future release cycle.

=== Introduced Changes by this series ===

1) [P 1]  Replace uses of system_wq and system_unbound_wq

    system_unbound_wq is to be used when locality is not required.

    Because of that, system_unbound_wq has been replaced with
	system_dfl_wq, to make clear it should be used when locality
	is not required.

2) [P 2-3-4] WQ_PERCPU added to alloc_workqueue()

    This change adds a new WQ_PERCPU flag to explicitly request
    alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.

Thanks!

Link: https://patch.msgid.link/20251031095643.74246-1-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 21:30:23 -05:00
Marco Crivellari 8d5cad38cf scsi: pm80xx: Add WQ_PERCPU to alloc_workqueue() users
Currently if a user enqueues a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.  This lack of consistency cannot be addressed
without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:

commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")

This  adds a new WQ_PERCPU flag to explicitly request to alloc_workqueue()
to be per-cpu when WQ_UNBOUND has not been specified.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251107155257.316728-1-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 21:28:27 -05:00
Marco Crivellari f60b8957d8 scsi: qedi: Add WQ_PERCPU to alloc_workqueue() users
Currently if a user enqueues a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.  This lack of consistency cannot be addressed
without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:

commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")

This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251107151618.281250-1-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 21:28:26 -05:00
Marco Crivellari 42312d3acd scsi: target: ibmvscsi: Add WQ_PERCPU to alloc_workqueue() users
Currently if a user enqueues a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.  This lack of consistency cannot be addressed
without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:

commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")

This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251107150542.271229-1-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 21:28:26 -05:00
Marco Crivellari e036dadf78 scsi: qedf: Add WQ_PERCPU to alloc_workqueue() users
Currently if a user enqueues a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.  This lack of consistency cannot be addressed
without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:

commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")

This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251107150155.267651-3-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 21:28:26 -05:00
Marco Crivellari a43a2e48d5 scsi: bnx2fc: Add WQ_PERCPU to alloc_workqueue() users
Currently if a user enqueues a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.  This lack of consistency cannot be addressed
without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:

commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")

This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251107150155.267651-2-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 21:28:26 -05:00
Marco Crivellari 6184ec8b63 scsi: be2iscsi: Add WQ_PERCPU to alloc_workqueue() users
Currently if a user enqueues a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.  This lack of consistency cannot be addressed
without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:

commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")

This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251107144949.256894-1-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 21:28:26 -05:00
Marco Crivellari 84150ef06f scsi: lpfc: WQ_PERCPU added to alloc_workqueue() users
Currently if a user enqueue a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.  This lack of consistentcy cannot be
addressed without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.

This patch continues the effort to refactor worqueue APIs, which has
begun with the change introducing new workqueues and a new
alloc_workqueue flag:

commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251104110808.123424-1-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 21:28:26 -05:00
Marco Crivellari f76e4e1e83 scsi: scsi_transport_fc: WQ_PERCPU added to alloc_workqueue users()
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251031095643.74246-5-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 21:28:26 -05:00
Marco Crivellari afad6b34de scsi: scsi_dh_alua: WQ_PERCPU added to alloc_workqueue() users
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251031095643.74246-4-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 21:28:26 -05:00
Marco Crivellari 5ca003bb43 scsi: qla2xxx: WQ_PERCPU added to alloc_workqueue() users
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251031095643.74246-3-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 21:28:25 -05:00
Marco Crivellari cd87aa2e50 scsi: scsi_transport_iscsi: Replace use of system_unbound_wq with system_dfl_wq
Currently if a user enqueue a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.

This lack of consistency cannot be addressed without refactoring the
API.

system_unbound_wq should be the default workqueue so as not to enforce
locality constraints for random work whenever it's not required.

Adding system_dfl_wq to encourage its use when unbound work should be
used.

The old system_unbound_wq will be kept for a few release cycles.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251031095643.74246-2-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 21:15:18 -05:00
Marco Crivellari 49783aca15 scsi: qla2xxx: Replace use of system_unbound_wq with system_dfl_wq
Currently if a user enqueue a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.

This lack of consistency cannot be addressed without refactoring the
API.

system_unbound_wq should be the default workqueue so as not to enforce
locality constraints for random work whenever it's not required.

Adding system_dfl_wq to encourage its use when unbound work should be
used.

The old system_unbound_wq will be kept for a few release cycles.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251031095643.74246-2-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 21:15:18 -05:00
Ally Heev 3813d28b2b scsi: scsi_debug: Fix uninitialized pointers with __free attr
Uninitialized pointers with '__free' attribute can cause undefined
behaviour as the memory assigned(randomly) to the pointer is freed
automatically when the pointer goes out of scope

scsi doesn't have any bugs related to this as of now, but it is better
to initialize and assign pointers with '__free' attr in one statement to
ensure proper scope-based cleanup

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/aPiG_F5EBQUjZqsl@stanley.mountain/
Signed-off-by: Ally Heev <allyheev@gmail.com>
Link: https://patch.msgid.link/20251105-aheev-uninitialized-free-attr-scsi-v1-1-d28435a0a7ea@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 21:08:05 -05:00
Nuno Sá 1028258914 scsi: pm: Drop unneeded call to pm_runtime_mark_last_busy()
There's no need to explicitly call pm_runtime_mark_last_busy() since
pm_runtime_autosuspend() is now doing it since commit 08071e64cb ("PM:
runtime: Mark last busy stamp in pm_runtime_autosuspend()")

Signed-off-by: Nuno Sá <nuno.sa@analog.com>
Link: https://patch.msgid.link/20251111-scsi-pm-improv-v2-1-626b8491f4b4@analog.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 20:56:38 -05:00
Haotian Zhang acd194d9b5 scsi: sim710: Fix resource leak by adding missing ioport_unmap() calls
The driver calls ioport_map() to map I/O ports in sim710_probe_common()
but never calls ioport_unmap() to release the mapping. This causes
resource leaks in both the error path when request_irq() fails and in
the normal device removal path via sim710_device_remove().

Add ioport_unmap() calls in the out_release error path and in
sim710_device_remove().

Fixes: 56fece2008 ("[PATCH] finally fix 53c700 to use the generic iomem infrastructure")
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Link: https://patch.msgid.link/20251029032555.1476-1-vulab@iscas.ac.cn
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 20:53:41 -05:00
Junrui Luo f6ab594672 scsi: aic94xx: fix use-after-free in device removal path
The asd_pci_remove() function fails to synchronize with pending tasklets
before freeing the asd_ha structure, leading to a potential
use-after-free vulnerability.

When a device removal is triggered (via hot-unplug or module unload),
race condition can occur.

The fix adds tasklet_kill() before freeing the asd_ha structure,
ensuring all scheduled tasklets complete before cleanup proceeds.

Reported-by: Yuhao Jiang <danisjiang@gmail.com>
Reported-by: Junrui Luo <moonafterrain@outlook.com>
Fixes: 2908d778ab ("[SCSI] aic94xx: new driver")
Cc: stable@vger.kernel.org
Signed-off-by: Junrui Luo <moonafterrain@outlook.com>
Link: https://patch.msgid.link/ME2PR01MB3156AB7DCACA206C845FC7E8AFFDA@ME2PR01MB3156.ausprd01.prod.outlook.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 20:50:43 -05:00
Martin K. Petersen d204087a59 Merge patch series "qla2xxx target mode improvements"
Tony Battersby <tonyb@cybernetics.com> says:

This patch series improves the qla2xxx FC driver in target mode. I
developed these patches using the out-of-tree SCST target-mode
subsystem (https://scst.sourceforge.net/), although most of the
improvements will also apply to the other target-mode subsystems such
as the in-tree LIO.  Unfortunately qla2xxx+LIO does not pass all of my
tests, but my patches do not make it any worse (results below). These
patches have been well-tested at my employer with qla2xxx+SCST in both
initiator mode and target mode and with a variety of FC HBAs and
initiators. Since SCST is out-of-tree, some of the patches have parts
that apply in-tree and other parts that apply out-of-tree to SCST. The
SCST patches can be found in the v2 posting linked above.

All patches apply to linux 6.17 and SCST 3.10 master branch.

Summary of patches:
- bugfixes
- cleanups
- improve handling of aborts and task management requests
- improve log message
- add back SLER / SRR support (removed in 2017)

Some of these patches improve handling of aborts and task management
requests. This is some of the testing that I did:

Test 1: Use /dev/sg to queue random disk I/O with short timeouts; make
sure cmds are aborted successfully.
Test 2: Queue lots of disk I/O, then use "sg_reset -N -d /dev/sg" on
initiator to reset logical unit.
Test 3: Queue lots of disk I/O, then use "sg_reset -N -t /dev/sg" on
initiator to reset target.
Test 4: Queue lots of disk I/O, then use "sg_reset -N -b /dev/sg" on
initiator to reset bus.
Test 5: Queue lots of disk I/O, then use "sg_reset -N -H /dev/sg" on
initiator to reset host.
Test 6: Use fiber channel attenuator to trigger SRR during
write/read/compare test; check data integrity.

With my patches, SCST passes all of these tests.

Results with in-tree LIO target-mode subsystem:

Test 1: Seems to abort the same cmd multiple times (both
qlt_24xx_retry_term_exchange() and __qlt_send_term_exchange()). But
cmds get aborted, so give it a pass?

Test 2: Seems to work; cmds are aborted.

Test 3: Target reset doesn't seem to abort cmds, instead, a few seconds
later:
qla2xxx [0000:04:00.0]-f058:9: qla_target(0): tag 1314312, op 2a: CTIO
with TIMEOUT status 0xb received (state 1, port 51:40:2e:c0:18:1d:9f:cc,
LUN 0)

Tests 4 and 5: The initiator is unable to log back in to the target; the
following messages are repeated over and over on the target:
qla2xxx [0000:04:00.0]-e01c:9: Sending TERM ELS CTIO (ha=00000000f8811390)
qla2xxx [0000:04:00.0]-f097:9: Linking sess 000000008df5aba8 [0] wwn
51:40:2e:c0:18:1d:9f:cc with PLOGI ACK to wwn 51:40:2e:c0:18:1d:9f:cc
s_id 00:00:01, ref=2 pla 00000000835a9271 link 0

Test 6: passes with my patches; SRR not supported previously.

So qla2xxx+LIO seems a bit flaky when handling exceptions, but my
patches do not make it any worse. Perhaps someone who is more familiar
with LIO can look at the difference between LIO and SCST and figure out
how to improve it.

Tony Battersby
https://www.cybernetics.com/

v1: https://lore.kernel.org/r/f8977250-638c-4d7d-ac0c-65f742b8d535@cybernetics.com/
v2: https://lore.kernel.org/linux-scsi/e95ee7d0-3580-4124-b854-7f73ca3a3a84@cybernetics.com/
Link: https://patch.msgid.link/aaea0ab0-da8b-4153-9369-60db7507ff7a@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 18:17:50 -05:00
Tony Battersby 4f5eb50f7c scsi: qla2xxx: target: Improve safety of cmd lookup by handle
The driver associates two different structs with numeric handles and
passes the handles to the hardware.  When the hardware passes the handle
back to the driver, the driver consults a table of void * to convert the
handle back to the struct without checking the type of struct.  This can
lead to type confusion if the HBA firmware misbehaves (and some firmware
versions do).  So verify the type of struct is what is expected before
using it.

But we can also do better than that.  Also verify that the exchange
address of the message sent from the hardware matches the exchange
address of the command being returned.  This adds an extra guard against
buggy HBA firmware that returns duplicate messages multiple times (which
has also been seen) in case the driver has reused the handle for a
different command of the same type.

These problems were seen on a QLE2694L with firmware 9.08.02 when
testing SLER / SRR support.  The SRR caused the HBA to flood the
response queue with hundreds of bogus entries.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/7c7cb574-fe62-42ae-b800-d136d8dd89ca@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 18:17:28 -05:00
Tony Battersby c7bd85a7b9 scsi: qla2xxx: target: Add back SRR support
Background: loading qla2xxx with "ql2xtgt_tape_enable=1" enables
Sequence Level Error Recovery (SLER), which is most commonly used for
tape drives.  With SLER enabled, if there is a recoverable I/O error
during a SCSI command, a Sequence Retransmission Request (SRR) will be
used to retry the I/O at a low-level completely within the driver
without propagating the error to the upper levels of the SCSI stack.

SRR support was removed in 2017 by commit 2c39b5ca2a ("qla2xxx: Remove
SRR code"). Add it back, new and improved.

The old removed SRR code used sequence numbers to correlate the SRR
CTIOs with SRR immediate notify messages.  I don't see how that would
work reliably with MSI-X interrupts and multiple queues.  So instead use
the exchange address to find the command associated with the immediate
notify (qlt_srr_to_cmd).

The old removed SRR code had a function qlt_check_srr_debug() to
simulate a SRR, but it didn't work for me.  Instead I just used fiber
optic attenuators attached to the FC cable to reduce the strength of the
signal and induce errors.  Unfortunately this only worked for inducing
SRRs on Data-Out (write) commands, so that is all I was able to test.

The code to build a new scatterlist for a SRR with nonzero offset has
been improved to reduce memory requirements and has been well-tested.
However it does not support protection information.

When a single cmd gets multiple SRRs, the old removed SRR code would
restore the data buffer from the values in cmd->se_cmd before processing
the new SRR.  That might be needed if the offset for the new SRR was
lower than the offset for the previous SRR, but I am not sure if that
can happen.  In my testing, when a single cmd gets multiple SRRs, the
SRR offset always increases or stays the same.  But in case it can
decrease, I added the function qlt_restore_orig_sg().  If this is not
supposed to happen then qlt_restore_orig_sg() can be removed to simplify
the code.

I ran into some HBA firmware bugs with QLE269x, QLE27xx, and QLE28xx
firmware 9.05.xx - 9.08.xx where a SRR would cause the HBA to misbehave
badly.  Since SRRs are rare and therefore difficult to test, I figured
it would be worth checking for the buggy firmware and disabling SLER
with a warning instead of letting others run into the same problem on
the rare occasion that they get a SRR.  This turned out to be difficult
because the firmware version isn't known in the normal NVRAM config
routine, so I added a second NVRAM config routine that is called after
the firmware version is known.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/654b7181-b79e-40ed-a15b-6d6e441a5d5f@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 18:17:28 -05:00
Tony Battersby 04957d8c98 scsi: qla2xxx: target: Improve cmd logging
- Add the command tag to various messages so that different messages
   about the same command can be correlated.

 - For CTIO errors (i.e. when the HW reports an error about a cmd),
   print the cmd tag, opcode, state, initiator WWPN, and LUN.  This info
   helps an administrator determine what is going wrong.

 - When a command experiences a transport error, log a message when it
   is freed.  This makes debugging exceptions easier.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/c579987d-5658-41ae-9653-f0e58c9d1880@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 18:17:28 -05:00
Tony Battersby f4199d5812 scsi: qla2xxx: target: Add cmd->rsp_sent
Add cmd->rsp_sent to indicate that the SCSI status has been sent
successfully, so that SCST can be informed of any transport errors.
This will also be used for logging in later patches.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/d4b0203f-7817-4517-9789-5866bb24fad7@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 18:17:28 -05:00
Tony Battersby 091719c21d scsi: qla2xxx: target: Fix invalid memory access with big CDBs
struct atio7_fcp_cmnd is a variable-length data structure because of
add_cdb_len, but it is embedded in struct atio_from_isp and copied
around like a fixed-length data structure.  For big CDBs > 16 bytes,
get_datalen_for_atio() called on a fixed-length copy of the atio will
access invalid memory.

In some cases this can be fixed by moving the atio to the end of the
data structure and using a variable-length allocation.  In other cases
such as allocating struct qla_tgt_cmd, the fixed-length data structures
are preallocated for speed, so in the case that add_cdb_len != 0,
allocate a separate buffer for the CDB.  Also add memcpy_atio() as a
safeguard against invalid memory accesses.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/306a9d0b-3c89-42fc-a69c-eebca8171347@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 18:17:28 -05:00
Tony Battersby 3d56983cc6 scsi: qla2xxx: Fix TMR failure handling
(target mode)

If handle_tmr() fails:

 - The code for QLA_TGT_ABTS results in memory-use-after-free and
   double-free:
	qlt_do_tmr_work()
		qlt_build_abts_resp_iocb()
			qpair->req->outstanding_cmds[h] = (srb_t *)mcmd;
		mempool_free(mcmd, qla_tgt_mgmt_cmd_mempool); FIRST FREE
	qlt_handle_abts_completion()
		mcmd = qlt_ctio_to_cmd()
			cmd = req->outstanding_cmds[h];
			return cmd;
		vha  = mcmd->vha; USE-AFTER-FREE
		ha->tgt.tgt_ops->free_mcmd(mcmd); SECOND FREE

 - qlt_send_busy() makes no sense because it sends a SCSI command
   response instead of a TMR response.

Instead just call qlt_xmit_tm_rsp() to send a TMR failed response, since
that code is well-tested and handles a number of corner cases.  But it
would be incorrect to call ha->tgt.tgt_ops->free_mcmd() after
handle_tmr() failed, so add a flag to mcmd indicating the proper way to
free the mcmd so that qlt_xmit_tm_rsp() can be used for both cases.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/09a1ff3d-6738-4953-a31b-10e89c540462@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 18:17:28 -05:00
Tony Battersby 5c50d84798 scsi: qla2xxx: target: Improve checks in qlt_xmit_response() / qlt_rdy_to_xfer()
Similar fixes to both functions:

qlt_xmit_response:

 - If the cmd cannot be processed, remember to call ->free_cmd() to
   prevent the target-mode midlevel from seeing a cmd lockup.

 - Do not try to send the response if the exchange has been terminated.

 - Check for chip reset once after lock instead of both before and after
   lock.

 - Give errors from qlt_pre_xmit_response() a lower priority to
   compensate for removing the first check for chip reset.

qlt_rdy_to_xfer:

 - Check for chip reset after lock instead of before lock to avoid
   races.

 - Do not try to receive data if the exchange has been terminated.

 - Give errors from qlt_pci_map_calc_cnt() a lower priority to
   compensate for moving the check for chip reset.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/cd6ccd31-33fa-4454-be36-507bf578a546@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 18:17:28 -05:00
Tony Battersby 17488f1390 scsi: qla2xxx: target: Fix races with aborting commands
cmd->cmd_lock only protects cmd->aborted, but when deciding how to
process a cmd, it is necessary to consider other factors such as
cmd->state and if the chip has been reset, which are protected by
qpair->qp_lock_ptr.  So replace cmd_lock with qp_lock_ptr, whick makes
it possible to check additional values and make decisions about what to
do without racing with the CTIO handler and other code.

 - Lock cmd->qpair->qp_lock_ptr when aborting a cmd.

 - Eliminate cmd->cmd_lock and change cmd->aborted to a bitfield since
   it is now protected by qp_lock_ptr just like all the other flags.

 - Add another command state QLA_TGT_STATE_DONE to avoid any possible
   races between qlt_abort_cmd() and tgt_ops->free_cmd().

 - Add the cmd->sent_term_exchg flag to indicate if
   qlt_send_term_exchange() has already been called.

 - Export qlt_send_term_exchange() for SCST so that it can be called
   directly instead of trying to make qlt_abort_cmd() work for both TMR
   abort and HW timeout.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/2c8d03e4-308b-4d5a-a418-a334be23f815@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 18:17:28 -05:00
Tony Battersby d46c69a087 scsi: qla2xxx: Clear cmds after chip reset
Commit aefed3e554 ("scsi: qla2xxx: target: Fix offline port handling
and host reset handling") caused two problems:

1. Commands sent to FW, after chip reset got stuck and never freed as FW
   is not going to respond to them anymore.

2. BUG_ON(cmd->sg_mapped) in qlt_free_cmd().  Commit 26f9ce5381
   ("scsi: qla2xxx: Fix missed DMA unmap for aborted commands")
   attempted to fix this, but introduced another bug under different
   circumstances when two different CPUs were racing to call
   qlt_unmap_sg() at the same time: BUG_ON(!valid_dma_direction(dir)) in
   dma_unmap_sg_attrs().

So revert "scsi: qla2xxx: Fix missed DMA unmap for aborted commands" and
partially revert "scsi: qla2xxx: target: Fix offline port handling and
host reset handling" at __qla2x00_abort_all_cmds.

Fixes: aefed3e554 ("scsi: qla2xxx: target: Fix offline port handling and host reset handling")
Fixes: 26f9ce5381 ("scsi: qla2xxx: Fix missed DMA unmap for aborted commands")
Co-developed-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/0e7e5d26-e7a0-42d1-8235-40eeb27f3e98@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 18:17:03 -05:00
Martin K. Petersen ab57a18665 Merge patch series "Optimize the hot path in the UFS driver"
Bart Van Assche <bvanassche@acm.org> says:

Hi Martin,

This patch series optimizes the hot path of the UFS driver by making
struct scsi_cmnd and struct ufshcd_lrb adjacent. Making these two data
structures adjacent is realized as follows:

@@ -9040,6 +9046,7 @@ static const struct scsi_host_template ufshcd_driver_template = {
     .name           = UFSHCD,
     .proc_name      = UFSHCD,
     .map_queues     = ufshcd_map_queues,
+    .cmd_size       = sizeof(struct ufshcd_lrb),
     .init_cmd_priv  = ufshcd_init_cmd_priv,
     .queuecommand   = ufshcd_queuecommand,
     .mq_poll        = ufshcd_poll,

The following changes had to be made prior to making these two data
structures adjacent:
* Add support for driver-internal and reserved commands in the SCSI core.
* Instead of making the reserved command slot (hba->reserved_slot)
  invisible to the SCSI core, let the SCSI core allocate a reserved command.
* Remove all UFS data structure members that are no longer needed
  because struct scsi_cmnd and struct ufshcd_lrb are now adjacent
* Call ufshcd_init_lrb() from inside the code for queueing a command instead of
  calling this function before I/O starts. This is necessary because
  ufshcd_memory_alloc() allocates fewer instances than the block layer
  allocates requests. See also the following code in the block layer
  core:

    if (blk_mq_init_request(set, hctx->fq->flush_rq, hctx_idx,
                hctx->numa_node))

  Although the UFS driver could be modified such that ufshcd_init_lrb()
  is called from ufshcd_init_cmd_priv(), realizing this would require
  moving the memory allocations that happen from inside
  ufshcd_memory_alloc() into ufshcd_init_cmd_priv(). That would make
  this patch series even larger. Although ufshcd_init_lrb() is called for each
  command, the benefits of reduced indirection and better cache efficiency
  outweigh the small overhead of per-command lrb initialization.
* ufshcd_add_scsi_host() happens now before any device management
  commands are submitted. This change is necessary because this patch
  makes device management command allocation happen when the SCSI host
  is allocated.
* Allocate as many command slots as the host controller supports. Decrease
  host->cmds_per_lun if necessary once it is clear whether or not the UFS
  device supports less command slots than the host controller.

Please consider this patch series for the next merge window.

Thanks,

Bart.

Link: https://patch.msgid.link/20251031204029.2883185-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 18:16:05 -05:00
Tony Battersby ed382b95f5 scsi: qla2xxx: target: Fix term exchange when cmd_sent_to_fw == 1
Properly set the nport_handle field of the terminate exchange message.
Previously when this field was not set properly, the term exchange would
fail when cmd_sent_to_fw == 1 but work when cmd_sent_to_fw == 0 (i.e. it
would fail when the HW was actively transferring data or status for the
cmd but work when the HW was idle).  With this change, term exchange
works in any cmd state, which now makes it possible to abort a command
that is locked up in the HW.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/1a221699-969b-4f28-8ea4-395d2f7a7c0a@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 18:07:50 -05:00
Tony Battersby c34e373f53 scsi: qla2xxx: target: Improve debug output for term exchange
Print better debug info when terminating a command, and print the
response status from the hardware.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/22f8a0b6-0e24-474d-9f28-9d65c9b7af03@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 18:07:50 -05:00
Tony Battersby 9da4e1dcea scsi: qla2xxx: target: Remove code for unsupported hardware
As far as I can tell, CONTINUE_TGT_IO_TYPE and CTIO_A64_TYPE are message
types from non-FWI2 boards (older than ISP24xx), which are not supported
by qla_target.c.  Removing them makes it possible to turn a void * into
the real type and avoid some typecasts.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/cb006628-e321-4e30-a60b-08b37b8685a5@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 18:07:50 -05:00
Tony Battersby 957aa59749 scsi: qla2xxx: Use reinit_completion on mbx_intr_comp
If a mailbox command completes immediately after
wait_for_completion_timeout() times out, ha->mbx_intr_comp could be left
in an inconsistent state, causing the next mailbox command not to wait
for the hardware.  Fix by reinitializing the completion before use.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/11b6485e-0bfd-4784-8f99-c06a196dad94@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 18:07:50 -05:00
Tony Battersby 4f6aaade2a scsi: qla2xxx: Fix lost interrupts with qlini_mode=disabled
When qla2xxx is loaded with qlini_mode=disabled,
ha->flags.disable_msix_handshake is used before it is set, resulting in
the wrong interrupt handler being used on certain HBAs
(qla2xxx_msix_rsp_q_hs() is used when qla2xxx_msix_rsp_q() should be
used).  The only difference between these two interrupt handlers is that
the _hs() version writes to a register to clear the "RISC" interrupt,
whereas the other version does not.  So this bug results in the RISC
interrupt being cleared when it should not be.  This occasionally causes
a different interrupt handler qla24xx_msix_default() for a different
vector to see ((stat & HSRX_RISC_INT) == 0) and ignore its interrupt,
which then causes problems like:

qla2xxx [0000:02:00.0]-d04c:6: MBX Command timeout for cmd 20,
  iocontrol=8 jiffies=1090c0300 mb[0-3]=[0x4000 0x0 0x40 0xda] mb7 0x500
  host_status 0x40000010 hccr 0x3f00
qla2xxx [0000:02:00.0]-101e:6: Mailbox cmd timeout occurred, cmd=0x20,
  mb[0]=0x20. Scheduling ISP abort
(the cmd varies; sometimes it is 0x20, 0x22, 0x54, 0x5a, 0x5d, or 0x6a)

This problem can be reproduced with a 16 or 32 Gbps HBA by loading
qla2xxx with qlini_mode=disabled and running a high IOPS test while
triggering frequent RSCN database change events.

While analyzing the problem I discovered that even with
disable_msix_handshake forced to 0, it is not necessary to clear the
RISC interrupt from qla2xxx_msix_rsp_q_hs() (more below).  So just
completely remove qla2xxx_msix_rsp_q_hs() and the logic for selecting
it, which also fixes the bug with qlini_mode=disabled.

The test below describes the justification for not needing
qla2xxx_msix_rsp_q_hs():

Force disable_msix_handshake to 0:
qla24xx_config_rings():
if (0 && (ha->fw_attributes & BIT_6) && (IS_MSIX_NACK_CAPABLE(ha)) &&
    (ha->flags.msix_enabled)) {

In qla24xx_msix_rsp_q() and qla2xxx_msix_rsp_q_hs(), check:
  (rd_reg_dword(&reg->host_status) & HSRX_RISC_INT)

Count the number of calls to each function with HSRX_RISC_INT set and
the number with HSRX_RISC_INT not set while performing some I/O.

If qla2xxx_msix_rsp_q_hs() clears the RISC interrupt (original code):
qla24xx_msix_rsp_q:    50% of calls have HSRX_RISC_INT set
qla2xxx_msix_rsp_q_hs:  5% of calls have HSRX_RISC_INT set
(# of qla2xxx_msix_rsp_q_hs interrupts) =
    (# of qla24xx_msix_rsp_q interrupts) * 3

If qla2xxx_msix_rsp_q_hs() does not clear the RISC interrupt (patched
code):
qla24xx_msix_rsp_q:    100% of calls have HSRX_RISC_INT set
qla2xxx_msix_rsp_q_hs:   9% of calls have HSRX_RISC_INT set
(# of qla2xxx_msix_rsp_q_hs interrupts) =
    (# of qla24xx_msix_rsp_q interrupts) * 3

In the case of the original code, qla24xx_msix_rsp_q() was seeing
HSRX_RISC_INT set only 50% of the time because qla2xxx_msix_rsp_q_hs()
was clearing it when it shouldn't have been.  In the patched code,
qla24xx_msix_rsp_q() sees HSRX_RISC_INT set 100% of the time, which
makes sense if that interrupt handler needs to clear the RISC interrupt
(which it does).  qla2xxx_msix_rsp_q_hs() sees HSRX_RISC_INT only 9% of
the time, which is just overlap from the other interrupt during the
high IOPS test.

Tested with SCST on:
QLE2742  FW:v9.08.02 (32 Gbps 2-port)
QLE2694L FW:v9.10.11 (16 Gbps 4-port)
QLE2694L FW:v9.08.02 (16 Gbps 4-port)
QLE2672  FW:v8.07.12 (16 Gbps 2-port)
both initiator and target mode

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/56d378eb-14ad-49c7-bae9-c649b6c7691e@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 18:07:50 -05:00
Tony Battersby 8f58fc64d5 scsi: qla2xxx: Fix initiator mode with qlini_mode=exclusive
When given the module parameter qlini_mode=exclusive, qla2xxx in
initiator mode is initially unable to successfully send SCSI commands to
devices it finds while scanning, resulting in an escalating series of
resets until an adapter reset clears the issue.  Fix by checking the
active mode instead of the module parameter.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/1715ec14-ba9a-45dc-9cf2-d41aa6b81b5e@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 18:07:50 -05:00
Tony Battersby b57fbc8871 scsi: Revert "scsi: qla2xxx: Perform lockless command completion in abort path"
This reverts commit 0367076b08.

The commit being reverted added code to __qla2x00_abort_all_cmds() to
call sp->done() without holding a spinlock.  But unlike the older code
below it, this new code failed to check sp->cmd_type and just assumed
TYPE_SRB, which results in a jump to an invalid pointer in target-mode
with TYPE_TGT_CMD:

qla2xxx [0000:65:00.0]-d034:8: qla24xx_do_nack_work create sess success
  0000000009f7a79b
qla2xxx [0000:65:00.0]-5003:8: ISP System Error - mbx1=1ff5h mbx2=10h
  mbx3=0h mbx4=0h mbx5=191h mbx6=0h mbx7=0h.
qla2xxx [0000:65:00.0]-d01e:8: -> fwdump no buffer
qla2xxx [0000:65:00.0]-f03a:8: qla_target(0): System error async event
  0x8002 occurred
qla2xxx [0000:65:00.0]-00af:8: Performing ISP error recovery -
  ha=0000000058183fda.
BUG: kernel NULL pointer dereference, address: 0000000000000000
PF: supervisor instruction fetch in kernel mode
PF: error_code(0x0010) - not-present page
PGD 0 P4D 0
Oops: 0010 [#1] SMP
CPU: 2 PID: 9446 Comm: qla2xxx_8_dpc Tainted: G           O       6.1.133 #1
Hardware name: Supermicro Super Server/X11SPL-F, BIOS 4.2 12/15/2023
RIP: 0010:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
RSP: 0018:ffffc90001f93dc8 EFLAGS: 00010206
RAX: 0000000000000282 RBX: 0000000000000355 RCX: ffff88810d16a000
RDX: ffff88810dbadaa8 RSI: 0000000000080000 RDI: ffff888169dc38c0
RBP: ffff888169dc38c0 R08: 0000000000000001 R09: 0000000000000045
R10: ffffffffa034bdf0 R11: 0000000000000000 R12: ffff88810800bb40
R13: 0000000000001aa8 R14: ffff888100136610 R15: ffff8881070f7400
FS:  0000000000000000(0000) GS:ffff88bf80080000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 000000010c8ff006 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 ? __die+0x4d/0x8b
 ? page_fault_oops+0x91/0x180
 ? trace_buffer_unlock_commit_regs+0x38/0x1a0
 ? exc_page_fault+0x391/0x5e0
 ? asm_exc_page_fault+0x22/0x30
 __qla2x00_abort_all_cmds+0xcb/0x3e0 [qla2xxx_scst]
 qla2x00_abort_all_cmds+0x50/0x70 [qla2xxx_scst]
 qla2x00_abort_isp_cleanup+0x3b7/0x4b0 [qla2xxx_scst]
 qla2x00_abort_isp+0xfd/0x860 [qla2xxx_scst]
 qla2x00_do_dpc+0x581/0xa40 [qla2xxx_scst]
 kthread+0xa8/0xd0
 </TASK>

Then commit 4475afa264 ("scsi: qla2xxx: Complete command early within
lock") added the spinlock back, because not having the lock caused a
race and a crash.  But qla2x00_abort_srb() in the switch below already
checks for qla2x00_chip_is_down() and handles it the same way, so the
code above the switch is now redundant and still buggy in target-mode.
Remove it.

Cc: stable@vger.kernel.org
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/3a8022dc-bcfd-4b01-9f9b-7a9ec61fa2a3@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 18:07:50 -05:00
Bart Van Assche 581ca49035 scsi: scsi_debug: Abort SCSI commands via an internal command
Add a .queue_reserved_command() implementation and call it from the code
path that aborts SCSI commands. This ensures that the code for
allocating a pseudo SCSI device and also the code for allocating and
processing reserved commands gets triggered while running blktests.

Most of the code in this patch is a modified version of code from John
Garry. See also
https://lore.kernel.org/linux-scsi/75018e17-4dea-4e1b-8c92-7a224a1e13b9@oracle.com/

Reviewed-by: John Garry <john.g.garry@oracle.com>
Suggested-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251031204029.2883185-8-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 17:02:31 -05:00
Hannes Reinecke a2ab4e3328 scsi: core: Add scsi_{get,put}_internal_cmd() helpers
Add helper functions to allow LLDDs to allocate and free internal commands.

[ bvanassche: changed the 'nowait' argument into a 'flags' argument. See also
  https://lore.kernel.org/linux-scsi/20211125151048.103910-3-hare@suse.de/ ]

Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251031204029.2883185-7-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 17:02:31 -05:00
John Garry 11ea1de3fc scsi: core: Introduce .queue_reserved_command()
Reserved commands will be used by SCSI LLDs for submitting internal
commands. Since the SCSI host, target and device limits do not apply to
the reserved command use cases, bypass the SCSI host limit checks for
reserved commands. Introduce the .queue_reserved_command() callback for
reserved commands. Additionally, do not activate the SCSI error handler
if a reserved command fails such that reserved commands can be submitted
from inside the SCSI error handler.

[ bvanassche: modified patch title and patch description. Renamed
  .reserved_queuecommand() into .queue_reserved_command(). Changed
  the second argument of __blk_mq_end_request() from 0 into error
  code in the completion path if cmd->result != 0. Rewrote the
  scsi_queue_rq() changes. See also
  https://lore.kernel.org/linux-scsi/1666693096-180008-5-git-send-email-john.garry@huawei.com/ ]

Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://patch.msgid.link/20251031204029.2883185-6-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 17:02:31 -05:00
Hannes Reinecke d630fbf6fc scsi: core: Support allocating a pseudo SCSI device
Allocate a pseudo SCSI device if 'nr_reserved_cmds' has been set. Pseudo
SCSI devices have the SCSI ID <max_id>:U64_MAX so they won't clash with
any devices the LLD might create. Pseudo SCSI devices are excluded from
scanning and will not show up in sysfs. Additionally, pseudo SCSI
devices are skipped by shost_for_each_device(). This prevents that the
SCSI error handler tries to submit a reset to a non-existent logical
unit.

Do not allocate a budget map for pseudo SCSI devices since the
cmd_per_lun limit does not apply to pseudo SCSI devices.

Do not perform queue depth ramp up / ramp down for pseudo SCSI devices.

Pseudo SCSI devices will be used to send internal commands to a storage
device.

[ bvanassche: edited patch description / renamed host_sdev into
  pseudo_sdev / unexported scsi_get_host_dev() / modified error path in
  scsi_get_pseudo_dev() / skip pseudo devices in __scsi_iterate_devices()
  and also when calling sdev_init(), sdev_configure() and sdev_destroy().
  See also
  https://lore.kernel.org/linux-scsi/20211125151048.103910-2-hare@suse.de/ ]

Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251031204029.2883185-5-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 17:02:31 -05:00
Bart Van Assche a47c7bef57 scsi: core: Make the budget map optional
Prepare for not allocating a budget map for pseudo SCSI devices by
checking whether a budget map has been allocated before using it.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251031204029.2883185-4-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 17:02:31 -05:00
Bart Van Assche 21008cabc5 scsi: core: Move two statements
Move two statements that will be needed for pseudo SCSI devices in front
of code that won't be needed for pseudo SCSI devices. No functionality
has been changed.

Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251031204029.2883185-3-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 17:02:30 -05:00
Hannes Reinecke d604e1ec24 scsi: core: Support allocating reserved commands
Quite some drivers are using management commands internally. These
commands typically use the same tag pool as regular SCSI commands. Tags
for these management commands are set aside before allocating the
block-mq tag bitmap for regular SCSI commands. The block layer already
supports this via the reserved tag mechanism. Add a new field
'nr_reserved_cmds' to the SCSI host template to instruct the block layer
to set aside a tag space for these management commands by using reserved
tags. Exclude reserved commands from .can_queue because .can_queue is
visible in sysfs.

[ bvanassche: modified patch title and patch description. Left out the
  following statements: "if (sht->nr_reserved_cmds)" and also
  "if (sdev->host->nr_reserved_cmds) flags |= BLK_MQ_REQ_RESERVED;". Moved
  nr_reserved_cmds declarations and statements close to the
  corresponding can_queue declarations and statements. See also
  https://lore.kernel.org/linux-scsi/20210503150333.130310-11-hare@suse.de/ ]

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251031204029.2883185-2-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 17:02:30 -05:00
Martin K. Petersen c53a741a7f Merge patch series "Support power resources defined in acpi on ata"
Markus Probst <markus.probst@posteo.de> says:

This series adds support for power resources defined in acpi on ata
ports/devices. A device can define a power resource in an ata port/device,
which then gets powered on right before the port is probed. This can be
useful for devices, which have sata power connectors that are:
  a: powered down by default
  b: can be individually powered on
like in some synology nas devices. If thats the case it will be assumed,
that the power resource won't survive reboots and therefore the disk will
be stopped.

Link: https://patch.msgid.link/20251104142413.322347-1-markus.probst@posteo.de
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-08 13:26:28 -05:00
Markus Probst 8fdfdb1488 scsi: sd: Add manage_restart device attribute to scsi_disk
In addition to the already existing manage_shutdown,
manage_system_start_stop and manage_runtime_start_stop device scsi_disk
attributes, add manage_restart, which allows the high-level device
driver (sd) to manage the device power state for SYSTEM_RESTART if set
to 1.

This attribute is necessary for the following commit "ata: stop disk on
restart if ACPI power resources are found" to avoid a potential disk
power failure in the case the SATA power connector does not retain the
power state after a restart.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Markus Probst <markus.probst@posteo.de>
Link: https://patch.msgid.link/20251104142413.322347-2-markus.probst@posteo.de
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-08 13:24:55 -05:00
Martin K. Petersen 5b5dedf007 Merge patch series "Update lpfc to revision 14.4.0.12"
Justin Tee <justin.tee@broadcom.com> says:

Update lpfc to revision 14.4.0.12

This patch set contains updates to log messaging, revision of outdated
comment descriptions, fixes to kref accounting, support for BB credit
recovery in point-to-point mode, and introduction of registering unique
platform name identifiers with fabrics.

The patches were cut against Martin's 6.19/scsi-queue tree.

Link: https://patch.msgid.link/20251106224639.139176-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-08 13:19:17 -05:00
Justin Tee d45fdc6cdd scsi: lpfc: Update lpfc version to 14.4.0.12
Update lpfc version to 14.4.0.12

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251106224639.139176-11-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-08 13:18:01 -05:00
Justin Tee 191da2c71b scsi: lpfc: Add capability to register Platform Name ID to fabric
FC-LS and FC-GS specifications outline fields for registering a platform
name identifier (PNI) to the fabric.  The PNI assists fabrics with
identifying the physical server source of frames in the fabric.

lpfc generates a PNI based partially on the uuid specific for the
system.  Initial attempts to extract a uuid are made from SMBIOS's
System Information 08h uuid entry.  If SMBIOS DMI does not exist, a PNI
is not generated and PNI registration with the fabric is skipped.

The PNI is submitted in FLOGI and FDISC frames.  After successful fabric
login, the RSPNI_PNI CT frame is submitted to the fabric to register the
OS host name tying it to the PNI.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251106224639.139176-10-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-08 13:18:01 -05:00
Justin Tee 683df5fc3e scsi: lpfc: Allow support for BB credit recovery in point-to-point topology
Currently, BB credit recovery is excluded to fabric topology mode.  This
patch allows setting of BB_SC_N in PLOGIs and PLOGI_ACCs when in
point-to-point mode so that BB credit recovery can operate in
point-to-point topology as well.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251106224639.139176-9-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-08 13:18:00 -05:00
Justin Tee 07caedc6a3 scsi: lpfc: Fix reusing an ndlp that is marked NLP_DROPPED during FLOGI
It's possible for an unstable link to repeatedly bounce allowing a FLOGI
retry, but then bounce again forcing an abort of the FLOGI.  Ensure that
the initial reference count on the FLOGI ndlp is restored in this faulty
link scenario.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251106224639.139176-8-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-08 13:18:00 -05:00
Justin Tee 0b8b15a0b7 scsi: lpfc: Modify kref handling for Fabric Controller ndlps
Currently, there is a kref put in the lpfc_cleanup() routine that takes
care of outstanding references on fabric controller ndlps in UNUSED
state.  While typically there is a state change from UNUSED -> REGLOGIN
when the ndlp successfully logs into the fabric, there may be cases when
FLOGI is unsuccessful and the ndlp will remain in UNUSED state without a
registered rpi, yet the ndlp incorrectly has a kref count of one.

To address this, handling of Fabric Controller ndlps are moved into the
routines: lpfc_issue_els_scr(), lpfc_issue_els_rdf(),
lpfc_cmpl_els_disc_cmd().

In both lpfc_issue_els_scr() and lpfc_issue_els_rdf(), if there does not
exist a previously created fabric controller ndlp, an ndlp will be
created.  Otherwise, we can reuse the pre-existing ndlp object.

In lpfc_cmpl_els_disc_cmd(), if the SCR or RDF are not successfully
issued, the initial reference on the ndlp that is not registered with
upper layers will be decremented with a kref_put().

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251106224639.139176-7-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-08 13:18:00 -05:00
Justin Tee 23f4906729 scsi: lpfc: Fix leaked ndlp krefs when in point-to-point topology
In point-to-point topology, the driver sometimes defers the unsolicited
FLOGI LS_ACC until it sends its FLOGI to the remote port.  When this
happens, lpfc neglects to release the ndlp allocated for the unsolicited
FLOGI.  This patch adds code to release the ndlp for the deferred
unsolicited FLOGI LS_ACC.

An NLP_FLOGI_DFR_ACC flag is introduced to facilitate identifying an
ndlp with an expected deferred FLOGI LS_ACC completion.  When
lpfc_cmpl_els_rsp() detects the correct qualifiers, it releases the
initial reference on the ndlp object.  And when lpfc_cmpl_els_rsp()
exits, the remaining put for the deferred action is executed and the
ndlp is released.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251106224639.139176-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-08 13:18:00 -05:00
Justin Tee 6f81582b7a scsi: lpfc: Ensure unregistration of rpis for received PLOGIs
Unregistration of an rpi object should be done when a PLOGI is received
as PLOGI receipt implies an implicit LOGO.  Previously, the driver would
continue using the same, already registered, rpi and ACC the received
PLOGI.

Replace the ACC and early return statement with break to execute the
rest of the lpfc_rcv_plogi logic outside the switch case statement.
This ensures unregistration and reregistration of an rpi after PLOGI_ACC
completion.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251106224639.139176-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-08 13:18:00 -05:00
Justin Tee 3c228061c8 scsi: lpfc: Remove redundant NULL ptr assignment in lpfc_els_free_iocb()
Remove redundant cmd_dmabuf and bpl_dmabuf NULL ptr assignment as they
are already initialized to NULL when handling a new unsolicited event.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251106224639.139176-4-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-08 13:18:00 -05:00
Justin Tee f7a302e475 scsi: lpfc: Revise discovery related function headers and comments
Correcting discovery related function headers, return status
information, and comment descriptions.  There are no functional changes.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251106224639.139176-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-08 13:18:00 -05:00
Justin Tee 051d4b65e8 scsi: lpfc: Update various NPIV diagnostic log messaging
Update PRLI status log message to automatically warn when CQE status is
non-zero.

When issuing an RSCN, log ndlp's kref count to the debugfs trace buffer.

Add the NPIV virtual port index to the FDMI registration log message
with the fabric.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251106224639.139176-2-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-08 13:17:59 -05:00
Martin K. Petersen e15710242e Merge patch series "smartpqi updates"
Don Brace <don.brace@microchip.com> says:

These patches are based on Martin Petersen's 6.19/scsi-queue tree
  https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git
  6.19/scsi-queue

This patch series includes four patches, with two main functional changes:

1. smartpqi-Add-timeout-value-to-RAID-path-requests-to-physical-devices

   Sets a timeout value for requests sent to physical devices via the
   RAID path. This prevents the controller firmware from waiting
   indefinitely and allows it to time out requests a few seconds before
   the OS issues Target Management Function (TMF) commands.

   The timeout value sent to the firmware is set to 3 seconds less than
   the OS timeout, which significantly reduces TMF invocations.

2. smartpqi-fix-Device-resources-accessed-after-device-removal
   Fixes a race condition during device removal by:
     * Checking for device removal in the reset handler and canceling any
       pending reset work if the device is no longer present.
     * Canceling outstanding TMF work items in the .sdev_destroy handler.

   Together, these changes eliminate races between reset operations
   and device removal.

The other two patches:
3. smartpqi-add-new-Hurray-Data-pci-device
   Adds support for new Hurray Data PCI device.
   No functional changes.
4. smartpqi-update-driver-version-to-2.1.36-026
   Updates the driver version string.
   No functional changes.

Link: https://patch.msgid.link/20251106163823.786828-1-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-08 13:09:49 -05:00
Don Brace 4cec99e83d scsi: smartpqi: Update version to 2.1.36-026
Update driver version to 2.1.36-026

Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Gerry Morong <gerry.morong@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://patch.msgid.link/20251106163823.786828-5-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-08 13:08:31 -05:00
David Strahan 48e6b7e708 scsi: smartpqi: Add support for Hurray Data new controller PCI device
Add support for new Hurray Data controller.

All entries are in HEX.

Add PCI IDs for Hurray Data controllers:
                                         VID  / DID  / SVID / SDID
                                         ----   ----   ----   ----
                                         9005   028f   207d   4840

Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Signed-off-by: David Strahan <David.Strahan@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://patch.msgid.link/20251106163823.786828-4-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-08 13:08:31 -05:00
Mike McGowen b518e86d1a scsi: smartpqi: Fix device resources accessed after device removal
Correct possible race conditions during device removal.

Previously, a scheduled work item to reset a LUN could still execute
after the device was removed, leading to use-after-free and other
resource access issues.

This race condition occurs because the abort handler may schedule a LUN
reset concurrently with device removal via sdev_destroy(), leading to
use-after-free and improper access to freed resources.

  - Check in the device reset handler if the device is still present in
    the controller's SCSI device list before running; if not, the reset
    is skipped.

  - Cancel any pending TMF work that has not started in sdev_destroy().

  - Ensure device freeing in sdev_destroy() is done while holding the
    LUN reset mutex to avoid races with ongoing resets.

Fixes: 2d80f4054f ("scsi: smartpqi: Update deleting a LUN via sysfs")
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Signed-off-by: Mike McGowen <mike.mcgowen@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://patch.msgid.link/20251106163823.786828-3-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-08 13:08:31 -05:00
Mike McGowen f3ecbba1aa scsi: smartpqi: Add timeout value to RAID path requests to physical devices
Add a timeout value to requests sent to physical devices via the RAID
path.

A timeout value of zero means wait indefinitely, which may cause the OS
to issue Target Management Function (TMF) commands if the device does
not respond.

For input timeouts of 8 seconds or greater, the value sent to firmware
is reduced by 3 seconds to provide an earlier firmware timeout and allow
the OS additional time before timing out.

This change improves timeout handling between the driver, firmware, and
OS, helping to better manage device responsiveness and avoid indefinite
waits.

Reviewed-by: David Strahan <david.strahan@microchip.com>
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Signed-off-by: Mike McGowen <Mike.McGowen@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://patch.msgid.link/20251106163823.786828-2-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-08 13:08:31 -05:00
Marco Crivellari 57565f97b0 scsi: fcoe: Add WQ_PERCPU to alloc_workqueue() users
Currently if a user enqueues a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() uses
WORK_CPU_UNBOUND (used when a CPU is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.  This lack of consistentcy cannot be
addressed without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

Continue the effort to refactor workqueue APIs, which has begun with the
change introducing new workqueues and a new alloc_workqueue flag:

commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")

Adds a new WQ_PERCPU flag to explicitly request alloc_workqueue() to be
per-CPU when WQ_UNBOUND has not been specified.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://patch.msgid.link/20251105150336.244079-1-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-08 12:28:11 -05:00
David Jeffery d27418aaf8 scsi: st: Skip buffer flush for information ioctls
With commit 9604eea5bd ("scsi: st: Add third party poweron reset
handling") some customer tape applications fail from being unable to
complete ioctls to verify ID information for the device when there has
been any type of reset event to their tape devices.

The st driver currently will fail all standard SCSI ioctls if a call to
flush_buffer() fails in st_ioctl(). This causes ioctls which otherwise
have no effect on tape state to succeed or fail based on events
unrelated to the requested ioctl.

This makes SCSI information ioctls unreliable after a reset even if no
buffering is in use. With a reset setting the pos_unknown field,
flush_buffer() will report failure and fail all ioctls. So any
application expecting to use ioctls to check the identify the device
will be unable to do so in such a state.

For SCSI information ioctls, avoid the need for a buffer flush and allow
the ioctls to execute regardless of buffer state.

Signed-off-by: David Jeffery <djeffery@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Acked-by: Kai Mäkisara <kai.makisara@kolumbus.fi>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Tested-by: John Meneghini <jmeneghi@redhat.com>
Link: https://patch.msgid.link/20251104154709.6436-2-djeffery@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-08 12:21:10 -05:00
David Jeffery b37d70c0df scsi: st: Separate st-unique ioctl handling from SCSI common ioctl handling
The st ioctl function currently interleaves code for handling various st
specific ioctls with parts of code needed for handling ioctls common to
all SCSI devices. Separate out st's code for the common ioctls into a
more manageable, separate function.

Signed-off-by: David Jeffery <djeffery@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Acked-by: Kai Mäkisara <kai.makisara@kolumbus.fi>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Tested-by: John Meneghini <jmeneghi@redhat.com>
Link: https://patch.msgid.link/20251104154709.6436-1-djeffery@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-08 12:21:10 -05:00
Haotian Zhang 20da637eb5 scsi: stex: Fix reboot_notifier leak in probe error path
In stex_probe(), register_reboot_notifier() is called at the beginning,
but if any subsequent initialization step fails, the function returns
without unregistering the notifier, resulting in a resource leak.

Add unregister_reboot_notifier() in the out_disable error path to ensure
proper cleanup on all failure paths.

Fixes: 61b745fa63 ("scsi: stex: Add S6 support")
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Link: https://patch.msgid.link/20251104094847.270-1-vulab@iscas.ac.cn
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-08 12:11:06 -05:00
Damien Le Moal fdb9aed869 block: introduce disk_report_zone()
Commit b76b840fd9 ("dm: Fix dm-zoned-reclaim zone write pointer
alignment") introduced an indirect call for the callback function of a
report zones executed with blkdev_report_zones(). This is necessary so
that the function disk_zone_wplug_sync_wp_offset() can be called to
refresh a zone write plug zone write pointer offset after a write error.
However, this solution makes following the path of a zone information
harder to understand.

Clean this up by introducing the new blk_report_zones_args structure to
define a zone report callback and its private data and introduce the
helper function disk_report_zone() which calls both
disk_zone_wplug_sync_wp_offset() and the zone report user callback
function for all zones of a zone report. This helper function must be
called by all block device drivers that implement the report zones
block operation in order to correctly report a zone information.

All block device drivers supporting the report_zones block operation are
updated to use this new scheme.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-05 08:07:21 -07:00
Bart Van Assche 867e4b1bae scsi: core: Improve sdev_store_timeout()
Check whether or not the conversion of the argument to an integer
succeeded. Reject invalid timeout values.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251031221844.2921694-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-02 21:56:22 -05:00
Bart Van Assche 61deab8a32 scsi: core: Remove unused code from scsi_sysfs.c
Remove unused code since we do not keep unused code in the Linux kernel.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251031220857.2917954-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-02 21:54:40 -05:00
Alok Tiwari e9ff858c9a scsi: qla4xxx: Use correct variable in memset for clarity
Both mbox_cmd and mbox_sts have the same size, so using sizeof(mbox_cmd)
when clearing mbox_sts did not cause any functional issue. However, it
is misleading and reduces code readability.

Update the memset() calls to use sizeof(mbox_sts) to make the intent
clear

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/20251021090354.1804327-1-alok.a.tiwari@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-23 23:02:15 -04:00
Bart Van Assche e414748b7e scsi: aacraid: Improve code readability
aac_queuecommand() is a scsi_host_template.queuecommand()
implementation.  Any value returned by this function other than one of
the following values is translated into SCSI_MLQUEUE_HOST_BUSY:

* 0
* SCSI_MLQUEUE_HOST_BUSY
* SCSI_MLQUEUE_DEVICE_BUSY
* SCSI_MLQUEUE_EH_RETRY
* SCSI_MLQUEUE_TARGET_BUSY

Improve readability of aac_queuecommand() by returning
SCSI_MLQUEUE_HOST_BUSY instead of FAILED.

Cc: Gilbert Wu <gilbert.wu@microchip.com>
Cc: Sagar Biradar <Sagar.Biradar@microchip.com>
Cc: John Garry <john.g.garry@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Link: https://patch.msgid.link/20251021201743.3539900-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-23 23:01:00 -04:00
John Garry bb798c1f43 scsi: advansys: Don't call asc_prt_scsi_host() -> scsi_host_busy()
The driver calls asc_prt_scsi_host() -> scsi_host_busy() prior to
calling scsi_add_host(). This should not be done, and has raised issues
for other drivers, like [0].

Function asc_prt_scsi_host() only has a single callsite, as above, where
the shost busy count would always be 0.

Avoid printing the shost busy count to avoid this problem.

[0] https://lore.kernel.org/linux-scsi/20251014200118.3390839-3-bvanassche@acm.org/

Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251023085451.3933666-1-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-23 22:58:53 -04:00
John Garry dcc98c1136 scsi: core: Minor comment fixes for scsi_host_busy()
I guess that the @shost comment on scsi_host_busy() was copied from
scsi_host_get() (as it is the same), however they do not do the same
thing.

Also drop reference to busy counter, which has been removed.

Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251023082759.3927000-1-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-23 22:56:18 -04:00
Magnus Lindholm ea0e278a5c scsi: qla1280: Fix compiler warnings (DEBUG mode)
Building the qla1280 driver with DEBUG_QLA1280 set will emit compiler
warnings. Fix some print formatting strings to reflect the correct type
of printed variables as well as remove unused code. (static function
ql1280_dump_device) in order to avoid compiler warnings.

[mkp: fixed a few more checkpatch warnings]

Signed-off-by: Magnus Lindholm <linmag7@gmail.com>
Link: https://patch.msgid.link/20251002052604.24590-1-linmag7@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-21 22:06:22 -04:00
Bart Van Assche ce085ecdba scsi: core: Do not declare scsi_cmnd pointers const
This change allows removing multiple casts and hence improves type
checking by the compiler.

Cc: Hannes Reinecke <hare@suse.de>
Suggested-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Link: https://patch.msgid.link/20251014220426.3690007-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-21 21:11:35 -04:00
Bart Van Assche d54c676d4f scsi: core: Fix the unit attention counter implementation
scsi_decide_disposition() may call scsi_check_sense().
scsi_decide_disposition() calls are not serialized. Hence, counter
updates by scsi_check_sense() must be serialized. Hence this patch that
makes the counters updated by scsi_check_sense() atomic.

Cc: Kai Mäkisara <Kai.Makisara@kolumbus.fi>
Fixes: a5d518cd4e ("scsi: core: Add counters for New Media and Power On/Reset UNIT ATTENTIONs")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Link: https://patch.msgid.link/20251014220244.3689508-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-21 21:09:36 -04:00
Qiang Liu 3d0d1c7a5c scsi: fnic: Self-assignment of intr_time_type has no effect
Remove the self-assignment statement of the intr_time_type variable.

Signed-off-by: Qiang Liu <liuqiang@kylinos.cn>
Reviewed-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://patch.msgid.link/20251017075504.143491-1-liuqiangneo@163.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-20 12:43:07 -04:00
Bhanu Seshu Kumar Valluri 05e66c18e3 scsi: smartpqi: Prefer kmalloc_array() over kmalloc()
As a best practice use kmalloc_array() to safely calculate dynamic
object sizes without overflow.

[mkp: line exceeding 100 chars, added newline]

Acked-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Bhanu Seshu Kumar Valluri <bhanuseshukumar@gmail.com>
Link: https://patch.msgid.link/20251007065345.8853-1-bhanuseshukumar@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-20 12:01:20 -04:00
Gustavo A. R. Silva 81cb6c228f scsi: megaraid_sas: Avoid a couple -Wflex-array-member-not-at-end warnings
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.

Use the new TRAILING_OVERLAP() helper to fix the following warnings:

drivers/scsi/megaraid/megaraid_sas_fusion.h:1153:31: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/scsi/megaraid/megaraid_sas_fusion.h:1198:32: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

This helper creates a union between a flexible-array member (FAM) and a
set of MEMBERS that would otherwise follow it --in this case 'struct
MR_LD_SPAN_MAP ldSpanMap[MAX_LOGICAL_DRIVES_DYN]' and 'struct
MR_LD_SPAN_MAP ldSpanMap[MAX_LOGICAL_DRIVES]' in the corresponding
structures.

This overlays the trailing members onto the FAM (struct MR_LD_SPAN_MAP
ldSpanMap[];) while keeping the FAM and the start of MEMBERS aligned.

The static_assert() ensures this alignment remains, and it's
intentionally placed inmediately after the corresponding structures --no
blank line in between.

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/aM1E7Xa8qYdZ598N@kspp
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-20 12:00:42 -04:00
Gustavo A. R. Silva 11956e4b91 scsi: isci: Avoid -Wflex-array-member-not-at-end warning
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.

Move the conflicting declaration (which happens to be in a union, so
we're moving the entire union) to the end of the corresponding
structure. Notice that `struct ssp_response_iu` is a flexible structure,
this is a structure that contains a flexible-array member.

With these changes fix the following warning:

drivers/scsi/isci/task.h:92:11: warning: structure containing a flexible
array member is not at the end of another structure
[-Wflex-array-member-not-at-end]

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/aM09bpl1xj9KZSZl@kspp
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-20 12:00:11 -04:00
Bart Van Assche a0b7780602 scsi: core: Fix a regression triggered by scsi_host_busy()
Commit 995412e23b ("blk-mq: Replace tags->lock with SRCU for tag
iterators") introduced the following regression:

Call trace:
 __srcu_read_lock+0x30/0x80 (P)
 blk_mq_tagset_busy_iter+0x44/0x300
 scsi_host_busy+0x38/0x70
 ufshcd_print_host_state+0x34/0x1bc
 ufshcd_link_startup.constprop.0+0xe4/0x2e0
 ufshcd_init+0x944/0xf80
 ufshcd_pltfrm_init+0x504/0x820
 ufs_rockchip_probe+0x2c/0x88
 platform_probe+0x5c/0xa4
 really_probe+0xc0/0x38c
 __driver_probe_device+0x7c/0x150
 driver_probe_device+0x40/0x120
 __driver_attach+0xc8/0x1e0
 bus_for_each_dev+0x7c/0xdc
 driver_attach+0x24/0x30
 bus_add_driver+0x110/0x230
 driver_register+0x68/0x130
 __platform_driver_register+0x20/0x2c
 ufs_rockchip_pltform_init+0x1c/0x28
 do_one_initcall+0x60/0x1e0
 kernel_init_freeable+0x248/0x2c4
 kernel_init+0x20/0x140
 ret_from_fork+0x10/0x20

Fix this regression by making scsi_host_busy() check whether the SCSI
host tag set has already been initialized. tag_set->ops is set by
scsi_mq_setup_tags() just before blk_mq_alloc_tag_set() is called. This
fix is based on the assumption that scsi_host_busy() and
scsi_mq_setup_tags() calls are serialized. This is the case in the UFS
driver.

Reported-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Closes: https://lore.kernel.org/linux-block/pnezafputodmqlpumwfbn644ohjybouveehcjhz2hmhtcf2rka@sdhoiivync4y/
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Link: https://patch.msgid.link/20251007214800.1678255-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-20 11:57:52 -04:00
Martin K. Petersen 4827790660 Merge branch '6.18/scsi-queue' into 6.18/scsi-fixes
Pull in outstanding SCSI fixes for 6.18.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-13 15:54:13 -04:00
Linus Torvalds 2a6edd867b SCSI misc on 20251011
Fixes only in drivers (ufs, mvsas, qla2xxx, target) that came in just
 before or during the merge window.  The most important one is the
 qla2xxx which reverts a conversion to fix flexible array member
 warnings, that went up in this merge window but which turned out on
 further testing to be causing data corruption.
 
 Signed-off-by: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
 -----BEGIN PGP SIGNATURE-----
 
 iLgEABMIAGAWIQTnYEDbdso9F2cI+arnQslM7pishQUCaOpnUhsUgAAAAAAEAA5t
 YW51MiwyLjUrMS4xMSwyLDImHGphbWVzLmJvdHRvbWxleUBoYW5zZW5wYXJ0bmVy
 c2hpcC5jb20ACgkQ50LJTO6YrIUCFQEA1cADof79U9suDka2hAa2l7D4wJQ53qSj
 R7CwarAVX2wA/3RaEHySnSk+ivyHRl5AOYlPrYsnBo1o2KUq1M6kfviM
 =itgd
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Fixes only in drivers (ufs, mvsas, qla2xxx, target) that came in just
  before or during the merge window.

  The most important one is the qla2xxx which reverts a conversion to
  fix flexible array member warnings, that went up in this merge window
  but which turned out on further testing to be causing data corruption"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: core: Include UTP error in INT_FATAL_ERRORS
  scsi: ufs: sysfs: Make HID attributes visible
  scsi: mvsas: Fix use-after-free bugs in mvs_work_queue
  scsi: ufs: core: Fix PM QoS mutex initialization
  scsi: ufs: core: Fix runtime suspend error deadlock
  Revert "scsi: qla2xxx: Fix memcpy() field-spanning write issue"
  scsi: target: target_core_configfs: Add length check to avoid buffer overflow
2025-10-11 11:49:00 -07:00
Linus Torvalds 2215336295 hyperv-next for v6.18
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmjkpakTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXip5B/48MvTFJ1qwRGPVzevZQ8Z4SDogEREp
 69VS/xRf1YCIzyXyanwqf1dXLq8NAqicSp6ewpJAmNA55/9O0cwT2EtohjeGCu61
 krPIvS3KT7xI0uSEniBdhBtALYBscnQ0e3cAbLNzL7bwA6Q6OmvoIawpBADgE/cW
 aZNCK9jy+WUqtXc6lNtkJtST0HWGDn0h04o2hjqIkZ+7ewjuEEJBUUB/JZwJ41Od
 UxbID0PAcn9O4n/u/Y/GH65MX+ddrdCgPHEGCLAGAKT24lou3NzVv445OuCw0c4W
 ilALIRb9iea56ZLVBW5O82+7g9Ag41LGq+841MNlZjeRNONGykaUpTWZ
 =OR26
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv updates from Wei Liu:

 - Unify guest entry code for KVM and MSHV (Sean Christopherson)

 - Switch Hyper-V MSI domain to use msi_create_parent_irq_domain()
   (Nam Cao)

 - Add CONFIG_HYPERV_VMBUS and limit the semantics of CONFIG_HYPERV
   (Mukesh Rathor)

 - Add kexec/kdump support on Azure CVMs (Vitaly Kuznetsov)

 - Deprecate hyperv_fb in favor of Hyper-V DRM driver (Prasanna
   Kumar T S M)

 - Miscellaneous enhancements, fixes and cleanups (Abhishek Tiwari,
   Alok Tiwari, Nuno Das Neves, Wei Liu, Roman Kisel, Michael Kelley)

* tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  hyperv: Remove the spurious null directive line
  MAINTAINERS: Mark hyperv_fb driver Obsolete
  fbdev/hyperv_fb: deprecate this in favor of Hyper-V DRM driver
  Drivers: hv: Make CONFIG_HYPERV bool
  Drivers: hv: Add CONFIG_HYPERV_VMBUS option
  Drivers: hv: vmbus: Fix typos in vmbus_drv.c
  Drivers: hv: vmbus: Fix sysfs output format for ring buffer index
  Drivers: hv: vmbus: Clean up sscanf format specifier in target_cpu_store()
  x86/hyperv: Switch to msi_create_parent_irq_domain()
  mshv: Use common "entry virt" APIs to do work in root before running guest
  entry: Rename "kvm" entry code assets to "virt" to genericize APIs
  entry/kvm: KVM: Move KVM details related to signal/-EINTR into KVM proper
  mshv: Handle NEED_RESCHED_LAZY before transferring to guest
  x86/hyperv: Add kexec/kdump support on Azure CVMs
  Drivers: hv: Simplify data structures for VMBus channel close message
  Drivers: hv: util: Cosmetic changes for hv_utils_transport.c
  mshv: Add support for a new parent partition configuration
  clocksource: hyper-v: Skip unnecessary checks for the root partition
  hyperv: Add missing field to hv_output_map_device_interrupt
2025-10-07 08:40:15 -07:00
Dan Carpenter 120642726e scsi: libfc: Prevent integer overflow in fc_fcp_recv_data()
The "offset" comes from the skb->data that we received.  Here the code
is verifying that "offset + len" is within bounds however it does not
take integer overflows into account.  Use size_add() to be safe.

This would only be an issue on 32bit systems which are probably a very
small percent of the users.  Still, it's worth fixing just for
correctness sake.

Fixes: 42e9a92fe6 ("[SCSI] libfc: A modular Fibre Channel library")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Message-Id: <aNvPMet7TPtM9CY1@stanley.mountain>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-06 22:27:28 -04:00
Alok Tiwari 987da233b2 scsi: qla4xxx: Fix typos in comments
Fix several spelling mistakes in qla4xxx driver comments:

 "Unfortunely" -> "Unfortunately"
 "becase" -> "because"
 "funcions" -> "functions"
 "targer_id" -> "target_id"

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-06 22:26:16 -04:00
Long Li b69ffeaa0a scsi: storvsc: Prefer returning channel with the same CPU as on the I/O issuing CPU
When selecting an outgoing channel for I/O, storvsc tries to select a
channel with a returning CPU that is not the same as issuing CPU. This
worked well in the past, however it doesn't work well when the Hyper-V
exposes a large number of channels (up to the number of all CPUs). Use a
different CPU for returning channel is not efficient on Hyper-V.

Change this behavior by preferring to the channel with the same CPU as
the current I/O issuing CPU whenever possible.

Tests have shown improvements in newer Hyper-V/Azure environment, and no
regression with older Hyper-V/Azure environments.

Tested-by: Raheel Abdul Faizy <rabdulfaizy@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Message-Id: <1759381530-7414-1-git-send-email-longli@linux.microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-06 22:04:57 -04:00
Linus Torvalds 2f2c725493 pci-v6.18-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmjgOAkUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxzlA//QxoUF4p1cN7+rPwuzCPNi2ZmKNyU
 T7mLfUciV/t8nPLPFdtxdttHB3F+BsA/E9WYFiUUGBzvdYafnoZ/Qnio1WdMIIYz
 0eVrTpnMUMBXrUwGFnnIER3b4GCJb2WR3RPfaBrbqQRHoAlDmv/ijh7rIKhgWIeR
 NsCmPiFnsxPjgVusn2jXWLheUHEbZh2dVTk9lceQXFRdrUELC9wH7zigAA6GviGO
 ssPC1pKfg5DrtuuM6k9JCcEYibQIlynxZ8sbT6YfQ2bs1uSEd2pEcr7AORb4l2yQ
 rcirHwGTpvZ/QvzKpDY8FcuzPFRP7QPd+34zMEQ2OW04y1k61iKE/4EE2Z9w/OoW
 esFQXbevy9P5JHu6DBcaJ2uwvnLiVesry+9CmkKCc6Dxyjbcbgeta1LR5dhn1Rv0
 dMtRnkd/pxzIF5cRnu+WlOFV2aAw2gKL9pGuimH5TO4xL2qCZKak0hh8PAjUN2c/
 12GAlrwAyBK1FeY2ZflTN7Vr8o2O0I6I6NeaF3sCW1VO2e6E9/bAIhrduUO4lhGq
 BHTVRBefFRtbFVaxTlUAj+lSCyqES3Wzm8y/uLQvT6M3opunTziSDff1aWbm1Y2t
 aASl1IByuKsGID8VrT5khHeBKSWtnd/v7LLUjCeq+g6eKdfN2arInPvw5X1NpVMj
 tzzBYqwHgBoA4u8=
 =BUw/
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Add PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP() macros that
     take config space accessor functions.

     Implement pci_find_capability(), pci_find_ext_capability(), and
     dwc, dwc endpoint, and cadence capability search interfaces with
     them (Hans Zhang)

   - Leave parent unit address 0 in 'interrupt-map' so that when we
     build devicetree nodes to describe PCI functions that contain
     multiple peripherals, we can build this property even when
     interrupt controllers lack 'reg' properties (Lorenzo Pieralisi)

   - Add a Xeon 6 quirk to disable Extended Tags and limit Max Read
     Request Size to 128B to avoid a performance issue (Ilpo Järvinen)

   - Add sysfs 'serial_number' file to expose the Device Serial Number
     (Matthew Wood)

   - Fix pci_acpi_preserve_config() memory leak (Nirmoy Das)

  Resource management:

   - Align m68k pcibios_enable_device() with other arches (Ilpo
     Järvinen)

   - Remove sparc pcibios_enable_device() implementations that don't do
     anything beyond what pci_enable_resources() does (Ilpo Järvinen)

   - Remove mips pcibios_enable_resources() and use
     pci_enable_resources() instead (Ilpo Järvinen)

   - Clean up bridge window sizing and assignment (Ilpo Järvinen),
     including:

       - Leave non-claimed bridge windows disabled

       - Enable bridges even if a window wasn't assigned because not all
         windows are required by downstream devices

       - Preserve bridge window type when releasing the resource, since
         the type is needed for reassignment

       - Consolidate selection of bridge windows into two new
         interfaces, pbus_select_window() and
         pbus_select_window_for_type(), so this is done consistently

       - Compute bridge window start and end earlier to avoid logging
         stale information

  MSI:

   - Add quirk to disable MSI on RDC PCI to PCIe bridges (Marcos Del Sol
     Vives)

  Error handling:

   - Align AER with EEH by allowing drivers to request a Bus Reset on
     Non-Fatal Errors (in addition to the reset on Fatal Errors that we
     already do) (Lukas Wunner)

   - If error recovery fails, emit FAILED_RECOVERY uevents for the
     devices, not for the bridge leading to them.

     This makes them correspond to BEGIN_RECOVERY uevents (Lukas Wunner)

   - Align AER with EEH by calling err_handler.error_detected()
     callbacks to notify drivers if error recovery fails (Lukas Wunner)

   - Align AER with EEH by restoring device error_state to
     pci_channel_io_normal before the err_handler.slot_reset() callback.

     This is earlier than before the err_handler.resume() callback
     (Lukas Wunner)

   - Emit a BEGIN_RECOVERY uevent when driver's
     err_handler.error_detected() requests a reset, as well as when it
     says recovery is complete or can be done without a reset (Niklas
     Schnelle)

   - Align s390 with AER and EEH by emitting uevents during error
     recovery (Niklas Schnelle)

   - Align EEH with AER and s390 by emitting BEGIN_RECOVERY,
     SUCCESSFUL_RECOVERY, or FAILED_RECOVERY uevents depending on the
     result of err_handler.error_detected() (Niklas Schnelle)

   - Fix a NULL pointer dereference in aer_ratelimit() when ACPI GHES
     error information identifies a device without an AER Capability
     (Breno Leitao)

   - Update error decoding and TLP Log printing for new errors in
     current PCIe base spec (Lukas Wunner)

   - Update error recovery documentation to match the current code
     and use consistent nomenclature (Lukas Wunner)

  ASPM:

   - Enable all ClockPM and ASPM states for devicetree platforms, since
     there's typically no firmware that enables ASPM

     This is a risky change that may uncover hardware or configuration
     defects at boot-time rather than when users enable ASPM via sysfs
     later. Booting with "pcie_aspm=off" prevents this enabling
     (Manivannan Sadhasivam)

   - Remove the qcom code that enabled ASPM (Manivannan Sadhasivam)

  Power management:

   - If a device has already been disconnected, e.g., by a hotplug
     removal, don't bother trying to resume it to D0 when detaching the
     driver.

     This avoids annoying "Unable to change power state from D3cold to
     D0" messages (Mario Limonciello)

   - Ensure devices are powered up before config reads for
     'max_link_width', 'current_link_speed', 'current_link_width',
     'secondary_bus_number', and 'subordinate_bus_number' sysfs files.

     This prevents using invalid data (~0) in drivers or lspci and,
     depending on how the PCIe controller reports errors, may avoid
     error interrupts or crashes (Brian Norris)

  Virtualization:

   - Add rescan/remove locking when enabling/disabling SR-IOV, which
     avoids list corruption on s390, where disabling SR-IOV also
     generates hotplug events (Niklas Schnelle)

  Peer-to-peer DMA:

   - Free struct p2p_pgmap, not a member within it, in the
     pci_p2pdma_add_resource() error path (Sungho Kim)

  Endpoint framework:

   - Document sysfs interface for BAR assignment of vNTB endpoint
     functions (Jerome Brunet)

   - Fix array underflow in endpoint BAR test case (Dan Carpenter)

   - Skip endpoint IRQ test if the IRQ is out of range to avoid false
     errors (Christian Bruel)

   - Fix endpoint test case for controllers with fixed-size BARs smaller
     than requested by the test (Marek Vasut)

   - Restore inbound translation when disabling doorbell so the endpoint
     doorbell test case can be run more than once (Niklas Cassel)

   - Avoid a NULL pointer dereference when releasing DMA channels in
     endpoint DMA test case (Shin'ichiro Kawasaki)

   - Convert tegra194 interrupt number to MSI vector to fix endpoint
     Kselftest MSI_TEST test case (Niklas Cassel)

   - Reset tegra194 BARs when running in endpoint mode so the BAR tests
     don't overwrite the ATU settings in BAR4 (Niklas Cassel)

   - Handle errors in tegra194 BPMP transactions so we don't mistakenly
     skip future PERST# assertion (Vidya Sagar)

  AMD MDB PCIe controller driver:

   - Update DT binding example to separate PERST# to a Root Port stanza
     to make multiple Root Ports possible in the future (Sai Krishna
     Musham)

   - Add driver support for PERST# being described in a Root Port
     stanza, falling back to the host bridge if not found there (Sai
     Krishna Musham)

  Freescale i.MX6 PCIe controller driver:

   - Enable the 3.3V Vaux supply if available so devices can request
     wakeup with either Beacon or WAKE# (Richard Zhu)

  MediaTek PCIe Gen3 controller driver:

   - Add optional sys clock ready time setting to avoid sys_clk_rdy
     signal glitching in MT6991 and MT8196 (AngeloGioacchino Del Regno)

   - Add DT binding and driver support for MT6991 and MT8196
     (AngeloGioacchino Del Regno)

  NVIDIA Tegra PCIe controller driver:

   - When asserting PERST#, disable the controller instead of mistakenly
     disabling the PLL twice (Nagarjuna Kristam)

   - Convert struct tegra_msi mask_lock to raw spinlock to avoid a lock
     nesting error (Marek Vasut)

  Qualcomm PCIe controller driver:

   - Select PCI Power Control Slot driver so slot voltage rails can be
     turned on/off if described in Root Port devicetree node (Qiang Yu)

   - Parse only PCI bridge child nodes in devicetree, skipping unrelated
     nodes such as OPP (Operating Performance Points), which caused
     probe failures (Krishna Chaitanya Chundru)

   - Add 8.0 GT/s and 32.0 GT/s equalization settings (Ziyue Zhang)

   - Consolidate Root Port 'phy' and 'reset' properties in struct
     qcom_pcie_port, regardless of whether we got them from the Root
     Port node or the host bridge node (Manivannan Sadhasivam)

   - Fetch and map the ELBI register space in the DWC core rather than
     in each driver individually (Krishna Chaitanya Chundru)

   - Enable ECAM mechanism in DWC core by setting up iATU with 'CFG
     Shift Feature' and use this in the qcom driver (Krishna Chaitanya
     Chundru)

   - Add SM8750 compatible to qcom,pcie-sm8550.yaml (Krishna Chaitanya
     Chundru)

   - Update qcom,pcie-x1e80100.yaml to allow fifth PCIe host on Qualcomm
     Glymur, which is compatible with X1E80100 but doesn't have the
     cnoc_sf_axi clock (Qiang Yu)

  Renesas R-Car PCIe controller driver:

   - Fix a typo that prevented correct PHY initialization (Marek Vasut)

   - Add a missing 1ms delay after PWR reset assertion as required by
     the V4H manual (Marek Vasut)

   - Assure reset has completed before DBI access to avoid SError (Marek
     Vasut)

   - Fix inverted PHY initialization check, which sometimes led to
     timeouts and failure to start the controller (Marek Vasut)

   - Pass the correct IRQ domain to generic_handle_domain_irq() to fix a
     regression when converting to msi_create_parent_irq_domain()
     (Claudiu Beznea)

   - Drop the spinlock protecting the PMSR register - it's no longer
     required since pci_lock already serializes accesses (Marek Vasut)

   - Convert struct rcar_msi mask_lock to raw spinlock to avoid a lock
     nesting error (Marek Vasut)

  SOPHGO PCIe controller driver:

   - Check for existence of struct cdns_pcie.ops before using it to
     allow Cadence drivers that don't need to supply ops (Chen Wang)

   - Add DT binding and driver for the SOPHGO SG2042 PCIe controller
     (Chen Wang)

  STMicroelectronics STM32MP25 PCIe controller driver:

   - Update pinctrl documentation of initial states and use in runtime
     suspend/resume (Christian Bruel)

   - Add pinctrl_pm_select_init_state() for use by stm32 driver, which
     needs it during resume (Christian Bruel)

   - Add devicetree bindings and drivers for the STMicroelectronics
     STM32MP25 in host and endpoint modes (Christian Bruel)

  Synopsys DesignWare PCIe controller driver:

   - Add support for x16 in devicetree 'num-lanes' property (Konrad
     Dybcio)

   - Verify that if DT specifies a single IRQ for all eDMA channels, it
     is named 'dma' (Niklas Cassel)

  TI J721E PCIe driver:

   - Add MODULE_DEVICE_TABLE() so driver can be autoloaded (Siddharth
     Vadapalli)

   - Power controller off before configuring the glue layer so the
     controller latches the correct values on power-on (Siddharth
     Vadapalli)

  TI Keystone PCIe controller driver:

   - Use devm_request_irq() so 'ks-pcie-error-irq' is freed when driver
     exits with error (Siddharth Vadapalli)

   - Add Peripheral Virtualization Unit (PVU), which restricts DMA from
     PCIe devices to specific regions of host memory, to the ti,am65
     binding (Jan Kiszka)

  Xilinx NWL PCIe controller driver:

   - Clear bootloader E_ECAM_CONTROL before merging in the new driver
     value to avoid writing invalid values (Jani Nurminen)"

* tag 'pci-v6.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (141 commits)
  PCI/AER: Avoid NULL pointer dereference in aer_ratelimit()
  MAINTAINERS: Add entry for ST STM32MP25 PCIe drivers
  PCI: stm32-ep: Add PCIe Endpoint support for STM32MP25
  dt-bindings: PCI: Add STM32MP25 PCIe Endpoint bindings
  PCI: stm32: Add PCIe host support for STM32MP25
  PCI: xilinx-nwl: Fix ECAM programming
  PCI: j721e: Fix incorrect error message in probe()
  PCI: keystone: Use devm_request_irq() to free "ks-pcie-error-irq" on exit
  dt-bindings: PCI: qcom,pcie-x1e80100: Set clocks minItems for the fifth Glymur PCIe Controller
  PCI: dwc: Support 16-lane operation
  PCI: Add lockdep assertion in pci_stop_and_remove_bus_device()
  PCI/IOV: Add PCI rescan-remove locking when enabling/disabling SR-IOV
  PCI: rcar-host: Convert struct rcar_msi mask_lock into raw spinlock
  PCI: tegra194: Rename 'root_bus' to 'root_port_bus' in tegra_pcie_downstream_dev_to_D0()
  PCI: tegra: Convert struct tegra_msi mask_lock into raw spinlock
  PCI: rcar-gen4: Fix inverted break condition in PHY initialization
  PCI: rcar-gen4: Assure reset occurs before DBI access
  PCI: rcar-gen4: Add missing 1ms delay after PWR reset assertion
  PCI: Set up bridge resources earlier
  PCI: rcar-host: Drop PMSR spinlock
  ...
2025-10-06 10:41:03 -07:00
Linus Torvalds 674b0ddb75 SCSI misc on 20251002
Usual driver updates (ufs, mpi3mr, lpfc, pm80xx, mpt3sas) plus
 assorted cleanups and fixes.  The only core update is to sd.c and is
 mostly cosmetic.
 
 Signed-off-by: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
 -----BEGIN PGP SIGNATURE-----
 
 iLgEABMIAGAWIQTnYEDbdso9F2cI+arnQslM7pishQUCaN7vMhsUgAAAAAAEAA5t
 YW51MiwyLjUrMS4xMSwyLDImHGphbWVzLmJvdHRvbWxleUBoYW5zZW5wYXJ0bmVy
 c2hpcC5jb20ACgkQ50LJTO6YrIXdXgD9HsCcnqb5KzHs0EMzy752KjHoHeS5Tg59
 CkTZcpTFAfgBAJJIHo5AripQ0cqPqTzVEkhWbd9q277A3HLgu3NFcsWG
 =seOt
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "Usual driver updates (ufs, mpi3mr, lpfc, pm80xx, mpt3sas) plus
  assorted cleanups and fixes.

  The only core update is to sd.c and is mostly cosmetic"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (105 commits)
  scsi: MAINTAINERS: Update FC element owners
  scsi: mpt3sas: Update driver version to 54.100.00.00
  scsi: mpt3sas: Add support for 22.5 Gbps SAS link rate
  scsi: mpt3sas: Suppress unnecessary IOCLogInfo on CONFIG_INVALID_PAGE
  scsi: mpt3sas: Fix crash in transport port remove by using ioc_info()
  scsi: ufs: ufs-qcom: Add support for limiting HS gear and rate
  scsi: ufs: pltfrm: Add DT support to limit HS gear and gear rate
  scsi: ufs: ufs-qcom: Remove redundant re-assignment to hs_rate
  scsi: ufs: dt-bindings: Document gear and rate limit properties
  scsi: ufs: core: Fix data race in CPU latency PM QoS request handling
  scsi: libfc: Fix potential buffer overflow in fc_ct_ms_fill()
  scsi: storvsc: Remove redundant ternary operators
  scsi: ufs: core: Change MCQ interrupt enable flow
  scsi: smartpqi: Replace kmalloc() + copy_from_user() with memdup_user()
  scsi: hpsa: Replace kmalloc() + copy_from_user() with memdup_user()
  scsi: hpsa: Fix potential memory leak in hpsa_big_passthru_ioctl()
  scsi: lpfc: Copyright updates for 14.4.0.11 patches
  scsi: lpfc: Update lpfc version to 14.4.0.11
  scsi: lpfc: Convert debugfs directory counts from atomic to unsigned int
  scsi: lpfc: Clean up extraneous phba dentries
  ...
2025-10-03 19:17:48 -07:00
Linus Torvalds 8804d970fa Summary of significant series in this pull request:
- The 3 patch series "mm, swap: improve cluster scan strategy" from
   Kairui Song improves performance and reduces the failure rate of swap
   cluster allocation.
 
 - The 4 patch series "support large align and nid in Rust allocators"
   from Vitaly Wool permits Rust allocators to set NUMA node and large
   alignment when perforning slub and vmalloc reallocs.
 
 - The 2 patch series "mm/damon/vaddr: support stat-purpose DAMOS" from
   Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets
   for virtual address spaces for ops-level DAMOS filters.
 
 - The 3 patch series "execute PROCMAP_QUERY ioctl under per-vma lock"
   from Suren Baghdasaryan reduces mmap_lock contention during reads of
   /proc/pid/maps.
 
 - The 2 patch series "mm/mincore: minor clean up for swap cache
   checking" from Kairui Song performs some cleanup in the swap code.
 
 - The 11 patch series "mm: vm_normal_page*() improvements" from David
   Hildenbrand provides code cleanup in the pagemap code.
 
 - The 5 patch series "add persistent huge zero folio support" from
   Pankaj Raghav provides a block layer speedup by optionalls making the
   huge_zero_pagepersistent, instead of releasing it when its refcount
   falls to zero.
 
 - The 3 patch series "kho: fixes and cleanups" from Mike Rapoport adds a
   few touchups to the recently added Kexec Handover feature.
 
 - The 10 patch series "mm: make mm->flags a bitmap and 64-bit on all
   arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap.  To
   end the constant struggle with space shortage on 32-bit conflicting with
   64-bit's needs.
 
 - The 2 patch series "mm/swapfile.c and swap.h cleanup" from Chris Li
   cleans up some swap code.
 
 - The 7 patch series "selftests/mm: Fix false positives and skip
   unsupported tests" from Donet Tom fixes a few things in our selftests
   code.
 
 - The 7 patch series "prctl: extend PR_SET_THP_DISABLE to only provide
   THPs when advised" from David Hildenbrand "allows individual processes
   to opt-out of THP=always into THP=madvise, without affecting other
   workloads on the system".
 
   It's a long story - the [1/N] changelog spells out the considerations.
 
 - The 11 patch series "Add and use memdesc_flags_t" from Matthew Wilcox
   gets us started on the memdesc project.  Please see
   https://kernelnewbies.org/MatthewWilcox/Memdescs and
   https://blogs.oracle.com/linux/post/introducing-memdesc.
 
 - The 3 patch series "Tiny optimization for large read operations" from
   Chi Zhiling improves the efficiency of the pagecache read path.
 
 - The 5 patch series "Better split_huge_page_test result check" from Zi
   Yan improves our folio splitting selftest code.
 
 - The 2 patch series "test that rmap behaves as expected" from Wei Yang
   adds some rmap selftests.
 
 - The 3 patch series "remove write_cache_pages()" from Christoph Hellwig
   removes that function and converts its two remaining callers.
 
 - The 2 patch series "selftests/mm: uffd-stress fixes" from Dev Jain
   fixes some UFFD selftests issues.
 
 - The 3 patch series "introduce kernel file mapped folios" from Boris
   Burkov introduces the concept of "kernel file pages".  Using these
   permits btrfs to account its metadata pages to the root cgroup, rather
   than to the cgroups of random inappropriate tasks.
 
 - The 2 patch series "mm/pageblock: improve readability of some
   pageblock handling" from Wei Yang provides some readability improvements
   to the page allocator code.
 
 - The 11 patch series "mm/damon: support ARM32 with LPAE" from SeongJae
   Park teaches DAMON to understand arm32 highmem.
 
 - The 4 patch series "tools: testing: Use existing atomic.h for
   vma/maple tests" from Brendan Jackman performs some code cleanups and
   deduplication under tools/testing/.
 
 - The 2 patch series "maple_tree: Fix testing for 32bit compiles" from
   Liam Howlett fixes a couple of 32-bit issues in
   tools/testing/radix-tree.c.
 
 - The 2 patch series "kasan: unify kasan_enabled() and remove
   arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN
   arch-specific initialization code into a common arch-neutral
   implementation.
 
 - The 3 patch series "mm: remove zpool" from Johannes Weiner removes
   zspool - an indirection layer which now only redirects to a single thing
   (zsmalloc).
 
 - The 2 patch series "mm: task_stack: Stack handling cleanups" from
   Pasha Tatashin makes a couple of cleanups in the fork code.
 
 - The 37 patch series "mm: remove nth_page()" from David Hildenbrand
   makes rather a lot of adjustments at various nth_page() callsites,
   eventually permitting the removal of that undesirable helper function.
 
 - The 2 patch series "introduce kasan.write_only option in hw-tags" from
   Yeoreum Yun creates a KASAN read-only mode for ARM, using that
   architecture's memory tagging feature.  It is felt that a read-only mode
   KASAN is suitable for use in production systems rather than debug-only.
 
 - The 3 patch series "mm: hugetlb: cleanup hugetlb folio allocation"
   from Kefeng Wang does some tidying in the hugetlb folio allocation code.
 
 - The 12 patch series "mm: establish const-correctness for pointer
   parameters" from Max Kellermann makes quite a number of the MM API
   functions more accurate about the constness of their arguments.  This
   was getting in the way of subsystems (in this case CEPH) when they
   attempt to improving their own const/non-const accuracy.
 
 - The 7 patch series "Cleanup free_pages() misuse" from Vishal Moola
   fixes a number of code sites which were confused over when to use
   free_pages() vs __free_pages().
 
 - The 3 patch series "Add Rust abstraction for Maple Trees" from Alice
   Ryhl makes the mapletree code accessible to Rust.  Required by nouveau
   and by its forthcoming successor: the new Rust Nova driver.
 
 - The 2 patch series "selftests/mm: split_huge_page_test:
   split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and
   some cleanups to the thp selftesting code.
 
 - The 14 patch series "mm, swap: introduce swap table as swap cache
   (phase I)" from Chris Li and Kairui Song is the first step along the
   path to implementing "swap tables" - a new approach to swap allocation
   and state tracking which is expected to yield speed and space
   improvements.  This patchset itself yields a 5-20% performance benefit
   in some situations.
 
 - The 3 patch series "Some ptdesc cleanups" from Matthew Wilcox utilizes
   the new memdesc layer to clean up the ptdesc code a little.
 
 - The 3 patch series "Fix va_high_addr_switch.sh test failure" from
   Chunyu Hu fixes some issues in our 5-level pagetable selftesting code.
 
 - The 2 patch series "Minor fixes for memory allocation profiling" from
   Suren Baghdasaryan addresses a couple of minor issues in relatively new
   memory allocation profiling feature.
 
 - The 3 patch series "Small cleanups" from Matthew Wilcox has a few
   cleanups in preparation for more memdesc work.
 
 - The 2 patch series "mm/damon: add addr_unit for DAMON_LRU_SORT and
   DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in
   furtherance of supporting arm highmem.
 
 - The 2 patch series "selftests/mm: Add -Wunreachable-code and fix
   warnings" from Muhammad Anjum adds that compiler check to selftests code
   and fixes the fallout, by removing dead code.
 
 - The 10 patch series "Improvements to Victim Process Thawing and OOM
   Reaper Traversal Order" from zhongjinji makes a number of improvements
   in the OOM killer: mainly thawing a more appropriate group of victim
   threads so they can release resources.
 
 - The 5 patch series "mm/damon: misc fixups and improvements for 6.18"
   from SeongJae Park is a bunch of small and unrelated fixups for DAMON.
 
 - The 7 patch series "mm/damon: define and use DAMON initialization
   check function" from SeongJae Park implement reliability and
   maintainability improvements to a recently-added bug fix.
 
 - The 2 patch series "mm/damon/stat: expose auto-tuned intervals and
   non-idle ages" from SeongJae Park provides additional transparency to
   userspace clients of the DAMON_STAT information.
 
 - The 2 patch series "Expand scope of khugepaged anonymous collapse"
   from Dev Jain removes some constraints on khubepaged's collapsing of
   anon VMAs.  It also increases the success rate of MADV_COLLAPSE against
   an anon vma.
 
 - The 2 patch series "mm: do not assume file == vma->vm_file in
   compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards
   removal of file_operations.mmap().  This patchset concentrates upon
   clearing up the treatment of stacked filesystems.
 
 - The 6 patch series "mm: Improve mlock tracking for large folios" from
   Kiryl Shutsemau provides some fixes and improvements to mlock's tracking
   of large folios.  /proc/meminfo's "Mlocked" field became more accurate.
 
 - The 2 patch series "mm/ksm: Fix incorrect accounting of KSM counters
   during fork" from Donet Tom fixes several user-visible KSM stats
   inaccuracies across forks and adds selftest code to verify these
   counters.
 
 - The 2 patch series "mm_slot: fix the usage of mm_slot_entry" from Wei
   Yang addresses some potential but presently benign issues in KSM's
   mm_slot handling.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaN3cywAKCRDdBJ7gKXxA
 jtaPAQDmIuIu7+XnVUK5V11hsQ/5QtsUeLHV3OsAn4yW5/3dEQD/UddRU08ePN+1
 2VRB0EwkLAdfMWW7TfiNZ+yhuoiL/AA=
 =4mhY
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - "mm, swap: improve cluster scan strategy" from Kairui Song improves
   performance and reduces the failure rate of swap cluster allocation

 - "support large align and nid in Rust allocators" from Vitaly Wool
   permits Rust allocators to set NUMA node and large alignment when
   perforning slub and vmalloc reallocs

 - "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend
   DAMOS_STAT's handling of the DAMON operations sets for virtual
   address spaces for ops-level DAMOS filters

 - "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren
   Baghdasaryan reduces mmap_lock contention during reads of
   /proc/pid/maps

 - "mm/mincore: minor clean up for swap cache checking" from Kairui Song
   performs some cleanup in the swap code

 - "mm: vm_normal_page*() improvements" from David Hildenbrand provides
   code cleanup in the pagemap code

 - "add persistent huge zero folio support" from Pankaj Raghav provides
   a block layer speedup by optionalls making the
   huge_zero_pagepersistent, instead of releasing it when its refcount
   falls to zero

 - "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to
   the recently added Kexec Handover feature

 - "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo
   Stoakes turns mm_struct.flags into a bitmap. To end the constant
   struggle with space shortage on 32-bit conflicting with 64-bit's
   needs

 - "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap
   code

 - "selftests/mm: Fix false positives and skip unsupported tests" from
   Donet Tom fixes a few things in our selftests code

 - "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised"
   from David Hildenbrand "allows individual processes to opt-out of
   THP=always into THP=madvise, without affecting other workloads on the
   system".

   It's a long story - the [1/N] changelog spells out the considerations

 - "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on
   the memdesc project. Please see

      https://kernelnewbies.org/MatthewWilcox/Memdescs and
      https://blogs.oracle.com/linux/post/introducing-memdesc

 - "Tiny optimization for large read operations" from Chi Zhiling
   improves the efficiency of the pagecache read path

 - "Better split_huge_page_test result check" from Zi Yan improves our
   folio splitting selftest code

 - "test that rmap behaves as expected" from Wei Yang adds some rmap
   selftests

 - "remove write_cache_pages()" from Christoph Hellwig removes that
   function and converts its two remaining callers

 - "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD
   selftests issues

 - "introduce kernel file mapped folios" from Boris Burkov introduces
   the concept of "kernel file pages". Using these permits btrfs to
   account its metadata pages to the root cgroup, rather than to the
   cgroups of random inappropriate tasks

 - "mm/pageblock: improve readability of some pageblock handling" from
   Wei Yang provides some readability improvements to the page allocator
   code

 - "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON
   to understand arm32 highmem

 - "tools: testing: Use existing atomic.h for vma/maple tests" from
   Brendan Jackman performs some code cleanups and deduplication under
   tools/testing/

 - "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes
   a couple of 32-bit issues in tools/testing/radix-tree.c

 - "kasan: unify kasan_enabled() and remove arch-specific
   implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific
   initialization code into a common arch-neutral implementation

 - "mm: remove zpool" from Johannes Weiner removes zspool - an
   indirection layer which now only redirects to a single thing
   (zsmalloc)

 - "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a
   couple of cleanups in the fork code

 - "mm: remove nth_page()" from David Hildenbrand makes rather a lot of
   adjustments at various nth_page() callsites, eventually permitting
   the removal of that undesirable helper function

 - "introduce kasan.write_only option in hw-tags" from Yeoreum Yun
   creates a KASAN read-only mode for ARM, using that architecture's
   memory tagging feature. It is felt that a read-only mode KASAN is
   suitable for use in production systems rather than debug-only

 - "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does
   some tidying in the hugetlb folio allocation code

 - "mm: establish const-correctness for pointer parameters" from Max
   Kellermann makes quite a number of the MM API functions more accurate
   about the constness of their arguments. This was getting in the way
   of subsystems (in this case CEPH) when they attempt to improving
   their own const/non-const accuracy

 - "Cleanup free_pages() misuse" from Vishal Moola fixes a number of
   code sites which were confused over when to use free_pages() vs
   __free_pages()

 - "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the
   mapletree code accessible to Rust. Required by nouveau and by its
   forthcoming successor: the new Rust Nova driver

 - "selftests/mm: split_huge_page_test: split_pte_mapped_thp
   improvements" from David Hildenbrand adds a fix and some cleanups to
   the thp selftesting code

 - "mm, swap: introduce swap table as swap cache (phase I)" from Chris
   Li and Kairui Song is the first step along the path to implementing
   "swap tables" - a new approach to swap allocation and state tracking
   which is expected to yield speed and space improvements. This
   patchset itself yields a 5-20% performance benefit in some situations

 - "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc
   layer to clean up the ptdesc code a little

 - "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some
   issues in our 5-level pagetable selftesting code

 - "Minor fixes for memory allocation profiling" from Suren Baghdasaryan
   addresses a couple of minor issues in relatively new memory
   allocation profiling feature

 - "Small cleanups" from Matthew Wilcox has a few cleanups in
   preparation for more memdesc work

 - "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from
   Quanmin Yan makes some changes to DAMON in furtherance of supporting
   arm highmem

 - "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad
   Anjum adds that compiler check to selftests code and fixes the
   fallout, by removing dead code

 - "Improvements to Victim Process Thawing and OOM Reaper Traversal
   Order" from zhongjinji makes a number of improvements in the OOM
   killer: mainly thawing a more appropriate group of victim threads so
   they can release resources

 - "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park
   is a bunch of small and unrelated fixups for DAMON

 - "mm/damon: define and use DAMON initialization check function" from
   SeongJae Park implement reliability and maintainability improvements
   to a recently-added bug fix

 - "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from
   SeongJae Park provides additional transparency to userspace clients
   of the DAMON_STAT information

 - "Expand scope of khugepaged anonymous collapse" from Dev Jain removes
   some constraints on khubepaged's collapsing of anon VMAs. It also
   increases the success rate of MADV_COLLAPSE against an anon vma

 - "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()"
   from Lorenzo Stoakes moves us further towards removal of
   file_operations.mmap(). This patchset concentrates upon clearing up
   the treatment of stacked filesystems

 - "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau
   provides some fixes and improvements to mlock's tracking of large
   folios. /proc/meminfo's "Mlocked" field became more accurate

 - "mm/ksm: Fix incorrect accounting of KSM counters during fork" from
   Donet Tom fixes several user-visible KSM stats inaccuracies across
   forks and adds selftest code to verify these counters

 - "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses
   some potential but presently benign issues in KSM's mm_slot handling

* tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits)
  mm: swap: check for stable address space before operating on the VMA
  mm: convert folio_page() back to a macro
  mm/khugepaged: use start_addr/addr for improved readability
  hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
  alloc_tag: fix boot failure due to NULL pointer dereference
  mm: silence data-race in update_hiwater_rss
  mm/memory-failure: don't select MEMORY_ISOLATION
  mm/khugepaged: remove definition of struct khugepaged_mm_slot
  mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL
  hugetlb: increase number of reserving hugepages via cmdline
  selftests/mm: add fork inheritance test for ksm_merging_pages counter
  mm/ksm: fix incorrect KSM counter handling in mm_struct during fork
  drivers/base/node: fix double free in register_one_node()
  mm: remove PMD alignment constraint in execmem_vmalloc()
  mm/memory_hotplug: fix typo 'esecially' -> 'especially'
  mm/rmap: improve mlock tracking for large folios
  mm/filemap: map entire large folio faultaround
  mm/fault: try to map the entire file folio in finish_fault()
  mm/rmap: mlock large folios in try_to_unmap_one()
  mm/rmap: fix a mlock race condition in folio_referenced_one()
  ...
2025-10-02 18:18:33 -07:00
Linus Torvalds e1b1d03cee for-6.18/block-20250929
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmjbLCgQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpoY0D/9J+11BC88pBxCrLKv/V2TwCNokRMi0dU3L
 r3EUdA46k0oXmvb6ueZqIcfY2e+IX7rdQkaRbh1zRdsNejqHo4548C3ePWGdBAcM
 OdNEGfpehO0aD0td1+mK/NxoJMLhbs5QraPanz+SOkGZOKeF+vGCga5PUDivsr5J
 16T9yb7i+isENLdAc2RJbZVyAphqHQlo5GHi5ZIKOVi5cNt8GU/q2sQl7NYmGvHd
 aq37svvZHFOhLRajP959Fw9WOxEYITewzQ4UYf1FZjUodJUxO+vCnP0ooBQRlyu8
 1B4PYWwSE+Vn3GkQE0Om+mzo9AVPOiLmoAWGxdgJBMyEkZndocr46XEslXOufQ1Z
 T3Gu19G6jCxcyByNVhjVnaajYKmvSQAy1w75m4XlfqTRm4f9Om+LAJavUk3RuaOL
 7lXKQ7Ql1/Tby9Jmf8afjYYXXotNDNku6rz2P3qtOwAA26mNJfgVt0rO+8XGRDe9
 ioLbCkTjslYMc/Oh4jSsbrspsVALbaQMq/Dmah8k0EWb4QAHVgCJyGBoff3hOboI
 jD6B1enaKOQVgcjWcjm/FjOk3jv2h3v4X26YWQZTvEc/1PnSnST78Zi/ePhzDdmt
 sBALUAS37TfTgNMzrhbHl5Zs13k0C0XyANuayuKuo5hlNnC1wbdap+5FZJOmpuOB
 YT+VkYnaOA==
 =kOmc
 -----END PGP SIGNATURE-----

Merge tag 'for-6.18/block-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block updates from Jens Axboe:

 - NVMe pull request via Keith:
     - FC target fixes (Daniel)
     - Authentication fixes and updates (Martin, Chris)
     - Admin controller handling (Kamaljit)
     - Target lockdep assertions (Max)
     - Keep-alive updates for discovery (Alastair)
     - Suspend quirk (Georg)

 - MD pull request via Yu:
     - Add support for a lockless bitmap.

       A key feature for the new bitmap are that the IO fastpath is
       lockless. If a user issues lots of write IO to the same bitmap
       bit in a short time, only the first write has additional overhead
       to update bitmap bit, no additional overhead for the following
       writes.

       By supporting only resync or recover written data, means in the
       case creating new array or replacing with a new disk, there is no
       need to do a full disk resync/recovery.

 - Switch ->getgeo() and ->bios_param() to using struct gendisk rather
   than struct block_device.

 - Rust block changes via Andreas. This series adds configuration via
   configfs and remote completion to the rnull driver. The series also
   includes a set of changes to the rust block device driver API: a few
   cleanup patches, and a few features supporting the rnull changes.

   The series removes the raw buffer formatting logic from
   `kernel::block` and improves the logic available in `kernel::string`
   to support the same use as the removed logic.

 - floppy arch cleanups

 - Reduce the number of dereferencing needed for ublk commands

 - Restrict supported sockets for nbd. Mostly done to eliminate a class
   of issues perpetually reported by syzbot, by using nonsensical socket
   setups.

 - A few s390 dasd block fixes

 - Fix a few issues around atomic writes

 - Improve DMA interation for integrity requests

 - Improve how iovecs are treated with regards to O_DIRECT aligment
   constraints.

   We used to require each segment to adhere to the constraints, now
   only the request as a whole needs to.

 - Clean up and improve p2p support, enabling use of p2p for metadata
   payloads

 - Improve locking of request lookup, using SRCU where appropriate

 - Use page references properly for brd, avoiding very long RCU sections

 - Fix ordering of recursively submitted IOs

 - Clean up and improve updating nr_requests for a live device

 - Various fixes and cleanups

* tag 'for-6.18/block-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (164 commits)
  s390/dasd: enforce dma_alignment to ensure proper buffer validation
  s390/dasd: Return BLK_STS_INVAL for EINVAL from do_dasd_request
  ublk: remove redundant zone op check in ublk_setup_iod()
  nvme: Use non zero KATO for persistent discovery connections
  nvmet: add safety check for subsys lock
  nvme-core: use nvme_is_io_ctrl() for I/O controller check
  nvme-core: do ioccsz/iorcsz validation only for I/O controllers
  nvme-core: add method to check for an I/O controller
  blk-cgroup: fix possible deadlock while configuring policy
  blk-mq: fix null-ptr-deref in blk_mq_free_tags() from error path
  blk-mq: Fix more tag iteration function documentation
  selftests: ublk: fix behavior when fio is not installed
  ublk: don't access ublk_queue in ublk_unmap_io()
  ublk: pass ublk_io to __ublk_complete_rq()
  ublk: don't access ublk_queue in ublk_need_complete_req()
  ublk: don't access ublk_queue in ublk_check_commit_and_fetch()
  ublk: don't pass ublk_queue to ublk_fetch()
  ublk: don't access ublk_queue in ublk_config_io_buf()
  ublk: don't access ublk_queue in ublk_check_fetch_buf()
  ublk: pass q_id and tag to __ublk_check_and_get_req()
  ...
2025-10-02 10:16:56 -07:00
Mukesh Rathor 94b04355e6 Drivers: hv: Add CONFIG_HYPERV_VMBUS option
At present VMBus driver is hinged off of CONFIG_HYPERV which entails
lot of builtin code and encompasses too much. It's not always clear
what depends on builtin hv code and what depends on VMBus. Setting
CONFIG_HYPERV as a module and fudging the Makefile to switch to builtin
adds even more confusion. VMBus is an independent module and should have
its own config option. Also, there are scenarios like baremetal dom0/root
where support is built in with CONFIG_HYPERV but without VMBus. Lastly,
there are more features coming down that use CONFIG_HYPERV and add more
dependencies on it.

So, create a fine grained HYPERV_VMBUS option and update Kconfigs for
dependency on VMBus.

Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>	# drivers/pci
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-10-01 00:00:42 +00:00
Duoming Zhou 60cd16a3b7 scsi: mvsas: Fix use-after-free bugs in mvs_work_queue
During the detaching of Marvell's SAS/SATA controller, the original code
calls cancel_delayed_work() in mvs_free() to cancel the delayed work
item mwq->work_q. However, if mwq->work_q is already running, the
cancel_delayed_work() may fail to cancel it. This can lead to
use-after-free scenarios where mvs_free() frees the mvs_info while
mvs_work_queue() is still executing and attempts to access the
already-freed mvs_info.

A typical race condition is illustrated below:

CPU 0 (remove)            | CPU 1 (delayed work callback)
mvs_pci_remove()          |
  mvs_free()              | mvs_work_queue()
    cancel_delayed_work() |
      kfree(mvi)          |
                          |   mvi-> // UAF

Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure
that the delayed work item is properly canceled and any executing
delayed work item completes before the mvs_info is deallocated.

This bug was found by static analysis.

Fixes: 20b09c2992 ("[SCSI] mvsas: add support for 94xx; layout change; bug fixes")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-09-29 17:26:20 -04:00
John Meneghini 285654d58a Revert "scsi: qla2xxx: Fix memcpy() field-spanning write issue"
This reverts commit 6f4b10226b.

We've been testing this patch and it turns out there is a significant
bug here. This leaks memory and causes a driver hang.

Link: https://lore.kernel.org/linux-scsi/yq1zfajqpec.fsf@ca-mkp.ca.oracle.com/
Signed-off-by: John Meneghini <jmeneghi@redhat.com>
Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-09-29 16:35:30 -04:00
Martin K. Petersen 408445e9c1 Merge patch series "mpt3sas: Few Enhancements and minor fixes"
Ranjan Kumar <ranjan.kumar@broadcom.com> says:

Few Enhancements and minor fixes of mpt3sas driver.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-09-24 22:35:14 -04:00
Ranjan Kumar 7e5a43897a scsi: mpt3sas: Update driver version to 54.100.00.00
Updated driver version to 54.100.00.00.

Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Message-Id: <20250922095113.281484-5-ranjan.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-09-24 22:34:43 -04:00
Ranjan Kumar 4be7599d6b scsi: mpt3sas: Add support for 22.5 Gbps SAS link rate
Add handling for MPI26_SAS_NEG_LINK_RATE_22_5 in
_transport_convert_phy_link_rate(). This maps the new 22.5 Gbps
negotiated rate to SAS_LINK_RATE_22_5_GBPS, to get correct PHY link
speeds.

Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Message-Id: <20250922095113.281484-4-ranjan.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-09-24 22:34:43 -04:00
Ranjan Kumar 27f0b41de1 scsi: mpt3sas: Suppress unnecessary IOCLogInfo on CONFIG_INVALID_PAGE
Avoid unconditional IOCLogInfo prints for CONFIG_INVALID_PAGE.  Log only
if MPT_DEBUG_REPLY is enabled or when loginfo represents other
errors. This reduces uncessary logging without losing useful error
reporting.

Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-09-24 22:34:42 -04:00
Ranjan Kumar 1703fe4f8a scsi: mpt3sas: Fix crash in transport port remove by using ioc_info()
During mpt3sas_transport_port_remove(), messages were logged with
dev_printk() against &mpt3sas_port->port->dev. At this point the SAS
transport device may already be partially unregistered or freed, leading
to a crash when accessing its struct device.

Using ioc_info(), which logs via the PCI device (ioc->pdev->dev),
guaranteed to remain valid until driver removal.

[83428.295776] Oops: general protection fault, probably for non-canonical address 0x6f702f323a33312d: 0000 [#1] SMP NOPTI
[83428.295785] CPU: 145 UID: 0 PID: 113296 Comm: rmmod Kdump: loaded Tainted: G           OE       6.16.0-rc1+ #1 PREEMPT(voluntary)
[83428.295792] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[83428.295795] Hardware name: Dell Inc. Precision 7875 Tower/, BIOS 89.1.67 02/23/2024
[83428.295799] RIP: 0010:__dev_printk+0x1f/0x70
[83428.295805] Code: 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 49 89 d1 48 85 f6 74 52 4c 8b 46 50 4d 85 c0 74 1f 48 8b 46 68 48 85 c0 74 22 <48> 8b 08 0f b6 7f 01 48 c7 c2 db e8 42 ad 83 ef 30 e9 7b f8 ff ff
[83428.295813] RSP: 0018:ff85aeafc3137bb0 EFLAGS: 00010206
[83428.295817] RAX: 6f702f323a33312d RBX: ff4290ee81292860 RCX: 5000cca25103be32
[83428.295820] RDX: ff85aeafc3137bb8 RSI: ff4290eeb1966c00 RDI: ffffffffc1560845
[83428.295823] RBP: ff85aeafc3137c18 R08: 74726f702f303a33 R09: ff85aeafc3137bb8
[83428.295826] R10: ff85aeafc3137b18 R11: ff4290f5bd60fe68 R12: ff4290ee81290000
[83428.295830] R13: ff4290ee6e345de0 R14: ff4290ee81290000 R15: ff4290ee6e345e30
[83428.295833] FS:  00007fd9472a6740(0000) GS:ff4290f5ce96b000(0000) knlGS:0000000000000000
[83428.295837] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[83428.295840] CR2: 00007f242b4db238 CR3: 00000002372b8006 CR4: 0000000000771ef0
[83428.295844] PKRU: 55555554
[83428.295846] Call Trace:
[83428.295848]  <TASK>
[83428.295850]  _dev_printk+0x5c/0x80
[83428.295857]  ? srso_alias_return_thunk+0x5/0xfbef5
[83428.295863]  mpt3sas_transport_port_remove+0x1c7/0x420 [mpt3sas]
[83428.295882]  _scsih_remove_device+0x21b/0x280 [mpt3sas]
[83428.295894]  ? _scsih_expander_node_remove+0x108/0x140 [mpt3sas]
[83428.295906]  ? srso_alias_return_thunk+0x5/0xfbef5
[83428.295910]  mpt3sas_device_remove_by_sas_address.part.0+0x8f/0x110 [mpt3sas]
[83428.295921]  _scsih_expander_node_remove+0x129/0x140 [mpt3sas]
[83428.295933]  _scsih_expander_node_remove+0x6a/0x140 [mpt3sas]
[83428.295944]  scsih_remove+0x3f0/0x4a0 [mpt3sas]
[83428.295957]  pci_device_remove+0x3b/0xb0
[83428.295962]  device_release_driver_internal+0x193/0x200
[83428.295968]  driver_detach+0x44/0x90
[83428.295971]  bus_remove_driver+0x69/0xf0
[83428.295975]  pci_unregister_driver+0x2a/0xb0
[83428.295979]  _mpt3sas_exit+0x1f/0x300 [mpt3sas]
[83428.295991]  __do_sys_delete_module.constprop.0+0x174/0x310
[83428.295997]  ? srso_alias_return_thunk+0x5/0xfbef5
[83428.296000]  ? __x64_sys_getdents64+0x9a/0x110
[83428.296005]  ? srso_alias_return_thunk+0x5/0xfbef5
[83428.296009]  ? syscall_trace_enter+0xf6/0x1b0
[83428.296014]  do_syscall_64+0x7b/0x2c0
[83428.296019]  ? srso_alias_return_thunk+0x5/0xfbef5
[83428.296023]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fixes: f92363d123 ("[SCSI] mpt3sas: add new driver supporting 12GB SAS")
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-09-24 22:34:42 -04:00
Alok Tiwari 072fdd4b0b scsi: libfc: Fix potential buffer overflow in fc_ct_ms_fill()
The fc_ct_ms_fill() helper currently formats the OS name and version
into entry->value using "%s v%s". Since init_utsname()->sysname and
->release are unbounded strings, snprintf() may attempt to write more
than FC_FDMI_HBA_ATTR_OSNAMEVERSION_LEN bytes, triggering a
-Wformat-truncation warning with W=1.

In file included from drivers/scsi/libfc/fc_elsct.c:18:
drivers/scsi/libfc/fc_encode.h: In function ‘fc_ct_ms_fill.constprop’:
drivers/scsi/libfc/fc_encode.h:359:30: error: ‘%s’ directive output may
be truncated writing up to 64 bytes into a region of size between 62
and 126 [-Werror=format-truncation=]
  359 |                         "%s v%s",
      |                              ^~
  360 |                         init_utsname()->sysname,
  361 |                         init_utsname()->release);
      |                         ~~~~~~~~~~~~~~~~~~~~~~~
drivers/scsi/libfc/fc_encode.h:357:17: note: ‘snprintf’ output between
3 and 131 bytes into a destination of size 128
  357 |                 snprintf((char *)&entry->value,
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  358 |                         FC_FDMI_HBA_ATTR_OSNAMEVERSION_LEN,
      |                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  359 |                         "%s v%s",
      |                         ~~~~~~~~~
  360 |                         init_utsname()->sysname,
      |                         ~~~~~~~~~~~~~~~~~~~~~~~~
  361 |                         init_utsname()->release);
      |                         ~~~~~~~~~~~~~~~~~~~~~~~~

Fix this by using "%.62s v%.62s", which ensures sysname and release are
truncated to fit within the 128-byte field defined by
FC_FDMI_HBA_ATTR_OSNAMEVERSION_LEN.

[mkp: clarified commit description]

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-09-24 21:57:03 -04:00
Liao Yuanhong 15968590f0 scsi: storvsc: Remove redundant ternary operators
Remove redundant ternary operators to clean up the code.

Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-09-24 21:44:38 -04:00
Thorsten Blum 0ac3c901fb scsi: smartpqi: Replace kmalloc() + copy_from_user() with memdup_user()
Replace kmalloc() followed by copy_from_user() with memdup_user() to
simplify and improve pqi_passthru_ioctl().

Since memdup_user() already allocates memory, use kzalloc() in the else
branch instead of manually zeroing 'kernel_buffer' using memset(0).

Return early if an error occurs.  No functional changes intended.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Acked-by: Don Brace <don.brace@microchip.com>
Message-Id: <20250922201832.1697874-2-thorsten.blum@linux.dev>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-09-24 21:41:24 -04:00
Thorsten Blum ac01fc418f scsi: hpsa: Replace kmalloc() + copy_from_user() with memdup_user()
Replace kmalloc() followed by copy_from_user() with memdup_user() to
improve and simplify hpsa_passthru_ioctl().

Since memdup_user() already allocates memory, use kzalloc() in the else
branch instead of manually zeroing 'buff' using memset(0).

Return early if an error occurs and remove the 'out_kfree' label.

No functional changes intended.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Acked-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-09-24 21:41:24 -04:00
Thorsten Blum b81296591c scsi: hpsa: Fix potential memory leak in hpsa_big_passthru_ioctl()
Replace kmalloc() followed by copy_from_user() with memdup_user() to fix
a memory leak that occurs when copy_from_user(buff[sg_used],,) fails and
the 'cleanup1:' path does not free the memory for 'buff[sg_used]'. Using
memdup_user() avoids this by freeing the memory internally.

Since memdup_user() already allocates memory, use kzalloc() in the else
branch instead of manually zeroing 'buff[sg_used]' using memset(0).

Cc: stable@vger.kernel.org
Fixes: edd163687e ("[SCSI] hpsa: add driver for HP Smart Array controllers.")
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Acked-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-09-24 21:27:00 -04:00
Nathan Chancellor c7d3dd9163
Merge patch series "Add generated modalias to modules.builtin.modinfo"
Alexey Gladkov says:

The modules.builtin.modinfo file is used by userspace (kmod to be specific) to
get information about builtin modules. Among other information about the module,
information about module aliases is stored. This is very important to determine
that a particular modalias will be handled by a module that is inside the
kernel.

There are several mechanisms for creating modalias for modules:

The first is to explicitly specify the MODULE_ALIAS of the macro. In this case,
the aliases go into the '.modinfo' section of the module if it is compiled
separately or into vmlinux.o if it is builtin into the kernel.

The second is the use of MODULE_DEVICE_TABLE followed by the use of the
modpost utility. In this case, vmlinux.o no longer has this information and
does not get it into modules.builtin.modinfo.

For example:

$ modinfo pci:v00008086d0000A36Dsv00001043sd00008694bc0Csc03i30
modinfo: ERROR: Module pci:v00008086d0000A36Dsv00001043sd00008694bc0Csc03i30 not found.

$ modinfo xhci_pci
name:           xhci_pci
filename:       (builtin)
license:        GPL
file:           drivers/usb/host/xhci-pci
description:    xHCI PCI Host Controller Driver

The builtin module is missing alias "pci:v*d*sv*sd*bc0Csc03i30*" which will be
generated by modpost if the module is built separately.

To fix this it is necessary to add the generated by modpost modalias to
modules.builtin.modinfo. Fortunately modpost already generates .vmlinux.export.c
for exported symbols. It is possible to add `.modinfo` for builtin modules and
modify the build system so that `.modinfo` section is extracted from the
intermediate vmlinux after modpost is executed.

Link: https://patch.msgid.link/cover.1758182101.git.legion@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2025-09-24 09:10:54 -07:00
Alexey Gladkov b88f88c267
scsi: Always define blogic_pci_tbl structure
The blogic_pci_tbl structure is used by the MODULE_DEVICE_TABLE macro.
There is no longer a need to protect it with the MODULE condition, since
this no longer causes the compiler to warn about an unused variable.

To avoid warnings when -Wunused-const-variable option is used, mark it
as __maybe_unused for such configuration.

Cc: Khalid Aziz <khalid@gonehiking.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Suggested-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Alexey Gladkov <legion@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://patch.msgid.link/fd8e30de07de79a4923ae967eaee5ba2f2fcef00.1758182101.git.legion@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2025-09-24 09:10:44 -07:00
David Hildenbrand d66ff3db89 scsi: sg: drop nth_page() usage within SG entry
It's no longer required to use nth_page() when iterating pages within a
single SG entry, so let's drop the nth_page() usage.

Link: https://lkml.kernel.org/r/20250901150359.867252-32-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-21 14:22:08 -07:00
David Hildenbrand 9b6024fa76 scsi: scsi_lib: drop nth_page() usage within SG entry
It's no longer required to use nth_page() when iterating pages within a
single SG entry, so let's drop the nth_page() usage.

Link: https://lkml.kernel.org/r/20250901150359.867252-31-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-21 14:22:08 -07:00
Martin K. Petersen 88e8acffd7 Merge patch series "Update lpfc to revision 14.4.0.11"
Justin Tee <justintee8345@gmail.com> says:

Update lpfc to revision 14.4.0.11

This patch set contains clean up of unused members in various structs,
bug fixes related to discovery and resource allocation, and updates to
handling of debugfs entries.

The patches were cut against Martin's 6.18/scsi-queue tree.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-09-16 22:20:57 -04:00
Justin Tee a28205c2bc scsi: lpfc: Copyright updates for 14.4.0.11 patches
Update copyrights to 2025 for files modified in the 14.4.0.11 patch set.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Message-ID: <20250915180811.137530-15-justintee8345@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-09-16 22:20:00 -04:00