Commit Graph

2992 Commits

Author SHA1 Message Date
Andreas Gruenbacher bb504b4d64
lockref: remove count argument of lockref_init
All users of lockref_init() now initialize the count to 1, so hardcode
that and remove the count argument.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Link: https://lore.kernel.org/r/20250130135624.1899988-4-agruenba@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07 10:27:25 +01:00
Andreas Gruenbacher 34ad6fa2ad
gfs2: switch to lockref_init(..., 1)
In qd_alloc(), initialize the lockref count to 1 to cover the common
case.  Compensate for that in gfs2_quota_init() by adjusting the count
back down to 0; this only occurs when mounting the filesystem rw.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Link: https://lore.kernel.org/r/20250130135624.1899988-3-agruenba@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07 10:27:25 +01:00
Andreas Gruenbacher d9b3a3c70d
gfs2: use lockref_init for gl_lockref
Move the initialization of gl_lockref from gfs2_init_glock_once() to
gfs2_glock_get().  This allows to use lockref_init() there.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Link: https://lore.kernel.org/r/20250130135624.1899988-2-agruenba@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07 10:27:25 +01:00
Linus Torvalds d3d90cc289 Provide stable parent and name to ->d_revalidate() instances
Most of the filesystem methods where we care about dentry name
 and parent have their stability guaranteed by the callers;
 ->d_revalidate() is the major exception.
 
 It's easy enough for callers to supply stable values for
 expected name and expected parent of the dentry being
 validated.  That kills quite a bit of boilerplate in
 ->d_revalidate() instances, along with a bunch of races
 where they used to access ->d_name without sufficient
 precautions.
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZ5gkoQAKCRBZ7Krx/gZQ
 6w9FAP4nyxNNWMjE1TwuWR/DNDMYYuw/qn/miZ88B5BUM8hzqgD/W2SjRvcbSaIm
 xSIYpbtKgtqNU34P1PU+dBvL8Utz2AE=
 =TWY8
 -----END PGP SIGNATURE-----

Merge tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs d_revalidate updates from Al Viro:
 "Provide stable parent and name to ->d_revalidate() instances

  Most of the filesystem methods where we care about dentry name and
  parent have their stability guaranteed by the callers;
  ->d_revalidate() is the major exception.

  It's easy enough for callers to supply stable values for expected name
  and expected parent of the dentry being validated. That kills quite a
  bit of boilerplate in ->d_revalidate() instances, along with a bunch
  of races where they used to access ->d_name without sufficient
  precautions"

* tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  9p: fix ->rename_sem exclusion
  orangefs_d_revalidate(): use stable parent inode and name passed by caller
  ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller
  nfs: fix ->d_revalidate() UAF on ->d_name accesses
  nfs{,4}_lookup_validate(): use stable parent inode passed by caller
  gfs2_drevalidate(): use stable parent inode and name passed by caller
  fuse_dentry_revalidate(): use stable parent inode and name passed by caller
  vfat_revalidate{,_ci}(): use stable parent inode passed by caller
  exfat_d_revalidate(): use stable parent inode passed by caller
  fscrypt_d_revalidate(): use stable parent inode passed by caller
  ceph_d_revalidate(): propagate stable name down into request encoding
  ceph_d_revalidate(): use stable parent inode passed by caller
  afs_d_revalidate(): use stable name and parent inode passed by caller
  Pass parent directory inode and expected name to ->d_revalidate()
  generic_ci_d_compare(): use shortname_storage
  ext4 fast_commit: make use of name_snapshot primitives
  dissolve external_name.u into separate members
  make take_dentry_name_snapshot() lockless
  dcache: back inline names with a struct-wrapped array of unsigned long
  make sure that DNAME_INLINE_LEN is a multiple of word size
2025-01-30 09:13:35 -08:00
Al Viro eab2a11e5b gfs2_drevalidate(): use stable parent inode and name passed by caller
No need to mess with dget_parent() for the former; for the latter we really should
not rely upon ->d_name.name remaining stable.  Theoretically a UAF, but it's
hard to exfiltrate the information...

Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:25:24 -05:00
Al Viro 5be1fa8abd Pass parent directory inode and expected name to ->d_revalidate()
->d_revalidate() often needs to access dentry parent and name; that has
to be done carefully, since the locking environment varies from caller
to caller.  We are not guaranteed that dentry in question will not be
moved right under us - not unless the filesystem is such that nothing
on it ever gets renamed.

It can be dealt with, but that results in boilerplate code that isn't
even needed - the callers normally have just found the dentry via dcache
lookup and want to verify that it's in the right place; they already
have the values of ->d_parent and ->d_name stable.  There is a couple
of exceptions (overlayfs and, to less extent, ecryptfs), but for the
majority of calls that song and dance is not needed at all.

It's easier to make ecryptfs and overlayfs find and pass those values if
there's a ->d_revalidate() instance to be called, rather than doing that
in the instances.

This commit only changes the calling conventions; making use of supplied
values is left to followups.

NOTE: some instances need more than just the parent - things like CIFS
may need to build an entire path from filesystem root, so they need
more precautions than the usual boilerplate.  This series doesn't
do anything to that need - these filesystems have to keep their locking
mechanisms (rename_lock loops, use of dentry_path_raw(), private rwsem
a-la v9fs).

One thing to keep in mind when using name is that name->name will normally
point into the pathname being resolved; the filename in question occupies
name->len bytes starting at name->name, and there is NUL somewhere after it,
but it the next byte might very well be '/' rather than '\0'.  Do not
ignore name->len.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:25:23 -05:00
Linus Torvalds 1851bccf60 gfs2 changes
- In the quota code, to avoid spurious audit messages, don't call
   capable() when quotas are off.
 
 - When changing the 'j' flag of an inode, truncate the inode address
   space to avoid mixing "buffer head" and "iomap" pages.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmeOWTkUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTqMEBAAh5bu0h7611UZTsYkDvGClnxc7oBo
 j/0JCMyeHDvIRaGBFnzFFx1QNXW9ChXy/FgocQED13LPaiZ7kuFjwQPJoW9TH440
 QrWa8xpxiiSz232maP0wQ7Y6K7EW7GW9tvCrqXj64PGb56TWQu+vEcACUtL6sN3V
 +Xli2TljZNDwGsBQygiGvUR2ICfipFNUoV4Yrxv15WdOM1cQmF9F5P7SwFe9mCBh
 tjF/D/vxxmQwR5njalnF0oTFSNlQmYpuaLPzZMdOnsFbwyFda/DncdLbbl9LsmMF
 +C7zzAC3gY8Iq6K4azDE1SI6Gh5JGxA8lcIHtDVbUoFJwHVV/Jg8kWGPRcTuvKs+
 LL8moxut6Id6HmPDmJA2tjjpKYfbnGstNdUIbozNhV5A634AmlbaqA+PwFxDcNs4
 JZdbK4tPSVV7fzodQHZg0vewB+E49yBsCtZ+ows27MzgQFYWKrcngkf/Twn+e5F+
 s59cFi31KzgaLMMCelkDlFwg5Dp8QDYAKZ0UYknU/rVoHFERGCrBn1QwDCk8aMa3
 /IeCQGDu3ry7kpMXGWXIRu4Bmf/k1J89H2UjMdBMTquqzU6QYcKdzvE6JLt8GHTK
 buE/D1y9t2wxERKBmjHb6KGUcso7RCVwT48cvEPmj+7OdynBPufDGPP666gpsA46
 qZNhpfMnS+f1zrQ=
 =0Cc4
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-for-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:

 - In the quota code, to avoid spurious audit messages, don't call
   capable() when quotas are off

 - When changing the 'j' flag of an inode, truncate the inode address
   space to avoid mixing "buffer head" and "iomap" pages

* tag 'gfs2-for-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Truncate address space when flipping GFS2_DIF_JDATA flag
  gfs2: reorder capability check last
2025-01-20 13:06:28 -08:00
Christoph Hellwig 3e652eba24
gfs2: use lockref_init for qd_lockref
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250115094702.504610-9-hch@lst.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-16 11:48:12 +01:00
Andreas Gruenbacher 7c9d922380 gfs2: Truncate address space when flipping GFS2_DIF_JDATA flag
Truncate an inode's address space when flipping the GFS2_DIF_JDATA flag:
depending on that flag, the pages in the address space will either use
buffer heads or iomap_folio_state structs, and we cannot mix the two.

Reported-by: Kun Hu <huk23@m.fudan.edu.cn>, Jiaji Qin <jjtan24@m.fudan.edu.cn>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-01-14 18:54:08 +01:00
Christian Göttsche ead64b20f1 gfs2: reorder capability check last
capable() calls refer to enabled LSMs whether to permit or deny the
request.  This is relevant in connection with SELinux, where a
capability check results in a policy decision and by default a denial
message on insufficient permission is issued.
It can lead to three undesired cases:
  1. A denial message is generated, even in case the operation was an
     unprivileged one and thus the syscall succeeded, creating noise.
  2. To avoid the noise from 1. the policy writer adds a rule to ignore
     those denial messages, hiding future syscalls, where the task
     performs an actual privileged operation, leading to hidden limited
     functionality of that task.
  3. To avoid the noise from 1. the policy writer adds a rule to permit
     the task the requested capability, while it does not need it,
     violating the principle of least privilege.

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-12-09 10:44:35 +01:00
Linus Torvalds ff2a7a064a gfs2 changes
- Fix the code that cleans up left-over unlinked files.  Various fixes
   and minor improvements in deleting files cached or held open remotely.
 
 - Simplify the use of dlm's DLM_LKF_QUECVT flag.
 
 - A few other minor cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmdESTYUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTpdyA/9EWDxx2Y6JeVeAC+J138pSOYqHtwn
 wLtMeTdwbycW6M8V5kyW3vCh+lLLS6s0dZuwn2Xv8jx5QytrD4c51Wj3bRYjuidM
 Zt0L+wohOQISvL1+AViYuIns2pzQQvNZUC2aAVr9J3KGhdIFonbU6PdLOeEN0cZe
 R08Nseux9oJ/geaKJ3jh/ReX2VZehp2WAaQ4I+PoQkkNflBULPkyysxjkv9sc8tW
 9hN1sK7dk/U5OLKr4H6SSi1Uu6N6Wek0x2zo4NxTRqyfBiRXYtZYnXPkdftuB+6N
 M7N2dAIuhnXiAhQdo7OOe9hZZVXTFhmeQK1tyTsw/FZkQJNMX+bdBn4g7NV94drz
 CpTliqm+Z5dTnkSdS4cIozkQZ7zID1eibX8uF7QsnozBWm7bjbW6fi7a+z+u5ykN
 hsWanoMKhH1524oNKaiSjIxT0b1oda114DJQVpdU68HjkyHf5l0GXUTcVpg0dxs3
 peXhpZ+CjHbaTMXl5xqGOucD+ACPhMOGXPAX1lF2bIcfbLqgbTVn0fMMUYWeb8j1
 medJtQ0itwpiCHZTl62xUOLEOCqCiS5J1/TjrwNuJ1HLJ5JP1UePNl5kjT9nDfsA
 KXB31sKFfPX99rPVYJgjLXPgRLwslcniSHOg9p+bWwq7ZI1PSxPSauUgtKZpSe6A
 E3YfnIxjPxRMKks=
 =BP41
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:

 - Fix the code that cleans up left-over unlinked files.

   Various fixes and minor improvements in deleting files cached or held
   open remotely.

 - Simplify the use of dlm's DLM_LKF_QUECVT flag.

 - A few other minor cleanups.

* tag 'gfs2-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (21 commits)
  gfs2: Prevent inode creation race
  gfs2: Only defer deletes when we have an iopen glock
  gfs2: Simplify DLM_LKF_QUECVT use
  gfs2: gfs2_evict_inode clarification
  gfs2: Make gfs2_inode_refresh static
  gfs2: Use get_random_u32 in gfs2_orlov_skip
  gfs2: Randomize GLF_VERIFY_DELETE work delay
  gfs2: Use mod_delayed_work in gfs2_queue_try_to_evict
  gfs2: Update to the evict / remote delete documentation
  gfs2: Call gfs2_queue_verify_delete from gfs2_evict_inode
  gfs2: Clean up delete work processing
  gfs2: Minor delete_work_func cleanup
  gfs2: Return enum evict_behavior from gfs2_upgrade_iopen_glock
  gfs2: Rename dinode_demise to evict_behavior
  gfs2: Rename GIF_{DEFERRED -> DEFER}_DELETE
  gfs2: Faster gfs2_upgrade_iopen_glock wakeups
  KMSAN: uninit-value in inode_go_dump (5)
  gfs2: Fix unlinked inode cleanup
  gfs2: Allow immediate GLF_VERIFY_DELETE work
  gfs2: Initialize gl_no_formal_ino earlier
  ...
2024-11-26 12:34:50 -08:00
Linus Torvalds 5c00ff742b - The series "zram: optimal post-processing target selection" from
Sergey Senozhatsky improves zram's post-processing selection algorithm.
   This leads to improved memory savings.
 
 - Wei Yang has gone to town on the mapletree code, contributing several
   series which clean up the implementation:
 
 	- "refine mas_mab_cp()"
 	- "Reduce the space to be cleared for maple_big_node"
 	- "maple_tree: simplify mas_push_node()"
 	- "Following cleanup after introduce mas_wr_store_type()"
 	- "refine storing null"
 
 - The series "selftests/mm: hugetlb_fault_after_madv improvements" from
   David Hildenbrand fixes this selftest for s390.
 
 - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng
   implements some rationaizations and cleanups in the page mapping code.
 
 - The series "mm: optimize shadow entries removal" from Shakeel Butt
   optimizes the file truncation code by speeding up the handling of shadow
   entries.
 
 - The series "Remove PageKsm()" from Matthew Wilcox completes the
   migration of this flag over to being a folio-based flag.
 
 - The series "Unify hugetlb into arch_get_unmapped_area functions" from
   Oscar Salvador implements a bunch of consolidations and cleanups in the
   hugetlb code.
 
 - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain
   takes away the wp-fault time practice of turning a huge zero page into
   small pages.  Instead we replace the whole thing with a THP.  More
   consistent cleaner and potentiall saves a large number of pagefaults.
 
 - The series "percpu: Add a test case and fix for clang" from Andy
   Shevchenko enhances and fixes the kernel's built in percpu test code.
 
 - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett
   optimizes mremap() by avoiding doing things which we didn't need to do.
 
 - The series "Improve the tmpfs large folio read performance" from
   Baolin Wang teaches tmpfs to copy data into userspace at the folio size
   rather than as individual pages.  A 20% speedup was observed.
 
 - The series "mm/damon/vaddr: Fix issue in
   damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting.
 
 - The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt
   removes the long-deprecated memcgv2 charge moving feature.
 
 - The series "fix error handling in mmap_region() and refactor" from
   Lorenzo Stoakes cleanup up some of the mmap() error handling and
   addresses some potential performance issues.
 
 - The series "x86/module: use large ROX pages for text allocations" from
   Mike Rapoport teaches x86 to use large pages for read-only-execute
   module text.
 
 - The series "page allocation tag compression" from Suren Baghdasaryan
   is followon maintenance work for the new page allocation profiling
   feature.
 
 - The series "page->index removals in mm" from Matthew Wilcox remove
   most references to page->index in mm/.  A slow march towards shrinking
   struct page.
 
 - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs
   interface tests" from Andrew Paniakin performs maintenance work for
   DAMON's self testing code.
 
 - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar
   improves zswap's batching of compression and decompression.  It is a
   step along the way towards using Intel IAA hardware acceleration for
   this zswap operation.
 
 - The series "kasan: migrate the last module test to kunit" from
   Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests
   over to the KUnit framework.
 
 - The series "implement lightweight guard pages" from Lorenzo Stoakes
   permits userapace to place fault-generating guard pages within a single
   VMA, rather than requiring that multiple VMAs be created for this.
   Improved efficiencies for userspace memory allocators are expected.
 
 - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses
   tracepoints to provide increased visibility into memcg stats flushing
   activity.
 
 - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky
   fixes a zram buglet which potentially affected performance.
 
 - The series "mm: add more kernel parameters to control mTHP" from
   Maíra Canal enhances our ability to control/configuremultisize THP from
   the kernel boot command line.
 
 - The series "kasan: few improvements on kunit tests" from Sabyrzhan
   Tasbolatov has a couple of fixups for the KASAN KUnit tests.
 
 - The series "mm/list_lru: Split list_lru lock into per-cgroup scope"
   from Kairui Song optimizes list_lru memory utilization when lockdep is
   enabled.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZzwFqgAKCRDdBJ7gKXxA
 jkeuAQCkl+BmeYHE6uG0hi3pRxkupseR6DEOAYIiTv0/l8/GggD/Z3jmEeqnZaNq
 xyyenpibWgUoShU2wZ/Ha8FE5WDINwg=
 =JfWR
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - The series "zram: optimal post-processing target selection" from
   Sergey Senozhatsky improves zram's post-processing selection
   algorithm. This leads to improved memory savings.

 - Wei Yang has gone to town on the mapletree code, contributing several
   series which clean up the implementation:
	- "refine mas_mab_cp()"
	- "Reduce the space to be cleared for maple_big_node"
	- "maple_tree: simplify mas_push_node()"
	- "Following cleanup after introduce mas_wr_store_type()"
	- "refine storing null"

 - The series "selftests/mm: hugetlb_fault_after_madv improvements" from
   David Hildenbrand fixes this selftest for s390.

 - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng
   implements some rationaizations and cleanups in the page mapping
   code.

 - The series "mm: optimize shadow entries removal" from Shakeel Butt
   optimizes the file truncation code by speeding up the handling of
   shadow entries.

 - The series "Remove PageKsm()" from Matthew Wilcox completes the
   migration of this flag over to being a folio-based flag.

 - The series "Unify hugetlb into arch_get_unmapped_area functions" from
   Oscar Salvador implements a bunch of consolidations and cleanups in
   the hugetlb code.

 - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain
   takes away the wp-fault time practice of turning a huge zero page
   into small pages. Instead we replace the whole thing with a THP. More
   consistent cleaner and potentiall saves a large number of pagefaults.

 - The series "percpu: Add a test case and fix for clang" from Andy
   Shevchenko enhances and fixes the kernel's built in percpu test code.

 - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett
   optimizes mremap() by avoiding doing things which we didn't need to
   do.

 - The series "Improve the tmpfs large folio read performance" from
   Baolin Wang teaches tmpfs to copy data into userspace at the folio
   size rather than as individual pages. A 20% speedup was observed.

 - The series "mm/damon/vaddr: Fix issue in
   damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON
   splitting.

 - The series "memcg-v1: fully deprecate charge moving" from Shakeel
   Butt removes the long-deprecated memcgv2 charge moving feature.

 - The series "fix error handling in mmap_region() and refactor" from
   Lorenzo Stoakes cleanup up some of the mmap() error handling and
   addresses some potential performance issues.

 - The series "x86/module: use large ROX pages for text allocations"
   from Mike Rapoport teaches x86 to use large pages for
   read-only-execute module text.

 - The series "page allocation tag compression" from Suren Baghdasaryan
   is followon maintenance work for the new page allocation profiling
   feature.

 - The series "page->index removals in mm" from Matthew Wilcox remove
   most references to page->index in mm/. A slow march towards shrinking
   struct page.

 - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs
   interface tests" from Andrew Paniakin performs maintenance work for
   DAMON's self testing code.

 - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar
   improves zswap's batching of compression and decompression. It is a
   step along the way towards using Intel IAA hardware acceleration for
   this zswap operation.

 - The series "kasan: migrate the last module test to kunit" from
   Sabyrzhan Tasbolatov completes the migration of the KASAN built-in
   tests over to the KUnit framework.

 - The series "implement lightweight guard pages" from Lorenzo Stoakes
   permits userapace to place fault-generating guard pages within a
   single VMA, rather than requiring that multiple VMAs be created for
   this. Improved efficiencies for userspace memory allocators are
   expected.

 - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses
   tracepoints to provide increased visibility into memcg stats flushing
   activity.

 - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky
   fixes a zram buglet which potentially affected performance.

 - The series "mm: add more kernel parameters to control mTHP" from
   Maíra Canal enhances our ability to control/configuremultisize THP
   from the kernel boot command line.

 - The series "kasan: few improvements on kunit tests" from Sabyrzhan
   Tasbolatov has a couple of fixups for the KASAN KUnit tests.

 - The series "mm/list_lru: Split list_lru lock into per-cgroup scope"
   from Kairui Song optimizes list_lru memory utilization when lockdep
   is enabled.

* tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits)
  cma: enforce non-zero pageblock_order during cma_init_reserved_mem()
  mm/kfence: add a new kunit test test_use_after_free_read_nofault()
  zram: fix NULL pointer in comp_algorithm_show()
  memcg/hugetlb: add hugeTLB counters to memcg
  vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event
  mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount
  zram: ZRAM_DEF_COMP should depend on ZRAM
  MAINTAINERS/MEMORY MANAGEMENT: add document files for mm
  Docs/mm/damon: recommend academic papers to read and/or cite
  mm: define general function pXd_init()
  kmemleak: iommu/iova: fix transient kmemleak false positive
  mm/list_lru: simplify the list_lru walk callback function
  mm/list_lru: split the lock to per-cgroup scope
  mm/list_lru: simplify reparenting and initial allocation
  mm/list_lru: code clean up for reparenting
  mm/list_lru: don't export list_lru_add
  mm/list_lru: don't pass unnecessary key parameters
  kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller
  kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW
  kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols
  ...
2024-11-23 09:58:07 -08:00
Andreas Gruenbacher ffd1cf0443 gfs2: Prevent inode creation race
When a request to evict an inode comes in over the network, we are
trying to grab an inode reference via the iopen glock's gl_object
pointer.  There is a very small probability that by the time such a
request comes in, inode creation hasn't completed and the I_NEW flag is
still set.  To deal with that, wait for the inode and then check if
inode creation was successful.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-19 13:05:41 +01:00
Andreas Gruenbacher c5b7a2400e gfs2: Only defer deletes when we have an iopen glock
The mechanism to defer deleting unlinked inodes is tied to
delete_work_func(), which is tied to iopen glocks.  When we don't have
an iopen glock, we must carry out deletes immediately instead.

Fixes a NULL pointer dereference in gfs2_evict_inode().

Fixes: 8c21c2c71e ("gfs2: Call gfs2_queue_verify_delete from gfs2_evict_inode")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-19 12:33:20 +01:00
Linus Torvalds 4c797b11a8 vfs-6.13.file
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcW4gAKCRCRxhvAZXjc
 okF+AP9xTMb2SlnRPBOBd9yFcmVXmQi86TSCUPAEVb+wIldGYwD/RIOdvXYJlp9v
 RgJkU1DC3ddkXtONNDY6gFaP+siIWA0=
 =gMc7
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.13.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs file updates from Christian Brauner:
 "This contains changes the changes for files for this cycle:

   - Introduce a new reference counting mechanism for files.

     As atomic_inc_not_zero() is implemented with a try_cmpxchg() loop
     it has O(N^2) behaviour under contention with N concurrent
     operations and it is in a hot path in __fget_files_rcu().

     The rcuref infrastructures remedies this problem by using an
     unconditional increment relying on safe- and dead zones to make
     this work and requiring rcu protection for the data structure in
     question. This not just scales better it also introduces overflow
     protection.

     However, in contrast to generic rcuref, files require a memory
     barrier and thus cannot rely on *_relaxed() atomic operations and
     also require to be built on atomic_long_t as having massive amounts
     of reference isn't unheard of even if it is just an attack.

     This adds a file specific variant instead of making this a generic
     library.

     This has been tested by various people and it gives consistent
     improvement up to 3-5% on workloads with loads of threads.

   - Add a fastpath for find_next_zero_bit(). Skip 2-levels searching
     via find_next_zero_bit() when there is a free slot in the word that
     contains the next fd. This improves pts/blogbench-1.1.0 read by 8%
     and write by 4% on Intel ICX 160.

   - Conditionally clear full_fds_bits since it's very likely that a bit
     in full_fds_bits has been cleared during __clear_open_fds(). This
     improves pts/blogbench-1.1.0 read up to 13%, and write up to 5% on
     Intel ICX 160.

   - Get rid of all lookup_*_fdget_rcu() variants. They were used to
     lookup files without taking a reference count. That became invalid
     once files were switched to SLAB_TYPESAFE_BY_RCU and now we're
     always taking a reference count. Switch to an already existing
     helper and remove the legacy variants.

   - Remove pointless includes of <linux/fdtable.h>.

   - Avoid cmpxchg() in close_files() as nobody else has a reference to
     the files_struct at that point.

   - Move close_range() into fs/file.c and fold __close_range() into it.

   - Cleanup calling conventions of alloc_fdtable() and expand_files().

   - Merge __{set,clear}_close_on_exec() into one.

   - Make __set_open_fd() set cloexec as well instead of doing it in two
     separate steps"

* tag 'vfs-6.13.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  selftests: add file SLAB_TYPESAFE_BY_RCU recycling stressor
  fs: port files to file_ref
  fs: add file_ref
  expand_files(): simplify calling conventions
  make __set_open_fd() set cloexec state as well
  fs: protect backing files with rcu
  file.c: merge __{set,clear}_close_on_exec()
  alloc_fdtable(): change calling conventions.
  fs/file.c: add fast path in find_next_fd()
  fs/file.c: conditionally clear full_fds
  fs/file.c: remove sanity_check and add likely/unlikely in alloc_fd()
  move close_range(2) into fs/file.c, fold __close_range() into it
  close_files(): don't bother with xchg()
  remove pointless includes of <linux/fdtable.h>
  get rid of ...lookup...fdget_rcu() family
2024-11-18 10:30:29 -08:00
Kairui Song da0c02516c mm/list_lru: simplify the list_lru walk callback function
Now isolation no longer takes the list_lru global node lock, only use the
per-cgroup lock instead.  And this lock is inside the list_lru_one being
walked, no longer needed to pass the lock explicitly.

Link: https://lkml.kernel.org/r/20241104175257.60853-7-ryncsn@gmail.com
Signed-off-by: Kairui Song <kasong@tencent.com>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11 17:22:26 -08:00
Andreas Gruenbacher b6900ce151 gfs2: Simplify DLM_LKF_QUECVT use
The DLM_LKF_QUECVT flag needs to be set for "upward" lock conversions to
ensure fairness, but setting it for "downward" lock conversions will
lead to a failure.  The flag is currently set based on the GLF_BLOCKING
flag and it's not immediately obvious why this is correct.  Simplify
things by figuring out if a lock conversion is "upward" by looking at
the before and after locking modes instead of relying on the
GLF_BLOCKING flag.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05 12:39:29 +01:00
Andreas Gruenbacher 03ff3781bf gfs2: gfs2_evict_inode clarification
When function evict_should_delete() returns SHOULD_DEFER_EVICTION, gh is
never initialized, but that isn't obvious; if it did initialize gh and
then return SHOULD_DEFER_EVICTION, gfs2_evict_inode() would fail to
release it.  To clarify the code, change gfs2_evict_inode() to always
check if gh needs to be released, no matter what evict_should_delete()
returns.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05 12:39:29 +01:00
Andreas Gruenbacher 70cddf16cb gfs2: Make gfs2_inode_refresh static
Function gfs2_inode_refresh() is only used in fs/gfs2/glops.c.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05 12:39:29 +01:00
Andreas Gruenbacher 0c5bee608f gfs2: Use get_random_u32 in gfs2_orlov_skip
Use get_random_u32() instead of get_random_bytes() to remove the last
remaining call to get_random_bytes().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05 12:39:29 +01:00
Andreas Gruenbacher 085e423b4d gfs2: Randomize GLF_VERIFY_DELETE work delay
Randomize the delay of GLF_VERIFY_DELETE work.  This avoids thundering
herd problems when multiple nodes schedule that kind of work in response
to an inode being unlinked remotely.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05 12:39:29 +01:00
Andreas Gruenbacher f6ca45e3d2 gfs2: Use mod_delayed_work in gfs2_queue_try_to_evict
In the unlikely case that we're trying to queue GLF_TRY_TO_EVICT work
for an inode that already has GLF_VERIFY_DELETE work queued, we want to
make sure that the GLF_TRY_TO_EVICT work gets scheduled immediately
instead of waiting for the delayed work timer to expire.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05 12:39:29 +01:00
Andreas Gruenbacher a6033333cc gfs2: Update to the evict / remote delete documentation
Try to be a bit more clear and remove some duplications.  We cannot
actually get rid of the verification step eventually, so remove the
comment saying so.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05 12:39:29 +01:00
Andreas Gruenbacher 8c21c2c71e gfs2: Call gfs2_queue_verify_delete from gfs2_evict_inode
Move calls to gfs2_queue_verify_delete() into gfs2_evict_inode().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05 12:39:29 +01:00
Andreas Gruenbacher 0baa10b60c gfs2: Clean up delete work processing
Function delete_work_func() was previously assuming that the
GLF_TRY_TO_EVICT and GLF_VERIFY_DELETE flags won't both be set at the
same time, but there probably are races in which that can happen, so
handle that case correctly.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05 12:39:29 +01:00
Andreas Gruenbacher b4100457d0 gfs2: Minor delete_work_func cleanup
Move those definitions into the the scope in which they are used.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05 12:39:28 +01:00
Andreas Gruenbacher a94dafe87d gfs2: Return enum evict_behavior from gfs2_upgrade_iopen_glock
In case an iopen glock cannot be upgraded, function
gfs2_upgrade_iopen_glock() needs to communicate to gfs2_evict_inode()
whether deleting the inode should be deferred or skipped altogether.
Change the function to return the appropriate enum evict_behavior value
to indicate that.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05 12:39:28 +01:00
Andreas Gruenbacher c79ba4be35 gfs2: Rename dinode_demise to evict_behavior
Rename enum dinode_demise to evict_behavior and its items
SHOULD_DELETE_DINODE to EVICT_SHOULD_DELETE,
SHOULD_NOT_DELETE_DINODE to EVICT_SHOULD_SKIP_DELETE, and
SHOULD_DEFER_EVICTION to EVICT_SHOULD_DEFER_DELETE.

In gfs2_evict_inode(), add a separate variable of type enum
evict_behavior instead of implicitly casting to int.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05 12:39:28 +01:00
Andreas Gruenbacher 9fb794aac6 gfs2: Rename GIF_{DEFERRED -> DEFER}_DELETE
The GIF_DEFERRED_DELETE flag indicates an action that gfs2_evict_inode()
should take, so rename the flag to GIF_DEFER_DELETE to clarify.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05 12:39:28 +01:00
Andreas Gruenbacher ee51baa817 gfs2: Faster gfs2_upgrade_iopen_glock wakeups
Move function needs_demote() to glock.h and rename it to
glock_needs_demote().  In handle_callback(), wake up the glock when
setting the GLF_PENDING_DEMOTE flag as well.  (Setting the GLF_DEMOTE
flag already triggered a wake-up.)

With that, check for glock_needs_demote() in gfs2_upgrade_iopen_glock()
to wake up when either of those flags is set for the inode glock: the
faster we can react to contention, the better.

The GLF_PENDING_DEMOTE flag is only used for inode glocks (see
gfs2_glock_cb()) so it's okay to only check for the GLF_DEMOTE flag in
gfs2_drop_inode().  Still, using glock_needs_demote() there as well
makes the code a little easier to read.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05 12:39:28 +01:00
Qianqiang Liu f9417fcfca KMSAN: uninit-value in inode_go_dump (5)
When mounting of a corrupted disk image fails, the error message printed
can reference uninitialized inode fields.  To prevent that from happening,
always initialize those fields.

Reported-by: syzbot+aa0730b0a42646eb1359@syzkaller.appspotmail.com
Signed-off-by: Qianqiang Liu <qianqiang.liu@163.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-10-22 14:06:11 +02:00
Al Viro 8fd3395ec9 get rid of ...lookup...fdget_rcu() family
Once upon a time, predecessors of those used to do file lookup
without bumping a refcount, provided that caller held rcu_read_lock()
across the lookup and whatever it wanted to read from the struct
file found.  When struct file allocation switched to SLAB_TYPESAFE_BY_RCU,
that stopped being feasible and these primitives started to bump the
file refcount for lookup result, requiring the caller to call fput()
afterwards.

But that turned them pointless - e.g.
	rcu_read_lock();
	file = lookup_fdget_rcu(fd);
	rcu_read_unlock();
is equivalent to
	file = fget_raw(fd);
and all callers of lookup_fdget_rcu() are of that form.  Similarly,
task_lookup_fdget_rcu() calls can be replaced with calling fget_task().
task_lookup_next_fdget_rcu() doesn't have direct counterparts, but
its callers would be happier if we replaced it with an analogue that
deals with RCU internally.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-10-07 13:34:41 -04:00
Christian Brauner 09ee2a670d
Merge patch series "Fixup NLM and kNFSD file lock callbacks"
Benjamin Coddington <bcodding@redhat.com> says:

Last year both GFS2 and OCFS2 had some work done to make their locking more
robust when exported over NFS.  Unfortunately, part of that work caused both
NLM (for NFS v3 exports) and kNFSD (for NFSv4.1+ exports) to no longer send
lock notifications to clients.

This in itself is not a huge problem because most NFS clients will still
poll the server in order to acquire a conflicted lock, but now that I've
noticed it I can't help but try to fix it because there are big advantages
for setups that might depend on timely lock notifications, and we've
supported that as a feature for a long time.

Its important for NLM and kNFSD that they do not block their kernel threads
inside filesystem's file_lock implementations because that can produce
deadlocks.  We used to make sure of this by only trusting that
posix_lock_file() can correctly handle blocking lock calls asynchronously,
so the lock managers would only setup their file_lock requests for async
callbacks if the filesystem did not define its own lock() file operation.

However, when GFS2 and OCFS2 grew the capability to correctly
handle blocking lock requests asynchronously, they started signalling this
behavior with EXPORT_OP_ASYNC_LOCK, and the check for also trusting
posix_lock_file() was inadvertently dropped, so now most filesystems no
longer produce lock notifications when exported over NFS.

I tried to fix this by simply including the old check for lock(), but the
resulting include mess and layering violations was more than I could accept.
There's a much cleaner way presented here using an fop_flag, which while
potentially flag-greedy, greatly simplifies the problem and grooms the
way for future uses by both filesystems and lock managers alike.

* patches from https://lore.kernel.org/r/cover.1726083391.git.bcodding@redhat.com:
  exportfs: Remove EXPORT_OP_ASYNC_LOCK
  NLM/NFSD: Fix lock notifications for async-capable filesystems
  gfs2/ocfs2: set FOP_ASYNC_LOCK
  fs: Introduce FOP_ASYNC_LOCK
  NFS: trace: show TIMEDOUT instead of 0x6e
  nfsd: use system_unbound_wq for nfsd_file_gc_worker()
  nfsd: count nfsd_file allocations
  nfsd: fix refcount leak when file is unhashed after being found
  nfsd: remove unneeded EEXIST error check in nfsd_do_file_acquire
  nfsd: add list_head nf_gc to struct nfsd_file

Link: https://lore.kernel.org/r/cover.1726083391.git.bcodding@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-02 07:52:07 +02:00
Benjamin Coddington b875bd5b38
exportfs: Remove EXPORT_OP_ASYNC_LOCK
Now that GFS2 and OCFS2 are signalling async ->lock() support with
FOP_ASYNC_LOCK and checks for support are converted, we can remove
EXPORT_OP_ASYNC_LOCK.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Link: https://lore.kernel.org/r/0a114db814fec3086f937ae3d44a086f13b8de26.1726083391.git.bcodding@redhat.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-01 17:01:08 +02:00
Andreas Gruenbacher 7c6f714d88 gfs2: Fix unlinked inode cleanup
Before commit f0e56edc2e ("gfs2: Split the two kinds of glock "delete"
work"), function delete_work_func() was used to trigger the eviction of
in-memory inodes from remote as well as deleting unlinked inodes at a
later point.  These two kinds of work were then split into two kinds of
work, and the two places in the code were deferred deletion of inodes is
required accidentally ended up queuing the wrong kind of work.  This
caused unlinked inodes to be left behind, which could in the worst case
fill up filesystems and require a filesystem check to recover.

Fix that by queuing the right kind of work in try_rgrp_unlink() and
gfs2_drop_inode().

Fixes: f0e56edc2e ("gfs2: Split the two kinds of glock "delete" work")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-09-25 17:11:49 +02:00
Andreas Gruenbacher 160bc9555d gfs2: Allow immediate GLF_VERIFY_DELETE work
Add an argument to gfs2_queue_verify_delete() that allows it to queue
GLF_VERIFY_DELETE work for immediate execution.  This is used in the
next patch.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-09-25 17:08:23 +02:00
Andreas Gruenbacher 1072b3aa68 gfs2: Initialize gl_no_formal_ino earlier
Set gl_no_formal_ino of the iopen glock to the generation of the
associated inode (ip->i_no_formal_ino) as soon as that value is known.
This saves us from setting it later, possibly repeatedly, when queuing
GLF_VERIFY_DELETE work.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-09-24 19:03:33 +02:00
Andreas Gruenbacher 820ce8ed53 gfs2: Rename GLF_VERIFY_EVICT to GLF_VERIFY_DELETE
Rename the GLF_VERIFY_EVICT flag to GLF_VERIFY_DELETE: that flag
indicates that we want to delete an inode / verify that it has been
deleted.

To match, rename gfs2_queue_verify_evict() to
gfs2_queue_verify_delete().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-09-24 16:44:22 +02:00
Linus Torvalds 721068dec4 gfs2 changes
- Eliminate the writepage address space operation (by Matthew Wilcox).
 
 - A syzkaller fix (by Julian Sun) and a minor cleanup (by Andreas
   Gruenbacher).
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmbwIfMUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTqj/RAApBJf8Da7U9Qn3rMtCdV3HU8p9VSF
 Fwzr1qVgthNNfFGSaNhTQcTglSL0OLMg8rLaAFN8G0RJrOyJw9Q0GclhLR+wGy5F
 KpvpVczkYh+UphYJ7LhygUE9/cP2GZCSI5EbI+cIifnioY8rztt+e8QqteFpAoFA
 3asfynVWeAnYYh6qIjZBSvi77KWzaxk9Kyv8WScJ//FTNHYYXLMKUhZEXuh0gCdv
 r9VwCnRNTm8X9YtbeaRduC7mSRcVwtV+KCCJ0Lxw292a795g5oWvdwnx4DQg5120
 XvdcGkZOV/sjQE19vhfkJot6kLPhP9PofTWBIbwqMwV/lkxEKlQAPJ5+mnjg56eU
 JekZjibxEewSDkG4riWibBP2WgIfqHkryl8o9a4dZSIjeNfzJYbvhyL2NwPYfFOY
 43VeC6AB6K49gwUtD0gRTOKV/EKEFswRh1Cstvt+hPrpnF7Y/ZUjCptaEk/ZicyK
 qWG/6YVtjXz9HvOo/eojTTlZuwlmh2l63aWwW9s8/aMCei6V5Hs/S3gtuZSg7bKL
 AtfUmwlNLfNSh4VavO6W3SxuCp12OVwPYe0WQPccMshA6fVbsH1QV8LSUS5WKaE7
 TGsevEWEanAm85zlbar2BvIk1PaxRNvNJ7BgZ1LYUwdHvm9/h+JPMZUYqBUDYbZC
 Fz5wF5zjbmeR91k=
 =cuQP
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v6.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 update from Andreas Gruenbacher:

 - Convert the writepage address space operation to writepages (Matthew
   Wilcox)

 - A syzkaller fix (by Julian Sun) and a minor cleanup (Andreas
   Gruenbacher)

* tag 'gfs2-v6.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Remove gfs2_aspace_writepage()
  gfs2: Remove gfs2_jdata_writepage()
  gfs2: Remove __gfs2_writepage()
  gfs2: Add gfs2_aspace_writepages()
  gfs2: fix double destroy_workqueue error
  gfs2: Minor gfs2_glock_cb cleanup
2024-09-23 11:55:17 -07:00
Benjamin Coddington 2253ab99f2
gfs2/ocfs2: set FOP_ASYNC_LOCK
Both GFS2 and OCFS2 use DLM locking, which will allow async lock requests.
Signal this support by setting FOP_ASYNC_LOCK.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Link: https://lore.kernel.org/r/fc4163dbbf33c58e5a8b8ee8cb8c57e555f53ce5.1726083391.git.bcodding@redhat.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12 14:39:05 +02:00
Josef Bacik 31754ea6cb iomap: add a private argument for iomap_file_buffered_write
In order to switch fuse over to using iomap for buffered writes we need
to be able to have the struct file for the original write, in case we
have to read in the page to make it uptodate.  Handle this by using the
existing private field in the iomap_iter, and add the argument to
iomap_file_buffered_write.  This will allow us to pass the file in
through the iomap buffered write path, and is flexible for any other
file systems needs.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/7f55c7c32275004ba00cddf862d970e6e633f750.1724755651.git.josef@toxicpanda.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-03 15:01:23 +02:00
Matthew Wilcox (Oracle) 6888c1e85f gfs2: Remove gfs2_aspace_writepage()
There are no remaining callers of gfs2_aspace_writepage() other than
vmscan, which is known to do more harm than good.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-09-02 14:46:37 +02:00
Matthew Wilcox (Oracle) e5ac171992 gfs2: Remove gfs2_jdata_writepage()
There are no remaining callers of gfs2_jdata_writepage() other than
vmscan, which is known to do more harm than good.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-09-02 14:46:33 +02:00
Matthew Wilcox (Oracle) 8d391972ae gfs2: Remove __gfs2_writepage()
Call aops->writepages() instead of using write_cache_pages() to call
aops->writepage.  Change the handling of -ENODATA to not set the
persistent error on the block device.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-09-02 14:46:29 +02:00
Matthew Wilcox (Oracle) 901849e707 gfs2: Add gfs2_aspace_writepages()
This saves one indirect function call per folio and gets us closer to
removing aops->writepage.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-09-02 14:46:21 +02:00
Julian Sun 6cb9df81a2 gfs2: fix double destroy_workqueue error
When gfs2_fill_super() fails, destroy_workqueue() is called within
gfs2_gl_hash_clear(), and the subsequent code path calls
destroy_workqueue() on the same work queue again.

This issue can be fixed by setting the work queue pointer to NULL after
the first destroy_workqueue() call and checking for a NULL pointer
before attempting to destroy the work queue again.

Reported-by: syzbot+d34c2a269ed512c531b0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d34c2a269ed512c531b0
Fixes: 30e388d573 ("gfs2: Switch to a per-filesystem glock workqueue")
Cc: stable@vger.kernel.org
Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-08-20 16:27:22 +02:00
Andreas Gruenbacher 4117efd5c9 gfs2: Minor gfs2_glock_cb cleanup
In gfs2_glock_cb(), we only need to calculate the glock hold time for
inode glocks; the value is unused otherwise.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-08-20 16:06:43 +02:00
Andreas Gruenbacher f75efefb6d gfs2: Clean up glock demote logic
The logic for determining when to demote a glock in glock_work_func(),
introduced in commit 7cf8dcd3b6 ("GFS2: Automatically adjust glock min
hold time"), doesn't make sense: inode glocks have a minimum hold time
that delays demotion, while all other glocks are expected to be demoted
immediately.  Instead of demoting non-inode glocks immediately,
glock_work_func() schedules glock work for them to be demoted, however.
Get rid of that unnecessary indirection.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-07-09 10:40:03 +02:00
Andreas Gruenbacher 5a1906a476 gfs2: Revert "check for no eligible quota changes"
Since the previous commit, function gfs2_quota_sync() will not cause the
sync generation to creep forward by one every time the function is
called; this helps keep things a but more tidy.  We also don't care that
this function allocates a page of memory every time it is called, so no
good reason for keeping qd_changed() anymore, which just duplicates
qd_grab_sync().

This reverts commit 06aa6fd31a.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-06-20 16:38:15 +02:00
Andreas Gruenbacher d9a75a6069 gfs2: Be more careful with the quota sync generation
The quota sync generation is only ever updated under sd_quota_sync_mutex
by gfs2_quota_sync(), but its current value is fetched ouside of that
mutex, so use WRITE_ONCE() and READ_ONCE() when accessing it without
holding that mutex.

Pass the current sync generation to do_sync() from its callers to ensure
that we're not recording the wrong generation when the syncing is
done.  Also, make sure that qd->qd_sync_gen only ever moves forward.

In gfs2_quota_sync(), only write the new sync generation when we know
that there are changes.  This eliminates the need for function
sd_changed(), which we will remove in the next commit.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-06-20 16:38:15 +02:00