-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmnPLLMACgkQiiy9cAdy
T1HChQv/exBrgGy7AYavmAZjGsY67XFqAqKyechw2mB0vL1876MHFLcC3ch/MPXX
IkYwbQYjIev4v2HuItRKdxFVZbcEIarttGRdi9zHpyb8GOcT/QZ7SktzdXk/POUE
uUkhZxTmp4Wq/raG328+HpUGcRxs/SUfTJtrnkKTsiG2bbjWn1NkEyyMDTKeVwfa
VfGA8+Ys6pZe5yUUjWyCJ4Qe7ox0sJofXcTAODdOPz3KJPFqO63njBjx4g/zi/nu
7CxGwqcmcVmPeBPKi1FHkNupMRp/cLOdmjoSuf+aG7BHRpgkeGDPMVoBWHtKBua4
dx+MDbEKRf0kQXtkctdwiZ3Qy6oK3Yj6PNYRkq5RFYNsWgDP1DS/r+hOtKh1Azwl
Ncg2K+L8TicNL0di2IbWQUu0YUeU8ugfY49aFmHBFJIXxH7/khzZPATU9vJlDYvp
spqA/6s7gTbCfaJ0Ag3E2h/R9QcG6FmSurcwwYMZMHwmZXThSknNBf9XKcBL03GJ
hi7Ti+58
=PDtM
-----END PGP SIGNATURE-----
Merge tag 'v7.0-rc6-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fix from Steve French:
- Fix potential out of bounds read in mount
* tag 'v7.0-rc6-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
fs/smb/client: fix out-of-bounds read in cifs_sanitize_prepath
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmnN2jYACgkQiiy9cAdy
T1FWcwwAuYPzBSgXdf9Vqr23RViYvFeODFWeyQ5ADuIZN20aeRy2WrJho2MfmrCZ
q+GFf1SfpE/EPr3XF6Px6100K1Ox/0r94h7Jo/ig9Eg7VxsuBGCfTU7EXo3kzvfI
pJlXLa6x/+Sk6Dkyp9/aJTxuM2cm4CDEcnG29AfVdhoHcnZzdqsC9CsS/qAiEe8D
cM01PMFz7wTLSnU45iMVqpGBS/wNO8fsX66MeH3C6ghvZ89AD+ZJki5sF1O7gzsY
pEEi+8agldwrSmFsBrSb92takgK6/8Leq5hC9DvUSbo2SJQocADaL9k1ow8SN1X+
I3tI1cVX91+VcVKZ9prwUKpq7cEPvCj6iI2uv1XPvio4bRcvNjWc32mqmdMrUcd8
plbQCr1fICqQmxXHd3p4EPxCZ6XsQ2XrjXYzwT+hpK6TSRalTXEojI22THZjV23h
a/ln6gKjSZ8Bb3/nzM1JqwGZ9YltGwupnxElqJdlc8j4rNG8VQ2lm9OTO+sHc9Af
iUlaL/J+
=88HL
-----END PGP SIGNATURE-----
Merge tag 'v7.0-rc6-ksmbd-server-fix' of git://git.samba.org/ksmbd
Pull smb server fix from Steve French:
- Fix out of bound write
* tag 'v7.0-rc6-ksmbd-server-fix' of git://git.samba.org/ksmbd:
ksmbd: fix OOB write in QUERY_INFO for compound requests
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmnOouYACgkQxWXV+ddt
WDtuPw/+JGyFfNCB7ypZWOf8Lj/nxGLlNxnVCaSOqo4TDp/5dLmrVd6jHG1QqYhW
B8s4pIbF4QQ3kGa+L9r2xhAXEW4CRqKQ2rz2QhmUzxm8kZuFTFcbc1bfGp4dJT12
10iyfWTuFVSyo6QqY+QzTCxI0rw99MTZclFgaKwSvzP/dZKoxwZlYriXJj0i/j28
ofM9jHOFobDmLV55zOD+VGNt8IUm5QM4pkY8PoOAqPAj04qlNzx/t9d5gLGPucU5
6jm5+lAFXzEDf7klb5x3BKCXhtl86ajATIaS205KF5eH08Du/1U7+qswbBwV8jLB
xp3B1aCSmYDIsK7tjb8a2iFyD+lF4KXGFtzWfIwMn3Qrelvh6LfKc4LEz/BdnEUu
M/gI4HFusQi8WklAadQ726EMQwyU/yM/2yxAtEnSS4suVxKzniQUHprm35hntKxy
yF2GLLrjNMrioAc5HxP9RIl8JGakputC2fgkZsvc4RkNMeD6lH5T2hbZcJXLbqSV
1ggqivzcLOMGUNmZnCDS38f8Df69mz/HctSGIE1cR8+HewDuTlWdt8D5njfXx+iQ
PXDT/mM9Xi314QtOdw1XMbyhH9WloeaaUQxcM7hoQXo51uEOLMOcSUL/X4+KaqIE
Wyv5ufPZoq6i3UuYRkKux37tAKRQo0n7J03hOeody4vvXG4nK2s=
=85r4
-----END PGP SIGNATURE-----
Merge tag 'for-7.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"One more fix for a potential extent tree corruption due to an
unexpected error value.
When the search for an extent item failed, it under some circumstances
was reported as a success to the caller"
* tag 'for-7.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix incorrect return value after changing leaf in lookup_extent_data_ref()
When cifs_sanitize_prepath is called with an empty string or a string
containing only delimiters (e.g., "/"), the current logic attempts to
check *(cursor2 - 1) before cursor2 has advanced. This results in an
out-of-bounds read.
This patch adds an early exit check after stripping prepended
delimiters. If no path content remains, the function returns NULL.
The bug was identified via manual audit and verified using a
standalone test case compiled with AddressSanitizer, which
triggered a SEGV on affected inputs.
Signed-off-by: Fredric Cover <FredTheDude@proton.me>
Reviewed-by: Henrique Carvalho <[2]henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmnLsoQACgkQnJ2qBz9k
QNmnIAgAmJtXFZu/uJiRR6RrxmYzIdznkaZ8aTGeApQfG432goHqPhh1j+TKpe7Q
YwrB90YP6umL16HOyKvci/jNpv7+vFb8sr93FPu8+01AnMJqfvA3IJ+hrzPy6IcX
dqpO4oKr9gqbmf2/9Eny5mv50xcNI3+M7y+xh0MFsKoZxnEb2j4pCKQsPO0qMOZw
Aui1fFb4nXpcWUlIJtNFRuXXR3pQf4n8ZpfywhUCe4FV5RRrwiq6Y8If0Tj8lqgx
2fEwbkkjZkFVlOijNhZSSWKnobwg3oUbpXStZE6Kyw60vZBcQgEKWOAke175v+qf
XMuhiSrSawJDC45SZmb8QtGe8GEV6g==
=gYPI
-----END PGP SIGNATURE-----
Merge tag 'fs_for_v7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull udf fix from Jan Kara:
"Fix for a race in UDF that can lead to memory corruption"
* tag 'fs_for_v7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fix race between file type conversion and writeback
mpage: Provide variant of mpage_writepages() with own optional folio handler
After commit 1618aa3c2e ("btrfs: simplify return variables in
lookup_extent_data_ref()"), the err and ret variables were merged into
a single ret variable. However, when btrfs_next_leaf() returns 0
(success), ret is overwritten from -ENOENT to 0. If the first key in
the next leaf does not match (different objectid or type), the function
returns 0 instead of -ENOENT, making the caller believe the lookup
succeeded when it did not. This can lead to operations on the wrong
extent tree item, potentially causing extent tree corruption.
Fix this by returning -ENOENT directly when the key does not match,
instead of relying on the ret variable.
Fixes: 1618aa3c2e ("btrfs: simplify return variables in lookup_extent_data_ref()")
CC: stable@vger.kernel.org # 6.12+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: robbieko <robbieko@synology.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When a compound request such as READ + QUERY_INFO(Security) is received,
and the first command (READ) consumes most of the response buffer,
ksmbd could write beyond the allocated buffer while building a security
descriptor.
The root cause was that smb2_get_info_sec() checked buffer space using
ppntsd_size from xattr, while build_sec_desc() often synthesized a
significantly larger descriptor from POSIX ACLs.
This patch introduces smb_acl_sec_desc_scratch_len() to accurately
compute the final descriptor size beforehand, performs proper buffer
checking with smb2_calc_max_out_buf_len(), and uses exact-sized
allocation + iov pinning.
Cc: stable@vger.kernel.org
Fixes: e2b76ab8b5 ("ksmbd: add support for read compound")
Signed-off-by: Asim Viladi Oglu Manizada <manizada@pm.me>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Please consider pulling these changes from the signed vfs-7.0-rc6.fixes tag.
Thanks!
Christian
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCacmRjQAKCRCRxhvAZXjc
olJnAQD2iiLqih8Y8nX3ESMkkIQWUoSikrfSVw/GqmuKTmlrDgEA/z+LRgDGnI/+
6xzkEw4UNmJ9JoJsiPSlHq18yyga/ww=
=DxTb
-----END PGP SIGNATURE-----
Merge tag 'vfs-7.0-rc6.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix netfs_limit_iter() hitting BUG() when an ITER_KVEC iterator
reaches it via core dump writes to 9P filesystems. Add ITER_KVEC
handling following the same pattern as the existing ITER_BVEC code.
- Fix a NULL pointer dereference in the netfs unbuffered write retry
path when the filesystem (e.g., 9P) doesn't set the prepare_write
operation.
- Clear I_DIRTY_TIME in sync_lazytime for filesystems implementing
->sync_lazytime. Without this the flag stays set and may cause
additional unnecessary calls during inode deactivation.
- Increase tmpfs size in mount_setattr selftests. A recent commit
bumped the ext4 image size to 2 GB but didn't adjust the tmpfs
backing store, so mkfs.ext4 fails with ENOSPC writing metadata.
- Fix an invalid folio access in iomap when i_blkbits matches the folio
size but differs from the I/O granularity. The cur_folio pointer
would not get invalidated and iomap_read_end() would still be called
on it despite the IO helper owning it.
- Fix hash_name() docstring.
- Fix read abandonment during netfs retry where the subreq variable
used for abandonment could be uninitialized on the first pass or
point to a deleted subrequest on later passes.
- Don't block sync for filesystems with no data integrity guarantees.
Add a SB_I_NO_DATA_INTEGRITY superblock flag replacing the per-inode
AS_NO_DATA_INTEGRITY mapping flag so sync kicks off writeback but
doesn't wait for flusher threads. This fixes a suspend-to-RAM hang on
fuse-overlayfs where the flusher thread blocks when the fuse daemon
is frozen.
- Fix a lockdep splat in iomap when reads fail. iomap_read_end_io()
invokes fserror_report() which calls igrab() taking i_lock in hardirq
context while i_lock is normally held with interrupts enabled. Kick
failed read handling to a workqueue.
- Remove the redundant netfs_io_stream::front member and use
stream->subrequests.next instead, fixing a potential issue in the
direct write code path.
* tag 'vfs-7.0-rc6.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
netfs: Fix the handling of stream->front by removing it
iomap: fix lockdep complaint when reads fail
writeback: don't block sync for filesystems with no data integrity guarantees
netfs: Fix read abandonment during retry
vfs: fix docstring of hash_name()
iomap: fix invalid folio access when i_blkbits differs from I/O granularity
selftests/mount_setattr: increase tmpfs size for idmapped mount tests
fs: clear I_DIRTY_TIME in sync_lazytime
netfs: Fix NULL pointer dereference in netfs_unbuffered_write() on retry
netfs: Fix kernel BUG in netfs_limit_iter() for ITER_KVEC iterators
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmnI69IACgkQEVvVyTe/
1WqWmxAAn7Bbd9vh6botvvRoMwSlvPOE3oexRbpr+EMCE3Z9mulKi//KoiOk1xxW
nzg41gXQ2ZvsCys8M/5tloo4U0YRZQthf3SyEyvl/c/wGRWvSZYdv86iV5/Qhkwf
VaAT0Kh/V5Rffg1f1xYIVlt4TRbcbKTj9lr1bPnPGEbAY96Dav0gYMn2+29I1642
l6CpWtM2OZCalDK/j0zG6Wxxeko84pT0iv6kxGba3roQVONzYWXycgW3caR4DJq2
l4D81QCG1NhV984MP3UChHPoTa1gUepE6fOhyyTwXrWn8bEdkFXtHgKaC90PirP0
OeYYK7sRx4yrsRcpBsuo83lpCP7aMIiGpU9hrbABdQ2qF9o1/hZyggU+FdKv+dri
6SIw1w+wHpa2gDoSsze9uCsoc/jPflCzBgfo1w1jbGf3kctwmMSnWpkVBhCIwtD4
ntoTWk2MzmLNXCDCrjb1r6SpYgnxrGDcRf6KikF89hkVmWO3wSYzscbaVQz1+R4b
QKBF67zmYoubYPPZ503Y5n7S4L3O8kxGMCGCBx10FIvs8Lvf8dGwTBL+gOtmg1VK
YO99lreo+BN3Q3iui1h2F4wT+XY1Iaq2ORaxi3czjblMfu/rkpcwHS2O6FlecvCQ
Vfgve+6xGuYHMeziDkSiczA2m+8USIefA0zXzwq+2TA82hs95b4=
=sSP0
-----END PGP SIGNATURE-----
Merge tag 'ovl-fixes-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs fixes from Amir Goldstein:
- Fix regression in 'xino' feature detection
I clumsily introduced this regression myself when working on another
subsystem (fsnotify). Both the regression and the fix have almost no
visible impact on users except for some kmsg prints.
- Fix to performance regression in v6.12.
This regression was reported by Google COS developers.
It is not uncommon these days for the year-old mature LTS to get
adopted by distros and get exposed to many new workloads. We made a
sub-smart move of making a behavior change in v6.12 which could
impact performance, without making it opt-in. Fixing this mistake
retroactively, to be picked by LTS.
* tag 'ovl-fixes-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: make fsync after metadata copy-up opt-in mount option
ovl: fix wrong detection of 32bit inode numbers
Add a test issue an ext4 warning (not a WARN_ON) if there are still
dirty pages attached to an evicted inode.
A lot of ext4 bug fixes including:
* Fix a number of Syzkaller issues.
* Fix memory leaks on error paths.
* Replace some BUG and WARN with EFSCORRUPTED reporting.
* Fix a potential crash when disabling discard via remount followed
by an immediate unmount. (Found by Sashiko)
* Fix a corner case which could lead to allocating blocks for an
indirect-mapped inode block numbers > 2**32.
* Fix a race when reallocating a freed inode that could result in
a deadlock.
* Fix a user-after-free in update_super_work when racing with umount.
* Fix build issues when trying to build ext4's kunit tests as a module
* Fix a bug where ext4_split_extent_zeroout() could fail to pass
back an error from ext4_ext_dirty().
* Avoid allocating blocks from a corrupted block group in
ext4_mb_find_by_goal().
* Fix a percpu_counters list corruption BUG triggered by an
ext4 extents kunit.
* Fix a potetial crash caused by the fast commit flush path potentially
accessing the jinode structure before it is fully initialized.
* Fix fsync(2) in no-journal mode to make sure the dirtied inode is
write to storage.
* Fix a bug when in no-journal mode, when ext4 tries to avoid using
recently deleted inodes, if lazy itable initialization is enabled,
can lead to an unitialized inode getting skipped and triggering
an e2fsck complaint.
* Fix journal credit calculation when setting an xattr when both
the encryption and ea_inode feeatures are enabled.
* Fix corner cases which could result in stale xarray tags after
writeback.
* Fix generic/475 failures caused by ENOSPC errors while creating
a symlink when the system crashes resulting to a file system
inconsistency when replaying the fast commit journal.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmnIo3cACgkQ8vlZVpUN
gaNu+wf+PyZyRFar37GVcXiMWm8wiBrH7XYhcH1z+uEfjxtWuxUDtKZtF/UZ8FiW
peRI785hiIW1xjj3efkNlrg4JFPlY+mLlyY8iUabYj6Gt2kddgy7h6gCf2ugM/V9
1fk9Hf3+QQ38qM9PpBpBVZyhSS7WpafCcUvx622G/+WiQldWAohpfk/KrNQtSWiX
ZFzJtJ+7jJD6Qo1Pe2Em4khsZK2fsLw1XH9GpUWIdauQV+oXnkp7+kOQIOoT+7RX
azjAub5b6L3Bf9p/NNvlUCE9VINeqjKHVGI/gesRz9uiQd/zI3/pBQtddMUy8B3k
buN3uii5iYcLQD8ntX5sTP35PQg4hg==
=Ew20
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
- Update the MAINTAINERS file to add reviewers for the ext4 file system
- Add a test issue an ext4 warning (not a WARN_ON) if there are still
dirty pages attached to an evicted inode.
- Fix a number of Syzkaller issues
- Fix memory leaks on error paths
- Replace some BUG and WARN with EFSCORRUPTED reporting
- Fix a potential crash when disabling discard via remount followed by
an immediate unmount. (Found by Sashiko)
- Fix a corner case which could lead to allocating blocks for an
indirect-mapped inode block numbers > 2**32
- Fix a race when reallocating a freed inode that could result in a
deadlock
- Fix a user-after-free in update_super_work when racing with umount
- Fix build issues when trying to build ext4's kunit tests as a module
- Fix a bug where ext4_split_extent_zeroout() could fail to pass back
an error from ext4_ext_dirty()
- Avoid allocating blocks from a corrupted block group in
ext4_mb_find_by_goal()
- Fix a percpu_counters list corruption BUG triggered by an ext4
extents kunit
- Fix a potetial crash caused by the fast commit flush path potentially
accessing the jinode structure before it is fully initialized
- Fix fsync(2) in no-journal mode to make sure the dirtied inode is
write to storage
- Fix a bug when in no-journal mode, when ext4 tries to avoid using
recently deleted inodes, if lazy itable initialization is enabled,
can lead to an unitialized inode getting skipped and triggering an
e2fsck complaint
- Fix journal credit calculation when setting an xattr when both the
encryption and ea_inode feeatures are enabled
- Fix corner cases which could result in stale xarray tags after
writeback
- Fix generic/475 failures caused by ENOSPC errors while creating a
symlink when the system crashes resulting to a file system
inconsistency when replaying the fast commit journal
* tag 'ext4_for_linus-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (27 commits)
ext4: always drain queued discard work in ext4_mb_release()
ext4: handle wraparound when searching for blocks for indirect mapped blocks
ext4: skip split extent recovery on corruption
ext4: fix iloc.bh leak in ext4_fc_replay_inode() error paths
ext4: fix deadlock on inode reallocation
ext4: fix use-after-free in update_super_work when racing with umount
ext4: fix the might_sleep() warnings in kvfree()
ext4: reject mount if bigalloc with s_first_data_block != 0
ext4: fix extents-test.c is not compiled when EXT4_KUNIT_TESTS=M
ext4: fix mballoc-test.c is not compiled when EXT4_KUNIT_TESTS=M
ext4: introduce EXPORT_SYMBOL_FOR_EXT4_TEST() helper
jbd2: gracefully abort on checkpointing state corruptions
ext4: avoid infinite loops caused by residual data
ext4: validate p_idx bounds in ext4_ext_correct_indexes
ext4: test if inode's all dirty pages are submitted to disk
ext4: minor fix for ext4_split_extent_zeroout()
ext4: avoid allocate block from corrupted group in ext4_mb_find_by_goal()
ext4: kunit: extents-test: lix percpu_counters list corruption
ext4: publish jinode after initialization
ext4: replace BUG_ON with proper error handling in ext4_read_inline_folio
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmnIO1oACgkQxWXV+ddt
WDuLDQ//eWEAvqyA7ch+aLNm0QVHqsdH3NdVFjqw/3NsZfC9A3txTpQmmWpfqIUl
Yqb3AVIdRqwdytinn3xX17SxkJxzCWExcQ8LckoiJ2ODhLoi/vWGnaK8sUBzUDsm
2etvLGbDmhEEpkTMtIFkbY8fhprqthGCQgRHNrn/cZ+AzwohaboinvACysQJSLlH
Nqv0YxnAxsaGIaGD7wigjZ7a2BDoSZ/Q82+RvacrOkyqEThSOqzLmfl++fmHfxQo
UDzVOt9RSUnZ5LU4MYjF3H5otLczqKqRuoQC66ulIkvCml07jaKxfUVJyjkQ3dzZ
//UiylTXnC6XK/x1ayRrM73dKkXoN9mBmvH4b2FnOQJ3D3oyiCuytEF8s4nCrE+P
Zml+OTvIvWMj7Xy2U7cTMWuQGeggZ1m4nxCRScZBEnububUzt13izODwE7C8Cx1y
5FvfFIkiyeBrI03bWUO48H38lwWsfIKj6KnEk1SUeZIARClnZfM4NGZ7MbxjcntG
SunXzuQAywOZCkpskUDWMASLSH6CEGG4zgHPmbzw1NSCE5ZCHFZYXG7znv4w3CTW
TTeowG9r9cgdVBnMUTIToNWuD0jSMLmUfEPWvcwZF9ptfXdoABsPEQZTrKTLeiq7
TVdunKWzU12ZLr2sl4iku8U4H4aNQFoAwV3XqKnG8VXmzN41+Zw=
=I6JH
-----END PGP SIGNATURE-----
Merge tag 'for-7.0-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more fixes. There's one that stands out in size as it fixes an
edge case in fsync.
- fix issue on fsync where file with zero size appears as a non-zero
after log replay
- in zlib compression, handle a crash when data alignment causes
folio reference issues
- fix possible crash with enabled tracepoints on a overlayfs mount
- handle device stats update error
- on zoned filesystems, fix kobject leak on sub-block groups
- fix super block offset in an error message in validation"
* tag 'for-7.0-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix lost error when running device stats on multiple devices fs
btrfs: tracepoints: get correct superblock from dentry in event btrfs_sync_file()
btrfs: zlib: handle page aligned compressed size correctly
btrfs: fix leak of kobject name for sub-group space_info
btrfs: fix zero size inode with non-zero size after log replay
btrfs: fix super block offset in error message in btrfs_validate_super()
While reviewing recent ext4 patch[1], Sashiko raised the following
concern[2]:
> If the filesystem is initially mounted with the discard option,
> deleting files will populate sbi->s_discard_list and queue
> s_discard_work. If it is then remounted with nodiscard, the
> EXT4_MOUNT_DISCARD flag is cleared, but the pending s_discard_work is
> neither cancelled nor flushed.
[1] https://lore.kernel.org/r/20260319094545.19291-1-qiang.zhang@linux.dev/
[2] https://sashiko.dev/#/patchset/20260319094545.19291-1-qiang.zhang%40linux.dev
The concern was valid, but it had nothing to do with the patch[1].
One of the problems with Sashiko in its current (early) form is that
it will detect pre-existing issues and report it as a problem with the
patch that it is reviewing.
In practice, it would be hard to hit deliberately (unless you are a
malicious syzkaller fuzzer), since it would involve mounting the file
system with -o discard, and then deleting a large number of files,
remounting the file system with -o nodiscard, and then immediately
unmounting the file system before the queued discard work has a change
to drain on its own.
Fix it because it's a real bug, and to avoid Sashiko from raising this
concern when analyzing future patches to mballoc.c.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fixes: 55cdd0af2b ("ext4: get discard out of jbd2 commit kthread contex")
Cc: stable@kernel.org
Commit 4865c768b5 ("ext4: always allocate blocks only from groups
inode can use") restricts what blocks will be allocated for indirect
block based files to block numbers that fit within 32-bit block
numbers.
However, when using a review bot running on the latest Gemini LLM to
check this commit when backporting into an LTS based kernel, it raised
this concern:
If ac->ac_g_ex.fe_group is >= ngroups (for instance, if the goal
group was populated via stream allocation from s_mb_last_groups),
then start will be >= ngroups.
Does this allow allocating blocks beyond the 32-bit limit for
indirect block mapped files? The commit message mentions that
ext4_mb_scan_groups_linear() takes care to not select unsupported
groups. However, its loop uses group = *start, and the very first
iteration will call ext4_mb_scan_group() with this unsupported
group because next_linear_group() is only called at the end of the
iteration.
After reviewing the code paths involved and considering the LLM
review, I determined that this can happen when there is a file system
where some files/directories are extent-mapped and others are
indirect-block mapped. To address this, add a safety clamp in
ext4_mb_scan_groups().
Fixes: 4865c768b5 ("ext4: always allocate blocks only from groups inode can use")
Cc: Jan Kara <jack@suse.cz>
Reviewed-by: Baokun Li <libaokun@linux.alibaba.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://patch.msgid.link/20260326045834.1175822-1-tytso@mit.edu
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
ext4_split_extent_at() retries after ext4_ext_insert_extent() fails by
refinding the original extent and restoring its length. That recovery is
only safe for transient resource failures such as -ENOSPC, -EDQUOT, and
-ENOMEM.
When ext4_ext_insert_extent() fails because the extent tree is already
corrupted, ext4_find_extent() can return a leaf path without p_ext.
ext4_split_extent_at() then dereferences path[depth].p_ext while trying to
fix up the original extent length, causing a NULL pointer dereference while
handling a pre-existing filesystem corruption.
Do not enter the recovery path for corruption errors, and validate p_ext
after refinding the extent before touching it. This keeps the recovery path
limited to cases it can actually repair and turns the syzbot-triggered crash
into a proper corruption report.
Fixes: 716b9c23b8 ("ext4: refactor split and convert extents")
Reported-by: syzbot+1ffa5d865557e51cb604@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1ffa5d865557e51cb604
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: hongao <hongao@uniontech.com>
Link: https://patch.msgid.link/EF77870F23FF9C90+20260324015815.35248-1-hongao@uniontech.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
During code review, Joseph found that ext4_fc_replay_inode() calls
ext4_get_fc_inode_loc() to get the inode location, which holds a
reference to iloc.bh that must be released via brelse().
However, several error paths jump to the 'out' label without
releasing iloc.bh:
- ext4_handle_dirty_metadata() failure
- sync_dirty_buffer() failure
- ext4_mark_inode_used() failure
- ext4_iget() failure
Fix this by introducing an 'out_brelse' label placed just before
the existing 'out' label to ensure iloc.bh is always released.
Additionally, make ext4_fc_replay_inode() propagate errors
properly instead of always returning 0.
Reported-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Fixes: 8016e29f43 ("ext4: fast commit recovery path")
Signed-off-by: Baokun Li <libaokun@linux.alibaba.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260323060836.3452660-1-libaokun@linux.alibaba.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Currently there is a race in ext4 when reallocating freed inode
resulting in a deadlock:
Task1 Task2
ext4_evict_inode()
handle = ext4_journal_start();
...
if (IS_SYNC(inode))
handle->h_sync = 1;
ext4_free_inode()
ext4_new_inode()
handle = ext4_journal_start()
finds the bit in inode bitmap
already clear
insert_inode_locked()
waits for inode to be
removed from the hash.
ext4_journal_stop(handle)
jbd2_journal_stop(handle)
jbd2_log_wait_commit(journal, tid);
- deadlocks waiting for transaction handle Task2 holds
Fix the problem by removing inode from the hash already in
ext4_clear_inode() by which time all IO for the inode is done so reuse
is already fine but we are still before possibly blocking on transaction
commit.
Reported-by: "Lai, Yi" <yi1.lai@linux.intel.com>
Link: https://lore.kernel.org/all/abNvb2PcrKj1FBeC@ly-workstation
Fixes: 88ec797c46 ("fs: make insert_inode_locked() wait for inode destruction")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260320090428.24899-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Commit b98535d091 ("ext4: fix bug_on in start_this_handle during umount
filesystem") moved ext4_unregister_sysfs() before flushing s_sb_upd_work
to prevent new error work from being queued via /proc/fs/ext4/xx/mb_groups
reads during unmount. However, this introduced a use-after-free because
update_super_work calls ext4_notify_error_sysfs() -> sysfs_notify() which
accesses the kobject's kernfs_node after it has been freed by kobject_del()
in ext4_unregister_sysfs():
update_super_work ext4_put_super
----------------- --------------
ext4_unregister_sysfs(sb)
kobject_del(&sbi->s_kobj)
__kobject_del()
sysfs_remove_dir()
kobj->sd = NULL
sysfs_put(sd)
kernfs_put() // RCU free
ext4_notify_error_sysfs(sbi)
sysfs_notify(&sbi->s_kobj)
kn = kobj->sd // stale pointer
kernfs_get(kn) // UAF on freed kernfs_node
ext4_journal_destroy()
flush_work(&sbi->s_sb_upd_work)
Instead of reordering the teardown sequence, fix this by making
ext4_notify_error_sysfs() detect that sysfs has already been torn down
by checking s_kobj.state_in_sysfs, and skipping the sysfs_notify() call
in that case. A dedicated mutex (s_error_notify_mutex) serializes
ext4_notify_error_sysfs() against kobject_del() in ext4_unregister_sysfs()
to prevent TOCTOU races where the kobject could be deleted between the
state_in_sysfs check and the sysfs_notify() call.
Fixes: b98535d091 ("ext4: fix bug_on in start_this_handle during umount filesystem")
Cc: Jiayuan Chen <jiayuan.chen@linux.dev>
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260319120336.157873-1-jiayuan.chen@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Now, only EXT4_KUNIT_TESTS=Y testcase will be compiled in 'extents.c'.
To solve this issue, the ext4 test code needs to be decoupled. The
'extents-test' module is compiled into 'ext4-test' module.
Fixes: cb1e0c1d1f ("ext4: kunit tests for extent splitting and conversion")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260314075258.1317579-4-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This patch targets two internal state machine invariants in checkpoint.c
residing inside functions that natively return integer error codes.
- In jbd2_cleanup_journal_tail(): A blocknr of 0 indicates a severely
corrupted journal superblock. Replaced the J_ASSERT with a WARN_ON_ONCE
and a graceful journal abort, returning -EFSCORRUPTED.
- In jbd2_log_do_checkpoint(): Replaced the J_ASSERT_BH checking for
an unexpected buffer_jwrite state. If the warning triggers, we
explicitly drop the just-taken get_bh() reference and call __flush_batch()
to safely clean up any previously queued buffers in the j_chkpt_bhs array,
preventing a memory leak before returning -EFSCORRUPTED.
Signed-off-by: Milos Nikic <nikic.milos@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Baokun Li <libaokun@linux.alibaba.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260311041548.159424-1-nikic.milos@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
On the mkdir/mknod path, when mapping logical blocks to physical blocks,
if inserting a new extent into the extent tree fails (in this example,
because the file system disabled the huge file feature when marking the
inode as dirty), ext4_ext_map_blocks() only calls ext4_free_blocks() to
reclaim the physical block without deleting the corresponding data in
the extent tree. This causes subsequent mkdir operations to reference
the previously reclaimed physical block number again, even though this
physical block is already being used by the xattr block. Therefore, a
situation arises where both the directory and xattr are using the same
buffer head block in memory simultaneously.
The above causes ext4_xattr_block_set() to enter an infinite loop about
"inserted" and cannot release the inode lock, ultimately leading to the
143s blocking problem mentioned in [1].
If the metadata is corrupted, then trying to remove some extent space
can do even more harm. Also in case EXT4_GET_BLOCKS_DELALLOC_RESERVE
was passed, remove space wrongly update quota information.
Jan Kara suggests distinguishing between two cases:
1) The error is ENOSPC or EDQUOT - in this case the filesystem is fully
consistent and we must maintain its consistency including all the
accounting. However these errors can happen only early before we've
inserted the extent into the extent tree. So current code works correctly
for this case.
2) Some other error - this means metadata is corrupted. We should strive to
do as few modifications as possible to limit damage. So I'd just skip
freeing of allocated blocks.
[1]
INFO: task syz.0.17:5995 blocked for more than 143 seconds.
Call Trace:
inode_lock_nested include/linux/fs.h:1073 [inline]
__start_dirop fs/namei.c:2923 [inline]
start_dirop fs/namei.c:2934 [inline]
Reported-by: syzbot+512459401510e2a9a39f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1659aaaaa8d9d11265d7
Tested-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
Reported-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=512459401510e2a9a39f
Tested-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: syzbot+512459401510e2a9a39f@syzkaller.appspotmail.com
Link: https://patch.msgid.link/tencent_43696283A68450B761D76866C6F360E36705@qq.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
ext4_ext_correct_indexes() walks up the extent tree correcting
index entries when the first extent in a leaf is modified. Before
accessing path[k].p_idx->ei_block, there is no validation that
p_idx falls within the valid range of index entries for that
level.
If the on-disk extent header contains a corrupted or crafted
eh_entries value, p_idx can point past the end of the allocated
buffer, causing a slab-out-of-bounds read.
Fix this by validating path[k].p_idx against EXT_LAST_INDEX() at
both access sites: before the while loop and inside it. Return
-EFSCORRUPTED if the index pointer is out of range, consistent
with how other bounds violations are handled in the ext4 extent
tree code.
Reported-by: syzbot+04c4e65cab786a2e5b7e@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=04c4e65cab786a2e5b7e
Signed-off-by: Tejas Bharambe <tejas.bharambe@outlook.com>
Link: https://patch.msgid.link/JH0PR06MB66326016F9B6AD24097D232B897CA@JH0PR06MB6632.apcprd06.prod.outlook.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
The commit aa373cf550 ("writeback: stop background/kupdate works from
livelocking other works") introduced an issue where unmounting a filesystem
in a multi-logical-partition scenario could lead to batch file data loss.
This problem was not fixed until the commit d92109891f ("fs/writeback:
bail out if there is no more inodes for IO and queued once"). It took
considerable time to identify the root cause. Additionally, in actual
production environments, we frequently encountered file data loss after
normal system reboots. Therefore, we are adding a check in the inode
release flow to verify whether all dirty pages have been flushed to disk,
in order to determine whether the data loss is caused by a logic issue in
the filesystem code.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260303012242.3206465-1-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
There's issue as follows:
...
EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 2243 at logical offset 0 with max blocks 1 with error 117
EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 2239 at logical offset 0 with max blocks 1 with error 117
EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
EXT4-fs (mmcblk0p1): error count since last fsck: 1
EXT4-fs (mmcblk0p1): initial error at time 1765597433: ext4_mb_generate_buddy:760
EXT4-fs (mmcblk0p1): last error at time 1765597433: ext4_mb_generate_buddy:760
...
According to the log analysis, blocks are always requested from the
corrupted block group. This may happen as follows:
ext4_mb_find_by_goal
ext4_mb_load_buddy
ext4_mb_load_buddy_gfp
ext4_mb_init_cache
ext4_read_block_bitmap_nowait
ext4_wait_block_bitmap
ext4_validate_block_bitmap
if (!grp || EXT4_MB_GRP_BBITMAP_CORRUPT(grp))
return -EFSCORRUPTED; // There's no logs.
if (err)
return err; // Will return error
ext4_lock_group(ac->ac_sb, group);
if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info))) // Unreachable
goto out;
After commit 9008a58e5d ("ext4: make the bitmap read routines return
real error codes") merged, Commit 163a203ddb ("ext4: mark block group
as corrupt on block bitmap error") is no real solution for allocating
blocks from corrupted block groups. This is because if
'EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)' is true, then
'ext4_mb_load_buddy()' may return an error. This means that the block
allocation will fail.
Therefore, check block group if corrupted when ext4_mb_load_buddy()
returns error.
Fixes: 163a203ddb ("ext4: mark block group as corrupt on block bitmap error")
Fixes: 9008a58e5d ("ext4: make the bitmap read routines return real error codes")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260302134619.3145520-1-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
commit 82f80e2e3b ("ext4: add extent status cache support to kunit tests"),
added ext4_es_register_shrinker() in extents_kunit_init() function but
failed to add the unregister shrinker routine in extents_kunit_exit().
This could cause the following percpu_counters list corruption bug.
ok 1 split unwrit extent to 2 extents and convert 1st half writ
slab kmalloc-4k start c0000002007ff000 pointer offset 1448 size 4096
list_add corruption. next->prev should be prev (c000000004bc9e60), but was 0000000000000000. (next=c0000002007ff5a8).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:29!
cpu 0x2: Vector: 700 (Program Check) at [c000000241927a30]
pc: c000000000f26ed0: __list_add_valid_or_report+0x120/0x164
lr: c000000000f26ecc: __list_add_valid_or_report+0x11c/0x164
sp: c000000241927cd0
msr: 800000000282b033
current = 0xc000000241215200
paca = 0xc0000003fffff300 irqmask: 0x03 irq_happened: 0x09
pid = 258, comm = kunit_try_catch
kernel BUG at lib/list_debug.c:29!
enter ? for help
__percpu_counter_init_many+0x148/0x184
ext4_es_register_shrinker+0x74/0x23c
extents_kunit_init+0x100/0x308
kunit_try_run_case+0x78/0x1f8
kunit_generic_run_threadfn_adapter+0x40/0x70
kthread+0x190/0x1a0
start_kernel_thread+0x14/0x18
2:mon>
This happens because:
extents_kunit_init(test N):
ext4_es_register_shrinker(sbi)
percpu_counters_init() x 4; // this adds 4 list nodes to global percpu_counters list
list_add(&fbc->list, &percpu_counters);
shrinker_register();
extents_kunit_exit(test N):
kfree(sbi); // frees sbi w/o removing those 4 list nodes.
// So, those list node now becomes dangling pointers
extents_kunit_init(test N+1):
kzalloc_obj(ext4_sb_info) // allocator returns same page, but zeroed.
ext4_es_register_shrinker(sbi)
percpu_counters_init()
list_add(&fbc->list, &percpu_counters);
__list_add_valid(new, prev, next);
next->prev != prev // list corruption bug detected, since next->prev = NULL
Fixes: 82f80e2e3b ("ext4: add extent status cache support to kunit tests")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/5bb9041471dab8ce870c191c19cbe4df57473be8.1772381213.git.ritesh.list@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Replace BUG_ON() with proper error handling when inline data size
exceeds PAGE_SIZE. This prevents kernel panic and allows the system to
continue running while properly reporting the filesystem corruption.
The error is logged via ext4_error_inode(), the buffer head is released
to prevent memory leak, and -EFSCORRUPTED is returned to indicate
filesystem corruption.
Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com>
Link: https://patch.msgid.link/20260223123345.14838-2-ytohnuki@amazon.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
When inode metadata is changed, we sometimes just call
ext4_mark_inode_dirty() to track modified metadata. This copies inode
metadata into block buffer which is enough when we are journalling
metadata. However when we are running in nojournal mode we currently
fail to write the dirtied inode buffer during fsync(2) because the inode
is not marked as dirty. Use explicit ext4_write_inode() call to make
sure the inode table buffer is written to the disk. This is a band aid
solution but proper solution requires a much larger rewrite including
changes in metadata bh tracking infrastructure.
Reported-by: Free Ekanayaka <free.ekanayaka@gmail.com>
Link: https://lore.kernel.org/all/87il8nhxdm.fsf@x1.mail-host-address-is-not-set/
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20260216164848.3074-4-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
recently_deleted() checks whether inode has been used in the near past.
However this can give false positive result when inode table is not
initialized yet and we are in fact comparing to random garbage (or stale
itable block of a filesystem before mkfs). Ultimately this results in
uninitialized inodes being skipped during inode allocation and possibly
they are never initialized and thus e2fsck complains. Verify if the
inode has been initialized before checking for dtime.
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20260216164848.3074-3-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Fix an issue arising when ext4 features has_journal, ea_inode, and encrypt
are activated simultaneously, leading to ENOSPC when creating an encrypted
file.
Fix by passing XATTR_CREATE flag to xattr_set_handle function if a handle
is specified, i.e., when the function is called in the control flow of
creating a new inode. This aligns the number of jbd2 credits set_handle
checks for with the number allocated for creating a new inode.
ext4_set_context must not be called with a non-null handle (fs_data) if
fscrypt context xattr is not guaranteed to not exist yet. The only other
usage of this function currently is when handling the ioctl
FS_IOC_SET_ENCRYPTION_POLICY, which calls it with fs_data=NULL.
Fixes: c1a5d5f6ab ("ext4: improve journal credit handling in set xattr paths")
Co-developed-by: Anthony Durrer <anthonydev@fastmail.com>
Signed-off-by: Anthony Durrer <anthonydev@fastmail.com>
Signed-off-by: Simon Weber <simon.weber.39@gmail.com>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20260207100148.724275-4-simon.weber.39@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Add a check in ext4_setattr() to convert files from inline data storage
to extent-based storage when truncate() grows the file size beyond the
inline capacity. This prevents the filesystem from entering an
inconsistent state where the inline data flag is set but the file size
exceeds what can be stored inline.
Without this fix, the following sequence causes a kernel BUG_ON():
1. Mount filesystem with inode that has inline flag set and small size
2. truncate(file, 50MB) - grows size but inline flag remains set
3. sendfile() attempts to write data
4. ext4_write_inline_data() hits BUG_ON(write_size > inline_capacity)
The crash occurs because ext4_write_inline_data() expects inline storage
to accommodate the write, but the actual inline capacity (~60 bytes for
i_block + ~96 bytes for xattrs) is far smaller than the file size and
write request.
The fix checks if the new size from setattr exceeds the inode's actual
inline capacity (EXT4_I(inode)->i_inline_size) and converts the file to
extent-based storage before proceeding with the size change.
This addresses the root cause by ensuring the inline data flag and file
size remain consistent during truncate operations.
Reported-by: syzbot+7de5fe447862fc37576f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=7de5fe447862fc37576f
Tested-by: syzbot+7de5fe447862fc37576f@syzkaller.appspotmail.com
Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com>
Link: https://patch.msgid.link/20260207043607.1175976-1-kartikey406@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
There are cases where ext4_bio_write_page() gets called for a page which
has no buffers to submit. This happens e.g. when the part of the file is
actually a hole, when we cannot allocate blocks due to being called from
jbd2, or in data=journal mode when checkpointing writes the buffers
earlier. In these cases we just return from ext4_bio_write_page()
however if the page didn't need redirtying, we will leave stale DIRTY
and/or TOWRITE tags in xarray because those get cleared only in
__folio_start_writeback(). As a result we can leave these tags set in
mappings even after a final sync on filesystem that's getting remounted
read-only or that's being frozen. Various assertions can then get upset
when writeback is started on such filesystems (Gerald reported assertion
in ext4_journal_check_start() firing).
Fix the problem by cycling the page through writeback state even if we
decide nothing needs to be written for it so that xarray tags get
properly updated. This is slightly silly (we could update the xarray
tags directly) but I don't think a special helper messing with xarray
tags is really worth it in this relatively rare corner case.
Reported-by: Gerald Yang <gerald.yang@canonical.com>
Link: https://lore.kernel.org/all/20260128074515.2028982-1-gerald.yang@canonical.com
Fixes: dff4ac75ee ("ext4: move keep_towrite handling to ext4_bio_write_page()")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260205092223.21287-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Commit '5f920d5d6083 ("ext4: verify fast symlink length")' causes the
generic/475 test to fail during orphan cleanup of zero-length symlinks.
generic/475 84s ... _check_generic_filesystem: filesystem on /dev/vde is inconsistent
The fsck reports are provided below:
Deleted inode 9686 has zero dtime.
Deleted inode 158230 has zero dtime.
...
Inode bitmap differences: -9686 -158230
Orphan file (inode 12) block 13 is not clean.
Failed to initialize orphan file.
In ext4_symlink(), a newly created symlink can be added to the orphan
list due to ENOSPC. Its data has not been initialized, and its size is
zero. Therefore, we need to disregard the length check of the symbolic
link when cleaning up orphan inodes. Instead, we should ensure that the
nlink count is zero.
Fixes: 5f920d5d60 ("ext4: verify fast symlink length")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260131091156.1733648-1-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Carlos Maiolino <cem@kernel.org>
-----BEGIN PGP SIGNATURE-----
iJUEABMJAB0WIQSmtYVZ/MfVMGUq1GNcsMJ8RxYuYwUCacY+kQAKCRBcsMJ8RxYu
Y6bMAXwL00+ri1ygA5s3PCaY965nLOchGrWjy0nCS075NTgYGo4Jq/hXBDDJPdRy
bg1cr6sBf2u5Qx2jQ3wrTy63XbHQ78wM+0lNZCMAobSpPC7li+wAIQ8l8wNGitU8
WQdkTBSfCw==
=nciv
-----END PGP SIGNATURE-----
Merge tag 'xfs-fixes-7.0-rc6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:
"This includes a few important bug fixes, and some code refactoring
that was necessary for one of the fixes"
* tag 'xfs-fixes-7.0-rc6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: remove file_path tracepoint data
xfs: don't irele after failing to iget in xfs_attri_recover_work
xfs: remove redundant validation in xlog_recover_attri_commit_pass2
xfs: fix ri_total validation in xlog_recover_attri_commit_pass2
xfs: close crash window in attr dabtree inactivation
xfs: factor out xfs_attr3_leaf_init
xfs: factor out xfs_attr3_node_entry_remove
xfs: only assert new size for datafork during truncate extents
xfs: annotate struct xfs_attr_list_context with __counted_by_ptr
xfs: cleanup buftarg handling in XFS_IOC_VERIFY_MEDIA
xfs: scrub: unlock dquot before early return in quota scrub
xfs: refactor xfsaild_push loop into helper
xfs: save ailp before dropping the AIL lock in push callbacks
xfs: avoid dereferencing log items after push callbacks
xfs: stop reclaim before pushing AIL during unmount
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmnF6AEACgkQiiy9cAdy
T1GydAv/cQNJiOJM6CVyAcuixCqf8bxmLkjYqGhPQ7jcjYjY688UJ4l5LCwKkWj8
7mlhHO2Ly0trlUn6dBet/0zVctQ8T0caMrOaht9BLnscdOHXdS3Sn27EZ02Ba8zF
aS3XYa792PASIeUf9CEvztaEMYW9BkJ8hQt4Z1qdgYIWgvIEeDiAgR+4tmB0iAaO
UbAU/bqSLzTC80wukUTa41ofJTEdb7Sg147BbP2p+D8aKBKxaQWEz71RS+erjlmQ
bx0AdImfdLkDWyEnFP7raGmNH/XOx76uDBVamKZEiTLz/MlRJRtXNFUwXeH+SH6p
92vatbkUbDHmMxCx0xhhFFci6oGSl1sS2R9jWxLgVHnup+apWyt6XcSXQq7yFt80
+C2EJiIlNpddWg9jEL3Q+Z/r3lN66cIztyuhN+Xuzo0zLMX80kfFhtW4KyJ24Gdj
sJCury5fcDqs264MMlO+q6r7nefViJaEWclfu1uvUjXwdybpofkM4i4Ki7106wsN
XQm/umrf
=41HQ
-----END PGP SIGNATURE-----
Merge tag 'v7.0-rc5-ksmbd-srv-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Fix out of bounds write
- Fix for better calculating max output buffers
- Fix memory leaks in SMB2/SMB3 lock
- Fix use after free
- Multichannel fix
* tag 'v7.0-rc5-ksmbd-srv-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix potencial OOB in get_file_all_info() for compound requests
ksmbd: replace hardcoded hdr2_len with offsetof() in smb2_calc_max_out_buf_len()
ksmbd: fix memory leaks and NULL deref in smb2_lock()
ksmbd: fix use-after-free and NULL deref in smb_grant_oplock()
ksmbd: do not expire session on binding failure
udf_setsize() can race with udf_writepages() as follows:
udf_setsize() udf_writepages()
if (iinfo->i_alloc_type ==
ICBTAG_FLAG_AD_IN_ICB)
err = udf_expand_file_adinicb(inode);
err = udf_extend_file(inode, newsize);
udf_adinicb_writepages()
memcpy_from_file_folio() - crash
because inode size is too big.
Fix the problem by checking the file type under folio lock in
udf_handle_page_wb() handler called from __mpage_writepages() which
properly serializes with udf_expand_file_adinicb().
Reported-by: Jianzhou Zhao <luckd0g@163.com>
Link: https://lore.kernel.org/all/f622c01.67ac.19cdbdd777d.Coremail.luckd0g@163.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260326140635.15895-4-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Some filesystems need to treat some folios specially (for example for
inodes with inline data). Doing the handling in their .writepages method
in a race-free manner results in duplicating some of the writeback
internals. So provide generalized version of mpage_writepages() that
allows filesystem to provide a handler called for each folio which can
handle the folio in a special way.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260326140635.15895-3-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Commit 7d6899fb69 ("ovl: fsync after metadata copy-up") was done to
fix durability of overlayfs copy up on an upper filesystem which does
not enforce ordering on storing of metadata changes (e.g. ubifs).
In an earlier revision of the regressing commit by Lei Lv, the metadata
fsync behavior was opt-in via a new "fsync=strict" mount option.
We were hoping that the opt-in mount option could be avoided, so the
change was only made to depend on metacopy=off, in the hope of not
hurting performance of metadata heavy workloads, which are more likely
to be using metacopy=on.
This hope was proven wrong by a performance regression report from Google
COS workload after upgrade to kernel 6.12.
This is an adaptation of Lei's original "fsync=strict" mount option
to the existing upstream code.
The new mount option is mutually exclusive with the "volatile" mount
option, so the latter is now an alias to the "fsync=volatile" mount
option.
Reported-by: Chenglong Tang <chenglongtang@google.com>
Closes: https://lore.kernel.org/linux-unionfs/CAOdxtTadAFH01Vui1FvWfcmQ8jH1O45owTzUcpYbNvBxnLeM7Q@mail.gmail.com/
Link: https://lore.kernel.org/linux-unionfs/CAOQ4uxgKC1SgjMWre=fUb00v8rxtd6sQi-S+dxR8oDzAuiGu8g@mail.gmail.com/
Fixes: 7d6899fb69 ("ovl: fsync after metadata copy-up")
Depends: 50e638beb6 ("ovl: Use str_on_off() helper in ovl_show_options()")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Fei Lv <feilv@asrmicro.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
The netfs_io_stream::front member is meant to point to the subrequest
currently being collected on a stream, but it isn't actually used this way
by direct write (which mostly ignores it). However, there's a tracepoint
which looks at it. Further, stream->front is actually redundant with
stream->subrequests.next.
Fix the potential problem in the direct code by just removing the member
and using stream->subrequests.next instead, thereby also simplifying the
code.
Fixes: a0b4c7a491 ("netfs: Fix unbuffered/DIO writes to dispatch subrequests in strict sequence")
Reported-by: Paulo Alcantara <pc@manguebit.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/4158599.1774426817@warthog.procyon.org.uk
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
The xfile/xmbuf shmem file descriptions are no longer as detailed as
they were when online fsck was first merged, because moving to static
strings in commit 60382993a2 ("xfs: get rid of the
xchk_xfile_*_descr calls") removed a memory allocation and hence a
source of failure.
However this makes encoding the description in the tracepoints sort of a
waste of memory. David Laight also points out that file_path doesn't
zero the whole buffer which causes exposure of stale trace bytes, and
Steven Rostedt wonders why we're not using a dynamic array for the file
path.
I don't think this is worth fixing, so let's just rip it out.
Cc: rostedt@goodmis.org
Cc: david.laight.linux@gmail.com
Link: https://lore.kernel.org/linux-xfs/20260323172204.work.979-kees@kernel.org/
Cc: stable@vger.kernel.org # v6.11
Fixes: 19ebc8f84e ("xfs: fix file_path handling in tracepoints")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
xlog_recovery_iget* never set @ip to a valid pointer if they return
an error, so this irele will walk off a dangling pointer. Fix that.
Cc: stable@vger.kernel.org # v6.10
Fixes: ae673f534a ("xfs: record inode generation in xattr update log intent items")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
- Mark I/Os as failed when encountering short reads on file-backed
mounts
- Label GFP_NOIO in the BIO completion when the completion is in
the process context, and directly call into the decompression
to avoid deadlocks
- Improve Kconfig descriptions to better highlight the overall
efforts
- Fix .fadvise() for page cache sharing
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmnEgjARHHhpYW5nQGtl
cm5lbC5vcmcACgkQUXZn5Zlu5qpBBhAAoARM70xNaTJPYsJax+lXpk6xinmYSthW
bkLbi99WDxTlxDWkrPLLTgiqfQ4Wq/Ks1vgtGC5ERZ7r5MnlRhlUaMTgOcS2PedE
gN8kQOGk0MoegA6uVF4TpkecGP5QX40L/xycRA4BKYPcI1OjHNzWtkC+vhqs9fMU
rUVQtxsaa4kHr1by709ttglBOXR4Sbdc/H3W4k/iUsa1jODko5iyuXtOY3dzx2zG
FCKR68m+tqEfyl+Qzt9mq2xyAIPIBGrqAnNgwiO8YlyUADZgd6e+0iiALUy1Ly4L
P7+Of2PKRf0RDyANNnNMICQHjdw/0SXWojp/VhH8evF0127/9sfoDChEvKCHBHfs
+HFfi0hMsO/vTo1OcwaBcuFqM64yRq3zmkUfGTaTpDpWdSDrGXR8ak2uzBxUTinM
5CCSxYjvmfQfAt9M7JqRPMOiE0WKzq9Mqd1+wExyA5SW140lz3ROttwogh0CRzal
oeDphyZo3quLBhoyapWVPCo8onMfgt031lfvf04VdjDjyeyiNCbAEUCtof35p3Y5
uBchgDkhSuCVR63lBwnGI3m3dGBSuFN70acEPExWyi2Dz/fNTfFnSU9tZ5yqBHt5
6fO+NbeLezO82PvWRkyvp0J0244F16LS5xBPBettR15DhaRUyi5Vgaf52Z+t1n/U
tR752Y/avsk=
=OpbI
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
- Mark I/Os as failed when encountering short reads on file-backed
mounts
- Label GFP_NOIO in the BIO completion when the completion is in the
process context, and directly call into the decompression to avoid
deadlocks
- Improve Kconfig descriptions to better highlight the overall efforts
- Fix .fadvise() for page cache sharing
* tag 'erofs-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix .fadvise() for page cache sharing
erofs: update the Kconfig description
erofs: add GFP_NOIO in the bio completion if needed
erofs: set fileio bio failed in short read case
When a compound request consists of QUERY_DIRECTORY + QUERY_INFO
(FILE_ALL_INFORMATION) and the first command consumes nearly the entire
max_trans_size, get_file_all_info() would blindly call smbConvertToUTF16()
with PATH_MAX, causing out-of-bounds write beyond the response buffer.
In get_file_all_info(), there was a missing validation check for
the client-provided OutputBufferLength before copying the filename into
FileName field of the smb2_file_all_info structure.
If the filename length exceeds the available buffer space, it could lead to
potential buffer overflows or memory corruption during smbConvertToUTF16
conversion. This calculating the actual free buffer size using
smb2_calc_max_out_buf_len() and returning -EINVAL if the buffer is
insufficient and updating smbConvertToUTF16 to use the actual filename
length (clamped by PATH_MAX) to ensure a safe copy operation.
Cc: stable@vger.kernel.org
Fixes: e2b76ab8b5 ("ksmbd: add support for read compound")
Reported-by: Asim Viladi Oglu Manizada <manizada@pm.me>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Currently, .fadvise() doesn't work well if page cache sharing is on
since shared inodes belong to a pseudo fs generated with init_pseudo(),
and sb->s_bdi is the default one &noop_backing_dev_info.
Then, generic_fadvise() will just behave as a no-op if sb->s_bdi is
&noop_backing_dev_info, but as the bdev fs (the bdev fs changes
inode_to_bdi() instead), it's actually NOT a pure memfs.
Let's generate a real bdi for erofs_ishare_mnt instead.
Fixes: d86d7817c0 ("erofs: implement .fadvise for page cache share")
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Refine the description to better highlight its features and use cases.
In addition, add instructions for building it as a module and clarify
the compression option.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>