From 50d7b4332f27762d24641970fc34bb68a2621926 Mon Sep 17 00:00:00 2001 From: "Pratyush Yadav (Google)" Date: Mon, 23 Feb 2026 18:39:28 +0100 Subject: [PATCH 01/15] mm: memfd_luo: always make all folios uptodate Patch series "mm: memfd_luo: fixes for folio flag preservation". This series contains a couple fixes for flag preservation for memfd live update. The first patch fixes memfd preservation when fallocate() was used to pre-allocate some pages. For these memfds, all the writes to fallocated pages touched after preserve were lost. The second patch fixes dirty flag tracking. If the dirty flag is not tracked correctly, the next kernel might incorrectly reclaim some folios under memory pressure, losing user data. This is a theoretical bug that I observed when reading the code, and haven't been able to reproduce it. This patch (of 2): When a folio is added to a shmem file via fallocate, it is not zeroed on allocation. This is done as a performance optimization since it is possible the folio will never end up being used at all. When the folio is used, shmem checks for the uptodate flag, and if absent, zeroes the folio (and sets the flag) before returning to user. With LUO, the flags of each folio are saved at preserve time. It is possible to have a memfd with some folios fallocated but not uptodate. For those, the uptodate flag doesn't get saved. The folios might later end up being used and become uptodate. They would get passed to the next kernel via KHO correctly since they did get preserved. But they won't have the MEMFD_LUO_FOLIO_UPTODATE flag. This means that when the memfd is retrieved, the folios will be added to the shmem file without the uptodate flag. They will be zeroed before first use, losing the data in those folios. Since we take a big performance hit in allocating, zeroing, and pinning all folios at prepare time anyway, take some more and zero all non-uptodate ones too. Later when there is a stronger need to make prepare faster, this can be optimized. To avoid racing with another uptodate operation, take the folio lock. Link: https://lkml.kernel.org/r/20260223173931.2221759-2-pratyush@kernel.org Fixes: b3749f174d68 ("mm: memfd_luo: allow preserving memfd") Signed-off-by: Pratyush Yadav (Google) Reviewed-by: Mike Rapoport (Microsoft) Cc: Pasha Tatashin Cc: Signed-off-by: Andrew Morton --- mm/memfd_luo.c | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/mm/memfd_luo.c b/mm/memfd_luo.c index e485b828d173..1c9510289312 100644 --- a/mm/memfd_luo.c +++ b/mm/memfd_luo.c @@ -152,10 +152,31 @@ static int memfd_luo_preserve_folios(struct file *file, if (err) goto err_unpreserve; + folio_lock(folio); + if (folio_test_dirty(folio)) flags |= MEMFD_LUO_FOLIO_DIRTY; - if (folio_test_uptodate(folio)) - flags |= MEMFD_LUO_FOLIO_UPTODATE; + + /* + * If the folio is not uptodate, it was fallocated but never + * used. Saving this flag at prepare() doesn't work since it + * might change later when someone uses the folio. + * + * Since we have taken the performance penalty of allocating, + * zeroing, and pinning all the folios in the holes, take a bit + * more and zero all non-uptodate folios too. + * + * NOTE: For someone looking to improve preserve performance, + * this is a good place to look. + */ + if (!folio_test_uptodate(folio)) { + folio_zero_range(folio, 0, folio_size(folio)); + flush_dcache_folio(folio); + folio_mark_uptodate(folio); + } + flags |= MEMFD_LUO_FOLIO_UPTODATE; + + folio_unlock(folio); pfolio->pfn = folio_pfn(folio); pfolio->flags = flags; From 7e04bf1f33151a30e06a65b74b5f2c19fc2be128 Mon Sep 17 00:00:00 2001 From: "Pratyush Yadav (Google)" Date: Mon, 23 Feb 2026 18:39:29 +0100 Subject: [PATCH 02/15] mm: memfd_luo: always dirty all folios A dirty folio is one which has been written to. A clean folio is its opposite. Since a clean folio has no user data, it can be freed under memory pressure. memfd preservation with LUO saves the flag at preserve(). This is problematic. The folio might get dirtied later. Saving it at freeze() also doesn't work, since the dirty bit from PTE is normally synced at unmap and there might still be mappings of the file at freeze(). To see why this is a problem, say a folio is clean at preserve, but gets dirtied later. The serialized state of the folio will mark it as clean. After retrieve, the next kernel will see the folio as clean and might try to reclaim it under memory pressure. This will result in losing user data. Mark all folios of the file as dirty, and always set the MEMFD_LUO_FOLIO_DIRTY flag. This comes with the side effect of making all clean folios un-reclaimable. This is a cost that has to be paid for participants of live update. It is not expected to be a common use case to preserve a lot of clean folios anyway. Since the value of pfolio->flags is a constant now, drop the flags variable and set it directly. Link: https://lkml.kernel.org/r/20260223173931.2221759-3-pratyush@kernel.org Fixes: b3749f174d68 ("mm: memfd_luo: allow preserving memfd") Signed-off-by: Pratyush Yadav (Google) Reviewed-by: Mike Rapoport (Microsoft) Cc: Pasha Tatashin Cc: Signed-off-by: Andrew Morton --- mm/memfd_luo.c | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/mm/memfd_luo.c b/mm/memfd_luo.c index 1c9510289312..b8edb9f981d7 100644 --- a/mm/memfd_luo.c +++ b/mm/memfd_luo.c @@ -146,7 +146,6 @@ static int memfd_luo_preserve_folios(struct file *file, for (i = 0; i < nr_folios; i++) { struct memfd_luo_folio_ser *pfolio = &folios_ser[i]; struct folio *folio = folios[i]; - unsigned int flags = 0; err = kho_preserve_folio(folio); if (err) @@ -154,8 +153,26 @@ static int memfd_luo_preserve_folios(struct file *file, folio_lock(folio); - if (folio_test_dirty(folio)) - flags |= MEMFD_LUO_FOLIO_DIRTY; + /* + * A dirty folio is one which has been written to. A clean folio + * is its opposite. Since a clean folio does not carry user + * data, it can be freed by page reclaim under memory pressure. + * + * Saving the dirty flag at prepare() time doesn't work since it + * can change later. Saving it at freeze() also won't work + * because the dirty bit is normally synced at unmap and there + * might still be a mapping of the file at freeze(). + * + * To see why this is a problem, say a folio is clean at + * preserve, but gets dirtied later. The pfolio flags will mark + * it as clean. After retrieve, the next kernel might try to + * reclaim this folio under memory pressure, losing user data. + * + * Unconditionally mark it dirty to avoid this problem. This + * comes at the cost of making clean folios un-reclaimable after + * live update. + */ + folio_mark_dirty(folio); /* * If the folio is not uptodate, it was fallocated but never @@ -174,12 +191,11 @@ static int memfd_luo_preserve_folios(struct file *file, flush_dcache_folio(folio); folio_mark_uptodate(folio); } - flags |= MEMFD_LUO_FOLIO_UPTODATE; folio_unlock(folio); pfolio->pfn = folio_pfn(folio); - pfolio->flags = flags; + pfolio->flags = MEMFD_LUO_FOLIO_DIRTY | MEMFD_LUO_FOLIO_UPTODATE; pfolio->index = folio->index; } From d210fdcac9c0d1380eab448aebc93f602c1cd4e6 Mon Sep 17 00:00:00 2001 From: Raul Pazemecxas De Andrade Date: Mon, 23 Feb 2026 17:10:59 -0800 Subject: [PATCH 03/15] mm/damon/core: clear walk_control on inactive context in damos_walk() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit damos_walk() sets ctx->walk_control to the caller-provided control structure before checking whether the context is running. If the context is inactive (damon_is_running() returns false), the function returns -EINVAL without clearing ctx->walk_control. This leaves a dangling pointer to a stack-allocated structure that will be freed when the caller returns. This is structurally identical to the bug fixed in commit f9132fbc2e83 ("mm/damon/core: remove call_control in inactive contexts") for damon_call(), which had the same pattern of linking a control object and returning an error without unlinking it. The dangling walk_control pointer can cause: 1. Use-after-free if the context is later started and kdamond    dereferences ctx->walk_control (e.g., in damos_walk_cancel()    which writes to control->canceled and calls complete()) 2. Permanent -EBUSY from subsequent damos_walk() calls, since the    stale pointer is non-NULL Nonetheless, the real user impact is quite restrictive. The use-after-free is impossible because there is no damos_walk() callers who starts the context later. The permanent -EBUSY can actually confuse users, as DAMON is not running. But the symptom is kept only while the context is turned off. Turning it on again will make DAMON internally uses a newly generated damon_ctx object that doesn't have the invalid damos_walk_control pointer, so everything will work fine again. Fix this by clearing ctx->walk_control under walk_control_lock before returning -EINVAL, mirroring the fix pattern from f9132fbc2e83. Link: https://lkml.kernel.org/r/20260224011102.56033-1-sj@kernel.org Fixes: bf0eaba0ff9c ("mm/damon/core: implement damos_walk()") Reported-by: Raul Pazemecxas De Andrade Closes: https://lore.kernel.org/CPUPR80MB8171025468965E583EF2490F956CA@CPUPR80MB8171.lamprd80.prod.outlook.com Signed-off-by: Raul Pazemecxas De Andrade Signed-off-by: SeongJae Park Reviewed-by: SeongJae Park Cc: [6.14+] Signed-off-by: Andrew Morton --- mm/damon/core.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/mm/damon/core.c b/mm/damon/core.c index adfc52fee9dc..c1d1091d307e 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -1562,8 +1562,13 @@ int damos_walk(struct damon_ctx *ctx, struct damos_walk_control *control) } ctx->walk_control = control; mutex_unlock(&ctx->walk_control_lock); - if (!damon_is_running(ctx)) + if (!damon_is_running(ctx)) { + mutex_lock(&ctx->walk_control_lock); + if (ctx->walk_control == control) + ctx->walk_control = NULL; + mutex_unlock(&ctx->walk_control_lock); return -EINVAL; + } wait_for_completion(&control->completion); if (control->canceled) return -ECANCELED; From f4355d6bb39fc8e53d772fa0654c8441b214e349 Mon Sep 17 00:00:00 2001 From: Zi Yan Date: Tue, 24 Feb 2026 22:12:31 -0500 Subject: [PATCH 04/15] mm/cma: move put_page_testzero() out of VM_WARN_ON in cma_release() When CONFIG_DEBUG_VM is not set, VM_WARN_ON is a NOP. Putting any statement with side effect inside it is incorrect. Collect all !put_page_testzero() results and check the sum using WARN instead after the loop. It restores the same check in free_contig_range() before commit e0c1326779cc ("mm: page_alloc: add alloc_contig_frozen_{range,pages}()"), the commit prior to the Fixes one. Link: https://lkml.kernel.org/r/20260225031231.2352011-1-ziy@nvidia.com Fixes: 9bda131c6093 ("mm: cma: add cma_alloc_frozen{_compound}()") Signed-off-by: Zi Yan Reported-by: Ron Economos Closes: https://lore.kernel.org/all/1b17c38f-30d3-4bb4-a7e1-e74b19ada885@w6rz.net/ Suggested-by: Kefeng Wang Reviewed-by: Vishal Moola (Oracle) Debugged-by: David Hildenbrand (Arm) Acked-by: David Hildenbrand (Arm) Tested-by: Ron Economos Reviewed-by: Kefeng Wang Reviewed-by: Anshuman Khandual Reviewed-by: SeongJae Park Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: Michal Hocko Cc: Mike Rapoport Cc: Suren Baghdasaryan Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- mm/cma.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/cma.c b/mm/cma.c index 94b5da468a7d..15cc0ae76c8e 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -1013,6 +1013,7 @@ bool cma_release(struct cma *cma, const struct page *pages, unsigned long count) { struct cma_memrange *cmr; + unsigned long ret = 0; unsigned long i, pfn; cmr = find_cma_memrange(cma, pages, count); @@ -1021,7 +1022,9 @@ bool cma_release(struct cma *cma, const struct page *pages, pfn = page_to_pfn(pages); for (i = 0; i < count; i++, pfn++) - VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))); + ret += !put_page_testzero(pfn_to_page(pfn)); + + WARN(ret, "%lu pages are still in use!\n", ret); __cma_release_frozen(cma, cmr, pages, count); From 2d28ed588f8d7d0d41b0a4fad7f0d05e4bbf1797 Mon Sep 17 00:00:00 2001 From: Axel Rasmussen Date: Tue, 24 Feb 2026 16:24:34 -0800 Subject: [PATCH 05/15] Revert "ptdesc: remove references to folios from __pagetable_ctor() and pagetable_dtor()" This change swapped out mod_node_page_state for lruvec_stat_add_folio. But, these two APIs are not interchangeable: the lruvec version also increments memcg stats, in addition to "global" pgdat stats. So after this change, the "pagetables" memcg stat in memory.stat always yields "0", which is a userspace visible regression. I tried to look for a refactor where we add a variant of lruvec_stat_mod_folio which takes a pgdat and a memcg instead of a folio, to try to adhere to the spirit of the original patch. But at the end of the day this just means we have to call folio_memcg(ptdesc_folio(ptdesc)) anyway, which doesn't really accomplish much. This regression is visible in master as well as 6.18 stable, so CC stable too. Link: https://lkml.kernel.org/r/20260225002434.2953895-1-axelrasmussen@google.com Fixes: f0c92726e89f ("ptdesc: remove references to folios from __pagetable_ctor() and pagetable_dtor()") Signed-off-by: Axel Rasmussen Acked-by: Shakeel Butt Acked-by: Johannes Weiner Reviewed-by: Vishal Moola (Oracle) Cc: David Hildenbrand Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Mike Rapoport Cc: Suren Baghdasaryan Cc: Vlastimil Babka Cc: Roman Gushchin Cc: Muchun Song Cc: Signed-off-by: Andrew Morton --- include/linux/mm.h | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 5be3d8a8f806..abb4963c1f06 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3514,26 +3514,21 @@ static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; } static inline void ptlock_free(struct ptdesc *ptdesc) {} #endif /* defined(CONFIG_SPLIT_PTE_PTLOCKS) */ -static inline unsigned long ptdesc_nr_pages(const struct ptdesc *ptdesc) -{ - return compound_nr(ptdesc_page(ptdesc)); -} - static inline void __pagetable_ctor(struct ptdesc *ptdesc) { - pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags)); + struct folio *folio = ptdesc_folio(ptdesc); - __SetPageTable(ptdesc_page(ptdesc)); - mod_node_page_state(pgdat, NR_PAGETABLE, ptdesc_nr_pages(ptdesc)); + __folio_set_pgtable(folio); + lruvec_stat_add_folio(folio, NR_PAGETABLE); } static inline void pagetable_dtor(struct ptdesc *ptdesc) { - pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags)); + struct folio *folio = ptdesc_folio(ptdesc); ptlock_free(ptdesc); - __ClearPageTable(ptdesc_page(ptdesc)); - mod_node_page_state(pgdat, NR_PAGETABLE, -ptdesc_nr_pages(ptdesc)); + __folio_clear_pgtable(folio); + lruvec_stat_sub_folio(folio, NR_PAGETABLE); } static inline void pagetable_dtor_free(struct ptdesc *ptdesc) From 5548dd7fa84510f7bbce67c35cc3b388c86aeddf Mon Sep 17 00:00:00 2001 From: "Mike Rapoport (Microsoft)" Date: Thu, 26 Feb 2026 01:31:11 +0200 Subject: [PATCH 06/15] tools/testing: fix testing/vma and testing/radix-tree build Build of VMA and radix-tree tests is unhappy after the conversion of kzalloc() to kzalloc_obj() in lib/idr.c: cc -I../shared -I. -I../../include -I../../arch/x86/include -I../../../lib -g -Og -Wall -D_LGPL_SOURCE -fsanitize=address -fsanitize=undefined -DNUM_VMA_FLAG_BITS=128 -DNUM_MM_FLAG_BITS=128 -c -o idr.o idr.c idr.c: In function `ida_alloc_range': idr.c:420:34: error: implicit declaration of function `kzalloc_obj'; did you mean `kzalloc_node'? [-Wimplicit-function-declaration] 420 | bitmap = kzalloc_obj(*bitmap, GFP_NOWAIT); | ^~~~~~~~~~~ | kzalloc_node idr.c:420:32: error: assignment to `struct ida_bitmap *' from `int' makes pointer from integer without a cast [-Wint-conversion] 420 | bitmap = kzalloc_obj(*bitmap, GFP_NOWAIT); | ^ idr.c:447:40: error: assignment to `struct ida_bitmap *' from `int' makes pointer from integer without a cast [-Wint-conversion] 447 | bitmap = kzalloc_obj(*bitmap, GFP_NOWAIT); | ^ idr.c:468:15: error: assignment to `struct ida_bitmap *' from `int' makes pointer from integer without a cast [-Wint-conversion] 468 | alloc = kzalloc_obj(*bitmap, gfp); | ^ make: *** [: idr.o] Error 1 Import necessary macros from include/linux to tools/include/linux to fix the compilation. Link: https://lkml.kernel.org/r/20260225233111.2760752-1-rppt@kernel.org Fixes: 69050f8d6d07 ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types") Signed-off-by: Mike Rapoport (Microsoft) Tested-by: SeongJae Park Reviewed-by: Lorenzo Stoakes Cc: David Hildenbrand Cc: Kees Cook Cc: Liam Howlett Cc: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- tools/include/linux/gfp.h | 4 ++++ tools/include/linux/overflow.h | 19 +++++++++++++++++++ tools/include/linux/slab.h | 9 +++++++++ 3 files changed, 32 insertions(+) diff --git a/tools/include/linux/gfp.h b/tools/include/linux/gfp.h index 6a10ff5f5be9..9e957b57b694 100644 --- a/tools/include/linux/gfp.h +++ b/tools/include/linux/gfp.h @@ -5,6 +5,10 @@ #include #include +/* Helper macro to avoid gfp flags if they are the default one */ +#define __default_gfp(a,...) a +#define default_gfp(...) __default_gfp(__VA_ARGS__ __VA_OPT__(,) GFP_KERNEL) + static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags) { return !!(gfp_flags & __GFP_DIRECT_RECLAIM); diff --git a/tools/include/linux/overflow.h b/tools/include/linux/overflow.h index dcb0c1bf6866..3427d7880326 100644 --- a/tools/include/linux/overflow.h +++ b/tools/include/linux/overflow.h @@ -68,6 +68,25 @@ __builtin_mul_overflow(__a, __b, __d); \ }) +/** + * size_mul() - Calculate size_t multiplication with saturation at SIZE_MAX + * @factor1: first factor + * @factor2: second factor + * + * Returns: calculate @factor1 * @factor2, both promoted to size_t, + * with any overflow causing the return value to be SIZE_MAX. The + * lvalue must be size_t to avoid implicit type conversion. + */ +static inline size_t __must_check size_mul(size_t factor1, size_t factor2) +{ + size_t bytes; + + if (check_mul_overflow(factor1, factor2, &bytes)) + return SIZE_MAX; + + return bytes; +} + /** * array_size() - Calculate size of 2-dimensional array. * diff --git a/tools/include/linux/slab.h b/tools/include/linux/slab.h index 94937a699402..6d8e9413d5a4 100644 --- a/tools/include/linux/slab.h +++ b/tools/include/linux/slab.h @@ -202,4 +202,13 @@ static inline unsigned int kmem_cache_sheaf_size(struct slab_sheaf *sheaf) return sheaf->size; } +#define __alloc_objs(KMALLOC, GFP, TYPE, COUNT) \ +({ \ + const size_t __obj_size = size_mul(sizeof(TYPE), COUNT); \ + (TYPE *)KMALLOC(__obj_size, GFP); \ +}) + +#define kzalloc_obj(P, ...) \ + __alloc_objs(kzalloc, default_gfp(__VA_ARGS__), typeof(P), 1) + #endif /* _TOOLS_SLAB_H */ From ba4c3698e6963eacd8e7c86c13343631bfeabe55 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Thu, 26 Feb 2026 11:54:21 +0900 Subject: [PATCH 07/15] zram: rename writeback_compressed device attr MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Rename writeback_compressed attr to compressed_writeback to avoid possible confusion and have more natural naming. writeback_compressed may look like an alternative version of writeback while in fact writeback_compressed only sets a writeback property. Make this distinction more clear with a new compressed_writeback name. This updates a feature which is new in 7.0-rcX. Link: https://lkml.kernel.org/r/20260226025429.1042083-1-senozhatsky@chromium.org Fixes: 4c1d61389e8e ("zram: introduce writeback_compressed device attribute") Signed-off-by: Sergey Senozhatsky Suggested-by: Minchan Kim Acked-by: Minchan Kim Cc: Brian Geffon Cc: Richard Chang Cc: Suren Baghdasaryan Cc: "Christoph Böhmwalder" Cc: Jens Axboe Cc: Jonathan Corbet Cc: Lars Ellenberg Cc: Philipp Reisner Cc: Shuah Khan Signed-off-by: Andrew Morton --- Documentation/ABI/testing/sysfs-block-zram | 4 ++-- Documentation/admin-guide/blockdev/zram.rst | 6 +++--- drivers/block/zram/zram_drv.c | 24 ++++++++++----------- drivers/block/zram/zram_drv.h | 2 +- 4 files changed, 18 insertions(+), 18 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram index e538d4850d61..64c03010e951 100644 --- a/Documentation/ABI/testing/sysfs-block-zram +++ b/Documentation/ABI/testing/sysfs-block-zram @@ -151,11 +151,11 @@ Description: The algorithm_params file is write-only and is used to setup compression algorithm parameters. -What: /sys/block/zram/writeback_compressed +What: /sys/block/zram/compressed_writeback Date: Decemeber 2025 Contact: Richard Chang Description: - The writeback_compressed device atrribute toggles compressed + The compressed_writeback device atrribute toggles compressed writeback feature. What: /sys/block/zram/writeback_batch_size diff --git a/Documentation/admin-guide/blockdev/zram.rst b/Documentation/admin-guide/blockdev/zram.rst index 94bb7f2245ee..451fa00d3004 100644 --- a/Documentation/admin-guide/blockdev/zram.rst +++ b/Documentation/admin-guide/blockdev/zram.rst @@ -216,7 +216,7 @@ writeback_limit WO specifies the maximum amount of write IO zram writeback_limit_enable RW show and set writeback_limit feature writeback_batch_size RW show and set maximum number of in-flight writeback operations -writeback_compressed RW show and set compressed writeback feature +compressed_writeback RW show and set compressed writeback feature comp_algorithm RW show and change the compression algorithm algorithm_params WO setup compression algorithm parameters compact WO trigger memory compaction @@ -439,11 +439,11 @@ budget in next setting is user's job. By default zram stores written back pages in decompressed (raw) form, which means that writeback operation involves decompression of the page before writing it to the backing device. This behavior can be changed by enabling -`writeback_compressed` feature, which causes zram to write compressed pages +`compressed_writeback` feature, which causes zram to write compressed pages to the backing device, thus avoiding decompression overhead. To enable this feature, execute:: - $ echo yes > /sys/block/zramX/writeback_compressed + $ echo yes > /sys/block/zramX/compressed_writeback Note that this feature should be configured before the `zramX` device is initialized. diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index bca33403fc8b..a324ede6206d 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -549,7 +549,7 @@ static ssize_t bd_stat_show(struct device *dev, struct device_attribute *attr, return ret; } -static ssize_t writeback_compressed_store(struct device *dev, +static ssize_t compressed_writeback_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { @@ -564,12 +564,12 @@ static ssize_t writeback_compressed_store(struct device *dev, return -EBUSY; } - zram->wb_compressed = val; + zram->compressed_wb = val; return len; } -static ssize_t writeback_compressed_show(struct device *dev, +static ssize_t compressed_writeback_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -577,7 +577,7 @@ static ssize_t writeback_compressed_show(struct device *dev, struct zram *zram = dev_to_zram(dev); guard(rwsem_read)(&zram->dev_lock); - val = zram->wb_compressed; + val = zram->compressed_wb; return sysfs_emit(buf, "%d\n", val); } @@ -946,7 +946,7 @@ static int zram_writeback_complete(struct zram *zram, struct zram_wb_req *req) goto out; } - if (zram->wb_compressed) { + if (zram->compressed_wb) { /* * ZRAM_WB slots get freed, we need to preserve data required * for read decompression. @@ -960,7 +960,7 @@ static int zram_writeback_complete(struct zram *zram, struct zram_wb_req *req) set_slot_flag(zram, index, ZRAM_WB); set_slot_handle(zram, index, req->blk_idx); - if (zram->wb_compressed) { + if (zram->compressed_wb) { if (huge) set_slot_flag(zram, index, ZRAM_HUGE); set_slot_size(zram, index, size); @@ -1100,7 +1100,7 @@ static int zram_writeback_slots(struct zram *zram, */ if (!test_slot_flag(zram, index, ZRAM_PP_SLOT)) goto next; - if (zram->wb_compressed) + if (zram->compressed_wb) err = read_from_zspool_raw(zram, req->page, index); else err = read_from_zspool(zram, req->page, index); @@ -1429,7 +1429,7 @@ static void zram_async_read_endio(struct bio *bio) * * Keep the existing behavior for now. */ - if (zram->wb_compressed == false) { + if (zram->compressed_wb == false) { /* No decompression needed, complete the parent IO */ bio_endio(req->parent); bio_put(bio); @@ -1508,7 +1508,7 @@ static int read_from_bdev_sync(struct zram *zram, struct page *page, u32 index, flush_work(&req.work); destroy_work_on_stack(&req.work); - if (req.error || zram->wb_compressed == false) + if (req.error || zram->compressed_wb == false) return req.error; return decompress_bdev_page(zram, page, index); @@ -3007,7 +3007,7 @@ static DEVICE_ATTR_WO(writeback); static DEVICE_ATTR_RW(writeback_limit); static DEVICE_ATTR_RW(writeback_limit_enable); static DEVICE_ATTR_RW(writeback_batch_size); -static DEVICE_ATTR_RW(writeback_compressed); +static DEVICE_ATTR_RW(compressed_writeback); #endif #ifdef CONFIG_ZRAM_MULTI_COMP static DEVICE_ATTR_RW(recomp_algorithm); @@ -3031,7 +3031,7 @@ static struct attribute *zram_disk_attrs[] = { &dev_attr_writeback_limit.attr, &dev_attr_writeback_limit_enable.attr, &dev_attr_writeback_batch_size.attr, - &dev_attr_writeback_compressed.attr, + &dev_attr_compressed_writeback.attr, #endif &dev_attr_io_stat.attr, &dev_attr_mm_stat.attr, @@ -3091,7 +3091,7 @@ static int zram_add(void) init_rwsem(&zram->dev_lock); #ifdef CONFIG_ZRAM_WRITEBACK zram->wb_batch_size = 32; - zram->wb_compressed = false; + zram->compressed_wb = false; #endif /* gendisk structure */ diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index 515a72d9c06f..f0de8f8218f5 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -133,7 +133,7 @@ struct zram { #ifdef CONFIG_ZRAM_WRITEBACK struct file *backing_dev; bool wb_limit_enable; - bool wb_compressed; + bool compressed_wb; u32 wb_batch_size; u64 bd_wb_limit; struct block_device *bdev; From a1e59fc6ee4ed8988ea4aeb9224e75d03175be9c Mon Sep 17 00:00:00 2001 From: "Ritesh Harjani (IBM)" Date: Thu, 26 Feb 2026 17:56:30 +0530 Subject: [PATCH 08/15] mm/hugetlb.c: use __pa() instead of virt_to_phys() in early bootmem alloc code Architecture like powerpc, checks for pfn_valid() in their virt_to_phys() implementation (when CONFIG_DEBUG_VIRTUAL is enabled) [1]. Commit d49004c5f0c1 "arch, mm: consolidate initialization of nodes, zones and memory map" changed the order of initialization between hugetlb_bootmem_alloc() and free_area_init(). This means, pfn_valid() can now return false in alloc_bootmem() path, since sparse_init() is not yet done. Since, alloc_bootmem() uses memblock_alloc(.., MEMBLOCK_ALLOC_ACCESSIBLE), this means these allocations are always going to happen below high_memory, where __pa() should return valid physical addresses. Hence this patch converts the two callers of virt_to_phys() in alloc_bootmem() path to __pa() to avoid this bootup warning: ------------[ cut here ]------------ WARNING: arch/powerpc/include/asm/io.h:879 at virt_to_phys+0x44/0x1b8, CPU#0: swapper/0 Modules linked in: <...> NIP [c000000000601584] virt_to_phys+0x44/0x1b8 LR [c000000004075de4] alloc_bootmem+0x144/0x1a8 Call Trace: [c000000004d1fb50] [c000000004075dd4] alloc_bootmem+0x134/0x1a8 [c000000004d1fba0] [c000000004075fac] __alloc_bootmem_huge_page+0x164/0x230 [c000000004d1fbe0] [c000000004030bc4] alloc_bootmem_huge_page+0x44/0x138 [c000000004d1fc10] [c000000004076e48] hugetlb_hstate_alloc_pages+0x350/0x5ac [c000000004d1fd30] [c0000000040782f0] hugetlb_bootmem_alloc+0x15c/0x19c [c000000004d1fd70] [c00000000406d7b4] mm_core_init_early+0x7c/0xdf4 [c000000004d1ff30] [c000000004011d84] start_kernel+0xac/0xc58 [c000000004d1ffe0] [c00000000000e99c] start_here_common+0x1c/0x20 [1]: https://lore.kernel.org/linuxppc-dev/87tsv5h544.ritesh.list@gmail.com/ Link: https://lkml.kernel.org/r/b4a7d2c6c4c1dd81dddc904fc21f01303290a4b8.1772107852.git.riteshh@linux.ibm.com Fixes: d49004c5f0c1 ("arch, mm: consolidate initialization of nodes, zones and memory map") Signed-off-by: Ritesh Harjani (IBM) Reviewed-by: Mike Rapoport (Microsoft) Cc: David Hildenbrand Cc: Muchun Song Cc: Oscar Salvador Signed-off-by: Andrew Morton --- mm/hugetlb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 0beb6e22bc26..327eaa4074d3 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3101,7 +3101,7 @@ static __init void *alloc_bootmem(struct hstate *h, int nid, bool node_exact) * extract the actual node first. */ if (m) - listnode = early_pfn_to_nid(PHYS_PFN(virt_to_phys(m))); + listnode = early_pfn_to_nid(PHYS_PFN(__pa(m))); } if (m) { @@ -3160,7 +3160,7 @@ found: * The head struct page is used to get folio information by the HugeTLB * subsystem like zone id and node id. */ - memblock_reserved_mark_noinit(virt_to_phys((void *)m + PAGE_SIZE), + memblock_reserved_mark_noinit(__pa((void *)m + PAGE_SIZE), huge_page_size(h) - PAGE_SIZE); return 1; From dccd5ee2625d50239510bcd73ed78559005e00a3 Mon Sep 17 00:00:00 2001 From: Hao Li Date: Thu, 26 Feb 2026 19:51:37 +0800 Subject: [PATCH 09/15] memcg: fix slab accounting in refill_obj_stock() trylock path In the trylock path of refill_obj_stock(), mod_objcg_mlstate() should use the real alloc/free bytes (i.e., nr_acct) for accounting, rather than nr_bytes. The user-visible impact is that the NR_SLAB_RECLAIMABLE_B and NR_SLAB_UNRECLAIMABLE_B stats can end up being incorrect. For example, if a user allocates a 6144-byte object, then before this fix efill_obj_stock() calls mod_objcg_mlstate(..., nr_bytes=2048), even though it should account for 6144 bytes (i.e., nr_acct). When the user later frees the same object with kfree(), refill_obj_stock() calls mod_objcg_mlstate(..., nr_bytes=6144). This ends up adding 6144 to the stats, but it should be applying -6144 (i.e., nr_acct) since the object is being freed. Link: https://lkml.kernel.org/r/20260226115145.62903-1-hao.li@linux.dev Fixes: 200577f69f29 ("memcg: objcg stock trylock without irq disabling") Signed-off-by: Hao Li Acked-by: Shakeel Butt Acked-by: Johannes Weiner Cc: Michal Hocko Cc: Muchun Song Cc: Roman Gushchin Cc: Vlastimil Babka Cc: Signed-off-by: Andrew Morton --- mm/memcontrol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a52da3a5e4fd..772bac21d155 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3086,7 +3086,7 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes, if (!local_trylock(&obj_stock.lock)) { if (pgdat) - mod_objcg_mlstate(objcg, pgdat, idx, nr_bytes); + mod_objcg_mlstate(objcg, pgdat, idx, nr_acct); nr_pages = nr_bytes >> PAGE_SHIFT; nr_bytes = nr_bytes & (PAGE_SIZE - 1); atomic_add(nr_bytes, &objcg->nr_charged_bytes); From 06de173b138513087896f9cf090f30b35846518d Mon Sep 17 00:00:00 2001 From: Jason Xing Date: Sun, 1 Mar 2026 10:09:02 +0800 Subject: [PATCH 10/15] MAINTAINERS: add RELAY entry RELAYFS was originally developed by Tom Zanussi and Karim Yaghmour in 2005[1]. Jens Axboe converted it from filesystem into a generic API in 2006[2] and made it widely known through the notable I/O tracing tool blktrace. In the decade, there remain a few users scatterred across different subsystems, like recently added wifi commit[3] that is an example to show how to communicate between users and kernel. Last year I've already done some maintenance and added/corrected some diagnostic counters. At Tencent, we internally maintain RELAY as one of most crucial components of network observibility platform which was shared a bit at LPC 2025[4][5] and hopefully will be published in the paper this year. RELAY has proven highly efficient due to its inherent design essence. This design becomes the indispensable way to build a 7x24 platform monitoring various hot paths even without any selectively sampling (yes, sampling is commonly used to avoid the overall performance degradation). One of the recommended usages is to use its zerocopy function relay_reserve() to transfer data in a raw format that can be recognized and parsed by the corresponding application to userspace without introducing heavy locks and complicated logic that appears in other types of approaches, like printk. More details can be discovered by reading through the Documentation :) Credits are given to the all the contributors and reviewers for RELAY/RELAYFS in the past and future! Many thanks! [1]: commit e82894f84dbb ("[PATCH] relayfs") [2]: commit b86ff981a825 ("[PATCH] relay: migrate from relayfs to a generic relay API") [3]: commit c1bf6959dd81 ("wifi: ath11k: Register relayfs entries for CFR dump") [4]: https://lpc.events/event/19/contributions/2055/ [5]: https://lpc.events/event/19/contributions/2010/ Link: https://lkml.kernel.org/r/20260301020902.56476-1-kerneljasonxing@gmail.com Signed-off-by: Jason Xing Acked-by: Andrew Morton Acked-by: Jens Axboe Cc: Andriy Shevchenko Cc: Tom Zanussi Signed-off-by: Andrew Morton --- MAINTAINERS | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index e4572a36afd2..0ecf11bab619 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -22284,6 +22284,16 @@ L: linux-wireless@vger.kernel.org S: Orphan F: drivers/net/wireless/rsi/ +RELAY +M: Andrew Morton +M: Jens Axboe +M: Jason Xing +L: linux-kernel@vger.kernel.org +S: Maintained +F: Documentation/filesystems/relay.rst +F: include/linux/relay.h +F: kernel/relay.c + REGISTER MAP ABSTRACTION M: Mark Brown L: linux-kernel@vger.kernel.org From 431b04f0084d244569e81ca4216a40644b23b0c5 Mon Sep 17 00:00:00 2001 From: "Vlastimil Babka (SUSE)" Date: Mon, 2 Mar 2026 11:13:46 +0100 Subject: [PATCH 11/15] MAINTAINERS: add co-maintainer and reviewer for SLAB ALLOCATOR Promote Harry Yoo from reviewer to maintainer. Harry's been involved in slab development for multiple years now and doing a great job. Add Hao Li as a new reviewer. Hao has been doing very useful reviews for a while now, so make it official and ensure the Cc's. Link: https://lkml.kernel.org/r/20260302101345.36713-2-vbabka@kernel.org Signed-off-by: Vlastimil Babka (SUSE) Acked-by: Lorenzo Stoakes Acked-by: Harry Yoo Acked-by: Hao Li Acked-by: SeongJae Park Cc: Christoph Lameter Cc: David Rientjes Cc: Roman Gushchin Signed-off-by: Andrew Morton --- MAINTAINERS | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 0ecf11bab619..e510fbc6f882 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -24361,11 +24361,12 @@ F: drivers/nvmem/layouts/sl28vpd.c SLAB ALLOCATOR M: Vlastimil Babka +M: Harry Yoo M: Andrew Morton +R: Hao Li R: Christoph Lameter R: David Rientjes R: Roman Gushchin -R: Harry Yoo L: linux-mm@kvack.org S: Maintained T: git git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git From 577a1f495fd78d8fb61b67ac3d3b595b01f6fcb0 Mon Sep 17 00:00:00 2001 From: Zi Yan Date: Mon, 2 Mar 2026 15:31:59 -0500 Subject: [PATCH 12/15] mm/huge_memory: fix a folio_split() race condition with folio_try_get() During a pagecache folio split, the values in the related xarray should not be changed from the original folio at xarray split time until all after-split folios are well formed and stored in the xarray. Current use of xas_try_split() in __split_unmapped_folio() lets some after-split folios show up at wrong indices in the xarray. When these misplaced after-split folios are unfrozen, before correct folios are stored via __xa_store(), and grabbed by folio_try_get(), they are returned to userspace at wrong file indices, causing data corruption. More detailed explanation is at the bottom. The reproducer is at: https://github.com/dfinity/thp-madv-remove-test It 1. creates a memfd, 2. forks, 3. in the child process, maps the file with large folios (via shmem code path) and reads the mapped file continuously with 16 threads, 4. in the parent process, uses madvise(MADV_REMOVE) to punch poles in the large folio. Data corruption can be observed without the fix. Basically, data from a wrong page->index is returned. Fix it by using the original folio in xas_try_split() calls, so that folio_try_get() can get the right after-split folios after the original folio is unfrozen. Uniform split, split_huge_page*(), is not affected, since it uses xas_split_alloc() and xas_split() only once and stores the original folio in the xarray. Change xas_split() used in uniform split branch to use the original folio to avoid confusion. Fixes below points to the commit introduces the code, but folio_split() is used in a later commit 7460b470a131f ("mm/truncate: use folio_split() in truncate operation"). More details: For example, a folio f is split non-uniformly into f, f2, f3, f4 like below: +----------------+---------+----+----+ | f | f2 | f3 | f4 | +----------------+---------+----+----+ but the xarray would look like below after __split_unmapped_folio() is done: +----------------+---------+----+----+ | f | f2 | f3 | f3 | +----------------+---------+----+----+ After __split_unmapped_folio(), the code changes the xarray and unfreezes after-split folios: 1. unfreezes f2, __xa_store(f2) 2. unfreezes f3, __xa_store(f3) 3. unfreezes f4, __xa_store(f4), which overwrites the second f3 to f4. 4. unfreezes f. Meanwhile, a parallel filemap_get_entry() can read the second f3 from the xarray and use folio_try_get() on it at step 2 when f3 is unfrozen. Then, f3 is wrongly returned to user. After the fix, the xarray looks like below after __split_unmapped_folio(): +----------------+---------+----+----+ | f | f | f | f | +----------------+---------+----+----+ so that the race window no longer exists. [ziy@nvidia.com: move comment, per David] Link: https://lkml.kernel.org/r/5C9FA053-A4C6-4615-BE05-74E47A6462B3@nvidia.com Link: https://lkml.kernel.org/r/20260302203159.3208341-1-ziy@nvidia.com Fixes: 00527733d0dc ("mm/huge_memory: add two new (not yet used) functions for folio_split()") Signed-off-by: Zi Yan Reported-by: Bas van Dijk Closes: https://lore.kernel.org/all/CAKNNEtw5_kZomhkugedKMPOG-sxs5Q5OLumWJdiWXv+C9Yct0w@mail.gmail.com/ Tested-by: Lance Yang Reviewed-by: Lorenzo Stoakes Reviewed-by: Wei Yang Reviewed-by: Baolin Wang Cc: Barry Song Cc: David Hildenbrand Cc: Dev Jain Cc: Hugh Dickins Cc: Liam Howlett Cc: Matthew Wilcox (Oracle) Cc: Nico Pache Cc: Ryan Roberts Cc: Signed-off-by: Andrew Morton --- mm/huge_memory.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 8e2746ea74ad..912c248a3f7e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3631,6 +3631,7 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, const bool is_anon = folio_test_anon(folio); int old_order = folio_order(folio); int start_order = split_type == SPLIT_TYPE_UNIFORM ? new_order : old_order - 1; + struct folio *old_folio = folio; int split_order; /* @@ -3651,12 +3652,16 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, * uniform split has xas_split_alloc() called before * irq is disabled to allocate enough memory, whereas * non-uniform split can handle ENOMEM. + * Use the to-be-split folio, so that a parallel + * folio_try_get() waits on it until xarray is updated + * with after-split folios and the original one is + * unfrozen. */ - if (split_type == SPLIT_TYPE_UNIFORM) - xas_split(xas, folio, old_order); - else { + if (split_type == SPLIT_TYPE_UNIFORM) { + xas_split(xas, old_folio, old_order); + } else { xas_set_order(xas, folio->index, split_order); - xas_try_split(xas, folio, old_order); + xas_try_split(xas, old_folio, old_order); if (xas_error(xas)) return xas_error(xas); } From 7392f8e4ea632622b2cd2086675ba022db238b3a Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sun, 1 Mar 2026 16:52:29 -0800 Subject: [PATCH 13/15] uaccess: correct kernel-doc parameter format Use the correct kernel-doc function parameter format to avoid kernel-doc warnings: Warning: include/linux/uaccess.h:814 function parameter 'uptr' not described in 'scoped_user_rw_access_size' Warning: include/linux/uaccess.h:826 function parameter 'uptr' not described in 'scoped_user_rw_access' Link: https://lkml.kernel.org/r/20260302005229.3471955-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap Reviewed-by: Andrew Morton Signed-off-by: Andrew Morton --- include/linux/uaccess.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h index 1f3804245c06..001cfef21b61 100644 --- a/include/linux/uaccess.h +++ b/include/linux/uaccess.h @@ -806,7 +806,7 @@ for (bool done = false; !done; done = true) \ /** * scoped_user_rw_access_size - Start a scoped user read/write access with given size - * @uptr Pointer to the user space address to read from and write to + * @uptr: Pointer to the user space address to read from and write to * @size: Size of the access starting from @uptr * @elbl: Error label to goto when the access region is rejected * @@ -817,7 +817,7 @@ for (bool done = false; !done; done = true) \ /** * scoped_user_rw_access - Start a scoped user read/write access - * @uptr Pointer to the user space address to read from and write to + * @uptr: Pointer to the user space address to read from and write to * @elbl: Error label to goto when the access region is rejected * * The size of the access starting from @uptr is determined via sizeof(*@uptr)). From 599b4e290c8766b19378d85d4310c6ec8f90ade4 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sun, 1 Mar 2026 16:52:22 -0800 Subject: [PATCH 14/15] mm/mmu_notifier: clean up mmu_notifier.h kernel-doc Eliminate kernel-doc warnings in mmu_notifier.h: - add a missing struct short description - use the correct format for function parameters - add missing function return comment sections Warning: include/linux/mmu_notifier.h:236 missing initial short description on line: * struct mmu_interval_notifier_ops Warning: include/linux/mmu_notifier.h:325 function parameter 'interval_sub' not described in 'mmu_interval_set_seq' Warning: include/linux/mmu_notifier.h:325 function parameter 'cur_seq' not described in 'mmu_interval_set_seq' Warning: include/linux/mmu_notifier.h:346 function parameter 'interval_sub' not described in 'mmu_interval_read_retry' Warning: include/linux/mmu_notifier.h:346 function parameter 'seq' not described in 'mmu_interval_read_retry' Warning: include/linux/mmu_notifier.h:346 No description found for return value of 'mmu_interval_read_retry' Warning: include/linux/mmu_notifier.h:370 function parameter 'interval_sub' not described in 'mmu_interval_check_retry' Warning: include/linux/mmu_notifier.h:370 function parameter 'seq' not described in 'mmu_interval_check_retry' Warning: include/linux/mmu_notifier.h:370 No description found for return value of 'mmu_interval_check_retry' Link: https://lkml.kernel.org/r/20260302005222.3470783-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap Reviewed-by: Jason Gunthorpe Cc: David Hildenbrand Cc: "Liam R. Howlett" Cc: Lorenzo Stoakes Cc: Michal Hocko Cc: Mike Rapoport Cc: Randy Dunlap Cc: Suren Baghdasaryan Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/mmu_notifier.h | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 07a2bbaf86e9..8450e18a87c2 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -234,7 +234,7 @@ struct mmu_notifier { }; /** - * struct mmu_interval_notifier_ops + * struct mmu_interval_notifier_ops - callback for range notification * @invalidate: Upon return the caller must stop using any SPTEs within this * range. This function can sleep. Return false only if sleeping * was required but mmu_notifier_range_blockable(range) is false. @@ -309,8 +309,8 @@ void mmu_interval_notifier_remove(struct mmu_interval_notifier *interval_sub); /** * mmu_interval_set_seq - Save the invalidation sequence - * @interval_sub - The subscription passed to invalidate - * @cur_seq - The cur_seq passed to the invalidate() callback + * @interval_sub: The subscription passed to invalidate + * @cur_seq: The cur_seq passed to the invalidate() callback * * This must be called unconditionally from the invalidate callback of a * struct mmu_interval_notifier_ops under the same lock that is used to call @@ -329,8 +329,8 @@ mmu_interval_set_seq(struct mmu_interval_notifier *interval_sub, /** * mmu_interval_read_retry - End a read side critical section against a VA range - * interval_sub: The subscription - * seq: The return of the paired mmu_interval_read_begin() + * @interval_sub: The subscription + * @seq: The return of the paired mmu_interval_read_begin() * * This MUST be called under a user provided lock that is also held * unconditionally by op->invalidate() when it calls mmu_interval_set_seq(). @@ -338,7 +338,7 @@ mmu_interval_set_seq(struct mmu_interval_notifier *interval_sub, * Each call should be paired with a single mmu_interval_read_begin() and * should be used to conclude the read side. * - * Returns true if an invalidation collided with this critical section, and + * Returns: true if an invalidation collided with this critical section, and * the caller should retry. */ static inline bool @@ -350,20 +350,21 @@ mmu_interval_read_retry(struct mmu_interval_notifier *interval_sub, /** * mmu_interval_check_retry - Test if a collision has occurred - * interval_sub: The subscription - * seq: The return of the matching mmu_interval_read_begin() + * @interval_sub: The subscription + * @seq: The return of the matching mmu_interval_read_begin() * * This can be used in the critical section between mmu_interval_read_begin() - * and mmu_interval_read_retry(). A return of true indicates an invalidation - * has collided with this critical region and a future - * mmu_interval_read_retry() will return true. - * - * False is not reliable and only suggests a collision may not have - * occurred. It can be called many times and does not have to hold the user - * provided lock. + * and mmu_interval_read_retry(). * * This call can be used as part of loops and other expensive operations to * expedite a retry. + * It can be called many times and does not have to hold the user + * provided lock. + * + * Returns: true indicates an invalidation has collided with this critical + * region and a future mmu_interval_read_retry() will return true. + * False is not reliable and only suggests a collision may not have + * occurred. */ static inline bool mmu_interval_check_retry(struct mmu_interval_notifier *interval_sub, From b12bbe35c7c1e431f2fa01fe9291daa52fb7ab43 Mon Sep 17 00:00:00 2001 From: "Lorenzo Stoakes (Oracle)" Date: Tue, 3 Mar 2026 19:50:25 +0000 Subject: [PATCH 15/15] MAINTAINERS, mailmap: update email address for Lorenzo Stoakes I want to experiment with a new email setup, and using the @kernel.org address is the easiest way to have flexibility on this. Link: https://lkml.kernel.org/r/20260303195025.1170895-1-ljs@kernel.org Signed-off-by: Lorenzo Stoakes (Oracle) Signed-off-by: Andrew Morton --- .mailmap | 3 ++- MAINTAINERS | 20 ++++++++++---------- 2 files changed, 12 insertions(+), 11 deletions(-) diff --git a/.mailmap b/.mailmap index c124a1306d26..fd062abdb133 100644 --- a/.mailmap +++ b/.mailmap @@ -491,7 +491,8 @@ Lior David Loic Poulain Loic Poulain Lorenzo Pieralisi -Lorenzo Stoakes +Lorenzo Stoakes +Lorenzo Stoakes Luca Ceresoli Luca Weiss Lucas De Marchi diff --git a/MAINTAINERS b/MAINTAINERS index e510fbc6f882..a3b4e75ad1ce 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16654,7 +16654,7 @@ F: mm/balloon.c MEMORY MANAGEMENT - CORE M: Andrew Morton M: David Hildenbrand -R: Lorenzo Stoakes +R: Lorenzo Stoakes R: Liam R. Howlett R: Vlastimil Babka R: Mike Rapoport @@ -16784,7 +16784,7 @@ F: mm/workingset.c MEMORY MANAGEMENT - MISC M: Andrew Morton M: David Hildenbrand -R: Lorenzo Stoakes +R: Lorenzo Stoakes R: Liam R. Howlett R: Vlastimil Babka R: Mike Rapoport @@ -16875,7 +16875,7 @@ R: David Hildenbrand R: Michal Hocko R: Qi Zheng R: Shakeel Butt -R: Lorenzo Stoakes +R: Lorenzo Stoakes L: linux-mm@kvack.org S: Maintained F: mm/vmscan.c @@ -16884,7 +16884,7 @@ F: mm/workingset.c MEMORY MANAGEMENT - RMAP (REVERSE MAPPING) M: Andrew Morton M: David Hildenbrand -M: Lorenzo Stoakes +M: Lorenzo Stoakes R: Rik van Riel R: Liam R. Howlett R: Vlastimil Babka @@ -16929,7 +16929,7 @@ F: mm/swapfile.c MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE) M: Andrew Morton M: David Hildenbrand -M: Lorenzo Stoakes +M: Lorenzo Stoakes R: Zi Yan R: Baolin Wang R: Liam R. Howlett @@ -16969,7 +16969,7 @@ F: tools/testing/selftests/mm/uffd-*.[ch] MEMORY MANAGEMENT - RUST M: Alice Ryhl -R: Lorenzo Stoakes +R: Lorenzo Stoakes R: Liam R. Howlett L: linux-mm@kvack.org L: rust-for-linux@vger.kernel.org @@ -16985,7 +16985,7 @@ F: rust/kernel/page.rs MEMORY MAPPING M: Andrew Morton M: Liam R. Howlett -M: Lorenzo Stoakes +M: Lorenzo Stoakes R: Vlastimil Babka R: Jann Horn R: Pedro Falcato @@ -17015,7 +17015,7 @@ MEMORY MAPPING - LOCKING M: Andrew Morton M: Suren Baghdasaryan M: Liam R. Howlett -M: Lorenzo Stoakes +M: Lorenzo Stoakes R: Vlastimil Babka R: Shakeel Butt L: linux-mm@kvack.org @@ -17030,7 +17030,7 @@ F: mm/mmap_lock.c MEMORY MAPPING - MADVISE (MEMORY ADVICE) M: Andrew Morton M: Liam R. Howlett -M: Lorenzo Stoakes +M: Lorenzo Stoakes M: David Hildenbrand R: Vlastimil Babka R: Jann Horn @@ -23183,7 +23183,7 @@ K: \b(?i:rust)\b RUST [ALLOC] M: Danilo Krummrich -R: Lorenzo Stoakes +R: Lorenzo Stoakes R: Vlastimil Babka R: Liam R. Howlett R: Uladzislau Rezki