If disabled, THPs faulted in or collapsed will not be added to
_deferred_list, and therefore won't be considered for splitting under
memory pressure if underused.
Link: https://lkml.kernel.org/r/20240830100438.3623486-7-usamaarif642@gmail.com
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Cc: Alexander Zhu <alexlzhu@fb.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kairui Song <ryncsn@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Shuang Zhai <zhais@google.com>
Cc: Shuang Zhai <szhai2@cs.rochester.edu>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This is an attempt to mitigate the issue of running out of memory when THP
is always enabled. During runtime whenever a THP is being faulted in
(__do_huge_pmd_anonymous_page) or collapsed by khugepaged
(collapse_huge_page), the THP is added to _deferred_list. Whenever memory
reclaim happens in linux, the kernel runs the deferred_split shrinker
which goes through the _deferred_list.
If the folio was partially mapped, the shrinker attempts to split it. If
the folio is not partially mapped, the shrinker checks if the THP was
underused, i.e. how many of the base 4K pages of the entire THP were
zero-filled. If this number goes above a certain threshold (decided by
/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none), the
shrinker will attempt to split that THP. Then at remap time, the pages
that were zero-filled are mapped to the shared zeropage, hence saving
memory.
Link: https://lkml.kernel.org/r/20240830100438.3623486-6-usamaarif642@gmail.com
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Suggested-by: Rik van Riel <riel@surriel.com>
Co-authored-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alexander Zhu <alexlzhu@fb.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kairui Song <ryncsn@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Shuang Zhai <zhais@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Shuang Zhai <szhai2@cs.rochester.edu>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We added a public Google calendar for easy sharing of DAMON bi-weekly
meetups[1]. Add it to the official document for a better visibility.
[1] https://lore.kernel.org/all/20240717235812.53087-1-sj@kernel.org/
Link: https://lkml.kernel.org/r/20240826015741.80707-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Alex Shi <alexs@kernel.org>
Cc: Hu Haowen <2023002089@link.tyut.edu.cn>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Yanteng Si <siyanteng@loongson.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
maintainer-profile.rst for DAMON separates the links and target
definitions. It is not really necessary, and only makes the readability
worse. At least the definitions need the section title (say,
"References"). Just add the links in place on the doc.
Link: https://lkml.kernel.org/r/20240826015741.80707-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Alex Shi <alexs@kernel.org>
Cc: Hu Haowen <2023002089@link.tyut.edu.cn>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Yanteng Si <siyanteng@loongson.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Docs/damon: update GitHub repo URLs and maintainer-profile".
Replace GitHub URLS on DAMON documents for none-kernel parts DAMON repos
with new ones[1] via the first patch. With following two patches,
wordsmith maitnainer-profile for better readability, and document the
Google clendsar for bi-weekly meetups, respectively.
[1] https://lore.kernel.org/20240813232158.83903-1-sj@kernel.org
This patch (of 3):
GitHub repos for non-kernel parts of DAMON project including 'damo',
'damon-tests' and 'damoos' will be moved[1] from 'awslabs' org to
'damonitor', by 2024-09-05. Update related URLs in kernel tree.
[1] https://lore.kernel.org/20240813232158.83903-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20240826015741.80707-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20240826015741.80707-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Alex Shi <alexs@kernel.org>
Cc: Hu Haowen <2023002089@link.tyut.edu.cn>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Yanteng Si <siyanteng@loongson.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
There are no more callers of isolate_lru_page(), remove it.
[wangkefeng.wang@huawei.com: convert page to folio in comment and document, per Matthew]
Link: https://lkml.kernel.org/r/20240826144114.1928071-1-wangkefeng.wang@huawei.com
Link: https://lkml.kernel.org/r/20240826065814.1336616-6-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When a THP is added to the deferred_list due to partially mapped, its
partial pages are unused, leading to wasted memory and potentially
increasing memory reclamation pressure.
Detailing the specifics of how unmapping occurs is quite difficult and not
that useful, so we adopt a simple approach: each time a THP enters the
deferred_list, we increment the count by 1; whenever it leaves for any
reason, we decrement the count by 1.
Link: https://lkml.kernel.org/r/20240824010441.21308-3-21cnbao@gmail.com
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Chuanhua Han <hanchuanhua@oppo.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuai Yuan <yuanshuai@oppo.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm: count the number of anonymous THPs per size", v4.
Knowing the number of transparent anon THPs in the system is crucial
for performance analysis. It helps in understanding the ratio and
distribution of THPs versus small folios throughout the system.
Additionally, partial unmapping by userspace can lead to significant waste
of THPs over time and increase memory reclamation pressure. We need this
information for comprehensive system tuning.
This patch (of 2):
Let's track for each anonymous THP size, how many of them are currently
allocated. We'll track the complete lifespan of an anon THP, starting
when it becomes an anon THP ("large anon folio") (->mapping gets set),
until it gets freed (->mapping gets cleared).
Introduce a new "nr_anon" counter per THP size and adjust the
corresponding counter in the following cases:
* We allocate a new THP and call folio_add_new_anon_rmap() to map
it the first time and turn it into an anon THP.
* We split an anon THP into multiple smaller ones.
* We migrate an anon THP, when we prepare the destination.
* We free an anon THP back to the buddy.
Note that AnonPages in /proc/meminfo currently tracks the total number of
*mapped* anonymous *pages*, and therefore has slightly different
semantics. In the future, we might also want to track "nr_anon_mapped"
for each THP size, which might be helpful when comparing it to the number
of allocated anon THPs (long-term pinning, stuck in swapcache, memory
leaks, ...).
Further note that for now, we only track anon THPs after they got their
->mapping set, for example via folio_add_new_anon_rmap(). If we would
allocate some in the swapcache, they will only show up in the statistics
for now after they have been mapped to user space the first time, where we
call folio_add_new_anon_rmap().
[akpm@linux-foundation.org: documentation fixups, per David]
Link: https://lkml.kernel.org/r/3e8add35-e26b-443b-8a04-1078f4bc78f6@redhat.com
Link: https://lkml.kernel.org/r/20240824010441.21308-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20240824010441.21308-2-21cnbao@gmail.com
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Chuanhua Han <hanchuanhua@oppo.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuai Yuan <yuanshuai@oppo.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
As discussed in [1], zswap-related settings natually lose their effect
when zswap is disabled, specifically zswap.writeback here. Be explicit
about this behavior.
[1] https://lore.kernel.org/linux-kernel/CAKEwX=Mhbwhh-=xxCU-RjMXS_n=RpV3Gtznb2m_3JgL+jzz++g@mail.gmail.com/
[akpm@linux-foundation.org: fix/simplify text]
Link: https://lkml.kernel.org/r/20240823162506.12117-3-me@yhndnzj.com
Signed-off-by: Mike Yuan <me@yhndnzj.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Increase the number of bits available in page_type".
Kent wants more than 16 bits in page_type, so I resurrected this old patch
and expanded it a bit. It's a bit more efficient than our current scheme
(1 4-byte insn vs 3 insns of 13 bytes total) to test a single page type.
This patch (of 4):
An upcoming patch will convert page type from being a bitfield to a
single byte, so we will not be able to use %pG to print the page type
any more. The printing of the symbolic name will be restored in that
patch.
Link: https://lkml.kernel.org/r/20240821173914.2270383-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20240821173914.2270383-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The ability to observe the demotion and promotion decisions made by the
kernel on a per-cgroup basis is important for monitoring and tuning
containerized workloads on machines equipped with tiered memory.
Different containers in the system may experience drastically different
memory tiering actions that cannot be distinguished from the global
counters alone.
For example, a container running a workload that has a much hotter memory
accesses will likely see more promotions and fewer demotions, potentially
depriving a colocated container of top tier memory to such an extent that
its performance degrades unacceptably.
For another example, some containers may exhibit longer periods between
data reuse, causing much more numa_hint_faults than numa_pages_migrated.
In this case, tuning hot_threshold_ms may be appropriate, but the signal
can easily be lost if only global counters are available.
In the long term, we hope to introduce per-cgroup control of promotion and
demotion actions to implement memory placement policies in tiering.
This patch set adds seven counters to memory.stat in a cgroup:
numa_pages_migrated, numa_pte_updates, numa_hint_faults, pgdemote_kswapd,
pgdemote_khugepaged, pgdemote_direct and pgpromote_success. pgdemote_*
and pgpromote_success are also available in memory.numa_stat.
count_memcg_events_mm() is added to count multiple event occurrences at
once, and get_mem_cgroup_from_folio() is added because we need to get a
reference to the memcg of a folio before it's migrated to track
numa_pages_migrated. The accounting of PGDEMOTE_* is moved to
shrink_inactive_list() before being changed to per-cgroup.
[kaiyang2@cs.cmu.edu: add documentation of the memcg counters in cgroup-v2.rst]
Link: https://lkml.kernel.org/r/20240814235122.252309-1-kaiyang2@cs.cmu.edu
Link: https://lkml.kernel.org/r/20240814174227.30639-1-kaiyang2@cs.cmu.edu
Signed-off-by: Kaiyang Zhao <kaiyang2@cs.cmu.edu>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Wei Xu <weixugc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
NUMA emulation can be now enabled on arm64 and riscv in addition to x86.
Move description of numa=fake parameters from x86 documentation of
admin-guide/kernel-parameters.txt
Link: https://lkml.kernel.org/r/20240807064110.1003856-27-rppt@kernel.org
Suggested-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
Acked-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Rob Herring (Arm) <robh@kernel.org>
Cc: Samuel Holland <samuel.holland@sifive.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The pressure_level in memcg v1 provides memory pressure notifications to
the user space. At the moment it provides notifications for three levels
of memory pressure i.e. low, medium and critical, which are defined based
on internal memory reclaim implementation details. More specifically the
ratio of scanned and reclaimed pages during a memory reclaim. However
this is not robust as there are workloads with mostly unreclaimable user
memory or kernel memory.
For v2, the users can use PSI for memory pressure status of the system or
the cgroup. Let's start the deprecation process for pressure_level and
add warnings to gather the info on how the current users are using this
interface and how they can be used to PSI.
Link: https://lkml.kernel.org/r/20240814220021.3208384-5-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: T.J. Mercier <tjmercier@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The oom_control provides functionality to disable memcg oom-killer,
notifications on oom-kill and reading the stats regarding oom-kills. This
interface was mainly introduced to provide functionality for userspace
oom-killers. However it is not robust enough and only supports OOM
handling in the page fault path.
For v2, the users can use the combination of memory.events notifications,
memory.high and PSI to provide userspace OOM-killing functionality.
Actually LMKD in Android and OOMd in systemd and Meta infrastructure
already use PSI in combination with other stats to implement userspace
OOM-killing.
Let's start the deprecation process for v1 and gather the info on how the
current users are using this interface and work on providing a more robust
functionality in v2.
Link: https://lkml.kernel.org/r/20240814220021.3208384-4-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: T.J. Mercier <tjmercier@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Memcg v1 provides soft limit functionality for the best effort memory
sharing between multiple workloads on a system. It is usually triggered
through kswapd and at the moment does not reclaim kernel memory.
Memcg v2 provides more straightforward best effort (memory.low) and hard
protection (memory.min) functionalities. Let's initiate the deprecation
of soft limit from v1 and gather if v2 needs something more to move the
existing v1 users to v2 regarding soft limit.
Link: https://lkml.kernel.org/r/20240814220021.3208384-3-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: T.J. Mercier <tjmercier@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "memcg: initiate deprecation of v1 features", v2.
Start the deprecation process of the memcg v1 features which we discussed
during LSFMMBPF 2024 [1]. For now add the warnings to collect the
information on how the current users are using these features. Next we
will work on providing better alternatives in v2 (if needed) and fully
deprecate these features.
Link: https://lwn.net/Articles/974575 [1]
This patch (of 4):
Memcg v1 provides opt-in TCP memory accounting feature. However it is
mostly unused due to its performance impact on the network traffic. In
v2, the TCP memory is accounted in the regular memory usage and is
transparent to the users but they can observe the TCP memory usage through
memcg stats.
Let's initiate the deprecation process of memcg v1's tcp accounting
functionality and add warnings to gather if there are any users and if
there are, collect how they are using it and plan to provide them better
alternative in v2.
Link: https://lkml.kernel.org/r/20240814220021.3208384-1-shakeel.butt@linux.dev
Link: https://lkml.kernel.org/r/20240814220021.3208384-2-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: T.J. Mercier <tjmercier@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add thp_anon= cmdline parameter to allow specifying the default enablement
of each supported anon THP size. The parameter accepts the following
format and can be provided multiple times to configure each size:
thp_anon=<size>,<size>[KMG]:<value>;<size>-<size>[KMG]:<value>
An example:
thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
See Documentation/admin-guide/mm/transhuge.rst for more details.
Configuring the defaults at boot time is useful to allow early user space
to take advantage of mTHP before its been configured through sysfs.
[v-songbaohua@oppo.com: use get_oder() and check size is is_power_of_2]
Link: https://lkml.kernel.org/r/20240814224635.43272-1-21cnbao@gmail.com
[ryan.roberts@arm.com: some minor cleanup according to David's comments]
Link: https://lkml.kernel.org/r/20240820105244.62703-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20240814020247.67297-1-21cnbao@gmail.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Co-developed-by: Barry Song <v-songbaohua@oppo.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <ioworker0@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Nobody checks the folio error flag any more, so we can stop setting and
clearing it. Also remove the documentation suggesting to not bother
setting the error bit.
Link: https://lkml.kernel.org/r/20240807193528.1865100-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Introduce burst mode, which can be configured with kfence.burst=$count,
where the burst count denotes the additional successive slab allocations
to be allocated through KFENCE for each sample interval.
The idea is that this can give developers an additional knob to make
KFENCE more aggressive when debugging specific issues of systems where
either rebooting or recompiling the kernel with KASAN is not possible.
Experiment: To assess the effectiveness of the new option, we randomly
picked a recent out-of-bounds [1] and use-after-free bug [2], each with a
reproducer provided by syzbot, that initially detected these bugs with
KASAN. We then tried to reproduce the bugs with KFENCE below.
[1] Fixed by: 7c55b78818 ("jfs: xattr: fix buffer overflow for invalid xattr")
https://syzkaller.appspot.com/bug?id=9d1b59d4718239da6f6069d3891863c25f9f24a2
[2] Fixed by: f8ad00f3fb ("l2tp: fix possible UAF when cleaning up tunnels")
https://syzkaller.appspot.com/bug?id=4f34adc84f4a3b080187c390eeef60611fd450e1
The following KFENCE configs were compared. A pool size of 1023 objects
was used for all configurations.
Baseline
kfence.sample_interval=100
kfence.skip_covered_thresh=75
kfence.burst=0
Aggressive
kfence.sample_interval=1
kfence.skip_covered_thresh=10
kfence.burst=0
AggressiveBurst
kfence.sample_interval=1
kfence.skip_covered_thresh=10
kfence.burst=1000
Each reproducer was run 10 times (after a fresh reboot), with the
following detection counts for each KFENCE config:
| Detection Count out of 10 |
| OOB [1] | UAF [2] |
------------------+-------------+-------------+
Default | 0/10 | 0/10 |
Aggressive | 0/10 | 0/10 |
AggressiveBurst | 8/10 | 8/10 |
With the Default and even the Aggressive configs the results are
unsurprising, given KFENCE has not been designed for deterministic bug
detection of small test cases.
However, when enabling burst mode with relatively large burst count,
KFENCE can start to detect heap memory-safety bugs even in simpler test
cases with high probability (in the above cases with ~80% probability).
Link: https://lkml.kernel.org/r/20240805124203.2692278-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
All users are gone, let's remove it and any leftovers in comments. We'll
leave any FOLL/follow_page_() naming cleanups as future work.
Link: https://lkml.kernel.org/r/20240802155524.517137-11-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm, memcg: cg2 memory{.swap,}.peak write handlers", v7.
This patch (of 2):
Other mechanisms for querying the peak memory usage of either a process or
v1 memory cgroup allow for resetting the high watermark. Restore parity
with those mechanisms, but with a less racy API.
For example:
- Any write to memory.max_usage_in_bytes in a cgroup v1 mount resets
the high watermark.
- writing "5" to the clear_refs pseudo-file in a processes's proc
directory resets the peak RSS.
This change is an evolution of a previous patch, which mostly copied the
cgroup v1 behavior, however, there were concerns about races/ownership
issues with a global reset, so instead this change makes the reset
filedescriptor-local.
Writing any non-empty string to the memory.peak and memory.swap.peak
pseudo-files reset the high watermark to the current usage for subsequent
reads through that same FD.
Notably, following Johannes's suggestion, this implementation moves the
O(FDs that have written) behavior onto the FD write(2) path. Instead, on
the page-allocation path, we simply add one additional watermark to
conditionally bump per-hierarchy level in the page-counter.
Additionally, this takes Longman's suggestion of nesting the
page-charging-path checks for the two watermarks to reduce the number of
common-case comparisons.
This behavior is particularly useful for work scheduling systems that need
to track memory usage of worker processes/cgroups per-work-item. Since
memory can't be squeezed like CPU can (the OOM-killer has opinions), these
systems need to track the peak memory usage to compute system/container
fullness when binpacking workitems.
Most notably, Vimeo's use-case involves a system that's doing global
binpacking across many Kubernetes pods/containers, and while we can use
PSI for some local decisions about overload, we strive to avoid packing
workloads too tightly in the first place. To facilitate this, we track
the peak memory usage. However, since we run with long-lived workers (to
amortize startup costs) we need a way to track the high watermark while a
work-item is executing. Polling runs the risk of missing short spikes
that last for timescales below the polling interval, and peak memory
tracking at the cgroup level is otherwise perfect for this use-case.
As this data is used to ensure that binpacked work ends up with sufficient
headroom, this use-case mostly avoids the inaccuracies surrounding
reclaimable memory.
Link: https://lkml.kernel.org/r/20240730231304.761942-1-davidf@vimeo.com
Link: https://lkml.kernel.org/r/20240729143743.34236-1-davidf@vimeo.com
Link: https://lkml.kernel.org/r/20240729143743.34236-2-davidf@vimeo.com
Signed-off-by: David Finkel <davidf@vimeo.com>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Waiman Long <longman@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Here are some small USB fixes for 6.11-rc6. Included in here are:
- dwc3 driver fixes for reported issues
- MAINTAINER file update, marking a driver as unsupported :(
- cdnsp driver fixes
- USB gadget driver fix
- USB sysfs fix
- other tiny fixes
- new device ids for usb serial driver
All of these have been in linux-next this week with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZtNVTw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymqWwCgmITg5Owdigw+ejnkLJ4Q/CHYfRkAoLh7nkcm
kaX7IqZLgVDr4rvgVcmR
=azoE
-----END PGP SIGNATURE-----
Merge tag 'usb-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes for 6.11-rc6. Included in here are:
- dwc3 driver fixes for reported issues
- MAINTAINER file update, marking a driver as unsupported :(
- cdnsp driver fixes
- USB gadget driver fix
- USB sysfs fix
- other tiny fixes
- new device ids for usb serial driver
All of these have been in linux-next this week with no reported
issues"
* tag 'usb-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: serial: option: add MeiG Smart SRM825L
usb: cdnsp: fix for Link TRB with TC
usb: dwc3: st: add missing depopulate in probe error path
usb: dwc3: st: fix probed platform device ref count on probe error path
usb: dwc3: ep0: Don't reset resource alloc flag (including ep0)
usb: core: sysfs: Unmerge @usb3_hardware_lpm_attr_group in remove_power_attributes()
usb: typec: fsa4480: Relax CHIP_ID check
usb: dwc3: xilinx: add missing depopulate in probe error path
usb: dwc3: omap: add missing depopulate in probe error path
dt-bindings: usb: microchip,usb2514: Fix reference USB device schema
usb: gadget: uvc: queue pump work in uvcg_video_enable()
cdc-acm: Add DISABLE_ECHO quirk for GE HealthCare UI Controller
usb: cdnsp: fix incorrect index in cdnsp_get_hw_deq function
usb: dwc3: core: Prevent USB core invalid event buffer address access
MAINTAINERS: Mark UVC gadget driver as orphan
No known outstanding regressions.
Current release - regressions:
- wifi: iwlwifi: fix hibernation
- eth: ionic: prevent tx_timeout due to frequent doorbell ringing
Previous releases - regressions:
- sched: fix sch_fq incorrect behavior for small weights
- wifi:
- iwlwifi: take the mutex before running link selection
- wfx: repair open network AP mode
- netfilter: restore IP sanity checks for netdev/egress
- tcp: fix forever orphan socket caused by tcp_abort
- mptcp: close subflow when receiving TCP+FIN
- bluetooth: fix random crash seen while removing btnxpuart driver
Previous releases - always broken:
- mptcp: more fixes for the in-kernel PM
- eth: bonding: change ipsec_lock from spin lock to mutex
- eth: mana: fix race of mana_hwc_post_rx_wqe and new hwc response
Misc:
- documentation: drop special comment style for net code
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmbQcqISHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkOJkP+QHZx2LCilc0uvrYkqWBz7aYEigISK+6
NdGiF/c9FO/dvmisUbs7i48TXKplHu56bR0YTVm2pdKUNcXO5jUgy+s4n9uncsCF
/Cq8WnaXJ3THqKKNlMnSeTJ1URE47iagI+LdX4g9a5HE5GgrORcHm4mfcn7m68EP
pZ+TaPDw9jp+o+1nkpqgPe8Vdz1dPlqC1S2KQMl0S60WcSlYgDpVUtVU5m2mitJ1
giNHXcU2UXxFFyvqhHXyQIFIkKU6sNbD32cm9VXDMw680KjmBM63Fz5EIiZospkz
efQWHn/xwqZNEOLdN5sPUtKLv02D8sTusfTaGWaNulmNd346ABrkS+fDjHqRxLFb
OBKXOn4AG1OPCs4MgrgtJf7mrcwJErbE/21dLqGPZWkOzqbe+7r8pXn0XtS9m+gR
V0IkSpwrS1D394nP2TVmw0B+LwXMA3ItzAjNa8Lemr7TVEpAmuCdAz2XiR13KZfw
B0DBtWCWr5skgWZdlhofi71OOXC9bWIq75i+aSZR/KXqRj4I38OsargSVNQve/H4
OmOitt1w0ZRhmb+HmW+xgIffikGsi9YEO165UUzUQCQVn00r0EHP8T3Q2d1/2Hzx
FtnaYFsBAI2TV2m2DTuJNWc0Fa3G08tJWkoEqbrWeAuebOk0oKWExsXzVfMOrig1
a9nNA98DmlN3
=yJIE
-----END PGP SIGNATURE-----
Merge tag 'net-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth, wireless and netfilter.
No known outstanding regressions.
Current release - regressions:
- wifi: iwlwifi: fix hibernation
- eth: ionic: prevent tx_timeout due to frequent doorbell ringing
Previous releases - regressions:
- sched: fix sch_fq incorrect behavior for small weights
- wifi:
- iwlwifi: take the mutex before running link selection
- wfx: repair open network AP mode
- netfilter: restore IP sanity checks for netdev/egress
- tcp: fix forever orphan socket caused by tcp_abort
- mptcp: close subflow when receiving TCP+FIN
- bluetooth: fix random crash seen while removing btnxpuart driver
Previous releases - always broken:
- mptcp: more fixes for the in-kernel PM
- eth: bonding: change ipsec_lock from spin lock to mutex
- eth: mana: fix race of mana_hwc_post_rx_wqe and new hwc response
Misc:
- documentation: drop special comment style for net code"
* tag 'net-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
nfc: pn533: Add poll mod list filling check
mailmap: update entry for Sriram Yagnaraman
selftests: mptcp: join: check re-re-adding ID 0 signal
mptcp: pm: ADD_ADDR 0 is not a new address
selftests: mptcp: join: validate event numbers
mptcp: avoid duplicated SUB_CLOSED events
selftests: mptcp: join: check re-re-adding ID 0 endp
mptcp: pm: fix ID 0 endp usage after multiple re-creations
mptcp: pm: do not remove already closed subflows
selftests: mptcp: join: no extra msg if no counter
selftests: mptcp: join: check re-adding init endp with != id
mptcp: pm: reset MPC endp ID when re-added
mptcp: pm: skip connecting to already established sf
mptcp: pm: send ACK on an active subflow
selftests: mptcp: join: check removing ID 0 endpoint
mptcp: pm: fix RM_ADDR ID for the initial subflow
mptcp: pm: reuse ID 0 after delete and re-add
net: busy-poll: use ktime_get_ns() instead of local_clock()
sctp: fix association labeling in the duplicate COOKIE-ECHO case
mptcp: pr_debug: add missing \n at the end
...
Nothing too interesting. One patch to remove spurious warning and others to
address static checker warnings.
-----BEGIN PGP SIGNATURE-----
iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZskq8g4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGfTVAP42MsAOyrlND+cH/zQpSc8OhGbm3v0gJFnPn4UE
Y3B4kgD/W68n57MQ5uWh1vHHvsqjizbXfRez1dVJoGqa/q88GQs=
=Uwdx
-----END PGP SIGNATURE-----
Merge tag 'wq-for-6.11-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
"Nothing too interesting. One patch to remove spurious warning and
others to address static checker warnings"
* tag 'wq-for-6.11-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Correct declaration of cpu_pwq in struct workqueue_struct
workqueue: Fix spruious data race in __flush_work()
workqueue: Remove incorrect "WARN_ON_ONCE(!list_empty(&worker->entry));" from dying worker
workqueue: Fix UBSAN 'subtraction overflow' error in shift_and_mask()
workqueue: doc: Fix function name, remove markers
- a tweak to uinput interface to reject requests with abnormally large
number of slots. 100 slots/contacts should be enough for real devices
- support for FocalTech FT8201 added to the edt-ft5x06 driver
- tweaks to i8042 to handle more devices that have issue with its
emulation
- Synaptics touchpad switched to native SMbus/RMI mode on HP Elitebook
840 G2
- other minor fixes.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQST2eWILY88ieB2DOtAj56VGEWXnAUCZsjxKQAKCRBAj56VGEWX
nKE9APwPOZl4TldhPLG37LCvlVmgN0cSSWY+PEEqIaxmFtezjQEAr2/qs1XknZVr
LkuhmHxng2ZI9X4HUL+h52Lha7y4UAc=
=3Pcb
-----END PGP SIGNATURE-----
Merge tag 'input-for-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- a tweak to uinput interface to reject requests with abnormally large
number of slots. 100 slots/contacts should be enough for real devices
- support for FocalTech FT8201 added to the edt-ft5x06 driver
- tweaks to i8042 to handle more devices that have issue with its
emulation
- Synaptics touchpad switched to native SMbus/RMI mode on HP Elitebook
840 G2
- other minor fixes
* tag 'input-for-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: himax_hx83112b - fix incorrect size when reading product ID
Input: i8042 - use new forcenorestore quirk to replace old buggy quirk combination
Input: i8042 - add forcenorestore quirk to leave controller untouched even on s3
Input: i8042 - add Fujitsu Lifebook E756 to i8042 quirk table
Input: uinput - reject requests with unreasonable number of slots
Input: edt-ft5x06 - add support for FocalTech FT8201
dt-bindings: input: touchscreen: edt-ft5x06: Document FT8201 support
Input: adc-joystick - fix optional value handling
Input: synaptics - enable SMBus for HP Elitebook 840 G2
Input: ads7846 - ratelimit the spi_sync error message
As we discussed in the room at netdevconf earlier this week,
drop the requirement for special comment style for netdev.
For checkpatch, the general check accepts both right now, so
simply drop the special request there as well.
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
An USB hub is not a HCD, but an USB device. Fix the referenced schema
accordingly.
Fixes: bfbf2e4b77 ("dt-bindings: usb: Document the Microchip USB2514 hub")
Cc: stable@vger.kernel.org
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Link: https://lore.kernel.org/r/20240815113132.372542-1-alexander.stein@ew.tq-group.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
- Allow large folios on compressed inodes;
- Fix invalid memory accesses if z_erofs_gbuf_growsize()
partially fails;
- Two minor cleanups.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmbF1s8RHHhpYW5nQGtl
cm5lbC5vcmcACgkQUXZn5Zlu5qqzghAAjSi/OITIcpjYe3GOxDNEbIxTFf+5d7ge
66KRMBm5FG6xBaXOK4xhit2EdFsDVMAna6FjYyRJYpDDQX5MAUsifDeZeGZCKOV5
KpUk2CfhIp1zi9CeVA5Tl6rfdz0b0SKz77uRK9BN8PWmCdbeenOKgK6HU06S5enC
uPud9QegQELXad0bIe+uo+4zGQnqxwGMfkNEW+cfbp200m6kpig/SgttoXCtz1mm
CEZpv87MlrQLbLtox+KcesFhYofxWdx/My5ckJVbaHGZdM29HKehRLIsBFX3di5N
4OkGnFbh16lhx6s4a/b1pCeDNH1NcmP7uqCoGr+dCVKQ79glSZQvfiV4myQk7vRN
4Orj+zq4aBSmYOvizIFMH++IGDioPBxkhKv3ReeCe4t/L8LHPkGEaeT+iM9b+b4n
uK0faj6qREygf2rHIRr6ciq7GdxlMwdlF7AeNUIyCHNO84J39+GkAID3Nqs+vzuF
wSJAuqp959WgJf2Co5ugb4vjbrpq8TKfZG0yk7KnIOWJkhAKDcIwta+ooloIYtD8
5gHhMRodIgKjYKPOi9vx9cQPlISTMsqVWpRuyzH9vCvqNvkzDriAez1kFq01tZTO
gcK5QGekSgft/5h7PLFrbVV1kVfVdjNapsIY+XAEmBGGCJCScCrlIxjbsRwsseFW
JIpQXilIU3c=
=gFZW
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-6.11-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
"As I mentioned in the merge window pull request, there is a regression
which could cause system hang due to page migration. The corresponding
fix landed upstream through MM tree last week (commit 2e6506e1c4ee:
"mm/migrate: fix deadlock in migrate_pages_batch() on large folios"),
therefore large folios can be safely allowed for compressed inodes and
stress tests have been running on my fleet for over 20 days without
any regression. Users have explicitly requested this for months, so
let's allow large folios for EROFS full cases now for wider testing.
Additionally, there is a fix which addresses invalid memory accesses
on a failure path triggered by fault injection and two minor cleanups
to simplify the codebase.
Summary:
- Allow large folios on compressed inodes
- Fix invalid memory accesses if z_erofs_gbuf_growsize() partially
fails
- Two minor cleanups"
* tag 'erofs-for-6.11-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix out-of-bound access when z_erofs_gbuf_growsize() partially fails
erofs: allow large folios for compressed files
erofs: get rid of check_layout_compatibility()
erofs: simplify readdir operation
As commit 2e6506e1c4 ("mm/migrate: fix deadlock in
migrate_pages_batch() on large folios") has landed upstream, large
folios can be safely enabled for compressed inodes since all
prerequisites have already landed in 6.11-rc1.
Stress tests has been running on my fleet for over 20 days without any
regression. Additionally, users [1] have requested it for months.
Let's allow large folios for EROFS full cases upstream now for wider
testing.
[1] https://lore.kernel.org/r/CAGsJ_4wtE8OcpinuqVwG4jtdx6Qh5f+TON6wz+4HMCq=A2qFcA@mail.gmail.com
Cc: Barry Song <21cnbao@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
[ Gao Xiang: minor commit typo fixes. ]
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240819025207.3808649-1-hsiangkao@linux.alibaba.com
There are a couple of spelling mistakes in the documentation. This patch
fixes them.
Signed-off-by: Victor Timofei <victor@vtimothy.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
- Fix crashes on 85xx with some configs since the recent hugepd rework.
- Fix boot warning with hugepages and CONFIG_DEBUG_VIRTUAL on some platforms.
- Don't enable offline cores when changing SMT modes, to match existing
userspace behaviour.
Thanks to: Christophe Leroy, Dr. David Alan Gilbert, Guenter Roeck, Nysal Jan
K.A, Shrikanth Hegde, Thomas Gleixner, Tyrel Datwyler.
-----BEGIN PGP SIGNATURE-----
iQJLBAABCAA1FiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmbBN48XHG1pY2hhZWxA
ZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYDFhA/7ByodEuDtTZRAhQxJbzTlEMMk
OdEURo5MqJZo2P9A3G1KKQKUUy1cQwKLcOaCa7nSh3IXHswXEGZK/Do1lgUj8BAx
BcaTlm6aAgMnxkEXIGMNBCGn54IxA7pQV7TUUdr+3CJU0udtYceej03beWZuQVvN
DxdoHflNojU+h8AUWEm5KW6X/o8C+DI6rMAP5zW8Xvsbz/QmSSn1frAs+Dgnacyh
niAToWbW4ibw0LJ8NBDIxIgqDXZHGUY9/KMSAn1WgpERcbY8FUD3PWw2FzJxjqKw
h/sjDRpFhY7mImZtzTKez2OHMPiq+730OVEmgfoER/smknnIYi/tO4e2r+wA9YS7
IIpyl42sdTPV6ke1DDT5sUlWq4LjPLobB+2WKwgDkSOnTRjF1/9nf4AVdtwh2cuS
Y/Sttz3YjtfeSPG3sWnn5HkMbBksMoSSO+Q9BqB2BQAIHWHPDZWwadGhSw1omV7/
poYoR3KbmomLL39qk49P0thmhhCDhF64j7XN4ESFUK7tFL1BHCZ2vXSI5vIi0CHZ
z65pJxsid/0oz04abINAsrDOyZTIkPBTDawda4UEHfXpUOOM9iFPfQfcFnJYRCPk
xiOYAhRj10l7eQeSXOcaP1TXraW+DCs4N5neCaZ0zI/4vwTcrFMn37bB7DVYLjkB
08vDj12ybMrz51mjCj4=
=sZ+f
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix crashes on 85xx with some configs since the recent hugepd rework.
- Fix boot warning with hugepages and CONFIG_DEBUG_VIRTUAL on some
platforms.
- Don't enable offline cores when changing SMT modes, to match existing
userspace behaviour.
Thanks to Christophe Leroy, Dr. David Alan Gilbert, Guenter Roeck, Nysal
Jan K.A, Shrikanth Hegde, Thomas Gleixner, and Tyrel Datwyler.
* tag 'powerpc-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/topology: Check if a core is online
cpu/SMT: Enable SMT only if a core is online
powerpc/mm: Fix boot warning with hugepages and CONFIG_DEBUG_VIRTUAL
powerpc/mm: Fix size of allocated PGDIR
soc: fsl: qbman: remove unused struct 'cgr_comp'
* The text patching global icache flush has been reintroduced.
* A fix for the syscall entry code to correctly initialize a0, which
manifests as a bug in strace.
* XIP kernels now map the entire kernel, which fixes boot under at least
DEBUG_VIRTUAL=y.
* The acpi_early_node_map initializer now initializes all nodes.
* A fix for a OOB access in the Andes vendor extension probing code.
* A new key for scalar misaligned access performance in hwprobe, which
correctly treat the values as an enum (as opposed to a bitmap).
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAma/ahUTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYicI3D/9BfuMfVsx+L6KlSCQxEtAZeB5w6soD
ptwHIeExZhu2H+RRc2Ej76TQtzRL2Zd5xJtLf6kfvtzeG45y5Ic3Lal08Q4YNkLh
ivFIvFPFxAo5QdyxLmEgUbyJf4DERESb/9I68KIJvo0nBMhd5B8kKqGAcCgQXtZP
tnqLuO0cok4lK/BSbJrC5VkkBZbM5seH4hdwNOh3U8dyb9DEyrSNYXNHU7axCorJ
TTcnRlsBzBQPyNyaXDN0w6Yp1yA3eCs20dvTf0AQW3iBEboIIYV1hbk+dMDgJNgB
JyOz3nRcjncw2+S+zeshiHu62SfjyrhCCZQCC+2ssJKbSqIlZh6C+sZfQpPGe8RH
fK1QoVHbcBKO+zNsTK58oVxK+yQhZhbAHE5WvZ7ZsPw7/Aa7Y2kRUbfG7GoUYgej
Hvb3GVTsiIJc0CO4TkjojBoAbyiY0fCnDzVUn5esZylRcCx1OI2mMRIDWUpbVnsM
LORkK/vVTaR2JebCqoTLMpjbjFyxrRfeuv+nDFP/NaW0HO/poX0n5Zk175kgPtpv
AwdkQvuaw5notNPVDd2mpr+fwCibwEYO4kh4jtuAun07sQPA2KXJrUKSt4Y8rt4D
cmBI+YJDx5Y1kUWnCyIzvggMGDQXvBlepjbhZ9IBIWL5JHozil0Mc1FIC7tH4JhS
yc1SYS1+7r2gEg==
=dO3Z
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- reintroduce the text patching global icache flush
- fix syscall entry code to correctly initialize a0, which manifested
as a strace bug
- XIP kernels now map the entire kernel, which fixes boot under at
least DEBUG_VIRTUAL=y
- initialize all nodes in the acpi_early_node_map initializer
- fix OOB access in the Andes vendor extension probing code
- A new key for scalar misaligned access performance in hwprobe, which
correctly treat the values as an enum (as opposed to a bitmap)
* tag 'riscv-for-linus-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix out-of-bounds when accessing Andes per hart vendor extension array
RISC-V: hwprobe: Add SCALAR to misaligned perf defines
RISC-V: hwprobe: Add MISALIGNED_PERF key
RISC-V: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE
riscv: change XIP's kernel_map.size to be size of the entire kernel
riscv: entry: always initialize regs->a0 to -ENOSYS
riscv: Re-introduce global icache flush in patch_text_XXX()
mediatek:
- fix cursor crash
amdgpu:
- Fix MES ring buffer overflow
- DCN 3.5 fix
- DCN 3.2.1 fix
- DP MST fix
- Cursor fixes
- JPEG fixes
- Context ops validation
- MES 12 fixes
- VCN 5.0 fix
- HDP fix
panel:
- dt bindings style fix
- orientation quirks
rockchip:
- inno-hdmi: fix infoframe upload
v3d:
- fix OOB access in v3d_csd_job_run()
xe:
- Validate user fence during creation
- Fix use after free when client stats are captured
- SRIOV fixes
- Runtime PM fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAma+0HEACgkQDHTzWXnE
hr4sdQ//W3niZ5jPlzn0YB40euYZQi5t0bID+iF+OzEqW9rfJ31kbLFxF5Krgmdy
M7PXsnKljdYyrL0ZeyQgsFW26401QHZQwgfo/ucG6v+YzsrB02fYXeQkGn+/Pz5z
uGu49fDMmHiJ6u5XgxSiAmClGOlbW8+TeEz6fPk5lZOc46eikTFT5Xb1Tf7I3RKC
ZQC/0GGOhr9taoJ9n7cTynDkxQUpEXGykdqb7CoZZ2kE1yBGujR1n+UXyMDEq2ll
8j1t0hdRXQC7hZYSg+5ohxJJB6hfgDAj46zBr+wxOid0pNrOwBUPc6bEKicJnSzk
g0R7XBtCGzbIdCas9EIkJ8ewD4ucglg1R64JZVDS4dBw7kIHQM4nphfrwM4+1/Ig
MLQaefEukwZT1s8kMoOK/oei8M/FGE1sVfCtmvCAxLQaysGB33vjjV68CltCsUWd
rkehLi11wtFZtCf9crr/EeoSQuYI9YXElQ7JX10DquYhjchXZBW/MwYkYzicd128
bpMesGIr+iLI5RW3HsB2hyQ5b4lCrp7/jeP954LQWEei1VeMpchnfh6a7Qmwmhrl
stk1+t38PRDLfK1GBXEJBLvXEiQMF4c/akOBgr0j6JuFhwf1yKF+7MeeXjLn3kAv
n4uBtjXcPMPfm4v57rt7wUDJeIw4oLESkk+3hATnpimO3JXAQ8k=
=Xs5O
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-2024-08-16' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly drm fixes, mostly amdgpu and xe. The larger amdgpu fix is for a
new IP block introduced in rc1, so should be fine. The xe fixes
contain some missed fixes from the end of the previous round along
with some fixes which required precursor changes, but otherwise
everything seems fine,
mediatek:
- fix cursor crash
amdgpu:
- Fix MES ring buffer overflow
- DCN 3.5 fix
- DCN 3.2.1 fix
- DP MST fix
- Cursor fixes
- JPEG fixes
- Context ops validation
- MES 12 fixes
- VCN 5.0 fix
- HDP fix
panel:
- dt bindings style fix
- orientation quirks
rockchip:
- inno-hdmi: fix infoframe upload
v3d:
- fix OOB access in v3d_csd_job_run()
xe:
- Validate user fence during creation
- Fix use after free when client stats are captured
- SRIOV fixes
- Runtime PM fixes"
* tag 'drm-fixes-2024-08-16' of https://gitlab.freedesktop.org/drm/kernel: (37 commits)
drm/xe: Hold a PM ref when GT TLB invalidations are inflight
drm/xe: Drop xe_gt_tlb_invalidation_wait
drm/xe: Add xe_gt_tlb_invalidation_fence_init helper
drm/xe/pf: Fix VF config validation on multi-GT platforms
drm/xe: Build PM into GuC CT layer
drm/xe/vf: Fix register value lookup
drm/xe: Fix use after free when client stats are captured
drm/xe: Take a ref to xe file when user creates a VM
drm/xe: Add ref counting for xe_file
drm/xe: Move part of xe_file cleanup to a helper
drm/xe: Validate user fence during creation
drm/rockchip: inno-hdmi: Fix infoframe upload
drm/amd/amdgpu: add HDP_SD support on gc 12.0.0/1
drm/amdgpu: Update kmd_fw_shared for VCN5
drm/amd/amdgpu: command submission parser for JPEG
drm/amdgpu/mes12: fix suspend issue
drm/amdgpu/mes12: sw/hw fini for unified mes
drm/amdgpu/mes12: configure two pipes hardware resources
drm/amdgpu/mes12: adjust mes12 sw/hw init for multiple pipes
drm/amdgpu/mes12: add mes pipe switch support
...
The command provided to use ccache with clang is not a literal code
block. Once built, the documentation displays the '' symbols as a "
character, which is wrong, and the command can not be applied as
provided.
Turn the command into a literal code block.
Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
panel:
- dt-bindings style fixes
panel-orientation:
- add quirk for Any Loki Max
- add quirk for Any Loki Zero
rockchip:
- inno-hdmi: fix infoframe upload
v3d:
- fix OOB access in v3d_csd_job_run()
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAma9/7AACgkQaA3BHVML
eiN4sggAnc4phUhIiLEgqtinh0E8HnqlOujhvlrONdJPJWGoZz+OKM+l/Uy4mMme
/RF9D3CgEumlCN//3wZ+9J/2gutvUytcUzwr3uT+x4kkl9hjHEUiHyKnKXWDJqWK
Jt8Mtn99KojZNRkVbMYp6CVylvhXu4i2jGJhuq1F8sKA0Le6fCF09fDYwS/kUWxd
V2urqvSEWgu1rngcvIplPfhMyJH16VneYLa3CX7nNDy9q0mT0SAhFw8/47GozqeQ
1xPfsaqrxWnLUnV+/g6katbRgbdeXnzeYd47gbtruaLkWnhpfNrlofYpiRTq1Buy
SQ7iTo3lDulhm+pyIauLBhssytEKig==
=kUH6
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-fixes-2024-08-15' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Short summary of fixes pull:
panel:
- dt-bindings style fixes
panel-orientation:
- add quirk for Any Loki Max
- add quirk for Any Loki Zero
rockchip:
- inno-hdmi: fix infoframe upload
v3d:
- fix OOB access in v3d_csd_job_run()
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240815131751.GA151031@linux.fritz.box
Evan Green <evan@rivosinc.com> says:
The CPUPERF0 hwprobe key was documented and identified in code as
a bitmask value, but its contents were an enum. This produced
incorrect behavior in conjunction with the WHICH_CPUS hwprobe flag.
The first patch in this series fixes the bitmask/enum problem by
creating a new hwprobe key that returns the same data, but is
properly described as a value instead of a bitmask. The second patch
renames the value definitions in preparation for adding vector misaligned
access info. As of this version, the old defines are kept in place to
maintain source compatibility with older userspace programs.
* b4-shazam-merge:
RISC-V: hwprobe: Add SCALAR to misaligned perf defines
RISC-V: hwprobe: Add MISALIGNED_PERF key
Link: https://lore.kernel.org/r/20240809214444.3257596-1-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Current release - regressions:
- udp: fall back to software USO if IPv6 extension headers are present
- wifi: iwlwifi: correctly lookup DMA address in SG table
Current release - new code bugs:
- eth: mlx5e: fix queue stats access to non-existing channels splat
Previous releases - regressions:
- eth: mlx5e: take state lock during tx timeout reporter
- eth: mlxbf_gige: disable RX filters until RX path initialized
- eth: igc: fix reset adapter logics when tx mode change
Previous releases - always broken:
- tcp: update window clamping condition
- netfilter:
- nf_queue: drop packets with cloned unconfirmed conntracks
- nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests
- vsock: fix recursive ->recvmsg calls
- dsa: vsc73xx: fix MDIO bus access and PHY opera
- eth: gtp: pull network headers in gtp_dev_xmit()
- eth: igc: fix packet still tx after gate close by reducing i226 MAC retry buffer
- eth: mana: fix RX buf alloc_size alignment and atomic op panic
- eth: hns3: fix a deadlock problem when config TC during resetting
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAma+CzgSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOk46wP/0hNLi1qNd+zdv6E5nqYJ6ckgKJbaIwR
mM3VGmZtLQXTNApzihFsmfEqT1EiIQuiVW4rqu0eJ28oMezDWyrEKHORx+BL4Omj
6qnygnxQw1fDrhvTfZKXyOJw6mpJL3AMbygtw9DG1se4S5kbmo8cdTI9i9Q4Qcon
ms6CExsHLR1Mtf2XIs8K45XQC07CMy76dvd30VxOKus/bHHt+KBnNJ9B12kBYpbD
4Bko63KeJwhZ4n5soIC8MeqXcU1GyF+AgzQhGvuks8EvUVa4XfW7unxLZwuUsf0J
ZPEKCTBinb1adZnHUx7CYRVHhzi+ptQfFW3bACAkK5cWSy8u0KLOb9Aoe68+HDev
Qor2Hg3SckoFfXBEoZE0GbU+SosXMXIrs6qXOaMNo1gz062N7ZT8DoT6fNBamB31
N8QsiNTOyYDZ6icoTir1PCEvuDyx+QVIdTYAKx8wc3Q5FbpHBDTeStNFZgskTW+/
vEcOy23nXT0WImWP6wnK0REYur9UPb/pHwuBeglgBg/0zwuqioHpIjFUnphvQzBt
kabkX/G4Un44w9E97/ERB7vmR1iKHPTtuU9xIsoO7dMDWxKi8v2TV6f/IBugAEFD
Bx3frQFNayrhEnjm/dNnnwLpI0TZbw1YekVWBCk6pB1m7U+bpJHZfyipYloe8/yB
TfoX+7zCQJtA
=o4nr
-----END PGP SIGNATURE-----
Merge tag 'net-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from wireless and netfilter
Current release - regressions:
- udp: fall back to software USO if IPv6 extension headers are
present
- wifi: iwlwifi: correctly lookup DMA address in SG table
Current release - new code bugs:
- eth: mlx5e: fix queue stats access to non-existing channels splat
Previous releases - regressions:
- eth: mlx5e: take state lock during tx timeout reporter
- eth: mlxbf_gige: disable RX filters until RX path initialized
- eth: igc: fix reset adapter logics when tx mode change
Previous releases - always broken:
- tcp: update window clamping condition
- netfilter:
- nf_queue: drop packets with cloned unconfirmed conntracks
- nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests
- vsock: fix recursive ->recvmsg calls
- dsa: vsc73xx: fix MDIO bus access and PHY opera
- eth: gtp: pull network headers in gtp_dev_xmit()
- eth: igc: fix packet still tx after gate close by reducing i226 MAC
retry buffer
- eth: mana: fix RX buf alloc_size alignment and atomic op panic
- eth: hns3: fix a deadlock problem when config TC during resetting"
* tag 'net-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits)
net: hns3: use correct release function during uninitialization
net: hns3: void array out of bound when loop tnl_num
net: hns3: fix a deadlock problem when config TC during resetting
net: hns3: use the user's cfg after reset
net: hns3: fix wrong use of semaphore up
selftests: net: lib: kill PIDs before del netns
pse-core: Conditionally set current limit during PI regulator registration
net: thunder_bgx: Fix netdev structure allocation
net: ethtool: Allow write mechanism of LPL and both LPL and EPL
vsock: fix recursive ->recvmsg calls
selftest: af_unix: Fix kselftest compilation warnings
netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests
netfilter: nf_tables: Introduce nf_tables_getobj_single
netfilter: nf_tables: Audit log dump reset after the fact
selftests: netfilter: add test for br_netfilter+conntrack+queue combination
netfilter: nf_queue: drop packets with cloned unconfirmed conntracks
netfilter: flowtable: initialise extack before use
netfilter: nfnetlink: Initialise extack before use in ACKs
netfilter: allow ipv6 fragments to arrive on different devices
tcp: Update window clamping condition
...
* Fix failure to start guests with kvm.use_gisa=0
* Panic if (un)share fails to maintain security.
ARM:
* Use kvfree() for the kvmalloc'd nested MMUs array
* Set of fixes to address warnings in W=1 builds
* Make KVM depend on assembler support for ARMv8.4
* Fix for vgic-debug interface for VMs without LPIs
* Actually check ID_AA64MMFR3_EL1.S1PIE in get-reg-list selftest
* Minor code / comment cleanups for configuring PAuth traps
* Take kvm->arch.config_lock to prevent destruction / initialization
race for a vCPU's CPUIF which may lead to a UAF
x86:
* Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)
* Fix smatch issues
* Small cleanups
* Make x2APIC ID 100% readonly
* Fix typo in uapi constant
Generic:
* Use synchronize_srcu_expedited() on irqfd shutdown
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAma85nQUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPq5gf8DVZjs2yNwdPLAUN7AOpElsobFnVN
etJ1V09XsfSYMIAV6ksvC1VBzHpyOJN2QKuF0nfIiISsV8W9xLcV2rKIodsGBvdV
K5ODL/yxeYI27t6Uferra1AGlchtn3tlpZzVarZIgRvZa3NMXaQPYJdcQr2Oybou
7hZsboMTl6jaSl6NELzcBRksfkcOqQLQoUqVBqlkBTM3yyFRmV85BisRkOWCIBfA
9c+kn7ZWBfOqYaXjiLeCrdCKrUuFLfR3ejJibYFan5MULYHIL95W/WJo9uQJ1QBr
BNjMfmtVZ2JOWya40uUSKrvxJ0IErAlMNgmnpjeA4cBqYK5GRHUKvO5jNw==
=A8SH
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"s390:
- Fix failure to start guests with kvm.use_gisa=0
- Panic if (un)share fails to maintain security.
ARM:
- Use kvfree() for the kvmalloc'd nested MMUs array
- Set of fixes to address warnings in W=1 builds
- Make KVM depend on assembler support for ARMv8.4
- Fix for vgic-debug interface for VMs without LPIs
- Actually check ID_AA64MMFR3_EL1.S1PIE in get-reg-list selftest
- Minor code / comment cleanups for configuring PAuth traps
- Take kvm->arch.config_lock to prevent destruction / initialization
race for a vCPU's CPUIF which may lead to a UAF
x86:
- Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)
- Fix smatch issues
- Small cleanups
- Make x2APIC ID 100% readonly
- Fix typo in uapi constant
Generic:
- Use synchronize_srcu_expedited() on irqfd shutdown"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
KVM: SEV: uapi: fix typo in SEV_RET_INVALID_CONFIG
KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)
KVM: eventfd: Use synchronize_srcu_expedited() on shutdown
KVM: selftests: Add a testcase to verify x2APIC is fully readonly
KVM: x86: Make x2APIC ID 100% readonly
KVM: x86: Use this_cpu_ptr() instead of per_cpu_ptr(smp_processor_id())
KVM: x86: hyper-v: Remove unused inline function kvm_hv_free_pa_page()
KVM: SVM: Fix an error code in sev_gmem_post_populate()
KVM: SVM: Fix uninitialized variable bug
KVM: arm64: vgic: Hold config_lock while tearing down a CPU interface
KVM: selftests: arm64: Correct feature test for S1PIE in get-reg-list
KVM: arm64: Tidying up PAuth code in KVM
KVM: arm64: vgic-debug: Exit the iterator properly w/o LPI
KVM: arm64: Enforce dependency on an ARMv8.4-aware toolchain
s390/uv: Panic for set and remove shared access UVC errors
KVM: s390: fix validity interception issue when gisa is switched off
docs: KVM: Fix register ID of SPSR_FIQ
KVM: arm64: vgic: fix unexpected unlock sparse warnings
KVM: arm64: fix kdoc warnings in W=1 builds
KVM: arm64: fix override-init warnings in W=1 builds
...
In preparation for misaligned vector performance hwprobe keys, rename
the hwprobe key values associated with misaligned scalar accesses to
include the term SCALAR. Leave the old defines in place to maintain
source compatibility.
This change is intended to be a functional no-op.
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20240809214444.3257596-3-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
RISCV_HWPROBE_KEY_CPUPERF_0 was mistakenly flagged as a bitmask in
hwprobe_key_is_bitmask(), when in reality it was an enum value. This
causes problems when used in conjunction with RISCV_HWPROBE_WHICH_CPUS,
since SLOW, FAST, and EMULATED have values whose bits overlap with
each other. If the caller asked for the set of CPUs that was SLOW or
EMULATED, the returned set would also include CPUs that were FAST.
Introduce a new hwprobe key, RISCV_HWPROBE_KEY_MISALIGNED_PERF, which
returns the same values in response to a direct query (with no flags),
but is properly handled as an enumerated value. As a result, SLOW,
FAST, and EMULATED are all correctly treated as distinct values under
the new key when queried with the WHICH_CPUS flag.
Leave the old key in place to avoid disturbing applications which may
have already come to rely on the key, with or without its broken
behavior with respect to the WHICH_CPUS flag.
Fixes: e178bf146e ("RISC-V: hwprobe: Introduce which-cpus flag")
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20240809214444.3257596-2-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZrym4AAKCRCRxhvAZXjc
oqT3AP9ydoUNavaZcRayH8r3ybvz9+aJGJ6Q7NznFVCk71vn0gD/buLzmq96Muns
M5DWHbft2AFwK0Rz2nx8j5OXUeHwrQg=
=HZBL
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.11-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"VFS:
- Fix the name of file lease slab cache. When file leases were split
out of file locks the name of the file lock slab cache was used for
the file leases slab cache as well.
- Fix a type in take_fd() helper.
- Fix infinite directory iteration for stable offsets in tmpfs.
- When the icache is pruned all reclaimable inodes are marked with
I_FREEING and other processes that try to lookup such inodes will
block.
But some filesystems like ext4 can trigger lookups in their inode
evict callback causing deadlocks. Ext4 does such lookups if the
ea_inode feature is used whereby a separate inode may be used to
store xattrs.
Introduce I_LRU_ISOLATING which pins the inode while its pages are
reclaimed. This avoids inode deletion during inode_lru_isolate()
avoiding the deadlock and evict is made to wait until
I_LRU_ISOLATING is done.
netfs:
- Fault in smaller chunks for non-large folio mappings for
filesystems that haven't been converted to large folios yet.
- Fix the CONFIG_NETFS_DEBUG config option. The config option was
renamed a short while ago and that introduced two minor issues.
First, it depended on CONFIG_NETFS whereas it wants to depend on
CONFIG_NETFS_SUPPORT. The former doesn't exist, while the latter
does. Second, the documentation for the config option wasn't fixed
up.
- Revert the removal of the PG_private_2 writeback flag as ceph is
using it and fix how that flag is handled in netfs.
- Fix DIO reads on 9p. A program watching a file on a 9p mount
wouldn't see any changes in the size of the file being exported by
the server if the file was changed directly in the source
filesystem. Fix this by attempting to read the full size specified
when a DIO read is requested.
- Fix a NULL pointer dereference bug due to a data race where a
cachefiles cookies was retired even though it was still in use.
Check the cookie's n_accesses counter before discarding it.
nsfs:
- Fix ioctl declaration for NS_GET_MNTNS_ID from _IO() to _IOR() as
the kernel is writing to userspace.
pidfs:
- Prevent the creation of pidfds for kthreads until we have a
use-case for it and we know the semantics we want. It also confuses
userspace why they can get pidfds for kthreads.
squashfs:
- Fix an unitialized value bug reported by KMSAN caused by a
corrupted symbolic link size read from disk. Check that the
symbolic link size is not larger than expected"
* tag 'vfs-6.11-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
Squashfs: sanity check symbolic link size
9p: Fix DIO read through netfs
vfs: Don't evict inode under the inode lru traversing context
netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags
netfs, ceph: Revert "netfs: Remove deprecated use of PG_private_2 as a second writeback flag"
file: fix typo in take_fd() comment
pidfd: prevent creation of pidfds for kthreads
netfs: clean up after renaming FSCACHE_DEBUG config
libfs: fix infinite directory reads for offset dir
nsfs: fix ioctl declaration
fs/netfs/fscache_cookie: add missing "n_accesses" check
filelock: fix name of file_lease slab cache
netfs: Fault in smaller chunks for non-large folio mappings
While building kernel documention using make htmldocs command, I was
getting unexpected indentation error. Single description was given for
two module parameters with wrong indentation. So, I corrected the
indentation of both parameters and the description.
Signed-off-by: Shibu kumar <shibukumar.bit@gmail.com>
Signed-off-by: Daniel Yang <danielyangkang@gmail.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 0d815e3400 ("dm-crypt: limit the size of encryption requests")
- Use kvfree() for the kvmalloc'd nested MMUs array
- Set of fixes to address warnings in W=1 builds
- Make KVM depend on assembler support for ARMv8.4
- Fix for vgic-debug interface for VMs without LPIs
- Actually check ID_AA64MMFR3_EL1.S1PIE in get-reg-list selftest
- Minor code / comment cleanups for configuring PAuth traps
- Take kvm->arch.config_lock to prevent destruction / initialization
race for a vCPU's CPUIF which may lead to a UAF
-----BEGIN PGP SIGNATURE-----
iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZrVPUBccb2xpdmVyLnVw
dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFoCrAP9ZGQ1M7GdCe4Orm6Ex4R4OMVcz
MWMrFCVM73rnSoCbMwEA7le7M8c+X5i/4oqFOPm/fEr1i5RZT512RL5lc7MxBQ8=
=DG57
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-fixes-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.11, round #1
- Use kvfree() for the kvmalloc'd nested MMUs array
- Set of fixes to address warnings in W=1 builds
- Make KVM depend on assembler support for ARMv8.4
- Fix for vgic-debug interface for VMs without LPIs
- Actually check ID_AA64MMFR3_EL1.S1PIE in get-reg-list selftest
- Minor code / comment cleanups for configuring PAuth traps
- Take kvm->arch.config_lock to prevent destruction / initialization
race for a vCPU's CPUIF which may lead to a UAF
If a core is offline then enabling SMT should not online CPUs of
this core. By enabling SMT, what is intended is either changing the SMT
value from "off" to "on" or setting the SMT level (threads per core) from a
lower to higher value.
On PowerPC the ppc64_cpu utility can be used, among other things, to
perform the following functions:
ppc64_cpu --cores-on # Get the number of online cores
ppc64_cpu --cores-on=X # Put exactly X cores online
ppc64_cpu --offline-cores=X[,Y,...] # Put specified cores offline
ppc64_cpu --smt={on|off|value} # Enable, disable or change SMT level
If the user has decided to offline certain cores, enabling SMT should
not online CPUs in those cores. This patch fixes the issue and changes
the behaviour as described, by introducing an arch specific function
topology_is_core_online(). It is currently implemented only for PowerPC.
Fixes: 73c58e7e14 ("powerpc: Add HOTPLUG_SMT support")
Reported-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Closes: https://groups.google.com/g/powerpc-utils-devel/c/wrwVzAAnRlI/m/5KJSoqP4BAAJ
Signed-off-by: Nysal Jan K.A <nysal@linux.ibm.com>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240731030126.956210-2-nysal@linux.ibm.com
Commit 6b8e61472529 ("netfs: Rename CONFIG_FSCACHE_DEBUG to
CONFIG_NETFS_DEBUG") renames the config, but introduces two issues: First,
NETFS_DEBUG mistakenly depends on the non-existing config NETFS, whereas
the actual intended config is called NETFS_SUPPORT. Second, the config
renaming misses to adjust the documentation of the functionality of this
config.
Clean up those two points.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com>
Link: https://lore.kernel.org/r/20240731073902.69262-1-lukas.bulwahn@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>