arpi-5.15
1088 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
243f54dd3a |
ANDROID: vendor hook for TLB batching control
Add vendor hook for flushing TLB batching in zap_pte_range. Merged CL 232bdcbd660b1129b7d8d0de25a563b476eeb522: ANDROID: pass argument in zap_pte_range vendor hooks Bug: 238728493 Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: If2de5f070dd7b76624961f5a91440bf69a99ca2d Signed-off-by: Richard Chang <richardycc@google.com> (cherry picked from commit d257ef6764f228145d0fca24998162809bb5b9f7) |
||
|
|
c20204c67d |
ANDROID: cma: allow to use CMA in swap-in path
Now, we allow to use CMA pages for certain user space allocations. One of them is anonymous page fault case. To align the use case, we should also allow to use CMA pages in swap-in cases. This could help mitigate OOM on swap-in cases showing plenty of free CMA left. logd.klogd invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-1000 CPU: 0 PID: 433 Comm: logd.klogd Tainted: G W OE 5.10.100-android13-0 #1 Call trace: dump_backtrace.cfi_jt+0x0/0x8 show_stack+0x1c/0x2c dump_stack_lvl+0xc0/0x13c dump_header+0x54/0x238 oom_kill_process+0xb0/0x158 out_of_memory+0x17c/0x328 __alloc_pages_slowpath+0x5c4/0x8d0 __alloc_pages_nodemask+0x1bc/0x2e0 __read_swap_cache_async+0xdc/0x370 swap_vma_readahead+0x3b4/0x488 swapin_readahead+0x3c/0x54 do_swap_page+0x1e0/0xaa0 handle_pte_fault+0x128/0x1e0 handle_mm_fault+0x308/0x590 do_page_fault+0x33c/0x478 do_translation_fault+0x58/0x11c do_mem_abort+0x68/0x144 el0_da+0x24/0x34 el0_sync_handler+0xc4/0xec el0_sync+0x1c0/0x200 Mem-Info: active_anon:0 inactive_anon:3222 isolated_anon:62 active_file:232 inactive_file:428 isolated_file:0 unevictable:37232 dirty:3 writeback:40 slab_reclaimable:19943 slab_unreclaimable:281193 mapped:37126 shmem:2815 pagetables:8981 bounce:0 free:126007 free_pcp:223 free_cma:123062 Node 0 active_anon:16kB inactive_anon:13160kB active_file:292kB inactive_file:2000kB unevictable:148928kB isolated(anon):0kB isolated(file):0kB mapped:148308kB dirty:12kB writeback:164kB shmem:11260kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:20528kB shadow_call_stack:5200kB all_unreclaimable? no DMA32 free:14128kB min:7572kB low:22636kB high:37700kB reserved_highatomic:4096KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1913856kB managed:1553276kB mlocked:0kB pagetables:1292kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:2520kB lowmem_reserve[]: 0 0 0 Normal free:489888kB min:19808kB low:59220kB high:98632kB reserved_highatomic:36864KB active_anon:20kB inactive_anon:12168kB active_file:0kB inactive_file:1640kB unevictable:148928kB writepending:180kB present:4194304kB managed:4063392kB mlocked:148928kB pagetables:34632kB bounce:0kB free_pcp:1928kB local_pcp:0kB free_cma:489752kB lowmem_reserve[]: 0 0 0 DMA32: 166*4kB (UME) 163*8kB (UMECH) 592*16kB (UMCH) 5*32kB (UC) 2*64kB (C) 2*128kB (C) 2*256kB (C) 1*512kB (C) 1*1024kB (C) 0*2048kB 0*4096kB = 14032kB Normal: 969*4kB (C) 77*8kB (C) 40*16kB (C) 17*32kB (C) 5*64kB (C) 1*128kB (C) 2*256kB (C) 1*512kB (C) 0*1024kB 0*2048kB 118*4096kB (C) = 490476kB 40220 total pagecache pages 30 pages in swap cache Swap cache stats: add 2634625, delete 2635304, find 160621/2963954 Free swap = 1473788kB Total swap = 2097148kB Bug: 229822798 Signed-off-by: Martin Liu <liumartin@google.com> Change-Id: Ia0bb6f72e52f77f26062e1769bfd92e831f07cab Signed-off-by: Richard Chang <richardycc@google.com> (cherry picked from commit 3e591c63b13772a1de0ffed995b482166d27ed71) |
||
|
|
d5b2fa1c7b |
BACKPORT: mm: multi-gen LRU: groundwork
Evictable pages are divided into multiple generations for each lruvec. The youngest generation number is stored in lrugen->max_seq for both anon and file types as they are aged on an equal footing. The oldest generation numbers are stored in lrugen->min_seq[] separately for anon and file types as clean file pages can be evicted regardless of swap constraints. These three variables are monotonically increasing. Generation numbers are truncated into order_base_2(MAX_NR_GENS+1) bits in order to fit into the gen counter in page->flags. Each truncated generation number is an index to lrugen->lists[]. The sliding window technique is used to track at least MIN_NR_GENS and at most MAX_NR_GENS generations. The gen counter stores a value within [1, MAX_NR_GENS] while a page is on one of lrugen->lists[]. Otherwise it stores 0. There are two conceptually independent procedures: "the aging", which produces young generations, and "the eviction", which consumes old generations. They form a closed-loop system, i.e., "the page reclaim". Both procedures can be invoked from userspace for the purposes of working set estimation and proactive reclaim. These techniques are commonly used to optimize job scheduling (bin packing) in data centers [1][2]. To avoid confusion, the terms "hot" and "cold" will be applied to the multi-gen LRU, as a new convention; the terms "active" and "inactive" will be applied to the active/inactive LRU, as usual. The protection of hot pages and the selection of cold pages are based on page access channels and patterns. There are two access channels: one through page tables and the other through file descriptors. The protection of the former channel is by design stronger because: 1. The uncertainty in determining the access patterns of the former channel is higher due to the approximation of the accessed bit. 2. The cost of evicting the former channel is higher due to the TLB flushes required and the likelihood of encountering the dirty bit. 3. The penalty of underprotecting the former channel is higher because applications usually do not prepare themselves for major page faults like they do for blocked I/O. E.g., GUI applications commonly use dedicated I/O threads to avoid blocking rendering threads. There are also two access patterns: one with temporal locality and the other without. For the reasons listed above, the former channel is assumed to follow the former pattern unless VM_SEQ_READ or VM_RAND_READ is present; the latter channel is assumed to follow the latter pattern unless outlying refaults have been observed [3][4]. The next patch will address the "outlying refaults". Three macros, i.e., LRU_REFS_WIDTH, LRU_REFS_PGOFF and LRU_REFS_MASK, used later are added in this patch to make the entire patchset less diffy. A page is added to the youngest generation on faulting. The aging needs to check the accessed bit at least twice before handing this page over to the eviction. The first check takes care of the accessed bit set on the initial fault; the second check makes sure this page has not been used since then. This protocol, AKA second chance, requires a minimum of two generations, hence MIN_NR_GENS. [1] https://dl.acm.org/doi/10.1145/3297858.3304053 [2] https://dl.acm.org/doi/10.1145/3503222.3507731 [3] https://lwn.net/Articles/495543/ [4] https://lwn.net/Articles/815342/ Link: https://lkml.kernel.org/r/20220918080010.2920238-6-yuzhao@google.com Change-Id: I7b24d1e9d263e4eb2c2ee23f2eb143824fcb5201 Signed-off-by: Yu Zhao <yuzhao@google.com> Acked-by: Brian Geffon <bgeffon@google.com> Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name> Acked-by: Steven Barrett <steven@liquorix.net> Acked-by: Suleiman Souhlal <suleiman@google.com> Tested-by: Daniel Byrne <djbyrne@mtu.edu> Tested-by: Donald Carr <d@chaos-reins.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> Tested-by: Shuang Zhai <szhai2@cs.rochester.edu> Tested-by: Sofia Trinh <sofia.trinh@edi.works> Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Barry Song <baohua@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michael Larabel <Michael@MichaelLarabel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit ec1c86b25f4bdd9dce6436c0539d2a6ae676e1c4) [ Resolve conflicts in mm/memory.c, mm/memcontrol.c, mm/Kconfig, include/linux/mm_inline.h] Bug: 249601646 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> |
||
|
|
d232fd437a |
BACKPORT: mm: x86, arm64: add arch_has_hw_pte_young()
Patch series "Multi-Gen LRU Framework", v14. What's new ========== 1. OpenWrt, in addition to Android, Arch Linux Zen, Armbian, ChromeOS, Liquorix, post-factum and XanMod, is now shipping MGLRU on 5.15. 2. Fixed long-tailed direct reclaim latency seen on high-memory (TBs) machines. The old direct reclaim backoff, which tries to enforce a minimum fairness among all eligible memcgs, over-swapped by about (total_mem>>DEF_PRIORITY)-nr_to_reclaim. The new backoff, which pulls the plug on swapping once the target is met, trades some fairness for curtailed latency: https://lore.kernel.org/r/20220918080010.2920238-10-yuzhao@google.com/ 3. Fixed minior build warnings and conflicts. More comments and nits. TLDR ==== The current page reclaim is too expensive in terms of CPU usage and it often makes poor choices about what to evict. This patchset offers an alternative solution that is performant, versatile and straightforward. Patchset overview ================= The design and implementation overview is in patch 14: https://lore.kernel.org/r/20220918080010.2920238-15-yuzhao@google.com/ 01. mm: x86, arm64: add arch_has_hw_pte_young() 02. mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG Take advantage of hardware features when trying to clear the accessed bit in many PTEs. 03. mm/vmscan.c: refactor shrink_node() 04. Revert "include/linux/mm_inline.h: fold __update_lru_size() into its sole caller" Minor refactors to improve readability for the following patches. 05. mm: multi-gen LRU: groundwork Adds the basic data structure and the functions that insert pages to and remove pages from the multi-gen LRU (MGLRU) lists. 06. mm: multi-gen LRU: minimal implementation A minimal implementation without optimizations. 07. mm: multi-gen LRU: exploit locality in rmap Exploits spatial locality to improve efficiency when using the rmap. 08. mm: multi-gen LRU: support page table walks Further exploits spatial locality by optionally scanning page tables. 09. mm: multi-gen LRU: optimize multiple memcgs Optimizes the overall performance for multiple memcgs running mixed types of workloads. 10. mm: multi-gen LRU: kill switch Adds a kill switch to enable or disable MGLRU at runtime. 11. mm: multi-gen LRU: thrashing prevention 12. mm: multi-gen LRU: debugfs interface Provide userspace with features like thrashing prevention, working set estimation and proactive reclaim. 13. mm: multi-gen LRU: admin guide 14. mm: multi-gen LRU: design doc Add an admin guide and a design doc. Benchmark results ================= Independent lab results ----------------------- Based on the popularity of searches [01] and the memory usage in Google's public cloud, the most popular open-source memory-hungry applications, in alphabetical order, are: Apache Cassandra Memcached Apache Hadoop MongoDB Apache Spark PostgreSQL MariaDB (MySQL) Redis An independent lab evaluated MGLRU with the most widely used benchmark suites for the above applications. They posted 960 data points along with kernel metrics and perf profiles collected over more than 500 hours of total benchmark time. Their final reports show that, with 95% confidence intervals (CIs), the above applications all performed significantly better for at least part of their benchmark matrices. On 5.14: 1. Apache Spark [02] took 95% CIs [9.28, 11.19]% and [12.20, 14.93]% less wall time to sort three billion random integers, respectively, under the medium- and the high-concurrency conditions, when overcommitting memory. There were no statistically significant changes in wall time for the rest of the benchmark matrix. 2. MariaDB [03] achieved 95% CIs [5.24, 10.71]% and [20.22, 25.97]% more transactions per minute (TPM), respectively, under the medium- and the high-concurrency conditions, when overcommitting memory. There were no statistically significant changes in TPM for the rest of the benchmark matrix. 3. Memcached [04] achieved 95% CIs [23.54, 32.25]%, [20.76, 41.61]% and [21.59, 30.02]% more operations per second (OPS), respectively, for sequential access, random access and Gaussian (distribution) access, when THP=always; 95% CIs [13.85, 15.97]% and [23.94, 29.92]% more OPS, respectively, for random access and Gaussian access, when THP=never. There were no statistically significant changes in OPS for the rest of the benchmark matrix. 4. MongoDB [05] achieved 95% CIs [2.23, 3.44]%, [6.97, 9.73]% and [2.16, 3.55]% more operations per second (OPS), respectively, for exponential (distribution) access, random access and Zipfian (distribution) access, when underutilizing memory; 95% CIs [8.83, 10.03]%, [21.12, 23.14]% and [5.53, 6.46]% more OPS, respectively, for exponential access, random access and Zipfian access, when overcommitting memory. On 5.15: 5. Apache Cassandra [06] achieved 95% CIs [1.06, 4.10]%, [1.94, 5.43]% and [4.11, 7.50]% more operations per second (OPS), respectively, for exponential (distribution) access, random access and Zipfian (distribution) access, when swap was off; 95% CIs [0.50, 2.60]%, [6.51, 8.77]% and [3.29, 6.75]% more OPS, respectively, for exponential access, random access and Zipfian access, when swap was on. 6. Apache Hadoop [07] took 95% CIs [5.31, 9.69]% and [2.02, 7.86]% less average wall time to finish twelve parallel TeraSort jobs, respectively, under the medium- and the high-concurrency conditions, when swap was on. There were no statistically significant changes in average wall time for the rest of the benchmark matrix. 7. PostgreSQL [08] achieved 95% CI [1.75, 6.42]% more transactions per minute (TPM) under the high-concurrency condition, when swap was off; 95% CIs [12.82, 18.69]% and [22.70, 46.86]% more TPM, respectively, under the medium- and the high-concurrency conditions, when swap was on. There were no statistically significant changes in TPM for the rest of the benchmark matrix. 8. Redis [09] achieved 95% CIs [0.58, 5.94]%, [6.55, 14.58]% and [11.47, 19.36]% more total operations per second (OPS), respectively, for sequential access, random access and Gaussian (distribution) access, when THP=always; 95% CIs [1.27, 3.54]%, [10.11, 14.81]% and [8.75, 13.64]% more total OPS, respectively, for sequential access, random access and Gaussian access, when THP=never. Our lab results --------------- To supplement the above results, we ran the following benchmark suites on 5.16-rc7 and found no regressions [10]. fs_fio_bench_hdd_mq pft fs_lmbench pgsql-hammerdb fs_parallelio redis fs_postmark stream hackbench sysbenchthread kernbench tpcc_spark memcached unixbench multichase vm-scalability mutilate will-it-scale nginx [01] https://trends.google.com [02] https://lore.kernel.org/r/20211102002002.92051-1-bot@edi.works/ [03] https://lore.kernel.org/r/20211009054315.47073-1-bot@edi.works/ [04] https://lore.kernel.org/r/20211021194103.65648-1-bot@edi.works/ [05] https://lore.kernel.org/r/20211109021346.50266-1-bot@edi.works/ [06] https://lore.kernel.org/r/20211202062806.80365-1-bot@edi.works/ [07] https://lore.kernel.org/r/20211209072416.33606-1-bot@edi.works/ [08] https://lore.kernel.org/r/20211218071041.24077-1-bot@edi.works/ [09] https://lore.kernel.org/r/20211122053248.57311-1-bot@edi.works/ [10] https://lore.kernel.org/r/20220104202247.2903702-1-yuzhao@google.com/ Read-world applications ======================= Third-party testimonials ------------------------ Konstantin reported [11]: I have Archlinux with 8G RAM + zswap + swap. While developing, I have lots of apps opened such as multiple LSP-servers for different langs, chats, two browsers, etc... Usually, my system gets quickly to a point of SWAP-storms, where I have to kill LSP-servers, restart browsers to free memory, etc, otherwise the system lags heavily and is barely usable. 1.5 day ago I migrated from 5.11.15 kernel to 5.12 + the LRU patchset, and I started up by opening lots of apps to create memory pressure, and worked for a day like this. Till now I had not a single SWAP-storm, and mind you I got 3.4G in SWAP. I was never getting to the point of 3G in SWAP before without a single SWAP-storm. Vaibhav from IBM reported [12]: In a synthetic MongoDB Benchmark, seeing an average of ~19% throughput improvement on POWER10(Radix MMU + 64K Page Size) with MGLRU patches on top of 5.16 kernel for MongoDB + YCSB across three different request distributions, namely, Exponential, Uniform and Zipfan. Shuang from U of Rochester reported [13]: With the MGLRU, fio achieved 95% CIs [38.95, 40.26]%, [4.12, 6.64]% and [9.26, 10.36]% higher throughput, respectively, for random access, Zipfian (distribution) access and Gaussian (distribution) access, when the average number of jobs per CPU is 1; 95% CIs [42.32, 49.15]%, [9.44, 9.89]% and [20.99, 22.86]% higher throughput, respectively, for random access, Zipfian access and Gaussian access, when the average number of jobs per CPU is 2. Daniel from Michigan Tech reported [14]: With Memcached allocating ~100GB of byte-addressable Optante, performance improvement in terms of throughput (measured as queries per second) was about 10% for a series of workloads. Large-scale deployments ----------------------- We've rolled out MGLRU to tens of millions of ChromeOS users and about a million Android users. Google's fleetwide profiling [15] shows an overall 40% decrease in kswapd CPU usage, in addition to improvements in other UX metrics, e.g., an 85% decrease in the number of low-memory kills at the 75th percentile and an 18% decrease in app launch time at the 50th percentile. The downstream kernels that have been using MGLRU include: 1. Android [16] 2. Arch Linux Zen [17] 3. Armbian [18] 4. ChromeOS [19] 5. Liquorix [20] 6. OpenWrt [21] 7. post-factum [22] 8. XanMod [23] [11] https://lore.kernel.org/r/140226722f2032c86301fbd326d91baefe3d7d23.camel@yandex.ru/ [12] https://lore.kernel.org/r/87czj3mux0.fsf@vajain21.in.ibm.com/ [13] https://lore.kernel.org/r/20220105024423.26409-1-szhai2@cs.rochester.edu/ [14] https://lore.kernel.org/r/CA+4-3vksGvKd18FgRinxhqHetBS1hQekJE2gwco8Ja-bJWKtFw@mail.gmail.com/ [15] https://dl.acm.org/doi/10.1145/2749469.2750392 [16] https://android.com [17] https://archlinux.org [18] https://armbian.com [19] https://chromium.org [20] https://liquorix.net [21] https://openwrt.org [22] https://codeberg.org/pf-kernel [23] https://xanmod.org Summary ======= The facts are: 1. The independent lab results and the real-world applications indicate substantial improvements; there are no known regressions. 2. Thrashing prevention, working set estimation and proactive reclaim work out of the box; there are no equivalent solutions. 3. There is a lot of new code; no smaller changes have been demonstrated similar effects. Our options, accordingly, are: 1. Given the amount of evidence, the reported improvements will likely materialize for a wide range of workloads. 2. Gauging the interest from the past discussions, the new features will likely be put to use for both personal computers and data centers. 3. Based on Google's track record, the new code will likely be well maintained in the long term. It'd be more difficult if not impossible to achieve similar effects with other approaches. This patch (of 14): Some architectures automatically set the accessed bit in PTEs, e.g., x86 and arm64 v8.2. On architectures that do not have this capability, clearing the accessed bit in a PTE usually triggers a page fault following the TLB miss of this PTE (to emulate the accessed bit). Being aware of this capability can help make better decisions, e.g., whether to spread the work out over a period of time to reduce bursty page faults when trying to clear the accessed bit in many PTEs. Note that theoretically this capability can be unreliable, e.g., hotplugged CPUs might be different from builtin ones. Therefore it should not be used in architecture-independent code that involves correctness, e.g., to determine whether TLB flushes are required (in combination with the accessed bit). Link: https://lkml.kernel.org/r/20220918080010.2920238-1-yuzhao@google.com Link: https://lkml.kernel.org/r/20220918080010.2920238-2-yuzhao@google.com Change-Id: I7c94aa3ffeb0a8e570c2d7db15183f87658d0141 Signed-off-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Barry Song <baohua@kernel.org> Acked-by: Brian Geffon <bgeffon@google.com> Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name> Acked-by: Steven Barrett <steven@liquorix.net> Acked-by: Suleiman Souhlal <suleiman@google.com> Acked-by: Will Deacon <will@kernel.org> Tested-by: Daniel Byrne <djbyrne@mtu.edu> Tested-by: Donald Carr <d@chaos-reins.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> Tested-by: Shuang Zhai <szhai2@cs.rochester.edu> Tested-by: Sofia Trinh <sofia.trinh@edi.works> Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arm-kernel@lists.infradead.org Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Larabel <Michael@MichaelLarabel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit e1fd09e3d1dd4a1a8b3b33bc1fd647eee9f4e475) [Kalesh Singh - Fix trivial conflict in arch/arm64/include/asm/pgtable.h] Bug: 249601646 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> |
||
|
|
b3890c0f96 |
Revert "FROMLIST: mm: x86, arm64: add arch_has_hw_pte_young()"
This reverts commit
|
||
|
|
2635d7d108 |
Revert "FROMLIST: mm: multi-gen LRU: groundwork"
This reverts commit
|
||
|
|
4ea18cd059 |
ANDROID: mm: fix speculative walk which is unsafe under RCU
Speculative page fault handling expects MMU_GATHER_RCU_TABLE_FREE to guarantee that page tables are stable, however tlb_remove_table() has a slow-path fall-back case when __get_free_page() returns NULL and tlb_remove_table_one() gets called. The way synchronization is implemented in that function is not RCU-safe and require IRQs to be disabled (see the comment in tlb_remove_table_sync_one()). Fix the invalid assumption to disable IRQs even when MMU_GATHER_RCU_TABLE_FREE=y. Bug: 257443051 Change-Id: I227f351607cf73022cb31f6f7a232cab41cf6a5a Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
|
|
ca96bd7bf1 |
ANDROID: mm: avoid using vmacache in lockless vma search
When searching vma under RCU protection vmcache should be avoided because a race with munmap() might result in finding a vma and placing it into vmcache after munmap() removed that vma and called vmcache_invalidate. Once that vma is freed, vmcache will be left with an invalid vma pointer. Bug: 257443051 Change-Id: I62438305fcf5139974f4f7d3bae5b22c74084a59 Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
|
|
a1f65b39ba |
ANDROID: mm: skip pte_alloc during speculative page fault
Speculative page fault checks pmd to be valid before starting to handle the page fault and pte_alloc() should do nothing if pmd stays valid. If pmd gets changed during speculative page fault, we will detect the change later and retry with mmap_lock. Therefore pte_alloc() can be safely skipped and this prevents the racy pmd_lock() call which can access pmd->ptl after pmd was cleared. Bug: 257443051 Change-Id: Iec57df5530dba6e0e0bdf9f7500f910851c3d3fd Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
|
|
3f311327f9 |
ANDROID: mm: introduce vma refcounting to protect vma during SPF
Current mechanism to stabilize a vma during speculative page fault handling makes a copy of the faulting vma under RCU protection. This makes it hard to protect elements which do not belong to the vma but are used by the page fault handler like vma->vm_file. The problems is that a copy of the vma can't be used to safely protect the file attached to the original vma unless the file is also released after RCU grace period (which is how SPF was designed originally but that caused performance regression and had to be changed). To avoid these complications, introduce vma refcounting to stabilize and operate on the original vma during page fault handling. Page fault handler finds the vma and increases its refcount under RCU protection, vma is freed after RCU grace period, vma->vm_file is released only after refcount indicates no users. This mechanism guarantees that once get_vma returns a vma, both the vma itself and vma->vm_file are stable. Additional benefits of this patch are: we don't need to copy the vma and no additional logic is needed to stabilize vma->vm_file. Bug: 257443051 Change-Id: I59d373926d687fcbd56847a8c3500c43bf1844c8 Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
|
|
19f11a21de |
ANDROID: fix a race between speculative page walk and unmap operations
Speculative page fault walks the page tables under RCU protection and
assumes that page tables are stable after ptl lock is taken. Current
implementation has three issues:
1. While pmd can't be destroyed while in RCU read section, it can be
cleared and result in an invalid ptl lock address. Fix that by
rechecking pmd value after obtaining ptl lock.
2. In case of CONFIG_ALLOC_SPLIT_PTLOCKS, ptl lock is separate from the
pmd and is destroyed by a synchronous call to pgtable_pmd_page_dtor,
which can happen while page walker is in RCU section. Prevent this by
adding a dependency for CONFIG_SPECULATIVE_PAGE_FAULT to require
!CONIG_ALLOC_SPLIT_PTLOCKS.
3. Below sequence when do_mmap happens after the last mmap_seq check
would result in use-after-free issue.
__pte_map_lock
rcu_read_lock()
mmap_seq_read_check()
ptl = pte_lockptr(vmf->pmd)
spin_trylock(ptl)
mmap_seq_read_check()
mmap_write_lock()
do_mmap()
unmap_region()
unmap_vmas()
free_pgtables()
...
free_pte_range
pmd_clear
pte_free_tlb
...
call_rcu(tlb_remove_table_rcu)
rcu_read_unlock()
tlb_remove_table_rcu
spin_unlock(ptl) <-- UAF!
To prevent that free_pte_range needs to be blocked if ptl is locked and
is in use.
Bug: 245389404
Change-Id: I7b353f0995fc59e92bb2069bcdc7d1ac29b521b9
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
(cherry picked from commit 365ffc56b4862e9b49097450a1ad10392723bbe0)
|
||
|
|
046ce7a74e |
Merge 5.15.59 into android14-5.15
Changes in 5.15.59
Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
Revert "ocfs2: mount shared volume without ha stack"
ntfs: fix use-after-free in ntfs_ucsncmp()
fs: sendfile handles O_NONBLOCK of out_fd
secretmem: fix unhandled fault in truncate
mm: fix page leak with multiple threads mapping the same page
hugetlb: fix memoryleak in hugetlb_mcopy_atomic_pte
asm-generic: remove a broken and needless ifdef conditional
s390/archrandom: prevent CPACF trng invocations in interrupt context
nouveau/svm: Fix to migrate all requested pages
drm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid()
watch_queue: Fix missing rcu annotation
watch_queue: Fix missing locking in add_watch_to_object()
tcp: Fix data-races around sysctl_tcp_dsack.
tcp: Fix a data-race around sysctl_tcp_app_win.
tcp: Fix a data-race around sysctl_tcp_adv_win_scale.
tcp: Fix a data-race around sysctl_tcp_frto.
tcp: Fix a data-race around sysctl_tcp_nometrics_save.
tcp: Fix data-races around sysctl_tcp_no_ssthresh_metrics_save.
ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)
ice: do not setup vlan for loopback VSI
scsi: ufs: host: Hold reference returned by of_parse_phandle()
Revert "tcp: change pingpong threshold to 3"
octeontx2-pf: Fix UDP/TCP src and dst port tc filters
tcp: Fix data-races around sysctl_tcp_moderate_rcvbuf.
tcp: Fix a data-race around sysctl_tcp_limit_output_bytes.
tcp: Fix a data-race around sysctl_tcp_challenge_ack_limit.
scsi: core: Fix warning in scsi_alloc_sgtables()
scsi: mpt3sas: Stop fw fault watchdog work item during system shutdown
net: ping6: Fix memleak in ipv6_renew_options().
ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr
net/tls: Remove the context from the list in tls_device_down
igmp: Fix data-races around sysctl_igmp_qrv.
net: pcs: xpcs: propagate xpcs_read error to xpcs_get_state_c37_sgmii
net: sungem_phy: Add of_node_put() for reference returned by of_get_parent()
tcp: Fix a data-race around sysctl_tcp_min_tso_segs.
tcp: Fix a data-race around sysctl_tcp_min_rtt_wlen.
tcp: Fix a data-race around sysctl_tcp_autocorking.
tcp: Fix a data-race around sysctl_tcp_invalid_ratelimit.
Documentation: fix sctp_wmem in ip-sysctl.rst
macsec: fix NULL deref in macsec_add_rxsa
macsec: fix error message in macsec_add_rxsa and _txsa
macsec: limit replay window size with XPN
macsec: always read MACSEC_SA_ATTR_PN as a u64
net: macsec: fix potential resource leak in macsec_add_rxsa() and macsec_add_txsa()
net: mld: fix reference count leak in mld_{query | report}_work()
tcp: Fix data-races around sk_pacing_rate.
net: Fix data-races around sysctl_[rw]mem(_offset)?.
tcp: Fix a data-race around sysctl_tcp_comp_sack_delay_ns.
tcp: Fix a data-race around sysctl_tcp_comp_sack_slack_ns.
tcp: Fix a data-race around sysctl_tcp_comp_sack_nr.
tcp: Fix data-races around sysctl_tcp_reflect_tos.
ipv4: Fix data-races around sysctl_fib_notify_on_flag_change.
i40e: Fix interface init with MSI interrupts (no MSI-X)
sctp: fix sleep in atomic context bug in timer handlers
octeontx2-pf: cn10k: Fix egress ratelimit configuration
netfilter: nf_queue: do not allow packet truncation below transport header offset
virtio-net: fix the race between refill work and close
perf symbol: Correct address for bss symbols
sfc: disable softirqs for ptp TX
sctp: leave the err path free in sctp_stream_init to sctp_stream_free
ARM: crypto: comment out gcc warning that breaks clang builds
mm/hmm: fault non-owner device private entries
page_alloc: fix invalid watermark check on a negative value
ARM: 9216/1: Fix MAX_DMA_ADDRESS overflow
EDAC/ghes: Set the DIMM label unconditionally
docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed
locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter
x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available
Linux 5.15.59
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4f2002d38aea467e150a912f50d456c41b23de89
|
||
|
|
2722fb0f70 |
mm: fix page leak with multiple threads mapping the same page
commit 3fe2895cfecd03ac74977f32102b966b6589f481 upstream. We have an application with a lot of threads that use a shared mmap backed by tmpfs mounted with -o huge=within_size. This application started leaking loads of huge pages when we upgraded to a recent kernel. Using the page ref tracepoints and a BPF program written by Tejun Heo we were able to determine that these pages would have multiple refcounts from the page fault path, but when it came to unmap time we wouldn't drop the number of refs we had added from the faults. I wrote a reproducer that mmap'ed a file backed by tmpfs with -o huge=always, and then spawned 20 threads all looping faulting random offsets in this map, while using madvise(MADV_DONTNEED) randomly for huge page aligned ranges. This very quickly reproduced the problem. The problem here is that we check for the case that we have multiple threads faulting in a range that was previously unmapped. One thread maps the PMD, the other thread loses the race and then returns 0. However at this point we already have the page, and we are no longer putting this page into the processes address space, and so we leak the page. We actually did the correct thing prior to |
||
|
|
56f32ebb01 |
Merge 5.15.56 into android14-5.15
Changes in 5.15.56
ALSA: hda - Add fixup for Dell Latitidue E5430
ALSA: hda/conexant: Apply quirk for another HP ProDesk 600 G3 model
ALSA: hda/realtek: Fix headset mic for Acer SF313-51
ALSA: hda/realtek - Fix headset mic problem for a HP machine with alc671
ALSA: hda/realtek: fix mute/micmute LEDs for HP machines
ALSA: hda/realtek - Fix headset mic problem for a HP machine with alc221
ALSA: hda/realtek - Enable the headset-mic on a Xiaomi's laptop
xen/netback: avoid entering xenvif_rx_next_skb() with an empty rx queue
fix race between exit_itimers() and /proc/pid/timers
mm: userfaultfd: fix UFFDIO_CONTINUE on fallocated shmem pages
mm: split huge PUD on wp_huge_pud fallback
tracing/histograms: Fix memory leak problem
net: sock: tracing: Fix sock_exceed_buf_limit not to dereference stale pointer
ip: fix dflt addr selection for connected nexthop
ARM: 9213/1: Print message about disabled Spectre workarounds only once
ARM: 9214/1: alignment: advance IT state after emulating Thumb instruction
wifi: mac80211: fix queue selection for mesh/OCB interfaces
cgroup: Use separate src/dst nodes when preloading css_sets for migration
btrfs: return -EAGAIN for NOWAIT dio reads/writes on compressed and inline extents
drm/panfrost: Put mapping instead of shmem obj on panfrost_mmu_map_fault_addr() error
drm/panfrost: Fix shrinker list corruption by madvise IOCTL
fs/remap: constrain dedupe of EOF blocks
nilfs2: fix incorrect masking of permission flags for symlinks
sh: convert nommu io{re,un}map() to static inline functions
Revert "evm: Fix memleak in init_desc"
xfs: only run COW extent recovery when there are no live extents
xfs: don't include bnobt blocks when reserving free block pool
xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks
xfs: drop async cache flushes from CIL commits.
reset: Fix devm bulk optional exclusive control getter
ARM: dts: imx6qdl-ts7970: Fix ngpio typo and count
spi: amd: Limit max transfer and message size
ARM: 9209/1: Spectre-BHB: avoid pr_info() every time a CPU comes out of idle
ARM: 9210/1: Mark the FDT_FIXED sections as shareable
net/mlx5e: kTLS, Fix build time constant test in TX
net/mlx5e: kTLS, Fix build time constant test in RX
net/mlx5e: Fix enabling sriov while tc nic rules are offloaded
net/mlx5e: Fix capability check for updating vnic env counters
net/mlx5e: Ring the TX doorbell on DMA errors
drm/i915: fix a possible refcount leak in intel_dp_add_mst_connector()
ima: Fix a potential integer overflow in ima_appraise_measurement
ASoC: sgtl5000: Fix noise on shutdown/remove
ASoC: tas2764: Add post reset delays
ASoC: tas2764: Fix and extend FSYNC polarity handling
ASoC: tas2764: Correct playback volume range
ASoC: tas2764: Fix amp gain register offset & default
ASoC: Intel: Skylake: Correct the ssp rate discovery in skl_get_ssp_clks()
ASoC: Intel: Skylake: Correct the handling of fmt_config flexible array
net: stmmac: dwc-qos: Disable split header for Tegra194
net: ethernet: ti: am65-cpsw: Fix devlink port register sequence
sysctl: Fix data races in proc_dointvec().
sysctl: Fix data races in proc_douintvec().
sysctl: Fix data races in proc_dointvec_minmax().
sysctl: Fix data races in proc_douintvec_minmax().
sysctl: Fix data races in proc_doulongvec_minmax().
sysctl: Fix data races in proc_dointvec_jiffies().
tcp: Fix a data-race around sysctl_tcp_max_orphans.
inetpeer: Fix data-races around sysctl.
net: Fix data-races around sysctl_mem.
cipso: Fix data-races around sysctl.
icmp: Fix data-races around sysctl.
ipv4: Fix a data-race around sysctl_fib_sync_mem.
ARM: dts: at91: sama5d2: Fix typo in i2s1 node
ARM: dts: sunxi: Fix SPI NOR campatible on Orange Pi Zero
arm64: dts: broadcom: bcm4908: Fix timer node for BCM4906 SoC
arm64: dts: broadcom: bcm4908: Fix cpu node for smp boot
netfilter: nf_log: incorrect offset to network header
netfilter: nf_tables: replace BUG_ON by element length check
drm/i915/gvt: IS_ERR() vs NULL bug in intel_gvt_update_reg_whitelist()
xen/gntdev: Ignore failure to unmap INVALID_GRANT_HANDLE
lockd: set fl_owner when unlocking files
lockd: fix nlm_close_files
tracing: Fix sleeping while atomic in kdb ftdump
drm/i915/selftests: fix a couple IS_ERR() vs NULL tests
drm/i915/dg2: Add Wa_22011100796
drm/i915/gt: Serialize GRDOM access between multiple engine resets
drm/i915/gt: Serialize TLB invalidates with GT resets
drm/i915/uc: correctly track uc_fw init failure
drm/i915: Require the vm mutex for i915_vma_bind()
bnxt_en: Fix bnxt_reinit_after_abort() code path
bnxt_en: Fix bnxt_refclk_read()
sysctl: Fix data-races in proc_dou8vec_minmax().
sysctl: Fix data-races in proc_dointvec_ms_jiffies().
icmp: Fix data-races around sysctl_icmp_echo_enable_probe.
icmp: Fix a data-race around sysctl_icmp_ignore_bogus_error_responses.
icmp: Fix a data-race around sysctl_icmp_errors_use_inbound_ifaddr.
icmp: Fix a data-race around sysctl_icmp_ratelimit.
icmp: Fix a data-race around sysctl_icmp_ratemask.
raw: Fix a data-race around sysctl_raw_l3mdev_accept.
tcp: Fix a data-race around sysctl_tcp_ecn_fallback.
ipv4: Fix data-races around sysctl_ip_dynaddr.
nexthop: Fix data-races around nexthop_compat_mode.
net: ftgmac100: Hold reference returned by of_get_child_by_name()
net: stmmac: fix leaks in probe
ima: force signature verification when CONFIG_KEXEC_SIG is configured
ima: Fix potential memory leak in ima_init_crypto()
drm/amd/display: Only use depth 36 bpp linebuffers on DCN display engines.
drm/amd/pm: Prevent divide by zero
sfc: fix use after free when disabling sriov
ceph: switch netfs read ops to use rreq->inode instead of rreq->mapping->host
seg6: fix skb checksum evaluation in SRH encapsulation/insertion
seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps behaviors
seg6: bpf: fix skb checksum in bpf_push_seg6_encap()
sfc: fix kernel panic when creating VF
net: atlantic: remove deep parameter on suspend/resume functions
net: atlantic: remove aq_nic_deinit() when resume
KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op()
net/tls: Check for errors in tls_device_init
ACPI: video: Fix acpi_video_handles_brightness_key_presses()
mm: sysctl: fix missing numa_stat when !CONFIG_HUGETLB_PAGE
btrfs: rename btrfs_bio to btrfs_io_context
btrfs: zoned: fix a leaked bioc in read_zone_info
ksmbd: use SOCK_NONBLOCK type for kernel_accept()
powerpc/xive/spapr: correct bitmap allocation size
vdpa/mlx5: Initialize CVQ vringh only once
vduse: Tie vduse mgmtdev and its device
virtio_mmio: Add missing PM calls to freeze/restore
virtio_mmio: Restore guest page size on resume
netfilter: br_netfilter: do not skip all hooks with 0 priority
scsi: hisi_sas: Limit max hw sectors for v3 HW
cpufreq: pmac32-cpufreq: Fix refcount leak bug
platform/x86: hp-wmi: Ignore Sanitization Mode event
firmware: sysfb: Make sysfb_create_simplefb() return a pdev pointer
firmware: sysfb: Add sysfb_disable() helper function
fbdev: Disable sysfb device registration when removing conflicting FBs
net: tipc: fix possible refcount leak in tipc_sk_create()
NFC: nxp-nci: don't print header length mismatch on i2c error
nvme-tcp: always fail a request when sending it failed
nvme: fix regression when disconnect a recovering ctrl
net: sfp: fix memory leak in sfp_probe()
ASoC: ops: Fix off by one in range control validation
pinctrl: aspeed: Fix potential NULL dereference in aspeed_pinmux_set_mux()
ASoC: Realtek/Maxim SoundWire codecs: disable pm_runtime on remove
ASoC: rt711-sdca-sdw: fix calibrate mutex initialization
ASoC: Intel: sof_sdw: handle errors on card registration
ASoC: rt711: fix calibrate mutex initialization
ASoC: rt7*-sdw: harden jack_detect_handler
ASoC: codecs: rt700/rt711/rt711-sdca: initialize workqueues in probe
ASoC: SOF: Intel: hda-loader: Clarify the cl_dsp_init() flow
ASoC: wcd938x: Fix event generation for some controls
ASoC: Intel: bytcr_wm5102: Fix GPIO related probe-ordering problem
ASoC: wm5110: Fix DRE control
ASoC: rt711-sdca: fix kernel NULL pointer dereference when IO error
ASoC: dapm: Initialise kcontrol data for mux/demux controls
ASoC: cs47l15: Fix event generation for low power mux control
ASoC: madera: Fix event generation for OUT1 demux
ASoC: madera: Fix event generation for rate controls
irqchip: or1k-pic: Undefine mask_ack for level triggered hardware
x86: Clear .brk area at early boot
soc: ixp4xx/npe: Fix unused match warning
ARM: dts: stm32: use the correct clock source for CEC on stm32mp151
Revert "can: xilinx_can: Limit CANFD brp to 2"
ALSA: usb-audio: Add quirks for MacroSilicon MS2100/MS2106 devices
ALSA: usb-audio: Add quirk for Fiero SC-01
ALSA: usb-audio: Add quirk for Fiero SC-01 (fw v1.0.0)
nvme-pci: phison e16 has bogus namespace ids
signal handling: don't use BUG_ON() for debugging
USB: serial: ftdi_sio: add Belimo device ids
usb: typec: add missing uevent when partner support PD
usb: dwc3: gadget: Fix event pending check
tty: serial: samsung_tty: set dma burst_size to 1
vt: fix memory overlapping when deleting chars in the buffer
serial: 8250: fix return error code in serial8250_request_std_resource()
serial: stm32: Clear prev values before setting RTS delays
serial: pl011: UPSTAT_AUTORTS requires .throttle/unthrottle
serial: 8250: Fix PM usage_count for console handover
x86/pat: Fix x86_has_pat_wp()
drm/aperture: Run fbdev removal before internal helpers
Linux 5.15.56
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I763d2a7b49435bf2996b31e201aa9794ab64609e
|
||
|
|
e4967d2288 |
mm: split huge PUD on wp_huge_pud fallback
commit 14c99d65941538aa33edd8dc7b1bbbb593c324a2 upstream.
Currently the implementation will split the PUD when a fallback is taken
inside the create_huge_pud function. This isn't where it should be done:
the splitting should be done in wp_huge_pud, just like it's done for PMDs.
Reason being that if a callback is taken during create, there is no PUD
yet so nothing to split, whereas if a fallback is taken when encountering
a write protection fault there is something to split.
It looks like this was the original intention with the commit where the
splitting was introduced, but somehow it got moved to the wrong place
between v1 and v2 of the patch series. Rebase mistake perhaps.
Link: https://lkml.kernel.org/r/6f48d622eb8bce1ae5dd75327b0b73894a2ec407.camel@amazon.com
Fixes:
|
||
|
|
0864756fb0 |
Revert "ANDROID: Use the notifier lock to perform file-backed vma teardown"
This reverts commit
|
||
|
|
6072c99f21 |
ANDROID: Use the notifier lock to perform file-backed vma teardown
When a file-backed vma is being released, the userspace can have an
expectation that the vma and the file it's pinning will be released
synchronously. This does not happen when SPF is enabled because vma
and associated file are released asynchronously after RCU grace
period. This is done to prevent pagefault handler from stepping on
a deleted object. Fix this issue by synchronizing the file-backed
pagefault handler with the vma tear-down using notifier lock.
Fixes:
|
||
|
|
3fb2e91e28 |
Merge 5.15.40 into android-5.15
Changes in 5.15.40
x86/lib/atomic64_386_32: Rename things
x86: Prepare asm files for straight-line-speculation
x86: Prepare inline-asm for straight-line-speculation
objtool: Add straight-line-speculation validation
x86/alternative: Relax text_poke_bp() constraint
kbuild: move objtool_args back to scripts/Makefile.build
x86: Add straight-line-speculation mitigation
tools arch: Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem memcpy'
kvm/emulate: Fix SETcc emulation function offsets with SLS
crypto: x86/poly1305 - Fixup SLS
objtool: Fix SLS validation for kcov tail-call replacement
Bluetooth: Fix the creation of hdev->name
rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition
udf: Avoid using stale lengthOfImpUse
mm: fix missing cache flush for all tail pages of compound page
mm: hugetlb: fix missing cache flush in copy_huge_page_from_user()
mm: shmem: fix missing cache flush in shmem_mfill_atomic_pte()
mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic()
mm/hwpoison: fix error page recovered but reported "not recovered"
mm/mlock: fix potential imbalanced rlimit ucounts adjustment
mm: fix invalid page pointer returned with FOLL_PIN gups
Linux 5.15.40
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I039f8014f8e898f74f2f98ab71ef29c39a1ad544
|
||
|
|
e36b476a82 |
mm: hugetlb: fix missing cache flush in copy_huge_page_from_user()
commit e763243cc6cb1fcc720ec58cfd6e7c35ae90a479 upstream.
userfaultfd calls copy_huge_page_from_user() which does not do any cache
flushing for the target page. Then the target page will be mapped to
the user space with a different address (user address), which might have
an alias issue with the kernel address used to copy the data from the
user to.
Fix this issue by flushing dcache in copy_huge_page_from_user().
Link: https://lkml.kernel.org/r/20220210123058.79206-4-songmuchun@bytedance.com
Fixes:
|
||
|
|
33f5d1daec |
Merge 5.15.34 into android13-5.15
Changes in 5.15.34
lib/logic_iomem: correct fallback config references
um: fix and optimize xor select template for CONFIG64 and timetravel mode
rtc: wm8350: Handle error for wm8350_register_irq
nbd: add error handling support for add_disk()
nbd: Fix incorrect error handle when first_minor is illegal in nbd_dev_add
nbd: Fix hungtask when nbd_config_put
nbd: fix possible overflow on 'first_minor' in nbd_dev_add()
kfence: count unexpectedly skipped allocations
kfence: move saving stack trace of allocations into __kfence_alloc()
kfence: limit currently covered allocations when pool nearly full
KVM: x86/pmu: Use different raw event masks for AMD and Intel
KVM: SVM: Fix kvm_cache_regs.h inclusions for is_guest_mode()
KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs
KVM: x86/pmu: Fix and isolate TSX-specific performance event logic
KVM: x86/emulator: Emulate RDPID only if it is enabled in guest
drm: Add orientation quirk for GPD Win Max
ath5k: fix OOB in ath5k_eeprom_read_pcal_info_5111
drm/amd/display: Add signal type check when verify stream backends same
drm/amd/amdgpu/amdgpu_cs: fix refcount leak of a dma_fence obj
drm/amd/display: Fix memory leak
drm/amd/display: Use PSR version selected during set_psr_caps
usb: gadget: tegra-xudc: Do not program SPARAM
usb: gadget: tegra-xudc: Fix control endpoint's definitions
usb: cdnsp: fix cdnsp_decode_trb function to properly handle ret value
ptp: replace snprintf with sysfs_emit
drm/amdkfd: Don't take process mutex for svm ioctls
powerpc: dts: t104xrdb: fix phy type for FMAN 4/5
ath11k: fix kernel panic during unload/load ath11k modules
ath11k: pci: fix crash on suspend if board file is not found
ath11k: mhi: use mhi_sync_power_up()
net/smc: Send directly when TCP_CORK is cleared
drm/bridge: Add missing pm_runtime_put_sync
bpf: Make dst_port field in struct bpf_sock 16-bit wide
scsi: mvsas: Replace snprintf() with sysfs_emit()
scsi: bfa: Replace snprintf() with sysfs_emit()
drm/v3d: fix missing unlock
power: supply: axp20x_battery: properly report current when discharging
mt76: mt7921: fix crash when startup fails.
mt76: dma: initialize skip_unmap in mt76_dma_rx_fill
cfg80211: don't add non transmitted BSS to 6GHz scanned channels
libbpf: Fix build issue with llvm-readelf
ipv6: make mc_forwarding atomic
net: initialize init_net earlier
powerpc: Set crashkernel offset to mid of RMA region
drm/amdgpu: Fix recursive locking warning
scsi: smartpqi: Fix kdump issue when controller is locked up
PCI: aardvark: Fix support for MSI interrupts
iommu/arm-smmu-v3: fix event handling soft lockup
usb: ehci: add pci device support for Aspeed platforms
PCI: endpoint: Fix alignment fault error in copy tests
tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH.
PCI: pciehp: Add Qualcomm quirk for Command Completed erratum
scsi: mpi3mr: Fix reporting of actual data transfer size
scsi: mpi3mr: Fix memory leaks
powerpc/set_memory: Avoid spinlock recursion in change_page_attr()
power: supply: axp288-charger: Set Vhold to 4.4V
net/mlx5e: Disable TX queues before registering the netdev
usb: dwc3: pci: Set the swnode from inside dwc3_pci_quirks()
iwlwifi: mvm: Correctly set fragmented EBS
iwlwifi: mvm: move only to an enabled channel
drm/msm/dsi: Remove spurious IRQF_ONESHOT flag
ipv4: Invalidate neighbour for broadcast address upon address addition
dm ioctl: prevent potential spectre v1 gadget
dm: requeue IO if mapping table not yet available
drm/amdkfd: make CRAT table missing message informational only
vfio/pci: Stub vfio_pci_vga_rw when !CONFIG_VFIO_PCI_VGA
scsi: pm8001: Fix pm80xx_pci_mem_copy() interface
scsi: pm8001: Fix pm8001_mpi_task_abort_resp()
scsi: pm8001: Fix task leak in pm8001_send_abort_all()
scsi: pm8001: Fix tag leaks on error
scsi: pm8001: Fix memory leak in pm8001_chip_fw_flash_update_req()
mt76: mt7915: fix injected MPDU transmission to not use HW A-MSDU
powerpc/64s/hash: Make hash faults work in NMI context
mt76: mt7615: Fix assigning negative values to unsigned variable
scsi: aha152x: Fix aha152x_setup() __setup handler return value
scsi: hisi_sas: Free irq vectors in order for v3 HW
scsi: hisi_sas: Limit users changing debugfs BIST count value
net/smc: correct settings of RMB window update limit
mips: ralink: fix a refcount leak in ill_acc_of_setup()
macvtap: advertise link netns via netlink
tuntap: add sanity checks about msg_controllen in sendmsg
Bluetooth: Fix not checking for valid hdev on bt_dev_{info,warn,err,dbg}
Bluetooth: use memset avoid memory leaks
bnxt_en: Eliminate unintended link toggle during FW reset
PCI: endpoint: Fix misused goto label
MIPS: fix fortify panic when copying asm exception handlers
powerpc/64e: Tie PPC_BOOK3E_64 to PPC_FSL_BOOK3E
powerpc/secvar: fix refcount leak in format_show()
scsi: libfc: Fix use after free in fc_exch_abts_resp()
can: isotp: set default value for N_As to 50 micro seconds
can: etas_es58x: es58x_fd_rx_event_msg(): initialize rx_event_msg before calling es58x_check_msg_len()
riscv: Fixed misaligned memory access. Fixed pointer comparison.
net: account alternate interface name memory
net: limit altnames to 64k total
net/mlx5e: Remove overzealous validations in netlink EEPROM query
net: sfp: add 2500base-X quirk for Lantech SFP module
usb: dwc3: omap: fix "unbalanced disables for smps10_out1" on omap5evm
mt76: fix monitor mode crash with sdio driver
xtensa: fix DTC warning unit_address_format
MIPS: ingenic: correct unit node address
Bluetooth: Fix use after free in hci_send_acl
netfilter: conntrack: revisit gc autotuning
netlabel: fix out-of-bounds memory accesses
ceph: fix inode reference leakage in ceph_get_snapdir()
ceph: fix memory leak in ceph_readdir when note_last_dentry returns error
lib/Kconfig.debug: add ARCH dependency for FUNCTION_ALIGN option
init/main.c: return 1 from handled __setup() functions
minix: fix bug when opening a file with O_DIRECT
clk: si5341: fix reported clk_rate when output divider is 2
staging: vchiq_arm: Avoid NULL ptr deref in vchiq_dump_platform_instances
staging: vchiq_core: handle NULL result of find_service_by_handle
phy: amlogic: phy-meson-gxl-usb2: fix shared reset controller use
phy: amlogic: meson8b-usb2: Use dev_err_probe()
phy: amlogic: meson8b-usb2: fix shared reset control use
clk: rockchip: drop CLK_SET_RATE_PARENT from dclk_vop* on rk3568
cpufreq: CPPC: Fix performance/frequency conversion
opp: Expose of-node's name in debugfs
staging: wfx: fix an error handling in wfx_init_common()
w1: w1_therm: fixes w1_seq for ds28ea00 sensors
NFSv4.2: fix reference count leaks in _nfs42_proc_copy_notify()
NFSv4: Protect the state recovery thread against direct reclaim
habanalabs: fix possible memory leak in MMU DR fini
xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32
clk: ti: Preserve node in ti_dt_clocks_register()
clk: Enforce that disjoints limits are invalid
SUNRPC/call_alloc: async tasks mustn't block waiting for memory
SUNRPC/xprt: async tasks mustn't block waiting for memory
SUNRPC: remove scheduling boost for "SWAPPER" tasks.
NFS: swap IO handling is slightly different for O_DIRECT IO
NFS: swap-out must always use STABLE writes.
x86: Annotate call_on_stack()
x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy
serial: samsung_tty: do not unlock port->lock for uart_write_wakeup()
virtio_console: eliminate anonymous module_init & module_exit
jfs: prevent NULL deref in diFree
SUNRPC: Fix socket waits for write buffer space
NFS: nfsiod should not block forever in mempool_alloc()
NFS: Avoid writeback threads getting stuck in mempool_alloc()
selftests: net: Add tls config dependency for tls selftests
parisc: Fix CPU affinity for Lasi, WAX and Dino chips
parisc: Fix patch code locking and flushing
mm: fix race between MADV_FREE reclaim and blkdev direct IO read
rtc: mc146818-lib: change return values of mc146818_get_time()
rtc: Check return value from mc146818_get_time()
rtc: mc146818-lib: fix RTC presence check
drm/amdgpu: fix off by one in amdgpu_gfx_kiq_acquire()
Drivers: hv: vmbus: Fix potential crash on module unload
Revert "NFSv4: Handle the special Linux file open access mode"
NFSv4: fix open failure with O_ACCMODE flag
scsi: sr: Fix typo in CDROM(CLOSETRAY|EJECT) handling
scsi: core: Fix sbitmap depth in scsi_realloc_sdev_budget_map()
scsi: zorro7xx: Fix a resource leak in zorro7xx_remove_one()
vdpa/mlx5: Rename control VQ workqueue to vdpa wq
vdpa/mlx5: Propagate link status from device to vdpa driver
vdpa: mlx5: prevent cvq work from hogging CPU
net: sfc: add missing xdp queue reinitialization
net/tls: fix slab-out-of-bounds bug in decrypt_internal
vrf: fix packet sniffing for traffic originating from ip tunnels
skbuff: fix coalescing for page_pool fragment recycling
ice: Clear default forwarding VSI during VSI release
mctp: Fix check for dev_hard_header() result
net: ipv4: fix route with nexthop object delete warning
net: stmmac: Fix unset max_speed difference between DT and non-DT platforms
drm/imx: imx-ldb: Check for null pointer after calling kmemdup
drm/imx: Fix memory leak in imx_pd_connector_get_modes
drm/imx: dw_hdmi-imx: Fix bailout in error cases of probe
regulator: rtq2134: Fix missing active_discharge_on setting
regulator: atc260x: Fix missing active_discharge_on setting
arch/arm64: Fix topology initialization for core scheduling
bnxt_en: Synchronize tx when xdp redirects happen on same ring
bnxt_en: reserve space inside receive page for skb_shared_info
bnxt_en: Prevent XDP redirect from running when stopping TX queue
sfc: Do not free an empty page_ring
RDMA/mlx5: Don't remove cache MRs when a delay is needed
RDMA/mlx5: Add a missing update of cache->last_add
IB/cm: Cancel mad on the DREQ event when the state is MRA_REP_RCVD
IB/rdmavt: add lock to call to rvt_error_qp to prevent a race condition
sctp: count singleton chunks in assoc user stats
dpaa2-ptp: Fix refcount leak in dpaa2_ptp_probe
ice: Set txq_teid to ICE_INVAL_TEID on ring creation
ice: Do not skip not enabled queues in ice_vc_dis_qs_msg
ipv6: Fix stats accounting in ip6_pkt_drop
ice: synchronize_rcu() when terminating rings
ice: xsk: fix VSI state check in ice_xsk_wakeup()
net: openvswitch: don't send internal clone attribute to the userspace.
net: ethernet: mv643xx: Fix over zealous checking of_get_mac_address()
net: openvswitch: fix leak of nested actions
rxrpc: fix a race in rxrpc_exit_net()
net: sfc: fix using uninitialized xdp tx_queue
net: phy: mscc-miim: reject clause 45 register accesses
qede: confirm skb is allocated before using
spi: bcm-qspi: fix MSPI only access with bcm_qspi_exec_mem_op()
bpf: Support dual-stack sockets in bpf_tcp_check_syncookie
drbd: Fix five use after free bugs in get_initial_state
scsi: ufs: ufshpb: Fix a NULL check on list iterator
io_uring: nospec index for tags on files update
io_uring: don't touch scm_fp_list after queueing skb
SUNRPC: Handle ENOMEM in call_transmit_status()
SUNRPC: Handle low memory situations in call_status()
SUNRPC: svc_tcp_sendmsg() should handle errors from xdr_alloc_bvec()
iommu/omap: Fix regression in probe for NULL pointer dereference
perf: arm-spe: Fix perf report --mem-mode
perf tools: Fix perf's libperf_print callback
perf session: Remap buf if there is no space for event
arm64: Add part number for Arm Cortex-A78AE
scsi: mpt3sas: Fix use after free in _scsih_expander_node_remove()
scsi: ufs: ufs-pci: Add support for Intel MTL
Revert "mmc: sdhci-xenon: fix annoying 1.8V regulator warning"
mmc: block: Check for errors after write on SPI
mmc: mmci: stm32: correctly check all elements of sg list
mmc: renesas_sdhi: don't overwrite TAP settings when HS400 tuning is complete
mmc: core: Fixup support for writeback-cache for eMMC and SD
lz4: fix LZ4_decompress_safe_partial read out of bound
highmem: fix checks in __kmap_local_sched_{in,out}
mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0)
mm/mempolicy: fix mpol_new leak in shared_policy_replace
io_uring: don't check req->file in io_fsync_prep()
io_uring: defer splice/tee file validity check until command issue
io_uring: implement compat handling for IORING_REGISTER_IOWQ_AFF
io_uring: fix race between timeout flush and removal
x86/pm: Save the MSR validity status at context setup
x86/speculation: Restore speculation related MSRs during S3 resume
perf/x86/intel: Update the FRONTEND MSR mask on Sapphire Rapids
btrfs: fix qgroup reserve overflow the qgroup limit
btrfs: prevent subvol with swapfile from being deleted
spi: core: add dma_map_dev for __spi_unmap_msg()
arm64: patch_text: Fixup last cpu should be master
RDMA/hfi1: Fix use-after-free bug for mm struct
gpio: Restrict usage of GPIO chip irq members before initialization
x86/msi: Fix msi message data shadow struct
x86/mm/tlb: Revert retpoline avoidance approach
perf/x86/intel: Don't extend the pseudo-encoding to GP counters
ata: sata_dwc_460ex: Fix crash due to OOB write
perf: qcom_l2_pmu: fix an incorrect NULL check on list iterator
perf/core: Inherit event_caps
irqchip/gic-v3: Fix GICR_CTLR.RWP polling
fbdev: Fix unregistering of framebuffers without device
amd/display: set backlight only if required
SUNRPC: Prevent immediate close+reconnect
drm/panel: ili9341: fix optional regulator handling
drm/amdgpu/display: change pipe policy for DCN 2.1
drm/amdgpu/smu10: fix SoC/fclk units in auto mode
drm/amdgpu/vcn: Fix the register setting for vcn1
drm/nouveau/pmu: Add missing callbacks for Tegra devices
drm/amdkfd: Create file descriptor after client is added to smi_clients list
drm/amdgpu: don't use BACO for reset in S3
KVM: SVM: Allow AVIC support on system w/ physical APIC ID > 255
net/smc: send directly on setting TCP_NODELAY
Revert "selftests: net: Add tls config dependency for tls selftests"
bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide
selftests/bpf: Fix u8 narrow load checks for bpf_sk_lookup remote_port
rtc: mc146818-lib: fix signedness bug in mc146818_get_time()
SUNRPC: Don't call connect() more than once on a TCP socket
Revert "nbd: fix possible overflow on 'first_minor' in nbd_dev_add()"
perf build: Don't use -ffat-lto-objects in the python feature test when building with clang-13
perf python: Fix probing for some clang command line options
tools build: Filter out options and warnings not supported by clang
tools build: Use $(shell ) instead of `` to get embedded libperl's ccopts
dmaengine: Revert "dmaengine: shdma: Fix runtime PM imbalance on error"
KVM: avoid NULL pointer dereference in kvm_dirty_ring_push
Revert "net/mlx5: Accept devlink user input after driver initialization complete"
ubsan: remove CONFIG_UBSAN_OBJECT_SIZE
selftests: cgroup: Make cg_create() use 0755 for permission instead of 0644
selftests: cgroup: Test open-time credential usage for migration checks
selftests: cgroup: Test open-time cgroup namespace usage for migration checks
mm: don't skip swap entry even if zap_details specified
Drivers: hv: vmbus: Replace smp_store_mb() with virt_store_mb()
x86/bug: Prevent shadowing in __WARN_FLAGS
sched: Teach the forced-newidle balancer about CPU affinity limitation.
x86,static_call: Fix __static_call_return0 for i386
irqchip/gic-v4: Wait for GICR_VPENDBASER.Dirty to clear before descheduling
powerpc/64: Fix build failure with allyesconfig in book3s_64_entry.S
irqchip/gic, gic-v3: Prevent GSI to SGI translations
mm/sparsemem: fix 'mem_section' will never be NULL gcc 12 warning
static_call: Don't make __static_call_return0 static
powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit
stacktrace: move filter_irq_stacks() to kernel/stacktrace.c
Linux 5.15.34
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I98049d0d8ebd427296418d31085bfde482ad30e7
|
||
|
|
f88ed5a3d3 |
FROMLIST: mm: multi-gen LRU: groundwork
Evictable pages are divided into multiple generations for each lruvec. The youngest generation number is stored in lrugen->max_seq for both anon and file types as they are aged on an equal footing. The oldest generation numbers are stored in lrugen->min_seq[] separately for anon and file types as clean file pages can be evicted regardless of swap constraints. These three variables are monotonically increasing. Generation numbers are truncated into order_base_2(MAX_NR_GENS+1) bits in order to fit into the gen counter in page->flags. Each truncated generation number is an index to lrugen->lists[]. The sliding window technique is used to track at least MIN_NR_GENS and at most MAX_NR_GENS generations. The gen counter stores a value within [1, MAX_NR_GENS] while a page is on one of lrugen->lists[]. Otherwise it stores 0. There are two conceptually independent procedures: "the aging", which produces young generations, and "the eviction", which consumes old generations. They form a closed-loop system, i.e., "the page reclaim". Both procedures can be invoked from userspace for the purposes of working set estimation and proactive reclaim. These features are required to optimize job scheduling (bin packing) in data centers. The variable size of the sliding window is designed for such use cases [1][2]. To avoid confusion, the terms "hot" and "cold" will be applied to the multi-gen LRU, as a new convention; the terms "active" and "inactive" will be applied to the active/inactive LRU, as usual. The protection of hot pages and the selection of cold pages are based on page access channels and patterns. There are two access channels: one through page tables and the other through file descriptors. The protection of the former channel is by design stronger because: 1. The uncertainty in determining the access patterns of the former channel is higher due to the approximation of the accessed bit. 2. The cost of evicting the former channel is higher due to the TLB flushes required and the likelihood of encountering the dirty bit. 3. The penalty of underprotecting the former channel is higher because applications usually do not prepare themselves for major page faults like they do for blocked I/O. E.g., GUI applications commonly use dedicated I/O threads to avoid blocking the rendering threads. There are also two access patterns: one with temporal locality and the other without. For the reasons listed above, the former channel is assumed to follow the former pattern unless VM_SEQ_READ or VM_RAND_READ is present; the latter channel is assumed to follow the latter pattern unless outlying refaults have been observed [3][4]. The next patch will address the "outlying refaults". Three macros, i.e., LRU_REFS_WIDTH, LRU_REFS_PGOFF and LRU_REFS_MASK, used later are added in this patch to make the entire patchset less diffy. A page is added to the youngest generation on faulting. The aging needs to check the accessed bit at least twice before handing this page over to the eviction. The first check takes care of the accessed bit set on the initial fault; the second check makes sure this page has not been used since then. This protocol, AKA second chance, requires a minimum of two generations, hence MIN_NR_GENS. [1] https://dl.acm.org/doi/10.1145/3297858.3304053 [2] https://dl.acm.org/doi/10.1145/3503222.3507731 [3] https://lwn.net/Articles/495543/ [4] https://lwn.net/Articles/815342/ Link: https://lore.kernel.org/lkml/20220309021230.721028-6-yuzhao@google.com/ Signed-off-by: Yu Zhao <yuzhao@google.com> Acked-by: Brian Geffon <bgeffon@google.com> Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name> Acked-by: Steven Barrett <steven@liquorix.net> Acked-by: Suleiman Souhlal <suleiman@google.com> Tested-by: Daniel Byrne <djbyrne@mtu.edu> Tested-by: Donald Carr <d@chaos-reins.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> Tested-by: Shuang Zhai <szhai2@cs.rochester.edu> Tested-by: Sofia Trinh <sofia.trinh@edi.works> Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com> Bug: 227651406 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Change-Id: I333ec6a1d2abfa60d93d6adc190ed3eefe441512 |
||
|
|
1861f17391 |
FROMLIST: mm: x86, arm64: add arch_has_hw_pte_young()
Some architectures automatically set the accessed bit in PTEs, e.g., x86 and arm64 v8.2. On architectures that do not have this capability, clearing the accessed bit in a PTE usually triggers a page fault following the TLB miss of this PTE (to emulate the accessed bit). Being aware of this capability can help make better decisions, e.g., whether to spread the work out over a period of time to reduce bursty page faults when trying to clear the accessed bit in many PTEs. Note that theoretically this capability can be unreliable, e.g., hotplugged CPUs might be different from builtin ones. Therefore it should not be used in architecture-independent code that involves correctness, e.g., to determine whether TLB flushes are required (in combination with the accessed bit). Link: https://lore.kernel.org/lkml/20220309021230.721028-2-yuzhao@google.com/ Signed-off-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Barry Song <baohua@kernel.org> Acked-by: Brian Geffon <bgeffon@google.com> Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name> Acked-by: Steven Barrett <steven@liquorix.net> Acked-by: Suleiman Souhlal <suleiman@google.com> Acked-by: Will Deacon <will@kernel.org> Tested-by: Daniel Byrne <djbyrne@mtu.edu> Tested-by: Donald Carr <d@chaos-reins.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> Tested-by: Shuang Zhai <szhai2@cs.rochester.edu> Tested-by: Sofia Trinh <sofia.trinh@edi.works> Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com> Bug: 227651406 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Change-Id: Ie81175d7e0d239f688d31487b298cf9b4fb66707 |
||
|
|
b41a37c036 |
Merge 5.15.33 into android13-5.15
Changes in 5.15.33
Revert "swiotlb: rework "fix info leak with DMA_FROM_DEVICE""
USB: serial: pl2303: add IBM device IDs
dt-bindings: usb: hcd: correct usb-device path
USB: serial: pl2303: fix GS type detection
USB: serial: simple: add Nokia phone driver
mm: kfence: fix missing objcg housekeeping for SLAB
hv: utils: add PTP_1588_CLOCK to Kconfig to fix build
HID: logitech-dj: add new lightspeed receiver id
HID: Add support for open wheel and no attachment to T300
xfrm: fix tunnel model fragmentation behavior
ARM: mstar: Select HAVE_ARM_ARCH_TIMER
virtio_console: break out of buf poll on remove
vdpa/mlx5: should verify CTRL_VQ feature exists for MQ
tools/virtio: fix virtio_test execution
ethernet: sun: Free the coherent when failing in probing
gpio: Revert regression in sysfs-gpio (gpiolib.c)
spi: Fix invalid sgs value
net:mcf8390: Use platform_get_irq() to get the interrupt
Revert "gpio: Revert regression in sysfs-gpio (gpiolib.c)"
spi: Fix erroneous sgs value with min_t()
Input: zinitix - do not report shadow fingers
af_key: add __GFP_ZERO flag for compose_sadb_supported in function pfkey_register
net: dsa: microchip: add spi_device_id tables
selftests: vm: fix clang build error multiple output files
locking/lockdep: Avoid potential access of invalid memory in lock_class
drm/amdgpu: move PX checking into amdgpu_device_ip_early_init
drm/amdgpu: only check for _PR3 on dGPUs
iommu/iova: Improve 32-bit free space estimate
virtio-blk: Use blk_validate_block_size() to validate block size
tpm: fix reference counting for struct tpm_chip
usb: typec: tipd: Forward plug orientation to typec subsystem
USB: usb-storage: Fix use of bitfields for hardware data in ene_ub6250.c
xhci: fix garbage USBSTS being logged in some cases
xhci: fix runtime PM imbalance in USB2 resume
xhci: make xhci_handshake timeout for xhci_reset() adjustable
xhci: fix uninitialized string returned by xhci_decode_ctrl_ctx()
mei: me: disable driver on the ign firmware
mei: me: add Alder Lake N device id.
mei: avoid iterator usage outside of list_for_each_entry
bus: mhi: pci_generic: Add mru_default for Quectel EM1xx series
bus: mhi: Fix MHI DMA structure endianness
docs: sphinx/requirements: Limit jinja2<3.1
coresight: Fix TRCCONFIGR.QE sysfs interface
coresight: syscfg: Fix memleak on registration failure in cscfg_create_device
iio: afe: rescale: use s64 for temporary scale calculations
iio: inkern: apply consumer scale on IIO_VAL_INT cases
iio: inkern: apply consumer scale when no channel scale is available
iio: inkern: make a best effort on offset calculation
greybus: svc: fix an error handling bug in gb_svc_hello()
clk: rockchip: re-add rational best approximation algorithm to the fractional divider
clk: uniphier: Fix fixed-rate initialization
ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
cifs: fix handlecache and multiuser
cifs: we do not need a spinlock around the tree access during umount
KEYS: fix length validation in keyctl_pkey_params_get_2()
KEYS: asymmetric: enforce that sig algo matches key algo
KEYS: asymmetric: properly validate hash_algo and encoding
Documentation: add link to stable release candidate tree
Documentation: update stable tree link
firmware: stratix10-svc: add missing callback parameter on RSU
firmware: sysfb: fix platform-device leak in error path
HID: intel-ish-hid: Use dma_alloc_coherent for firmware update
SUNRPC: avoid race between mod_timer() and del_timer_sync()
NFS: NFSv2/v3 clients should never be setting NFS_CAP_XATTR
NFSD: prevent underflow in nfssvc_decode_writeargs()
NFSD: prevent integer overflow on 32 bit systems
f2fs: fix to unlock page correctly in error path of is_alive()
f2fs: quota: fix loop condition at f2fs_quota_sync()
f2fs: fix to do sanity check on .cp_pack_total_block_count
remoteproc: Fix count check in rproc_coredump_write()
mm/mlock: fix two bugs in user_shm_lock()
pinctrl: ingenic: Fix regmap on X series SoCs
pinctrl: samsung: drop pin banks references on error paths
net: bnxt_ptp: fix compilation error
spi: mxic: Fix the transmit path
mtd: rawnand: protect access to rawnand devices while in suspend
can: ems_usb: ems_usb_start_xmit(): fix double dev_kfree_skb() in error path
can: m_can: m_can_tx_handler(): fix use after free of skb
can: usb_8dev: usb_8dev_start_xmit(): fix double dev_kfree_skb() in error path
jffs2: fix use-after-free in jffs2_clear_xattr_subsystem
jffs2: fix memory leak in jffs2_do_mount_fs
jffs2: fix memory leak in jffs2_scan_medium
mm: fs: fix lru_cache_disabled race in bh_lru
mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node
mm: invalidate hwpoison page cache page in fault path
mempolicy: mbind_range() set_policy() after vma_merge()
scsi: core: sd: Add silence_suspend flag to suppress some PM messages
scsi: ufs: Fix runtime PM messages never-ending cycle
scsi: scsi_transport_fc: Fix FPIN Link Integrity statistics counters
scsi: libsas: Fix sas_ata_qc_issue() handling of NCQ NON DATA commands
qed: display VF trust config
qed: validate and restrict untrusted VFs vlan promisc mode
riscv: dts: canaan: Fix SPI3 bus width
riscv: Fix fill_callchain return value
riscv: Increase stack size under KASAN
Revert "Input: clear BTN_RIGHT/MIDDLE on buttonpads"
cifs: prevent bad output lengths in smb2_ioctl_query_info()
cifs: fix NULL ptr dereference in smb2_ioctl_query_info()
ALSA: cs4236: fix an incorrect NULL check on list iterator
ALSA: hda: Avoid unsol event during RPM suspending
ALSA: pcm: Fix potential AB/BA lock with buffer_mutex and mmap_lock
ALSA: hda/realtek: Fix audio regression on Mi Notebook Pro 2020
rtc: mc146818-lib: fix locking in mc146818_set_time
rtc: pl031: fix rtc features null pointer dereference
ocfs2: fix crash when mount with quota enabled
drm/simpledrm: Add "panel orientation" property on non-upright mounted LCD panels
mm: madvise: skip unmapped vma holes passed to process_madvise
mm: madvise: return correct bytes advised with process_madvise
Revert "mm: madvise: skip unmapped vma holes passed to process_madvise"
mm,hwpoison: unmap poisoned page before invalidation
mm/kmemleak: reset tag when compare object pointer
dm stats: fix too short end duration_ns when using precise_timestamps
dm: fix use-after-free in dm_cleanup_zoned_dev()
dm: interlock pending dm_io and dm_wait_for_bios_completion
dm: fix double accounting of flush with data
dm integrity: set journal entry unused when shrinking device
tracing: Have trace event string test handle zero length strings
drbd: fix potential silent data corruption
powerpc/kvm: Fix kvm_use_magic_page
PCI: fu740: Force 2.5GT/s for initial device probe
arm64: signal: nofpsimd: Do not allocate fp/simd context when not available
arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones
arm64: dts: qcom: sm8250: Fix MSI IRQ for PCIe1 and PCIe2
arm64: dts: ti: k3-am65: Fix gic-v3 compatible regs
arm64: dts: ti: k3-j721e: Fix gic-v3 compatible regs
arm64: dts: ti: k3-j7200: Fix gic-v3 compatible regs
arm64: dts: ti: k3-am64: Fix gic-v3 compatible regs
ASoC: SOF: Intel: Fix NULL ptr dereference when ENOMEM
Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag"
ACPI: properties: Consistently return -ENOENT if there are no more references
coredump: Also dump first pages of non-executable ELF libraries
ext4: fix ext4_fc_stats trace point
ext4: fix fs corruption when tring to remove a non-empty directory with IO error
ext4: make mb_optimize_scan performance mount option work with extents
drivers: hamradio: 6pack: fix UAF bug caused by mod_timer()
samples/landlock: Fix path_list memory leak
landlock: Use square brackets around "landlock-ruleset"
mailbox: tegra-hsp: Flush whole channel
block: limit request dispatch loop duration
block: don't merge across cgroup boundaries if blkcg is enabled
drm/edid: check basic audio support on CEA extension block
fbdev: Hot-unplug firmware fb devices on forced removal
video: fbdev: sm712fb: Fix crash in smtcfb_read()
video: fbdev: atari: Atari 2 bpp (STe) palette bugfix
rfkill: make new event layout opt-in
ARM: dts: at91: sama7g5: Remove unused properties in i2c nodes
ARM: dts: at91: sama5d2: Fix PMERRLOC resource size
ARM: dts: exynos: fix UART3 pins configuration in Exynos5250
ARM: dts: exynos: add missing HDMI supplies on SMDK5250
ARM: dts: exynos: add missing HDMI supplies on SMDK5420
mgag200 fix memmapsl configuration in GCTL6 register
carl9170: fix missing bit-wise or operator for tx_params
pstore: Don't use semaphores in always-atomic-context code
thermal: int340x: Increase bitmap size
lib/raid6/test: fix multiple definition linking error
exec: Force single empty string when argv is empty
crypto: rsa-pkcs1pad - only allow with rsa
crypto: rsa-pkcs1pad - correctly get hash from source scatterlist
crypto: rsa-pkcs1pad - restore signature length check
crypto: rsa-pkcs1pad - fix buffer overread in pkcs1pad_verify_complete()
bcache: fixup multiple threads crash
PM: domains: Fix sleep-in-atomic bug caused by genpd_debug_remove()
DEC: Limit PMAX memory probing to R3k systems
media: gpio-ir-tx: fix transmit with long spaces on Orange Pi PC
media: venus: hfi_cmds: List HDR10 property as unsupported for v1 and v3
media: venus: venc: Fix h264 8x8 transform control
media: davinci: vpif: fix unbalanced runtime PM get
media: davinci: vpif: fix unbalanced runtime PM enable
btrfs: zoned: mark relocation as writing
btrfs: extend locking to all space_info members accesses
btrfs: verify the tranisd of the to-be-written dirty extent buffer
xtensa: define update_mmu_tlb function
xtensa: fix stop_machine_cpuslocked call in patch_text
xtensa: fix xtensa_wsr always writing 0
drm/syncobj: flatten dma_fence_chains on transfer
drm/nouveau/backlight: Fix LVDS backlight detection on some laptops
drm/nouveau/backlight: Just set all backlight types as RAW
drm/fb-helper: Mark screen buffers in system memory with FBINFO_VIRTFB
brcmfmac: firmware: Allocate space for default boardrev in nvram
brcmfmac: pcie: Release firmwares in the brcmf_pcie_setup error path
brcmfmac: pcie: Declare missing firmware files in pcie.c
brcmfmac: pcie: Replace brcmf_pcie_copy_mem_todev with memcpy_toio
brcmfmac: pcie: Fix crashes due to early IRQs
drm/i915/opregion: check port number bounds for SWSCI display power state
drm/i915/gem: add missing boundary check in vm_access
PCI: imx6: Allow to probe when dw_pcie_wait_for_link() fails
PCI: pciehp: Clear cmd_busy bit in polling mode
PCI: xgene: Revert "PCI: xgene: Fix IB window setup"
regulator: qcom_smd: fix for_each_child.cocci warnings
selinux: access superblock_security_struct in LSM blob way
selinux: check return value of sel_make_avc_files
crypto: ccp - Ensure psp_ret is always init'd in __sev_platform_init_locked()
hwrng: cavium - Check health status while reading random data
hwrng: cavium - HW_RANDOM_CAVIUM should depend on ARCH_THUNDER
crypto: sun8i-ss - really disable hash on A80
crypto: authenc - Fix sleep in atomic context in decrypt_tail
crypto: mxs-dcp - Fix scatterlist processing
selinux: Fix selinux_sb_mnt_opts_compat()
thermal: int340x: Check for NULL after calling kmemdup()
crypto: octeontx2 - remove CONFIG_DM_CRYPT check
spi: tegra114: Add missing IRQ check in tegra_spi_probe
spi: tegra210-quad: Fix missin IRQ check in tegra_qspi_probe
stack: Constrain and fix stack offset randomization with Clang builds
arm64/mm: avoid fixmap race condition when create pud mapping
blk-cgroup: set blkg iostat after percpu stat aggregation
selftests/x86: Add validity check and allow field splitting
selftests/sgx: Treat CC as one argument
crypto: rockchip - ECB does not need IV
audit: log AUDIT_TIME_* records only from rules
EVM: fix the evm= __setup handler return value
crypto: ccree - don't attempt 0 len DMA mappings
crypto: hisilicon/sec - fix the aead software fallback for engine
spi: pxa2xx-pci: Balance reference count for PCI DMA device
hwmon: (pmbus) Add mutex to regulator ops
hwmon: (sch56xx-common) Replace WDOG_ACTIVE with WDOG_HW_RUNNING
nvme: cleanup __nvme_check_ids
nvme: fix the check for duplicate unique identifiers
block: don't delete queue kobject before its children
PM: hibernate: fix __setup handler error handling
PM: suspend: fix return value of __setup handler
spi: spi-zynqmp-gqspi: Handle error for dma_set_mask
hwrng: atmel - disable trng on failure path
crypto: sun8i-ss - call finalize with bh disabled
crypto: sun8i-ce - call finalize with bh disabled
crypto: amlogic - call finalize with bh disabled
crypto: gemini - call finalize with bh disabled
crypto: vmx - add missing dependencies
clocksource/drivers/timer-ti-dm: Fix regression from errata i940 fix
clocksource/drivers/exynos_mct: Refactor resources allocation
clocksource/drivers/exynos_mct: Handle DTS with higher number of interrupts
clocksource/drivers/timer-microchip-pit64b: Use notrace
clocksource/drivers/timer-of: Check return value of of_iomap in timer_of_base_init()
arm64: prevent instrumentation of bp hardening callbacks
KEYS: trusted: Fix trusted key backends when building as module
KEYS: trusted: Avoid calling null function trusted_key_exit
ACPI: APEI: fix return value of __setup handlers
crypto: ccp - ccp_dmaengine_unregister release dma channels
crypto: ccree - Fix use after free in cc_cipher_exit()
hwrng: nomadik - Change clk_disable to clk_disable_unprepare
hwmon: (pmbus) Add Vin unit off handling
clocksource: acpi_pm: fix return value of __setup handler
io_uring: don't check unrelated req->open.how in accept request
io_uring: terminate manual loop iterator loop correctly for non-vecs
watch_queue: Fix NULL dereference in error cleanup
watch_queue: Actually free the watch
f2fs: fix to enable ATGC correctly via gc_idle sysfs interface
sched/debug: Remove mpol_get/put and task_lock/unlock from sched_show_numa
sched/core: Export pelt_thermal_tp
sched/uclamp: Fix iowait boost escaping uclamp restriction
rseq: Remove broken uapi field layout on 32-bit little endian
perf/core: Fix address filter parser for multiple filters
perf/x86/intel/pt: Fix address filter config for 32-bit kernel
sched/fair: Improve consistency of allowed NUMA balance calculations
f2fs: fix missing free nid in f2fs_handle_failed_inode
nfsd: more robust allocation failure handling in nfsd_file_cache_init
sched/cpuacct: Fix charge percpu cpuusage
sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
f2fs: fix to avoid potential deadlock
btrfs: fix unexpected error path when reflinking an inline extent
f2fs: fix compressed file start atomic write may cause data corruption
selftests, x86: fix how check_cc.sh is being invoked
drivers/base/memory: add memory block to memory group after registration succeeded
kunit: make kunit_test_timeout compatible with comment
pinctrl: samsung: Remove EINT handler for Exynos850 ALIVE and CMGP gpios
media: staging: media: zoran: fix usage of vb2_dma_contig_set_max_seg_size
media: camss: csid-170: fix non-10bit formats
media: camss: csid-170: don't enable unused irqs
media: camss: csid-170: set the right HALT_CMD when disabled
media: camss: vfe-170: fix "VFE halt timeout" error
media: staging: media: imx: imx7-mipi-csis: Make subdev name unique
media: v4l2-mem2mem: Apply DST_QUEUE_OFF_BASE on MMAP buffers across ioctls
media: mtk-vcodec: potential dereference of null pointer
media: imx: imx8mq-mipi-csi2: remove wrong irq config write operation
media: imx: imx8mq-mipi_csi2: fix system resume
media: bttv: fix WARNING regression on tunerless devices
media: atmel: atmel-sama7g5-isc: fix ispck leftover
ASoC: sh: rz-ssi: Drop calling rz_ssi_pio_recv() recursively
ASoC: codecs: Check for error pointer after calling devm_regmap_init_mmio
ASoC: xilinx: xlnx_formatter_pcm: Handle sysclk setting
ASoC: simple-card-utils: Set sysclk on all components
media: coda: Fix missing put_device() call in coda_get_vdoa_data
media: meson: vdec: potential dereference of null pointer
media: hantro: Fix overfill bottom register field name
media: ov6650: Fix set format try processing path
media: v4l: Avoid unaligned access warnings when printing 4cc modifiers
media: ov5648: Don't pack controls struct
media: aspeed: Correct value for h-total-pixels
video: fbdev: matroxfb: set maxvram of vbG200eW to the same as vbG200 to avoid black screen
video: fbdev: controlfb: Fix COMPILE_TEST build
video: fbdev: smscufx: Fix null-ptr-deref in ufx_usb_probe()
video: fbdev: atmel_lcdfb: fix an error code in atmel_lcdfb_probe()
video: fbdev: fbcvt.c: fix printing in fb_cvt_print_name()
ARM: dts: Fix OpenBMC flash layout label addresses
firmware: qcom: scm: Remove reassignment to desc following initializer
ARM: dts: qcom: ipq4019: fix sleep clock
soc: qcom: rpmpd: Check for null return of devm_kcalloc
soc: qcom: ocmem: Fix missing put_device() call in of_get_ocmem
soc: qcom: aoss: remove spurious IRQF_ONESHOT flags
arm64: dts: qcom: sdm845: fix microphone bias properties and values
arm64: dts: qcom: sm8250: fix PCIe bindings to follow schema
arm64: dts: broadcom: bcm4908: use proper TWD binding
arm64: dts: qcom: sm8150: Correct TCS configuration for apps rsc
arm64: dts: qcom: sm8350: Correct TCS configuration for apps rsc
firmware: ti_sci: Fix compilation failure when CONFIG_TI_SCI_PROTOCOL is not defined
soc: ti: wkup_m3_ipc: Fix IRQ check in wkup_m3_ipc_probe
ARM: dts: sun8i: v3s: Move the csi1 block to follow address order
vsprintf: Fix potential unaligned access
ARM: dts: imx: Add missing LVDS decoder on M53Menlo
media: mexon-ge2d: fixup frames size in registers
media: video/hdmi: handle short reads of hdmi info frame.
media: ti-vpe: cal: Fix a NULL pointer dereference in cal_ctx_v4l2_init_formats()
media: em28xx: initialize refcount before kref_get
media: usb: go7007: s2250-board: fix leak in probe()
media: cedrus: H265: Fix neighbour info buffer size
media: cedrus: h264: Fix neighbour info buffer size
ASoC: codecs: rx-macro: fix accessing compander for aux
ASoC: codecs: rx-macro: fix accessing array out of bounds for enum type
ASoC: codecs: va-macro: fix accessing array out of bounds for enum type
ASoC: codecs: wc938x: fix accessing array out of bounds for enum type
ASoC: codecs: wcd938x: fix kcontrol max values
ASoC: codecs: wcd934x: fix kcontrol max values
ASoC: codecs: wcd934x: fix return value of wcd934x_rx_hph_mode_put
media: v4l2-core: Initialize h264 scaling matrix
media: ov5640: Fix set format, v4l2_mbus_pixelcode not updated
selftests/lkdtm: Add UBSAN config
lib: uninline simple_strntoull() as well
vsprintf: Fix %pK with kptr_restrict == 0
uaccess: fix nios2 and microblaze get_user_8()
ASoC: rt5663: check the return value of devm_kzalloc() in rt5663_parse_dp()
soc: mediatek: pm-domains: Add wakeup capacity support in power domain
mmc: sdhci_am654: Fix the driver data of AM64 SoC
ASoC: ti: davinci-i2s: Add check for clk_enable()
ALSA: spi: Add check for clk_enable()
arm64: dts: ns2: Fix spi-cpol and spi-cpha property
arm64: dts: broadcom: Fix sata nodename
printk: fix return value of printk.devkmsg __setup handler
ASoC: mxs-saif: Handle errors for clk_enable
ASoC: atmel_ssc_dai: Handle errors for clk_enable
ASoC: dwc-i2s: Handle errors for clk_enable
ASoC: soc-compress: prevent the potentially use of null pointer
memory: emif: Add check for setup_interrupts
memory: emif: check the pointer temp in get_device_details()
ALSA: firewire-lib: fix uninitialized flag for AV/C deferred transaction
arm64: dts: rockchip: Fix SDIO regulator supply properties on rk3399-firefly
m68k: coldfire/device.c: only build for MCF_EDMA when h/w macros are defined
media: stk1160: If start stream fails, return buffers with VB2_BUF_STATE_QUEUED
media: vidtv: Check for null return of vzalloc
ASoC: atmel: Add missing of_node_put() in at91sam9g20ek_audio_probe
ASoC: wm8350: Handle error for wm8350_register_irq
ASoC: fsi: Add check for clk_enable
video: fbdev: omapfb: Add missing of_node_put() in dvic_probe_of
media: saa7134: fix incorrect use to determine if list is empty
ivtv: fix incorrect device_caps for ivtvfb
ASoC: atmel: Fix error handling in snd_proto_probe
ASoC: rockchip: i2s: Fix missing clk_disable_unprepare() in rockchip_i2s_probe
ASoC: SOF: Add missing of_node_put() in imx8m_probe
ASoC: mediatek: use of_device_get_match_data()
ASoC: mediatek: mt8192-mt6359: Fix error handling in mt8192_mt6359_dev_probe
ASoC: rk817: Fix missing clk_disable_unprepare() in rk817_platform_probe
ASoC: dmaengine: do not use a NULL prepare_slave_config() callback
ASoC: mxs: Fix error handling in mxs_sgtl5000_probe
ASoC: fsl_spdif: Disable TX clock when stop
ASoC: imx-es8328: Fix error return code in imx_es8328_probe()
ASoC: SOF: Intel: enable DMI L1 for playback streams
ASoC: msm8916-wcd-digital: Fix missing clk_disable_unprepare() in msm8916_wcd_digital_probe
mmc: davinci_mmc: Handle error for clk_enable
ASoC: atmel: Fix error handling in sam9x5_wm8731_driver_probe
ASoC: msm8916-wcd-analog: Fix error handling in pm8916_wcd_analog_spmi_probe
ASoC: codecs: wcd934x: Add missing of_node_put() in wcd934x_codec_parse_data
ASoC: amd: Fix reference to PCM buffer address
ARM: configs: multi_v5_defconfig: re-enable CONFIG_V4L_PLATFORM_DRIVERS
ARM: configs: multi_v5_defconfig: re-enable DRM_PANEL and FB_xxx
drm/meson: osd_afbcd: Add an exit callback to struct meson_afbcd_ops
drm/meson: Make use of the helper function devm_platform_ioremap_resourcexxx()
drm/meson: split out encoder from meson_dw_hdmi
drm/meson: Fix error handling when afbcd.ops->init fails
drm/bridge: Fix free wrong object in sii8620_init_rcp_input_dev
drm/bridge: Add missing pm_runtime_disable() in __dw_mipi_dsi_probe
drm/bridge: nwl-dsi: Fix PM disable depth imbalance in nwl_dsi_probe
drm: bridge: adv7511: Fix ADV7535 HPD enablement
ath10k: fix memory overwrite of the WoWLAN wakeup packet pattern
drm/v3d/v3d_drv: Check for error num after setting mask
drm/panfrost: Check for error num after setting mask
libbpf: Fix possible NULL pointer dereference when destroying skeleton
bpftool: Only set obj->skeleton on complete success
udmabuf: validate ubuf->pagecount
bpf: Fix UAF due to race between btf_try_get_module and load_module
drm/selftests/test-drm_dp_mst_helper: Fix memory leak in sideband_msg_req_encode_decode
selftests: bpf: Fix bind on used port
Bluetooth: btintel: Fix WBS setting for Intel legacy ROM products
Bluetooth: hci_serdev: call init_rwsem() before p->open()
mtd: onenand: Check for error irq
mtd: rawnand: gpmi: fix controller timings setting
drm/edid: Don't clear formats if using deep color
drm/edid: Split deep color modes between RGB and YUV444
ionic: fix type complaint in ionic_dev_cmd_clean()
ionic: start watchdog after all is setup
ionic: Don't send reset commands if FW isn't running
drm/nouveau/acr: Fix undefined behavior in nvkm_acr_hsfw_load_bl()
drm/amd/display: Fix a NULL pointer dereference in amdgpu_dm_connector_add_common_modes()
drm/amd/pm: return -ENOTSUPP if there is no get_dpm_ultimate_freq function
net: phy: at803x: move page selection fix to config_init
selftests/bpf: Normalize XDP section names in selftests
selftests/bpf/test_xdp_redirect_multi: use temp netns for testing
ath9k_htc: fix uninit value bugs
RDMA/core: Set MR type in ib_reg_user_mr
KVM: PPC: Fix vmx/vsx mixup in mmio emulation
selftests/net: timestamping: Fix bind_phc check
i40e: don't reserve excessive XDP_PACKET_HEADROOM on XSK Rx to skb
i40e: respect metadata on XSK Rx to skb
igc: don't reserve excessive XDP_PACKET_HEADROOM on XSK Rx to skb
ixgbe: pass bi->xdp to ixgbe_construct_skb_zc() directly
ixgbe: don't reserve excessive XDP_PACKET_HEADROOM on XSK Rx to skb
ixgbe: respect metadata on XSK Rx to skb
power: reset: gemini-poweroff: Fix IRQ check in gemini_poweroff_probe
ray_cs: Check ioremap return value
powerpc: dts: t1040rdb: fix ports names for Seville Ethernet switch
KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init
powerpc/perf: Don't use perf_hw_context for trace IMC PMU
mt76: connac: fix sta_rec_wtbl tag len
mt76: mt7915: use proper aid value in mt7915_mcu_wtbl_generic_tlv in sta mode
mt76: mt7915: use proper aid value in mt7915_mcu_sta_basic_tlv
mt76: mt7921: fix a leftover race in runtime-pm
mt76: mt7615: fix a leftover race in runtime-pm
mt76: mt7603: check sta_rates pointer in mt7603_sta_rate_tbl_update
mt76: mt7615: check sta_rates pointer in mt7615_sta_rate_tbl_update
ptp: unregister virtual clocks when unregistering physical clock.
net: dsa: mv88e6xxx: Enable port policy support on 6097
mac80211: Remove a couple of obsolete TODO
mac80211: limit bandwidth in HE capabilities
scripts/dtc: Call pkg-config POSIXly correct
livepatch: Fix build failure on 32 bits processors
net: asix: add proper error handling of usb read errors
i2c: bcm2835: Use platform_get_irq() to get the interrupt
i2c: bcm2835: Fix the error handling in 'bcm2835_i2c_probe()'
mtd: mchp23k256: Add SPI ID table
mtd: mchp48l640: Add SPI ID table
igc: avoid kernel warning when changing RX ring parameters
igb: refactor XDP registration
PCI: aardvark: Fix reading MSI interrupt number
PCI: aardvark: Fix reading PCI_EXP_RTSTA_PME bit on emulated bridge
RDMA/rxe: Check the last packet by RXE_END_MASK
libbpf: Fix signedness bug in btf_dump_array_data()
cxl/core: Fix cxl_probe_component_regs() error message
cxl/regs: Fix size of CXL Capability Header Register
net:enetc: allocate CBD ring data memory using DMA coherent methods
libbpf: Fix compilation warning due to mismatched printf format
drm/bridge: dw-hdmi: use safe format when first in bridge chain
libbpf: Use dynamically allocated buffer when receiving netlink messages
power: supply: ab8500: Fix memory leak in ab8500_fg_sysfs_init
HID: i2c-hid: fix GET/SET_REPORT for unnumbered reports
iommu/ipmmu-vmsa: Check for error num after setting mask
drm/bridge: anx7625: Fix overflow issue on reading EDID
bpftool: Fix the error when lookup in no-btf maps
drm/amd/pm: enable pm sysfs write for one VF mode
drm/amd/display: Add affected crtcs to atomic state for dsc mst unplug
libbpf: Fix memleak in libbpf_netlink_recv()
IB/cma: Allow XRC INI QPs to set their local ACK timeout
dax: make sure inodes are flushed before destroy cache
selftests: mptcp: add csum mib check for mptcp_connect
iwlwifi: mvm: Don't call iwl_mvm_sta_from_mac80211() with NULL sta
iwlwifi: mvm: don't iterate unadded vifs when handling FW SMPS req
iwlwifi: mvm: align locking in D3 test debugfs
iwlwifi: yoyo: remove DBGI_SRAM address reset writing
iwlwifi: Fix -EIO error code that is never returned
iwlwifi: mvm: Fix an error code in iwl_mvm_up()
mtd: rawnand: pl353: Set the nand chip node as the flash node
drm/msm/dp: populate connector of struct dp_panel
drm/msm/dp: stop link training after link training 2 failed
drm/msm/dp: always add fail-safe mode into connector mode list
drm/msm/dsi: Use "ref" fw clock instead of global name for VCO parent
drm/msm/dsi/phy: fix 7nm v4.0 settings for C-PHY mode
drm/msm/dpu: add DSPP blocks teardown
drm/msm/dpu: fix dp audio condition
dm crypt: fix get_key_size compiler warning if !CONFIG_KEYS
vfio/pci: fix memory leak during D3hot to D0 transition
vfio/pci: wake-up devices around reset functions
scsi: fnic: Fix a tracing statement
scsi: pm8001: Fix command initialization in pm80XX_send_read_log()
scsi: pm8001: Fix command initialization in pm8001_chip_ssp_tm_req()
scsi: pm8001: Fix payload initialization in pm80xx_set_thermal_config()
scsi: pm8001: Fix le32 values handling in pm80xx_set_sas_protocol_timer_config()
scsi: pm8001: Fix payload initialization in pm80xx_encrypt_update()
scsi: pm8001: Fix le32 values handling in pm80xx_chip_ssp_io_req()
scsi: pm8001: Fix le32 values handling in pm80xx_chip_sata_req()
scsi: pm8001: Fix NCQ NON DATA command task initialization
scsi: pm8001: Fix NCQ NON DATA command completion handling
scsi: pm8001: Fix abort all task initialization
RDMA/mlx5: Fix the flow of a miss in the allocation of a cache ODP MR
drm/amd/display: Remove vupdate_int_entry definition
TOMOYO: fix __setup handlers return values
power: supply: sbs-charger: Don't cancel work that is not initialized
ext2: correct max file size computing
drm/tegra: Fix reference leak in tegra_dsi_ganged_probe
power: supply: bq24190_charger: Fix bq24190_vbus_is_enabled() wrong false return
scsi: hisi_sas: Change permission of parameter prot_mask
drm/bridge: cdns-dsi: Make sure to to create proper aliases for dt
bpf, arm64: Call build_prologue() first in first JIT pass
bpf, arm64: Feed byte-offset into bpf line info
xsk: Fix race at socket teardown
RDMA/irdma: Fix netdev notifications for vlan's
RDMA/irdma: Fix Passthrough mode in VM
RDMA/irdma: Remove incorrect masking of PD
gpu: host1x: Fix a memory leak in 'host1x_remove()'
libbpf: Skip forward declaration when counting duplicated type names
powerpc/mm/numa: skip NUMA_NO_NODE onlining in parse_numa_properties()
powerpc/Makefile: Don't pass -mcpu=powerpc64 when building 32-bit
KVM: x86: Fix emulation in writing cr8
KVM: x86/emulator: Defer not-present segment check in __load_segment_descriptor()
hv_balloon: rate-limit "Unhandled message" warning
i2c: xiic: Make bus names unique
power: supply: wm8350-power: Handle error for wm8350_register_irq
power: supply: wm8350-power: Add missing free in free_charger_irq
IB/hfi1: Allow larger MTU without AIP
RDMA/core: Fix ib_qp_usecnt_dec() called when error
PCI: Reduce warnings on possible RW1C corruption
net: axienet: fix RX ring refill allocation failure handling
drm/msm/a6xx: Fix missing ARRAY_SIZE() check
mips: DEC: honor CONFIG_MIPS_FP_SUPPORT=n
MIPS: Sanitise Cavium switch cases in TLB handler synthesizers
powerpc/sysdev: fix incorrect use to determine if list is empty
powerpc/64s: Don't use DSISR for SLB faults
mfd: mc13xxx: Add check for mc13xxx_irq_request
libbpf: Unmap rings when umem deleted
selftests/bpf: Make test_lwt_ip_encap more stable and faster
platform/x86: huawei-wmi: check the return value of device_create_file()
scsi: mpt3sas: Fix incorrect 4GB boundary check
powerpc: 8xx: fix a return value error in mpc8xx_pic_init
vxcan: enable local echo for sent CAN frames
ath10k: Fix error handling in ath10k_setup_msa_resources
mips: cdmm: Fix refcount leak in mips_cdmm_phys_base
MIPS: RB532: fix return value of __setup handler
MIPS: pgalloc: fix memory leak caused by pgd_free()
mtd: rawnand: atmel: fix refcount issue in atmel_nand_controller_init
power: ab8500_chargalg: Use CLOCK_MONOTONIC
RDMA/irdma: Prevent some integer underflows
Revert "RDMA/core: Fix ib_qp_usecnt_dec() called when error"
RDMA/mlx5: Fix memory leak in error flow for subscribe event routine
bpf, sockmap: Fix memleak in sk_psock_queue_msg
bpf, sockmap: Fix memleak in tcp_bpf_sendmsg while sk msg is full
bpf, sockmap: Fix more uncharged while msg has more_data
bpf, sockmap: Fix double uncharge the mem of sk_msg
samples/bpf, xdpsock: Fix race when running for fix duration of time
USB: storage: ums-realtek: fix error code in rts51x_read_mem()
drm/i915/display: Fix HPD short pulse handling for eDP
netfilter: flowtable: Fix QinQ and pppoe support for inet table
mt76: mt7921: fix mt7921_queues_acq implementation
can: isotp: sanitize CAN ID checks in isotp_bind()
can: isotp: return -EADDRNOTAVAIL when reading from unbound socket
can: isotp: support MSG_TRUNC flag when reading from socket
bareudp: use ipv6_mod_enabled to check if IPv6 enabled
ibmvnic: fix race between xmit and reset
af_unix: Fix some data-races around unix_sk(sk)->oob_skb.
selftests/bpf: Fix error reporting from sock_fields programs
Bluetooth: hci_uart: add missing NULL check in h5_enqueue
Bluetooth: call hci_le_conn_failed with hdev lock in hci_le_conn_failed
Bluetooth: btmtksdio: Fix kernel oops in btmtksdio_interrupt
ipv4: Fix route lookups when handling ICMP redirects and PMTU updates
af_netlink: Fix shift out of bounds in group mask calculation
i2c: meson: Fix wrong speed use from probe
netfilter: conntrack: Add and use nf_ct_set_auto_assign_helper_warned()
i2c: mux: demux-pinctrl: do not deactivate a master that is not active
powerpc/pseries: Fix use after free in remove_phb_dynamic()
selftests/bpf/test_lirc_mode2.sh: Exit with proper code
PCI: Avoid broken MSI on SB600 USB devices
net: bcmgenet: Use stronger register read/writes to assure ordering
tcp: ensure PMTU updates are processed during fastopen
openvswitch: always update flow key after nat
net: dsa: fix panic on shutdown if multi-chip tree failed to probe
tipc: fix the timer expires after interval 100ms
mfd: asic3: Add missing iounmap() on error asic3_mfd_probe
ice: fix 'scheduling while atomic' on aux critical err interrupt
ice: don't allow to run ice_send_event_to_aux() in atomic ctx
drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool
kernel/resource: fix kfree() of bootmem memory again
staging: r8188eu: convert DBG_88E_LEVEL call in hal/rtl8188e_hal_init.c
staging: r8188eu: release_firmware is not called if allocation fails
mxser: fix xmit_buf leak in activate when LSR == 0xff
fsi: scom: Fix error handling
fsi: scom: Remove retries in indirect scoms
pwm: lpc18xx-sct: Initialize driver data and hardware before pwmchip_add()
pps: clients: gpio: Propagate return value from pps_gpio_probe
fsi: Aspeed: Fix a potential double free
misc: alcor_pci: Fix an error handling path
cpufreq: qcom-cpufreq-nvmem: fix reading of PVS Valid fuse
soundwire: intel: fix wrong register name in intel_shim_wake
clk: qcom: ipq8074: fix PCI-E clock oops
dmaengine: idxd: check GENCAP config support for gencfg register
dmaengine: idxd: change bandwidth token to read buffers
dmaengine: idxd: restore traffic class defaults after wq reset
iio: mma8452: Fix probe failing when an i2c_device_id is used
serial: 8250_aspeed_vuart: add PORT_ASPEED_VUART port type
staging:iio:adc:ad7280a: Fix handing of device address bit reversing.
pinctrl: renesas: r8a77470: Reduce size for narrow VIN1 channel
pinctrl: renesas: checker: Fix miscalculation of number of states
clk: qcom: ipq8074: Use floor ops for SDCC1 clock
phy: dphy: Correct lpx parameter and its derivatives(ta_{get,go,sure})
phy: phy-brcm-usb: fixup BCM4908 support
serial: 8250_mid: Balance reference count for PCI DMA device
serial: 8250_lpss: Balance reference count for PCI DMA device
NFS: Use of mapping_set_error() results in spurious errors
serial: 8250: Fix race condition in RTS-after-send handling
iio: adc: Add check for devm_request_threaded_irq
habanalabs: Add check for pci_enable_device
NFS: Return valid errors from nfs2/3_decode_dirent()
staging: r8188eu: fix endless loop in recv_func
dma-debug: fix return value of __setup handlers
clk: imx7d: Remove audio_mclk_root_clk
clk: imx: off by one in imx_lpcg_parse_clks_from_dt()
clk: at91: sama7g5: fix parents of PDMCs' GCLK
clk: qcom: clk-rcg2: Update logic to calculate D value for RCG
clk: qcom: clk-rcg2: Update the frac table for pixel clock
dmaengine: hisi_dma: fix MSI allocate fail when reload hisi_dma
remoteproc: qcom: Fix missing of_node_put in adsp_alloc_memory_region
remoteproc: qcom_wcnss: Add missing of_node_put() in wcnss_alloc_memory_region
remoteproc: qcom_q6v5_mss: Fix some leaks in q6v5_alloc_memory_region
nvdimm/region: Fix default alignment for small regions
clk: actions: Terminate clk_div_table with sentinel element
clk: loongson1: Terminate clk_div_table with sentinel element
clk: hisilicon: Terminate clk_div_table with sentinel element
clk: clps711x: Terminate clk_div_table with sentinel element
clk: Fix clk_hw_get_clk() when dev is NULL
clk: tegra: tegra124-emc: Fix missing put_device() call in emc_ensure_emc_driver
mailbox: imx: fix crash in resume on i.mx8ulp
NFS: remove unneeded check in decode_devicenotify_args()
staging: mt7621-dts: fix LEDs and pinctrl on GB-PC1 devicetree
staging: mt7621-dts: fix formatting
staging: mt7621-dts: fix pinctrl properties for ethernet
staging: mt7621-dts: fix GB-PC2 devicetree
pinctrl: mediatek: Fix missing of_node_put() in mtk_pctrl_init
pinctrl: mediatek: paris: Fix PIN_CONFIG_BIAS_* readback
pinctrl: mediatek: paris: Fix "argument" argument type for mtk_pinconf_get()
pinctrl: mediatek: paris: Fix pingroup pin config state readback
pinctrl: mediatek: paris: Skip custom extra pin config dump for virtual GPIOs
pinctrl: microchip sgpio: use reset driver
pinctrl: microchip-sgpio: lock RMW access
pinctrl: nomadik: Add missing of_node_put() in nmk_pinctrl_probe
pinctrl/rockchip: Add missing of_node_put() in rockchip_pinctrl_probe
tty: hvc: fix return value of __setup handler
kgdboc: fix return value of __setup handler
serial: 8250: fix XOFF/XON sending when DMA is used
virt: acrn: obtain pa from VMA with PFNMAP flag
virt: acrn: fix a memory leak in acrn_dev_ioctl()
kgdbts: fix return value of __setup handler
firmware: google: Properly state IOMEM dependency
driver core: dd: fix return value of __setup handler
jfs: fix divide error in dbNextAG
netfilter: nf_conntrack_tcp: preserve liberal flag in tcp options
SUNRPC don't resend a task on an offlined transport
NFSv4.1: don't retry BIND_CONN_TO_SESSION on session error
kdb: Fix the putarea helper function
perf stat: Fix forked applications enablement of counters
clk: qcom: gcc-msm8994: Fix gpll4 width
vsock/virtio: initialize vdev->priv before using VQs
vsock/virtio: read the negotiated features before using VQs
vsock/virtio: enable VQs early on probe
clk: Initialize orphan req_rate
xen: fix is_xen_pmu()
net: enetc: report software timestamping via SO_TIMESTAMPING
net: hns3: fix bug when PF set the duplicate MAC address for VFs
net: hns3: fix port base vlan add fail when concurrent with reset
net: hns3: add vlan list lock to protect vlan list
net: hns3: format the output of the MAC address
net: hns3: refine the process when PF set VF VLAN
net: phy: broadcom: Fix brcm_fet_config_init()
selftests: test_vxlan_under_vrf: Fix broken test case
NFS: Don't loop forever in nfs_do_recoalesce()
net: hns3: clean residual vf config after disable sriov
net: sparx5: depends on PTP_1588_CLOCK_OPTIONAL
qlcnic: dcb: default to returning -EOPNOTSUPP
net/x25: Fix null-ptr-deref caused by x25_disconnect
net: sparx5: switchdev: fix possible NULL pointer dereference
octeontx2-af: initialize action variable
net: prefer nf_ct_put instead of nf_conntrack_put
net/sched: act_ct: fix ref leak when switching zones
NFSv4/pNFS: Fix another issue with a list iterator pointing to the head
net: dsa: bcm_sf2_cfp: fix an incorrect NULL check on list iterator
fs: fd tables have to be multiples of BITS_PER_LONG
lib/test: use after free in register_test_dev_kmod()
fs: fix fd table size alignment properly
LSM: general protection fault in legacy_parse_param
regulator: rpi-panel: Handle I2C errors/timing to the Atmel
crypto: hisilicon/qm - cleanup warning in qm_vf_read_qos
gcc-plugins/stackleak: Exactly match strings instead of prefixes
pinctrl: npcm: Fix broken references to chip->parent_device
rcu: Mark writes to the rcu_segcblist structure's ->flags field
block/bfq_wf2q: correct weight to ioprio
crypto: xts - Add softdep on ecb
crypto: hisilicon/sec - not need to enable sm4 extra mode at HW V3
block, bfq: don't move oom_bfqq
selinux: use correct type for context length
arm64: module: remove (NOLOAD) from linker script
selinux: allow FIOCLEX and FIONCLEX with policy capability
loop: use sysfs_emit() in the sysfs xxx show()
Fix incorrect type in assignment of ipv6 port for audit
irqchip/qcom-pdc: Fix broken locking
irqchip/nvic: Release nvic_base upon failure
fs/binfmt_elf: Fix AT_PHDR for unusual ELF files
bfq: fix use-after-free in bfq_dispatch_request
ACPICA: Avoid walking the ACPI Namespace if it is not there
lib/raid6/test/Makefile: Use $(pound) instead of \# for Make 4.3
Revert "Revert "block, bfq: honor already-setup queue merges""
ACPI/APEI: Limit printable size of BERT table data
PM: core: keep irq flags in device_pm_check_callbacks()
parisc: Fix handling off probe non-access faults
nvme-tcp: lockdep: annotate in-kernel sockets
spi: tegra20: Use of_device_get_match_data()
atomics: Fix atomic64_{read_acquire,set_release} fallbacks
locking/lockdep: Iterate lock_classes directly when reading lockdep files
ext4: correct cluster len and clusters changed accounting in ext4_mb_mark_bb
ext4: fix ext4_mb_mark_bb() with flex_bg with fast_commit
sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE
ext4: don't BUG if someone dirty pages without asking ext4 first
f2fs: fix to do sanity check on curseg->alloc_type
NFSD: Fix nfsd_breaker_owns_lease() return values
f2fs: don't get FREEZE lock in f2fs_evict_inode in frozen fs
btrfs: harden identification of a stale device
btrfs: make search_csum_tree return 0 if we get -EFBIG
f2fs: use spin_lock to avoid hang
f2fs: compress: fix to print raw data size in error path of lz4 decompression
Adjust cifssb maximum read size
ntfs: add sanity check on allocation size
media: staging: media: zoran: move videodev alloc
media: staging: media: zoran: calculate the right buffer number for zoran_reap_stat_com
media: staging: media: zoran: fix various V4L2 compliance errors
media: atmel: atmel-isc-base: report frame sizes as full supported range
media: ir_toy: free before error exiting
ASoC: sh: rz-ssi: Make the data structures available before registering the handlers
ASoC: SOF: Intel: match sdw version on link_slaves_found
media: imx-jpeg: Prevent decoding NV12M jpegs into single-planar buffers
media: iommu/mediatek-v1: Free the existed fwspec if the master dev already has
media: iommu/mediatek: Return ENODEV if the device is NULL
media: iommu/mediatek: Add device_link between the consumer and the larb devices
video: fbdev: nvidiafb: Use strscpy() to prevent buffer overflow
video: fbdev: w100fb: Reset global state
video: fbdev: cirrusfb: check pixclock to avoid divide by zero
video: fbdev: omapfb: acx565akm: replace snprintf with sysfs_emit
ARM: dts: qcom: fix gic_irq_domain_translate warnings for msm8960
ARM: dts: bcm2837: Add the missing L1/L2 cache information
ASoC: madera: Add dependencies on MFD
media: atomisp_gmin_platform: Add DMI quirk to not turn AXP ELDO2 regulator off on some boards
media: atomisp: fix dummy_ptr check to avoid duplicate active_bo
ARM: ftrace: avoid redundant loads or clobbering IP
ARM: dts: imx7: Use audio_mclk_post_div instead audio_mclk_root_clk
arm64: defconfig: build imx-sdma as a module
video: fbdev: omapfb: panel-dsi-cm: Use sysfs_emit() instead of snprintf()
video: fbdev: omapfb: panel-tpo-td043mtea1: Use sysfs_emit() instead of snprintf()
video: fbdev: udlfb: replace snprintf in show functions with sysfs_emit
ARM: dts: bcm2711: Add the missing L1/L2 cache information
ASoC: soc-core: skip zero num_dai component in searching dai name
media: imx-jpeg: fix a bug of accessing array out of bounds
media: cx88-mpeg: clear interrupt status register before streaming video
uaccess: fix type mismatch warnings from access_ok()
lib/test_lockup: fix kernel pointer check for separate address spaces
ARM: tegra: tamonten: Fix I2C3 pad setting
ARM: mmp: Fix failure to remove sram device
ASoC: amd: vg: fix for pm resume callback sequence
video: fbdev: sm712fb: Fix crash in smtcfb_write()
media: i2c: ov5648: Fix lockdep error
media: Revert "media: em28xx: add missing em28xx_close_extension"
media: hdpvr: initialize dev->worker at hdpvr_register_videodev
ASoC: Intel: sof_sdw: fix quirks for 2022 HP Spectre x360 13"
tracing: Have TRACE_DEFINE_ENUM affect trace event types as well
mmc: host: Return an error when ->enable_sdio_irq() ops is missing
media: atomisp: fix bad usage at error handling logic
ALSA: hda/realtek: Add alc256-samsung-headphone fixup
KVM: x86: Reinitialize context if host userspace toggles EFER.LME
KVM: x86/mmu: Move "invalid" check out of kvm_tdp_mmu_get_root()
KVM: x86/mmu: Zap _all_ roots when unmapping gfn range in TDP MMU
KVM: x86/mmu: Check for present SPTE when clearing dirty bit in TDP MMU
KVM: x86: hyper-v: Drop redundant 'ex' parameter from kvm_hv_send_ipi()
KVM: x86: hyper-v: Drop redundant 'ex' parameter from kvm_hv_flush_tlb()
KVM: x86: hyper-v: Fix the maximum number of sparse banks for XMM fast TLB flush hypercalls
KVM: x86: hyper-v: HVCALL_SEND_IPI_EX is an XMM fast hypercall
powerpc/kasan: Fix early region not updated correctly
powerpc/lib/sstep: Fix 'sthcx' instruction
powerpc/lib/sstep: Fix build errors with newer binutils
powerpc: Add set_memory_{p/np}() and remove set_memory_attr()
powerpc: Fix build errors with newer binutils
drm/dp: Fix off-by-one in register cache size
drm/i915: Treat SAGV block time 0 as SAGV disabled
drm/i915: Fix PSF GV point mask when SAGV is not possible
drm/i915: Reject unsupported TMDS rates on ICL+
scsi: qla2xxx: Refactor asynchronous command initialization
scsi: qla2xxx: Implement ref count for SRB
scsi: qla2xxx: Fix stuck session in gpdb
scsi: qla2xxx: Fix warning message due to adisc being flushed
scsi: qla2xxx: Fix scheduling while atomic
scsi: qla2xxx: Fix premature hw access after PCI error
scsi: qla2xxx: Fix wrong FDMI data for 64G adapter
scsi: qla2xxx: Fix warning for missing error code
scsi: qla2xxx: Fix device reconnect in loop topology
scsi: qla2xxx: edif: Fix clang warning
scsi: qla2xxx: Fix T10 PI tag escape and IP guard options for 28XX adapters
scsi: qla2xxx: Add devids and conditionals for 28xx
scsi: qla2xxx: Check for firmware dump already collected
scsi: qla2xxx: Suppress a kernel complaint in qla_create_qpair()
scsi: qla2xxx: Fix disk failure to rediscover
scsi: qla2xxx: Fix incorrect reporting of task management failure
scsi: qla2xxx: Fix hang due to session stuck
scsi: qla2xxx: Fix missed DMA unmap for NVMe ls requests
scsi: qla2xxx: Fix N2N inconsistent PLOGI
scsi: qla2xxx: Fix stuck session of PRLI reject
scsi: qla2xxx: Reduce false trigger to login
scsi: qla2xxx: Use correct feature type field during RFF_ID processing
platform: chrome: Split trace include file
KVM: x86: Check lapic_in_kernel() before attempting to set a SynIC irq
KVM: x86: Avoid theoretical NULL pointer dereference in kvm_irq_delivery_to_apic_fast()
KVM: x86: Forbid VMM to set SYNIC/STIMER MSRs when SynIC wasn't activated
KVM: Prevent module exit until all VMs are freed
KVM: x86: fix sending PV IPI
KVM: SVM: fix panic on out-of-bounds guest IRQ
ubifs: rename_whiteout: Fix double free for whiteout_ui->data
ubifs: Fix deadlock in concurrent rename whiteout and inode writeback
ubifs: Add missing iput if do_tmpfile() failed in rename whiteout
ubifs: Rename whiteout atomically
ubifs: Fix 'ui->dirty' race between do_tmpfile() and writeback work
ubifs: Rectify space amount budget for mkdir/tmpfile operations
ubifs: setflags: Make dirtied_ino_d 8 bytes aligned
ubifs: Fix read out-of-bounds in ubifs_wbuf_write_nolock()
ubifs: Fix to add refcount once page is set private
ubifs: rename_whiteout: correct old_dir size computing
nvme: allow duplicate NSIDs for private namespaces
nvme: fix the read-only state for zoned namespaces with unsupposed features
wireguard: queueing: use CFI-safe ptr_ring cleanup function
wireguard: socket: free skb in send6 when ipv6 is disabled
wireguard: socket: ignore v6 endpoints when ipv6 is disabled
XArray: Fix xas_create_range() when multi-order entry present
can: mcba_usb: mcba_usb_start_xmit(): fix double dev_kfree_skb in error path
can: mcba_usb: properly check endpoint type
can: mcp251xfd: mcp251xfd_register_get_dev_id(): fix return of error value
XArray: Update the LRU list in xas_split()
modpost: restore the warning message for missing symbol versions
rtc: check if __rtc_read_time was successful
gfs2: gfs2_setattr_size error path fix
gfs2: Make sure FITRIM minlen is rounded up to fs block size
net: hns3: fix the concurrency between functions reading debugfs
net: hns3: fix software vlan talbe of vlan 0 inconsistent with hardware
rxrpc: fix some null-ptr-deref bugs in server_key.c
rxrpc: Fix call timer start racing with call destruction
mailbox: imx: fix wakeup failure from freeze mode
crypto: arm/aes-neonbs-cbc - Select generic cbc and aes
watch_queue: Free the page array when watch_queue is dismantled
pinctrl: pinconf-generic: Print arguments for bias-pull-*
watchdog: rti-wdt: Add missing pm_runtime_disable() in probe function
net: sparx5: uses, depends on BRIDGE or !BRIDGE
pinctrl: nuvoton: npcm7xx: Rename DS() macro to DSTR()
pinctrl: nuvoton: npcm7xx: Use %zu printk format for ARRAY_SIZE()
ASoC: mediatek: mt6358: add missing EXPORT_SYMBOLs
ubi: Fix race condition between ctrl_cdev_ioctl and ubi_cdev_ioctl
ARM: iop32x: offset IRQ numbers by 1
block: Fix the maximum minor value is blk_alloc_ext_minor()
io_uring: fix memory leak of uid in files registration
riscv module: remove (NOLOAD)
ACPI: CPPC: Avoid out of bounds access when parsing _CPC data
vhost: handle error while adding split ranges to iotlb
spi: Fix Tegra QSPI example
platform/chrome: cros_ec_typec: Check for EC device
can: isotp: restore accidentally removed MSG_PEEK feature
proc: bootconfig: Add null pointer check
drm/connector: Fix typo in documentation
scsi: qla2xxx: Add qla2x00_async_done() for async routines
staging: mt7621-dts: fix pinctrl-0 items to be size-1 items on ethernet
arm64: mm: Drop 'const' from conditional arm64_dma_phys_limit definition
ASoC: soc-compress: Change the check for codec_dai
Reinstate some of "swiotlb: rework "fix info leak with DMA_FROM_DEVICE""
tracing: Have type enum modifications copy the strings
net: add skb_set_end_offset() helper
net: preserve skb_end_offset() in skb_unclone_keeptruesize()
mm/mmap: return 1 from stack_guard_gap __setup() handler
ARM: 9187/1: JIVE: fix return value of __setup handler
mm/memcontrol: return 1 from cgroup.memory __setup() handler
mm/usercopy: return 1 from hardened_usercopy __setup() handler
af_unix: Support POLLPRI for OOB.
bpf: Adjust BPF stack helper functions to accommodate skip > 0
bpf: Fix comment for helper bpf_current_task_under_cgroup()
mmc: rtsx: Use pm_runtime_{get,put}() to handle runtime PM
dt-bindings: mtd: nand-controller: Fix the reg property description
dt-bindings: mtd: nand-controller: Fix a comment in the examples
dt-bindings: spi: mxic: The interrupt property is not mandatory
dt-bindings: memory: mtk-smi: No need mediatek,larb-id for mt8167
dt-bindings: pinctrl: pinctrl-microchip-sgpio: Fix example
ubi: fastmap: Return error code if memory allocation fails in add_aeb()
ASoC: SOF: Intel: Fix build error without SND_SOC_SOF_PCI_DEV
ASoC: topology: Allow TLV control to be either read or write
perf vendor events: Update metrics for SkyLake Server
media: ov6650: Add try support to selection API operations
media: ov6650: Fix crop rectangle affected by set format
spi: mediatek: support tick_delay without enhance_timing
ARM: dts: spear1340: Update serial node properties
ARM: dts: spear13xx: Update SPI dma properties
arm64: dts: ls1043a: Update i2c dma properties
arm64: dts: ls1046a: Update i2c node dma properties
um: Fix uml_mconsole stop/go
docs: sysctl/kernel: add missing bit to panic_print
openvswitch: Fixed nd target mask field in the flow dump.
torture: Make torture.sh help message match reality
n64cart: convert bi_disk to bi_bdev->bd_disk fix build
mmc: rtsx: Let MMC core handle runtime PM
mmc: rtsx: Fix build errors/warnings for unused variable
KVM: x86/mmu: do compare-and-exchange of gPTE via the user address
iommu/dma: Skip extra sync during unmap w/swiotlb
iommu/dma: Fold _swiotlb helpers into callers
iommu/dma: Check CONFIG_SWIOTLB more broadly
swiotlb: Support aligned swiotlb buffers
iommu/dma: Account for min_align_mask w/swiotlb
coredump: Snapshot the vmas in do_coredump
coredump: Remove the WARN_ON in dump_vma_snapshot
coredump/elf: Pass coredump_params into fill_note_info
coredump: Use the vma snapshot in fill_files_note
PCI: xgene: Revert "PCI: xgene: Use inbound resources for setup"
Linux 5.15.33
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Id62bd8a22d0bfa7c2096539d253ffce804bed017
|
||
|
|
f5e59185b0 |
mm: don't skip swap entry even if zap_details specified
commit 5abfd71d936a8aefd9f9ccd299dea7a164a5d455 upstream.
Patch series "mm: Rework zap ptes on swap entries", v5.
Patch 1 should fix a long standing bug for zap_pte_range() on
zap_details usage. The risk is we could have some swap entries skipped
while we should have zapped them.
Migration entries are not the major concern because file backed memory
always zap in the pattern that "first time without page lock, then
re-zap with page lock" hence the 2nd zap will always make sure all
migration entries are already recovered.
However there can be issues with real swap entries got skipped
errornoously. There's a reproducer provided in commit message of patch
1 for that.
Patch 2-4 are cleanups that are based on patch 1. After the whole
patchset applied, we should have a very clean view of zap_pte_range().
Only patch 1 needs to be backported to stable if necessary.
This patch (of 4):
The "details" pointer shouldn't be the token to decide whether we should
skip swap entries.
For example, when the callers specified details->zap_mapping==NULL, it
means the user wants to zap all the pages (including COWed pages), then
we need to look into swap entries because there can be private COWed
pages that was swapped out.
Skipping some swap entries when details is non-NULL may lead to wrongly
leaving some of the swap entries while we should have zapped them.
A reproducer of the problem:
===8<===
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <stdio.h>
#include <assert.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
int page_size;
int shmem_fd;
char *buffer;
void main(void)
{
int ret;
char val;
page_size = getpagesize();
shmem_fd = memfd_create("test", 0);
assert(shmem_fd >= 0);
ret = ftruncate(shmem_fd, page_size * 2);
assert(ret == 0);
buffer = mmap(NULL, page_size * 2, PROT_READ | PROT_WRITE,
MAP_PRIVATE, shmem_fd, 0);
assert(buffer != MAP_FAILED);
/* Write private page, swap it out */
buffer[page_size] = 1;
madvise(buffer, page_size * 2, MADV_PAGEOUT);
/* This should drop private buffer[page_size] already */
ret = ftruncate(shmem_fd, page_size);
assert(ret == 0);
/* Recover the size */
ret = ftruncate(shmem_fd, page_size * 2);
assert(ret == 0);
/* Re-read the data, it should be all zero */
val = buffer[page_size];
if (val == 0)
printf("Good\n");
else
printf("BUG\n");
}
===8<===
We don't need to touch up the pmd path, because pmd never had a issue with
swap entries. For example, shmem pmd migration will always be split into
pte level, and same to swapping on anonymous.
Add another helper should_zap_cows() so that we can also check whether we
should zap private mappings when there's no page pointer specified.
This patch drops that trick, so we handle swap ptes coherently. Meanwhile
we should do the same check upon migration entry, hwpoison entry and
genuine swap entries too.
To be explicit, we should still remember to keep the private entries if
even_cows==false, and always zap them when even_cows==true.
The issue seems to exist starting from the initial commit of git.
[peterx@redhat.com: comment tweaks]
Link: https://lkml.kernel.org/r/20220217060746.71256-2-peterx@redhat.com
Link: https://lkml.kernel.org/r/20220217060746.71256-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20220216094810.60572-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20220216094810.60572-2-peterx@redhat.com
Fixes:
|
||
|
|
7d04d6d5c1 |
mm,hwpoison: unmap poisoned page before invalidation
commit 3149c79f3cb0e2e3bafb7cfadacec090cbd250d3 upstream. In some cases it appears the invalidation of a hwpoisoned page fails because the page is still mapped in another process. This can cause a program to be continuously restarted and die when it page faults on the page that was not invalidated. Avoid that problem by unmapping the hwpoisoned page when we find it. Another issue is that sometimes we end up oopsing in finish_fault, if the code tries to do something with the now-NULL vmf->page. I did not hit this error when submitting the previous patch because there are several opportunities for alloc_set_pte to bail out before accessing vmf->page, and that apparently happened on those systems, and most of the time on other systems, too. However, across several million systems that error does occur a handful of times a day. It can be avoided by returning VM_FAULT_NOPAGE which will cause do_read_fault to return before calling finish_fault. Link: https://lkml.kernel.org/r/20220325161428.5068d97e@imladris.surriel.com Fixes: e53ac7374e64 ("mm: invalidate hwpoison page cache page in fault path") Signed-off-by: Rik van Riel <riel@surriel.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Tested-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
3bae72c2db |
mm: invalidate hwpoison page cache page in fault path
commit e53ac7374e64dede04d745ff0e70ff5048378d1f upstream. Sometimes the page offlining code can leave behind a hwpoisoned clean page cache page. This can lead to programs being killed over and over and over again as they fault in the hwpoisoned page, get killed, and then get re-spawned by whatever wanted to run them. This is particularly embarrassing when the page was offlined due to having too many corrected memory errors. Now we are killing tasks due to them trying to access memory that probably isn't even corrupted. This problem can be avoided by invalidating the page from the page fault handler, which already has a branch for dealing with these kinds of pages. With this patch we simply pretend the page fault was successful if the page was invalidated, return to userspace, incur another page fault, read in the file from disk (to a new memory page), and then everything works again. Link: https://lkml.kernel.org/r/20220212213740.423efcea@imladris.surriel.com Signed-off-by: Rik van Riel <riel@surriel.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
385b0dd1f9 |
ANDROID: mm: Fix page table lookup in speculative fault path
In speculative fault path, while doing page table lookup, offset is obtained at each level and value at that offset is read and checks are perfomed on it, later to get next level offset we read from previous level offset again. A concurrent page table reclaimation operation could result in change in value at this offset, and we go ahead and access it, this would result in reading an invalid entry. Fix this by reading from previous level offset again and comparing before performing next level access. Bug: 221005439 Change-Id: I66b3d24ae79c7ee5ccce4ba7a94f028f4cf3fda0 Signed-off-by: Vijayanand Jitta <quic_vjitta@quicinc.com> |
||
|
|
7d6787088d |
BACKPORT: FROMLIST: mm: enable speculative fault handling for supported file types.
Introduce vma_can_speculate(), which allows speculative handling for VMAs mapping supported file types. From do_handle_mm_fault(), speculative handling will follow through __handle_mm_fault(), handle_pte_fault() and do_fault(). At this point, we expect speculative faults to continue through one of: - do_read_fault(), fully implemented; - do_cow_fault(), which might abort if missing anon vmas, - do_shared_fault(), not implemented yet (would require ->page_mkwrite() changes). vma_can_speculate() provides an early abort for the do_shared_fault() case, limiting the time spent on trying that unimplemented case. Signed-off-by: Michel Lespinasse <michel@lespinasse.org> Link: https://lore.kernel.org/all/20210407014502.24091-31-michel@lespinasse.org/ Conflicts: include/linux/vm_event_item.h mm/vmstat.c 1. SPF_ATTEMPT_FILE is taken from https://lore.kernel.org/all/20210407014502.24091-36-michel@lespinasse.org/ since the patch posted upstream at the time had a different structure with stats for anonymouse and file-backed pagefaults introduced in a separate patch. Bug: 161210518 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I3a28af63b41b649f02f8b73d53f6494ad114ee5a |
||
|
|
7045d2d838 |
FROMLIST: mm: implement speculative handling in do_fault_around()
Call the vm_ops->map_pages method within an rcu read locked section. In the speculative case, verify the mmap sequence lock at the start of the section. A match guarantees that the original vma is still valid at that time, and that the associated vma->vm_file stays valid while the vm_ops->map_pages() method is running. Do not test vmf->pmd in the speculative case - we only speculate when a page table already exists, and and this saves us from having to handle synchronization around the vmf->pmd read. Change xfs_filemap_map_pages() account for the fact that it can not block anymore, as it is now running within an rcu read lock. Signed-off-by: Michel Lespinasse <michel@lespinasse.org> Link: https://lore.kernel.org/all/20210407014502.24091-28-michel@lespinasse.org/ Bug: 161210518 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Id771c1e6fa9b883595a48d4df63f448a05916eda |
||
|
|
6877640598 |
BACKPORT: FROMLIST: mm: implement speculative fault handling in finish_fault()
In the speculative case, we want to avoid direct pmd checks (which would require some extra synchronization to be safe), and rely on pte_map_lock which will both lock the page table and verify that the pmd has not changed from its initial value. Signed-off-by: Michel Lespinasse <michel@lespinasse.org> Link: https://lore.kernel.org/all/20210407014502.24091-27-michel@lespinasse.org/ Conflicts: mm/memory.c 1. Merge conflict due to new vmf->prealloc_pte usage in finish_fault. Bug: 161210518 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: If6046592083eaf12caf5c51c3fbb287a4dfa1ace |
||
|
|
b12e52ca98 |
FROMLIST: mm: implement speculative handling in __do_fault()
In the speculative case, call the vm_ops->fault() method from within an rcu read locked section, and verify the mmap sequence lock at the start of the section. A match guarantees that the original vma is still valid at that time, and that the associated vma->vm_file stays valid while the vm_ops->fault() method is running. Note that this implies that speculative faults can not sleep within the vm_ops->fault method. We will only attempt to fetch existing pages from the page cache during speculative faults; any miss (or prefetch) will be handled by falling back to non-speculative fault handling. The speculative handling case also does not preallocate page tables, as it is always called with a pre-existing page table. Signed-off-by: Michel Lespinasse <michel@lespinasse.org> Link: https://lore.kernel.org/all/20210407014502.24091-25-michel@lespinasse.org/ Bug: 161210518 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I995ba94d8e96014ef83ac93fe5a4669afcde34b9 |
||
|
|
9b92402808 |
FROMLIST: mm: anon spf statistics
Add a new CONFIG_SPECULATIVE_PAGE_FAULT_STATS config option, and dump extra statistics about executed spf cases and abort reasons when the option is set. Signed-off-by: Michel Lespinasse <michel@lespinasse.org> Link: https://lore.kernel.org/all/20220128131006.67712-32-michel@lespinasse.org/ Bug: 161210518 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Ia53cd88e4a7140aeb26bf8f3869e1fc5270012da |
||
|
|
aa9ae5c915 |
FROMLIST: mm: implement and enable speculative fault handling in handle_pte_fault()
In handle_pte_fault(), allow speculative execution to proceed. Use pte_spinlock() to validate the mmap sequence count when locking the page table. If speculative execution proceeds through do_wp_page(), ensure that we end up in the wp_page_reuse() or wp_page_copy() paths, rather than wp_pfn_shared() or wp_page_shared() (both unreachable as we only handle anon vmas so far) or handle_userfault() (needs an explicit abort to handle non-speculatively). Signed-off-by: Michel Lespinasse <michel@lespinasse.org> Link: https://lore.kernel.org/all/20220128131006.67712-28-michel@lespinasse.org/ Bug: 161210518 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Ia45d095ec7b8e23f1c5d68b7a7f572a3f6f6df97 |
||
|
|
40bc9ed389 |
FROMLIST: mm: implement speculative handling in wp_page_copy()
Change wp_page_copy() to handle the speculative case. This involves aborting speculative faults if they have to allocate an anon_vma, read-locking the mmu_notifier_lock to avoid races with mmu_notifier_register(), and using pte_map_lock() instead of pte_offset_map_lock() to complete the page fault. Also change call sites to clear vmf->pte after unmapping the page table, in order to satisfy pte_map_lock()'s preconditions. Signed-off-by: Michel Lespinasse <michel@lespinasse.org> Link: https://lore.kernel.org/all/20220128131006.67712-27-michel@lespinasse.org/ Bug: 161210518 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Icd2188e9facf5a7fea42000a2808bcda1ad6f0fc |
||
|
|
009020e3d1 |
FROMLIST: mm: enable speculative fault handling in do_numa_page()
Change handle_pte_fault() to allow speculative fault execution to proceed through do_numa_page(). do_swap_page() does not implement speculative execution yet, so it needs to abort with VM_FAULT_RETRY in that case. Signed-off-by: Michel Lespinasse <michel@lespinasse.org> Link: https://lore.kernel.org/all/20220128131006.67712-22-michel@lespinasse.org/ Bug: 161210518 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I0390331facc9ecd37534012abdd9f255ab5bbb12 |
||
|
|
fedc4d513e |
FROMLIST: mm: implement speculative handling in do_numa_page()
change do_numa_page() to use pte_spinlock() when locking the page table, so that the mmap sequence counter will be validated in the speculative case. Signed-off-by: Michel Lespinasse <michel@lespinasse.org> Link: https://lore.kernel.org/all/20220128131006.67712-21-michel@lespinasse.org/ Bug: 161210518 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: If252547faf2a8a6cbba4c0a7ff929071a5f6a657 |
||
|
|
c2b2abe724 |
FROMLIST: mm: enable speculative fault handling through do_anonymous_page()
in x86 fault handler, only attempt spf if the vma is anonymous. In do_handle_mm_fault(), let speculative page faults proceed as long as they fall into anonymous vmas. This enables the speculative handling code in __handle_mm_fault() and do_anonymous_page(). In handle_pte_fault(), if vmf->pte is set (the original pte was not pte_none), catch speculative faults and return VM_FAULT_RETRY as those cases are not implemented yet. Also assert that do_fault() is not reached in the speculative case. Signed-off-by: Michel Lespinasse <michel@lespinasse.org> Link: https://lore.kernel.org/all/20220128131006.67712-20-michel@lespinasse.org/ Bug: 161210518 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I875106fcfa1084f570c2bf8f24a129bdce55316b |
||
|
|
31cf1fd564 |
FROMLIST: mm: implement speculative handling in do_anonymous_page()
Change do_anonymous_page() to handle the speculative case. This involves aborting speculative faults if they have to allocate a new anon_vma, and using pte_map_lock() instead of pte_offset_map_lock() to complete the page fault. Signed-off-by: Michel Lespinasse <michel@lespinasse.org> Link: https://lore.kernel.org/all/20220128131006.67712-19-michel@lespinasse.org/ Bug: 161210518 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I5ad955323faabc142c21f62415db039ac889066a |
||
|
|
6e6766ab76 |
BACKPORT: FROMLIST: mm: add pte_map_lock() and pte_spinlock()
pte_map_lock() and pte_spinlock() are used by fault handlers to ensure the pte is mapped and locked before they commit the faulted page to the mm's address space at the end of the fault. The functions differ in their preconditions; pte_map_lock() expects the pte to be unmapped prior to the call, while pte_spinlock() expects it to be already mapped. In the speculative fault case, the functions verify, after locking the pte, that the mmap sequence count has not changed since the start of the fault, and thus that no mmap lock writers have been running concurrently with the fault. After that point the page table lock serializes any further races with concurrent mmap lock writers. If the mmap sequence count check fails, both functions will return false with the pte being left unmapped and unlocked. Signed-off-by: Michel Lespinasse <michel@lespinasse.org> Link: https://lore.kernel.org/all/20220128131006.67712-18-michel@lespinasse.org/ Conflicts: include/linux/mm.h 1. Fixed pte_map_lock and pte_spinlock macros not to fail when CONFIG_SPECULATIVE_PAGE_FAULT=n Bug: 161210518 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Ibd7ccc2ead4fdf29f28c7657b312b2f677ac8836 |
||
|
|
6ab660d7cb |
FROMLIST: mm: implement speculative handling in __handle_mm_fault().
The speculative path calls speculative_page_walk_begin() before walking the page table tree to prevent page table reclamation. The logic is otherwise similar to the non-speculative path, but with additional restrictions: in the speculative path, we do not handle huge pages or wiring new pages tables. Signed-off-by: Michel Lespinasse <michel@lespinasse.org> Link: https://lore.kernel.org/all/20220128131006.67712-17-michel@lespinasse.org/ Bug: 161210518 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: If099534da8b0ac105bbaa5ea4714a6654032592a |
||
|
|
f3f9f17a32 |
FROMLIST: mm: refactor __handle_mm_fault() / handle_pte_fault()
Move the code that initializes vmf->pte and vmf->orig_pte from handle_pte_fault() to its single call site in __handle_mm_fault(). This ensures vmf->pte is now initialized together with the higher levels of the page table hierarchy. This also prepares for speculative page fault handling, where the entire page table walk (higher levels down to ptes) needs special care in the speculative case. Signed-off-by: Michel Lespinasse <michel@lespinasse.org> Link: https://lore.kernel.org/all/20220128131006.67712-16-michel@lespinasse.org/ Bug: 161210518 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Id550086fe568331aa71c91468f8314faad993b20 |
||
|
|
f8a4611b47 |
FROMLIST: mm: add speculative_page_walk_begin() and speculative_page_walk_end()
Speculative page faults will use these to protect against races with page table reclamation. This could always be handled by disabling local IRQs as the fast GUP code does; however speculative page faults do not need to protect against races with THP page splitting, so a weaker rcu read lock is sufficient in the MMU_GATHER_RCU_TABLE_FREE case. Signed-off-by: Michel Lespinasse <michel@lespinasse.org> Link: https://lore.kernel.org/all/20220128131006.67712-15-michel@lespinasse.org/ Bug: 161210518 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I3efe5fc6a5a49d537cf33e8093daeea42550077a |
||
|
|
4e2e391ff7 |
BACKPORT: FROMLIST: mm: add do_handle_mm_fault()
Add a new do_handle_mm_fault function, which extends the existing handle_mm_fault() API by adding an mmap sequence count, to be used in the FAULT_FLAG_SPECULATIVE case. In the initial implementation, FAULT_FLAG_SPECULATIVE always fails (by returning VM_FAULT_RETRY). The existing handle_mm_fault() API is kept as a wrapper around do_handle_mm_fault() so that we do not have to immediately update every handle_mm_fault() call site. Signed-off-by: Michel Lespinasse <michel@lespinasse.org> Conflicts: mm/memory.c 1. Trivial merge conflict due to folios. Link: https://lore.kernel.org/all/20220128131006.67712-10-michel@lespinasse.org/ Bug: 161210518 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Ic07b6d84af3e5d1fcc856e0968f1a6dd1544fa88 |
||
|
|
57f3bb2b12 |
BACKPORT: FROMLIST: do_anonymous_page: reduce code duplication
In do_anonymous_page(), we have separate cases for the zero page vs allocating new anonymous pages. However, once the pte entry has been computed, the rest of the handling (mapping and locking the page table, checking that we didn't lose a race with another page fault handler, etc) is identical between the two cases. This change reduces the code duplication between the two cases. Signed-off-by: Michel Lespinasse <michel@lespinasse.org> Conflicts: mm/memory.c 1. Trivial merge conflict caused by folios in mem_cgroup_charge call. Link: https://lore.kernel.org/all/20220128131006.67712-6-michel@lespinasse.org/ Bug: 161210518 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Ic19579571925878d632e43aa40b9f50cdf473ee6 |
||
|
|
82ab55ebcc |
FROMLIST: do_anonymous_page: use update_mmu_tlb()
update_mmu_tlb() can be used instead of update_mmu_cache() when the page fault handler detects that it lost the race to another page fault. It looks like this one call was missed in https://patchwork.kernel.org/project/linux-mips/patch/1590375160-6997-2-git-send-email-maobibo@loongson.cn after Andrew asked to replace all update_mmu_cache() calls with an alias in the previous version of this patch here: https://patchwork.kernel.org/project/linux-mips/patch/1590031837-9582-2-git-send-email-maobibo@loongson.cn/#23374625 Signed-off-by: Michel Lespinasse <michel@lespinasse.org> Link: https://lore.kernel.org/all/20220128131006.67712-5-michel@lespinasse.org/ Bug: 161210518 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Iaad4d3c27e12c2d9bf68d1140709788fc8dead24 |
||
|
|
1a77f04aac |
Revert "ANDROID: mm: Throttle rss_stat tracepoint"
This reverts commit
|
||
|
|
d1a66e7942 |
Merge tag 'v5.15' into android-mainline
Linux 5.15 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I81d96ada6b66bcf73d192ddde9018f1804e7d90b |
||
|
|
eac96c3efd |
mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
When handling shmem page fault the THP with corrupted subpage could be PMD mapped if certain conditions are satisfied. But kernel is supposed to send SIGBUS when trying to map hwpoisoned page. There are two paths which may do PMD map: fault around and regular fault. Before commit |
||
|
|
2d03f6ec4b |
Merge 92477dd1fa ("Merge tag 's390-5.15-ebpf-jit-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux") into android-mainline
Steps on the way to 5.15-rc3 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ia35478d4c4a085bff847b8cd484bd3290f9529b8 |
||
|
|
6e0e99d58a |
afs: Fix mmap coherency vs 3rd-party changes
Fix the coherency management of mmap'd data such that 3rd-party changes
become visible as soon as possible after the callback notification is
delivered by the fileserver. This is done by the following means:
(1) When we break a callback on a vnode specified by the CB.CallBack call
from the server, we queue a work item (vnode->cb_work) to go and
clobber all the PTEs mapping to that inode.
This causes the CPU to trip through the ->map_pages() and
->page_mkwrite() handlers if userspace attempts to access the page(s)
again.
(Ideally, this would be done in the service handler for CB.CallBack,
but the server is waiting for our reply before considering, and we
have a list of vnodes, all of which need breaking - and the process of
getting the mmap_lock and stripping the PTEs on all CPUs could be
quite slow.)
(2) Call afs_validate() from the ->map_pages() handler to check to see if
the file has changed and to get a new callback promise from the
server.
Also handle the fileserver telling us that it's dropping all callbacks,
possibly after it's been restarted by sending us a CB.InitCallBackState*
call by the following means:
(3) Maintain a per-cell list of afs files that are currently mmap'd
(cell->fs_open_mmaps).
(4) Add a work item to each server that is invoked if there are any open
mmaps when CB.InitCallBackState happens. This work item goes through
the aforementioned list and invokes the vnode->cb_work work item for
each one that is currently using this server.
This causes the PTEs to be cleared, causing ->map_pages() or
->page_mkwrite() to be called again, thereby calling afs_validate()
again.
I've chosen to simply strip the PTEs at the point of notification reception
rather than invalidate all the pages as well because (a) it's faster, (b)
we may get a notification for other reasons than the data being altered (in
which case we don't want to clobber the pagecache) and (c) we need to ask
the server to find out - and I don't want to wait for the reply before
holding up userspace.
This was tested using the attached test program:
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
int main(int argc, char *argv[])
{
size_t size = getpagesize();
unsigned char *p;
bool mod = (argc == 3);
int fd;
if (argc != 2 && argc != 3) {
fprintf(stderr, "Format: %s <file> [mod]\n", argv[0]);
exit(2);
}
fd = open(argv[1], mod ? O_RDWR : O_RDONLY);
if (fd < 0) {
perror(argv[1]);
exit(1);
}
p = mmap(NULL, size, mod ? PROT_READ|PROT_WRITE : PROT_READ,
MAP_SHARED, fd, 0);
if (p == MAP_FAILED) {
perror("mmap");
exit(1);
}
for (;;) {
if (mod) {
p[0]++;
msync(p, size, MS_ASYNC);
fsync(fd);
}
printf("%02x", p[0]);
fflush(stdout);
sleep(1);
}
}
It runs in two modes: in one mode, it mmaps a file, then sits in a loop
reading the first byte, printing it and sleeping for a second; in the
second mode it mmaps a file, then sits in a loop incrementing the first
byte and flushing, then printing and sleeping.
Two instances of this program can be run on different machines, one doing
the reading and one doing the writing. The reader should see the changes
made by the writer, but without this patch, they aren't because validity
checking is being done lazily - only on entry to the filesystem.
Testing the InitCallBackState change is more complicated. The server has
to be taken offline, the saved callback state file removed and then the
server restarted whilst the reading-mode program continues to run. The
client machine then has to poke the server to trigger the InitCallBackState
call.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Markus Suvanto <markus.suvanto@gmail.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/163111668833.283156.382633263709075739.stgit@warthog.procyon.org.uk/
|