Commit Graph

17592 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
046ce7a74e Merge 5.15.59 into android14-5.15
Changes in 5.15.59
	Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
	Revert "ocfs2: mount shared volume without ha stack"
	ntfs: fix use-after-free in ntfs_ucsncmp()
	fs: sendfile handles O_NONBLOCK of out_fd
	secretmem: fix unhandled fault in truncate
	mm: fix page leak with multiple threads mapping the same page
	hugetlb: fix memoryleak in hugetlb_mcopy_atomic_pte
	asm-generic: remove a broken and needless ifdef conditional
	s390/archrandom: prevent CPACF trng invocations in interrupt context
	nouveau/svm: Fix to migrate all requested pages
	drm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid()
	watch_queue: Fix missing rcu annotation
	watch_queue: Fix missing locking in add_watch_to_object()
	tcp: Fix data-races around sysctl_tcp_dsack.
	tcp: Fix a data-race around sysctl_tcp_app_win.
	tcp: Fix a data-race around sysctl_tcp_adv_win_scale.
	tcp: Fix a data-race around sysctl_tcp_frto.
	tcp: Fix a data-race around sysctl_tcp_nometrics_save.
	tcp: Fix data-races around sysctl_tcp_no_ssthresh_metrics_save.
	ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)
	ice: do not setup vlan for loopback VSI
	scsi: ufs: host: Hold reference returned by of_parse_phandle()
	Revert "tcp: change pingpong threshold to 3"
	octeontx2-pf: Fix UDP/TCP src and dst port tc filters
	tcp: Fix data-races around sysctl_tcp_moderate_rcvbuf.
	tcp: Fix a data-race around sysctl_tcp_limit_output_bytes.
	tcp: Fix a data-race around sysctl_tcp_challenge_ack_limit.
	scsi: core: Fix warning in scsi_alloc_sgtables()
	scsi: mpt3sas: Stop fw fault watchdog work item during system shutdown
	net: ping6: Fix memleak in ipv6_renew_options().
	ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr
	net/tls: Remove the context from the list in tls_device_down
	igmp: Fix data-races around sysctl_igmp_qrv.
	net: pcs: xpcs: propagate xpcs_read error to xpcs_get_state_c37_sgmii
	net: sungem_phy: Add of_node_put() for reference returned by of_get_parent()
	tcp: Fix a data-race around sysctl_tcp_min_tso_segs.
	tcp: Fix a data-race around sysctl_tcp_min_rtt_wlen.
	tcp: Fix a data-race around sysctl_tcp_autocorking.
	tcp: Fix a data-race around sysctl_tcp_invalid_ratelimit.
	Documentation: fix sctp_wmem in ip-sysctl.rst
	macsec: fix NULL deref in macsec_add_rxsa
	macsec: fix error message in macsec_add_rxsa and _txsa
	macsec: limit replay window size with XPN
	macsec: always read MACSEC_SA_ATTR_PN as a u64
	net: macsec: fix potential resource leak in macsec_add_rxsa() and macsec_add_txsa()
	net: mld: fix reference count leak in mld_{query | report}_work()
	tcp: Fix data-races around sk_pacing_rate.
	net: Fix data-races around sysctl_[rw]mem(_offset)?.
	tcp: Fix a data-race around sysctl_tcp_comp_sack_delay_ns.
	tcp: Fix a data-race around sysctl_tcp_comp_sack_slack_ns.
	tcp: Fix a data-race around sysctl_tcp_comp_sack_nr.
	tcp: Fix data-races around sysctl_tcp_reflect_tos.
	ipv4: Fix data-races around sysctl_fib_notify_on_flag_change.
	i40e: Fix interface init with MSI interrupts (no MSI-X)
	sctp: fix sleep in atomic context bug in timer handlers
	octeontx2-pf: cn10k: Fix egress ratelimit configuration
	netfilter: nf_queue: do not allow packet truncation below transport header offset
	virtio-net: fix the race between refill work and close
	perf symbol: Correct address for bss symbols
	sfc: disable softirqs for ptp TX
	sctp: leave the err path free in sctp_stream_init to sctp_stream_free
	ARM: crypto: comment out gcc warning that breaks clang builds
	mm/hmm: fault non-owner device private entries
	page_alloc: fix invalid watermark check on a negative value
	ARM: 9216/1: Fix MAX_DMA_ADDRESS overflow
	EDAC/ghes: Set the DIMM label unconditionally
	docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed
	locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter
	x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available
	Linux 5.15.59

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4f2002d38aea467e150a912f50d456c41b23de89
2022-08-04 15:18:41 +02:00
Marc Zyngier
d7ddd989d6 ANDROID: mm/vmalloc: Add arch-specific callbacks to track io{remap,unmap} physical pages
Add a pair of hooks (ioremap_phys_range_hook/iounmap_phys_range_hook)
that can be implemented by an architecture. Contrary to the existing
arch_sync_kernel_mappings(), this one tracks things at the physical
address level.

This is specially useful in these virtualised environments where
the guest has to tell the host whether (and how) it intends to use
a MMIO device.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 233587962
Change-Id: I970c2e632cb2b01060d5e66e4194fa9248188f43
Signed-off-by: Will Deacon <willdeacon@google.com>
2022-08-04 13:03:53 +00:00
Will Deacon
432cf24bb2 Revert "ANDROID: mm/vmalloc: Add arch-specific callbacks to track io{remap,unmap} physical pages"
This reverts commit acd8b4b1f1.

Bug: 233587962
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ie56327837f823b424f3438d5b974c8086f8ba1e9
2022-08-04 13:03:53 +00:00
Jaewon Kim
86e83233dd page_alloc: fix invalid watermark check on a negative value
commit 9282012fc0aa248b77a69f5eb802b67c5a16bb13 upstream.

There was a report that a task is waiting at the
throttle_direct_reclaim. The pgscan_direct_throttle in vmstat was
increasing.

This is a bug where zone_watermark_fast returns true even when the free
is very low. The commit f27ce0e140 ("page_alloc: consider highatomic
reserve in watermark fast") changed the watermark fast to consider
highatomic reserve. But it did not handle a negative value case which
can be happened when reserved_highatomic pageblock is bigger than the
actual free.

If watermark is considered as ok for the negative value, allocating
contexts for order-0 will consume all free pages without direct reclaim,
and finally free page may become depleted except highatomic free.

Then allocating contexts may fall into throttle_direct_reclaim. This
symptom may easily happen in a system where wmark min is low and other
reclaimers like kswapd does not make free pages quickly.

Handle the negative case by using MIN.

Link: https://lkml.kernel.org/r/20220725095212.25388-1-jaewon31.kim@samsung.com
Fixes: f27ce0e140 ("page_alloc: consider highatomic reserve in watermark fast")
Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
Reported-by: GyeongHwan Hong <gh21.hong@samsung.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yong-Taek Lee <ytk.lee@samsung.com>
Cc: <stable@vger.kerenl.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03 12:03:55 +02:00
Ralph Campbell
51a772c34e mm/hmm: fault non-owner device private entries
commit 8a295dbbaf7292c582a40ce469c326f472d51f66 upstream.

If hmm_range_fault() is called with the HMM_PFN_REQ_FAULT flag and a
device private PTE is found, the hmm_range::dev_private_owner page is used
to determine if the device private page should not be faulted in.
However, if the device private page is not owned by the caller,
hmm_range_fault() returns an error instead of calling migrate_to_ram() to
fault in the page.

For example, if a page is migrated to GPU private memory and a RDMA fault
capable NIC tries to read the migrated page, without this patch it will
get an error.  With this patch, the page will be migrated back to system
memory and the NIC will be able to read the data.

Link: https://lkml.kernel.org/r/20220727000837.4128709-2-rcampbell@nvidia.com
Link: https://lkml.kernel.org/r/20220725183615.4118795-2-rcampbell@nvidia.com
Fixes: 08ddddda66 ("mm/hmm: check the device private page owner in hmm_range_fault()")
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Reported-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Cc: Philip Yang <Philip.Yang@amd.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03 12:03:54 +02:00
Miaohe Lin
dc124c849c hugetlb: fix memoryleak in hugetlb_mcopy_atomic_pte
commit da9a298f5fad0dc615079a340da42928bc5b138e upstream.

When alloc_huge_page fails, *pagep is set to NULL without put_page first.
So the hugepage indicated by *pagep is leaked.

Link: https://lkml.kernel.org/r/20220709092629.54291-1-linmiaohe@huawei.com
Fixes: 8cc5fcbb5b ("mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03 12:03:42 +02:00
Josef Bacik
2722fb0f70 mm: fix page leak with multiple threads mapping the same page
commit 3fe2895cfecd03ac74977f32102b966b6589f481 upstream.

We have an application with a lot of threads that use a shared mmap backed
by tmpfs mounted with -o huge=within_size.  This application started
leaking loads of huge pages when we upgraded to a recent kernel.

Using the page ref tracepoints and a BPF program written by Tejun Heo we
were able to determine that these pages would have multiple refcounts from
the page fault path, but when it came to unmap time we wouldn't drop the
number of refs we had added from the faults.

I wrote a reproducer that mmap'ed a file backed by tmpfs with -o
huge=always, and then spawned 20 threads all looping faulting random
offsets in this map, while using madvise(MADV_DONTNEED) randomly for huge
page aligned ranges.  This very quickly reproduced the problem.

The problem here is that we check for the case that we have multiple
threads faulting in a range that was previously unmapped.  One thread maps
the PMD, the other thread loses the race and then returns 0.  However at
this point we already have the page, and we are no longer putting this
page into the processes address space, and so we leak the page.  We
actually did the correct thing prior to f9ce0be71d, however it looks
like Kirill copied what we do in the anonymous page case.  In the
anonymous page case we don't yet have a page, so we don't have to drop a
reference on anything.  Previously we did the correct thing for file based
faults by returning VM_FAULT_NOPAGE so we correctly drop the reference on
the page we faulted in.

Fix this by returning VM_FAULT_NOPAGE in the pmd_devmap_trans_unstable()
case, this makes us drop the ref on the page properly, and now my
reproducer no longer leaks the huge pages.

[josef@toxicpanda.com: v2]
  Link: https://lkml.kernel.org/r/e90c8f0dbae836632b669c2afc434006a00d4a67.1657721478.git.josef@toxicpanda.com
Link: https://lkml.kernel.org/r/2b798acfd95c9ab9395fe85e8d5a835e2e10a920.1657051137.git.josef@toxicpanda.com
Fixes: f9ce0be71d ("mm: Cleanup faultaround and finish_fault() codepaths")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Chris Mason <clm@fb.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03 12:03:42 +02:00
Mike Rapoport
70d0ce332d secretmem: fix unhandled fault in truncate
commit 84ac013046ccc438af04b7acecd4d3ab84fe4bde upstream.

syzkaller reports the following issue:

BUG: unable to handle page fault for address: ffff888021f7e005
PGD 11401067 P4D 11401067 PUD 11402067 PMD 21f7d063 PTE 800fffffde081060
Oops: 0002 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 3761 Comm: syz-executor281 Not tainted 5.19.0-rc4-syzkaller-00014-g941e3e791269 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:memset_erms+0x9/0x10 arch/x86/lib/memset_64.S:64
Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
RSP: 0018:ffffc9000329fa90 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000ffb
RDX: 0000000000000ffb RSI: 0000000000000000 RDI: ffff888021f7e005
RBP: ffffea000087df80 R08: 0000000000000001 R09: ffff888021f7e005
R10: ffffed10043efdff R11: 0000000000000000 R12: 0000000000000005
R13: 0000000000000000 R14: 0000000000001000 R15: 0000000000000ffb
FS:  00007fb29d8b2700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff888021f7e005 CR3: 0000000026e7b000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 zero_user_segments include/linux/highmem.h:272 [inline]
 folio_zero_range include/linux/highmem.h:428 [inline]
 truncate_inode_partial_folio+0x76a/0xdf0 mm/truncate.c:237
 truncate_inode_pages_range+0x83b/0x1530 mm/truncate.c:381
 truncate_inode_pages mm/truncate.c:452 [inline]
 truncate_pagecache+0x63/0x90 mm/truncate.c:753
 simple_setattr+0xed/0x110 fs/libfs.c:535
 secretmem_setattr+0xae/0xf0 mm/secretmem.c:170
 notify_change+0xb8c/0x12b0 fs/attr.c:424
 do_truncate+0x13c/0x200 fs/open.c:65
 do_sys_ftruncate+0x536/0x730 fs/open.c:193
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fb29d900899
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 11 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fb29d8b2318 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
RAX: ffffffffffffffda RBX: 00007fb29d988408 RCX: 00007fb29d900899
RDX: 00007fb29d900899 RSI: 0000000000000005 RDI: 0000000000000003
RBP: 00007fb29d988400 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb29d98840c
R13: 00007ffca01a23bf R14: 00007fb29d8b2400 R15: 0000000000022000
 </TASK>
Modules linked in:
CR2: ffff888021f7e005
---[ end trace 0000000000000000 ]---

Eric Biggers suggested that this happens when
secretmem_setattr()->simple_setattr() races with secretmem_fault() so that
a page that is faulted in by secretmem_fault() (and thus removed from the
direct map) is zeroed by inode truncation right afterwards.

Use mapping->invalidate_lock to make secretmem_fault() and
secretmem_setattr() mutually exclusive.

[rppt@linux.ibm.com: v3]
  Link: https://lkml.kernel.org/r/20220714091337.412297-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20220707165650.248088-1-rppt@kernel.org
Reported-by: syzbot+9bd2b7adbd34b30b87e4@syzkaller.appspotmail.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03 12:03:41 +02:00
Greg Kroah-Hartman
6d2ac8a0a4 Merge 5.15.58 into android14-5.15
Changes in 5.15.58
	pinctrl: stm32: fix optional IRQ support to gpios
	riscv: add as-options for modules with assembly compontents
	mlxsw: spectrum_router: Fix IPv4 nexthop gateway indication
	lockdown: Fix kexec lockdown bypass with ima policy
	drm/ttm: fix locking in vmap/vunmap TTM GEM helpers
	bus: mhi: host: pci_generic: add Telit FN980 v1 hardware revision
	bus: mhi: host: pci_generic: add Telit FN990
	Revert "selftest/vm: verify remap destination address in mremap_test"
	Revert "selftest/vm: verify mmap addr in mremap_test"
	PCI: hv: Fix multi-MSI to allow more than one MSI vector
	PCI: hv: Fix hv_arch_irq_unmask() for multi-MSI
	PCI: hv: Reuse existing IRTE allocation in compose_msi_msg()
	PCI: hv: Fix interrupt mapping for multi-MSI
	serial: mvebu-uart: correctly report configured baudrate value
	batman-adv: Use netif_rx_any_context() any.
	Revert "mt76: mt7921: Fix the error handling path of mt7921_pci_probe()"
	Revert "mt76: mt7921e: fix possible probe failure after reboot"
	mt76: mt7921: use physical addr to unify register access
	mt76: mt7921e: fix possible probe failure after reboot
	mt76: mt7921: Fix the error handling path of mt7921_pci_probe()
	xfs: fix maxlevels comparisons in the btree staging code
	xfs: fold perag loop iteration logic into helper function
	xfs: rename the next_agno perag iteration variable
	xfs: terminate perag iteration reliably on agcount
	xfs: fix perag reference leak on iteration race with growfs
	xfs: prevent a WARN_ONCE() in xfs_ioc_attr_list()
	r8152: fix a WOL issue
	ip: Fix data-races around sysctl_ip_default_ttl.
	xfrm: xfrm_policy: fix a possible double xfrm_pols_put() in xfrm_bundle_lookup()
	power/reset: arm-versatile: Fix refcount leak in versatile_reboot_probe
	RDMA/irdma: Do not advertise 1GB page size for x722
	RDMA/irdma: Fix sleep from invalid context BUG
	pinctrl: ralink: rename MT7628(an) functions to MT76X8
	pinctrl: ralink: rename pinctrl-rt2880 to pinctrl-ralink
	pinctrl: ralink: Check for null return of devm_kcalloc
	perf/core: Fix data race between perf_event_set_output() and perf_mmap_close()
	ipv4/tcp: do not use per netns ctl sockets
	net: tun: split run_ebpf_filter() and pskb_trim() into different "if statement"
	mm/pagealloc: sysctl: change watermark_scale_factor max limit to 30%
	sysctl: move some boundary constants from sysctl.c to sysctl_vals
	tcp: Fix data-races around sysctl_tcp_ecn.
	drm/amd/display: Support for DMUB HPD interrupt handling
	drm/amd/display: Add option to defer works of hpd_rx_irq
	drm/amd/display: Fork thread to offload work of hpd_rx_irq
	drm/amdgpu/display: add quirk handling for stutter mode
	drm/amd/display: Ignore First MST Sideband Message Return Error
	scsi: megaraid: Clear READ queue map's nr_queues
	scsi: ufs: core: Drop loglevel of WriteBoost message
	nvme: check for duplicate identifiers earlier
	nvme: fix block device naming collision
	e1000e: Enable GPT clock before sending message to CSME
	Revert "e1000e: Fix possible HW unit hang after an s0ix exit"
	igc: Reinstate IGC_REMOVED logic and implement it properly
	ip: Fix data-races around sysctl_ip_no_pmtu_disc.
	ip: Fix data-races around sysctl_ip_fwd_use_pmtu.
	ip: Fix data-races around sysctl_ip_fwd_update_priority.
	ip: Fix data-races around sysctl_ip_nonlocal_bind.
	ip: Fix a data-race around sysctl_ip_autobind_reuse.
	ip: Fix a data-race around sysctl_fwmark_reflect.
	tcp/dccp: Fix a data-race around sysctl_tcp_fwmark_accept.
	tcp: sk->sk_bound_dev_if once in inet_request_bound_dev_if()
	tcp: Fix data-races around sysctl_tcp_l3mdev_accept.
	tcp: Fix data-races around sysctl_tcp_mtu_probing.
	tcp: Fix data-races around sysctl_tcp_base_mss.
	tcp: Fix data-races around sysctl_tcp_min_snd_mss.
	tcp: Fix a data-race around sysctl_tcp_mtu_probe_floor.
	tcp: Fix a data-race around sysctl_tcp_probe_threshold.
	tcp: Fix a data-race around sysctl_tcp_probe_interval.
	net: stmmac: fix pm runtime issue in stmmac_dvr_remove()
	net: stmmac: fix unbalanced ptp clock issue in suspend/resume flow
	mtd: rawnand: gpmi: validate controller clock rate
	mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout based on program/erase times
	net: dsa: microchip: ksz_common: Fix refcount leak bug
	net: skb: introduce kfree_skb_reason()
	net: skb: use kfree_skb_reason() in tcp_v4_rcv()
	net: skb: use kfree_skb_reason() in __udp4_lib_rcv()
	net: socket: rename SKB_DROP_REASON_SOCKET_FILTER
	net: skb_drop_reason: add document for drop reasons
	net: netfilter: use kfree_drop_reason() for NF_DROP
	net: ipv4: use kfree_skb_reason() in ip_rcv_core()
	net: ipv4: use kfree_skb_reason() in ip_rcv_finish_core()
	i2c: mlxcpld: Fix register setting for 400KHz frequency
	i2c: cadence: Change large transfer count reset logic to be unconditional
	perf tests: Fix Convert perf time to TSC test for hybrid
	net: stmmac: fix dma queue left shift overflow issue
	net/tls: Fix race in TLS device down flow
	igmp: Fix data-races around sysctl_igmp_llm_reports.
	igmp: Fix a data-race around sysctl_igmp_max_memberships.
	igmp: Fix data-races around sysctl_igmp_max_msf.
	tcp: Fix data-races around keepalive sysctl knobs.
	tcp: Fix data-races around sysctl_tcp_syn(ack)?_retries.
	tcp: Fix data-races around sysctl_tcp_syncookies.
	tcp: Fix data-races around sysctl_tcp_migrate_req.
	tcp: Fix data-races around sysctl_tcp_reordering.
	tcp: Fix data-races around some timeout sysctl knobs.
	tcp: Fix a data-race around sysctl_tcp_notsent_lowat.
	tcp: Fix a data-race around sysctl_tcp_tw_reuse.
	tcp: Fix data-races around sysctl_max_syn_backlog.
	tcp: Fix data-races around sysctl_tcp_fastopen.
	tcp: Fix data-races around sysctl_tcp_fastopen_blackhole_timeout.
	iavf: Fix handling of dummy receive descriptors
	pinctrl: armada-37xx: Use temporary variable for struct device
	pinctrl: armada-37xx: Make use of the devm_platform_ioremap_resource()
	pinctrl: armada-37xx: Convert to use dev_err_probe()
	pinctrl: armada-37xx: use raw spinlocks for regmap to avoid invalid wait context
	i40e: Fix erroneous adapter reinitialization during recovery process
	ixgbe: Add locking to prevent panic when setting sriov_numvfs to zero
	net: stmmac: remove redunctant disable xPCS EEE call
	gpio: pca953x: only use single read/write for No AI mode
	gpio: pca953x: use the correct range when do regmap sync
	gpio: pca953x: use the correct register address when regcache sync during init
	be2net: Fix buffer overflow in be_get_module_eeprom
	net: dsa: sja1105: silent spi_device_id warnings
	net: dsa: vitesse-vsc73xx: silent spi_device_id warnings
	drm/imx/dcss: Add missing of_node_put() in fail path
	ipv4: Fix a data-race around sysctl_fib_multipath_use_neigh.
	ipv4: Fix data-races around sysctl_fib_multipath_hash_policy.
	ipv4: Fix data-races around sysctl_fib_multipath_hash_fields.
	ip: Fix data-races around sysctl_ip_prot_sock.
	udp: Fix a data-race around sysctl_udp_l3mdev_accept.
	tcp: Fix data-races around sysctl knobs related to SYN option.
	tcp: Fix a data-race around sysctl_tcp_early_retrans.
	tcp: Fix data-races around sysctl_tcp_recovery.
	tcp: Fix a data-race around sysctl_tcp_thin_linear_timeouts.
	tcp: Fix data-races around sysctl_tcp_slow_start_after_idle.
	tcp: Fix a data-race around sysctl_tcp_retrans_collapse.
	tcp: Fix a data-race around sysctl_tcp_stdurg.
	tcp: Fix a data-race around sysctl_tcp_rfc1337.
	tcp: Fix a data-race around sysctl_tcp_abort_on_overflow.
	tcp: Fix data-races around sysctl_tcp_max_reordering.
	gpio: gpio-xilinx: Fix integer overflow
	KVM: selftests: Fix target thread to be migrated in rseq_test
	spi: bcm2835: bcm2835_spi_handle_err(): fix NULL pointer deref for non DMA transfers
	KVM: Don't null dereference ops->destroy
	mm/mempolicy: fix uninit-value in mpol_rebind_policy()
	bpf: Make sure mac_header was set before using it
	sched/deadline: Fix BUG_ON condition for deboosted tasks
	x86/bugs: Warn when "ibrs" mitigation is selected on Enhanced IBRS parts
	dlm: fix pending remove if msg allocation fails
	x86/uaccess: Implement macros for CMPXCHG on user addresses
	x86/extable: Tidy up redundant handler functions
	x86/extable: Get rid of redundant macros
	x86/mce: Deduplicate exception handling
	x86/extable: Rework the exception table mechanics
	x86/extable: Provide EX_TYPE_DEFAULT_MCE_SAFE and EX_TYPE_FAULT_MCE_SAFE
	bitfield.h: Fix "type of reg too small for mask" test
	x86/entry_32: Remove .fixup usage
	x86/extable: Extend extable functionality
	x86/msr: Remove .fixup usage
	x86/futex: Remove .fixup usage
	KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses
	xhci: dbc: refactor xhci_dbc_init()
	xhci: dbc: create and remove dbc structure in dbgtty driver.
	xhci: dbc: Rename xhci_dbc_init and xhci_dbc_exit
	xhci: Set HCD flag to defer primary roothub registration
	mt76: fix use-after-free by removing a non-RCU wcid pointer
	iwlwifi: fw: uefi: add missing include guards
	crypto: qat - set to zero DH parameters before free
	crypto: qat - use pre-allocated buffers in datapath
	crypto: qat - refactor submission logic
	crypto: qat - add backlog mechanism
	crypto: qat - fix memory leak in RSA
	crypto: qat - remove dma_free_coherent() for RSA
	crypto: qat - remove dma_free_coherent() for DH
	crypto: qat - add param check for RSA
	crypto: qat - add param check for DH
	crypto: qat - re-enable registration of algorithms
	exfat: fix referencing wrong parent directory information after renaming
	tracing: Have event format check not flag %p* on __get_dynamic_array()
	tracing: Place trace_pid_list logic into abstract functions
	tracing: Fix return value of trace_pid_write()
	um: virtio_uml: Allow probing from devicetree
	um: virtio_uml: Fix broken device handling in time-travel
	Bluetooth: Add bt_skb_sendmsg helper
	Bluetooth: Add bt_skb_sendmmsg helper
	Bluetooth: SCO: Replace use of memcpy_from_msg with bt_skb_sendmsg
	Bluetooth: RFCOMM: Replace use of memcpy_from_msg with bt_skb_sendmmsg
	Bluetooth: Fix passing NULL to PTR_ERR
	Bluetooth: SCO: Fix sco_send_frame returning skb->len
	Bluetooth: Fix bt_skb_sendmmsg not allocating partial chunks
	exfat: use updated exfat_chain directly during renaming
	drm/amd/display: Reset DMCUB before HW init
	drm/amd/display: Optimize bandwidth on following fast update
	drm/amd/display: Fix surface optimization regression on Carrizo
	x86/amd: Use IBPB for firmware calls
	x86/alternative: Report missing return thunk details
	watchqueue: make sure to serialize 'wqueue->defunct' properly
	tty: drivers/tty/, stop using tty_schedule_flip()
	tty: the rest, stop using tty_schedule_flip()
	tty: drop tty_schedule_flip()
	tty: extract tty_flip_buffer_commit() from tty_flip_buffer_push()
	tty: use new tty_insert_flip_string_and_push_buffer() in pty_write()
	net: usb: ax88179_178a needs FLAG_SEND_ZLP
	watch-queue: remove spurious double semicolon
	drm/amd/display: Don't lock connection_mutex for DMUB HPD
	drm/amd/display: invalid parameter check in dmub_hpd_callback
	x86/extable: Prefer local labels in .set directives
	KVM: x86: fix typo in __try_cmpxchg_user causing non-atomicness
	x86: drop bogus "cc" clobber from __try_cmpxchg_user_asm()
	drm/amdgpu: Off by one in dm_dmub_outbox1_low_irq()
	x86/entry_32: Fix segment exceptions
	drm/amd/display: Fix wrong format specifier in amdgpu_dm.c
	Linux 5.15.58

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6655a937b4226d3011278d13df84b25e5ab4b9ef
2022-08-02 08:37:15 +02:00
Wang Cheng
8c5429a04c mm/mempolicy: fix uninit-value in mpol_rebind_policy()
commit 018160ad314d75b1409129b2247b614a9f35894c upstream.

mpol_set_nodemask()(mm/mempolicy.c) does not set up nodemask when
pol->mode is MPOL_LOCAL.  Check pol->mode before access
pol->w.cpuset_mems_allowed in mpol_rebind_policy()(mm/mempolicy.c).

BUG: KMSAN: uninit-value in mpol_rebind_policy mm/mempolicy.c:352 [inline]
BUG: KMSAN: uninit-value in mpol_rebind_task+0x2ac/0x2c0 mm/mempolicy.c:368
 mpol_rebind_policy mm/mempolicy.c:352 [inline]
 mpol_rebind_task+0x2ac/0x2c0 mm/mempolicy.c:368
 cpuset_change_task_nodemask kernel/cgroup/cpuset.c:1711 [inline]
 cpuset_attach+0x787/0x15e0 kernel/cgroup/cpuset.c:2278
 cgroup_migrate_execute+0x1023/0x1d20 kernel/cgroup/cgroup.c:2515
 cgroup_migrate kernel/cgroup/cgroup.c:2771 [inline]
 cgroup_attach_task+0x540/0x8b0 kernel/cgroup/cgroup.c:2804
 __cgroup1_procs_write+0x5cc/0x7a0 kernel/cgroup/cgroup-v1.c:520
 cgroup1_tasks_write+0x94/0xb0 kernel/cgroup/cgroup-v1.c:539
 cgroup_file_write+0x4c2/0x9e0 kernel/cgroup/cgroup.c:3852
 kernfs_fop_write_iter+0x66a/0x9f0 fs/kernfs/file.c:296
 call_write_iter include/linux/fs.h:2162 [inline]
 new_sync_write fs/read_write.c:503 [inline]
 vfs_write+0x1318/0x2030 fs/read_write.c:590
 ksys_write+0x28b/0x510 fs/read_write.c:643
 __do_sys_write fs/read_write.c:655 [inline]
 __se_sys_write fs/read_write.c:652 [inline]
 __x64_sys_write+0xdb/0x120 fs/read_write.c:652
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Uninit was created at:
 slab_post_alloc_hook mm/slab.h:524 [inline]
 slab_alloc_node mm/slub.c:3251 [inline]
 slab_alloc mm/slub.c:3259 [inline]
 kmem_cache_alloc+0x902/0x11c0 mm/slub.c:3264
 mpol_new mm/mempolicy.c:293 [inline]
 do_set_mempolicy+0x421/0xb70 mm/mempolicy.c:853
 kernel_set_mempolicy mm/mempolicy.c:1504 [inline]
 __do_sys_set_mempolicy mm/mempolicy.c:1510 [inline]
 __se_sys_set_mempolicy+0x44c/0xb60 mm/mempolicy.c:1507
 __x64_sys_set_mempolicy+0xd8/0x110 mm/mempolicy.c:1507
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
 entry_SYSCALL_64_after_hwframe+0x44/0xae

KMSAN: uninit-value in mpol_rebind_task (2)
https://syzkaller.appspot.com/bug?id=d6eb90f952c2a5de9ea718a1b873c55cb13b59dc

This patch seems to fix below bug too.
KMSAN: uninit-value in mpol_rebind_mm (2)
https://syzkaller.appspot.com/bug?id=f2fecd0d7013f54ec4162f60743a2b28df40926b

The uninit-value is pol->w.cpuset_mems_allowed in mpol_rebind_policy().
When syzkaller reproducer runs to the beginning of mpol_new(),

	    mpol_new() mm/mempolicy.c
	  do_mbind() mm/mempolicy.c
	kernel_mbind() mm/mempolicy.c

`mode` is 1(MPOL_PREFERRED), nodes_empty(*nodes) is `true` and `flags`
is 0. Then

	mode = MPOL_LOCAL;
	...
	policy->mode = mode;
	policy->flags = flags;

will be executed. So in mpol_set_nodemask(),

	    mpol_set_nodemask() mm/mempolicy.c
	  do_mbind()
	kernel_mbind()

pol->mode is 4 (MPOL_LOCAL), that `nodemask` in `pol` is not initialized,
which will be accessed in mpol_rebind_policy().

Link: https://lkml.kernel.org/r/20220512123428.fq3wofedp6oiotd4@ppc.localdomain
Signed-off-by: Wang Cheng <wanngchenng@gmail.com>
Reported-by: <syzbot+217f792c92599518a2ab@syzkaller.appspotmail.com>
Tested-by: <syzbot+217f792c92599518a2ab@syzkaller.appspotmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-29 17:25:24 +02:00
Greg Kroah-Hartman
56f32ebb01 Merge 5.15.56 into android14-5.15
Changes in 5.15.56
	ALSA: hda - Add fixup for Dell Latitidue E5430
	ALSA: hda/conexant: Apply quirk for another HP ProDesk 600 G3 model
	ALSA: hda/realtek: Fix headset mic for Acer SF313-51
	ALSA: hda/realtek - Fix headset mic problem for a HP machine with alc671
	ALSA: hda/realtek: fix mute/micmute LEDs for HP machines
	ALSA: hda/realtek - Fix headset mic problem for a HP machine with alc221
	ALSA: hda/realtek - Enable the headset-mic on a Xiaomi's laptop
	xen/netback: avoid entering xenvif_rx_next_skb() with an empty rx queue
	fix race between exit_itimers() and /proc/pid/timers
	mm: userfaultfd: fix UFFDIO_CONTINUE on fallocated shmem pages
	mm: split huge PUD on wp_huge_pud fallback
	tracing/histograms: Fix memory leak problem
	net: sock: tracing: Fix sock_exceed_buf_limit not to dereference stale pointer
	ip: fix dflt addr selection for connected nexthop
	ARM: 9213/1: Print message about disabled Spectre workarounds only once
	ARM: 9214/1: alignment: advance IT state after emulating Thumb instruction
	wifi: mac80211: fix queue selection for mesh/OCB interfaces
	cgroup: Use separate src/dst nodes when preloading css_sets for migration
	btrfs: return -EAGAIN for NOWAIT dio reads/writes on compressed and inline extents
	drm/panfrost: Put mapping instead of shmem obj on panfrost_mmu_map_fault_addr() error
	drm/panfrost: Fix shrinker list corruption by madvise IOCTL
	fs/remap: constrain dedupe of EOF blocks
	nilfs2: fix incorrect masking of permission flags for symlinks
	sh: convert nommu io{re,un}map() to static inline functions
	Revert "evm: Fix memleak in init_desc"
	xfs: only run COW extent recovery when there are no live extents
	xfs: don't include bnobt blocks when reserving free block pool
	xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks
	xfs: drop async cache flushes from CIL commits.
	reset: Fix devm bulk optional exclusive control getter
	ARM: dts: imx6qdl-ts7970: Fix ngpio typo and count
	spi: amd: Limit max transfer and message size
	ARM: 9209/1: Spectre-BHB: avoid pr_info() every time a CPU comes out of idle
	ARM: 9210/1: Mark the FDT_FIXED sections as shareable
	net/mlx5e: kTLS, Fix build time constant test in TX
	net/mlx5e: kTLS, Fix build time constant test in RX
	net/mlx5e: Fix enabling sriov while tc nic rules are offloaded
	net/mlx5e: Fix capability check for updating vnic env counters
	net/mlx5e: Ring the TX doorbell on DMA errors
	drm/i915: fix a possible refcount leak in intel_dp_add_mst_connector()
	ima: Fix a potential integer overflow in ima_appraise_measurement
	ASoC: sgtl5000: Fix noise on shutdown/remove
	ASoC: tas2764: Add post reset delays
	ASoC: tas2764: Fix and extend FSYNC polarity handling
	ASoC: tas2764: Correct playback volume range
	ASoC: tas2764: Fix amp gain register offset & default
	ASoC: Intel: Skylake: Correct the ssp rate discovery in skl_get_ssp_clks()
	ASoC: Intel: Skylake: Correct the handling of fmt_config flexible array
	net: stmmac: dwc-qos: Disable split header for Tegra194
	net: ethernet: ti: am65-cpsw: Fix devlink port register sequence
	sysctl: Fix data races in proc_dointvec().
	sysctl: Fix data races in proc_douintvec().
	sysctl: Fix data races in proc_dointvec_minmax().
	sysctl: Fix data races in proc_douintvec_minmax().
	sysctl: Fix data races in proc_doulongvec_minmax().
	sysctl: Fix data races in proc_dointvec_jiffies().
	tcp: Fix a data-race around sysctl_tcp_max_orphans.
	inetpeer: Fix data-races around sysctl.
	net: Fix data-races around sysctl_mem.
	cipso: Fix data-races around sysctl.
	icmp: Fix data-races around sysctl.
	ipv4: Fix a data-race around sysctl_fib_sync_mem.
	ARM: dts: at91: sama5d2: Fix typo in i2s1 node
	ARM: dts: sunxi: Fix SPI NOR campatible on Orange Pi Zero
	arm64: dts: broadcom: bcm4908: Fix timer node for BCM4906 SoC
	arm64: dts: broadcom: bcm4908: Fix cpu node for smp boot
	netfilter: nf_log: incorrect offset to network header
	netfilter: nf_tables: replace BUG_ON by element length check
	drm/i915/gvt: IS_ERR() vs NULL bug in intel_gvt_update_reg_whitelist()
	xen/gntdev: Ignore failure to unmap INVALID_GRANT_HANDLE
	lockd: set fl_owner when unlocking files
	lockd: fix nlm_close_files
	tracing: Fix sleeping while atomic in kdb ftdump
	drm/i915/selftests: fix a couple IS_ERR() vs NULL tests
	drm/i915/dg2: Add Wa_22011100796
	drm/i915/gt: Serialize GRDOM access between multiple engine resets
	drm/i915/gt: Serialize TLB invalidates with GT resets
	drm/i915/uc: correctly track uc_fw init failure
	drm/i915: Require the vm mutex for i915_vma_bind()
	bnxt_en: Fix bnxt_reinit_after_abort() code path
	bnxt_en: Fix bnxt_refclk_read()
	sysctl: Fix data-races in proc_dou8vec_minmax().
	sysctl: Fix data-races in proc_dointvec_ms_jiffies().
	icmp: Fix data-races around sysctl_icmp_echo_enable_probe.
	icmp: Fix a data-race around sysctl_icmp_ignore_bogus_error_responses.
	icmp: Fix a data-race around sysctl_icmp_errors_use_inbound_ifaddr.
	icmp: Fix a data-race around sysctl_icmp_ratelimit.
	icmp: Fix a data-race around sysctl_icmp_ratemask.
	raw: Fix a data-race around sysctl_raw_l3mdev_accept.
	tcp: Fix a data-race around sysctl_tcp_ecn_fallback.
	ipv4: Fix data-races around sysctl_ip_dynaddr.
	nexthop: Fix data-races around nexthop_compat_mode.
	net: ftgmac100: Hold reference returned by of_get_child_by_name()
	net: stmmac: fix leaks in probe
	ima: force signature verification when CONFIG_KEXEC_SIG is configured
	ima: Fix potential memory leak in ima_init_crypto()
	drm/amd/display: Only use depth 36 bpp linebuffers on DCN display engines.
	drm/amd/pm: Prevent divide by zero
	sfc: fix use after free when disabling sriov
	ceph: switch netfs read ops to use rreq->inode instead of rreq->mapping->host
	seg6: fix skb checksum evaluation in SRH encapsulation/insertion
	seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps behaviors
	seg6: bpf: fix skb checksum in bpf_push_seg6_encap()
	sfc: fix kernel panic when creating VF
	net: atlantic: remove deep parameter on suspend/resume functions
	net: atlantic: remove aq_nic_deinit() when resume
	KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op()
	net/tls: Check for errors in tls_device_init
	ACPI: video: Fix acpi_video_handles_brightness_key_presses()
	mm: sysctl: fix missing numa_stat when !CONFIG_HUGETLB_PAGE
	btrfs: rename btrfs_bio to btrfs_io_context
	btrfs: zoned: fix a leaked bioc in read_zone_info
	ksmbd: use SOCK_NONBLOCK type for kernel_accept()
	powerpc/xive/spapr: correct bitmap allocation size
	vdpa/mlx5: Initialize CVQ vringh only once
	vduse: Tie vduse mgmtdev and its device
	virtio_mmio: Add missing PM calls to freeze/restore
	virtio_mmio: Restore guest page size on resume
	netfilter: br_netfilter: do not skip all hooks with 0 priority
	scsi: hisi_sas: Limit max hw sectors for v3 HW
	cpufreq: pmac32-cpufreq: Fix refcount leak bug
	platform/x86: hp-wmi: Ignore Sanitization Mode event
	firmware: sysfb: Make sysfb_create_simplefb() return a pdev pointer
	firmware: sysfb: Add sysfb_disable() helper function
	fbdev: Disable sysfb device registration when removing conflicting FBs
	net: tipc: fix possible refcount leak in tipc_sk_create()
	NFC: nxp-nci: don't print header length mismatch on i2c error
	nvme-tcp: always fail a request when sending it failed
	nvme: fix regression when disconnect a recovering ctrl
	net: sfp: fix memory leak in sfp_probe()
	ASoC: ops: Fix off by one in range control validation
	pinctrl: aspeed: Fix potential NULL dereference in aspeed_pinmux_set_mux()
	ASoC: Realtek/Maxim SoundWire codecs: disable pm_runtime on remove
	ASoC: rt711-sdca-sdw: fix calibrate mutex initialization
	ASoC: Intel: sof_sdw: handle errors on card registration
	ASoC: rt711: fix calibrate mutex initialization
	ASoC: rt7*-sdw: harden jack_detect_handler
	ASoC: codecs: rt700/rt711/rt711-sdca: initialize workqueues in probe
	ASoC: SOF: Intel: hda-loader: Clarify the cl_dsp_init() flow
	ASoC: wcd938x: Fix event generation for some controls
	ASoC: Intel: bytcr_wm5102: Fix GPIO related probe-ordering problem
	ASoC: wm5110: Fix DRE control
	ASoC: rt711-sdca: fix kernel NULL pointer dereference when IO error
	ASoC: dapm: Initialise kcontrol data for mux/demux controls
	ASoC: cs47l15: Fix event generation for low power mux control
	ASoC: madera: Fix event generation for OUT1 demux
	ASoC: madera: Fix event generation for rate controls
	irqchip: or1k-pic: Undefine mask_ack for level triggered hardware
	x86: Clear .brk area at early boot
	soc: ixp4xx/npe: Fix unused match warning
	ARM: dts: stm32: use the correct clock source for CEC on stm32mp151
	Revert "can: xilinx_can: Limit CANFD brp to 2"
	ALSA: usb-audio: Add quirks for MacroSilicon MS2100/MS2106 devices
	ALSA: usb-audio: Add quirk for Fiero SC-01
	ALSA: usb-audio: Add quirk for Fiero SC-01 (fw v1.0.0)
	nvme-pci: phison e16 has bogus namespace ids
	signal handling: don't use BUG_ON() for debugging
	USB: serial: ftdi_sio: add Belimo device ids
	usb: typec: add missing uevent when partner support PD
	usb: dwc3: gadget: Fix event pending check
	tty: serial: samsung_tty: set dma burst_size to 1
	vt: fix memory overlapping when deleting chars in the buffer
	serial: 8250: fix return error code in serial8250_request_std_resource()
	serial: stm32: Clear prev values before setting RTS delays
	serial: pl011: UPSTAT_AUTORTS requires .throttle/unthrottle
	serial: 8250: Fix PM usage_count for console handover
	x86/pat: Fix x86_has_pat_wp()
	drm/aperture: Run fbdev removal before internal helpers
	Linux 5.15.56

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I763d2a7b49435bf2996b31e201aa9794ab64609e
2022-07-22 17:43:50 +02:00
Gowans, James
e4967d2288 mm: split huge PUD on wp_huge_pud fallback
commit 14c99d65941538aa33edd8dc7b1bbbb593c324a2 upstream.

Currently the implementation will split the PUD when a fallback is taken
inside the create_huge_pud function.  This isn't where it should be done:
the splitting should be done in wp_huge_pud, just like it's done for PMDs.
Reason being that if a callback is taken during create, there is no PUD
yet so nothing to split, whereas if a fallback is taken when encountering
a write protection fault there is something to split.

It looks like this was the original intention with the commit where the
splitting was introduced, but somehow it got moved to the wrong place
between v1 and v2 of the patch series.  Rebase mistake perhaps.

Link: https://lkml.kernel.org/r/6f48d622eb8bce1ae5dd75327b0b73894a2ec407.camel@amazon.com
Fixes: 327e9fd489 ("mm: Split huge pages on write-notify or COW")
Signed-off-by: James Gowans <jgowans@amazon.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-21 21:24:11 +02:00
Axel Rasmussen
27056f20d7 mm: userfaultfd: fix UFFDIO_CONTINUE on fallocated shmem pages
commit 73f37dbcfe1763ee2294c7717a1f571e27d17fd8 upstream.

When fallocate() is used on a shmem file, the pages we allocate can end up
with !PageUptodate.

Since UFFDIO_CONTINUE tries to find the existing page the user wants to
map with SGP_READ, we would fail to find such a page, since
shmem_getpage_gfp returns with a "NULL" pagep for SGP_READ if it discovers
!PageUptodate.  As a result, UFFDIO_CONTINUE returns -EFAULT, as it would
do if the page wasn't found in the page cache at all.

This isn't the intended behavior.  UFFDIO_CONTINUE is just trying to find
if a page exists, and doesn't care whether it still needs to be cleared or
not.  So, instead of SGP_READ, pass in SGP_NOALLOC.  This is the same,
except for one critical difference: in the !PageUptodate case, SGP_NOALLOC
will clear the page and then return it.  With this change, UFFDIO_CONTINUE
works properly (succeeds) on a shmem file which has been fallocated, but
otherwise not modified.

Link: https://lkml.kernel.org/r/20220610173812.1768919-1-axelrasmussen@google.com
Fixes: 153132571f ("userfaultfd/shmem: support UFFDIO_CONTINUE for shmem")
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-21 21:24:11 +02:00
Liujie Xie
1ed025b9a1 ANDROID: Allow vendor module to reclaim a memcg
Export try_to_free_mem_cgroup_pages function to allow vendor modules to reclaim a memory cgroup.

Bug: 192052083

Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: Iec6ef50f5c71c62d0c9aa6de90e56a143dac61c1
(cherry picked from commit a8385d61f2)
2022-07-19 12:47:38 +00:00
Liujie Xie
bf24c43b7f ANDROID: Export memcg functions to allow module to add new files
Export cgroup_add_legacy_cftypes and a helper function to allow vendor module to expose additional files in the memory cgroup hierarchy.

Bug: 192052083

Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: Ie2b936b3e77c7ab6d740d1bb6d70e03c70a326a7
(cherry picked from commit f41a95eadc)
2022-07-19 12:47:38 +00:00
Liujie Xie
7af5027889 ANDROID: vendor_hooks: add hooks in mem_cgroup subsystem
Add hooks to tune memory policy based on mem_cgroup.

Bug: 192052083

Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: Ica1a5409eed86fbd466edd2c7557f94972a40175
(cherry picked from commit 1cdcf76b15)
2022-07-19 12:47:38 +00:00
rongqianfeng
8da6ee328b ANDROID: vendor_hooks: add hook and OEM data for slab shrink
Some shrinker add lock in count_objects() may cause lock contention
issues and lead all task stall on slab shrink. Add vendor hook in
do_shrink_slab() for shrinker->count_objects() latency measuring.

Add 3 oem data in shrink_control struct. Two is for
shrinker->count_objects() and shrinker->scan_objects() latency
measuring, other one to store priority, some shrinker know the reclaimer
priority can control the memory reclaim more better.

Bug: 188684131

Change-Id: I80e9d90179bb52a99c54d9a067c6fcee835bb2ad
Signed-off-by: rongqianfeng <rongqianfeng@vivo.com>
2022-07-19 12:47:36 +00:00
zhaoyang.huang
109097ed1c ANDROID: Add vendor hook for MemcgV2 optimization
The associated vendor hooks/data are used for implementing dynamic memory.low protection based on memcgv2.

Bug: 232723420
Test: build pass
Change-Id: I2e92bdc2840af1eaaa08ee6427d2a82d78390005
Signed-off-by: zhaoyang.huang <zhaoyang.huang@unisoc.com>
2022-07-19 12:47:28 +00:00
Suren Baghdasaryan
0864756fb0 Revert "ANDROID: Use the notifier lock to perform file-backed vma teardown"
This reverts commit dc8ac508af.
Reason for revert: performance regression.

Bug: 234527424
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I5811c7ee2feac7f9057ed0ccd848e5de71f79354
2022-07-19 12:47:27 +00:00
wudean
842c68a977 ANDROID: vendor_hooks: bypass shrink slab
Add hooks for bypass shrink slab.

Bug: 185951972
(cherry picked from commit 396a6adfd3)

Change-Id: I343e02ae5cd6d076d525d0e4bfc09ecdfeda1d7b
Signed-off-by: wudean <dean.wu@vivo.com>
2022-07-19 12:47:26 +00:00
Liangcai Fan
bf46e6f5db BACKPORT: mm: khugepaged: recalculate min_free_kbytes after stopping khugepaged
When initializing transparent huge pages, min_free_kbytes would be
calculated according to what khugepaged expected.

So when transparent huge pages get disabled, min_free_kbytes should be
recalculated instead of the higher value set by khugepaged.

Link: https://lkml.kernel.org/r/1633937809-16558-1-git-send-email-liangcaifan19@gmail.com
Signed-off-by: Liangcai Fan <liangcaifan19@gmail.com>
Signed-off-by: Chunyan Zhang <zhang.lyra@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit bd3400ea173fb611cdf2030d03620185ff6c0b0e)

Bug: 235523176
Signed-off-by: Chinwen Chang <chinwen.chang@mediatek.com>
Change-Id: I815893d25186847933db2a0872528fb15a00b3c8
2022-07-19 03:54:51 +00:00
Liujie Xie
b7ea1c4987 ANDROID: vendor_hooks: Add hook in shrink_node_memcgs
Add vendor hook in shrink_node_memcgs to adjust whether
to skip memory reclamation of memcg.

Bug: 226482420
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: I925856353e63c5a821027de4f8476c833e21b982
2022-07-19 03:52:52 +00:00
Liujie Xie
5fdfed7d78 ANDROID: vendor_hooks: Add hooks for memory when debug
Add vendors hooks for recording memory used

Bug: 182443489
Bug: 234407991
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: I62d8bb2b6650d8b187b433f97eb833ef0b784df1
2022-07-19 03:52:51 +00:00
Jiewen Wang
ecbd315a64 ANDROID: vendor_hooks: Add hook in try_to_unmap_one()
Add hook in try_to_unmap_one() to trace this function for debug memory
swap bugs.

Bug: 198385827
Change-Id: I1fdbe60e09bb491b949e06a07133710453ecca03
Signed-off-by: Jiewen Wang <jiewen.wang@vivo.com>
(cherry picked from commit 955f917251)
2022-07-19 03:52:51 +00:00
Jiewen Wang
e846635d85 ANDROID: vendor_hooks: Add hook in mmap_region()
Add hook in mmap_region() to record the vma and address information
of monitored processes.

Bug: 198385827
Change-Id: I0bde29113b47ca7f4a9f5d42a54188e791ca3b7e
Signed-off-by: Jiewen Wang <jiewen.wang@vivo.com>
(cherry picked from commit 878e0caa77)
2022-07-19 03:52:51 +00:00
Patrick Daly
629b40ebc0 ANDROID: mm/memory_hotplug: Don't special case memory_block_size_bytes
If add_memory_subsection() is called with a size of
memory_block_size_bytes, it calls into add_memory(), which declares
the region as system ram, and adds it to the buddy allocator. This
is inconsistent with the behavior of add_memory_subsection() for
other sizes, for which it does not add the memory to buddy and
instead reserves it for the caller's private use.

Bug: 210008865
Fixes: 417ac617ea ("ANDROID: mm/memory_hotplug: implement {add/remove}_memory_subsection")
Change-Id: Iefb69b0b4e96af670d0e65c325a9538d14b460e3
Signed-off-by: Patrick Daly <quic_pdaly@quicinc.com>
2022-07-19 03:52:50 +00:00
Hailong Tu
7231d46778 FROMLIST: mm/damon/reclaim: Fix the timer always stays active
The timer stays active even if the reclaim mechanism is never enabled.
It is unnecessary overhead can be completely avoided by using module_param_cb() for enabled flag.

Link: https://lore.kernel.org/all/20220421125910.1052459-1-tuhailong@gmail.com/

Bug: 228223814
Signed-off-by: Hailong Tu <tuhailong@oppo.com>
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: I77591e41ce424ce16a4a5c70e7a86cdae996a354
2022-07-19 03:52:48 +00:00
Jakub Kicinski
301cbec682 BACKPORT: treewide: Add missing includes masked by cgroup -> bpf dependency
cgroup.h (therefore swap.h, therefore half of the universe)
includes bpf.h which in turn includes module.h and slab.h.
Since we're about to get rid of that dependency we need
to clean things up.

v2: drop the cpu.h include from cacheinfo.h, it's not necessary
and it makes riscv sensitive to ordering of include files.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Krzysztof Wilczyński <kw@linux.com>
Acked-by: Peter Chen <peter.chen@kernel.org>
Acked-by: SeongJae Park <sj@kernel.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/all/20211120035253.72074-1-kuba@kernel.org/  # v1
Link: https://lore.kernel.org/all/20211120165528.197359-1-kuba@kernel.org/ # cacheinfo discussion
Link: https://lore.kernel.org/bpf/20211202203400.1208663-1-kuba@kernel.org

(cherry picked from commit 8581fd402a0cf80b5298e3b225e7a7bd8f110e69)

Dropped all the changes except for the ones made in mm/damon/vaddr.c

Bug: 228223814
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: Ib64ecbe4e06f192ed576af77e03e6b49d538bac9
2022-07-19 03:52:48 +00:00
SeongJae Park
be7103897c UPSTREAM: mm/damon: hide kernel pointer from tracepoint event
DAMON's virtual address spaces monitoring primitive uses 'struct pid *'
of the target process as its monitoring target id.  The kernel address
is exposed as-is to the user space via the DAMON tracepoint,
'damon_aggregated'.

Though primarily only privileged users are allowed to access that, it
would be better to avoid unnecessarily exposing kernel pointers so.
Because the trace result is only required to be able to distinguish each
target, we aren't need to use the pointer as-is.

This makes the tracepoint to use the index of the target in the
context's targets list as its id in the tracepoint, to hide the kernel
space address.

Link: https://lkml.kernel.org/r/20211229131016.23641-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 76fd0285b447991267e838842c0be7395eb454bb)

Bug: 228223814
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: Iee4a6f56cf9bf5f61fb95f438dfc3316c10198e5
2022-07-19 03:52:48 +00:00
SeongJae Park
ec4797a0fc UPSTREAM: mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log
The failure log message for 'damon_va_three_regions()' prints the target
id, which is a 'struct pid' pointer in the case.  To avoid exposing the
kernel pointer via the log, this makes the log to use the index of the
target in the context's targets list instead.

Link: https://lkml.kernel.org/r/20211229131016.23641-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 962fe7a6b1b2f9deb1b31b3344afa3b11afdf7ab)

Bug: 228223814
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: I6b9b09a9b7024630f05ec41db030006d63f47ce1
2022-07-19 03:52:48 +00:00
SeongJae Park
20616f4e0b UPSTREAM: mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging
Failure of 'damon_va_three_regions()' is logged using 'pr_err()'.  But,
the function can fail in legal situations.  To avoid making users be
surprised and to keep the kernel clean, this makes the log to be printed
using 'pr_debug()'.

Link: https://lkml.kernel.org/r/20211229131016.23641-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 251403f19aab6a122f4dcfb14149814e85564202)

Bug: 228223814
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: I0a2c08d856d0067af6af7a008e89f63fa5fac381
2022-07-19 03:52:47 +00:00
SeongJae Park
8b69d49ed0 UPSTREAM: mm/damon/dbgfs: remove an unnecessary variable
Patch series "mm/damon: Hide unnecessary information disclosures".

DAMON is exposing some unnecessary information including kernel pointer
in kernel log and tracepoint.  This patchset hides such information.
The first patch is only for a trivial cleanup, though.

This patch (of 4):

This commit removes a unnecessarily used variable in
dbgfs_target_ids_write().

Link: https://lkml.kernel.org/r/20211229131016.23641-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20211229131016.23641-2-sj@kernel.org
Fixes: 4bc05954d0 ("mm/damon: implement a debugfs-based user space interface")
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 70b8480812d0a3930049a44820a1fa149b090c10)

Bug: 228223814
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: I774c900de92780ae9ebf09f01c0b0536eb43f822
2022-07-19 03:52:47 +00:00
Guoqing Jiang
f23143e4b5 UPSTREAM: mm/damon: move the implementation of damon_insert_region to damon.h
Usually, inline function is declared static since it should sit between
storage and type.  And implement it in a header file if used by multiple
files.

And this change also fixes compile issue when backport damon to 5.10.

  mm/damon/vaddr.c: In function `damon_va_evenly_split_region':
  ./include/linux/damon.h:425:13: error: inlining failed in call to `always_inline' `damon_insert_region': function body not available
  425 | inline void damon_insert_region(struct damon_region *r,
      | ^~~~~~~~~~~~~~~~~~~
  mm/damon/vaddr.c:86:3: note: called from here
  86 | damon_insert_region(n, r, next, t);
     | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Link: https://lkml.kernel.org/r/20211223085703.6142-1-guoqing.jiang@linux.dev
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 2cd4b8e10cc31eadb5b10b1d73b3f28156f3776c)

Bug: 228223814
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: Iaa05318092b8e98bfbfc72a8c5df2cb6de97d224
2022-07-19 03:52:47 +00:00
Baolin Wang
7b782ddac2 UPSTREAM: mm/damon: add access checking for hugetlb pages
The process's VMAs can be mapped by hugetlb page, but now the DAMON did
not implement the access checking for hugetlb pte, so we can not get the
actual access count like below if a process VMAs were mapped by hugetlb.

  damon_aggregated: target_id=18446614368406014464 nr_regions=12 4194304-5476352: 0 545
  damon_aggregated: target_id=18446614368406014464 nr_regions=12 140662370467840-140662372970496: 0 545
  damon_aggregated: target_id=18446614368406014464 nr_regions=12 140662372970496-140662375460864: 0 545
  damon_aggregated: target_id=18446614368406014464 nr_regions=12 140662375460864-140662377951232: 0 545
  damon_aggregated: target_id=18446614368406014464 nr_regions=12 140662377951232-140662380449792: 0 545
  damon_aggregated: target_id=18446614368406014464 nr_regions=12 140662380449792-140662382944256: 0 545
  ......

Thus this patch adds hugetlb access checking support, with this patch we
can see below VMA mapped by hugetlb access count.

  damon_aggregated: target_id=18446613056935405824 nr_regions=12 140296486649856-140296489914368: 1 3
  damon_aggregated: target_id=18446613056935405824 nr_regions=12 140296489914368-140296492978176: 1 3
  damon_aggregated: target_id=18446613056935405824 nr_regions=12 140296492978176-140296495439872: 1 3
  damon_aggregated: target_id=18446613056935405824 nr_regions=12 140296495439872-140296498311168: 1 3
  damon_aggregated: target_id=18446613056935405824 nr_regions=12 140296498311168-140296501198848: 1 3
  damon_aggregated: target_id=18446613056935405824 nr_regions=12 140296501198848-140296504320000: 1 3
  damon_aggregated: target_id=18446613056935405824 nr_regions=12 140296504320000-140296507568128: 1 2
  ......

[baolin.wang@linux.alibaba.com: fix unused var warning]
  Link: https://lkml.kernel.org/r/1aaf9c11-0d8e-b92d-5c92-46e50a6e8d4e@linux.alibaba.com
[baolin.wang@linux.alibaba.com: v3]
  Link: https://lkml.kernel.org/r/486927ecaaaecf2e3a7fbe0378ec6e1c58b50747.1640852276.git.baolin.wang@linux.alibaba.com

Link: https://lkml.kernel.org/r/6afcbd1fda5f9c7c24f320d26a98188c727ceec3.1639623751.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 49f4203aae06ba9d67b500c90339b262b0a52637)

Bug: 228223814
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: Id7c86f5c0344efef150b3b27f662b696263947ed
2022-07-19 03:52:47 +00:00
SeongJae Park
b66d1048c2 UPSTREAM: mm/damon/dbgfs: support all DAMOS stats
Currently, DAMON debugfs interface is not supporting DAMON-based
Operation Schemes (DAMOS) stats for schemes successfully applied regions
and time/space quota limit exceeds.  This adds the support.

Link: https://lkml.kernel.org/r/20211210150016.35349-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 3a619fdb8de8a3ecd4200e7d183d2c8ceb32289e)

Bug: 228223814
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: Ide3cfaff1b63e053ece5a8746a463fb87ef6c607
2022-07-19 03:52:47 +00:00
SeongJae Park
d62d8cb4a1 UPSTREAM: mm/damon/reclaim: provide reclamation statistics
This implements new DAMON_RECLAIM parameters for statistics reporting.
Those can be used for understanding how DAMON_RECLAIM is working, and
for tuning the other parameters.

Link: https://lkml.kernel.org/r/20211210150016.35349-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 60e52e7c46a127bca5ddd48b89002564f3862063)

Bug: 228223814
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: I721bdacb3b2d8b41154296f5dbf8ef48c5dd0744
2022-07-19 03:52:47 +00:00
SeongJae Park
a5c6a46772 UPSTREAM: mm/damon/schemes: account how many times quota limit has exceeded
If the time/space quotas of a given DAMON-based operation scheme is too
small, the scheme could show unexpectedly slow progress.  However, there
is no good way to notice the case in runtime.  This commit extends the
DAMOS stat to provide how many times the quota limits exceeded so that
the users can easily notice the case and tune the scheme.

Link: https://lkml.kernel.org/r/20211210150016.35349-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 6268eac34ca30af7f6313504d556ec7fcd295621)

Bug: 228223814
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: I36184dc51917810c81ac8d576b144e33a4454dc1
2022-07-19 03:52:47 +00:00
SeongJae Park
df693f153b UPSTREAM: mm/damon/schemes: account scheme actions that successfully applied
Patch series "mm/damon/schemes: Extend stats for better online analysis and tuning".

To help online access pattern analysis and tuning of DAMON-based
Operation Schemes (DAMOS), DAMOS provides simple statistics for each
scheme.  Introduction of DAMOS time/space quota further made the tuning
easier by making the risk management easier.  However, that also made
understanding of the working schemes a little bit more difficult.

For an example, progress of a given scheme can now be throttled by not
only the aggressiveness of the target access pattern, but also the
time/space quotas.  So, when a scheme is showing unexpectedly slow
progress, it's difficult to know by what the progress of the scheme is
throttled, with currently provided statistics.

This patchset extends the statistics to contain some metrics that can be
helpful for such online schemes analysis and tuning (patches 1-2),
exports those to users (patches 3 and 5), and add documents (patches 4
and 6).

This patch (of 6):

DAMON-based operation schemes (DAMOS) stats provide only the number and
the amount of regions that the action of the scheme has tried to be
applied.  Because the action could be failed for some reasons, the
currently provided information is sometimes not useful or convenient
enough for schemes profiling and tuning.  To improve this situation,
this commit extends the DAMOS stats to provide the number and the amount
of regions that the action has successfully applied.

Link: https://lkml.kernel.org/r/20211210150016.35349-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20211210150016.35349-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 0e92c2ee9f459542c5384d9cfab24873c3dd6398)

Bug: 228223814
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: Iddfe9257cb99091404202576a1addc5cc340cb8b
2022-07-19 03:52:47 +00:00
SeongJae Park
ae03ced4a8 UPSTREAM: mm/damon: convert macro functions to static inline functions
Patch series "mm/damon: Misc cleanups".

This patchset contains miscellaneous cleanups for DAMON's macro
functions and documentation.

This patch (of 6):

This commit converts macro functions in DAMON to static inline functions,
for better type checking, code documentation, etc[1].

[1] https://lore.kernel.org/linux-mm/20211202151213.6ec830863342220da4141bc5@linux-foundation.org/

Link: https://lkml.kernel.org/r/20211209131806.19317-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20211209131806.19317-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 88f86dcfa454784f7de550966c60fc78a3e95d6d)

Bug: 228223814
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: Id1e4e08a844b49cf3572cb3ac02158f5c1909ea2
2022-07-19 03:52:46 +00:00
Xin Hao
a871a6037c UPSTREAM: mm/damon: move damon_rand() definition into damon.h
damon_rand() is called in three files:damon/core.c, damon/ paddr.c,
damon/vaddr.c, i think there is no need to redefine this twice, So move
it to damon.h will be a good choice.

Link: https://lkml.kernel.org/r/20211202075859.51341-1-xhao@linux.alibaba.com
Signed-off-by: Xin Hao <xhao@linux.alibaba.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 9b2a38d6ef25c1748e3964b0ff30a89e4ed26583)

Bug: 228223814
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: Ib7be3062385fac4b422faa86705968aa39095a72
2022-07-19 03:52:46 +00:00
Xin Hao
f9feb73419 UPSTREAM: mm/damon/schemes: add the validity judgment of thresholds
In dbgfs "schemes" interface, i do some test like this:
    # cd /sys/kernel/debug/damon
    # echo "2 1 2 1 10 1 3 10 1 1 1 1 1 1 1 1 2 3" > schemes
    # cat schemes
    # 2 1 2 1 10 1 3 10 1 1 1 1 1 1 1 1 2 3 0 0

There have some unreasonable places, i set the valules of these variables
"<min_sz, max_sz> <min_nr_a, max_nr_a>, <min_age, max_age>, <wmarks.high,
wmarks.mid, wmarks.low>" as "<2, 1>, <2, 1>, <10, 1>, <1, 2, 3>.

So there add a validity judgment for these thresholds value.

Link: https://lkml.kernel.org/r/d78360e52158d786fcbf20bc62c96785742e76d3.1637239568.git.xhao@linux.alibaba.com
Signed-off-by: Xin Hao <xhao@linux.alibaba.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit c89ae63eb0662b6c9f82dbfad3ef010239b8c1b1)

Bug: 228223814
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: I2b23124b95ebeac935f28f9632dccaeabace4533
2022-07-19 03:52:46 +00:00
Yihao Han
1853e7689a UPSTREAM: mm/damon/vaddr: remove swap_ranges() and replace it with swap()
Remove 'swap_ranges()' and replace it with the macro 'swap()' defined in
'include/linux/minmax.h' to simplify code and improve efficiency

Link: https://lkml.kernel.org/r/20211111115355.2808-1-hanyihao@vivo.com
Signed-off-by: Yihao Han <hanyihao@vivo.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 8bd0b9da03c9154e279b1a502636103887b9fbed)

Bug: 228223814
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: Icf565b52a7642fb830ae004764ca496986aed82a
2022-07-19 03:52:46 +00:00
Xin Hao
4d050da29e UPSTREAM: mm/damon: remove some unneeded function definitions in damon.h
In damon.h some func definitions about VA & PA can only be used in its
own file, so there no need to define in the header file, and the header
file will look cleaner.

If other files later need these functions, the prototypes can be added
to damon.h at that time.

[sj@kernel.org: remove unnecessary function prototype position changes]
 Link: https://lkml.kernel.org/r/20211118114827.20052-1-sj@kernel.org

Link: https://lkml.kernel.org/r/45fd5b3ef6cce8e28dbc1c92f9dc845ccfc949d7.1636989871.git.xhao@linux.alibaba.com
Signed-off-by: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: SeongJae Park <sj@kernel.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit cdeed009f3bceee41f73f0137db785fd29a05cb8)

Bug: 228223814
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: I2203b9c8e4625493797e1f3e506431799c4404c9
2022-07-19 03:52:46 +00:00
Xin Hao
0026bea8c0 UPSTREAM: mm/damon/core: use abs() instead of diff_of()
In kernel, we can use abs(a - b) to get the absolute value, So there is no
need to redefine a new one.

Link: https://lkml.kernel.org/r/b24e7b82d9efa90daf150d62dea171e19390ad0b.1636989871.git.xhao@linux.alibaba.com
Signed-off-by: Xin Hao <xhao@linux.alibaba.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit d720bbbd70e968f8a0257393b575c3a29b56f990)

Bug: 228223814
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: Iab418bdc79cea24a175c2c5e17ab7ee393df6c47
2022-07-19 03:52:46 +00:00
Xin Hao
a5e1a7efe9 UPSTREAM: mm/damon: unified access_check function naming rules
Patch series "mm/damon: Do some small changes", v4.

This patch (of 4):

In damon/paddr.c file, two functions names start with underscore,
	static void __damon_pa_prepare_access_check(struct damon_ctx *ctx,
			struct damon_region *r)
	static void __damon_pa_prepare_access_check(struct damon_ctx *ctx,
			struct damon_region *r)
In damon/vaddr.c file, there are also two functions with the same function,
	static void damon_va_prepare_access_check(struct damon_ctx *ctx,
			struct mm_struct *mm, struct damon_region *r)
	static void damon_va_check_access(struct damon_ctx *ctx,
			struct mm_struct *mm, struct damon_region *r)

It makes sense to keep consistent, and it is not easy to be confused with
the function that call them.

Link: https://lkml.kernel.org/r/cover.1636989871.git.xhao@linux.alibaba.com
Link: https://lkml.kernel.org/r/529054aed932a42b9c09fc9977ad4574b9e7b0bd.1636989871.git.xhao@linux.alibaba.com
Signed-off-by: Xin Hao <xhao@linux.alibaba.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit b627b774911660852ce7f3f3817955ddad2bd130)

Bug: 228223814
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: I180e7f068ed4745c5fbc0e2a81320b7fda03c52e
2022-07-19 03:52:46 +00:00
SeongJae Park
b933d9e6d5 UPSTREAM: mm/damon/vaddr-test: remove unnecessary variables
A couple of test functions in DAMON virtual address space monitoring
primitives implementation has unnecessary damon_ctx variables.  This
commit removes those.

Link: https://lkml.kernel.org/r/20211201150440.1088-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 9f86d624292c238203b3687cdb870a2cde1a6f9b)

Bug: 228223814
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: Ic7543d67d15843d7c2110302de950958a3b46e04
2022-07-19 03:52:45 +00:00
SeongJae Park
bc7c4add16 UPSTREAM: mm/damon/vaddr-test: split a test function having >1024 bytes frame size
On some configuration[1], 'damon_test_split_evenly()' kunit test
function has >1024 bytes frame size, so below build warning is
triggered:

      CC      mm/damon/vaddr.o
    In file included from mm/damon/vaddr.c:672:
    mm/damon/vaddr-test.h: In function 'damon_test_split_evenly':
    mm/damon/vaddr-test.h:309:1: warning: the frame size of 1064 bytes is larger than 1024 bytes [-Wframe-larger-than=]
      309 | }
          | ^

This commit fixes the warning by separating the common logic in the
function.

[1] https://lore.kernel.org/linux-mm/202111182146.OV3C4uGr-lkp@intel.com/

Link: https://lkml.kernel.org/r/20211201150440.1088-6-sj@kernel.org
Fixes: 17ccae8bb5 ("mm/damon: add kunit tests")
Signed-off-by: SeongJae Park <sj@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 044cd9750fe010170f5dc812e4824d98f5ea928c)

Bug: 228223814
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: I78226fdde7f992f5aa8209f0996561a3f7efce84
2022-07-19 03:52:45 +00:00
SeongJae Park
83cfe3652e UPSTREAM: mm/damon/vaddr: remove an unnecessary warning message
The DAMON virtual address space monitoring primitive prints a warning
message for wrong DAMOS action.  However, it is not essential as the
code returns appropriate failure in the case.  This commit removes the
message to make the log clean.

Link: https://lkml.kernel.org/r/20211201150440.1088-5-sj@kernel.org
Fixes: 6dea8add4d28 ("mm/damon/vaddr: support DAMON-based Operation Schemes")
Signed-off-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 09e12289cc044afa484e70c0b379d579d52caf9a)

Bug: 228223814
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: I695d3e813f9449b388f2757c6e81cba76ee3e1bf
2022-07-19 03:52:45 +00:00
SeongJae Park
9ff23f4240 UPSTREAM: mm/damon/core: remove unnecessary error messages
DAMON core prints error messages when damon_target object creation is
failed or wrong monitoring attributes are given.  Because appropriate
error code is returned for each case, the messages are not essential.
Also, because the code path can be triggered with user-specified input,
this could result in kernel log mistakenly being messy.  To avoid the
case, this commit removes the messages.

Link: https://lkml.kernel.org/r/20211201150440.1088-4-sj@kernel.org
Fixes: 4bc05954d0 ("mm/damon: implement a debugfs-based user space interface")
Fixes: b9a6ac4e4e ("mm/damon: adaptively adjust regions")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 1afaf5cb687de85c5e00ac70f6eea5597077cbc5)

Bug: 228223814
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: I571336c5d49998029d4727b523b7a0950c753b2f
2022-07-19 03:52:45 +00:00
SeongJae Park
c9078aeb90 UPSTREAM: mm/damon/dbgfs: remove an unnecessary error message
When wrong scheme action is requested via the debugfs interface, DAMON
prints an error message.  Because the function returns error code, this
is not really needed.  Because the code path is triggered by the user
specified input, this can result in kernel log mistakenly being messy.
To avoid the case, this commit removes the message.

Link: https://lkml.kernel.org/r/20211201150440.1088-3-sj@kernel.org
Fixes: af122dd8f3c0 ("mm/damon/dbgfs: support DAMON-based Operation Schemes")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 0bceffa236af401f5206feaf3538526cbc427209)

Bug: 228223814
Signed-off-by: zhijun wan <wanzhijun@oppo.com>
Change-Id: I6c6e4ea6891eee04cb934f237b7d1768a766f240
2022-07-19 03:52:45 +00:00