Commit Graph

1193 Commits

Author SHA1 Message Date
Yu Zhao
7f53b0e704 BACKPORT: mm: multi-gen LRU: support page table walks
To further exploit spatial locality, the aging prefers to walk page tables
to search for young PTEs and promote hot pages.  A kill switch will be
added in the next patch to disable this behavior.  When disabled, the
aging relies on the rmap only.

NB: this behavior has nothing similar with the page table scanning in the
2.4 kernel [1], which searches page tables for old PTEs, adds cold pages
to swapcache and unmaps them.

To avoid confusion, the term "iteration" specifically means the traversal
of an entire mm_struct list; the term "walk" will be applied to page
tables and the rmap, as usual.

An mm_struct list is maintained for each memcg, and an mm_struct follows
its owner task to the new memcg when this task is migrated.  Given an
lruvec, the aging iterates lruvec_memcg()->mm_list and calls
walk_page_range() with each mm_struct on this list to promote hot pages
before it increments max_seq.

When multiple page table walkers iterate the same list, each of them gets
a unique mm_struct; therefore they can run concurrently.  Page table
walkers ignore any misplaced pages, e.g., if an mm_struct was migrated,
pages it left in the previous memcg will not be promoted when its current
memcg is under reclaim.  Similarly, page table walkers will not promote
pages from nodes other than the one under reclaim.

This patch uses the following optimizations when walking page tables:
1. It tracks the usage of mm_struct's between context switches so that
   page table walkers can skip processes that have been sleeping since
   the last iteration.
2. It uses generational Bloom filters to record populated branches so
   that page table walkers can reduce their search space based on the
   query results, e.g., to skip page tables containing mostly holes or
   misplaced pages.
3. It takes advantage of the accessed bit in non-leaf PMD entries when
   CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y.
4. It does not zigzag between a PGD table and the same PMD table
   spanning multiple VMAs. IOW, it finishes all the VMAs within the
   range of the same PMD table before it returns to a PGD table. This
   improves the cache performance for workloads that have large
   numbers of tiny VMAs [2], especially when CONFIG_PGTABLE_LEVELS=5.

Server benchmark results:
  Single workload:
    fio (buffered I/O): no change

  Single workload:
    memcached (anon): +[8, 10]%
                Ops/sec      KB/sec
      patch1-7: 1147696.57   44640.29
      patch1-8: 1245274.91   48435.66

  Configurations:
    no change

Client benchmark results:
  kswapd profiles:
    patch1-7
      48.16%  lzo1x_1_do_compress (real work)
       8.20%  page_vma_mapped_walk (overhead)
       7.06%  _raw_spin_unlock_irq
       2.92%  ptep_clear_flush
       2.53%  __zram_bvec_write
       2.11%  do_raw_spin_lock
       2.02%  memmove
       1.93%  lru_gen_look_around
       1.56%  free_unref_page_list
       1.40%  memset

    patch1-8
      49.44%  lzo1x_1_do_compress (real work)
       6.19%  page_vma_mapped_walk (overhead)
       5.97%  _raw_spin_unlock_irq
       3.13%  get_pfn_page
       2.85%  ptep_clear_flush
       2.42%  __zram_bvec_write
       2.08%  do_raw_spin_lock
       1.92%  memmove
       1.44%  alloc_zspage
       1.36%  memset

  Configurations:
    no change

Thanks to the following developers for their efforts [3].
  kernel test robot <lkp@intel.com>

[1] https://lwn.net/Articles/23732/
[2] https://llvm.org/docs/ScudoHardenedAllocator.html
[3] https://lore.kernel.org/r/202204160827.ekEARWQo-lkp@intel.com/

Link: https://lkml.kernel.org/r/20220918080010.2920238-9-yuzhao@google.com
Change-Id: I7ed3daf288e664e15bfd34991a77467a19a4e39a
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit bd74fdaea146029e4fa12c6de89adbe0779348a9)
[ Resolve conflicts in include/linux/memcontrol.h,
  include/linux/mm_types.h ]
Bug: 249601646
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
2022-11-30 00:28:11 +00:00
Kalesh Singh
02dc0d1dda Revert "FROMLIST: mm: multi-gen LRU: support page table walks"
This reverts commit 5280d76d38.

To be replaced with upstream version.

Bug: 249601646
Change-Id: I5bf4095117082b6627d182a8d987ca78a18fa392
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
2022-11-30 00:28:11 +00:00
Suren Baghdasaryan
3f311327f9 ANDROID: mm: introduce vma refcounting to protect vma during SPF
Current mechanism to stabilize a vma during speculative page fault
handling makes a copy of the faulting vma under RCU protection. This
makes it hard to protect elements which do not belong to the vma but
are used by the page fault handler like vma->vm_file.
The problems is that a copy of the vma can't be used to safely
protect the file attached to the original vma unless the file is
also released after RCU grace period (which is how SPF was designed
originally but that caused performance regression and had to be
changed).
To avoid these complications, introduce vma refcounting to stabilize
and operate on the original vma during page fault handling. Page
fault handler finds the vma and increases its refcount under RCU
protection, vma is freed after RCU grace period, vma->vm_file is
released only after refcount indicates no users. This mechanism
guarantees that once get_vma returns a vma, both the vma itself and
vma->vm_file are stable.
Additional benefits of this patch are: we don't need to copy the vma
and no additional logic is needed to stabilize vma->vm_file.

Bug: 257443051
Change-Id: I59d373926d687fcbd56847a8c3500c43bf1844c8
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2022-11-23 10:25:27 -08:00
Suren Baghdasaryan
50567620db ANDROID: reimplement vm_file protection during speculative page fault
Use vma->vm_file refcounting to protect the file during speculative page
fault handling.

Bug: 258731892
Change-Id: I222c23785391bea7d95c4506d70d6f68029ec45f
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2022-11-23 10:25:27 -08:00
Suren Baghdasaryan
c11ef6356b Revert "ANDROID: add vma->file_ref_count to synchronize vma->vm_file destruction"
This reverts commit a3fe25d923.

File refcounting implemented in this patch is broken and needs to be
redone.
The change in include/linux/mm_types.h which adds file_ref_count into
vm_area_struct is left untouched to keep ABI intact.

Bug: 258731892
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I37984eb2f0981a989f74bcaaa6be42040a2f241e
2022-11-23 10:25:26 -08:00
Greg Kroah-Hartman
5a1075de9c Merge 5.15.68 into android14-5.15
Changes in 5.15.68
	net: wwan: iosm: remove pointless null check
	efi: libstub: Disable struct randomization
	efi: capsule-loader: Fix use-after-free in efi_capsule_write
	wifi: iwlegacy: 4965: corrected fix for potential off-by-one overflow in il4965_rs_fill_link_cmd()
	fs: only do a memory barrier for the first set_buffer_uptodate()
	Revert "mm: kmemleak: take a full lowmem check in kmemleak_*_phys()"
	scsi: qla2xxx: Disable ATIO interrupt coalesce for quad port ISP27XX
	scsi: megaraid_sas: Fix double kfree()
	drm/gem: Fix GEM handle release errors
	drm/amdgpu: Move psp_xgmi_terminate call from amdgpu_xgmi_remove_device to psp_hw_fini
	drm/amdgpu: Check num_gfx_rings for gfx v9_0 rb setup.
	drm/radeon: add a force flush to delay work when radeon
	scsi: ufs: core: Reduce the power mode change timeout
	Revert "parisc: Show error if wrong 32/64-bit compiler is being used"
	parisc: ccio-dma: Handle kmalloc failure in ccio_init_resources()
	parisc: Add runtime check to prevent PA2.0 kernels on PA1.x machines
	arm64: cacheinfo: Fix incorrect assignment of signed error value to unsigned fw_level
	netfilter: conntrack: work around exceeded receive window
	cpufreq: check only freq_table in __resolve_freq()
	net/core/skbuff: Check the return value of skb_copy_bits()
	md: Flush workqueue md_rdev_misc_wq in md_alloc()
	fbdev: fbcon: Destroy mutex on freeing struct fb_info
	fbdev: chipsfb: Add missing pci_disable_device() in chipsfb_pci_init()
	drm/amdgpu: mmVM_L2_CNTL3 register not initialized correctly
	ALSA: pcm: oss: Fix race at SNDCTL_DSP_SYNC
	ALSA: emu10k1: Fix out of bounds access in snd_emu10k1_pcm_channel_alloc()
	ALSA: aloop: Fix random zeros in capture data when using jiffies timer
	ALSA: usb-audio: Split endpoint setups for hw_params and prepare
	ALSA: usb-audio: Fix an out-of-bounds bug in __snd_usb_parse_audio_interface()
	tracing: Fix to check event_mutex is held while accessing trigger list
	btrfs: zoned: set pseudo max append zone limit in zone emulation mode
	vfio/type1: Unpin zero pages
	kprobes: Prohibit probes in gate area
	debugfs: add debugfs_lookup_and_remove()
	sched/debug: fix dentry leak in update_sched_domain_debugfs
	drm/amd/display: fix memory leak when using debugfs_lookup()
	nvmet: fix a use-after-free
	drm/i915: Implement WaEdpLinkRateDataReload
	scsi: mpt3sas: Fix use-after-free warning
	scsi: lpfc: Add missing destroy_workqueue() in error path
	NFS: Further optimisations for 'ls -l'
	NFS: Save some space in the inode
	NFS: Fix another fsync() issue after a server reboot
	cgroup: Elide write-locking threadgroup_rwsem when updating csses on an empty subtree
	cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock
	ASoC: qcom: sm8250: add missing module owner
	RDMA/rtrs-clt: Use the right sg_cnt after ib_dma_map_sg
	RDMA/rtrs-srv: Pass the correct number of entries for dma mapped SGL
	ARM: dts: imx6qdl-kontron-samx6i: remove duplicated node
	soc: imx: gpcv2: Assert reset before ungating clock
	regulator: core: Clean up on enable failure
	tee: fix compiler warning in tee_shm_register()
	RDMA/cma: Fix arguments order in net device validation
	soc: brcmstb: pm-arm: Fix refcount leak and __iomem leak bugs
	RDMA/hns: Fix supported page size
	RDMA/hns: Fix wrong fixed value of qp->rq.wqe_shift
	wifi: wilc1000: fix DMA on stack objects
	ARM: at91: pm: fix self-refresh for sama7g5
	ARM: at91: pm: fix DDR recalibration when resuming from backup and self-refresh
	ARM: dts: at91: sama5d27_wlsom1: specify proper regulator output ranges
	ARM: dts: at91: sama5d2_icp: specify proper regulator output ranges
	ARM: dts: at91: sama5d27_wlsom1: don't keep ldo2 enabled all the time
	ARM: dts: at91: sama5d2_icp: don't keep vdd_other enabled all the time
	netfilter: br_netfilter: Drop dst references before setting.
	netfilter: nf_tables: clean up hook list when offload flags check fails
	netfilter: nf_conntrack_irc: Fix forged IP logic
	RDMA/srp: Set scmnd->result only when scmnd is not NULL
	ALSA: usb-audio: Inform the delayed registration more properly
	ALSA: usb-audio: Register card again for iface over delayed_register option
	rxrpc: Fix ICMP/ICMP6 error handling
	rxrpc: Fix an insufficiently large sglist in rxkad_verify_packet_2()
	afs: Use the operation issue time instead of the reply time for callbacks
	Revert "net: phy: meson-gxl: improve link-up behavior"
	sch_sfb: Don't assume the skb is still around after enqueueing to child
	tipc: fix shift wrapping bug in map_get()
	net: introduce __skb_fill_page_desc_noacc
	tcp: TX zerocopy should not sense pfmemalloc status
	ice: use bitmap_free instead of devm_kfree
	i40e: Fix kernel crash during module removal
	iavf: Detach device during reset task
	xen-netback: only remove 'hotplug-status' when the vif is actually destroyed
	RDMA/siw: Pass a pointer to virt_to_page()
	ipv6: sr: fix out-of-bounds read when setting HMAC data.
	IB/core: Fix a nested dead lock as part of ODP flow
	RDMA/mlx5: Set local port to one when accessing counters
	erofs: fix pcluster use-after-free on UP platforms
	nvme-tcp: fix UAF when detecting digest errors
	nvme-tcp: fix regression that causes sporadic requests to time out
	tcp: fix early ETIMEDOUT after spurious non-SACK RTO
	nvmet: fix mar and mor off-by-one errors
	RDMA/irdma: Report the correct max cqes from query device
	RDMA/irdma: Return correct WC error for bind operation failure
	RDMA/irdma: Report RNR NAK generation in device caps
	sch_sfb: Also store skb len before calling child enqueue
	perf script: Fix Cannot print 'iregs' field for hybrid systems
	hwmon: (tps23861) fix byte order in resistance register
	ASoC: mchp-spdiftx: remove references to mchp_i2s_caps
	ASoC: mchp-spdiftx: Fix clang -Wbitfield-constant-conversion
	MIPS: loongson32: ls1c: Fix hang during startup
	kbuild: disable header exports for UML in a straightforward way
	i40e: Refactor tc mqprio checks
	i40e: Fix ADQ rate limiting for PF
	swiotlb: avoid potential left shift overflow
	iommu/amd: use full 64-bit value in build_completion_wait()
	s390/boot: fix absolute zero lowcore corruption on boot
	hwmon: (mr75203) fix VM sensor allocation when "intel,vm-map" not defined
	hwmon: (mr75203) update pvt->v_num and vm_num to the actual number of used sensors
	hwmon: (mr75203) fix voltage equation for negative source input
	hwmon: (mr75203) fix multi-channel voltage reading
	hwmon: (mr75203) enable polling for all VM channels
	Revert "arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags""
	arm64/bti: Disable in kernel BTI when cross section thunks are broken
	iommu/vt-d: Correctly calculate sagaw value of IOMMU
	arm64: errata: add detection for AMEVCNTR01 incrementing incorrectly
	drm/bridge: display-connector: implement bus fmts callbacks
	perf machine: Use path__join() to compose a path instead of snprintf(dir, '/', filename)
	ARM: at91: ddr: remove CONFIG_SOC_SAMA7 dependency
	Linux 5.15.68

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3e23c18230fda5af55fc5b73db9ac288835c8c23
2022-09-24 14:12:45 +02:00
Yishai Hadas
819110054b IB/core: Fix a nested dead lock as part of ODP flow
[ Upstream commit 85eaeb5058f0f04dffb124c97c86b4f18db0b833 ]

Fix a nested dead lock as part of ODP flow by using mmput_async().

From the below call trace [1] can see that calling mmput() once we have
the umem_odp->umem_mutex locked as required by
ib_umem_odp_map_dma_and_lock() might trigger in the same task the
exit_mmap()->__mmu_notifier_release()->mlx5_ib_invalidate_range() which
may dead lock when trying to lock the same mutex.

Moving to use mmput_async() will solve the problem as the above
exit_mmap() flow will be called in other task and will be executed once
the lock will be available.

[1]
[64843.077665] task:kworker/u133:2  state:D stack:    0 pid:80906 ppid:
2 flags:0x00004000
[64843.077672] Workqueue: mlx5_ib_page_fault mlx5_ib_eqe_pf_action [mlx5_ib]
[64843.077719] Call Trace:
[64843.077722]  <TASK>
[64843.077724]  __schedule+0x23d/0x590
[64843.077729]  schedule+0x4e/0xb0
[64843.077735]  schedule_preempt_disabled+0xe/0x10
[64843.077740]  __mutex_lock.constprop.0+0x263/0x490
[64843.077747]  __mutex_lock_slowpath+0x13/0x20
[64843.077752]  mutex_lock+0x34/0x40
[64843.077758]  mlx5_ib_invalidate_range+0x48/0x270 [mlx5_ib]
[64843.077808]  __mmu_notifier_release+0x1a4/0x200
[64843.077816]  exit_mmap+0x1bc/0x200
[64843.077822]  ? walk_page_range+0x9c/0x120
[64843.077828]  ? __cond_resched+0x1a/0x50
[64843.077833]  ? mutex_lock+0x13/0x40
[64843.077839]  ? uprobe_clear_state+0xac/0x120
[64843.077860]  mmput+0x5f/0x140
[64843.077867]  ib_umem_odp_map_dma_and_lock+0x21b/0x580 [ib_core]
[64843.077931]  pagefault_real_mr+0x9a/0x140 [mlx5_ib]
[64843.077962]  pagefault_mr+0xb4/0x550 [mlx5_ib]
[64843.077992]  pagefault_single_data_segment.constprop.0+0x2ac/0x560
[mlx5_ib]
[64843.078022]  mlx5_ib_eqe_pf_action+0x528/0x780 [mlx5_ib]
[64843.078051]  process_one_work+0x22b/0x3d0
[64843.078059]  worker_thread+0x53/0x410
[64843.078065]  ? process_one_work+0x3d0/0x3d0
[64843.078073]  kthread+0x12a/0x150
[64843.078079]  ? set_kthread_struct+0x50/0x50
[64843.078085]  ret_from_fork+0x22/0x30
[64843.078093]  </TASK>

Fixes: 36f30e486d ("IB/core: Improve ODP to use hmm_range_fault()")
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/74d93541ea533ef7daec6f126deb1072500aeb16.1661251841.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15 11:30:06 +02:00
Suren Baghdasaryan
4daa3c254e ANDROID: add vma->file_ref_count to synchronize vma->vm_file destruction
In order to prevent destruction of vma->vm_file while it's being used
during speculative page fault handling, introduce an atomic refcounter.

Bug: 234527424
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I0e971156f3e76feb45136bac1582a7eaab8c75df
2022-07-19 12:47:27 +00:00
Suren Baghdasaryan
0864756fb0 Revert "ANDROID: Use the notifier lock to perform file-backed vma teardown"
This reverts commit dc8ac508af.
Reason for revert: performance regression.

Bug: 234527424
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I5811c7ee2feac7f9057ed0ccd848e5de71f79354
2022-07-19 12:47:27 +00:00
Liangliang Li
bafafe0ec4 ANDROID: vendor_hooks: Add hooks to dup_task_struct
Add hook to dup_task_struct for vendor data fields initialisation.

Bug: 188004638

Change-Id: I4b58604ee822fb8d1e0cc37bec72e820e7318427
Signed-off-by: Liangliang Li <liliangliang@vivo.com>
(cherry picked from commit f66d96b14a)
2022-07-19 03:52:54 +00:00
Bart Van Assche
38c4790b88 ANDROID: Fix the CONFIG_ANDROID_VENDOR_OEM_DATA=n build
Scripts like
https://github.com/bvanassche/build-scsi-drivers/blob/main/build-scsi-drivers
do not set CONFIG_ANDROID_VENDOR_OEM_DATA. Hence this patch that
unbreaks the CONFIG_ANDROID_VENDOR_OEM_DATA=n build.

Fixes: 291dfda577 ("ANDROID: init_task: Init android vendor and oem data")
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Change-Id: Ic8223e69495fce7e2d0531313856ea5ed21659b7
2022-07-19 03:52:52 +00:00
Maria Yu
34c53e1824 ANDROID: init_task: Init android vendor and oem data
Without initialization, it will be random data and hard for
vendor hook to decide.

Bug: 207739506
Change-Id: I278772d87eea38c03a40d4f0bef20ac8644e2ecd
Signed-off-by: Maria Yu <quic_aiquny@quicinc.com>
(cherry picked from commit 898e7ec950)
2022-07-19 03:52:49 +00:00
Suren Baghdasaryan
6072c99f21 ANDROID: Use the notifier lock to perform file-backed vma teardown
When a file-backed vma is being released, the userspace can have an
expectation that the vma and the file it's pinning will be released
synchronously. This does not happen when SPF is enabled because vma
and associated file are released asynchronously after RCU grace
period. This is done to prevent pagefault handler from stepping on
a deleted object. Fix this issue by synchronizing the file-backed
pagefault handler with the vma tear-down using notifier lock.

Fixes: 48e35d053f "FROMLIST: mm: rcu safe vma->vm_file freeing"
Bug: 231394031
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Idabf44b8e5a91805e99d79884af77a000dca7637
2022-07-16 17:06:39 +00:00
Steve Muckle
23ef56f65c Revert "ANDROID: Make file-backed vma teardown synchronous"
This reverts commit fe25fc5375.

Reason for revert: test regressions
Bug: 232427425
Bug: 232421416
Change-Id: If7006fe6c3f1a55361099ef24927d4bc7c821b9c
Signed-off-by: Steve Muckle <smuckle@google.com>
2022-05-13 15:42:24 +00:00
Suren Baghdasaryan
fe25fc5375 ANDROID: Make file-backed vma teardown synchronous
When a file-backed vma is being released, the userspace can have an
expectation that the vma and the file it's pinning will be released
synchronously. This does not happen when SPF is enabled because vma
and associated file are released asynchronously after RCU grace
period. This is done to prevent pagefault handler from stepping on
a deleted object. Fix this issue by synchronously waiting for RCU
grace period during file-backed vma tear-down.

Fixes: 48e35d053f "FROMLIST: mm: rcu safe vma->vm_file freeing"
Bug: 231394031
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I9f672d5bd947763c7d180a8c1b1f964600d407f3
2022-05-11 17:08:16 +00:00
Yu Zhao
5280d76d38 FROMLIST: mm: multi-gen LRU: support page table walks
To further exploit spatial locality, the aging prefers to walk page
tables to search for young PTEs and promote hot pages. A kill switch
will be added in the next patch to disable this behavior. When
disabled, the aging relies on the rmap only.

NB: this behavior has nothing similar with the page table scanning in
the 2.4 kernel [1], which searches page tables for old PTEs, adds cold
pages to swapcache and unmaps them.

To avoid confusion, the term "iteration" specifically means the
traversal of an entire mm_struct list; the term "walk" will be applied
to page tables and the rmap, as usual.

An mm_struct list is maintained for each memcg, and an mm_struct
follows its owner task to the new memcg when this task is migrated.
Given an lruvec, the aging iterates lruvec_memcg()->mm_list and calls
walk_page_range() with each mm_struct on this list to promote hot
pages before it increments max_seq.

When multiple page table walkers iterate the same list, each of them
gets a unique mm_struct; therefore they can run concurrently. Page
table walkers ignore any misplaced pages, e.g., if an mm_struct was
migrated, pages it left in the previous memcg will not be promoted
when its current memcg is under reclaim. Similarly, page table walkers
will not promote pages from nodes other than the one under reclaim.

This patch uses the following optimizations when walking page tables:
1. It tracks the usage of mm_struct's between context switches so that
   page table walkers can skip processes that have been sleeping since
   the last iteration.
2. It uses generational Bloom filters to record populated branches so
   that page table walkers can reduce their search space based on the
   query results, e.g., to skip page tables containing mostly holes or
   misplaced pages.
3. It takes advantage of the accessed bit in non-leaf PMD entries when
   CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y.
4. It does not zigzag between a PGD table and the same PMD table
   spanning multiple VMAs. IOW, it finishes all the VMAs within the
   range of the same PMD table before it returns to a PGD table. This
   improves the cache performance for workloads that have large
   numbers of tiny VMAs [2], especially when CONFIG_PGTABLE_LEVELS=5.

Server benchmark results:
  Single workload:
    fio (buffered I/O): no change

  Single workload:
    memcached (anon): +[5.5, 7.5]%
                         Ops/sec      KB/sec
      patch1-7:          1014393.57   39455.42
      patch1-8:          1078507.59   41949.15

  Configurations:
    no change

Client benchmark results:
  kswapd profiles:
    patch1-7
      45.54%  lzo1x_1_do_compress (real work)
       9.56%  page_vma_mapped_walk
       6.70%  _raw_spin_unlock_irq
       2.78%  ptep_clear_flush
       2.47%  do_raw_spin_lock
       2.22%  __zram_bvec_write
       1.87%  lru_gen_look_around
       1.78%  memmove
       1.77%  obj_malloc
       1.44%  free_unref_page_list

    patch1-8
      47.02%  lzo1x_1_do_compress (real work)
       6.73%  page_vma_mapped_walk
       6.14%  _raw_spin_unlock_irq
       3.39%  walk_pte_range
       2.63%  ptep_clear_flush
       2.29%  __zram_bvec_write
       2.10%  do_raw_spin_lock
       1.81%  memmove
       1.73%  obj_malloc
       1.53%  free_unref_page_list

  Configurations:
    no change

[1] https://lwn.net/Articles/23732/
[2] https://source.android.com/devices/tech/debug/scudo

Link: https://lore.kernel.org/lkml/20220309021230.721028-9-yuzhao@google.com/
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Bug: 227651406
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I5a3c97cf8ebf8d65d5f9528cd979a637c190053e
2022-04-20 17:38:55 +00:00
Suren Baghdasaryan
0fd37220d8 UPSTREAM: mm: refactor vm_area_struct::anon_vma_name usage code
Avoid mixing strings and their anon_vma_name referenced pointers by
using struct anon_vma_name whenever possible.  This simplifies the code
and allows easier sharing of anon_vma_name structures when they
represent the same name.

[surenb@google.com: fix comment]

Link: https://lkml.kernel.org/r/20220223153613.835563-1-surenb@google.com
Link: https://lkml.kernel.org/r/20220224231834.1481408-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Colin Cross <ccross@google.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Alexey Gladkov <legion@kernel.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Chris Hyser <chris.hyser@oracle.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Xiaofeng Cao <caoxiaofeng@yulong.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 5c26f6ac9416b63d093e29c30e79b3297e425472)

Bug: 218352794
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I4a6b5602ce7151d1a4b88fac489f86d68089bd4d
2022-03-24 18:44:39 -07:00
Michel Lespinasse
48e35d053f FROMLIST: mm: rcu safe vma->vm_file freeing
Defer freeing of vma->vm_file when freeing vmas.
This is to allow speculative page faults in the mapped file case.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20210407014502.24091-24-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ic766bc2086db82eae9f3aadf0f23dd743be1c464
2022-03-23 11:32:18 -07:00
Michel Lespinasse
12230588f3 FROMLIST: mm: disable rcu safe vma freeing for single threaded user space
Performance tuning: as single threaded userspace does not use
speculative page faults, it does not require rcu safe vma freeing.
Turn this off to avoid the related (small) extra overheads.

For multi threaded userspace, we often see a performance benefit from
the rcu safe vma freeing - even in tests that do not have any frequent
concurrent page faults ! This is because rcu safe vma freeing prevents
recently released vmas from being immediately reused in a new thread.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-30-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I81ef7ab43e2757f268c567d5bfe6ab02f1e43a1c
2022-03-23 11:32:17 -07:00
Michel Lespinasse
1ae855f191 FROMLIST: mm: add mmu_notifier_lock
Introduce mmu_notifier_lock as a per-mm percpu_rw_semaphore,
as well as the code to initialize and destroy it together with the mm.

This lock will be used to prevent races between mmu_notifier_register()
and speculative fault handlers that need to fire MMU notifications
without holding any of the mmap or rmap locks.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-24-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I453ebe979c8b9dcc6159b41c5ec7a1ea17d85ee2
2022-03-23 11:32:16 -07:00
Michel Lespinasse
67cc8ce9a6 FROMLIST: mm: rcu safe vma freeing
This prepares for speculative page faults looking up and copying vmas
under protection of an rcu read lock, instead of the usual mmap read lock.

Note - it might also be feasible to just use SLAB_TYPESAFE_BY_RCU when
creating the vm_area_cachep, but that's probably too subtle to consider here.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-12-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I992fddb7c32c61bb4ab10b387f91c4e54c2250ef
2022-03-23 11:32:14 -07:00
Peter Zijlstra
3411613611 sched: Fix yet more sched_fork() races
commit b1e8206582f9d680cff7d04828708c8b6ab32957 upstream.

Where commit 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an
invalid sched_task_group") fixed a fork race vs cgroup, it opened up a
race vs syscalls by not placing the task on the runqueue before it
gets exposed through the pidhash.

Commit 13765de8148f ("sched/fair: Fix fault in reweight_entity") is
trying to fix a single instance of this, instead fix the whole class
of issues, effectively reverting this commit.

Fixes: 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Tested-by: Zhang Qiao <zhangqiao22@huawei.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lkml.kernel.org/r/YgoeCbwj5mbCR0qA@hirez.programming.kicks-ass.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08 19:12:49 +01:00
Peter Zijlstra
42da9cb956 UPSTREAM: sched: Fix yet more sched_fork() races
Where commit 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an
invalid sched_task_group") fixed a fork race vs cgroup, it opened up a
race vs syscalls by not placing the task on the runqueue before it
gets exposed through the pidhash.

Commit 13765de8148f ("sched/fair: Fix fault in reweight_entity") is
trying to fix a single instance of this, instead fix the whole class
of issues, effectively reverting this commit.

Change-Id: I4d34311eac28b23ee32e9308a21c66afe8fa8a3b
Fixes: 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Tested-by: Zhang Qiao <zhangqiao22@huawei.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lkml.kernel.org/r/YgoeCbwj5mbCR0qA@hirez.programming.kicks-ass.net
BUG: 221850698
(cherry picked from commit b1e8206582f9d680cff7d04828708c8b6ab32957)
Signed-off-by: Ashay Jaiswal <quic_ashayj@quicinc.com>
2022-03-01 21:35:25 +00:00
Greg Kroah-Hartman
2ded03fd7c Merge 5.15.25 into android13-5.15
Changes in 5.15.25
	drm/nouveau/pmu/gm200-: use alternate falcon reset sequence
	fs/proc: task_mmu.c: don't read mapcount for migration entry
	btrfs: zoned: cache reported zone during mount
	scsi: lpfc: Fix mailbox command failure during driver initialization
	HID:Add support for UGTABLET WP5540
	Revert "svm: Add warning message for AVIC IPI invalid target"
	parisc: Show error if wrong 32/64-bit compiler is being used
	serial: parisc: GSC: fix build when IOSAPIC is not set
	parisc: Drop __init from map_pages declaration
	parisc: Fix data TLB miss in sba_unmap_sg
	parisc: Fix sglist access in ccio-dma.c
	mmc: block: fix read single on recovery logic
	mm: don't try to NUMA-migrate COW pages that have other uses
	HID: amd_sfh: Add illuminance mask to limit ALS max value
	HID: i2c-hid: goodix: Fix a lockdep splat
	HID: amd_sfh: Increase sensor command timeout
	HID: amd_sfh: Correct the structure field name
	PCI: hv: Fix NUMA node assignment when kernel boots with custom NUMA topology
	parisc: Add ioread64_lo_hi() and iowrite64_lo_hi()
	btrfs: send: in case of IO error log it
	platform/x86: touchscreen_dmi: Add info for the RWC NANOTE P8 AY07J 2-in-1
	platform/x86: ISST: Fix possible circular locking dependency detected
	kunit: tool: Import missing importlib.abc
	selftests: rtc: Increase test timeout so that all tests run
	kselftest: signal all child processes
	net: ieee802154: at86rf230: Stop leaking skb's
	selftests/zram: Skip max_comp_streams interface on newer kernel
	selftests/zram01.sh: Fix compression ratio calculation
	selftests/zram: Adapt the situation that /dev/zram0 is being used
	selftests: openat2: Print also errno in failure messages
	selftests: openat2: Add missing dependency in Makefile
	selftests: openat2: Skip testcases that fail with EOPNOTSUPP
	selftests: skip mincore.check_file_mmap when fs lacks needed support
	ax25: improve the incomplete fix to avoid UAF and NPD bugs
	pinctrl: bcm63xx: fix unmet dependency on REGMAP for GPIO_REGMAP
	vfs: make freeze_super abort when sync_filesystem returns error
	quota: make dquot_quota_sync return errors from ->sync_fs
	scsi: pm80xx: Fix double completion for SATA devices
	kselftest: Fix vdso_test_abi return status
	scsi: core: Reallocate device's budget map on queue depth change
	scsi: pm8001: Fix use-after-free for aborted TMF sas_task
	scsi: pm8001: Fix use-after-free for aborted SSP/STP sas_task
	drm/amd: Warn users about potential s0ix problems
	nvme: fix a possible use-after-free in controller reset during load
	nvme-tcp: fix possible use-after-free in transport error_recovery work
	nvme-rdma: fix possible use-after-free in transport error_recovery work
	net: sparx5: do not refer to skb after passing it on
	drm/amd: add support to check whether the system is set to s3
	drm/amd: Only run s3 or s0ix if system is configured properly
	drm/amdgpu: fix logic inversion in check
	x86/Xen: streamline (and fix) PV CPU enumeration
	Revert "module, async: async_synchronize_full() on module init iff async is used"
	gcc-plugins/stackleak: Use noinstr in favor of notrace
	random: wake up /dev/random writers after zap
	KVM: x86/xen: Fix runstate updates to be atomic when preempting vCPU
	KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSM
	KVM: x86: SVM: don't passthrough SMAP/SMEP/PKE bits in !NPT && !gCR0.PG case
	KVM: x86: nSVM: fix potential NULL derefernce on nested migration
	KVM: x86: nSVM: mark vmcb01 as dirty when restoring SMM saved state
	iwlwifi: fix use-after-free
	drm/radeon: Fix backlight control on iMac 12,1
	drm/atomic: Don't pollute crtc_state->mode_blob with error pointers
	drm/amd/pm: correct the sequence of sending gpu reset msg
	drm/amdgpu: skipping SDMA hw_init and hw_fini for S0ix.
	drm/i915/opregion: check port number bounds for SWSCI display power state
	drm/i915: Fix dbuf slice config lookup
	drm/i915: Fix mbus join config lookup
	vsock: remove vsock from connected table when connect is interrupted by a signal
	drm/cma-helper: Set VM_DONTEXPAND for mmap
	drm/i915/gvt: Make DRM_I915_GVT depend on X86
	drm/i915/ttm: tweak priority hint selection
	iwlwifi: pcie: fix locking when "HW not ready"
	iwlwifi: pcie: gen2: fix locking when "HW not ready"
	iwlwifi: mvm: don't send SAR GEO command for 3160 devices
	selftests: netfilter: fix exit value for nft_concat_range
	netfilter: nft_synproxy: unregister hooks on init error path
	selftests: netfilter: disable rp_filter on router
	ipv4: fix data races in fib_alias_hw_flags_set
	ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt
	ipv6: mcast: use rcu-safe version of ipv6_get_lladdr()
	ipv6: per-netns exclusive flowlabel checks
	Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname"
	mac80211: mlme: check for null after calling kmemdup
	brcmfmac: firmware: Fix crash in brcm_alt_fw_path
	cfg80211: fix race in netlink owner interface destruction
	net: dsa: lan9303: fix reset on probe
	net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN
	net: dsa: lantiq_gswip: fix use after free in gswip_remove()
	net: dsa: lan9303: handle hwaccel VLAN tags
	net: dsa: lan9303: add VLAN IDs to master device
	net: ieee802154: ca8210: Fix lifs/sifs periods
	ping: fix the dif and sdif check in ping_lookup
	bonding: force carrier update when releasing slave
	drop_monitor: fix data-race in dropmon_net_event / trace_napi_poll_hit
	net_sched: add __rcu annotation to netdev->qdisc
	bonding: fix data-races around agg_select_timer
	libsubcmd: Fix use-after-free for realloc(..., 0)
	net/smc: Avoid overwriting the copies of clcsock callback functions
	net: phy: mediatek: remove PHY mode check on MT7531
	atl1c: fix tx timeout after link flap on Mikrotik 10/25G NIC
	tipc: fix wrong publisher node address in link publications
	dpaa2-switch: fix default return of dpaa2_switch_flower_parse_mirror_key
	dpaa2-eth: Initialize mutex used in one step timestamping path
	net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled
	perf bpf: Defer freeing string after possible strlen() on it
	selftests/exec: Add non-regular to TEST_GEN_PROGS
	arm64: Correct wrong label in macro __init_el2_gicv3
	ALSA: usb-audio: revert to IMPLICIT_FB_FIXED_DEV for M-Audio FastTrack Ultra
	ALSA: hda/realtek: Add quirk for Legion Y9000X 2019
	ALSA: hda/realtek: Fix deadlock by COEF mutex
	ALSA: hda: Fix regression on forced probe mask option
	ALSA: hda: Fix missing codec probe on Shenker Dock 15
	ASoC: ops: Fix stereo change notifications in snd_soc_put_volsw()
	ASoC: ops: Fix stereo change notifications in snd_soc_put_volsw_range()
	ASoC: ops: Fix stereo change notifications in snd_soc_put_volsw_sx()
	ASoC: ops: Fix stereo change notifications in snd_soc_put_xr_sx()
	cifs: fix set of group SID via NTSD xattrs
	powerpc/603: Fix boot failure with DEBUG_PAGEALLOC and KFENCE
	powerpc/lib/sstep: fix 'ptesync' build error
	mtd: rawnand: gpmi: don't leak PM reference in error path
	smb3: fix snapshot mount option
	tipc: fix wrong notification node addresses
	scsi: ufs: Remove dead code
	scsi: ufs: Fix a deadlock in the error handler
	ASoC: tas2770: Insert post reset delay
	ASoC: qcom: Actually clear DMA interrupt register for HDMI
	block/wbt: fix negative inflight counter when remove scsi device
	NFS: Remove an incorrect revalidation in nfs4_update_changeattr_locked()
	NFS: LOOKUP_DIRECTORY is also ok with symlinks
	NFS: Do not report writeback errors in nfs_getattr()
	tty: n_tty: do not look ahead for EOL character past the end of the buffer
	block: fix surprise removal for drivers calling blk_set_queue_dying
	mtd: rawnand: qcom: Fix clock sequencing in qcom_nandc_probe()
	mtd: parsers: qcom: Fix kernel panic on skipped partition
	mtd: parsers: qcom: Fix missing free for pparts in cleanup
	mtd: phram: Prevent divide by zero bug in phram_setup()
	mtd: rawnand: brcmnand: Fixed incorrect sub-page ECC status
	HID: elo: fix memory leak in elo_probe
	mtd: rawnand: ingenic: Fix missing put_device in ingenic_ecc_get
	Drivers: hv: vmbus: Fix memory leak in vmbus_add_channel_kobj
	KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()
	KVM: x86/pmu: Don't truncate the PerfEvtSeln MSR when creating a perf event
	KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW
	ARM: OMAP2+: hwmod: Add of_node_put() before break
	ARM: OMAP2+: adjust the location of put_device() call in omapdss_init_of
	phy: usb: Leave some clocks running during suspend
	staging: vc04_services: Fix RCU dereference check
	phy: phy-mtk-tphy: Fix duplicated argument in phy-mtk-tphy
	irqchip/sifive-plic: Add missing thead,c900-plic match string
	x86/bug: Merge annotate_reachable() into _BUG_FLAGS() asm
	netfilter: conntrack: don't refresh sctp entries in closed state
	ksmbd: fix same UniqueId for dot and dotdot entries
	ksmbd: don't align last entry offset in smb2 query directory
	arm64: dts: meson-gx: add ATF BL32 reserved-memory region
	arm64: dts: meson-g12: add ATF BL32 reserved-memory region
	arm64: dts: meson-g12: drop BL32 region from SEI510/SEI610
	pidfd: fix test failure due to stack overflow on some arches
	selftests: fixup build warnings in pidfd / clone3 tests
	mm: io_uring: allow oom-killer from io_uring_setup
	ACPI: PM: Revert "Only mark EC GPE for wakeup on Intel systems"
	kconfig: let 'shell' return enough output for deep path names
	ata: libata-core: Disable TRIM on M88V29
	soc: aspeed: lpc-ctrl: Block error printing on probe defer cases
	xprtrdma: fix pointer derefs in error cases of rpcrdma_ep_create
	drm/rockchip: dw_hdmi: Do not leave clock enabled in error case
	tracing: Fix tp_printk option related with tp_printk_stop_on_boot
	display/amd: decrease message verbosity about watermarks table failure
	drm/amd/display: Cap pflip irqs per max otg number
	drm/amd/display: fix yellow carp wm clamping
	net: usb: qmi_wwan: Add support for Dell DW5829e
	net: macb: Align the dma and coherent dma masks
	kconfig: fix failing to generate auto.conf
	scsi: lpfc: Fix pt2pt NVMe PRLI reject LOGO loop
	EDAC: Fix calculation of returned address and next offset in edac_align_ptr()
	ucounts: Handle wrapping in is_ucounts_overlimit
	ucounts: In set_cred_ucounts assume new->ucounts is non-NULL
	ucounts: Base set_cred_ucounts changes on the real user
	ucounts: Enforce RLIMIT_NPROC not RLIMIT_NPROC+1
	lib/iov_iter: initialize "flags" in new pipe_buffer
	rlimit: Fix RLIMIT_NPROC enforcement failure caused by capability calls in set_user
	ucounts: Move RLIMIT_NPROC handling after set_user
	net: sched: limit TC_ACT_REPEAT loops
	dmaengine: sh: rcar-dmac: Check for error num after setting mask
	dmaengine: stm32-dmamux: Fix PM disable depth imbalance in stm32_dmamux_probe
	dmaengine: sh: rcar-dmac: Check for error num after dma_set_max_seg_size
	tests: fix idmapped mount_setattr test
	i2c: qcom-cci: don't delete an unregistered adapter
	i2c: qcom-cci: don't put a device tree node before i2c_add_adapter()
	dmaengine: ptdma: Fix the error handling path in pt_core_init()
	copy_process(): Move fd_install() out of sighand->siglock critical section
	scsi: qedi: Fix ABBA deadlock in qedi_process_tmf_resp() and qedi_process_cmd_cleanup_resp()
	ice: enable parsing IPSEC SPI headers for RSS
	i2c: brcmstb: fix support for DSL and CM variants
	lockdep: Correct lock_classes index mapping
	Linux 5.15.25

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib129a0e11f5e82d67563329a5de1b0aef1d87928
2022-02-23 12:30:26 +01:00
Waiman Long
795feafb72 copy_process(): Move fd_install() out of sighand->siglock critical section
commit ddc204b517e60ae64db34f9832dc41dafa77c751 upstream.

I was made aware of the following lockdep splat:

[ 2516.308763] =====================================================
[ 2516.309085] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
[ 2516.309433] 5.14.0-51.el9.aarch64+debug #1 Not tainted
[ 2516.309703] -----------------------------------------------------
[ 2516.310149] stress-ng/153663 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[ 2516.310512] ffff0000e422b198 (&newf->file_lock){+.+.}-{2:2}, at: fd_install+0x368/0x4f0
[ 2516.310944]
               and this task is already holding:
[ 2516.311248] ffff0000c08140d8 (&sighand->siglock){-.-.}-{2:2}, at: copy_process+0x1e2c/0x3e80
[ 2516.311804] which would create a new lock dependency:
[ 2516.312066]  (&sighand->siglock){-.-.}-{2:2} -> (&newf->file_lock){+.+.}-{2:2}
[ 2516.312446]
               but this new dependency connects a HARDIRQ-irq-safe lock:
[ 2516.312983]  (&sighand->siglock){-.-.}-{2:2}
   :
[ 2516.330700]  Possible interrupt unsafe locking scenario:

[ 2516.331075]        CPU0                    CPU1
[ 2516.331328]        ----                    ----
[ 2516.331580]   lock(&newf->file_lock);
[ 2516.331790]                                local_irq_disable();
[ 2516.332231]                                lock(&sighand->siglock);
[ 2516.332579]                                lock(&newf->file_lock);
[ 2516.332922]   <Interrupt>
[ 2516.333069]     lock(&sighand->siglock);
[ 2516.333291]
                *** DEADLOCK ***
[ 2516.389845]
               stack backtrace:
[ 2516.390101] CPU: 3 PID: 153663 Comm: stress-ng Kdump: loaded Not tainted 5.14.0-51.el9.aarch64+debug #1
[ 2516.390756] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[ 2516.391155] Call trace:
[ 2516.391302]  dump_backtrace+0x0/0x3e0
[ 2516.391518]  show_stack+0x24/0x30
[ 2516.391717]  dump_stack_lvl+0x9c/0xd8
[ 2516.391938]  dump_stack+0x1c/0x38
[ 2516.392247]  print_bad_irq_dependency+0x620/0x710
[ 2516.392525]  check_irq_usage+0x4fc/0x86c
[ 2516.392756]  check_prev_add+0x180/0x1d90
[ 2516.392988]  validate_chain+0x8e0/0xee0
[ 2516.393215]  __lock_acquire+0x97c/0x1e40
[ 2516.393449]  lock_acquire.part.0+0x240/0x570
[ 2516.393814]  lock_acquire+0x90/0xb4
[ 2516.394021]  _raw_spin_lock+0xe8/0x154
[ 2516.394244]  fd_install+0x368/0x4f0
[ 2516.394451]  copy_process+0x1f5c/0x3e80
[ 2516.394678]  kernel_clone+0x134/0x660
[ 2516.394895]  __do_sys_clone3+0x130/0x1f4
[ 2516.395128]  __arm64_sys_clone3+0x5c/0x7c
[ 2516.395478]  invoke_syscall.constprop.0+0x78/0x1f0
[ 2516.395762]  el0_svc_common.constprop.0+0x22c/0x2c4
[ 2516.396050]  do_el0_svc+0xb0/0x10c
[ 2516.396252]  el0_svc+0x24/0x34
[ 2516.396436]  el0t_64_sync_handler+0xa4/0x12c
[ 2516.396688]  el0t_64_sync+0x198/0x19c
[ 2517.491197] NET: Registered PF_ATMPVC protocol family
[ 2517.491524] NET: Registered PF_ATMSVC protocol family
[ 2591.991877] sched: RT throttling activated

One way to solve this problem is to move the fd_install() call out of
the sighand->siglock critical section.

Before commit 6fd2fe494b ("copy_process(): don't use ksys_close()
on cleanups"), the pidfd installation was done without holding both
the task_list lock and the sighand->siglock. Obviously, holding these
two locks are not really needed to protect the fd_install() call.
So move the fd_install() call down to after the releases of both locks.

Link: https://lore.kernel.org/r/20220208163912.1084752-1-longman@redhat.com
Fixes: 6fd2fe494b ("copy_process(): don't use ksys_close() on cleanups")
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23 12:03:21 +01:00
Eric W. Biederman
2d2d92cfcd ucounts: Enforce RLIMIT_NPROC not RLIMIT_NPROC+1
commit 8f2f9c4d82f24f172ae439e5035fc1e0e4c229dd upstream.

Michal Koutný <mkoutny@suse.com> wrote:

> It was reported that v5.14 behaves differently when enforcing
> RLIMIT_NPROC limit, namely, it allows one more task than previously.
> This is consequence of the commit 21d1c5e386 ("Reimplement
> RLIMIT_NPROC on top of ucounts") that missed the sharpness of
> equality in the forking path.

This can be fixed either by fixing the test or by moving the increment
to be before the test.  Fix it my moving copy_creds which contains
the increment before is_ucounts_overlimit.

In the case of CLONE_NEWUSER the ucounts in the task_cred changes.
The function is_ucounts_overlimit needs to use the final version of
the ucounts for the new process.  Which means moving the
is_ucounts_overlimit test after copy_creds is necessary.

Both the test in fork and the test in set_user were semantically
changed when the code moved to ucounts.  The change of the test in
fork was bad because it was before the increment.  The test in
set_user was wrong and the change to ucounts fixed it.  So this
fix only restores the old behavior in one lcation not two.

Link: https://lkml.kernel.org/r/20220204181144.24462-1-mkoutny@suse.com
Link: https://lkml.kernel.org/r/20220216155832.680775-2-ebiederm@xmission.com
Cc: stable@vger.kernel.org
Reported-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Fixes: 21d1c5e386 ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23 12:03:20 +01:00
Andrey Konovalov
cdb4c18935 FROMLIST: kasan, fork: reset pointer tags of vmapped stacks
[Combines a FROMGIT patch and a FROMLIST fix for it.]

Once tag-based KASAN modes start tagging vmalloc() allocations, kernel
stacks start getting tagged if CONFIG_VMAP_STACK is enabled.

Reset the tag of kernel stack pointers after allocation in
alloc_thread_stack_node().

For SW_TAGS KASAN, when CONFIG_KASAN_STACK is enabled, the instrumentation
can't handle the SP register being tagged.

For HW_TAGS KASAN, there's no instrumentation-related issues.  However,
the impact of having a tagged SP register needs to be properly evaluated,
so keep it non-tagged for now.

Note, that the memory for the stack allocation still gets tagged to catch
vmalloc-into-stack out-of-bounds accesses.

Link: https://lkml.kernel.org/r/c6c96f012371ecd80e1936509ebcd3b07a5956f7.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit 9d2dae85d689202c56068ce62e20821ad91c3606
 git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/linux-mm/f50c5f96ef896d7936192c888b0c0a7674e33184.1644943792.git.andreyknvl@google.com/
Bug: 217222520
Change-Id: Ie723b03f1b857bc841cffc9a424b2791c97044a6
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
2022-02-15 17:58:45 +01:00
Arnd Bergmann
049413278d UPSTREAM: mm: move anon_vma declarations to linux/mm_inline.h
The patch to add anonymous vma names causes a build failure in some
configurations:

  include/linux/mm_types.h: In function 'is_same_vma_anon_name':
  include/linux/mm_types.h:924:37: error: implicit declaration of function 'strcmp' [-Werror=implicit-function-declaration]
    924 |         return name && vma_name && !strcmp(name, vma_name);
        |                                     ^~~~~~
  include/linux/mm_types.h:22:1: note: 'strcmp' is defined in header '<string.h>'; did you forget to '#include <string.h>'?

This should not really be part of linux/mm_types.h in the first place,
as that header is meant to only contain structure defintions and need a
minimum set of indirect includes itself.

While the header clearly includes more than it should at this point,
let's not make it worse by including string.h as well, which would pull
in the expensive (compile-speed wise) fortify-string logic.

Move the new functions into a separate header that only needs to be
included in a couple of locations.

Link: https://lkml.kernel.org/r/20211207125710.2503446-1-arnd@kernel.org
Fixes: "mm: add a field to store names for private anonymous memory"
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Colin Cross <ccross@google.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 17fca131cee21724ee953a17c185c14e9533af5b)

Bug: 120441514
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I54719d7ea27d3cf53ef7245b2af88d2a2bc9bafe
2022-01-18 16:01:48 -08:00
Colin Cross
301c56064d UPSTREAM: mm: add a field to store names for private anonymous memory
In many userspace applications, and especially in VM based applications
like Android uses heavily, there are multiple different allocators in
use.  At a minimum there is libc malloc and the stack, and in many cases
there are libc malloc, the stack, direct syscalls to mmap anonymous
memory, and multiple VM heaps (one for small objects, one for big
objects, etc.).  Each of these layers usually has its own tools to
inspect its usage; malloc by compiling a debug version, the VM through
heap inspection tools, and for direct syscalls there is usually no way
to track them.

On Android we heavily use a set of tools that use an extended version of
the logic covered in Documentation/vm/pagemap.txt to walk all pages
mapped in userspace and slice their usage by process, shared (COW) vs.
unique mappings, backing, etc.  This can account for real physical
memory usage even in cases like fork without exec (which Android uses
heavily to share as many private COW pages as possible between
processes), Kernel SamePage Merging, and clean zero pages.  It produces
a measurement of the pages that only exist in that process (USS, for
unique), and a measurement of the physical memory usage of that process
with the cost of shared pages being evenly split between processes that
share them (PSS).

If all anonymous memory is indistinguishable then figuring out the real
physical memory usage (PSS) of each heap requires either a pagemap
walking tool that can understand the heap debugging of every layer, or
for every layer's heap debugging tools to implement the pagemap walking
logic, in which case it is hard to get a consistent view of memory
across the whole system.

Tracking the information in userspace leads to all sorts of problems.
It either needs to be stored inside the process, which means every
process has to have an API to export its current heap information upon
request, or it has to be stored externally in a filesystem that somebody
needs to clean up on crashes.  It needs to be readable while the process
is still running, so it has to have some sort of synchronization with
every layer of userspace.  Efficiently tracking the ranges requires
reimplementing something like the kernel vma trees, and linking to it
from every layer of userspace.  It requires more memory, more syscalls,
more runtime cost, and more complexity to separately track regions that
the kernel is already tracking.

This patch adds a field to /proc/pid/maps and /proc/pid/smaps to show a
userspace-provided name for anonymous vmas.  The names of named
anonymous vmas are shown in /proc/pid/maps and /proc/pid/smaps as
[anon:<name>].

Userspace can set the name for a region of memory by calling

   prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name)

Setting the name to NULL clears it.  The name length limit is 80 bytes
including NUL-terminator and is checked to contain only printable ascii
characters (including space), except '[',']','\','$' and '`'.

Ascii strings are being used to have a descriptive identifiers for vmas,
which can be understood by the users reading /proc/pid/maps or
/proc/pid/smaps.  Names can be standardized for a given system and they
can include some variable parts such as the name of the allocator or a
library, tid of the thread using it, etc.

The name is stored in a pointer in the shared union in vm_area_struct
that points to a null terminated string.  Anonymous vmas with the same
name (equivalent strings) and are otherwise mergeable will be merged.
The name pointers are not shared between vmas even if they contain the
same name.  The name pointer is stored in a union with fields that are
only used on file-backed mappings, so it does not increase memory usage.

CONFIG_ANON_VMA_NAME kernel configuration is introduced to enable this
feature.  It keeps the feature disabled by default to prevent any
additional memory overhead and to avoid confusing procfs parsers on
systems which are not ready to support named anonymous vmas.

The patch is based on the original patch developed by Colin Cross, more
specifically on its latest version [1] posted upstream by Sumit Semwal.
It used a userspace pointer to store vma names.  In that design, name
pointers could be shared between vmas.  However during the last
upstreaming attempt, Kees Cook raised concerns [2] about this approach
and suggested to copy the name into kernel memory space, perform
validity checks [3] and store as a string referenced from
vm_area_struct.

One big concern is about fork() performance which would need to strdup
anonymous vma names.  Dave Hansen suggested experimenting with
worst-case scenario of forking a process with 64k vmas having longest
possible names [4].  I ran this experiment on an ARM64 Android device
and recorded a worst-case regression of almost 40% when forking such a
process.

This regression is addressed in the followup patch which replaces the
pointer to a name with a refcounted structure that allows sharing the
name pointer between vmas of the same name.  Instead of duplicating the
string during fork() or when splitting a vma it increments the refcount.

[1] https://lore.kernel.org/linux-mm/20200901161459.11772-4-sumit.semwal@linaro.org/
[2] https://lore.kernel.org/linux-mm/202009031031.D32EF57ED@keescook/
[3] https://lore.kernel.org/linux-mm/202009031022.3834F692@keescook/
[4] https://lore.kernel.org/linux-mm/5d0358ab-8c47-2f5f-8e43-23b89d6a8e95@intel.com/

Changes for prctl(2) manual page (in the options section):

PR_SET_VMA
	Sets an attribute specified in arg2 for virtual memory areas
	starting from the address specified in arg3 and spanning the
	size specified	in arg4. arg5 specifies the value of the attribute
	to be set. Note that assigning an attribute to a virtual memory
	area might prevent it from being merged with adjacent virtual
	memory areas due to the difference in that attribute's value.

	Currently, arg2 must be one of:

	PR_SET_VMA_ANON_NAME
		Set a name for anonymous virtual memory areas. arg5 should
		be a pointer to a null-terminated string containing the
		name. The name length including null byte cannot exceed
		80 bytes. If arg5 is NULL, the name of the appropriate
		anonymous virtual memory areas will be reset. The name
		can contain only printable ascii characters (including
                space), except '[',']','\','$' and '`'.

                This feature is available only if the kernel is built with
                the CONFIG_ANON_VMA_NAME option enabled.

[surenb@google.com: docs: proc.rst: /proc/PID/maps: fix malformed table]
  Link: https://lkml.kernel.org/r/20211123185928.2513763-1-surenb@google.com
[surenb: rebased over v5.15-rc6, replaced userpointer with a kernel copy,
 added input sanitization and CONFIG_ANON_VMA_NAME config. The bulk of the
 work here was done by Colin Cross, therefore, with his permission, keeping
 him as the author]

Link: https://lkml.kernel.org/r/20211019215511.3771969-2-surenb@google.com
Signed-off-by: Colin Cross <ccross@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Glauber <jan.glauber@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Landley <rob@landley.net>
Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
Cc: Shaohua Li <shli@fusionio.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 9a10064f5625d5572c3626c1516e0bebc6c9fe9b)

Bug: 120441514
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I53d56d551a7d62f75341304751814294b447c04e
2022-01-18 15:30:27 -08:00
Greg Kroah-Hartman
36de88a855 Merge 5.15.3 into android13-5.15
Changes in 5.15.3
	xhci: Fix USB 3.1 enumeration issues by increasing roothub power-on-good delay
	usb: xhci: Enable runtime-pm by default on AMD Yellow Carp platform
	Input: iforce - fix control-message timeout
	Input: elantench - fix misreporting trackpoint coordinates
	Input: i8042 - Add quirk for Fujitsu Lifebook T725
	libata: fix read log timeout value
	ocfs2: fix data corruption on truncate
	scsi: scsi_ioctl: Validate command size
	scsi: core: Avoid leaving shost->last_reset with stale value if EH does not run
	scsi: core: Remove command size deduction from scsi_setup_scsi_cmnd()
	scsi: lpfc: Don't release final kref on Fport node while ABTS outstanding
	scsi: lpfc: Fix FCP I/O flush functionality for TMF routines
	scsi: qla2xxx: Fix crash in NVMe abort path
	scsi: qla2xxx: Fix kernel crash when accessing port_speed sysfs file
	scsi: qla2xxx: Fix use after free in eh_abort path
	ce/gf100: fix incorrect CE0 address calculation on some GPUs
	char: xillybus: fix msg_ep UAF in xillyusb_probe()
	mmc: mtk-sd: Add wait dma stop done flow
	mmc: dw_mmc: Dont wait for DRTO on Write RSP error
	exfat: fix incorrect loading of i_blocks for large files
	io-wq: remove worker to owner tw dependency
	parisc: Fix set_fixmap() on PA1.x CPUs
	parisc: Fix ptrace check on syscall return
	tpm: Check for integer overflow in tpm2_map_response_body()
	firmware/psci: fix application of sizeof to pointer
	crypto: s5p-sss - Add error handling in s5p_aes_probe()
	media: rkvdec: Do not override sizeimage for output format
	media: ite-cir: IR receiver stop working after receive overflow
	media: rkvdec: Support dynamic resolution changes
	media: ir-kbd-i2c: improve responsiveness of hauppauge zilog receivers
	media: v4l2-ioctl: Fix check_ext_ctrls
	ALSA: hda/realtek: Fix mic mute LED for the HP Spectre x360 14
	ALSA: hda/realtek: Add a quirk for HP OMEN 15 mute LED
	ALSA: hda/realtek: Add quirk for Clevo PC70HS
	ALSA: hda/realtek: Headset fixup for Clevo NH77HJQ
	ALSA: hda/realtek: Add a quirk for Acer Spin SP513-54N
	ALSA: hda/realtek: Add quirk for ASUS UX550VE
	ALSA: hda/realtek: Add quirk for HP EliteBook 840 G7 mute LED
	ALSA: ua101: fix division by zero at probe
	ALSA: 6fire: fix control and bulk message timeouts
	ALSA: line6: fix control and interrupt message timeouts
	ALSA: mixer: oss: Fix racy access to slots
	ALSA: mixer: fix deadlock in snd_mixer_oss_set_volume
	ALSA: usb-audio: Line6 HX-Stomp XL USB_ID for 48k-fixed quirk
	ALSA: usb-audio: Add registration quirk for JBL Quantum 400
	ALSA: hda: Free card instance properly at probe errors
	ALSA: synth: missing check for possible NULL after the call to kstrdup
	ALSA: pci: rme: Fix unaligned buffer addresses
	ALSA: PCM: Fix NULL dereference at mmap checks
	ALSA: timer: Fix use-after-free problem
	ALSA: timer: Unconditionally unlink slave instances, too
	Revert "ext4: enforce buffer head state assertion in ext4_da_map_blocks"
	ext4: fix lazy initialization next schedule time computation in more granular unit
	ext4: ensure enough credits in ext4_ext_shift_path_extents
	ext4: refresh the ext4_ext_path struct after dropping i_data_sem.
	fuse: fix page stealing
	x86/sme: Use #define USE_EARLY_PGTABLE_L5 in mem_encrypt_identity.c
	x86/cpu: Fix migration safety with X86_BUG_NULL_SEL
	x86/irq: Ensure PI wakeup handler is unregistered before module unload
	x86/iopl: Fake iopl(3) CLI/STI usage
	btrfs: clear MISSING device status bit in btrfs_close_one_device
	btrfs: fix lost error handling when replaying directory deletes
	btrfs: call btrfs_check_rw_degradable only if there is a missing device
	KVM: x86/mmu: Drop a redundant, broken remote TLB flush
	KVM: VMX: Unregister posted interrupt wakeup handler on hardware unsetup
	KVM: PPC: Tick accounting should defer vtime accounting 'til after IRQ handling
	ia64: kprobes: Fix to pass correct trampoline address to the handler
	selinux: fix race condition when computing ocontext SIDs
	ipmi:watchdog: Set panic count to proper value on a panic
	md/raid1: only allocate write behind bio for WriteMostly device
	hwmon: (pmbus/lm25066) Add offset coefficients
	regulator: s5m8767: do not use reset value as DVS voltage if GPIO DVS is disabled
	regulator: dt-bindings: samsung,s5m8767: correct s5m8767,pmic-buck-default-dvs-idx property
	EDAC/sb_edac: Fix top-of-high-memory value for Broadwell/Haswell
	mwifiex: fix division by zero in fw download path
	ath6kl: fix division by zero in send path
	ath6kl: fix control-message timeout
	ath10k: fix control-message timeout
	ath10k: fix division by zero in send path
	PCI: Mark Atheros QCA6174 to avoid bus reset
	rtl8187: fix control-message timeouts
	evm: mark evm_fixmode as __ro_after_init
	ifb: Depend on netfilter alternatively to tc
	platform/surface: aggregator_registry: Add support for Surface Laptop Studio
	mt76: mt7615: fix skb use-after-free on mac reset
	HID: surface-hid: Use correct event registry for managing HID events
	HID: surface-hid: Allow driver matching for target ID 1 devices
	wcn36xx: Fix HT40 capability for 2Ghz band
	wcn36xx: Fix tx_status mechanism
	wcn36xx: Fix (QoS) null data frame bitrate/modulation
	PM: sleep: Do not let "syscore" devices runtime-suspend during system transitions
	mwifiex: Read a PCI register after writing the TX ring write pointer
	mwifiex: Try waking the firmware until we get an interrupt
	libata: fix checking of DMA state
	dma-buf: fix and rework dma_buf_poll v7
	wcn36xx: handle connection loss indication
	rsi: fix occasional initialisation failure with BT coex
	rsi: fix key enabled check causing unwanted encryption for vap_id > 0
	rsi: fix rate mask set leading to P2P failure
	rsi: Fix module dev_oper_mode parameter description
	perf/x86/intel/uncore: Support extra IMC channel on Ice Lake server
	perf/x86/intel/uncore: Fix invalid unit check
	perf/x86/intel/uncore: Fix Intel ICX IIO event constraints
	RDMA/qedr: Fix NULL deref for query_qp on the GSI QP
	ASoC: tegra: Set default card name for Trimslice
	ASoC: tegra: Restore AC97 support
	signal: Remove the bogus sigkill_pending in ptrace_stop
	memory: renesas-rpc-if: Correct QSPI data transfer in Manual mode
	signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT
	signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed
	soc: samsung: exynos-pmu: Fix compilation when nothing selects CONFIG_MFD_CORE
	soc: fsl: dpio: replace smp_processor_id with raw_smp_processor_id
	soc: fsl: dpio: use the combined functions to protect critical zone
	mtd: rawnand: socrates: Keep the driver compatible with on-die ECC engines
	mctp: handle the struct sockaddr_mctp padding fields
	power: supply: max17042_battery: Prevent int underflow in set_soc_threshold
	power: supply: max17042_battery: use VFSOC for capacity when no rsns
	iio: core: fix double free in iio_device_unregister_sysfs()
	iio: core: check return value when calling dev_set_name()
	KVM: arm64: Extract ESR_ELx.EC only
	KVM: x86: Fix recording of guest steal time / preempted status
	KVM: x86: Add helper to consolidate core logic of SET_CPUID{2} flows
	KVM: nVMX: Query current VMCS when determining if MSR bitmaps are in use
	KVM: nVMX: Handle dynamic MSR intercept toggling
	can: peak_usb: always ask for BERR reporting for PCAN-USB devices
	can: mcp251xfd: mcp251xfd_irq(): add missing can_rx_offload_threaded_irq_finish() in case of bus off
	can: j1939: j1939_tp_cmd_recv(): ignore abort message in the BAM transport
	can: j1939: j1939_can_recv(): ignore messages with invalid source address
	can: j1939: j1939_tp_cmd_recv(): check the dst address of TP.CM_BAM
	iio: adc: tsc2046: fix scan interval warning
	powerpc/85xx: Fix oops when mpc85xx_smp_guts_ids node cannot be found
	io_uring: honour zeroes as io-wq worker limits
	ring-buffer: Protect ring_buffer_reset() from reentrancy
	serial: core: Fix initializing and restoring termios speed
	ifb: fix building without CONFIG_NET_CLS_ACT
	xen/balloon: add late_initcall_sync() for initial ballooning done
	ovl: fix use after free in struct ovl_aio_req
	ovl: fix filattr copy-up failure
	PCI: pci-bridge-emul: Fix emulation of W1C bits
	PCI: cadence: Add cdns_plat_pcie_probe() missing return
	cxl/pci: Fix NULL vs ERR_PTR confusion
	PCI: aardvark: Do not clear status bits of masked interrupts
	PCI: aardvark: Fix checking for link up via LTSSM state
	PCI: aardvark: Do not unmask unused interrupts
	PCI: aardvark: Fix reporting Data Link Layer Link Active
	PCI: aardvark: Fix configuring Reference clock
	PCI: aardvark: Fix return value of MSI domain .alloc() method
	PCI: aardvark: Read all 16-bits from PCIE_MSI_PAYLOAD_REG
	PCI: aardvark: Fix support for bus mastering and PCI_COMMAND on emulated bridge
	PCI: aardvark: Fix support for PCI_BRIDGE_CTL_BUS_RESET on emulated bridge
	PCI: aardvark: Set PCI Bridge Class Code to PCI Bridge
	PCI: aardvark: Fix support for PCI_ROM_ADDRESS1 on emulated bridge
	quota: check block number when reading the block in quota file
	quota: correct error number in free_dqentry()
	cifs: To match file servers, make sure the server hostname matches
	cifs: set a minimum of 120s for next dns resolution
	mfd: simple-mfd-i2c: Select MFD_CORE to fix build error
	pinctrl: core: fix possible memory leak in pinctrl_enable()
	coresight: cti: Correct the parameter for pm_runtime_put
	coresight: trbe: Fix incorrect access of the sink specific data
	coresight: trbe: Defer the probe on offline CPUs
	iio: buffer: check return value of kstrdup_const()
	iio: buffer: Fix memory leak in iio_buffers_alloc_sysfs_and_mask()
	iio: buffer: Fix memory leak in __iio_buffer_alloc_sysfs_and_mask()
	iio: buffer: Fix memory leak in iio_buffer_register_legacy_sysfs_groups()
	drivers: iio: dac: ad5766: Fix dt property name
	iio: dac: ad5446: Fix ad5622_write() return value
	iio: ad5770r: make devicetree property reading consistent
	Documentation:devicetree:bindings:iio:dac: Fix val
	USB: serial: keyspan: fix memleak on probe errors
	serial: 8250: fix racy uartclk update
	ksmbd: set unique value to volume serial field in FS_VOLUME_INFORMATION
	io-wq: serialize hash clear with wakeup
	serial: 8250: Fix reporting real baudrate value in c_ospeed field
	Revert "serial: 8250: Fix reporting real baudrate value in c_ospeed field"
	most: fix control-message timeouts
	USB: iowarrior: fix control-message timeouts
	USB: chipidea: fix interrupt deadlock
	power: supply: max17042_battery: Clear status bits in interrupt handler
	component: do not leave master devres group open after bind
	dma-buf: WARN on dmabuf release with pending attachments
	drm: panel-orientation-quirks: Update the Lenovo Ideapad D330 quirk (v2)
	drm: panel-orientation-quirks: Add quirk for KD Kurio Smart C15200 2-in-1
	drm: panel-orientation-quirks: Add quirk for the Samsung Galaxy Book 10.6
	Bluetooth: sco: Fix lock_sock() blockage by memcpy_from_msg()
	Bluetooth: fix use-after-free error in lock_sock_nested()
	Bluetooth: call sock_hold earlier in sco_conn_del
	drm/panel-orientation-quirks: add Valve Steam Deck
	rcutorture: Avoid problematic critical section nesting on PREEMPT_RT
	platform/x86: wmi: do not fail if disabling fails
	drm/amdgpu: move iommu_resume before ip init/resume
	MIPS: lantiq: dma: add small delay after reset
	MIPS: lantiq: dma: reset correct number of channel
	locking/lockdep: Avoid RCU-induced noinstr fail
	net: sched: update default qdisc visibility after Tx queue cnt changes
	ACPI: resources: Add DMI-based legacy IRQ override quirk
	rcu-tasks: Move RTGS_WAIT_CBS to beginning of rcu_tasks_kthread() loop
	smackfs: Fix use-after-free in netlbl_catmap_walk()
	ath11k: Align bss_chan_info structure with firmware
	crypto: aesni - check walk.nbytes instead of err
	x86/mm/64: Improve stack overflow warnings
	x86: Increase exception stack sizes
	mwifiex: Run SET_BSS_MODE when changing from P2P to STATION vif-type
	mwifiex: Properly initialize private structure on interface type changes
	spi: Check we have a spi_device_id for each DT compatible
	fscrypt: allow 256-bit master keys with AES-256-XTS
	drm/amdgpu: Fix MMIO access page fault
	drm/amd/display: Fix null pointer dereference for encoders
	selftests: net: fib_nexthops: Wait before checking reported idle time
	ath11k: Avoid reg rules update during firmware recovery
	ath11k: add handler for scan event WMI_SCAN_EVENT_DEQUEUED
	ath11k: Change DMA_FROM_DEVICE to DMA_TO_DEVICE when map reinjected packets
	ath10k: high latency fixes for beacon buffer
	octeontx2-pf: Enable promisc/allmulti match MCAM entries.
	media: mt9p031: Fix corrupted frame after restarting stream
	media: netup_unidvb: handle interrupt properly according to the firmware
	media: atomisp: Fix error handling in probe
	media: stm32: Potential NULL pointer dereference in dcmi_irq_thread()
	media: uvcvideo: Set capability in s_param
	media: uvcvideo: Return -EIO for control errors
	media: uvcvideo: Set unique vdev name based in type
	media: vidtv: Fix memory leak in remove
	media: s5p-mfc: fix possible null-pointer dereference in s5p_mfc_probe()
	media: s5p-mfc: Add checking to s5p_mfc_probe().
	media: videobuf2: rework vb2_mem_ops API
	media: imx: set a media_device bus_info string
	media: rcar-vin: Use user provided buffers when starting
	media: mceusb: return without resubmitting URB in case of -EPROTO error.
	ia64: don't do IA64_CMPXCHG_DEBUG without CONFIG_PRINTK
	rtw88: fix RX clock gate setting while fifo dump
	brcmfmac: Add DMI nvram filename quirk for Cyberbook T116 tablet
	media: rcar-csi2: Add checking to rcsi2_start_receiver()
	ipmi: Disable some operations during a panic
	fs/proc/uptime.c: Fix idle time reporting in /proc/uptime
	kselftests/sched: cleanup the child processes
	ACPICA: Avoid evaluating methods too early during system resume
	cpufreq: Make policy min/max hard requirements
	ice: Move devlink port to PF/VF struct
	media: imx-jpeg: Fix possible null pointer dereference
	media: ipu3-imgu: imgu_fmt: Handle properly try
	media: ipu3-imgu: VIDIOC_QUERYCAP: Fix bus_info
	media: usb: dvd-usb: fix uninit-value bug in dibusb_read_eeprom_byte()
	net-sysfs: try not to restart the syscall if it will fail eventually
	drm/amdkfd: rm BO resv on validation to avoid deadlock
	tracefs: Have tracefs directories not set OTH permission bits by default
	tracing: Disable "other" permission bits in the tracefs files
	ath: dfs_pattern_detector: Fix possible null-pointer dereference in channel_detector_create()
	KVM: arm64: Propagate errors from __pkvm_prot_finalize hypercall
	mmc: moxart: Fix reference count leaks in moxart_probe
	iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value
	ACPI: battery: Accept charges over the design capacity as full
	ACPI: scan: Release PM resources blocked by unused objects
	drm/amd/display: fix null pointer deref when plugging in display
	drm/amdkfd: fix resume error when iommu disabled in Picasso
	net: phy: micrel: make *-skew-ps check more lenient
	leaking_addresses: Always print a trailing newline
	thermal/core: Fix null pointer dereference in thermal_release()
	drm/msm: prevent NULL dereference in msm_gpu_crashstate_capture()
	thermal/drivers/tsens: Add timeout to get_temp_tsens_valid
	block: bump max plugged deferred size from 16 to 32
	floppy: fix calling platform_device_unregister() on invalid drives
	md: update superblock after changing rdev flags in state_store
	memstick: r592: Fix a UAF bug when removing the driver
	locking/rwsem: Disable preemption for spinning region
	lib/xz: Avoid overlapping memcpy() with invalid input with in-place decompression
	lib/xz: Validate the value before assigning it to an enum variable
	workqueue: make sysfs of unbound kworker cpumask more clever
	tracing/cfi: Fix cmp_entries_* functions signature mismatch
	mt76: mt7915: fix an off-by-one bound check
	mwl8k: Fix use-after-free in mwl8k_fw_state_machine()
	iwlwifi: change all JnP to NO-160 configuration
	block: remove inaccurate requeue check
	media: allegro: ignore interrupt if mailbox is not initialized
	drm/amdgpu/pm: properly handle sclk for profiling modes on vangogh
	nvmet: fix use-after-free when a port is removed
	nvmet-rdma: fix use-after-free when a port is removed
	nvmet-tcp: fix use-after-free when a port is removed
	nvme: drop scan_lock and always kick requeue list when removing namespaces
	samples/bpf: Fix application of sizeof to pointer
	arm64: vdso32: suppress error message for 'make mrproper'
	PM: hibernate: Get block device exclusively in swsusp_check()
	selftests: kvm: fix mismatched fclose() after popen()
	selftests/bpf: Fix perf_buffer test on system with offline cpus
	iwlwifi: mvm: disable RX-diversity in powersave
	smackfs: use __GFP_NOFAIL for smk_cipso_doi()
	ARM: clang: Do not rely on lr register for stacktrace
	gre/sit: Don't generate link-local addr if addr_gen_mode is IN6_ADDR_GEN_MODE_NONE
	can: bittiming: can_fixup_bittiming(): change type of tseg1 and alltseg to unsigned int
	gfs2: Cancel remote delete work asynchronously
	gfs2: Fix glock_hash_walk bugs
	ARM: 9136/1: ARMv7-M uses BE-8, not BE-32
	tools/latency-collector: Use correct size when writing queue_full_warning
	vrf: run conntrack only in context of lower/physdev for locally generated packets
	net: annotate data-race in neigh_output()
	ACPI: AC: Quirk GK45 to skip reading _PSR
	ACPI: resources: Add one more Medion model in IRQ override quirk
	btrfs: reflink: initialize return value to 0 in btrfs_extent_same()
	btrfs: do not take the uuid_mutex in btrfs_rm_device
	spi: bcm-qspi: Fix missing clk_disable_unprepare() on error in bcm_qspi_probe()
	wcn36xx: Correct band/freq reporting on RX
	wcn36xx: Fix packet drop on resume
	Revert "wcn36xx: Enable firmware link monitoring"
	ftrace: do CPU checking after preemption disabled
	inet: remove races in inet{6}_getname()
	x86/hyperv: Protect set_hv_tscchange_cb() against getting preempted
	drm/amd/display: dcn20_resource_construct reduce scope of FPU enabled
	selftests/core: fix conflicting types compile error for close_range()
	perf/x86/intel: Fix ICL/SPR INST_RETIRED.PREC_DIST encodings
	parisc: fix warning in flush_tlb_all
	task_stack: Fix end_of_stack() for architectures with upwards-growing stack
	erofs: don't trigger WARN() when decompression fails
	parisc/unwind: fix unwinder when CONFIG_64BIT is enabled
	parisc/kgdb: add kgdb_roundup() to make kgdb work with idle polling
	netfilter: conntrack: set on IPS_ASSURED if flows enters internal stream state
	selftests/bpf: Fix strobemeta selftest regression
	fbdev/efifb: Release PCI device's runtime PM ref during FB destroy
	drm/bridge: anx7625: Propagate errors from sp_tx_rst_aux()
	perf/x86/intel/uncore: Fix Intel SPR CHA event constraints
	perf/x86/intel/uncore: Fix Intel SPR IIO event constraints
	perf/x86/intel/uncore: Fix Intel SPR M2PCIE event constraints
	perf/x86/intel/uncore: Fix Intel SPR M3UPI event constraints
	drm/bridge: it66121: Initialize {device,vendor}_ids
	drm/bridge: it66121: Wait for next bridge to be probed
	Bluetooth: fix init and cleanup of sco_conn.timeout_work
	libbpf: Don't crash on object files with no symbol tables
	Bluetooth: hci_uart: fix GPF in h5_recv
	rcu: Fix existing exp request check in sync_sched_exp_online_cleanup()
	MIPS: lantiq: dma: fix burst length for DEU
	x86/xen: Mark cpu_bringup_and_idle() as dead_end_function
	objtool: Handle __sanitize_cov*() tail calls
	net/mlx5: Publish and unpublish all devlink parameters at once
	drm/v3d: fix wait for TMU write combiner flush
	crypto: sm4 - Do not change section of ck and sbox
	virtio-gpu: fix possible memory allocation failure
	lockdep: Let lock_is_held_type() detect recursive read as read
	net: net_namespace: Fix undefined member in key_remove_domain()
	net: phylink: don't call netif_carrier_off() with NULL netdev
	drm: bridge: it66121: Fix return value it66121_probe
	spi: Fixed division by zero warning
	cgroup: Make rebind_subsystems() disable v2 controllers all at once
	wcn36xx: Fix Antenna Diversity Switching
	wilc1000: fix possible memory leak in cfg_scan_result()
	Bluetooth: btmtkuart: fix a memleak in mtk_hci_wmt_sync
	drm/amdgpu: Fix crash on device remove/driver unload
	drm/amd/display: Pass display_pipe_params_st as const in DML
	drm/amdgpu: move amdgpu_virt_release_full_gpu to fini_early stage
	crypto: caam - disable pkc for non-E SoCs
	crypto: qat - power up 4xxx device
	Bluetooth: hci_h5: Fix (runtime)suspend issues on RTL8723BS HCIs
	bnxt_en: Check devlink allocation and registration status
	qed: Don't ignore devlink allocation failures
	rxrpc: Fix _usecs_to_jiffies() by using usecs_to_jiffies()
	mptcp: do not shrink snd_nxt when recovering
	fortify: Fix dropped strcpy() compile-time write overflow check
	mac80211: twt: don't use potentially unaligned pointer
	cfg80211: always free wiphy specific regdomain
	net/mlx5: Accept devlink user input after driver initialization complete
	net: dsa: rtl8366rb: Fix off-by-one bug
	net: dsa: rtl8366: Fix a bug in deleting VLANs
	bpf/tests: Fix error in tail call limit tests
	ath11k: fix some sleeping in atomic bugs
	ath11k: Avoid race during regd updates
	ath11k: fix packet drops due to incorrect 6 GHz freq value in rx status
	ath11k: Fix memory leak in ath11k_qmi_driver_event_work
	gve: DQO: avoid unused variable warnings
	ath10k: Fix missing frame timestamp for beacon/probe-resp
	ath10k: sdio: Add missing BH locking around napi_schdule()
	drm/ttm: stop calling tt_swapin in vm_access
	arm64: mm: update max_pfn after memory hotplug
	drm/amdgpu: fix warning for overflow check
	libbpf: Fix skel_internal.h to set errno on loader retval < 0
	media: em28xx: add missing em28xx_close_extension
	media: meson-ge2d: Fix rotation parameter changes detection in 'ge2d_s_ctrl()'
	media: cxd2880-spi: Fix a null pointer dereference on error handling path
	media: ttusb-dec: avoid release of non-acquired mutex
	media: dvb-usb: fix ununit-value in az6027_rc_query
	media: imx258: Fix getting clock frequency
	media: v4l2-ioctl: S_CTRL output the right value
	media: mtk-vcodec: venc: fix return value when start_streaming fails
	media: TDA1997x: handle short reads of hdmi info frame.
	media: mtk-vpu: Fix a resource leak in the error handling path of 'mtk_vpu_probe()'
	media: imx-jpeg: Fix the error handling path of 'mxc_jpeg_probe()'
	media: i2c: ths8200 needs V4L2_ASYNC
	media: sun6i-csi: Allow the video device to be open multiple times
	media: radio-wl1273: Avoid card name truncation
	media: si470x: Avoid card name truncation
	media: tm6000: Avoid card name truncation
	media: cx23885: Fix snd_card_free call on null card pointer
	media: atmel: fix the ispck initialization
	scs: Release kasan vmalloc poison in scs_free process
	kprobes: Do not use local variable when creating debugfs file
	crypto: ecc - fix CRYPTO_DEFAULT_RNG dependency
	drm: fb_helper: fix CONFIG_FB dependency
	cpuidle: Fix kobject memory leaks in error paths
	media: em28xx: Don't use ops->suspend if it is NULL
	ath10k: Don't always treat modem stop events as crashes
	ath9k: Fix potential interrupt storm on queue reset
	PM: EM: Fix inefficient states detection
	x86/insn: Use get_unaligned() instead of memcpy()
	EDAC/amd64: Handle three rank interleaving mode
	rcu: Always inline rcu_dynticks_task*_{enter,exit}()
	rcu: Fix rcu_dynticks_curr_cpu_in_eqs() vs noinstr
	netfilter: nft_dynset: relax superfluous check on set updates
	media: venus: fix vpp frequency calculation for decoder
	media: dvb-frontends: mn88443x: Handle errors of clk_prepare_enable()
	crypto: ccree - avoid out-of-range warnings from clang
	crypto: qat - detect PFVF collision after ACK
	crypto: qat - disregard spurious PFVF interrupts
	hwrng: mtk - Force runtime pm ops for sleep ops
	ima: fix deadlock when traversing "ima_default_rules".
	b43legacy: fix a lower bounds test
	b43: fix a lower bounds test
	gve: Recover from queue stall due to missed IRQ
	gve: Track RX buffer allocation failures
	mmc: sdhci-omap: Fix NULL pointer exception if regulator is not configured
	mmc: sdhci-omap: Fix context restore
	memstick: avoid out-of-range warning
	memstick: jmb38x_ms: use appropriate free function in jmb38x_ms_alloc_host()
	net, neigh: Fix NTF_EXT_LEARNED in combination with NTF_USE
	hwmon: Fix possible memleak in __hwmon_device_register()
	hwmon: (pmbus/lm25066) Let compiler determine outer dimension of lm25066_coeff
	ath10k: fix max antenna gain unit
	kernel/sched: Fix sched_fork() access an invalid sched_task_group
	net: fealnx: fix build for UML
	net: intel: igc_ptp: fix build for UML
	net: tulip: winbond-840: fix build for UML
	tcp: switch orphan_count to bare per-cpu counters
	crypto: octeontx2 - set assoclen in aead_do_fallback()
	thermal/core: fix a UAF bug in __thermal_cooling_device_register()
	drm/msm/dsi: do not enable irq handler before powering up the host
	drm/msm: Fix potential Oops in a6xx_gmu_rpmh_init()
	drm/msm: potential error pointer dereference in init()
	drm/msm: unlock on error in get_sched_entity()
	drm/msm: fix potential NULL dereference in cleanup
	drm/msm: uninitialized variable in msm_gem_import()
	net: stream: don't purge sk_error_queue in sk_stream_kill_queues()
	thermal/drivers/qcom/lmh: make QCOM_LMH depends on QCOM_SCM
	mailbox: Remove WARN_ON for async_cb.cb in cmdq_exec_done
	media: ivtv: fix build for UML
	media: ir_toy: assignment to be16 should be of correct type
	mmc: mxs-mmc: disable regulator on error and in the remove function
	io-wq: Remove duplicate code in io_workqueue_create()
	block: ataflop: fix breakage introduced at blk-mq refactoring
	blk-wbt: prevent NULL pointer dereference in wb_timer_fn
	platform/x86: thinkpad_acpi: Fix bitwise vs. logical warning
	mailbox: mtk-cmdq: Validate alias_id on probe
	mailbox: mtk-cmdq: Fix local clock ID usage
	ACPI: PM: Turn off unused wakeup power resources
	ACPI: PM: Fix sharing of wakeup power resources
	drm/amdkfd: Fix an inappropriate error handling in allloc memory of gpu
	mt76: mt7921: fix endianness in mt7921_mcu_tx_done_event
	mt76: mt7915: fix endianness warning in mt7915_mac_add_txs_skb
	mt76: mt7921: fix endianness warning in mt7921_update_txs
	mt76: mt7615: fix endianness warning in mt7615_mac_write_txwi
	mt76: mt7915: fix info leak in mt7915_mcu_set_pre_cal()
	mt76: connac: fix mt76_connac_gtk_rekey_tlv usage
	mt76: fix build error implicit enumeration conversion
	mt76: mt7921: fix survey-dump reporting
	mt76: mt76x02: fix endianness warnings in mt76x02_mac.c
	mt76: mt7921: Fix out of order process by invalid event pkt
	mt76: mt7915: fix potential overflow of eeprom page index
	mt76: mt7915: fix bit fields for HT rate idx
	mt76: mt7921: fix dma hang in rmmod
	mt76: connac: fix GTK rekey offload failure on WPA mixed mode
	mt76: overwrite default reg_ops if necessary
	mt76: mt7921: report HE MU radiotap
	mt76: mt7921: fix firmware usage of RA info using legacy rates
	mt76: mt7921: fix kernel warning from cfg80211_calculate_bitrate
	mt76: mt7921: always wake device if necessary in debugfs
	mt76: mt7915: fix hwmon temp sensor mem use-after-free
	mt76: mt7615: fix hwmon temp sensor mem use-after-free
	mt76: mt7915: fix possible infinite loop release semaphore
	mt76: mt7921: fix retrying release semaphore without end
	mt76: mt7615: fix monitor mode tear down crash
	mt76: connac: fix possible NULL pointer dereference in mt76_connac_get_phy_mode_v2
	mt76: mt7915: fix sta_rec_wtbl tag len
	mt76: mt7915: fix muar_idx in mt7915_mcu_alloc_sta_req()
	rsi: stop thread firstly in rsi_91x_init() error handling
	mwifiex: Send DELBA requests according to spec
	iwlwifi: mvm: reset PM state on unsuccessful resume
	iwlwifi: pnvm: don't kmemdup() more than we have
	iwlwifi: pnvm: read EFI data only if long enough
	net: enetc: unmap DMA in enetc_send_cmd()
	phy: micrel: ksz8041nl: do not use power down mode
	nbd: Fix use-after-free in pid_show
	nvme-rdma: fix error code in nvme_rdma_setup_ctrl
	PM: hibernate: fix sparse warnings
	clocksource/drivers/timer-ti-dm: Select TIMER_OF
	x86/sev: Fix stack type check in vc_switch_off_ist()
	drm/msm: Fix potential NULL dereference in DPU SSPP
	drm/msm/dsi: fix wrong type in msm_dsi_host
	crypto: tcrypt - fix skcipher multi-buffer tests for 1420B blocks
	smackfs: use netlbl_cfg_cipsov4_del() for deleting cipso_v4_doi
	KVM: selftests: Fix nested SVM tests when built with clang
	libbpf: Fix memory leak in btf__dedup()
	bpftool: Avoid leaking the JSON writer prepared for program metadata
	libbpf: Fix overflow in BTF sanity checks
	libbpf: Fix BTF header parsing checks
	mt76: mt7615: mt7622: fix ibss and meshpoint
	s390/gmap: validate VMA in __gmap_zap()
	s390/gmap: don't unconditionally call pte_unmap_unlock() in __gmap_zap()
	s390/mm: validate VMA in PGSTE manipulation functions
	s390/mm: fix VMA and page table handling code in storage key handling functions
	s390/uv: fully validate the VMA before calling follow_page()
	KVM: s390: pv: avoid double free of sida page
	KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm
	irq: mips: avoid nested irq_enter()
	net: dsa: avoid refcount warnings when ->port_{fdb,mdb}_del returns error
	ARM: 9142/1: kasan: work around LPAE build warning
	ath10k: fix module load regression with iram-recovery feature
	block: ataflop: more blk-mq refactoring fixes
	blk-cgroup: synchronize blkg creation against policy deactivation
	libbpf: Fix off-by-one bug in bpf_core_apply_relo()
	tpm: fix Atmel TPM crash caused by too frequent queries
	tpm_tis_spi: Add missing SPI ID
	libbpf: Fix endianness detection in BPF_CORE_READ_BITFIELD_PROBED()
	tcp: don't free a FIN sk_buff in tcp_remove_empty_skb()
	tracing: Fix missing trace_boot_init_histograms kstrdup NULL checks
	cpufreq: intel_pstate: Fix cpu->pstate.turbo_freq initialization
	spi: spi-rpc-if: Check return value of rpcif_sw_init()
	samples/kretprobes: Fix return value if register_kretprobe() failed
	KVM: s390: Fix handle_sske page fault handling
	libertas_tf: Fix possible memory leak in probe and disconnect
	libertas: Fix possible memory leak in probe and disconnect
	wcn36xx: add proper DMA memory barriers in rx path
	wcn36xx: Fix discarded frames due to wrong sequence number
	bpf: Avoid races in __bpf_prog_run() for 32bit arches
	bpf: Fixes possible race in update_prog_stats() for 32bit arches
	wcn36xx: Channel list update before hardware scan
	drm/amdgpu: fix a potential memory leak in amdgpu_device_fini_sw()
	drm/amdgpu/gmc6: fix DMA mask from 44 to 40 bits
	selftests/bpf: Fix fd cleanup in sk_lookup test
	selftests/bpf: Fix memory leak in test_ima
	sctp: allow IP fragmentation when PLPMTUD enters Error state
	sctp: reset probe_timer in sctp_transport_pl_update
	sctp: subtract sctphdr len in sctp_transport_pl_hlen
	sctp: return true only for pathmtu update in sctp_transport_pl_toobig
	net: amd-xgbe: Toggle PLL settings during rate change
	ipmi: kcs_bmc: Fix a memory leak in the error handling path of 'kcs_bmc_serio_add_device()'
	nfp: fix NULL pointer access when scheduling dim work
	nfp: fix potential deadlock when canceling dim work
	net: phylink: avoid mvneta warning when setting pause parameters
	net: bridge: fix uninitialized variables when BRIDGE_CFM is disabled
	selftests: net: bridge: update IGMP/MLD membership interval value
	crypto: pcrypt - Delay write to padata->info
	selftests/bpf: Fix fclose/pclose mismatch in test_progs
	udp6: allow SO_MARK ctrl msg to affect routing
	ibmvnic: don't stop queue in xmit
	ibmvnic: Process crqs after enabling interrupts
	ibmvnic: delay complete()
	selftests: mptcp: fix proto type in link_failure tests
	skmsg: Lose offset info in sk_psock_skb_ingress
	cgroup: Fix rootcg cpu.stat guest double counting
	bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off.
	bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit.
	of: unittest: fix EXPECT text for gpio hog errors
	cpufreq: Fix parameter in parse_perf_domain()
	staging: r8188eu: fix memory leak in rtw_set_key
	arm64: dts: meson: sm1: add Ethernet PHY reset line for ODROID-C4/HC4
	iio: st_sensors: disable regulators after device unregistration
	RDMA/rxe: Fix wrong port_cap_flags
	ARM: dts: BCM5301X: Fix memory nodes names
	arm64: dts: broadcom: bcm4908: Fix UART clock name
	clk: mvebu: ap-cpu-clk: Fix a memory leak in error handling paths
	scsi: pm80xx: Fix lockup in outbound queue management
	scsi: qla2xxx: edif: Use link event to wake up app
	scsi: lpfc: Fix NVMe I/O failover to non-optimized path
	ARM: s3c: irq-s3c24xx: Fix return value check for s3c24xx_init_intc()
	arm64: dts: rockchip: Fix GPU register width for RK3328
	ARM: dts: qcom: msm8974: Add xo_board reference clock to DSI0 PHY
	RDMA/bnxt_re: Fix query SRQ failure
	arm64: dts: ti: k3-j721e-main: Fix "max-virtual-functions" in PCIe EP nodes
	arm64: dts: ti: k3-j721e-main: Fix "bus-range" upto 256 bus number for PCIe
	arm64: dts: ti: j7200-main: Fix "vendor-id"/"device-id" properties of pcie node
	arm64: dts: ti: j7200-main: Fix "bus-range" upto 256 bus number for PCIe
	arm64: dts: meson-g12a: Fix the pwm regulator supply properties
	arm64: dts: meson-g12b: Fix the pwm regulator supply properties
	arm64: dts: meson-sm1: Fix the pwm regulator supply properties
	bus: ti-sysc: Fix timekeeping_suspended warning on resume
	ARM: dts: at91: tse850: the emac<->phy interface is rmii
	arm64: dts: qcom: sc7180: Base dynamic CPU power coefficients in reality
	soc: qcom: llcc: Disable MMUHWT retention
	arm64: dts: qcom: sc7280: fix display port phy reg property
	scsi: dc395: Fix error case unwinding
	MIPS: loongson64: make CPU_LOONGSON64 depends on MIPS_FP_SUPPORT
	JFS: fix memleak in jfs_mount
	pinctrl: renesas: rzg2l: Fix missing port register 21h
	ASoC: wcd9335: Use correct version to initialize Class H
	arm64: dts: qcom: msm8916: Fix Secondary MI2S bit clock
	arm64: dts: renesas: beacon: Fix Ethernet PHY mode
	iommu/mediatek: Fix out-of-range warning with clang
	arm64: dts: qcom: pm8916: Remove wrong reg-names for rtc@6000
	iommu/dma: Fix sync_sg with swiotlb
	iommu/dma: Fix arch_sync_dma for map
	ALSA: hda: Reduce udelay() at SKL+ position reporting
	ALSA: hda: Use position buffer for SKL+ again
	ALSA: usb-audio: Fix possible race at sync of urb completions
	soundwire: debugfs: use controller id and link_id for debugfs
	power: reset: at91-reset: check properly the return value of devm_of_iomap
	scsi: ufs: core: Fix ufshcd_probe_hba() prototype to match the definition
	scsi: ufs: core: Stop clearing UNIT ATTENTIONS
	scsi: megaraid_sas: Fix concurrent access to ISR between IRQ polling and real interrupt
	scsi: pm80xx: Fix misleading log statement in pm8001_mpi_get_nvmd_resp()
	driver core: Fix possible memory leak in device_link_add()
	arm: dts: omap3-gta04a4: accelerometer irq fix
	ASoC: SOF: topology: do not power down primary core during topology removal
	iio: st_pressure_spi: Add missing entries SPI to device ID table
	soc/tegra: Fix an error handling path in tegra_powergate_power_up()
	memory: fsl_ifc: fix leak of irq and nand_irq in fsl_ifc_ctrl_probe
	clk: at91: check pmc node status before registering syscore ops
	powerpc/mem: Fix arch/powerpc/mm/mem.c:53:12: error: no previous prototype for 'create_section_mapping'
	video: fbdev: chipsfb: use memset_io() instead of memset()
	powerpc: fix unbalanced node refcount in check_kvm_guest()
	powerpc/paravirt: correct preempt debug splat in vcpu_is_preempted()
	serial: 8250_dw: Drop wrong use of ACPI_PTR()
	usb: gadget: hid: fix error code in do_config()
	power: supply: rt5033_battery: Change voltage values to µV
	power: supply: max17040: fix null-ptr-deref in max17040_probe()
	scsi: csiostor: Uninitialized data in csio_ln_vnp_read_cbfn()
	RDMA/mlx4: Return missed an error if device doesn't support steering
	usb: musb: select GENERIC_PHY instead of depending on it
	staging: most: dim2: do not double-register the same device
	staging: ks7010: select CRYPTO_HASH/CRYPTO_MICHAEL_MIC
	RDMA/core: Set sgtable nents when using ib_dma_virt_map_sg()
	dyndbg: make dyndbg a known cli param
	powerpc/perf: Fix cycles/instructions as PM_CYC/PM_INST_CMPL in power10
	pinctrl: renesas: checker: Fix off-by-one bug in drive register check
	ARM: dts: stm32: Reduce DHCOR SPI NOR frequency to 50 MHz
	ARM: dts: stm32: fix STUSB1600 Type-C irq level on stm32mp15xx-dkx
	ARM: dts: stm32: fix SAI sub nodes register range
	ARM: dts: stm32: fix AV96 board SAI2 pin muxing on stm32mp15
	ASoC: cs42l42: Always configure both ASP TX channels
	ASoC: cs42l42: Correct some register default values
	ASoC: cs42l42: Defer probe if request_threaded_irq() returns EPROBE_DEFER
	soc: qcom: rpmhpd: Make power_on actually enable the domain
	soc: qcom: socinfo: add two missing PMIC IDs
	iio: buffer: Fix double-free in iio_buffers_alloc_sysfs_and_mask()
	usb: typec: STUSB160X should select REGMAP_I2C
	iio: adis: do not disabe IRQs in 'adis_init()'
	soundwire: bus: stop dereferencing invalid slave pointer
	scsi: ufs: ufshcd-pltfrm: Fix memory leak due to probe defer
	scsi: lpfc: Wait for successful restart of SLI3 adapter during host sg_reset
	serial: imx: fix detach/attach of serial console
	usb: dwc2: drd: fix dwc2_force_mode call in dwc2_ovr_init
	usb: dwc2: drd: fix dwc2_drd_role_sw_set when clock could be disabled
	usb: dwc2: drd: reset current session before setting the new one
	powerpc/booke: Disable STRICT_KERNEL_RWX, DEBUG_PAGEALLOC and KFENCE
	usb: dwc3: gadget: Skip resizing EP's TX FIFO if already resized
	firmware: qcom_scm: Fix error retval in __qcom_scm_is_call_available()
	soc: qcom: rpmhpd: fix sm8350_mxc's peer domain
	soc: qcom: apr: Add of_node_put() before return
	arm64: dts: qcom: pmi8994: Fix "eternal"->"external" typo in WLED node
	arm64: dts: qcom: sdm845: Use RPMH_CE_CLK macro directly
	arm64: dts: qcom: sdm845: Fix Qualcomm crypto engine bus clock
	pinctrl: equilibrium: Fix function addition in multiple groups
	ASoC: topology: Fix stub for snd_soc_tplg_component_remove()
	phy: qcom-qusb2: Fix a memory leak on probe
	phy: ti: gmii-sel: check of_get_address() for failure
	phy: qcom-qmp: another fix for the sc8180x PCIe definition
	phy: qcom-snps: Correct the FSEL_MASK
	phy: Sparx5 Eth SerDes: Fix return value check in sparx5_serdes_probe()
	serial: xilinx_uartps: Fix race condition causing stuck TX
	clk: at91: sam9x60-pll: use DIV_ROUND_CLOSEST_ULL
	clk: at91: clk-master: check if div or pres is zero
	clk: at91: clk-master: fix prescaler logic
	HID: u2fzero: clarify error check and length calculations
	HID: u2fzero: properly handle timeouts in usb_submit_urb
	powerpc/nohash: Fix __ptep_set_access_flags() and ptep_set_wrprotect()
	powerpc/book3e: Fix set_memory_x() and set_memory_nx()
	powerpc/44x/fsp2: add missing of_node_put
	powerpc/xmon: fix task state output
	ALSA: oxfw: fix functional regression for Mackie Onyx 1640i in v5.14 or later
	iommu/dma: Fix incorrect error return on iommu deferred attach
	powerpc: Don't provide __kernel_map_pages() without ARCH_SUPPORTS_DEBUG_PAGEALLOC
	ASoC: cs42l42: Correct configuring of switch inversion from ts-inv
	RDMA/hns: Fix initial arm_st of CQ
	RDMA/hns: Modify the value of MAX_LP_MSG_LEN to meet hardware compatibility
	ASoC: rsnd: Fix an error handling path in 'rsnd_node_count()'
	serial: cpm_uart: Protect udbg definitions by CONFIG_SERIAL_CPM_CONSOLE
	virtio_ring: check desc == NULL when using indirect with packed
	vdpa/mlx5: Fix clearing of VIRTIO_NET_F_MAC feature bit
	mips: cm: Convert to bitfield API to fix out-of-bounds access
	power: supply: bq27xxx: Fix kernel crash on IRQ handler register error
	RDMA/core: Require the driver to set the IOVA correctly during rereg_mr
	apparmor: fix error check
	rpmsg: Fix rpmsg_create_ept return when RPMSG config is not defined
	mtd: rawnand: intel: Fix potential buffer overflow in probe
	nfsd: don't alloc under spinlock in rpc_parse_scope_id
	rtc: ds1302: Add SPI ID table
	rtc: ds1390: Add SPI ID table
	rtc: pcf2123: Add SPI ID table
	remoteproc: imx_rproc: Fix TCM io memory type
	i2c: i801: Use PCI bus rescan mutex to protect P2SB access
	dmaengine: idxd: move out percpu_ref_exit() to ensure it's outside submission
	rtc: mcp795: Add SPI ID table
	Input: ariel-pwrbutton - add SPI device ID table
	i2c: mediatek: fixing the incorrect register offset
	NFS: Default change_attr_type to NFS4_CHANGE_TYPE_IS_UNDEFINED
	NFS: Don't set NFS_INO_DATA_INVAL_DEFER and NFS_INO_INVALID_DATA
	NFS: Ignore the directory size when marking for revalidation
	NFS: Fix dentry verifier races
	pnfs/flexfiles: Fix misplaced barrier in nfs4_ff_layout_prepare_ds
	drm/bridge/lontium-lt9611uxc: fix provided connector suport
	drm/plane-helper: fix uninitialized variable reference
	PCI: aardvark: Don't spam about PIO Response Status
	PCI: aardvark: Fix preserving PCI_EXP_RTCTL_CRSSVE flag on emulated bridge
	opp: Fix return in _opp_add_static_v2()
	NFS: Fix deadlocks in nfs_scan_commit_list()
	sparc: Add missing "FORCE" target when using if_changed
	fs: orangefs: fix error return code of orangefs_revalidate_lookup()
	Input: st1232 - increase "wait ready" timeout
	drm/bridge: nwl-dsi: Add atomic_get_input_bus_fmts
	mtd: spi-nor: hisi-sfc: Remove excessive clk_disable_unprepare()
	PCI: uniphier: Serialize INTx masking/unmasking and fix the bit operation
	mtd: rawnand: arasan: Prevent an unsupported configuration
	mtd: core: don't remove debugfs directory if device is in use
	remoteproc: Fix a memory leak in an error handling path in 'rproc_handle_vdev()'
	rtc: rv3032: fix error handling in rv3032_clkout_set_rate()
	dmaengine: at_xdmac: call at_xdmac_axi_config() on resume path
	dmaengine: at_xdmac: fix AT_XDMAC_CC_PERID() macro
	dmaengine: stm32-dma: fix stm32_dma_get_max_width
	NFS: Fix up commit deadlocks
	NFS: Fix an Oops in pnfs_mark_request_commit()
	Fix user namespace leak
	auxdisplay: img-ascii-lcd: Fix lock-up when displaying empty string
	auxdisplay: ht16k33: Connect backlight to fbdev
	auxdisplay: ht16k33: Fix frame buffer device blanking
	soc: fsl: dpaa2-console: free buffer before returning from dpaa2_console_read
	netfilter: nfnetlink_queue: fix OOB when mac header was cleared
	dmaengine: dmaengine_desc_callback_valid(): Check for `callback_result`
	dmaengine: tegra210-adma: fix pm runtime unbalance
	dmanegine: idxd: fix resource free ordering on driver removal
	dmaengine: idxd: reconfig device after device reset command
	signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL)
	m68k: set a default value for MEMORY_RESERVE
	watchdog: f71808e_wdt: fix inaccurate report in WDIOC_GETTIMEOUT
	ar7: fix kernel builds for compiler test
	scsi: target: core: Remove from tmr_list during LUN unlink
	scsi: qla2xxx: Relogin during fabric disturbance
	scsi: qla2xxx: Fix gnl list corruption
	scsi: qla2xxx: Turn off target reset during issue_lip
	scsi: qla2xxx: edif: Fix app start fail
	scsi: qla2xxx: edif: Fix app start delay
	scsi: qla2xxx: edif: Flush stale events and msgs on session down
	scsi: qla2xxx: edif: Increase ELS payload
	scsi: qla2xxx: edif: Fix EDIF bsg
	NFSv4: Fix a regression in nfs_set_open_stateid_locked()
	dmaengine: idxd: fix resource leak on dmaengine driver disable
	i2c: xlr: Fix a resource leak in the error handling path of 'xlr_i2c_probe()'
	gpio: realtek-otto: fix GPIO line IRQ offset
	xen-pciback: Fix return in pm_ctrl_init()
	nbd: fix max value for 'first_minor'
	nbd: fix possible overflow for 'first_minor' in nbd_dev_add()
	io-wq: fix max-workers not correctly set on multi-node system
	net: davinci_emac: Fix interrupt pacing disable
	kselftests/net: add missed icmp.sh test to Makefile
	kselftests/net: add missed setup_loopback.sh/setup_veth.sh to Makefile
	kselftests/net: add missed SRv6 tests
	kselftests/net: add missed vrf_strict_mode_test.sh test to Makefile
	kselftests/net: add missed toeplitz.sh/toeplitz_client.sh to Makefile
	ethtool: fix ethtool msg len calculation for pause stats
	openrisc: fix SMP tlb flush NULL pointer dereference
	net: vlan: fix a UAF in vlan_dev_real_dev()
	net: dsa: felix: fix broken VLAN-tagged PTP under VLAN-aware bridge
	ice: Fix replacing VF hardware MAC to existing MAC filter
	ice: Fix not stopping Tx queues for VFs
	kdb: Adopt scheduler's task classification
	ACPI: PMIC: Fix intel_pmic_regs_handler() read accesses
	PCI: j721e: Fix j721e_pcie_probe() error path
	nvdimm/btt: do not call del_gendisk() if not needed
	scsi: bsg: Fix errno when scsi_bsg_register_queue() fails
	scsi: ufs: ufshpb: Use proper power management API
	scsi: ufs: core: Fix NULL pointer dereference
	scsi: ufs: ufshpb: Properly handle max-single-cmd
	selftests: net: properly support IPv6 in GSO GRE test
	drm/nouveau/svm: Fix refcount leak bug and missing check against null bug
	nvdimm/pmem: cleanup the disk if pmem_release_disk() is yet assigned
	block/ataflop: use the blk_cleanup_disk() helper
	block/ataflop: add registration bool before calling del_gendisk()
	block/ataflop: provide a helper for cleanup up an atari disk
	ataflop: remove ataflop_probe_lock mutex
	PCI: Do not enable AtomicOps on VFs
	cpufreq: intel_pstate: Clear HWP desired on suspend/shutdown and offline
	net: phy: fix duplex out of sync problem while changing settings
	block: fix device_add_disk() kobject_create_and_add() error handling
	drm/ttm: remove ttm_bo_vm_insert_huge()
	bonding: Fix a use-after-free problem when bond_sysfs_slave_add() failed
	octeontx2-pf: select CONFIG_NET_DEVLINK
	ALSA: memalloc: Catch call with NULL snd_dma_buffer pointer
	mfd: core: Add missing of_node_put for loop iteration
	mfd: cpcap: Add SPI device ID table
	mfd: sprd: Add SPI device ID table
	mfd: altera-sysmgr: Fix a mistake caused by resource_size conversion
	ACPI: PM: Fix device wakeup power reference counting error
	libbpf: Fix lookup_and_delete_elem_flags error reporting
	selftests/bpf/xdp_redirect_multi: Put the logs to tmp folder
	selftests/bpf/xdp_redirect_multi: Use arping to accurate the arp number
	selftests/bpf/xdp_redirect_multi: Give tcpdump a chance to terminate cleanly
	selftests/bpf/xdp_redirect_multi: Limit the tests in netns
	drm: fb_helper: improve CONFIG_FB dependency
	Revert "drm/imx: Annotate dma-fence critical section in commit path"
	drm/amdgpu/powerplay: fix sysfs_emit/sysfs_emit_at handling
	can: etas_es58x: es58x_rx_err_msg(): fix memory leak in error path
	can: mcp251xfd: mcp251xfd_chip_start(): fix error handling for mcp251xfd_chip_rx_int_enable()
	mm/zsmalloc.c: close race window between zs_pool_dec_isolated() and zs_unregister_migration()
	zram: off by one in read_block_state()
	perf bpf: Add missing free to bpf_event__print_bpf_prog_info()
	llc: fix out-of-bound array index in llc_sk_dev_hash()
	nfc: pn533: Fix double free when pn533_fill_fragment_skbs() fails
	litex_liteeth: Fix a double free in the remove function
	arm64: arm64_ftr_reg->name may not be a human-readable string
	arm64: pgtable: make __pte_to_phys/__phys_to_pte_val inline functions
	bpf, sockmap: Remove unhash handler for BPF sockmap usage
	bpf, sockmap: Fix race in ingress receive verdict with redirect to self
	bpf: sockmap, strparser, and tls are reusing qdisc_skb_cb and colliding
	bpf, sockmap: sk_skb data_end access incorrect when src_reg = dst_reg
	dmaengine: stm32-dma: fix burst in case of unaligned memory address
	dmaengine: stm32-dma: avoid 64-bit division in stm32_dma_get_max_width
	gve: Fix off by one in gve_tx_timeout()
	drm/i915/fb: Fix rounding error in subsampled plane size calculation
	init: make unknown command line param message clearer
	seq_file: fix passing wrong private data
	drm/amdgpu: fix uvd crash on Polaris12 during driver unloading
	net: dsa: mv88e6xxx: Don't support >1G speeds on 6191X on ports other than 10
	net/sched: sch_taprio: fix undefined behavior in ktime_mono_to_any
	net: hns3: fix ROCE base interrupt vector initialization bug
	net: hns3: fix pfc packet number incorrect after querying pfc parameters
	net: hns3: fix kernel crash when unload VF while it is being reset
	net: hns3: allow configure ETS bandwidth of all TCs
	net: stmmac: allow a tc-taprio base-time of zero
	net: ethernet: ti: cpsw_ale: Fix access to un-initialized memory
	net: marvell: mvpp2: Fix wrong SerDes reconfiguration order
	vsock: prevent unnecessary refcnt inc for nonblocking connect
	net/smc: fix sk_refcnt underflow on linkdown and fallback
	cxgb4: fix eeprom len when diagnostics not implemented
	selftests/net: udpgso_bench_rx: fix port argument
	thermal: int340x: fix build on 32-bit targets
	smb3: do not error on fsync when readonly
	ARM: 9155/1: fix early early_iounmap()
	ARM: 9156/1: drop cc-option fallbacks for architecture selection
	parisc: Fix backtrace to always include init funtion names
	parisc: Flush kernel data mapping in set_pte_at() when installing pte for user page
	MIPS: fix duplicated slashes for Platform file path
	MIPS: fix *-pkg builds for loongson2ef platform
	MIPS: Fix assembly error from MIPSr2 code used within MIPS_ISA_ARCH_LEVEL
	x86/mce: Add errata workaround for Skylake SKX37
	PCI/MSI: Move non-mask check back into low level accessors
	PCI/MSI: Destroy sysfs before freeing entries
	KVM: x86: move guest_pv_has out of user_access section
	posix-cpu-timers: Clear task::posix_cputimers_work in copy_process()
	irqchip/sifive-plic: Fixup EOI failed when masked
	f2fs: should use GFP_NOFS for directory inodes
	f2fs: include non-compressed blocks in compr_written_block
	f2fs: fix UAF in f2fs_available_free_memory
	ceph: fix mdsmap decode when there are MDS's beyond max_mds
	erofs: fix unsafe pagevec reuse of hooked pclusters
	drm/i915/guc: Fix blocked context accounting
	block: Hold invalidate_lock in BLKDISCARD ioctl
	block: Hold invalidate_lock in BLKZEROOUT ioctl
	block: Hold invalidate_lock in BLKRESETZONE ioctl
	ksmbd: Fix buffer length check in fsctl_validate_negotiate_info()
	ksmbd: don't need 8byte alignment for request length in ksmbd_check_message
	dmaengine: ti: k3-udma: Set bchan to NULL if a channel request fail
	dmaengine: ti: k3-udma: Set r/tchan or rflow to NULL if request fail
	dmaengine: bestcomm: fix system boot lockups
	net, neigh: Enable state migration between NUD_PERMANENT and NTF_USE
	9p/net: fix missing error check in p9_check_errors
	mm/filemap.c: remove bogus VM_BUG_ON
	memcg: prohibit unconditional exceeding the limit of dying tasks
	mm, oom: pagefault_out_of_memory: don't force global OOM for dying tasks
	mm, oom: do not trigger out_of_memory from the #PF
	mm, thp: lock filemap when truncating page cache
	mm, thp: fix incorrect unmap behavior for private pages
	mfd: dln2: Add cell for initializing DLN2 ADC
	video: backlight: Drop maximum brightness override for brightness zero
	bcache: fix use-after-free problem in bcache_device_free()
	bcache: Revert "bcache: use bvec_virt"
	PM: sleep: Avoid calling put_device() under dpm_list_mtx
	s390/cpumf: cpum_cf PMU displays invalid value after hotplug remove
	s390/cio: check the subchannel validity for dev_busid
	s390/tape: fix timer initialization in tape_std_assign()
	s390/ap: Fix hanging ioctl caused by orphaned replies
	s390/cio: make ccw_device_dma_* more robust
	remoteproc: elf_loader: Fix loading segment when is_iomem true
	remoteproc: Fix the wrong default value of is_iomem
	remoteproc: imx_rproc: Fix ignoring mapping vdev regions
	remoteproc: imx_rproc: Fix rsc-table name
	mtd: rawnand: fsmc: Fix use of SM ORDER
	mtd: rawnand: ams-delta: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: xway: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: mpc5121: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: gpio: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: pasemi: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: orion: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: plat_nand: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: au1550nd: Keep the driver compatible with on-die ECC engines
	powerpc/vas: Fix potential NULL pointer dereference
	powerpc/bpf: Fix write protecting JIT code
	powerpc/32e: Ignore ESR in instruction storage interrupt handler
	powerpc/powernv/prd: Unregister OPAL_MSG_PRD2 notifier during module unload
	powerpc/security: Use a mutex for interrupt exit code patching
	powerpc/64s/interrupt: Fix check_return_regs_valid() false positive
	powerpc/pseries/mobility: ignore ibm, platform-facilities updates
	powerpc/85xx: fix timebase sync issue when CONFIG_HOTPLUG_CPU=n
	drm/sun4i: Fix macros in sun8i_csc.h
	PCI: Add PCI_EXP_DEVCTL_PAYLOAD_* macros
	PCI: aardvark: Fix PCIe Max Payload Size setting
	SUNRPC: Partial revert of commit 6f9f17287e
	drm/amd/display: Look at firmware version to determine using dmub on dcn21
	media: vidtv: move kfree(dvb) to vidtv_bridge_dev_release()
	cifs: fix memory leak of smb3_fs_context_dup::server_hostname
	ath10k: fix invalid dma_addr_t token assignment
	mmc: moxart: Fix null pointer dereference on pointer host
	selftests/x86/iopl: Adjust to the faked iopl CLI/STI usage
	selftests/bpf: Fix also no-alu32 strobemeta selftest
	arch/cc: Introduce a function to check for confidential computing features
	x86/sev: Add an x86 version of cc_platform_has()
	x86/sev: Make the #VC exception stacks part of the default stacks storage
	media: videobuf2: always set buffer vb2 pointer
	media: videobuf2-dma-sg: Fix buf->vb NULL pointer dereference
	Linux 5.15.3

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I09574eb6b4fbe930bd13f932cc618846972fcc27
2021-11-19 15:38:07 +01:00
Michael Pratt
d5d21724af posix-cpu-timers: Clear task::posix_cputimers_work in copy_process()
commit ca7752caeaa70bd31d1714af566c9809688544af upstream.

copy_process currently copies task_struct.posix_cputimers_work as-is. If a
timer interrupt arrives while handling clone and before dup_task_struct
completes then the child task will have:

1. posix_cputimers_work.scheduled = true
2. posix_cputimers_work.work queued.

copy_process clears task_struct.task_works, so (2) will have no effect and
posix_cpu_timers_work will never run (not to mention it doesn't make sense
for two tasks to share a common linked list).

Since posix_cpu_timers_work never runs, posix_cputimers_work.scheduled is
never cleared. Since scheduled is set, future timer interrupts will skip
scheduling work, with the ultimate result that the task will never receive
timer expirations.

Together, the complete flow is:

1. Task 1 calls clone(), enters kernel.
2. Timer interrupt fires, schedules task work on Task 1.
   2a. task_struct.posix_cputimers_work.scheduled = true
   2b. task_struct.posix_cputimers_work.work added to
       task_struct.task_works.
3. dup_task_struct() copies Task 1 to Task 2.
4. copy_process() clears task_struct.task_works for Task 2.
5. Future timer interrupts on Task 2 see
   task_struct.posix_cputimers_work.scheduled = true and skip scheduling
   work.

Fix this by explicitly clearing contents of task_struct.posix_cputimers_work
in copy_process(). This was never meant to be shared or inherited across
tasks in the first place.

Fixes: 1fb497dd00 ("posix-cpu-timers: Provide mechanisms to defer timer handling to task_work")
Reported-by: Rhys Hiltner <rhys@justin.tv>
Signed-off-by: Michael Pratt <mpratt@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20211101210615.716522-1-mpratt@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18 19:17:14 +01:00
Zhang Qiao
3869eecf05 kernel/sched: Fix sched_fork() access an invalid sched_task_group
[ Upstream commit 4ef0c5c6b5ba1f38f0ea1cedad0cad722f00c14a ]

There is a small race between copy_process() and sched_fork()
where child->sched_task_group point to an already freed pointer.

	parent doing fork()      | someone moving the parent
				 | to another cgroup
  -------------------------------+-------------------------------
  copy_process()
      + dup_task_struct()<1>
				  parent move to another cgroup,
				  and free the old cgroup. <2>
      + sched_fork()
	+ __set_task_cpu()<3>
	+ task_fork_fair()
	  + sched_slice()<4>

In the worst case, this bug can lead to "use-after-free" and
cause panic as shown above:

  (1) parent copy its sched_task_group to child at <1>;

  (2) someone move the parent to another cgroup and free the old
      cgroup at <2>;

  (3) the sched_task_group and cfs_rq that belong to the old cgroup
      will be accessed at <3> and <4>, which cause a panic:

  [] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  [] PGD 8000001fa0a86067 P4D 8000001fa0a86067 PUD 2029955067 PMD 0
  [] Oops: 0000 [#1] SMP PTI
  [] CPU: 7 PID: 648398 Comm: ebizzy Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0.x86_64+ #1
  [] RIP: 0010:sched_slice+0x84/0xc0

  [] Call Trace:
  []  task_fork_fair+0x81/0x120
  []  sched_fork+0x132/0x240
  []  copy_process.part.5+0x675/0x20e0
  []  ? __handle_mm_fault+0x63f/0x690
  []  _do_fork+0xcd/0x3b0
  []  do_syscall_64+0x5d/0x1d0
  []  entry_SYSCALL_64_after_hwframe+0x65/0xca
  [] RIP: 0033:0x7f04418cd7e1

Between cgroup_can_fork() and cgroup_post_fork(), the cgroup
membership and thus sched_task_group can't change. So update child's
sched_task_group at sched_post_fork() and move task_fork() and
__set_task_cpu() (where accees the sched_task_group) from sched_fork()
to sched_post_fork().

Fixes: 8323f26ce3 ("sched: Fix race in task_group")
Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lkml.kernel.org/r/20210915064030.2231-1-zhangqiao22@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18 19:16:32 +01:00
lijianzhong
bc70904edc ANDROID: sched: Add vendor hooks for sched.
Add vendor hooks in scheduler to support OEM's value adds.

Bug: 183674818
Signed-off-by: lijianzhong <lijianzhong@xiaomi.com>
Change-Id: I8415958749948b3702e411f835c227ad4f8d8e92
Signed-off-by: Shaleen Agrawal <shalagra@codeaurora.org>
2021-10-18 15:11:09 +00:00
Greg Kroah-Hartman
e74ef7cf8f Merge tag 'v5.15-rc1' into android-mainline
Linux 5.15-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib4933c598d27b18268860e7549966ef7724652fc
2021-09-16 09:51:19 +02:00
Greg Kroah-Hartman
c0d1ebaba1 Merge 2d338201d5 ("Merge branch 'akpm' (patches from Andrew)") into android-mainline
Steps on the way to 5.15-rc1

Resolves merge conflict in:
	fs/proc/base.c

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic554ca8447961e52fbc6f27d91470a816b59a771
2021-09-15 14:34:48 +02:00
Greg Kroah-Hartman
c5cd945b24 Merge fd47ff55c9 ("Merge tag 'usb-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb") into android-mainline
Steps on the way to 5.15-rc1.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I42ffa8818bbb2072f043923553c4d8f91d9647a5
2021-09-14 14:42:51 +02:00
Liu Zixian
13db8c5047 mm/hugetlb: initialize hugetlb_usage in mm_init
After fork, the child process will get incorrect (2x) hugetlb_usage.  If
a process uses 5 2MB hugetlb pages in an anonymous mapping,

	HugetlbPages:	   10240 kB

and then forks, the child will show,

	HugetlbPages:	   20480 kB

The reason for double the amount is because hugetlb_usage will be copied
from the parent and then increased when we copy page tables from parent
to child.  Child will have 2x actual usage.

Fix this by adding hugetlb_count_init in mm_init.

Link: https://lkml.kernel.org/r/20210826071742.877-1-liuzixian4@huawei.com
Fixes: 5d317b2b65 ("mm: hugetlb: proc: add HugetlbPages field to /proc/PID/status")
Signed-off-by: Liu Zixian <liuzixian4@huawei.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 18:45:53 -07:00
Linus Torvalds
2d338201d5 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "147 patches, based on 7d2a07b769.

  Subsystems affected by this patch series: mm (memory-hotplug, rmap,
  ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan),
  alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib,
  checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig,
  selftests, ipc, and scripts"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits)
  scripts: check_extable: fix typo in user error message
  mm/workingset: correct kernel-doc notations
  ipc: replace costly bailout check in sysvipc_find_ipc()
  selftests/memfd: remove unused variable
  Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH
  configs: remove the obsolete CONFIG_INPUT_POLLDEV
  prctl: allow to setup brk for et_dyn executables
  pid: cleanup the stale comment mentioning pidmap_init().
  kernel/fork.c: unexport get_{mm,task}_exe_file
  coredump: fix memleak in dump_vma_snapshot()
  fs/coredump.c: log if a core dump is aborted due to changed file permissions
  nilfs2: use refcount_dec_and_lock() to fix potential UAF
  nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group
  nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group
  nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group
  nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group
  nilfs2: fix NULL pointer in nilfs_##name##_attr_release
  nilfs2: fix memory leak in nilfs_sysfs_create_device_group
  trap: cleanup trap_init()
  init: move usermodehelper_enable() to populate_rootfs()
  ...
2021-09-08 12:55:35 -07:00
Christoph Hellwig
05da8113c9 kernel/fork.c: unexport get_{mm,task}_exe_file
Only used by core code and the tomoyo which can't be a module either.

Link: https://lkml.kernel.org/r/20210820095430.445242-1-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 11:50:28 -07:00
Greg Kroah-Hartman
bc2f6edebd Merge 9e9fb7655e ("Merge tag 'net-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next") into android-mainline
Steps on the way to 5.15-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I49577d606b2710975407eae3fee60bc331397810
2021-09-07 14:40:30 +02:00
Linus Torvalds
49624efa65 Merge tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux
Pull MAP_DENYWRITE removal from David Hildenbrand:
 "Remove all in-tree usage of MAP_DENYWRITE from the kernel and remove
  VM_DENYWRITE.

  There are some (minor) user-visible changes:

   - We no longer deny write access to shared libaries loaded via legacy
     uselib(); this behavior matches modern user space e.g. dlopen().

   - We no longer deny write access to the elf interpreter after exec
     completed, treating it just like shared libraries (which it often
     is).

   - We always deny write access to the file linked via /proc/pid/exe:
     sys_prctl(PR_SET_MM_MAP/EXE_FILE) will fail if write access to the
     file cannot be denied, and write access to the file will remain
     denied until the link is effectivel gone (exec, termination,
     sys_prctl(PR_SET_MM_MAP/EXE_FILE)) -- just as if exec'ing the file.

  Cross-compiled for a bunch of architectures (alpha, microblaze, i386,
  s390x, ...) and verified via ltp that especially the relevant tests
  (i.e., creat07 and execve04) continue working as expected"

* tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux:
  fs: update documentation of get_write_access() and friends
  mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff()
  mm: remove VM_DENYWRITE
  binfmt: remove in-tree usage of MAP_DENYWRITE
  kernel/fork: always deny write access to current MM exe_file
  kernel/fork: factor out replacing the current MM exe_file
  binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib()
2021-09-04 11:35:47 -07:00
David Hildenbrand
8d0920bde5 mm: remove VM_DENYWRITE
All in-tree users of MAP_DENYWRITE are gone. MAP_DENYWRITE cannot be
set from user space, so all users are gone; let's remove it.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2021-09-03 18:42:01 +02:00
David Hildenbrand
fe69d560b5 kernel/fork: always deny write access to current MM exe_file
We want to remove VM_DENYWRITE only currently only used when mapping the
executable during exec. During exec, we already deny_write_access() the
executable, however, after exec completes the VMAs mapped
with VM_DENYWRITE effectively keeps write access denied via
deny_write_access().

Let's deny write access when setting or replacing the MM exe_file. With
this change, we can remove VM_DENYWRITE for mapping executables.

Make set_mm_exe_file() return an error in case deny_write_access()
fails; note that this should never happen, because exec code does a
deny_write_access() early and keeps write access denied when calling
set_mm_exe_file. However, it makes the code easier to read and makes
set_mm_exe_file() and replace_mm_exe_file() look more similar.

This represents a minor user space visible change:
sys_prctl(PR_SET_MM_MAP/EXE_FILE) can now fail if the file is already
opened writable. Also, after sys_prctl(PR_SET_MM_MAP/EXE_FILE) the file
cannot be opened writable. Note that we can already fail with -EACCES if
the file doesn't have execute permissions.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2021-09-03 18:42:01 +02:00
David Hildenbrand
35d7bdc860 kernel/fork: factor out replacing the current MM exe_file
Let's factor the main logic out into replace_mm_exe_file(), such that
all mm->exe_file logic is contained in kernel/fork.c.

While at it, perform some simple cleanups that are possible now that
we're simplifying the individual functions.

Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2021-09-03 18:42:01 +02:00
Greg Kroah-Hartman
506f6a3567 Merge 5d3c0db459 ("Merge tag 'sched-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") into android-mainline
Steps on the way to 5.15-rc1.

Resolves merge conflicts with:
	kernel/fork.c
	kernel/sched/core.c

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I73e7bfc310639dae4ca5df7b63d47fa32a171760
2021-09-02 11:02:37 +02:00
Greg Kroah-Hartman
571976d07a Merge tag 'v5.14' into android-mainline
Linux 5.14

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie2edcf6ad1f7d17ffbe5322bf2378ff01be31e64
2021-09-01 17:33:51 +02:00
Linus Torvalds
9e9fb7655e Merge tag 'net-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
 "Core:

   - Enable memcg accounting for various networking objects.

  BPF:

   - Introduce bpf timers.

   - Add perf link and opaque bpf_cookie which the program can read out
     again, to be used in libbpf-based USDT library.

   - Add bpf_task_pt_regs() helper to access user space pt_regs in
     kprobes, to help user space stack unwinding.

   - Add support for UNIX sockets for BPF sockmap.

   - Extend BPF iterator support for UNIX domain sockets.

   - Allow BPF TCP congestion control progs and bpf iterators to call
     bpf_setsockopt(), e.g. to switch to another congestion control
     algorithm.

  Protocols:

   - Support IOAM Pre-allocated Trace with IPv6.

   - Support Management Component Transport Protocol.

   - bridge: multicast: add vlan support.

   - netfilter: add hooks for the SRv6 lightweight tunnel driver.

   - tcp:
       - enable mid-stream window clamping (by user space or BPF)
       - allow data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD
       - more accurate DSACK processing for RACK-TLP

   - mptcp:
       - add full mesh path manager option
       - add partial support for MP_FAIL
       - improve use of backup subflows
       - optimize option processing

   - af_unix: add OOB notification support.

   - ipv6: add IFLA_INET6_RA_MTU to expose MTU value advertised by the
     router.

   - mac80211: Target Wake Time support in AP mode.

   - can: j1939: extend UAPI to notify about RX status.

  Driver APIs:

   - Add page frag support in page pool API.

   - Many improvements to the DSA (distributed switch) APIs.

   - ethtool: extend IRQ coalesce uAPI with timer reset modes.

   - devlink: control which auxiliary devices are created.

   - Support CAN PHYs via the generic PHY subsystem.

   - Proper cross-chip support for tag_8021q.

   - Allow TX forwarding for the software bridge data path to be
     offloaded to capable devices.

  Drivers:

   - veth: more flexible channels number configuration.

   - openvswitch: introduce per-cpu upcall dispatch.

   - Add internet mix (IMIX) mode to pktgen.

   - Transparently handle XDP operations in the bonding driver.

   - Add LiteETH network driver.

   - Renesas (ravb):
       - support Gigabit Ethernet IP

   - NXP Ethernet switch (sja1105):
       - fast aging support
       - support for "H" switch topologies
       - traffic termination for ports under VLAN-aware bridge

   - Intel 1G Ethernet
       - support getcrosststamp() with PCIe PTM (Precision Time
         Measurement) for better time sync
       - support Credit-Based Shaper (CBS) offload, enabling HW traffic
         prioritization and bandwidth reservation

   - Broadcom Ethernet (bnxt)
       - support pulse-per-second output
       - support larger Rx rings

   - Mellanox Ethernet (mlx5)
       - support ethtool RSS contexts and MQPRIO channel mode
       - support LAG offload with bridging
       - support devlink rate limit API
       - support packet sampling on tunnels

   - Huawei Ethernet (hns3):
       - basic devlink support
       - add extended IRQ coalescing support
       - report extended link state

   - Netronome Ethernet (nfp):
       - add conntrack offload support

   - Broadcom WiFi (brcmfmac):
       - add WPA3 Personal with FT to supported cipher suites
       - support 43752 SDIO device

   - Intel WiFi (iwlwifi):
       - support scanning hidden 6GHz networks
       - support for a new hardware family (Bz)

   - Xen pv driver:
       - harden netfront against malicious backends

   - Qualcomm mobile
       - ipa: refactor power management and enable automatic suspend
       - mhi: move MBIM to WWAN subsystem interfaces

  Refactor:

   - Ambient BPF run context and cgroup storage cleanup.

   - Compat rework for ndo_ioctl.

  Old code removal:

   - prism54 remove the obsoleted driver, deprecated by the p54 driver.

   - wan: remove sbni/granch driver"

* tag 'net-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1715 commits)
  net: Add depends on OF_NET for LiteX's LiteETH
  ipv6: seg6: remove duplicated include
  net: hns3: remove unnecessary spaces
  net: hns3: add some required spaces
  net: hns3: clean up a type mismatch warning
  net: hns3: refine function hns3_set_default_feature()
  ipv6: remove duplicated 'net/lwtunnel.h' include
  net: w5100: check return value after calling platform_get_resource()
  net/mlxbf_gige: Make use of devm_platform_ioremap_resourcexxx()
  net: mdio: mscc-miim: Make use of the helper function devm_platform_ioremap_resource()
  net: mdio-ipq4019: Make use of devm_platform_ioremap_resource()
  fou: remove sparse errors
  ipv4: fix endianness issue in inet_rtm_getroute_build_skb()
  octeontx2-af: Set proper errorcode for IPv4 checksum errors
  octeontx2-af: Fix static code analyzer reported issues
  octeontx2-af: Fix mailbox errors in nix_rss_flowkey_cfg
  octeontx2-af: Fix loop in free and unmap counter
  af_unix: fix potential NULL deref in unix_dgram_connect()
  dpaa2-eth: Replace strlcpy with strscpy
  octeontx2-af: Use NDC TX for transmit packet data
  ...
2021-08-31 16:43:06 -07:00
Linus Torvalds
5d3c0db459 Merge tag 'sched-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:

 - The biggest change in this cycle is scheduler support for asymmetric
   scheduling affinity, to support the execution of legacy 32-bit tasks
   on AArch32 systems that also have 64-bit-only CPUs.

   Architectures can fill in this functionality by defining their own
   task_cpu_possible_mask(p). When this is done, the scheduler will make
   sure the task will only be scheduled on CPUs that support it.

   (The actual arm64 specific changes are not part of this tree.)

   For other architectures there will be no change in functionality.

 - Add cgroup SCHED_IDLE support

 - Increase node-distance flexibility & delay determining it until a CPU
   is brought online. (This enables platforms where node distance isn't
   final until the CPU is only.)

 - Deadline scheduler enhancements & fixes

 - Misc fixes & cleanups.

* tag 'sched-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  eventfd: Make signal recursion protection a task bit
  sched/fair: Mark tg_is_idle() an inline in the !CONFIG_FAIR_GROUP_SCHED case
  sched: Introduce dl_task_check_affinity() to check proposed affinity
  sched: Allow task CPU affinity to be restricted on asymmetric systems
  sched: Split the guts of sched_setaffinity() into a helper function
  sched: Introduce task_struct::user_cpus_ptr to track requested affinity
  sched: Reject CPU affinity changes based on task_cpu_possible_mask()
  cpuset: Cleanup cpuset_cpus_allowed_fallback() use in select_fallback_rq()
  cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()
  cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1
  sched: Introduce task_cpu_possible_mask() to limit fallback rq selection
  sched: Cgroup SCHED_IDLE support
  sched/topology: Skip updating masks for non-online nodes
  sched: Replace deprecated CPU-hotplug functions.
  sched: Skip priority checks with SCHED_FLAG_KEEP_PARAMS
  sched: Fix UCLAMP_FLAG_IDLE setting
  sched/deadline: Fix missing clock update in migrate_task_rq_dl()
  sched/fair: Avoid a second scan of target in select_idle_cpu
  sched/fair: Use prev instead of new target as recent_used_cpu
  sched: Don't report SCHED_FLAG_SUGOV in sched_getattr()
  ...
2021-08-30 13:42:10 -07:00
Jakub Kicinski
97c78d0af5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/wwan/mhi_wwan_mbim.c - drop the extra arg.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26 17:57:57 -07:00
Eric W. Biederman
5ddf994fa2 ucounts: Fix regression preventing increasing of rlimits in init_user_ns
"Ma, XinjianX" <xinjianx.ma@intel.com> reported:

> When lkp team run kernel selftests, we found after these series of patches, testcase mqueue: mq_perf_tests
> in kselftest failed with following message.
>
> # selftests: mqueue: mq_perf_tests
> #
> # Initial system state:
> #       Using queue path:                       /mq_perf_tests
> #       RLIMIT_MSGQUEUE(soft):                  819200
> #       RLIMIT_MSGQUEUE(hard):                  819200
> #       Maximum Message Size:                   8192
> #       Maximum Queue Size:                     10
> #       Nice value:                             0
> #
> # Adjusted system state for testing:
> #       RLIMIT_MSGQUEUE(soft):                  (unlimited)
> #       RLIMIT_MSGQUEUE(hard):                  (unlimited)
> #       Maximum Message Size:                   16777216
> #       Maximum Queue Size:                     65530
> #       Nice value:                             -20
> #       Continuous mode:                        (disabled)
> #       CPUs to pin:                            3
> # ./mq_perf_tests: mq_open() at 296: Too many open files
> not ok 2 selftests: mqueue: mq_perf_tests # exit=1
> ```
>
> Test env:
> rootfs: debian-10
> gcc version: 9

After investigation the problem turned out to be that ucount_max for
the rlimits in init_user_ns was being set to the initial rlimit value.
The practical problem is that ucount_max provides a limit that
applications inside the user namespace can not exceed.  Which means in
practice that rlimits that have been converted to use the ucount
infrastructure were not able to exceend their initial rlimits.

Solve this by setting the relevant values of ucount_max to
RLIM_INIFINITY.  A limit in init_user_ns is pointless so the code
should allow the values to grow as large as possible without riscking
an underflow or an overflow.

As the ltp test case was a bit of a pain I have reproduced the rlimit failure
and tested the fix with the following little C program:
> #include <stdio.h>
> #include <fcntl.h>
> #include <sys/stat.h>
> #include <mqueue.h>
> #include <sys/time.h>
> #include <sys/resource.h>
> #include <errno.h>
> #include <string.h>
> #include <stdlib.h>
> #include <limits.h>
> #include <unistd.h>
>
> int main(int argc, char **argv)
> {
> 	struct mq_attr mq_attr;
> 	struct rlimit rlim;
> 	mqd_t mqd;
> 	int ret;
>
> 	ret = getrlimit(RLIMIT_MSGQUEUE, &rlim);
> 	if (ret != 0) {
> 		fprintf(stderr, "getrlimit(RLIMIT_MSGQUEUE) failed: %s\n", strerror(errno));
> 		exit(EXIT_FAILURE);
> 	}
> 	printf("RLIMIT_MSGQUEUE %lu %lu\n",
> 	       rlim.rlim_cur, rlim.rlim_max);
> 	rlim.rlim_cur = RLIM_INFINITY;
> 	rlim.rlim_max = RLIM_INFINITY;
> 	ret = setrlimit(RLIMIT_MSGQUEUE, &rlim);
> 	if (ret != 0) {
> 		fprintf(stderr, "setrlimit(RLIMIT_MSGQUEUE, RLIM_INFINITY) failed: %s\n", strerror(errno));
> 		exit(EXIT_FAILURE);
> 	}
>
> 	memset(&mq_attr, 0, sizeof(struct mq_attr));
> 	mq_attr.mq_maxmsg = 65536 - 1;
> 	mq_attr.mq_msgsize = 16*1024*1024 - 1;
>
> 	mqd = mq_open("/mq_rlimit_test", O_RDONLY|O_CREAT, 0600, &mq_attr);
> 	if (mqd == (mqd_t)-1) {
> 		fprintf(stderr, "mq_open failed: %s\n", strerror(errno));
> 		exit(EXIT_FAILURE);
> 	}
> 	ret = mq_close(mqd);
> 	if (ret) {
> 		fprintf(stderr, "mq_close failed; %s\n", strerror(errno));
> 		exit(EXIT_FAILURE);
> 	}
>
> 	return EXIT_SUCCESS;
> }

Fixes: 6e52a9f053 ("Reimplement RLIMIT_MSGQUEUE on top of ucounts")
Fixes: d7c9e99aee ("Reimplement RLIMIT_MEMLOCK on top of ucounts")
Fixes: d646969055 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
Fixes: 21d1c5e386 ("Reimplement RLIMIT_NPROC on top of ucounts")
Reported-by: kernel test robot lkp@intel.com
Acked-by: Alexey Gladkov <legion@kernel.org>
Link: https://lkml.kernel.org/r/87eeajswfc.fsf_-_@disp2133
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-08-23 16:10:42 -05:00