Commit Graph

38228 Commits

Author SHA1 Message Date
Yu Zhao
5280d76d38 FROMLIST: mm: multi-gen LRU: support page table walks
To further exploit spatial locality, the aging prefers to walk page
tables to search for young PTEs and promote hot pages. A kill switch
will be added in the next patch to disable this behavior. When
disabled, the aging relies on the rmap only.

NB: this behavior has nothing similar with the page table scanning in
the 2.4 kernel [1], which searches page tables for old PTEs, adds cold
pages to swapcache and unmaps them.

To avoid confusion, the term "iteration" specifically means the
traversal of an entire mm_struct list; the term "walk" will be applied
to page tables and the rmap, as usual.

An mm_struct list is maintained for each memcg, and an mm_struct
follows its owner task to the new memcg when this task is migrated.
Given an lruvec, the aging iterates lruvec_memcg()->mm_list and calls
walk_page_range() with each mm_struct on this list to promote hot
pages before it increments max_seq.

When multiple page table walkers iterate the same list, each of them
gets a unique mm_struct; therefore they can run concurrently. Page
table walkers ignore any misplaced pages, e.g., if an mm_struct was
migrated, pages it left in the previous memcg will not be promoted
when its current memcg is under reclaim. Similarly, page table walkers
will not promote pages from nodes other than the one under reclaim.

This patch uses the following optimizations when walking page tables:
1. It tracks the usage of mm_struct's between context switches so that
   page table walkers can skip processes that have been sleeping since
   the last iteration.
2. It uses generational Bloom filters to record populated branches so
   that page table walkers can reduce their search space based on the
   query results, e.g., to skip page tables containing mostly holes or
   misplaced pages.
3. It takes advantage of the accessed bit in non-leaf PMD entries when
   CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y.
4. It does not zigzag between a PGD table and the same PMD table
   spanning multiple VMAs. IOW, it finishes all the VMAs within the
   range of the same PMD table before it returns to a PGD table. This
   improves the cache performance for workloads that have large
   numbers of tiny VMAs [2], especially when CONFIG_PGTABLE_LEVELS=5.

Server benchmark results:
  Single workload:
    fio (buffered I/O): no change

  Single workload:
    memcached (anon): +[5.5, 7.5]%
                         Ops/sec      KB/sec
      patch1-7:          1014393.57   39455.42
      patch1-8:          1078507.59   41949.15

  Configurations:
    no change

Client benchmark results:
  kswapd profiles:
    patch1-7
      45.54%  lzo1x_1_do_compress (real work)
       9.56%  page_vma_mapped_walk
       6.70%  _raw_spin_unlock_irq
       2.78%  ptep_clear_flush
       2.47%  do_raw_spin_lock
       2.22%  __zram_bvec_write
       1.87%  lru_gen_look_around
       1.78%  memmove
       1.77%  obj_malloc
       1.44%  free_unref_page_list

    patch1-8
      47.02%  lzo1x_1_do_compress (real work)
       6.73%  page_vma_mapped_walk
       6.14%  _raw_spin_unlock_irq
       3.39%  walk_pte_range
       2.63%  ptep_clear_flush
       2.29%  __zram_bvec_write
       2.10%  do_raw_spin_lock
       1.81%  memmove
       1.73%  obj_malloc
       1.53%  free_unref_page_list

  Configurations:
    no change

[1] https://lwn.net/Articles/23732/
[2] https://source.android.com/devices/tech/debug/scudo

Link: https://lore.kernel.org/lkml/20220309021230.721028-9-yuzhao@google.com/
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Bug: 227651406
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I5a3c97cf8ebf8d65d5f9528cd979a637c190053e
2022-04-20 17:38:55 +00:00
Yu Zhao
a1537a68c5 FROMLIST: mm: multi-gen LRU: minimal implementation
To avoid confusion, the terms "promotion" and "demotion" will be
applied to the multi-gen LRU, as a new convention; the terms
"activation" and "deactivation" will be applied to the active/inactive
LRU, as usual.

The aging produces young generations. Given an lruvec, it increments
max_seq when max_seq-min_seq+1 approaches MIN_NR_GENS. The aging
promotes hot pages to the youngest generation when it finds them
accessed through page tables; the demotion of cold pages happens
consequently when it increments max_seq. The aging has the complexity
O(nr_hot_pages), since it is only interested in hot pages. Promotion
in the aging path does not require any LRU list operations, only the
updates of the gen counter and lrugen->nr_pages[]; demotion, unless as
the result of the increment of max_seq, requires LRU list operations,
e.g., lru_deactivate_fn().

The eviction consumes old generations. Given an lruvec, it increments
min_seq when the lists indexed by min_seq%MAX_NR_GENS become empty. A
feedback loop modeled after the PID controller monitors refaults over
anon and file types and decides which type to evict when both types
are available from the same generation.

Each generation is divided into multiple tiers. Tiers represent
different ranges of numbers of accesses through file descriptors. A
page accessed N times through file descriptors is in tier
order_base_2(N). Tiers do not have dedicated lrugen->lists[], only
bits in page->flags. In contrast to moving across generations, which
requires the LRU lock, moving across tiers only involves operations on
page->flags. The feedback loop also monitors refaults over all tiers
and decides when to protect pages in which tiers (N>1), using the
first tier (N=0,1) as a baseline. The first tier contains single-use
unmapped clean pages, which are most likely the best choices. The
eviction moves a page to the next generation, i.e., min_seq+1, if the
feedback loop decides so. This approach has the following advantages:
1. It removes the cost of activation in the buffered access path by
   inferring whether pages accessed multiple times through file
   descriptors are statistically hot and thus worth protecting in the
   eviction path.
2. It takes pages accessed through page tables into account and avoids
   overprotecting pages accessed multiple times through file
   descriptors. (Pages accessed through page tables are in the first
   tier, since N=0.)
3. More tiers provide better protection for pages accessed more than
   twice through file descriptors, when under heavy buffered I/O
   workloads.

Server benchmark results:
  Single workload:
    fio (buffered I/O): +[38, 40]%
                         IOPS         BW
      5.18-ed4643521e6a: 2547k        9989MiB/s
      patch1-6:          3540k        13.5GiB/s

  Single workload:
    memcached (anon): +[103, 107]%
                         Ops/sec      KB/sec
      5.18-ed4643521e6a: 469048.66    18243.91
      patch1-6:          964656.80    37520.88

  Configurations:
    CPU: two Xeon 6154
    Mem: total 256G

    Node 1 was only used as a ram disk to reduce the variance in the
    results.

    patch drivers/block/brd.c <<EOF
    99,100c99,100
    < 	gfp_flags = GFP_NOIO | __GFP_ZERO | __GFP_HIGHMEM;
    < 	page = alloc_page(gfp_flags);
    ---
    > 	gfp_flags = GFP_NOIO | __GFP_ZERO | __GFP_HIGHMEM | __GFP_THISNODE;
    > 	page = alloc_pages_node(1, gfp_flags, 0);
    EOF

    cat >>/etc/systemd/system.conf <<EOF
    CPUAffinity=numa
    NUMAPolicy=bind
    NUMAMask=0
    EOF

    cat >>/etc/memcached.conf <<EOF
    -m 184320
    -s /var/run/memcached/memcached.sock
    -a 0766
    -t 36
    -B binary
    EOF

    cat fio.sh
    modprobe brd rd_nr=1 rd_size=113246208
    swapoff -a
    mkfs.ext4 /dev/ram0
    mount -t ext4 /dev/ram0 /mnt

    mkdir /sys/fs/cgroup/user.slice/test
    echo 38654705664 >/sys/fs/cgroup/user.slice/test/memory.max
    echo $$ >/sys/fs/cgroup/user.slice/test/cgroup.procs
    fio -name=mglru --numjobs=72 --directory=/mnt --size=1408m \
      --buffered=1 --ioengine=io_uring --iodepth=128 \
      --iodepth_batch_submit=32 --iodepth_batch_complete=32 \
      --rw=randread --random_distribution=random --norandommap \
      --time_based --ramp_time=10m --runtime=5m --group_reporting

    cat memcached.sh
    modprobe brd rd_nr=1 rd_size=113246208
    swapoff -a
    mkswap /dev/ram0
    swapon /dev/ram0

    memtier_benchmark -S /var/run/memcached/memcached.sock \
      -P memcache_binary -n allkeys --key-minimum=1 \
      --key-maximum=65000000 --key-pattern=P:P -c 1 -t 36 \
      --ratio 1:0 --pipeline 8 -d 2000

    memtier_benchmark -S /var/run/memcached/memcached.sock \
      -P memcache_binary -n allkeys --key-minimum=1 \
      --key-maximum=65000000 --key-pattern=R:R -c 1 -t 36 \
      --ratio 0:1 --pipeline 8 --randomize --distinct-client-seed

Client benchmark results:
  kswapd profiles:
    5.18-ed4643521e6a
      39.56%  page_vma_mapped_walk
      19.32%  lzo1x_1_do_compress (real work)
       7.18%  do_raw_spin_lock
       4.23%  _raw_spin_unlock_irq
       2.26%  vma_interval_tree_subtree_search
       2.12%  vma_interval_tree_iter_next
       2.11%  folio_referenced_one
       1.90%  anon_vma_interval_tree_iter_first
       1.47%  ptep_clear_flush
       0.97%  __anon_vma_interval_tree_subtree_search

    patch1-6
      36.13%  lzo1x_1_do_compress (real work)
      19.16%  page_vma_mapped_walk
       6.55%  _raw_spin_unlock_irq
       4.02%  do_raw_spin_lock
       2.32%  anon_vma_interval_tree_iter_first
       2.11%  ptep_clear_flush
       1.76%  __zram_bvec_write
       1.64%  folio_referenced_one
       1.40%  memmove
       1.35%  obj_malloc

  Configurations:
    CPU: single Snapdragon 7c
    Mem: total 4G

    Chrome OS MemoryPressure [1]

[1] https://chromium.googlesource.com/chromiumos/platform/tast-tests/

Link: https://lore.kernel.org/lkml/20220309021230.721028-7-yuzhao@google.com/
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Bug: 227651406
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I3fe4850006d7984cd9f4fd46134b826609dc2f86
2022-04-20 17:38:55 +00:00
Yu Zhao
f88ed5a3d3 FROMLIST: mm: multi-gen LRU: groundwork
Evictable pages are divided into multiple generations for each lruvec.
The youngest generation number is stored in lrugen->max_seq for both
anon and file types as they are aged on an equal footing. The oldest
generation numbers are stored in lrugen->min_seq[] separately for anon
and file types as clean file pages can be evicted regardless of swap
constraints. These three variables are monotonically increasing.

Generation numbers are truncated into order_base_2(MAX_NR_GENS+1) bits
in order to fit into the gen counter in page->flags. Each truncated
generation number is an index to lrugen->lists[]. The sliding window
technique is used to track at least MIN_NR_GENS and at most
MAX_NR_GENS generations. The gen counter stores a value within [1,
MAX_NR_GENS] while a page is on one of lrugen->lists[]. Otherwise it
stores 0.

There are two conceptually independent procedures: "the aging", which
produces young generations, and "the eviction", which consumes old
generations. They form a closed-loop system, i.e., "the page reclaim".
Both procedures can be invoked from userspace for the purposes of
working set estimation and proactive reclaim. These features are
required to optimize job scheduling (bin packing) in data centers. The
variable size of the sliding window is designed for such use cases
[1][2].

To avoid confusion, the terms "hot" and "cold" will be applied to the
multi-gen LRU, as a new convention; the terms "active" and "inactive"
will be applied to the active/inactive LRU, as usual.

The protection of hot pages and the selection of cold pages are based
on page access channels and patterns. There are two access channels:
one through page tables and the other through file descriptors. The
protection of the former channel is by design stronger because:
1. The uncertainty in determining the access patterns of the former
   channel is higher due to the approximation of the accessed bit.
2. The cost of evicting the former channel is higher due to the TLB
   flushes required and the likelihood of encountering the dirty bit.
3. The penalty of underprotecting the former channel is higher because
   applications usually do not prepare themselves for major page
   faults like they do for blocked I/O. E.g., GUI applications
   commonly use dedicated I/O threads to avoid blocking the rendering
   threads.
There are also two access patterns: one with temporal locality and the
other without. For the reasons listed above, the former channel is
assumed to follow the former pattern unless VM_SEQ_READ or
VM_RAND_READ is present; the latter channel is assumed to follow the
latter pattern unless outlying refaults have been observed [3][4].

The next patch will address the "outlying refaults". Three macros,
i.e., LRU_REFS_WIDTH, LRU_REFS_PGOFF and LRU_REFS_MASK, used later are
added in this patch to make the entire patchset less diffy.

A page is added to the youngest generation on faulting. The aging
needs to check the accessed bit at least twice before handing this
page over to the eviction. The first check takes care of the accessed
bit set on the initial fault; the second check makes sure this page
has not been used since then. This protocol, AKA second chance,
requires a minimum of two generations, hence MIN_NR_GENS.

[1] https://dl.acm.org/doi/10.1145/3297858.3304053
[2] https://dl.acm.org/doi/10.1145/3503222.3507731
[3] https://lwn.net/Articles/495543/
[4] https://lwn.net/Articles/815342/

Link: https://lore.kernel.org/lkml/20220309021230.721028-6-yuzhao@google.com/
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Bug: 227651406
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I333ec6a1d2abfa60d93d6adc190ed3eefe441512
2022-04-20 17:38:55 +00:00
Greg Kroah-Hartman
b41a37c036 Merge 5.15.33 into android13-5.15
Changes in 5.15.33
	Revert "swiotlb: rework "fix info leak with DMA_FROM_DEVICE""
	USB: serial: pl2303: add IBM device IDs
	dt-bindings: usb: hcd: correct usb-device path
	USB: serial: pl2303: fix GS type detection
	USB: serial: simple: add Nokia phone driver
	mm: kfence: fix missing objcg housekeeping for SLAB
	hv: utils: add PTP_1588_CLOCK to Kconfig to fix build
	HID: logitech-dj: add new lightspeed receiver id
	HID: Add support for open wheel and no attachment to T300
	xfrm: fix tunnel model fragmentation behavior
	ARM: mstar: Select HAVE_ARM_ARCH_TIMER
	virtio_console: break out of buf poll on remove
	vdpa/mlx5: should verify CTRL_VQ feature exists for MQ
	tools/virtio: fix virtio_test execution
	ethernet: sun: Free the coherent when failing in probing
	gpio: Revert regression in sysfs-gpio (gpiolib.c)
	spi: Fix invalid sgs value
	net:mcf8390: Use platform_get_irq() to get the interrupt
	Revert "gpio: Revert regression in sysfs-gpio (gpiolib.c)"
	spi: Fix erroneous sgs value with min_t()
	Input: zinitix - do not report shadow fingers
	af_key: add __GFP_ZERO flag for compose_sadb_supported in function pfkey_register
	net: dsa: microchip: add spi_device_id tables
	selftests: vm: fix clang build error multiple output files
	locking/lockdep: Avoid potential access of invalid memory in lock_class
	drm/amdgpu: move PX checking into amdgpu_device_ip_early_init
	drm/amdgpu: only check for _PR3 on dGPUs
	iommu/iova: Improve 32-bit free space estimate
	virtio-blk: Use blk_validate_block_size() to validate block size
	tpm: fix reference counting for struct tpm_chip
	usb: typec: tipd: Forward plug orientation to typec subsystem
	USB: usb-storage: Fix use of bitfields for hardware data in ene_ub6250.c
	xhci: fix garbage USBSTS being logged in some cases
	xhci: fix runtime PM imbalance in USB2 resume
	xhci: make xhci_handshake timeout for xhci_reset() adjustable
	xhci: fix uninitialized string returned by xhci_decode_ctrl_ctx()
	mei: me: disable driver on the ign firmware
	mei: me: add Alder Lake N device id.
	mei: avoid iterator usage outside of list_for_each_entry
	bus: mhi: pci_generic: Add mru_default for Quectel EM1xx series
	bus: mhi: Fix MHI DMA structure endianness
	docs: sphinx/requirements: Limit jinja2<3.1
	coresight: Fix TRCCONFIGR.QE sysfs interface
	coresight: syscfg: Fix memleak on registration failure in cscfg_create_device
	iio: afe: rescale: use s64 for temporary scale calculations
	iio: inkern: apply consumer scale on IIO_VAL_INT cases
	iio: inkern: apply consumer scale when no channel scale is available
	iio: inkern: make a best effort on offset calculation
	greybus: svc: fix an error handling bug in gb_svc_hello()
	clk: rockchip: re-add rational best approximation algorithm to the fractional divider
	clk: uniphier: Fix fixed-rate initialization
	ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
	cifs: fix handlecache and multiuser
	cifs: we do not need a spinlock around the tree access during umount
	KEYS: fix length validation in keyctl_pkey_params_get_2()
	KEYS: asymmetric: enforce that sig algo matches key algo
	KEYS: asymmetric: properly validate hash_algo and encoding
	Documentation: add link to stable release candidate tree
	Documentation: update stable tree link
	firmware: stratix10-svc: add missing callback parameter on RSU
	firmware: sysfb: fix platform-device leak in error path
	HID: intel-ish-hid: Use dma_alloc_coherent for firmware update
	SUNRPC: avoid race between mod_timer() and del_timer_sync()
	NFS: NFSv2/v3 clients should never be setting NFS_CAP_XATTR
	NFSD: prevent underflow in nfssvc_decode_writeargs()
	NFSD: prevent integer overflow on 32 bit systems
	f2fs: fix to unlock page correctly in error path of is_alive()
	f2fs: quota: fix loop condition at f2fs_quota_sync()
	f2fs: fix to do sanity check on .cp_pack_total_block_count
	remoteproc: Fix count check in rproc_coredump_write()
	mm/mlock: fix two bugs in user_shm_lock()
	pinctrl: ingenic: Fix regmap on X series SoCs
	pinctrl: samsung: drop pin banks references on error paths
	net: bnxt_ptp: fix compilation error
	spi: mxic: Fix the transmit path
	mtd: rawnand: protect access to rawnand devices while in suspend
	can: ems_usb: ems_usb_start_xmit(): fix double dev_kfree_skb() in error path
	can: m_can: m_can_tx_handler(): fix use after free of skb
	can: usb_8dev: usb_8dev_start_xmit(): fix double dev_kfree_skb() in error path
	jffs2: fix use-after-free in jffs2_clear_xattr_subsystem
	jffs2: fix memory leak in jffs2_do_mount_fs
	jffs2: fix memory leak in jffs2_scan_medium
	mm: fs: fix lru_cache_disabled race in bh_lru
	mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node
	mm: invalidate hwpoison page cache page in fault path
	mempolicy: mbind_range() set_policy() after vma_merge()
	scsi: core: sd: Add silence_suspend flag to suppress some PM messages
	scsi: ufs: Fix runtime PM messages never-ending cycle
	scsi: scsi_transport_fc: Fix FPIN Link Integrity statistics counters
	scsi: libsas: Fix sas_ata_qc_issue() handling of NCQ NON DATA commands
	qed: display VF trust config
	qed: validate and restrict untrusted VFs vlan promisc mode
	riscv: dts: canaan: Fix SPI3 bus width
	riscv: Fix fill_callchain return value
	riscv: Increase stack size under KASAN
	Revert "Input: clear BTN_RIGHT/MIDDLE on buttonpads"
	cifs: prevent bad output lengths in smb2_ioctl_query_info()
	cifs: fix NULL ptr dereference in smb2_ioctl_query_info()
	ALSA: cs4236: fix an incorrect NULL check on list iterator
	ALSA: hda: Avoid unsol event during RPM suspending
	ALSA: pcm: Fix potential AB/BA lock with buffer_mutex and mmap_lock
	ALSA: hda/realtek: Fix audio regression on Mi Notebook Pro 2020
	rtc: mc146818-lib: fix locking in mc146818_set_time
	rtc: pl031: fix rtc features null pointer dereference
	ocfs2: fix crash when mount with quota enabled
	drm/simpledrm: Add "panel orientation" property on non-upright mounted LCD panels
	mm: madvise: skip unmapped vma holes passed to process_madvise
	mm: madvise: return correct bytes advised with process_madvise
	Revert "mm: madvise: skip unmapped vma holes passed to process_madvise"
	mm,hwpoison: unmap poisoned page before invalidation
	mm/kmemleak: reset tag when compare object pointer
	dm stats: fix too short end duration_ns when using precise_timestamps
	dm: fix use-after-free in dm_cleanup_zoned_dev()
	dm: interlock pending dm_io and dm_wait_for_bios_completion
	dm: fix double accounting of flush with data
	dm integrity: set journal entry unused when shrinking device
	tracing: Have trace event string test handle zero length strings
	drbd: fix potential silent data corruption
	powerpc/kvm: Fix kvm_use_magic_page
	PCI: fu740: Force 2.5GT/s for initial device probe
	arm64: signal: nofpsimd: Do not allocate fp/simd context when not available
	arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones
	arm64: dts: qcom: sm8250: Fix MSI IRQ for PCIe1 and PCIe2
	arm64: dts: ti: k3-am65: Fix gic-v3 compatible regs
	arm64: dts: ti: k3-j721e: Fix gic-v3 compatible regs
	arm64: dts: ti: k3-j7200: Fix gic-v3 compatible regs
	arm64: dts: ti: k3-am64: Fix gic-v3 compatible regs
	ASoC: SOF: Intel: Fix NULL ptr dereference when ENOMEM
	Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag"
	ACPI: properties: Consistently return -ENOENT if there are no more references
	coredump: Also dump first pages of non-executable ELF libraries
	ext4: fix ext4_fc_stats trace point
	ext4: fix fs corruption when tring to remove a non-empty directory with IO error
	ext4: make mb_optimize_scan performance mount option work with extents
	drivers: hamradio: 6pack: fix UAF bug caused by mod_timer()
	samples/landlock: Fix path_list memory leak
	landlock: Use square brackets around "landlock-ruleset"
	mailbox: tegra-hsp: Flush whole channel
	block: limit request dispatch loop duration
	block: don't merge across cgroup boundaries if blkcg is enabled
	drm/edid: check basic audio support on CEA extension block
	fbdev: Hot-unplug firmware fb devices on forced removal
	video: fbdev: sm712fb: Fix crash in smtcfb_read()
	video: fbdev: atari: Atari 2 bpp (STe) palette bugfix
	rfkill: make new event layout opt-in
	ARM: dts: at91: sama7g5: Remove unused properties in i2c nodes
	ARM: dts: at91: sama5d2: Fix PMERRLOC resource size
	ARM: dts: exynos: fix UART3 pins configuration in Exynos5250
	ARM: dts: exynos: add missing HDMI supplies on SMDK5250
	ARM: dts: exynos: add missing HDMI supplies on SMDK5420
	mgag200 fix memmapsl configuration in GCTL6 register
	carl9170: fix missing bit-wise or operator for tx_params
	pstore: Don't use semaphores in always-atomic-context code
	thermal: int340x: Increase bitmap size
	lib/raid6/test: fix multiple definition linking error
	exec: Force single empty string when argv is empty
	crypto: rsa-pkcs1pad - only allow with rsa
	crypto: rsa-pkcs1pad - correctly get hash from source scatterlist
	crypto: rsa-pkcs1pad - restore signature length check
	crypto: rsa-pkcs1pad - fix buffer overread in pkcs1pad_verify_complete()
	bcache: fixup multiple threads crash
	PM: domains: Fix sleep-in-atomic bug caused by genpd_debug_remove()
	DEC: Limit PMAX memory probing to R3k systems
	media: gpio-ir-tx: fix transmit with long spaces on Orange Pi PC
	media: venus: hfi_cmds: List HDR10 property as unsupported for v1 and v3
	media: venus: venc: Fix h264 8x8 transform control
	media: davinci: vpif: fix unbalanced runtime PM get
	media: davinci: vpif: fix unbalanced runtime PM enable
	btrfs: zoned: mark relocation as writing
	btrfs: extend locking to all space_info members accesses
	btrfs: verify the tranisd of the to-be-written dirty extent buffer
	xtensa: define update_mmu_tlb function
	xtensa: fix stop_machine_cpuslocked call in patch_text
	xtensa: fix xtensa_wsr always writing 0
	drm/syncobj: flatten dma_fence_chains on transfer
	drm/nouveau/backlight: Fix LVDS backlight detection on some laptops
	drm/nouveau/backlight: Just set all backlight types as RAW
	drm/fb-helper: Mark screen buffers in system memory with FBINFO_VIRTFB
	brcmfmac: firmware: Allocate space for default boardrev in nvram
	brcmfmac: pcie: Release firmwares in the brcmf_pcie_setup error path
	brcmfmac: pcie: Declare missing firmware files in pcie.c
	brcmfmac: pcie: Replace brcmf_pcie_copy_mem_todev with memcpy_toio
	brcmfmac: pcie: Fix crashes due to early IRQs
	drm/i915/opregion: check port number bounds for SWSCI display power state
	drm/i915/gem: add missing boundary check in vm_access
	PCI: imx6: Allow to probe when dw_pcie_wait_for_link() fails
	PCI: pciehp: Clear cmd_busy bit in polling mode
	PCI: xgene: Revert "PCI: xgene: Fix IB window setup"
	regulator: qcom_smd: fix for_each_child.cocci warnings
	selinux: access superblock_security_struct in LSM blob way
	selinux: check return value of sel_make_avc_files
	crypto: ccp - Ensure psp_ret is always init'd in __sev_platform_init_locked()
	hwrng: cavium - Check health status while reading random data
	hwrng: cavium - HW_RANDOM_CAVIUM should depend on ARCH_THUNDER
	crypto: sun8i-ss - really disable hash on A80
	crypto: authenc - Fix sleep in atomic context in decrypt_tail
	crypto: mxs-dcp - Fix scatterlist processing
	selinux: Fix selinux_sb_mnt_opts_compat()
	thermal: int340x: Check for NULL after calling kmemdup()
	crypto: octeontx2 - remove CONFIG_DM_CRYPT check
	spi: tegra114: Add missing IRQ check in tegra_spi_probe
	spi: tegra210-quad: Fix missin IRQ check in tegra_qspi_probe
	stack: Constrain and fix stack offset randomization with Clang builds
	arm64/mm: avoid fixmap race condition when create pud mapping
	blk-cgroup: set blkg iostat after percpu stat aggregation
	selftests/x86: Add validity check and allow field splitting
	selftests/sgx: Treat CC as one argument
	crypto: rockchip - ECB does not need IV
	audit: log AUDIT_TIME_* records only from rules
	EVM: fix the evm= __setup handler return value
	crypto: ccree - don't attempt 0 len DMA mappings
	crypto: hisilicon/sec - fix the aead software fallback for engine
	spi: pxa2xx-pci: Balance reference count for PCI DMA device
	hwmon: (pmbus) Add mutex to regulator ops
	hwmon: (sch56xx-common) Replace WDOG_ACTIVE with WDOG_HW_RUNNING
	nvme: cleanup __nvme_check_ids
	nvme: fix the check for duplicate unique identifiers
	block: don't delete queue kobject before its children
	PM: hibernate: fix __setup handler error handling
	PM: suspend: fix return value of __setup handler
	spi: spi-zynqmp-gqspi: Handle error for dma_set_mask
	hwrng: atmel - disable trng on failure path
	crypto: sun8i-ss - call finalize with bh disabled
	crypto: sun8i-ce - call finalize with bh disabled
	crypto: amlogic - call finalize with bh disabled
	crypto: gemini - call finalize with bh disabled
	crypto: vmx - add missing dependencies
	clocksource/drivers/timer-ti-dm: Fix regression from errata i940 fix
	clocksource/drivers/exynos_mct: Refactor resources allocation
	clocksource/drivers/exynos_mct: Handle DTS with higher number of interrupts
	clocksource/drivers/timer-microchip-pit64b: Use notrace
	clocksource/drivers/timer-of: Check return value of of_iomap in timer_of_base_init()
	arm64: prevent instrumentation of bp hardening callbacks
	KEYS: trusted: Fix trusted key backends when building as module
	KEYS: trusted: Avoid calling null function trusted_key_exit
	ACPI: APEI: fix return value of __setup handlers
	crypto: ccp - ccp_dmaengine_unregister release dma channels
	crypto: ccree - Fix use after free in cc_cipher_exit()
	hwrng: nomadik - Change clk_disable to clk_disable_unprepare
	hwmon: (pmbus) Add Vin unit off handling
	clocksource: acpi_pm: fix return value of __setup handler
	io_uring: don't check unrelated req->open.how in accept request
	io_uring: terminate manual loop iterator loop correctly for non-vecs
	watch_queue: Fix NULL dereference in error cleanup
	watch_queue: Actually free the watch
	f2fs: fix to enable ATGC correctly via gc_idle sysfs interface
	sched/debug: Remove mpol_get/put and task_lock/unlock from sched_show_numa
	sched/core: Export pelt_thermal_tp
	sched/uclamp: Fix iowait boost escaping uclamp restriction
	rseq: Remove broken uapi field layout on 32-bit little endian
	perf/core: Fix address filter parser for multiple filters
	perf/x86/intel/pt: Fix address filter config for 32-bit kernel
	sched/fair: Improve consistency of allowed NUMA balance calculations
	f2fs: fix missing free nid in f2fs_handle_failed_inode
	nfsd: more robust allocation failure handling in nfsd_file_cache_init
	sched/cpuacct: Fix charge percpu cpuusage
	sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
	f2fs: fix to avoid potential deadlock
	btrfs: fix unexpected error path when reflinking an inline extent
	f2fs: fix compressed file start atomic write may cause data corruption
	selftests, x86: fix how check_cc.sh is being invoked
	drivers/base/memory: add memory block to memory group after registration succeeded
	kunit: make kunit_test_timeout compatible with comment
	pinctrl: samsung: Remove EINT handler for Exynos850 ALIVE and CMGP gpios
	media: staging: media: zoran: fix usage of vb2_dma_contig_set_max_seg_size
	media: camss: csid-170: fix non-10bit formats
	media: camss: csid-170: don't enable unused irqs
	media: camss: csid-170: set the right HALT_CMD when disabled
	media: camss: vfe-170: fix "VFE halt timeout" error
	media: staging: media: imx: imx7-mipi-csis: Make subdev name unique
	media: v4l2-mem2mem: Apply DST_QUEUE_OFF_BASE on MMAP buffers across ioctls
	media: mtk-vcodec: potential dereference of null pointer
	media: imx: imx8mq-mipi-csi2: remove wrong irq config write operation
	media: imx: imx8mq-mipi_csi2: fix system resume
	media: bttv: fix WARNING regression on tunerless devices
	media: atmel: atmel-sama7g5-isc: fix ispck leftover
	ASoC: sh: rz-ssi: Drop calling rz_ssi_pio_recv() recursively
	ASoC: codecs: Check for error pointer after calling devm_regmap_init_mmio
	ASoC: xilinx: xlnx_formatter_pcm: Handle sysclk setting
	ASoC: simple-card-utils: Set sysclk on all components
	media: coda: Fix missing put_device() call in coda_get_vdoa_data
	media: meson: vdec: potential dereference of null pointer
	media: hantro: Fix overfill bottom register field name
	media: ov6650: Fix set format try processing path
	media: v4l: Avoid unaligned access warnings when printing 4cc modifiers
	media: ov5648: Don't pack controls struct
	media: aspeed: Correct value for h-total-pixels
	video: fbdev: matroxfb: set maxvram of vbG200eW to the same as vbG200 to avoid black screen
	video: fbdev: controlfb: Fix COMPILE_TEST build
	video: fbdev: smscufx: Fix null-ptr-deref in ufx_usb_probe()
	video: fbdev: atmel_lcdfb: fix an error code in atmel_lcdfb_probe()
	video: fbdev: fbcvt.c: fix printing in fb_cvt_print_name()
	ARM: dts: Fix OpenBMC flash layout label addresses
	firmware: qcom: scm: Remove reassignment to desc following initializer
	ARM: dts: qcom: ipq4019: fix sleep clock
	soc: qcom: rpmpd: Check for null return of devm_kcalloc
	soc: qcom: ocmem: Fix missing put_device() call in of_get_ocmem
	soc: qcom: aoss: remove spurious IRQF_ONESHOT flags
	arm64: dts: qcom: sdm845: fix microphone bias properties and values
	arm64: dts: qcom: sm8250: fix PCIe bindings to follow schema
	arm64: dts: broadcom: bcm4908: use proper TWD binding
	arm64: dts: qcom: sm8150: Correct TCS configuration for apps rsc
	arm64: dts: qcom: sm8350: Correct TCS configuration for apps rsc
	firmware: ti_sci: Fix compilation failure when CONFIG_TI_SCI_PROTOCOL is not defined
	soc: ti: wkup_m3_ipc: Fix IRQ check in wkup_m3_ipc_probe
	ARM: dts: sun8i: v3s: Move the csi1 block to follow address order
	vsprintf: Fix potential unaligned access
	ARM: dts: imx: Add missing LVDS decoder on M53Menlo
	media: mexon-ge2d: fixup frames size in registers
	media: video/hdmi: handle short reads of hdmi info frame.
	media: ti-vpe: cal: Fix a NULL pointer dereference in cal_ctx_v4l2_init_formats()
	media: em28xx: initialize refcount before kref_get
	media: usb: go7007: s2250-board: fix leak in probe()
	media: cedrus: H265: Fix neighbour info buffer size
	media: cedrus: h264: Fix neighbour info buffer size
	ASoC: codecs: rx-macro: fix accessing compander for aux
	ASoC: codecs: rx-macro: fix accessing array out of bounds for enum type
	ASoC: codecs: va-macro: fix accessing array out of bounds for enum type
	ASoC: codecs: wc938x: fix accessing array out of bounds for enum type
	ASoC: codecs: wcd938x: fix kcontrol max values
	ASoC: codecs: wcd934x: fix kcontrol max values
	ASoC: codecs: wcd934x: fix return value of wcd934x_rx_hph_mode_put
	media: v4l2-core: Initialize h264 scaling matrix
	media: ov5640: Fix set format, v4l2_mbus_pixelcode not updated
	selftests/lkdtm: Add UBSAN config
	lib: uninline simple_strntoull() as well
	vsprintf: Fix %pK with kptr_restrict == 0
	uaccess: fix nios2 and microblaze get_user_8()
	ASoC: rt5663: check the return value of devm_kzalloc() in rt5663_parse_dp()
	soc: mediatek: pm-domains: Add wakeup capacity support in power domain
	mmc: sdhci_am654: Fix the driver data of AM64 SoC
	ASoC: ti: davinci-i2s: Add check for clk_enable()
	ALSA: spi: Add check for clk_enable()
	arm64: dts: ns2: Fix spi-cpol and spi-cpha property
	arm64: dts: broadcom: Fix sata nodename
	printk: fix return value of printk.devkmsg __setup handler
	ASoC: mxs-saif: Handle errors for clk_enable
	ASoC: atmel_ssc_dai: Handle errors for clk_enable
	ASoC: dwc-i2s: Handle errors for clk_enable
	ASoC: soc-compress: prevent the potentially use of null pointer
	memory: emif: Add check for setup_interrupts
	memory: emif: check the pointer temp in get_device_details()
	ALSA: firewire-lib: fix uninitialized flag for AV/C deferred transaction
	arm64: dts: rockchip: Fix SDIO regulator supply properties on rk3399-firefly
	m68k: coldfire/device.c: only build for MCF_EDMA when h/w macros are defined
	media: stk1160: If start stream fails, return buffers with VB2_BUF_STATE_QUEUED
	media: vidtv: Check for null return of vzalloc
	ASoC: atmel: Add missing of_node_put() in at91sam9g20ek_audio_probe
	ASoC: wm8350: Handle error for wm8350_register_irq
	ASoC: fsi: Add check for clk_enable
	video: fbdev: omapfb: Add missing of_node_put() in dvic_probe_of
	media: saa7134: fix incorrect use to determine if list is empty
	ivtv: fix incorrect device_caps for ivtvfb
	ASoC: atmel: Fix error handling in snd_proto_probe
	ASoC: rockchip: i2s: Fix missing clk_disable_unprepare() in rockchip_i2s_probe
	ASoC: SOF: Add missing of_node_put() in imx8m_probe
	ASoC: mediatek: use of_device_get_match_data()
	ASoC: mediatek: mt8192-mt6359: Fix error handling in mt8192_mt6359_dev_probe
	ASoC: rk817: Fix missing clk_disable_unprepare() in rk817_platform_probe
	ASoC: dmaengine: do not use a NULL prepare_slave_config() callback
	ASoC: mxs: Fix error handling in mxs_sgtl5000_probe
	ASoC: fsl_spdif: Disable TX clock when stop
	ASoC: imx-es8328: Fix error return code in imx_es8328_probe()
	ASoC: SOF: Intel: enable DMI L1 for playback streams
	ASoC: msm8916-wcd-digital: Fix missing clk_disable_unprepare() in msm8916_wcd_digital_probe
	mmc: davinci_mmc: Handle error for clk_enable
	ASoC: atmel: Fix error handling in sam9x5_wm8731_driver_probe
	ASoC: msm8916-wcd-analog: Fix error handling in pm8916_wcd_analog_spmi_probe
	ASoC: codecs: wcd934x: Add missing of_node_put() in wcd934x_codec_parse_data
	ASoC: amd: Fix reference to PCM buffer address
	ARM: configs: multi_v5_defconfig: re-enable CONFIG_V4L_PLATFORM_DRIVERS
	ARM: configs: multi_v5_defconfig: re-enable DRM_PANEL and FB_xxx
	drm/meson: osd_afbcd: Add an exit callback to struct meson_afbcd_ops
	drm/meson: Make use of the helper function devm_platform_ioremap_resourcexxx()
	drm/meson: split out encoder from meson_dw_hdmi
	drm/meson: Fix error handling when afbcd.ops->init fails
	drm/bridge: Fix free wrong object in sii8620_init_rcp_input_dev
	drm/bridge: Add missing pm_runtime_disable() in __dw_mipi_dsi_probe
	drm/bridge: nwl-dsi: Fix PM disable depth imbalance in nwl_dsi_probe
	drm: bridge: adv7511: Fix ADV7535 HPD enablement
	ath10k: fix memory overwrite of the WoWLAN wakeup packet pattern
	drm/v3d/v3d_drv: Check for error num after setting mask
	drm/panfrost: Check for error num after setting mask
	libbpf: Fix possible NULL pointer dereference when destroying skeleton
	bpftool: Only set obj->skeleton on complete success
	udmabuf: validate ubuf->pagecount
	bpf: Fix UAF due to race between btf_try_get_module and load_module
	drm/selftests/test-drm_dp_mst_helper: Fix memory leak in sideband_msg_req_encode_decode
	selftests: bpf: Fix bind on used port
	Bluetooth: btintel: Fix WBS setting for Intel legacy ROM products
	Bluetooth: hci_serdev: call init_rwsem() before p->open()
	mtd: onenand: Check for error irq
	mtd: rawnand: gpmi: fix controller timings setting
	drm/edid: Don't clear formats if using deep color
	drm/edid: Split deep color modes between RGB and YUV444
	ionic: fix type complaint in ionic_dev_cmd_clean()
	ionic: start watchdog after all is setup
	ionic: Don't send reset commands if FW isn't running
	drm/nouveau/acr: Fix undefined behavior in nvkm_acr_hsfw_load_bl()
	drm/amd/display: Fix a NULL pointer dereference in amdgpu_dm_connector_add_common_modes()
	drm/amd/pm: return -ENOTSUPP if there is no get_dpm_ultimate_freq function
	net: phy: at803x: move page selection fix to config_init
	selftests/bpf: Normalize XDP section names in selftests
	selftests/bpf/test_xdp_redirect_multi: use temp netns for testing
	ath9k_htc: fix uninit value bugs
	RDMA/core: Set MR type in ib_reg_user_mr
	KVM: PPC: Fix vmx/vsx mixup in mmio emulation
	selftests/net: timestamping: Fix bind_phc check
	i40e: don't reserve excessive XDP_PACKET_HEADROOM on XSK Rx to skb
	i40e: respect metadata on XSK Rx to skb
	igc: don't reserve excessive XDP_PACKET_HEADROOM on XSK Rx to skb
	ixgbe: pass bi->xdp to ixgbe_construct_skb_zc() directly
	ixgbe: don't reserve excessive XDP_PACKET_HEADROOM on XSK Rx to skb
	ixgbe: respect metadata on XSK Rx to skb
	power: reset: gemini-poweroff: Fix IRQ check in gemini_poweroff_probe
	ray_cs: Check ioremap return value
	powerpc: dts: t1040rdb: fix ports names for Seville Ethernet switch
	KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init
	powerpc/perf: Don't use perf_hw_context for trace IMC PMU
	mt76: connac: fix sta_rec_wtbl tag len
	mt76: mt7915: use proper aid value in mt7915_mcu_wtbl_generic_tlv in sta mode
	mt76: mt7915: use proper aid value in mt7915_mcu_sta_basic_tlv
	mt76: mt7921: fix a leftover race in runtime-pm
	mt76: mt7615: fix a leftover race in runtime-pm
	mt76: mt7603: check sta_rates pointer in mt7603_sta_rate_tbl_update
	mt76: mt7615: check sta_rates pointer in mt7615_sta_rate_tbl_update
	ptp: unregister virtual clocks when unregistering physical clock.
	net: dsa: mv88e6xxx: Enable port policy support on 6097
	mac80211: Remove a couple of obsolete TODO
	mac80211: limit bandwidth in HE capabilities
	scripts/dtc: Call pkg-config POSIXly correct
	livepatch: Fix build failure on 32 bits processors
	net: asix: add proper error handling of usb read errors
	i2c: bcm2835: Use platform_get_irq() to get the interrupt
	i2c: bcm2835: Fix the error handling in 'bcm2835_i2c_probe()'
	mtd: mchp23k256: Add SPI ID table
	mtd: mchp48l640: Add SPI ID table
	igc: avoid kernel warning when changing RX ring parameters
	igb: refactor XDP registration
	PCI: aardvark: Fix reading MSI interrupt number
	PCI: aardvark: Fix reading PCI_EXP_RTSTA_PME bit on emulated bridge
	RDMA/rxe: Check the last packet by RXE_END_MASK
	libbpf: Fix signedness bug in btf_dump_array_data()
	cxl/core: Fix cxl_probe_component_regs() error message
	cxl/regs: Fix size of CXL Capability Header Register
	net:enetc: allocate CBD ring data memory using DMA coherent methods
	libbpf: Fix compilation warning due to mismatched printf format
	drm/bridge: dw-hdmi: use safe format when first in bridge chain
	libbpf: Use dynamically allocated buffer when receiving netlink messages
	power: supply: ab8500: Fix memory leak in ab8500_fg_sysfs_init
	HID: i2c-hid: fix GET/SET_REPORT for unnumbered reports
	iommu/ipmmu-vmsa: Check for error num after setting mask
	drm/bridge: anx7625: Fix overflow issue on reading EDID
	bpftool: Fix the error when lookup in no-btf maps
	drm/amd/pm: enable pm sysfs write for one VF mode
	drm/amd/display: Add affected crtcs to atomic state for dsc mst unplug
	libbpf: Fix memleak in libbpf_netlink_recv()
	IB/cma: Allow XRC INI QPs to set their local ACK timeout
	dax: make sure inodes are flushed before destroy cache
	selftests: mptcp: add csum mib check for mptcp_connect
	iwlwifi: mvm: Don't call iwl_mvm_sta_from_mac80211() with NULL sta
	iwlwifi: mvm: don't iterate unadded vifs when handling FW SMPS req
	iwlwifi: mvm: align locking in D3 test debugfs
	iwlwifi: yoyo: remove DBGI_SRAM address reset writing
	iwlwifi: Fix -EIO error code that is never returned
	iwlwifi: mvm: Fix an error code in iwl_mvm_up()
	mtd: rawnand: pl353: Set the nand chip node as the flash node
	drm/msm/dp: populate connector of struct dp_panel
	drm/msm/dp: stop link training after link training 2 failed
	drm/msm/dp: always add fail-safe mode into connector mode list
	drm/msm/dsi: Use "ref" fw clock instead of global name for VCO parent
	drm/msm/dsi/phy: fix 7nm v4.0 settings for C-PHY mode
	drm/msm/dpu: add DSPP blocks teardown
	drm/msm/dpu: fix dp audio condition
	dm crypt: fix get_key_size compiler warning if !CONFIG_KEYS
	vfio/pci: fix memory leak during D3hot to D0 transition
	vfio/pci: wake-up devices around reset functions
	scsi: fnic: Fix a tracing statement
	scsi: pm8001: Fix command initialization in pm80XX_send_read_log()
	scsi: pm8001: Fix command initialization in pm8001_chip_ssp_tm_req()
	scsi: pm8001: Fix payload initialization in pm80xx_set_thermal_config()
	scsi: pm8001: Fix le32 values handling in pm80xx_set_sas_protocol_timer_config()
	scsi: pm8001: Fix payload initialization in pm80xx_encrypt_update()
	scsi: pm8001: Fix le32 values handling in pm80xx_chip_ssp_io_req()
	scsi: pm8001: Fix le32 values handling in pm80xx_chip_sata_req()
	scsi: pm8001: Fix NCQ NON DATA command task initialization
	scsi: pm8001: Fix NCQ NON DATA command completion handling
	scsi: pm8001: Fix abort all task initialization
	RDMA/mlx5: Fix the flow of a miss in the allocation of a cache ODP MR
	drm/amd/display: Remove vupdate_int_entry definition
	TOMOYO: fix __setup handlers return values
	power: supply: sbs-charger: Don't cancel work that is not initialized
	ext2: correct max file size computing
	drm/tegra: Fix reference leak in tegra_dsi_ganged_probe
	power: supply: bq24190_charger: Fix bq24190_vbus_is_enabled() wrong false return
	scsi: hisi_sas: Change permission of parameter prot_mask
	drm/bridge: cdns-dsi: Make sure to to create proper aliases for dt
	bpf, arm64: Call build_prologue() first in first JIT pass
	bpf, arm64: Feed byte-offset into bpf line info
	xsk: Fix race at socket teardown
	RDMA/irdma: Fix netdev notifications for vlan's
	RDMA/irdma: Fix Passthrough mode in VM
	RDMA/irdma: Remove incorrect masking of PD
	gpu: host1x: Fix a memory leak in 'host1x_remove()'
	libbpf: Skip forward declaration when counting duplicated type names
	powerpc/mm/numa: skip NUMA_NO_NODE onlining in parse_numa_properties()
	powerpc/Makefile: Don't pass -mcpu=powerpc64 when building 32-bit
	KVM: x86: Fix emulation in writing cr8
	KVM: x86/emulator: Defer not-present segment check in __load_segment_descriptor()
	hv_balloon: rate-limit "Unhandled message" warning
	i2c: xiic: Make bus names unique
	power: supply: wm8350-power: Handle error for wm8350_register_irq
	power: supply: wm8350-power: Add missing free in free_charger_irq
	IB/hfi1: Allow larger MTU without AIP
	RDMA/core: Fix ib_qp_usecnt_dec() called when error
	PCI: Reduce warnings on possible RW1C corruption
	net: axienet: fix RX ring refill allocation failure handling
	drm/msm/a6xx: Fix missing ARRAY_SIZE() check
	mips: DEC: honor CONFIG_MIPS_FP_SUPPORT=n
	MIPS: Sanitise Cavium switch cases in TLB handler synthesizers
	powerpc/sysdev: fix incorrect use to determine if list is empty
	powerpc/64s: Don't use DSISR for SLB faults
	mfd: mc13xxx: Add check for mc13xxx_irq_request
	libbpf: Unmap rings when umem deleted
	selftests/bpf: Make test_lwt_ip_encap more stable and faster
	platform/x86: huawei-wmi: check the return value of device_create_file()
	scsi: mpt3sas: Fix incorrect 4GB boundary check
	powerpc: 8xx: fix a return value error in mpc8xx_pic_init
	vxcan: enable local echo for sent CAN frames
	ath10k: Fix error handling in ath10k_setup_msa_resources
	mips: cdmm: Fix refcount leak in mips_cdmm_phys_base
	MIPS: RB532: fix return value of __setup handler
	MIPS: pgalloc: fix memory leak caused by pgd_free()
	mtd: rawnand: atmel: fix refcount issue in atmel_nand_controller_init
	power: ab8500_chargalg: Use CLOCK_MONOTONIC
	RDMA/irdma: Prevent some integer underflows
	Revert "RDMA/core: Fix ib_qp_usecnt_dec() called when error"
	RDMA/mlx5: Fix memory leak in error flow for subscribe event routine
	bpf, sockmap: Fix memleak in sk_psock_queue_msg
	bpf, sockmap: Fix memleak in tcp_bpf_sendmsg while sk msg is full
	bpf, sockmap: Fix more uncharged while msg has more_data
	bpf, sockmap: Fix double uncharge the mem of sk_msg
	samples/bpf, xdpsock: Fix race when running for fix duration of time
	USB: storage: ums-realtek: fix error code in rts51x_read_mem()
	drm/i915/display: Fix HPD short pulse handling for eDP
	netfilter: flowtable: Fix QinQ and pppoe support for inet table
	mt76: mt7921: fix mt7921_queues_acq implementation
	can: isotp: sanitize CAN ID checks in isotp_bind()
	can: isotp: return -EADDRNOTAVAIL when reading from unbound socket
	can: isotp: support MSG_TRUNC flag when reading from socket
	bareudp: use ipv6_mod_enabled to check if IPv6 enabled
	ibmvnic: fix race between xmit and reset
	af_unix: Fix some data-races around unix_sk(sk)->oob_skb.
	selftests/bpf: Fix error reporting from sock_fields programs
	Bluetooth: hci_uart: add missing NULL check in h5_enqueue
	Bluetooth: call hci_le_conn_failed with hdev lock in hci_le_conn_failed
	Bluetooth: btmtksdio: Fix kernel oops in btmtksdio_interrupt
	ipv4: Fix route lookups when handling ICMP redirects and PMTU updates
	af_netlink: Fix shift out of bounds in group mask calculation
	i2c: meson: Fix wrong speed use from probe
	netfilter: conntrack: Add and use nf_ct_set_auto_assign_helper_warned()
	i2c: mux: demux-pinctrl: do not deactivate a master that is not active
	powerpc/pseries: Fix use after free in remove_phb_dynamic()
	selftests/bpf/test_lirc_mode2.sh: Exit with proper code
	PCI: Avoid broken MSI on SB600 USB devices
	net: bcmgenet: Use stronger register read/writes to assure ordering
	tcp: ensure PMTU updates are processed during fastopen
	openvswitch: always update flow key after nat
	net: dsa: fix panic on shutdown if multi-chip tree failed to probe
	tipc: fix the timer expires after interval 100ms
	mfd: asic3: Add missing iounmap() on error asic3_mfd_probe
	ice: fix 'scheduling while atomic' on aux critical err interrupt
	ice: don't allow to run ice_send_event_to_aux() in atomic ctx
	drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool
	kernel/resource: fix kfree() of bootmem memory again
	staging: r8188eu: convert DBG_88E_LEVEL call in hal/rtl8188e_hal_init.c
	staging: r8188eu: release_firmware is not called if allocation fails
	mxser: fix xmit_buf leak in activate when LSR == 0xff
	fsi: scom: Fix error handling
	fsi: scom: Remove retries in indirect scoms
	pwm: lpc18xx-sct: Initialize driver data and hardware before pwmchip_add()
	pps: clients: gpio: Propagate return value from pps_gpio_probe
	fsi: Aspeed: Fix a potential double free
	misc: alcor_pci: Fix an error handling path
	cpufreq: qcom-cpufreq-nvmem: fix reading of PVS Valid fuse
	soundwire: intel: fix wrong register name in intel_shim_wake
	clk: qcom: ipq8074: fix PCI-E clock oops
	dmaengine: idxd: check GENCAP config support for gencfg register
	dmaengine: idxd: change bandwidth token to read buffers
	dmaengine: idxd: restore traffic class defaults after wq reset
	iio: mma8452: Fix probe failing when an i2c_device_id is used
	serial: 8250_aspeed_vuart: add PORT_ASPEED_VUART port type
	staging:iio:adc:ad7280a: Fix handing of device address bit reversing.
	pinctrl: renesas: r8a77470: Reduce size for narrow VIN1 channel
	pinctrl: renesas: checker: Fix miscalculation of number of states
	clk: qcom: ipq8074: Use floor ops for SDCC1 clock
	phy: dphy: Correct lpx parameter and its derivatives(ta_{get,go,sure})
	phy: phy-brcm-usb: fixup BCM4908 support
	serial: 8250_mid: Balance reference count for PCI DMA device
	serial: 8250_lpss: Balance reference count for PCI DMA device
	NFS: Use of mapping_set_error() results in spurious errors
	serial: 8250: Fix race condition in RTS-after-send handling
	iio: adc: Add check for devm_request_threaded_irq
	habanalabs: Add check for pci_enable_device
	NFS: Return valid errors from nfs2/3_decode_dirent()
	staging: r8188eu: fix endless loop in recv_func
	dma-debug: fix return value of __setup handlers
	clk: imx7d: Remove audio_mclk_root_clk
	clk: imx: off by one in imx_lpcg_parse_clks_from_dt()
	clk: at91: sama7g5: fix parents of PDMCs' GCLK
	clk: qcom: clk-rcg2: Update logic to calculate D value for RCG
	clk: qcom: clk-rcg2: Update the frac table for pixel clock
	dmaengine: hisi_dma: fix MSI allocate fail when reload hisi_dma
	remoteproc: qcom: Fix missing of_node_put in adsp_alloc_memory_region
	remoteproc: qcom_wcnss: Add missing of_node_put() in wcnss_alloc_memory_region
	remoteproc: qcom_q6v5_mss: Fix some leaks in q6v5_alloc_memory_region
	nvdimm/region: Fix default alignment for small regions
	clk: actions: Terminate clk_div_table with sentinel element
	clk: loongson1: Terminate clk_div_table with sentinel element
	clk: hisilicon: Terminate clk_div_table with sentinel element
	clk: clps711x: Terminate clk_div_table with sentinel element
	clk: Fix clk_hw_get_clk() when dev is NULL
	clk: tegra: tegra124-emc: Fix missing put_device() call in emc_ensure_emc_driver
	mailbox: imx: fix crash in resume on i.mx8ulp
	NFS: remove unneeded check in decode_devicenotify_args()
	staging: mt7621-dts: fix LEDs and pinctrl on GB-PC1 devicetree
	staging: mt7621-dts: fix formatting
	staging: mt7621-dts: fix pinctrl properties for ethernet
	staging: mt7621-dts: fix GB-PC2 devicetree
	pinctrl: mediatek: Fix missing of_node_put() in mtk_pctrl_init
	pinctrl: mediatek: paris: Fix PIN_CONFIG_BIAS_* readback
	pinctrl: mediatek: paris: Fix "argument" argument type for mtk_pinconf_get()
	pinctrl: mediatek: paris: Fix pingroup pin config state readback
	pinctrl: mediatek: paris: Skip custom extra pin config dump for virtual GPIOs
	pinctrl: microchip sgpio: use reset driver
	pinctrl: microchip-sgpio: lock RMW access
	pinctrl: nomadik: Add missing of_node_put() in nmk_pinctrl_probe
	pinctrl/rockchip: Add missing of_node_put() in rockchip_pinctrl_probe
	tty: hvc: fix return value of __setup handler
	kgdboc: fix return value of __setup handler
	serial: 8250: fix XOFF/XON sending when DMA is used
	virt: acrn: obtain pa from VMA with PFNMAP flag
	virt: acrn: fix a memory leak in acrn_dev_ioctl()
	kgdbts: fix return value of __setup handler
	firmware: google: Properly state IOMEM dependency
	driver core: dd: fix return value of __setup handler
	jfs: fix divide error in dbNextAG
	netfilter: nf_conntrack_tcp: preserve liberal flag in tcp options
	SUNRPC don't resend a task on an offlined transport
	NFSv4.1: don't retry BIND_CONN_TO_SESSION on session error
	kdb: Fix the putarea helper function
	perf stat: Fix forked applications enablement of counters
	clk: qcom: gcc-msm8994: Fix gpll4 width
	vsock/virtio: initialize vdev->priv before using VQs
	vsock/virtio: read the negotiated features before using VQs
	vsock/virtio: enable VQs early on probe
	clk: Initialize orphan req_rate
	xen: fix is_xen_pmu()
	net: enetc: report software timestamping via SO_TIMESTAMPING
	net: hns3: fix bug when PF set the duplicate MAC address for VFs
	net: hns3: fix port base vlan add fail when concurrent with reset
	net: hns3: add vlan list lock to protect vlan list
	net: hns3: format the output of the MAC address
	net: hns3: refine the process when PF set VF VLAN
	net: phy: broadcom: Fix brcm_fet_config_init()
	selftests: test_vxlan_under_vrf: Fix broken test case
	NFS: Don't loop forever in nfs_do_recoalesce()
	net: hns3: clean residual vf config after disable sriov
	net: sparx5: depends on PTP_1588_CLOCK_OPTIONAL
	qlcnic: dcb: default to returning -EOPNOTSUPP
	net/x25: Fix null-ptr-deref caused by x25_disconnect
	net: sparx5: switchdev: fix possible NULL pointer dereference
	octeontx2-af: initialize action variable
	net: prefer nf_ct_put instead of nf_conntrack_put
	net/sched: act_ct: fix ref leak when switching zones
	NFSv4/pNFS: Fix another issue with a list iterator pointing to the head
	net: dsa: bcm_sf2_cfp: fix an incorrect NULL check on list iterator
	fs: fd tables have to be multiples of BITS_PER_LONG
	lib/test: use after free in register_test_dev_kmod()
	fs: fix fd table size alignment properly
	LSM: general protection fault in legacy_parse_param
	regulator: rpi-panel: Handle I2C errors/timing to the Atmel
	crypto: hisilicon/qm - cleanup warning in qm_vf_read_qos
	gcc-plugins/stackleak: Exactly match strings instead of prefixes
	pinctrl: npcm: Fix broken references to chip->parent_device
	rcu: Mark writes to the rcu_segcblist structure's ->flags field
	block/bfq_wf2q: correct weight to ioprio
	crypto: xts - Add softdep on ecb
	crypto: hisilicon/sec - not need to enable sm4 extra mode at HW V3
	block, bfq: don't move oom_bfqq
	selinux: use correct type for context length
	arm64: module: remove (NOLOAD) from linker script
	selinux: allow FIOCLEX and FIONCLEX with policy capability
	loop: use sysfs_emit() in the sysfs xxx show()
	Fix incorrect type in assignment of ipv6 port for audit
	irqchip/qcom-pdc: Fix broken locking
	irqchip/nvic: Release nvic_base upon failure
	fs/binfmt_elf: Fix AT_PHDR for unusual ELF files
	bfq: fix use-after-free in bfq_dispatch_request
	ACPICA: Avoid walking the ACPI Namespace if it is not there
	lib/raid6/test/Makefile: Use $(pound) instead of \# for Make 4.3
	Revert "Revert "block, bfq: honor already-setup queue merges""
	ACPI/APEI: Limit printable size of BERT table data
	PM: core: keep irq flags in device_pm_check_callbacks()
	parisc: Fix handling off probe non-access faults
	nvme-tcp: lockdep: annotate in-kernel sockets
	spi: tegra20: Use of_device_get_match_data()
	atomics: Fix atomic64_{read_acquire,set_release} fallbacks
	locking/lockdep: Iterate lock_classes directly when reading lockdep files
	ext4: correct cluster len and clusters changed accounting in ext4_mb_mark_bb
	ext4: fix ext4_mb_mark_bb() with flex_bg with fast_commit
	sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE
	ext4: don't BUG if someone dirty pages without asking ext4 first
	f2fs: fix to do sanity check on curseg->alloc_type
	NFSD: Fix nfsd_breaker_owns_lease() return values
	f2fs: don't get FREEZE lock in f2fs_evict_inode in frozen fs
	btrfs: harden identification of a stale device
	btrfs: make search_csum_tree return 0 if we get -EFBIG
	f2fs: use spin_lock to avoid hang
	f2fs: compress: fix to print raw data size in error path of lz4 decompression
	Adjust cifssb maximum read size
	ntfs: add sanity check on allocation size
	media: staging: media: zoran: move videodev alloc
	media: staging: media: zoran: calculate the right buffer number for zoran_reap_stat_com
	media: staging: media: zoran: fix various V4L2 compliance errors
	media: atmel: atmel-isc-base: report frame sizes as full supported range
	media: ir_toy: free before error exiting
	ASoC: sh: rz-ssi: Make the data structures available before registering the handlers
	ASoC: SOF: Intel: match sdw version on link_slaves_found
	media: imx-jpeg: Prevent decoding NV12M jpegs into single-planar buffers
	media: iommu/mediatek-v1: Free the existed fwspec if the master dev already has
	media: iommu/mediatek: Return ENODEV if the device is NULL
	media: iommu/mediatek: Add device_link between the consumer and the larb devices
	video: fbdev: nvidiafb: Use strscpy() to prevent buffer overflow
	video: fbdev: w100fb: Reset global state
	video: fbdev: cirrusfb: check pixclock to avoid divide by zero
	video: fbdev: omapfb: acx565akm: replace snprintf with sysfs_emit
	ARM: dts: qcom: fix gic_irq_domain_translate warnings for msm8960
	ARM: dts: bcm2837: Add the missing L1/L2 cache information
	ASoC: madera: Add dependencies on MFD
	media: atomisp_gmin_platform: Add DMI quirk to not turn AXP ELDO2 regulator off on some boards
	media: atomisp: fix dummy_ptr check to avoid duplicate active_bo
	ARM: ftrace: avoid redundant loads or clobbering IP
	ARM: dts: imx7: Use audio_mclk_post_div instead audio_mclk_root_clk
	arm64: defconfig: build imx-sdma as a module
	video: fbdev: omapfb: panel-dsi-cm: Use sysfs_emit() instead of snprintf()
	video: fbdev: omapfb: panel-tpo-td043mtea1: Use sysfs_emit() instead of snprintf()
	video: fbdev: udlfb: replace snprintf in show functions with sysfs_emit
	ARM: dts: bcm2711: Add the missing L1/L2 cache information
	ASoC: soc-core: skip zero num_dai component in searching dai name
	media: imx-jpeg: fix a bug of accessing array out of bounds
	media: cx88-mpeg: clear interrupt status register before streaming video
	uaccess: fix type mismatch warnings from access_ok()
	lib/test_lockup: fix kernel pointer check for separate address spaces
	ARM: tegra: tamonten: Fix I2C3 pad setting
	ARM: mmp: Fix failure to remove sram device
	ASoC: amd: vg: fix for pm resume callback sequence
	video: fbdev: sm712fb: Fix crash in smtcfb_write()
	media: i2c: ov5648: Fix lockdep error
	media: Revert "media: em28xx: add missing em28xx_close_extension"
	media: hdpvr: initialize dev->worker at hdpvr_register_videodev
	ASoC: Intel: sof_sdw: fix quirks for 2022 HP Spectre x360 13"
	tracing: Have TRACE_DEFINE_ENUM affect trace event types as well
	mmc: host: Return an error when ->enable_sdio_irq() ops is missing
	media: atomisp: fix bad usage at error handling logic
	ALSA: hda/realtek: Add alc256-samsung-headphone fixup
	KVM: x86: Reinitialize context if host userspace toggles EFER.LME
	KVM: x86/mmu: Move "invalid" check out of kvm_tdp_mmu_get_root()
	KVM: x86/mmu: Zap _all_ roots when unmapping gfn range in TDP MMU
	KVM: x86/mmu: Check for present SPTE when clearing dirty bit in TDP MMU
	KVM: x86: hyper-v: Drop redundant 'ex' parameter from kvm_hv_send_ipi()
	KVM: x86: hyper-v: Drop redundant 'ex' parameter from kvm_hv_flush_tlb()
	KVM: x86: hyper-v: Fix the maximum number of sparse banks for XMM fast TLB flush hypercalls
	KVM: x86: hyper-v: HVCALL_SEND_IPI_EX is an XMM fast hypercall
	powerpc/kasan: Fix early region not updated correctly
	powerpc/lib/sstep: Fix 'sthcx' instruction
	powerpc/lib/sstep: Fix build errors with newer binutils
	powerpc: Add set_memory_{p/np}() and remove set_memory_attr()
	powerpc: Fix build errors with newer binutils
	drm/dp: Fix off-by-one in register cache size
	drm/i915: Treat SAGV block time 0 as SAGV disabled
	drm/i915: Fix PSF GV point mask when SAGV is not possible
	drm/i915: Reject unsupported TMDS rates on ICL+
	scsi: qla2xxx: Refactor asynchronous command initialization
	scsi: qla2xxx: Implement ref count for SRB
	scsi: qla2xxx: Fix stuck session in gpdb
	scsi: qla2xxx: Fix warning message due to adisc being flushed
	scsi: qla2xxx: Fix scheduling while atomic
	scsi: qla2xxx: Fix premature hw access after PCI error
	scsi: qla2xxx: Fix wrong FDMI data for 64G adapter
	scsi: qla2xxx: Fix warning for missing error code
	scsi: qla2xxx: Fix device reconnect in loop topology
	scsi: qla2xxx: edif: Fix clang warning
	scsi: qla2xxx: Fix T10 PI tag escape and IP guard options for 28XX adapters
	scsi: qla2xxx: Add devids and conditionals for 28xx
	scsi: qla2xxx: Check for firmware dump already collected
	scsi: qla2xxx: Suppress a kernel complaint in qla_create_qpair()
	scsi: qla2xxx: Fix disk failure to rediscover
	scsi: qla2xxx: Fix incorrect reporting of task management failure
	scsi: qla2xxx: Fix hang due to session stuck
	scsi: qla2xxx: Fix missed DMA unmap for NVMe ls requests
	scsi: qla2xxx: Fix N2N inconsistent PLOGI
	scsi: qla2xxx: Fix stuck session of PRLI reject
	scsi: qla2xxx: Reduce false trigger to login
	scsi: qla2xxx: Use correct feature type field during RFF_ID processing
	platform: chrome: Split trace include file
	KVM: x86: Check lapic_in_kernel() before attempting to set a SynIC irq
	KVM: x86: Avoid theoretical NULL pointer dereference in kvm_irq_delivery_to_apic_fast()
	KVM: x86: Forbid VMM to set SYNIC/STIMER MSRs when SynIC wasn't activated
	KVM: Prevent module exit until all VMs are freed
	KVM: x86: fix sending PV IPI
	KVM: SVM: fix panic on out-of-bounds guest IRQ
	ubifs: rename_whiteout: Fix double free for whiteout_ui->data
	ubifs: Fix deadlock in concurrent rename whiteout and inode writeback
	ubifs: Add missing iput if do_tmpfile() failed in rename whiteout
	ubifs: Rename whiteout atomically
	ubifs: Fix 'ui->dirty' race between do_tmpfile() and writeback work
	ubifs: Rectify space amount budget for mkdir/tmpfile operations
	ubifs: setflags: Make dirtied_ino_d 8 bytes aligned
	ubifs: Fix read out-of-bounds in ubifs_wbuf_write_nolock()
	ubifs: Fix to add refcount once page is set private
	ubifs: rename_whiteout: correct old_dir size computing
	nvme: allow duplicate NSIDs for private namespaces
	nvme: fix the read-only state for zoned namespaces with unsupposed features
	wireguard: queueing: use CFI-safe ptr_ring cleanup function
	wireguard: socket: free skb in send6 when ipv6 is disabled
	wireguard: socket: ignore v6 endpoints when ipv6 is disabled
	XArray: Fix xas_create_range() when multi-order entry present
	can: mcba_usb: mcba_usb_start_xmit(): fix double dev_kfree_skb in error path
	can: mcba_usb: properly check endpoint type
	can: mcp251xfd: mcp251xfd_register_get_dev_id(): fix return of error value
	XArray: Update the LRU list in xas_split()
	modpost: restore the warning message for missing symbol versions
	rtc: check if __rtc_read_time was successful
	gfs2: gfs2_setattr_size error path fix
	gfs2: Make sure FITRIM minlen is rounded up to fs block size
	net: hns3: fix the concurrency between functions reading debugfs
	net: hns3: fix software vlan talbe of vlan 0 inconsistent with hardware
	rxrpc: fix some null-ptr-deref bugs in server_key.c
	rxrpc: Fix call timer start racing with call destruction
	mailbox: imx: fix wakeup failure from freeze mode
	crypto: arm/aes-neonbs-cbc - Select generic cbc and aes
	watch_queue: Free the page array when watch_queue is dismantled
	pinctrl: pinconf-generic: Print arguments for bias-pull-*
	watchdog: rti-wdt: Add missing pm_runtime_disable() in probe function
	net: sparx5: uses, depends on BRIDGE or !BRIDGE
	pinctrl: nuvoton: npcm7xx: Rename DS() macro to DSTR()
	pinctrl: nuvoton: npcm7xx: Use %zu printk format for ARRAY_SIZE()
	ASoC: mediatek: mt6358: add missing EXPORT_SYMBOLs
	ubi: Fix race condition between ctrl_cdev_ioctl and ubi_cdev_ioctl
	ARM: iop32x: offset IRQ numbers by 1
	block: Fix the maximum minor value is blk_alloc_ext_minor()
	io_uring: fix memory leak of uid in files registration
	riscv module: remove (NOLOAD)
	ACPI: CPPC: Avoid out of bounds access when parsing _CPC data
	vhost: handle error while adding split ranges to iotlb
	spi: Fix Tegra QSPI example
	platform/chrome: cros_ec_typec: Check for EC device
	can: isotp: restore accidentally removed MSG_PEEK feature
	proc: bootconfig: Add null pointer check
	drm/connector: Fix typo in documentation
	scsi: qla2xxx: Add qla2x00_async_done() for async routines
	staging: mt7621-dts: fix pinctrl-0 items to be size-1 items on ethernet
	arm64: mm: Drop 'const' from conditional arm64_dma_phys_limit definition
	ASoC: soc-compress: Change the check for codec_dai
	Reinstate some of "swiotlb: rework "fix info leak with DMA_FROM_DEVICE""
	tracing: Have type enum modifications copy the strings
	net: add skb_set_end_offset() helper
	net: preserve skb_end_offset() in skb_unclone_keeptruesize()
	mm/mmap: return 1 from stack_guard_gap __setup() handler
	ARM: 9187/1: JIVE: fix return value of __setup handler
	mm/memcontrol: return 1 from cgroup.memory __setup() handler
	mm/usercopy: return 1 from hardened_usercopy __setup() handler
	af_unix: Support POLLPRI for OOB.
	bpf: Adjust BPF stack helper functions to accommodate skip > 0
	bpf: Fix comment for helper bpf_current_task_under_cgroup()
	mmc: rtsx: Use pm_runtime_{get,put}() to handle runtime PM
	dt-bindings: mtd: nand-controller: Fix the reg property description
	dt-bindings: mtd: nand-controller: Fix a comment in the examples
	dt-bindings: spi: mxic: The interrupt property is not mandatory
	dt-bindings: memory: mtk-smi: No need mediatek,larb-id for mt8167
	dt-bindings: pinctrl: pinctrl-microchip-sgpio: Fix example
	ubi: fastmap: Return error code if memory allocation fails in add_aeb()
	ASoC: SOF: Intel: Fix build error without SND_SOC_SOF_PCI_DEV
	ASoC: topology: Allow TLV control to be either read or write
	perf vendor events: Update metrics for SkyLake Server
	media: ov6650: Add try support to selection API operations
	media: ov6650: Fix crop rectangle affected by set format
	spi: mediatek: support tick_delay without enhance_timing
	ARM: dts: spear1340: Update serial node properties
	ARM: dts: spear13xx: Update SPI dma properties
	arm64: dts: ls1043a: Update i2c dma properties
	arm64: dts: ls1046a: Update i2c node dma properties
	um: Fix uml_mconsole stop/go
	docs: sysctl/kernel: add missing bit to panic_print
	openvswitch: Fixed nd target mask field in the flow dump.
	torture: Make torture.sh help message match reality
	n64cart: convert bi_disk to bi_bdev->bd_disk fix build
	mmc: rtsx: Let MMC core handle runtime PM
	mmc: rtsx: Fix build errors/warnings for unused variable
	KVM: x86/mmu: do compare-and-exchange of gPTE via the user address
	iommu/dma: Skip extra sync during unmap w/swiotlb
	iommu/dma: Fold _swiotlb helpers into callers
	iommu/dma: Check CONFIG_SWIOTLB more broadly
	swiotlb: Support aligned swiotlb buffers
	iommu/dma: Account for min_align_mask w/swiotlb
	coredump: Snapshot the vmas in do_coredump
	coredump: Remove the WARN_ON in dump_vma_snapshot
	coredump/elf: Pass coredump_params into fill_note_info
	coredump: Use the vma snapshot in fill_files_note
	PCI: xgene: Revert "PCI: xgene: Use inbound resources for setup"
	Linux 5.15.33

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Id62bd8a22d0bfa7c2096539d253ffce804bed017
2022-04-20 08:18:54 +02:00
Stephen Dickey
40bcf12c6b ANDROID: sched: vendor hook for sched_getaffinity
Just as sched_getaffinity updates the affinity based upon the
active_mask, the vendor hook needs to be able to constrain
the task's affinity.

Introduce a hook to sched_getaffinity such that vendor modules
are able to update the affinity.

Bug: 229133948
Change-Id: I5c501e9204d4fcc6688f675a41be60ef5a2d1075
Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>
2022-04-20 00:30:01 +00:00
Namkyu Kim
87b89ce83b ANDROID: scheduler: add vendor-specific wake flag
specific wake flag for android vendor

Bug: 189858948
Bug: 226256614
Signed-off-by: Namkyu Kim <namkyu78.kim@samsung.com>
Change-Id: Idc23c1c47f7d83b298c0b2560859f1ce2761fd85
(cherry picked from commit 4c1097df5d)
Signed-off-by: Dongseok Yi <dseok.yi@samsung.com>
2022-04-19 23:47:01 +00:00
Kalesh Singh
46a5d62344 FROMGIT: EXP rcu: Move expedited grace period (GP) work to RT kthread_worker
Enabling CONFIG_RCU_BOOST did not reduce RCU expedited grace-period
latency because its workqueues run at SCHED_OTHER, and thus can be
delayed by normal processes.  This commit avoids these delays by moving
the expedited GP work items to a real-time-priority kthread_worker.

This option is controlled by CONFIG_RCU_EXP_KTHREAD and disabled by
default on PREEMPT_RT=y kernels which disable expedited grace periods
after boot by unconditionally setting rcupdate.rcu_normal_after_boot=1.

The results were evaluated on arm64 Android devices (6GB ram) running
5.10 kernel, and capturing trace data in critical user-level code.

The table below shows the resulting order-of-magnitude improvements
in synchronize_rcu_expedited() latency:

------------------------------------------------------------------------
|                          |   workqueues  |  kthread_worker |  Diff   |
------------------------------------------------------------------------
| Count                    |          725  |            688  |         |
------------------------------------------------------------------------
| Min Duration       (ns)  |          326  |            447  |  37.12% |
------------------------------------------------------------------------
| Q1                 (ns)  |       39,428  |         38,971  |  -1.16% |
------------------------------------------------------------------------
| Q2 - Median        (ns)  |       98,225  |         69,743  | -29.00% |
------------------------------------------------------------------------
| Q3                 (ns)  |      342,122  |        126,638  | -62.98% |
------------------------------------------------------------------------
| Max Duration       (ns)  |  372,766,967  |      2,329,671  | -99.38% |
------------------------------------------------------------------------
| Avg Duration       (ns)  |    2,746,353  |        151,242  | -94.49% |
------------------------------------------------------------------------
| Standard Deviation (ns)  |   19,327,765  |        294,408  |         |
------------------------------------------------------------------------

The below table show the range of maximums/minimums for
synchronize_rcu_expedited() latency from all experiments:

------------------------------------------------------------------------
|                          |   workqueues  |  kthread_worker |  Diff   |
------------------------------------------------------------------------
| Total No. of Experiments |           25  |             23  |         |
------------------------------------------------------------------------
| Largest  Maximum   (ns)  |  372,766,967  |      2,329,671  | -99.38% |
------------------------------------------------------------------------
| Smallest Maximum   (ns)  |       38,819  |         86,954  | 124.00% |
------------------------------------------------------------------------
| Range of Maximums  (ns)  |  372,728,148  |      2,242,717  |         |
------------------------------------------------------------------------
| Largest  Minimum   (ns)  |       88,623  |         27,588  | -68.87% |
------------------------------------------------------------------------
| Smallest Minimum   (ns)  |          326  |            447  |  37.12% |
------------------------------------------------------------------------
| Range of Minimums  (ns)  |       88,297  |         27,141  |         |
------------------------------------------------------------------------

Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Reported-by: Tim Murray <timmurray@google.com>
Reported-by: Wei Wang <wvw@google.com>
Tested-by: Kyle Lin <kylelin@google.com>
Tested-by: Chunwei Lu <chunweilu@google.com>
Tested-by: Lulu Wang <luluw@google.com>
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20220409003527.1587028-1-kaleshsingh@google.com/
(cherry picked from commit 3902dd17a29bd3ed1ead364a331a1761edb7162b
git: //git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git fastexp.2022.04.11a)
Bug: 224791892
Change-Id: I4cc5d28f9ae99e44e92afc43ff4db4b71c415d6c
2022-04-19 23:20:40 +00:00
Abhijeet Dharmapurikar
fad13230ac ANDROID: sched: create trace points for 32bit execve
Module code would like to hold some locks when affinity is being updated
for 32 bit task exec.

Create pre and post tracepoints in force_compatible_cpus_allowed_ptr()

Bug: 187917024
Change-Id: I95bff9f4d5b5d37c1d5440acbd6857d2855c2b43
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
Signed-off-by: Shaleen Agrawal <shalagra@codeaurora.org>
Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>
2022-04-19 22:23:21 +00:00
David Stevens
3e44e13656 swiotlb: Support aligned swiotlb buffers
commit e81e99bacc9f9347bda7808a949c1ce9fcc2bbf4 upstream.

Add an argument to swiotlb_tbl_map_single that specifies the desired
alignment of the allocated buffer. This is used by dma-iommu to ensure
the buffer is aligned to the iova granule size when using swiotlb with
untrusted sub-granule mappings. This addresses an issue where adjacent
slots could be exposed to the untrusted device if IO_TLB_SIZE < iova
granule < PAGE_SIZE.

Signed-off-by: David Stevens <stevensd@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210929023300.335969-7-stevensd@google.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Cc: Mario Limonciello <Mario.Limonciello@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-08 14:24:17 +02:00
Namhyung Kim
398ac11f44 bpf: Adjust BPF stack helper functions to accommodate skip > 0
commit ee2a098851bfbe8bcdd964c0121f4246f00ff41e upstream.

Let's say that the caller has storage for num_elem stack frames.  Then,
the BPF stack helper functions walk the stack for only num_elem frames.
This means that if skip > 0, one keeps only 'num_elem - skip' frames.

This is because it sets init_nr in the perf_callchain_entry to the end
of the buffer to save num_elem entries only.  I believe it was because
the perf callchain code unwound the stack frames until it reached the
global max size (sysctl_perf_event_max_stack).

However it now has perf_callchain_entry_ctx.max_stack to limit the
iteration locally.  This simplifies the code to handle init_nr in the
BPF callstack entries and removes the confusion with the perf_event's
__PERF_SAMPLE_CALLCHAIN_EARLY which sets init_nr to 0.

Also change the comment on bpf_get_stack() in the header file to be
more explicit what the return value means.

Fixes: c195651e56 ("bpf: add bpf_get_stack helper")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/30a7b5d5-6726-1cc2-eaee-8da2828a9a9c@oracle.com
Link: https://lore.kernel.org/bpf/20220314182042.71025-1-namhyung@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Based-on-patch-by: Eugene Loh <eugene.loh@oracle.com>
2022-04-08 14:24:14 +02:00
Steven Rostedt (Google)
14d552ab31 tracing: Have type enum modifications copy the strings
commit 795301d3c28996219d555023ac6863401b6076bc upstream.

When an enum is used in the visible parts of a trace event that is
exported to user space, the user space applications like perf and
trace-cmd do not have a way to know what the value of the enum is. To
solve this, at boot up (or module load) the printk formats are modified to
replace the enum with their numeric value in the string output.

Array fields of the event are defined by [<nr-elements>] in the type
portion of the format file so that the user space parsers can correctly
parse the array into the appropriate size chunks. But in some trace
events, an enum is used in defining the size of the array, which once
again breaks the parsing of user space tooling.

This was solved the same way as the print formats were, but it modified
the type strings of the trace event. This caused crashes in some
architectures because, as supposed to the print string, is a const string
value. This was not detected on x86, as it appears that const strings are
still writable (at least in boot up), but other architectures this is not
the case, and writing to a const string will cause a kernel fault.

To fix this, use kstrdup() to copy the type before modifying it. If the
trace event is for the core kernel there's no need to free it because the
string will be in use for the life of the machine being on line. For
modules, create a link list to store all the strings being allocated for
modules and when the module is removed, free them.

Link: https://lore.kernel.org/all/yt9dr1706b4i.fsf@linux.ibm.com/
Link: https://lkml.kernel.org/r/20220318153432.3984b871@gandalf.local.home

Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Fixes: b3bc8547d3be ("tracing: Have TRACE_DEFINE_ENUM affect trace event types as well")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-08 14:24:13 +02:00
Linus Torvalds
7007c89463 Reinstate some of "swiotlb: rework "fix info leak with DMA_FROM_DEVICE""
commit 901c7280ca0d5e2b4a8929fbe0bfb007ac2a6544 upstream.

Halil Pasic points out [1] that the full revert of that commit (revert
in bddac7c1e02b), and that a partial revert that only reverts the
problematic case, but still keeps some of the cleanups is probably
better.  

And that partial revert [2] had already been verified by Oleksandr
Natalenko to also fix the issue, I had just missed that in the long
discussion.

So let's reinstate the cleanups from commit aa6f8dcbab47 ("swiotlb:
rework "fix info leak with DMA_FROM_DEVICE""), and effectively only
revert the part that caused problems.

Link: https://lore.kernel.org/all/20220328013731.017ae3e3.pasic@linux.ibm.com/ [1]
Link: https://lore.kernel.org/all/20220324055732.GB12078@lst.de/ [2]
Link: https://lore.kernel.org/all/4386660.LvFx2qVVIh@natalenko.name/ [3]
Suggested-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-08 14:24:13 +02:00
Eric Dumazet
4913daecd0 watch_queue: Free the page array when watch_queue is dismantled
commit b490207017ba237d97b735b2aa66dc241ccd18f5 upstream.

Commit 7ea1a0124b6d ("watch_queue: Free the alloc bitmap when the
watch_queue is torn down") took care of the bitmap, but not the page
array.

  BUG: memory leak
  unreferenced object 0xffff88810d9bc140 (size 32):
  comm "syz-executor335", pid 3603, jiffies 4294946994 (age 12.840s)
  hex dump (first 32 bytes):
    40 a7 40 04 00 ea ff ff 00 00 00 00 00 00 00 00  @.@.............
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
     kmalloc_array include/linux/slab.h:621 [inline]
     kcalloc include/linux/slab.h:652 [inline]
     watch_queue_set_size+0x12f/0x2e0 kernel/watch_queue.c:251
     pipe_ioctl+0x82/0x140 fs/pipe.c:632
     vfs_ioctl fs/ioctl.c:51 [inline]
     __do_sys_ioctl fs/ioctl.c:874 [inline]
     __se_sys_ioctl fs/ioctl.c:860 [inline]
     __x64_sys_ioctl+0xfc/0x140 fs/ioctl.c:860
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]

Reported-by: syzbot+25ea042ae28f3888727a@syzkaller.appspotmail.com
Fixes: c73be61ced ("pipe: Add general notification queue support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/r/20220322004654.618274-1-eric.dumazet@gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-08 14:24:11 +02:00
Steven Rostedt (Google)
7c6bd60999 tracing: Have TRACE_DEFINE_ENUM affect trace event types as well
[ Upstream commit b3bc8547d3be60898818885f5bf22d0a62e2eb48 ]

The macro TRACE_DEFINE_ENUM is used to convert enums in the kernel to
their actual value when they are exported to user space via the trace
event format file.

Currently only the enums in the "print fmt" (TP_printk in the TRACE_EVENT
macro) have the enums converted. But the enums can be used to denote array
size:

        field:unsigned int fc_ineligible_rc[EXT4_FC_REASON_MAX]; offset:12;      size:36;        signed:0;

The EXT4_FC_REASON_MAX has no meaning to userspace but it needs to know
that information to know how to parse the array.

Have the array indexes also be parsed as well.

Link: https://lore.kernel.org/all/cover.1646922487.git.riteshh@linux.ibm.com/

Reported-by: Ritesh Harjani <riteshh@linux.ibm.com>
Tested-by: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:24:02 +02:00
Waiman Long
1388c10b32 locking/lockdep: Iterate lock_classes directly when reading lockdep files
[ Upstream commit fb7275acd6fb988313dddd8d3d19efa70d9015ad ]

When dumping lock_classes information via /proc/lockdep, we can't take
the lockdep lock as the lock hold time is indeterminate. Iterating
over all_lock_classes without holding lock can be dangerous as there
is a slight chance that it may branch off to other lists leading to
infinite loop or even access invalid memory if changes are made to
all_lock_classes list in parallel.

To avoid this problem, iteration of lock classes is now done directly
on the lock_classes array itself. The lock_classes_in_use bitmap is
checked to see if the lock class is being used. To avoid iterating
the full array all the times, a new max_lock_class_idx value is added
to track the maximum lock_class index that is currently being used.

We can theoretically take the lockdep lock for iterating all_lock_classes
when other lockdep files (lockdep_stats and lock_stat) are accessed as
the lock hold time will be shorter for them. For consistency, they are
also modified to iterate the lock_classes array directly.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220211035526.1329503-2-longman@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:23:57 +02:00
Paul E. McKenney
e34806c6c2 rcu: Mark writes to the rcu_segcblist structure's ->flags field
[ Upstream commit c09929031018913b5783872a8b8cdddef4a543c7 ]

KCSAN reports data races between the rcu_segcblist_clear_flags() and
rcu_segcblist_set_flags() functions, though misreporting the latter
as a call to rcu_segcblist_is_enabled() from call_rcu().  This commit
converts the updates of this field to WRITE_ONCE(), relying on the
resulting unmarked reads to continue to detect buggy concurrent writes
to this field.

Reported-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:23:55 +02:00
Daniel Thompson
c532caa7df kdb: Fix the putarea helper function
[ Upstream commit c1cb81429df462eca1b6ba615cddd21dd3103c46 ]

Currently kdb_putarea_size() uses copy_from_kernel_nofault() to write *to*
arbitrary kernel memory. This is obviously wrong and means the memory
modify ('mm') command is a serious risk to debugger stability: if we poke
to a bad address we'll double-fault and lose our debug session.

Fix this the (very) obvious way.

Note that there are two Fixes: tags because the API was renamed and this
patch will only trivially backport as far as the rename (and this is
probably enough). Nevertheless Christoph's rename did not introduce this
problem so I wanted to record that!

Fixes: fe557319aa ("maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault")
Fixes: 5d5314d679 ("kdb: core for kgdb back end (1 of 2)")
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20220128144055.207267-1-daniel.thompson@linaro.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:23:51 +02:00
Randy Dunlap
c39a750b61 dma-debug: fix return value of __setup handlers
[ Upstream commit 80e4390981618e290616dbd06ea190d4576f219d ]

When valid kernel command line parameters
  dma_debug=off dma_debug_entries=100
are used, they are reported as Unknown parameters and added to init's
environment strings, polluting it.

  Unknown kernel command line parameters "BOOT_IMAGE=/boot/bzImage-517rc5
    dma_debug=off dma_debug_entries=100", will be passed to user space.

and

 Run /sbin/init as init process
   with arguments:
     /sbin/init
   with environment:
     HOME=/
     TERM=linux
     BOOT_IMAGE=/boot/bzImage-517rc5
     dma_debug=off
     dma_debug_entries=100

Return 1 from these __setup handlers to indicate that the command line
option has been handled.

Fixes: 59d3daafa1 ("dma-debug: add kernel command line parameters")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: iommu@lists.linux-foundation.org
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:23:47 +02:00
Miaohe Lin
a9e88c2618 kernel/resource: fix kfree() of bootmem memory again
[ Upstream commit 0cbcc92917c5de80f15c24d033566539ad696892 ]

Since commit ebff7d8f27 ("mem hotunplug: fix kfree() of bootmem
memory"), we could get a resource allocated during boot via
alloc_resource().  And it's required to release the resource using
free_resource().  Howerver, many people use kfree directly which will
result in kernel BUG.  In order to fix this without fixing every call
site, just leak a couple of bytes in such corner case.

Link: https://lkml.kernel.org/r/20220217083619.19305-1-linmiaohe@huawei.com
Fixes: ebff7d8f27 ("mem hotunplug: fix kfree() of bootmem memory")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:23:43 +02:00
Christophe Leroy
b997cfdc3f livepatch: Fix build failure on 32 bits processors
[ Upstream commit 2f293651eca3eacaeb56747dede31edace7329d2 ]

Trying to build livepatch on powerpc/32 results in:

	kernel/livepatch/core.c: In function 'klp_resolve_symbols':
	kernel/livepatch/core.c:221:23: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
	  221 |                 sym = (Elf64_Sym *)sechdrs[symndx].sh_addr + ELF_R_SYM(relas[i].r_info);
	      |                       ^
	kernel/livepatch/core.c:221:21: error: assignment to 'Elf32_Sym *' {aka 'struct elf32_sym *'} from incompatible pointer type 'Elf64_Sym *' {aka 'struct elf64_sym *'} [-Werror=incompatible-pointer-types]
	  221 |                 sym = (Elf64_Sym *)sechdrs[symndx].sh_addr + ELF_R_SYM(relas[i].r_info);
	      |                     ^
	kernel/livepatch/core.c: In function 'klp_apply_section_relocs':
	kernel/livepatch/core.c:312:35: error: passing argument 1 of 'klp_resolve_symbols' from incompatible pointer type [-Werror=incompatible-pointer-types]
	  312 |         ret = klp_resolve_symbols(sechdrs, strtab, symndx, sec, sec_objname);
	      |                                   ^~~~~~~
	      |                                   |
	      |                                   Elf32_Shdr * {aka struct elf32_shdr *}
	kernel/livepatch/core.c:193:44: note: expected 'Elf64_Shdr *' {aka 'struct elf64_shdr *'} but argument is of type 'Elf32_Shdr *' {aka 'struct elf32_shdr *'}
	  193 | static int klp_resolve_symbols(Elf64_Shdr *sechdrs, const char *strtab,
	      |                                ~~~~~~~~~~~~^~~~~~~

Fix it by using the right types instead of forcing 64 bits types.

Fixes: 7c8e2bdd5f ("livepatch: Apply vmlinux-specific KLP relocations early")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Petr Mladek <pmladek@suse.com>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5288e11b018a762ea3351cc8fb2d4f15093a4457.1640017960.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:23:29 +02:00
Kumar Kartikeya Dwivedi
51b82141ff bpf: Fix UAF due to race between btf_try_get_module and load_module
[ Upstream commit 18688de203b47e5d8d9d0953385bf30b5949324f ]

While working on code to populate kfunc BTF ID sets for module BTF from
its initcall, I noticed that by the time the initcall is invoked, the
module BTF can already be seen by userspace (and the BPF verifier). The
existing btf_try_get_module calls try_module_get which only fails if
mod->state == MODULE_STATE_GOING, i.e. it can increment module reference
when module initcall is happening in parallel.

Currently, BTF parsing happens from MODULE_STATE_COMING notifier
callback. At this point, the module initcalls have not been invoked.
The notifier callback parses and prepares the module BTF, allocates an
ID, which publishes it to userspace, and then adds it to the btf_modules
list allowing the kernel to invoke btf_try_get_module for the BTF.

However, at this point, the module has not been fully initialized (i.e.
its initcalls have not finished). The code in module.c can still fail
and free the module, without caring for other users. However, nothing
stops btf_try_get_module from succeeding between the state transition
from MODULE_STATE_COMING to MODULE_STATE_LIVE.

This leads to a use-after-free issue when BPF program loads
successfully in the state transition, load_module's do_init_module call
fails and frees the module, and BPF program fd on close calls module_put
for the freed module. Future patch has test case to verify we don't
regress in this area in future.

There are multiple points after prepare_coming_module (in load_module)
where failure can occur and module loading can return error. We
illustrate and test for the race using the last point where it can
practically occur (in module __init function).

An illustration of the race:

CPU 0                           CPU 1
			  load_module
			    notifier_call(MODULE_STATE_COMING)
			      btf_parse_module
			      btf_alloc_id	// Published to userspace
			      list_add(&btf_mod->list, btf_modules)
			    mod->init(...)
...				^
bpf_check		        |
check_pseudo_btf_id             |
  btf_try_get_module            |
    returns true                |  ...
...                             |  module __init in progress
return prog_fd                  |  ...
...                             V
			    if (ret < 0)
			      free_module(mod)
			    ...
close(prog_fd)
 ...
 bpf_prog_free_deferred
  module_put(used_btf.mod) // use-after-free

We fix this issue by setting a flag BTF_MODULE_F_LIVE, from the notifier
callback when MODULE_STATE_LIVE state is reached for the module, so that
we return NULL from btf_try_get_module for modules that are not fully
formed. Since try_module_get already checks that module is not in
MODULE_STATE_GOING state, and that is the only transition a live module
can make before being removed from btf_modules list, this is enough to
close the race and prevent the bug.

A later selftest patch crafts the race condition artifically to verify
that it has been fixed, and that verifier fails to load program (with
ENXIO).

Lastly, a couple of comments:

 1. Even if this race didn't exist, it seems more appropriate to only
    access resources (ksyms and kfuncs) of a fully formed module which
    has been initialized completely.

 2. This patch was born out of need for synchronization against module
    initcall for the next patch, so it is needed for correctness even
    without the aforementioned race condition. The BTF resources
    initialized by module initcall are set up once and then only looked
    up, so just waiting until the initcall has finished ensures correct
    behavior.

Fixes: 541c3bad8d ("bpf: Support BPF ksym variables in kernel modules")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:23:24 +02:00
Randy Dunlap
f64c5b235b printk: fix return value of printk.devkmsg __setup handler
[ Upstream commit b665eae7a788c5e2bc10f9ac3c0137aa0ad1fc97 ]

If an invalid option value is used with "printk.devkmsg=<value>",
it is silently ignored.
If a valid option value is used, it is honored but the wrong return
value (0) is used, indicating that the command line option had an
error and was not handled. This string is not added to init's
environment strings due to init/main.c::unknown_bootoption()
checking for a '.' in the boot option string and then considering
that string to be an "Unused module parameter".

Print a warning message if a bad option string is used.
Always return 1 from the __setup handler to indicate that the command
line option has been handled.

Fixes: 750afe7bab ("printk: add kernel parameter to control writes to /dev/kmsg")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Cc: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220228220556.23484-1-rdunlap@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:23:19 +02:00
Valentin Schneider
1e0e63ad62 sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
[ Upstream commit 49bef33e4b87b743495627a529029156c6e09530 ]

John reported that push_rt_task() can end up invoking
find_lowest_rq(rq->curr) when curr is not an RT task (in this case a CFS
one), which causes mayhem down convert_prio().

This can happen when current gets demoted to e.g. CFS when releasing an
rt_mutex, and the local CPU gets hit with an rto_push_work irqwork before
getting the chance to reschedule. Exactly who triggers this work isn't
entirely clear to me - switched_from_rt() only invokes rt_queue_pull_task()
if there are no RT tasks on the local RQ, which means the local CPU can't
be in the rto_mask.

My current suspected sequence is something along the lines of the below,
with the demoted task being current.

  mark_wakeup_next_waiter()
    rt_mutex_adjust_prio()
      rt_mutex_setprio() // deboost originally-CFS task
	check_class_changed()
	  switched_from_rt() // Only rt_queue_pull_task() if !rq->rt.rt_nr_running
	  switched_to_fair() // Sets need_resched
      __balance_callbacks() // if pull_rt_task(), tell_cpu_to_push() can't select local CPU per the above
      raw_spin_rq_unlock(rq)

       // need_resched is set, so task_woken_rt() can't
       // invoke push_rt_tasks(). Best I can come up with is
       // local CPU has rt_nr_migratory >= 2 after the demotion, so stays
       // in the rto_mask, and then:

       <some other CPU running rto_push_irq_work_func() queues rto_push_work on this CPU>
	 push_rt_task()
	   // breakage follows here as rq->curr is CFS

Move an existing check to check rq->curr vs the next pushable task's
priority before getting anywhere near find_lowest_rq(). While at it, add an
explicit sched_class of rq->curr check prior to invoking
find_lowest_rq(rq->curr). Align the DL logic to also reschedule regardless
of next_task's migratability.

Fixes: a7c81556ec ("sched: Fix migrate_disable() vs rt/dl balancing")
Reported-by: John Keeping <john@metanate.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: John Keeping <john@metanate.com>
Link: https://lore.kernel.org/r/20220127154059.974729-1-valentin.schneider@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:23:11 +02:00
Chengming Zhou
b7aec0843e sched/cpuacct: Fix charge percpu cpuusage
[ Upstream commit 248cc9993d1cc12b8e9ed716cc3fc09f6c3517dd ]

The cpuacct_account_field() is always called by the current task
itself, so it's ok to use __this_cpu_add() to charge the tick time.

But cpuacct_charge() maybe called by update_curr() in load_balance()
on a random CPU, different from the CPU on which the task is running.
So __this_cpu_add() will charge that cputime to a random incorrect CPU.

Fixes: 73e6aafd9e ("sched/cpuacct: Simplify the cpuacct code")
Reported-by: Minye Zhu <zhuminye@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220220051426.5274-1-zhouchengming@bytedance.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:23:11 +02:00
Mel Gorman
ec5884cbbf sched/fair: Improve consistency of allowed NUMA balance calculations
[ Upstream commit 2cfb7a1b031b0e816af7a6ee0c6ab83b0acdf05a ]

There are inconsistencies when determining if a NUMA imbalance is allowed
that should be corrected.

o allow_numa_imbalance changes types and is not always examining
  the destination group so both the type should be corrected as
  well as the naming.
o find_idlest_group uses the sched_domain's weight instead of the
  group weight which is different to find_busiest_group
o find_busiest_group uses the source group instead of the destination
  which is different to task_numa_find_cpu
o Both find_idlest_group and find_busiest_group should account
  for the number of running tasks if a move was allowed to be
  consistent with task_numa_find_cpu

Fixes: 7d2b5dd0bc ("sched/numa: Allow a floating imbalance between NUMA nodes")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20220208094334.16379-2-mgorman@techsingularity.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:23:11 +02:00
Adrian Hunter
929d8a87f7 perf/core: Fix address filter parser for multiple filters
[ Upstream commit d680ff24e9e14444c63945b43a37ede7cd6958f9 ]

Reset appropriate variables in the parser loop between parsing separate
filters, so that they do not interfere with parsing the next filter.

Fixes: 375637bc52 ("perf/core: Introduce address range filtering")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220131072453.2839535-4-adrian.hunter@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:23:10 +02:00
Mathieu Desnoyers
3bb11f3f68 rseq: Remove broken uapi field layout on 32-bit little endian
[ Upstream commit bfdf4e6208051ed7165b2e92035b4bf11f43eb63 ]

The rseq rseq_cs.ptr.{ptr32,padding} uapi endianness handling is
entirely wrong on 32-bit little endian: a preprocessor logic mistake
wrongly uses the big endian field layout on 32-bit little endian
architectures.

Fortunately, those ptr32 accessors were never used within the kernel,
and only meant as a convenience for user-space.

Remove those and replace the whole rseq_cs union by a __u64 type, as
this is the only thing really needed to express the ABI. Document how
32-bit architectures are meant to interact with this field.

Fixes: ec9c82e03a ("rseq: uapi: Declare rseq_cs field as union, update includes")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220127152720.25898-1-mathieu.desnoyers@efficios.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:23:10 +02:00
Qais Yousef
d2c741290f sched/uclamp: Fix iowait boost escaping uclamp restriction
[ Upstream commit d37aee9018e68b0d356195caefbb651910e0bbfa ]

iowait_boost signal is applied independently of util and doesn't take
into account uclamp settings of the rq. An io heavy task that is capped
by uclamp_max could still request higher frequency because
sugov_iowait_apply() doesn't clamp the boost via uclamp_rq_util_with()
like effective_cpu_util() does.

Make sure that iowait_boost honours uclamp requests by calling
uclamp_rq_util_with() when applying the boost.

Fixes: 982d9cdc22 ("sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20211216225320.2957053-3-qais.yousef@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:23:10 +02:00
Qais Yousef
6c72766223 sched/core: Export pelt_thermal_tp
[ Upstream commit 77cf151b7bbdfa3577b3c3f3a5e267a6c60a263b ]

We can't use this tracepoint in modules without having the symbol
exported first, fix that.

Fixes: 765047932f ("sched/pelt: Add support to track thermal pressure")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211028115005.873539-1-qais.yousef@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:23:10 +02:00
Bharata B Rao
8bc68c44d9 sched/debug: Remove mpol_get/put and task_lock/unlock from sched_show_numa
[ Upstream commit 28c988c3ec29db74a1dda631b18785958d57df4f ]

The older format of /proc/pid/sched printed home node info which
required the mempolicy and task lock around mpol_get(). However
the format has changed since then and there is no need for
sched_show_numa() any more to have mempolicy argument,
asssociated mpol_get/put and task_lock/unlock. Remove them.

Fixes: 397f2378f1 ("sched/numa: Fix numa balancing stats in /proc/pid/sched")
Signed-off-by: Bharata B Rao <bharata@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lore.kernel.org/r/20220118050515.2973-1-bharata@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:23:10 +02:00
David Howells
f69aecb499 watch_queue: Actually free the watch
[ Upstream commit 3d8dcf278b1ee1eff1e90be848fa2237db4c07a7 ]

free_watch() does everything barring actually freeing the watch object.  Fix
this by adding the missing kfree.

kmemleak produces a report something like the following.  Note that as an
address can be seen in the first word, the watch would appear to have gone
through call_rcu().

BUG: memory leak
unreferenced object 0xffff88810ce4a200 (size 96):
  comm "syz-executor352", pid 3605, jiffies 4294947473 (age 13.720s)
  hex dump (first 32 bytes):
    e0 82 48 0d 81 88 ff ff 00 00 00 00 00 00 00 00  ..H.............
    80 a2 e4 0c 81 88 ff ff 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8214e6cc>] kmalloc include/linux/slab.h:581 [inline]
    [<ffffffff8214e6cc>] kzalloc include/linux/slab.h:714 [inline]
    [<ffffffff8214e6cc>] keyctl_watch_key+0xec/0x2e0 security/keys/keyctl.c:1800
    [<ffffffff8214ec84>] __do_sys_keyctl+0x3c4/0x490 security/keys/keyctl.c:2016
    [<ffffffff84493a25>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    [<ffffffff84493a25>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
    [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: c73be61ced ("pipe: Add general notification queue support")
Reported-and-tested-by: syzbot+6e2de48f06cdb2884bfc@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:23:10 +02:00
David Howells
695c47cea0 watch_queue: Fix NULL dereference in error cleanup
[ Upstream commit a635415a064e77bcfbf43da413fd9dfe0bbed9cb ]

In watch_queue_set_size(), the error cleanup code doesn't take account of
the fact that __free_page() can't handle a NULL pointer when trying to free
up buffer pages that did get allocated.

Fix this by only calling __free_page() on the pages actually allocated.

Without the fix, this can lead to something like the following:

BUG: KASAN: null-ptr-deref in __free_pages+0x1f/0x1b0 mm/page_alloc.c:5473
Read of size 4 at addr 0000000000000034 by task syz-executor168/3599
...
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 __kasan_report mm/kasan/report.c:446 [inline]
 kasan_report.cold+0x66/0xdf mm/kasan/report.c:459
 check_region_inline mm/kasan/generic.c:183 [inline]
 kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189
 instrument_atomic_read include/linux/instrumented.h:71 [inline]
 atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline]
 page_ref_count include/linux/page_ref.h:67 [inline]
 put_page_testzero include/linux/mm.h:717 [inline]
 __free_pages+0x1f/0x1b0 mm/page_alloc.c:5473
 watch_queue_set_size+0x499/0x630 kernel/watch_queue.c:275
 pipe_ioctl+0xac/0x2b0 fs/pipe.c:632
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:874 [inline]
 __se_sys_ioctl fs/ioctl.c:860 [inline]
 __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: c73be61ced ("pipe: Add general notification queue support")
Reported-and-tested-by: syzbot+d55757faa9b80590767b@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:23:10 +02:00
Randy Dunlap
7e45fc93dd PM: suspend: fix return value of __setup handler
[ Upstream commit 7a64ca17e4dd50d5f910769167f3553902777844 ]

If an invalid option is given for "test_suspend=<option>", the entire
string is added to init's environment, so return 1 instead of 0 from
the __setup handler.

  Unknown kernel command line parameters "BOOT_IMAGE=/boot/bzImage-517rc5
    test_suspend=invalid"

and

 Run /sbin/init as init process
   with arguments:
     /sbin/init
   with environment:
     HOME=/
     TERM=linux
     BOOT_IMAGE=/boot/bzImage-517rc5
     test_suspend=invalid

Fixes: 2ce986892f ("PM / sleep: Enhance test_suspend option with repeat capability")
Fixes: 27ddcc6596 ("PM / sleep: Add state field to pm_states[] entries")
Fixes: a9d7052363 ("PM: Separate suspend to RAM functionality from core")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:23:07 +02:00
Randy Dunlap
d0cd9da501 PM: hibernate: fix __setup handler error handling
[ Upstream commit ba7ffcd4c4da374b0f64666354eeeda7d3827131 ]

If an invalid value is used in "resumedelay=<seconds>", it is
silently ignored. Add a warning message and then let the __setup
handler return 1 to indicate that the kernel command line option
has been handled.

Fixes: 317cf7e5e8 ("PM / hibernate: convert simple_strtoul to kstrtoul")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:23:07 +02:00
Richard Guy Briggs
3a10df7315 audit: log AUDIT_TIME_* records only from rules
[ Upstream commit 272ceeaea355214b301530e262a0df8600bfca95 ]

AUDIT_TIME_* events are generated when there are syscall rules present
that are not related to time keeping.  This will produce noisy log
entries that could flood the logs and hide events we really care about.

Rather than immediately produce the AUDIT_TIME_* records, store the data
in the context and log it at syscall exit time respecting the filter
rules.

Note: This eats the audit_buffer, unlike any others in show_special().

Please see https://bugzilla.redhat.com/show_bug.cgi?id=1991919

Fixes: 7e8eda734d ("ntp: Audit NTP parameters adjustment")
Fixes: 2d87a0674b ("timekeeping: Audit clock adjustments")
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: fixed style/whitespace issues]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:23:06 +02:00
Steven Rostedt (Google)
39483fd3b2 tracing: Have trace event string test handle zero length strings
commit eca344a7362e0f34f179298fd8366bcd556eede1 upstream.

If a trace event has in its TP_printk():

 "%*.s", len, len ? __get_str(string) : NULL

It is perfectly valid if len is zero and passing in the NULL.
Unfortunately, the runtime string check at time of reading the trace sees
the NULL and flags it as a bad string and produces a WARN_ON().

Handle this case by passing into the test function if the format has an
asterisk (star) and if so, if the length is zero, then mark it as safe.

Link: https://lore.kernel.org/all/YjsWzuw5FbWPrdqq@bfoster/

Cc: stable@vger.kernel.org
Reported-by: Brian Foster <bfoster@redhat.com>
Tested-by: Brian Foster <bfoster@redhat.com>
Fixes: 9a6944fee6 ("tracing: Add a verifier to check string pointers for trace events")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-08 14:22:57 +02:00
Jann Horn
b6d75218ff ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
commit ee1fee900537b5d9560e9f937402de5ddc8412f3 upstream.

Setting PTRACE_O_SUSPEND_SECCOMP is supposed to be a highly privileged
operation because it allows the tracee to completely bypass all seccomp
filters on kernels with CONFIG_CHECKPOINT_RESTORE=y. It is only supposed to
be settable by a process with global CAP_SYS_ADMIN, and only if that
process is not subject to any seccomp filters at all.

However, while these permission checks were done on the PTRACE_SETOPTIONS
path, they were missing on the PTRACE_SEIZE path, which also sets
user-specified ptrace flags.

Move the permissions checks out into a helper function and let both
ptrace_attach() and ptrace_setoptions() call it.

Cc: stable@kernel.org
Fixes: 13c4a90119 ("seccomp: add ptrace options for suspend/resume")
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://lkml.kernel.org/r/20220319010838.1386861-1-jannh@google.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-08 14:22:50 +02:00
Waiman Long
03f836fcb6 locking/lockdep: Avoid potential access of invalid memory in lock_class
commit 61cc4534b6550997c97a03759ab46b29d44c0017 upstream.

It was found that reading /proc/lockdep after a lockdep splat may
potentially cause an access to freed memory if lockdep_unregister_key()
is called after the splat but before access to /proc/lockdep [1]. This
is due to the fact that graph_lock() call in lockdep_unregister_key()
fails after the clearing of debug_locks by the splat process.

After lockdep_unregister_key() is called, the lock_name may be freed
but the corresponding lock_class structure still have a reference to
it. That invalid memory pointer will then be accessed when /proc/lockdep
is read by a user and a use-after-free (UAF) error will be reported if
KASAN is enabled.

To fix this problem, lockdep_unregister_key() is now modified to always
search for a matching key irrespective of the debug_locks state and
zap the corresponding lock class if a matching one is found.

[1] https://lore.kernel.org/lkml/77f05c15-81b6-bddd-9650-80d5f23fe330@i-love.sakura.ne.jp/

Fixes: 8b39adbee8 ("locking/lockdep: Make lockdep_unregister_key() honor 'debug_locks' again")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Cc: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Link: https://lkml.kernel.org/r/20220103023558.1377055-1-longman@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-08 14:22:48 +02:00
Linus Torvalds
890f78e54b Revert "swiotlb: rework "fix info leak with DMA_FROM_DEVICE""
commit bddac7c1e02ba47f0570e494c9289acea3062cc1 upstream.

This reverts commit aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13.

It turns out this breaks at least the ath9k wireless driver, and
possibly others.

What the ath9k driver does on packet receive is to set up the DMA
transfer with:

  int ath_rx_init(..)
  ..
                bf->bf_buf_addr = dma_map_single(sc->dev, skb->data,
                                                 common->rx_bufsize,
                                                 DMA_FROM_DEVICE);

and then the receive logic (through ath_rx_tasklet()) will fetch
incoming packets

  static bool ath_edma_get_buffers(..)
  ..
        dma_sync_single_for_cpu(sc->dev, bf->bf_buf_addr,
                                common->rx_bufsize, DMA_FROM_DEVICE);

        ret = ath9k_hw_process_rxdesc_edma(ah, rs, skb->data);
        if (ret == -EINPROGRESS) {
                /*let device gain the buffer again*/
                dma_sync_single_for_device(sc->dev, bf->bf_buf_addr,
                                common->rx_bufsize, DMA_FROM_DEVICE);
                return false;
        }

and it's worth noting how that first DMA sync:

    dma_sync_single_for_cpu(..DMA_FROM_DEVICE);

is there to make sure the CPU can read the DMA buffer (possibly by
copying it from the bounce buffer area, or by doing some cache flush).
The iommu correctly turns that into a "copy from bounce bufer" so that
the driver can look at the state of the packets.

In the meantime, the device may continue to write to the DMA buffer, but
we at least have a snapshot of the state due to that first DMA sync.

But that _second_ DMA sync:

    dma_sync_single_for_device(..DMA_FROM_DEVICE);

is telling the DMA mapping that the CPU wasn't interested in the area
because the packet wasn't there.  In the case of a DMA bounce buffer,
that is a no-op.

Note how it's not a sync for the CPU (the "for_device()" part), and it's
not a sync for data written by the CPU (the "DMA_FROM_DEVICE" part).

Or rather, it _should_ be a no-op.  That's what commit aa6f8dcbab47
broke: it made the code bounce the buffer unconditionally, and changed
the DMA_FROM_DEVICE to just unconditionally and illogically be
DMA_TO_DEVICE.

[ Side note: purely within the confines of the swiotlb driver it wasn't
  entirely illogical: The reason it did that odd DMA_FROM_DEVICE ->
  DMA_TO_DEVICE conversion thing is because inside the swiotlb driver,
  it uses just a swiotlb_bounce() helper that doesn't care about the
  whole distinction of who the sync is for - only which direction to
  bounce.

  So it took the "sync for device" to mean that the CPU must have been
  the one writing, and thought it meant DMA_TO_DEVICE. ]

Also note how the commentary in that commit was wrong, probably due to
that whole confusion, claiming that the commit makes the swiotlb code

                                  "bounce unconditionally (that is, also
    when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale
    data from the swiotlb buffer"

which is nonsensical for two reasons:

 - that "also when dir == DMA_TO_DEVICE" is nonsensical, as that was
   exactly when it always did - and should do - the bounce.

 - since this is a sync for the device (not for the CPU), we're clearly
   fundamentally not coping back stale data from the bounce buffers at
   all, because we'd be copying *to* the bounce buffers.

So that commit was just very confused.  It confused the direction of the
synchronization (to the device, not the cpu) with the direction of the
DMA (from the device).

Reported-and-bisected-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Reported-by: Olha Cherevyk <olha.cherevyk@gmail.com>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Toke Høiland-Jørgensen <toke@toke.dk>
Cc: Maxime Bizon <mbizon@freebox.fr>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-08 14:22:45 +02:00
weidong guan
5168caa32a ANDROID: Add vendor hook for tk_based_time_sync
android_rvh_tk_based_time_sync:
	to get vendor-specific time sync func

Bug: 223950152
Test: build pass

Signed-off-by: weidong guan <weidong.guan@unisoc.com>
Change-Id: I4c540e077668bff548c55d4afba3a7bd6725497e
2022-04-06 08:31:30 -07:00
Xuewen Yan
c6fb9f6636 ANDROID: Add vendor hook to the effective_cpu_util
android_rvh_effective_cpu_util:
	To perform vendor-specific cpu util, it is used in EAS/schedutil/thermal.

The effective_cpu_util would be called when thermal calc the dynamic power,
it's non-atomic context, so set the hook be restricted.

Bug: 226686099
Test: build pass

Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
Change-Id: I6fd77f44ca4328f5ef37d96989aa2e08d65e29bb
2022-04-01 17:12:18 +00:00
Sangmoon Kim
73a2a894a9 ANDROID: stacktrace: export stack_trace_save_tsk/regs
Export stack_trace_save_tsk and stack_trace_save_regs so that modules
can use it to get the stacktrace of specific tasks or pt_regs.

Bug: 220704805
Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Change-Id: I568e5d8c3e2474a0dce96da6adb4eec2583d45d2
(cherry picked from commit db42a16b80a073fdf3f6f3ee207395bba0c7219e)
2022-03-29 23:38:13 +00:00
Greg Kroah-Hartman
3019c1326d Merge 5.15.32 into android13-5.15
Changes in 5.15.32
	nfc: st21nfca: Fix potential buffer overflows in EVT_TRANSACTION
	net: ipv6: fix skb_over_panic in __ip6_append_data
	tpm: Fix error handling in async work
	Bluetooth: btusb: Add another Realtek 8761BU
	llc: fix netdevice reference leaks in llc_ui_bind()
	ASoC: sti: Fix deadlock via snd_pcm_stop_xrun() call
	ALSA: oss: Fix PCM OSS buffer allocation overflow
	ALSA: usb-audio: add mapping for new Corsair Virtuoso SE
	ALSA: hda/realtek: Add quirk for Clevo NP70PNJ
	ALSA: hda/realtek: Add quirk for Clevo NP50PNJ
	ALSA: hda/realtek - Fix headset mic problem for a HP machine with alc671
	ALSA: hda/realtek: Add quirk for ASUS GA402
	ALSA: pcm: Fix races among concurrent hw_params and hw_free calls
	ALSA: pcm: Fix races among concurrent read/write and buffer changes
	ALSA: pcm: Fix races among concurrent prepare and hw_params/hw_free calls
	ALSA: pcm: Fix races among concurrent prealloc proc writes
	ALSA: pcm: Add stream lock during PCM reset ioctl operations
	ALSA: usb-audio: Add mute TLV for playback volumes on RODE NT-USB
	ALSA: cmipci: Restore aux vol on suspend/resume
	ALSA: pci: fix reading of swapped values from pcmreg in AC97 codec
	drivers: net: xgene: Fix regression in CRC stripping
	netfilter: nf_tables: initialize registers in nft_do_chain()
	netfilter: nf_tables: validate registers coming from userspace.
	ACPI / x86: Work around broken XSDT on Advantech DAC-BJ01 board
	ACPI: battery: Add device HID and quirk for Microsoft Surface Go 3
	ACPI: video: Force backlight native for Clevo NL5xRU and NL5xNU
	crypto: qat - disable registration of algorithms
	Bluetooth: btusb: Add one more Bluetooth part for the Realtek RTL8852AE
	Revert "ath: add support for special 0x0 regulatory domain"
	drm/virtio: Ensure that objs is not NULL in virtio_gpu_array_put_free()
	rcu: Don't deboost before reporting expedited quiescent state
	uaccess: fix integer overflow on access_ok()
	mac80211: fix potential double free on mesh join
	tpm: use try_get_ops() in tpm-space.c
	wcn36xx: Differentiate wcn3660 from wcn3620
	m68k: fix access_ok for coldfire
	nds32: fix access_ok() checks in get/put_user
	llc: only change llc->dev when bind() succeeds
	Linux 5.15.32

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3f78747142047de7aa26e2c81bc378d4d0aedaff
2022-03-29 12:52:53 +02:00
Paul E. McKenney
058d62a03e rcu: Don't deboost before reporting expedited quiescent state
commit 10c535787436d62ea28156a4b91365fd89b5a432 upstream.

Currently rcu_preempt_deferred_qs_irqrestore() releases rnp->boost_mtx
before reporting the expedited quiescent state.  Under heavy real-time
load, this can result in this function being preempted before the
quiescent state is reported, which can in turn prevent the expedited grace
period from completing.  Tim Murray reports that the resulting expedited
grace periods can take hundreds of milliseconds and even more than one
second, when they should normally complete in less than a millisecond.

This was fine given that there were no particular response-time
constraints for synchronize_rcu_expedited(), as it was designed
for throughput rather than latency.  However, some users now need
sub-100-millisecond response-time constratints.

This patch therefore follows Neeraj's suggestion (seconded by Tim and
by Uladzislau Rezki) of simply reversing the two operations.

Reported-by: Tim Murray <timmurray@google.com>
Reported-by: Joel Fernandes <joelaf@google.com>
Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Tested-by: Tim Murray <timmurray@google.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: <stable@vger.kernel.org> # 5.4.x
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-28 09:58:45 +02:00
Suren Baghdasaryan
0fd37220d8 UPSTREAM: mm: refactor vm_area_struct::anon_vma_name usage code
Avoid mixing strings and their anon_vma_name referenced pointers by
using struct anon_vma_name whenever possible.  This simplifies the code
and allows easier sharing of anon_vma_name structures when they
represent the same name.

[surenb@google.com: fix comment]

Link: https://lkml.kernel.org/r/20220223153613.835563-1-surenb@google.com
Link: https://lkml.kernel.org/r/20220224231834.1481408-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Colin Cross <ccross@google.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Alexey Gladkov <legion@kernel.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Chris Hyser <chris.hyser@oracle.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Xiaofeng Cao <caoxiaofeng@yulong.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 5c26f6ac9416b63d093e29c30e79b3297e425472)

Bug: 218352794
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I4a6b5602ce7151d1a4b88fac489f86d68089bd4d
2022-03-24 18:44:39 -07:00
Prasad Sodagudi
b1315d6275 ANDROID: timer: update vendor hook for timer calc index
commit '18550710107d ANDROID: timer: Add vendor hook for timer calc index'
is changing the expires value only for bucket calculation, so timer
wont be very precise in this case.

Bug: 178758017
Change-Id: I8218a4065d31ed60bafdbbb36c1d6d9c9f0a6e9e
Signed-off-by: Prasad Sodagudi <quic_psodagud@quicinc.com>
Signed-off-by: Gokul krishna Krishnakumar <quic_gokukris@quicinc.com>
2022-03-24 22:02:34 +00:00
Greg Kroah-Hartman
8aa6f1cde0 Revert "Revert "bpf: Fix possible race in inc_misses_counter""
This reverts commit bb592b6898.

It is no longer needed as we can modify the KABI at this point in time.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I1467db884a714b1379d31d65e650efccbc17ac5c
2022-03-23 11:32:21 -07:00
Greg Kroah-Hartman
57270a84df Revert "Revert "bpf: Use u64_stats_t in struct bpf_prog_stats""
This reverts commit beb134d21a.

It is no longer needed as we can modify the KABI at this point in time.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibabb6d2e2a1e00d18ad2e8c39b4459ba118c7002
2022-03-23 11:32:21 -07:00
JianMin Liu
46dfaf84ab ANDROID: rwsem: Add vendor hook to the rw-semaphore
Add the hook to apply vendor's performance tune for owner
of rwsem.

Add the hook for the waiter list of rwsem to allow
vendor perform waiting queue enhancement

ANDROID_VENDOR_DATA added to rw_semaphore

Bug: 222402411

Signed-off-by: JianMin Liu <jian-min.liu@mediatek.com>
Signed-off-by: Jino Hsu <jino.hsu@mediatek.com>
Change-Id: I007a5e26f3db2adaeaf4e5ccea414ce7abfa83b8
2022-03-23 11:32:20 -07:00
Michel Lespinasse
48e35d053f FROMLIST: mm: rcu safe vma->vm_file freeing
Defer freeing of vma->vm_file when freeing vmas.
This is to allow speculative page faults in the mapped file case.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20210407014502.24091-24-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ic766bc2086db82eae9f3aadf0f23dd743be1c464
2022-03-23 11:32:18 -07:00