Commit Graph

199 Commits

Author SHA1 Message Date
Minchan Kim
7df45e50a5 ANDROID: vendor hook to control blk_plug for memory reclaim
Add vendor hook to contorl blk plugging.

Bug: 255471591
Bug: 238728493
Change-Id: I96b73cec14f0d2fea46a4828526e6ae5aa5c71b7
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit a17e132ec4f290621666311e73f43202706d2743)
2023-01-13 18:43:29 +00:00
Greg Kroah-Hartman
eb07b1080f Merge 5.15.72 into android14-5.15
Changes in 5.15.72
	ALSA: hda: Do disconnect jacks at codec unbind
	ALSA: hda: Fix hang at HD-audio codec unbinding due to refcount saturation
	ALSA: hda: Fix Nvidia dp infoframe
	ALSA: hda/realtek: fix speakers and micmute on HP 855 G8
	cgroup: reduce dependency on cgroup_mutex
	cgroup: cgroup_get_from_id() must check the looked-up kn is a directory
	uas: add no-uas quirk for Hiksemi usb_disk
	usb-storage: Add Hiksemi USB3-FW to IGNORE_UAS
	uas: ignore UAS for Thinkplus chips
	usb: typec: ucsi: Remove incorrect warning
	thunderbolt: Explicitly reset plug events delay back to USB4 spec value
	net: usb: qmi_wwan: Add new usb-id for Dell branded EM7455
	Input: snvs_pwrkey - fix SNVS_HPVIDR1 register address
	can: c_can: don't cache TX messages for C_CAN cores
	clk: ingenic-tcu: Properly enable registers before accessing timers
	x86/sgx: Do not fail on incomplete sanitization on premature stop of ksgxd
	ARM: dts: integrator: Tag PCI host with device_type
	ntfs: fix BUG_ON in ntfs_lookup_inode_by_name()
	mm/damon/dbgfs: fix memory leak when using debugfs_lookup()
	net: mt7531: only do PLL once after the reset
	Revert "firmware: arm_scmi: Add clock management to the SCMI power domain"
	drm/i915/gt: Restrict forced preemption to the active context
	drm/amdgpu: Add amdgpu suspend-resume code path under SRIOV
	vduse: prevent uninitialized memory accesses
	libata: add ATA_HORKAGE_NOLPM for Pioneer BDR-207M and BDR-205
	mmc: moxart: fix 4-bit bus width and remove 8-bit bus width
	mmc: hsq: Fix data stomping during mmc recovery
	mm/page_alloc: fix race condition between build_all_zonelists and page allocation
	mm: prevent page_frag_alloc() from corrupting the memory
	mm: fix dereferencing possible ERR_PTR
	mm/migrate_device.c: flush TLB while holding PTL
	mm: fix madivse_pageout mishandling on non-LRU page
	mm,hwpoison: check mm when killing accessing process
	media: dvb_vb2: fix possible out of bound access
	media: rkvdec: Disable H.264 error detection
	media: v4l2-compat-ioctl32.c: zero buffer passed to v4l2_compat_get_array_args()
	swiotlb: max mapping size takes min align mask into account
	ARM: dts: am33xx: Fix MMCHS0 dma properties
	reset: imx7: Fix the iMX8MP PCIe PHY PERST support
	ARM: dts: am5748: keep usb4_tm disabled
	soc: sunxi: sram: Actually claim SRAM regions
	soc: sunxi: sram: Prevent the driver from being unbound
	soc: sunxi_sram: Make use of the helper function devm_platform_ioremap_resource()
	soc: sunxi: sram: Fix probe function ordering issues
	soc: sunxi: sram: Fix debugfs info for A64 SRAM C
	ASoC: imx-card: Fix refcount issue with of_node_put
	arm64: dts: qcom: sm8350: fix UFS PHY serdes size
	ASoC: tas2770: Reinit regcache on reset
	drm/bridge: lt8912b: add vsync hsync
	drm/bridge: lt8912b: set hdmi or dvi mode
	drm/bridge: lt8912b: fix corrupted image output
	Revert "drm: bridge: analogix/dp: add panel prepare/unprepare in suspend/resume time"
	Input: melfas_mip4 - fix return value check in mip4_probe()
	gpio: mvebu: Fix check for pwm support on non-A8K platforms
	usbnet: Fix memory leak in usbnet_disconnect()
	net: sched: act_ct: fix possible refcount leak in tcf_ct_init()
	cxgb4: fix missing unlock on ETHOFLD desc collect fail path
	net/mlxbf_gige: Fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe
	nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devices
	wifi: mac80211: fix regression with non-QoS drivers
	net: stmmac: power up/down serdes in stmmac_open/release
	net: phy: Don't WARN for PHY_UP state in mdio_bus_phy_resume()
	selftests: Fix the if conditions of in test_extra_filter()
	vdpa/ifcvf: fix the calculation of queuepair
	fs: split off setxattr_copy and do_setxattr function from setxattr
	clk: imx: imx6sx: remove the SET_RATE_PARENT flag for QSPI clocks
	clk: iproc: Do not rely on node name for correct PLL setup
	KVM: x86: Hide IA32_PLATFORM_DCA_CAP[31:0] from the guest
	x86/alternative: Fix race in try_get_desc()
	drm/i915/gem: Really move i915_gem_context.link under ref protection
	Linux 5.15.72

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib8569de2af78d5080b026a46196aad5fc816fd42
2022-10-05 11:59:55 +02:00
Minchan Kim
f9aed3d8a0 mm: fix madivse_pageout mishandling on non-LRU page
commit 58d426a7ba92870d489686dfdb9d06b66815a2ab upstream.

MADV_PAGEOUT tries to isolate non-LRU pages and gets a warning from
isolate_lru_page below.

Fix it by checking PageLRU in advance.

------------[ cut here ]------------
trying to isolate tail page
WARNING: CPU: 0 PID: 6175 at mm/folio-compat.c:158 isolate_lru_page+0x130/0x140
Modules linked in:
CPU: 0 PID: 6175 Comm: syz-executor.0 Not tainted 5.18.12 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:isolate_lru_page+0x130/0x140

Link: https://lore.kernel.org/linux-mm/485f8c33.2471b.182d5726afb.Coremail.hantianshuo@iie.ac.cn/
Link: https://lkml.kernel.org/r/20220908151204.762596-1-minchan@kernel.org
Fixes: 1a4e58cce8 ("mm: introduce MADV_PAGEOUT")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: 韩天ç`• <hantianshuo@iie.ac.cn>
Suggested-by: Yang Shi <shy828301@gmail.com>
Acked-by: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-05 10:39:39 +02:00
Greg Kroah-Hartman
06d58f3cef Merge 5.15.54 into android14-5.15
Changes in 5.15.54
	mm/slub: add missing TID updates on slab deactivation
	mm/filemap: fix UAF in find_lock_entries
	Revert "selftests/bpf: Add test for bpf_timer overwriting crash"
	ALSA: usb-audio: Workarounds for Behringer UMC 204/404 HD
	ALSA: hda/realtek: Add quirk for Clevo L140PU
	ALSA: cs46xx: Fix missing snd_card_free() call at probe error
	can: bcm: use call_rcu() instead of costly synchronize_rcu()
	can: grcan: grcan_probe(): remove extra of_node_get()
	can: gs_usb: gs_usb_open/close(): fix memory leak
	can: m_can: m_can_chip_config(): actually enable internal timestamping
	can: m_can: m_can_{read_fifo,echo_tx_event}(): shift timestamp to full 32 bits
	can: mcp251xfd: mcp251xfd_regmap_crc_read(): improve workaround handling for mcp2517fd
	can: mcp251xfd: mcp251xfd_regmap_crc_read(): update workaround broken CRC on TBC register
	bpf: Fix incorrect verifier simulation around jmp32's jeq/jne
	bpf: Fix insufficient bounds propagation from adjust_scalar_min_max_vals
	usbnet: fix memory leak in error case
	net: rose: fix UAF bug caused by rose_t0timer_expiry
	netfilter: nft_set_pipapo: release elements in clone from abort path
	netfilter: nf_tables: stricter validation of element data
	btrfs: rename btrfs_alloc_chunk to btrfs_create_chunk
	btrfs: add additional parameters to btrfs_init_tree_ref/btrfs_init_data_ref
	btrfs: fix invalid delayed ref after subvolume creation failure
	btrfs: fix warning when freeing leaf after subvolume creation failure
	Input: cpcap-pwrbutton - handle errors from platform_get_irq()
	Input: goodix - change goodix_i2c_write() len parameter type to int
	Input: goodix - add a goodix.h header file
	Input: goodix - refactor reset handling
	Input: goodix - try not to touch the reset-pin on x86/ACPI devices
	dma-buf/poll: Get a file reference for outstanding fence callbacks
	btrfs: fix deadlock between chunk allocation and chunk btree modifications
	drm/i915: Disable bonding on gen12+ platforms
	drm/i915/gt: Register the migrate contexts with their engines
	drm/i915: Replace the unconditional clflush with drm_clflush_virt_range()
	PCI/portdrv: Rename pm_iter() to pcie_port_device_iter()
	PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot Reset
	media: ir_toy: prevent device from hanging during transmit
	memory: renesas-rpc-if: Avoid unaligned bus access for HyperFlash
	ath11k: add hw_param for wakeup_mhi
	qed: Improve the stack space of filter_config()
	platform/x86: wmi: introduce helper to convert driver to WMI driver
	platform/x86: wmi: Replace read_takes_no_args with a flags field
	platform/x86: wmi: Fix driver->notify() vs ->probe() race
	mt76: mt7921: get rid of mt7921_mac_set_beacon_filter
	mt76: mt7921: introduce mt7921_mcu_set_beacon_filter utility routine
	mt76: mt7921: fix a possible race enabling/disabling runtime-pm
	bpf: Stop caching subprog index in the bpf_pseudo_func insn
	bpf, arm64: Use emit_addr_mov_i64() for BPF_PSEUDO_FUNC
	riscv: defconfig: enable DRM_NOUVEAU
	RISC-V: defconfigs: Set CONFIG_FB=y, for FB console
	net/mlx5e: Check action fwd/drop flag exists also for nic flows
	net/mlx5e: Split actions_match_supported() into a sub function
	net/mlx5e: TC, Reject rules with drop and modify hdr action
	net/mlx5e: TC, Reject rules with forward and drop actions
	ASoC: rt5682: Avoid the unexpected IRQ event during going to suspend
	ASoC: rt5682: Re-detect the combo jack after resuming
	ASoC: rt5682: Fix deadlock on resume
	netfilter: nf_tables: convert pktinfo->tprot_set to flags field
	netfilter: nft_payload: support for inner header matching / mangling
	netfilter: nft_payload: don't allow th access for fragments
	s390/boot: allocate amode31 section in decompressor
	s390/setup: use physical pointers for memblock_reserve()
	s390/setup: preserve memory at OLDMEM_BASE and OLDMEM_SIZE
	ibmvnic: init init_done_rc earlier
	ibmvnic: clear fop when retrying probe
	ibmvnic: Allow queueing resets during probe
	virtio-blk: avoid preallocating big SGL for data
	io_uring: ensure that fsnotify is always called
	block: use bdev_get_queue() in bio.c
	block: only mark bio as tracked if it really is tracked
	block: fix rq-qos breakage from skipping rq_qos_done_bio()
	stddef: Introduce struct_group() helper macro
	media: omap3isp: Use struct_group() for memcpy() region
	media: davinci: vpif: fix use-after-free on driver unbind
	mt76: mt76_connac: fix MCU_CE_CMD_SET_ROC definition error
	mt76: mt7921: do not always disable fw runtime-pm
	cxl/port: Hold port reference until decoder release
	clk: renesas: r9a07g044: Update multiplier and divider values for PLL2/3
	KVM: x86/mmu: Use yield-safe TDP MMU root iter in MMU notifier unmapping
	KVM: x86/mmu: Use common TDP MMU zap helper for MMU notifier unmap hook
	scsi: qla2xxx: Move heartbeat handling from DPC thread to workqueue
	scsi: qla2xxx: Fix laggy FC remote port session recovery
	scsi: qla2xxx: edif: Replace list_for_each_safe with list_for_each_entry_safe
	scsi: qla2xxx: Fix crash during module load unload test
	gfs2: Fix gfs2_file_buffered_write endless loop workaround
	vdpa/mlx5: Avoid processing works if workqueue was destroyed
	btrfs: handle device lookup with btrfs_dev_lookup_args
	btrfs: add a btrfs_get_dev_args_from_path helper
	btrfs: use btrfs_get_dev_args_from_path in dev removal ioctls
	btrfs: remove device item and update super block in the same transaction
	drbd: add error handling support for add_disk()
	drbd: Fix double free problem in drbd_create_device
	drbd: fix an invalid memory access caused by incorrect use of list iterator
	drm/amd/display: Set min dcfclk if pipe count is 0
	drm/amd/display: Fix by adding FPU protection for dcn30_internal_validate_bw
	NFSD: De-duplicate net_generic(nf->nf_net, nfsd_net_id)
	NFSD: COMMIT operations must not return NFS?ERR_INVAL
	riscv/mm: Add XIP_FIXUP for riscv_pfn_base
	iio: accel: mma8452: use the correct logic to get mma8452_data
	batman-adv: Use netif_rx().
	mtd: spi-nor: Skip erase logic when SPI_NOR_NO_ERASE is set
	Compiler Attributes: add __alloc_size() for better bounds checking
	mm: vmalloc: introduce array allocation functions
	KVM: use __vcalloc for very large allocations
	btrfs: don't access possibly stale fs_info data in device_list_add
	KVM: s390x: fix SCK locking
	scsi: qla2xxx: Fix loss of NVMe namespaces after driver reload test
	powerpc/32: Don't use lmw/stmw for saving/restoring non volatile regs
	powerpc: flexible GPR range save/restore macros
	powerpc/tm: Fix more userspace r13 corruption
	serial: sc16is7xx: Clear RS485 bits in the shutdown
	bus: mhi: core: Use correctly sized arguments for bit field
	bus: mhi: Fix pm_state conversion to string
	stddef: Introduce DECLARE_FLEX_ARRAY() helper
	uapi/linux/stddef.h: Add include guards
	ASoC: rt5682: move clk related code to rt5682_i2c_probe
	ASoC: rt5682: fix an incorrect NULL check on list iterator
	drm/amd/vcn: fix an error msg on vcn 3.0
	KVM: Don't create VM debugfs files outside of the VM directory
	tty: n_gsm: Modify CR,PF bit when config requester
	tty: n_gsm: Save dlci address open status when config requester
	tty: n_gsm: fix frame reception handling
	ALSA: usb-audio: add mapping for MSI MPG X570S Carbon Max Wifi.
	ALSA: usb-audio: add mapping for MSI MAG X570S Torpedo MAX.
	tty: n_gsm: fix missing update of modem controls after DLCI open
	btrfs: zoned: encapsulate inode locking for zoned relocation
	btrfs: zoned: use dedicated lock for data relocation
	KVM: Initialize debugfs_dentry when a VM is created to avoid NULL deref
	mm/hwpoison: mf_mutex for soft offline and unpoison
	mm/hwpoison: avoid the impact of hwpoison_filter() return value on mce handler
	mm/memory-failure.c: fix race with changing page compound again
	mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()
	tty: n_gsm: fix invalid use of MSC in advanced option
	tty: n_gsm: fix sometimes uninitialized warning in gsm_dlci_modem_output()
	serial: 8250_mtk: Make sure to select the right FEATURE_SEL
	tty: n_gsm: fix invalid gsmtty_write_room() result
	drm/amd: Refactor `amdgpu_aspm` to be evaluated per device
	drm/amdgpu: vi: disable ASPM on Intel Alder Lake based systems
	drm/i915: Fix a race between vma / object destruction and unbinding
	drm/mediatek: Use mailbox rx_callback instead of cmdq_task_cb
	drm/mediatek: Remove the pointer of struct cmdq_client
	drm/mediatek: Detect CMDQ execution timeout
	drm/mediatek: Add cmdq_handle in mtk_crtc
	drm/mediatek: Add vblank register/unregister callback functions
	Bluetooth: protect le accept and resolv lists with hdev->lock
	Bluetooth: btmtksdio: fix use-after-free at btmtksdio_recv_event
	io_uring: avoid io-wq -EAGAIN looping for !IOPOLL
	irqchip/gic-v3: Ensure pseudo-NMIs have an ISB between ack and handling
	irqchip/gic-v3: Refactor ISB + EOIR at ack time
	rxrpc: Fix locking issue
	dt-bindings: soc: qcom: smd-rpm: Add compatible for MSM8953 SoC
	dt-bindings: soc: qcom: smd-rpm: Fix missing MSM8936 compatible
	module: change to print useful messages from elf_validity_check()
	module: fix [e_shstrndx].sh_size=0 OOB access
	iommu/vt-d: Fix PCI bus rescan device hot add
	fbdev: fbmem: Fix logo center image dx issue
	fbmem: Check virtual screen sizes in fb_set_var()
	fbcon: Disallow setting font bigger than screen size
	fbcon: Prevent that screen size is smaller than font size
	PM: runtime: Redefine pm_runtime_release_supplier()
	memregion: Fix memregion_free() fallback definition
	video: of_display_timing.h: include errno.h
	powerpc/powernv: delay rng platform device creation until later in boot
	net: dsa: qca8k: reset cpu port on MTU change
	can: kvaser_usb: replace run-time checks with struct kvaser_usb_driver_info
	can: kvaser_usb: kvaser_usb_leaf: fix CAN clock frequency regression
	can: kvaser_usb: kvaser_usb_leaf: fix bittiming limits
	xfs: remove incorrect ASSERT in xfs_rename
	Revert "serial: sc16is7xx: Clear RS485 bits in the shutdown"
	btrfs: fix error pointer dereference in btrfs_ioctl_rm_dev_v2()
	virtio-blk: modify the value type of num in virtio_queue_rq()
	btrfs: fix use of uninitialized variable at rm device ioctl
	tty: n_gsm: fix encoding of command/response bit
	ARM: meson: Fix refcount leak in meson_smp_prepare_cpus
	pinctrl: sunxi: a83t: Fix NAND function name for some pins
	ASoC: rt711: Add endianness flag in snd_soc_component_driver
	ASoC: rt711-sdca: Add endianness flag in snd_soc_component_driver
	ASoC: codecs: rt700/rt711/rt711-sdca: resume bus/codec in .set_jack_detect
	arm64: dts: qcom: msm8994: Fix CPU6/7 reg values
	arm64: dts: qcom: sdm845: use dispcc AHB clock for mdss node
	ARM: mxs_defconfig: Enable the framebuffer
	arm64: dts: imx8mp-evk: correct mmc pad settings
	arm64: dts: imx8mp-evk: correct the uart2 pinctl value
	arm64: dts: imx8mp-evk: correct gpio-led pad settings
	arm64: dts: imx8mp-evk: correct vbus pad settings
	arm64: dts: imx8mp-evk: correct eqos pad settings
	arm64: dts: imx8mp-evk: correct I2C1 pad settings
	arm64: dts: imx8mp-evk: correct I2C3 pad settings
	arm64: dts: imx8mp-phyboard-pollux-rdk: correct uart pad settings
	arm64: dts: imx8mp-phyboard-pollux-rdk: correct eqos pad settings
	arm64: dts: imx8mp-phyboard-pollux-rdk: correct i2c2 & mmc settings
	pinctrl: sunxi: sunxi_pconf_set: use correct offset
	arm64: dts: qcom: msm8992-*: Fix vdd_lvs1_2-supply typo
	ARM: at91: pm: use proper compatible for sama5d2's rtc
	ARM: at91: pm: use proper compatibles for sam9x60's rtc and rtt
	ARM: at91: pm: use proper compatibles for sama7g5's rtc and rtt
	ARM: dts: at91: sam9x60ek: fix eeprom compatible and size
	ARM: dts: at91: sama5d2_icp: fix eeprom compatibles
	ARM: at91: fix soc detection for SAM9X60 SiPs
	xsk: Clear page contiguity bit when unmapping pool
	i2c: piix4: Fix a memory leak in the EFCH MMIO support
	i40e: Fix dropped jumbo frames statistics
	i40e: Fix VF's MAC Address change on VM
	ARM: dts: stm32: use usbphyc ck_usbo_48m as USBH OHCI clock on stm32mp151
	ARM: dts: stm32: add missing usbh clock and fix clk order on stm32mp15
	ibmvnic: Properly dispose of all skbs during a failover.
	selftests: forwarding: fix flood_unicast_test when h2 supports IFF_UNICAST_FLT
	selftests: forwarding: fix learning_test when h1 supports IFF_UNICAST_FLT
	selftests: forwarding: fix error message in learning_test
	r8169: fix accessing unset transport header
	i2c: cadence: Unregister the clk notifier in error path
	dmaengine: imx-sdma: Allow imx8m for imx7 FW revs
	misc: rtsx_usb: fix use of dma mapped buffer for usb bulk transfer
	misc: rtsx_usb: use separate command and response buffers
	misc: rtsx_usb: set return value in rsp_buf alloc err path
	Revert "mm/memory-failure.c: fix race with changing page compound again"
	Revert "serial: 8250_mtk: Make sure to select the right FEATURE_SEL"
	dt-bindings: dma: allwinner,sun50i-a64-dma: Fix min/max typo
	ida: don't use BUG_ON() for debugging
	dmaengine: pl330: Fix lockdep warning about non-static key
	dmaengine: lgm: Fix an error handling path in intel_ldma_probe()
	dmaengine: at_xdma: handle errors of at_xdmac_alloc_desc() correctly
	dmaengine: ti: Fix refcount leak in ti_dra7_xbar_route_allocate
	dmaengine: qcom: bam_dma: fix runtime PM underflow
	dmaengine: ti: Add missing put_device in ti_dra7_xbar_route_allocate
	dmaengine: idxd: force wq context cleanup on device disable path
	selftests/net: fix section name when using xdp_dummy.o
	Linux 5.15.54

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3ca4c0aa09a3bea6969c7a127d833034a123f437
2022-07-13 19:41:43 +02:00
luofei
7a07875fab mm/hwpoison: avoid the impact of hwpoison_filter() return value on mce handler
[ Upstream commit d1fe111fb62a1cf0446a2919f5effbb33ad0702c ]

When the hwpoison page meets the filter conditions, it should not be
regarded as successful memory_failure() processing for mce handler, but
should return a distinct value, otherwise mce handler regards the error
page has been identified and isolated, which may lead to calling
set_mce_nospec() to change page attribute, etc.

Here memory_failure() return -EOPNOTSUPP to indicate that the error
event is filtered, mce handler should not take any action for this
situation and hwpoison injector should treat as correct.

Link: https://lkml.kernel.org/r/20220223082135.2769649-1-luofei@unicloud.com
Signed-off-by: luofei <luofei@unicloud.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-07-12 16:35:05 +02:00
Greg Kroah-Hartman
b41a37c036 Merge 5.15.33 into android13-5.15
Changes in 5.15.33
	Revert "swiotlb: rework "fix info leak with DMA_FROM_DEVICE""
	USB: serial: pl2303: add IBM device IDs
	dt-bindings: usb: hcd: correct usb-device path
	USB: serial: pl2303: fix GS type detection
	USB: serial: simple: add Nokia phone driver
	mm: kfence: fix missing objcg housekeeping for SLAB
	hv: utils: add PTP_1588_CLOCK to Kconfig to fix build
	HID: logitech-dj: add new lightspeed receiver id
	HID: Add support for open wheel and no attachment to T300
	xfrm: fix tunnel model fragmentation behavior
	ARM: mstar: Select HAVE_ARM_ARCH_TIMER
	virtio_console: break out of buf poll on remove
	vdpa/mlx5: should verify CTRL_VQ feature exists for MQ
	tools/virtio: fix virtio_test execution
	ethernet: sun: Free the coherent when failing in probing
	gpio: Revert regression in sysfs-gpio (gpiolib.c)
	spi: Fix invalid sgs value
	net:mcf8390: Use platform_get_irq() to get the interrupt
	Revert "gpio: Revert regression in sysfs-gpio (gpiolib.c)"
	spi: Fix erroneous sgs value with min_t()
	Input: zinitix - do not report shadow fingers
	af_key: add __GFP_ZERO flag for compose_sadb_supported in function pfkey_register
	net: dsa: microchip: add spi_device_id tables
	selftests: vm: fix clang build error multiple output files
	locking/lockdep: Avoid potential access of invalid memory in lock_class
	drm/amdgpu: move PX checking into amdgpu_device_ip_early_init
	drm/amdgpu: only check for _PR3 on dGPUs
	iommu/iova: Improve 32-bit free space estimate
	virtio-blk: Use blk_validate_block_size() to validate block size
	tpm: fix reference counting for struct tpm_chip
	usb: typec: tipd: Forward plug orientation to typec subsystem
	USB: usb-storage: Fix use of bitfields for hardware data in ene_ub6250.c
	xhci: fix garbage USBSTS being logged in some cases
	xhci: fix runtime PM imbalance in USB2 resume
	xhci: make xhci_handshake timeout for xhci_reset() adjustable
	xhci: fix uninitialized string returned by xhci_decode_ctrl_ctx()
	mei: me: disable driver on the ign firmware
	mei: me: add Alder Lake N device id.
	mei: avoid iterator usage outside of list_for_each_entry
	bus: mhi: pci_generic: Add mru_default for Quectel EM1xx series
	bus: mhi: Fix MHI DMA structure endianness
	docs: sphinx/requirements: Limit jinja2<3.1
	coresight: Fix TRCCONFIGR.QE sysfs interface
	coresight: syscfg: Fix memleak on registration failure in cscfg_create_device
	iio: afe: rescale: use s64 for temporary scale calculations
	iio: inkern: apply consumer scale on IIO_VAL_INT cases
	iio: inkern: apply consumer scale when no channel scale is available
	iio: inkern: make a best effort on offset calculation
	greybus: svc: fix an error handling bug in gb_svc_hello()
	clk: rockchip: re-add rational best approximation algorithm to the fractional divider
	clk: uniphier: Fix fixed-rate initialization
	ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
	cifs: fix handlecache and multiuser
	cifs: we do not need a spinlock around the tree access during umount
	KEYS: fix length validation in keyctl_pkey_params_get_2()
	KEYS: asymmetric: enforce that sig algo matches key algo
	KEYS: asymmetric: properly validate hash_algo and encoding
	Documentation: add link to stable release candidate tree
	Documentation: update stable tree link
	firmware: stratix10-svc: add missing callback parameter on RSU
	firmware: sysfb: fix platform-device leak in error path
	HID: intel-ish-hid: Use dma_alloc_coherent for firmware update
	SUNRPC: avoid race between mod_timer() and del_timer_sync()
	NFS: NFSv2/v3 clients should never be setting NFS_CAP_XATTR
	NFSD: prevent underflow in nfssvc_decode_writeargs()
	NFSD: prevent integer overflow on 32 bit systems
	f2fs: fix to unlock page correctly in error path of is_alive()
	f2fs: quota: fix loop condition at f2fs_quota_sync()
	f2fs: fix to do sanity check on .cp_pack_total_block_count
	remoteproc: Fix count check in rproc_coredump_write()
	mm/mlock: fix two bugs in user_shm_lock()
	pinctrl: ingenic: Fix regmap on X series SoCs
	pinctrl: samsung: drop pin banks references on error paths
	net: bnxt_ptp: fix compilation error
	spi: mxic: Fix the transmit path
	mtd: rawnand: protect access to rawnand devices while in suspend
	can: ems_usb: ems_usb_start_xmit(): fix double dev_kfree_skb() in error path
	can: m_can: m_can_tx_handler(): fix use after free of skb
	can: usb_8dev: usb_8dev_start_xmit(): fix double dev_kfree_skb() in error path
	jffs2: fix use-after-free in jffs2_clear_xattr_subsystem
	jffs2: fix memory leak in jffs2_do_mount_fs
	jffs2: fix memory leak in jffs2_scan_medium
	mm: fs: fix lru_cache_disabled race in bh_lru
	mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node
	mm: invalidate hwpoison page cache page in fault path
	mempolicy: mbind_range() set_policy() after vma_merge()
	scsi: core: sd: Add silence_suspend flag to suppress some PM messages
	scsi: ufs: Fix runtime PM messages never-ending cycle
	scsi: scsi_transport_fc: Fix FPIN Link Integrity statistics counters
	scsi: libsas: Fix sas_ata_qc_issue() handling of NCQ NON DATA commands
	qed: display VF trust config
	qed: validate and restrict untrusted VFs vlan promisc mode
	riscv: dts: canaan: Fix SPI3 bus width
	riscv: Fix fill_callchain return value
	riscv: Increase stack size under KASAN
	Revert "Input: clear BTN_RIGHT/MIDDLE on buttonpads"
	cifs: prevent bad output lengths in smb2_ioctl_query_info()
	cifs: fix NULL ptr dereference in smb2_ioctl_query_info()
	ALSA: cs4236: fix an incorrect NULL check on list iterator
	ALSA: hda: Avoid unsol event during RPM suspending
	ALSA: pcm: Fix potential AB/BA lock with buffer_mutex and mmap_lock
	ALSA: hda/realtek: Fix audio regression on Mi Notebook Pro 2020
	rtc: mc146818-lib: fix locking in mc146818_set_time
	rtc: pl031: fix rtc features null pointer dereference
	ocfs2: fix crash when mount with quota enabled
	drm/simpledrm: Add "panel orientation" property on non-upright mounted LCD panels
	mm: madvise: skip unmapped vma holes passed to process_madvise
	mm: madvise: return correct bytes advised with process_madvise
	Revert "mm: madvise: skip unmapped vma holes passed to process_madvise"
	mm,hwpoison: unmap poisoned page before invalidation
	mm/kmemleak: reset tag when compare object pointer
	dm stats: fix too short end duration_ns when using precise_timestamps
	dm: fix use-after-free in dm_cleanup_zoned_dev()
	dm: interlock pending dm_io and dm_wait_for_bios_completion
	dm: fix double accounting of flush with data
	dm integrity: set journal entry unused when shrinking device
	tracing: Have trace event string test handle zero length strings
	drbd: fix potential silent data corruption
	powerpc/kvm: Fix kvm_use_magic_page
	PCI: fu740: Force 2.5GT/s for initial device probe
	arm64: signal: nofpsimd: Do not allocate fp/simd context when not available
	arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones
	arm64: dts: qcom: sm8250: Fix MSI IRQ for PCIe1 and PCIe2
	arm64: dts: ti: k3-am65: Fix gic-v3 compatible regs
	arm64: dts: ti: k3-j721e: Fix gic-v3 compatible regs
	arm64: dts: ti: k3-j7200: Fix gic-v3 compatible regs
	arm64: dts: ti: k3-am64: Fix gic-v3 compatible regs
	ASoC: SOF: Intel: Fix NULL ptr dereference when ENOMEM
	Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag"
	ACPI: properties: Consistently return -ENOENT if there are no more references
	coredump: Also dump first pages of non-executable ELF libraries
	ext4: fix ext4_fc_stats trace point
	ext4: fix fs corruption when tring to remove a non-empty directory with IO error
	ext4: make mb_optimize_scan performance mount option work with extents
	drivers: hamradio: 6pack: fix UAF bug caused by mod_timer()
	samples/landlock: Fix path_list memory leak
	landlock: Use square brackets around "landlock-ruleset"
	mailbox: tegra-hsp: Flush whole channel
	block: limit request dispatch loop duration
	block: don't merge across cgroup boundaries if blkcg is enabled
	drm/edid: check basic audio support on CEA extension block
	fbdev: Hot-unplug firmware fb devices on forced removal
	video: fbdev: sm712fb: Fix crash in smtcfb_read()
	video: fbdev: atari: Atari 2 bpp (STe) palette bugfix
	rfkill: make new event layout opt-in
	ARM: dts: at91: sama7g5: Remove unused properties in i2c nodes
	ARM: dts: at91: sama5d2: Fix PMERRLOC resource size
	ARM: dts: exynos: fix UART3 pins configuration in Exynos5250
	ARM: dts: exynos: add missing HDMI supplies on SMDK5250
	ARM: dts: exynos: add missing HDMI supplies on SMDK5420
	mgag200 fix memmapsl configuration in GCTL6 register
	carl9170: fix missing bit-wise or operator for tx_params
	pstore: Don't use semaphores in always-atomic-context code
	thermal: int340x: Increase bitmap size
	lib/raid6/test: fix multiple definition linking error
	exec: Force single empty string when argv is empty
	crypto: rsa-pkcs1pad - only allow with rsa
	crypto: rsa-pkcs1pad - correctly get hash from source scatterlist
	crypto: rsa-pkcs1pad - restore signature length check
	crypto: rsa-pkcs1pad - fix buffer overread in pkcs1pad_verify_complete()
	bcache: fixup multiple threads crash
	PM: domains: Fix sleep-in-atomic bug caused by genpd_debug_remove()
	DEC: Limit PMAX memory probing to R3k systems
	media: gpio-ir-tx: fix transmit with long spaces on Orange Pi PC
	media: venus: hfi_cmds: List HDR10 property as unsupported for v1 and v3
	media: venus: venc: Fix h264 8x8 transform control
	media: davinci: vpif: fix unbalanced runtime PM get
	media: davinci: vpif: fix unbalanced runtime PM enable
	btrfs: zoned: mark relocation as writing
	btrfs: extend locking to all space_info members accesses
	btrfs: verify the tranisd of the to-be-written dirty extent buffer
	xtensa: define update_mmu_tlb function
	xtensa: fix stop_machine_cpuslocked call in patch_text
	xtensa: fix xtensa_wsr always writing 0
	drm/syncobj: flatten dma_fence_chains on transfer
	drm/nouveau/backlight: Fix LVDS backlight detection on some laptops
	drm/nouveau/backlight: Just set all backlight types as RAW
	drm/fb-helper: Mark screen buffers in system memory with FBINFO_VIRTFB
	brcmfmac: firmware: Allocate space for default boardrev in nvram
	brcmfmac: pcie: Release firmwares in the brcmf_pcie_setup error path
	brcmfmac: pcie: Declare missing firmware files in pcie.c
	brcmfmac: pcie: Replace brcmf_pcie_copy_mem_todev with memcpy_toio
	brcmfmac: pcie: Fix crashes due to early IRQs
	drm/i915/opregion: check port number bounds for SWSCI display power state
	drm/i915/gem: add missing boundary check in vm_access
	PCI: imx6: Allow to probe when dw_pcie_wait_for_link() fails
	PCI: pciehp: Clear cmd_busy bit in polling mode
	PCI: xgene: Revert "PCI: xgene: Fix IB window setup"
	regulator: qcom_smd: fix for_each_child.cocci warnings
	selinux: access superblock_security_struct in LSM blob way
	selinux: check return value of sel_make_avc_files
	crypto: ccp - Ensure psp_ret is always init'd in __sev_platform_init_locked()
	hwrng: cavium - Check health status while reading random data
	hwrng: cavium - HW_RANDOM_CAVIUM should depend on ARCH_THUNDER
	crypto: sun8i-ss - really disable hash on A80
	crypto: authenc - Fix sleep in atomic context in decrypt_tail
	crypto: mxs-dcp - Fix scatterlist processing
	selinux: Fix selinux_sb_mnt_opts_compat()
	thermal: int340x: Check for NULL after calling kmemdup()
	crypto: octeontx2 - remove CONFIG_DM_CRYPT check
	spi: tegra114: Add missing IRQ check in tegra_spi_probe
	spi: tegra210-quad: Fix missin IRQ check in tegra_qspi_probe
	stack: Constrain and fix stack offset randomization with Clang builds
	arm64/mm: avoid fixmap race condition when create pud mapping
	blk-cgroup: set blkg iostat after percpu stat aggregation
	selftests/x86: Add validity check and allow field splitting
	selftests/sgx: Treat CC as one argument
	crypto: rockchip - ECB does not need IV
	audit: log AUDIT_TIME_* records only from rules
	EVM: fix the evm= __setup handler return value
	crypto: ccree - don't attempt 0 len DMA mappings
	crypto: hisilicon/sec - fix the aead software fallback for engine
	spi: pxa2xx-pci: Balance reference count for PCI DMA device
	hwmon: (pmbus) Add mutex to regulator ops
	hwmon: (sch56xx-common) Replace WDOG_ACTIVE with WDOG_HW_RUNNING
	nvme: cleanup __nvme_check_ids
	nvme: fix the check for duplicate unique identifiers
	block: don't delete queue kobject before its children
	PM: hibernate: fix __setup handler error handling
	PM: suspend: fix return value of __setup handler
	spi: spi-zynqmp-gqspi: Handle error for dma_set_mask
	hwrng: atmel - disable trng on failure path
	crypto: sun8i-ss - call finalize with bh disabled
	crypto: sun8i-ce - call finalize with bh disabled
	crypto: amlogic - call finalize with bh disabled
	crypto: gemini - call finalize with bh disabled
	crypto: vmx - add missing dependencies
	clocksource/drivers/timer-ti-dm: Fix regression from errata i940 fix
	clocksource/drivers/exynos_mct: Refactor resources allocation
	clocksource/drivers/exynos_mct: Handle DTS with higher number of interrupts
	clocksource/drivers/timer-microchip-pit64b: Use notrace
	clocksource/drivers/timer-of: Check return value of of_iomap in timer_of_base_init()
	arm64: prevent instrumentation of bp hardening callbacks
	KEYS: trusted: Fix trusted key backends when building as module
	KEYS: trusted: Avoid calling null function trusted_key_exit
	ACPI: APEI: fix return value of __setup handlers
	crypto: ccp - ccp_dmaengine_unregister release dma channels
	crypto: ccree - Fix use after free in cc_cipher_exit()
	hwrng: nomadik - Change clk_disable to clk_disable_unprepare
	hwmon: (pmbus) Add Vin unit off handling
	clocksource: acpi_pm: fix return value of __setup handler
	io_uring: don't check unrelated req->open.how in accept request
	io_uring: terminate manual loop iterator loop correctly for non-vecs
	watch_queue: Fix NULL dereference in error cleanup
	watch_queue: Actually free the watch
	f2fs: fix to enable ATGC correctly via gc_idle sysfs interface
	sched/debug: Remove mpol_get/put and task_lock/unlock from sched_show_numa
	sched/core: Export pelt_thermal_tp
	sched/uclamp: Fix iowait boost escaping uclamp restriction
	rseq: Remove broken uapi field layout on 32-bit little endian
	perf/core: Fix address filter parser for multiple filters
	perf/x86/intel/pt: Fix address filter config for 32-bit kernel
	sched/fair: Improve consistency of allowed NUMA balance calculations
	f2fs: fix missing free nid in f2fs_handle_failed_inode
	nfsd: more robust allocation failure handling in nfsd_file_cache_init
	sched/cpuacct: Fix charge percpu cpuusage
	sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
	f2fs: fix to avoid potential deadlock
	btrfs: fix unexpected error path when reflinking an inline extent
	f2fs: fix compressed file start atomic write may cause data corruption
	selftests, x86: fix how check_cc.sh is being invoked
	drivers/base/memory: add memory block to memory group after registration succeeded
	kunit: make kunit_test_timeout compatible with comment
	pinctrl: samsung: Remove EINT handler for Exynos850 ALIVE and CMGP gpios
	media: staging: media: zoran: fix usage of vb2_dma_contig_set_max_seg_size
	media: camss: csid-170: fix non-10bit formats
	media: camss: csid-170: don't enable unused irqs
	media: camss: csid-170: set the right HALT_CMD when disabled
	media: camss: vfe-170: fix "VFE halt timeout" error
	media: staging: media: imx: imx7-mipi-csis: Make subdev name unique
	media: v4l2-mem2mem: Apply DST_QUEUE_OFF_BASE on MMAP buffers across ioctls
	media: mtk-vcodec: potential dereference of null pointer
	media: imx: imx8mq-mipi-csi2: remove wrong irq config write operation
	media: imx: imx8mq-mipi_csi2: fix system resume
	media: bttv: fix WARNING regression on tunerless devices
	media: atmel: atmel-sama7g5-isc: fix ispck leftover
	ASoC: sh: rz-ssi: Drop calling rz_ssi_pio_recv() recursively
	ASoC: codecs: Check for error pointer after calling devm_regmap_init_mmio
	ASoC: xilinx: xlnx_formatter_pcm: Handle sysclk setting
	ASoC: simple-card-utils: Set sysclk on all components
	media: coda: Fix missing put_device() call in coda_get_vdoa_data
	media: meson: vdec: potential dereference of null pointer
	media: hantro: Fix overfill bottom register field name
	media: ov6650: Fix set format try processing path
	media: v4l: Avoid unaligned access warnings when printing 4cc modifiers
	media: ov5648: Don't pack controls struct
	media: aspeed: Correct value for h-total-pixels
	video: fbdev: matroxfb: set maxvram of vbG200eW to the same as vbG200 to avoid black screen
	video: fbdev: controlfb: Fix COMPILE_TEST build
	video: fbdev: smscufx: Fix null-ptr-deref in ufx_usb_probe()
	video: fbdev: atmel_lcdfb: fix an error code in atmel_lcdfb_probe()
	video: fbdev: fbcvt.c: fix printing in fb_cvt_print_name()
	ARM: dts: Fix OpenBMC flash layout label addresses
	firmware: qcom: scm: Remove reassignment to desc following initializer
	ARM: dts: qcom: ipq4019: fix sleep clock
	soc: qcom: rpmpd: Check for null return of devm_kcalloc
	soc: qcom: ocmem: Fix missing put_device() call in of_get_ocmem
	soc: qcom: aoss: remove spurious IRQF_ONESHOT flags
	arm64: dts: qcom: sdm845: fix microphone bias properties and values
	arm64: dts: qcom: sm8250: fix PCIe bindings to follow schema
	arm64: dts: broadcom: bcm4908: use proper TWD binding
	arm64: dts: qcom: sm8150: Correct TCS configuration for apps rsc
	arm64: dts: qcom: sm8350: Correct TCS configuration for apps rsc
	firmware: ti_sci: Fix compilation failure when CONFIG_TI_SCI_PROTOCOL is not defined
	soc: ti: wkup_m3_ipc: Fix IRQ check in wkup_m3_ipc_probe
	ARM: dts: sun8i: v3s: Move the csi1 block to follow address order
	vsprintf: Fix potential unaligned access
	ARM: dts: imx: Add missing LVDS decoder on M53Menlo
	media: mexon-ge2d: fixup frames size in registers
	media: video/hdmi: handle short reads of hdmi info frame.
	media: ti-vpe: cal: Fix a NULL pointer dereference in cal_ctx_v4l2_init_formats()
	media: em28xx: initialize refcount before kref_get
	media: usb: go7007: s2250-board: fix leak in probe()
	media: cedrus: H265: Fix neighbour info buffer size
	media: cedrus: h264: Fix neighbour info buffer size
	ASoC: codecs: rx-macro: fix accessing compander for aux
	ASoC: codecs: rx-macro: fix accessing array out of bounds for enum type
	ASoC: codecs: va-macro: fix accessing array out of bounds for enum type
	ASoC: codecs: wc938x: fix accessing array out of bounds for enum type
	ASoC: codecs: wcd938x: fix kcontrol max values
	ASoC: codecs: wcd934x: fix kcontrol max values
	ASoC: codecs: wcd934x: fix return value of wcd934x_rx_hph_mode_put
	media: v4l2-core: Initialize h264 scaling matrix
	media: ov5640: Fix set format, v4l2_mbus_pixelcode not updated
	selftests/lkdtm: Add UBSAN config
	lib: uninline simple_strntoull() as well
	vsprintf: Fix %pK with kptr_restrict == 0
	uaccess: fix nios2 and microblaze get_user_8()
	ASoC: rt5663: check the return value of devm_kzalloc() in rt5663_parse_dp()
	soc: mediatek: pm-domains: Add wakeup capacity support in power domain
	mmc: sdhci_am654: Fix the driver data of AM64 SoC
	ASoC: ti: davinci-i2s: Add check for clk_enable()
	ALSA: spi: Add check for clk_enable()
	arm64: dts: ns2: Fix spi-cpol and spi-cpha property
	arm64: dts: broadcom: Fix sata nodename
	printk: fix return value of printk.devkmsg __setup handler
	ASoC: mxs-saif: Handle errors for clk_enable
	ASoC: atmel_ssc_dai: Handle errors for clk_enable
	ASoC: dwc-i2s: Handle errors for clk_enable
	ASoC: soc-compress: prevent the potentially use of null pointer
	memory: emif: Add check for setup_interrupts
	memory: emif: check the pointer temp in get_device_details()
	ALSA: firewire-lib: fix uninitialized flag for AV/C deferred transaction
	arm64: dts: rockchip: Fix SDIO regulator supply properties on rk3399-firefly
	m68k: coldfire/device.c: only build for MCF_EDMA when h/w macros are defined
	media: stk1160: If start stream fails, return buffers with VB2_BUF_STATE_QUEUED
	media: vidtv: Check for null return of vzalloc
	ASoC: atmel: Add missing of_node_put() in at91sam9g20ek_audio_probe
	ASoC: wm8350: Handle error for wm8350_register_irq
	ASoC: fsi: Add check for clk_enable
	video: fbdev: omapfb: Add missing of_node_put() in dvic_probe_of
	media: saa7134: fix incorrect use to determine if list is empty
	ivtv: fix incorrect device_caps for ivtvfb
	ASoC: atmel: Fix error handling in snd_proto_probe
	ASoC: rockchip: i2s: Fix missing clk_disable_unprepare() in rockchip_i2s_probe
	ASoC: SOF: Add missing of_node_put() in imx8m_probe
	ASoC: mediatek: use of_device_get_match_data()
	ASoC: mediatek: mt8192-mt6359: Fix error handling in mt8192_mt6359_dev_probe
	ASoC: rk817: Fix missing clk_disable_unprepare() in rk817_platform_probe
	ASoC: dmaengine: do not use a NULL prepare_slave_config() callback
	ASoC: mxs: Fix error handling in mxs_sgtl5000_probe
	ASoC: fsl_spdif: Disable TX clock when stop
	ASoC: imx-es8328: Fix error return code in imx_es8328_probe()
	ASoC: SOF: Intel: enable DMI L1 for playback streams
	ASoC: msm8916-wcd-digital: Fix missing clk_disable_unprepare() in msm8916_wcd_digital_probe
	mmc: davinci_mmc: Handle error for clk_enable
	ASoC: atmel: Fix error handling in sam9x5_wm8731_driver_probe
	ASoC: msm8916-wcd-analog: Fix error handling in pm8916_wcd_analog_spmi_probe
	ASoC: codecs: wcd934x: Add missing of_node_put() in wcd934x_codec_parse_data
	ASoC: amd: Fix reference to PCM buffer address
	ARM: configs: multi_v5_defconfig: re-enable CONFIG_V4L_PLATFORM_DRIVERS
	ARM: configs: multi_v5_defconfig: re-enable DRM_PANEL and FB_xxx
	drm/meson: osd_afbcd: Add an exit callback to struct meson_afbcd_ops
	drm/meson: Make use of the helper function devm_platform_ioremap_resourcexxx()
	drm/meson: split out encoder from meson_dw_hdmi
	drm/meson: Fix error handling when afbcd.ops->init fails
	drm/bridge: Fix free wrong object in sii8620_init_rcp_input_dev
	drm/bridge: Add missing pm_runtime_disable() in __dw_mipi_dsi_probe
	drm/bridge: nwl-dsi: Fix PM disable depth imbalance in nwl_dsi_probe
	drm: bridge: adv7511: Fix ADV7535 HPD enablement
	ath10k: fix memory overwrite of the WoWLAN wakeup packet pattern
	drm/v3d/v3d_drv: Check for error num after setting mask
	drm/panfrost: Check for error num after setting mask
	libbpf: Fix possible NULL pointer dereference when destroying skeleton
	bpftool: Only set obj->skeleton on complete success
	udmabuf: validate ubuf->pagecount
	bpf: Fix UAF due to race between btf_try_get_module and load_module
	drm/selftests/test-drm_dp_mst_helper: Fix memory leak in sideband_msg_req_encode_decode
	selftests: bpf: Fix bind on used port
	Bluetooth: btintel: Fix WBS setting for Intel legacy ROM products
	Bluetooth: hci_serdev: call init_rwsem() before p->open()
	mtd: onenand: Check for error irq
	mtd: rawnand: gpmi: fix controller timings setting
	drm/edid: Don't clear formats if using deep color
	drm/edid: Split deep color modes between RGB and YUV444
	ionic: fix type complaint in ionic_dev_cmd_clean()
	ionic: start watchdog after all is setup
	ionic: Don't send reset commands if FW isn't running
	drm/nouveau/acr: Fix undefined behavior in nvkm_acr_hsfw_load_bl()
	drm/amd/display: Fix a NULL pointer dereference in amdgpu_dm_connector_add_common_modes()
	drm/amd/pm: return -ENOTSUPP if there is no get_dpm_ultimate_freq function
	net: phy: at803x: move page selection fix to config_init
	selftests/bpf: Normalize XDP section names in selftests
	selftests/bpf/test_xdp_redirect_multi: use temp netns for testing
	ath9k_htc: fix uninit value bugs
	RDMA/core: Set MR type in ib_reg_user_mr
	KVM: PPC: Fix vmx/vsx mixup in mmio emulation
	selftests/net: timestamping: Fix bind_phc check
	i40e: don't reserve excessive XDP_PACKET_HEADROOM on XSK Rx to skb
	i40e: respect metadata on XSK Rx to skb
	igc: don't reserve excessive XDP_PACKET_HEADROOM on XSK Rx to skb
	ixgbe: pass bi->xdp to ixgbe_construct_skb_zc() directly
	ixgbe: don't reserve excessive XDP_PACKET_HEADROOM on XSK Rx to skb
	ixgbe: respect metadata on XSK Rx to skb
	power: reset: gemini-poweroff: Fix IRQ check in gemini_poweroff_probe
	ray_cs: Check ioremap return value
	powerpc: dts: t1040rdb: fix ports names for Seville Ethernet switch
	KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init
	powerpc/perf: Don't use perf_hw_context for trace IMC PMU
	mt76: connac: fix sta_rec_wtbl tag len
	mt76: mt7915: use proper aid value in mt7915_mcu_wtbl_generic_tlv in sta mode
	mt76: mt7915: use proper aid value in mt7915_mcu_sta_basic_tlv
	mt76: mt7921: fix a leftover race in runtime-pm
	mt76: mt7615: fix a leftover race in runtime-pm
	mt76: mt7603: check sta_rates pointer in mt7603_sta_rate_tbl_update
	mt76: mt7615: check sta_rates pointer in mt7615_sta_rate_tbl_update
	ptp: unregister virtual clocks when unregistering physical clock.
	net: dsa: mv88e6xxx: Enable port policy support on 6097
	mac80211: Remove a couple of obsolete TODO
	mac80211: limit bandwidth in HE capabilities
	scripts/dtc: Call pkg-config POSIXly correct
	livepatch: Fix build failure on 32 bits processors
	net: asix: add proper error handling of usb read errors
	i2c: bcm2835: Use platform_get_irq() to get the interrupt
	i2c: bcm2835: Fix the error handling in 'bcm2835_i2c_probe()'
	mtd: mchp23k256: Add SPI ID table
	mtd: mchp48l640: Add SPI ID table
	igc: avoid kernel warning when changing RX ring parameters
	igb: refactor XDP registration
	PCI: aardvark: Fix reading MSI interrupt number
	PCI: aardvark: Fix reading PCI_EXP_RTSTA_PME bit on emulated bridge
	RDMA/rxe: Check the last packet by RXE_END_MASK
	libbpf: Fix signedness bug in btf_dump_array_data()
	cxl/core: Fix cxl_probe_component_regs() error message
	cxl/regs: Fix size of CXL Capability Header Register
	net:enetc: allocate CBD ring data memory using DMA coherent methods
	libbpf: Fix compilation warning due to mismatched printf format
	drm/bridge: dw-hdmi: use safe format when first in bridge chain
	libbpf: Use dynamically allocated buffer when receiving netlink messages
	power: supply: ab8500: Fix memory leak in ab8500_fg_sysfs_init
	HID: i2c-hid: fix GET/SET_REPORT for unnumbered reports
	iommu/ipmmu-vmsa: Check for error num after setting mask
	drm/bridge: anx7625: Fix overflow issue on reading EDID
	bpftool: Fix the error when lookup in no-btf maps
	drm/amd/pm: enable pm sysfs write for one VF mode
	drm/amd/display: Add affected crtcs to atomic state for dsc mst unplug
	libbpf: Fix memleak in libbpf_netlink_recv()
	IB/cma: Allow XRC INI QPs to set their local ACK timeout
	dax: make sure inodes are flushed before destroy cache
	selftests: mptcp: add csum mib check for mptcp_connect
	iwlwifi: mvm: Don't call iwl_mvm_sta_from_mac80211() with NULL sta
	iwlwifi: mvm: don't iterate unadded vifs when handling FW SMPS req
	iwlwifi: mvm: align locking in D3 test debugfs
	iwlwifi: yoyo: remove DBGI_SRAM address reset writing
	iwlwifi: Fix -EIO error code that is never returned
	iwlwifi: mvm: Fix an error code in iwl_mvm_up()
	mtd: rawnand: pl353: Set the nand chip node as the flash node
	drm/msm/dp: populate connector of struct dp_panel
	drm/msm/dp: stop link training after link training 2 failed
	drm/msm/dp: always add fail-safe mode into connector mode list
	drm/msm/dsi: Use "ref" fw clock instead of global name for VCO parent
	drm/msm/dsi/phy: fix 7nm v4.0 settings for C-PHY mode
	drm/msm/dpu: add DSPP blocks teardown
	drm/msm/dpu: fix dp audio condition
	dm crypt: fix get_key_size compiler warning if !CONFIG_KEYS
	vfio/pci: fix memory leak during D3hot to D0 transition
	vfio/pci: wake-up devices around reset functions
	scsi: fnic: Fix a tracing statement
	scsi: pm8001: Fix command initialization in pm80XX_send_read_log()
	scsi: pm8001: Fix command initialization in pm8001_chip_ssp_tm_req()
	scsi: pm8001: Fix payload initialization in pm80xx_set_thermal_config()
	scsi: pm8001: Fix le32 values handling in pm80xx_set_sas_protocol_timer_config()
	scsi: pm8001: Fix payload initialization in pm80xx_encrypt_update()
	scsi: pm8001: Fix le32 values handling in pm80xx_chip_ssp_io_req()
	scsi: pm8001: Fix le32 values handling in pm80xx_chip_sata_req()
	scsi: pm8001: Fix NCQ NON DATA command task initialization
	scsi: pm8001: Fix NCQ NON DATA command completion handling
	scsi: pm8001: Fix abort all task initialization
	RDMA/mlx5: Fix the flow of a miss in the allocation of a cache ODP MR
	drm/amd/display: Remove vupdate_int_entry definition
	TOMOYO: fix __setup handlers return values
	power: supply: sbs-charger: Don't cancel work that is not initialized
	ext2: correct max file size computing
	drm/tegra: Fix reference leak in tegra_dsi_ganged_probe
	power: supply: bq24190_charger: Fix bq24190_vbus_is_enabled() wrong false return
	scsi: hisi_sas: Change permission of parameter prot_mask
	drm/bridge: cdns-dsi: Make sure to to create proper aliases for dt
	bpf, arm64: Call build_prologue() first in first JIT pass
	bpf, arm64: Feed byte-offset into bpf line info
	xsk: Fix race at socket teardown
	RDMA/irdma: Fix netdev notifications for vlan's
	RDMA/irdma: Fix Passthrough mode in VM
	RDMA/irdma: Remove incorrect masking of PD
	gpu: host1x: Fix a memory leak in 'host1x_remove()'
	libbpf: Skip forward declaration when counting duplicated type names
	powerpc/mm/numa: skip NUMA_NO_NODE onlining in parse_numa_properties()
	powerpc/Makefile: Don't pass -mcpu=powerpc64 when building 32-bit
	KVM: x86: Fix emulation in writing cr8
	KVM: x86/emulator: Defer not-present segment check in __load_segment_descriptor()
	hv_balloon: rate-limit "Unhandled message" warning
	i2c: xiic: Make bus names unique
	power: supply: wm8350-power: Handle error for wm8350_register_irq
	power: supply: wm8350-power: Add missing free in free_charger_irq
	IB/hfi1: Allow larger MTU without AIP
	RDMA/core: Fix ib_qp_usecnt_dec() called when error
	PCI: Reduce warnings on possible RW1C corruption
	net: axienet: fix RX ring refill allocation failure handling
	drm/msm/a6xx: Fix missing ARRAY_SIZE() check
	mips: DEC: honor CONFIG_MIPS_FP_SUPPORT=n
	MIPS: Sanitise Cavium switch cases in TLB handler synthesizers
	powerpc/sysdev: fix incorrect use to determine if list is empty
	powerpc/64s: Don't use DSISR for SLB faults
	mfd: mc13xxx: Add check for mc13xxx_irq_request
	libbpf: Unmap rings when umem deleted
	selftests/bpf: Make test_lwt_ip_encap more stable and faster
	platform/x86: huawei-wmi: check the return value of device_create_file()
	scsi: mpt3sas: Fix incorrect 4GB boundary check
	powerpc: 8xx: fix a return value error in mpc8xx_pic_init
	vxcan: enable local echo for sent CAN frames
	ath10k: Fix error handling in ath10k_setup_msa_resources
	mips: cdmm: Fix refcount leak in mips_cdmm_phys_base
	MIPS: RB532: fix return value of __setup handler
	MIPS: pgalloc: fix memory leak caused by pgd_free()
	mtd: rawnand: atmel: fix refcount issue in atmel_nand_controller_init
	power: ab8500_chargalg: Use CLOCK_MONOTONIC
	RDMA/irdma: Prevent some integer underflows
	Revert "RDMA/core: Fix ib_qp_usecnt_dec() called when error"
	RDMA/mlx5: Fix memory leak in error flow for subscribe event routine
	bpf, sockmap: Fix memleak in sk_psock_queue_msg
	bpf, sockmap: Fix memleak in tcp_bpf_sendmsg while sk msg is full
	bpf, sockmap: Fix more uncharged while msg has more_data
	bpf, sockmap: Fix double uncharge the mem of sk_msg
	samples/bpf, xdpsock: Fix race when running for fix duration of time
	USB: storage: ums-realtek: fix error code in rts51x_read_mem()
	drm/i915/display: Fix HPD short pulse handling for eDP
	netfilter: flowtable: Fix QinQ and pppoe support for inet table
	mt76: mt7921: fix mt7921_queues_acq implementation
	can: isotp: sanitize CAN ID checks in isotp_bind()
	can: isotp: return -EADDRNOTAVAIL when reading from unbound socket
	can: isotp: support MSG_TRUNC flag when reading from socket
	bareudp: use ipv6_mod_enabled to check if IPv6 enabled
	ibmvnic: fix race between xmit and reset
	af_unix: Fix some data-races around unix_sk(sk)->oob_skb.
	selftests/bpf: Fix error reporting from sock_fields programs
	Bluetooth: hci_uart: add missing NULL check in h5_enqueue
	Bluetooth: call hci_le_conn_failed with hdev lock in hci_le_conn_failed
	Bluetooth: btmtksdio: Fix kernel oops in btmtksdio_interrupt
	ipv4: Fix route lookups when handling ICMP redirects and PMTU updates
	af_netlink: Fix shift out of bounds in group mask calculation
	i2c: meson: Fix wrong speed use from probe
	netfilter: conntrack: Add and use nf_ct_set_auto_assign_helper_warned()
	i2c: mux: demux-pinctrl: do not deactivate a master that is not active
	powerpc/pseries: Fix use after free in remove_phb_dynamic()
	selftests/bpf/test_lirc_mode2.sh: Exit with proper code
	PCI: Avoid broken MSI on SB600 USB devices
	net: bcmgenet: Use stronger register read/writes to assure ordering
	tcp: ensure PMTU updates are processed during fastopen
	openvswitch: always update flow key after nat
	net: dsa: fix panic on shutdown if multi-chip tree failed to probe
	tipc: fix the timer expires after interval 100ms
	mfd: asic3: Add missing iounmap() on error asic3_mfd_probe
	ice: fix 'scheduling while atomic' on aux critical err interrupt
	ice: don't allow to run ice_send_event_to_aux() in atomic ctx
	drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool
	kernel/resource: fix kfree() of bootmem memory again
	staging: r8188eu: convert DBG_88E_LEVEL call in hal/rtl8188e_hal_init.c
	staging: r8188eu: release_firmware is not called if allocation fails
	mxser: fix xmit_buf leak in activate when LSR == 0xff
	fsi: scom: Fix error handling
	fsi: scom: Remove retries in indirect scoms
	pwm: lpc18xx-sct: Initialize driver data and hardware before pwmchip_add()
	pps: clients: gpio: Propagate return value from pps_gpio_probe
	fsi: Aspeed: Fix a potential double free
	misc: alcor_pci: Fix an error handling path
	cpufreq: qcom-cpufreq-nvmem: fix reading of PVS Valid fuse
	soundwire: intel: fix wrong register name in intel_shim_wake
	clk: qcom: ipq8074: fix PCI-E clock oops
	dmaengine: idxd: check GENCAP config support for gencfg register
	dmaengine: idxd: change bandwidth token to read buffers
	dmaengine: idxd: restore traffic class defaults after wq reset
	iio: mma8452: Fix probe failing when an i2c_device_id is used
	serial: 8250_aspeed_vuart: add PORT_ASPEED_VUART port type
	staging:iio:adc:ad7280a: Fix handing of device address bit reversing.
	pinctrl: renesas: r8a77470: Reduce size for narrow VIN1 channel
	pinctrl: renesas: checker: Fix miscalculation of number of states
	clk: qcom: ipq8074: Use floor ops for SDCC1 clock
	phy: dphy: Correct lpx parameter and its derivatives(ta_{get,go,sure})
	phy: phy-brcm-usb: fixup BCM4908 support
	serial: 8250_mid: Balance reference count for PCI DMA device
	serial: 8250_lpss: Balance reference count for PCI DMA device
	NFS: Use of mapping_set_error() results in spurious errors
	serial: 8250: Fix race condition in RTS-after-send handling
	iio: adc: Add check for devm_request_threaded_irq
	habanalabs: Add check for pci_enable_device
	NFS: Return valid errors from nfs2/3_decode_dirent()
	staging: r8188eu: fix endless loop in recv_func
	dma-debug: fix return value of __setup handlers
	clk: imx7d: Remove audio_mclk_root_clk
	clk: imx: off by one in imx_lpcg_parse_clks_from_dt()
	clk: at91: sama7g5: fix parents of PDMCs' GCLK
	clk: qcom: clk-rcg2: Update logic to calculate D value for RCG
	clk: qcom: clk-rcg2: Update the frac table for pixel clock
	dmaengine: hisi_dma: fix MSI allocate fail when reload hisi_dma
	remoteproc: qcom: Fix missing of_node_put in adsp_alloc_memory_region
	remoteproc: qcom_wcnss: Add missing of_node_put() in wcnss_alloc_memory_region
	remoteproc: qcom_q6v5_mss: Fix some leaks in q6v5_alloc_memory_region
	nvdimm/region: Fix default alignment for small regions
	clk: actions: Terminate clk_div_table with sentinel element
	clk: loongson1: Terminate clk_div_table with sentinel element
	clk: hisilicon: Terminate clk_div_table with sentinel element
	clk: clps711x: Terminate clk_div_table with sentinel element
	clk: Fix clk_hw_get_clk() when dev is NULL
	clk: tegra: tegra124-emc: Fix missing put_device() call in emc_ensure_emc_driver
	mailbox: imx: fix crash in resume on i.mx8ulp
	NFS: remove unneeded check in decode_devicenotify_args()
	staging: mt7621-dts: fix LEDs and pinctrl on GB-PC1 devicetree
	staging: mt7621-dts: fix formatting
	staging: mt7621-dts: fix pinctrl properties for ethernet
	staging: mt7621-dts: fix GB-PC2 devicetree
	pinctrl: mediatek: Fix missing of_node_put() in mtk_pctrl_init
	pinctrl: mediatek: paris: Fix PIN_CONFIG_BIAS_* readback
	pinctrl: mediatek: paris: Fix "argument" argument type for mtk_pinconf_get()
	pinctrl: mediatek: paris: Fix pingroup pin config state readback
	pinctrl: mediatek: paris: Skip custom extra pin config dump for virtual GPIOs
	pinctrl: microchip sgpio: use reset driver
	pinctrl: microchip-sgpio: lock RMW access
	pinctrl: nomadik: Add missing of_node_put() in nmk_pinctrl_probe
	pinctrl/rockchip: Add missing of_node_put() in rockchip_pinctrl_probe
	tty: hvc: fix return value of __setup handler
	kgdboc: fix return value of __setup handler
	serial: 8250: fix XOFF/XON sending when DMA is used
	virt: acrn: obtain pa from VMA with PFNMAP flag
	virt: acrn: fix a memory leak in acrn_dev_ioctl()
	kgdbts: fix return value of __setup handler
	firmware: google: Properly state IOMEM dependency
	driver core: dd: fix return value of __setup handler
	jfs: fix divide error in dbNextAG
	netfilter: nf_conntrack_tcp: preserve liberal flag in tcp options
	SUNRPC don't resend a task on an offlined transport
	NFSv4.1: don't retry BIND_CONN_TO_SESSION on session error
	kdb: Fix the putarea helper function
	perf stat: Fix forked applications enablement of counters
	clk: qcom: gcc-msm8994: Fix gpll4 width
	vsock/virtio: initialize vdev->priv before using VQs
	vsock/virtio: read the negotiated features before using VQs
	vsock/virtio: enable VQs early on probe
	clk: Initialize orphan req_rate
	xen: fix is_xen_pmu()
	net: enetc: report software timestamping via SO_TIMESTAMPING
	net: hns3: fix bug when PF set the duplicate MAC address for VFs
	net: hns3: fix port base vlan add fail when concurrent with reset
	net: hns3: add vlan list lock to protect vlan list
	net: hns3: format the output of the MAC address
	net: hns3: refine the process when PF set VF VLAN
	net: phy: broadcom: Fix brcm_fet_config_init()
	selftests: test_vxlan_under_vrf: Fix broken test case
	NFS: Don't loop forever in nfs_do_recoalesce()
	net: hns3: clean residual vf config after disable sriov
	net: sparx5: depends on PTP_1588_CLOCK_OPTIONAL
	qlcnic: dcb: default to returning -EOPNOTSUPP
	net/x25: Fix null-ptr-deref caused by x25_disconnect
	net: sparx5: switchdev: fix possible NULL pointer dereference
	octeontx2-af: initialize action variable
	net: prefer nf_ct_put instead of nf_conntrack_put
	net/sched: act_ct: fix ref leak when switching zones
	NFSv4/pNFS: Fix another issue with a list iterator pointing to the head
	net: dsa: bcm_sf2_cfp: fix an incorrect NULL check on list iterator
	fs: fd tables have to be multiples of BITS_PER_LONG
	lib/test: use after free in register_test_dev_kmod()
	fs: fix fd table size alignment properly
	LSM: general protection fault in legacy_parse_param
	regulator: rpi-panel: Handle I2C errors/timing to the Atmel
	crypto: hisilicon/qm - cleanup warning in qm_vf_read_qos
	gcc-plugins/stackleak: Exactly match strings instead of prefixes
	pinctrl: npcm: Fix broken references to chip->parent_device
	rcu: Mark writes to the rcu_segcblist structure's ->flags field
	block/bfq_wf2q: correct weight to ioprio
	crypto: xts - Add softdep on ecb
	crypto: hisilicon/sec - not need to enable sm4 extra mode at HW V3
	block, bfq: don't move oom_bfqq
	selinux: use correct type for context length
	arm64: module: remove (NOLOAD) from linker script
	selinux: allow FIOCLEX and FIONCLEX with policy capability
	loop: use sysfs_emit() in the sysfs xxx show()
	Fix incorrect type in assignment of ipv6 port for audit
	irqchip/qcom-pdc: Fix broken locking
	irqchip/nvic: Release nvic_base upon failure
	fs/binfmt_elf: Fix AT_PHDR for unusual ELF files
	bfq: fix use-after-free in bfq_dispatch_request
	ACPICA: Avoid walking the ACPI Namespace if it is not there
	lib/raid6/test/Makefile: Use $(pound) instead of \# for Make 4.3
	Revert "Revert "block, bfq: honor already-setup queue merges""
	ACPI/APEI: Limit printable size of BERT table data
	PM: core: keep irq flags in device_pm_check_callbacks()
	parisc: Fix handling off probe non-access faults
	nvme-tcp: lockdep: annotate in-kernel sockets
	spi: tegra20: Use of_device_get_match_data()
	atomics: Fix atomic64_{read_acquire,set_release} fallbacks
	locking/lockdep: Iterate lock_classes directly when reading lockdep files
	ext4: correct cluster len and clusters changed accounting in ext4_mb_mark_bb
	ext4: fix ext4_mb_mark_bb() with flex_bg with fast_commit
	sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE
	ext4: don't BUG if someone dirty pages without asking ext4 first
	f2fs: fix to do sanity check on curseg->alloc_type
	NFSD: Fix nfsd_breaker_owns_lease() return values
	f2fs: don't get FREEZE lock in f2fs_evict_inode in frozen fs
	btrfs: harden identification of a stale device
	btrfs: make search_csum_tree return 0 if we get -EFBIG
	f2fs: use spin_lock to avoid hang
	f2fs: compress: fix to print raw data size in error path of lz4 decompression
	Adjust cifssb maximum read size
	ntfs: add sanity check on allocation size
	media: staging: media: zoran: move videodev alloc
	media: staging: media: zoran: calculate the right buffer number for zoran_reap_stat_com
	media: staging: media: zoran: fix various V4L2 compliance errors
	media: atmel: atmel-isc-base: report frame sizes as full supported range
	media: ir_toy: free before error exiting
	ASoC: sh: rz-ssi: Make the data structures available before registering the handlers
	ASoC: SOF: Intel: match sdw version on link_slaves_found
	media: imx-jpeg: Prevent decoding NV12M jpegs into single-planar buffers
	media: iommu/mediatek-v1: Free the existed fwspec if the master dev already has
	media: iommu/mediatek: Return ENODEV if the device is NULL
	media: iommu/mediatek: Add device_link between the consumer and the larb devices
	video: fbdev: nvidiafb: Use strscpy() to prevent buffer overflow
	video: fbdev: w100fb: Reset global state
	video: fbdev: cirrusfb: check pixclock to avoid divide by zero
	video: fbdev: omapfb: acx565akm: replace snprintf with sysfs_emit
	ARM: dts: qcom: fix gic_irq_domain_translate warnings for msm8960
	ARM: dts: bcm2837: Add the missing L1/L2 cache information
	ASoC: madera: Add dependencies on MFD
	media: atomisp_gmin_platform: Add DMI quirk to not turn AXP ELDO2 regulator off on some boards
	media: atomisp: fix dummy_ptr check to avoid duplicate active_bo
	ARM: ftrace: avoid redundant loads or clobbering IP
	ARM: dts: imx7: Use audio_mclk_post_div instead audio_mclk_root_clk
	arm64: defconfig: build imx-sdma as a module
	video: fbdev: omapfb: panel-dsi-cm: Use sysfs_emit() instead of snprintf()
	video: fbdev: omapfb: panel-tpo-td043mtea1: Use sysfs_emit() instead of snprintf()
	video: fbdev: udlfb: replace snprintf in show functions with sysfs_emit
	ARM: dts: bcm2711: Add the missing L1/L2 cache information
	ASoC: soc-core: skip zero num_dai component in searching dai name
	media: imx-jpeg: fix a bug of accessing array out of bounds
	media: cx88-mpeg: clear interrupt status register before streaming video
	uaccess: fix type mismatch warnings from access_ok()
	lib/test_lockup: fix kernel pointer check for separate address spaces
	ARM: tegra: tamonten: Fix I2C3 pad setting
	ARM: mmp: Fix failure to remove sram device
	ASoC: amd: vg: fix for pm resume callback sequence
	video: fbdev: sm712fb: Fix crash in smtcfb_write()
	media: i2c: ov5648: Fix lockdep error
	media: Revert "media: em28xx: add missing em28xx_close_extension"
	media: hdpvr: initialize dev->worker at hdpvr_register_videodev
	ASoC: Intel: sof_sdw: fix quirks for 2022 HP Spectre x360 13"
	tracing: Have TRACE_DEFINE_ENUM affect trace event types as well
	mmc: host: Return an error when ->enable_sdio_irq() ops is missing
	media: atomisp: fix bad usage at error handling logic
	ALSA: hda/realtek: Add alc256-samsung-headphone fixup
	KVM: x86: Reinitialize context if host userspace toggles EFER.LME
	KVM: x86/mmu: Move "invalid" check out of kvm_tdp_mmu_get_root()
	KVM: x86/mmu: Zap _all_ roots when unmapping gfn range in TDP MMU
	KVM: x86/mmu: Check for present SPTE when clearing dirty bit in TDP MMU
	KVM: x86: hyper-v: Drop redundant 'ex' parameter from kvm_hv_send_ipi()
	KVM: x86: hyper-v: Drop redundant 'ex' parameter from kvm_hv_flush_tlb()
	KVM: x86: hyper-v: Fix the maximum number of sparse banks for XMM fast TLB flush hypercalls
	KVM: x86: hyper-v: HVCALL_SEND_IPI_EX is an XMM fast hypercall
	powerpc/kasan: Fix early region not updated correctly
	powerpc/lib/sstep: Fix 'sthcx' instruction
	powerpc/lib/sstep: Fix build errors with newer binutils
	powerpc: Add set_memory_{p/np}() and remove set_memory_attr()
	powerpc: Fix build errors with newer binutils
	drm/dp: Fix off-by-one in register cache size
	drm/i915: Treat SAGV block time 0 as SAGV disabled
	drm/i915: Fix PSF GV point mask when SAGV is not possible
	drm/i915: Reject unsupported TMDS rates on ICL+
	scsi: qla2xxx: Refactor asynchronous command initialization
	scsi: qla2xxx: Implement ref count for SRB
	scsi: qla2xxx: Fix stuck session in gpdb
	scsi: qla2xxx: Fix warning message due to adisc being flushed
	scsi: qla2xxx: Fix scheduling while atomic
	scsi: qla2xxx: Fix premature hw access after PCI error
	scsi: qla2xxx: Fix wrong FDMI data for 64G adapter
	scsi: qla2xxx: Fix warning for missing error code
	scsi: qla2xxx: Fix device reconnect in loop topology
	scsi: qla2xxx: edif: Fix clang warning
	scsi: qla2xxx: Fix T10 PI tag escape and IP guard options for 28XX adapters
	scsi: qla2xxx: Add devids and conditionals for 28xx
	scsi: qla2xxx: Check for firmware dump already collected
	scsi: qla2xxx: Suppress a kernel complaint in qla_create_qpair()
	scsi: qla2xxx: Fix disk failure to rediscover
	scsi: qla2xxx: Fix incorrect reporting of task management failure
	scsi: qla2xxx: Fix hang due to session stuck
	scsi: qla2xxx: Fix missed DMA unmap for NVMe ls requests
	scsi: qla2xxx: Fix N2N inconsistent PLOGI
	scsi: qla2xxx: Fix stuck session of PRLI reject
	scsi: qla2xxx: Reduce false trigger to login
	scsi: qla2xxx: Use correct feature type field during RFF_ID processing
	platform: chrome: Split trace include file
	KVM: x86: Check lapic_in_kernel() before attempting to set a SynIC irq
	KVM: x86: Avoid theoretical NULL pointer dereference in kvm_irq_delivery_to_apic_fast()
	KVM: x86: Forbid VMM to set SYNIC/STIMER MSRs when SynIC wasn't activated
	KVM: Prevent module exit until all VMs are freed
	KVM: x86: fix sending PV IPI
	KVM: SVM: fix panic on out-of-bounds guest IRQ
	ubifs: rename_whiteout: Fix double free for whiteout_ui->data
	ubifs: Fix deadlock in concurrent rename whiteout and inode writeback
	ubifs: Add missing iput if do_tmpfile() failed in rename whiteout
	ubifs: Rename whiteout atomically
	ubifs: Fix 'ui->dirty' race between do_tmpfile() and writeback work
	ubifs: Rectify space amount budget for mkdir/tmpfile operations
	ubifs: setflags: Make dirtied_ino_d 8 bytes aligned
	ubifs: Fix read out-of-bounds in ubifs_wbuf_write_nolock()
	ubifs: Fix to add refcount once page is set private
	ubifs: rename_whiteout: correct old_dir size computing
	nvme: allow duplicate NSIDs for private namespaces
	nvme: fix the read-only state for zoned namespaces with unsupposed features
	wireguard: queueing: use CFI-safe ptr_ring cleanup function
	wireguard: socket: free skb in send6 when ipv6 is disabled
	wireguard: socket: ignore v6 endpoints when ipv6 is disabled
	XArray: Fix xas_create_range() when multi-order entry present
	can: mcba_usb: mcba_usb_start_xmit(): fix double dev_kfree_skb in error path
	can: mcba_usb: properly check endpoint type
	can: mcp251xfd: mcp251xfd_register_get_dev_id(): fix return of error value
	XArray: Update the LRU list in xas_split()
	modpost: restore the warning message for missing symbol versions
	rtc: check if __rtc_read_time was successful
	gfs2: gfs2_setattr_size error path fix
	gfs2: Make sure FITRIM minlen is rounded up to fs block size
	net: hns3: fix the concurrency between functions reading debugfs
	net: hns3: fix software vlan talbe of vlan 0 inconsistent with hardware
	rxrpc: fix some null-ptr-deref bugs in server_key.c
	rxrpc: Fix call timer start racing with call destruction
	mailbox: imx: fix wakeup failure from freeze mode
	crypto: arm/aes-neonbs-cbc - Select generic cbc and aes
	watch_queue: Free the page array when watch_queue is dismantled
	pinctrl: pinconf-generic: Print arguments for bias-pull-*
	watchdog: rti-wdt: Add missing pm_runtime_disable() in probe function
	net: sparx5: uses, depends on BRIDGE or !BRIDGE
	pinctrl: nuvoton: npcm7xx: Rename DS() macro to DSTR()
	pinctrl: nuvoton: npcm7xx: Use %zu printk format for ARRAY_SIZE()
	ASoC: mediatek: mt6358: add missing EXPORT_SYMBOLs
	ubi: Fix race condition between ctrl_cdev_ioctl and ubi_cdev_ioctl
	ARM: iop32x: offset IRQ numbers by 1
	block: Fix the maximum minor value is blk_alloc_ext_minor()
	io_uring: fix memory leak of uid in files registration
	riscv module: remove (NOLOAD)
	ACPI: CPPC: Avoid out of bounds access when parsing _CPC data
	vhost: handle error while adding split ranges to iotlb
	spi: Fix Tegra QSPI example
	platform/chrome: cros_ec_typec: Check for EC device
	can: isotp: restore accidentally removed MSG_PEEK feature
	proc: bootconfig: Add null pointer check
	drm/connector: Fix typo in documentation
	scsi: qla2xxx: Add qla2x00_async_done() for async routines
	staging: mt7621-dts: fix pinctrl-0 items to be size-1 items on ethernet
	arm64: mm: Drop 'const' from conditional arm64_dma_phys_limit definition
	ASoC: soc-compress: Change the check for codec_dai
	Reinstate some of "swiotlb: rework "fix info leak with DMA_FROM_DEVICE""
	tracing: Have type enum modifications copy the strings
	net: add skb_set_end_offset() helper
	net: preserve skb_end_offset() in skb_unclone_keeptruesize()
	mm/mmap: return 1 from stack_guard_gap __setup() handler
	ARM: 9187/1: JIVE: fix return value of __setup handler
	mm/memcontrol: return 1 from cgroup.memory __setup() handler
	mm/usercopy: return 1 from hardened_usercopy __setup() handler
	af_unix: Support POLLPRI for OOB.
	bpf: Adjust BPF stack helper functions to accommodate skip > 0
	bpf: Fix comment for helper bpf_current_task_under_cgroup()
	mmc: rtsx: Use pm_runtime_{get,put}() to handle runtime PM
	dt-bindings: mtd: nand-controller: Fix the reg property description
	dt-bindings: mtd: nand-controller: Fix a comment in the examples
	dt-bindings: spi: mxic: The interrupt property is not mandatory
	dt-bindings: memory: mtk-smi: No need mediatek,larb-id for mt8167
	dt-bindings: pinctrl: pinctrl-microchip-sgpio: Fix example
	ubi: fastmap: Return error code if memory allocation fails in add_aeb()
	ASoC: SOF: Intel: Fix build error without SND_SOC_SOF_PCI_DEV
	ASoC: topology: Allow TLV control to be either read or write
	perf vendor events: Update metrics for SkyLake Server
	media: ov6650: Add try support to selection API operations
	media: ov6650: Fix crop rectangle affected by set format
	spi: mediatek: support tick_delay without enhance_timing
	ARM: dts: spear1340: Update serial node properties
	ARM: dts: spear13xx: Update SPI dma properties
	arm64: dts: ls1043a: Update i2c dma properties
	arm64: dts: ls1046a: Update i2c node dma properties
	um: Fix uml_mconsole stop/go
	docs: sysctl/kernel: add missing bit to panic_print
	openvswitch: Fixed nd target mask field in the flow dump.
	torture: Make torture.sh help message match reality
	n64cart: convert bi_disk to bi_bdev->bd_disk fix build
	mmc: rtsx: Let MMC core handle runtime PM
	mmc: rtsx: Fix build errors/warnings for unused variable
	KVM: x86/mmu: do compare-and-exchange of gPTE via the user address
	iommu/dma: Skip extra sync during unmap w/swiotlb
	iommu/dma: Fold _swiotlb helpers into callers
	iommu/dma: Check CONFIG_SWIOTLB more broadly
	swiotlb: Support aligned swiotlb buffers
	iommu/dma: Account for min_align_mask w/swiotlb
	coredump: Snapshot the vmas in do_coredump
	coredump: Remove the WARN_ON in dump_vma_snapshot
	coredump/elf: Pass coredump_params into fill_note_info
	coredump: Use the vma snapshot in fill_files_note
	PCI: xgene: Revert "PCI: xgene: Use inbound resources for setup"
	Linux 5.15.33

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Id62bd8a22d0bfa7c2096539d253ffce804bed017
2022-04-20 08:18:54 +02:00
Charan Teja Kalla
d4835551fd Revert "mm: madvise: skip unmapped vma holes passed to process_madvise"
commit e6b0a7b357659c332231621e4315658d062c23ee upstream.

This reverts commit 08095d6310a7 ("mm: madvise: skip unmapped vma holes
passed to process_madvise") as process_madvise() fails to return the
exact processed bytes in other cases too.

As an example: if process_madvise() hits mlocked pages after processing
some initial bytes passed in [start, end), it just returns EINVAL
although some bytes are processed.  Thus making an exception only for
ENOMEM is partially fixing the problem of returning the proper advised
bytes.

Thus revert this patch and return proper bytes advised.

Link: https://lkml.kernel.org/r/e73da1304a88b6a8a11907045117cccf4c2b8374.1648046642.git.quic_charante@quicinc.com
Fixes: 08095d6310a7ce ("mm: madvise: skip unmapped vma holes passed to process_madvise")
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-08 14:22:56 +02:00
Charan Teja Kalla
a07a4b75cc mm: madvise: return correct bytes advised with process_madvise
commit 5bd009c7c9a9e888077c07535dc0c70aeab242c3 upstream.

Patch series "mm: madvise: return correct bytes processed with
process_madvise", v2.  With the process_madvise(), always choose to return
non zero processed bytes over an error.  This can help the user to know on
which VMA, passed in the 'struct iovec' vector list, is failed to advise
thus can take the decission of retrying/skipping on that VMA.

This patch (of 2):

The process_madvise() system call returns error even after processing some
VMA's passed in the 'struct iovec' vector list which leaves the user
confused to know where to restart the advise next.  It is also against
this syscall man page[1] documentation where it mentions that "return
value may be less than the total number of requested bytes, if an error
occurred after some iovec elements were already processed.".

Consider a user passed 10 VMA's in the 'struct iovec' vector list of which
9 are processed but one.  Then it just returns the error caused on that
failed VMA despite the first 9 VMA's processed, leaving the user confused
about on which VMA it is failed.  Returning the number of bytes processed
here can help the user to know which VMA it is failed on and thus can
retry/skip the advise on that VMA.

[1]https://man7.org/linux/man-pages/man2/process_madvise.2.html.

Link: https://lkml.kernel.org/r/cover.1647008754.git.quic_charante@quicinc.com
Link: https://lkml.kernel.org/r/125b61a0edcee5c2db8658aed9d06a43a19ccafc.1647008754.git.quic_charante@quicinc.com
Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-08 14:22:56 +02:00
Charan Teja Kalla
27d96f11b0 mm: madvise: skip unmapped vma holes passed to process_madvise
commit 08095d6310a7ce43256b4251577bc66a25c6e1a6 upstream.

The process_madvise() system call is expected to skip holes in vma passed
through 'struct iovec' vector list.  But do_madvise, which
process_madvise() calls for each vma, returns ENOMEM in case of unmapped
holes, despite the VMA is processed.

Thus process_madvise() should treat ENOMEM as expected and consider the
VMA passed to as processed and continue processing other vma's in the
vector list.  Returning -ENOMEM to user, despite the VMA is processed,
will be unable to figure out where to start the next madvise.

Link: https://lkml.kernel.org/r/4f091776142f2ebf7b94018146de72318474e686.1647008754.git.quic_charante@quicinc.com
Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-08 14:22:56 +02:00
Suren Baghdasaryan
e9eea2a170 UPSTREAM: mm: fix use-after-free when anon vma name is used after vma is freed
When adjacent vmas are being merged it can result in the vma that was
originally passed to madvise_update_vma being destroyed.  In the current
implementation, the name parameter passed to madvise_update_vma points
directly to vma->anon_name and it is used after the call to vma_merge.
In the cases when vma_merge merges the original vma and destroys it,
this might result in UAF.  For that the original vma would have to hold
the anon_vma_name with the last reference.  The following vma would need
to contain a different anon_vma_name object with the same string.  Such
scenario is shown below:

madvise_vma_behavior(vma)
  madvise_update_vma(vma, ..., anon_name == vma->anon_name)
    vma_merge(vma)
      __vma_adjust(vma) <-- merges vma with adjacent one
        vm_area_free(vma) <-- frees the original vma
    replace_vma_anon_name(anon_name) <-- UAF of vma->anon_name

Fix this by raising the name refcount and stabilizing it.

Link: https://lkml.kernel.org/r/20220224231834.1481408-3-surenb@google.com
Link: https://lkml.kernel.org/r/20220223153613.835563-3-surenb@google.com
Fixes: 9a10064f5625 ("mm: add a field to store names for private anonymous memory")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: syzbot+aa7b3d4b35f9dc46a366@syzkaller.appspotmail.com
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexey Gladkov <legion@kernel.org>
Cc: Chris Hyser <chris.hyser@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Colin Cross <ccross@google.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xiaofeng Cao <caoxiaofeng@yulong.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 942341dcc5748d9c1fc7009a359fc1916bfe0ef0)

Bug: 218352794
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I07e3cbff2eaa69a0d56281537510f7a42feaaf09
2022-03-24 18:44:46 -07:00
Suren Baghdasaryan
6e2654ba49 UPSTREAM: mm: prevent vm_area_struct::anon_name refcount saturation
A deep process chain with many vmas could grow really high.  With
default sysctl_max_map_count (64k) and default pid_max (32k) the max
number of vmas in the system is 2147450880 and the refcounter has
headroom of 1073774592 before it reaches REFCOUNT_SATURATED
(3221225472).

Therefore it's unlikely that an anonymous name refcounter will overflow
with these defaults.  Currently the max for pid_max is PID_MAX_LIMIT
(4194304) and for sysctl_max_map_count it's INT_MAX (2147483647).  In
this configuration anon_vma_name refcount overflow becomes theoretically
possible (that still require heavy sharing of that anon_vma_name between
processes).

kref refcounting interface used in anon_vma_name structure will detect a
counter overflow when it reaches REFCOUNT_SATURATED value but will only
generate a warning and freeze the ref counter.  This would lead to the
refcounted object never being freed.  A determined attacker could leak
memory like that but it would be rather expensive and inefficient way to
do so.

To ensure anon_vma_name refcount does not overflow, stop anon_vma_name
sharing when the refcount reaches REFCOUNT_MAX (2147483647), which still
leaves INT_MAX/2 (1073741823) values before the counter reaches
REFCOUNT_SATURATED.  This should provide enough headroom for raising the
refcounts temporarily.

Link: https://lkml.kernel.org/r/20220223153613.835563-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexey Gladkov <legion@kernel.org>
Cc: Chris Hyser <chris.hyser@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Colin Cross <ccross@google.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xiaofeng Cao <caoxiaofeng@yulong.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 96403e11283def1d1c465c8279514c9a504d8630)

Bug: 218352794
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ieaab58f6300d9aff3139eed1c1d3417237d81955
2022-03-24 18:44:43 -07:00
Suren Baghdasaryan
0fd37220d8 UPSTREAM: mm: refactor vm_area_struct::anon_vma_name usage code
Avoid mixing strings and their anon_vma_name referenced pointers by
using struct anon_vma_name whenever possible.  This simplifies the code
and allows easier sharing of anon_vma_name structures when they
represent the same name.

[surenb@google.com: fix comment]

Link: https://lkml.kernel.org/r/20220223153613.835563-1-surenb@google.com
Link: https://lkml.kernel.org/r/20220224231834.1481408-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Colin Cross <ccross@google.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Alexey Gladkov <legion@kernel.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Chris Hyser <chris.hyser@oracle.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Xiaofeng Cao <caoxiaofeng@yulong.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 5c26f6ac9416b63d093e29c30e79b3297e425472)

Bug: 218352794
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I4a6b5602ce7151d1a4b88fac489f86d68089bd4d
2022-03-24 18:44:39 -07:00
Suren Baghdasaryan
7ff2a03673 Revert "FROMGIT: mm: fix use-after-free when anon vma name is used after vma is freed"
This reverts commit 91b7584411.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ic9b7399bc4649e68cad9ac7017225262a2dea579
2022-03-24 18:44:32 -07:00
Suren Baghdasaryan
91b7584411 FROMGIT: mm: fix use-after-free when anon vma name is used after vma is freed
When adjacent vmas are being merged it can result in the vma that was
originally passed to madvise_update_vma being destroyed.  In the current
implementation, the name parameter passed to madvise_update_vma points
directly to vma->anon_name->name and it is used after the call to
vma_merge.  In the cases when vma_merge merges the original vma and
destroys it, this will result in use-after-free bug as shown below:

madvise_vma_behavior << passes vma->anon_name->name as name param
  madvise_update_vma(name)
    vma_merge
      __vma_adjust
        vm_area_free <-- frees the vma
    replace_vma_anon_name(name) <-- UAF

Fix this by raising the name refcount and stabilizing it. Introduce
vma_anon_name_{get/put} API for this purpose.

Link: https://lkml.kernel.org/r/20220211013032.623763-1-surenb@google.com
Fixes: 9a10064f5625 ("mm: add a field to store names for private anonymous memory")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: syzbot+aa7b3d4b35f9dc46a366@syzkaller.appspotmail.com
Cc: Colin Cross <ccross@google.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Alexey Gladkov <legion@kernel.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Chris Hyser <chris.hyser@oracle.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Xiaofeng Cao <caoxiaofeng@yulong.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

(cherry picked from commit 8cadf02b8d6dc771f05f5934c12e751481964039
 https://github.com/hnaz/linux-mm.git master)
Bug: 218352794
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I30c00a89109ee842d40e5f4b21a35c07ad695903
2022-02-16 16:52:46 +00:00
Arnd Bergmann
049413278d UPSTREAM: mm: move anon_vma declarations to linux/mm_inline.h
The patch to add anonymous vma names causes a build failure in some
configurations:

  include/linux/mm_types.h: In function 'is_same_vma_anon_name':
  include/linux/mm_types.h:924:37: error: implicit declaration of function 'strcmp' [-Werror=implicit-function-declaration]
    924 |         return name && vma_name && !strcmp(name, vma_name);
        |                                     ^~~~~~
  include/linux/mm_types.h:22:1: note: 'strcmp' is defined in header '<string.h>'; did you forget to '#include <string.h>'?

This should not really be part of linux/mm_types.h in the first place,
as that header is meant to only contain structure defintions and need a
minimum set of indirect includes itself.

While the header clearly includes more than it should at this point,
let's not make it worse by including string.h as well, which would pull
in the expensive (compile-speed wise) fortify-string logic.

Move the new functions into a separate header that only needs to be
included in a couple of locations.

Link: https://lkml.kernel.org/r/20211207125710.2503446-1-arnd@kernel.org
Fixes: "mm: add a field to store names for private anonymous memory"
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Colin Cross <ccross@google.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 17fca131cee21724ee953a17c185c14e9533af5b)

Bug: 120441514
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I54719d7ea27d3cf53ef7245b2af88d2a2bc9bafe
2022-01-18 16:01:48 -08:00
Suren Baghdasaryan
3f4e41d409 UPSTREAM: mm: add anonymous vma name refcounting
While forking a process with high number (64K) of named anonymous vmas
the overhead caused by strdup() is noticeable.  Experiments with ARM64
Android device show up to 40% performance regression when forking a
process with 64k unpopulated anonymous vmas using the max name lengths
vs the same process with the same number of anonymous vmas having no
name.

Introduce anon_vma_name refcounted structure to avoid the overhead of
copying vma names during fork() and when splitting named anonymous vmas.

When a vma is duplicated, instead of copying the name we increment the
refcount of this structure.  Multiple vmas can point to the same
anon_vma_name as long as they increment the refcount.  The name member
of anon_vma_name structure is assigned at structure allocation time and
is never changed.  If vma name changes then the refcount of the original
structure is dropped, a new anon_vma_name structure is allocated to hold
the new name and the vma pointer is updated to point to the new
structure.

With this approach the fork() performance regressions is reduced 3-4x
times and with usecases using more reasonable number of VMAs (a few
thousand) the regressions is not measurable.

Link: https://lkml.kernel.org/r/20211019215511.3771969-3-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Colin Cross <ccross@google.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Glauber <jan.glauber@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Landley <rob@landley.net>
Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
Cc: Shaohua Li <shli@fusionio.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 78db3412833dc9c479cd17412035f216cfd01a29)

Bug: 120441514
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I4b6d63b1aced3813ebb91479f4bcfd0d89e8fa29
2022-01-18 15:59:04 -08:00
Colin Cross
301c56064d UPSTREAM: mm: add a field to store names for private anonymous memory
In many userspace applications, and especially in VM based applications
like Android uses heavily, there are multiple different allocators in
use.  At a minimum there is libc malloc and the stack, and in many cases
there are libc malloc, the stack, direct syscalls to mmap anonymous
memory, and multiple VM heaps (one for small objects, one for big
objects, etc.).  Each of these layers usually has its own tools to
inspect its usage; malloc by compiling a debug version, the VM through
heap inspection tools, and for direct syscalls there is usually no way
to track them.

On Android we heavily use a set of tools that use an extended version of
the logic covered in Documentation/vm/pagemap.txt to walk all pages
mapped in userspace and slice their usage by process, shared (COW) vs.
unique mappings, backing, etc.  This can account for real physical
memory usage even in cases like fork without exec (which Android uses
heavily to share as many private COW pages as possible between
processes), Kernel SamePage Merging, and clean zero pages.  It produces
a measurement of the pages that only exist in that process (USS, for
unique), and a measurement of the physical memory usage of that process
with the cost of shared pages being evenly split between processes that
share them (PSS).

If all anonymous memory is indistinguishable then figuring out the real
physical memory usage (PSS) of each heap requires either a pagemap
walking tool that can understand the heap debugging of every layer, or
for every layer's heap debugging tools to implement the pagemap walking
logic, in which case it is hard to get a consistent view of memory
across the whole system.

Tracking the information in userspace leads to all sorts of problems.
It either needs to be stored inside the process, which means every
process has to have an API to export its current heap information upon
request, or it has to be stored externally in a filesystem that somebody
needs to clean up on crashes.  It needs to be readable while the process
is still running, so it has to have some sort of synchronization with
every layer of userspace.  Efficiently tracking the ranges requires
reimplementing something like the kernel vma trees, and linking to it
from every layer of userspace.  It requires more memory, more syscalls,
more runtime cost, and more complexity to separately track regions that
the kernel is already tracking.

This patch adds a field to /proc/pid/maps and /proc/pid/smaps to show a
userspace-provided name for anonymous vmas.  The names of named
anonymous vmas are shown in /proc/pid/maps and /proc/pid/smaps as
[anon:<name>].

Userspace can set the name for a region of memory by calling

   prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name)

Setting the name to NULL clears it.  The name length limit is 80 bytes
including NUL-terminator and is checked to contain only printable ascii
characters (including space), except '[',']','\','$' and '`'.

Ascii strings are being used to have a descriptive identifiers for vmas,
which can be understood by the users reading /proc/pid/maps or
/proc/pid/smaps.  Names can be standardized for a given system and they
can include some variable parts such as the name of the allocator or a
library, tid of the thread using it, etc.

The name is stored in a pointer in the shared union in vm_area_struct
that points to a null terminated string.  Anonymous vmas with the same
name (equivalent strings) and are otherwise mergeable will be merged.
The name pointers are not shared between vmas even if they contain the
same name.  The name pointer is stored in a union with fields that are
only used on file-backed mappings, so it does not increase memory usage.

CONFIG_ANON_VMA_NAME kernel configuration is introduced to enable this
feature.  It keeps the feature disabled by default to prevent any
additional memory overhead and to avoid confusing procfs parsers on
systems which are not ready to support named anonymous vmas.

The patch is based on the original patch developed by Colin Cross, more
specifically on its latest version [1] posted upstream by Sumit Semwal.
It used a userspace pointer to store vma names.  In that design, name
pointers could be shared between vmas.  However during the last
upstreaming attempt, Kees Cook raised concerns [2] about this approach
and suggested to copy the name into kernel memory space, perform
validity checks [3] and store as a string referenced from
vm_area_struct.

One big concern is about fork() performance which would need to strdup
anonymous vma names.  Dave Hansen suggested experimenting with
worst-case scenario of forking a process with 64k vmas having longest
possible names [4].  I ran this experiment on an ARM64 Android device
and recorded a worst-case regression of almost 40% when forking such a
process.

This regression is addressed in the followup patch which replaces the
pointer to a name with a refcounted structure that allows sharing the
name pointer between vmas of the same name.  Instead of duplicating the
string during fork() or when splitting a vma it increments the refcount.

[1] https://lore.kernel.org/linux-mm/20200901161459.11772-4-sumit.semwal@linaro.org/
[2] https://lore.kernel.org/linux-mm/202009031031.D32EF57ED@keescook/
[3] https://lore.kernel.org/linux-mm/202009031022.3834F692@keescook/
[4] https://lore.kernel.org/linux-mm/5d0358ab-8c47-2f5f-8e43-23b89d6a8e95@intel.com/

Changes for prctl(2) manual page (in the options section):

PR_SET_VMA
	Sets an attribute specified in arg2 for virtual memory areas
	starting from the address specified in arg3 and spanning the
	size specified	in arg4. arg5 specifies the value of the attribute
	to be set. Note that assigning an attribute to a virtual memory
	area might prevent it from being merged with adjacent virtual
	memory areas due to the difference in that attribute's value.

	Currently, arg2 must be one of:

	PR_SET_VMA_ANON_NAME
		Set a name for anonymous virtual memory areas. arg5 should
		be a pointer to a null-terminated string containing the
		name. The name length including null byte cannot exceed
		80 bytes. If arg5 is NULL, the name of the appropriate
		anonymous virtual memory areas will be reset. The name
		can contain only printable ascii characters (including
                space), except '[',']','\','$' and '`'.

                This feature is available only if the kernel is built with
                the CONFIG_ANON_VMA_NAME option enabled.

[surenb@google.com: docs: proc.rst: /proc/PID/maps: fix malformed table]
  Link: https://lkml.kernel.org/r/20211123185928.2513763-1-surenb@google.com
[surenb: rebased over v5.15-rc6, replaced userpointer with a kernel copy,
 added input sanitization and CONFIG_ANON_VMA_NAME config. The bulk of the
 work here was done by Colin Cross, therefore, with his permission, keeping
 him as the author]

Link: https://lkml.kernel.org/r/20211019215511.3771969-2-surenb@google.com
Signed-off-by: Colin Cross <ccross@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Glauber <jan.glauber@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Landley <rob@landley.net>
Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
Cc: Shaohua Li <shli@fusionio.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 9a10064f5625d5572c3626c1516e0bebc6c9fe9b)

Bug: 120441514
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I53d56d551a7d62f75341304751814294b447c04e
2022-01-18 15:30:27 -08:00
Colin Cross
730a9d73ab UPSTREAM: mm: rearrange madvise code to allow for reuse
Patch series "mm: rearrange madvise code to allow for reuse", v11.

Avoid performance regression of the new anon vma name field refcounting it.

I checked the image sizes with allnoconfig builds:

  unpatched Linus' ToT
     text    data     bss     dec     hex filename
  1324759      32   73928 1398719 1557bf vmlinux

  After the first patch is applied (madvise refactoring)
     text    data     bss     dec     hex filename
  1322346      32   73928 1396306 154e52 vmlinux
  >>> 2413 bytes decrease vs ToT <<<

  After all patches applied with CONFIG_ANON_VMA_NAME=n
     text    data     bss     dec     hex filename
  1322337      32   73928 1396297 154e49 vmlinux
  >>> 2422 bytes decrease vs ToT <<<

  After all patches applied with CONFIG_ANON_VMA_NAME=y
     text    data     bss     dec     hex filename
  1325228      32   73928 1399188 155994 vmlinux
  >>> 469 bytes increase vs ToT <<<

This patch (of 3):

Refactor the madvise syscall to allow for parts of it to be reused by a
prctl syscall that affects vmas.

Move the code that walks vmas in a virtual address range into a function
that takes a function pointer as a parameter.  The only caller for now
is sys_madvise, which uses it to call madvise_vma_behavior on each vma,
but the next patch will add an additional caller.

Move handling all vma behaviors inside madvise_behavior, and rename it
to madvise_vma_behavior.

Move the code that updates the flags on a vma, including splitting or
merging the vma as necessary, into a new function called
madvise_update_vma.  The next patch will add support for updating a new
anon_name field as well.

Link: https://lkml.kernel.org/r/20211019215511.3771969-1-surenb@google.com
Signed-off-by: Colin Cross <ccross@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jan Glauber <jan.glauber@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Rob Landley <rob@landley.net>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Shaohua Li <shli@fusionio.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit ac1e9acc5acf0b41d54de6a4c45471644f8b97ff)

Bug: 120441514
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: If96c14ca3acc3795de373d658ba0a940dda68e1c
2022-01-18 14:57:50 -08:00
Suren Baghdasaryan
f355f9635d Revert "ANDROID: mm: add a field to store names for private anonymous memory"
This reverts commit 60500a4228.
Replacing out-of-tree implementation with the upstream one.

Bug: 120441514
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ic34c8e16d51ccf9f00cb59d2de341e911bcb2828
2022-01-18 14:54:47 -08:00
Greg Kroah-Hartman
c2b303f98f Merge 4e71add028 ("Merge branch 'stable/for-linus-5.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft") into android-mainline
Steps on the way to 5.15-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib3f181326491eb896547d802a6f0a1b3be54ce28
2021-09-14 14:35:23 +02:00
Linus Torvalds
14726903c8 Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "173 patches.

  Subsystems affected by this series: ia64, ocfs2, block, and mm (debug,
  pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
  bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure,
  hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock,
  oom-kill, migration, ksm, percpu, vmstat, and madvise)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (173 commits)
  mm/madvise: add MADV_WILLNEED to process_madvise()
  mm/vmstat: remove unneeded return value
  mm/vmstat: simplify the array size calculation
  mm/vmstat: correct some wrong comments
  mm/percpu,c: remove obsolete comments of pcpu_chunk_populated()
  selftests: vm: add COW time test for KSM pages
  selftests: vm: add KSM merging time test
  mm: KSM: fix data type
  selftests: vm: add KSM merging across nodes test
  selftests: vm: add KSM zero page merging test
  selftests: vm: add KSM unmerge test
  selftests: vm: add KSM merge test
  mm/migrate: correct kernel-doc notation
  mm: wire up syscall process_mrelease
  mm: introduce process_mrelease system call
  memblock: make memblock_find_in_range method private
  mm/mempolicy.c: use in_task() in mempolicy_slab_node()
  mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies
  mm/mempolicy: advertise new MPOL_PREFERRED_MANY
  mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY
  ...
2021-09-03 10:08:28 -07:00
zhangkui
d5fffc5aff mm/madvise: add MADV_WILLNEED to process_madvise()
There is a usecase in Android that an app process's memory is swapped out
by process_madvise() with MADV_PAGEOUT, such as the memory is swapped to
zram or a backing device.  When the process is scheduled to running, like
switch to foreground, multiple page faults may cause the app dropped
frames.

To reduce the problem, System Management Software can read-ahead memory
of the process immediately when the app switches to forground.  Calling
process_madvise() with MADV_WILLNEED can meet this need.

Link: https://lkml.kernel.org/r/20210804082010.12482-1-zhangkui@oppo.com
Signed-off-by: zhangkui <zhangkui@oppo.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-03 09:58:18 -07:00
Greg Kroah-Hartman
7c647e3d19 Merge 4520dcbe0d ("Merge tag 'for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply") into android-mainline
Steps on the way to 5.15-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia88ca4292f7c791a2c9a0b256584969fa86a06c4
2021-09-02 09:42:36 +02:00
Linus Torvalds
aa99f3c2b9 Merge tag 'hole_punch_for_v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fs hole punching vs cache filling race fixes from Jan Kara:
 "Fix races leading to possible data corruption or stale data exposure
  in multiple filesystems when hole punching races with operations such
  as readahead.

  This is the series I was sending for the last merge window but with
  your objection fixed - now filemap_fault() has been modified to take
  invalidate_lock only when we need to create new page in the page cache
  and / or bring it uptodate"

* tag 'hole_punch_for_v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  filesystems/locking: fix Malformed table warning
  cifs: Fix race between hole punch and page fault
  ceph: Fix race between hole punch and page fault
  fuse: Convert to using invalidate_lock
  f2fs: Convert to using invalidate_lock
  zonefs: Convert to using invalidate_lock
  xfs: Convert double locking of MMAPLOCK to use VFS helpers
  xfs: Convert to use invalidate_lock
  xfs: Refactor xfs_isilocked()
  ext2: Convert to using invalidate_lock
  ext4: Convert to use mapping->invalidate_lock
  mm: Add functions to lock invalidate_lock for two mappings
  mm: Protect operations adding pages to page cache with invalidate_lock
  documentation: Sync file_operations members with reality
  mm: Fix comments mentioning i_mutex
2021-08-30 10:24:50 -07:00
Lee Jones
ab4d4efea8 Merge tag 'v5.14-rc6' into android-mainline
Linux 5.14-rc6

Change-Id: I2a11e448621a8133f80acbc241a1851d33bc38fb
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2021-08-16 09:30:23 +01:00
David Hildenbrand
eb2faa513c mm/madvise: report SIGBUS as -EFAULT for MADV_POPULATE_(READ|WRITE)
Doing some extended tests and polishing the man page update for
MADV_POPULATE_(READ|WRITE), I realized that we end up converting also
SIGBUS (via -EFAULT) to -EINVAL, making it look like yet another
madvise() user error.

We want to report only problematic mappings and permission problems that
the user could have know as -EINVAL.

Let's not convert -EFAULT arising due to SIGBUS (or SIGSEGV) to -EINVAL,
but instead indicate -EFAULT to user space.  While we could also convert
it to -ENOMEM, using -EFAULT looks more helpful when user space might
want to troubleshoot what's going wrong: MADV_POPULATE_(READ|WRITE) is
not part of an final Linux release and we can still adjust the behavior.

Link: https://lkml.kernel.org/r/20210726154932.102880-1-david@redhat.com
Fixes: 4ca9b3859d ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables")
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-13 14:09:31 -10:00
Jan Kara
9608703e48 mm: Fix comments mentioning i_mutex
inode->i_mutex has been replaced with inode->i_rwsem long ago. Fix
comments still mentioning i_mutex.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-07-12 18:31:16 +02:00
Lee Jones
293f275f4d Merge commit df8ba5f160 ("Merge tag 'kgdb-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux") into android-mainline
A large step en route to v5.14-rc1

Change-Id: I52bb71dc737044a593d1a9dfd7fe02b31e273ff9
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2021-07-12 11:00:18 +01:00
David Hildenbrand
4ca9b3859d mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables
I. Background: Sparse Memory Mappings

When we manage sparse memory mappings dynamically in user space - also
sometimes involving MAP_NORESERVE - we want to dynamically populate/
discard memory inside such a sparse memory region.  Example users are
hypervisors (especially implementing memory ballooning or similar
technologies like virtio-mem) and memory allocators.  In addition, we want
to fail in a nice way (instead of generating SIGBUS) if populating does
not succeed because we are out of backend memory (which can happen easily
with file-based mappings, especially tmpfs and hugetlbfs).

While MADV_DONTNEED, MADV_REMOVE and FALLOC_FL_PUNCH_HOLE allow for
reliably discarding memory for most mapping types, there is no generic
approach to populate page tables and preallocate memory.

Although mmap() supports MAP_POPULATE, it is not applicable to the concept
of sparse memory mappings, where we want to populate/discard dynamically
and avoid expensive/problematic remappings.  In addition, we never
actually report errors during the final populate phase - it is best-effort
only.

fallocate() can be used to preallocate file-based memory and fail in a
safe way.  However, it cannot really be used for any private mappings on
anonymous files via memfd due to COW semantics.  In addition, fallocate()
does not actually populate page tables, so we still always get pagefaults
on first access - which is sometimes undesired (i.e., real-time workloads)
and requires real prefaulting of page tables, not just a preallocation of
backend storage.  There might be interesting use cases for sparse memory
regions along with mlockall(MCL_ONFAULT) which fallocate() cannot satisfy
as it does not prefault page tables.

II. On preallcoation/prefaulting from user space

Because we don't have a proper interface, what applications (like QEMU and
databases) end up doing is touching (i.e., reading+writing one byte to not
overwrite existing data) all individual pages.

However, that approach
1) Can result in wear on storage backing, because we end up reading/writing
   each page; this is especially a problem for dax/pmem.
2) Can result in mmap_sem contention when prefaulting via multiple
   threads.
3) Requires expensive signal handling, especially to catch SIGBUS in case
   of hugetlbfs/shmem/file-backed memory. For example, this is
   problematic in hypervisors like QEMU where SIGBUS handlers might already
   be used by other subsystems concurrently to e.g, handle hardware errors.
   "Simply" doing preallocation concurrently from other thread is not that
   easy.

III. On MADV_WILLNEED

Extending MADV_WILLNEED is not an option because
1. It would change the semantics: "Expect access in the near future." and
   "might be a good idea to read some pages" vs. "Definitely populate/
   preallocate all memory and definitely fail on errors.".
2. Existing users (like virtio-balloon in QEMU when deflating the balloon)
   don't want populate/prealloc semantics. They treat this rather as a hint
   to give a little performance boost without too much overhead - and don't
   expect that a lot of memory might get consumed or a lot of time
   might be spent.

IV. MADV_POPULATE_READ and MADV_POPULATE_WRITE

Let's introduce MADV_POPULATE_READ and MADV_POPULATE_WRITE, inspired by
MAP_POPULATE, with the following semantics:
1. MADV_POPULATE_READ can be used to prefault page tables just like
   manually reading each individual page. This will not break any COW
   mappings. The shared zero page might get mapped and no backend storage
   might get preallocated -- allocation might be deferred to
   write-fault time. Especially shared file mappings require an explicit
   fallocate() upfront to actually preallocate backend memory (blocks in
   the file system) in case the file might have holes.
2. If MADV_POPULATE_READ succeeds, all page tables have been populated
   (prefaulted) readable once.
3. MADV_POPULATE_WRITE can be used to preallocate backend memory and
   prefault page tables just like manually writing (or
   reading+writing) each individual page. This will break any COW
   mappings -- e.g., the shared zeropage is never populated.
4. If MADV_POPULATE_WRITE succeeds, all page tables have been populated
   (prefaulted) writable once.
5. MADV_POPULATE_READ and MADV_POPULATE_WRITE cannot be applied to special
   mappings marked with VM_PFNMAP and VM_IO. Also, proper access
   permissions (e.g., PROT_READ, PROT_WRITE) are required. If any such
   mapping is encountered, madvise() fails with -EINVAL.
6. If MADV_POPULATE_READ or MADV_POPULATE_WRITE fails, some page tables
   might have been populated.
7. MADV_POPULATE_READ and MADV_POPULATE_WRITE will return -EHWPOISON
   when encountering a HW poisoned page in the range.
8. Similar to MAP_POPULATE, MADV_POPULATE_READ and MADV_POPULATE_WRITE
   cannot protect from the OOM (Out Of Memory) handler killing the
   process.

While the use case for MADV_POPULATE_WRITE is fairly obvious (i.e.,
preallocate memory and prefault page tables for VMs), one issue is that
whenever we prefault pages writable, the pages have to be marked dirty,
because the CPU could dirty them any time.  while not a real problem for
hugetlbfs or dax/pmem, it can be a problem for shared file mappings: each
page will be marked dirty and has to be written back later when evicting.

MADV_POPULATE_READ allows for optimizing this scenario: Pre-read a whole
mapping from backend storage without marking it dirty, such that eviction
won't have to write it back.  As discussed above, shared file mappings
might require an explciit fallocate() upfront to achieve
preallcoation+prepopulation.

Although sparse memory mappings are the primary use case, this will also
be useful for other preallocate/prefault use cases where MAP_POPULATE is
not desired or the semantics of MAP_POPULATE are not sufficient: as one
example, QEMU users can trigger preallocation/prefaulting of guest RAM
after the mapping was created -- and don't want errors to be silently
suppressed.

Looking at the history, MADV_POPULATE was already proposed in 2013 [1],
however, the main motivation back than was performance improvements --
which should also still be the case.

V. Single-threaded performance comparison

I did a short experiment, prefaulting page tables on completely *empty
mappings/files* and repeated the experiment 10 times.  The results
correspond to the shortest execution time.  In general, the performance
benefit for huge pages is negligible with small mappings.

V.1: Private mappings

POPULATE_READ and POPULATE_WRITE is fastest.  Note that
Reading/POPULATE_READ will populate the shared zeropage where applicable
-- which result in short population times.

The fastest way to allocate backend storage (here: swap or huge pages) and
prefault page tables is POPULATE_WRITE.

V.2: Shared mappings

fallocate() is fastest, however, doesn't prefault page tables.
POPULATE_WRITE is faster than simple writes and read/writes.
POPULATE_READ is faster than simple reads.

Without a fd, the fastest way to allocate backend storage and prefault
page tables is POPULATE_WRITE.  With an fd, the fastest way is usually
FALLOCATE+POPULATE_READ or FALLOCATE+POPULATE_WRITE respectively; one
exception are actual files: FALLOCATE+Read is slightly faster than
FALLOCATE+POPULATE_READ.

The fastest way to allocate backend storage prefault page tables is
FALLOCATE+POPULATE_WRITE -- except when dealing with actual files; then,
FALLOCATE+POPULATE_READ is fastest and won't directly mark all pages as
dirty.

v.3: Detailed results

==================================================
2 MiB MAP_PRIVATE:
**************************************************
Anon 4 KiB     : Read                     :     0.119 ms
Anon 4 KiB     : Write                    :     0.222 ms
Anon 4 KiB     : Read/Write               :     0.380 ms
Anon 4 KiB     : POPULATE_READ            :     0.060 ms
Anon 4 KiB     : POPULATE_WRITE           :     0.158 ms
Memfd 4 KiB    : Read                     :     0.034 ms
Memfd 4 KiB    : Write                    :     0.310 ms
Memfd 4 KiB    : Read/Write               :     0.362 ms
Memfd 4 KiB    : POPULATE_READ            :     0.039 ms
Memfd 4 KiB    : POPULATE_WRITE           :     0.229 ms
Memfd 2 MiB    : Read                     :     0.030 ms
Memfd 2 MiB    : Write                    :     0.030 ms
Memfd 2 MiB    : Read/Write               :     0.030 ms
Memfd 2 MiB    : POPULATE_READ            :     0.030 ms
Memfd 2 MiB    : POPULATE_WRITE           :     0.030 ms
tmpfs          : Read                     :     0.033 ms
tmpfs          : Write                    :     0.313 ms
tmpfs          : Read/Write               :     0.406 ms
tmpfs          : POPULATE_READ            :     0.039 ms
tmpfs          : POPULATE_WRITE           :     0.285 ms
file           : Read                     :     0.033 ms
file           : Write                    :     0.351 ms
file           : Read/Write               :     0.408 ms
file           : POPULATE_READ            :     0.039 ms
file           : POPULATE_WRITE           :     0.290 ms
hugetlbfs      : Read                     :     0.030 ms
hugetlbfs      : Write                    :     0.030 ms
hugetlbfs      : Read/Write               :     0.030 ms
hugetlbfs      : POPULATE_READ            :     0.030 ms
hugetlbfs      : POPULATE_WRITE           :     0.030 ms
**************************************************
4096 MiB MAP_PRIVATE:
**************************************************
Anon 4 KiB     : Read                     :   237.940 ms
Anon 4 KiB     : Write                    :   708.409 ms
Anon 4 KiB     : Read/Write               :  1054.041 ms
Anon 4 KiB     : POPULATE_READ            :   124.310 ms
Anon 4 KiB     : POPULATE_WRITE           :   572.582 ms
Memfd 4 KiB    : Read                     :   136.928 ms
Memfd 4 KiB    : Write                    :   963.898 ms
Memfd 4 KiB    : Read/Write               :  1106.561 ms
Memfd 4 KiB    : POPULATE_READ            :    78.450 ms
Memfd 4 KiB    : POPULATE_WRITE           :   805.881 ms
Memfd 2 MiB    : Read                     :   357.116 ms
Memfd 2 MiB    : Write                    :   357.210 ms
Memfd 2 MiB    : Read/Write               :   357.606 ms
Memfd 2 MiB    : POPULATE_READ            :   356.094 ms
Memfd 2 MiB    : POPULATE_WRITE           :   356.937 ms
tmpfs          : Read                     :   137.536 ms
tmpfs          : Write                    :   954.362 ms
tmpfs          : Read/Write               :  1105.954 ms
tmpfs          : POPULATE_READ            :    80.289 ms
tmpfs          : POPULATE_WRITE           :   822.826 ms
file           : Read                     :   137.874 ms
file           : Write                    :   987.025 ms
file           : Read/Write               :  1107.439 ms
file           : POPULATE_READ            :    80.413 ms
file           : POPULATE_WRITE           :   857.622 ms
hugetlbfs      : Read                     :   355.607 ms
hugetlbfs      : Write                    :   355.729 ms
hugetlbfs      : Read/Write               :   356.127 ms
hugetlbfs      : POPULATE_READ            :   354.585 ms
hugetlbfs      : POPULATE_WRITE           :   355.138 ms
**************************************************
2 MiB MAP_SHARED:
**************************************************
Anon 4 KiB     : Read                     :     0.394 ms
Anon 4 KiB     : Write                    :     0.348 ms
Anon 4 KiB     : Read/Write               :     0.400 ms
Anon 4 KiB     : POPULATE_READ            :     0.326 ms
Anon 4 KiB     : POPULATE_WRITE           :     0.273 ms
Anon 2 MiB     : Read                     :     0.030 ms
Anon 2 MiB     : Write                    :     0.030 ms
Anon 2 MiB     : Read/Write               :     0.030 ms
Anon 2 MiB     : POPULATE_READ            :     0.030 ms
Anon 2 MiB     : POPULATE_WRITE           :     0.030 ms
Memfd 4 KiB    : Read                     :     0.412 ms
Memfd 4 KiB    : Write                    :     0.372 ms
Memfd 4 KiB    : Read/Write               :     0.419 ms
Memfd 4 KiB    : POPULATE_READ            :     0.343 ms
Memfd 4 KiB    : POPULATE_WRITE           :     0.288 ms
Memfd 4 KiB    : FALLOCATE                :     0.137 ms
Memfd 4 KiB    : FALLOCATE+Read           :     0.446 ms
Memfd 4 KiB    : FALLOCATE+Write          :     0.330 ms
Memfd 4 KiB    : FALLOCATE+Read/Write     :     0.454 ms
Memfd 4 KiB    : FALLOCATE+POPULATE_READ  :     0.379 ms
Memfd 4 KiB    : FALLOCATE+POPULATE_WRITE :     0.268 ms
Memfd 2 MiB    : Read                     :     0.030 ms
Memfd 2 MiB    : Write                    :     0.030 ms
Memfd 2 MiB    : Read/Write               :     0.030 ms
Memfd 2 MiB    : POPULATE_READ            :     0.030 ms
Memfd 2 MiB    : POPULATE_WRITE           :     0.030 ms
Memfd 2 MiB    : FALLOCATE                :     0.030 ms
Memfd 2 MiB    : FALLOCATE+Read           :     0.031 ms
Memfd 2 MiB    : FALLOCATE+Write          :     0.031 ms
Memfd 2 MiB    : FALLOCATE+Read/Write     :     0.031 ms
Memfd 2 MiB    : FALLOCATE+POPULATE_READ  :     0.030 ms
Memfd 2 MiB    : FALLOCATE+POPULATE_WRITE :     0.030 ms
tmpfs          : Read                     :     0.416 ms
tmpfs          : Write                    :     0.369 ms
tmpfs          : Read/Write               :     0.425 ms
tmpfs          : POPULATE_READ            :     0.346 ms
tmpfs          : POPULATE_WRITE           :     0.295 ms
tmpfs          : FALLOCATE                :     0.139 ms
tmpfs          : FALLOCATE+Read           :     0.447 ms
tmpfs          : FALLOCATE+Write          :     0.333 ms
tmpfs          : FALLOCATE+Read/Write     :     0.454 ms
tmpfs          : FALLOCATE+POPULATE_READ  :     0.380 ms
tmpfs          : FALLOCATE+POPULATE_WRITE :     0.272 ms
file           : Read                     :     0.191 ms
file           : Write                    :     0.511 ms
file           : Read/Write               :     0.524 ms
file           : POPULATE_READ            :     0.196 ms
file           : POPULATE_WRITE           :     0.434 ms
file           : FALLOCATE                :     0.004 ms
file           : FALLOCATE+Read           :     0.197 ms
file           : FALLOCATE+Write          :     0.554 ms
file           : FALLOCATE+Read/Write     :     0.480 ms
file           : FALLOCATE+POPULATE_READ  :     0.201 ms
file           : FALLOCATE+POPULATE_WRITE :     0.381 ms
hugetlbfs      : Read                     :     0.030 ms
hugetlbfs      : Write                    :     0.030 ms
hugetlbfs      : Read/Write               :     0.030 ms
hugetlbfs      : POPULATE_READ            :     0.030 ms
hugetlbfs      : POPULATE_WRITE           :     0.030 ms
hugetlbfs      : FALLOCATE                :     0.030 ms
hugetlbfs      : FALLOCATE+Read           :     0.031 ms
hugetlbfs      : FALLOCATE+Write          :     0.031 ms
hugetlbfs      : FALLOCATE+Read/Write     :     0.030 ms
hugetlbfs      : FALLOCATE+POPULATE_READ  :     0.030 ms
hugetlbfs      : FALLOCATE+POPULATE_WRITE :     0.030 ms
**************************************************
4096 MiB MAP_SHARED:
**************************************************
Anon 4 KiB     : Read                     :  1053.090 ms
Anon 4 KiB     : Write                    :   913.642 ms
Anon 4 KiB     : Read/Write               :  1060.350 ms
Anon 4 KiB     : POPULATE_READ            :   893.691 ms
Anon 4 KiB     : POPULATE_WRITE           :   782.885 ms
Anon 2 MiB     : Read                     :   358.553 ms
Anon 2 MiB     : Write                    :   358.419 ms
Anon 2 MiB     : Read/Write               :   357.992 ms
Anon 2 MiB     : POPULATE_READ            :   357.533 ms
Anon 2 MiB     : POPULATE_WRITE           :   357.808 ms
Memfd 4 KiB    : Read                     :  1078.144 ms
Memfd 4 KiB    : Write                    :   942.036 ms
Memfd 4 KiB    : Read/Write               :  1100.391 ms
Memfd 4 KiB    : POPULATE_READ            :   925.829 ms
Memfd 4 KiB    : POPULATE_WRITE           :   804.394 ms
Memfd 4 KiB    : FALLOCATE                :   304.632 ms
Memfd 4 KiB    : FALLOCATE+Read           :  1163.359 ms
Memfd 4 KiB    : FALLOCATE+Write          :   933.186 ms
Memfd 4 KiB    : FALLOCATE+Read/Write     :  1187.304 ms
Memfd 4 KiB    : FALLOCATE+POPULATE_READ  :  1013.660 ms
Memfd 4 KiB    : FALLOCATE+POPULATE_WRITE :   794.560 ms
Memfd 2 MiB    : Read                     :   358.131 ms
Memfd 2 MiB    : Write                    :   358.099 ms
Memfd 2 MiB    : Read/Write               :   358.250 ms
Memfd 2 MiB    : POPULATE_READ            :   357.563 ms
Memfd 2 MiB    : POPULATE_WRITE           :   357.334 ms
Memfd 2 MiB    : FALLOCATE                :   356.735 ms
Memfd 2 MiB    : FALLOCATE+Read           :   358.152 ms
Memfd 2 MiB    : FALLOCATE+Write          :   358.331 ms
Memfd 2 MiB    : FALLOCATE+Read/Write     :   358.018 ms
Memfd 2 MiB    : FALLOCATE+POPULATE_READ  :   357.286 ms
Memfd 2 MiB    : FALLOCATE+POPULATE_WRITE :   357.523 ms
tmpfs          : Read                     :  1087.265 ms
tmpfs          : Write                    :   950.840 ms
tmpfs          : Read/Write               :  1107.567 ms
tmpfs          : POPULATE_READ            :   922.605 ms
tmpfs          : POPULATE_WRITE           :   810.094 ms
tmpfs          : FALLOCATE                :   306.320 ms
tmpfs          : FALLOCATE+Read           :  1169.796 ms
tmpfs          : FALLOCATE+Write          :   933.730 ms
tmpfs          : FALLOCATE+Read/Write     :  1191.610 ms
tmpfs          : FALLOCATE+POPULATE_READ  :  1020.474 ms
tmpfs          : FALLOCATE+POPULATE_WRITE :   798.945 ms
file           : Read                     :   654.101 ms
file           : Write                    :  1259.142 ms
file           : Read/Write               :  1289.509 ms
file           : POPULATE_READ            :   661.642 ms
file           : POPULATE_WRITE           :  1106.816 ms
file           : FALLOCATE                :     1.864 ms
file           : FALLOCATE+Read           :   656.328 ms
file           : FALLOCATE+Write          :  1153.300 ms
file           : FALLOCATE+Read/Write     :  1180.613 ms
file           : FALLOCATE+POPULATE_READ  :   668.347 ms
file           : FALLOCATE+POPULATE_WRITE :   996.143 ms
hugetlbfs      : Read                     :   357.245 ms
hugetlbfs      : Write                    :   357.413 ms
hugetlbfs      : Read/Write               :   357.120 ms
hugetlbfs      : POPULATE_READ            :   356.321 ms
hugetlbfs      : POPULATE_WRITE           :   356.693 ms
hugetlbfs      : FALLOCATE                :   355.927 ms
hugetlbfs      : FALLOCATE+Read           :   357.074 ms
hugetlbfs      : FALLOCATE+Write          :   357.120 ms
hugetlbfs      : FALLOCATE+Read/Write     :   356.983 ms
hugetlbfs      : FALLOCATE+POPULATE_READ  :   356.413 ms
hugetlbfs      : FALLOCATE+POPULATE_WRITE :   356.266 ms
**************************************************

[1] https://lkml.org/lkml/2013/6/27/698

[akpm@linux-foundation.org: coding style fixes]

Link: https://lkml.kernel.org/r/20210419135443.12822-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30 20:47:30 -07:00
Lee Jones
85adc860fd Merge 6efb943b86 Linux 5.13-rc1 into android-mainline
One giant leap, all the way up to 5.13-rc1

Also take the opportunity to re-align (a.k.a. fix a couple of previous
merge conflict fix-up issues) which occurred during this merge-window.

Fixes: 4797acfb9c ("Merge 16b3d0cf5b Merge tag 'sched-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into android-mainline")
Fixes: 92f282f338 ("Merge 8ca5297e7e Merge tag 'kconfig-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild into android-mainline")

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: Ie9389f595776e8f66bba6eaf0fa7a3587c6a5749
2021-05-15 09:09:01 +01:00
Ingo Molnar
f0953a1bba mm: fix typos in comments
Fix ~94 single-word typos in locking code comments, plus a few
very obvious grammar mistakes.

Link: https://lkml.kernel.org/r/20210322212624.GA1963421@gmail.com
Link: https://lore.kernel.org/r/20210322205203.GB1959563@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-07 00:26:35 -07:00
Greg Kroah-Hartman
6bb5c8b522 Merge 5.12-rc3 into android-mainline
Linux 5.12-rc3

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibc683be6a28885f39d477e3f86ec2ecd1a9e7e21
2021-03-15 08:48:29 +01:00
Suren Baghdasaryan
96cfe2c0fd mm/madvise: replace ptrace attach requirement for process_madvise
process_madvise currently requires ptrace attach capability.
PTRACE_MODE_ATTACH gives one process complete control over another
process.  It effectively removes the security boundary between the two
processes (in one direction).  Granting ptrace attach capability even to a
system process is considered dangerous since it creates an attack surface.
This severely limits the usage of this API.

The operations process_madvise can perform do not affect the correctness
of the operation of the target process; they only affect where the data is
physically located (and therefore, how fast it can be accessed).  What we
want is the ability for one process to influence another process in order
to optimize performance across the entire system while leaving the
security boundary intact.

Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ and
CAP_SYS_NICE.  PTRACE_MODE_READ to prevent leaking ASLR metadata and
CAP_SYS_NICE for influencing process performance.

Link: https://lkml.kernel.org/r/20210303185807.2160264-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: <stable@vger.kernel.org>	[5.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:30 -08:00
Greg Kroah-Hartman
bbf615b8e1 Merge 7d6beb71da ("Merge tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux") into android-mainline
Steps on the way to 5.12-rc1.

Resolves conflicts in:
	fs/overlayfs/inode.c

Note, incfs is broken here, will mark it as BROKEN in another patch to
make it more obvious.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I95a5938fc3cfa27d521c5d4fd18b2adfb13a6d84
2021-03-06 17:42:01 +01:00
Greg Kroah-Hartman
1e5781383b Merge 657bd90c93 ("Merge tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") into android-mainline
Steps on the way to 5.12-rc1

Resolves conflicts in:
        kernel/sched/cpufreq_schedutil.c

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I06d90f919467f3e7e8970aaedbb872a10eb699ff
2021-03-03 15:29:15 +01:00
Linus Torvalds
7d6beb71da Merge tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull idmapped mounts from Christian Brauner:
 "This introduces idmapped mounts which has been in the making for some
  time. Simply put, different mounts can expose the same file or
  directory with different ownership. This initial implementation comes
  with ports for fat, ext4 and with Christoph's port for xfs with more
  filesystems being actively worked on by independent people and
  maintainers.

  Idmapping mounts handle a wide range of long standing use-cases. Here
  are just a few:

   - Idmapped mounts make it possible to easily share files between
     multiple users or multiple machines especially in complex
     scenarios. For example, idmapped mounts will be used in the
     implementation of portable home directories in
     systemd-homed.service(8) where they allow users to move their home
     directory to an external storage device and use it on multiple
     computers where they are assigned different uids and gids. This
     effectively makes it possible to assign random uids and gids at
     login time.

   - It is possible to share files from the host with unprivileged
     containers without having to change ownership permanently through
     chown(2).

   - It is possible to idmap a container's rootfs and without having to
     mangle every file. For example, Chromebooks use it to share the
     user's Download folder with their unprivileged containers in their
     Linux subsystem.

   - It is possible to share files between containers with
     non-overlapping idmappings.

   - Filesystem that lack a proper concept of ownership such as fat can
     use idmapped mounts to implement discretionary access (DAC)
     permission checking.

   - They allow users to efficiently changing ownership on a per-mount
     basis without having to (recursively) chown(2) all files. In
     contrast to chown (2) changing ownership of large sets of files is
     instantenous with idmapped mounts. This is especially useful when
     ownership of a whole root filesystem of a virtual machine or
     container is changed. With idmapped mounts a single syscall
     mount_setattr syscall will be sufficient to change the ownership of
     all files.

   - Idmapped mounts always take the current ownership into account as
     idmappings specify what a given uid or gid is supposed to be mapped
     to. This contrasts with the chown(2) syscall which cannot by itself
     take the current ownership of the files it changes into account. It
     simply changes the ownership to the specified uid and gid. This is
     especially problematic when recursively chown(2)ing a large set of
     files which is commong with the aforementioned portable home
     directory and container and vm scenario.

   - Idmapped mounts allow to change ownership locally, restricting it
     to specific mounts, and temporarily as the ownership changes only
     apply as long as the mount exists.

  Several userspace projects have either already put up patches and
  pull-requests for this feature or will do so should you decide to pull
  this:

   - systemd: In a wide variety of scenarios but especially right away
     in their implementation of portable home directories.

         https://systemd.io/HOME_DIRECTORY/

   - container runtimes: containerd, runC, LXD:To share data between
     host and unprivileged containers, unprivileged and privileged
     containers, etc. The pull request for idmapped mounts support in
     containerd, the default Kubernetes runtime is already up for quite
     a while now: https://github.com/containerd/containerd/pull/4734

   - The virtio-fs developers and several users have expressed interest
     in using this feature with virtual machines once virtio-fs is
     ported.

   - ChromeOS: Sharing host-directories with unprivileged containers.

  I've tightly synced with all those projects and all of those listed
  here have also expressed their need/desire for this feature on the
  mailing list. For more info on how people use this there's a bunch of
  talks about this too. Here's just two recent ones:

      https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf
      https://fosdem.org/2021/schedule/event/containers_idmap/

  This comes with an extensive xfstests suite covering both ext4 and
  xfs:

      https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts

  It covers truncation, creation, opening, xattrs, vfscaps, setid
  execution, setgid inheritance and more both with idmapped and
  non-idmapped mounts. It already helped to discover an unrelated xfs
  setgid inheritance bug which has since been fixed in mainline. It will
  be sent for inclusion with the xfstests project should you decide to
  merge this.

  In order to support per-mount idmappings vfsmounts are marked with
  user namespaces. The idmapping of the user namespace will be used to
  map the ids of vfs objects when they are accessed through that mount.
  By default all vfsmounts are marked with the initial user namespace.
  The initial user namespace is used to indicate that a mount is not
  idmapped. All operations behave as before and this is verified in the
  testsuite.

  Based on prior discussions we want to attach the whole user namespace
  and not just a dedicated idmapping struct. This allows us to reuse all
  the helpers that already exist for dealing with idmappings instead of
  introducing a whole new range of helpers. In addition, if we decide in
  the future that we are confident enough to enable unprivileged users
  to setup idmapped mounts the permission checking can take into account
  whether the caller is privileged in the user namespace the mount is
  currently marked with.

  The user namespace the mount will be marked with can be specified by
  passing a file descriptor refering to the user namespace as an
  argument to the new mount_setattr() syscall together with the new
  MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern
  of extensibility.

  The following conditions must be met in order to create an idmapped
  mount:

   - The caller must currently have the CAP_SYS_ADMIN capability in the
     user namespace the underlying filesystem has been mounted in.

   - The underlying filesystem must support idmapped mounts.

   - The mount must not already be idmapped. This also implies that the
     idmapping of a mount cannot be altered once it has been idmapped.

   - The mount must be a detached/anonymous mount, i.e. it must have
     been created by calling open_tree() with the OPEN_TREE_CLONE flag
     and it must not already have been visible in the filesystem.

  The last two points guarantee easier semantics for userspace and the
  kernel and make the implementation significantly simpler.

  By default vfsmounts are marked with the initial user namespace and no
  behavioral or performance changes are observed.

  The manpage with a detailed description can be found here:

      1d7b902e28

  In order to support idmapped mounts, filesystems need to be changed
  and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The
  patches to convert individual filesystem are not very large or
  complicated overall as can be seen from the included fat, ext4, and
  xfs ports. Patches for other filesystems are actively worked on and
  will be sent out separately. The xfstestsuite can be used to verify
  that port has been done correctly.

  The mount_setattr() syscall is motivated independent of the idmapped
  mounts patches and it's been around since July 2019. One of the most
  valuable features of the new mount api is the ability to perform
  mounts based on file descriptors only.

  Together with the lookup restrictions available in the openat2()
  RESOLVE_* flag namespace which we added in v5.6 this is the first time
  we are close to hardened and race-free (e.g. symlinks) mounting and
  path resolution.

  While userspace has started porting to the new mount api to mount
  proper filesystems and create new bind-mounts it is currently not
  possible to change mount options of an already existing bind mount in
  the new mount api since the mount_setattr() syscall is missing.

  With the addition of the mount_setattr() syscall we remove this last
  restriction and userspace can now fully port to the new mount api,
  covering every use-case the old mount api could. We also add the
  crucial ability to recursively change mount options for a whole mount
  tree, both removing and adding mount options at the same time. This
  syscall has been requested multiple times by various people and
  projects.

  There is a simple tool available at

      https://github.com/brauner/mount-idmapped

  that allows to create idmapped mounts so people can play with this
  patch series. I'll add support for the regular mount binary should you
  decide to pull this in the following weeks:

  Here's an example to a simple idmapped mount of another user's home
  directory:

	u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt

	u1001@f2-vm:/$ ls -al /home/ubuntu/
	total 28
	drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
	drwxr-xr-x 4 root   root   4096 Oct 28 04:00 ..
	-rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
	-rw-r--r-- 1 ubuntu ubuntu  220 Feb 25  2020 .bash_logout
	-rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25  2020 .bashrc
	-rw-r--r-- 1 ubuntu ubuntu  807 Feb 25  2020 .profile
	-rw-r--r-- 1 ubuntu ubuntu    0 Oct 16 16:11 .sudo_as_admin_successful
	-rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo

	u1001@f2-vm:/$ ls -al /mnt/
	total 28
	drwxr-xr-x  2 u1001 u1001 4096 Oct 28 22:07 .
	drwxr-xr-x 29 root  root  4096 Oct 28 22:01 ..
	-rw-------  1 u1001 u1001 3154 Oct 28 22:12 .bash_history
	-rw-r--r--  1 u1001 u1001  220 Feb 25  2020 .bash_logout
	-rw-r--r--  1 u1001 u1001 3771 Feb 25  2020 .bashrc
	-rw-r--r--  1 u1001 u1001  807 Feb 25  2020 .profile
	-rw-r--r--  1 u1001 u1001    0 Oct 16 16:11 .sudo_as_admin_successful
	-rw-------  1 u1001 u1001 1144 Oct 28 00:43 .viminfo

	u1001@f2-vm:/$ touch /mnt/my-file

	u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file

	u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file

	u1001@f2-vm:/$ ls -al /mnt/my-file
	-rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file

	u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
	-rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file

	u1001@f2-vm:/$ getfacl /mnt/my-file
	getfacl: Removing leading '/' from absolute path names
	# file: mnt/my-file
	# owner: u1001
	# group: u1001
	user::rw-
	user:u1001:rwx
	group::rw-
	mask::rwx
	other::r--

	u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
	getfacl: Removing leading '/' from absolute path names
	# file: home/ubuntu/my-file
	# owner: ubuntu
	# group: ubuntu
	user::rw-
	user:ubuntu:rwx
	group::rw-
	mask::rwx
	other::r--"

* tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits)
  xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl
  xfs: support idmapped mounts
  ext4: support idmapped mounts
  fat: handle idmapped mounts
  tests: add mount_setattr() selftests
  fs: introduce MOUNT_ATTR_IDMAP
  fs: add mount_setattr()
  fs: add attr_flags_to_mnt_flags helper
  fs: split out functions to hold writers
  namespace: only take read lock in do_reconfigure_mnt()
  mount: make {lock,unlock}_mount_hash() static
  namespace: take lock_mount_hash() directly when changing flags
  nfs: do not export idmapped mounts
  overlayfs: do not mount on top of idmapped mounts
  ecryptfs: do not mount on top of idmapped mounts
  ima: handle idmapped mounts
  apparmor: handle idmapped mounts
  fs: make helpers idmap mount aware
  exec: handle idmapped mounts
  would_dump: handle idmapped mounts
  ...
2021-02-23 13:39:45 -08:00
Will Deacon
a72afd8730 tlb: mmu_gather: Remove start/end arguments from tlb_gather_mmu()
The 'start' and 'end' arguments to tlb_gather_mmu() are no longer
needed now that there is a separate function for 'fullmm' flushing.

Remove the unused arguments and update all callers.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Yu Zhao <yuzhao@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wjQWa14_4UpfDf=fiineNP+RH74kZeDMo_f1D35xNzq9w@mail.gmail.com
2021-01-29 20:02:29 +01:00
Will Deacon
ae8eba8b5d tlb: mmu_gather: Remove unused start/end arguments from tlb_finish_mmu()
Since commit 7a30df49f6 ("mm: mmu_gather: remove __tlb_reset_range()
for force flush"), the 'start' and 'end' arguments to tlb_finish_mmu()
are no longer used, since we flush the whole mm in case of a nested
invalidation.

Remove the unused arguments and update all callers.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Yu Zhao <yuzhao@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20210127235347.1402-3-will@kernel.org
2021-01-29 20:02:28 +01:00
Christian Brauner
21cb47be6f inode: make init and permission helpers idmapped mount aware
The inode_owner_or_capable() helper determines whether the caller is the
owner of the inode or is capable with respect to that inode. Allow it to
handle idmapped mounts. If the inode is accessed through an idmapped
mount it according to the mount's user namespace. Afterwards the checks
are identical to non-idmapped mounts. If the initial user namespace is
passed nothing changes so non-idmapped mounts will see identical
behavior as before.

Similarly, allow the inode_init_owner() helper to handle idmapped
mounts. It initializes a new inode on idmapped mounts by mapping the
fsuid and fsgid of the caller from the mount's user namespace. If the
initial user namespace is passed nothing changes so non-idmapped mounts
will see identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-7-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:16 +01:00
Christian Brauner
02f92b3868 fs: add file and path permissions helpers
Add two simple helpers to check permissions on a file and path
respectively and convert over some callers. It simplifies quite a few
codepaths and also reduces the churn in later patches quite a bit.
Christoph also correctly points out that this makes codepaths (e.g.
ioctls) way easier to follow that would otherwise have to do more
complex argument passing than necessary.

Link: https://lore.kernel.org/r/20210121131959.646623-4-christian.brauner@ubuntu.com
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:16 +01:00
Greg Kroah-Hartman
8c3b398d8c Merge ac73e3dc8a ("Merge branch 'akpm' (patches from Andrew)") into android-mainline
Steps on the way to 5.11-rc1

Change-Id: I23957617a1e123aa05d3c1d48ea24e6acd131bdd
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2020-12-17 07:57:30 +01:00
Oscar Salvador
1e8aaedb18 mm,memory_failure: always pin the page in madvise_inject_error
madvise_inject_error() uses get_user_pages_fast to translate the address
we specified to a page.  After [1], we drop the extra reference count for
memory_failure() path.  That commit says that memory_failure wanted to
keep the pin in order to take the page out of circulation.

The truth is that we need to keep the page pinned, otherwise the page
might be re-used after the put_page() and we can end up messing with
someone else's memory.

E.g:

CPU0
process X					CPU1
 madvise_inject_error
  get_user_pages
   put_page
					page gets reclaimed
					process Y allocates the page
  memory_failure
   // We mess with process Y memory

madvise() is meant to operate on a self address space, so messing with
pages that do not belong to us seems the wrong thing to do.
To avoid that, let us keep the page pinned for memory_failure as well.

Pages for DAX mappings will release this extra refcount in
memory_failure_dev_pagemap.

[1] ("23e7b5c2e271: mm, madvise_inject_error:
      Let memory_failure() optionally take a page reference")

Link: https://lkml.kernel.org/r/20201207094818.8518-1-osalvador@suse.de
Fixes: 23e7b5c2e2 ("mm, madvise_inject_error: Let memory_failure() optionally take a page reference")
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:44 -08:00
Oscar Salvador
32409cba3f mm,hwpoison: drop unneeded pcplist draining
memory_failure and soft_offline_path paths now drain pcplists by calling
get_hwpoison_page.

memory_failure flags the page as HWPoison before, so that page cannot
longer go into a pcplist, and soft_offline_page only flags a page as
HWPoison if 1) we took the page off a buddy freelist 2) the page was
in-use and we migrated it 3) was a clean pagecache.

Because of that, a page cannot longer be poisoned and be in a pcplist.

Link: https://lkml.kernel.org/r/20201013144447.6706-5-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:44 -08:00
Greg Kroah-Hartman
4de82f1b3b Merge a68a0262ab ("mm/madvise: remove racy mm ownership check") into android-mainline
Steps on the way to 5.10-rc8/final

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0d1e2f396425894a7c2e9a2680edf763a223c644
2020-12-10 11:34:58 +01:00
Minchan Kim
a68a0262ab mm/madvise: remove racy mm ownership check
Jann spotted the security hole due to race of mm ownership check.

If the task is sharing the mm_struct but goes through execve() before
mm_access(), it could skip process_madvise_behavior_valid check.  That
makes *any advice hint* to reach into the remote process.

This patch removes the mm ownership check.  With it, it will lose the
ability that local process could give *any* advice hint with vector
interface for some reason (e.g., performance).  Since there is no
concrete example in upstream yet, it would be better to remove the
abiliity at this moment and need to review when such new advice comes
up.

Fixes: ecb8ac8b1f ("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-08 20:57:18 -08:00
Greg Kroah-Hartman
5acba58e59 Merge 5.10-rc5 into android-mainline
Linux 5.10-rc5

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia5b23cceb3e0212c1c841f1297ecfab65cc9aaa6
2020-11-23 08:17:16 +01:00
Matthew Wilcox (Oracle)
66383800df mm: fix madvise WILLNEED performance problem
The calculation of the end page index was incorrect, leading to a
regression of 70% when running stress-ng.

With this fix, we instead see a performance improvement of 3%.

Fixes: e6e88712e4 ("mm: optimise madvise WILLNEED")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: "Chen, Rong A" <rong.a.chen@intel.com>
Link: https://lkml.kernel.org/r/20201109134851.29692-1-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22 10:48:22 -08:00
Eric Dumazet
450677dcb0 mm/madvise: fix memory leak from process_madvise
The early return in process_madvise() will produce a memory leak.

Fix it.

Fixes: ecb8ac8b1f ("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/20201116155132.GA3805951@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22 10:48:22 -08:00
Greg Kroah-Hartman
b3dd1b5952 Merge 922a763ae1 ("Merge tag 'zonefs-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs") into android-mainline
Steps on the way to 5.10-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I520719ae5e0d992c3756e393cb299d77d650622e
2020-10-26 10:08:43 +01:00
Greg Kroah-Hartman
05d2a661fd Merge 54a4c789ca ("Merge tag 'docs/v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media") into android-mainline
Steps on the way to 5.10-rc1

Resolves conflicts in:
	fs/userfaultfd.c

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie3fe3c818f1f6565cfd4fa551de72d2b72ef60af
2020-10-26 09:23:33 +01:00