Commit Graph

160 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
3fb2e91e28 Merge 5.15.40 into android-5.15
Changes in 5.15.40
	x86/lib/atomic64_386_32: Rename things
	x86: Prepare asm files for straight-line-speculation
	x86: Prepare inline-asm for straight-line-speculation
	objtool: Add straight-line-speculation validation
	x86/alternative: Relax text_poke_bp() constraint
	kbuild: move objtool_args back to scripts/Makefile.build
	x86: Add straight-line-speculation mitigation
	tools arch: Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem memcpy'
	kvm/emulate: Fix SETcc emulation function offsets with SLS
	crypto: x86/poly1305 - Fixup SLS
	objtool: Fix SLS validation for kcov tail-call replacement
	Bluetooth: Fix the creation of hdev->name
	rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition
	udf: Avoid using stale lengthOfImpUse
	mm: fix missing cache flush for all tail pages of compound page
	mm: hugetlb: fix missing cache flush in copy_huge_page_from_user()
	mm: shmem: fix missing cache flush in shmem_mfill_atomic_pte()
	mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic()
	mm/hwpoison: fix error page recovered but reported "not recovered"
	mm/mlock: fix potential imbalanced rlimit ucounts adjustment
	mm: fix invalid page pointer returned with FOLL_PIN gups
	Linux 5.15.40

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I039f8014f8e898f74f2f98ab71ef29c39a1ad544
2022-06-09 15:39:10 +02:00
Miaohe Lin
954c78ed8c mm/mlock: fix potential imbalanced rlimit ucounts adjustment
commit 5c2a956c3eea173b2bc89f632507c0eeaebf6c4a upstream.

user_shm_lock forgets to set allowed to 0 when get_ucounts fails.  So
the later user_shm_unlock might do the extra dec_rlimit_ucounts.  Fix
this by resetting allowed to 0.

Link: https://lkml.kernel.org/r/20220310132417.41189-1-linmiaohe@huawei.com
Fixes: d7c9e99aee ("Reimplement RLIMIT_MEMLOCK on top of ucounts")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-15 20:18:53 +02:00
Greg Kroah-Hartman
b41a37c036 Merge 5.15.33 into android13-5.15
Changes in 5.15.33
	Revert "swiotlb: rework "fix info leak with DMA_FROM_DEVICE""
	USB: serial: pl2303: add IBM device IDs
	dt-bindings: usb: hcd: correct usb-device path
	USB: serial: pl2303: fix GS type detection
	USB: serial: simple: add Nokia phone driver
	mm: kfence: fix missing objcg housekeeping for SLAB
	hv: utils: add PTP_1588_CLOCK to Kconfig to fix build
	HID: logitech-dj: add new lightspeed receiver id
	HID: Add support for open wheel and no attachment to T300
	xfrm: fix tunnel model fragmentation behavior
	ARM: mstar: Select HAVE_ARM_ARCH_TIMER
	virtio_console: break out of buf poll on remove
	vdpa/mlx5: should verify CTRL_VQ feature exists for MQ
	tools/virtio: fix virtio_test execution
	ethernet: sun: Free the coherent when failing in probing
	gpio: Revert regression in sysfs-gpio (gpiolib.c)
	spi: Fix invalid sgs value
	net:mcf8390: Use platform_get_irq() to get the interrupt
	Revert "gpio: Revert regression in sysfs-gpio (gpiolib.c)"
	spi: Fix erroneous sgs value with min_t()
	Input: zinitix - do not report shadow fingers
	af_key: add __GFP_ZERO flag for compose_sadb_supported in function pfkey_register
	net: dsa: microchip: add spi_device_id tables
	selftests: vm: fix clang build error multiple output files
	locking/lockdep: Avoid potential access of invalid memory in lock_class
	drm/amdgpu: move PX checking into amdgpu_device_ip_early_init
	drm/amdgpu: only check for _PR3 on dGPUs
	iommu/iova: Improve 32-bit free space estimate
	virtio-blk: Use blk_validate_block_size() to validate block size
	tpm: fix reference counting for struct tpm_chip
	usb: typec: tipd: Forward plug orientation to typec subsystem
	USB: usb-storage: Fix use of bitfields for hardware data in ene_ub6250.c
	xhci: fix garbage USBSTS being logged in some cases
	xhci: fix runtime PM imbalance in USB2 resume
	xhci: make xhci_handshake timeout for xhci_reset() adjustable
	xhci: fix uninitialized string returned by xhci_decode_ctrl_ctx()
	mei: me: disable driver on the ign firmware
	mei: me: add Alder Lake N device id.
	mei: avoid iterator usage outside of list_for_each_entry
	bus: mhi: pci_generic: Add mru_default for Quectel EM1xx series
	bus: mhi: Fix MHI DMA structure endianness
	docs: sphinx/requirements: Limit jinja2<3.1
	coresight: Fix TRCCONFIGR.QE sysfs interface
	coresight: syscfg: Fix memleak on registration failure in cscfg_create_device
	iio: afe: rescale: use s64 for temporary scale calculations
	iio: inkern: apply consumer scale on IIO_VAL_INT cases
	iio: inkern: apply consumer scale when no channel scale is available
	iio: inkern: make a best effort on offset calculation
	greybus: svc: fix an error handling bug in gb_svc_hello()
	clk: rockchip: re-add rational best approximation algorithm to the fractional divider
	clk: uniphier: Fix fixed-rate initialization
	ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
	cifs: fix handlecache and multiuser
	cifs: we do not need a spinlock around the tree access during umount
	KEYS: fix length validation in keyctl_pkey_params_get_2()
	KEYS: asymmetric: enforce that sig algo matches key algo
	KEYS: asymmetric: properly validate hash_algo and encoding
	Documentation: add link to stable release candidate tree
	Documentation: update stable tree link
	firmware: stratix10-svc: add missing callback parameter on RSU
	firmware: sysfb: fix platform-device leak in error path
	HID: intel-ish-hid: Use dma_alloc_coherent for firmware update
	SUNRPC: avoid race between mod_timer() and del_timer_sync()
	NFS: NFSv2/v3 clients should never be setting NFS_CAP_XATTR
	NFSD: prevent underflow in nfssvc_decode_writeargs()
	NFSD: prevent integer overflow on 32 bit systems
	f2fs: fix to unlock page correctly in error path of is_alive()
	f2fs: quota: fix loop condition at f2fs_quota_sync()
	f2fs: fix to do sanity check on .cp_pack_total_block_count
	remoteproc: Fix count check in rproc_coredump_write()
	mm/mlock: fix two bugs in user_shm_lock()
	pinctrl: ingenic: Fix regmap on X series SoCs
	pinctrl: samsung: drop pin banks references on error paths
	net: bnxt_ptp: fix compilation error
	spi: mxic: Fix the transmit path
	mtd: rawnand: protect access to rawnand devices while in suspend
	can: ems_usb: ems_usb_start_xmit(): fix double dev_kfree_skb() in error path
	can: m_can: m_can_tx_handler(): fix use after free of skb
	can: usb_8dev: usb_8dev_start_xmit(): fix double dev_kfree_skb() in error path
	jffs2: fix use-after-free in jffs2_clear_xattr_subsystem
	jffs2: fix memory leak in jffs2_do_mount_fs
	jffs2: fix memory leak in jffs2_scan_medium
	mm: fs: fix lru_cache_disabled race in bh_lru
	mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node
	mm: invalidate hwpoison page cache page in fault path
	mempolicy: mbind_range() set_policy() after vma_merge()
	scsi: core: sd: Add silence_suspend flag to suppress some PM messages
	scsi: ufs: Fix runtime PM messages never-ending cycle
	scsi: scsi_transport_fc: Fix FPIN Link Integrity statistics counters
	scsi: libsas: Fix sas_ata_qc_issue() handling of NCQ NON DATA commands
	qed: display VF trust config
	qed: validate and restrict untrusted VFs vlan promisc mode
	riscv: dts: canaan: Fix SPI3 bus width
	riscv: Fix fill_callchain return value
	riscv: Increase stack size under KASAN
	Revert "Input: clear BTN_RIGHT/MIDDLE on buttonpads"
	cifs: prevent bad output lengths in smb2_ioctl_query_info()
	cifs: fix NULL ptr dereference in smb2_ioctl_query_info()
	ALSA: cs4236: fix an incorrect NULL check on list iterator
	ALSA: hda: Avoid unsol event during RPM suspending
	ALSA: pcm: Fix potential AB/BA lock with buffer_mutex and mmap_lock
	ALSA: hda/realtek: Fix audio regression on Mi Notebook Pro 2020
	rtc: mc146818-lib: fix locking in mc146818_set_time
	rtc: pl031: fix rtc features null pointer dereference
	ocfs2: fix crash when mount with quota enabled
	drm/simpledrm: Add "panel orientation" property on non-upright mounted LCD panels
	mm: madvise: skip unmapped vma holes passed to process_madvise
	mm: madvise: return correct bytes advised with process_madvise
	Revert "mm: madvise: skip unmapped vma holes passed to process_madvise"
	mm,hwpoison: unmap poisoned page before invalidation
	mm/kmemleak: reset tag when compare object pointer
	dm stats: fix too short end duration_ns when using precise_timestamps
	dm: fix use-after-free in dm_cleanup_zoned_dev()
	dm: interlock pending dm_io and dm_wait_for_bios_completion
	dm: fix double accounting of flush with data
	dm integrity: set journal entry unused when shrinking device
	tracing: Have trace event string test handle zero length strings
	drbd: fix potential silent data corruption
	powerpc/kvm: Fix kvm_use_magic_page
	PCI: fu740: Force 2.5GT/s for initial device probe
	arm64: signal: nofpsimd: Do not allocate fp/simd context when not available
	arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones
	arm64: dts: qcom: sm8250: Fix MSI IRQ for PCIe1 and PCIe2
	arm64: dts: ti: k3-am65: Fix gic-v3 compatible regs
	arm64: dts: ti: k3-j721e: Fix gic-v3 compatible regs
	arm64: dts: ti: k3-j7200: Fix gic-v3 compatible regs
	arm64: dts: ti: k3-am64: Fix gic-v3 compatible regs
	ASoC: SOF: Intel: Fix NULL ptr dereference when ENOMEM
	Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag"
	ACPI: properties: Consistently return -ENOENT if there are no more references
	coredump: Also dump first pages of non-executable ELF libraries
	ext4: fix ext4_fc_stats trace point
	ext4: fix fs corruption when tring to remove a non-empty directory with IO error
	ext4: make mb_optimize_scan performance mount option work with extents
	drivers: hamradio: 6pack: fix UAF bug caused by mod_timer()
	samples/landlock: Fix path_list memory leak
	landlock: Use square brackets around "landlock-ruleset"
	mailbox: tegra-hsp: Flush whole channel
	block: limit request dispatch loop duration
	block: don't merge across cgroup boundaries if blkcg is enabled
	drm/edid: check basic audio support on CEA extension block
	fbdev: Hot-unplug firmware fb devices on forced removal
	video: fbdev: sm712fb: Fix crash in smtcfb_read()
	video: fbdev: atari: Atari 2 bpp (STe) palette bugfix
	rfkill: make new event layout opt-in
	ARM: dts: at91: sama7g5: Remove unused properties in i2c nodes
	ARM: dts: at91: sama5d2: Fix PMERRLOC resource size
	ARM: dts: exynos: fix UART3 pins configuration in Exynos5250
	ARM: dts: exynos: add missing HDMI supplies on SMDK5250
	ARM: dts: exynos: add missing HDMI supplies on SMDK5420
	mgag200 fix memmapsl configuration in GCTL6 register
	carl9170: fix missing bit-wise or operator for tx_params
	pstore: Don't use semaphores in always-atomic-context code
	thermal: int340x: Increase bitmap size
	lib/raid6/test: fix multiple definition linking error
	exec: Force single empty string when argv is empty
	crypto: rsa-pkcs1pad - only allow with rsa
	crypto: rsa-pkcs1pad - correctly get hash from source scatterlist
	crypto: rsa-pkcs1pad - restore signature length check
	crypto: rsa-pkcs1pad - fix buffer overread in pkcs1pad_verify_complete()
	bcache: fixup multiple threads crash
	PM: domains: Fix sleep-in-atomic bug caused by genpd_debug_remove()
	DEC: Limit PMAX memory probing to R3k systems
	media: gpio-ir-tx: fix transmit with long spaces on Orange Pi PC
	media: venus: hfi_cmds: List HDR10 property as unsupported for v1 and v3
	media: venus: venc: Fix h264 8x8 transform control
	media: davinci: vpif: fix unbalanced runtime PM get
	media: davinci: vpif: fix unbalanced runtime PM enable
	btrfs: zoned: mark relocation as writing
	btrfs: extend locking to all space_info members accesses
	btrfs: verify the tranisd of the to-be-written dirty extent buffer
	xtensa: define update_mmu_tlb function
	xtensa: fix stop_machine_cpuslocked call in patch_text
	xtensa: fix xtensa_wsr always writing 0
	drm/syncobj: flatten dma_fence_chains on transfer
	drm/nouveau/backlight: Fix LVDS backlight detection on some laptops
	drm/nouveau/backlight: Just set all backlight types as RAW
	drm/fb-helper: Mark screen buffers in system memory with FBINFO_VIRTFB
	brcmfmac: firmware: Allocate space for default boardrev in nvram
	brcmfmac: pcie: Release firmwares in the brcmf_pcie_setup error path
	brcmfmac: pcie: Declare missing firmware files in pcie.c
	brcmfmac: pcie: Replace brcmf_pcie_copy_mem_todev with memcpy_toio
	brcmfmac: pcie: Fix crashes due to early IRQs
	drm/i915/opregion: check port number bounds for SWSCI display power state
	drm/i915/gem: add missing boundary check in vm_access
	PCI: imx6: Allow to probe when dw_pcie_wait_for_link() fails
	PCI: pciehp: Clear cmd_busy bit in polling mode
	PCI: xgene: Revert "PCI: xgene: Fix IB window setup"
	regulator: qcom_smd: fix for_each_child.cocci warnings
	selinux: access superblock_security_struct in LSM blob way
	selinux: check return value of sel_make_avc_files
	crypto: ccp - Ensure psp_ret is always init'd in __sev_platform_init_locked()
	hwrng: cavium - Check health status while reading random data
	hwrng: cavium - HW_RANDOM_CAVIUM should depend on ARCH_THUNDER
	crypto: sun8i-ss - really disable hash on A80
	crypto: authenc - Fix sleep in atomic context in decrypt_tail
	crypto: mxs-dcp - Fix scatterlist processing
	selinux: Fix selinux_sb_mnt_opts_compat()
	thermal: int340x: Check for NULL after calling kmemdup()
	crypto: octeontx2 - remove CONFIG_DM_CRYPT check
	spi: tegra114: Add missing IRQ check in tegra_spi_probe
	spi: tegra210-quad: Fix missin IRQ check in tegra_qspi_probe
	stack: Constrain and fix stack offset randomization with Clang builds
	arm64/mm: avoid fixmap race condition when create pud mapping
	blk-cgroup: set blkg iostat after percpu stat aggregation
	selftests/x86: Add validity check and allow field splitting
	selftests/sgx: Treat CC as one argument
	crypto: rockchip - ECB does not need IV
	audit: log AUDIT_TIME_* records only from rules
	EVM: fix the evm= __setup handler return value
	crypto: ccree - don't attempt 0 len DMA mappings
	crypto: hisilicon/sec - fix the aead software fallback for engine
	spi: pxa2xx-pci: Balance reference count for PCI DMA device
	hwmon: (pmbus) Add mutex to regulator ops
	hwmon: (sch56xx-common) Replace WDOG_ACTIVE with WDOG_HW_RUNNING
	nvme: cleanup __nvme_check_ids
	nvme: fix the check for duplicate unique identifiers
	block: don't delete queue kobject before its children
	PM: hibernate: fix __setup handler error handling
	PM: suspend: fix return value of __setup handler
	spi: spi-zynqmp-gqspi: Handle error for dma_set_mask
	hwrng: atmel - disable trng on failure path
	crypto: sun8i-ss - call finalize with bh disabled
	crypto: sun8i-ce - call finalize with bh disabled
	crypto: amlogic - call finalize with bh disabled
	crypto: gemini - call finalize with bh disabled
	crypto: vmx - add missing dependencies
	clocksource/drivers/timer-ti-dm: Fix regression from errata i940 fix
	clocksource/drivers/exynos_mct: Refactor resources allocation
	clocksource/drivers/exynos_mct: Handle DTS with higher number of interrupts
	clocksource/drivers/timer-microchip-pit64b: Use notrace
	clocksource/drivers/timer-of: Check return value of of_iomap in timer_of_base_init()
	arm64: prevent instrumentation of bp hardening callbacks
	KEYS: trusted: Fix trusted key backends when building as module
	KEYS: trusted: Avoid calling null function trusted_key_exit
	ACPI: APEI: fix return value of __setup handlers
	crypto: ccp - ccp_dmaengine_unregister release dma channels
	crypto: ccree - Fix use after free in cc_cipher_exit()
	hwrng: nomadik - Change clk_disable to clk_disable_unprepare
	hwmon: (pmbus) Add Vin unit off handling
	clocksource: acpi_pm: fix return value of __setup handler
	io_uring: don't check unrelated req->open.how in accept request
	io_uring: terminate manual loop iterator loop correctly for non-vecs
	watch_queue: Fix NULL dereference in error cleanup
	watch_queue: Actually free the watch
	f2fs: fix to enable ATGC correctly via gc_idle sysfs interface
	sched/debug: Remove mpol_get/put and task_lock/unlock from sched_show_numa
	sched/core: Export pelt_thermal_tp
	sched/uclamp: Fix iowait boost escaping uclamp restriction
	rseq: Remove broken uapi field layout on 32-bit little endian
	perf/core: Fix address filter parser for multiple filters
	perf/x86/intel/pt: Fix address filter config for 32-bit kernel
	sched/fair: Improve consistency of allowed NUMA balance calculations
	f2fs: fix missing free nid in f2fs_handle_failed_inode
	nfsd: more robust allocation failure handling in nfsd_file_cache_init
	sched/cpuacct: Fix charge percpu cpuusage
	sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
	f2fs: fix to avoid potential deadlock
	btrfs: fix unexpected error path when reflinking an inline extent
	f2fs: fix compressed file start atomic write may cause data corruption
	selftests, x86: fix how check_cc.sh is being invoked
	drivers/base/memory: add memory block to memory group after registration succeeded
	kunit: make kunit_test_timeout compatible with comment
	pinctrl: samsung: Remove EINT handler for Exynos850 ALIVE and CMGP gpios
	media: staging: media: zoran: fix usage of vb2_dma_contig_set_max_seg_size
	media: camss: csid-170: fix non-10bit formats
	media: camss: csid-170: don't enable unused irqs
	media: camss: csid-170: set the right HALT_CMD when disabled
	media: camss: vfe-170: fix "VFE halt timeout" error
	media: staging: media: imx: imx7-mipi-csis: Make subdev name unique
	media: v4l2-mem2mem: Apply DST_QUEUE_OFF_BASE on MMAP buffers across ioctls
	media: mtk-vcodec: potential dereference of null pointer
	media: imx: imx8mq-mipi-csi2: remove wrong irq config write operation
	media: imx: imx8mq-mipi_csi2: fix system resume
	media: bttv: fix WARNING regression on tunerless devices
	media: atmel: atmel-sama7g5-isc: fix ispck leftover
	ASoC: sh: rz-ssi: Drop calling rz_ssi_pio_recv() recursively
	ASoC: codecs: Check for error pointer after calling devm_regmap_init_mmio
	ASoC: xilinx: xlnx_formatter_pcm: Handle sysclk setting
	ASoC: simple-card-utils: Set sysclk on all components
	media: coda: Fix missing put_device() call in coda_get_vdoa_data
	media: meson: vdec: potential dereference of null pointer
	media: hantro: Fix overfill bottom register field name
	media: ov6650: Fix set format try processing path
	media: v4l: Avoid unaligned access warnings when printing 4cc modifiers
	media: ov5648: Don't pack controls struct
	media: aspeed: Correct value for h-total-pixels
	video: fbdev: matroxfb: set maxvram of vbG200eW to the same as vbG200 to avoid black screen
	video: fbdev: controlfb: Fix COMPILE_TEST build
	video: fbdev: smscufx: Fix null-ptr-deref in ufx_usb_probe()
	video: fbdev: atmel_lcdfb: fix an error code in atmel_lcdfb_probe()
	video: fbdev: fbcvt.c: fix printing in fb_cvt_print_name()
	ARM: dts: Fix OpenBMC flash layout label addresses
	firmware: qcom: scm: Remove reassignment to desc following initializer
	ARM: dts: qcom: ipq4019: fix sleep clock
	soc: qcom: rpmpd: Check for null return of devm_kcalloc
	soc: qcom: ocmem: Fix missing put_device() call in of_get_ocmem
	soc: qcom: aoss: remove spurious IRQF_ONESHOT flags
	arm64: dts: qcom: sdm845: fix microphone bias properties and values
	arm64: dts: qcom: sm8250: fix PCIe bindings to follow schema
	arm64: dts: broadcom: bcm4908: use proper TWD binding
	arm64: dts: qcom: sm8150: Correct TCS configuration for apps rsc
	arm64: dts: qcom: sm8350: Correct TCS configuration for apps rsc
	firmware: ti_sci: Fix compilation failure when CONFIG_TI_SCI_PROTOCOL is not defined
	soc: ti: wkup_m3_ipc: Fix IRQ check in wkup_m3_ipc_probe
	ARM: dts: sun8i: v3s: Move the csi1 block to follow address order
	vsprintf: Fix potential unaligned access
	ARM: dts: imx: Add missing LVDS decoder on M53Menlo
	media: mexon-ge2d: fixup frames size in registers
	media: video/hdmi: handle short reads of hdmi info frame.
	media: ti-vpe: cal: Fix a NULL pointer dereference in cal_ctx_v4l2_init_formats()
	media: em28xx: initialize refcount before kref_get
	media: usb: go7007: s2250-board: fix leak in probe()
	media: cedrus: H265: Fix neighbour info buffer size
	media: cedrus: h264: Fix neighbour info buffer size
	ASoC: codecs: rx-macro: fix accessing compander for aux
	ASoC: codecs: rx-macro: fix accessing array out of bounds for enum type
	ASoC: codecs: va-macro: fix accessing array out of bounds for enum type
	ASoC: codecs: wc938x: fix accessing array out of bounds for enum type
	ASoC: codecs: wcd938x: fix kcontrol max values
	ASoC: codecs: wcd934x: fix kcontrol max values
	ASoC: codecs: wcd934x: fix return value of wcd934x_rx_hph_mode_put
	media: v4l2-core: Initialize h264 scaling matrix
	media: ov5640: Fix set format, v4l2_mbus_pixelcode not updated
	selftests/lkdtm: Add UBSAN config
	lib: uninline simple_strntoull() as well
	vsprintf: Fix %pK with kptr_restrict == 0
	uaccess: fix nios2 and microblaze get_user_8()
	ASoC: rt5663: check the return value of devm_kzalloc() in rt5663_parse_dp()
	soc: mediatek: pm-domains: Add wakeup capacity support in power domain
	mmc: sdhci_am654: Fix the driver data of AM64 SoC
	ASoC: ti: davinci-i2s: Add check for clk_enable()
	ALSA: spi: Add check for clk_enable()
	arm64: dts: ns2: Fix spi-cpol and spi-cpha property
	arm64: dts: broadcom: Fix sata nodename
	printk: fix return value of printk.devkmsg __setup handler
	ASoC: mxs-saif: Handle errors for clk_enable
	ASoC: atmel_ssc_dai: Handle errors for clk_enable
	ASoC: dwc-i2s: Handle errors for clk_enable
	ASoC: soc-compress: prevent the potentially use of null pointer
	memory: emif: Add check for setup_interrupts
	memory: emif: check the pointer temp in get_device_details()
	ALSA: firewire-lib: fix uninitialized flag for AV/C deferred transaction
	arm64: dts: rockchip: Fix SDIO regulator supply properties on rk3399-firefly
	m68k: coldfire/device.c: only build for MCF_EDMA when h/w macros are defined
	media: stk1160: If start stream fails, return buffers with VB2_BUF_STATE_QUEUED
	media: vidtv: Check for null return of vzalloc
	ASoC: atmel: Add missing of_node_put() in at91sam9g20ek_audio_probe
	ASoC: wm8350: Handle error for wm8350_register_irq
	ASoC: fsi: Add check for clk_enable
	video: fbdev: omapfb: Add missing of_node_put() in dvic_probe_of
	media: saa7134: fix incorrect use to determine if list is empty
	ivtv: fix incorrect device_caps for ivtvfb
	ASoC: atmel: Fix error handling in snd_proto_probe
	ASoC: rockchip: i2s: Fix missing clk_disable_unprepare() in rockchip_i2s_probe
	ASoC: SOF: Add missing of_node_put() in imx8m_probe
	ASoC: mediatek: use of_device_get_match_data()
	ASoC: mediatek: mt8192-mt6359: Fix error handling in mt8192_mt6359_dev_probe
	ASoC: rk817: Fix missing clk_disable_unprepare() in rk817_platform_probe
	ASoC: dmaengine: do not use a NULL prepare_slave_config() callback
	ASoC: mxs: Fix error handling in mxs_sgtl5000_probe
	ASoC: fsl_spdif: Disable TX clock when stop
	ASoC: imx-es8328: Fix error return code in imx_es8328_probe()
	ASoC: SOF: Intel: enable DMI L1 for playback streams
	ASoC: msm8916-wcd-digital: Fix missing clk_disable_unprepare() in msm8916_wcd_digital_probe
	mmc: davinci_mmc: Handle error for clk_enable
	ASoC: atmel: Fix error handling in sam9x5_wm8731_driver_probe
	ASoC: msm8916-wcd-analog: Fix error handling in pm8916_wcd_analog_spmi_probe
	ASoC: codecs: wcd934x: Add missing of_node_put() in wcd934x_codec_parse_data
	ASoC: amd: Fix reference to PCM buffer address
	ARM: configs: multi_v5_defconfig: re-enable CONFIG_V4L_PLATFORM_DRIVERS
	ARM: configs: multi_v5_defconfig: re-enable DRM_PANEL and FB_xxx
	drm/meson: osd_afbcd: Add an exit callback to struct meson_afbcd_ops
	drm/meson: Make use of the helper function devm_platform_ioremap_resourcexxx()
	drm/meson: split out encoder from meson_dw_hdmi
	drm/meson: Fix error handling when afbcd.ops->init fails
	drm/bridge: Fix free wrong object in sii8620_init_rcp_input_dev
	drm/bridge: Add missing pm_runtime_disable() in __dw_mipi_dsi_probe
	drm/bridge: nwl-dsi: Fix PM disable depth imbalance in nwl_dsi_probe
	drm: bridge: adv7511: Fix ADV7535 HPD enablement
	ath10k: fix memory overwrite of the WoWLAN wakeup packet pattern
	drm/v3d/v3d_drv: Check for error num after setting mask
	drm/panfrost: Check for error num after setting mask
	libbpf: Fix possible NULL pointer dereference when destroying skeleton
	bpftool: Only set obj->skeleton on complete success
	udmabuf: validate ubuf->pagecount
	bpf: Fix UAF due to race between btf_try_get_module and load_module
	drm/selftests/test-drm_dp_mst_helper: Fix memory leak in sideband_msg_req_encode_decode
	selftests: bpf: Fix bind on used port
	Bluetooth: btintel: Fix WBS setting for Intel legacy ROM products
	Bluetooth: hci_serdev: call init_rwsem() before p->open()
	mtd: onenand: Check for error irq
	mtd: rawnand: gpmi: fix controller timings setting
	drm/edid: Don't clear formats if using deep color
	drm/edid: Split deep color modes between RGB and YUV444
	ionic: fix type complaint in ionic_dev_cmd_clean()
	ionic: start watchdog after all is setup
	ionic: Don't send reset commands if FW isn't running
	drm/nouveau/acr: Fix undefined behavior in nvkm_acr_hsfw_load_bl()
	drm/amd/display: Fix a NULL pointer dereference in amdgpu_dm_connector_add_common_modes()
	drm/amd/pm: return -ENOTSUPP if there is no get_dpm_ultimate_freq function
	net: phy: at803x: move page selection fix to config_init
	selftests/bpf: Normalize XDP section names in selftests
	selftests/bpf/test_xdp_redirect_multi: use temp netns for testing
	ath9k_htc: fix uninit value bugs
	RDMA/core: Set MR type in ib_reg_user_mr
	KVM: PPC: Fix vmx/vsx mixup in mmio emulation
	selftests/net: timestamping: Fix bind_phc check
	i40e: don't reserve excessive XDP_PACKET_HEADROOM on XSK Rx to skb
	i40e: respect metadata on XSK Rx to skb
	igc: don't reserve excessive XDP_PACKET_HEADROOM on XSK Rx to skb
	ixgbe: pass bi->xdp to ixgbe_construct_skb_zc() directly
	ixgbe: don't reserve excessive XDP_PACKET_HEADROOM on XSK Rx to skb
	ixgbe: respect metadata on XSK Rx to skb
	power: reset: gemini-poweroff: Fix IRQ check in gemini_poweroff_probe
	ray_cs: Check ioremap return value
	powerpc: dts: t1040rdb: fix ports names for Seville Ethernet switch
	KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init
	powerpc/perf: Don't use perf_hw_context for trace IMC PMU
	mt76: connac: fix sta_rec_wtbl tag len
	mt76: mt7915: use proper aid value in mt7915_mcu_wtbl_generic_tlv in sta mode
	mt76: mt7915: use proper aid value in mt7915_mcu_sta_basic_tlv
	mt76: mt7921: fix a leftover race in runtime-pm
	mt76: mt7615: fix a leftover race in runtime-pm
	mt76: mt7603: check sta_rates pointer in mt7603_sta_rate_tbl_update
	mt76: mt7615: check sta_rates pointer in mt7615_sta_rate_tbl_update
	ptp: unregister virtual clocks when unregistering physical clock.
	net: dsa: mv88e6xxx: Enable port policy support on 6097
	mac80211: Remove a couple of obsolete TODO
	mac80211: limit bandwidth in HE capabilities
	scripts/dtc: Call pkg-config POSIXly correct
	livepatch: Fix build failure on 32 bits processors
	net: asix: add proper error handling of usb read errors
	i2c: bcm2835: Use platform_get_irq() to get the interrupt
	i2c: bcm2835: Fix the error handling in 'bcm2835_i2c_probe()'
	mtd: mchp23k256: Add SPI ID table
	mtd: mchp48l640: Add SPI ID table
	igc: avoid kernel warning when changing RX ring parameters
	igb: refactor XDP registration
	PCI: aardvark: Fix reading MSI interrupt number
	PCI: aardvark: Fix reading PCI_EXP_RTSTA_PME bit on emulated bridge
	RDMA/rxe: Check the last packet by RXE_END_MASK
	libbpf: Fix signedness bug in btf_dump_array_data()
	cxl/core: Fix cxl_probe_component_regs() error message
	cxl/regs: Fix size of CXL Capability Header Register
	net:enetc: allocate CBD ring data memory using DMA coherent methods
	libbpf: Fix compilation warning due to mismatched printf format
	drm/bridge: dw-hdmi: use safe format when first in bridge chain
	libbpf: Use dynamically allocated buffer when receiving netlink messages
	power: supply: ab8500: Fix memory leak in ab8500_fg_sysfs_init
	HID: i2c-hid: fix GET/SET_REPORT for unnumbered reports
	iommu/ipmmu-vmsa: Check for error num after setting mask
	drm/bridge: anx7625: Fix overflow issue on reading EDID
	bpftool: Fix the error when lookup in no-btf maps
	drm/amd/pm: enable pm sysfs write for one VF mode
	drm/amd/display: Add affected crtcs to atomic state for dsc mst unplug
	libbpf: Fix memleak in libbpf_netlink_recv()
	IB/cma: Allow XRC INI QPs to set their local ACK timeout
	dax: make sure inodes are flushed before destroy cache
	selftests: mptcp: add csum mib check for mptcp_connect
	iwlwifi: mvm: Don't call iwl_mvm_sta_from_mac80211() with NULL sta
	iwlwifi: mvm: don't iterate unadded vifs when handling FW SMPS req
	iwlwifi: mvm: align locking in D3 test debugfs
	iwlwifi: yoyo: remove DBGI_SRAM address reset writing
	iwlwifi: Fix -EIO error code that is never returned
	iwlwifi: mvm: Fix an error code in iwl_mvm_up()
	mtd: rawnand: pl353: Set the nand chip node as the flash node
	drm/msm/dp: populate connector of struct dp_panel
	drm/msm/dp: stop link training after link training 2 failed
	drm/msm/dp: always add fail-safe mode into connector mode list
	drm/msm/dsi: Use "ref" fw clock instead of global name for VCO parent
	drm/msm/dsi/phy: fix 7nm v4.0 settings for C-PHY mode
	drm/msm/dpu: add DSPP blocks teardown
	drm/msm/dpu: fix dp audio condition
	dm crypt: fix get_key_size compiler warning if !CONFIG_KEYS
	vfio/pci: fix memory leak during D3hot to D0 transition
	vfio/pci: wake-up devices around reset functions
	scsi: fnic: Fix a tracing statement
	scsi: pm8001: Fix command initialization in pm80XX_send_read_log()
	scsi: pm8001: Fix command initialization in pm8001_chip_ssp_tm_req()
	scsi: pm8001: Fix payload initialization in pm80xx_set_thermal_config()
	scsi: pm8001: Fix le32 values handling in pm80xx_set_sas_protocol_timer_config()
	scsi: pm8001: Fix payload initialization in pm80xx_encrypt_update()
	scsi: pm8001: Fix le32 values handling in pm80xx_chip_ssp_io_req()
	scsi: pm8001: Fix le32 values handling in pm80xx_chip_sata_req()
	scsi: pm8001: Fix NCQ NON DATA command task initialization
	scsi: pm8001: Fix NCQ NON DATA command completion handling
	scsi: pm8001: Fix abort all task initialization
	RDMA/mlx5: Fix the flow of a miss in the allocation of a cache ODP MR
	drm/amd/display: Remove vupdate_int_entry definition
	TOMOYO: fix __setup handlers return values
	power: supply: sbs-charger: Don't cancel work that is not initialized
	ext2: correct max file size computing
	drm/tegra: Fix reference leak in tegra_dsi_ganged_probe
	power: supply: bq24190_charger: Fix bq24190_vbus_is_enabled() wrong false return
	scsi: hisi_sas: Change permission of parameter prot_mask
	drm/bridge: cdns-dsi: Make sure to to create proper aliases for dt
	bpf, arm64: Call build_prologue() first in first JIT pass
	bpf, arm64: Feed byte-offset into bpf line info
	xsk: Fix race at socket teardown
	RDMA/irdma: Fix netdev notifications for vlan's
	RDMA/irdma: Fix Passthrough mode in VM
	RDMA/irdma: Remove incorrect masking of PD
	gpu: host1x: Fix a memory leak in 'host1x_remove()'
	libbpf: Skip forward declaration when counting duplicated type names
	powerpc/mm/numa: skip NUMA_NO_NODE onlining in parse_numa_properties()
	powerpc/Makefile: Don't pass -mcpu=powerpc64 when building 32-bit
	KVM: x86: Fix emulation in writing cr8
	KVM: x86/emulator: Defer not-present segment check in __load_segment_descriptor()
	hv_balloon: rate-limit "Unhandled message" warning
	i2c: xiic: Make bus names unique
	power: supply: wm8350-power: Handle error for wm8350_register_irq
	power: supply: wm8350-power: Add missing free in free_charger_irq
	IB/hfi1: Allow larger MTU without AIP
	RDMA/core: Fix ib_qp_usecnt_dec() called when error
	PCI: Reduce warnings on possible RW1C corruption
	net: axienet: fix RX ring refill allocation failure handling
	drm/msm/a6xx: Fix missing ARRAY_SIZE() check
	mips: DEC: honor CONFIG_MIPS_FP_SUPPORT=n
	MIPS: Sanitise Cavium switch cases in TLB handler synthesizers
	powerpc/sysdev: fix incorrect use to determine if list is empty
	powerpc/64s: Don't use DSISR for SLB faults
	mfd: mc13xxx: Add check for mc13xxx_irq_request
	libbpf: Unmap rings when umem deleted
	selftests/bpf: Make test_lwt_ip_encap more stable and faster
	platform/x86: huawei-wmi: check the return value of device_create_file()
	scsi: mpt3sas: Fix incorrect 4GB boundary check
	powerpc: 8xx: fix a return value error in mpc8xx_pic_init
	vxcan: enable local echo for sent CAN frames
	ath10k: Fix error handling in ath10k_setup_msa_resources
	mips: cdmm: Fix refcount leak in mips_cdmm_phys_base
	MIPS: RB532: fix return value of __setup handler
	MIPS: pgalloc: fix memory leak caused by pgd_free()
	mtd: rawnand: atmel: fix refcount issue in atmel_nand_controller_init
	power: ab8500_chargalg: Use CLOCK_MONOTONIC
	RDMA/irdma: Prevent some integer underflows
	Revert "RDMA/core: Fix ib_qp_usecnt_dec() called when error"
	RDMA/mlx5: Fix memory leak in error flow for subscribe event routine
	bpf, sockmap: Fix memleak in sk_psock_queue_msg
	bpf, sockmap: Fix memleak in tcp_bpf_sendmsg while sk msg is full
	bpf, sockmap: Fix more uncharged while msg has more_data
	bpf, sockmap: Fix double uncharge the mem of sk_msg
	samples/bpf, xdpsock: Fix race when running for fix duration of time
	USB: storage: ums-realtek: fix error code in rts51x_read_mem()
	drm/i915/display: Fix HPD short pulse handling for eDP
	netfilter: flowtable: Fix QinQ and pppoe support for inet table
	mt76: mt7921: fix mt7921_queues_acq implementation
	can: isotp: sanitize CAN ID checks in isotp_bind()
	can: isotp: return -EADDRNOTAVAIL when reading from unbound socket
	can: isotp: support MSG_TRUNC flag when reading from socket
	bareudp: use ipv6_mod_enabled to check if IPv6 enabled
	ibmvnic: fix race between xmit and reset
	af_unix: Fix some data-races around unix_sk(sk)->oob_skb.
	selftests/bpf: Fix error reporting from sock_fields programs
	Bluetooth: hci_uart: add missing NULL check in h5_enqueue
	Bluetooth: call hci_le_conn_failed with hdev lock in hci_le_conn_failed
	Bluetooth: btmtksdio: Fix kernel oops in btmtksdio_interrupt
	ipv4: Fix route lookups when handling ICMP redirects and PMTU updates
	af_netlink: Fix shift out of bounds in group mask calculation
	i2c: meson: Fix wrong speed use from probe
	netfilter: conntrack: Add and use nf_ct_set_auto_assign_helper_warned()
	i2c: mux: demux-pinctrl: do not deactivate a master that is not active
	powerpc/pseries: Fix use after free in remove_phb_dynamic()
	selftests/bpf/test_lirc_mode2.sh: Exit with proper code
	PCI: Avoid broken MSI on SB600 USB devices
	net: bcmgenet: Use stronger register read/writes to assure ordering
	tcp: ensure PMTU updates are processed during fastopen
	openvswitch: always update flow key after nat
	net: dsa: fix panic on shutdown if multi-chip tree failed to probe
	tipc: fix the timer expires after interval 100ms
	mfd: asic3: Add missing iounmap() on error asic3_mfd_probe
	ice: fix 'scheduling while atomic' on aux critical err interrupt
	ice: don't allow to run ice_send_event_to_aux() in atomic ctx
	drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool
	kernel/resource: fix kfree() of bootmem memory again
	staging: r8188eu: convert DBG_88E_LEVEL call in hal/rtl8188e_hal_init.c
	staging: r8188eu: release_firmware is not called if allocation fails
	mxser: fix xmit_buf leak in activate when LSR == 0xff
	fsi: scom: Fix error handling
	fsi: scom: Remove retries in indirect scoms
	pwm: lpc18xx-sct: Initialize driver data and hardware before pwmchip_add()
	pps: clients: gpio: Propagate return value from pps_gpio_probe
	fsi: Aspeed: Fix a potential double free
	misc: alcor_pci: Fix an error handling path
	cpufreq: qcom-cpufreq-nvmem: fix reading of PVS Valid fuse
	soundwire: intel: fix wrong register name in intel_shim_wake
	clk: qcom: ipq8074: fix PCI-E clock oops
	dmaengine: idxd: check GENCAP config support for gencfg register
	dmaengine: idxd: change bandwidth token to read buffers
	dmaengine: idxd: restore traffic class defaults after wq reset
	iio: mma8452: Fix probe failing when an i2c_device_id is used
	serial: 8250_aspeed_vuart: add PORT_ASPEED_VUART port type
	staging:iio:adc:ad7280a: Fix handing of device address bit reversing.
	pinctrl: renesas: r8a77470: Reduce size for narrow VIN1 channel
	pinctrl: renesas: checker: Fix miscalculation of number of states
	clk: qcom: ipq8074: Use floor ops for SDCC1 clock
	phy: dphy: Correct lpx parameter and its derivatives(ta_{get,go,sure})
	phy: phy-brcm-usb: fixup BCM4908 support
	serial: 8250_mid: Balance reference count for PCI DMA device
	serial: 8250_lpss: Balance reference count for PCI DMA device
	NFS: Use of mapping_set_error() results in spurious errors
	serial: 8250: Fix race condition in RTS-after-send handling
	iio: adc: Add check for devm_request_threaded_irq
	habanalabs: Add check for pci_enable_device
	NFS: Return valid errors from nfs2/3_decode_dirent()
	staging: r8188eu: fix endless loop in recv_func
	dma-debug: fix return value of __setup handlers
	clk: imx7d: Remove audio_mclk_root_clk
	clk: imx: off by one in imx_lpcg_parse_clks_from_dt()
	clk: at91: sama7g5: fix parents of PDMCs' GCLK
	clk: qcom: clk-rcg2: Update logic to calculate D value for RCG
	clk: qcom: clk-rcg2: Update the frac table for pixel clock
	dmaengine: hisi_dma: fix MSI allocate fail when reload hisi_dma
	remoteproc: qcom: Fix missing of_node_put in adsp_alloc_memory_region
	remoteproc: qcom_wcnss: Add missing of_node_put() in wcnss_alloc_memory_region
	remoteproc: qcom_q6v5_mss: Fix some leaks in q6v5_alloc_memory_region
	nvdimm/region: Fix default alignment for small regions
	clk: actions: Terminate clk_div_table with sentinel element
	clk: loongson1: Terminate clk_div_table with sentinel element
	clk: hisilicon: Terminate clk_div_table with sentinel element
	clk: clps711x: Terminate clk_div_table with sentinel element
	clk: Fix clk_hw_get_clk() when dev is NULL
	clk: tegra: tegra124-emc: Fix missing put_device() call in emc_ensure_emc_driver
	mailbox: imx: fix crash in resume on i.mx8ulp
	NFS: remove unneeded check in decode_devicenotify_args()
	staging: mt7621-dts: fix LEDs and pinctrl on GB-PC1 devicetree
	staging: mt7621-dts: fix formatting
	staging: mt7621-dts: fix pinctrl properties for ethernet
	staging: mt7621-dts: fix GB-PC2 devicetree
	pinctrl: mediatek: Fix missing of_node_put() in mtk_pctrl_init
	pinctrl: mediatek: paris: Fix PIN_CONFIG_BIAS_* readback
	pinctrl: mediatek: paris: Fix "argument" argument type for mtk_pinconf_get()
	pinctrl: mediatek: paris: Fix pingroup pin config state readback
	pinctrl: mediatek: paris: Skip custom extra pin config dump for virtual GPIOs
	pinctrl: microchip sgpio: use reset driver
	pinctrl: microchip-sgpio: lock RMW access
	pinctrl: nomadik: Add missing of_node_put() in nmk_pinctrl_probe
	pinctrl/rockchip: Add missing of_node_put() in rockchip_pinctrl_probe
	tty: hvc: fix return value of __setup handler
	kgdboc: fix return value of __setup handler
	serial: 8250: fix XOFF/XON sending when DMA is used
	virt: acrn: obtain pa from VMA with PFNMAP flag
	virt: acrn: fix a memory leak in acrn_dev_ioctl()
	kgdbts: fix return value of __setup handler
	firmware: google: Properly state IOMEM dependency
	driver core: dd: fix return value of __setup handler
	jfs: fix divide error in dbNextAG
	netfilter: nf_conntrack_tcp: preserve liberal flag in tcp options
	SUNRPC don't resend a task on an offlined transport
	NFSv4.1: don't retry BIND_CONN_TO_SESSION on session error
	kdb: Fix the putarea helper function
	perf stat: Fix forked applications enablement of counters
	clk: qcom: gcc-msm8994: Fix gpll4 width
	vsock/virtio: initialize vdev->priv before using VQs
	vsock/virtio: read the negotiated features before using VQs
	vsock/virtio: enable VQs early on probe
	clk: Initialize orphan req_rate
	xen: fix is_xen_pmu()
	net: enetc: report software timestamping via SO_TIMESTAMPING
	net: hns3: fix bug when PF set the duplicate MAC address for VFs
	net: hns3: fix port base vlan add fail when concurrent with reset
	net: hns3: add vlan list lock to protect vlan list
	net: hns3: format the output of the MAC address
	net: hns3: refine the process when PF set VF VLAN
	net: phy: broadcom: Fix brcm_fet_config_init()
	selftests: test_vxlan_under_vrf: Fix broken test case
	NFS: Don't loop forever in nfs_do_recoalesce()
	net: hns3: clean residual vf config after disable sriov
	net: sparx5: depends on PTP_1588_CLOCK_OPTIONAL
	qlcnic: dcb: default to returning -EOPNOTSUPP
	net/x25: Fix null-ptr-deref caused by x25_disconnect
	net: sparx5: switchdev: fix possible NULL pointer dereference
	octeontx2-af: initialize action variable
	net: prefer nf_ct_put instead of nf_conntrack_put
	net/sched: act_ct: fix ref leak when switching zones
	NFSv4/pNFS: Fix another issue with a list iterator pointing to the head
	net: dsa: bcm_sf2_cfp: fix an incorrect NULL check on list iterator
	fs: fd tables have to be multiples of BITS_PER_LONG
	lib/test: use after free in register_test_dev_kmod()
	fs: fix fd table size alignment properly
	LSM: general protection fault in legacy_parse_param
	regulator: rpi-panel: Handle I2C errors/timing to the Atmel
	crypto: hisilicon/qm - cleanup warning in qm_vf_read_qos
	gcc-plugins/stackleak: Exactly match strings instead of prefixes
	pinctrl: npcm: Fix broken references to chip->parent_device
	rcu: Mark writes to the rcu_segcblist structure's ->flags field
	block/bfq_wf2q: correct weight to ioprio
	crypto: xts - Add softdep on ecb
	crypto: hisilicon/sec - not need to enable sm4 extra mode at HW V3
	block, bfq: don't move oom_bfqq
	selinux: use correct type for context length
	arm64: module: remove (NOLOAD) from linker script
	selinux: allow FIOCLEX and FIONCLEX with policy capability
	loop: use sysfs_emit() in the sysfs xxx show()
	Fix incorrect type in assignment of ipv6 port for audit
	irqchip/qcom-pdc: Fix broken locking
	irqchip/nvic: Release nvic_base upon failure
	fs/binfmt_elf: Fix AT_PHDR for unusual ELF files
	bfq: fix use-after-free in bfq_dispatch_request
	ACPICA: Avoid walking the ACPI Namespace if it is not there
	lib/raid6/test/Makefile: Use $(pound) instead of \# for Make 4.3
	Revert "Revert "block, bfq: honor already-setup queue merges""
	ACPI/APEI: Limit printable size of BERT table data
	PM: core: keep irq flags in device_pm_check_callbacks()
	parisc: Fix handling off probe non-access faults
	nvme-tcp: lockdep: annotate in-kernel sockets
	spi: tegra20: Use of_device_get_match_data()
	atomics: Fix atomic64_{read_acquire,set_release} fallbacks
	locking/lockdep: Iterate lock_classes directly when reading lockdep files
	ext4: correct cluster len and clusters changed accounting in ext4_mb_mark_bb
	ext4: fix ext4_mb_mark_bb() with flex_bg with fast_commit
	sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE
	ext4: don't BUG if someone dirty pages without asking ext4 first
	f2fs: fix to do sanity check on curseg->alloc_type
	NFSD: Fix nfsd_breaker_owns_lease() return values
	f2fs: don't get FREEZE lock in f2fs_evict_inode in frozen fs
	btrfs: harden identification of a stale device
	btrfs: make search_csum_tree return 0 if we get -EFBIG
	f2fs: use spin_lock to avoid hang
	f2fs: compress: fix to print raw data size in error path of lz4 decompression
	Adjust cifssb maximum read size
	ntfs: add sanity check on allocation size
	media: staging: media: zoran: move videodev alloc
	media: staging: media: zoran: calculate the right buffer number for zoran_reap_stat_com
	media: staging: media: zoran: fix various V4L2 compliance errors
	media: atmel: atmel-isc-base: report frame sizes as full supported range
	media: ir_toy: free before error exiting
	ASoC: sh: rz-ssi: Make the data structures available before registering the handlers
	ASoC: SOF: Intel: match sdw version on link_slaves_found
	media: imx-jpeg: Prevent decoding NV12M jpegs into single-planar buffers
	media: iommu/mediatek-v1: Free the existed fwspec if the master dev already has
	media: iommu/mediatek: Return ENODEV if the device is NULL
	media: iommu/mediatek: Add device_link between the consumer and the larb devices
	video: fbdev: nvidiafb: Use strscpy() to prevent buffer overflow
	video: fbdev: w100fb: Reset global state
	video: fbdev: cirrusfb: check pixclock to avoid divide by zero
	video: fbdev: omapfb: acx565akm: replace snprintf with sysfs_emit
	ARM: dts: qcom: fix gic_irq_domain_translate warnings for msm8960
	ARM: dts: bcm2837: Add the missing L1/L2 cache information
	ASoC: madera: Add dependencies on MFD
	media: atomisp_gmin_platform: Add DMI quirk to not turn AXP ELDO2 regulator off on some boards
	media: atomisp: fix dummy_ptr check to avoid duplicate active_bo
	ARM: ftrace: avoid redundant loads or clobbering IP
	ARM: dts: imx7: Use audio_mclk_post_div instead audio_mclk_root_clk
	arm64: defconfig: build imx-sdma as a module
	video: fbdev: omapfb: panel-dsi-cm: Use sysfs_emit() instead of snprintf()
	video: fbdev: omapfb: panel-tpo-td043mtea1: Use sysfs_emit() instead of snprintf()
	video: fbdev: udlfb: replace snprintf in show functions with sysfs_emit
	ARM: dts: bcm2711: Add the missing L1/L2 cache information
	ASoC: soc-core: skip zero num_dai component in searching dai name
	media: imx-jpeg: fix a bug of accessing array out of bounds
	media: cx88-mpeg: clear interrupt status register before streaming video
	uaccess: fix type mismatch warnings from access_ok()
	lib/test_lockup: fix kernel pointer check for separate address spaces
	ARM: tegra: tamonten: Fix I2C3 pad setting
	ARM: mmp: Fix failure to remove sram device
	ASoC: amd: vg: fix for pm resume callback sequence
	video: fbdev: sm712fb: Fix crash in smtcfb_write()
	media: i2c: ov5648: Fix lockdep error
	media: Revert "media: em28xx: add missing em28xx_close_extension"
	media: hdpvr: initialize dev->worker at hdpvr_register_videodev
	ASoC: Intel: sof_sdw: fix quirks for 2022 HP Spectre x360 13"
	tracing: Have TRACE_DEFINE_ENUM affect trace event types as well
	mmc: host: Return an error when ->enable_sdio_irq() ops is missing
	media: atomisp: fix bad usage at error handling logic
	ALSA: hda/realtek: Add alc256-samsung-headphone fixup
	KVM: x86: Reinitialize context if host userspace toggles EFER.LME
	KVM: x86/mmu: Move "invalid" check out of kvm_tdp_mmu_get_root()
	KVM: x86/mmu: Zap _all_ roots when unmapping gfn range in TDP MMU
	KVM: x86/mmu: Check for present SPTE when clearing dirty bit in TDP MMU
	KVM: x86: hyper-v: Drop redundant 'ex' parameter from kvm_hv_send_ipi()
	KVM: x86: hyper-v: Drop redundant 'ex' parameter from kvm_hv_flush_tlb()
	KVM: x86: hyper-v: Fix the maximum number of sparse banks for XMM fast TLB flush hypercalls
	KVM: x86: hyper-v: HVCALL_SEND_IPI_EX is an XMM fast hypercall
	powerpc/kasan: Fix early region not updated correctly
	powerpc/lib/sstep: Fix 'sthcx' instruction
	powerpc/lib/sstep: Fix build errors with newer binutils
	powerpc: Add set_memory_{p/np}() and remove set_memory_attr()
	powerpc: Fix build errors with newer binutils
	drm/dp: Fix off-by-one in register cache size
	drm/i915: Treat SAGV block time 0 as SAGV disabled
	drm/i915: Fix PSF GV point mask when SAGV is not possible
	drm/i915: Reject unsupported TMDS rates on ICL+
	scsi: qla2xxx: Refactor asynchronous command initialization
	scsi: qla2xxx: Implement ref count for SRB
	scsi: qla2xxx: Fix stuck session in gpdb
	scsi: qla2xxx: Fix warning message due to adisc being flushed
	scsi: qla2xxx: Fix scheduling while atomic
	scsi: qla2xxx: Fix premature hw access after PCI error
	scsi: qla2xxx: Fix wrong FDMI data for 64G adapter
	scsi: qla2xxx: Fix warning for missing error code
	scsi: qla2xxx: Fix device reconnect in loop topology
	scsi: qla2xxx: edif: Fix clang warning
	scsi: qla2xxx: Fix T10 PI tag escape and IP guard options for 28XX adapters
	scsi: qla2xxx: Add devids and conditionals for 28xx
	scsi: qla2xxx: Check for firmware dump already collected
	scsi: qla2xxx: Suppress a kernel complaint in qla_create_qpair()
	scsi: qla2xxx: Fix disk failure to rediscover
	scsi: qla2xxx: Fix incorrect reporting of task management failure
	scsi: qla2xxx: Fix hang due to session stuck
	scsi: qla2xxx: Fix missed DMA unmap for NVMe ls requests
	scsi: qla2xxx: Fix N2N inconsistent PLOGI
	scsi: qla2xxx: Fix stuck session of PRLI reject
	scsi: qla2xxx: Reduce false trigger to login
	scsi: qla2xxx: Use correct feature type field during RFF_ID processing
	platform: chrome: Split trace include file
	KVM: x86: Check lapic_in_kernel() before attempting to set a SynIC irq
	KVM: x86: Avoid theoretical NULL pointer dereference in kvm_irq_delivery_to_apic_fast()
	KVM: x86: Forbid VMM to set SYNIC/STIMER MSRs when SynIC wasn't activated
	KVM: Prevent module exit until all VMs are freed
	KVM: x86: fix sending PV IPI
	KVM: SVM: fix panic on out-of-bounds guest IRQ
	ubifs: rename_whiteout: Fix double free for whiteout_ui->data
	ubifs: Fix deadlock in concurrent rename whiteout and inode writeback
	ubifs: Add missing iput if do_tmpfile() failed in rename whiteout
	ubifs: Rename whiteout atomically
	ubifs: Fix 'ui->dirty' race between do_tmpfile() and writeback work
	ubifs: Rectify space amount budget for mkdir/tmpfile operations
	ubifs: setflags: Make dirtied_ino_d 8 bytes aligned
	ubifs: Fix read out-of-bounds in ubifs_wbuf_write_nolock()
	ubifs: Fix to add refcount once page is set private
	ubifs: rename_whiteout: correct old_dir size computing
	nvme: allow duplicate NSIDs for private namespaces
	nvme: fix the read-only state for zoned namespaces with unsupposed features
	wireguard: queueing: use CFI-safe ptr_ring cleanup function
	wireguard: socket: free skb in send6 when ipv6 is disabled
	wireguard: socket: ignore v6 endpoints when ipv6 is disabled
	XArray: Fix xas_create_range() when multi-order entry present
	can: mcba_usb: mcba_usb_start_xmit(): fix double dev_kfree_skb in error path
	can: mcba_usb: properly check endpoint type
	can: mcp251xfd: mcp251xfd_register_get_dev_id(): fix return of error value
	XArray: Update the LRU list in xas_split()
	modpost: restore the warning message for missing symbol versions
	rtc: check if __rtc_read_time was successful
	gfs2: gfs2_setattr_size error path fix
	gfs2: Make sure FITRIM minlen is rounded up to fs block size
	net: hns3: fix the concurrency between functions reading debugfs
	net: hns3: fix software vlan talbe of vlan 0 inconsistent with hardware
	rxrpc: fix some null-ptr-deref bugs in server_key.c
	rxrpc: Fix call timer start racing with call destruction
	mailbox: imx: fix wakeup failure from freeze mode
	crypto: arm/aes-neonbs-cbc - Select generic cbc and aes
	watch_queue: Free the page array when watch_queue is dismantled
	pinctrl: pinconf-generic: Print arguments for bias-pull-*
	watchdog: rti-wdt: Add missing pm_runtime_disable() in probe function
	net: sparx5: uses, depends on BRIDGE or !BRIDGE
	pinctrl: nuvoton: npcm7xx: Rename DS() macro to DSTR()
	pinctrl: nuvoton: npcm7xx: Use %zu printk format for ARRAY_SIZE()
	ASoC: mediatek: mt6358: add missing EXPORT_SYMBOLs
	ubi: Fix race condition between ctrl_cdev_ioctl and ubi_cdev_ioctl
	ARM: iop32x: offset IRQ numbers by 1
	block: Fix the maximum minor value is blk_alloc_ext_minor()
	io_uring: fix memory leak of uid in files registration
	riscv module: remove (NOLOAD)
	ACPI: CPPC: Avoid out of bounds access when parsing _CPC data
	vhost: handle error while adding split ranges to iotlb
	spi: Fix Tegra QSPI example
	platform/chrome: cros_ec_typec: Check for EC device
	can: isotp: restore accidentally removed MSG_PEEK feature
	proc: bootconfig: Add null pointer check
	drm/connector: Fix typo in documentation
	scsi: qla2xxx: Add qla2x00_async_done() for async routines
	staging: mt7621-dts: fix pinctrl-0 items to be size-1 items on ethernet
	arm64: mm: Drop 'const' from conditional arm64_dma_phys_limit definition
	ASoC: soc-compress: Change the check for codec_dai
	Reinstate some of "swiotlb: rework "fix info leak with DMA_FROM_DEVICE""
	tracing: Have type enum modifications copy the strings
	net: add skb_set_end_offset() helper
	net: preserve skb_end_offset() in skb_unclone_keeptruesize()
	mm/mmap: return 1 from stack_guard_gap __setup() handler
	ARM: 9187/1: JIVE: fix return value of __setup handler
	mm/memcontrol: return 1 from cgroup.memory __setup() handler
	mm/usercopy: return 1 from hardened_usercopy __setup() handler
	af_unix: Support POLLPRI for OOB.
	bpf: Adjust BPF stack helper functions to accommodate skip > 0
	bpf: Fix comment for helper bpf_current_task_under_cgroup()
	mmc: rtsx: Use pm_runtime_{get,put}() to handle runtime PM
	dt-bindings: mtd: nand-controller: Fix the reg property description
	dt-bindings: mtd: nand-controller: Fix a comment in the examples
	dt-bindings: spi: mxic: The interrupt property is not mandatory
	dt-bindings: memory: mtk-smi: No need mediatek,larb-id for mt8167
	dt-bindings: pinctrl: pinctrl-microchip-sgpio: Fix example
	ubi: fastmap: Return error code if memory allocation fails in add_aeb()
	ASoC: SOF: Intel: Fix build error without SND_SOC_SOF_PCI_DEV
	ASoC: topology: Allow TLV control to be either read or write
	perf vendor events: Update metrics for SkyLake Server
	media: ov6650: Add try support to selection API operations
	media: ov6650: Fix crop rectangle affected by set format
	spi: mediatek: support tick_delay without enhance_timing
	ARM: dts: spear1340: Update serial node properties
	ARM: dts: spear13xx: Update SPI dma properties
	arm64: dts: ls1043a: Update i2c dma properties
	arm64: dts: ls1046a: Update i2c node dma properties
	um: Fix uml_mconsole stop/go
	docs: sysctl/kernel: add missing bit to panic_print
	openvswitch: Fixed nd target mask field in the flow dump.
	torture: Make torture.sh help message match reality
	n64cart: convert bi_disk to bi_bdev->bd_disk fix build
	mmc: rtsx: Let MMC core handle runtime PM
	mmc: rtsx: Fix build errors/warnings for unused variable
	KVM: x86/mmu: do compare-and-exchange of gPTE via the user address
	iommu/dma: Skip extra sync during unmap w/swiotlb
	iommu/dma: Fold _swiotlb helpers into callers
	iommu/dma: Check CONFIG_SWIOTLB more broadly
	swiotlb: Support aligned swiotlb buffers
	iommu/dma: Account for min_align_mask w/swiotlb
	coredump: Snapshot the vmas in do_coredump
	coredump: Remove the WARN_ON in dump_vma_snapshot
	coredump/elf: Pass coredump_params into fill_note_info
	coredump: Use the vma snapshot in fill_files_note
	PCI: xgene: Revert "PCI: xgene: Use inbound resources for setup"
	Linux 5.15.33

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Id62bd8a22d0bfa7c2096539d253ffce804bed017
2022-04-20 08:18:54 +02:00
Miaohe Lin
025a7ccfb7 mm/mlock: fix two bugs in user_shm_lock()
commit e97824ff663ce3509fe040431c713182c2f058b1 upstream.

user_shm_lock forgets to set allowed to 0 when get_ucounts fails. So the
later user_shm_unlock might do the extra dec_rlimit_ucounts. Also in the
RLIM_INFINITY case, user_shm_lock will success regardless of the value of
memlock where memblock == LONG_MAX && !capable(CAP_IPC_LOCK) should fail.
Fix all of these by changing the code to leave lock_limit at ULONG_MAX aka
RLIM_INFINITY, leave "allowed" initialized to 0 and remove the special case
of RLIM_INFINITY as nothing can be greater than ULONG_MAX.

Credit goes to Eric W. Biederman for proposing simplifying the code and
thus catching the later bug.

Fixes: d7c9e99aee ("Reimplement RLIMIT_MEMLOCK on top of ucounts")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: stable@vger.kernel.org
v1: https://lkml.kernel.org/r/20220310132417.41189-1-linmiaohe@huawei.com
v2: https://lkml.kernel.org/r/20220314064039.62972-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20220322080918.59861-1-linmiaohe@huawei.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-08 14:22:52 +02:00
Suren Baghdasaryan
0fd37220d8 UPSTREAM: mm: refactor vm_area_struct::anon_vma_name usage code
Avoid mixing strings and their anon_vma_name referenced pointers by
using struct anon_vma_name whenever possible.  This simplifies the code
and allows easier sharing of anon_vma_name structures when they
represent the same name.

[surenb@google.com: fix comment]

Link: https://lkml.kernel.org/r/20220223153613.835563-1-surenb@google.com
Link: https://lkml.kernel.org/r/20220224231834.1481408-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Colin Cross <ccross@google.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Alexey Gladkov <legion@kernel.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Chris Hyser <chris.hyser@oracle.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Xiaofeng Cao <caoxiaofeng@yulong.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 5c26f6ac9416b63d093e29c30e79b3297e425472)

Bug: 218352794
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I4a6b5602ce7151d1a4b88fac489f86d68089bd4d
2022-03-24 18:44:39 -07:00
Colin Cross
301c56064d UPSTREAM: mm: add a field to store names for private anonymous memory
In many userspace applications, and especially in VM based applications
like Android uses heavily, there are multiple different allocators in
use.  At a minimum there is libc malloc and the stack, and in many cases
there are libc malloc, the stack, direct syscalls to mmap anonymous
memory, and multiple VM heaps (one for small objects, one for big
objects, etc.).  Each of these layers usually has its own tools to
inspect its usage; malloc by compiling a debug version, the VM through
heap inspection tools, and for direct syscalls there is usually no way
to track them.

On Android we heavily use a set of tools that use an extended version of
the logic covered in Documentation/vm/pagemap.txt to walk all pages
mapped in userspace and slice their usage by process, shared (COW) vs.
unique mappings, backing, etc.  This can account for real physical
memory usage even in cases like fork without exec (which Android uses
heavily to share as many private COW pages as possible between
processes), Kernel SamePage Merging, and clean zero pages.  It produces
a measurement of the pages that only exist in that process (USS, for
unique), and a measurement of the physical memory usage of that process
with the cost of shared pages being evenly split between processes that
share them (PSS).

If all anonymous memory is indistinguishable then figuring out the real
physical memory usage (PSS) of each heap requires either a pagemap
walking tool that can understand the heap debugging of every layer, or
for every layer's heap debugging tools to implement the pagemap walking
logic, in which case it is hard to get a consistent view of memory
across the whole system.

Tracking the information in userspace leads to all sorts of problems.
It either needs to be stored inside the process, which means every
process has to have an API to export its current heap information upon
request, or it has to be stored externally in a filesystem that somebody
needs to clean up on crashes.  It needs to be readable while the process
is still running, so it has to have some sort of synchronization with
every layer of userspace.  Efficiently tracking the ranges requires
reimplementing something like the kernel vma trees, and linking to it
from every layer of userspace.  It requires more memory, more syscalls,
more runtime cost, and more complexity to separately track regions that
the kernel is already tracking.

This patch adds a field to /proc/pid/maps and /proc/pid/smaps to show a
userspace-provided name for anonymous vmas.  The names of named
anonymous vmas are shown in /proc/pid/maps and /proc/pid/smaps as
[anon:<name>].

Userspace can set the name for a region of memory by calling

   prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name)

Setting the name to NULL clears it.  The name length limit is 80 bytes
including NUL-terminator and is checked to contain only printable ascii
characters (including space), except '[',']','\','$' and '`'.

Ascii strings are being used to have a descriptive identifiers for vmas,
which can be understood by the users reading /proc/pid/maps or
/proc/pid/smaps.  Names can be standardized for a given system and they
can include some variable parts such as the name of the allocator or a
library, tid of the thread using it, etc.

The name is stored in a pointer in the shared union in vm_area_struct
that points to a null terminated string.  Anonymous vmas with the same
name (equivalent strings) and are otherwise mergeable will be merged.
The name pointers are not shared between vmas even if they contain the
same name.  The name pointer is stored in a union with fields that are
only used on file-backed mappings, so it does not increase memory usage.

CONFIG_ANON_VMA_NAME kernel configuration is introduced to enable this
feature.  It keeps the feature disabled by default to prevent any
additional memory overhead and to avoid confusing procfs parsers on
systems which are not ready to support named anonymous vmas.

The patch is based on the original patch developed by Colin Cross, more
specifically on its latest version [1] posted upstream by Sumit Semwal.
It used a userspace pointer to store vma names.  In that design, name
pointers could be shared between vmas.  However during the last
upstreaming attempt, Kees Cook raised concerns [2] about this approach
and suggested to copy the name into kernel memory space, perform
validity checks [3] and store as a string referenced from
vm_area_struct.

One big concern is about fork() performance which would need to strdup
anonymous vma names.  Dave Hansen suggested experimenting with
worst-case scenario of forking a process with 64k vmas having longest
possible names [4].  I ran this experiment on an ARM64 Android device
and recorded a worst-case regression of almost 40% when forking such a
process.

This regression is addressed in the followup patch which replaces the
pointer to a name with a refcounted structure that allows sharing the
name pointer between vmas of the same name.  Instead of duplicating the
string during fork() or when splitting a vma it increments the refcount.

[1] https://lore.kernel.org/linux-mm/20200901161459.11772-4-sumit.semwal@linaro.org/
[2] https://lore.kernel.org/linux-mm/202009031031.D32EF57ED@keescook/
[3] https://lore.kernel.org/linux-mm/202009031022.3834F692@keescook/
[4] https://lore.kernel.org/linux-mm/5d0358ab-8c47-2f5f-8e43-23b89d6a8e95@intel.com/

Changes for prctl(2) manual page (in the options section):

PR_SET_VMA
	Sets an attribute specified in arg2 for virtual memory areas
	starting from the address specified in arg3 and spanning the
	size specified	in arg4. arg5 specifies the value of the attribute
	to be set. Note that assigning an attribute to a virtual memory
	area might prevent it from being merged with adjacent virtual
	memory areas due to the difference in that attribute's value.

	Currently, arg2 must be one of:

	PR_SET_VMA_ANON_NAME
		Set a name for anonymous virtual memory areas. arg5 should
		be a pointer to a null-terminated string containing the
		name. The name length including null byte cannot exceed
		80 bytes. If arg5 is NULL, the name of the appropriate
		anonymous virtual memory areas will be reset. The name
		can contain only printable ascii characters (including
                space), except '[',']','\','$' and '`'.

                This feature is available only if the kernel is built with
                the CONFIG_ANON_VMA_NAME option enabled.

[surenb@google.com: docs: proc.rst: /proc/PID/maps: fix malformed table]
  Link: https://lkml.kernel.org/r/20211123185928.2513763-1-surenb@google.com
[surenb: rebased over v5.15-rc6, replaced userpointer with a kernel copy,
 added input sanitization and CONFIG_ANON_VMA_NAME config. The bulk of the
 work here was done by Colin Cross, therefore, with his permission, keeping
 him as the author]

Link: https://lkml.kernel.org/r/20211019215511.3771969-2-surenb@google.com
Signed-off-by: Colin Cross <ccross@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Glauber <jan.glauber@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Landley <rob@landley.net>
Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
Cc: Shaohua Li <shli@fusionio.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 9a10064f5625d5572c3626c1516e0bebc6c9fe9b)

Bug: 120441514
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I53d56d551a7d62f75341304751814294b447c04e
2022-01-18 15:30:27 -08:00
Suren Baghdasaryan
f355f9635d Revert "ANDROID: mm: add a field to store names for private anonymous memory"
This reverts commit 60500a4228.
Replacing out-of-tree implementation with the upstream one.

Bug: 120441514
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ic34c8e16d51ccf9f00cb59d2de341e911bcb2828
2022-01-18 14:54:47 -08:00
Lee Jones
a8b636db7d Merge tag 'v5.14-rc1' into android-mainline
Linux 5.14-rc1

Change-Id: I9765cd4581f6683a6fca3580667017fff9cbaa2b
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2021-07-13 10:02:04 +01:00
Lee Jones
293f275f4d Merge commit df8ba5f160 ("Merge tag 'kgdb-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux") into android-mainline
A large step en route to v5.14-rc1

Change-Id: I52bb71dc737044a593d1a9dfd7fe02b31e273ff9
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2021-07-12 11:00:18 +01:00
Lee Jones
8e658623d4 Merge commit c288d9cd71 ("Merge tag 'for-5.14/io_uring-2021-06-30' of git://git.kernel.dk/linux-block") into android-mainline
Another small step en route to v5.14-rc1

Change-Id: I24899ab78da7d367574ed69ceaa82ab0837d9556
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2021-07-12 10:02:27 +01:00
Mike Rapoport
1507f51255 mm: introduce memfd_secret system call to create "secret" memory areas
Introduce "memfd_secret" system call with the ability to create memory
areas visible only in the context of the owning process and not mapped not
only to other processes but in the kernel page tables as well.

The secretmem feature is off by default and the user must explicitly
enable it at the boot time.

Once secretmem is enabled, the user will be able to create a file
descriptor using the memfd_secret() system call.  The memory areas created
by mmap() calls from this file descriptor will be unmapped from the kernel
direct map and they will be only mapped in the page table of the processes
that have access to the file descriptor.

Secretmem is designed to provide the following protections:

* Enhanced protection (in conjunction with all the other in-kernel
  attack prevention systems) against ROP attacks.  Seceretmem makes
  "simple" ROP insufficient to perform exfiltration, which increases the
  required complexity of the attack.  Along with other protections like
  the kernel stack size limit and address space layout randomization which
  make finding gadgets is really hard, absence of any in-kernel primitive
  for accessing secret memory means the one gadget ROP attack can't work.
  Since the only way to access secret memory is to reconstruct the missing
  mapping entry, the attacker has to recover the physical page and insert
  a PTE pointing to it in the kernel and then retrieve the contents.  That
  takes at least three gadgets which is a level of difficulty beyond most
  standard attacks.

* Prevent cross-process secret userspace memory exposures.  Once the
  secret memory is allocated, the user can't accidentally pass it into the
  kernel to be transmitted somewhere.  The secreremem pages cannot be
  accessed via the direct map and they are disallowed in GUP.

* Harden against exploited kernel flaws.  In order to access secretmem,
  a kernel-side attack would need to either walk the page tables and
  create new ones, or spawn a new privileged uiserspace process to perform
  secrets exfiltration using ptrace.

The file descriptor based memory has several advantages over the
"traditional" mm interfaces, such as mlock(), mprotect(), madvise().  File
descriptor approach allows explicit and controlled sharing of the memory
areas, it allows to seal the operations.  Besides, file descriptor based
memory paves the way for VMMs to remove the secret memory range from the
userspace hipervisor process, for instance QEMU.  Andy Lutomirski says:

  "Getting fd-backed memory into a guest will take some possibly major
  work in the kernel, but getting vma-backed memory into a guest without
  mapping it in the host user address space seems much, much worse."

memfd_secret() is made a dedicated system call rather than an extension to
memfd_create() because it's purpose is to allow the user to create more
secure memory mappings rather than to simply allow file based access to
the memory.  Nowadays a new system call cost is negligible while it is way
simpler for userspace to deal with a clear-cut system calls than with a
multiplexer or an overloaded syscall.  Moreover, the initial
implementation of memfd_secret() is completely distinct from
memfd_create() so there is no much sense in overloading memfd_create() to
begin with.  If there will be a need for code sharing between these
implementation it can be easily achieved without a need to adjust user
visible APIs.

The secret memory remains accessible in the process context using uaccess
primitives, but it is not exposed to the kernel otherwise; secret memory
areas are removed from the direct map and functions in the
follow_page()/get_user_page() family will refuse to return a page that
belongs to the secret memory area.

Once there will be a use case that will require exposing secretmem to the
kernel it will be an opt-in request in the system call flags so that user
would have to decide what data can be exposed to the kernel.

Removing of the pages from the direct map may cause its fragmentation on
architectures that use large pages to map the physical memory which
affects the system performance.  However, the original Kconfig text for
CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "...  can
improve the kernel's performance a tiny bit ..." (commit 00d1c5e057
("x86: add gbpages switches")) and the recent report [1] showed that "...
although 1G mappings are a good default choice, there is no compelling
evidence that it must be the only choice".  Hence, it is sufficient to
have secretmem disabled by default with the ability of a system
administrator to enable it at boot time.

Pages in the secretmem regions are unevictable and unmovable to avoid
accidental exposure of the sensitive data via swap or during page
migration.

Since the secretmem mappings are locked in memory they cannot exceed
RLIMIT_MEMLOCK.  Since these mappings are already locked independently
from mlock(), an attempt to mlock()/munlock() secretmem range would fail
and mlockall()/munlockall() will ignore secretmem mappings.

However, unlike mlock()ed memory, secretmem currently behaves more like
long-term GUP: secretmem mappings are unmovable mappings directly consumed
by user space.  With default limits, there is no excessive use of
secretmem and it poses no real problem in combination with
ZONE_MOVABLE/CMA, but in the future this should be addressed to allow
balanced use of large amounts of secretmem along with ZONE_MOVABLE/CMA.

A page that was a part of the secret memory area is cleared when it is
freed to ensure the data is not exposed to the next user of that page.

The following example demonstrates creation of a secret mapping (error
handling is omitted):

	fd = memfd_secret(0);
	ftruncate(fd, MAP_SIZE);
	ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE,
		   MAP_SHARED, fd, 0);

[1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/

[akpm@linux-foundation.org: suppress Kconfig whine]

Link: https://lkml.kernel.org/r/20210518072034.31572-5-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08 11:48:21 -07:00
Linus Torvalds
71bd934101 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "190 patches.

  Subsystems affected by this patch series: mm (hugetlb, userfaultfd,
  vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock,
  migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap,
  zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc,
  core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs,
  signals, exec, kcov, selftests, compress/decompress, and ipc"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits)
  ipc/util.c: use binary search for max_idx
  ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock
  ipc: use kmalloc for msg_queue and shmid_kernel
  ipc sem: use kvmalloc for sem_undo allocation
  lib/decompressors: remove set but not used variabled 'level'
  selftests/vm/pkeys: exercise x86 XSAVE init state
  selftests/vm/pkeys: refill shadow register after implicit kernel write
  selftests/vm/pkeys: handle negative sys_pkey_alloc() return code
  selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random
  kcov: add __no_sanitize_coverage to fix noinstr for all architectures
  exec: remove checks in __register_bimfmt()
  x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned
  hfsplus: report create_date to kstat.btime
  hfsplus: remove unnecessary oom message
  nilfs2: remove redundant continue statement in a while-loop
  kprobes: remove duplicated strong free_insn_page in x86 and s390
  init: print out unknown kernel parameters
  checkpatch: do not complain about positive return values starting with EPOLL
  checkpatch: improve the indented label test
  checkpatch: scripts/spdxcheck.py now requires python3
  ...
2021-07-02 12:08:10 -07:00
Alistair Popple
cd62734ca6 mm/rmap: split try_to_munlock from try_to_unmap
The behaviour of try_to_unmap_one() is difficult to follow because it
performs different operations based on a fairly large set of flags used in
different combinations.

TTU_MUNLOCK is one such flag.  However it is exclusively used by
try_to_munlock() which specifies no other flags.  Therefore rather than
overload try_to_unmap_one() with unrelated behaviour split this out into
it's own function and remove the flag.

Link: https://lkml.kernel.org/r/20210616105937.23201-4-apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-01 11:06:03 -07:00
Linus Torvalds
c54b245d01 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace rlimit handling update from Eric Biederman:
 "This is the work mainly by Alexey Gladkov to limit rlimits to the
  rlimits of the user that created a user namespace, and to allow users
  to have stricter limits on the resources created within a user
  namespace."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  cred: add missing return error code when set_cred_ucounts() failed
  ucounts: Silence warning in dec_rlimit_ucounts
  ucounts: Set ucount_max to the largest positive value the type can hold
  kselftests: Add test to check for rlimit changes in different user namespaces
  Reimplement RLIMIT_MEMLOCK on top of ucounts
  Reimplement RLIMIT_SIGPENDING on top of ucounts
  Reimplement RLIMIT_MSGQUEUE on top of ucounts
  Reimplement RLIMIT_NPROC on top of ucounts
  Use atomic_t for ucounts reference counting
  Add a reference to ucounts for each cred
  Increase size of ucounts to atomic_long_t
2021-06-28 20:39:26 -07:00
Lee Jones
85adc860fd Merge 6efb943b86 Linux 5.13-rc1 into android-mainline
One giant leap, all the way up to 5.13-rc1

Also take the opportunity to re-align (a.k.a. fix a couple of previous
merge conflict fix-up issues) which occurred during this merge-window.

Fixes: 4797acfb9c ("Merge 16b3d0cf5b Merge tag 'sched-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into android-mainline")
Fixes: 92f282f338 ("Merge 8ca5297e7e Merge tag 'kconfig-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild into android-mainline")

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: Ie9389f595776e8f66bba6eaf0fa7a3587c6a5749
2021-05-15 09:09:01 +01:00
Zhiyuan Dai
68d68ff6eb mm/mempool: minor coding style tweaks
Various coding style tweaks to various files under mm/

[daizhiyuan@phytium.com.cn: mm/swapfile: minor coding style tweaks]
  Link: https://lkml.kernel.org/r/1614223624-16055-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/sparse: minor coding style tweaks]
  Link: https://lkml.kernel.org/r/1614227288-19363-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/vmscan: minor coding style tweaks]
  Link: https://lkml.kernel.org/r/1614227649-19853-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/compaction: minor coding style tweaks]
  Link: https://lkml.kernel.org/r/1614228218-20770-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/oom_kill: minor coding style tweaks]
  Link: https://lkml.kernel.org/r/1614228360-21168-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/shmem: minor coding style tweaks]
  Link: https://lkml.kernel.org/r/1614228504-21491-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/page_alloc: minor coding style tweaks]
  Link: https://lkml.kernel.org/r/1614228613-21754-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/filemap: minor coding style tweaks]
  Link: https://lkml.kernel.org/r/1614228936-22337-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/mlock: minor coding style tweaks]
  Link: https://lkml.kernel.org/r/1613956588-2453-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/frontswap: minor coding style tweaks]
  Link: https://lkml.kernel.org/r/1613962668-15045-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/vmalloc: minor coding style tweaks]
  Link: https://lkml.kernel.org/r/1613963379-15988-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/memory_hotplug: minor coding style tweaks]
  Link: https://lkml.kernel.org/r/1613971784-24878-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/mempolicy: minor coding style tweaks]
  Link: https://lkml.kernel.org/r/1613972228-25501-1-git-send-email-daizhiyuan@phytium.com.cn

Link: https://lkml.kernel.org/r/1614222374-13805-1-git-send-email-daizhiyuan@phytium.com.cn
Signed-off-by: Zhiyuan Dai <daizhiyuan@phytium.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05 11:27:27 -07:00
Alexey Gladkov
d7c9e99aee Reimplement RLIMIT_MEMLOCK on top of ucounts
The rlimit counter is tied to uid in the user_namespace. This allows
rlimit values to be specified in userns even if they are already
globally exceeded by the user. However, the value of the previous
user_namespaces cannot be exceeded.

Changelog

v11:
* Fix issue found by lkp robot.

v8:
* Fix issues found by lkp-tests project.

v7:
* Keep only ucounts for RLIMIT_MEMLOCK checks instead of struct cred.

v6:
* Fix bug in hugetlb_file_setup() detected by trinity.

Reported-by: kernel test robot <oliver.sang@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alexey Gladkov <legion@kernel.org>
Link: https://lkml.kernel.org/r/970d50c70c71bfd4496e0e8d2a0a32feebebb350.1619094428.git.legion@kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-04-30 14:14:02 -05:00
Greg Kroah-Hartman
14f1854cdb Merge fecfd01539 ("Merge tag 'leds-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds") into android-mainline
Steps on the way to 5.12-rc1

Resolves merge conflicts in:
	drivers/dma-buf/dma-heap.c
	drivers/dma-buf/heaps/cma_heap.c
	drivers/dma-buf/heaps/system_heap.c
	include/linux/dma-heap.h

Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibb32dbdba5183c9e19f5d1e94016cc1ae9616173
2021-03-07 08:45:40 +01:00
Greg Kroah-Hartman
3db2a71497 Merge 5b47b10e8f ("Merge tag 'pci-v5.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci") into android-mainline
Steps on the way to 5.12-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8330e967324622f71630bd9ee9e19086688d068e
2021-03-07 08:41:46 +01:00
Miaohe Lin
48b03eea32 mm/mlock: stop counting mlocked pages when none vma is found
There will be no vma satisfies addr < vm_end when find_vma() returns NULL.
Thus it's meaningless to traverse the vma list below because we can't
find any vma to count mlocked pages.  Stop counting mlocked pages in this
case to save some vma list traversal cycles.

Link: https://lkml.kernel.org/r/20210204110705.17586-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26 09:41:01 -08:00
Yu Zhao
46ae6b2cc2 mm/swap.c: don't pass "enum lru_list" to del_page_from_lru_list()
The parameter is redundant in the sense that it can be potentially
extracted from the "struct page" parameter by page_lru(). We need to
make sure that existing PageActive() or PageUnevictable() remains
until the function returns. A few places don't conform, and simple
reordering fixes them.

This patch may have left page_off_lru() seemingly odd, and we'll take
care of it in the next patch.

Link: https://lore.kernel.org/linux-mm/20201207220949.830352-6-yuzhao@google.com/
Link: https://lkml.kernel.org/r/20210122220600.906146-6-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24 13:38:33 -08:00
Greg Kroah-Hartman
44ede8412e Merge 5b200f5789 ("Merge branch 'akpm' (patches from Andrew)") into
android-mainline

Steps on the way to 5.11-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie0812f92e7f8ed8bd3bcf7525008b57ef78e4e01
2020-12-18 14:19:43 +01:00
Alexander Duyck
2a5e4e340b mm/lru: introduce relock_page_lruvec()
Add relock_page_lruvec() to replace repeated same code, no functional
change.

When testing for relock we can avoid the need for RCU locking if we simply
compare the page pgdat and memcg pointers versus those that the lruvec is
holding.  By doing this we can avoid the extra pointer walks and accesses
of the memory cgroup.

In addition we can avoid the checks entirely if lruvec is currently NULL.

[alex.shi@linux.alibaba.com: use page_memcg()]
  Link: https://lkml.kernel.org/r/66d8e79d-7ec6-bfbc-1c82-bf32db3ae5b7@linux.alibaba.com

Link: https://lkml.kernel.org/r/1604566549-62481-19-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Chen, Rong A" <rong.a.chen@intel.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 14:48:04 -08:00
Alex Shi
6168d0da2b mm/lru: replace pgdat lru_lock with lruvec lock
This patch moves per node lru_lock into lruvec, thus bring a lru_lock for
each of memcg per node.  So on a large machine, each of memcg don't have
to suffer from per node pgdat->lru_lock competition.  They could go fast
with their self lru_lock.

After move memcg charge before lru inserting, page isolation could
serialize page's memcg, then per memcg lruvec lock is stable and could
replace per node lru lock.

In isolate_migratepages_block(), compact_unlock_should_abort and
lock_page_lruvec_irqsave are open coded to work with compact_control.
Also add a debug func in locking which may give some clues if there are
sth out of hands.

Daniel Jordan's testing show 62% improvement on modified readtwice case on
his 2P * 10 core * 2 HT broadwell box.
https://lore.kernel.org/lkml/20200915165807.kpp7uhiw7l3loofu@ca-dmjordan1.us.oracle.com/

Hugh Dickins helped on the patch polish, thanks!

[alex.shi@linux.alibaba.com: fix comment typo]
  Link: https://lkml.kernel.org/r/5b085715-292a-4b43-50b3-d73dc90d1de5@linux.alibaba.com
[alex.shi@linux.alibaba.com: use page_memcg()]
  Link: https://lkml.kernel.org/r/5a4c2b72-7ee8-2478-fc0e-85eb83aafec4@linux.alibaba.com

Link: https://lkml.kernel.org/r/1604566549-62481-18-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rong Chen <rong.a.chen@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 14:48:04 -08:00
Alex Shi
d25b5bd8a8 mm/lru: introduce TestClearPageLRU()
Currently lru_lock still guards both lru list and page's lru bit, that's
ok.  but if we want to use specific lruvec lock on the page, we need to
pin down the page's lruvec/memcg during locking.  Just taking lruvec lock
first may be undermined by the page's memcg charge/migration.  To fix this
problem, we will clear the lru bit out of locking and use it as pin down
action to block the page isolation in memcg changing.

So now a standard steps of page isolation is following:
	1, get_page(); 	       #pin the page avoid to be free
	2, TestClearPageLRU(); #block other isolation like memcg change
	3, spin_lock on lru_lock; #serialize lru list access
	4, delete page from lru list;

This patch start with the first part: TestClearPageLRU, which combines
PageLRU check and ClearPageLRU into a macro func TestClearPageLRU.  This
function will be used as page isolation precondition to prevent other
isolations some where else.  Then there are may !PageLRU page on lru list,
need to remove BUG() checking accordingly.

There 2 rules for lru bit now:
1, the lru bit still indicate if a page on lru list, just in some
   temporary moment(isolating), the page may have no lru bit when
   it's on lru list.  but the page still must be on lru list when the
   lru bit set.
2, have to remove lru bit before delete it from lru list.

As Andrew Morton mentioned this change would dirty cacheline for a page
which isn't on the LRU.  But the loss would be acceptable in Rong Chen
<rong.a.chen@intel.com> report:
https://lore.kernel.org/lkml/20200304090301.GB5972@shao2-debian/

Link: https://lkml.kernel.org/r/1604566549-62481-15-git-send-email-alex.shi@linux.alibaba.com
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 14:48:04 -08:00
Alex Shi
13805a88a9 mm/mlock: remove __munlock_isolate_lru_page()
__munlock_isolate_lru_page() only has one caller, remove it to clean up
and simplify code.

Link: https://lkml.kernel.org/r/1604566549-62481-14-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: "Chen, Rong A" <rong.a.chen@intel.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 14:48:04 -08:00
Alex Shi
3db19aa39b mm/mlock: remove lru_lock on TestClearPageMlocked
In the func munlock_vma_page, comments mentained lru_lock needed for
serialization with split_huge_pages.  But the page must be PageLocked as
well as pages in split_huge_page series funcs.  Thus the PageLocked is
enough to serialize both funcs.

Further more, Hugh Dickins pointed: before splitting in
split_huge_page_to_list, the page was unmap_page() to remove pmd/ptes
which protect the page from munlock.  Thus, no needs to guard
__split_huge_page_tail for mlock clean, just keep the lru_lock there for
isolation purpose.

LKP found a preempt issue on __mod_zone_page_state which need change to
mod_zone_page_state.  Thanks!

Link: https://lkml.kernel.org/r/1604566549-62481-13-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: "Chen, Rong A" <rong.a.chen@intel.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 14:48:04 -08:00
Greg Kroah-Hartman
ee385f5df9 Merge 5.9-rc6 into android-mainline
Linux 5.9-rc6

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3bccdbb773bfc2c604742e6ff5983bf0b61ba0b5
2020-09-21 12:13:45 +02:00
Hugh Dickins
0964730bf4 mlock: fix unevictable_pgs event counts on THP
5.8 commit 5d91f31faf ("mm: swap: fix vmstats for huge page") has
established that vm_events should count every subpage of a THP, including
unevictable_pgs_culled and unevictable_pgs_rescued; but
lru_cache_add_inactive_or_unevictable() was not doing so for
unevictable_pgs_mlocked, and mm/mlock.c was not doing so for
unevictable_pgs mlocked, munlocked, cleared and stranded.

Fix them; but THPs don't go the pagevec way in mlock.c, so no fixes needed
on that path.

Fixes: 5d91f31faf ("mm: swap: fix vmstats for huge page")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Yang Shi <shy828301@gmail.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Qian Cai <cai@lca.pw>
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301408230.5954@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-19 13:13:38 -07:00
Greg Kroah-Hartman
df7e491a37 Merge 4b6c093e21 ("Merge tag 'block-5.9-2020-08-14' of git://git.kernel.dk/linux-block") into android-mainline
Steps on the way to 5.9-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I904678b5c31139b25fb49ee67cbacb4721a3c7bc
2020-08-17 09:19:35 +02:00
Matthew Wilcox (Oracle)
6c357848b4 mm: replace hpage_nr_pages with thp_nr_pages
The thp prefix is more frequently used than hpage and we should be
consistent between the various functions.

[akpm@linux-foundation.org: fix mm/migrate.c]

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/r/20200629151959.15779-6-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-14 19:56:56 -07:00
Greg Kroah-Hartman
a253db8915 Merge ad57a1022f ("Merge tag 'exfat-for-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat") into android-mainline
Steps on the way to 5.8-rc1.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4bc42f572167ea2f815688b4d1eb6124b6d260d4
2020-06-24 17:54:12 +02:00
Michel Lespinasse
c1e8d7c6a7 mmap locking API: convert mmap_sem comments
Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Michel Lespinasse
d8ed45c5dc mmap locking API: use coccinelle to convert mmap_sem rwsem call sites
This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&mm->mmap_sem)
+(mm)

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Greg Kroah-Hartman
94139142d9 Merge 5.4-rc1-prelrease into android-mainline
To make the 5.4-rc1 merge easier, merge at a prerelease point in time
before the final release happens.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: If613d657fd0abf9910c5bf3435a745f01b89765e
2019-10-02 17:58:47 +02:00
Andrey Konovalov
057d338910 mm: untag user pointers passed to memory syscalls
This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.

This patch allows tagged pointers to be passed to the following memory
syscalls: get_mempolicy, madvise, mbind, mincore, mlock, mlock2, mprotect,
mremap, msync, munlock, move_pages.

The mmap and mremap syscalls do not currently accept tagged addresses.
Architectures may interpret the tag as a background colour for the
corresponding vma.

Link: http://lkml.kernel.org/r/aaf0c0969d46b2feb9017f3e1b3ef3970b633d91.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jens Wiklander <jens.wiklander@linaro.org>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25 17:51:41 -07:00
Greg Kroah-Hartman
879ebb9016 Merge 5.2-rc5 into android-mainline
Linux 5.2-rc5

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-06-17 08:38:06 +02:00
swkhack
0874bb49bb mm/mlock.c: change count_mm_mlocked_page_nr return type
On a 64-bit machine the value of "vma->vm_end - vma->vm_start" may be
negative when using 32 bit ints and the "count >> PAGE_SHIFT"'s result
will be wrong.  So change the local variable and return value to
unsigned long to fix the problem.

Link: http://lkml.kernel.org/r/20190513023701.83056-1-swkhack@gmail.com
Fixes: 0cf2f6f6dc ("mm: mlock: check against vma for actual mlock() size")
Signed-off-by: swkhack <swkhack@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-13 17:34:56 -10:00
Potyra, Stefan
dedca63504 mm/mlock.c: mlockall error for flag MCL_ONFAULT
If mlockall() is called with only MCL_ONFAULT as flag, it removes any
previously applied lockings and does nothing else.

This behavior is counter-intuitive and doesn't match the Linux man page.

  For mlockall():

  EINVAL Unknown flags were specified or MCL_ONFAULT was specified
  without either MCL_FUTURE or MCL_CURRENT.

Consequently, return the error EINVAL, if only MCL_ONFAULT is passed.
That way, applications will at least detect that they are calling
mlockall() incorrectly.

Link: http://lkml.kernel.org/r/20190527075333.GA6339@er01809n.ebgroup.elektrobit.com
Fixes: b0f205c2a3 ("mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage")
Signed-off-by: Stefan Potyra <Stefan.Potyra@elektrobit.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-13 17:34:56 -10:00
Todd Kjos
0f2cb7cf80 Merge branch 'linux-mainline' into android-mainline-tmp
Change-Id: I4380c68c3474026a42ffa9f95c525f9a563ba7a3
2019-05-03 12:22:22 -07:00
Colin Cross
60500a4228 ANDROID: mm: add a field to store names for private anonymous memory
Userspace processes often have multiple allocators that each do
anonymous mmaps to get memory.  When examining memory usage of
individual processes or systems as a whole, it is useful to be
able to break down the various heaps that were allocated by
each layer and examine their size, RSS, and physical memory
usage.

This patch adds a user pointer to the shared union in
vm_area_struct that points to a null terminated string inside
the user process containing a name for the vma.  vmas that
point to the same address will be merged, but vmas that
point to equivalent strings at different addresses will
not be merged.

Userspace can set the name for a region of memory by calling
prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
Setting the name to NULL clears it.

The names of named anonymous vmas are shown in /proc/pid/maps
as [anon:<name>] and in /proc/pid/smaps in a new "Name" field
that is only present for named vmas.  If the userspace pointer
is no longer valid all or part of the name will be replaced
with "<fault>".

The idea to store a userspace pointer to reduce the complexity
within mm (at the expense of the complexity of reading
/proc/pid/mem) came from Dave Hansen.  This results in no
runtime overhead in the mm subsystem other than comparing
the anon_name pointers when considering vma merging.  The pointer
is stored in a union with fieds that are only used on file-backed
mappings, so it does not increase memory usage.

Includes fix from Jed Davis <jld@mozilla.com> for typo in
prctl_set_vma_anon_name, which could attempt to set the name
across two vmas at the same time due to a typo, which might
corrupt the vma list.  Fix it to use tmp instead of end to limit
the name setting to a single vma at a time.

Bug: 120441514
Change-Id: I9aa7b6b5ef536cd780599ba4e2fba8ceebe8b59f
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
[AmitP: Fix get_user_pages_remote() call to align with upstream commit
        5b56d49fc3 ("mm: add locked parameter to get_user_pages_remote()")]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2019-05-03 10:39:58 -07:00
Andrey Ryabinin
f4b7e272b5 mm: remove zone_lru_lock() function, access ->lru_lock directly
We have common pattern to access lru_lock from a page pointer:
	zone_lru_lock(page_zone(page))

Which is silly, because it unfolds to this:
	&NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)]->zone_pgdat->lru_lock
while we can simply do
	&NODE_DATA(page_to_nid(page))->lru_lock

Remove zone_lru_lock() function, since it's only complicate things.  Use
'page_pgdat(page)->lru_lock' pattern instead.

[aryabinin@virtuozzo.com: a slightly better version of __split_huge_page()]
  Link: http://lkml.kernel.org/r/20190301121651.7741-1-aryabinin@virtuozzo.com
Link: http://lkml.kernel.org/r/20190228083329.31892-2-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:21 -08:00
Dave Jiang
e1fb4a0864 dax: remove VM_MIXEDMAP for fsdax and device dax
This patch is reworked from an earlier patch that Dan has posted:
https://patchwork.kernel.org/patch/10131727/

VM_MIXEDMAP is used by dax to direct mm paths like vm_normal_page() that
the memory page it is dealing with is not typical memory from the linear
map.  The get_user_pages_fast() path, since it does not resolve the vma,
is already using {pte,pmd}_devmap() as a stand-in for VM_MIXEDMAP, so we
use that as a VM_MIXEDMAP replacement in some locations.  In the cases
where there is no pte to consult we fallback to using vma_is_dax() to
detect the VM_MIXEDMAP special case.

Now that we have explicit driver pfn_t-flag opt-in/opt-out for
get_user_pages() support for DAX we can stop setting VM_MIXEDMAP.  This
also means we no longer need to worry about safely manipulating vm_flags
in a future where we support dynamically changing the dax mode of a
file.

DAX should also now be supported with madvise_behavior(), vma_merge(),
and copy_page_range().

This patch has been tested against ndctl unit test.  It has also been
tested against xfstests commit: 625515d using fake pmem created by
memmap and no additional issues have been observed.

Link: http://lkml.kernel.org/r/152847720311.55924.16999195879201817653.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-17 16:20:27 -07:00
Shakeel Butt
9c4e6b1a70 mm, mlock, vmscan: no more skipping pagevecs
When a thread mlocks an address space backed either by file pages which
are currently not present in memory or swapped out anon pages (not in
swapcache), a new page is allocated and added to the local pagevec
(lru_add_pvec), I/O is triggered and the thread then sleeps on the page.
On I/O completion, the thread can wake on a different CPU, the mlock
syscall will then sets the PageMlocked() bit of the page but will not be
able to put that page in unevictable LRU as the page is on the pagevec
of a different CPU.  Even on drain, that page will go to evictable LRU
because the PageMlocked() bit is not checked on pagevec drain.

The page will eventually go to right LRU on reclaim but the LRU stats
will remain skewed for a long time.

This patch puts all the pages, even unevictable, to the pagevecs and on
the drain, the pages will be added on their LRUs correctly by checking
their evictability.  This resolves the mlocked pages on pagevec of other
CPUs issue because when those pagevecs will be drained, the mlocked file
pages will go to unevictable LRU.  Also this makes the race with munlock
easier to resolve because the pagevec drains happen in LRU lock.

However there is still one place which makes a page evictable and does
PageLRU check on that page without LRU lock and needs special attention.
TestClearPageMlocked() and isolate_lru_page() in clear_page_mlock().

	#0: __pagevec_lru_add_fn	#1: clear_page_mlock

	SetPageLRU()			if (!TestClearPageMlocked())
					  return
	smp_mb() // <--required
					// inside does PageLRU
	if (!PageMlocked())		if (isolate_lru_page())
	  move to evictable LRU		  putback_lru_page()
	else
	  move to unevictable LRU

In '#1', TestClearPageMlocked() provides full memory barrier semantics
and thus the PageLRU check (inside isolate_lru_page) can not be
reordered before it.

In '#0', without explicit memory barrier, the PageMlocked() check can be
reordered before SetPageLRU().  If that happens, '#0' can put a page in
unevictable LRU and '#1' might have just cleared the Mlocked bit of that
page but fails to isolate as PageLRU fails as '#0' still hasn't set
PageLRU bit of that page.  That page will be stranded on the unevictable
LRU.

There is one (good) side effect though.  Without this patch, the pages
allocated for System V shared memory segment are added to evictable LRUs
even after shmctl(SHM_LOCK) on that segment.  This patch will correctly
put such pages to unevictable LRU.

Link: http://lkml.kernel.org/r/20171121211241.18877-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shaohua Li <shli@fb.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-21 15:35:42 -08:00
Mike Rapoport
b7701a5f2e mm: docs: fixup punctuation
so that kernel-doc will properly recognize the parameter and function
descriptions.

Link: http://lkml.kernel.org/r/1516700871-22279-2-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:48 -08:00
Paul E. McKenney
50d4fb7812 mm: Eliminate cond_resched_rcu_qs() in favor of cond_resched()
Now that cond_resched() also provides RCU quiescent states when
needed, it can be used in place of cond_resched_rcu_qs().  This
commit therefore makes this change.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
2017-11-28 16:00:28 -08:00
Shakeel Butt
72b03fcd5d mm: mlock: remove lru_add_drain_all()
lru_add_drain_all() is not required by mlock() and it will drain
everything that has been cached at the time mlock is called.  And that
is not really related to the memory which will be faulted in (and
cached) and mlocked by the syscall itself.

If anything lru_add_drain_all() should be called _after_ pages have been
mlocked and faulted in but even that is not strictly needed because
those pages would get to the appropriate LRUs lazily during the reclaim
path.  Moreover follow_page_pte (gup) will drain the local pcp LRU
cache.

On larger machines the overhead of lru_add_drain_all() in mlock() can be
significant when mlocking data already in memory.  We have observed high
latency in mlock() due to lru_add_drain_all() when the users were
mlocking in memory tmpfs files.

[mhocko@suse.com: changelog fix]
Link: http://lkml.kernel.org/r/20171019222507.2894-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:07 -08:00
Mel Gorman
8667982014 mm, pagevec: remove cold parameter for pagevecs
Every pagevec_init user claims the pages being released are hot even in
cases where it is unlikely the pages are hot.  As no one cares about the
hotness of pages being released to the allocator, just ditch the
parameter.

No performance impact is expected as the overhead is marginal.  The
parameter is removed simply because it is a bit stupid to have a useless
parameter copied everywhere.

Link: http://lkml.kernel.org/r/20171018075952.10627-6-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:06 -08:00
Greg Kroah-Hartman
b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Joonsoo Kim
9472f23c9e mm/mlock.c: use page_zone() instead of page_zone_id()
page_zone_id() is a specialized function to compare the zone for the pages
that are within the section range.  If the section of the pages are
different, page_zone_id() can be different even if their zone is the same.
This wrong usage doesn't cause any actual problem since
__munlock_pagevec_fill() would be called again with failed index.
However, it's better to use more appropriate function here.

Link: http://lkml.kernel.org/r/1503559211-10259-1-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:47 -07:00