Commit Graph

17853 Commits

Author SHA1 Message Date
Peter Yoon
670c6da635 Merge branch 'android14-5.15' into arpi-5.15.92 2023-02-28 17:56:07 +09:00
Andrey Konovalov
f34aed1750 ANDROID: kasan: fix slab page check in complete_report_info
The backport commit 77a0deb5d5 ("BACKPORT: kasan: fill in cache and
object in complete_report_info") did not resolve the conflict due to the
folio patchset missing in 5.15 correctly: complete_report_info needs to
check PageSlab to make sure that the page is a slab page.

Add a PageSlab check to complete_report_info.

Bug: 254721825
Reported-by: Peter Collingbourne <pcc@google.com>
Fixes: 77a0deb5d5 ("BACKPORT: kasan: fill in cache and object in complete_report_info")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Change-Id: I307ddbd9315134f825b37a0c7254a033453a46ef
2023-02-10 00:01:43 +01:00
Dom Cobley
e9e421392e Merge remote-tracking branch 'stable/linux-5.15.y' into rpi-5.15.y 2023-02-01 17:30:22 +00:00
Greg Kroah-Hartman
e3d8fe0993 Merge 5.15.91 into android14-5.15
Changes in 5.15.91
	memory: tegra: Remove clients SID override programming
	memory: atmel-sdramc: Fix missing clk_disable_unprepare in atmel_ramc_probe()
	memory: mvebu-devbus: Fix missing clk_disable_unprepare in mvebu_devbus_probe()
	dmaengine: ti: k3-udma: Do conditional decrement of UDMA_CHAN_RT_PEER_BCNT_REG
	arm64: dts: imx8mp-phycore-som: Remove invalid PMIC property
	ARM: dts: imx6ul-pico-dwarf: Use 'clock-frequency'
	ARM: dts: imx7d-pico: Use 'clock-frequency'
	ARM: dts: imx6qdl-gw560x: Remove incorrect 'uart-has-rtscts'
	arm64: dts: imx8mm-beacon: Fix ecspi2 pinmux
	ARM: imx: add missing of_node_put()
	HID: intel_ish-hid: Add check for ishtp_dma_tx_map
	arm64: dts: imx8mm-venice-gw7901: fix USB2 controller OC polarity
	soc: imx8m: Fix incorrect check for of_clk_get_by_name()
	reset: uniphier-glue: Use reset_control_bulk API
	reset: uniphier-glue: Fix possible null-ptr-deref
	EDAC/highbank: Fix memory leak in highbank_mc_probe()
	firmware: arm_scmi: Harden shared memory access in fetch_response
	firmware: arm_scmi: Harden shared memory access in fetch_notification
	tomoyo: fix broken dependency on *.conf.default
	RDMA/core: Fix ib block iterator counter overflow
	IB/hfi1: Reject a zero-length user expected buffer
	IB/hfi1: Reserve user expected TIDs
	IB/hfi1: Fix expected receive setup error exit issues
	IB/hfi1: Immediately remove invalid memory from hardware
	IB/hfi1: Remove user expected buffer invalidate race
	affs: initialize fsdata in affs_truncate()
	PM: AVS: qcom-cpr: Fix an error handling path in cpr_probe()
	arm64: dts: qcom: msm8992: Don't use sfpb mutex
	arm64: dts: qcom: msm8992-libra: Add CPU regulators
	arm64: dts: qcom: msm8992-libra: Fix the memory map
	phy: ti: fix Kconfig warning and operator precedence
	NFSD: fix use-after-free in nfsd4_ssc_setup_dul()
	ARM: dts: at91: sam9x60: fix the ddr clock for sam9x60
	amd-xgbe: TX Flow Ctrl Registers are h/w ver dependent
	amd-xgbe: Delay AN timeout during KR training
	bpf: Fix pointer-leak due to insufficient speculative store bypass mitigation
	phy: rockchip-inno-usb2: Fix missing clk_disable_unprepare() in rockchip_usb2phy_power_on()
	net: nfc: Fix use-after-free in local_cleanup()
	net: wan: Add checks for NULL for utdm in undo_uhdlc_init and unmap_si_regs
	net: enetc: avoid deadlock in enetc_tx_onestep_tstamp()
	sch_htb: Avoid grafting on htb_destroy_class_offload when destroying htb
	gpio: use raw spinlock for gpio chip shadowed data
	gpio: mxc: Protect GPIO irqchip RMW with bgpio spinlock
	gpio: mxc: Always set GPIOs used as interrupt source to INPUT mode
	wifi: rndis_wlan: Prevent buffer overflow in rndis_query_oid
	pinctrl/rockchip: Use temporary variable for struct device
	pinctrl/rockchip: add error handling for pull/drive register getters
	pinctrl: rockchip: fix reading pull type on rk3568
	net: stmmac: Fix queue statistics reading
	net/sched: sch_taprio: fix possible use-after-free
	l2tp: Serialize access to sk_user_data with sk_callback_lock
	l2tp: Don't sleep and disable BH under writer-side sk_callback_lock
	l2tp: convert l2tp_tunnel_list to idr
	l2tp: close all race conditions in l2tp_tunnel_register()
	octeontx2-pf: Avoid use of GFP_KERNEL in atomic context
	net: usb: sr9700: Handle negative len
	net: mdio: validate parameter addr in mdiobus_get_phy()
	HID: check empty report_list in hid_validate_values()
	HID: check empty report_list in bigben_probe()
	net: stmmac: fix invalid call to mdiobus_get_phy()
	pinctrl: rockchip: fix mux route data for rk3568
	HID: revert CHERRY_MOUSE_000C quirk
	usb: gadget: f_fs: Prevent race during ffs_ep0_queue_wait
	usb: gadget: f_fs: Ensure ep0req is dequeued before free_request
	Bluetooth: Fix possible deadlock in rfcomm_sk_state_change
	net: ipa: disable ipa interrupt during suspend
	net/mlx5: E-switch, Fix setting of reserved fields on MODIFY_SCHEDULING_ELEMENT
	net: mlx5: eliminate anonymous module_init & module_exit
	drm/panfrost: fix GENERIC_ATOMIC64 dependency
	dmaengine: Fix double increment of client_count in dma_chan_get()
	net: macb: fix PTP TX timestamp failure due to packet padding
	virtio-net: correctly enable callback during start_xmit
	l2tp: prevent lockdep issue in l2tp_tunnel_register()
	HID: betop: check shape of output reports
	cifs: fix potential deadlock in cache_refresh_path()
	dmaengine: xilinx_dma: call of_node_put() when breaking out of for_each_child_of_node()
	phy: phy-can-transceiver: Skip warning if no "max-bitrate"
	drm/amd/display: fix issues with driver unload
	nvme-pci: fix timeout request state check
	tcp: avoid the lookup process failing to get sk in ehash table
	octeontx2-pf: Fix the use of GFP_KERNEL in atomic context on rt
	ptdma: pt_core_execute_cmd() should use spinlock
	device property: fix of node refcount leak in fwnode_graph_get_next_endpoint()
	w1: fix deadloop in __w1_remove_master_device()
	w1: fix WARNING after calling w1_process()
	driver core: Fix test_async_probe_init saves device in wrong array
	selftests/net: toeplitz: fix race on tpacket_v3 block close
	net: dsa: microchip: ksz9477: port map correction in ALU table entry register
	thermal/core: Remove duplicate information when an error occurs
	thermal/core: Rename 'trips' to 'num_trips'
	thermal: Validate new state in cur_state_store()
	thermal/core: fix error code in __thermal_cooling_device_register()
	thermal: core: call put_device() only after device_register() fails
	net: stmmac: enable all safety features by default
	tcp: fix rate_app_limited to default to 1
	scsi: iscsi: Fix multiple iSCSI session unbind events sent to userspace
	cpufreq: Add Tegra234 to cpufreq-dt-platdev blocklist
	kcsan: test: don't put the expect array on the stack
	cpufreq: Add SM6375 to cpufreq-dt-platdev blocklist
	ASoC: fsl_micfil: Correct the number of steps on SX controls
	net: usb: cdc_ether: add support for Thales Cinterion PLS62-W modem
	drm: Add orientation quirk for Lenovo ideapad D330-10IGL
	s390/debug: add _ASM_S390_ prefix to header guard
	s390: expicitly align _edata and _end symbols on page boundary
	perf/x86/msr: Add Emerald Rapids
	perf/x86/intel/uncore: Add Emerald Rapids
	cpufreq: armada-37xx: stop using 0 as NULL pointer
	ASoC: fsl_ssi: Rename AC'97 streams to avoid collisions with AC'97 CODEC
	ASoC: fsl-asoc-card: Fix naming of AC'97 CODEC widgets
	spi: spidev: remove debug messages that access spidev->spi without locking
	KVM: s390: interrupt: use READ_ONCE() before cmpxchg()
	scsi: hisi_sas: Set a port invalid only if there are no devices attached when refreshing port id
	r8152: add vendor/device ID pair for Microsoft Devkit
	platform/x86: touchscreen_dmi: Add info for the CSL Panther Tab HD
	platform/x86: asus-nb-wmi: Add alternate mapping for KEY_SCREENLOCK
	lockref: stop doing cpu_relax in the cmpxchg loop
	firmware: coreboot: Check size of table entry and use flex-array
	drm/i915: Allow switching away via vga-switcheroo if uninitialized
	Revert "selftests/bpf: check null propagation only neither reg is PTR_TO_BTF_ID"
	drm/i915: Remove unused variable
	x86: ACPI: cstate: Optimize C3 entry on AMD CPUs
	fs: reiserfs: remove useless new_opts in reiserfs_remount
	sysctl: add a new register_sysctl_init() interface
	kernel/panic: move panic sysctls to its own file
	panic: unset panic_on_warn inside panic()
	ubsan: no need to unset panic_on_warn in ubsan_epilogue()
	kasan: no need to unset panic_on_warn in end_report()
	exit: Add and use make_task_dead.
	objtool: Add a missing comma to avoid string concatenation
	hexagon: Fix function name in die()
	h8300: Fix build errors from do_exit() to make_task_dead() transition
	csky: Fix function name in csky_alignment() and die()
	ia64: make IA64_MCA_RECOVERY bool instead of tristate
	panic: Separate sysctl logic from CONFIG_SMP
	exit: Put an upper limit on how often we can oops
	exit: Expose "oops_count" to sysfs
	exit: Allow oops_limit to be disabled
	panic: Consolidate open-coded panic_on_warn checks
	panic: Introduce warn_limit
	panic: Expose "warn_count" to sysfs
	docs: Fix path paste-o for /sys/kernel/warn_count
	exit: Use READ_ONCE() for all oops/warn limit reads
	Bluetooth: hci_sync: cancel cmd_timer if hci_open failed
	drm/amdgpu: complete gfxoff allow signal during suspend without delay
	scsi: hpsa: Fix allocation size for scsi_host_alloc()
	KVM: SVM: fix tsc scaling cache logic
	module: Don't wait for GOING modules
	tracing: Make sure trace_printk() can output as soon as it can be used
	trace_events_hist: add check for return value of 'create_hist_field'
	ftrace/scripts: Update the instructions for ftrace-bisect.sh
	cifs: Fix oops due to uncleared server->smbd_conn in reconnect
	i2c: mv64xxx: Remove shutdown method from driver
	i2c: mv64xxx: Add atomic_xfer method to driver
	ksmbd: add smbd max io size parameter
	ksmbd: add max connections parameter
	ksmbd: do not sign response to session request for guest login
	ksmbd: downgrade ndr version error message to debug
	ksmbd: limit pdu length size according to connection status
	ovl: fail on invalid uid/gid mapping at copy up
	KVM: x86/vmx: Do not skip segment attributes if unusable bit is set
	KVM: arm64: GICv4.1: Fix race with doorbell on VPE activation/deactivation
	thermal: intel: int340x: Protect trip temperature from concurrent updates
	ipv6: fix reachability confirmation with proxy_ndp
	ARM: 9280/1: mm: fix warning on phys_addr_t to void pointer assignment
	EDAC/device: Respect any driver-supplied workqueue polling value
	EDAC/qcom: Do not pass llcc_driv_data as edac_device_ctl_info's pvt_info
	net: mana: Fix IRQ name - add PCI and queue number
	scsi: ufs: core: Fix devfreq deadlocks
	i2c: designware: use casting of u64 in clock multiplication to avoid overflow
	netlink: prevent potential spectre v1 gadgets
	net: fix UaF in netns ops registration error path
	drm/i915/selftest: fix intel_selftest_modify_policy argument types
	netfilter: nft_set_rbtree: Switch to node list walk for overlap detection
	netfilter: nft_set_rbtree: skip elements in transaction from garbage collection
	netlink: annotate data races around nlk->portid
	netlink: annotate data races around dst_portid and dst_group
	netlink: annotate data races around sk_state
	ipv4: prevent potential spectre v1 gadget in ip_metrics_convert()
	ipv4: prevent potential spectre v1 gadget in fib_metrics_match()
	netfilter: conntrack: fix vtag checks for ABORT/SHUTDOWN_COMPLETE
	netrom: Fix use-after-free of a listening socket.
	net/sched: sch_taprio: do not schedule in taprio_reset()
	sctp: fail if no bound addresses can be used for a given scope
	riscv/kprobe: Fix instruction simulation of JALR
	nvme: fix passthrough csi check
	gpio: mxc: Unlock on error path in mxc_flip_edge()
	ravb: Rename "no_ptp_cfg_active" and "ptp_cfg_active" variables
	net: ravb: Fix lack of register setting after system resumed for Gen3
	net: ravb: Fix possible hang if RIS2_QFF1 happen
	net: mctp: mark socks as dead on unhash, prevent re-add
	thermal: intel: int340x: Add locking to int340x_thermal_get_trip_type()
	net/tg3: resolve deadlock in tg3_reset_task() during EEH
	net: mdio-mux-meson-g12a: force internal PHY off on mux switch
	treewide: fix up files incorrectly marked executable
	tools: gpio: fix -c option of gpio-event-mon
	Revert "Input: synaptics - switch touchpad on HP Laptop 15-da3001TU to RMI mode"
	cpufreq: Move to_gov_attr_set() to cpufreq.h
	cpufreq: governor: Use kobject release() method to free dbs_data
	kbuild: Allow kernel installation packaging to override pkg-config
	block: fix and cleanup bio_check_ro
	x86/i8259: Mark legacy PIC interrupts with IRQ_LEVEL
	netfilter: conntrack: unify established states for SCTP paths
	perf/x86/amd: fix potential integer overflow on shift of a int
	Linux 5.15.91

Change-Id: I3349d802533097ac86e5c680fbd40c00c9719ec7
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-02-01 09:38:19 +00:00
Kees Cook
7b98914a6c panic: Consolidate open-coded panic_on_warn checks
commit 79cc1ba7badf9e7a12af99695a557e9ce27ee967 upstream.

Several run-time checkers (KASAN, UBSAN, KFENCE, KCSAN, sched) roll
their own warnings, and each check "panic_on_warn". Consolidate this
into a single function so that future instrumentation can be added in
a single location.

Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Gow <davidgow@google.com>
Cc: tangmeng <tangmeng@uniontech.com>
Cc: Jann Horn <jannh@google.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: kasan-dev@googlegroups.com
Cc: linux-mm@kvack.org
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Link: https://lore.kernel.org/r/20221117234328.594699-4-keescook@chromium.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-01 08:27:22 +01:00
Tiezhu Yang
b5c1acaa43 kasan: no need to unset panic_on_warn in end_report()
commit e7ce7500375a63348e1d3a703c8d5003cbe3fea6 upstream.

panic_on_warn is unset inside panic(), so no need to unset it before
calling panic() in end_report().

Link: https://lkml.kernel.org/r/1644324666-15947-6-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-01 08:27:20 +01:00
Dom Cobley
bb12d18e60 Merge remote-tracking branch 'stable/linux-5.15.y' into rpi-5.15.y 2023-01-30 14:49:00 +00:00
Greg Kroah-Hartman
17126d43d4 Merge 5.15.90 into android14-5.15
Changes in 5.15.90
	btrfs: fix trace event name typo for FLUSH_DELAYED_REFS
	pNFS/filelayout: Fix coalescing test for single DS
	selftests/bpf: check null propagation only neither reg is PTR_TO_BTF_ID
	tools/virtio: initialize spinlocks in vring_test.c
	virtio_pci: modify ENOENT to EINVAL
	vduse: Validate vq_num in vduse_validate_config()
	net/ethtool/ioctl: return -EOPNOTSUPP if we have no phy stats
	r8169: move rtl_wol_enable_rx() and rtl_prepare_power_down()
	RDMA/srp: Move large values to a new enum for gcc13
	btrfs: always report error in run_one_delayed_ref()
	x86/asm: Fix an assembler warning with current binutils
	f2fs: let's avoid panic if extent_tree is not created
	perf/x86/rapl: Treat Tigerlake like Icelake
	fbdev: omapfb: avoid stack overflow warning
	Bluetooth: hci_qca: Fix driver shutdown on closed serdev
	wifi: brcmfmac: fix regression for Broadcom PCIe wifi devices
	wifi: mac80211: sdata can be NULL during AMPDU start
	Add exception protection processing for vd in axi_chan_handle_err function
	zonefs: Detect append writes at invalid locations
	nilfs2: fix general protection fault in nilfs_btree_insert()
	efi: fix userspace infinite retry read efivars after EFI runtime services page fault
	ALSA: hda/realtek: fix mute/micmute LEDs for a HP ProBook
	ALSA: hda/realtek: fix mute/micmute LEDs don't work for a HP platform
	drm/amdgpu: disable runtime pm on several sienna cichlid cards(v2)
	drm/amd: Delay removal of the firmware framebuffer
	hugetlb: unshare some PMDs when splitting VMAs
	io_uring: don't gate task_work run on TIF_NOTIFY_SIGNAL
	eventpoll: add EPOLL_URING_WAKE poll wakeup flag
	eventfd: provide a eventfd_signal_mask() helper
	io_uring: pass in EPOLL_URING_WAKE for eventfd signaling and wakeups
	io_uring: improve send/recv error handling
	io_uring: ensure recv and recvmsg handle MSG_WAITALL correctly
	io_uring: add flag for disabling provided buffer recycling
	io_uring: support MSG_WAITALL for IORING_OP_SEND(MSG)
	io_uring: allow re-poll if we made progress
	io_uring: fix async accept on O_NONBLOCK sockets
	io_uring: ensure that cached task references are always put on exit
	io_uring: remove duplicated calls to io_kiocb_ppos
	io_uring: update kiocb->ki_pos at execution time
	io_uring: do not recalculate ppos unnecessarily
	io_uring/rw: defer fsnotify calls to task context
	xhci-pci: set the dma max_seg_size
	usb: xhci: Check endpoint is valid before dereferencing it
	xhci: Fix null pointer dereference when host dies
	xhci: Add update_hub_device override for PCI xHCI hosts
	xhci: Add a flag to disable USB3 lpm on a xhci root port level.
	usb: acpi: add helper to check port lpm capability using acpi _DSM
	xhci: Detect lpm incapable xHC USB3 roothub ports from ACPI tables
	prlimit: do_prlimit needs to have a speculation check
	USB: serial: option: add Quectel EM05-G (GR) modem
	USB: serial: option: add Quectel EM05-G (CS) modem
	USB: serial: option: add Quectel EM05-G (RS) modem
	USB: serial: option: add Quectel EC200U modem
	USB: serial: option: add Quectel EM05CN (SG) modem
	USB: serial: option: add Quectel EM05CN modem
	staging: vchiq_arm: fix enum vchiq_status return types
	USB: misc: iowarrior: fix up header size for USB_DEVICE_ID_CODEMERCS_IOW100
	misc: fastrpc: Don't remove map on creater_process and device_release
	misc: fastrpc: Fix use-after-free race condition for maps
	usb: core: hub: disable autosuspend for TI TUSB8041
	comedi: adv_pci1760: Fix PWM instruction handling
	ACPI: PRM: Check whether EFI runtime is available
	mmc: sunxi-mmc: Fix clock refcount imbalance during unbind
	mmc: sdhci-esdhc-imx: correct the tuning start tap and step setting
	btrfs: do not abort transaction on failure to write log tree when syncing log
	btrfs: fix race between quota rescan and disable leading to NULL pointer deref
	cifs: do not include page data when checking signature
	thunderbolt: Use correct function to calculate maximum USB3 link rate
	riscv: dts: sifive: fu740: fix size of pcie 32bit memory
	bpf: restore the ebpf program ID for BPF_AUDIT_UNLOAD and PERF_BPF_EVENT_PROG_UNLOAD
	staging: mt7621-dts: change some node hex addresses to lower case
	tty: serial: qcom-geni-serial: fix slab-out-of-bounds on RX FIFO buffer
	tty: fix possible null-ptr-defer in spk_ttyio_release
	USB: gadgetfs: Fix race between mounting and unmounting
	USB: serial: cp210x: add SCALANCE LPE-9000 device id
	usb: cdns3: remove fetched trb from cache before dequeuing
	usb: host: ehci-fsl: Fix module alias
	usb: typec: tcpm: Fix altmode re-registration causes sysfs create fail
	usb: typec: altmodes/displayport: Add pin assignment helper
	usb: typec: altmodes/displayport: Fix pin assignment calculation
	usb: gadget: g_webcam: Send color matching descriptor per frame
	usb: gadget: f_ncm: fix potential NULL ptr deref in ncm_bitrate()
	usb-storage: apply IGNORE_UAS only for HIKSEMI MD202 on RTL9210
	dt-bindings: phy: g12a-usb2-phy: fix compatible string documentation
	dt-bindings: phy: g12a-usb3-pcie-phy: fix compatible string documentation
	serial: pch_uart: Pass correct sg to dma_unmap_sg()
	dmaengine: lgm: Move DT parsing after initialization
	dmaengine: tegra210-adma: fix global intr clear
	dmaengine: idxd: Let probe fail when workqueue cannot be enabled
	serial: amba-pl011: fix high priority character transmission in rs486 mode
	serial: atmel: fix incorrect baudrate setup
	gsmi: fix null-deref in gsmi_get_variable
	mei: me: add meteor lake point M DID
	drm/i915: re-disable RC6p on Sandy Bridge
	drm/i915/display: Check source height is > 0
	drm/amd/display: Fix set scaling doesn's work
	drm/amd/display: Calculate output_color_space after pixel encoding adjustment
	drm/amd/display: Fix COLOR_SPACE_YCBCR2020_TYPE matrix
	drm/amdgpu: drop experimental flag on aldebaran
	fs/ntfs3: Fix attr_punch_hole() null pointer derenference
	arm64: efi: Execute runtime services from a dedicated stack
	efi: rt-wrapper: Add missing include
	Revert "drm/amdgpu: make display pinning more flexible (v2)"
	x86/fpu: Use _Alignof to avoid undefined behavior in TYPE_ALIGN
	tracing: Use alignof__(struct {type b;}) instead of offsetof()
	io_uring: io_kiocb_update_pos() should not touch file for non -1 offset
	io_uring/net: fix fast_iov assignment in io_setup_async_msg()
	net/ulp: use consistent error code when blocking ULP
	net/mlx5: fix missing mutex_unlock in mlx5_fw_fatal_reporter_err_work()
	block: mq-deadline: Rename deadline_is_seq_writes()
	Revert "wifi: mac80211: fix memory leak in ieee80211_if_add()"
	soc: qcom: apr: Make qcom,protection-domain optional again
	mm/khugepaged: fix collapse_pte_mapped_thp() to allow anon_vma
	io_uring: Clean up a false-positive warning from GCC 9.3.0
	io_uring: fix double poll leak on repolling
	io_uring/rw: ensure kiocb_end_write() is always called
	io_uring/rw: remove leftover debug statement
	Linux 5.15.90

Change-Id: I5abc2e695f7183a1d3be9d8f62633bb1df8e8a48
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-01-25 16:13:42 +00:00
Andrey Konovalov
6d926a9aae FROMLIST: kasan: reset page tags properly with sampling
[The patch is in the mm-unstable tree.]

The implementation of page_alloc poisoning sampling assumed that
tag_clear_highpage resets page tags for __GFP_ZEROTAGS allocations.
However, this is no longer the case since commit 70c248aca9e7
("mm: kasan: Skip unpoisoning of user pages").

This leads to kernel crashes when MTE-enabled userspace mappings are
used with Hardware Tag-Based KASAN enabled.

Reset page tags for __GFP_ZEROTAGS allocations in post_alloc_hook().

Also clarify and fix related comments.

Fixes: 44383cef54c0 ("kasan: allow sampling page_alloc allocations for HW_TAGS")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Peter Collingbourne <pcc@google.com>
Tested-by: Peter Collingbourne <pcc@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 238286329
Bug: 264310057
Link: https://lore.kernel.org/all/5dbd866714b4839069e2d8469ac45b60953db290.1674592780.git.andreyknvl@google.com/
Change-Id: Iea4234bcf7e35337c8063827b07039583bca9c66
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
2023-01-25 00:02:18 +01:00
Andrey Konovalov
25e257e5d5 FROMGIT: kasan: allow sampling page_alloc allocations for HW_TAGS
[The patch is in mm-stable tree.]

As Hardware Tag-Based KASAN is intended to be used in production, its
performance impact is crucial.  As page_alloc allocations tend to be big,
tagging and checking all such allocations can introduce a significant
slowdown.

Add two new boot parameters that allow to alleviate that slowdown:

- kasan.page_alloc.sample, which makes Hardware Tag-Based KASAN tag only
  every Nth page_alloc allocation with the order configured by the second
  added parameter (default: tag every such allocation).

- kasan.page_alloc.sample.order, which makes sampling enabled by the first
  parameter only affect page_alloc allocations with the order equal or
  greater than the specified value (default: 3, see below).

The exact performance improvement caused by using the new parameters
depends on their values and the applied workload.

The chosen default value for kasan.page_alloc.sample.order is 3, which
matches both PAGE_ALLOC_COSTLY_ORDER and SKB_FRAG_PAGE_ORDER.  This is
done for two reasons:

1. PAGE_ALLOC_COSTLY_ORDER is "the order at which allocations are deemed
   costly to service", which corresponds to the idea that only large and
   thus costly allocations are supposed to sampled.

2. One of the workloads targeted by this patch is a benchmark that sends
   a large amount of data over a local loopback connection. Most multi-page
   data allocations in the networking subsystem have the order of
   SKB_FRAG_PAGE_ORDER (or PAGE_ALLOC_COSTLY_ORDER).

When running a local loopback test on a testing MTE-enabled device in sync
mode, enabling Hardware Tag-Based KASAN introduces a ~50% slowdown.
Applying this patch and setting kasan.page_alloc.sampling to a value
higher than 1 allows to lower the slowdown.  The performance improvement
saturates around the sampling interval value of 10 with the default
sampling page order of 3.  This lowers the slowdown to ~20%.  The slowdown
in real scenarios involving the network will likely be better.

Enabling page_alloc sampling has a downside: KASAN misses bad accesses to
a page_alloc allocation that has not been tagged.  This lowers the value
of KASAN as a security mitigation.

However, based on measuring the number of page_alloc allocations of
different orders during boot in a test build, sampling with the default
kasan.page_alloc.sample.order value affects only ~7% of allocations.  The
rest ~93% of allocations are still checked deterministically.

Link: https://lkml.kernel.org/r/129da0614123bb85ed4dd61ae30842b2dd7c903f.1671471846.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Brand <markbrand@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 238286329
Bug: 264310057
(cherry picked from commit 44383cef54c0ce1201f884d83cc2b367bc5aa4f7 git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-stable)
Change-Id: I85f9eb4e93eeddff8f8d06238f433226affca177
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
2023-01-25 00:02:08 +01:00
Hugh Dickins
940e8922c1 mm/khugepaged: fix collapse_pte_mapped_thp() to allow anon_vma
commit ab0c3f1251b4670978fde0bd54161795a139b060 upstream.

uprobe_write_opcode() uses collapse_pte_mapped_thp() to restore huge pmd,
when removing a breakpoint from hugepage text: vma->anon_vma is always set
in that case, so undo the prohibition.  And MADV_COLLAPSE ought to be able
to collapse some page tables in a vma which happens to have anon_vma set
from CoWing elsewhere.

Is anon_vma lock required?  Almost not: if any page other than expected
subpage of the non-anon huge page is found in the page table, collapse is
aborted without making any change.  However, it is possible that an anon
page was CoWed from this extent in another mm or vma, in which case a
concurrent lookup might look here: so keep it away while clearing pmd (but
perhaps we shall go back to using pmd_lock() there in future).

Note that collapse_pte_mapped_thp() is exceptional in freeing a page table
without having cleared its ptes: I'm uneasy about that, and had thought
pte_clear()ing appropriate; but exclusive i_mmap lock does fix the
problem, and we would have to move the mmu_notification if clearing those
ptes.

What this fixes is not a dangerous instability.  But I suggest Cc stable
because uprobes "healing" has regressed in that way, so this should follow
8d3c106e19e8 into those stable releases where it was backported (and may
want adjustment there - I'll supply backports as needed).

Link: https://lkml.kernel.org/r/b740c9fb-edba-92ba-59fb-7a5592e5dfc@google.com
Fixes: 8d3c106e19e8 ("mm/khugepaged: take the right locks for page table retraction")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org>    [5.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-24 07:22:49 +01:00
James Houghton
bd9a23a4bb hugetlb: unshare some PMDs when splitting VMAs
[ Upstream commit b30c14cd61025eeea2f2e8569606cd167ba9ad2d ]

PMD sharing can only be done in PUD_SIZE-aligned pieces of VMAs; however,
it is possible that HugeTLB VMAs are split without unsharing the PMDs
first.

Without this fix, it is possible to hit the uffd-wp-related WARN_ON_ONCE
in hugetlb_change_protection [1].  The key there is that
hugetlb_unshare_all_pmds will not attempt to unshare PMDs in
non-PUD_SIZE-aligned sections of the VMA.

It might seem ideal to unshare in hugetlb_vm_op_open, but we need to
unshare in both the new and old VMAs, so unsharing in hugetlb_vm_op_split
seems natural.

[1]: https://lore.kernel.org/linux-mm/CADrL8HVeOkj0QH5VZZbRzybNE8CG-tEGFshnA+bG9nMgcWtBSg@mail.gmail.com/

Link: https://lkml.kernel.org/r/20230104231910.1464197-1-jthoughton@google.com
Fixes: 6dfeaff93b ("hugetlb/userfaultfd: unshare all pmds for hugetlbfs when register wp")
Signed-off-by: James Houghton <jthoughton@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-24 07:22:43 +01:00
Peter Collingbourne
f95873c0e5 Revert "FROMLIST: kasan: allow sampling page_alloc allocations for HW_TAGS"
This reverts commit fe19dff7e6.

Reason for revert:
Observed frequent boot crashes on a device with sampling KASAN enabled.

Bug: 265863271
Change-Id: Ib7860295065ed7aaa36d9e47d8aaa97918c7bc57
Signed-off-by: Peter Collingbourne <pcc@google.com>
2023-01-20 15:11:44 -08:00
Greg Kroah-Hartman
0fd420005c Merge 5.15.89 into android14-5.15
Changes in 5.15.89
	netfilter: nft_payload: incorrect arithmetics when fetching VLAN header bits
	ALSA: control-led: use strscpy in set_led_id()
	ALSA: hda/realtek - Turn on power early
	ALSA: hda/realtek: Enable mute/micmute LEDs on HP Spectre x360 13-aw0xxx
	KVM: arm64: Fix S1PTW handling on RO memslots
	KVM: arm64: nvhe: Fix build with profile optimization
	selftests: kvm: Fix a compile error in selftests/kvm/rseq_test.c
	efi: tpm: Avoid READ_ONCE() for accessing the event log
	docs: Fix the docs build with Sphinx 6.0
	net: stmmac: add aux timestamps fifo clearance wait
	perf auxtrace: Fix address filter duplicate symbol selection
	s390/kexec: fix ipl report address for kdump
	ASoC: qcom: lpass-cpu: Fix fallback SD line index handling
	s390/cpum_sf: add READ_ONCE() semantics to compare and swap loops
	s390/percpu: add READ_ONCE() to arch_this_cpu_to_op_simple()
	drm/virtio: Fix GEM handle creation UAF
	drm/i915/gt: Reset twice
	net/mlx5e: Set action fwd flag when parsing tc action goto
	cifs: Fix uninitialized memory read for smb311 posix symlink create
	platform/x86: dell-privacy: Only register SW_CAMERA_LENS_COVER if present
	platform/surface: aggregator: Ignore command messages not intended for us
	platform/x86: dell-privacy: Fix SW_CAMERA_LENS_COVER reporting
	dt-bindings: msm: dsi-controller-main: Fix operating-points-v2 constraint
	drm/msm/adreno: Make adreno quirks not overwrite each other
	dt-bindings: msm: dsi-controller-main: Fix power-domain constraint
	dt-bindings: msm: dsi-controller-main: Fix description of core clock
	dt-bindings: msm: dsi-phy-28nm: Add missing qcom, dsi-phy-regulator-ldo-mode
	platform/x86: ideapad-laptop: Add Legion 5 15ARH05 DMI id to set_fn_lock_led_list[]
	drm/msm/dp: do not complete dp_aux_cmd_fifo_tx() if irq is not for aux transfer
	dt-bindings: msm/dsi: Don't require vdds-supply on 10nm PHY
	dt-bindings: msm/dsi: Don't require vcca-supply on 14nm PHY
	platform/x86: sony-laptop: Don't turn off 0x153 keyboard backlight during probe
	ixgbe: fix pci device refcount leak
	ipv6: raw: Deduct extension header length in rawv6_push_pending_frames
	bus: mhi: host: Fix race between channel preparation and M0 event
	usb: ulpi: defer ulpi_register on ulpi_read_id timeout
	iommu/iova: Fix alloc iova overflows issue
	iommu/mediatek-v1: Fix an error handling path in mtk_iommu_v1_probe()
	sched/core: Fix use-after-free bug in dup_user_cpus_ptr()
	netfilter: ipset: Fix overflow before widen in the bitmap_ip_create() function.
	powerpc/imc-pmu: Fix use of mutex in IRQs disabled section
	x86/boot: Avoid using Intel mnemonics in AT&T syntax asm
	EDAC/device: Fix period calculation in edac_device_reset_delay_period()
	x86/resctrl: Fix task CLOSID/RMID update race
	regulator: da9211: Use irq handler when ready
	scsi: mpi3mr: Refer CONFIG_SCSI_MPI3MR in Makefile
	scsi: ufs: Stop using the clock scaling lock in the error handler
	scsi: ufs: core: WLUN suspend SSU/enter hibern8 fail recovery
	ASoC: wm8904: fix wrong outputs volume after power reactivation
	ALSA: usb-audio: Make sure to stop endpoints before closing EPs
	ALSA: usb-audio: Relax hw constraints for implicit fb sync
	tipc: fix unexpected link reset due to discovery messages
	octeontx2-af: Fix LMAC config in cgx_lmac_rx_tx_enable
	hvc/xen: lock console list traversal
	nfc: pn533: Wait for out_urb's completion in pn533_usb_send_frame()
	af_unix: selftest: Fix the size of the parameter to connect()
	tools/nolibc: x86: Remove `r8`, `r9` and `r10` from the clobber list
	tools/nolibc: x86-64: Use `mov $60,%eax` instead of `mov $60,%rax`
	tools/nolibc: use pselect6 on RISCV
	tools/nolibc/std: move the standard type definitions to std.h
	tools/nolibc/types: split syscall-specific definitions into their own files
	tools/nolibc/arch: split arch-specific code into individual files
	tools/nolibc/arch: mark the _start symbol as weak
	tools/nolibc: Remove .global _start from the entry point code
	tools/nolibc: restore mips branch ordering in the _start block
	tools/nolibc: fix the O_* fcntl/open macro definitions for riscv
	net/sched: act_mpls: Fix warning during failed attribute validation
	net/mlx5: Fix ptp max frequency adjustment range
	net/mlx5e: Don't support encap rules with gbp option
	perf build: Properly guard libbpf includes
	igc: Fix PPS delta between two synchronized end-points
	platform/surface: aggregator: Add missing call to ssam_request_sync_free()
	mm: Always release pages to the buddy allocator in memblock_free_late().
	Documentation: KVM: add API issues section
	KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
	io_uring: lock overflowing for IOPOLL
	arm64: atomics: format whitespace consistently
	arm64: atomics: remove LL/SC trampolines
	arm64: cmpxchg_double*: hazard against entire exchange variable
	efi: fix NULL-deref in init error path
	scsi: mpt3sas: Remove scsi_dma_map() error messages
	io_uring/io-wq: free worker if task_work creation is canceled
	io_uring/io-wq: only free worker if it was allocated for creation
	block: handle bio_split_to_limits() NULL return
	Revert "usb: ulpi: defer ulpi_register on ulpi_read_id timeout"
	pinctrl: amd: Add dynamic debugging for active GPIOs
	Linux 5.15.89

Change-Id: I66c4f269aa7751b2e4aac919f822dfdcb844a69d
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-01-18 13:45:44 +00:00
Greg Kroah-Hartman
efae08b654 Merge 5.15.87 into android14-5.15
Changes in 5.15.87
	usb: dwc3: qcom: Fix memory leak in dwc3_qcom_interconnect_init
	cifs: fix oops during encryption
	Revert "selftests/bpf: Add test for unstable CT lookup API"
	nvme-pci: fix doorbell buffer value endianness
	nvme-pci: fix mempool alloc size
	nvme-pci: fix page size checks
	ACPI: resource: Skip IRQ override on Asus Vivobook K3402ZA/K3502ZA
	ACPI: resource: do IRQ override on LENOVO IdeaPad
	ACPI: resource: do IRQ override on XMG Core 15
	ACPI: resource: do IRQ override on Lenovo 14ALC7
	block, bfq: fix uaf for bfqq in bfq_exit_icq_bfqq
	ata: ahci: Fix PCS quirk application for suspend
	nvme: fix the NVME_CMD_EFFECTS_CSE_MASK definition
	nvmet: don't defer passthrough commands with trivial effects to the workqueue
	fs/ntfs3: Validate BOOT record_size
	fs/ntfs3: Add overflow check for attribute size
	fs/ntfs3: Validate data run offset
	fs/ntfs3: Add null pointer check to attr_load_runs_vcn
	fs/ntfs3: Fix memory leak on ntfs_fill_super() error path
	fs/ntfs3: Add null pointer check for inode operations
	fs/ntfs3: Validate attribute name offset
	fs/ntfs3: Validate buffer length while parsing index
	fs/ntfs3: Validate resident attribute name
	fs/ntfs3: Fix slab-out-of-bounds read in run_unpack
	soundwire: dmi-quirks: add quirk variant for LAPBC710 NUC15
	fs/ntfs3: Validate index root when initialize NTFS security
	fs/ntfs3: Use __GFP_NOWARN allocation at wnd_init()
	fs/ntfs3: Use __GFP_NOWARN allocation at ntfs_fill_super()
	fs/ntfs3: Delete duplicate condition in ntfs_read_mft()
	fs/ntfs3: Fix slab-out-of-bounds in r_page
	objtool: Fix SEGFAULT
	powerpc/rtas: avoid device tree lookups in rtas_os_term()
	powerpc/rtas: avoid scheduling in rtas_os_term()
	HID: multitouch: fix Asus ExpertBook P2 P2451FA trackpoint
	HID: plantronics: Additional PIDs for double volume key presses quirk
	pstore: Properly assign mem_type property
	pstore/zone: Use GFP_ATOMIC to allocate zone buffer
	hfsplus: fix bug causing custom uid and gid being unable to be assigned with mount
	binfmt: Fix error return code in load_elf_fdpic_binary()
	ovl: Use ovl mounter's fsuid and fsgid in ovl_link()
	ALSA: line6: correct midi status byte when receiving data from podxt
	ALSA: line6: fix stack overflow in line6_midi_transmit
	pnode: terminate at peers of source
	mfd: mt6360: Add bounds checking in Regmap read/write call-backs
	md: fix a crash in mempool_free
	mm, compaction: fix fast_isolate_around() to stay within boundaries
	f2fs: should put a page when checking the summary info
	f2fs: allow to read node block after shutdown
	mmc: vub300: fix warning - do not call blocking ops when !TASK_RUNNING
	tpm: acpi: Call acpi_put_table() to fix memory leak
	tpm: tpm_crb: Add the missed acpi_put_table() to fix memory leak
	tpm: tpm_tis: Add the missed acpi_put_table() to fix memory leak
	SUNRPC: Don't leak netobj memory when gss_read_proxy_verf() fails
	kcsan: Instrument memcpy/memset/memmove with newer Clang
	ASoC: Intel/SOF: use set_stream() instead of set_tdm_slots() for HDAudio
	ASoC/SoundWire: dai: expand 'stream' concept beyond SoundWire
	rcu-tasks: Simplify trc_read_check_handler() atomic operations
	net/af_packet: add VLAN support for AF_PACKET SOCK_RAW GSO
	net/af_packet: make sure to pull mac header
	media: stv0288: use explicitly signed char
	soc: qcom: Select REMAP_MMIO for LLCC driver
	kest.pl: Fix grub2 menu handling for rebooting
	ktest.pl minconfig: Unset configs instead of just removing them
	jbd2: use the correct print format
	perf/x86/intel/uncore: Disable I/O stacks to PMU mapping on ICX-D
	perf/x86/intel/uncore: Clear attr_update properly
	arm64: dts: qcom: sdm845-db845c: correct SPI2 pins drive strength
	mmc: sdhci-sprd: Disable CLK_AUTO when the clock is less than 400K
	btrfs: fix resolving backrefs for inline extent followed by prealloc
	ARM: ux500: do not directly dereference __iomem
	arm64: dts: qcom: sdm850-lenovo-yoga-c630: correct I2C12 pins drive strength
	selftests: Use optional USERCFLAGS and USERLDFLAGS
	PM/devfreq: governor: Add a private governor_data for governor
	cpufreq: Init completion before kobject_init_and_add()
	ALSA: patch_realtek: Fix Dell Inspiron Plus 16
	ALSA: hda/realtek: Apply dual codec fixup for Dell Latitude laptops
	fs: dlm: fix sock release if listen fails
	fs: dlm: retry accept() until -EAGAIN or error returns
	mptcp: mark ops structures as ro_after_init
	mptcp: remove MPTCP 'ifdef' in TCP SYN cookies
	dm cache: Fix ABBA deadlock between shrink_slab and dm_cache_metadata_abort
	dm thin: Fix ABBA deadlock between shrink_slab and dm_pool_abort_metadata
	dm thin: Use last transaction's pmd->root when commit failed
	dm thin: resume even if in FAIL mode
	dm thin: Fix UAF in run_timer_softirq()
	dm integrity: Fix UAF in dm_integrity_dtr()
	dm clone: Fix UAF in clone_dtr()
	dm cache: Fix UAF in destroy()
	dm cache: set needs_check flag after aborting metadata
	tracing/hist: Fix out-of-bound write on 'action_data.var_ref_idx'
	perf/core: Call LSM hook after copying perf_event_attr
	of/kexec: Fix reading 32-bit "linux,initrd-{start,end}" values
	KVM: VMX: Resume guest immediately when injecting #GP on ECREATE
	KVM: nVMX: Inject #GP, not #UD, if "generic" VMXON CR0/CR4 check fails
	KVM: nVMX: Properly expose ENABLE_USR_WAIT_PAUSE control to L1
	x86/microcode/intel: Do not retry microcode reloading on the APs
	ftrace/x86: Add back ftrace_expected for ftrace bug reports
	x86/kprobes: Fix kprobes instruction boudary check with CONFIG_RETHUNK
	x86/kprobes: Fix optprobe optimization check with CONFIG_RETHUNK
	tracing: Fix race where eprobes can be called before the event
	tracing: Fix complicated dependency of CONFIG_TRACER_MAX_TRACE
	tracing/hist: Fix wrong return value in parse_action_params()
	tracing/probes: Handle system names with hyphens
	tracing: Fix infinite loop in tracing_read_pipe on overflowed print_trace_line
	staging: media: tegra-video: fix chan->mipi value on error
	staging: media: tegra-video: fix device_node use after free
	ARM: 9256/1: NWFPE: avoid compiler-generated __aeabi_uldivmod
	media: dvb-core: Fix double free in dvb_register_device()
	media: dvb-core: Fix UAF due to refcount races at releasing
	cifs: fix confusing debug message
	cifs: fix missing display of three mount options
	rtc: ds1347: fix value written to century register
	block: mq-deadline: Do not break sequential write streams to zoned HDDs
	md/bitmap: Fix bitmap chunk size overflow issues
	efi: Add iMac Pro 2017 to uefi skip cert quirk
	wifi: wilc1000: sdio: fix module autoloading
	ASoC: jz4740-i2s: Handle independent FIFO flush bits
	ipu3-imgu: Fix NULL pointer dereference in imgu_subdev_set_selection()
	ipmi: fix long wait in unload when IPMI disconnect
	mtd: spi-nor: Check for zero erase size in spi_nor_find_best_erase_type()
	ima: Fix a potential NULL pointer access in ima_restore_measurement_list
	ipmi: fix use after free in _ipmi_destroy_user()
	PCI: Fix pci_device_is_present() for VFs by checking PF
	PCI/sysfs: Fix double free in error path
	riscv: stacktrace: Fixup ftrace_graph_ret_addr retp argument
	riscv: mm: notify remote harts about mmu cache updates
	crypto: n2 - add missing hash statesize
	crypto: ccp - Add support for TEE for PCI ID 0x14CA
	driver core: Fix bus_type.match() error handling in __driver_attach()
	phy: qcom-qmp-combo: fix sc8180x reset
	iommu/amd: Fix ivrs_acpihid cmdline parsing code
	remoteproc: core: Do pm_relax when in RPROC_OFFLINE state
	parisc: led: Fix potential null-ptr-deref in start_task()
	device_cgroup: Roll back to original exceptions after copy failure
	drm/connector: send hotplug uevent on connector cleanup
	drm/vmwgfx: Validate the box size for the snooped cursor
	drm/i915/dsi: fix VBT send packet port selection for dual link DSI
	drm/ingenic: Fix missing platform_driver_unregister() call in ingenic_drm_init()
	ext4: silence the warning when evicting inode with dioread_nolock
	ext4: add inode table check in __ext4_get_inode_loc to aovid possible infinite loop
	ext4: remove trailing newline from ext4_msg() message
	fs: ext4: initialize fsdata in pagecache_write()
	ext4: fix use-after-free in ext4_orphan_cleanup
	ext4: fix undefined behavior in bit shift for ext4_check_flag_values
	ext4: add EXT4_IGET_BAD flag to prevent unexpected bad inode
	ext4: add helper to check quota inums
	ext4: fix bug_on in __es_tree_search caused by bad quota inode
	ext4: fix reserved cluster accounting in __es_remove_extent()
	ext4: check and assert if marking an no_delete evicting inode dirty
	ext4: fix bug_on in __es_tree_search caused by bad boot loader inode
	ext4: fix leaking uninitialized memory in fast-commit journal
	ext4: fix uninititialized value in 'ext4_evict_inode'
	ext4: init quota for 'old.inode' in 'ext4_rename'
	ext4: fix delayed allocation bug in ext4_clu_mapped for bigalloc + inline
	ext4: fix corruption when online resizing a 1K bigalloc fs
	ext4: fix error code return to user-space in ext4_get_branch()
	ext4: avoid BUG_ON when creating xattrs
	ext4: fix kernel BUG in 'ext4_write_inline_data_end()'
	ext4: fix inode leak in ext4_xattr_inode_create() on an error path
	ext4: initialize quota before expanding inode in setproject ioctl
	ext4: avoid unaccounted block allocation when expanding inode
	ext4: allocate extended attribute value in vmalloc area
	drm/amdgpu: handle polaris10/11 overlap asics (v2)
	drm/amdgpu: make display pinning more flexible (v2)
	block: mq-deadline: Fix dd_finish_request() for zoned devices
	tracing: Fix issue of missing one synthetic field
	ext4: remove unused enum EXT4_FC_COMMIT_FAILED
	ext4: use ext4_debug() instead of jbd_debug()
	ext4: introduce EXT4_FC_TAG_BASE_LEN helper
	ext4: factor out ext4_fc_get_tl()
	ext4: fix potential out of bound read in ext4_fc_replay_scan()
	ext4: disable fast-commit of encrypted dir operations
	ext4: don't set up encryption key during jbd2 transaction
	ext4: add missing validation of fast-commit record lengths
	ext4: fix unaligned memory access in ext4_fc_reserve_space()
	ext4: fix off-by-one errors in fast-commit block filling
	ARM: renumber bits related to _TIF_WORK_MASK
	phy: qcom-qmp-combo: fix out-of-bounds clock access
	btrfs: replace strncpy() with strscpy()
	btrfs: move missing device handling in a dedicate function
	btrfs: fix extent map use-after-free when handling missing device in read_one_chunk
	x86/mce: Get rid of msr_ops
	x86/MCE/AMD: Clear DFR errors found in THR handler
	media: s5p-mfc: Fix to handle reference queue during finishing
	media: s5p-mfc: Clear workbit to handle error condition
	media: s5p-mfc: Fix in register read and write for H264
	perf probe: Use dwarf_attr_integrate as generic DWARF attr accessor
	perf probe: Fix to get the DW_AT_decl_file and DW_AT_call_file as unsinged data
	ravb: Fix "failed to switch device to config mode" message during unbind
	ext4: goto right label 'failed_mount3a'
	ext4: correct inconsistent error msg in nojournal mode
	mbcache: automatically delete entries from cache on freeing
	ext4: fix deadlock due to mbcache entry corruption
	drm/i915/migrate: don't check the scratch page
	drm/i915/migrate: fix offset calculation
	drm/i915/migrate: fix length calculation
	SUNRPC: ensure the matching upcall is in-flight upon downcall
	btrfs: fix an error handling path in btrfs_defrag_leaves()
	bpf: pull before calling skb_postpull_rcsum()
	drm/panfrost: Fix GEM handle creation ref-counting
	netfilter: nf_tables: consolidate set description
	netfilter: nf_tables: add function to create set stateful expressions
	netfilter: nf_tables: perform type checking for existing sets
	vmxnet3: correctly report csum_level for encapsulated packet
	netfilter: nf_tables: honor set timeout and garbage collection updates
	veth: Fix race with AF_XDP exposing old or uninitialized descriptors
	nfsd: shut down the NFSv4 state objects before the filecache
	net: hns3: add interrupts re-initialization while doing VF FLR
	net: hns3: refactor hns3_nic_reuse_page()
	net: hns3: extract macro to simplify ring stats update code
	net: hns3: fix miss L3E checking for rx packet
	net: hns3: fix VF promisc mode not update when mac table full
	net: sched: fix memory leak in tcindex_set_parms
	qlcnic: prevent ->dcb use-after-free on qlcnic_dcb_enable() failure
	net: dsa: mv88e6xxx: depend on PTP conditionally
	nfc: Fix potential resource leaks
	vdpa_sim: fix possible memory leak in vdpasim_net_init() and vdpasim_blk_init()
	vhost/vsock: Fix error handling in vhost_vsock_init()
	vringh: fix range used in iotlb_translate()
	vhost: fix range used in translate_desc()
	vdpa_sim: fix vringh initialization in vdpasim_queue_ready()
	net/mlx5: E-Switch, properly handle ingress tagged packets on VST
	net/mlx5: Add forgotten cleanup calls into mlx5_init_once() error path
	net/mlx5: Avoid recovery in probe flows
	net/mlx5e: IPoIB, Don't allow CQE compression to be turned on by default
	net/mlx5e: TC, Refactor mlx5e_tc_add_flow_mod_hdr() to get flow attr
	net/mlx5e: Always clear dest encap in neigh-update-del
	net/mlx5e: Fix hw mtu initializing at XDP SQ allocation
	net: amd-xgbe: add missed tasklet_kill
	net: ena: Fix toeplitz initial hash value
	net: ena: Don't register memory info on XDP exchange
	net: ena: Account for the number of processed bytes in XDP
	net: ena: Use bitmask to indicate packet redirection
	net: ena: Fix rx_copybreak value update
	net: ena: Set default value for RX interrupt moderation
	net: ena: Update NUMA TPH hint register upon NUMA node update
	net: phy: xgmiitorgmii: Fix refcount leak in xgmiitorgmii_probe
	RDMA/mlx5: Fix mlx5_ib_get_hw_stats when used for device
	RDMA/mlx5: Fix validation of max_rd_atomic caps for DC
	drm/meson: Reduce the FIFO lines held when AFBC is not used
	filelock: new helper: vfs_inode_has_locks
	ceph: switch to vfs_inode_has_locks() to fix file lock bug
	gpio: sifive: Fix refcount leak in sifive_gpio_probe
	net: sched: atm: dont intepret cls results when asked to drop
	net: sched: cbq: dont intepret cls results when asked to drop
	net: sparx5: Fix reading of the MAC address
	netfilter: ipset: fix hash:net,port,net hang with /0 subnet
	netfilter: ipset: Rework long task execution when adding/deleting entries
	perf tools: Fix resources leak in perf_data__open_dir()
	drm/imx: ipuv3-plane: Fix overlay plane width
	fs/ntfs3: don't hold ni_lock when calling truncate_setsize()
	drivers/net/bonding/bond_3ad: return when there's no aggregator
	octeontx2-pf: Fix lmtst ID used in aura free
	usb: rndis_host: Secure rndis_query check against int overflow
	perf stat: Fix handling of --for-each-cgroup with --bpf-counters to match non BPF mode
	drm/i915: unpin on error in intel_vgpu_shadow_mm_pin()
	caif: fix memory leak in cfctrl_linkup_request()
	udf: Fix extension of the last extent in the file
	ASoC: Intel: bytcr_rt5640: Add quirk for the Advantech MICA-071 tablet
	nvme: fix multipath crash caused by flush request when blktrace is enabled
	io_uring: check for valid register opcode earlier
	nvmet: use NVME_CMD_EFFECTS_CSUPP instead of open coding it
	nvme: also return I/O command effects from nvme_command_effects
	btrfs: check superblock to ensure the fs was not modified at thaw time
	x86/kexec: Fix double-free of elf header buffer
	x86/bugs: Flush IBP in ib_prctl_set()
	nfsd: fix handling of readdir in v4root vs. mount upcall timeout
	fbdev: matroxfb: G200eW: Increase max memory from 1 MB to 16 MB
	block: don't allow splitting of a REQ_NOWAIT bio
	io_uring: fix CQ waiting timeout handling
	thermal: int340x: Add missing attribute for data rate base
	riscv: uaccess: fix type of 0 variable on error in get_user()
	riscv, kprobes: Stricter c.jr/c.jalr decoding
	drm/i915/gvt: fix gvt debugfs destroy
	drm/i915/gvt: fix vgpu debugfs clean in remove
	hfs/hfsplus: use WARN_ON for sanity check
	hfs/hfsplus: avoid WARN_ON() for sanity check, use proper error handling
	ksmbd: fix infinite loop in ksmbd_conn_handler_loop()
	ksmbd: check nt_len to be at least CIFS_ENCPWD_SIZE in ksmbd_decode_ntlmssp_auth_blob
	Revert "ACPI: PM: Add support for upcoming AMD uPEP HID AMDI007"
	mptcp: dedicated request sock for subflow in v6
	mptcp: use proper req destructor for IPv6
	ext4: don't allow journal inode to have encrypt flag
	selftests: set the BUILD variable to absolute path
	btrfs: make thaw time super block check to also verify checksum
	net: hns3: fix return value check bug of rx copybreak
	mbcache: Avoid nesting of cache->c_list_lock under bit locks
	efi: random: combine bootloader provided RNG seed with RNG protocol output
	io_uring: Fix unsigned 'res' comparison with zero in io_fixup_rw_res()
	drm/mgag200: Fix PLL setup for G200_SE_A rev >=4
	Linux 5.15.87

Change-Id: I06fb376627506652ed60c04d56074956e6e075a0
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-01-18 13:13:15 +00:00
Dom Cobley
8ad43539c6 Merge remote-tracking branch 'stable/linux-5.15.y' into rpi-5.15.y 2023-01-18 11:53:49 +00:00
Aaron Thompson
b8f3b3cffb mm: Always release pages to the buddy allocator in memblock_free_late().
[ Upstream commit 115d9d77bb0f9152c60b6e8646369fa7f6167593 ]

If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages()
only releases pages to the buddy allocator if they are not in the
deferred range. This is correct for free pages (as defined by
for_each_free_mem_pfn_range_in_zone()) because free pages in the
deferred range will be initialized and released as part of the deferred
init process. memblock_free_pages() is called by memblock_free_late(),
which is used to free reserved ranges after memblock_free_all() has
run. All pages in reserved ranges have been initialized at that point,
and accordingly, those pages are not touched by the deferred init
process. This means that currently, if the pages that
memblock_free_late() intends to release are in the deferred range, they
will never be released to the buddy allocator. They will forever be
reserved.

In addition, memblock_free_pages() calls kmsan_memblock_free_pages(),
which is also correct for free pages but is not correct for reserved
pages. KMSAN metadata for reserved pages is initialized by
kmsan_init_shadow(), which runs shortly before memblock_free_all().

For both of these reasons, memblock_free_pages() should only be called
for free pages, and memblock_free_late() should call __free_pages_core()
directly instead.

One case where this issue can occur in the wild is EFI boot on
x86_64. The x86 EFI code reserves all EFI boot services memory ranges
via memblock_reserve() and frees them later via memblock_free_late()
(efi_reserve_boot_services() and efi_free_boot_services(),
respectively). If any of those ranges happens to fall within the
deferred init range, the pages will not be released and that memory will
be unavailable.

For example, on an Amazon EC2 t3.micro VM (1 GB) booting via EFI:

v6.2-rc2:
  # grep -E 'Node|spanned|present|managed' /proc/zoneinfo
  Node 0, zone      DMA
          spanned  4095
          present  3999
          managed  3840
  Node 0, zone    DMA32
          spanned  246652
          present  245868
          managed  178867

v6.2-rc2 + patch:
  # grep -E 'Node|spanned|present|managed' /proc/zoneinfo
  Node 0, zone      DMA
          spanned  4095
          present  3999
          managed  3840
  Node 0, zone    DMA32
          spanned  246652
          present  245868
          managed  222816   # +43,949 pages

Fixes: 3a80a7fa79 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Signed-off-by: Aaron Thompson <dev@aaront.org>
Link: https://lore.kernel.org/r/01010185892de53e-e379acfb-7044-4b24-b30a-e2657c1ba989-000000@us-west-2.amazonses.com
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18 11:48:57 +01:00
Minchan Kim
5cecdaebbf FROMLIST: BACKPORT: mm: fix is_pinnable_page against on cma page
Pages on CMA area could have MIGRATE_ISOLATE as well as MIGRATE_CMA
so current is_pinnable_page could miss CMA pages which has MIGRATE_
ISOLATE. It ends up pinning CMA pages as longterm at pin_user_pages
APIs so CMA allocation keep failed until the pin is released.

     CPU 0                                   CPU 1 - Task B

cma_alloc
alloc_contig_range
                                        pin_user_pages_fast(FOLL_LONGTERM)
change pageblock as MIGRATE_ISOLATE
                                        internal_get_user_pages_fast
                                        lockless_pages_from_mm
                                        gup_pte_range
                                        try_grab_folio
                                        is_pinnable_page
                                          return true;
                                        So, pinned the page successfully.
page migration failure with pinned page
                                        ..
                                        .. After 30 sec
                                        unpin_user_page(page)

CMA allocation succeeded after 30 sec.

The CMA allocation path protects the migration type change race
using zone->lock but what GUP path need to know is just whether the
page is on CMA area or not rather than exact migration type.
Thus, we don't need zone->lock but just checks migration type in
either of (MIGRATE_ISOLATE and MIGRATE_CMA).

Adding the MIGRATE_ISOLATE check in is_pinnable_page could cause
rejecting of pinning pages on MIGRATE_ISOLATE pageblocks even
though it's neither CMA nor movable zone if the page is temporarily
unmovable. However, such a migration failure by unexpected temporal
refcount holding is general issue, not only come from MIGRATE_ISOLATE
and the MIGRATE_ISOLATE is also transient state like other temporal
elevated refcount problem.

Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>

Conflicts:
        include/linux/mm.h

1. There is no is_pinnable_page in 5.10

Link: https://lore.kernel.org/all/20220524171525.976723-1-minchan@kernel.org/
Bug: 231227007
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I5cdd2b8eefdd7e89658abd21c32aa84876ad7782
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit e9dd78ebe1c8e9fcc4067e0795326495a16a9c9b)
2023-01-16 01:52:43 +00:00
Andrey Konovalov
fe19dff7e6 FROMLIST: kasan: allow sampling page_alloc allocations for HW_TAGS
[The patch is in mm-unstable tree.]

As Hardware Tag-Based KASAN is intended to be used in production, its
performance impact is crucial.  As page_alloc allocations tend to be big,
tagging and checking all such allocations can introduce a significant
slowdown.

Add two new boot parameters that allow to alleviate that slowdown:

- kasan.page_alloc.sample, which makes Hardware Tag-Based KASAN tag only
  every Nth page_alloc allocation with the order configured by the second
  added parameter (default: tag every such allocation).

- kasan.page_alloc.sample.order, which makes sampling enabled by the first
  parameter only affect page_alloc allocations with the order equal or
  greater than the specified value (default: 3, see below).

The exact performance improvement caused by using the new parameters
depends on their values and the applied workload.

The chosen default value for kasan.page_alloc.sample.order is 3, which
matches both PAGE_ALLOC_COSTLY_ORDER and SKB_FRAG_PAGE_ORDER.  This is
done for two reasons:

1. PAGE_ALLOC_COSTLY_ORDER is "the order at which allocations are deemed
   costly to service", which corresponds to the idea that only large and
   thus costly allocations are supposed to sampled.

2. One of the workloads targeted by this patch is a benchmark that sends
   a large amount of data over a local loopback connection. Most multi-page
   data allocations in the networking subsystem have the order of
   SKB_FRAG_PAGE_ORDER (or PAGE_ALLOC_COSTLY_ORDER).

When running a local loopback test on a testing MTE-enabled device in sync
mode, enabling Hardware Tag-Based KASAN introduces a ~50% slowdown.
Applying this patch and setting kasan.page_alloc.sampling to a value
higher than 1 allows to lower the slowdown.  The performance improvement
saturates around the sampling interval value of 10 with the default
sampling page order of 3.  This lowers the slowdown to ~20%.  The slowdown
in real scenarios involving the network will likely be better.

Enabling page_alloc sampling has a downside: KASAN misses bad accesses to
a page_alloc allocation that has not been tagged.  This lowers the value
of KASAN as a security mitigation.

However, based on measuring the number of page_alloc allocations of
different orders during boot in a test build, sampling with the default
kasan.page_alloc.sample.order value affects only ~7% of allocations.  The
rest ~93% of allocations are still checked deterministically.

Link: https://lkml.kernel.org/r/129da0614123bb85ed4dd61ae30842b2dd7c903f.1671471846.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Brand <markbrand@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 238286329
Bug: 264310057
Link: https://lore.kernel.org/all/129da0614123bb85ed4dd61ae30842b2dd7c903f.1671471846.git.andreyknvl@google.com
Change-Id: Icc7befe61848021c68a12034f426f1c300181ad6
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
2023-01-13 20:27:50 +00:00
Minchan Kim
a8962f626f ANDROID: vendor hook to control blk_plug for shrink_lruvec
Add vendor hook to contorl blk plugging for shrink_lruvec.

Merged CL bcf1e503f5ed774dc28126a0f1a8c839717eafac:
ANDROID: adjust vendor hook to control blk_plug

Bug: 255471591
Bug: 238728493
Change-Id: Iba2603ff2e1b62cf2ee8fd6969d8ccd71416a288
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit 89fed37332fd48e0cd13b256cd85d6929d5da319)
2023-01-13 18:43:29 +00:00
Minchan Kim
7df45e50a5 ANDROID: vendor hook to control blk_plug for memory reclaim
Add vendor hook to contorl blk plugging.

Bug: 255471591
Bug: 238728493
Change-Id: I96b73cec14f0d2fea46a4828526e6ae5aa5c71b7
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit a17e132ec4f290621666311e73f43202706d2743)
2023-01-13 18:43:29 +00:00
Minchan Kim
ba005d6032 ANDROID: vendor hook to control bh_lru and lru_cache_disable
Add vendor hook for bh_lru and lru_cache_disable

Bug: 238728493
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I81bfad317cf6e8633186ebb3238644306d7a102d
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit 74e2ea264cd1895c493b9008b62bfea98dacf3f6)
2023-01-13 18:43:29 +00:00
Minchan Kim
243f54dd3a ANDROID: vendor hook for TLB batching control
Add vendor hook for flushing TLB batching in zap_pte_range.

Merged CL 232bdcbd660b1129b7d8d0de25a563b476eeb522:
ANDROID: pass argument in zap_pte_range vendor hooks

Bug: 238728493
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: If2de5f070dd7b76624961f5a91440bf69a99ca2d
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit d257ef6764f228145d0fca24998162809bb5b9f7)
2023-01-13 18:43:29 +00:00
Minchan Kim
bb6ab2be93 ANDROID: vendor hook to control pagevec flush
The pagevec batching causes lru_add_drain_all which is too expensive
sometimes. This patch adds a new vendor hook to drain the pagevec
immediately depending on the page's type.

Bug: 251881967
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: Id17e14e69197993ddad511a40c96e51674c02834
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit 2f8253b7e6e563cc19cffa120c72f6f528664103)
2023-01-13 18:43:29 +00:00
Robin Hsu
1eeadb47e0 ANDROID: mm: vh for compaction begin/end
Add vendor hook for compaction begin/end.  The first use would be
to measure compaction durations.

Bug: 229927848
Test: local kernel build test
Signed-off-by: Robin Hsu <robinhsu@google.com>
Change-Id: I3d95434bf49b37199056dc9ddfc36a59a7de17b7
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit 13b6bd38bb1f43bfffdb08c8f3a4a20d36ccd670)
2023-01-13 18:43:29 +00:00
Minchan Kim
e3f396c2d4 ANDROID: add vendor_hook to control CMA allocation ratio
CMA first allocation policy for movable makes CMA(Upstream doesn't)
area always full. It's good for memory efficiency since it could use
up CMA available memory most of time. However, it could cause
cma_alloc slow since it causes a lot page migration all the time.

Let's add vendor hook for someone who want to restore CMA allocation
policy to upstream so they will see less page migration in cma_alloc.

If the vendor_hook returns false, the rmqueue_bulk return 0 without
filling pcp->lists so get_populated_pcp_list will return NULL.
Once get_populated_pcp_list returns NULL, __rmqueue_pcplist will retry
the page allocation with original migratetype(currently, original
migratetype couldn't be MIGRATE_CMA) so the retrial will find
available pages from !MIGRATE_CMA free list.

Bug: 231978523
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: Ia031d9bc6f34085b892a8d9923bf5b9b1794f94a
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit 0ca85e35bf5b4a7ff08f00a060c83e4a82380b64)
2023-01-13 18:43:29 +00:00
Dom Cobley
4f7af52a4b Merge remote-tracking branch 'stable/linux-5.15.y' into rpi-5.15.y 2023-01-13 13:14:02 +00:00
NARIBAYASHI Akira
35d8a89862 mm, compaction: fix fast_isolate_around() to stay within boundaries
commit be21b32afe470c5ae98e27e49201158a47032942 upstream.

Depending on the memory configuration, isolate_freepages_block() may scan
pages out of the target range and causes panic.

Panic can occur on systems with multiple zones in a single pageblock.

The reason it is rare is that it only happens in special
configurations.  Depending on how many similar systems there are, it
may be a good idea to fix this problem for older kernels as well.

The problem is that pfn as argument of fast_isolate_around() could be out
of the target range.  Therefore we should consider the case where pfn <
start_pfn, and also the case where end_pfn < pfn.

This problem should have been addressd by the commit 6e2b7044c1 ("mm,
compaction: make fast_isolate_freepages() stay within zone") but there was
an oversight.

 Case1: pfn < start_pfn

  <at memory compaction for node Y>
  |  node X's zone  | node Y's zone
  +-----------------+------------------------------...
   pageblock    ^   ^     ^
  +-----------+-----------+-----------+-----------+...
                ^   ^     ^
                ^   ^      end_pfn
                ^    start_pfn = cc->zone->zone_start_pfn
                 pfn
                <---------> scanned range by "Scan After"

 Case2: end_pfn < pfn

  <at memory compaction for node X>
  |  node X's zone  | node Y's zone
  +-----------------+------------------------------...
   pageblock  ^     ^   ^
  +-----------+-----------+-----------+-----------+...
              ^     ^   ^
              ^     ^    pfn
              ^      end_pfn
               start_pfn
              <---------> scanned range by "Scan Before"

It seems that there is no good reason to skip nr_isolated pages just after
given pfn.  So let perform simple scan from start to end instead of
dividing the scan into "Before" and "After".

Link: https://lkml.kernel.org/r/20221026112438.236336-1-a.naribayashi@fujitsu.com
Fixes: 6e2b7044c1 ("mm, compaction: make fast_isolate_freepages() stay within zone").
Signed-off-by: NARIBAYASHI Akira <a.naribayashi@fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-12 11:58:47 +01:00
davidchiang
6986b4c889 ANDROID: mm: Export find_vm_area
Export find_vm_area for obtaining pages of vmalloc'ed memory, which is
required for both GXP and TPU modules.

Bug: 263839332
Change-Id: I1d6c37a5abb6012c3ff295120dd2d3cb2871c820
Signed-off-by: davidchiang <davidchiang@google.com>
2023-01-11 04:31:52 +00:00
Charan Teja Kalla
c41503e313 ANDROID: page_pinner: prevent pp_buffer uninitialized access
There is a race window between page_pinner_inited set and the pp_buffer
initialization which cause accessing the pp_buffer->lock. Avoid this by
moving the pp_buffer initialization to page_ext_ops->init() which sets
the page_pinner_inited only after the pp_buffer is initialized.

Race scenario:
1) init_page_pinner is called --> page_pinner_inited is set.

2) __alloc_contig_migrate_range --> __page_pinner_failure_detect()
accesses the pp_buffer->lock(yet to be initialized).

3) Then the pp_buffer is allocated and initialized.

Below is the issue call stack:
 spin_bug+0x0
 _raw_spin_lock_irqsave+0x3c
 __page_pinner_failure_detect+0x110
 __alloc_contig_migrate_range+0x1c4
 alloc_contig_range+0x130
 cma_alloc+0x170
 dma_alloc_contiguous+0xa0
 __dma_direct_alloc_pages+0x16c
 dma_direct_alloc+0x88

Bug: 259024332
Change-Id: I6849ac4d944498b9a431b47cad7adc7903c9bbaa
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
2023-01-05 19:14:46 +00:00
Jaegeuk Kim
d68a75b1dc Merge "Merge remote-tracking branch 'aosp/upstream-f2fs-stable-linux-5.15.y' into android14-5.15" into android14-5.15 2023-01-05 17:17:32 +00:00
Martin Liu
d705ab99ab ANDROID: vendor_hooks: Export direct reclaim trace points
Get direct reclaim info.

Bug: 190795589
Signed-off-by: Martin Liu <liumartin@google.com>
Change-Id: Ie66a3c87484a364a918c19b8e044c82f1afd6749
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit 12902c9996)
2023-01-05 04:05:51 +00:00
Martin Liu
c20204c67d ANDROID: cma: allow to use CMA in swap-in path
Now, we allow to use CMA pages for certain user space
allocations. One of them is anonymous page fault case.
To align the use case, we should also allow to use CMA
pages in swap-in cases. This could help mitigate OOM
on swap-in cases showing plenty of free CMA left.

logd.klogd invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-1000
CPU: 0 PID: 433 Comm: logd.klogd Tainted: G        W  OE     5.10.100-android13-0 #1

Call trace:
 dump_backtrace.cfi_jt+0x0/0x8
 show_stack+0x1c/0x2c
 dump_stack_lvl+0xc0/0x13c
 dump_header+0x54/0x238
 oom_kill_process+0xb0/0x158
 out_of_memory+0x17c/0x328
 __alloc_pages_slowpath+0x5c4/0x8d0
 __alloc_pages_nodemask+0x1bc/0x2e0
 __read_swap_cache_async+0xdc/0x370
 swap_vma_readahead+0x3b4/0x488
 swapin_readahead+0x3c/0x54
 do_swap_page+0x1e0/0xaa0
 handle_pte_fault+0x128/0x1e0
 handle_mm_fault+0x308/0x590
 do_page_fault+0x33c/0x478
 do_translation_fault+0x58/0x11c
 do_mem_abort+0x68/0x144
 el0_da+0x24/0x34
 el0_sync_handler+0xc4/0xec
 el0_sync+0x1c0/0x200
Mem-Info:
active_anon:0 inactive_anon:3222 isolated_anon:62
 active_file:232 inactive_file:428 isolated_file:0
 unevictable:37232 dirty:3 writeback:40
 slab_reclaimable:19943 slab_unreclaimable:281193
 mapped:37126 shmem:2815 pagetables:8981 bounce:0
 free:126007 free_pcp:223 free_cma:123062
Node 0 active_anon:16kB inactive_anon:13160kB active_file:292kB inactive_file:2000kB unevictable:148928kB isolated(anon):0kB isolated(file):0kB mapped:148308kB dirty:12kB writeback:164kB shmem:11260kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:20528kB shadow_call_stack:5200kB all_unreclaimable? no
DMA32 free:14128kB min:7572kB low:22636kB high:37700kB reserved_highatomic:4096KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1913856kB managed:1553276kB mlocked:0kB pagetables:1292kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:2520kB
lowmem_reserve[]: 0 0 0
Normal free:489888kB min:19808kB low:59220kB high:98632kB reserved_highatomic:36864KB active_anon:20kB inactive_anon:12168kB active_file:0kB inactive_file:1640kB unevictable:148928kB writepending:180kB present:4194304kB managed:4063392kB mlocked:148928kB pagetables:34632kB bounce:0kB free_pcp:1928kB local_pcp:0kB free_cma:489752kB
lowmem_reserve[]: 0 0 0
DMA32: 166*4kB (UME) 163*8kB (UMECH) 592*16kB (UMCH) 5*32kB (UC) 2*64kB (C) 2*128kB (C) 2*256kB (C) 1*512kB (C) 1*1024kB (C) 0*2048kB 0*4096kB = 14032kB
Normal: 969*4kB (C) 77*8kB (C) 40*16kB (C) 17*32kB (C) 5*64kB (C) 1*128kB (C) 2*256kB (C) 1*512kB (C) 0*1024kB 0*2048kB 118*4096kB (C) = 490476kB
40220 total pagecache pages
30 pages in swap cache
Swap cache stats: add 2634625, delete 2635304, find 160621/2963954
Free swap  = 1473788kB
Total swap = 2097148kB

Bug: 229822798
Signed-off-by: Martin Liu <liumartin@google.com>
Change-Id: Ia0bb6f72e52f77f26062e1769bfd92e831f07cab
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit 3e591c63b13772a1de0ffed995b482166d27ed71)
2023-01-05 02:16:50 +00:00
Minchan Kim
dd887dbfaa ANDROID: mm: do not count cma_alloc_fail on __GFP_NORETRY
Do not account __GFP_NORETRY allocation failure as cma_alloc_fail
since it's not critical failure(i.e., the caller with __GFP_NORETRY
should always carry on the fallback plan). It's also good for
compatibility POV with upstream since upstream cma_alloc_fail
only counts cma_alloc_fail with !__GFP_NORETRY since upstream
doesn't support __GFP_NORTRY yet.

Bug: 220669548
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I377e6b033c3786e10b6b1c814037a4fc40e20a73
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit 8ffc7ff817fe552592daa2b0de1760e3539663f3)
2023-01-05 02:16:50 +00:00
Minchan Kim
bb7b81497d ANDROID: GKI: export cma_get_size
Export cma_get_size to tell cma instance's size, which is needed
to allocate entire pages of the cma.

Bug: 218731671
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: Ifb2769f60250ce605236342b950907218e1c28a5
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit 7a44906686048bdcecb7dfa4fac02c4ad7f6cd06)
2023-01-05 02:16:50 +00:00
Minchan Kim
1b38e981db ANDROID: mm: cma do not sleep for __GFP_NORETRY
Do not sleep for retrying for __GFP_NORERY since it's failfast
mode approach. User could retry the allocation without the flag
by themselves if they see the failure.

Bug: 192475091
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: Ic6a857978fda8e353b9ed770d1e0ba1808fd201e
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit 12f48605e8)
2023-01-05 02:16:50 +00:00
Minchan Kim
60d2dad38e ANDROID: mm: cma: skip problematic pageblock
alloc_contig_range is supposed to work on max(MAX_ORDER_NR_PAGES,
or pageblock_nr_pages) granularity aligned range. If it fails
at a page and return error to user, user doesn't know what page
makes the allocation failure and keep retrying another allocation
with new range including the failed page and encountered error
again and again until it could escape the out of the granularity
block. Instead, let's make CMA aware of what pfn was troubled in
previous trial and then continue to work new pageblock out of the
failed page so it doesn't see the repeated error repeatedly.

Currently, this option works for only __GFP_NORETRY case for
safe for existing CMA users.

Bug: 192475091
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I0959c9df3d4b36408a68920abbb4d52d31026079
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit 0e688e972d)
2023-01-05 02:16:50 +00:00
Minchan Kim
c63e78a29a ANDROID: mm: lru_cache_disable skips lru cache drainnig
lru_cache_disable is not trivial cost since it should run work
from every cores in the system. Thus, repeated call of the
function whenever alloc_contig_range in the cma's allocation loop
is called is expensive.

This patch makes the lru_cache_disable smarter in that it will
not run __lru_add_drain_all since it knows the cache was already
disabled by someone else.
With that, user of alloc_contig_range can disable the lru cache
in advance in their context so that subsequent alloc_contig_range
for user's operation will avoid the costly function call.

This patch moves lru_cache APIs from swap.h to swap.c and export
it for vendor users.

Bug: 192475091
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I23da8599c55db49dc80226285972e4cd80dedcff
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit c8578a3e90)
2023-01-05 02:16:50 +00:00
Minchan Kim
b74eeef409 ANDROID: mm: do not try test_page_isoalte if migration fails
Currently, alloc_contig_range expects that even though a page fails
with -EBUSY from __alloc_contig_migrate_range, it want to check
those failed pages in test_pages_isolated again with hope that
those page would be freed soon so cma allocatoin would be succeeded.
However, it depends on the luck and I found sometimes test_page_isolated
constantly fails at the page repeatedly whenever cma_alloc retried.
Rather than burning out CPU to check the page's status in
test_pages_isolated for GFP_NORETRY allocation, just bail out and
relies on the user what they want to do.
Currently, this option works for only __GFP_NORETRY case for safe
of existing other users.

Bug: 192475091
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I9211452be06960dc7d8f854537e53b3fc5262c8e
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit c01ce3b5ef)
2023-01-05 02:16:50 +00:00
Minchan Kim
a7f55c5c73 ANDROID: mm: add cma allocation statistics
alloc_contig_range is the core worker function for CMA allocation
so it has every information to be able to understand allocation
latency. For example, how many pages are migrated, how many time
unmap was needed to migrate pages, how many times it encountered
errors by some reasons.

This patch adds such statistics in the alloc_contig_range and
return it to user so user can use those information to analyize
latency. The cma_alloc is first user for the statistics, which
export the statistics as new trace event(i.e., cma_alloc_info).

It was really usefuli to optimize cma allocation work.

Bug: 192475091
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I7be43cc89d11078e2a324d2d06aada6d8e9e1cc9
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit 675e504598)
2023-01-05 02:16:50 +00:00
Minchan Kim
1b2de5aa2d ANDROID: mm: cma: add vendor hoook in cma_alloc()
Add vendor hook for cma_alloc latency measuring.

Bug: 177231781
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: Ia2dbb26454bd8f03489389b29b9a6c939d3c2bbb
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit c6e85ea56b)
2023-01-05 02:16:50 +00:00
Minchan Kim
1b6d55eb48 ANDROID: mm: build alloc_contig_dump_pages in page_alloc.o
GKI has CONFIG_DYNAMIC_DEBUG_CORE. Thus, to enable only the
specific alloc_contig_dump_pages without needing all pr_debug
in every source files is using -DCONFIG_DYNAMIC_MODULE
when the page_alloc.o compiled.

Bug: 182195592
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I93266eb4161b3653389c737d4c767fd5d1083339
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit 8d03e49505)
2023-01-05 02:16:50 +00:00
Minchan Kim
9cb760bc6a FROMLIST: mm: failfast mode with __GFP_NORETRY in alloc_contig_range
Contiguous memory allocation can be stalled due to waiting
on page writeback and/or page lock which causes unpredictable
delay. It's a unavoidable cost for the requestor to get *big*
contiguous memory but it's expensive for *small* contiguous
memory(e.g., order-4) because caller could retry the request
in different range where would have easy migratable pages
without stalling.

This patch introduce __GFP_NORETRY as compaction gfp_mask in
alloc_contig_range so it will fail fast without blocking
when it encounters pages needed waiting.

Bug: 170340257
Bug: 120293424
Link: https://lore.kernel.org/linux-mm/YAnM5PbNJZlk%2F%2FiX@google.com/T/#m1362218ebb69e6e10c20d9361008b079745c4e6f
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I42ba8dd5aeb065d936978ab205e4baf84bf9a321
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit 20512940b8)
2023-01-05 02:16:50 +00:00
Minchan Kim
eebff8eab2 FROMLIST: mm: cma: introduce gfp flag in cma_alloc instead of no_warn
The upcoming patch will introduce __GFP_NORETRY semantic
in alloc_contig_range which is a failfast mode of the API.
Instead of adding a additional parameter for gfp, replace
no_warn with gfp flag.

To keep old behaviors, it follows the rule below.

  no_warn 			gfp_flags

  false         		GFP_KERNEL
  true          		GFP_KERNEL|__GFP_NOWARN
  gfp & __GFP_NOWARN		GFP_KERNEL | (gfp & __GFP_NOWARN)

Bug: 170340257
Bug: 120293424
Link: https://lore.kernel.org/linux-mm/YAnM5PbNJZlk%2F%2FiX@google.com/T/#m36b144ff81fe0a8f0ecaf6813de4819ecc41f8fe
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I1ce020ab5d5fff34eb6464be4632ddef72fb43eb
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit 23ba990a3e)
2023-01-05 02:16:50 +00:00
Jaegeuk Kim
b205c093f2 Merge remote-tracking branch 'aosp/upstream-f2fs-stable-linux-5.15.y' into android14-5.15
* aosp/upstream-f2fs-stable-linux-5.15.y:
  f2fs: let's avoid panic if extent_tree is not created
  f2fs: should use a temp extent_info for lookup
  f2fs: don't mix to use union values in extent_info
  f2fs: initialize extent_cache parameter
  f2fs: fix to avoid NULL pointer dereference in f2fs_issue_flush()
  fs: account for group membership
  fs: fix acl translation
  fs: support mapped mounts of mapped filesystems
  fs: add i_user_ns() helper
  fs: port higher-level mapping helpers
  fs: remove unused low-level mapping helpers
  fs: use low-level mapping helpers
  docs: update mapping documentation
  fs: account for filesystem mappings
  fs: tweak fsuidgid_has_mapping()
  fs: move mapping helpers
  fs: add is_idmapped_mnt() helper
  Revert "fs: add is_idmapped_mnt() helper"
  Revert "fs: move mapping helpers"
  Revert "fs: tweak fsuidgid_has_mapping()"
  Revert "fs: account for filesystem mappings"
  Revert "docs: update mapping documentation"
  Revert "fs: use low-level mapping helpers"
  Revert "fs: remove unused low-level mapping helpers"
  Revert "fs: add i_user_ns() helper"
  Revert "fs: account for group membership"
  fsverity: simplify fsverity_get_digest()
  fsverity: stop using PG_error to track error status
  fs-verity: use kmap_local_page() instead of kmap()
  highmem: Make __kunmap_{local,atomic}() take const void pointer
  fs-verity: use memcpy_from_page()
  fs-verity: Use struct_size() helper in enable_verity()
  fs-verity: remove unused parameter desc_size in fsverity_create_info()
  fs-verity: define a function to return the integrity protected file digest
  fscrypt: add additional documentation for SM4 support
  fscrypt: remove unused Speck definitions
  fscrypt: Add SM4 XTS/CTS symmetric algorithm support
  blk-crypto: Add support for SM4-XTS blk crypto mode
  fscrypt: add comment for fscrypt_valid_enc_modes_v1()
  blk-crypto: Add a missing include directive
  blk-crypto: move internal only declarations to blk-crypto-internal.h
  blk-crypto: add a blk_crypto_config_supported_natively helper
  blk-crypto: don't use struct request_queue for public interfaces
  fscrypt: pass super_block to fscrypt_put_master_key_activeref()
  fscrypt: add fscrypt_context_for_new_inode
  fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size
  fscrypt: Add HCTR2 support for filename encryption
  fs: account for group membership
  fs: add i_user_ns() helper
  fs: remove unused low-level mapping helpers
  fs: use low-level mapping helpers
  docs: update mapping documentation
  fs: account for filesystem mappings
  fs: tweak fsuidgid_has_mapping()
  fs: move mapping helpers
  fs: add is_idmapped_mnt() helper

Bug: 256243893
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Change-Id: Ia7b419ef516dee40be32968cd7c60ce0adabca99
2023-01-04 17:13:29 -08:00
Minchan Kim
f74aca771c BACKPORT: mm: don't be stuck to rmap lock on reclaim path
The rmap locks(i_mmap_rwsem and anon_vma->root->rwsem) could be contended
under memory pressure if processes keep working on their vmas(e.g., fork,
mmap, munmap).  It makes reclaim path stuck.  In our real workload traces,
we see kswapd is waiting the lock for 300ms+(worst case, a sec) and it
makes other processes entering direct reclaim, which were also stuck on
the lock.

This patch makes lru aging path try_lock mode like shink_page_list so the
reclaim context will keep working with next lru pages without being stuck.
if it found the rmap lock contended, it rotates the page back to head of
lru in both active/inactive lrus to make them consistent behavior, which
is basic starting point rather than adding more heristic.

Since this patch introduces a new "contended" field as out-param along
with try_lock in-param in rmap_walk_control, it's not immutable any longer
if the try_lock is set so remove const keywords on rmap related functions.
Since rmap walking is already expensive operation, I doubt the const
would help sizable benefit( And we didn't have it until 5.17).

In a heavy app workload in Android, trace shows following statistics.  It
almost removes rmap lock contention from reclaim path.

Martin Liu reported:

Before:

   max_dur(ms)  min_dur(ms)  max-min(dur)ms  avg_dur(ms)  sum_dur(ms)  count blocked_function
         1632            0            1631   151.542173        31672    209  page_lock_anon_vma_read
          601            0             601   145.544681        28817    198  rmap_walk_file

After:

   max_dur(ms)  min_dur(ms)  max-min(dur)ms  avg_dur(ms)  sum_dur(ms)  count blocked_function
          NaN          NaN              NaN          NaN          NaN    0.0             NaN
            0            0                0     0.127645            1     12  rmap_walk_file

[minchan@kernel.org: add comment, per Matthew]
  Link: https://lkml.kernel.org/r/YnNqeB5tUf6LZ57b@google.com
Link: https://lkml.kernel.org/r/20220510215423.164547-1-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: John Dias <joaodias@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Martin Liu <liumartin@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Conflicts:
	folio->page

Conflicts:
mm/huge_memory.c was refactored by
commit c5b5a3dd2c mm: thp: refactor NUMA fault handling

(cherry picked from commit 6d4675e601357834dadd2ba1d803f6484596015c)
Bug: 239681156
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I0c63e0291120c8a1b5f2d83b8a7b210cb56c27a2
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit b8762fa2655469fb3f2c40a38f67e6f05afe4136)
2023-01-04 02:28:34 +00:00
Charan Teja Kalla
8ca606e98b FROMLIST: mm: fix use-after free of page_ext after race with memory-offline
The below is one path where race between page_ext and  offline of the
respective memory blocks will cause use-after-free on the access of
page_ext structure.

process1		              process2
---------                             ---------
a)doing /proc/page_owner           doing memory offline
			           through offline_pages.

b)PageBuddy check is failed
thus proceed to get the
page_owner information
through page_ext access.
page_ext = lookup_page_ext(page);

				    migrate_pages();
				    .................
				Since all pages are successfully
				migrated as part of the offline
				operation,send MEM_OFFLINE notification
				where for page_ext it calls:
				offline_page_ext()-->
				__free_page_ext()-->
				   free_page_ext()-->
				     vfree(ms->page_ext)
			           mem_section->page_ext = NULL

c) Check for the PAGE_EXT flags
in the page_ext->flags access
results into the use-after-free(leading
to the translation faults).

As mentioned above, there is really no synchronization between page_ext
access and its freeing in the memory_offline.

The memory offline steps(roughly) on a memory block is as below:
1) Isolate all the pages
2) while(1)
  try free the pages to buddy.(->free_list[MIGRATE_ISOLATE])
3) delete the pages from this buddy list.
4) Then free page_ext.(Note: The struct page is still alive as it is
freed only during hot remove of the memory which frees the memmap, which
steps the user might not perform).

This design leads to the state where struct page is alive but the struct
page_ext is freed, where the later is ideally part of the former which
just representing the page_flags (check [3] for why this design is
chosen).

The above mentioned race is just one example __but the problem persists
in the other paths too involving page_ext->flags access(eg:
page_is_idle())__.

Fix all the paths where offline races with page_ext access by
maintaining synchronization with rcu lock and is achieved in 3 steps:
1) Invalidate all the page_ext's of the sections of a memory block by
storing a flag in the LSB of mem_section->page_ext.

2) Wait till all the existing readers to finish working with the
->page_ext's with synchronize_rcu(). Any parallel process that starts
after this call will not get page_ext, through lookup_page_ext(), for
the block parallel offline operation is being performed.

3) Now safely free all sections ->page_ext's of the block on which
offline operation is being performed.

Note: If synchronize_rcu() takes time then optimizations can be done in
this path through call_rcu()[2].

Thanks to David Hildenbrand for his views/suggestions on the initial
discussion[1] and Pavan kondeti for various inputs on this patch.

[1] https://lore.kernel.org/linux-mm/59edde13-4167-8550-86f0-11fc67882107@quicinc.com/
[2] https://lore.kernel.org/all/a26ce299-aed1-b8ad-711e-a49e82bdd180@quicinc.com/T/#u
[3] https://lore.kernel.org/all/6fa6b7aa-731e-891c-3efb-a03d6a700efa@redhat.com/

Bug: 236222283
Bug: 240196534
Link: https://lore.kernel.org/all/1661496993-11473-1-git-send-email-quic_charante@quicinc.com/
Change-Id: Ib439ae19c61a557a5c70ea90e3c4b35a5583ba0d
Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Signed-off-by: Minchan Kim <minchan@google.com>
(fixed merge conflicts and still exported lookup_page_ext)
(minchan: fixed page_pinner with new page_ext scheme)
2023-01-04 02:18:52 +00:00
Minchan Kim
e12acd3eef ANDROID: mm: introduce page_pinner
For CMA allocation, it's really critical to migrate a page but
sometimes it fails. One of the reasons is some driver holds a
page refcount for a long time so VM couldn't migrate the page
at that time.

The concern here is there is no way to find the who hold the
refcount of the page effectively. This patch introduces feature
to keep tracking page's pinner. All get_page sites are vulnerable
to pin a page for a long time but the cost to keep track it would
be significat since get_page is the most frequent kernel operation.
Furthermore, the page could be not user page but kernel page which
is not related to the page migration failure.

Thus, this patch keeps tracks of only migration failed pages to
reduce runtime cost. Once page migration fails in CMA allocation
path, those pages are marked as "migration failure" and every
put_page operation against those pages, callstack of the put
are recorded into page_pinner buffer. Later, admin can see
what pages were failed and who released the refcount since the
failure. It really helps effectively to find out longtime refcount
holder to prevent the page migration.

note: page_pinner doesn't guarantee attributing/unattributing are
atomic if they happen at the same time. It's just best effort so
false-positive could happen.

Bug: 183414571
BUg: 240196534
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I603d0c0122734c377db6b1eb95848a6f734173a0
(cherry picked from commit 898cfbf094a2fc13c67fab5b5d3c916f0139833a)
2023-01-04 02:18:52 +00:00
Suren Baghdasaryan
a21b3ffd99 ANDROID: remove unnecessary SPECULATIVE_PAGE_FAULT config dependency
After recent fixes [1], speculative page fault walks are performed with
disabled interrupts, therefore do not depend on ALLOC_SPLIT_PTLOCKS
which would affect them if performed under RCU protection. Remove
unnecessary config dependency.

[1] 5fcb50b0559a ("ANDROID: mm: fix speculative walk which is unsafe under RCU")

Bug: 253557903
Change-Id: Ia1c835c7b08419f8fce61fa4f7e6842fbf786229
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2022-12-29 20:48:14 +00:00
Minchan Kim
98f3cc7ecd ANDROID: mm: freeing MIGRATE_ISOLATE page instantly
Since Android has pcp list for MIGRATE_CMA[1], it could cause
CMA allocation latency due to not freeing the MIGRATE_ISOLATE
page immediately.

Originally, MIGRATE_ISOLATED page is supposed to go buddy list
with skipping pcp list. Otherwise, the page could be reallocated
from pcp list or staying on the pcp list until the pcp is drained
so that CMA keeps retrying since it couldn't find the freed page
from buddy list. That worked before since the CMA pfnblocks changed
only from MIGRATE_CMA to MIGRATE_ISOLATE and free function logic
in page allocator has checked MIGRATE_ISOLATEness on every CMA
pages using below.

  free_unref_page_commit
    if (migratetype >= MIGRATE_PCPTYPES)
      if(is_migrate_isolate(migratetype))
        free_one_page(page);

It worked since enum MIGRATE_CMA was bigger than enum
MIGRATE_PCPTYPES but since [1], the enum MIGRATE_CMA is less than
MIGRATE_PCPTYPES so the logic above doesn't work any more.

It could cause following race

         CPU 0	                          CPU 1
  free_unref_page
  migratetype = get_pfnblock_migratetype()
  set_pcppage_migratetype(MIGRATE_CMA)

                                cma_alloc
				alloc_contig_range
                              	set_migrate_isolate(MIGRATE_ISOLATE)
  add the page into pcp list
  the page could be reallocated

This patch couldn't fix the race completely due to missing zone->lock
in order-0 page free(for performance reason). However, it's not a new
problem so we need to deal with the issue separately.

[1] ANDROID: mm: add cma pcp list

Bug: 218731671
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: Ibea20085ce5bfb4b74b83b041f9bda9a380120f9
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit d9e4b67784866047e8cfb5598cdf1ebc0c71f3d9)
2022-12-29 09:09:24 +00:00