kernel_arpi

Author	SHA1	Message	Date
Connor O'Brien	31f50e8abc	FROMGIT: bpf: btf: limit logging of ignored BTF mismatches Enabling CONFIG_MODULE_ALLOW_BTF_MISMATCH is an indication that BTF mismatches are expected and module loading should proceed anyway. Logging with pr_warn() on every one of these "benign" mismatches creates unnecessary noise when many such modules are loaded. Instead, handle this case with a single log warning that BTF info may be unavailable. Mismatches also result in calls to __btf_verifier_log() via __btf_verifier_log_type() or btf_verifier_log_member(), adding several additional lines of logging per mismatched module. Add checks to these paths to skip logging for module BTF mismatches in the "allow mismatch" case. All existing logging behavior is preserved in the default CONFIG_MODULE_ALLOW_BTF_MISMATCH=n case. Signed-off-by: Connor O'Brien <connoro@google.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20230107025331.3240536-1-connoro@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Bug: 260025057 Bug: 256578296 Bug: 258989318 (cherry picked from commit 9cb61e50bf6bf54db712bba6cf20badca4383f96 https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master) Signed-off-by: Connor O'Brien <connoro@google.com> Change-Id: I2f9d19a2027610af2e825fc38fd08f56f4cd7091	2023-01-17 23:18:26 +00:00
Minchan Kim	eebff8eab2	FROMLIST: mm: cma: introduce gfp flag in cma_alloc instead of no_warn The upcoming patch will introduce __GFP_NORETRY semantic in alloc_contig_range which is a failfast mode of the API. Instead of adding a additional parameter for gfp, replace no_warn with gfp flag. To keep old behaviors, it follows the rule below. no_warn gfp_flags false GFP_KERNEL true GFP_KERNEL\|__GFP_NOWARN gfp & __GFP_NOWARN GFP_KERNEL \| (gfp & __GFP_NOWARN) Bug: 170340257 Bug: 120293424 Link: https://lore.kernel.org/linux-mm/YAnM5PbNJZlk%2F%2FiX@google.com/T/#m36b144ff81fe0a8f0ecaf6813de4819ecc41f8fe Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: I1ce020ab5d5fff34eb6464be4632ddef72fb43eb Signed-off-by: Richard Chang <richardycc@google.com> (cherry picked from commit `23ba990a3e`)	2023-01-05 02:16:50 +00:00
Minchan Kim	abafa1328b	BACKPORT: locking: Add missing __sched attributes This patch adds __sched attributes to a few missing places to show blocked function rather than locking function in get_wchan. Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220115231657.84828-1-minchan@kernel.org Conflicts: kernel/locking/percpu-rwsem.c 1. conflict <linux/sched/debug.h> Bug: 228243692 Change-Id: Ifb50c13cfdd7484269d9a291a8da515e1cce6a7b (cherry picked from commit c441e934b604a3b5f350a9104124cf6a3ba07a34) Signed-off-by: Minchan Kim <minchan@google.com> Signed-off-by: Richard Chang <richardycc@google.com> (cherry picked from commit da358e264cbed0240debe37fa6867c81f0748bf8)	2023-01-04 02:28:50 +00:00
Greg Kroah-Hartman	5a91f1aa85	Merge 5.15.84 into android14-5.15 Changes in 5.15.84 x86/vdso: Conditionally export __vdso_sgx_enter_enclave() vfs: fix copy_file_range() averts filesystem freeze protection nfp: fix use-after-free in area_cache_get() ASoC: fsl_micfil: explicitly clear software reset bit ASoC: fsl_micfil: explicitly clear CHnF flags ASoC: ops: Check bounds for second channel in snd_soc_put_volsw_sx() libbpf: Use page size as max_entries when probing ring buffer map pinctrl: meditatek: Startup with the IRQs disabled can: sja1000: fix size of OCR_MODE_MASK define can: mcba_usb: Fix termination command argument net: fec: don't reset irq coalesce settings to defaults on "ip link up" ASoC: cs42l51: Correct PGA Volume minimum value perf: Fix perf_pending_task() UaF nvme-pci: clear the prp2 field when not used ASoC: ops: Correct bounds check for second channel on SX controls net: fec: properly guard irq coalesce setup Linux 5.15.84 Change-Id: I34ef5e73fca9da9a77c89b1f0c7ad4af37b63a79 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2022-12-19 17:29:08 +01:00
Ramji Jiyani	a6eaf3db80	ANDROID: GKI: Only protect exports if KMI symbols are present Only enforce export protection if there are symbols in the unprotected list for the Kernel Module Interface (KMI). This is only relevant for targets like arm64 that have defined ABI symbol lists. This allows non-GKI targets like arm and x86 to continue using GKI source code without disabling the feature for those targets. Bug: 232430739 Test: TH Fixes: `fd1e768866` ("ANDROID: GKI: Protect exports of protected GKI modules") Change-Id: Ie89e8f63eda99d9b7aacd1bb76d036b3ff4ba37c Signed-off-by: Ramji Jiyani <ramjiyani@google.com>	2022-12-19 16:16:00 +00:00
Peter Zijlstra	8bffa95ac1	perf: Fix perf_pending_task() UaF [ Upstream commit 517e6a301f34613bff24a8e35b5455884f2d83d8 ] Per syzbot it is possible for perf_pending_task() to run after the event is free()'d. There are two related but distinct cases: - the task_work was already queued before destroying the event; - destroying the event itself queues the task_work. The first cannot be solved using task_work_cancel() since perf_release() itself might be called from a task_work (____fput), which means the current->task_works list is already empty and task_work_cancel() won't be able to find the perf_pending_task() entry. The simplest alternative is extending the perf_event lifetime to cover the task_work. The second is just silly, queueing a task_work while you know the event is going away makes no sense and is easily avoided by re-arranging how the event is marked STATE_DEAD and ensuring it goes through STATE_OFF on the way down. Reported-by: syzbot+9228d6098455bb209ec8@syzkaller.appspotmail.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Marco Elver <elver@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-19 12:36:43 +01:00
Ramji Jiyani	fd1e768866	ANDROID: GKI: Protect exports of protected GKI modules Implement support for protecting the exported symbols of protected GKI modules. Only signed GKI modules are permitted to export symbols listed in the android/abi_gki_protected_exports file. Attempting to export these symbols from an unsigned module will result in the module failing to load, with a 'Permission denied' error message. Bug: 232430739 Test: TH Change-Id: I3e8b330938e116bb2e022d356ac0d55108a84a01 Signed-off-by: Ramji Jiyani <ramjiyani@google.com>	2022-12-16 16:44:54 +00:00
Greg Kroah-Hartman	63696badd9	Merge 5.15.83 into android14-5.15 Changes in 5.15.83 clk: generalize devm_clk_get() a bit clk: Provide new devm_clk helpers for prepared and enabled clocks mmc: mtk-sd: Fix missing clk_disable_unprepare in msdc_of_clock_parse() arm64: dts: rockchip: keep I2S1 disabled for GPIO function on ROCK Pi 4 series arm: dts: rockchip: fix node name for hym8563 rtc arm: dts: rockchip: remove clock-frequency from rtc ARM: dts: rockchip: fix ir-receiver node names arm64: dts: rockchip: fix ir-receiver node names ARM: dts: rockchip: rk3188: fix lcdc1-rgb24 node name fs: use acquire ordering in __fget_light() ARM: 9251/1: perf: Fix stacktraces for tracepoint events in THUMB2 kernels ARM: 9266/1: mm: fix no-MMU ZERO_PAGE() implementation ASoC: wm8962: Wait for updated value of WM8962_CLOCKING1 register spi: mediatek: Fix DEVAPC Violation at KO Remove ARM: dts: rockchip: disable arm_global_timer on rk3066 and rk3188 ASoC: rt711-sdca: fix the latency time of clock stop prepare state machine transitions 9p/fd: Use P9_HDRSZ for header size regulator: slg51000: Wait after asserting CS pin ALSA: seq: Fix function prototype mismatch in snd_seq_expand_var_event selftests/net: Find nettest in current directory btrfs: send: avoid unaligned encoded writes when attempting to clone range ASoC: soc-pcm: Add NULL check in BE reparenting regulator: twl6030: fix get status of twl6032 regulators fbcon: Use kzalloc() in fbcon_prepare_logo() usb: dwc3: gadget: Disable GUSB2PHYCFG.SUSPHY for End Transfer 9p/xen: check logical size for buffer size net: usb: qmi_wwan: add u-blox 0x1342 composition mm/khugepaged: take the right locks for page table retraction mm/khugepaged: fix GUP-fast interaction by sending IPI mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths rtc: mc146818-lib: extract mc146818_avoid_UIP rtc: cmos: avoid UIP when writing alarm time rtc: cmos: avoid UIP when reading alarm time cifs: fix use-after-free caused by invalid pointer `hostname` drm/bridge: anx7625: Fix edid_read break case in sp_tx_edid_read() xen/netback: Ensure protocol headers don't fall in the non-linear area xen/netback: do some code cleanup xen/netback: don't call kfree_skb() with interrupts disabled media: videobuf2-core: take mmap_lock in vb2_get_unmapped_area() soundwire: intel: Initialize clock stop timeout Revert "ARM: dts: imx7: Fix NAND controller size-cells" media: v4l2-dv-timings.c: fix too strict blanking sanity checks memcg: fix possible use-after-free in memcg_write_event_control() mm/gup: fix gup_pud_range() for dax Bluetooth: btusb: Add debug message for CSR controllers Bluetooth: Fix crash when replugging CSR fake controllers net: mana: Fix race on per-CQ variable napi work_done KVM: s390: vsie: Fix the initialization of the epoch extension (epdx) field drm/vmwgfx: Don't use screen objects when SEV is active drm/amdgpu/sdma_v4_0: turn off SDMA ring buffer in the s2idle suspend drm/shmem-helper: Remove errant put in error path drm/shmem-helper: Avoid vm_open error paths net: dsa: sja1105: avoid out of bounds access in sja1105_init_l2_policing() HID: usbhid: Add ALWAYS_POLL quirk for some mice HID: hid-lg4ff: Add check for empty lbuf HID: core: fix shift-out-of-bounds in hid_report_raw_event HID: ite: Enable QUIRK_TOUCHPAD_ON_OFF_REPORT on Acer Aspire Switch V 10 can: af_can: fix NULL pointer dereference in can_rcv_filter clk: Fix pointer casting to prevent oops in devm_clk_release() gpiolib: improve coding style for local variables gpiolib: check the 'ngpios' property in core gpiolib code gpiolib: fix memory leak in gpiochip_setup_dev() netfilter: nft_set_pipapo: Actually validate intervals in fields after the first one drm/vmwgfx: Fix race issue calling pin_user_pages ieee802154: cc2520: Fix error return code in cc2520_hw_init() ca8210: Fix crash by zero initializing data netfilter: ctnetlink: fix compilation warning after data race fixes in ct mark drm/bridge: ti-sn65dsi86: Fix output polarity setting bug gpio: amd8111: Fix PCI device reference count leak e1000e: Fix TX dispatch condition igb: Allocate MSI-X vector when testing net: broadcom: Add PTP_1588_CLOCK_OPTIONAL dependency for BCMGENET under ARCH_BCM2835 drm: bridge: dw_hdmi: fix preference of RGB modes over YUV420 af_unix: Get user_ns from in_skb in unix_diag_get_exact(). vmxnet3: correctly report encapsulated LRO packet vmxnet3: use correct intrConf reference when using extended queues Bluetooth: 6LoWPAN: add missing hci_dev_put() in get_l2cap_conn() Bluetooth: Fix not cleanup led when bt_init fails net: dsa: ksz: Check return value net: dsa: hellcreek: Check return value net: dsa: sja1105: Check return value selftests: rtnetlink: correct xfrm policy rule in kci_test_ipsec_offload mac802154: fix missing INIT_LIST_HEAD in ieee802154_if_add() net: encx24j600: Add parentheses to fix precedence net: encx24j600: Fix invalid logic in reading of MISTAT register net: mdiobus: fwnode_mdiobus_register_phy() rework error handling net: mdiobus: fix double put fwnode in the error path octeontx2-pf: Fix potential memory leak in otx2_init_tc() xen-netfront: Fix NULL sring after live migration net: mvneta: Prevent out of bounds read in mvneta_config_rss() i40e: Fix not setting default xps_cpus after reset i40e: Fix for VF MAC address 0 i40e: Disallow ip4 and ip6 l4_4_bytes NFC: nci: Bounds check struct nfc_target arrays nvme initialize core quirks before calling nvme_init_subsystem gpio/rockchip: fix refcount leak in rockchip_gpiolib_register() net: stmmac: fix "snps,axi-config" node property parsing ip_gre: do not report erspan version on GRE interface net: microchip: sparx5: Fix missing destroy_workqueue of mact_queue net: thunderx: Fix missing destroy_workqueue of nicvf_rx_mode_wq net: hisilicon: Fix potential use-after-free in hisi_femac_rx() net: mdio: fix unbalanced fwnode reference count in mdio_device_release() net: hisilicon: Fix potential use-after-free in hix5hd2_rx() tipc: Fix potential OOB in tipc_link_proto_rcv() ipv4: Fix incorrect route flushing when source address is deleted ipv4: Fix incorrect route flushing when table ID 0 is used net: dsa: sja1105: fix memory leak in sja1105_setup_devlink_regions() tipc: call tipc_lxc_xmit without holding node_read_lock ethernet: aeroflex: fix potential skb leak in greth_init_rings() dpaa2-switch: Fix memory leak in dpaa2_switch_acl_entry_add() and dpaa2_switch_acl_entry_remove() xen/netback: fix build warning net: phy: mxl-gpy: fix version reporting net: plip: don't call kfree_skb/dev_kfree_skb() under spin_lock_irq() ipv6: avoid use-after-free in ip6_fragment() net: thunderbolt: fix memory leak in tbnet_open() net: mvneta: Fix an out of bounds check macsec: add missing attribute validation for offload s390/qeth: fix various format strings s390/qeth: fix use-after-free in hsci can: esd_usb: Allow REC and TEC to return to zero block: move CONFIG_BLOCK guard to top Makefile io_uring: move to separate directory io_uring: Fix a null-ptr-deref in io_tctx_exit_cb() Linux 5.15.83 Change-Id: I08ef74d6ad8786c191050294dcbf1090908e7c4d Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2022-12-14 13:04:18 +01:00
Jens Axboe	f435c66d23	io_uring: move to separate directory [ Upstream commit ed29b0b4fd835b058ddd151c49d021e28d631ee6 ] In preparation for splitting io_uring up a bit, move it into its own top level directory. It didn't really belong in fs/ anyway, as it's not a file system only API. This adds io_uring/ and moves the core files in there, and updates the MAINTAINERS file for the new location. Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: 998b30c3948e ("io_uring: Fix a null-ptr-deref in io_tctx_exit_cb()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-14 11:37:31 +01:00
Tejun Heo	aad8bbd17a	memcg: fix possible use-after-free in memcg_write_event_control() commit 4a7ba45b1a435e7097ca0f79a847d0949d0eb088 upstream. memcg_write_event_control() accesses the dentry->d_name of the specified control fd to route the write call. As a cgroup interface file can't be renamed, it's safe to access d_name as long as the specified file is a regular cgroup file. Also, as these cgroup interface files can't be removed before the directory, it's safe to access the parent too. Prior to `347c4a8747` ("memcg: remove cgroup_event->cft"), there was a call to __file_cft() which verified that the specified file is a regular cgroupfs file before further accesses. The cftype pointer returned from __file_cft() was no longer necessary and the commit inadvertently dropped the file type check with it allowing any file to slip through. With the invarients broken, the d_name and parent accesses can now race against renames and removals of arbitrary files and cause use-after-free's. Fix the bug by resurrecting the file type check in __file_cft(). Now that cgroupfs is implemented through kernfs, checking the file operations needs to go through a layer of indirection. Instead, let's check the superblock and dentry type. Link: https://lkml.kernel.org/r/Y5FRm/cfcKPGzWwl@slm.duckdns.org Fixes: `347c4a8747` ("memcg: remove cgroup_event->cft") Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jann Horn <jannh@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: <stable@vger.kernel.org> [3.14+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-12-14 11:37:19 +01:00
Rick Yiu	757d548ca6	ANDROID: kernel: sched: Export reweight_task Export reweight_task for vendor usage when they are trying to manipulate task prio. After the prio changed, it will need to update its load weight to take effect. Therefore, this function needs to be called from vendor kernel module. It could be used with trace_android_rvh_set_user_nice and trace_android_rvh_setscheduler. Bug: 245675204 Change-Id: I0033518bf1cbd0a8129795743b95340f439d5fe8 Signed-off-by: Rick Yiu <rickyiu@google.com> (cherry picked from commit db144888f871fa149c0d18844faa971a3bf8f4fc)	2022-12-09 08:34:21 +00:00
Greg Kroah-Hartman	b3d82fd3de	Merge 5.15.82 into android14-5.15 Changes in 5.15.82 arm64: mte: Avoid setting PG_mte_tagged if no tags cleared or restored drm/i915: Create a dummy object for gen6 ppgtt drm/i915/gt: Use i915_vm_put on ppgtt_create error paths erofs: fix order >= MAX_ORDER warning due to crafted negative i_size btrfs: sink iterator parameter to btrfs_ioctl_logical_to_ino btrfs: free btrfs_path before copying inodes to userspace spi: spi-imx: Fix spi_bus_clk if requested clock is higher than input clock btrfs: move QUOTA_ENABLED check to rescan_should_stop from btrfs_qgroup_rescan_worker btrfs: qgroup: fix sleep from invalid context bug in btrfs_qgroup_inherit() drm/display/dp_mst: Fix drm_dp_mst_add_affected_dsc_crtcs() return code drm/amdgpu: update drm_display_info correctly when the edid is read drm/amdgpu: Partially revert "drm/amdgpu: update drm_display_info correctly when the edid is read" iio: health: afe4403: Fix oob read in afe4403_read_raw iio: health: `afe4404`: Fix oob read in afe4404_[read\|write]_raw iio: light: rpr0521: add missing Kconfig dependencies bpf, perf: Use subprog name when reporting subprog ksymbol scripts/faddr2line: Fix regression in name resolution on ppc64le ARM: at91: rm9200: fix usb device clock id libbpf: Handle size overflow for ringbuf mmap hwmon: (ltc2947) fix temperature scaling hwmon: (ina3221) Fix shunt sum critical calculation hwmon: (i5500_temp) fix missing pci_disable_device() hwmon: (ibmpex) Fix possible UAF when ibmpex_register_bmc() fails bpf: Do not copy spin lock field from user in bpf_selem_alloc nvmem: rmem: Fix return value check in rmem_read() of: property: decrement node refcount in of_fwnode_get_reference_args() ixgbevf: Fix resource leak in ixgbevf_init_module() i40e: Fix error handling in i40e_init_module() fm10k: Fix error handling in fm10k_init_module() iavf: remove redundant ret variable iavf: Fix error handling in iavf_init_module() e100: Fix possible use after free in e100_xmit_prepare net/mlx5: DR, Rename list field in matcher struct to list_node net/mlx5: DR, Fix uninitialized var warning net/mlx5: Fix uninitialized variable bug in outlen_write() net/mlx5e: Fix use-after-free when reverting termination table can: sja1000_isa: sja1000_isa_probe(): add missing free_sja1000dev() can: cc770: cc770_isa_probe(): add missing free_cc770dev() can: etas_es58x: es58x_init_netdev(): free netdev when register_candev() can: m_can: pci: add missing m_can_class_free_dev() in probe/remove methods can: m_can: Add check for devm_clk_get qlcnic: fix sleep-in-atomic-context bugs caused by msleep aquantia: Do not purge addresses when setting the number of rings wifi: cfg80211: fix buffer overflow in elem comparison wifi: cfg80211: don't allow multi-BSSID in S1G wifi: mac8021: fix possible oob access in ieee80211_get_rate_duration net: phy: fix null-ptr-deref while probe() failed net: ethernet: ti: am65-cpsw: fix error handling in am65_cpsw_nuss_probe() net: net_netdev: Fix error handling in ntb_netdev_init_module() net/9p: Fix a potential socket leak in p9_socket_open net: ethernet: nixge: fix NULL dereference net: wwan: iosm: fix kernel test robot reported error net: wwan: iosm: fix dma_alloc_coherent incompatible pointer type dsa: lan9303: Correct stat name tipc: re-fetch skb cb after tipc_msg_validate net: hsr: Fix potential use-after-free net: mdiobus: fix unbalanced node reference count afs: Fix fileserver probe RTT handling net: tun: Fix use-after-free in tun_detach() packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE sctp: fix memory leak in sctp_stream_outq_migrate() net: ethernet: renesas: ravb: Fix promiscuous mode after system resumed hwmon: (coretemp) Check for null before removing sysfs attrs hwmon: (coretemp) fix pci device refcount leak in nv1a_ram_new() riscv: vdso: fix section overlapping under some conditions riscv: mm: Proper page permissions after initmem free ALSA: dice: fix regression for Lexicon I-ONIX FW810S error-injection: Add prompt for function error injection tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep" nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry() x86/bugs: Make sure MSR_SPEC_CTRL is updated properly upon resume from S3 pinctrl: intel: Save and restore pins in "direct IRQ" mode v4l2: don't fall back to follow_pfn() if pin_user_pages_fast() fails net: stmmac: Set MAC's flow control register to reflect current settings mmc: mmc_test: Fix removal of debugfs file mmc: core: Fix ambiguous TRIM and DISCARD arg mmc: sdhci-esdhc-imx: correct CQHCI exit halt state check mmc: sdhci-sprd: Fix no reset data and command after voltage switch mmc: sdhci: Fix voltage switch delay drm/amdgpu: temporarily disable broken Clang builds due to blown stack-frame drm/amdgpu: enable Vangogh VCN indirect sram mode drm/i915: Fix negative value passed as remaining time drm/i915: Never return 0 if not all requests retired tracing/osnoise: Fix duration type tracing: Fix race where histograms can be called before the event tracing: Free buffers when a used dynamic event is removed io_uring: update res mask in io_poll_check_events io_uring: fix tw losing poll events io_uring: cmpxchg for poll arm refs release io_uring: make poll refs more robust io_uring/poll: fix poll_refs race with cancelation KVM: x86/mmu: Fix race condition in direct_page_fault ASoC: ops: Fix bounds check for _sx controls pinctrl: single: Fix potential division by zero riscv: Sync efi page table's kernel mappings before switching riscv: fix race when vmap stack overflow riscv: kexec: Fixup irq controller broken in kexec crash path nvme: fix SRCU protection of nvme_ns_head list iommu/vt-d: Fix PCI device refcount leak in has_external_pci() iommu/vt-d: Fix PCI device refcount leak in dmar_dev_scope_init() mm: __isolate_lru_page_prepare() in isolate_migratepages_block() mm: migrate: fix THP's mapcount on isolation parisc: Increase FRAME_WARN to 2048 bytes on parisc Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled selftests: net: add delete nexthop route warning test selftests: net: fix nexthop warning cleanup double ip typo ipv4: Handle attempt to delete multipath route when fib_info contains an nh reference ipv4: Fix route deletion when nexthop info is not specified serial: stm32: Factor out GPIO RTS toggling into separate function serial: stm32: Use TC interrupt to deassert GPIO RTS in RS485 mode serial: stm32: Deassert Transmit Enable on ->rs485_config() i2c: npcm7xx: Fix error handling in npcm_i2c_init() i2c: imx: Only DMA messages with I2C_M_DMA_SAFE flag set ACPI: HMAT: remove unnecessary variable initialization ACPI: HMAT: Fix initiator registration for single-initiator systems Revert "clocksource/drivers/riscv: Events are stopped during CPU suspend" char: tpm: Protect tpm_pm_suspend with locks Input: raydium_ts_i2c - fix memory leak in raydium_i2c_send() ipc/sem: Fix dangling sem_array access in semtimedop race proc: avoid integer type confusion in get_proc_long proc: proc_skip_spaces() shouldn't think it is working on C strings Linux 5.15.82 Change-Id: I1631261fcfd321674c546c41628bb3283e02f1a5 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2022-12-08 16:43:35 +00:00
Linus Torvalds	48642f9431	proc: proc_skip_spaces() shouldn't think it is working on C strings commit bce9332220bd677d83b19d21502776ad555a0e73 upstream. proc_skip_spaces() seems to think it is working on C strings, and ends up being just a wrapper around skip_spaces() with a really odd calling convention. Instead of basing it on skip_spaces(), it should have looked more like proc_skip_char(), which really is the exact same function (except it skips a particular character, rather than whitespace). So use that as inspiration, odd coding and all. Now the calling convention actually makes sense and works for the intended purpose. Reported-and-tested-by: Kyle Zeng <zengyhkyle@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-12-08 11:28:45 +01:00
Linus Torvalds	3eb9213f66	proc: avoid integer type confusion in get_proc_long commit e6cfaf34be9fcd1a8285a294e18986bfc41a409c upstream. proc_get_long() is passed a size_t, but then assigns it to an 'int' variable for the length. Let's not do that, even if our IO paths are limited to MAX_RW_COUNT (exactly because of these kinds of type errors). So do the proper test in the rigth type. Reported-by: Kyle Zeng <zengyhkyle@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-12-08 11:28:45 +01:00
Steven Rostedt (Google)	417d5ea6e7	tracing: Free buffers when a used dynamic event is removed commit 4313e5a613049dfc1819a6dfb5f94cf2caff9452 upstream. After 65536 dynamic events have been added and removed, the "type" field of the event then uses the first type number that is available (not currently used by other events). A type number is the identifier of the binary blobs in the tracing ring buffer (known as events) to map them to logic that can parse the binary blob. The issue is that if a dynamic event (like a kprobe event) is traced and is in the ring buffer, and then that event is removed (because it is dynamic, which means it can be created and destroyed), if another dynamic event is created that has the same number that new event's logic on parsing the binary blob will be used. To show how this can be an issue, the following can crash the kernel: # cd /sys/kernel/tracing # for i in `seq 65536`; do echo 'p:kprobes/foo do_sys_openat2 $arg1:u32' > kprobe_events # done For every iteration of the above, the writing to the kprobe_events will remove the old event and create a new one (with the same format) and increase the type number to the next available on until the type number reaches over 65535 which is the max number for the 16 bit type. After it reaches that number, the logic to allocate a new number simply looks for the next available number. When an dynamic event is removed, that number is then available to be reused by the next dynamic event created. That is, once the above reaches the max number, the number assigned to the event in that loop will remain the same. Now that means deleting one dynamic event and created another will reuse the previous events type number. This is where bad things can happen. After the above loop finishes, the kprobes/foo event which reads the do_sys_openat2 function call's first parameter as an integer. # echo 1 > kprobes/foo/enable # cat /etc/passwd > /dev/null # cat trace cat-2211 [005] .... 2007.849603: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196 cat-2211 [005] .... 2007.849620: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196 cat-2211 [005] .... 2007.849838: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196 cat-2211 [005] .... 2007.849880: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196 # echo 0 > kprobes/foo/enable Now if we delete the kprobe and create a new one that reads a string: # echo 'p:kprobes/foo do_sys_openat2 +0($arg2):string' > kprobe_events And now we can the trace: # cat trace sendmail-1942 [002] ..... 530.136320: foo: (do_sys_openat2+0x0/0x240) arg1= cat-2046 [004] ..... 530.930817: foo: (do_sys_openat2+0x0/0x240) arg1="��" cat-2046 [004] ..... 530.930961: foo: (do_sys_openat2+0x0/0x240) arg1="��" cat-2046 [004] ..... 530.934278: foo: (do_sys_openat2+0x0/0x240) arg1="��" cat-2046 [004] ..... 530.934563: foo: (do_sys_openat2+0x0/0x240) arg1="��" bash-1515 [007] ..... 534.299093: foo: (do_sys_openat2+0x0/0x240) arg1="kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk��@��4Z��;Y��U And dmesg has: ================================================================== BUG: KASAN: use-after-free in string+0xd4/0x1c0 Read of size 1 at addr ffff88805fdbbfa0 by task cat/2049 CPU: 0 PID: 2049 Comm: cat Not tainted 6.1.0-rc6-test+ #641 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 Call Trace: <TASK> dump_stack_lvl+0x5b/0x77 print_report+0x17f/0x47b kasan_report+0xad/0x130 string+0xd4/0x1c0 vsnprintf+0x500/0x840 seq_buf_vprintf+0x62/0xc0 trace_seq_printf+0x10e/0x1e0 print_type_string+0x90/0xa0 print_kprobe_event+0x16b/0x290 print_trace_line+0x451/0x8e0 s_show+0x72/0x1f0 seq_read_iter+0x58e/0x750 seq_read+0x115/0x160 vfs_read+0x11d/0x460 ksys_read+0xa9/0x130 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fc2e972ade2 Code: c0 e9 b2 fe ff ff 50 48 8d 3d b2 3f 0a 00 e8 05 f0 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24 RSP: 002b:00007ffc64e687c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fc2e972ade2 RDX: 0000000000020000 RSI: 00007fc2e980d000 RDI: 0000000000000003 RBP: 00007fc2e980d000 R08: 00007fc2e980c010 R09: 0000000000000000 R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000020f00 R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 </TASK> The buggy address belongs to the physical page: page:ffffea00017f6ec0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5fdbb flags: 0xfffffc0000000(node=0\|zone=1\|lastcpupid=0x1fffff) raw: 000fffffc0000000 0000000000000000 ffffea00017f6ec8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88805fdbbe80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88805fdbbf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff88805fdbbf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff88805fdbc000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88805fdbc080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== This was found when Zheng Yejian sent a patch to convert the event type number assignment to use IDA, which gives the next available number, and this bug showed up in the fuzz testing by Yujie Liu and the kernel test robot. But after further analysis, I found that this behavior is the same as when the event type numbers go past the 16bit max (and the above shows that). As modules have a similar issue, but is dealt with by setting a "WAS_ENABLED" flag when a module event is enabled, and when the module is freed, if any of its events were enabled, the ring buffer that holds that event is also cleared, to prevent reading stale events. The same can be done for dynamic events. If any dynamic event that is being removed was enabled, then make sure the buffers they were enabled in are now cleared. Link: https://lkml.kernel.org/r/20221123171434.545706e3@gandalf.local.home Link: https://lore.kernel.org/all/20221110020319.1259291-1-zhengyejian1@huawei.com/ Cc: stable@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Depends-on: e18eb8783ec49 ("tracing: Add tracing_reset_all_online_cpus_unlocked() function") Depends-on: `5448d44c38` ("tracing: Add unified dynamic event framework") Depends-on: `6212dd2968` ("tracing/kprobes: Use dyn_event framework for kprobe events") Depends-on: `065e63f951` ("tracing: Only have rmmod clear buffers that its events were active in") Depends-on: `575380da8b` ("tracing: Only clear trace buffer on module unload if event was traced") Fixes: `77b44d1b7c` ("tracing/kprobes: Rename Kprobe-tracer to kprobe-event") Reported-by: Zheng Yejian <zhengyejian1@huawei.com> Reported-by: Yujie Liu <yujie.liu@intel.com> Reported-by: kernel test robot <yujie.liu@intel.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-12-08 11:28:43 +01:00
Steven Rostedt (Google)	52fc245d15	tracing: Fix race where histograms can be called before the event commit ef38c79a522b660f7f71d45dad2d6244bc741841 upstream. commit 94eedf3dded5 ("tracing: Fix race where eprobes can be called before the event") fixed an issue where if an event is soft disabled, and the trigger is being added, there's a small window where the event sees that there's a trigger but does not see that it requires reading the event yet, and then calls the trigger with the record == NULL. This could be solved with adding memory barriers in the hot path, or to make sure that all the triggers requiring a record check for NULL. The latter was chosen. Commit 94eedf3dded5 set the eprobe trigger handle to check for NULL, but the same needs to be done with histograms. Link: https://lore.kernel.org/linux-trace-kernel/20221118211809.701d40c0f8a757b0df3c025a@kernel.org/ Link: https://lore.kernel.org/linux-trace-kernel/20221123164323.03450c3a@gandalf.local.home Cc: Tom Zanussi <zanussi@kernel.org> Cc: stable@vger.kernel.org Fixes: `7491e2c442` ("tracing: Add a probe that attaches to trace events") Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-12-08 11:28:43 +01:00
Daniel Bristot de Oliveira	cb2b0612cd	tracing/osnoise: Fix duration type commit 022632f6c43a86f2135642dccd5686de318e861d upstream. The duration type is a 64 long value, not an int. This was causing some long noise to report wrong values. Change the duration to a 64 bits value. Link: https://lkml.kernel.org/r/a93d8a8378c7973e9c609de05826533c9e977939.1668692096.git.bristot@kernel.org Cc: stable@vger.kernel.org Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Fixes: `bce29ac9ce` ("trace: Add osnoise tracer") Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-12-08 11:28:43 +01:00
Xu Kuohai	e376183167	bpf: Do not copy spin lock field from user in bpf_selem_alloc [ Upstream commit 836e49e103dfeeff670c934b7d563cbd982fce87 ] bpf_selem_alloc function is used by inode_storage, sk_storage and task_storage maps to set map value, for these map types, there may be a spin lock in the map value, so if we use memcpy to copy the whole map value from user, the spin lock field may be initialized incorrectly. Since the spin lock field is zeroed by kzalloc, call copy_map_value instead of memcpy to skip copying the spin lock field to fix it. Fixes: `6ac99e8f23` ("bpf: Introduce bpf sk local storage") Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/20221114134720.1057939-2-xukuohai@huawei.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-08 11:28:39 +01:00
Hou Tao	ee3d37d796	bpf, perf: Use subprog name when reporting subprog ksymbol [ Upstream commit 47df8a2f78bc34ff170d147d05b121f84e252b85 ] Since commit `bfea9a8574` ("bpf: Add name to struct bpf_ksym"), when reporting subprog ksymbol to perf, prog name instead of subprog name is used. The backtrace of bpf program with subprogs will be incorrect as shown below: ffffffffc02deace bpf_prog_e44a3057dcb151f8_overwrite+0x66 ffffffffc02de9f7 bpf_prog_e44a3057dcb151f8_overwrite+0x9f ffffffffa71d8d4e trace_call_bpf+0xce ffffffffa71c2938 perf_call_bpf_enter.isra.0+0x48 overwrite is the entry program and it invokes the overwrite_htab subprog through bpf_loop, but in above backtrace, overwrite program just jumps inside itself. Fixing it by using subprog name when reporting subprog ksymbol. After the fix, the output of perf script will be correct as shown below: ffffffffc031aad2 bpf_prog_37c0bec7d7c764a4_overwrite_htab+0x66 ffffffffc031a9e7 bpf_prog_c7eb827ef4f23e71_overwrite+0x9f ffffffffa3dd8d4e trace_call_bpf+0xce ffffffffa3dc2938 perf_call_bpf_enter.isra.0+0x48 Fixes: `bfea9a8574` ("bpf: Add name to struct bpf_ksym") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20221114095733.158588-1-houtao@huaweicloud.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-08 11:28:38 +01:00
Qais Yousef	cd65f53a58	ANDROID: sched/uclamp: Don't enable uclamp_is_used static key by in-kernel requests We do have now in-kernel users of uclamp to implement inheritance. The static_branch_enable() path unconditionally holds the cpus_read_lock() which might_sleep(). The path in binder that implements inheritance happens from in_atomic() context which leads to a splat like this one: [ 147.529960] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:56 [ 147.530196] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2586, name: RenderThread [ 147.530410] INFO: lockdep is turned off. [ 147.530518] Preemption disabled at: [ 147.530521] [<ffffffc008ca2cec>] binder_proc_transaction+0x78/0x41c [ 147.530793] CPU: 8 PID: 2586 Comm: RenderThread Tainted: G S W O 5.15.76-android14-5-00086-gc01afe5d262f #1 [ 147.531214] Call trace: [ 147.531288] dump_backtrace+0xe8/0x134 [ 147.531444] show_stack+0x1c/0x4c [ 147.531598] dump_stack_lvl+0x74/0x94 [ 147.531766] dump_stack+0x14/0x3c [ 147.531920] ___might_sleep+0x210/0x230 [ 147.532094] __might_sleep+0x54/0x84 [ 147.532259] cpus_read_lock+0x2c/0x160 [ 147.532429] static_key_enable+0x1c/0x34 [ 147.532608] __sched_setscheduler+0x2a8/0x99c [ 147.532802] sched_setattr_nocheck+0x1c/0x24 [ 147.532994] binder_do_set_priority+0x31c/0x4a4 [ 147.533195] binder_transaction_priority+0x200/0x3f4 [ 147.533413] binder_proc_transaction+0x220/0x41c [ 147.533618] binder_transaction+0x1df0/0x234c [ 147.533812] binder_thread_write+0xd84/0x2398 [ 147.534007] binder_ioctl_write_read+0x19c/0xb28 [ 147.534212] binder_ioctl+0x344/0x1a3c [ 147.534382] __arm64_sys_ioctl+0x94/0xc8 [ 147.534561] invoke_syscall+0x44/0xf8 [ 147.534729] el0_svc_common+0xc8/0x10c [ 147.534900] do_el0_svc+0x20/0x28 [ 147.535053] el0_svc+0x58/0xe0 [ 147.535198] el0t_64_sync_handler+0x7c/0xe4 [ 147.535386] el0t_64_sync+0x188/0x18c Prevent enabling the lock for !user initiated sched_setattr() operations. Generally we don't expect in-kernel uclamp users. Bug: 259145692 Signed-off-by: Qais Yousef <qyousef@google.com> Change-Id: Iac5be139b5ffd39f5e1c0431ce253133d81b98cf	2022-12-05 20:29:35 +00:00
Greg Kroah-Hartman	880a2689a3	Merge 5.15.81 into android14-5.15 Changes in 5.15.81 ASoC: fsl_sai: use local device pointer ASoC: fsl_asrc fsl_esai fsl_sai: allow CONFIG_PM=N serial: Add rs485_supported to uart_port serial: fsl_lpuart: Fill in rs485_supported tty: serial: fsl_lpuart: don't break the on-going transfer when global reset sctp: remove the unnecessary sinfo_stream check in sctp_prsctp_prune_unsent sctp: clear out_curr if all frag chunks of current msg are pruned cifs: introduce new helper for cifs_reconnect() cifs: split out dfs code from cifs_reconnect() cifs: support nested dfs links over reconnect cifs: Fix connections leak when tlink setup failed ata: libata-scsi: simplify __ata_scsi_queuecmd() ata: libata-core: do not issue non-internal commands once EH is pending drm/display: Don't assume dual mode adaptors support i2c sub-addressing nvme: add a bogus subsystem NQN quirk for Micron MTFDKBA2T0TFH nvme-pci: add NVME_QUIRK_BOGUS_NID for Micron Nitro nvme-pci: disable namespace identifiers for the MAXIO MAP1001 nvme-pci: disable write zeroes on various Kingston SSD nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV7000 iio: ms5611: Simplify IO callback parameters iio: pressure: ms5611: fixed value compensation bug ceph: do not update snapshot context when there is no new snapshot ceph: avoid putting the realm twice when decoding snaps fails x86/sgx: Create utility to validate user provided offset and length x86/sgx: Add overflow check in sgx_validate_offset_length() binder: validate alloc->mm in ->mmap() handler ceph: Use kcalloc for allocating multiple elements ceph: fix NULL pointer dereference for req->r_session wifi: mac80211: fix memory free error when registering wiphy fail wifi: mac80211_hwsim: fix debugfs attribute ps with rc table support riscv: dts: sifive unleashed: Add PWM controlled LEDs audit: fix undefined behavior in bit shift for AUDIT_BIT wifi: airo: do not assign -1 to unsigned char wifi: mac80211: Fix ack frame idr leak when mesh has no route wifi: ath11k: Fix QCN9074 firmware boot on x86 spi: stm32: fix stm32_spi_prepare_mbr() that halves spi clk for every run selftests/bpf: Add verifier test for release_reference() Revert "net: macsec: report real_dev features when HW offloading is enabled" platform/x86: ideapad-laptop: Disable touchpad_switch platform/x86: touchscreen_dmi: Add info for the RCA Cambio W101 v2 2-in-1 platform/x86/intel/pmt: Sapphire Rapids PMT errata fix platform/x86/intel/hid: Add some ACPI device IDs scsi: ibmvfc: Avoid path failures during live migration scsi: scsi_debug: Make the READ CAPACITY response compliant with ZBC drm: panel-orientation-quirks: Add quirk for Acer Switch V 10 (SW5-017) block, bfq: fix null pointer dereference in bfq_bio_bfqg() arm64/syscall: Include asm/ptrace.h in syscall_wrapper header. nvmet: fix memory leak in nvmet_subsys_attr_model_store_locked Revert "drm/amdgpu: Revert "drm/amdgpu: getting fan speed pwm for vega10 properly"" ALSA: usb-audio: add quirk to fix Hamedal C20 disconnect issue RISC-V: vdso: Do not add missing symbols to version section in linker script MIPS: pic32: treat port as signed integer xfrm: fix "disable_policy" on ipv4 early demux xfrm: replay: Fix ESN wrap around for GSO af_key: Fix send_acquire race with pfkey_register ARM: dts: am335x-pcm-953: Define fixed regulators in root node ASoC: hdac_hda: fix hda pcm buffer overflow issue ASoC: sgtl5000: Reset the CHIP_CLK_CTRL reg on remove ASoC: soc-pcm: Don't zero TDM masks in __soc_pcm_open() x86/hyperv: Restore VP assist page after cpu offlining/onlining scsi: storvsc: Fix handling of srb_status and capacity change events ASoC: max98373: Add checks for devm_kcalloc regulator: core: fix kobject release warning and memory leak in regulator_register() spi: dw-dma: decrease reference count in dw_spi_dma_init_mfld() regulator: core: fix UAF in destroy_regulator() bus: sunxi-rsb: Remove the shutdown callback bus: sunxi-rsb: Support atomic transfers tee: optee: fix possible memory leak in optee_register_device() ARM: dts: at91: sam9g20ek: enable udc vbus gpio pinctrl selftests: mptcp: more stable simult_flows tests selftests: mptcp: fix mibit vs mbit mix up net: liquidio: simplify if expression rxrpc: Allow list of in-use local UDP endpoints to be viewed in /proc rxrpc: Use refcount_t rather than atomic_t rxrpc: Fix race between conn bundle lookup and bundle removal [ZDI-CAN-15975] net: dsa: sja1105: disallow C45 transactions on the BASE-TX MDIO bus nfc/nci: fix race with opening and closing net: pch_gbe: fix potential memleak in pch_gbe_tx_queue() 9p/fd: fix issue of list_del corruption in p9_fd_cancel() netfilter: conntrack: Fix data-races around ct mark netfilter: nf_tables: do not set up extensions for end interval iavf: Fix a crash during reset task iavf: Do not restart Tx queues after reset task failure iavf: Fix race condition between iavf_shutdown and iavf_remove ARM: mxs: fix memory leak in mxs_machine_init() ARM: dts: imx6q-prti6q: Fix ref/tcxo-clock-frequency properties net: ethernet: mtk_eth_soc: fix error handling in mtk_open() net/mlx4: Check retval of mlx4_bitmap_init net: mvpp2: fix possible invalid pointer dereference net/qla3xxx: fix potential memleak in ql3xxx_send() octeontx2-af: debugsfs: fix pci device refcount leak net: pch_gbe: fix pci device refcount leak while module exiting nfp: fill splittable of devlink_port_attrs correctly nfp: add port from netdev validation for EEPROM access macsec: Fix invalid error code set Drivers: hv: vmbus: fix double free in the error path of vmbus_add_channel_work() Drivers: hv: vmbus: fix possible memory leak in vmbus_device_register() netfilter: ipset: regression in ip_set_hash_ip.c net/mlx5: Do not query pci info while pci disabled net/mlx5: Fix FW tracer timestamp calculation net/mlx5: Fix handling of entry refcount when command is not issued to FW tipc: set con sock in tipc_conn_alloc tipc: add an extra conn_get in tipc_conn_alloc tipc: check skb_linearize() return value in tipc_disc_rcv() xfrm: Fix oops in __xfrm_state_delete() xfrm: Fix ignored return value in xfrm6_init() net: wwan: iosm: use ACPI_FREE() but not kfree() in ipc_pcie_read_bios_cfg() sfc: fix potential memleak in __ef100_hard_start_xmit() net: sparx5: fix error handling in sparx5_port_open() net: sched: allow act_ct to be built without NF_NAT NFC: nci: fix memory leak in nci_rx_data_packet() regulator: twl6030: re-add TWL6032_SUBCLASS bnx2x: fix pci device refcount leak in bnx2x_vf_is_pcie_pending() dma-buf: fix racing conflict of dma_heap_add() netfilter: ipset: restore allowing 64 clashing elements in hash:net,iface netfilter: flowtable_offload: add missing locking fs: do not update freeing inode i_io_list dccp/tcp: Reset saddr on failure after inet6?_hash_connect(). ipv4: Fix error return code in fib_table_insert() arcnet: fix potential memory leak in com20020_probe() s390/dasd: fix no record found for raw_track_access nfc: st-nci: fix incorrect validating logic in EVT_TRANSACTION nfc: st-nci: fix memory leaks in EVT_TRANSACTION nfc: st-nci: fix incorrect sizing calculations in EVT_TRANSACTION net: enetc: manage ENETC_F_QBV in priv->active_offloads only when enabled net: enetc: cache accesses to &priv->si->hw net: enetc: preserve TX ring priority across reconfiguration octeontx2-pf: Add check for devm_kcalloc octeontx2-af: Fix reference count issue in rvu_sdp_init() net: thunderx: Fix the ACPI memory leak s390/crashdump: fix TOD programmable field size lib/vdso: use "grep -E" instead of "egrep" init/Kconfig: fix CC_HAS_ASM_GOTO_TIED_OUTPUT test with dash nios2: add FORCE for vmlinuz.gz mmc: sdhci-brcmstb: Re-organize flags mmc: sdhci-brcmstb: Enable Clock Gating to save power mmc: sdhci-brcmstb: Fix SDHCI_RESET_ALL for CQHCI KVM: arm64: pkvm: Fixup boot mode to reflect that the kernel resumes from EL1 usb: dwc3: exynos: Fix remove() function usb: cdnsp: Fix issue with Clear Feature Halt Endpoint usb: cdnsp: fix issue with ZLP - added TD_SIZE = 1 ext4: fix use-after-free in ext4_ext_shift_extents arm64: dts: rockchip: lower rk3399-puma-haikou SD controller clock frequency iio: light: apds9960: fix wrong register for gesture gain iio: core: Fix entry not deleted when iio_register_sw_trigger_type() fails bus: ixp4xx: Don't touch bit 7 on IXP42x usb: dwc3: gadget: conditionally remove requests usb: dwc3: gadget: Return -ESHUTDOWN on ep disable usb: dwc3: gadget: Clear ep descriptor last nilfs2: fix nilfs_sufile_mark_dirty() not set segment usage as dirty gcov: clang: fix the buffer overflow issue mm: vmscan: fix extreme overreclaim and swap floods KVM: x86: nSVM: leave nested mode on vCPU free KVM: x86: forcibly leave nested mode on vCPU reset KVM: x86: nSVM: harden svm_free_nested against freeing vmcb02 while still in use KVM: x86: add kvm_leave_nested KVM: x86: remove exit_int_info warning in svm_handle_exit x86/tsx: Add a feature bit for TSX control MSR support x86/pm: Add enumeration check before spec MSRs save/restore setup x86/ioremap: Fix page aligned size calculation in __ioremap_caller() Input: synaptics - switch touchpad on HP Laptop 15-da3001TU to RMI mode ASoC: Intel: bytcht_es8316: Add quirk for the Nanote UMPC-01 tools: iio: iio_generic_buffer: Fix read size serial: 8250: 8250_omap: Avoid RS485 RTS glitch on ->set_termios() Input: goodix - try resetting the controller when no config is set Input: soc_button_array - add use_low_level_irq module parameter Input: soc_button_array - add Acer Switch V 10 to dmi_use_low_level_irq[] Input: i8042 - apply probe defer to more ASUS ZenBook models ASoC: stm32: dfsdm: manage cb buffers cleanup xen-pciback: Allow setting PCI_MSIX_FLAGS_MASKALL too xen/platform-pci: add missing free_irq() in error path platform/x86: asus-wmi: add missing pci_dev_put() in asus_wmi_set_xusb2pr() platform/x86: acer-wmi: Enable SW_TABLET_MODE on Switch V 10 (SW5-017) drm/amdgpu: disable BACO support on more cards zonefs: fix zone report size in __zonefs_io_error() platform/x86: hp-wmi: Ignore Smart Experience App event platform/x86: ideapad-laptop: Fix interrupt storm on fn-lock toggle on some Yoga laptops tcp: configurable source port perturb table size net: usb: qmi_wwan: add Telit 0x103a composition scsi: iscsi: Fix possible memory leak when device_register() failed gpu: host1x: Avoid trying to use GART on Tegra20 dm integrity: flush the journal on suspend dm integrity: clear the journal on suspend fuse: lock inode unconditionally in fuse_fallocate() wifi: wilc1000: validate pairwise and authentication suite offsets wifi: wilc1000: validate length of IEEE80211_P2P_ATTR_OPER_CHANNEL attribute wifi: wilc1000: validate length of IEEE80211_P2P_ATTR_CHANNEL_LIST attribute wifi: wilc1000: validate number of channels genirq/msi: Shutdown managed interrupts with unsatifiable affinities genirq: Always limit the affinity to online CPUs irqchip/gic-v3: Always trust the managed affinity provided by the core code genirq: Take the proposed affinity at face value if force==true btrfs: free btrfs_path before copying root refs to userspace btrfs: free btrfs_path before copying fspath to userspace btrfs: free btrfs_path before copying subvol info to userspace btrfs: zoned: fix missing endianness conversion in sb_write_pointer btrfs: use kvcalloc in btrfs_get_dev_zone_info btrfs: sysfs: normalize the error handling branch in btrfs_init_sysfs() drm/amd/dc/dce120: Fix audio register mapping, stop triggering KASAN drm/amd/display: No display after resume from WB/CB drm/amdgpu: Enable Aldebaran devices to report CU Occupancy drm/amdgpu: always register an MMU notifier for userptr drm/i915: fix TLB invalidation for Gen12 video and compute engines cifs: fix missed refcounting of ipc tcon Linux 5.15.81 Change-Id: I9fe677a5bc83d3484f86aebbe20dd129c7ca3a42 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2022-12-04 12:02:10 +00:00
Luiz Capitulino	7d0c25b5fe	genirq: Take the proposed affinity at face value if force==true From: Marc Zyngier <maz@kernel.org> commit c48c8b829d2b966a6649827426bcdba082ccf922 upstream. Although setting the affinity of an interrupt to a set of CPUs that doesn't have any online CPU is generally frowned apon, there are a few limited cases where such affinity is set from a CPUHP notifier, setting the affinity to a CPU that isn't online yet. The saving grace is that this is always done using the 'force' attribute, which gives a hint that the affinity setting can be outside of the online CPU mask and the callsite set this flag with the knowledge that the underlying interrupt controller knows to handle it. This restores the expected behaviour on Marek's system. Fixes: 33de0aa4bae9 ("genirq: Always limit the affinity to online CPUs") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/4b7fc13c-887b-a664-26e8-45aed13f048a@samsung.com Link: https://lore.kernel.org/r/20220414140011.541725-1-maz@kernel.org Signed-off-by: Luiz Capitulino <luizcap@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-12-02 17:41:12 +01:00
Luiz Capitulino	52a93f2dcf	genirq: Always limit the affinity to online CPUs From: Marc Zyngier <maz@kernel.org> commit 33de0aa4bae982ed6f7c777f86b5af3e627ac937 upstream. [ Fixed small conflicts due to the HK_FLAG_MANAGED_IRQ flag been renamed on upstream ] When booting with maxcpus=<small number> (or even loading a driver while most CPUs are offline), it is pretty easy to observe managed affinities containing a mix of online and offline CPUs being passed to the irqchip driver. This means that the irqchip cannot trust the affinity passed down from the core code, which is a bit annoying and requires (at least in theory) all drivers to implement some sort of affinity narrowing. In order to address this, always limit the cpumask to the set of online CPUs. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220405185040.206297-3-maz@kernel.org Signed-off-by: Luiz Capitulino <luizcap@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-12-02 17:41:11 +01:00
Luiz Capitulino	599cf4b845	genirq/msi: Shutdown managed interrupts with unsatifiable affinities From: Marc Zyngier <maz@kernel.org> commit d802057c7c553ad426520a053da9f9fe08e2c35a upstream. [ This commit is almost a rewrite because it conflicts with Thomas Gleixner's refactoring of this code in v5.17-rc1. I wasn't sure if I should drop all the s-o-bs (including Mark's), but decided to keep as the original commit ] When booting with maxcpus=<small number>, interrupt controllers such as the GICv3 ITS may not be able to satisfy the affinity of some managed interrupts, as some of the HW resources are simply not available. The same thing happens when loading a driver using managed interrupts while CPUs are offline. In order to deal with this, do not try to activate such interrupt if there is no online CPU capable of handling it. Instead, place it in shutdown state. Once a capable CPU shows up, it will be activated. Reported-by: John Garry <john.garry@huawei.com> Reported-by: David Decotigny <ddecotig@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/20220405185040.206297-2-maz@kernel.org Signed-off-by: Luiz Capitulino <luizcap@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-12-02 17:41:11 +01:00
Mukesh Ojha	feb2eda5e1	gcov: clang: fix the buffer overflow issue commit a6f810efabfd789d3bbafeacb4502958ec56c5ce upstream. Currently, in clang version of gcov code when module is getting removed gcov_info_add() incorrectly adds the sfn_ptr->counter to all the dst->functions and it result in the kernel panic in below crash report. Fix this by properly handling it. [ 8.899094][ T599] Unable to handle kernel write to read-only memory at virtual address ffffff80461cc000 [ 8.899100][ T599] Mem abort info: [ 8.899102][ T599] ESR = 0x9600004f [ 8.899103][ T599] EC = 0x25: DABT (current EL), IL = 32 bits [ 8.899105][ T599] SET = 0, FnV = 0 [ 8.899107][ T599] EA = 0, S1PTW = 0 [ 8.899108][ T599] FSC = 0x0f: level 3 permission fault [ 8.899110][ T599] Data abort info: [ 8.899111][ T599] ISV = 0, ISS = 0x0000004f [ 8.899113][ T599] CM = 0, WnR = 1 [ 8.899114][ T599] swapper pgtable: 4k pages, 39-bit VAs, pgdp=00000000ab8de000 [ 8.899116][ T599] [ffffff80461cc000] pgd=18000009ffcde003, p4d=18000009ffcde003, pud=18000009ffcde003, pmd=18000009ffcad003, pte=00600000c61cc787 [ 8.899124][ T599] Internal error: Oops: 9600004f [#1] PREEMPT SMP [ 8.899265][ T599] Skip md ftrace buffer dump for: 0x1609e0 .... .., [ 8.899544][ T599] CPU: 7 PID: 599 Comm: modprobe Tainted: G S OE 5.15.41-android13-8-g38e9b1af6bce #1 [ 8.899547][ T599] Hardware name: XXX (DT) [ 8.899549][ T599] pstate: 82400005 (Nzcv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--) [ 8.899551][ T599] pc : gcov_info_add+0x9c/0xb8 [ 8.899557][ T599] lr : gcov_event+0x28c/0x6b8 [ 8.899559][ T599] sp : ffffffc00e733b00 [ 8.899560][ T599] x29: ffffffc00e733b00 x28: ffffffc00e733d30 x27: ffffffe8dc297470 [ 8.899563][ T599] x26: ffffffe8dc297000 x25: ffffffe8dc297000 x24: ffffffe8dc297000 [ 8.899566][ T599] x23: ffffffe8dc0a6200 x22: ffffff880f68bf20 x21: 0000000000000000 [ 8.899569][ T599] x20: ffffff880f68bf00 x19: ffffff8801babc00 x18: ffffffc00d7f9058 [ 8.899572][ T599] x17: 0000000000088793 x16: ffffff80461cbe00 x15: 9100052952800785 [ 8.899575][ T599] x14: 0000000000000200 x13: 0000000000000041 x12: 9100052952800785 [ 8.899577][ T599] x11: ffffffe8dc297000 x10: ffffffe8dc297000 x9 : ffffff80461cbc80 [ 8.899580][ T599] x8 : ffffff8801babe80 x7 : ffffffe8dc2ec000 x6 : ffffffe8dc2ed000 [ 8.899583][ T599] x5 : 000000008020001f x4 : fffffffe2006eae0 x3 : 000000008020001f [ 8.899586][ T599] x2 : ffffff8027c49200 x1 : ffffff8801babc20 x0 : ffffff80461cb3a0 [ 8.899589][ T599] Call trace: [ 8.899590][ T599] gcov_info_add+0x9c/0xb8 [ 8.899592][ T599] gcov_module_notifier+0xbc/0x120 [ 8.899595][ T599] blocking_notifier_call_chain+0xa0/0x11c [ 8.899598][ T599] do_init_module+0x2a8/0x33c [ 8.899600][ T599] load_module+0x23cc/0x261c [ 8.899602][ T599] __arm64_sys_finit_module+0x158/0x194 [ 8.899604][ T599] invoke_syscall+0x94/0x2bc [ 8.899607][ T599] el0_svc_common+0x1d8/0x34c [ 8.899609][ T599] do_el0_svc+0x40/0x54 [ 8.899611][ T599] el0_svc+0x94/0x2f0 [ 8.899613][ T599] el0t_64_sync_handler+0x88/0xec [ 8.899615][ T599] el0t_64_sync+0x1b4/0x1b8 [ 8.899618][ T599] Code: f905f56c f86e69ec f86e6a0f 8b0c01ec (f82e6a0c) [ 8.899620][ T599] ---[ end trace ed5218e9e5b6e2e6 ]--- Link: https://lkml.kernel.org/r/1668020497-13142-1-git-send-email-quic_mojha@quicinc.com Fixes: `e178a5beb3` ("gcov: clang support") Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Tested-by: Peter Oberparleiter <oberpar@linux.ibm.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Tom Rix <trix@redhat.com> Cc: <stable@vger.kernel.org> [5.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-12-02 17:41:09 +01:00
Ramji Jiyani	706d642b1d	ANDROID: GKI: Handle no ABI symbol list for modules ACKs with no ABI symbol lists like mainline, don't let any unsigned modules load as every access is being treated as violation as NO_OF_UNPROTECTED_SYMBOLS will be 0 in this case. Check NO_OF_UNPROTECTED_SYMBOLS and if it's 0, allow every symbol access by unsigned modules; so we can keep the feature enable and also not break any devices. It should never be 0 with kernel branches where KMI_SYMBOL_LISTS have been enabled. Bug: 257458145 Bug: 232430739 Test: TH Fixes: e9669eeb2f45 ("ANDROID: GKI: Add module load time symbol protection") Change-Id: Iab65e1425473e32baaad0d6c7f0d3eb007ae864f Signed-off-by: Ramji Jiyani <ramjiyani@google.com> (cherry picked from commit 8e00226a8fffa10b6383e448af785ce44451688e)	2022-12-01 00:22:47 +00:00
Yu Zhao	baeb9a0025	BACKPORT: mm: multi-gen LRU: kill switch Add /sys/kernel/mm/lru_gen/enabled as a kill switch. Components that can be disabled include: 0x0001: the multi-gen LRU core 0x0002: walking page table, when arch_has_hw_pte_young() returns true 0x0004: clearing the accessed bit in non-leaf PMD entries, when CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y [yYnN]: apply to all the components above E.g., echo y >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled 0x0007 echo 5 >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled 0x0005 NB: the page table walks happen on the scale of seconds under heavy memory pressure, in which case the mmap_lock contention is a lesser concern, compared with the LRU lock contention and the I/O congestion. So far the only well-known case of the mmap_lock contention happens on Android, due to Scudo [1] which allocates several thousand VMAs for merely a few hundred MBs. The SPF and the Maple Tree also have provided their own assessments [2][3]. However, if walking page tables does worsen the mmap_lock contention, the kill switch can be used to disable it. In this case the multi-gen LRU will suffer a minor performance degradation, as shown previously. Clearing the accessed bit in non-leaf PMD entries can also be disabled, since this behavior was not tested on x86 varieties other than Intel and AMD. [1] https://source.android.com/devices/tech/debug/scudo [2] https://lore.kernel.org/r/20220128131006.67712-1-michel@lespinasse.org/ [3] https://lore.kernel.org/r/20220426150616.3937571-1-Liam.Howlett@oracle.com/ Link: https://lkml.kernel.org/r/20220918080010.2920238-11-yuzhao@google.com Change-Id: If3116e6698cc6967b6992c2017962fac6c2d3a11 Signed-off-by: Yu Zhao <yuzhao@google.com> Acked-by: Brian Geffon <bgeffon@google.com> Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name> Acked-by: Steven Barrett <steven@liquorix.net> Acked-by: Suleiman Souhlal <suleiman@google.com> Tested-by: Daniel Byrne <djbyrne@mtu.edu> Tested-by: Donald Carr <d@chaos-reins.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> Tested-by: Shuang Zhai <szhai2@cs.rochester.edu> Tested-by: Sofia Trinh <sofia.trinh@edi.works> Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Barry Song <baohua@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michael Larabel <Michael@MichaelLarabel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 354ed597442952fb680c9cafc7e4eb8a76f9514c) Bug: 249601646 Signed-off-by: Kalesh Singh <kaleshsingh@google.com>	2022-11-30 00:28:11 +00:00
Yu Zhao	7f53b0e704	BACKPORT: mm: multi-gen LRU: support page table walks To further exploit spatial locality, the aging prefers to walk page tables to search for young PTEs and promote hot pages. A kill switch will be added in the next patch to disable this behavior. When disabled, the aging relies on the rmap only. NB: this behavior has nothing similar with the page table scanning in the 2.4 kernel [1], which searches page tables for old PTEs, adds cold pages to swapcache and unmaps them. To avoid confusion, the term "iteration" specifically means the traversal of an entire mm_struct list; the term "walk" will be applied to page tables and the rmap, as usual. An mm_struct list is maintained for each memcg, and an mm_struct follows its owner task to the new memcg when this task is migrated. Given an lruvec, the aging iterates lruvec_memcg()->mm_list and calls walk_page_range() with each mm_struct on this list to promote hot pages before it increments max_seq. When multiple page table walkers iterate the same list, each of them gets a unique mm_struct; therefore they can run concurrently. Page table walkers ignore any misplaced pages, e.g., if an mm_struct was migrated, pages it left in the previous memcg will not be promoted when its current memcg is under reclaim. Similarly, page table walkers will not promote pages from nodes other than the one under reclaim. This patch uses the following optimizations when walking page tables: 1. It tracks the usage of mm_struct's between context switches so that page table walkers can skip processes that have been sleeping since the last iteration. 2. It uses generational Bloom filters to record populated branches so that page table walkers can reduce their search space based on the query results, e.g., to skip page tables containing mostly holes or misplaced pages. 3. It takes advantage of the accessed bit in non-leaf PMD entries when CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y. 4. It does not zigzag between a PGD table and the same PMD table spanning multiple VMAs. IOW, it finishes all the VMAs within the range of the same PMD table before it returns to a PGD table. This improves the cache performance for workloads that have large numbers of tiny VMAs [2], especially when CONFIG_PGTABLE_LEVELS=5. Server benchmark results: Single workload: fio (buffered I/O): no change Single workload: memcached (anon): +[8, 10]% Ops/sec KB/sec patch1-7: 1147696.57 44640.29 patch1-8: 1245274.91 48435.66 Configurations: no change Client benchmark results: kswapd profiles: patch1-7 48.16% lzo1x_1_do_compress (real work) 8.20% page_vma_mapped_walk (overhead) 7.06% _raw_spin_unlock_irq 2.92% ptep_clear_flush 2.53% __zram_bvec_write 2.11% do_raw_spin_lock 2.02% memmove 1.93% lru_gen_look_around 1.56% free_unref_page_list 1.40% memset patch1-8 49.44% lzo1x_1_do_compress (real work) 6.19% page_vma_mapped_walk (overhead) 5.97% _raw_spin_unlock_irq 3.13% get_pfn_page 2.85% ptep_clear_flush 2.42% __zram_bvec_write 2.08% do_raw_spin_lock 1.92% memmove 1.44% alloc_zspage 1.36% memset Configurations: no change Thanks to the following developers for their efforts [3]. kernel test robot <lkp@intel.com> [1] https://lwn.net/Articles/23732/ [2] https://llvm.org/docs/ScudoHardenedAllocator.html [3] https://lore.kernel.org/r/202204160827.ekEARWQo-lkp@intel.com/ Link: https://lkml.kernel.org/r/20220918080010.2920238-9-yuzhao@google.com Change-Id: I7ed3daf288e664e15bfd34991a77467a19a4e39a Signed-off-by: Yu Zhao <yuzhao@google.com> Acked-by: Brian Geffon <bgeffon@google.com> Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name> Acked-by: Steven Barrett <steven@liquorix.net> Acked-by: Suleiman Souhlal <suleiman@google.com> Tested-by: Daniel Byrne <djbyrne@mtu.edu> Tested-by: Donald Carr <d@chaos-reins.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> Tested-by: Shuang Zhai <szhai2@cs.rochester.edu> Tested-by: Sofia Trinh <sofia.trinh@edi.works> Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Barry Song <baohua@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michael Larabel <Michael@MichaelLarabel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit bd74fdaea146029e4fa12c6de89adbe0779348a9) [ Resolve conflicts in include/linux/memcontrol.h, include/linux/mm_types.h ] Bug: 249601646 Signed-off-by: Kalesh Singh <kaleshsingh@google.com>	2022-11-30 00:28:11 +00:00
Yu Zhao	37397878ee	BACKPORT: mm: multi-gen LRU: minimal implementation To avoid confusion, the terms "promotion" and "demotion" will be applied to the multi-gen LRU, as a new convention; the terms "activation" and "deactivation" will be applied to the active/inactive LRU, as usual. The aging produces young generations. Given an lruvec, it increments max_seq when max_seq-min_seq+1 approaches MIN_NR_GENS. The aging promotes hot pages to the youngest generation when it finds them accessed through page tables; the demotion of cold pages happens consequently when it increments max_seq. Promotion in the aging path does not involve any LRU list operations, only the updates of the gen counter and lrugen->nr_pages[]; demotion, unless as the result of the increment of max_seq, requires LRU list operations, e.g., lru_deactivate_fn(). The aging has the complexity O(nr_hot_pages), since it is only interested in hot pages. The eviction consumes old generations. Given an lruvec, it increments min_seq when lrugen->lists[] indexed by min_seq%MAX_NR_GENS becomes empty. A feedback loop modeled after the PID controller monitors refaults over anon and file types and decides which type to evict when both types are available from the same generation. The protection of pages accessed multiple times through file descriptors takes place in the eviction path. Each generation is divided into multiple tiers. A page accessed N times through file descriptors is in tier order_base_2(N). Tiers do not have dedicated lrugen->lists[], only bits in page->flags. The aforementioned feedback loop also monitors refaults over all tiers and decides when to protect pages in which tiers (N>1), using the first tier (N=0,1) as a baseline. The first tier contains single-use unmapped clean pages, which are most likely the best choices. In contrast to promotion in the aging path, the protection of a page in the eviction path is achieved by moving this page to the next generation, i.e., min_seq+1, if the feedback loop decides so. This approach has the following advantages: 1. It removes the cost of activation in the buffered access path by inferring whether pages accessed multiple times through file descriptors are statistically hot and thus worth protecting in the eviction path. 2. It takes pages accessed through page tables into account and avoids overprotecting pages accessed multiple times through file descriptors. (Pages accessed through page tables are in the first tier, since N=0.) 3. More tiers provide better protection for pages accessed more than twice through file descriptors, when under heavy buffered I/O workloads. Server benchmark results: Single workload: fio (buffered I/O): +[30, 32]% IOPS BW 5.19-rc1: 2673k 10.2GiB/s patch1-6: 3491k 13.3GiB/s Single workload: memcached (anon): -[4, 6]% Ops/sec KB/sec 5.19-rc1: 1161501.04 45177.25 patch1-6: 1106168.46 43025.04 Configurations: CPU: two Xeon 6154 Mem: total 256G Node 1 was only used as a ram disk to reduce the variance in the results. patch drivers/block/brd.c <<EOF 99,100c99,100 < gfp_flags = GFP_NOIO \| __GFP_ZERO \| __GFP_HIGHMEM; < page = alloc_page(gfp_flags); --- > gfp_flags = GFP_NOIO \| __GFP_ZERO \| __GFP_HIGHMEM \| __GFP_THISNODE; > page = alloc_pages_node(1, gfp_flags, 0); EOF cat >>/etc/systemd/system.conf <<EOF CPUAffinity=numa NUMAPolicy=bind NUMAMask=0 EOF cat >>/etc/memcached.conf <<EOF -m 184320 -s /var/run/memcached/memcached.sock -a 0766 -t 36 -B binary EOF cat fio.sh modprobe brd rd_nr=1 rd_size=113246208 swapoff -a mkfs.ext4 /dev/ram0 mount -t ext4 /dev/ram0 /mnt mkdir /sys/fs/cgroup/user.slice/test echo 38654705664 >/sys/fs/cgroup/user.slice/test/memory.max echo $$ >/sys/fs/cgroup/user.slice/test/cgroup.procs fio -name=mglru --numjobs=72 --directory=/mnt --size=1408m \ --buffered=1 --ioengine=io_uring --iodepth=128 \ --iodepth_batch_submit=32 --iodepth_batch_complete=32 \ --rw=randread --random_distribution=random --norandommap \ --time_based --ramp_time=10m --runtime=5m --group_reporting cat memcached.sh modprobe brd rd_nr=1 rd_size=113246208 swapoff -a mkswap /dev/ram0 swapon /dev/ram0 memtier_benchmark -S /var/run/memcached/memcached.sock \ -P memcache_binary -n allkeys --key-minimum=1 \ --key-maximum=65000000 --key-pattern=P:P -c 1 -t 36 \ --ratio 1:0 --pipeline 8 -d 2000 memtier_benchmark -S /var/run/memcached/memcached.sock \ -P memcache_binary -n allkeys --key-minimum=1 \ --key-maximum=65000000 --key-pattern=R:R -c 1 -t 36 \ --ratio 0:1 --pipeline 8 --randomize --distinct-client-seed Client benchmark results: kswapd profiles: 5.19-rc1 40.33% page_vma_mapped_walk (overhead) 21.80% lzo1x_1_do_compress (real work) 7.53% do_raw_spin_lock 3.95% _raw_spin_unlock_irq 2.52% vma_interval_tree_iter_next 2.37% page_referenced_one 2.28% vma_interval_tree_subtree_search 1.97% anon_vma_interval_tree_iter_first 1.60% ptep_clear_flush 1.06% __zram_bvec_write patch1-6 39.03% lzo1x_1_do_compress (real work) 18.47% page_vma_mapped_walk (overhead) 6.74% _raw_spin_unlock_irq 3.97% do_raw_spin_lock 2.49% ptep_clear_flush 2.48% anon_vma_interval_tree_iter_first 1.92% page_referenced_one 1.88% __zram_bvec_write 1.48% memmove 1.31% vma_interval_tree_iter_next Configurations: CPU: single Snapdragon 7c Mem: total 4G ChromeOS MemoryPressure [1] [1] https://chromium.googlesource.com/chromiumos/platform/tast-tests/ Link: https://lkml.kernel.org/r/20220918080010.2920238-7-yuzhao@google.com Change-Id: I30b26b3086ce1879b83b96eb265f8f0dcb16a1fb Signed-off-by: Yu Zhao <yuzhao@google.com> Acked-by: Brian Geffon <bgeffon@google.com> Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name> Acked-by: Steven Barrett <steven@liquorix.net> Acked-by: Suleiman Souhlal <suleiman@google.com> Tested-by: Daniel Byrne <djbyrne@mtu.edu> Tested-by: Donald Carr <d@chaos-reins.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> Tested-by: Shuang Zhai <szhai2@cs.rochester.edu> Tested-by: Sofia Trinh <sofia.trinh@edi.works> Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Barry Song <baohua@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michael Larabel <Michael@MichaelLarabel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit ac35a490237446b71e3b4b782b1596967edd0aa8) [Resolve confilcts in mm/Kconfig, mm/swap.c, mm/vmscan.c] Bug: 249601646 Signed-off-by: Kalesh Singh <kaleshsingh@google.com>	2022-11-30 00:28:11 +00:00
Yu Zhao	d5b2fa1c7b	BACKPORT: mm: multi-gen LRU: groundwork Evictable pages are divided into multiple generations for each lruvec. The youngest generation number is stored in lrugen->max_seq for both anon and file types as they are aged on an equal footing. The oldest generation numbers are stored in lrugen->min_seq[] separately for anon and file types as clean file pages can be evicted regardless of swap constraints. These three variables are monotonically increasing. Generation numbers are truncated into order_base_2(MAX_NR_GENS+1) bits in order to fit into the gen counter in page->flags. Each truncated generation number is an index to lrugen->lists[]. The sliding window technique is used to track at least MIN_NR_GENS and at most MAX_NR_GENS generations. The gen counter stores a value within [1, MAX_NR_GENS] while a page is on one of lrugen->lists[]. Otherwise it stores 0. There are two conceptually independent procedures: "the aging", which produces young generations, and "the eviction", which consumes old generations. They form a closed-loop system, i.e., "the page reclaim". Both procedures can be invoked from userspace for the purposes of working set estimation and proactive reclaim. These techniques are commonly used to optimize job scheduling (bin packing) in data centers [1][2]. To avoid confusion, the terms "hot" and "cold" will be applied to the multi-gen LRU, as a new convention; the terms "active" and "inactive" will be applied to the active/inactive LRU, as usual. The protection of hot pages and the selection of cold pages are based on page access channels and patterns. There are two access channels: one through page tables and the other through file descriptors. The protection of the former channel is by design stronger because: 1. The uncertainty in determining the access patterns of the former channel is higher due to the approximation of the accessed bit. 2. The cost of evicting the former channel is higher due to the TLB flushes required and the likelihood of encountering the dirty bit. 3. The penalty of underprotecting the former channel is higher because applications usually do not prepare themselves for major page faults like they do for blocked I/O. E.g., GUI applications commonly use dedicated I/O threads to avoid blocking rendering threads. There are also two access patterns: one with temporal locality and the other without. For the reasons listed above, the former channel is assumed to follow the former pattern unless VM_SEQ_READ or VM_RAND_READ is present; the latter channel is assumed to follow the latter pattern unless outlying refaults have been observed [3][4]. The next patch will address the "outlying refaults". Three macros, i.e., LRU_REFS_WIDTH, LRU_REFS_PGOFF and LRU_REFS_MASK, used later are added in this patch to make the entire patchset less diffy. A page is added to the youngest generation on faulting. The aging needs to check the accessed bit at least twice before handing this page over to the eviction. The first check takes care of the accessed bit set on the initial fault; the second check makes sure this page has not been used since then. This protocol, AKA second chance, requires a minimum of two generations, hence MIN_NR_GENS. [1] https://dl.acm.org/doi/10.1145/3297858.3304053 [2] https://dl.acm.org/doi/10.1145/3503222.3507731 [3] https://lwn.net/Articles/495543/ [4] https://lwn.net/Articles/815342/ Link: https://lkml.kernel.org/r/20220918080010.2920238-6-yuzhao@google.com Change-Id: I7b24d1e9d263e4eb2c2ee23f2eb143824fcb5201 Signed-off-by: Yu Zhao <yuzhao@google.com> Acked-by: Brian Geffon <bgeffon@google.com> Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name> Acked-by: Steven Barrett <steven@liquorix.net> Acked-by: Suleiman Souhlal <suleiman@google.com> Tested-by: Daniel Byrne <djbyrne@mtu.edu> Tested-by: Donald Carr <d@chaos-reins.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> Tested-by: Shuang Zhai <szhai2@cs.rochester.edu> Tested-by: Sofia Trinh <sofia.trinh@edi.works> Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Barry Song <baohua@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michael Larabel <Michael@MichaelLarabel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit ec1c86b25f4bdd9dce6436c0539d2a6ae676e1c4) [ Resolve conflicts in mm/memory.c, mm/memcontrol.c, mm/Kconfig, include/linux/mm_inline.h] Bug: 249601646 Signed-off-by: Kalesh Singh <kaleshsingh@google.com>	2022-11-30 00:28:11 +00:00
Kalesh Singh	2635d7d108	Revert "FROMLIST: mm: multi-gen LRU: groundwork" This reverts commit `f88ed5a3d3`. To be replaced with upstream version. Bug: 249601646 Change-Id: I5a206480f838c304fb1c960fec2615894c2421bb Signed-off-by: Kalesh Singh <kaleshsingh@google.com>	2022-11-30 00:28:11 +00:00
Kalesh Singh	e931b1d222	Revert "FROMLIST: mm: multi-gen LRU: minimal implementation" This reverts commit `a1537a68c5`. To be replaced with upstream version. Bug: 249601646 Change-Id: I3dfbb3ec56cfdb5a2db7ec00c124dae471cce932 Signed-off-by: Kalesh Singh <kaleshsingh@google.com>	2022-11-30 00:28:11 +00:00
Kalesh Singh	02dc0d1dda	Revert "FROMLIST: mm: multi-gen LRU: support page table walks" This reverts commit `5280d76d38`. To be replaced with upstream version. Bug: 249601646 Change-Id: I5bf4095117082b6627d182a8d987ca78a18fa392 Signed-off-by: Kalesh Singh <kaleshsingh@google.com>	2022-11-30 00:28:11 +00:00
Kalesh Singh	8994fcd031	Revert "FROMLIST: mm: multi-gen LRU: kill switch" This reverts commit `76f7f07cbf`. To be replaced with upstream version. Bug: 249601646 Change-Id: I8356c32ce1962efba62ab2fc8a256b751c974c00 Signed-off-by: Kalesh Singh <kaleshsingh@google.com>	2022-11-30 00:28:11 +00:00
Greg Kroah-Hartman	f5ea8b2710	Merge 5.15.80 into android14-5.15 Changes in 5.15.80 mm: hwpoison: refactor refcount check handling mm: hwpoison: handle non-anonymous THP correctly mm: shmem: don't truncate page if memory failure happens ASoC: wm5102: Revert "ASoC: wm5102: Fix PM disable depth imbalance in wm5102_probe" ASoC: wm5110: Revert "ASoC: wm5110: Fix PM disable depth imbalance in wm5110_probe" ASoC: wm8997: Revert "ASoC: wm8997: Fix PM disable depth imbalance in wm8997_probe" ASoC: mt6660: Keep the pm_runtime enables before component stuff in mt6660_i2c_probe ASoC: rt1019: Fix the TDM settings ASoC: wm8962: Add an event handler for TEMP_HP and TEMP_SPK spi: intel: Fix the offset to get the 64K erase opcode ASoC: codecs: jz4725b: add missed Line In power control bit ASoC: codecs: jz4725b: fix reported volume for Master ctl ASoC: codecs: jz4725b: use right control for Capture Volume ASoC: codecs: jz4725b: fix capture selector naming ASoC: Intel: sof_sdw: add quirk variant for LAPBC710 NUC15 selftests/futex: fix build for clang selftests/intel_pstate: fix build for ARCH=x86_64 ASoC: rt1308-sdw: add the default value of some registers drm/amd/display: Remove wrong pipe control lock ACPI: scan: Add LATT2021 to acpi_ignore_dep_ids[] RDMA/efa: Add EFA 0xefa2 PCI ID btrfs: raid56: properly handle the error when unable to find the missing stripe NFSv4: Retry LOCK on OLD_STATEID during delegation return ACPI: x86: Add another system to quirk list for forcing StorageD3Enable firmware: arm_scmi: Cleanup the core driver removal callback i2c: tegra: Allocate DMA memory for DMA engine i2c: i801: add lis3lv02d's I2C address for Vostro 5568 drm/imx: imx-tve: Fix return type of imx_tve_connector_mode_valid btrfs: remove pointless and double ulist frees in error paths of qgroup tests Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm x86/cpu: Add several Intel server CPU model numbers ASoC: codecs: jz4725b: Fix spelling mistake "Sourc" -> "Source", "Routee" -> "Route" mtd: spi-nor: intel-spi: Disable write protection only if asked spi: intel: Use correct mask for flash and protected regions KVM: x86/pmu: Do not speculatively query Intel GP PMCs that don't exist yet hugetlbfs: don't delete error page from pagecache arm64: dts: qcom: sa8155p-adp: Specify which LDO modes are allowed arm64: dts: qcom: sm8150-xperia-kumano: Specify which LDO modes are allowed arm64: dts: qcom: sm8250-xperia-edo: Specify which LDO modes are allowed arm64: dts: qcom: sm8350-hdk: Specify which LDO modes are allowed spi: stm32: Print summary 'callbacks suppressed' message ARM: dts: at91: sama7g5: fix signal name of pin PB2 ASoC: core: Fix use-after-free in snd_soc_exit() ASoC: tas2770: Fix set_tdm_slot in case of single slot ASoC: tas2764: Fix set_tdm_slot in case of single slot ARM: at91: pm: avoid soft resetting AC DLL serial: 8250: omap: Fix missing PM runtime calls for omap8250_set_mctrl() serial: 8250_omap: remove wait loop from Errata i202 workaround serial: 8250: omap: Fix unpaired pm_runtime_put_sync() in omap8250_remove() serial: 8250: omap: Flush PM QOS work on remove serial: imx: Add missing .thaw_noirq hook tty: n_gsm: fix sleep-in-atomic-context bug in gsm_control_send bpf, test_run: Fix alignment problem in bpf_prog_test_run_skb() ASoC: soc-utils: Remove __exit for snd_soc_util_exit() pinctrl: rockchip: list all pins in a possible mux route for PX30 scsi: scsi_transport_sas: Fix error handling in sas_phy_add() block: sed-opal: kmalloc the cmd/resp buffers bpf: Fix memory leaks in __check_func_call arm64: Fix bit-shifting UB in the MIDR_CPU_MODEL() macro siox: fix possible memory leak in siox_device_add() parport_pc: Avoid FIFO port location truncation pinctrl: devicetree: fix null pointer dereferencing in pinctrl_dt_to_map drm/vc4: kms: Fix IS_ERR() vs NULL check for vc4_kms drm/panel: simple: set bpc field for logic technologies displays drm/drv: Fix potential memory leak in drm_dev_init() drm: Fix potential null-ptr-deref in drm_vblank_destroy_worker() ARM: dts: imx7: Fix NAND controller size-cells arm64: dts: imx8mm: Fix NAND controller size-cells arm64: dts: imx8mn: Fix NAND controller size-cells ata: libata-transport: fix double ata_host_put() in ata_tport_add() ata: libata-transport: fix error handling in ata_tport_add() ata: libata-transport: fix error handling in ata_tlink_add() ata: libata-transport: fix error handling in ata_tdev_add() nfp: change eeprom length to max length enumerators MIPS: fix duplicate definitions for exported symbols MIPS: Loongson64: Add WARN_ON on kexec related kmalloc failed bpf: Initialize same number of free nodes for each pcpu_freelist net: bgmac: Drop free_netdev() from bgmac_enet_remove() mISDN: fix possible memory leak in mISDN_dsp_element_register() net: hinic: Fix error handling in hinic_module_init() net: stmmac: ensure tx function is not running in stmmac_xdp_release() soc: imx8m: Enable OCOTP clock before reading the register net: liquidio: release resources when liquidio driver open failed mISDN: fix misuse of put_device() in mISDN_register_device() net: macvlan: Use built-in RCU list checking net: caif: fix double disconnect client in chnl_net_open() bnxt_en: Remove debugfs when pci_register_driver failed net: mhi: Fix memory leak in mhi_net_dellink() net: dsa: make dsa_master_ioctl() see through port_hwtstamp_get() shims xen/pcpu: fix possible memory leak in register_pcpu() net: ionic: Fix error handling in ionic_init_module() net: ena: Fix error handling in ena_init() net: hns3: fix setting incorrect phy link ksettings for firmware in resetting process bridge: switchdev: Fix memory leaks when changing VLAN protocol drbd: use after free in drbd_create_device() platform/x86/intel: pmc: Don't unconditionally attach Intel PMC when virtualized platform/surface: aggregator: Do not check for repeated unsequenced packets cifs: add check for returning value of SMB2_close_init net: ag71xx: call phylink_disconnect_phy if ag71xx_hw_enable() fail in ag71xx_open() net/x25: Fix skb leak in x25_lapb_receive_frame() cifs: Fix wrong return value checking when GETFLAGS net: microchip: sparx5: Fix potential null-ptr-deref in sparx_stats_init() and sparx5_start() net: thunderbolt: Fix error handling in tbnet_init() cifs: add check for returning value of SMB2_set_info_init ftrace: Fix the possible incorrect kernel message ftrace: Optimize the allocation for mcount entries ftrace: Fix null pointer dereference in ftrace_add_mod() ring_buffer: Do not deactivate non-existant pages tracing: Fix memory leak in tracing_read_pipe() tracing/ring-buffer: Have polling block on watermark tracing: Fix memory leak in test_gen_synth_cmd() and test_empty_synth_event() tracing: Fix wild-memory-access in register_synth_event() tracing: Fix race where eprobes can be called before the event tracing: kprobe: Fix potential null-ptr-deref on trace_event_file in kprobe_event_gen_test_exit() tracing: kprobe: Fix potential null-ptr-deref on trace_array in kprobe_event_gen_test_exit() drm/amd/display: Add HUBP surface flip interrupt handler ALSA: usb-audio: Drop snd_BUG_ON() from snd_usbmidi_output_open() ALSA: hda/realtek: fix speakers for Samsung Galaxy Book Pro ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy Book Pro 360 Revert "usb: dwc3: disable USB core PHY management" slimbus: qcom-ngd: Fix build error when CONFIG_SLIM_QCOM_NGD_CTRL=y && CONFIG_QCOM_RPROC_COMMON=m slimbus: stream: correct presence rate frequencies speakup: fix a segfault caused by switching consoles USB: bcma: Make GPIO explicitly optional USB: serial: option: add Sierra Wireless EM9191 USB: serial: option: remove old LARA-R6 PID USB: serial: option: add u-blox LARA-R6 00B modem USB: serial: option: add u-blox LARA-L6 modem USB: serial: option: add Fibocom FM160 0x0111 composition usb: add NO_LPM quirk for Realforce 87U Keyboard usb: chipidea: fix deadlock in ci_otg_del_timer usb: cdns3: host: fix endless superspeed hub port reset usb: typec: mux: Enter safe mode only when pins need to be reconfigured iio: adc: at91_adc: fix possible memory leak in at91_adc_allocate_trigger() iio: trigger: sysfs: fix possible memory leak in iio_sysfs_trig_init() iio: adc: mp2629: fix wrong comparison of channel iio: adc: mp2629: fix potential array out of bound access iio: pressure: ms5611: changed hardcoded SPI speed to value limited dm ioctl: fix misbehavior if list_versions races with module loading serial: 8250: Fall back to non-DMA Rx if IIR_RDI occurs serial: 8250: Flush DMA Rx on RLSI serial: 8250_lpss: Configure DMA also w/o DMA filter Input: iforce - invert valid length check when fetching device IDs maccess: Fix writing offset in case of fault in strncpy_from_kernel_nofault() net: phy: marvell: add sleep time after enabling the loopback bit scsi: zfcp: Fix double free of FSF request when qdio send fails iommu/vt-d: Preset Access bit for IOVA in FL non-leaf paging entries iommu/vt-d: Set SRE bit only when hardware has SRS cap firmware: coreboot: Register bus in module init mmc: core: properly select voltage range without power cycle mmc: sdhci-pci-o2micro: fix card detect fail issue caused by CD# debounce timeout mmc: sdhci-pci: Fix possible memory leak caused by missing pci_dev_put() docs: update mediator contact information in CoC doc misc/vmw_vmci: fix an infoleak in vmci_host_do_receive_datagram() perf/x86/intel/pt: Fix sampling using single range output nvme: restrict management ioctls to admin nvme: ensure subsystem reset is single threaded serial: 8250_lpss: Use 16B DMA burst with Elkhart Lake perf: Improve missing SIGTRAP checking ring-buffer: Include dropped pages in counting dirty patches tracing: Fix warning on variable 'struct trace_array' net: use struct_group to copy ip/ipv6 header addresses scsi: target: tcm_loop: Fix possible name leak in tcm_loop_setup_hba_bus() scsi: scsi_debug: Fix possible UAF in sdebug_add_host_helper() kprobes: Skip clearing aggrprobe's post_handler in kprobe-on-ftrace case Input: i8042 - fix leaking of platform device on module removal macvlan: enforce a consistent minimal mtu tcp: cdg: allow tcp_cdg_release() to be called multiple times kcm: avoid potential race in kcm_tx_work kcm: close race conditions on sk_receive_queue 9p: trans_fd/p9_conn_cancel: drop client lock earlier gfs2: Check sb_bsize_shift after reading superblock gfs2: Switch from strlcpy to strscpy 9p/trans_fd: always use O_NONBLOCK read/write wifi: wext: use flex array destination for memcpy() mm: fs: initialize fsdata passed to write_begin/write_end interface net/9p: use a dedicated spinlock for trans_fd ntfs: fix use-after-free in ntfs_attr_find() ntfs: fix out-of-bounds read in ntfs_attr_find() ntfs: check overflow when iterating ATTR_RECORDs Linux 5.15.80 Change-Id: Idc9aa4c30c528dd194bc813201cbb2c5df8c1d62 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2022-11-28 16:08:50 +00:00
Li Huafei	c49cc2c059	kprobes: Skip clearing aggrprobe's post_handler in kprobe-on-ftrace case [ Upstream commit 5dd7caf0bdc5d0bae7cf9776b4d739fb09bd5ebb ] In __unregister_kprobe_top(), if the currently unregistered probe has post_handler but other child probes of the aggrprobe do not have post_handler, the post_handler of the aggrprobe is cleared. If this is a ftrace-based probe, there is a problem. In later calls to disarm_kprobe(), we will use kprobe_ftrace_ops because post_handler is NULL. But we're armed with kprobe_ipmodify_ops. This triggers a WARN in __disarm_kprobe_ftrace() and may even cause use-after-free: Failed to disarm kprobe-ftrace at kernel_clone+0x0/0x3c0 (error -2) WARNING: CPU: 5 PID: 137 at kernel/kprobes.c:1135 __disarm_kprobe_ftrace.isra.21+0xcf/0xe0 Modules linked in: testKprobe_007(-) CPU: 5 PID: 137 Comm: rmmod Not tainted 6.1.0-rc4-dirty #18 [...] Call Trace: <TASK> __disable_kprobe+0xcd/0xe0 __unregister_kprobe_top+0x12/0x150 ? mutex_lock+0xe/0x30 unregister_kprobes.part.23+0x31/0xa0 unregister_kprobe+0x32/0x40 __x64_sys_delete_module+0x15e/0x260 ? do_user_addr_fault+0x2cd/0x6b0 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] For the kprobe-on-ftrace case, we keep the post_handler setting to identify this aggrprobe armed with kprobe_ipmodify_ops. This way we can disarm it correctly. Link: https://lore.kernel.org/all/20221112070000.35299-1-lihuafei1@huawei.com/ Fixes: `0bc11ed5ab` ("kprobes: Allow kprobes coexist with livepatch") Reported-by: Zhao Gongyi <zhaogongyi@huawei.com> Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Li Huafei <lihuafei1@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-11-26 09:24:50 +01:00
Steven Rostedt (Google)	73cf0ff9a3	ring-buffer: Include dropped pages in counting dirty patches [ Upstream commit 31029a8b2c7e656a0289194ef16415050ae4c4ac ] The function ring_buffer_nr_dirty_pages() was created to find out how many pages are filled in the ring buffer. There's two running counters. One is incremented whenever a new page is touched (pages_touched) and the other is whenever a page is read (pages_read). The dirty count is the number touched minus the number read. This is used to determine if a blocked task should be woken up if the percentage of the ring buffer it is waiting for is hit. The problem is that it does not take into account dropped pages (when the new writes overwrite pages that were not read). And then the dirty pages will always be greater than the percentage. This makes the "buffer_percent" file inaccurate, as the number of dirty pages end up always being larger than the percentage, event when it's not and this causes user space to be woken up more than it wants to be. Add a new counter to keep track of lost pages, and include that in the accounting of dirty pages so that it is actually accurate. Link: https://lkml.kernel.org/r/20221021123013.55fb6055@gandalf.local.home Fixes: `2c2b0a78b3` ("ring-buffer: Add percentage of ring buffer full to wake up reader") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-11-26 09:24:49 +01:00
Marco Elver	35c60b4e8c	perf: Improve missing SIGTRAP checking [ Upstream commit bb88f9695460bec25aa30ba9072595025cf6c8af ] To catch missing SIGTRAP we employ a WARN in __perf_event_overflow(), which fires if pending_sigtrap was already set: returning to user space without consuming pending_sigtrap, and then having the event fire again would re-enter the kernel and trigger the WARN. This, however, seemed to miss the case where some events not associated with progress in the user space task can fire and the interrupt handler runs before the IRQ work meant to consume pending_sigtrap (and generate the SIGTRAP). syzbot gifted us this stack trace: \| WARNING: CPU: 0 PID: 3607 at kernel/events/core.c:9313 __perf_event_overflow \| Modules linked in: \| CPU: 0 PID: 3607 Comm: syz-executor100 Not tainted 6.1.0-rc2-syzkaller-00073-g88619e77b33d #0 \| Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/11/2022 \| RIP: 0010:__perf_event_overflow+0x498/0x540 kernel/events/core.c:9313 \| <...> \| Call Trace: \| <TASK> \| perf_swevent_hrtimer+0x34f/0x3c0 kernel/events/core.c:10729 \| __run_hrtimer kernel/time/hrtimer.c:1685 [inline] \| __hrtimer_run_queues+0x1c6/0xfb0 kernel/time/hrtimer.c:1749 \| hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811 \| local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1096 [inline] \| __sysvec_apic_timer_interrupt+0x17c/0x640 arch/x86/kernel/apic/apic.c:1113 \| sysvec_apic_timer_interrupt+0x40/0xc0 arch/x86/kernel/apic/apic.c:1107 \| asm_sysvec_apic_timer_interrupt+0x16/0x20 arch/x86/include/asm/idtentry.h:649 \| <...> \| </TASK> In this case, syzbot produced a program with event type PERF_TYPE_SOFTWARE and config PERF_COUNT_SW_CPU_CLOCK. The hrtimer manages to fire again before the IRQ work got a chance to run, all while never having returned to user space. Improve the WARN to check for real progress in user space: approximate this by storing a 32-bit hash of the current IP into pending_sigtrap, and if an event fires while pending_sigtrap still matches the previous IP, we assume no progress (false negatives are possible given we could return to user space and trigger again on the same IP). Fixes: ca6c21327c6a ("perf: Fix missing SIGTRAPs") Reported-by: syzbot+b8ded3e2e2c6adde4990@syzkaller.appspotmail.com Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221031093513.3032814-1-elver@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-11-26 09:24:49 +01:00
Shang XiaoJing	e57daa7503	tracing: kprobe: Fix potential null-ptr-deref on trace_array in kprobe_event_gen_test_exit() commit 22ea4ca9631eb137e64e5ab899e9c89cb6670959 upstream. When test_gen_kprobe_cmd() failed after kprobe_event_gen_cmd_end(), it will goto delete, which will call kprobe_event_delete() and release the corresponding resource. However, the trace_array in gen_kretprobe_test will point to the invalid resource. Set gen_kretprobe_test to NULL after called kprobe_event_delete() to prevent null-ptr-deref. BUG: kernel NULL pointer dereference, address: 0000000000000070 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 246 Comm: modprobe Tainted: G W 6.1.0-rc1-00174-g9522dc5c87da-dirty #248 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 RIP: 0010:__ftrace_set_clr_event_nolock+0x53/0x1b0 Code: e8 82 26 fc ff 49 8b 1e c7 44 24 0c ea ff ff ff 49 39 de 0f 84 3c 01 00 00 c7 44 24 18 00 00 00 00 e8 61 26 fc ff 48 8b 6b 10 <44> 8b 65 70 4c 8b 6d 18 41 f7 c4 00 02 00 00 75 2f RSP: 0018:ffffc9000159fe00 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff88810971d268 RCX: 0000000000000000 RDX: ffff8881080be600 RSI: ffffffff811b48ff RDI: ffff88810971d058 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001 R10: ffffc9000159fe58 R11: 0000000000000001 R12: ffffffffa0001064 R13: ffffffffa000106c R14: ffff88810971d238 R15: 0000000000000000 FS: 00007f89eeff6540(0000) GS:ffff88813b600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000070 CR3: 000000010599e004 CR4: 0000000000330ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __ftrace_set_clr_event+0x3e/0x60 trace_array_set_clr_event+0x35/0x50 ? 0xffffffffa0000000 kprobe_event_gen_test_exit+0xcd/0x10b [kprobe_event_gen_test] __x64_sys_delete_module+0x206/0x380 ? lockdep_hardirqs_on_prepare+0xd8/0x190 ? syscall_enter_from_user_mode+0x1c/0x50 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f89eeb061b7 Link: https://lore.kernel.org/all/20221108015130.28326-3-shangxiaojing@huawei.com/ Fixes: `64836248dd` ("tracing: Add kprobe event command generation test module") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Cc: stable@vger.kernel.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-11-26 09:24:43 +01:00
Shang XiaoJing	3a41c0f2a5	tracing: kprobe: Fix potential null-ptr-deref on trace_event_file in kprobe_event_gen_test_exit() commit e0d75267f59d7084e0468bd68beeb1bf9c71d7c0 upstream. When trace_get_event_file() failed, gen_kretprobe_test will be assigned as the error code. If module kprobe_event_gen_test is removed now, the null pointer dereference will happen in kprobe_event_gen_test_exit(). Check if gen_kprobe_test or gen_kretprobe_test is error code or NULL before dereference them. BUG: kernel NULL pointer dereference, address: 0000000000000012 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 2210 Comm: modprobe Not tainted 6.1.0-rc1-00171-g2159299a3b74-dirty #217 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 RIP: 0010:kprobe_event_gen_test_exit+0x1c/0xb5 [kprobe_event_gen_test] Code: Unable to access opcode bytes at 0xffffffff9ffffff2. RSP: 0018:ffffc900015bfeb8 EFLAGS: 00010246 RAX: ffffffffffffffea RBX: ffffffffa0002080 RCX: 0000000000000000 RDX: ffffffffa0001054 RSI: ffffffffa0001064 RDI: ffffffffdfc6349c RBP: ffffffffa0000000 R08: 0000000000000004 R09: 00000000001e95c0 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000800 R13: ffffffffa0002420 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f56b75be540(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff9ffffff2 CR3: 000000010874a006 CR4: 0000000000330ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __x64_sys_delete_module+0x206/0x380 ? lockdep_hardirqs_on_prepare+0xd8/0x190 ? syscall_enter_from_user_mode+0x1c/0x50 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Link: https://lore.kernel.org/all/20221108015130.28326-2-shangxiaojing@huawei.com/ Fixes: `64836248dd` ("tracing: Add kprobe event command generation test module") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-11-26 09:24:43 +01:00
Steven Rostedt (Google)	7291dec4f2	tracing: Fix race where eprobes can be called before the event commit 94eedf3dded5fb472ce97bfaf3ac1c6c29c35d26 upstream. The flag that tells the event to call its triggers after reading the event is set for eprobes after the eprobe is enabled. This leads to a race where the eprobe may be triggered at the beginning of the event where the record information is NULL. The eprobe then dereferences the NULL record causing a NULL kernel pointer bug. Test for a NULL record to keep this from happening. Link: https://lore.kernel.org/linux-trace-kernel/20221116192552.1066630-1-rafaelmendsr@gmail.com/ Link: https://lore.kernel.org/linux-trace-kernel/20221117214249.2addbe10@gandalf.local.home Cc: Linux Trace Kernel <linux-trace-kernel@vger.kernel.org> Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: stable@vger.kernel.org Fixes: `7491e2c442` ("tracing: Add a probe that attaches to trace events") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reported-by: Rafael Mendonca <rafaelmendsr@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-11-26 09:24:43 +01:00
Shang XiaoJing	6517b97134	tracing: Fix wild-memory-access in register_synth_event() commit 1b5f1c34d3f5a664a57a5a7557a50e4e3cc2505c upstream. In register_synth_event(), if set_synth_event_print_fmt() failed, then both trace_remove_event_call() and unregister_trace_event() will be called, which means the trace_event_call will call __unregister_trace_event() twice. As the result, the second unregister will causes the wild-memory-access. register_synth_event set_synth_event_print_fmt failed trace_remove_event_call event_remove if call->event.funcs then __unregister_trace_event (first call) unregister_trace_event __unregister_trace_event (second call) Fix the bug by avoiding to call the second __unregister_trace_event() by checking if the first one is called. general protection fault, probably for non-canonical address 0xfbd59c0000000024: 0000 [#1] SMP KASAN PTI KASAN: maybe wild-memory-access in range [0xdead000000000120-0xdead000000000127] CPU: 0 PID: 3807 Comm: modprobe Not tainted 6.1.0-rc1-00186-g76f33a7eedb4 #299 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_trace_event+0x6e/0x280 Code: 00 fc ff df 4c 89 ea 48 c1 ea 03 80 3c 02 00 0f 85 0e 02 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 63 08 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f 85 e2 01 00 00 49 89 2c 24 48 85 ed 74 28 e8 7a 9b RSP: 0018:ffff88810413f370 EFLAGS: 00010a06 RAX: dffffc0000000000 RBX: ffff888105d050b0 RCX: 0000000000000000 RDX: 1bd5a00000000024 RSI: ffff888119e276e0 RDI: ffffffff835a8b20 RBP: dead000000000100 R08: 0000000000000000 R09: fffffbfff0913481 R10: ffffffff8489a407 R11: fffffbfff0913480 R12: dead000000000122 R13: ffff888105d050b8 R14: 0000000000000000 R15: ffff888105d05028 FS: 00007f7823e8d540(0000) GS:ffff888119e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7823e7ebec CR3: 000000010a058002 CR4: 0000000000330ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __create_synth_event+0x1e37/0x1eb0 create_or_delete_synth_event+0x110/0x250 synth_event_run_command+0x2f/0x110 test_gen_synth_cmd+0x170/0x2eb [synth_event_gen_test] synth_event_gen_test_init+0x76/0x9bc [synth_event_gen_test] do_one_initcall+0xdb/0x480 do_init_module+0x1cf/0x680 load_module+0x6a50/0x70a0 __do_sys_finit_module+0x12f/0x1c0 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Link: https://lkml.kernel.org/r/20221117012346.22647-3-shangxiaojing@huawei.com Fixes: `4b147936fa` ("tracing: Add support for 'synthetic' events") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Cc: stable@vger.kernel.org Cc: <mhiramat@kernel.org> Cc: <zanussi@kernel.org> Cc: <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-11-26 09:24:43 +01:00
Shang XiaoJing	07ba4f0603	tracing: Fix memory leak in test_gen_synth_cmd() and test_empty_synth_event() commit a4527fef9afe5c903c718d0cd24609fe9c754250 upstream. test_gen_synth_cmd() only free buf in fail path, hence buf will leak when there is no failure. Add kfree(buf) to prevent the memleak. The same reason and solution in test_empty_synth_event(). unreferenced object 0xffff8881127de000 (size 2048): comm "modprobe", pid 247, jiffies 4294972316 (age 78.756s) hex dump (first 32 bytes): 20 67 65 6e 5f 73 79 6e 74 68 5f 74 65 73 74 20 gen_synth_test 20 70 69 64 5f 74 20 6e 65 78 74 5f 70 69 64 5f pid_t next_pid_ backtrace: [<000000004254801a>] kmalloc_trace+0x26/0x100 [<0000000039eb1cf5>] 0xffffffffa00083cd [<000000000e8c3bc8>] 0xffffffffa00086ba [<00000000c293d1ea>] do_one_initcall+0xdb/0x480 [<00000000aa189e6d>] do_init_module+0x1cf/0x680 [<00000000d513222b>] load_module+0x6a50/0x70a0 [<000000001fd4d529>] __do_sys_finit_module+0x12f/0x1c0 [<00000000b36c4c0f>] do_syscall_64+0x3f/0x90 [<00000000bbf20cf3>] entry_SYSCALL_64_after_hwframe+0x63/0xcd unreferenced object 0xffff8881127df000 (size 2048): comm "modprobe", pid 247, jiffies 4294972324 (age 78.728s) hex dump (first 32 bytes): 20 65 6d 70 74 79 5f 73 79 6e 74 68 5f 74 65 73 empty_synth_tes 74 20 20 70 69 64 5f 74 20 6e 65 78 74 5f 70 69 t pid_t next_pi backtrace: [<000000004254801a>] kmalloc_trace+0x26/0x100 [<00000000d4db9a3d>] 0xffffffffa0008071 [<00000000c31354a5>] 0xffffffffa00086ce [<00000000c293d1ea>] do_one_initcall+0xdb/0x480 [<00000000aa189e6d>] do_init_module+0x1cf/0x680 [<00000000d513222b>] load_module+0x6a50/0x70a0 [<000000001fd4d529>] __do_sys_finit_module+0x12f/0x1c0 [<00000000b36c4c0f>] do_syscall_64+0x3f/0x90 [<00000000bbf20cf3>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Link: https://lkml.kernel.org/r/20221117012346.22647-2-shangxiaojing@huawei.com Cc: <mhiramat@kernel.org> Cc: <zanussi@kernel.org> Cc: <fengguang.wu@intel.com> Cc: stable@vger.kernel.org Fixes: `9fe41efaca` ("tracing: Add synth event generation test module") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-11-26 09:24:43 +01:00
Steven Rostedt (Google)	8b318f3032	tracing/ring-buffer: Have polling block on watermark commit 42fb0a1e84ff525ebe560e2baf9451ab69127e2b upstream. Currently the way polling works on the ring buffer is broken. It will return immediately if there's any data in the ring buffer whereas a read will block until the watermark (defined by the tracefs buffer_percent file) is hit. That is, a select() or poll() will return as if there's data available, but then the following read will block. This is broken for the way select()s and poll()s are supposed to work. Have the polling on the ring buffer also block the same way reads and splice does on the ring buffer. Link: https://lkml.kernel.org/r/20221020231427.41be3f26@gandalf.local.home Cc: Linux Trace Kernel <linux-trace-kernel@vger.kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Primiano Tucci <primiano@google.com> Cc: stable@vger.kernel.org Fixes: `1e0d6714ac` ("ring-buffer: Do not wake up a splice waiter when page is not full") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-11-26 09:24:42 +01:00
Wang Yufen	2c21ee020c	tracing: Fix memory leak in tracing_read_pipe() commit 649e72070cbbb8600eb823833e4748f5a0815116 upstream. kmemleak reports this issue: unreferenced object 0xffff888105a18900 (size 128): comm "test_progs", pid 18933, jiffies 4336275356 (age 22801.766s) hex dump (first 32 bytes): 25 73 00 90 81 88 ff ff 26 05 00 00 42 01 58 04 %s......&...B.X. 03 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000560143a1>] __kmalloc_node_track_caller+0x4a/0x140 [<000000006af00822>] krealloc+0x8d/0xf0 [<00000000c309be6a>] trace_iter_expand_format+0x99/0x150 [<000000005a53bdb6>] trace_check_vprintf+0x1e0/0x11d0 [<0000000065629d9d>] trace_event_printf+0xb6/0xf0 [<000000009a690dc7>] trace_raw_output_bpf_trace_printk+0x89/0xc0 [<00000000d22db172>] print_trace_line+0x73c/0x1480 [<00000000cdba76ba>] tracing_read_pipe+0x45c/0x9f0 [<0000000015b58459>] vfs_read+0x17b/0x7c0 [<000000004aeee8ed>] ksys_read+0xed/0x1c0 [<0000000063d3d898>] do_syscall_64+0x3b/0x90 [<00000000a06dda7f>] entry_SYSCALL_64_after_hwframe+0x63/0xcd iter->fmt alloced in tracing_read_pipe() -> .. ->trace_iter_expand_format(), but not freed, to fix, add free in tracing_release_pipe() Link: https://lkml.kernel.org/r/1667819090-4643-1-git-send-email-wangyufen@huawei.com Cc: stable@vger.kernel.org Fixes: `efbbdaa22b` ("tracing: Show real address for trace event arguments") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-11-26 09:24:42 +01:00
Daniil Tatianin	00f74b1a98	ring_buffer: Do not deactivate non-existant pages commit 56f4ca0a79a9f1af98f26c54b9b89ba1f9bcc6bd upstream. rb_head_page_deactivate() expects cpu_buffer to contain a valid list of ->pages, so verify that the list is actually present before calling it. Found by Linux Verification Center (linuxtesting.org) with the SVACE static analysis tool. Link: https://lkml.kernel.org/r/20221114143129.3534443-1-d-tatianin@yandex-team.ru Cc: stable@vger.kernel.org Fixes: `77ae365eca` ("ring-buffer: make lockless") Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-11-26 09:24:42 +01:00
Xiu Jianfeng	1bea037a1a	ftrace: Fix null pointer dereference in ftrace_add_mod() commit 19ba6c8af9382c4c05dc6a0a79af3013b9a35cd0 upstream. The @ftrace_mod is allocated by kzalloc(), so both the members {prev,next} of @ftrace_mode->list are NULL, it's not a valid state to call list_del(). If kstrdup() for @ftrace_mod->{func\|module} fails, it goes to @out_free tag and calls free_ftrace_mod() to destroy @ftrace_mod, then list_del() will write prev->next and next->prev, where null pointer dereference happens. BUG: kernel NULL pointer dereference, address: 0000000000000008 Oops: 0002 [#1] PREEMPT SMP NOPTI Call Trace: <TASK> ftrace_mod_callback+0x20d/0x220 ? do_filp_open+0xd9/0x140 ftrace_process_regex.isra.51+0xbf/0x130 ftrace_regex_write.isra.52.part.53+0x6e/0x90 vfs_write+0xee/0x3a0 ? __audit_filter_op+0xb1/0x100 ? auditd_test_task+0x38/0x50 ksys_write+0xa5/0xe0 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Kernel panic - not syncing: Fatal exception So call INIT_LIST_HEAD() to initialize the list member to fix this issue. Link: https://lkml.kernel.org/r/20221116015207.30858-1-xiujianfeng@huawei.com Cc: stable@vger.kernel.org Fixes: `673feb9d76` ("ftrace: Add :mod: caching infrastructure to trace_array") Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-11-26 09:24:42 +01:00
Wang Wensheng	fadfcf39fb	ftrace: Optimize the allocation for mcount entries commit bcea02b096333dc74af987cb9685a4dbdd820840 upstream. If we can't allocate this size, try something smaller with half of the size. Its order should be decreased by one instead of divided by two. Link: https://lkml.kernel.org/r/20221109094434.84046-3-wangwensheng4@huawei.com Cc: <mhiramat@kernel.org> Cc: <mark.rutland@arm.com> Cc: stable@vger.kernel.org Fixes: `a790087554` ("ftrace: Allocate the mcount record pages as groups") Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-11-26 09:24:42 +01:00
Wang Wensheng	5c5f264289	ftrace: Fix the possible incorrect kernel message commit 08948caebe93482db1adfd2154eba124f66d161d upstream. If the number of mcount entries is an integer multiple of ENTRIES_PER_PAGE, the page count showing on the console would be wrong. Link: https://lkml.kernel.org/r/20221109094434.84046-2-wangwensheng4@huawei.com Cc: <mhiramat@kernel.org> Cc: <mark.rutland@arm.com> Cc: stable@vger.kernel.org Fixes: `5821e1b74f` ("function tracing: fix wrong pos computing when read buffer has been fulfilled") Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-11-26 09:24:42 +01:00
Xu Kuohai	7ff4fa179e	bpf: Initialize same number of free nodes for each pcpu_freelist [ Upstream commit 4b45cd81f737d79d0fbfc0d320a1e518e7f0bbf0 ] pcpu_freelist_populate() initializes nr_elems / num_possible_cpus() + 1 free nodes for some CPUs, and then possibly one CPU with fewer nodes, followed by remaining cpus with 0 nodes. For example, when nr_elems == 256 and num_possible_cpus() == 32, CPU 0~27 each gets 9 free nodes, CPU 28 gets 4 free nodes, CPU 29~31 get 0 free nodes, while in fact each CPU should get 8 nodes equally. This patch initializes nr_elems / num_possible_cpus() free nodes for each CPU firstly, then allocates the remaining free nodes by one for each CPU until no free nodes left. Fixes: `e19494edab` ("bpf: introduce percpu_freelist") Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20221110122128.105214-1-xukuohai@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-11-26 09:24:38 +01:00

1 2 3 4 5 ...

38569 Commits