kernel_arpi

Author	SHA1	Message	Date
Greg Kroah-Hartman	335a32d033	Merge 5.15.75 into android14-5.15 Changes in 5.15.75 Revert "fs: check FMODE_LSEEK to control internal pipe splicing" ALSA: oss: Fix potential deadlock at unregistration ALSA: rawmidi: Drop register_mutex in snd_rawmidi_free() ALSA: usb-audio: Fix potential memory leaks ALSA: usb-audio: Fix NULL dererence at error path ALSA: hda/realtek: remove ALC289_FIXUP_DUAL_SPK for Dell 5530 ALSA: hda/realtek: Correct pin configs for ASUS G533Z ALSA: hda/realtek: Add quirk for ASUS GV601R laptop ALSA: hda/realtek: Add Intel Reference SSID to support headset keys mtd: rawnand: atmel: Unmap streaming DMA mappings io_uring/net: don't update msg_name if not provided hv_netvsc: Fix race between VF offering and VF association message from host cifs: destage dirty pages before re-reading them for cache=none cifs: Fix the error length of VALIDATE_NEGOTIATE_INFO message iio: dac: ad5593r: Fix i2c read protocol requirements iio: ltc2497: Fix reading conversion results iio: adc: ad7923: fix channel readings for some variants iio: pressure: dps310: Refactor startup procedure iio: pressure: dps310: Reset chip after timeout xhci: dbc: Fix memory leak in xhci_alloc_dbc() usb: add quirks for Lenovo OneLink+ Dock can: kvaser_usb: Fix use of uninitialized completion can: kvaser_usb_leaf: Fix overread with an invalid command can: kvaser_usb_leaf: Fix TX queue out of sync after restart can: kvaser_usb_leaf: Fix CAN state after restart mmc: sdhci-sprd: Fix minimum clock limit i2c: designware: Fix handling of real but unexpected device interrupts fs: dlm: fix race between test_bit() and queue_work() fs: dlm: handle -EBUSY first in lock arg validation HID: multitouch: Add memory barriers quota: Check next/prev free block number after reading from quota file platform/chrome: cros_ec_proto: Update version on GET_NEXT_EVENT failure ASoC: wcd9335: fix order of Slimbus unprepare/disable ASoC: wcd934x: fix order of Slimbus unprepare/disable hwmon: (gsc-hwmon) Call of_node_get() before of_find_xxx API net: thunderbolt: Enable DMA paths only after rings are enabled regulator: qcom_rpm: Fix circular deferral regression arm64: topology: move store_cpu_topology() to shared code riscv: topology: fix default topology reporting RISC-V: Make port I/O string accessors actually work parisc: fbdev/stifb: Align graphics memory size to 4MB riscv: Allow PROT_WRITE-only mmap() riscv: Make VM_WRITE imply VM_READ riscv: always honor the CONFIG_CMDLINE_FORCE when parsing dtb riscv: Pass -mno-relax only on lld < 15.0.0 UM: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK nvmem: core: Fix memleak in nvmem_register() nvme-multipath: fix possible hang in live ns resize with ANA access nvme-pci: set min_align_mask before calculating max_hw_sectors Revert "drm/amdgpu: use dirty framebuffer helper" dmaengine: mxs: use platform_driver_register drm/virtio: Check whether transferred 2D BO is shmem drm/virtio: Unlock reservations on virtio_gpu_object_shmem_init() error drm/virtio: Use appropriate atomic state in virtio_gpu_plane_cleanup_fb() drm/udl: Restore display mode on resume arm64: errata: Add Cortex-A55 to the repeat tlbi list mm/damon: validate if the pmd entry is present before accessing mm/mmap: undo ->mmap() when arch_validate_flags() fails xen/gntdev: Prevent leaking grants xen/gntdev: Accommodate VMA splitting PCI: Sanitise firmware BAR assignments behind a PCI-PCI bridge serial: 8250: Let drivers request full 16550A feature probing serial: 8250: Request full 16550A feature probing for OxSemi PCIe devices NFSD: Protect against send buffer overflow in NFSv3 READDIR NFSD: Protect against send buffer overflow in NFSv2 READ NFSD: Protect against send buffer overflow in NFSv3 READ powercap: intel_rapl: Use standard Energy Unit for SPR Dram RAPL domain powerpc/boot: Explicitly disable usage of SPE instructions slimbus: qcom-ngd: use correct error in message of pdr_add_lookup() failure slimbus: qcom-ngd: cleanup in probe error path scsi: qedf: Populate sysfs attributes for vport gpio: rockchip: request GPIO mux to pinctrl when setting direction pinctrl: rockchip: add pinmux_ops.gpio_set_direction callback fbdev: smscufx: Fix use-after-free in ufx_ops_open() ksmbd: fix endless loop when encryption for response fails ksmbd: Fix wrong return value and message length check in smb2_ioctl() ksmbd: Fix user namespace mapping fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE btrfs: fix race between quota enable and quota rescan ioctl btrfs: set generation before calling btrfs_clean_tree_block in btrfs_init_new_buffer f2fs: complete checkpoints during remount f2fs: flush pending checkpoints when freezing super f2fs: increase the limit for reserve_root f2fs: fix to do sanity check on destination blkaddr during recovery f2fs: fix to do sanity check on summary info hardening: Avoid harmless Clang option under CONFIG_INIT_STACK_ALL_ZERO hardening: Remove Clang's enable flag for -ftrivial-auto-var-init=zero jbd2: wake up journal waiters in FIFO order, not LIFO jbd2: fix potential buffer head reference count leak jbd2: fix potential use-after-free in jbd2_fc_wait_bufs jbd2: add miss release buffer head in fc_do_one_pass() ext4: avoid crash when inline data creation follows DIO write ext4: fix null-ptr-deref in ext4_write_info ext4: make ext4_lazyinit_thread freezable ext4: fix check for block being out of directory size ext4: don't increase iversion counter for ea_inodes ext4: ext4_read_bh_lock() should submit IO if the buffer isn't uptodate ext4: place buffer head allocation before handle start ext4: fix dir corruption when ext4_dx_add_entry() fails ext4: fix miss release buffer head in ext4_fc_write_inode ext4: fix potential memory leak in ext4_fc_record_modified_inode() ext4: fix potential memory leak in ext4_fc_record_regions() ext4: update 'state->fc_regions_size' after successful memory allocation livepatch: fix race between fork and KLP transition ftrace: Properly unset FTRACE_HASH_FL_MOD ring-buffer: Allow splice to read previous partially read pages ring-buffer: Have the shortest_full queue be the shortest not longest ring-buffer: Check pending waiters when doing wake ups as well ring-buffer: Add ring_buffer_wake_waiters() ring-buffer: Fix race between reset page and reading page tracing: Disable interrupt or preemption before acquiring arch_spinlock_t tracing: Wake up ring buffer waiters on closing of the file tracing: Wake up waiters when tracing is disabled tracing: Add ioctl() to force ring buffer waiters to wake up tracing: Move duplicate code of trace_kprobe/eprobe.c into header tracing: Add "(fault)" name injection to kernel probes tracing: Fix reading strings from synthetic events thunderbolt: Explicitly enable lane adapter hotplug events at startup efi: libstub: drop pointless get_memory_map() call media: cedrus: Set the platform driver data earlier media: cedrus: Fix endless loop in cedrus_h265_skip_bits() blk-wbt: call rq_qos_add() after wb_normal is initialized KVM: x86/emulator: Fix handing of POP SS to correctly set interruptibility KVM: nVMX: Unconditionally purge queued/injected events on nested "exit" KVM: nVMX: Don't propagate vmcs12's PERF_GLOBAL_CTRL settings to vmcs02 KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS staging: greybus: audio_helper: remove unused and wrong debugfs usage drm/nouveau/kms/nv140-: Disable interlacing drm/nouveau: fix a use-after-free in nouveau_gem_prime_import_sg_table() drm/i915: Fix watermark calculations for gen12+ RC CCS modifier drm/i915: Fix watermark calculations for gen12+ MC CCS modifier drm/i915: Fix watermark calculations for gen12+ CCS+CC modifier drm/amd/display: Fix vblank refcount in vrr transition smb3: must initialize two ACL struct fields to zero selinux: use "grep -E" instead of "egrep" ima: fix blocking of security.ima xattrs of unsupported algorithms userfaultfd: open userfaultfds with O_RDONLY ntfs3: rework xattr handlers and switch to POSIX ACL VFS helpers thermal: cpufreq_cooling: Check the policy first in cpufreq_cooling_register() sh: machvec: Use char[] for section boundaries MIPS: SGI-IP27: Free some unused memory MIPS: SGI-IP27: Fix platform-device leak in bridge_platform_create() ARM: 9244/1: dump: Fix wrong pg_level in walk_pmd() ARM: 9247/1: mm: set readonly for MT_MEMORY_RO with ARM_LPAE objtool: Preserve special st_shndx indexes in elf_update_symbol nfsd: Fix a memory leak in an error handling path SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation SUNRPC: Fix svcxdr_init_encode's buflen calculation NFSD: Protect against send buffer overflow in NFSv2 READDIR NFSD: Fix handling of oversized NFSv4 COMPOUND requests wifi: rtlwifi: 8192de: correct checking of IQK reload wifi: ath10k: add peer map clean up for peer delete in ath10k_sta_state() leds: lm3601x: Don't use mutex after it was destroyed bpf: Fix reference state management for synchronous callbacks wifi: mac80211: allow bw change during channel switch in mesh bpftool: Fix a wrong type cast in btf_dumper_int spi: mt7621: Fix an error message in mt7621_spi_probe() x86/resctrl: Fix to restore to original value when re-enabling hardware prefetch register xsk: Fix backpressure mechanism on Tx bpf: Disable preemption when increasing per-cpu map_locked bpf: Propagate error from htab_lock_bucket() to userspace bpf: Use this_cpu_{inc\|dec\|inc_return} for bpf_task_storage_busy Bluetooth: btusb: mediatek: fix WMT failure during runtime suspend wifi: rtl8xxxu: tighten bounds checking in rtl8xxxu_read_efuse() wifi: rtw88: add missing destroy_workqueue() on error path in rtw_core_init() selftests/xsk: Avoid use-after-free on ctx spi: qup: add missing clk_disable_unprepare on error in spi_qup_resume() spi: qup: add missing clk_disable_unprepare on error in spi_qup_pm_resume_runtime() wifi: rtl8xxxu: Fix skb misuse in TX queue selection spi: meson-spicc: do not rely on busy flag in pow2 clk ops bpf: btf: fix truncated last_member_type_id in btf_struct_resolve wifi: rtl8xxxu: gen2: Fix mistake in path B IQ calibration wifi: rtl8xxxu: Remove copy-paste leftover in gen2_update_rate_mask wifi: mt76: sdio: fix transmitting packet hangs wifi: mt76: mt7615: add mt7615_mutex_acquire/release in mt7615_sta_set_decap_offload wifi: mt76: mt7915: do not check state before configuring implicit beamform Bluetooth: RFCOMM: Fix possible deadlock on socket shutdown/release net: fs_enet: Fix wrong check in do_pd_setup bpf: Ensure correct locking around vulnerable function find_vpid() Bluetooth: hci_{ldisc,serdev}: check percpu_init_rwsem() failure netfilter: conntrack: fix the gc rescheduling delay netfilter: conntrack: revisit the gc initial rescheduling bias wifi: ath11k: fix number of VHT beamformee spatial streams x86/microcode/AMD: Track patch allocation size explicitly x86/cpu: Include the header of init_ia32_feat_ctl()'s prototype spi: dw: Fix PM disable depth imbalance in dw_spi_bt1_probe spi/omap100k:Fix PM disable depth imbalance in omap1_spi100k_probe skmsg: Schedule psock work if the cached skb exists on the psock i2c: mlxbf: support lock mechanism Bluetooth: hci_core: Fix not handling link timeouts propertly xfrm: Reinject transport-mode packets through workqueue netfilter: nft_fib: Fix for rpath check with VRF devices spi: s3c64xx: Fix large transfers with DMA wifi: rtl8xxxu: Fix AIFS written to REG_EDCA__PARAM vhost/vsock: Use kvmalloc/kvfree for larger packets. eth: alx: take rtnl_lock on resume mISDN: fix use-after-free bugs in l1oip timer handlers sctp: handle the error returned from sctp_auth_asoc_init_active_key tcp: fix tcp_cwnd_validate() to not forget is_cwnd_limited spi: Ensure that sg_table won't be used after being freed hwmon: (pmbus/mp2888) Fix sensors readouts for MPS Multi-phase mp2888 controller net: rds: don't hold sock lock when cancelling work from rds_tcp_reset_callbacks() bnx2x: fix potential memory leak in bnx2x_tpa_stop() net: wwan: iosm: Call mutex_init before locking it net/ieee802154: reject zero-sized raw_sendmsg() once: add DO_ONCE_SLOW() for sleepable contexts net: mvpp2: fix mvpp2 debugfs leak drm: bridge: adv7511: fix CEC power down control register offset drm: bridge: adv7511: unregister cec i2c device after cec adapter drm/bridge: Avoid uninitialized variable warning drm/mipi-dsi: Detach devices when removing the host drm/virtio: Correct drm_gem_shmem_get_sg_table() error handling drm/bridge: parade-ps8640: Fix regulator supply order drm/dp_mst: fix drm_dp_dpcd_read return value checks drm:pl111: Add of_node_put() when breaking out of for_each_available_child_of_node() ASoC: mt6359: fix tests for platform_get_irq() failure platform/chrome: fix double-free in chromeos_laptop_prepare() platform/chrome: fix memory corruption in ioctl ASoC: tas2764: Allow mono streams ASoC: tas2764: Drop conflicting set_bias_level power setting ASoC: tas2764: Fix mute/unmute platform/x86: msi-laptop: Fix old-ec check for backlight registering platform/x86: msi-laptop: Fix resource cleanup platform/chrome: cros_ec_typec: Correct alt mode index drm/amdgpu: add missing pci_disable_device() in amdgpu_pmops_runtime_resume() drm/bridge: megachips: Fix a null pointer dereference bug ASoC: rsnd: Add check for rsnd_mod_power_on ALSA: hda: beep: Simplify keep-power-at-enable behavior drm/bochs: fix blanking drm/omap: dss: Fix refcount leak bugs drm/amdgpu: Fix memory leak in hpd_rx_irq_create_workqueue() mmc: au1xmmc: Fix an error handling path in au1xmmc_probe() ASoC: eureka-tlv320: Hold reference returned from of_find_xxx API drm/msm/dpu: index dpu_kms->hw_vbif using vbif_idx drm/msm/dp: correct 1.62G link rate at dp_catalog_ctrl_config_msa() drm/vmwgfx: Fix memory leak in vmw_mksstat_add_ioctl() ASoC: codecs: tx-macro: fix kcontrol put ASoC: da7219: Fix an error handling path in da7219_register_dai_clks() ALSA: dmaengine: increment buffer pointer atomically mmc: wmt-sdmmc: Fix an error handling path in wmt_mci_probe() ASoC: wm8997: Fix PM disable depth imbalance in wm8997_probe ASoC: wm5110: Fix PM disable depth imbalance in wm5110_probe ASoC: wm5102: Fix PM disable depth imbalance in wm5102_probe ASoC: mt6660: Fix PM disable depth imbalance in mt6660_i2c_probe ALSA: hda/hdmi: Don't skip notification handling during PM operation memory: pl353-smc: Fix refcount leak bug in pl353_smc_probe() memory: of: Fix refcount leak bug in of_get_ddr_timings() memory: of: Fix refcount leak bug in of_lpddr3_get_ddr_timings() locks: fix TOCTOU race when granting write lease soc: qcom: smsm: Fix refcount leak bugs in qcom_smsm_probe() soc: qcom: smem_state: Add refcounting for the 'state->of_node' ARM: dts: imx6qdl-kontron-samx6i: hook up DDC i2c bus ARM: dts: turris-omnia: Fix mpp26 pin name and comment ARM: dts: kirkwood: lsxl: fix serial line ARM: dts: kirkwood: lsxl: remove first ethernet port ia64: export memory_add_physaddr_to_nid to fix cxl build error soc/tegra: fuse: Drop Kconfig dependency on TEGRA20_APB_DMA arm64: dts: ti: k3-j7200: fix main pinmux range ARM: dts: exynos: correct s5k6a3 reset polarity on Midas family ARM: Drop CMDLINE_ dependency on ATAGS ext4: don't run ext4lazyinit for read-only filesystems arm64: ftrace: fix module PLTs with mcount ARM: dts: exynos: fix polarity of VBUS GPIO of Origen iio: adc: at91-sama5d2_adc: fix AT91_SAMA5D2_MR_TRACKTIM_MAX iio: adc: at91-sama5d2_adc: check return status for pressure and touch iio: adc: at91-sama5d2_adc: lock around oversampling and sample freq iio: adc: at91-sama5d2_adc: disable/prepare buffer on suspend/resume iio: inkern: only release the device node when done with it iio: inkern: fix return value in devm_of_iio_channel_get_by_name() iio: ABI: Fix wrong format of differential capacitance channel ABI. iio: magnetometer: yas530: Change data type of hard_offsets to signed RDMA/mlx5: Don't compare mkey tags in DEVX indirect mkey usb: common: debug: Check non-standard control requests clk: meson: Hold reference returned by of_get_parent() clk: oxnas: Hold reference returned by of_get_parent() clk: qoriq: Hold reference returned by of_get_parent() clk: berlin: Add of_node_put() for of_get_parent() clk: sprd: Hold reference returned by of_get_parent() clk: tegra: Fix refcount leak in tegra210_clock_init clk: tegra: Fix refcount leak in tegra114_clock_init clk: tegra20: Fix refcount leak in tegra20_clock_init HSI: omap_ssi: Fix refcount leak in ssi_probe HSI: omap_ssi_port: Fix dma_map_sg error check media: exynos4-is: fimc-is: Add of_node_put() when breaking out of loop tty: xilinx_uartps: Fix the ignore_status media: meson: vdec: add missing clk_disable_unprepare on error in vdec_hevc_start() media: uvcvideo: Fix memory leak in uvc_gpio_parse media: uvcvideo: Use entity get_cur in uvc_ctrl_set media: xilinx: vipp: Fix refcount leak in xvip_graph_dma_init RDMA/rxe: Fix "kernel NULL pointer dereference" error RDMA/rxe: Fix the error caused by qp->sk misc: ocxl: fix possible refcount leak in afu_ioctl() fpga: prevent integer overflow in dfl_feature_ioctl_set_irq() dmaengine: hisilicon: Disable channels when unregister hisi_dma dmaengine: hisilicon: Fix CQ head update dmaengine: hisilicon: Add multi-thread support for a DMA channel dyndbg: fix static_branch manipulation dyndbg: fix module.dyndbg handling dyndbg: let query-modname override actual module name dyndbg: drop EXPORTed dynamic_debug_exec_queries clk: qcom: sm6115: Select QCOM_GDSC mtd: devices: docg3: check the return value of devm_ioremap() in the probe phy: amlogic: phy-meson-axg-mipi-pcie-analog: Hold reference returned by of_get_parent() phy: phy-mtk-tphy: fix the phy type setting issue mtd: rawnand: intel: Read the chip-select line from the correct OF node mtd: rawnand: intel: Remove undocumented compatible string mtd: rawnand: fsl_elbc: Fix none ECC mode RDMA/irdma: Align AE id codes to correct flush code and event RDMA/srp: Fix srp_abort() RDMA/siw: Always consume all skbuf data in sk_data_ready() upcall. RDMA/siw: Fix QP destroy to wait for all references dropped. ata: fix ata_id_sense_reporting_enabled() and ata_id_has_sense_reporting() ata: fix ata_id_has_devslp() ata: fix ata_id_has_ncq_autosense() ata: fix ata_id_has_dipm() mtd: rawnand: meson: fix bit map use in meson_nfc_ecc_correct() md: Replace snprintf with scnprintf md/raid5: Ensure stripe_fill happens on non-read IO with journal md/raid5: Remove unnecessary bio_put() in raid5_read_one_chunk() RDMA/cm: Use SLID in the work completion as the DLID in responder side IB: Set IOVA/LENGTH on IB_MR in core/uverbs layers xhci: Don't show warning for reinit on known broken suspend usb: gadget: function: fix dangling pnp_string in f_printer.c drivers: serial: jsm: fix some leaks in probe serial: 8250: Toggle IER bits on only after irq has been set up tty: serial: fsl_lpuart: disable dma rx/tx use flags in lpuart_dma_shutdown phy: qualcomm: call clk_disable_unprepare in the error handling staging: vt6655: fix some erroneous memory clean-up loops slimbus: qcom-ngd-ctrl: allow compile testing without QCOM_RPROC_COMMON firmware: google: Test spinlock on panic path to avoid lockups serial: 8250: Fix restoring termios speed after suspend scsi: libsas: Fix use-after-free bug in smp_execute_task_sg() scsi: iscsi: Rename iscsi_conn_queue_work() scsi: iscsi: Add recv workqueue helpers scsi: iscsi: Run recv path from workqueue scsi: iscsi: iscsi_tcp: Fix null-ptr-deref while calling getpeername() clk: qcom: apss-ipq6018: mark apcs_alias0_core_clk as critical clk: qcom: gcc-sm6115: Override default Alpha PLL regs RDMA/rxe: Fix resize_finish() in rxe_queue.c fsi: core: Check error number after calling ida_simple_get mfd: intel_soc_pmic: Fix an error handling path in intel_soc_pmic_i2c_probe() mfd: fsl-imx25: Fix an error handling path in mx25_tsadc_setup_irq() mfd: lp8788: Fix an error handling path in lp8788_probe() mfd: lp8788: Fix an error handling path in lp8788_irq_init() and lp8788_irq_init() mfd: fsl-imx25: Fix check for platform_get_irq() errors mfd: sm501: Add check for platform_driver_register() clk: mediatek: mt8183: mfgcfg: Propagate rate changes to parent dmaengine: ioat: stop mod_timer from resurrecting deleted timer in __cleanup() usb: mtu3: fix failed runtime suspend in host only mode spmi: pmic-arb: correct duplicate APID to PPID mapping logic clk: vc5: Fix 5P49V6901 outputs disabling when enabling FOD clk: baikal-t1: Fix invalid xGMAC PTP clock divider clk: baikal-t1: Add shared xGMAC ref/ptp clocks internal parent clk: baikal-t1: Add SATA internal ref clock buffer clk: bcm2835: fix bcm2835_clock_rate_from_divisor declaration clk: imx: scu: fix memleak on platform_device_add() fails clk: ti: dra7-atl: Fix reference leak in of_dra7_atl_clk_probe clk: ast2600: BCLK comes from EPLL mailbox: mpfs: fix handling of the reg property mailbox: mpfs: account for mbox offsets while sending mailbox: bcm-ferxrm-mailbox: Fix error check for dma_map_sg powerpc/configs: Properly enable PAPR_SCM in pseries_defconfig powerpc/math_emu/efp: Include module.h powerpc/sysdev/fsl_msi: Add missing of_node_put() powerpc/pci_dn: Add missing of_node_put() powerpc/powernv: add missing of_node_put() in opal_export_attrs() powerpc: Fix fallocate and fadvise64_64 compat parameter combination x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition powerpc/64s: Fix GENERIC_CPU build flags for PPC970 / G5 powerpc: Fix SPE Power ISA properties for e500v1 platforms powerpc/kprobes: Fix null pointer reference in arch_prepare_kprobe() powerpc/pseries/vas: Pass hw_cpu_id to node associativity HCALL crypto: sahara - don't sleep when in softirq crypto: hisilicon/zip - fix mismatch in get/set sgl_sge_nr hwrng: arm-smccc-trng - fix NO_ENTROPY handling cgroup: Honor caller's cgroup NS when resolving path hwrng: imx-rngc - Moving IRQ handler registering after imx_rngc_irq_mask_clear() crypto: qat - fix default value of WDT timer crypto: hisilicon/qm - fix missing put dfx access cgroup/cpuset: Enable update_tasks_cpumask() on top_cpuset iommu/omap: Fix buffer overflow in debugfs crypto: akcipher - default implementation for setting a private key crypto: ccp - Release dma channels before dmaengine unrgister crypto: inside-secure - Change swab to swab32 crypto: qat - fix DMA transfer direction cifs: return correct error in ->calc_signature() iommu/iova: Fix module config properly tracing: kprobe: Fix kprobe event gen test module on exit tracing: kprobe: Make gen test module work in arm and riscv tracing/osnoise: Fix possible recursive locking in stop_per_cpu_kthreads kbuild: remove the target in signal traps when interrupted kbuild: rpm-pkg: fix breakage when V=1 is used crypto: marvell/octeontx - prevent integer overflows crypto: cavium - prevent integer overflow loading firmware thermal/drivers/qcom/tsens-v0_1: Fix MSM8939 fourth sensor hw_id ACPI: APEI: do not add task_work to kernel thread to avoid memory leak f2fs: fix race condition on setting FI_NO_EXTENT flag f2fs: fix to account FS_CP_DATA_IO correctly selftest: tpm2: Add Client.__del__() to close /dev/tpm* handle fs: dlm: fix race in lowcomms rcu: Avoid triggering strict-GP irq-work when RCU is idle rcu: Back off upon fill_page_cache_func() allocation failure rcu-tasks: Convert RCU_LOCKDEP_WARN() to WARN_ONCE() ACPI: video: Add Toshiba Satellite/Portege Z830 quirk ACPI: tables: FPDT: Don't call acpi_os_map_memory() on invalid phys address cpufreq: intel_pstate: Add Tigerlake support in no-HWP mode MIPS: BCM47XX: Cast memcmp() of function to (void *) powercap: intel_rapl: fix UBSAN shift-out-of-bounds issue thermal: intel_powerclamp: Use get_cpu() instead of smp_processor_id() to avoid crash ARM: decompressor: Include .data.rel.ro.local ACPI: x86: Add a quirk for Dell Inspiron 14 2-in-1 for StorageD3Enable x86/entry: Work around Clang __bdos() bug NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data NFSD: fix use-after-free on source server when doing inter-server copy wifi: brcmfmac: fix invalid address access when enabling SCAN log level bpftool: Clear errno after libcap's checks ice: set tx_tstamps when creating new Tx rings via ethtool net: ethernet: ti: davinci_mdio: Add workaround for errata i2329 openvswitch: Fix double reporting of drops in dropwatch openvswitch: Fix overreporting of drops in dropwatch tcp: annotate data-race around tcp_md5sig_pool_populated x86/mce: Retrieve poison range from hardware wifi: ath9k: avoid uninit memory read in ath9k_htc_rx_msg() thunderbolt: Add back Intel Falcon Ridge end-to-end flow control workaround xfrm: Update ipcomp_scratches with NULL when freed iavf: Fix race between iavf_close and iavf_reset_task wifi: brcmfmac: fix use-after-free bug in brcmf_netdev_start_xmit() Bluetooth: btintel: Mark Intel controller to support LE_STATES quirk regulator: core: Prevent integer underflow wifi: mt76: mt7921: reset msta->airtime_ac while clearing up hw value Bluetooth: L2CAP: initialize delayed works at l2cap_chan_create() Bluetooth: hci_sysfs: Fix attempting to call device_add multiple times can: bcm: check the result of can_send() in bcm_can_tx() wifi: rt2x00: don't run Rt5592 IQ calibration on MT7620 wifi: rt2x00: set correct TX_SW_CFG1 MAC register for MT7620 wifi: rt2x00: set VGC gain for both chains of MT7620 wifi: rt2x00: set SoC wmac clock register wifi: rt2x00: correctly set BBP register 86 for MT7620 hwmon: (sht4x) do not overflow clamping operation on 32-bit platforms net: If sock is dead don't access sock's sk_wq in sk_stream_wait_memory Bluetooth: L2CAP: Fix user-after-free r8152: Rate limit overflow messages drm/nouveau/nouveau_bo: fix potential memory leak in nouveau_bo_alloc() drm: Use size_t type for len variable in drm_copy_field() drm: Prevent drm_copy_field() to attempt copying a NULL pointer drm/komeda: Fix handling of atomic commits in the atomic_commit_tail hook gpu: lontium-lt9611: Fix NULL pointer dereference in lt9611_connector_init() drm/amd/display: fix overflow on MIN_I64 definition udmabuf: Set ubuf->sg = NULL if the creation of sg table fails drm: bridge: dw_hdmi: only trigger hotplug event on link change ALSA: usb-audio: Register card at the last interface drm/vc4: vec: Fix timings for VEC modes drm: panel-orientation-quirks: Add quirk for Anbernic Win600 platform/chrome: cros_ec: Notify the PM of wake events during resume platform/x86: msi-laptop: Change DMI match / alias strings to fix module autoloading ASoC: SOF: pci: Change DMI match info to support all Chrome platforms drm/amdgpu: fix initial connector audio value drm/meson: reorder driver deinit sequence to fix use-after-free bug drm/meson: explicitly remove aggregate driver at module unload time mmc: sdhci-msm: add compatible string check for sdm670 drm/dp: Don't rewrite link config when setting phy test pattern drm/amd/display: Remove interface for periodic interrupt 1 ARM: dts: imx7d-sdb: config the max pressure for tsc2046 ARM: dts: imx6q: add missing properties for sram ARM: dts: imx6dl: add missing properties for sram ARM: dts: imx6qp: add missing properties for sram ARM: dts: imx6sl: add missing properties for sram ARM: dts: imx6sll: add missing properties for sram ARM: dts: imx6sx: add missing properties for sram kselftest/arm64: Fix validatation termination record after EXTRA_CONTEXT arm64: dts: imx8mq-librem5: Add bq25895 as max17055's power supply btrfs: dump extra info if one free space cache has more bitmaps than it should btrfs: scrub: try to fix super block errors btrfs: don't print information about space cache or tree every remount ARM: 9242/1: kasan: Only map modules if CONFIG_KASAN_VMALLOC=n clk: zynqmp: Fix stack-out-of-bounds in strncpy` media: cx88: Fix a null-ptr-deref bug in buffer_prepare() media: platform: fix some double free in meson-ge2d and mtk-jpeg and s5p-mfc clk: zynqmp: pll: rectify rate rounding in zynqmp_pll_round_rate usb: host: xhci-plat: suspend and resume clocks usb: host: xhci-plat: suspend/resume clks for brcm dmaengine: ti: k3-udma: Reset UDMA_CHAN_RT byte counters to prevent overflow scsi: 3w-9xxx: Avoid disabling device if failing to enable it nbd: Fix hung when signal interrupts nbd_start_device_ioctl() iommu/arm-smmu-v3: Make default domain type of HiSilicon PTT device to identity power: supply: adp5061: fix out-of-bounds read in adp5061_get_chg_type() staging: vt6655: fix potential memory leak blk-throttle: prevent overflow while calculating wait time ata: libahci_platform: Sanity check the DT child nodes number bcache: fix set_at_max_writeback_rate() for multiple attached devices soundwire: cadence: Don't overwrite msg->buf during write commands soundwire: intel: fix error handling on dai registration issues HID: roccat: Fix use-after-free in roccat_read() eventfd: guard wake_up in eventfd fs calls as well md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d usb: host: xhci: Fix potential memory leak in xhci_alloc_stream_info() usb: musb: Fix musb_gadget.c rxstate overflow bug arm64: dts: imx8mp: Add snps,gfladj-refclk-lpm-sel quirk to USB nodes usb: dwc3: core: Enable GUCTL1 bit 10 for fixing termination error after resume bug Revert "usb: storage: Add quirk for Samsung Fit flash" staging: rtl8723bs: fix potential memory leak in rtw_init_drv_sw() staging: rtl8723bs: fix a potential memory leak in rtw_init_cmd_priv() scsi: tracing: Fix compile error in trace_array calls when TRACING is disabled ext2: Use kvmalloc() for group descriptor array nvme: copy firmware_rev on each init nvmet-tcp: add bounds check on Transfer Tag usb: idmouse: fix an uninit-value in idmouse_open clk: bcm2835: Make peripheral PLLC critical clk: bcm2835: Round UART input clock up perf intel-pt: Fix segfault in intel_pt_print_info() with uClibc io_uring/af_unix: defer registered files gc to io_uring release io_uring: correct pinned_vm accounting io_uring/rw: fix short rw error handling io_uring/rw: fix error'ed retry return values io_uring/rw: fix unexpected link breakage mm: hugetlb: fix UAF in hugetlb_handle_userfault net: ieee802154: return -EINVAL for unknown addr type ALSA: usb-audio: Fix last interface check for registration blk-wbt: fix that 'rwb->wc' is always set to 1 in wbt_init() net: ethernet: ti: davinci_mdio: fix build for mdio bitbang uses Revert "net/ieee802154: reject zero-sized raw_sendmsg()" net/ieee802154: don't warn zero-sized raw_sendmsg() drm/amd/display: Fix build breakage with CONFIG_DEBUG_FS=n Kconfig.debug: simplify the dependency of DEBUG_INFO_DWARF4/5 Kconfig.debug: add toolchain checks for DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5 ext4: continue to expand file system when the target size doesn't reach thermal: intel_powerclamp: Use first online CPU as control_cpu gcov: support GCC 12.1 and newer compilers io-wq: Fix memory leak in worker creation Linux 5.15.75 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Id3dfe8285e830d01df7adb33729e78a75c1cac1a	2022-11-02 08:51:19 +01:00
Michal Hocko	cf38a05eb1	rcu: Back off upon fill_page_cache_func() allocation failure [ Upstream commit 093590c16b447f53e66771c8579ae66c96f6ef61 ] The fill_page_cache_func() function allocates couple of pages to store kvfree_rcu_bulk_data structures. This is a lightweight (GFP_NORETRY) allocation which can fail under memory pressure. The function will, however keep retrying even when the previous attempt has failed. This retrying is in theory correct, but in practice the allocation is invoked from workqueue context, which means that if the memory reclaim gets stuck, these retries can hog the worker for quite some time. Although the workqueues subsystem automatically adjusts concurrency, such adjustment is not guaranteed to happen until the worker context sleeps. And the fill_page_cache_func() function's retry loop is not guaranteed to sleep (see the should_reclaim_retry() function). And we have seen this function cause workqueue lockups: kernel: BUG: workqueue lockup - pool cpus=93 node=1 flags=0x1 nice=0 stuck for 32s! [...] kernel: pool 74: cpus=37 node=0 flags=0x1 nice=0 hung=32s workers=2 manager: 2146 kernel: pwq 498: cpus=249 node=1 flags=0x1 nice=0 active=4/256 refcnt=5 kernel: in-flight: 1917:fill_page_cache_func kernel: pending: dbs_work_handler, free_work, kfree_rcu_monitor Originally, we thought that the root cause of this lockup was several retries with direct reclaim, but this is not yet confirmed. Furthermore, we have seen similar lockups without any heavy memory pressure. This suggests that there are other factors contributing to these lockups. However, it is not really clear that endless retries are desireable. So let's make the fill_page_cache_func() function back off after allocation failure. Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-10-26 12:35:29 +02:00
Greg Kroah-Hartman	df06ef206f	Merge 5.15.39 into android14-5.15 Changes in 5.15.39 MIPS: Fix CP0 counter erratum detection for R4k CPUs parisc: Merge model and model name into one line in /proc/cpuinfo ALSA: hda/realtek: Add quirk for Yoga Duet 7 13ITL6 speakers ALSA: fireworks: fix wrong return count shorter than expected by 4 bytes mmc: sdhci-msm: Reset GCC_SDCC_BCR register for SDHC mmc: sunxi-mmc: Fix DMA descriptors allocated above 32 bits mmc: core: Set HS clock speed before sending HS CMD13 gpiolib: of: fix bounds check for 'gpio-reserved-ranges' x86/fpu: Prevent FPU state corruption KVM: x86/svm: Account for family 17h event renumberings in amd_pmc_perf_hw_id iommu/vt-d: Calculate mask for non-aligned flushes iommu/arm-smmu-v3: Fix size calculation in arm_smmu_mm_invalidate_range() drm/amd/display: Avoid reading audio pattern past AUDIO_CHANNELS_COUNT drm/amdgpu: do not use passthrough mode in Xen dom0 RISC-V: relocate DTB if it's outside memory region Revert "SUNRPC: attempt AF_LOCAL connect on setup" timekeeping: Mark NMI safe time accessors as notrace firewire: fix potential uaf in outbound_phy_packet_callback() firewire: remove check of list iterator against head past the loop body firewire: core: extend card->lock in fw_core_handle_bus_reset net: stmmac: disable Split Header (SPH) for Intel platforms genirq: Synchronize interrupt thread startup ASoC: da7219: Fix change notifications for tone generator frequency ASoC: wm8958: Fix change notifications for DSP controls ASoC: meson: Fix event generation for AUI ACODEC mux ASoC: meson: Fix event generation for G12A tohdmi mux ASoC: meson: Fix event generation for AUI CODEC mux s390/dasd: fix data corruption for ESE devices s390/dasd: prevent double format of tracks for ESE devices s390/dasd: Fix read for ESE with blksize < 4k s390/dasd: Fix read inconsistency for ESE DASD devices can: grcan: grcan_close(): fix deadlock can: isotp: remove re-binding of bound socket can: grcan: use ofdev->dev when allocating DMA memory can: grcan: grcan_probe(): fix broken system id check for errata workaround needs can: grcan: only use the NAPI poll budget for RX nfc: replace improper check device_is_registered() in netlink related functions nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs NFC: netlink: fix sleep in atomic bug when firmware download timeout gpio: visconti: Fix fwnode of GPIO IRQ gpio: pca953x: fix irq_stat not updated when irq is disabled (irq_mask not set) hwmon: (adt7470) Fix warning on module removal hwmon: (pmbus) disable PEC if not enabled ASoC: dmaengine: Restore NULL prepare_slave_config() callback ASoC: soc-ops: fix error handling iommu/vt-d: Drop stop marker messages iommu/dart: check return value after calling platform_get_resource() net/mlx5e: Fix trust state reset in reload net/mlx5e: Don't match double-vlan packets if cvlan is not set net/mlx5e: CT: Fix queued up restore put() executing after relevant ft release net/mlx5e: Fix the calling of update_buffer_lossy() API net/mlx5: Avoid double clear or set of sync reset requested net/mlx5: Fix deadlock in sync reset flow selftests/seccomp: Don't call read() on TTY from background pgrp SUNRPC release the transport of a relocated task with an assigned transport RDMA/siw: Fix a condition race issue in MPA request processing RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state RDMA/irdma: Reduce iWARP QP destroy time RDMA/irdma: Fix possible crash due to NULL netdev in notifier NFSv4: Don't invalidate inode attributes on delegation return net: ethernet: mediatek: add missing of_node_put() in mtk_sgmii_init() net: dsa: mt7530: add missing of_node_put() in mt7530_setup() net: stmmac: dwmac-sun8i: add missing of_node_put() in sun8i_dwmac_register_mdio_mux() net: mdio: Fix ENOMEM return value in BCM6368 mux bus controller net: cpsw: add missing of_node_put() in cpsw_probe_dt() net: igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter() net: emaclite: Add error handling for of_address_to_resource() selftests/net: so_txtime: fix parsing of start time stamp on 32 bit systems selftests/net: so_txtime: usage(): fix documentation of default clock drm/msm/dp: remove fail safe mode related code btrfs: do not BUG_ON() on failure to update inode when setting xattr hinic: fix bug of wq out of bound access mld: respect RCU rules in ip6_mc_source() and ip6_mc_msfilter() rxrpc: Enable IPv6 checksums on transport socket selftests: mirror_gre_bridge_1q: Avoid changing PVID while interface is operational bnxt_en: Fix possible bnxt_open() failure caused by wrong RFS flag bnxt_en: Fix unnecessary dropping of RX packets selftests: ocelot: tc_flower_chains: specify conform-exceed action for policer smsc911x: allow using IRQ0 btrfs: force v2 space cache usage for subpage mount btrfs: always log symlinks in full mode drm/amdgpu: unify BO evicting method in amdgpu_ttm drm/amdgpu: explicitly check for s0ix when evicting resources drm/amdgpu: don't set s3 and s0ix at the same time drm/amdgpu: Ensure HDA function is suspended before ASIC reset gpio: mvebu: drop pwm base assignment kvm: x86/cpuid: Only provide CPUID leaf 0xA if host has architectural PMU fbdev: Make fb_release() return -ENODEV if fbdev was unregistered net/mlx5: Fix slab-out-of-bounds while reading resource dump menu net/mlx5e: Lag, Fix use-after-free in fib event handler net/mlx5e: Lag, Fix fib_info pointer assignment net/mlx5e: Lag, Don't skip fib events on current dst iommu/dart: Add missing module owner to ops structure kvm: selftests: do not use bitfields larger than 32-bits for PTEs KVM: selftests: Silence compiler warning in the kvm_page_table_test x86/kvm: Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume KVM: x86: Do not change ICR on write to APIC_SELF_IPI KVM: x86/mmu: avoid NULL-pointer dereference on page freeing bugs KVM: LAPIC: Enable timer posted-interrupt only when mwait/hlt is advertised selftest/vm: verify mmap addr in mremap_test selftest/vm: verify remap destination address in mremap_test mmc: rtsx: add 74 Clocks in power on flow Revert "parisc: Mark sched_clock unstable only if clocks are not syncronized" rcu: Fix callbacks processing time limit retaining cond_resched() rcu: Apply callbacks processing time limit only on softirq PCI: pci-bridge-emul: Add description for class_revision field PCI: pci-bridge-emul: Add definitions for missing capabilities registers PCI: aardvark: Add support for DEVCAP2, DEVCTL2, LNKCAP2 and LNKCTL2 registers on emulated bridge PCI: aardvark: Clear all MSIs at setup PCI: aardvark: Comment actions in driver remove method PCI: aardvark: Disable bus mastering when unbinding driver PCI: aardvark: Mask all interrupts when unbinding driver PCI: aardvark: Fix memory leak in driver unbind PCI: aardvark: Assert PERST# when unbinding driver PCI: aardvark: Disable link training when unbinding driver PCI: aardvark: Disable common PHY when unbinding driver PCI: aardvark: Replace custom PCIE_CORE_INT_* macros with PCI_INTERRUPT_* PCI: aardvark: Rewrite IRQ code to chained IRQ handler PCI: aardvark: Check return value of generic_handle_domain_irq() when processing INTx IRQ PCI: aardvark: Make MSI irq_chip structures static driver structures PCI: aardvark: Make msi_domain_info structure a static driver structure PCI: aardvark: Use dev_fwnode() instead of of_node_to_fwnode(dev->of_node) PCI: aardvark: Refactor unmasking summary MSI interrupt PCI: aardvark: Add support for masking MSI interrupts PCI: aardvark: Fix setting MSI address PCI: aardvark: Enable MSI-X support PCI: aardvark: Add support for ERR interrupt on emulated bridge PCI: aardvark: Optimize writing PCI_EXP_RTCTL_PMEIE and PCI_EXP_RTSTA_PME on emulated bridge PCI: aardvark: Add support for PME interrupts PCI: aardvark: Fix support for PME requester on emulated bridge PCI: aardvark: Use separate INTA interrupt for emulated root bridge PCI: aardvark: Remove irq_mask_ack() callback for INTx interrupts PCI: aardvark: Don't mask irq when mapping PCI: aardvark: Drop __maybe_unused from advk_pcie_disable_phy() PCI: aardvark: Update comment about link going down after link-up Linux 5.15.39 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ibd053f0980e7d7eb5659ec6cfa7689c6d9bc416e	2022-06-09 10:28:43 +02:00
Frederic Weisbecker	0060c7bd9e	rcu: Apply callbacks processing time limit only on softirq commit a554ba288845fd3f6f12311fd76a51694233458a upstream. Time limit only makes sense when callbacks are serviced in softirq mode because: _ In case we need to get back to the scheduler, cond_resched_tasks_rcu_qs() is called after each callback. _ In case some other softirq vector needs the CPU, the call to local_bh_enable() before cond_resched_tasks_rcu_qs() takes care about them via a call to do_softirq(). Therefore, make sure the time limit only applies to softirq mode. Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [UR: backport to 5.15-stable] Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-12 12:30:26 +02:00
Frederic Weisbecker	2c5029d652	rcu: Fix callbacks processing time limit retaining cond_resched() commit 3e61e95e2d095e308616cba4ffb640f95a480e01 upstream. The callbacks processing time limit makes sure we are not exceeding a given amount of time executing the queue. However its "continue" clause bypasses the cond_resched() call on rcuc and NOCB kthreads, delaying it until we reach the limit, which can be very long... Make sure the scheduler has a higher priority than the time limit. Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [UR: backport to 5.15-stable + commit update] Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-12 12:30:26 +02:00
Kalesh Singh	46a5d62344	FROMGIT: EXP rcu: Move expedited grace period (GP) work to RT kthread_worker Enabling CONFIG_RCU_BOOST did not reduce RCU expedited grace-period latency because its workqueues run at SCHED_OTHER, and thus can be delayed by normal processes. This commit avoids these delays by moving the expedited GP work items to a real-time-priority kthread_worker. This option is controlled by CONFIG_RCU_EXP_KTHREAD and disabled by default on PREEMPT_RT=y kernels which disable expedited grace periods after boot by unconditionally setting rcupdate.rcu_normal_after_boot=1. The results were evaluated on arm64 Android devices (6GB ram) running 5.10 kernel, and capturing trace data in critical user-level code. The table below shows the resulting order-of-magnitude improvements in synchronize_rcu_expedited() latency: ------------------------------------------------------------------------ \| \| workqueues \| kthread_worker \| Diff \| ------------------------------------------------------------------------ \| Count \| 725 \| 688 \| \| ------------------------------------------------------------------------ \| Min Duration (ns) \| 326 \| 447 \| 37.12% \| ------------------------------------------------------------------------ \| Q1 (ns) \| 39,428 \| 38,971 \| -1.16% \| ------------------------------------------------------------------------ \| Q2 - Median (ns) \| 98,225 \| 69,743 \| -29.00% \| ------------------------------------------------------------------------ \| Q3 (ns) \| 342,122 \| 126,638 \| -62.98% \| ------------------------------------------------------------------------ \| Max Duration (ns) \| 372,766,967 \| 2,329,671 \| -99.38% \| ------------------------------------------------------------------------ \| Avg Duration (ns) \| 2,746,353 \| 151,242 \| -94.49% \| ------------------------------------------------------------------------ \| Standard Deviation (ns) \| 19,327,765 \| 294,408 \| \| ------------------------------------------------------------------------ The below table show the range of maximums/minimums for synchronize_rcu_expedited() latency from all experiments: ------------------------------------------------------------------------ \| \| workqueues \| kthread_worker \| Diff \| ------------------------------------------------------------------------ \| Total No. of Experiments \| 25 \| 23 \| \| ------------------------------------------------------------------------ \| Largest Maximum (ns) \| 372,766,967 \| 2,329,671 \| -99.38% \| ------------------------------------------------------------------------ \| Smallest Maximum (ns) \| 38,819 \| 86,954 \| 124.00% \| ------------------------------------------------------------------------ \| Range of Maximums (ns) \| 372,728,148 \| 2,242,717 \| \| ------------------------------------------------------------------------ \| Largest Minimum (ns) \| 88,623 \| 27,588 \| -68.87% \| ------------------------------------------------------------------------ \| Smallest Minimum (ns) \| 326 \| 447 \| 37.12% \| ------------------------------------------------------------------------ \| Range of Minimums (ns) \| 88,297 \| 27,141 \| \| ------------------------------------------------------------------------ Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Tejun Heo <tj@kernel.org> Reported-by: Tim Murray <timmurray@google.com> Reported-by: Wei Wang <wvw@google.com> Tested-by: Kyle Lin <kylelin@google.com> Tested-by: Chunwei Lu <chunweilu@google.com> Tested-by: Lulu Wang <luluw@google.com> Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20220409003527.1587028-1-kaleshsingh@google.com/ (cherry picked from commit 3902dd17a29bd3ed1ead364a331a1761edb7162b git: //git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git fastexp.2022.04.11a) Bug: 224791892 Change-Id: I4cc5d28f9ae99e44e92afc43ff4db4b71c415d6c	2022-04-19 23:20:40 +00:00
Jun Miao	2f8e463885	UPSTREAM: rcu: Avoid alloc_pages() when recording stack The default kasan_record_aux_stack() calls stack_depot_save() with GFP_NOWAIT, which in turn can then call alloc_pages(GFP_NOWAIT, ...). In general, however, it is not even possible to use either GFP_ATOMIC nor GFP_NOWAIT in certain non-preemptive contexts/RT kernel including raw_spin_locks (see gfp.h and `ab00db216c`). Fix it by instructing stackdepot to not expand stack storage via alloc_pages() in case it runs out by using kasan_record_aux_stack_noalloc(). Jianwei Hu reported: BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:969 in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 15319, name: python3 INFO: lockdep is turned off. irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffff856c8b13>] copy_process+0xaf3/0x2590 softirqs last enabled at (0): [<ffffffff856c8b13>] copy_process+0xaf3/0x2590 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 6 PID: 15319 Comm: python3 Tainted: G W O 5.15-rc7-preempt-rt #1 Hardware name: Supermicro SYS-E300-9A-8C/A2SDi-8C-HLN4F, BIOS 1.1b 12/17/2018 Call Trace: show_stack+0x52/0x58 dump_stack+0xa1/0xd6 ___might_sleep.cold+0x11c/0x12d rt_spin_lock+0x3f/0xc0 rmqueue+0x100/0x1460 rmqueue+0x100/0x1460 mark_usage+0x1a0/0x1a0 ftrace_graph_ret_addr+0x2a/0xb0 rmqueue_pcplist.constprop.0+0x6a0/0x6a0 __kasan_check_read+0x11/0x20 __zone_watermark_ok+0x114/0x270 get_page_from_freelist+0x148/0x630 is_module_text_address+0x32/0xa0 __alloc_pages_nodemask+0x2f6/0x790 __alloc_pages_slowpath.constprop.0+0x12d0/0x12d0 create_prof_cpu_mask+0x30/0x30 alloc_pages_current+0xb1/0x150 stack_depot_save+0x39f/0x490 kasan_save_stack+0x42/0x50 kasan_save_stack+0x23/0x50 kasan_record_aux_stack+0xa9/0xc0 __call_rcu+0xff/0x9c0 call_rcu+0xe/0x10 put_object+0x53/0x70 __delete_object+0x7b/0x90 kmemleak_free+0x46/0x70 slab_free_freelist_hook+0xb4/0x160 kfree+0xe5/0x420 kfree_const+0x17/0x30 kobject_cleanup+0xaa/0x230 kobject_put+0x76/0x90 netdev_queue_update_kobjects+0x17d/0x1f0 ... ... ksys_write+0xd9/0x180 __x64_sys_write+0x42/0x50 do_syscall_64+0x38/0x50 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Links: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/kasan.h?id=7cb3007ce2da27ec02a1a3211941e7fe6875b642 Fixes: `84109ab585` ("rcu: Record kvfree_call_rcu() call stack for KASAN") Fixes: `26e760c9a7` ("rcu: kasan: record and print call_rcu() call stack") Reported-by: Jianwei Hu <jianwei.hu@windriver.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Acked-by: Marco Elver <elver@google.com> Tested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Jun Miao <jun.miao@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> (cherry picked from commit 300c0c5e721834f484b03fa3062602dd8ff48413) Bug: 217222520 Change-Id: I5b7d4f9dfe5da290627599d8d6de3278debb2a13 Signed-off-by: Andrey Konovalov <andreyknvl@google.com>	2022-02-14 15:50:54 +01:00
Paul E. McKenney	c3156dbd50	rcu: Tighten rcu_advance_cbs_nowake() checks commit 614ddad17f22a22e035e2ea37a04815f50362017 upstream. Currently, rcu_advance_cbs_nowake() checks that a grace period is in progress, however, that grace period could end just after the check. This commit rechecks that a grace period is still in progress while holding the rcu_node structure's lock. The grace period cannot end while the current CPU's rcu_node structure's ->lock is held, thus avoiding false positives from the WARN_ON_ONCE(). As Daniel Vacek noted, it is not necessary for the rcu_node structure to have a CPU that has not yet passed through its quiescent state. Tested-by: Guillaume Morin <guillaume@morinfr.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-01-29 10:58:25 +01:00
Paul E. McKenney	a96ac0688a	rcu: Mark accesses to rcu_state.n_force_qs commit 2431774f04d1050292054c763070021bade7b151 upstream. This commit marks accesses to the rcu_state.n_force_qs. These data races are hard to make happen, but syzkaller was equal to the task. Reported-by: syzbot+e08a83a1940ec3846cd5@syzkaller.appspotmail.com Acked-by: Marco Elver <elver@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-12-22 09:32:51 +01:00
Peter Zijlstra	d846b69dc7	rcu: Fix rcu_dynticks_curr_cpu_in_eqs() vs noinstr [ Upstream commit 74aece72f95f399dd29363669dc32a1344c8fab4 ] vmlinux.o: warning: objtool: rcu_nmi_enter()+0x36: call to __kasan_check_read() leaves .noinstr.text section noinstr cannot have atomic_() functions in because they're explicitly annotated, use arch_atomic_(). Fixes: `2be57f7328` ("rcu: Weaken ->dynticks accesses and updates") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-18 19:16:30 +01:00
Paul E. McKenney	b770efc460	Merge branches 'doc.2021.07.20c', 'fixes.2021.08.06a', 'nocb.2021.07.20c', 'nolibc.2021.07.20c', 'tasks.2021.07.20c', 'torture.2021.07.27a' and 'torturescript.2021.07.27a' into HEAD doc.2021.07.20c: Documentation updates. fixes.2021.08.06a: Miscellaneous fixes. nocb.2021.07.20c: Callback-offloading (NOCB CPU) updates. nolibc.2021.07.20c: Tiny userspace library updates. tasks.2021.07.20c: Tasks RCU updates. torture.2021.07.27a: In-kernel torture-test updates. torturescript.2021.07.27a: Torture-test scripting updates.	2021-08-10 11:00:53 -07:00
Sebastian Andrzej Siewior	d3dd95a885	rcu: Replace deprecated CPU-hotplug functions The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock(). Replace deprecated CPU-hotplug functions with the official version. The behavior remains unchanged. Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: rcu@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-08-10 10:47:32 -07:00
Liu Song	8211e922de	rcu: Use per_cpu_ptr to get the pointer of per_cpu variable There are a few remaining locations in kernel/rcu that still use "&per_cpu()". This commit replaces them with "per_cpu_ptr(&)", and does not introduce any functional change. Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Liu Song <liu.song11@zte.com.cn> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-08-06 13:41:49 -07:00
Liu Song	eb880949ef	rcu: Remove useless "ret" update in rcu_gp_fqs_loop() Within rcu_gp_fqs_loop(), the "ret" local variable is set to the return value from swait_event_idle_timeout_exclusive(), but "ret" is unconditionally overwritten later in the code. This commit therefore removes this useless assignment. Signed-off-by: Liu Song <liu.song11@zte.com.cn> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-08-06 13:41:48 -07:00
Paul E. McKenney	f74126dcbc	rcu: Make rcu_gp_init() and rcu_gp_fqs_loop noinline to conserve stack The kbuild test project found an oversized stack frame in rcu_gp_kthread() for some kernel configurations. This oversizing was due to a very large amount of inlining, which is unnecessary due to the fact that this code executes infrequently. This commit therefore marks rcu_gp_init() and rcu_gp_fqs_loop noinline_for_stack to conserve stack space. Reported-by: kernel test robot <lkp@intel.com> Tested-by: Rong Chen <rong.a.chen@intel.com> [ paulmck: noinline_for_stack per Nathan Chancellor. ] Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-08-06 13:41:48 -07:00
Paul E. McKenney	2be57f7328	rcu: Weaken ->dynticks accesses and updates Accesses to the rcu_data structure's ->dynticks field have always been fully ordered because it was not possible to prove that weaker ordering was safe. However, with the removal of the rcu_eqs_special_set() function and the advent of the Linux-kernel memory model, it is now easy to show that two of the four original full memory barriers can be weakened to acquire and release operations. The remaining pair must remain full memory barriers. This change makes the memory ordering requirements more evident, and it might well also speed up the to-idle and from-idle fastpaths on some architectures. The following litmus test, adapted from one supplied off-list by Frederic Weisbecker, models the RCU grace-period kthread detecting an idle CPU that is concurrently transitioning to non-idle: C dynticks-from-idle { DYNTICKS=0; (* Initially idle. ) } P0(int X, int DYNTICKS) { int dynticks; int x; // Idle. dynticks = READ_ONCE(DYNTICKS); smp_store_release(DYNTICKS, dynticks + 1); smp_mb(); // Now non-idle x = READ_ONCE(X); } P1(int X, int DYNTICKS) { int dynticks; WRITE_ONCE(X, 1); smp_mb(); dynticks = smp_load_acquire(DYNTICKS); } exists (1:dynticks=0 /\ 0:x=1) Running "herd7 -conf linux-kernel.cfg dynticks-from-idle.litmus" verifies this transition, namely, showing that if the RCU grace-period kthread (P1) sees another CPU as idle (P0), then any memory access prior to the start of the grace period (P1's write to X) will be seen by any RCU read-side critical section following the to-non-idle transition (P0's read from X). This is a straightforward use of full memory barriers to force ordering in a store-buffering (SB) litmus test. The following litmus test, also adapted from the one supplied off-list by Frederic Weisbecker, models the RCU grace-period kthread detecting a non-idle CPU that is concurrently transitioning to idle: C dynticks-into-idle { DYNTICKS=1; (* Initially non-idle. ) } P0(int X, int DYNTICKS) { int dynticks; // Non-idle. WRITE_ONCE(X, 1); dynticks = READ_ONCE(DYNTICKS); smp_store_release(DYNTICKS, dynticks + 1); smp_mb(); // Now idle. } P1(int X, int DYNTICKS) { int x; int dynticks; smp_mb(); dynticks = smp_load_acquire(DYNTICKS); x = READ_ONCE(X); } exists (1:dynticks=2 /\ 1:x=0) Running "herd7 -conf linux-kernel.cfg dynticks-into-idle.litmus" verifies this transition, namely, showing that if the RCU grace-period kthread (P1) sees another CPU as newly idle (P0), then any pre-idle memory access (P0's write to X) will be seen by any code following the grace period (P1's read from X). This is a simple release-acquire pair forcing ordering in a message-passing (MP) litmus test. Of course, if the grace-period kthread detects the CPU as non-idle, it will refrain from reporting a quiescent state on behalf of that CPU, so there are no ordering requirements from the grace-period kthread in that case. However, other subsystems call rcu_is_idle_cpu() to check for CPUs being non-idle from an RCU perspective. That case is also verified by the above litmus tests with the proviso that the sense of the low-order bit of the DYNTICKS counter be inverted. Unfortunately, on x86 smp_mb() is as expensive as a cache-local atomic increment. This commit therefore weakens only the read from ->dynticks. However, the updates are abstracted into a rcu_dynticks_inc() function to ease any future changes that might be needed. [ paulmck: Apply Linus Torvalds feedback. ] Link: https://lore.kernel.org/lkml/20210721202127.2129660-4-paulmck@kernel.org/ Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-08-06 13:41:48 -07:00
Joel Fernandes (Google)	a86baa69c2	rcu: Remove special bit at the bottom of the ->dynticks counter Commit `b8c17e6664` ("rcu: Maintain special bits at bottom of ->dynticks counter") reserved a bit at the bottom of the ->dynticks counter to defer flushing of TLBs, but this facility never has been used. This commit therefore removes this capability along with the rcu_eqs_special_set() function used to trigger it. Link: https://lore.kernel.org/linux-doc/CALCETrWNPOOdTrFabTDd=H7+wc6xJ9rJceg6OL1S0rTV5pfSsA@mail.gmail.com/ Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: "Joel Fernandes (Google)" <joel@joelfernandes.org> [ paulmck: Forward-port to v5.13-rc1. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-08-06 13:41:48 -07:00
Frederic Weisbecker	cba712beeb	rcu/nocb: Remove NOCB deferred wakeup from rcutree_dead_cpu() At CPU offline time, we must handle any pending wakeup for the nocb_gp kthread linked to the outgoing CPU. Now we are making sure of that twice: 1) From rcu_report_dead() when the outgoing CPU makes the very last local cleanups by itself before switching offline. 2) From rcutree_dead_cpu(). Here the offlining CPU has gone and is truly now offline. Another CPU takes care of post-portem cleaning up and check if the offline CPU had pending wakeup. Both ways are fine but we have to choose one or the other because we don't need to repeat that action. Simply benefit from cache locality and keep only the first solution. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-07-20 13:41:51 -07:00
Frederic Weisbecker	dfcb275402	rcu/nocb: Start moving nocb code to its own plugin file The kernel/rcu/tree_plugin.h file contains not only the plugins for preemptible RCU, but also many other features including rcu_nocbs callback offloading. This offloading has become large and complex, so it is time to put it in its own file. This commit starts that process. Suggested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> [ paulmck: Rename to tree_nocb.h, add Frederic as author. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-07-20 13:41:51 -07:00
Linus Torvalds	28e92f9903	Merge branch 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: - Bitmap parsing support for "all" as an alias for all bits - Documentation updates - Miscellaneous fixes, including some that overlap into mm and lockdep - kvfree_rcu() updates - mem_dump_obj() updates, with acks from one of the slab-allocator maintainers - RCU NOCB CPU updates, including limited deoffloading - SRCU updates - Tasks-RCU updates - Torture-test updates * 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (78 commits) tasks-rcu: Make show_rcu_tasks_gp_kthreads() be static inline rcu-tasks: Make ksoftirqd provide RCU Tasks quiescent states rcu: Add missing __releases() annotation rcu: Remove obsolete rcu_read_unlock() deadlock commentary rcu: Improve comments describing RCU read-side critical sections rcu: Create an unrcu_pointer() to remove __rcu from a pointer srcu: Early test SRCU polling start rcu: Fix various typos in comments rcu/nocb: Unify timers rcu/nocb: Prepare for fine-grained deferred wakeup rcu/nocb: Only cancel nocb timer if not polling rcu/nocb: Delete bypass_timer upon nocb_gp wakeup rcu/nocb: Cancel nocb_timer upon nocb_gp wakeup rcu/nocb: Allow de-offloading rdp leader rcu/nocb: Directly call __wake_nocb_gp() from bypass timer rcu: Don't penalize priority boosting when there is nothing to boost rcu: Point to documentation of ordering guarantees rcu: Make rcu_gp_cleanup() be noinline for tracing rcu: Restrict RCU_STRICT_GRACE_PERIOD to at most four CPUs rcu: Make show_rcu_gp_kthreads() dump rcu_node structures blocking GP ...	2021-07-04 12:58:33 -07:00
Andy Shevchenko	f39650de68	kernel.h: split out panic and oops helpers kernel.h is being used as a dump for all kinds of stuff for a long time. Here is the attempt to start cleaning it up by splitting out panic and oops helpers. There are several purposes of doing this: - dropping dependency in bug.h - dropping a loop by moving out panic_notifier.h - unload kernel.h from something which has its own domain At the same time convert users tree-wide to use new headers, although for the time being include new header back to kernel.h to avoid twisted indirected includes for existing users. [akpm@linux-foundation.org: thread_info.h needs limits.h] [andriy.shevchenko@linux.intel.com: ia64 fix] Link: https://lkml.kernel.org/r/20210520130557.55277-1-andriy.shevchenko@linux.intel.com Link: https://lkml.kernel.org/r/20210511074137.33666-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Co-developed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Sebastian Reichel <sre@kernel.org> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Stephen Boyd <sboyd@kernel.org> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Acked-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-07-01 11:06:04 -07:00
Paul E. McKenney	641faf1b90	Merge branches 'bitmaprange.2021.05.10c', 'doc.2021.05.10c', 'fixes.2021.05.13a', 'kvfree_rcu.2021.05.10c', 'mmdumpobj.2021.05.10c', 'nocb.2021.05.12a', 'srcu.2021.05.12a', 'tasks.2021.05.18a' and 'torture.2021.05.10c' into HEAD bitmaprange.2021.05.10c: Allow "all" for bitmap ranges. doc.2021.05.10c: Documentation updates. fixes.2021.05.13a: Miscellaneous fixes. kvfree_rcu.2021.05.10c: kvfree_rcu() updates. mmdumpobj.2021.05.10c: mem_dump_obj() updates. nocb.2021.05.12a: RCU NOCB CPU updates, including limited deoffloading. srcu.2021.05.12a: SRCU updates. tasks.2021.05.18a: Tasks-RCU updates. torture.2021.05.10c: Torture-test updates.	2021-05-18 10:56:19 -07:00
Paul E. McKenney	cf868c2af2	rcu-tasks: Make ksoftirqd provide RCU Tasks quiescent states Heavy networking load can cause a CPU to execute continuously and indefinitely within ksoftirqd, in which case there will be no voluntary task switches and thus no RCU-tasks quiescent states. This commit therefore causes the exiting rcu_softirq_qs() to provide an RCU-tasks quiescent state. This of course means that __do_softirq() and its callers cannot be invoked from within a tracing trampoline. Reported-by: Toke Høiland-Jørgensen <toke@redhat.com> Tested-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org>	2021-05-18 10:54:51 -07:00
Paul E. McKenney	1893afd634	rcu: Improve comments describing RCU read-side critical sections There are a number of places that call out the fact that preempt-disable regions of code now act as RCU read-side critical sections, where preempt-disable regions of code include irq-disable regions of code, bh-disable regions of code, hardirq handlers, and NMI handlers. However, someone relying solely on (for example) the call_rcu() header comment might well have no idea that preempt-disable regions of code have RCU semantics. This commit therefore updates the header comments for call_rcu(), synchronize_rcu(), rcu_dereference_bh_check(), and rcu_dereference_sched_check() to call out these new(ish) forms of RCU readers. Reported-by: Michel Lespinasse <michel@lespinasse.org> [ paulmck: Apply Matthew Wilcox and Michel Lespinasse feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-13 09:13:23 -07:00
Ingo Molnar	a616aec9aa	rcu: Fix various typos in comments Fix ~12 single-word typos in RCU code comments. [ paulmck: Apply feedback from Randy Dunlap. ] Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-12 12:11:05 -07:00
Frederic Weisbecker	870905169d	rcu/nocb: Prepare for fine-grained deferred wakeup Tuning the deferred wakeup level must be done from a safe wakeup point. Currently those sites are: * ->nocb_timer * user/idle/guest entry * CPU down * softirq/rcuc All of these sites perform the wake up for both RCU_NOCB_WAKE and RCU_NOCB_WAKE_FORCE. In order to merge ->nocb_timer and ->nocb_bypass_timer together, we plan to add a new RCU_NOCB_WAKE_BYPASS that really should be deferred until a timer fires so that we don't wake up the NOCB-gp kthread too early. To prepare for that, this commit specifies the per-callsite wakeup level/limit. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> [ paulmck: Fix non-NOCB rcu_nocb_need_deferred_wakeup() definition. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-12 12:10:23 -07:00
Paul E. McKenney	3d3a0d1b50	rcu: Point to documentation of ordering guarantees Add comments to synchronize_rcu() and friends that point to Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-10 16:22:54 -07:00
Paul E. McKenney	2f20de99a6	rcu: Make rcu_gp_cleanup() be noinline for tracing Although there are trace events for RCU grace periods, these are only enabled in CONFIG_RCU_TRACE=y kernels. This commit therefore marks rcu_gp_cleanup() noinline in order to provide a function that can be traced that is invoked near the end of each grace period. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-10 16:22:54 -07:00
Paul E. McKenney	3ef5a1c382	rcu: Make RCU priority boosting work on single-CPU rcu_node structures When any CPU comes online, it checks to see if an RCU-boost kthread has already been created for that CPU's leaf rcu_node structure, and if not, it creates one. Unfortunately, it also verifies that this leaf rcu_node structure actually has at least one online CPU, and if not, it declines to create the kthread. Although this behavior makes sense during early boot, especially on systems that claim far more CPUs than they actually have, it makes no sense for the first CPU to come online for a given rcu_node structure. There is no point in checking because we know there is a CPU on its way in. The problem is that timing differences can cause this incoming CPU to not yet be reflected in the various bit masks even at rcutree_online_cpu() time, and there is no chance at rcutree_prepare_cpu() time. Plus it would be better to create the RCU-boost kthread at rcutree_prepare_cpu() to handle the case where the CPU is involved in an RCU priority inversion very shortly after it comes online. This commit therefore moves the checking to rcu_prepare_kthreads(), which is called only at early boot, when the check is appropriate. In addition, it makes rcutree_prepare_cpu() invoke rcu_spawn_one_boost_kthread(), which no longer does any checking for online CPUs. With this change, RCU priority boosting tests now pass for short rcutorture runs, even with single-CPU leaf rcu_node structures. Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Scott Wood <swood@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-10 16:22:54 -07:00
Paul E. McKenney	8e4b1d2bc1	rcu: Invoke rcu_spawn_core_kthreads() from rcu_spawn_gp_kthread() Currently, rcu_spawn_core_kthreads() is invoked via an early_initcall(), which works, except that rcu_spawn_gp_kthread() is also invoked via an early_initcall() and rcu_spawn_core_kthreads() relies on adjustments to kthread_prio that are carried out by rcu_spawn_gp_kthread(). There is no guaranttee of ordering among early_initcall() handlers, and thus no guarantee that kthread_prio will be properly checked and range-limited at the time that rcu_spawn_core_kthreads() needs it. In most cases, this bug is harmless. After all, the only reason that rcu_spawn_gp_kthread() adjusts the value of kthread_prio is if the user specified a nonsensical value for this boot parameter, which experience indicates is rare. Nevertheless, a bug is a bug. This commit therefore causes the rcu_spawn_core_kthreads() function to be invoked directly from rcu_spawn_gp_kthread() after any needed adjustments to kthread_prio have been carried out. Fixes: `48d07c04b4` ("rcu: Enable elimination of Tree-RCU softirq processing") Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-10 16:22:54 -07:00
Zhouyi Zhou	277ffe1b70	rcu: Improve tree.c comments and add code cleanups This commit cleans up some comments and code in kernel/rcu/tree.c. Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-10 16:22:53 -07:00
Paul E. McKenney	ce7c169dee	rcu: Remove the unused rcu_irq_exit_preempt() function Commit `9ee01e0f69` ("x86/entry: Clean up idtentry_enter/exit() leftovers") left the rcu_irq_exit_preempt() in place in order to avoid conflicts with the -rcu tree. Now that this change has long since hit mainline, this commit removes the no-longer-used rcu_irq_exit_preempt() function. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-10 16:22:53 -07:00
Frederic Weisbecker	b5befe842e	srcu: Fix broken node geometry after early ssp init An srcu_struct structure that is initialized before rcu_init_geometry() will have its srcu_node hierarchy based on CONFIG_NR_CPUS. Once rcu_init_geometry() is called, this hierarchy is compressed as needed for the actual maximum number of CPUs for this system. Later on, that srcu_struct structure is confused, sometimes referring to its initial CONFIG_NR_CPUS-based hierarchy, and sometimes instead to the new num_possible_cpus() hierarchy. For example, each of its ->mynode fields continues to reference the original leaf rcu_node structures, some of which might no longer exist. On the other hand, srcu_for_each_node_breadth_first() traverses to the new node hierarchy. There are at least two bad possible outcomes to this: 1) a) A callback enqueued early on an srcu_data structure (call it *sdp) is recorded pending on sdp->mynode->srcu_data_have_cbs in srcu_funnel_gp_start() with sdp->mynode pointing to a deep leaf (say 3 levels). b) The grace period ends after rcu_init_geometry() shrinks the nodes level to a single one. srcu_gp_end() walks through the new srcu_node hierarchy without ever reaching the old leaves so the callback is never executed. This is easily reproduced on an 8 CPUs machine with CONFIG_NR_CPUS >= 32 and "rcupdate.rcu_self_test=1". The srcu_barrier() after early tests verification never completes and the boot hangs: [ 5413.141029] INFO: task swapper/0:1 blocked for more than 4915 seconds. [ 5413.147564] Not tainted 5.12.0-rc4+ #28 [ 5413.151927] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5413.159753] task:swapper/0 state:D stack: 0 pid: 1 ppid: 0 flags:0x00004000 [ 5413.168099] Call Trace: [ 5413.170555] __schedule+0x36c/0x930 [ 5413.174057] ? wait_for_completion+0x88/0x110 [ 5413.178423] schedule+0x46/0xf0 [ 5413.181575] schedule_timeout+0x284/0x380 [ 5413.185591] ? wait_for_completion+0x88/0x110 [ 5413.189957] ? mark_held_locks+0x61/0x80 [ 5413.193882] ? mark_held_locks+0x61/0x80 [ 5413.197809] ? _raw_spin_unlock_irq+0x24/0x50 [ 5413.202173] ? wait_for_completion+0x88/0x110 [ 5413.206535] wait_for_completion+0xb4/0x110 [ 5413.210724] ? srcu_torture_stats_print+0x110/0x110 [ 5413.215610] srcu_barrier+0x187/0x200 [ 5413.219277] ? rcu_tasks_verify_self_tests+0x50/0x50 [ 5413.224244] ? rdinit_setup+0x2b/0x2b [ 5413.227907] rcu_verify_early_boot_tests+0x2d/0x40 [ 5413.232700] do_one_initcall+0x63/0x310 [ 5413.236541] ? rdinit_setup+0x2b/0x2b [ 5413.240207] ? rcu_read_lock_sched_held+0x52/0x80 [ 5413.244912] kernel_init_freeable+0x253/0x28f [ 5413.249273] ? rest_init+0x250/0x250 [ 5413.252846] kernel_init+0xa/0x110 [ 5413.256257] ret_from_fork+0x22/0x30 2) An srcu_struct structure that is initialized before rcu_init_geometry() and used afterward will always have stale rdp->mynode references, resulting in callbacks to be missed in srcu_gp_end(), just like in the previous scenario. This commit therefore causes init_srcu_struct_nodes to initialize the geometry, if needed. This ensures that the srcu_node hierarchy is properly built and distributed from the get-go. Suggested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-10 16:03:35 -07:00
Frederic Weisbecker	8e9c01c717	srcu: Initialize SRCU after timers Once srcu_init() is called, the SRCU core will make use of delayed workqueues, which rely on timers. However init_timers() is called several steps after rcu_init(). This means that a call_srcu() after rcu_init() but before init_timers() would find itself within a dangerously uninitialized timer core. This commit therefore creates a separate call to srcu_init() after init_timer() completes, which ensures that we stay in early SRCU mode until timers are safe(r). Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-10 16:03:35 -07:00
Uladzislau Rezki (Sony)	a78d4a2a10	kvfree_rcu: Refactor kfree_rcu_monitor() Currently we have three functions which depend on each other. Two of them are quite tiny and the last one where the most work is done. All of them are related to queuing RCU batches to reclaim objects after a GP. 1. kfree_rcu_monitor(). It consist of few lines. It acquires a spin-lock and calls kfree_rcu_drain_unlock(). 2. kfree_rcu_drain_unlock(). It also consists of few lines of code. It calls queue_kfree_rcu_work() to queue the batch. If this fails, it rearms the monitor work to try again later. 3. queue_kfree_rcu_work(). This provides the bulk of the functionality, attempting to start a new batch to free objects after a GP. Since there are no external users of functions [2] and [3], both can eliminated by moving all logic directly into [1], which both shrinks and simplifies the code. Also replace comments which start with "/*" to "//" format to make it unified across the file. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-10 16:00:48 -07:00
Uladzislau Rezki (Sony)	d8628f35ba	kvfree_rcu: Fix comments according to current code The kvfree_rcu() function now defers allocations in the common case due to the fact that there is no lockless access to the memory-allocator caches/pools. In addition, in CONFIG_PREEMPT_NONE=y and in CONFIG_PREEMPT_VOLUNTARY=y kernels, there is no reliable way to determine if spinlocks are held. As a result, allocation is deferred in the common case, and the two-argument form of kvfree_rcu() thus uses the "channel 3" queue through all the rcu_head structures. This channel is called referred to as the emergency case in comments, and these comments are now obsolete. This commit therefore updates these comments to reflect the new common-case nature of such emergencies. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-10 16:00:48 -07:00
Uladzislau Rezki (Sony)	7fe1da33f6	kvfree_rcu: Use kfree_rcu_monitor() instead of open-coded variant Replace an open-coded version of the kfree_rcu_monitor() function body with a call to that function. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-10 16:00:48 -07:00
Uladzislau Rezki (Sony)	dd28c9f057	kvfree_rcu: Update "monitor_todo" once a batch is started Before attempting to start a new batch the "monitor_todo" variable is set to "false" and set back to "true" when a previous RCU batch is still in progress. This is at best confusing. Thus change this variable to "false" only when a new batch has been successfully queued, otherwise, just leave it be. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-10 16:00:48 -07:00
Uladzislau Rezki (Sony)	d434c00fa3	kvfree_rcu: Add a bulk-list check when a scheduler is run The rcu_scheduler_active flag is set to RCU_SCHEDULER_RUNNING once the scheduler is up and running. That signal is used in order to check and queue a "monitor work" to reclaim freed objects (if there are any) during early boot. This flag is used by kvfree_rcu() to determine when work can safely be queued, at which point memory passed to earlier invocations of kvfree_rcu() can be processed. However, only "krcp->head" is checked for objects that need to be released, and there are now two more, namely, "krcp->bkvhead[0]" and "krcp->bkvhead[1]". Therefore, check these two additional channels. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-10 16:00:48 -07:00
Uladzislau Rezki (Sony)	ac7625ebd5	kvfree_rcu: Use [READ/WRITE]_ONCE() macros to access to nr_bkv_objs nr_bkv_objs is a count of the objects in the kvfree_rcu page cache. Accessing it requires holding the ->lock. Switch to READ_ONCE() and WRITE_ONCE() macros to provide lockless access to this counter. This lockless access is used for the shrinker. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-10 16:00:48 -07:00
Zhang Qiang	d0bfa8b3c4	kvfree_rcu: Release a page cache under memory pressure Add a drain_page_cache() function to drain a per-cpu page cache. The reason behind of it is a system can run into a low memory condition, in that case a page shrinker can ask for its users to free their caches in order to get extra memory available for other needs in a system. When a system hits such condition, a page cache is drained for all CPUs in a system. By default a page cache work is delayed with 5 seconds interval until a memory pressure disappears, if needed it can be changed. See a rcu_delay_page_cache_fill_msec module parameter. Co-developed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Zqiang <qiang.zhang@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-05-10 16:00:48 -07:00
Paul E. McKenney	ab6ad3dbdd	Merge branches 'bitmaprange.2021.03.08a', 'fixes.2021.03.15a', 'kvfree_rcu.2021.03.08a', 'mmdumpobj.2021.03.08a', 'nocb.2021.03.15a', 'poll.2021.03.24a', 'rt.2021.03.08a', 'tasks.2021.03.08a', 'torture.2021.03.08a' and 'torturescript.2021.03.22a' into HEAD bitmaprange.2021.03.08a: Allow 3-N for bitmap ranges. fixes.2021.03.15a: Miscellaneous fixes. kvfree_rcu.2021.03.08a: kvfree_rcu() updates. mmdumpobj.2021.03.08a: mem_dump_obj() updates. nocb.2021.03.15a: RCU NOCB CPU updates, including limited deoffloading. poll.2021.03.24a: Polling grace-period interfaces for RCU. rt.2021.03.08a: Realtime-related RCU changes. tasks.2021.03.08a: Tasks-RCU updates. torture.2021.03.08a: Torture-test updates. torturescript.2021.03.22a: Torture-test scripting updates.	2021-03-24 17:20:18 -07:00
Paul E. McKenney	7abb18bd75	rcu: Provide polling interfaces for Tree RCU grace periods There is a need for a non-blocking polling interface for RCU grace periods, so this commit supplies start_poll_synchronize_rcu() and poll_state_synchronize_rcu() for this purpose. Note that the existing get_state_synchronize_rcu() may be used if future grace periods are inevitable (perhaps due to a later call_rcu() invocation). The new start_poll_synchronize_rcu() is to be used if future grace periods might not otherwise happen. Finally, poll_state_synchronize_rcu() provides a lockless check for a grace period having elapsed since the corresponding call to either of the get_state_synchronize_rcu() or start_poll_synchronize_rcu(). As with get_state_synchronize_rcu(), the return value from either get_state_synchronize_rcu() or start_poll_synchronize_rcu() is passed in to a later call to either poll_state_synchronize_rcu() or the existing (might_sleep) cond_synchronize_rcu(). [ paulmck: Remove redundant smp_mb() per Frederic Weisbecker feedback. ] [ Update poll_state_synchronize_rcu() docbook per Frederic Weisbecker feedback. ] Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-03-22 08:23:48 -07:00
Frederic Weisbecker	ec711bc12c	rcu/nocb: Only (re-)initialize segcblist when needed on CPU up At the start of a CPU-hotplug operation, the incoming CPU's callback list can be in a number of states: 1. Disabled and empty. This is the case when the boot CPU has not invoked call_rcu(), when a non-boot CPU first comes online, and when a non-offloaded CPU comes back online. In this case, it is both necessary and permissible to initialize ->cblist. Because either the CPU is currently running with interrupts disabled (boot CPU) or is not yet running at all (other CPUs), it is not necessary to acquire ->nocb_lock. In this case, initialization is required. 2. Disabled and non-empty. This cannot occur, because early boot call_rcu() invocations enable the callback list before enqueuing their callback. 3. Enabled, whether empty or not. In this case, the callback list has already been initialized. This case occurs when the boot CPU has executed an early boot call_rcu() and also when an offloaded CPU comes back online. In both cases, there is no need to initialize the callback list: In the boot-CPU case, the CPU has not (yet) gone offline, and in the offloaded case, the rcuo kthreads are taking care of business. Because it is not necessary to initialize the callback list, it is also not necessary to acquire ->nocb_lock. Therefore, checking if the segcblist is enabled suffices. This commit therefore initializes the callback list at rcutree_prepare_cpu() time only if that list is disabled. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-03-08 14:20:22 -08:00
Frederic Weisbecker	64305db285	rcu/nocb: Forbid NOCB toggling on offline CPUs It makes no sense to de-offload an offline CPU because that CPU will never invoke any remaining callbacks. It also makes little sense to offload an offline CPU because any pending RCU callbacks were migrated when that CPU went offline. Yes, it is in theory possible to use a number of tricks to permit offloading and deoffloading offline CPUs in certain cases, but in practice it is far better to have the simple and deterministic rule "Toggling the offload state of an offline CPU is forbidden". For but one example, consider that an offloaded offline CPU might have millions of callbacks queued. Best to just say "no". This commit therefore forbids toggling of the offloaded state of offline CPUs. Reported-by: Paul E. McKenney <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-03-08 14:20:21 -08:00
Frederic Weisbecker	3820b513a2	rcu/nocb: Detect unsafe checks for offloaded rdp Provide CONFIG_PROVE_RCU sanity checks to ensure we are always reading the offloaded state of an rdp in a safe and stable way and prevent from its value to be changed under us. We must either hold the barrier mutex, the cpu-hotplug lock (read or write) or the nocb lock. Local non-preemptible reads are also safe. NOCB kthreads and timers have their own means of synchronization against the offloaded state updaters. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-03-08 14:20:20 -08:00
Uladzislau Rezki (Sony)	ee6ddf5847	kvfree_rcu: Use same set of GFP flags as does single-argument Running an rcuscale stress-suite can lead to "Out of memory" of a system. This can happen under high memory pressure with a small amount of physical memory. For example, a KVM test configuration with 64 CPUs and 512 megabytes can result in OOM when running rcuscale with below parameters: ../kvm.sh --torture rcuscale --allcpus --duration 10 --kconfig CONFIG_NR_CPUS=64 \ --bootargs "rcuscale.kfree_rcu_test=1 rcuscale.kfree_nthreads=16 rcuscale.holdoff=20 \ rcuscale.kfree_loops=10000 torture.disable_onoff_at_boot" --trust-make <snip> [ 12.054448] kworker/1:1H invoked oom-killer: gfp_mask=0x2cc0(GFP_KERNEL\|__GFP_NOWARN), order=0, oom_score_adj=0 [ 12.055303] CPU: 1 PID: 377 Comm: kworker/1:1H Not tainted 5.11.0-rc3+ #510 [ 12.055416] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014 [ 12.056485] Workqueue: events_highpri fill_page_cache_func [ 12.056485] Call Trace: [ 12.056485] dump_stack+0x57/0x6a [ 12.056485] dump_header+0x4c/0x30a [ 12.056485] ? del_timer_sync+0x20/0x30 [ 12.056485] out_of_memory.cold.47+0xa/0x7e [ 12.056485] __alloc_pages_slowpath.constprop.123+0x82f/0xc00 [ 12.056485] __alloc_pages_nodemask+0x289/0x2c0 [ 12.056485] __get_free_pages+0x8/0x30 [ 12.056485] fill_page_cache_func+0x39/0xb0 [ 12.056485] process_one_work+0x1ed/0x3b0 [ 12.056485] ? process_one_work+0x3b0/0x3b0 [ 12.060485] worker_thread+0x28/0x3c0 [ 12.060485] ? process_one_work+0x3b0/0x3b0 [ 12.060485] kthread+0x138/0x160 [ 12.060485] ? kthread_park+0x80/0x80 [ 12.060485] ret_from_fork+0x22/0x30 [ 12.062156] Mem-Info: [ 12.062350] active_anon:0 inactive_anon:0 isolated_anon:0 [ 12.062350] active_file:0 inactive_file:0 isolated_file:0 [ 12.062350] unevictable:0 dirty:0 writeback:0 [ 12.062350] slab_reclaimable:2797 slab_unreclaimable:80920 [ 12.062350] mapped:1 shmem:2 pagetables:8 bounce:0 [ 12.062350] free:10488 free_pcp:1227 free_cma:0 ... [ 12.101610] Out of memory and no killable processes... [ 12.102042] Kernel panic - not syncing: System is deadlocked on memory [ 12.102583] CPU: 1 PID: 377 Comm: kworker/1:1H Not tainted 5.11.0-rc3+ #510 [ 12.102600] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014 <snip> Because kvfree_rcu() has a fallback path, memory allocation failure is not the end of the world. Furthermore, the added overhead of aggressive GFP settings must be balanced against the overhead of the fallback path, which is a cache miss for double-argument kvfree_rcu() and a call to synchronize_rcu() for single-argument kvfree_rcu(). The current choice of GFP_KERNEL\|__GFP_NOWARN can result in longer latencies than a call to synchronize_rcu(), so less-tenacious GFP flags would be helpful. Here is the tradeoff that must be balanced: a) Minimize use of the fallback path, b) Avoid pushing the system into OOM, c) Bound allocation latency to that of synchronize_rcu(), and d) Leave the emergency reserves to use cases lacking fallbacks. This commit therefore changes GFP flags from GFP_KERNEL\|__GFP_NOWARN to GFP_KERNEL\|__GFP_NORETRY\|__GFP_NOMEMALLOC\|__GFP_NOWARN. This combination leaves the emergency reserves alone and can initiate reclaim, but will not invoke the OOM killer. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-03-08 14:18:07 -08:00
Uladzislau Rezki (Sony)	3e7ce7a187	kvfree_rcu: Replace __GFP_RETRY_MAYFAIL by __GFP_NORETRY __GFP_RETRY_MAYFAIL can spend quite a bit of time reclaiming, and this can be wasted effort given that there is a fallback code path in case memory allocation fails. __GFP_NORETRY does perform some light-weight reclaim, but it will fail under OOM conditions, allowing the fallback to be taken as an alternative to hard-OOMing the system. There is a four-way tradeoff that must be balanced: 1) Minimize use of the fallback path; 2) Avoid full-up OOM; 3) Do a light-wait allocation request; 4) Avoid dipping into the emergency reserves. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-03-08 14:18:07 -08:00
Paul E. McKenney	7ffc9ec8ea	kvfree_rcu: Make krc_this_cpu_unlock() use raw_spin_unlock_irqrestore() The krc_this_cpu_unlock() function does a raw_spin_unlock() immediately followed by a local_irq_restore(). This commit saves a line of code by merging them into a raw_spin_unlock_irqrestore(). This transformation also reduces scheduling latency because raw_spin_unlock_irqrestore() responds immediately to a reschedule request. In contrast, local_irq_restore() does a scheduling-oblivious enabling of interrupts. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-03-08 14:18:07 -08:00
Paul E. McKenney	b01b405092	kvfree_rcu: Use __GFP_NOMEMALLOC for single-argument kvfree_rcu() This commit applies the __GFP_NOMEMALLOC gfp flag to memory allocations carried out by the single-argument variant of kvfree_rcu(), thus avoiding this can-sleep code path from dipping into the emergency reserves. Acked-by: Michal Hocko <mhocko@suse.com> Suggested-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-03-08 14:18:07 -08:00

1 2 3 4 5 ...

850 Commits