kernel_arpi

Author	SHA1	Message	Date
Minchan Kim	dd887dbfaa	ANDROID: mm: do not count cma_alloc_fail on __GFP_NORETRY Do not account __GFP_NORETRY allocation failure as cma_alloc_fail since it's not critical failure(i.e., the caller with __GFP_NORETRY should always carry on the fallback plan). It's also good for compatibility POV with upstream since upstream cma_alloc_fail only counts cma_alloc_fail with !__GFP_NORETRY since upstream doesn't support __GFP_NORTRY yet. Bug: 220669548 Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: I377e6b033c3786e10b6b1c814037a4fc40e20a73 Signed-off-by: Richard Chang <richardycc@google.com> (cherry picked from commit 8ffc7ff817fe552592daa2b0de1760e3539663f3)	2023-01-05 02:16:50 +00:00
Minchan Kim	bb7b81497d	ANDROID: GKI: export cma_get_size Export cma_get_size to tell cma instance's size, which is needed to allocate entire pages of the cma. Bug: 218731671 Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: Ifb2769f60250ce605236342b950907218e1c28a5 Signed-off-by: Richard Chang <richardycc@google.com> (cherry picked from commit 7a44906686048bdcecb7dfa4fac02c4ad7f6cd06)	2023-01-05 02:16:50 +00:00
Minchan Kim	1b38e981db	ANDROID: mm: cma do not sleep for __GFP_NORETRY Do not sleep for retrying for __GFP_NORERY since it's failfast mode approach. User could retry the allocation without the flag by themselves if they see the failure. Bug: 192475091 Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: Ic6a857978fda8e353b9ed770d1e0ba1808fd201e Signed-off-by: Richard Chang <richardycc@google.com> (cherry picked from commit `12f48605e8`)	2023-01-05 02:16:50 +00:00
Minchan Kim	60d2dad38e	ANDROID: mm: cma: skip problematic pageblock alloc_contig_range is supposed to work on max(MAX_ORDER_NR_PAGES, or pageblock_nr_pages) granularity aligned range. If it fails at a page and return error to user, user doesn't know what page makes the allocation failure and keep retrying another allocation with new range including the failed page and encountered error again and again until it could escape the out of the granularity block. Instead, let's make CMA aware of what pfn was troubled in previous trial and then continue to work new pageblock out of the failed page so it doesn't see the repeated error repeatedly. Currently, this option works for only __GFP_NORETRY case for safe for existing CMA users. Bug: 192475091 Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: I0959c9df3d4b36408a68920abbb4d52d31026079 Signed-off-by: Richard Chang <richardycc@google.com> (cherry picked from commit `0e688e972d`)	2023-01-05 02:16:50 +00:00
Minchan Kim	a7f55c5c73	ANDROID: mm: add cma allocation statistics alloc_contig_range is the core worker function for CMA allocation so it has every information to be able to understand allocation latency. For example, how many pages are migrated, how many time unmap was needed to migrate pages, how many times it encountered errors by some reasons. This patch adds such statistics in the alloc_contig_range and return it to user so user can use those information to analyize latency. The cma_alloc is first user for the statistics, which export the statistics as new trace event(i.e., cma_alloc_info). It was really usefuli to optimize cma allocation work. Bug: 192475091 Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: I7be43cc89d11078e2a324d2d06aada6d8e9e1cc9 Signed-off-by: Richard Chang <richardycc@google.com> (cherry picked from commit `675e504598`)	2023-01-05 02:16:50 +00:00
Minchan Kim	1b2de5aa2d	ANDROID: mm: cma: add vendor hoook in cma_alloc() Add vendor hook for cma_alloc latency measuring. Bug: 177231781 Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: Ia2dbb26454bd8f03489389b29b9a6c939d3c2bbb Signed-off-by: Richard Chang <richardycc@google.com> (cherry picked from commit `c6e85ea56b`)	2023-01-05 02:16:50 +00:00
Minchan Kim	eebff8eab2	FROMLIST: mm: cma: introduce gfp flag in cma_alloc instead of no_warn The upcoming patch will introduce __GFP_NORETRY semantic in alloc_contig_range which is a failfast mode of the API. Instead of adding a additional parameter for gfp, replace no_warn with gfp flag. To keep old behaviors, it follows the rule below. no_warn gfp_flags false GFP_KERNEL true GFP_KERNEL\|__GFP_NOWARN gfp & __GFP_NOWARN GFP_KERNEL \| (gfp & __GFP_NOWARN) Bug: 170340257 Bug: 120293424 Link: https://lore.kernel.org/linux-mm/YAnM5PbNJZlk%2F%2FiX@google.com/T/#m36b144ff81fe0a8f0ecaf6813de4819ecc41f8fe Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: I1ce020ab5d5fff34eb6464be4632ddef72fb43eb Signed-off-by: Richard Chang <richardycc@google.com> (cherry picked from commit `23ba990a3e`)	2023-01-05 02:16:50 +00:00
Greg Kroah-Hartman	3d8ac88867	Merge 5.15.46 into android14-5.15 Changes in 5.15.46 binfmt_flat: do not stop relocating GOT entries prematurely on riscv parisc/stifb: Implement fb_is_primary_device() parisc/stifb: Keep track of hardware path of graphics card RISC-V: Mark IORESOURCE_EXCLUSIVE for reserved mem instead of IORESOURCE_BUSY riscv: Initialize thread pointer before calling C functions riscv: Fix irq_work when SMP is disabled riscv: Wire up memfd_secret in UAPI header riscv: Move alternative length validation into subsection ALSA: hda/realtek - Add new type for ALC245 ALSA: hda/realtek: Enable 4-speaker output for Dell XPS 15 9520 laptop ALSA: hda/realtek - Fix microphone noise on ASUS TUF B550M-PLUS ALSA: usb-audio: Cancel pending work at closing a MIDI substream USB: serial: pl2303: fix type detection for odd device USB: serial: option: add Quectel BG95 modem USB: new quirk for Dell Gen 2 devices usb: isp1760: Fix out-of-bounds array access usb: dwc3: gadget: Move null pinter check to proper place usb: core: hcd: Add support for deferring roothub registration fs/ntfs3: Update valid size if -EIOCBQUEUED fs/ntfs3: Fix fiemap + fix shrink file size (to remove preallocated space) fs/ntfs3: Keep preallocated only if option prealloc enabled fs/ntfs3: Check new size for limits fs/ntfs3: In function ntfs_set_acl_ex do not change inode->i_mode if called from function ntfs_init_acl fs/ntfs3: Fix some memory leaks in an error handling path of 'log_replay()' fs/ntfs3: Update i_ctime when xattr is added fs/ntfs3: Restore ntfs_xattr_get_acl and ntfs_xattr_set_acl functions cifs: fix potential double free during failed mount cifs: when extending a file with falloc we should make files not-sparse xhci: Allow host runtime PM as default for Intel Alder Lake N xHCI platform/x86: intel-hid: fix _DSM function index handling x86/MCE/AMD: Fix memory leak when threshold_create_bank() fails perf/x86/intel: Fix event constraints for ICL x86/kexec: fix memory leak of elf header buffer x86/sgx: Set active memcg prior to shmem allocation ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP ptrace: Reimplement PTRACE_KILL by always sending SIGKILL btrfs: add "0x" prefix for unsupported optional features btrfs: return correct error number for __extent_writepage_io() btrfs: repair super block num_devices automatically btrfs: fix the error handling for submit_extent_page() for btrfs_do_readpage() iommu/vt-d: Add RPLS to quirk list to skip TE disabling drm/vmwgfx: validate the screen formats drm/virtio: fix NULL pointer dereference in virtio_gpu_conn_get_modes selftests/bpf: Fix vfs_link kprobe definition selftests/bpf: Fix parsing of prog types in UAPI hdr for bpftool sync mwifiex: add mutex lock for call in mwifiex_dfs_chan_sw_work_queue b43legacy: Fix assigning negative value to unsigned variable b43: Fix assigning negative value to unsigned variable ipw2x00: Fix potential NULL dereference in libipw_xmit() ipv6: fix locking issues with loops over idev->addr_list fbcon: Consistently protect deferred_takeover with console_lock() x86/platform/uv: Update TSC sync state for UV5 ACPICA: Avoid cache flush inside virtual machines mac80211: minstrel_ht: fix where rate stats are stored (fixes debugfs output) drm/komeda: return early if drm_universal_plane_init() fails. drm/amd/display: Disabling Z10 on DCN31 rcu-tasks: Fix race in schedule and flush work rcu: Make TASKS_RUDE_RCU select IRQ_WORK sfc: ef10: Fix assigning negative value to unsigned variable ALSA: jack: Access input_dev under mutex rtw88: 8821c: fix debugfs rssi value spi: spi-rspi: Remove setting {src,dst}_{addr,addr_width} based on DMA direction tools/power turbostat: fix ICX DRAM power numbers scsi: lpfc: Move cfg_log_verbose check before calling lpfc_dmp_dbg() scsi: lpfc: Fix SCSI I/O completion and abort handler deadlock scsi: lpfc: Fix call trace observed during I/O with CMF enabled cpuidle: PSCI: Improve support for suspend-to-RAM for PSCI OSI mode drm/amd/pm: fix double free in si_parse_power_table() ASoC: rsnd: care default case on rsnd_ssiu_busif_err_status_clear() ASoC: rsnd: care return value from rsnd_node_fixed_index() ath9k: fix QCA9561 PA bias level media: venus: hfi: avoid null dereference in deinit media: pci: cx23885: Fix the error handling in cx23885_initdev() media: cx25821: Fix the warning when removing the module md/bitmap: don't set sb values if can't pass sanity check mmc: jz4740: Apply DMA engine limits to maximum segment size drivers: mmc: sdhci_am654: Add the quirk to set TESTCD bit scsi: megaraid: Fix error check return value of register_chrdev() drm/amdgpu/sdma: Fix incorrect calculations of the wptr of the doorbells scsi: ufs: Use pm_runtime_resume_and_get() instead of pm_runtime_get_sync() scsi: lpfc: Fix resource leak in lpfc_sli4_send_seq_to_ulp() ath11k: disable spectral scan during spectral deinit ASoC: Intel: bytcr_rt5640: Add quirk for the HP Pro Tablet 408 drm/plane: Move range check for format_count earlier drm/amd/pm: fix the compile warning ath10k: skip ath10k_halt during suspend for driver state RESTARTING arm64: compat: Do not treat syscall number as ESR_ELx for a bad syscall drm: msm: fix error check return value of irq_of_parse_and_map() scsi: target: tcmu: Fix possible data corruption ipv6: Don't send rs packets to the interface of ARPHRD_TUNNEL net/mlx5: fs, delete the FTE when there are no rules attached to it ASoC: dapm: Don't fold register value changes into notifications mlxsw: spectrum_dcb: Do not warn about priority changes mlxsw: Treat LLDP packets as control drm/amdgpu/psp: move PSP memory alloc from hw_init to sw_init drm/amdgpu/ucode: Remove firmware load type check in amdgpu_ucode_free_bo regulator: mt6315: Enforce regulator-compatible, not name HID: bigben: fix slab-out-of-bounds Write in bigben_probe of: Support more than one crash kernel regions for kexec -s ASoC: tscs454: Add endianness flag in snd_soc_component_driver scsi: lpfc: Alter FPIN stat accounting logic net: remove two BUG() from skb_checksum_help() s390/preempt: disable __preempt_count_add() optimization for PROFILE_ALL_BRANCHES perf/amd/ibs: Cascade pmu init functions' return value sched/core: Avoid obvious double update_rq_clock warning spi: stm32-qspi: Fix wait_cmd timeout in APM mode dma-debug: change allocation mode from GFP_NOWAIT to GFP_ATIOMIC ACPI: PM: Block ASUS B1400CEAE from suspend to idle by default ipmi:ssif: Check for NULL msg when handling events and messages ipmi: Fix pr_fmt to avoid compilation issues rtlwifi: Use pr_warn instead of WARN_ONCE mt76: mt7921: accept rx frames with non-standard VHT MCS10-11 mt76: fix encap offload ethernet type check media: rga: fix possible memory leak in rga_probe media: coda: limit frame interval enumeration to supported encoder frame sizes media: hantro: HEVC: unconditionnaly set pps_{cb/cr}_qp_offset values media: ccs-core.c: fix failure to call clk_disable_unprepare media: imon: reorganize serialization media: cec-adap.c: fix is_configuring state usbnet: Run unregister_netdev() before unbind() again openrisc: start CPU timer early in boot nvme-pci: fix a NULL pointer dereference in nvme_alloc_admin_tags ASoC: rt5645: Fix errorenous cleanup order nbd: Fix hung on disconnect request if socket is closed before drm/amd/pm: update smartshift powerboost calc for smu12 drm/amd/pm: update smartshift powerboost calc for smu13 net: phy: micrel: Allow probing without .driver_data media: exynos4-is: Fix compile warning media: hantro: Stop using H.264 parameter pic_num ASoC: max98357a: remove dependency on GPIOLIB ASoC: rt1015p: remove dependency on GPIOLIB ACPI: CPPC: Assume no transition latency if no PCCT nvme: set non-mdts limits in nvme_scan_work can: mcp251xfd: silence clang's -Wunaligned-access warning x86/microcode: Add explicit CPU vendor dependency net: ipa: ignore endianness if there is no header m68k: atari: Make Atari ROM port I/O write macros return void rxrpc: Return an error to sendmsg if call failed rxrpc, afs: Fix selection of abort codes afs: Adjust ACK interpretation to try and cope with NAT eth: tg3: silence the GCC 12 array-bounds warning char: tpm: cr50_i2c: Suppress duplicated error message in .remove() selftests/bpf: fix btf_dump/btf_dump due to recent clang change gfs2: use i_lock spin_lock for inode qadata scsi: target: tcmu: Avoid holding XArray lock when calling lock_page IB/rdmavt: add missing locks in rvt_ruc_loopback ARM: dts: ox820: align interrupt controller node name with dtschema ARM: dts: socfpga: align interrupt controller node name with dtschema ARM: dts: s5pv210: align DMA channels with dtschema arm64: dts: qcom: msm8994: Fix the cont_splash_mem address arm64: dts: qcom: msm8994: Fix BLSP[12]_DMA channels count PM / devfreq: rk3399_dmc: Disable edev on remove() crypto: ccree - use fine grained DMA mapping dir soc: ti: ti_sci_pm_domains: Check for null return of devm_kcalloc fs: jfs: fix possible NULL pointer dereference in dbFree() arm64: dts: qcom: sdm845-xiaomi-beryllium: fix typo in panel's vddio-supply property ALSA: usb-audio: Add quirk bits for enabling/disabling generic implicit fb ALSA: usb-audio: Move generic implicit fb quirk entries into quirks.c ARM: OMAP1: clock: Fix UART rate reporting algorithm powerpc/fadump: Fix fadump to work with a different endian capture kernel fat: add ratelimit to fat_ent_bread() pinctrl: renesas: rzn1: Fix possible null-ptr-deref in sh_pfc_map_resources() ARM: versatile: Add missing of_node_put in dcscb_init ARM: dts: exynos: add atmel,24c128 fallback to Samsung EEPROM ARM: hisi: Add missing of_node_put after of_find_compatible_node cpufreq: Avoid unnecessary frequency updates due to mismatch powerpc/rtas: Keep MSR[RI] set when calling RTAS PCI: Avoid pci_dev_lock() AB/BA deadlock with sriov_numvfs_store() KVM: PPC: Book3S HV Nested: L2 LPCR should inherit L1 LPES setting alpha: fix alloc_zeroed_user_highpage_movable() tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate powerpc/powernv/vas: Assign real address to rx_fifo in vas_rx_win_attr powerpc/xics: fix refcount leak in icp_opal_init() powerpc/powernv: fix missing of_node_put in uv_init() macintosh/via-pmu: Fix build failure when CONFIG_INPUT is disabled powerpc/iommu: Add missing of_node_put in iommu_init_early_dart smb3: check for null tcon RDMA/hfi1: Prevent panic when SDMA is disabled Input: gpio-keys - cancel delayed work only in case of GPIO drm: fix EDID struct for old ARM OABI format drm/bridge_connector: enable HPD by default if supported dt-bindings: display: sitronix, st7735r: Fix backlight in example drm/vmwgfx: Fix an invalid read ath11k: acquire ab->base_lock in unassign when finding the peer by addr drm: bridge: it66121: Fix the register page length ath9k: fix ar9003_get_eepmisc drm/edid: fix invalid EDID extension block filtering drm/bridge: adv7511: clean up CEC adapter when probe fails drm: bridge: icn6211: Fix register layout drm: bridge: icn6211: Fix HFP_HSW_HBP_HI and HFP_MIN handling mtd: spinand: gigadevice: fix Quad IO for GD5F1GQ5UExxG spi: qcom-qspi: Add minItems to interconnect-names ASoC: mediatek: Fix error handling in mt8173_max98090_dev_probe ASoC: mediatek: Fix missing of_node_put in mt2701_wm8960_machine_probe x86/delay: Fix the wrong asm constraint in delay_loop() drm/vc4: hvs: Fix frame count register readout drm/mediatek: Fix mtk_cec_mask() drm/vc4: hvs: Reset muxes at probe time drm/vc4: txp: Don't set TXP_VSTART_AT_EOF drm/vc4: txp: Force alpha to be 0xff if it's disabled libbpf: Don't error out on CO-RE relos for overriden weak subprogs x86/PCI: Fix ALi M1487 (IBC) PIRQ router link value interpretation mptcp: reset the packet scheduler on PRIO change nl80211: show SSID for P2P_GO interfaces drm/komeda: Fix an undefined behavior bug in komeda_plane_add() drm: mali-dp: potential dereference of null pointer spi: spi-ti-qspi: Fix return value handling of wait_for_completion_timeout scftorture: Fix distribution of short handler delays net: dsa: mt7530: 1G can also support 1000BASE-X link mode ixp4xx_eth: fix error check return value of platform_get_irq() NFC: NULL out the dev->rfkill to prevent UAF efi: Add missing prototype for efi_capsule_setup_info device property: Check fwnode->secondary when finding properties device property: Allow error pointer to be passed to fwnode APIs target: remove an incorrect unmap zeroes data deduction drbd: fix duplicate array initializer EDAC/dmc520: Don't print an error for each unconfigured interrupt line mtd: rawnand: denali: Use managed device resources HID: hid-led: fix maximum brightness for Dream Cheeky HID: elan: Fix potential double free in elan_input_configured drm/bridge: Fix error handling in analogix_dp_probe regulator: da9121: Fix uninit-value in da9121_assign_chip_model() drm/mediatek: dpi: Use mt8183 output formats for mt8192 signal: Deliver SIGTRAP on perf event asynchronously if blocked sched/fair: Fix cfs_rq_clock_pelt() for throttled cfs_rq sched/psi: report zeroes for CPU full at the system level spi: img-spfi: Fix pm_runtime_get_sync() error checking cpufreq: Fix possible race in cpufreq online error path printk: use atomic updates for klogd work printk: add missing memory barrier to wake_up_klogd() printk: wake waiters for safe and NMI contexts ath9k_htc: fix potential out of bounds access with invalid rxstatus->rs_keyix media: i2c: max9286: Use dev_err_probe() helper media: i2c: max9286: Use "maxim,gpio-poc" property media: i2c: max9286: fix kernel oops when removing module media: hantro: Empty encoder capture buffers by default drm/panel: simple: Add missing bus flags for Innolux G070Y2-L01 ALSA: pcm: Check for null pointer of pointer substream before dereferencing it mtdblock: warn if opened on NAND inotify: show inotify mask flags in proc fdinfo fsnotify: fix wrong lockdep annotations spi: rockchip: Stop spi slave dma receiver when cs inactive spi: rockchip: Preset cs-high and clk polarity in setup progress spi: rockchip: fix missing error on unsupported SPI_CS_HIGH of: overlay: do not break notify on NOTIFY_{OK\|STOP} selftests/damon: add damon to selftests root Makefile drm/msm/dp: Modify prototype of encoder based API drm/msm/hdmi: switch to drm_bridge_connector drm/msm/dpu: adjust display_v_end for eDP and DP scsi: iscsi: Fix harmless double shift bug scsi: ufs: qcom: Fix ufs_qcom_resume() scsi: ufs: core: Exclude UECxx from SFR dump list drm/v3d: Fix null pointer dereference of pointer perfmon selftests/resctrl: Fix null pointer dereference on open failed libbpf: Fix logic for finding matching program for CO-RE relocation mtd: spi-nor: core: Check written SR value in spi_nor_write_16bit_sr_and_check() x86/pm: Fix false positive kmemleak report in msr_build_context() mtd: rawnand: cadence: fix possible null-ptr-deref in cadence_nand_dt_probe() mtd: rawnand: intel: fix possible null-ptr-deref in ebu_nand_probe() x86/speculation: Add missing prototype for unpriv_ebpf_notify() ASoC: rk3328: fix disabling mclk on pclk probe failure perf tools: Add missing headers needed by util/data.h drm/msm/disp/dpu1: set vbif hw config to NULL to avoid use after memory free during pm runtime resume drm/msm/dp: stop event kernel thread when DP unbind drm/msm/dp: fix error check return value of irq_of_parse_and_map() drm/msm/dp: reset DP controller before transmit phy test pattern drm/msm/dp: do not stop transmitting phy test pattern during DP phy compliance test drm/msm/dsi: fix error checks and return values for DSI xmit functions drm/msm/hdmi: check return value after calling platform_get_resource_byname() drm/msm/hdmi: fix error check return value of irq_of_parse_and_map() drm/msm: add missing include to msm_drv.c drm/panel: panel-simple: Fix proper bpc for AM-1280800N3TZQW-T00H kunit: fix debugfs code to use enum kunit_status, not bool drm/rockchip: vop: fix possible null-ptr-deref in vop_bind() spi: cadence-quadspi: fix Direct Access Mode disable for SoCFPGA perf tools: Use Python devtools for version autodetection rather than runtime virtio_blk: fix the discard_granularity and discard_alignment queue limits nl80211: don't hold RTNL in color change request x86: Fix return value of __setup handlers irqchip/exiu: Fix acknowledgment of edge triggered interrupts irqchip/aspeed-i2c-ic: Fix irq_of_parse_and_map() return value irqchip/aspeed-scu-ic: Fix irq_of_parse_and_map() return value x86/mm: Cleanup the control_va_addr_alignment() __setup handler arm64: fix types in copy_highpage() regulator: core: Fix enable_count imbalance with EXCLUSIVE_GET drm/msm/dsi: fix address for second DSI PHY on SDM660 drm/msm/dp: fix event thread stuck in wait_event after kthread_stop() drm/msm/mdp5: Return error code in mdp5_pipe_release when deadlock is detected drm/msm/mdp5: Return error code in mdp5_mixer_release when deadlock is detected drm/msm: return an error pointer in msm_gem_prime_get_sg_table() media: uvcvideo: Fix missing check to determine if element is found in list arm64: stackleak: fix current_top_of_stack() iomap: iomap_write_failed fix spi: spi-fsl-qspi: check return value after calling platform_get_resource_byname() Revert "cpufreq: Fix possible race in cpufreq online error path" regulator: qcom_smd: Fix up PM8950 regulator configuration samples: bpf: Don't fail for a missing VMLINUX_BTF when VMLINUX_H is provided perf/amd/ibs: Use interrupt regs ip for stack unwinding ath11k: Don't check arvif->is_started before sending management frames wilc1000: fix crash observed in AP mode with cfg80211_register_netdevice() HID: amd_sfh: Modify the bus name HID: amd_sfh: Modify the hid name ASoC: fsl: Use dev_err_probe() helper ASoC: fsl: Fix refcount leak in imx_sgtl5000_probe ASoC: imx-hdmi: Fix refcount leak in imx_hdmi_probe ASoC: mxs-saif: Fix refcount leak in mxs_saif_probe regulator: pfuze100: Fix refcount leak in pfuze_parse_regulators_dt dma-direct: factor out a helper for DMA_ATTR_NO_KERNEL_MAPPING allocations dma-direct: don't fail on highmem CMA pages in dma_direct_alloc_pages ASoC: samsung: Use dev_err_probe() helper ASoC: samsung: Fix refcount leak in aries_audio_probe block: Fix the bio.bi_opf comment kselftest/cgroup: fix test_stress.sh to use OUTPUT dir scripts/faddr2line: Fix overlapping text section failures media: aspeed: Fix an error handling path in aspeed_video_probe() media: exynos4-is: Fix PM disable depth imbalance in fimc_is_probe mt76: mt7921: Fix the error handling path of mt7921_pci_probe() mt76: do not attempt to reorder received 802.3 packets without agg session media: st-delta: Fix PM disable depth imbalance in delta_probe media: atmel: atmel-isc: Fix PM disable depth imbalance in atmel_isc_probe media: i2c: rdacm2x: properly set subdev entity function media: exynos4-is: Change clk_disable to clk_disable_unprepare media: pvrusb2: fix array-index-out-of-bounds in pvr2_i2c_core_init media: vsp1: Fix offset calculation for plane cropping media: atmel: atmel-sama5d2-isc: fix wrong mask in YUYV format check media: hantro: HEVC: Fix tile info buffer value computation Bluetooth: fix dangling sco_conn and use-after-free in sco_sock_timeout Bluetooth: use hdev lock in activate_scan for hci_is_adv_monitoring Bluetooth: use hdev lock for accept_list and reject_list in conn req nvme: set dma alignment to dword m68k: math-emu: Fix dependencies of math emulation support sctp: read sk->sk_bound_dev_if once in sctp_rcv() net: hinic: add missing destroy_workqueue in hinic_pf_to_mgmt_init ASoC: ti: j721e-evm: Fix refcount leak in j721e_soc_probe_ kselftest/arm64: bti: force static linking media: ov7670: remove ov7670_power_off from ov7670_remove media: i2c: ov5648: fix wrong pointer passed to IS_ERR() and PTR_ERR() media: staging: media: rkvdec: Make use of the helper function devm_platform_ioremap_resource() media: rkvdec: h264: Fix dpb_valid implementation media: rkvdec: h264: Fix bit depth wrap in pps packet regulator: scmi: Fix refcount leak in scmi_regulator_probe ext4: reject the 'commit' option on ext2 filesystems drm/msm/a6xx: Fix refcount leak in a6xx_gpu_init drm: msm: fix possible memory leak in mdp5_crtc_cursor_set() x86/sev: Annotate stack change in the #VC handler drm/msm: don't free the IRQ if it was not requested selftests/bpf: Add missed ima_setup.sh in Makefile drm/msm/dpu: handle pm_runtime_get_sync() errors in bind path drm/i915: Fix CFI violation with show_dynamic_id() thermal/drivers/bcm2711: Don't clamp temperature at zero thermal/drivers/broadcom: Fix potential NULL dereference in sr_thermal_probe thermal/core: Fix memory leak in __thermal_cooling_device_register() thermal/drivers/imx_sc_thermal: Fix refcount leak in imx_sc_thermal_probe bfq: Relax waker detection for shared queues bfq: Allow current waker to defend against a tentative one ASoC: wm2000: fix missing clk_disable_unprepare() on error in wm2000_anc_transition() PM: domains: Fix initialization of genpd's next_wakeup net: macb: Fix PTP one step sync support NFC: hci: fix sleep in atomic context bugs in nfc_hci_hcp_message_tx ASoC: max98090: Move check for invalid values before casting in max98090_put_enab_tlv() net: stmmac: selftests: Use kcalloc() instead of kzalloc() net: stmmac: fix out-of-bounds access in a selftest hv_netvsc: Fix potential dereference of NULL pointer hwmon: (pmbus) Check PEC support before reading other registers rxrpc: Fix listen() setting the bar too high for the prealloc rings rxrpc: Don't try to resend the request if we're receiving the reply rxrpc: Fix overlapping ACK accounting rxrpc: Don't let ack.previousPacket regress rxrpc: Fix decision on when to generate an IDLE ACK net: huawei: hinic: Use devm_kcalloc() instead of devm_kzalloc() hinic: Avoid some over memory allocation net: dsa: restrict SMSC_LAN9303_I2C kconfig net/smc: postpone sk_refcnt increment in connect() dma-direct: factor out dma_set_{de,en}crypted helpers dma-direct: don't call dma_set_decrypted for remapped allocations dma-direct: always leak memory that can't be re-encrypted dma-direct: don't over-decrypt memory arm64: dts: rockchip: Move drive-impedance-ohm to emmc phy on rk3399 arm64: dts: mt8192: Fix nor_flash status disable typo PCI/ACPI: Allow D3 only if Root Port can signal and wake from D3 memory: samsung: exynos5422-dmc: Avoid some over memory allocation ARM: dts: BCM5301X: update CRU block description ARM: dts: BCM5301X: Update pin controller node name ARM: dts: suniv: F1C100: fix watchdog compatible soc: qcom: smp2p: Fix missing of_node_put() in smp2p_parse_ipc soc: qcom: smsm: Fix missing of_node_put() in smsm_parse_ipc PCI: cadence: Fix find_first_zero_bit() limit PCI: rockchip: Fix find_first_zero_bit() limit PCI: mediatek: Fix refcount leak in mtk_pcie_subsys_powerup() PCI: dwc: Fix setting error return on MSI DMA mapping failure ARM: dts: ci4x10: Adapt to changes in imx6qdl.dtsi regarding fec clocks soc: qcom: llcc: Add MODULE_DEVICE_TABLE() KVM: nVMX: Leave most VM-Exit info fields unmodified on failed VM-Entry KVM: nVMX: Clear IDT vectoring on nested VM-Exit for double/triple fault crypto: qat - set CIPHER capability for QAT GEN2 crypto: qat - set COMPRESSION capability for QAT GEN2 crypto: qat - set CIPHER capability for DH895XCC crypto: qat - set COMPRESSION capability for DH895XCC platform/chrome: cros_ec: fix error handling in cros_ec_register() ARM: dts: imx6dl-colibri: Fix I2C pinmuxing platform/chrome: Re-introduce cros_ec_cmd_xfer and use it for ioctls can: xilinx_can: mark bit timing constants as const ARM: dts: stm32: Fix PHY post-reset delay on Avenger96 ARM: dts: bcm2835-rpi-zero-w: Fix GPIO line name for Wifi/BT ARM: dts: bcm2837-rpi-cm3-io3: Fix GPIO line names for SMPS I2C ARM: dts: bcm2837-rpi-3-b-plus: Fix GPIO line name of power LED ARM: dts: bcm2835-rpi-b: Fix GPIO line names misc: ocxl: fix possible double free in ocxl_file_register_afu crypto: marvell/cesa - ECB does not IV gpiolib: of: Introduce hook for missing gpio-ranges pinctrl: bcm2835: implement hook for missing gpio-ranges arm: mediatek: select arch timer for mt7629 pinctrl/rockchip: support deferring other gpio params pinctrl: mediatek: mt8195: enable driver on mtk platforms arm64: dts: qcom: qrb5165-rb5: Fix can-clock node name Drivers: hv: vmbus: Fix handling of messages with transaction ID of zero powerpc/fadump: fix PT_LOAD segment for boot memory area mfd: ipaq-micro: Fix error check return value of platform_get_irq() scsi: fcoe: Fix Wstringop-overflow warnings in fcoe_wwn_from_mac() soc: bcm: Check for NULL return of devm_kzalloc() arm64: dts: ti: k3-am64-mcu: remove incorrect UART base clock rates ASoC: sh: rz-ssi: Check return value of pm_runtime_resume_and_get() ASoC: sh: rz-ssi: Propagate error codes returned from platform_get_irq_byname() ASoC: sh: rz-ssi: Release the DMA channels in rz_ssi_probe() error path firmware: arm_scmi: Fix list protocols enumeration in the base protocol nvdimm: Fix firmware activation deadlock scenarios nvdimm: Allow overwrite in the presence of disabled dimms pinctrl: mvebu: Fix irq_of_parse_and_map() return value drivers/base/node.c: fix compaction sysfs file leak dax: fix cache flush on PMD-mapped pages drivers/base/memory: fix an unlikely reference counting issue in __add_memory_block() firmware: arm_ffa: Fix uuid parameter to ffa_partition_probe firmware: arm_ffa: Remove incorrect assignment of driver_data list: introduce list_is_head() helper and re-use it in list.h list: fix a data-race around ep->rdllist drm/msm/dpu: fix error check return value of irq_of_parse_and_map() powerpc/8xx: export 'cpm_setbrg' for modules pinctrl: renesas: r8a779a0: Fix GPIO function on I2C-capable pins pinctrl: renesas: core: Fix possible null-ptr-deref in sh_pfc_map_resources() powerpc/idle: Fix return value of __setup() handler powerpc/4xx/cpm: Fix return value of __setup() handler RDMA/hns: Add the detection for CMDQ status in the device initialization process arm64: dts: marvell: espressobin-ultra: fix SPI-NOR config arm64: dts: marvell: espressobin-ultra: enable front USB3 port ASoC: atmel-pdmic: Remove endianness flag on pdmic component ASoC: atmel-classd: Remove endianness flag on class d component proc: fix dentry/inode overinstantiating under /proc/${pid}/net ipc/mqueue: use get_tree_nodev() in mqueue_get_tree() PCI: imx6: Fix PERST# start-up sequence tty: fix deadlock caused by calling printk() under tty_port->lock crypto: sun8i-ss - rework handling of IV crypto: sun8i-ss - handle zero sized sg crypto: cryptd - Protect per-CPU resource by disabling BH. ARM: dts: at91: sama7g5: remove interrupt-parent from gic node hugetlbfs: fix hugetlbfs_statfs() locking Input: sparcspkr - fix refcount leak in bbc_beep_probe PCI/AER: Clear MULTI_ERR_COR/UNCOR_RCV bits PCI: microchip: Fix potential race in interrupt handling hwrng: omap3-rom - fix using wrong clk_disable() in omap_rom_rng_runtime_resume() powerpc/64: Only WARN if __pa()/__va() called with bad addresses powerpc/perf: Fix the threshold compare group constraint for power10 powerpc/perf: Fix the threshold compare group constraint for power9 macintosh: via-pmu and via-cuda need RTC_LIB powerpc/xive: Add some error handling code to 'xive_spapr_init()' powerpc/xive: Fix refcount leak in xive_spapr_init powerpc/fsl_rio: Fix refcount leak in fsl_rio_setup mfd: davinci_voicecodec: Fix possible null-ptr-deref davinci_vc_probe() nfsd: destroy percpu stats counters after reply cache shutdown mailbox: forward the hrtimer if not queued and under a lock RDMA/hfi1: Prevent use of lock before it is initialized KVM: LAPIC: Drop pending LAPIC timer injection when canceling the timer Input: stmfts - do not leave device disabled in stmfts_input_open OPP: call of_node_put() on error path in _bandwidth_supported() f2fs: support fault injection for dquot_initialize() f2fs: fix to do sanity check on inline_dots inode f2fs: fix dereference of stale list iterator after loop body iommu/amd: Enable swiotlb in all cases iommu/mediatek: Fix 2 HW sharing pgtable issue iommu/mediatek: Add list_del in mtk_iommu_remove iommu/mediatek: Remove clk_disable in mtk_iommu_remove iommu/mediatek: Add mutex for m4u_group and m4u_dom in data i2c: at91: use dma safe buffers cpufreq: mediatek: Use module_init and add module_exit cpufreq: mediatek: Unregister platform device on exit iommu/arm-smmu-v3-sva: Fix mm use-after-free MIPS: Loongson: Use hwmon_device_register_with_groups() to register hwmon iommu/mediatek: Fix NULL pointer dereference when printing dev_name i2c: at91: Initialize dma_buf in at91_twi_xfer() dmaengine: idxd: Fix the error handling path in idxd_cdev_register() NFS: Do not report EINTR/ERESTARTSYS as mapping errors NFS: fsync() should report filesystem errors over EINTR/ERESTARTSYS NFS: Don't report ENOSPC write errors twice NFS: Do not report flush errors in nfs_write_end() NFS: Don't report errors from nfs_pageio_complete() more than once NFSv4/pNFS: Do not fail I/O when we fail to allocate the pNFS layout NFS: Further fixes to the writeback error handling video: fbdev: clcdfb: Fix refcount leak in clcdfb_of_vram_setup dmaengine: stm32-mdma: remove GISR1 register dmaengine: stm32-mdma: fix chan initialization in stm32_mdma_irq_handler() iommu/amd: Increase timeout waiting for GA log enablement i2c: npcm: Fix timeout calculation i2c: npcm: Correct register access width i2c: npcm: Handle spurious interrupts i2c: rcar: fix PM ref counts in probe error paths perf build: Fix btf__load_from_kernel_by_id() feature check perf c2c: Use stdio interface if slang is not supported perf jevents: Fix event syntax error caused by ExtSel video: fbdev: vesafb: Fix a use-after-free due early fb_info cleanup NFS: Always initialise fattr->label in nfs_fattr_alloc() NFS: Create a new nfs_alloc_fattr_with_label() function NFS: Convert GFP_NOFS to GFP_KERNEL NFSv4.1 mark qualified async operations as MOVEABLE tasks f2fs: fix to avoid f2fs_bug_on() in dec_valid_node_count() f2fs: fix to do sanity check on block address in f2fs_do_zero_range() f2fs: fix to clear dirty inode in f2fs_evict_inode() f2fs: fix deadloop in foreground GC f2fs: don't need inode lock for system hidden quota f2fs: fix to do sanity check on total_data_blocks f2fs: don't use casefolded comparison for "." and ".." f2fs: fix fallocate to use file_modified to update permissions consistently f2fs: fix to do sanity check for inline inode objtool: Fix objtool regression on x32 systems objtool: Fix symbol creation wifi: mac80211: fix use-after-free in chanctx code iwlwifi: mvm: fix assert 1F04 upon reconfig fs-writeback: writeback_sb_inodes：Recalculate 'wrote' according skipped pages efi: Do not import certificates from UEFI Secure Boot for T2 Macs bfq: Avoid false marking of bic as stably merged bfq: Avoid merging queues with different parents bfq: Split shared queues on move between cgroups bfq: Update cgroup information before merging bio bfq: Drop pointless unlock-lock pair bfq: Remove pointless bfq_init_rq() calls bfq: Track whether bfq_group is still online bfq: Get rid of __bio_blkcg() usage bfq: Make sure bfqg for which we are queueing requests is online ext4: mark group as trimmed only if it was fully scanned ext4: fix use-after-free in ext4_rename_dir_prepare ext4: fix race condition between ext4_write and ext4_convert_inline_data ext4: fix warning in ext4_handle_inode_extension ext4: fix bug_on in ext4_writepages ext4: filter out EXT4_FC_REPLAY from on-disk superblock field s_state ext4: fix bug_on in __es_tree_search ext4: verify dir block before splitting it ext4: avoid cycles in directory h-tree ACPI: property: Release subnode properties with data nodes tty: goldfish: Introduce gf_ioread32()/gf_iowrite32() tracing: Fix potential double free in create_var_ref() tracing: Initialize integer variable to prevent garbage return value drm/amdgpu: add beige goby PCI ID PCI/PM: Fix bridge_d3_blacklist[] Elo i2 overwrite of Gigabyte X299 PCI: qcom: Fix runtime PM imbalance on probe errors PCI: qcom: Fix unbalanced PHY init on probe errors staging: r8188eu: prevent ->Ssid overflow in rtw_wx_set_scan() mm, compaction: fast_find_migrateblock() should return pfn in the target zone s390/perf: obtain sie_block from the right address s390/stp: clock_delta should be signed dlm: fix plock invalid read dlm: uninitialized variable on error in dlm_listen_for_all() dlm: fix missing lkb refcount handling ocfs2: dlmfs: fix error handling of user_dlm_destroy_lock scsi: dc395x: Fix a missing check on list iterator scsi: ufs: qcom: Add a readl() to make sure ref_clk gets enabled landlock: Add clang-format exceptions landlock: Format with clang-format selftests/landlock: Add clang-format exceptions selftests/landlock: Normalize array assignment selftests/landlock: Format with clang-format samples/landlock: Add clang-format exceptions samples/landlock: Format with clang-format landlock: Fix landlock_add_rule(2) documentation selftests/landlock: Make tests build with old libc selftests/landlock: Extend tests for minimal valid attribute size selftests/landlock: Add tests for unknown access rights selftests/landlock: Extend access right tests to directories selftests/landlock: Fully test file rename with "remove" access selftests/landlock: Add tests for O_PATH landlock: Change landlock_add_rule(2) argument check ordering landlock: Change landlock_restrict_self(2) check ordering selftests/landlock: Test landlock_create_ruleset(2) argument check ordering landlock: Define access_mask_t to enforce a consistent access mask size landlock: Reduce the maximum number of layers to 16 landlock: Create find_rule() from unmask_layers() landlock: Fix same-layer rule unions drm/amdgpu/cs: make commands with 0 chunks illegal behaviour. drm/nouveau/subdev/bus: Ratelimit logging for fault errors drm/etnaviv: check for reaped mapping in etnaviv_iommu_unmap_gem drm/nouveau/clk: Fix an incorrect NULL check on list iterator drm/nouveau/kms/nv50-: atom: fix an incorrect NULL check on list iterator drm/bridge: analogix_dp: Grab runtime PM reference for DP-AUX drm/i915/dsi: fix VBT send packet port selection for ICL+ md: fix an incorrect NULL check in does_sb_need_changing md: fix an incorrect NULL check in md_reload_sb mtd: cfi_cmdset_0002: Move and rename chip_check/chip_ready/chip_good_for_write mtd: cfi_cmdset_0002: Use chip_ready() for write on S29GL064N media: coda: Fix reported H264 profile media: coda: Add more H264 levels for CODA960 ima: remove the IMA_TEMPLATE Kconfig option Kconfig: Add option for asm goto w/ tied outputs to workaround clang-13 bug RDMA/hfi1: Fix potential integer multiplication overflow errors mmc: core: Allows to override the timeout value for ioctl() path csky: patch_text: Fixup last cpu should be master irqchip/armada-370-xp: Do not touch Performance Counter Overflow on A375, A38x, A39x irqchip: irq-xtensa-mx: fix initial IRQ affinity thermal: devfreq_cooling: use local ops instead of global ops cfg80211: declare MODULE_FIRMWARE for regulatory.db mac80211: upgrade passive scan to active scan on DFS channels after beacon rx um: Use asm-generic/dma-mapping.h um: chan_user: Fix winch_tramp() return value um: Fix out-of-bounds read in LDT setup kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add] ftrace: Clean up hash direct_functions on register failures ksmbd: fix outstanding credits related bugs iommu/msm: Fix an incorrect NULL check on list iterator iommu/dma: Fix iova map result check bug Revert "mm/cma.c: remove redundant cma_mutex lock" mm/page_alloc: always attempt to allocate at least one page during bulk allocation nodemask.h: fix compilation error with GCC12 hugetlb: fix huge_pmd_unshare address update mm/memremap: fix missing call to untrack_pfn() in pagemap_range() xtensa/simdisk: fix proc_read_simdisk() rtl818x: Prevent using not initialized queues ASoC: rt5514: Fix event generation for "DSP Voice Wake Up" control carl9170: tx: fix an incorrect use of list iterator stm: ltdc: fix two incorrect NULL checks on list iterator bcache: improve multithreaded bch_btree_check() bcache: improve multithreaded bch_sectors_dirty_init() bcache: remove incremental dirty sector counting for bch_sectors_dirty_init() bcache: avoid journal no-space deadlock by reserving 1 journal bucket serial: pch: don't overwrite xmit->buf[0] by x_char tilcdc: tilcdc_external: fix an incorrect NULL check on list iterator gma500: fix an incorrect NULL check on list iterator arm64: dts: qcom: ipq8074: fix the sleep clock frequency arm64: tegra: Add missing DFLL reset on Tegra210 clk: tegra: Add missing reset deassertion phy: qcom-qmp: fix struct clk leak on probe errors ARM: dts: s5pv210: Remove spi-cs-high on panel in Aries ARM: pxa: maybe fix gpio lookup tables SMB3: EBADF/EIO errors in rename/open caused by race condition in smb2_compound_op docs/conf.py: Cope with removal of language=None in Sphinx 5.0.0 dt-bindings: gpio: altera: correct interrupt-cells vdpasim: allow to enable a vq repeatedly blk-iolatency: Fix inflight count imbalances and IO hangs on offline coresight: core: Fix coresight device probe failure issue phy: qcom-qmp: fix reset-controller leak on probe errors net: ipa: fix page free in ipa_endpoint_trans_release() net: ipa: fix page free in ipa_endpoint_replenish_one() kseltest/cgroup: Make test_stress.sh work if run interactively list: test: Add a test for list_is_head() Revert "random: use static branch for crng_ready()" staging: r8188eu: delete rtw_wx_read/write32() RDMA/hns: Remove the num_cqc_timer variable RDMA/rxe: Generate a completion for unsupported/invalid opcode MIPS: IP27: Remove incorrect `cpu_has_fpu' override MIPS: IP30: Remove incorrect `cpu_has_fpu' override ext4: only allow test_dummy_encryption when supported interconnect: qcom: sc7180: Drop IP0 interconnects interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate fs: add two trivial lookup helpers exportfs: support idmapped mounts fs/ntfs3: Fix invalid free in log_replay md: Don't set mddev private to NULL in raid0 pers->free md: fix double free of io_acct_set bioset md: bcache: check the return value of kzalloc() in detached_dev_do_request() pinctrl/rockchip: support setting input-enable param block: fix bio_clone_blkg_association() to associate with proper blkcg_gq Linux 5.15.46 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I7b65df29c22a01b81a94cd844867a18e73098a15	2022-07-13 11:40:42 +02:00
Dong Aisheng	c142bddf37	Revert "mm/cma.c: remove redundant cma_mutex lock" commit 60a60e32cf91169840abcb4a80f0b0df31708ba7 upstream. This reverts commit `a4efc174b3` which introduced a regression issue that when there're multiple processes allocating dma memory in parallel by calling dma_alloc_coherent(), it may fail sometimes as follows: Error log: cma: cma_alloc: linux,cma: alloc failed, req-size: 148 pages, ret: -16 cma: number of available pages: 3@125+20@172+12@236+4@380+32@736+17@2287+23@2473+20@36076+99@40477+108@40852+44@41108+20@41196+108@41364+108@41620+ 108@42900+108@43156+483@44061+1763@45341+1440@47712+20@49324+20@49388+5076@49452+2304@55040+35@58141+20@58220+20@58284+ 7188@58348+84@66220+7276@66452+227@74525+6371@75549=> 33161 free of 81920 total pages When issue happened, we saw there were still 33161 pages (129M) free CMA memory and a lot available free slots for 148 pages in CMA bitmap that we want to allocate. When dumping memory info, we found that there was also ~342M normal memory, but only 1352K CMA memory left in buddy system while a lot of pageblocks were isolated. Memory info log: Normal free:351096kB min:30000kB low:37500kB high:45000kB reserved_highatomic:0KB active_anon:98060kB inactive_anon:98948kB active_file:60864kB inactive_file:31776kB unevictable:0kB writepending:0kB present:1048576kB managed:1018328kB mlocked:0kB bounce:0kB free_pcp:220kB local_pcp:192kB free_cma:1352kB lowmem_reserve[]: 0 0 0 Normal: 784kB (UECI) 17728kB (UMECI) 133516kB (UMECI) 36032kB (UMECI) 6564kB (UMCI) 36128kB (UMECI) 16256kB (UMCI) 6512kB (EI) 81024kB (UEI) 42048kB (MI) 84096kB (EI) 88192kB (UI) 316384kB (EI) 832768kB (M) = 489288kB The root cause of this issue is that since commit `a4efc174b3` ("mm/cma.c: remove redundant cma_mutex lock"), CMA supports concurrent memory allocation. It's possible that the memory range process A trying to alloc has already been isolated by the allocation of process B during memory migration. The problem here is that the memory range isolated during one allocation by start_isolate_page_range() could be much bigger than the real size we want to alloc due to the range is aligned to MAX_ORDER_NR_PAGES. Taking an ARMv7 platform with 1G memory as an example, when MAX_ORDER_NR_PAGES is big (e.g. 32M with max_order 14) and CMA memory is relatively small (e.g. 128M), there're only 4 MAX_ORDER slot, then it's very easy that all CMA memory may have already been isolated by other processes when one trying to allocate memory using dma_alloc_coherent(). Since current CMA code will only scan one time of whole available CMA memory, then dma_alloc_coherent() may easy fail due to contention with other processes. This patch simply falls back to the original method that using cma_mutex to make alloc_contig_range() run sequentially to avoid the issue. Link: https://lkml.kernel.org/r/20220509094551.3596244-1-aisheng.dong@nxp.com Link: https://lore.kernel.org/all/20220315144521.3810298-2-aisheng.dong@nxp.com/ Fixes: `a4efc174b3` ("mm/cma.c: remove redundant cma_mutex lock") Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> [5.11+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-06-09 10:23:27 +02:00
Lee Jones	85adc860fd	Merge `6efb943b86` Linux 5.13-rc1 into android-mainline One giant leap, all the way up to 5.13-rc1 Also take the opportunity to re-align (a.k.a. fix a couple of previous merge conflict fix-up issues) which occurred during this merge-window. Fixes: `4797acfb9c` ("Merge `16b3d0cf5b` Merge tag 'sched-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into android-mainline") Fixes: `92f282f338` ("Merge `8ca5297e7e` Merge tag 'kconfig-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild into android-mainline") Signed-off-by: Lee Jones <lee.jones@linaro.org> Change-Id: Ie9389f595776e8f66bba6eaf0fa7a3587c6a5749	2021-05-15 09:09:01 +01:00
Minchan Kim	78fa51503f	mm: use proper type for cma_[alloc\|release] size_t in cma_alloc is confusing since it makes people think it's byte count, not pages. Change it to unsigned long[1]. The unsigned int in cma_release is also not right so change it. Since we have unsigned long in cma_release, free_contig_range should also respect it. [1] `67a2e213e7`, mm: cma: fix incorrect type conversion for size during dma allocation Link: https://lore.kernel.org/linux-mm/20210324043434.GP1719932@casper.infradead.org/ Link: https://lkml.kernel.org/r/20210331164018.710560-1-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-05 11:27:24 -07:00
Minchan Kim	3aab8ae7aa	mm: cma: add the CMA instance name to cma trace events There were missing places to add cma instance name. To identify each CMA instance, let's add the name for every cma trace. This patch also changes the existing cma_trace_alloc to cma_trace_finish since we have cma_alloc_start[1]. [1] https://lore.kernel.org/linux-mm/20210324160740.15901-1-georgi.djakov@linaro.org Link: https://lkml.kernel.org/r/20210330220237.748899-1-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Liam Mark <lmark@codeaurora.org> Cc: Georgi Djakov <georgi.djakov@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-05 11:27:24 -07:00
Minchan Kim	43ca106fa8	mm: cma: support sysfs Since CMA is getting used more widely, it's more important to keep monitoring CMA statistics for system health since it's directly related to user experience. This patch introduces sysfs statistics for CMA, in order to provide some basic monitoring of the CMA allocator. * the number of CMA page successful allocations * the number of CMA page allocation failures These two values allow the user to calcuate the allocation failure rate for each CMA area. e.g.) /sys/kernel/mm/cma/WIFI/alloc_pages_[success\|fail] /sys/kernel/mm/cma/SENSOR/alloc_pages_[success\|fail] /sys/kernel/mm/cma/BLUETOOTH/alloc_pages_[success\|fail] The cma_stat was intentionally allocated by dynamic allocation to harmonize with kobject lifetime management. https://lore.kernel.org/linux-mm/YCOAmXqt6dZkCQYs@kroah.com/ Link: https://lkml.kernel.org/r/20210324230759.2213957-1-minchan@kernel.org Link: https://lore.kernel.org/linux-mm/20210316100433.17665-1-colin.king@canonical.com/ Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Colin Ian King <colin.king@canonical.com> Tested-by: Dmitry Osipenko <digetx@gmail.com> Reviewed-by: Dmitry Osipenko <digetx@gmail.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Tested-by: Anders Roxell <anders.roxell@linaro.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: John Dias <joaodias@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Colin Ian King <colin.king@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-05 11:27:24 -07:00
Liam Mark	7bc1aec5e2	mm: cma: add trace events for CMA alloc perf testing Add cma and migrate trace events to enable CMA allocation performance to be measured via ftrace. [georgi.djakov@linaro.org: add the CMA instance name to the cma_alloc_start trace event] Link: https://lkml.kernel.org/r/20210326155414.25006-1-georgi.djakov@linaro.org Link: https://lkml.kernel.org/r/20210324160740.15901-1-georgi.djakov@linaro.org Signed-off-by: Liam Mark <lmark@codeaurora.org> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-05 11:27:24 -07:00
Baolin Wang	63f83b31f4	mm: cma: use pr_err_ratelimited for CMA warning If we did not reserve extra CMA memory, the log buffer can be easily filled up by CMA failure warning when the devices calling dmam_alloc_coherent() to alloc DMA memory. Thus we can use pr_err_ratelimited() instead to reduce the duplicate CMA warning. Link: https://lkml.kernel.org/r/ce2251ef49e1727a9a40531d1996660b05462bd2.1615279825.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-05 11:27:24 -07:00
Minchan Kim	bbb269206f	mm: vmstat: add cma statistics Since CMA is used more widely, it's worth to have CMA allocation statistics into vmstat. With it, we could know how agressively system uses cma allocation and how often it fails. Link: https://lkml.kernel.org/r/20210302183346.3707237-1-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: John Dias <joaodias@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-05 11:27:24 -07:00
Mike Kravetz	0ef7dcac99	mm/cma: change cma mutex to irq safe spinlock Patch series "make hugetlb put_page safe for all calling contexts", v5. This effort is the result a recent bug report [1]. Syzbot found a potential deadlock in the hugetlb put_page/free_huge_page_path. WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected Since the free_huge_page_path already has code to 'hand off' page free requests to a workqueue, a suggestion was proposed to make the in_irq() detection accurate by always enabling PREEMPT_COUNT [2]. The outcome of that discussion was that the hugetlb put_page path (free_huge_page) path should be properly fixed and safe for all calling contexts. [1] https://lore.kernel.org/linux-mm/000000000000f1c03b05bc43aadc@google.com/ [2] http://lkml.kernel.org/r/20210311021321.127500-1-mike.kravetz@oracle.com This patch (of 8): cma_release is currently a sleepable operatation because the bitmap manipulation is protected by cma->lock mutex. Hugetlb code which relies on cma_release for CMA backed (giga) hugetlb pages, however, needs to be irq safe. The lock doesn't protect any sleepable operation so it can be changed to a (irq aware) spin lock. The bitmap processing should be quite fast in typical case but if cma sizes grow to TB then we will likely need to replace the lock by a more optimized bitmap implementation. Link: https://lkml.kernel.org/r/20210409205254.242291-1-mike.kravetz@oracle.com Link: https://lkml.kernel.org/r/20210409205254.242291-2-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Muchun Song <songmuchun@bytedance.com> Cc: David Rientjes <rientjes@google.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: HORIGUCHI NAOYA <naoya.horiguchi@nec.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com> Cc: Waiman Long <longman@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Barry Song <song.bao.hua@hisilicon.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-05 11:27:21 -07:00
Greg Kroah-Hartman	14f1854cdb	Merge `fecfd01539` ("Merge tag 'leds-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds") into android-mainline Steps on the way to 5.12-rc1 Resolves merge conflicts in: drivers/dma-buf/dma-heap.c drivers/dma-buf/heaps/cma_heap.c drivers/dma-buf/heaps/system_heap.c include/linux/dma-heap.h Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ibb32dbdba5183c9e19f5d1e94016cc1ae9616173	2021-03-07 08:45:40 +01:00
Patrick Daly	a052d4d13d	mm: cma: print region name on failure Print the name of the CMA region for convenience. This is useful information to have when cma_alloc() fails. [pdaly@codeaurora.org: print the "count" variable] Link: https://lkml.kernel.org/r/20210209142414.12768-1-georgi.djakov@linaro.org Link: https://lkml.kernel.org/r/20210208115200.20286-1-georgi.djakov@linaro.org Signed-off-by: Patrick Daly <pdaly@codeaurora.org> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org> Acked-by: Minchan Kim <minchan@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-02-26 09:41:00 -08:00
David Hildenbrand	072355c1cf	mm/cma: expose all pages to the buddy if activation of an area fails Right now, if activation fails, we might already have exposed some pages to the buddy for CMA use (although they will never get actually used by CMA), and some pages won't be exposed to the buddy at all. Let's check for "single zone" early and on error, don't expose any pages for CMA use - instead, expose them to the buddy available for any use. Simply call free_reserved_page() on every single page - easier than going via free_reserved_area(), converting back and forth between pfns and virt addresses. In addition, make sure to fixup totalcma_pages properly. Example: 6 GiB QEMU VM with "... hugetlb_cma=2G movablecore=20% ...": [ 0.006891] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node [ 0.006893] cma: Reserved 2048 MiB at 0x0000000100000000 [ 0.006893] hugetlb_cma: reserved 2048 MiB on node 0 ... [ 0.175433] cma: CMA area hugetlb0 could not be activated Before this patch: # cat /proc/meminfo MemTotal: 5867348 kB MemFree: 5692808 kB MemAvailable: 5542516 kB ... CmaTotal: 2097152 kB CmaFree: 1884160 kB After this patch: # cat /proc/meminfo MemTotal: 6077308 kB MemFree: 5904208 kB MemAvailable: 5747968 kB ... CmaTotal: 0 kB CmaFree: 0 kB Note: cma_init_reserved_mem() makes sure that we always cover full pageblocks / MAX_ORDER - 1 pages. Link: https://lkml.kernel.org/r/20210127101813.6370-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-02-26 09:41:00 -08:00
Roman Gushchin	df2ff39e78	mm: cma: allocate cma areas bottom-up Currently cma areas without a fixed base are allocated close to the end of the node. This placement is sub-optimal because of compaction: it brings pages into the cma area. In particular, it can bring in hot executable pages, even if there is a plenty of free memory on the machine. This results in cma allocation failures. Instead let's place cma areas close to the beginning of a node. In this case the compaction will help to free cma areas, resulting in better cma allocation success rates. If there is enough memory let's try to allocate bottom-up starting with 4GB to exclude any possible interference with DMA32. On smaller machines or in a case of a failure, stick with the old behavior. 16GB vm, 2GB cma area: With this patch: [ 0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G [ 0.002928] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node [ 0.002930] cma: Reserved 2048 MiB at 0x0000000100000000 [ 0.002931] hugetlb_cma: reserved 2048 MiB on node 0 Without this patch: [ 0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G [ 0.002930] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node [ 0.002933] cma: Reserved 2048 MiB at 0x00000003c0000000 [ 0.002934] hugetlb_cma: reserved 2048 MiB on node 0 v2: - switched to memblock_set_bottom_up(true), by Mike - start with 4GB, by Mike [guro@fb.com: whitespace fix, per Mike] Link: https://lkml.kernel.org/r/20201221170551.GB3428478@carbon.DHCP.thefacebook.com [guro@fb.com: fix 32-bit warnings] Link: https://lkml.kernel.org/r/20201223163537.GA4011967@carbon.DHCP.thefacebook.com [guro@fb.com: fix 32-bit systems] [akpm@linux-foundation.org: build fix] Link: https://lkml.kernel.org/r/20201217201214.3414100-1-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Wonhyuk Yang <vvghjk1234@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Rik van Riel <riel@surriel.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-02-26 09:41:00 -08:00
Greg Kroah-Hartman	8c3b398d8c	Merge `ac73e3dc8a` ("Merge branch 'akpm' (patches from Andrew)") into android-mainline Steps on the way to 5.11-rc1 Change-Id: I23957617a1e123aa05d3c1d48ea24e6acd131bdd Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2020-12-17 07:57:30 +01:00
Charan Teja Reddy	b8ca396f98	mm: cma: improve pr_debug log in cma_release() It is required to print 'count' of pages, along with the pages, passed to cma_release to debug the cases of mismatched count value passed between cma_alloc() and cma_release() from a code path. As an example, consider the below scenario: 1) CMA pool size is 4MB and 2) User doing the erroneous step of allocating 2 pages but freeing 1 page in a loop from this CMA pool. The step 2 causes cma_alloc() to return NULL at one point of time because of -ENOMEM condition. And the current pr_debug logs is not giving the info about these types of allocation patterns because of count value not being printed in cma_release(). We are printing the count value in the trace logs, just extend the same to pr_debug logs too. [akpm@linux-foundation.org: fix printk warning] Link: https://lkml.kernel.org/r/1606318341-29521-1-git-send-email-charante@codeaurora.org Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> Reviewed-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Vinayak Menon <vinmenon@codeaurora.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-12-15 12:13:46 -08:00
Lecopzer Chen	a4efc174b3	mm/cma.c: remove redundant cma_mutex lock The cma_mutex which protects alloc_contig_range() was first appeared in commit `7ee793a62f` ("cma: Remove potential deadlock situation"), at that time, there is no guarantee the behavior of concurrency inside alloc_contig_range(). After commit `2c7452a075` ("mm/page_isolation.c: make start_isolate_page_range() fail if already isolated") > However, two subsystems (CMA and gigantic > huge pages for example) could attempt operations on the same range. If > this happens, one thread may 'undo' the work another thread is doing. > This can result in pageblocks being incorrectly left marked as > MIGRATE_ISOLATE and therefore not available for page allocation. The concurrency inside alloc_contig_range() was clarified. Now we can find that hugepage and virtio call alloc_contig_range() without any lock, thus cma_mutex is "redundant" in cma_alloc() now. Link: https://lkml.kernel.org/r/20201020102241.3729-1-lecopzer.chen@mediatek.com Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: YJ Chiang <yj.chiang@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-12-15 12:13:46 -08:00
Chris Goldsworthy	73eda8ec74	ANDROID: mm: cma: retry allocations in cma_alloc CMA allocations will fail if 'pinned' pages are in a CMA area, since we cannot migrate pinned pages. The _refcount of a struct page being greater than _mapcount for that page can cause pinning for anonymous pages. This is because try_to_unmap(), which (1) is called in the CMA allocation path, and (2) decrements both _refcount and _mapcount for a page, will stop unmapping a page from VMAs once the _mapcount for a page reaches 0. This implies that after try_to_unmap() has finished successfully for a page where _recount > _mapcount, that _refcount will be greater than 0. Later in the CMA allocation path in migrate_page_move_mapping(), we will have one more reference count than intended for anonymous pages, meaning the allocation will fail for that page. One example of where _refcount can be greater than _mapcount for a page we would not expect to be pinned is inside of copy_one_pte(), which is called during a fork. For ptes for which pte_present(pte) == true, copy_one_pte() will increment the _refcount field followed by the _mapcount field of a page. If the process doing copy_one_pte() is context switched out after incrementing _refcount but before incrementing _mapcount, then the page will be temporarily pinned. So, inside of cma_alloc(), instead of giving up when alloc_contig_range() returns -EBUSY after having scanned a whole CMA-region bitmap, perform retries with sleeps to give the system an opportunity to unpin any pinned pages. Additionally, based off feedback by Minchan Kim, add the ability to exit early if a fatal signal is pending (this is a delta from the mailing-list version of this patch). Bug: 168521646 Link: https://lore.kernel.org/lkml/1596682582-29139-2-git-send-email-cgoldswo@codeaurora.org/ Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org> Co-developed-by: Susheel Khiani <skhiani@codeaurora.org> Signed-off-by: Susheel Khiani <skhiani@codeaurora.org> Co-developed-by: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Change-Id: I2f0c8388f9163e0decd631d9ae07bb6ad9ab79c8	2020-10-01 11:54:10 -07:00
Greg Kroah-Hartman	418b4bd4a0	Merge `dc06fe51d2` ("Merge tag 'rtc-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux") into android-mainline Steps on the way to 5.9-rc1. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Iceded779988ff472863b7e1c54e22a9fa6383a30	2020-08-13 09:09:55 +02:00
Mike Kravetz	3a5139f1c5	cma: don't quit at first error when activating reserved areas The routine cma_init_reserved_areas is designed to activate all reserved cma areas. It quits when it first encounters an error. This can leave some areas in a state where they are reserved but not activated. There is no feedback to code which performed the reservation. Attempting to allocate memory from areas in such a state will result in a BUG. Modify cma_init_reserved_areas to always attempt to activate all areas. The called routine, cma_activate_area is responsible for leaving the area in a valid state. No one is making active use of returned error codes, so change the routine to void. How to reproduce: This example uses kernelcore, hugetlb and cma as an easy way to reproduce. However, this is a more general cma issue. Two node x86 VM 16GB total, 8GB per node Kernel command line parameters, kernelcore=4G hugetlb_cma=8G Related boot time messages, hugetlb_cma: reserve 8192 MiB, up to 4096 MiB per node cma: Reserved 4096 MiB at 0x0000000100000000 hugetlb_cma: reserved 4096 MiB on node 0 cma: Reserved 4096 MiB at 0x0000000300000000 hugetlb_cma: reserved 4096 MiB on node 1 cma: CMA area hugetlb could not be activated # echo 8 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI ... Call Trace: bitmap_find_next_zero_area_off+0x51/0x90 cma_alloc+0x1a5/0x310 alloc_fresh_huge_page+0x78/0x1a0 alloc_pool_huge_page+0x6f/0xf0 set_max_huge_pages+0x10c/0x250 nr_hugepages_store_common+0x92/0x120 ? __kmalloc+0x171/0x270 kernfs_fop_write+0xc1/0x1a0 vfs_write+0xc7/0x1f0 ksys_write+0x5f/0xe0 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: `c64be2bb1c` ("drivers: add Contiguous Memory Allocator") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Roman Gushchin <guro@fb.com> Acked-by: Barry Song <song.bao.hua@hisilicon.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200730163123.6451-1-mike.kravetz@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-08-12 10:57:57 -07:00
Barry Song	18e98e56f4	mm: cma: fix the name of CMA areas Patch series "mm: fix the names of general cma and hugetlb cma", v2. The current code of CMA can only work when users pass a const string as name parameter. we need to fix the way to handle names in CMA. On the other hand, to avoid name conflicts after enabling CMA_DEBUGFS, each hugetlb should get a different CMA name. This patch (of 2): If users give a name saved in stack, the current code will generate magic pointer. if users don't give a name(NULL), kasprintf() will always return NULL as we are at the early stage. that means cma_init_reserved_mem() will return -ENOMEM if users set name parameter as NULL. [natechancellor@gmail.com: return cma->name directly in cma_get_name] Link: https://github.com/ClangBuiltLinux/linux/issues/1063 Link: http://lkml.kernel.org/r/20200623015840.621964-1-natechancellor@gmail.com Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Roman Gushchin <guro@fb.com> Link: http://lkml.kernel.org/r/20200616223131.33828-2-song.bao.hua@hisilicon.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-08-12 10:57:57 -07:00
Jianqun Xu	835832ba01	mm/cma.c: fix NULL pointer dereference when cma could not be activated In some case the cma area could not be activated, but the cma_alloc be used under this case, then the kernel will crash caused by NULL pointer dereference. Add bitmap valid check in cma_alloc to avoid this issue. Signed-off-by: Jianqun Xu <jay.xu@rock-chips.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Link: http://lkml.kernel.org/r/20200615010123.15596-1-jay.xu@rock-chips.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-08-12 10:57:57 -07:00
Greg Kroah-Hartman	ca9814bc63	Merge 5.8-rc4 into android-mainline Linux 5.8-rc4 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Iccdf79fdb94208b33796eca02bb813482e646ab1	2020-07-06 09:05:59 +02:00
Barry Song	40366bd70b	mm/cma.c: use exact_nid true to fix possible per-numa cma leak Calling cma_declare_contiguous_nid() with false exact_nid for per-numa reservation can easily cause cma leak and various confusion. For example, mm/hugetlb.c is trying to reserve per-numa cma for gigantic pages. But it can easily leak cma and make users confused when system has memoryless nodes. In case the system has 4 numa nodes, and only numa node0 has memory. if we set hugetlb_cma=4G in bootargs, mm/hugetlb.c will get 4 cma areas for 4 different numa nodes. since exact_nid=false in current code, all 4 numa nodes will get cma successfully from node0, but hugetlb_cma[1 to 3] will never be available to hugepage will only allocate memory from hugetlb_cma[0]. In case the system has 4 numa nodes, both numa node0&2 has memory, other nodes have no memory. if we set hugetlb_cma=4G in bootargs, mm/hugetlb.c will get 4 cma areas for 4 different numa nodes. since exact_nid=false in current code, all 4 numa nodes will get cma successfully from node0 or 2, but hugetlb_cma[1] and [3] will never be available to hugepage as mm/hugetlb.c will only allocate memory from hugetlb_cma[0] and hugetlb_cma[2]. This causes permanent leak of the cma areas which are supposed to be used by memoryless node. Of cource we can workaround the issue by letting mm/hugetlb.c scan all cma areas in alloc_gigantic_page() even node_mask includes node0 only. that means when node_mask includes node0 only, we can get page from hugetlb_cma[1] to hugetlb_cma[3]. But this will cause kernel crash in free_gigantic_page() while it wants to free page by: cma_release(hugetlb_cma[page_to_nid(page)], page, 1 << order) On the other hand, exact_nid=false won't consider numa distance, it might be not that useful to leverage cma areas on remote nodes. I feel it is much simpler to make exact_nid true to make everything clear. After that, memoryless nodes won't be able to reserve per-numa CMA from other nodes which have memory. Fixes: `cf11e85fc0` ("mm: hugetlb: optionally allocate gigantic hugepages using cma") Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Roman Gushchin <guro@fb.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Aslan Bakirov <aslan@fb.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Andreas Schaufler <andreas.schaufler@gmx.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Rik van Riel <riel@surriel.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200628074345.27228-1-song.bao.hua@hisilicon.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-07-03 16:15:25 -07:00
Greg Kroah-Hartman	5631819eaf	Merge `5b8b9d0c6d` ("Merge branch 'akpm' (patches from Andrew)") into android-mainline Steps along the way to the 5.7-rc1 merge. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Iaf237a174205979344cfa76274198e87e2ba7799	2020-04-11 12:04:04 +02:00
Aslan Bakirov	8676af1ff2	mm: cma: NUMA node interface I've noticed that there is no interface exposed by CMA which would let me to declare contigous memory on particular NUMA node. This patchset adds the ability to try to allocate contiguous memory on a specific node. It will fallback to other nodes if the specified one doesn't work. Implement a new method for declaring contigous memory on particular node and keep cma_declare_contiguous() as a wrapper. [akpm@linux-foundation.org: build fix] Signed-off-by: Aslan Bakirov <aslan@fb.com> Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@kernel.org> Cc: Andreas Schaufler <andreas.schaufler@gmx.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Rik van Riel <riel@surriel.com> Cc: Joonsoo Kim <js1304@gmail.com> Link: http://lkml.kernel.org/r/20200407163840.92263-2-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-10 15:36:21 -07:00
Greg Kroah-Hartman	d3a196a371	Merge 5.5-rc1 into android-mainline Linux 5.5-rc1 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I6f952ebdd40746115165a2f99bab340482f5c237	2019-12-09 12:12:00 +01:00
Yunfeng Ye	2184f9928a	mm/cma.c: switch to bitmap_zalloc() for cma bitmap allocation kzalloc() is used for cma bitmap allocation in cma_activate_area(), switch to bitmap_zalloc() for clarity. Link: http://lkml.kernel.org/r/895d4627-f115-c77a-d454-c0a196116426@huawei.com Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Yue Hu <huyue2@yulong.com> Cc: Peng Fan <peng.fan@nxp.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Ryohei Suzuki <ryh.szk.cmnty@gmail.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Doug Berger <opendmb@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-12-01 12:59:09 -08:00
Sandeep Patil	716306e82c	ANDROID: GKI: export cma symbols for cma heap as a module Bug: 140294230 Test: builds Change-Id: I04c12174934c24a704d5c1e5be3e7e948c777a78 Signed-off-by: Sandeep Patil <sspatil@google.com>	2019-09-24 12:03:45 -07:00
Doug Berger	c633324e31	mm/cma.c: fail if fixed declaration can't be honored The description of cma_declare_contiguous() indicates that if the 'fixed' argument is true the reserved contiguous area must be exactly at the address of the 'base' argument. However, the function currently allows the 'base', 'size', and 'limit' arguments to be silently adjusted to meet alignment constraints. This commit enforces the documented behavior through explicit checks that return an error if the region does not fit within a specified region. Link: http://lkml.kernel.org/r/1561422051-16142-1-git-send-email-opendmb@gmail.com Fixes: `5ea3b1b2f8` ("cma: add placement specifier for "cma=" kernel parameter") Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Cc: Yue Hu <huyue2@yulong.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Peng Fan <peng.fan@nxp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-07-16 19:23:21 -07:00
Ryohei Suzuki	929f92f780	mm/cma.c: fix a typo ("alloc_cma" -> "cma_alloc") in cma_release() comments A comment referred to a non-existent function alloc_cma(), which should have been cma_alloc(). Link: http://lkml.kernel.org/r/20190712085549.5920-1-ryh.szk.cmnty@gmail.com Signed-off-by: Ryohei Suzuki <ryh.szk.cmnty@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-07-16 19:23:21 -07:00
Thomas Gleixner	8607a96520	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 98 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your optional any later version of the license extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520075212.713472955@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-05-24 17:37:54 +02:00
Yue Hu	1df3a33907	mm/cma.c: fix crash on CMA allocation if bitmap allocation fails `f022d8cb7e` ("mm: cma: Don't crash on allocation if CMA area can't be activated") fixes the crash issue when activation fails via setting cma->count as 0, same logic exists if bitmap allocation fails. Link: http://lkml.kernel.org/r/20190325081309.6004-1-zbestahu@gmail.com Signed-off-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 09:47:47 -07:00
Yue Hu	2b59e01a3a	mm/cma.c: fix the bitmap status to show failed allocation reason Currently one bit in cma bitmap represents number of pages rather than one page, cma->count means cma size in pages. So to find available pages via find_next_zero_bit()/find_next_bit() we should use cma size not in pages but in bits although current free pages number is correct due to zero value of order_per_bit. Once order_per_bit is changed the bitmap status will be incorrect. The size input in cma_debug_show_areas() is not correct. It will affect the available pages at some position to debug the failure issue. This is an example with order_per_bit = 1 Before this change: [ 4.120060] cma: number of available pages: 1@93+4@108+7@121+7@137+7@153+7@169+7@185+7@201+3@213+3@221+3@229+3@237+3@245+3@253+3@261+3@269+3@277+3@285+3@293+3@301+3@309+3@317+3@325+19@333+15@369+512@512=> 638 free of 1024 total pages After this change: [ 4.143234] cma: number of available pages: 2@93+8@108+14@121+14@137+14@153+14@169+14@185+14@201+6@213+6@221+6@229+6@237+6@245+6@253+6@261+6@269+6@277+6@285+6@293+6@301+6@309+6@317+6@325+38@333+30@369=> 252 free of 1024 total pages Obviously the bitmap status before is incorrect. Link: http://lkml.kernel.org/r/20190320060829.9144-1-zbestahu@gmail.com Signed-off-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Laura Abbott <labbott@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 09:47:46 -07:00
Mike Rapoport	8a770c2a83	memblock: emphasize that memblock_alloc_range() returns a physical address Rename memblock_alloc_range() to memblock_phys_alloc_range() to emphasize that it returns a physical address. While on it, remove the 'enum memblock_flags' parameter from this function as its only user anyway sets it to MEMBLOCK_NONE, which is the default for the most of memblock allocations. Link: http://lkml.kernel.org/r/1548057848-15136-6-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Guo Ren <ren_guo@c-sky.com> [c-sky] Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Juergen Gross <jgross@suse.com> [Xen] Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Burton <paul.burton@mips.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-03-12 10:04:01 -07:00
Peng Fan	0d3bd18a5e	mm/cma.c: cma_declare_contiguous: correct err handling In case cma_init_reserved_mem failed, need to free the memblock allocated by memblock_reserve or memblock_alloc_range. Quote Catalin's comments: https://lkml.org/lkml/2019/2/26/482 Kmemleak is supposed to work with the memblock_{alloc,free} pair and it ignores the memblock_reserve() as a memblock_alloc() implementation detail. It is, however, tolerant to memblock_free() being called on a sub-range or just a different range from a previous memblock_alloc(). So the original patch looks fine to me. FWIW: Link: http://lkml.kernel.org/r/20190227144631.16708-1-peng.fan@nxp.com Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-03-05 21:07:21 -08:00
Andrey Konovalov	2813b9c029	kasan, mm, arm64: tag non slab memory allocated via pagealloc Tag-based KASAN doesn't check memory accesses through pointers tagged with 0xff. When page_address is used to get pointer to memory that corresponds to some page, the tag of the resulting pointer gets set to 0xff, even though the allocated memory might have been tagged differently. For slab pages it's impossible to recover the correct tag to return from page_address, since the page might contain multiple slab objects tagged with different values, and we can't know in advance which one of them is going to get accessed. For non slab pages however, we can recover the tag in page_address, since the whole page was marked with the same tag. This patch adds tagging to non slab memory allocated with pagealloc. To set the tag of the pointer returned from page_address, the tag gets stored to page->flags when the memory gets allocated. Link: http://lkml.kernel.org/r/d758ddcef46a5abc9970182b9137e2fbee202a2c.1544099024.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-12-28 12:11:44 -08:00
Marek Szyprowski	6518202970	mm/cma: remove unsupported gfp_mask parameter from cma_alloc() cma_alloc() doesn't really support gfp flags other than __GFP_NOWARN, so convert gfp_mask parameter to boolean no_warn parameter. This will help to avoid giving false feeling that this function supports standard gfp flags and callers can pass __GFP_ZERO to get zeroed buffer, what has already been an issue: see commit `dd65a941f6` ("arm64: dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag"). Link: http://lkml.kernel.org/r/20180709122019eucas1p2340da484acfcc932537e6014f4fd2c29~-sqTPJKij2939229392eucas1p2j@eucas1p2.samsung.com Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Michał Nazarewicz <mina86@mina86.com> Acked-by: Laura Abbott <labbott@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-17 16:20:32 -07:00
Joonsoo Kim	d883c6cf3b	Revert "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE" This reverts the following commits that change CMA design in MM. `3d2054ad8c` ("ARM: CMA: avoid double mapping to the CMA area if CONFIG_HIGHMEM=y") `1d47a3ec09` ("mm/cma: remove ALLOC_CMA") `bad8c6c0b1` ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE") Ville reported a following error on i386. Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) microcode: microcode updated early to revision 0x4, date = 2013-06-28 Initializing CPU#0 Initializing HighMem for node 0 (000377fe:00118000) Initializing Movable for node 0 (00000001:00118000) BUG: Bad page state in process swapper pfn:377fe page:f53effc0 count:0 mapcount:-127 mapping:00000000 index:0x0 flags: 0x80000000() raw: 80000000 00000000 00000000 ffffff80 00000000 00000100 00000200 00000001 page dumped because: nonzero mapcount Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0-rc5-elk+ #145 Hardware name: Dell Inc. Latitude E5410/03VXMC, BIOS A15 07/11/2013 Call Trace: dump_stack+0x60/0x96 bad_page+0x9a/0x100 free_pages_check_bad+0x3f/0x60 free_pcppages_bulk+0x29d/0x5b0 free_unref_page_commit+0x84/0xb0 free_unref_page+0x3e/0x70 __free_pages+0x1d/0x20 free_highmem_page+0x19/0x40 add_highpages_with_active_regions+0xab/0xeb set_highmem_pages_init+0x66/0x73 mem_init+0x1b/0x1d7 start_kernel+0x17a/0x363 i386_start_kernel+0x95/0x99 startup_32_smp+0x164/0x168 The reason for this error is that the span of MOVABLE_ZONE is extended to whole node span for future CMA initialization, and, normal memory is wrongly freed here. I submitted the fix and it seems to work, but, another problem happened. It's so late time to fix the later problem so I decide to reverting the series. Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Acked-by: Laura Abbott <labbott@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-05-24 10:07:50 -07:00
Joonsoo Kim	bad8c6c0b1	mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE Patch series "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE", v2. 0. History This patchset is the follow-up of the discussion about the "Introduce ZONE_CMA (v7)" [1]. Please reference it if more information is needed. 1. What does this patch do? This patch changes the management way for the memory of the CMA area in the MM subsystem. Currently the memory of the CMA area is managed by the zone where their pfn is belong to. However, this approach has some problems since MM subsystem doesn't have enough logic to handle the situation that different characteristic memories are in a single zone. To solve this issue, this patch try to manage all the memory of the CMA area by using the MOVABLE zone. In MM subsystem's point of view, characteristic of the memory on the MOVABLE zone and the memory of the CMA area are the same. So, managing the memory of the CMA area by using the MOVABLE zone will not have any problem. 2. Motivation There are some problems with current approach. See following. Although these problem would not be inherent and it could be fixed without this conception change, it requires many hooks addition in various code path and it would be intrusive to core MM and would be really error-prone. Therefore, I try to solve them with this new approach. Anyway, following is the problems of the current implementation. o CMA memory utilization First, following is the freepage calculation logic in MM. - For movable allocation: freepage = total freepage - For unmovable allocation: freepage = total freepage - CMA freepage Freepages on the CMA area is used after the normal freepages in the zone where the memory of the CMA area is belong to are exhausted. At that moment that the number of the normal freepages is zero, so - For movable allocation: freepage = total freepage = CMA freepage - For unmovable allocation: freepage = 0 If unmovable allocation comes at this moment, allocation request would fail to pass the watermark check and reclaim is started. After reclaim, there would exist the normal freepages so freepages on the CMA areas would not be used. FYI, there is another attempt [2] trying to solve this problem in lkml. And, as far as I know, Qualcomm also has out-of-tree solution for this problem. Useless reclaim: There is no logic to distinguish CMA pages in the reclaim path. Hence, CMA page is reclaimed even if the system just needs the page that can be usable for the kernel allocation. Atomic allocation failure: This is also related to the fallback allocation policy for the memory of the CMA area. Consider the situation that the number of the normal freepages is zero since the bunch of the movable allocation requests come. Kswapd would not be woken up due to following freepage calculation logic. - For movable allocation: freepage = total freepage = CMA freepage If atomic unmovable allocation request comes at this moment, it would fails due to following logic. - For unmovable allocation: freepage = total freepage - CMA freepage = 0 It was reported by Aneesh [3]. Useless compaction: Usual high-order allocation request is unmovable allocation request and it cannot be served from the memory of the CMA area. In compaction, migration scanner try to migrate the page in the CMA area and make high-order page there. As mentioned above, it cannot be usable for the unmovable allocation request so it's just waste. 3. Current approach and new approach Current approach is that the memory of the CMA area is managed by the zone where their pfn is belong to. However, these memory should be distinguishable since they have a strong limitation. So, they are marked as MIGRATE_CMA in pageblock flag and handled specially. However, as mentioned in section 2, the MM subsystem doesn't have enough logic to deal with this special pageblock so many problems raised. New approach is that the memory of the CMA area is managed by the MOVABLE zone. MM already have enough logic to deal with special zone like as HIGHMEM and MOVABLE zone. So, managing the memory of the CMA area by the MOVABLE zone just naturally work well because constraints for the memory of the CMA area that the memory should always be migratable is the same with the constraint for the MOVABLE zone. There is one side-effect for the usability of the memory of the CMA area. The use of MOVABLE zone is only allowed for a request with GFP_HIGHMEM && GFP_MOVABLE so now the memory of the CMA area is also only allowed for this gfp flag. Before this patchset, a request with GFP_MOVABLE can use them. IMO, It would not be a big issue since most of GFP_MOVABLE request also has GFP_HIGHMEM flag. For example, file cache page and anonymous page. However, file cache page for blockdev file is an exception. Request for it has no GFP_HIGHMEM flag. There is pros and cons on this exception. In my experience, blockdev file cache pages are one of the top reason that causes cma_alloc() to fail temporarily. So, we can get more guarantee of cma_alloc() success by discarding this case. Note that there is no change in admin POV since this patchset is just for internal implementation change in MM subsystem. Just one minor difference for admin is that the memory stat for CMA area will be printed in the MOVABLE zone. That's all. 4. Result Following is the experimental result related to utilization problem. 8 CPUs, 1024 MB, VIRTUAL MACHINE make -j16 <Before> CMA area: 0 MB 512 MB Elapsed-time: 92.4 186.5 pswpin: 82 18647 pswpout: 160 69839 <After> CMA : 0 MB 512 MB Elapsed-time: 93.1 93.4 pswpin: 84 46 pswpout: 183 92 akpm: "kernel test robot" reported a 26% improvement in vm-scalability.throughput: http://lkml.kernel.org/r/20180330012721.GA3845@yexl-desktop [1]: lkml.kernel.org/r/1491880640-9944-1-git-send-email-iamjoonsoo.kim@lge.com [2]: https://lkml.org/lkml/2014/10/15/623 [3]: http://www.spinics.net/lists/linux-mm/msg100562.html Link: http://lkml.kernel.org/r/1512114786-5085-2-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Tested-by: Tony Lindgren <tony@atomide.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-11 10:28:32 -07:00
Randy Dunlap	514c603249	headers: untangle kmemleak.h from mm.h Currently <linux/slab.h> #includes <linux/kmemleak.h> for no obvious reason. It looks like it's only a convenience, so remove kmemleak.h from slab.h and add <linux/kmemleak.h> to any users of kmemleak_* that don't already #include it. Also remove <linux/kmemleak.h> from source files that do not use it. This is tested on i386 allmodconfig and x86_64 allmodconfig. It would be good to run it through the 0day bot for other $ARCHes. I have neither the horsepower nor the storage space for the other $ARCHes. Update: This patch has been extensively build-tested by both the 0day bot & kisskb/ozlabs build farms. Both of them reported 2 build failures for which patches are included here (in v2). [ slab.h is the second most used header file after module.h; kernel.h is right there with slab.h. There could be some minor error in the counting due to some #includes having comments after them and I didn't combine all of those. ] [akpm@linux-foundation.org: security/keys/big_key.c needs vmalloc.h, per sfr] Link: http://lkml.kernel.org/r/e4309f98-3749-93e1-4bb7-d9501a39d015@infradead.org Link: http://kisskb.ellerman.id.au/kisskb/head/13396/ Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Reported-by: Michael Ellerman <mpe@ellerman.id.au> [2 build failures] Reported-by: Fengguang Wu <fengguang.wu@intel.com> [2 build failures] Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Wei Yongjun <weiyongjun1@huawei.com> Cc: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: John Johansen <john.johansen@canonical.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-05 21:36:27 -07:00
Mike Rapoport	e8b098fc57	mm: kernel-doc: add missing parameter descriptions Link: http://lkml.kernel.org/r/1519585191-10180-4-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-05 21:36:27 -07:00
Pintu Agarwal	5984af1082	mm/cma.c: change pr_info to pr_err for cma_alloc fail log It was observed that under cma_alloc fail log, pr_info was used instead of pr_err. This will lead to problems if printk debug level is set to below 7. In this case the cma_alloc failure log will not be captured in the log and it will be difficult to debug. Simply replace the pr_info with pr_err to capture failure log. Link: http://lkml.kernel.org/r/1507650633-4430-1-git-send-email-pintu.ping@gmail.com Signed-off-by: Pintu Agarwal <pintu.ping@gmail.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jaewon Kim <jaewon31.kim@samsung.com> Cc: Doug Berger <opendmb@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-11-15 18:21:03 -08:00

1 2

87 Commits