Changes in 5.15.17
KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMU
KVM: VMX: switch blocked_vcpu_on_cpu_lock to raw spinlock
HID: Ignore battery for Elan touchscreen on HP Envy X360 15t-dr100
HID: uhid: Fix worker destroying device without any protection
HID: wacom: Reset expected and received contact counts at the same time
HID: wacom: Ignore the confidence flag when a touch is removed
HID: wacom: Avoid using stale array indicies to read contact count
ALSA: core: Fix SSID quirk lookup for subvendor=0
f2fs: fix to do sanity check on inode type during garbage collection
f2fs: fix to do sanity check in is_alive()
f2fs: avoid EINVAL by SBI_NEED_FSCK when pinning a file
nfc: llcp: fix NULL error pointer dereference on sendmsg() after failed bind()
mtd: rawnand: gpmi: Add ERR007117 protection for nfc_apply_timings
mtd: rawnand: gpmi: Remove explicit default gpmi clock setting for i.MX6
mtd: Fixed breaking list in __mtd_del_partition.
mtd: rawnand: davinci: Don't calculate ECC when reading page
mtd: rawnand: davinci: Avoid duplicated page read
mtd: rawnand: davinci: Rewrite function description
mtd: rawnand: Export nand_read_page_hwecc_oob_first()
mtd: rawnand: ingenic: JZ4740 needs 'oob_first' read page function
riscv: Get rid of MAXPHYSMEM configs
RISC-V: Use common riscv_cpuid_to_hartid_mask() for both SMP=y and SMP=n
riscv: try to allocate crashkern region from 32bit addressible memory
riscv: Don't use va_pa_offset on kdump
riscv: use hart id instead of cpu id on machine_kexec
riscv: mm: fix wrong phys_ram_base value for RV64
x86/gpu: Reserve stolen memory for first integrated Intel GPU
tools/nolibc: x86-64: Fix startup code bug
crypto: x86/aesni - don't require alignment of data
tools/nolibc: i386: fix initial stack alignment
tools/nolibc: fix incorrect truncation of exit code
rtc: cmos: take rtc_lock while reading from CMOS
net: phy: marvell: add Marvell specific PHY loopback
ksmbd: uninitialized variable in create_socket()
ksmbd: fix guest connection failure with nautilus
ksmbd: add support for smb2 max credit parameter
ksmbd: move credit charge deduction under processing request
ksmbd: limits exceeding the maximum allowable outstanding requests
ksmbd: add reserved room in ipc request/response
media: cec: fix a deadlock situation
media: ov8865: Disable only enabled regulators on error path
media: v4l2-ioctl.c: readbuffers depends on V4L2_CAP_READWRITE
media: flexcop-usb: fix control-message timeouts
media: mceusb: fix control-message timeouts
media: em28xx: fix control-message timeouts
media: cpia2: fix control-message timeouts
media: s2255: fix control-message timeouts
media: dib0700: fix undefined behavior in tuner shutdown
media: redrat3: fix control-message timeouts
media: pvrusb2: fix control-message timeouts
media: stk1160: fix control-message timeouts
media: cec-pin: fix interrupt en/disable handling
can: softing_cs: softingcs_probe(): fix memleak on registration failure
mei: hbm: fix client dma reply status
iio: adc: ti-adc081c: Partial revert of removal of ACPI IDs
iio: trigger: Fix a scheduling whilst atomic issue seen on tsc2046
lkdtm: Fix content of section containing lkdtm_rodata_do_nothing()
bus: mhi: pci_generic: Graceful shutdown on freeze
bus: mhi: core: Fix reading wake_capable channel configuration
bus: mhi: core: Fix race while handling SYS_ERR at power up
cxl/pmem: Fix reference counting for delayed work
arm64: errata: Fix exec handling in erratum 1418040 workaround
ARM: dts: at91: update alternate function of signal PD20
iommu/io-pgtable-arm-v7s: Add error handle for page table allocation failure
gpu: host1x: Add back arm_iommu_detach_device()
drm/tegra: Add back arm_iommu_detach_device()
virtio/virtio_mem: handle a possible NULL as a memcpy parameter
dma_fence_array: Fix PENDING_ERROR leak in dma_fence_array_signaled()
PCI: Add function 1 DMA alias quirk for Marvell 88SE9125 SATA controller
mm_zone: add function to check if managed dma zone exists
dma/pool: create dma atomic pool only if dma zone has managed pages
mm/page_alloc.c: do not warn allocation failure on zone DMA if no managed pages
ath11k: add string type to search board data in board-2.bin for WCN6855
shmem: fix a race between shmem_unused_huge_shrink and shmem_evict_inode
drm/ttm: Put BO in its memory manager's lru list
Bluetooth: L2CAP: Fix not initializing sk_peer_pid
drm/bridge: display-connector: fix an uninitialized pointer in probe()
drm: fix null-ptr-deref in drm_dev_init_release()
drm/panel: kingdisplay-kd097d04: Delete panel on attach() failure
drm/panel: innolux-p079zca: Delete panel on attach() failure
drm/rockchip: dsi: Fix unbalanced clock on probe error
drm/rockchip: dsi: Hold pm-runtime across bind/unbind
drm/rockchip: dsi: Disable PLL clock on bind error
drm/rockchip: dsi: Reconfigure hardware on resume()
Bluetooth: virtio_bt: fix memory leak in virtbt_rx_handle()
Bluetooth: cmtp: fix possible panic when cmtp_init_sockets() fails
clk: bcm-2835: Pick the closest clock rate
clk: bcm-2835: Remove rounding up the dividers
drm/vc4: hdmi: Set a default HSM rate
drm/vc4: hdmi: Move the HSM clock enable to runtime_pm
drm/vc4: hdmi: Make sure the controller is powered in detect
drm/vc4: hdmi: Make sure the controller is powered up during bind
drm/vc4: hdmi: Rework the pre_crtc_configure error handling
drm/vc4: crtc: Make sure the HDMI controller is powered when disabling
wcn36xx: ensure pairing of init_scan/finish_scan and start_scan/end_scan
wcn36xx: Indicate beacon not connection loss on MISSED_BEACON_IND
drm/vc4: hdmi: Enable the scrambler on reconnection
libbpf: Free up resources used by inner map definition
wcn36xx: Fix DMA channel enable/disable cycle
wcn36xx: Release DMA channel descriptor allocations
wcn36xx: Put DXE block into reset before freeing memory
wcn36xx: populate band before determining rate on RX
wcn36xx: fix RX BD rate mapping for 5GHz legacy rates
ath11k: Send PPDU_STATS_CFG with proper pdev mask to firmware
bpftool: Fix memory leak in prog_dump()
mtd: hyperbus: rpc-if: Check return value of rpcif_sw_init()
media: videobuf2: Fix the size printk format
media: atomisp: add missing media_device_cleanup() in atomisp_unregister_entities()
media: atomisp: fix punit_ddr_dvfs_enable() argument for mrfld_power up case
media: atomisp: fix inverted logic in buffers_needed()
media: atomisp: do not use err var when checking port validity for ISP2400
media: atomisp: fix inverted error check for ia_css_mipi_is_source_port_valid()
media: atomisp: fix ifdefs in sh_css.c
media: atomisp: add NULL check for asd obtained from atomisp_video_pipe
media: atomisp: fix enum formats logic
media: atomisp: fix uninitialized bug in gmin_get_pmic_id_and_addr()
media: aspeed: fix mode-detect always time out at 2nd run
media: em28xx: fix memory leak in em28xx_init_dev
media: aspeed: Update signal status immediately to ensure sane hw state
arm64: dts: amlogic: meson-g12: Fix GPU operating point table node name
arm64: dts: amlogic: Fix SPI NOR flash node name for ODROID N2/N2+
arm64: dts: meson-gxbb-wetek: fix HDMI in early boot
arm64: dts: meson-gxbb-wetek: fix missing GPIO binding
fs: dlm: don't call kernel_getpeername() in error_report()
memory: renesas-rpc-if: Return error in case devm_ioremap_resource() fails
Bluetooth: stop proccessing malicious adv data
ath11k: Fix ETSI regd with weather radar overlap
ath11k: clear the keys properly via DISABLE_KEY
ath11k: reset RSN/WPA present state for open BSS
spi: hisi-kunpeng: Fix the debugfs directory name incorrect
tee: fix put order in teedev_close_context()
fs: dlm: fix build with CONFIG_IPV6 disabled
drm/dp: Don't read back backlight mode in drm_edp_backlight_enable()
drm/vboxvideo: fix a NULL vs IS_ERR() check
arm64: dts: renesas: cat875: Add rx/tx delays
media: dmxdev: fix UAF when dvb_register_device() fails
crypto: atmel-aes - Reestablish the correct tfm context at dequeue
crypto: qce - fix uaf on qce_aead_register_one
crypto: qce - fix uaf on qce_ahash_register_one
crypto: qce - fix uaf on qce_skcipher_register_one
arm64: dts: qcom: sc7280: Fix incorrect clock name
mtd: hyperbus: rpc-if: fix bug in rpcif_hb_remove
cpufreq: qcom-cpufreq-hw: Update offline CPUs per-cpu thermal pressure
cpufreq: qcom-hw: Fix probable nested interrupt handling
ARM: dts: stm32: fix dtbs_check warning on ili9341 dts binding on stm32f429 disco
libbpf: Fix potential misaligned memory access in btf_ext__new()
libbpf: Fix glob_syms memory leak in bpf_linker
libbpf: Fix using invalidated memory in bpf_linker
crypto: qat - remove unnecessary collision prevention step in PFVF
crypto: qat - make pfvf send message direction agnostic
crypto: qat - fix undetected PFVF timeout in ACK loop
ath11k: Use host CE parameters for CE interrupts configuration
arm64: dts: ti: k3-j721e: correct cache-sets info
tty: serial: atmel: Check return code of dmaengine_submit()
tty: serial: atmel: Call dma_async_issue_pending()
mfd: atmel-flexcom: Remove #ifdef CONFIG_PM_SLEEP
mfd: atmel-flexcom: Use .resume_noirq
bfq: Do not let waker requests skip proper accounting
libbpf: Silence uninitialized warning/error in btf_dump_dump_type_data
media: i2c: imx274: fix s_frame_interval runtime resume not requested
media: i2c: Re-order runtime pm initialisation
media: i2c: ov8865: Fix lockdep error
media: rcar-csi2: Correct the selection of hsfreqrange
media: imx-pxp: Initialize the spinlock prior to using it
media: si470x-i2c: fix possible memory leak in si470x_i2c_probe()
media: mtk-vcodec: call v4l2_m2m_ctx_release first when file is released
media: hantro: Hook up RK3399 JPEG encoder output
media: coda: fix CODA960 JPEG encoder buffer overflow
media: venus: correct low power frequency calculation for encoder
media: venus: core: Fix a potential NULL pointer dereference in an error handling path
media: venus: core: Fix a resource leak in the error handling path of 'venus_probe()'
net: stmmac: Add platform level debug register dump feature
thermal/drivers/imx: Implement runtime PM support
igc: AF_XDP zero-copy metadata adjust breaks SKBs on XDP_PASS
netfilter: bridge: add support for pppoe filtering
powerpc: Avoid discarding flags in system_call_exception()
arm64: dts: qcom: msm8916: fix MMC controller aliases
drm/vmwgfx: Remove the deprecated lower mem limit
drm/vmwgfx: Fail to initialize on broken configs
cgroup: Trace event cgroup id fields should be u64
ACPI: EC: Rework flushing of EC work while suspended to idle
thermal/drivers/imx8mm: Enable ADC when enabling monitor
drm/amdgpu: Fix a NULL pointer dereference in amdgpu_connector_lcd_native_mode()
drm/radeon/radeon_kms: Fix a NULL pointer dereference in radeon_driver_open_kms()
libbpf: Clean gen_loader's attach kind.
crypto: caam - save caam memory to support crypto engine retry mechanism.
arm64: dts: ti: k3-am642: Fix the L2 cache sets
arm64: dts: ti: k3-j7200: Fix the L2 cache sets
arm64: dts: ti: k3-j721e: Fix the L2 cache sets
arm64: dts: ti: k3-j7200: Correct the d-cache-sets info
tty: serial: uartlite: allow 64 bit address
serial: amba-pl011: do not request memory region twice
mtd: core: provide unique name for nvmem device
floppy: Fix hang in watchdog when disk is ejected
staging: rtl8192e: return error code from rtllib_softmac_init()
staging: rtl8192e: rtllib_module: fix error handle case in alloc_rtllib()
Bluetooth: btmtksdio: fix resume failure
bpf: Fix the test_task_vma selftest to support output shorter than 1 kB
sched/fair: Fix detection of per-CPU kthreads waking a task
sched/fair: Fix per-CPU kthread and wakee stacking for asym CPU capacity
bpf: Adjust BTF log size limit.
bpf: Disallow BPF_LOG_KERNEL log level for bpf(BPF_BTF_LOAD)
bpf: Remove config check to enable bpf support for branch records
arm64: clear_page() shouldn't use DC ZVA when DCZID_EL0.DZP == 1
arm64: mte: DC {GVA,GZVA} shouldn't be used when DCZID_EL0.DZP == 1
samples/bpf: Install libbpf headers when building
samples/bpf: Clean up samples/bpf build failes
samples: bpf: Fix xdp_sample_user.o linking with Clang
samples: bpf: Fix 'unknown warning group' build warning on Clang
media: dib8000: Fix a memleak in dib8000_init()
media: saa7146: mxb: Fix a NULL pointer dereference in mxb_attach()
media: si2157: Fix "warm" tuner state detection
wireless: iwlwifi: Fix a double free in iwl_txq_dyn_alloc_dma
sched/rt: Try to restart rt period timer when rt runtime exceeded
ath10k: Fix the MTU size on QCA9377 SDIO
Bluetooth: refactor set_exp_feature with a feature table
Bluetooth: MGMT: Use hci_dev_test_and_{set,clear}_flag
Bluetooth: btusb: Handle download_firmware failure cases
drm/amd/display: Fix bug in debugfs crc_win_update entry
drm/amd/display: Fix out of bounds access on DNC31 stream encoder regs
drm/msm/gpu: Don't allow zero fence_id
drm/msm/dp: displayPort driver need algorithm rational
rcu/exp: Mark current CPU as exp-QS in IPI loop second pass
wcn36xx: Fix max channels retrieval
drm/msm/dsi: fix initialization in the bonded DSI case
mwifiex: Fix possible ABBA deadlock
xfrm: fix a small bug in xfrm_sa_len()
x86/uaccess: Move variable into switch case statement
selftests: clone3: clone3: add case CLONE3_ARGS_NO_TEST
selftests: harness: avoid false negatives if test has no ASSERTs
crypto: stm32/cryp - fix CTR counter carry
crypto: stm32/cryp - fix xts and race condition in crypto_engine requests
crypto: stm32/cryp - check early input data
crypto: stm32/cryp - fix double pm exit
crypto: stm32/cryp - fix lrw chaining mode
crypto: stm32/cryp - fix bugs and crash in tests
crypto: stm32 - Revert broken pm_runtime_resume_and_get changes
crypto: hisilicon/qm - fix incorrect return value of hisi_qm_resume()
ath11k: Fix deleting uninitialized kernel timer during fragment cache flush
spi: Fix incorrect cs_setup delay handling
ARM: dts: gemini: NAS4220-B: fis-index-block with 128 KiB sectors
perf/arm-cmn: Fix CPU hotplug unregistration
media: dw2102: Fix use after free
media: msi001: fix possible null-ptr-deref in msi001_probe()
media: coda/imx-vdoa: Handle dma_set_coherent_mask error codes
ath11k: Fix a NULL pointer dereference in ath11k_mac_op_hw_scan()
net: dsa: hellcreek: Fix insertion of static FDB entries
net: dsa: hellcreek: Add STP forwarding rule
net: dsa: hellcreek: Allow PTP P2P measurements on blocked ports
net: dsa: hellcreek: Add missing PTP via UDP rules
arm64: dts: qcom: c630: Fix soundcard setup
arm64: dts: qcom: ipq6018: Fix gpio-ranges property
drm/msm/dpu: fix safe status debugfs file
drm/bridge: ti-sn65dsi86: Set max register for regmap
gpu: host1x: select CONFIG_DMA_SHARED_BUFFER
drm/tegra: gr2d: Explicitly control module reset
drm/tegra: vic: Fix DMA API misuse
media: hantro: Fix probe func error path
xfrm: interface with if_id 0 should return error
xfrm: state and policy should fail if XFRMA_IF_ID 0
ARM: 9159/1: decompressor: Avoid UNPREDICTABLE NOP encoding
usb: ftdi-elan: fix memory leak on device disconnect
arm64: dts: marvell: cn9130: add GPIO and SPI aliases
arm64: dts: marvell: cn9130: enable CP0 GPIO controllers
ARM: dts: armada-38x: Add generic compatible to UART nodes
mt76: mt7921: drop offload_flags overwritten
wilc1000: fix double free error in probe()
rtw88: add quirk to disable pci caps on HP 250 G7 Notebook PC
rtw88: Disable PCIe ASPM while doing NAPI poll on 8821CE
iwlwifi: mvm: fix 32-bit build in FTM
iwlwifi: mvm: test roc running status bits before removing the sta
iwlwifi: mvm: perform 6GHz passive scan after suspend
iwlwifi: mvm: set protected flag only for NDP ranging
mmc: meson-mx-sdhc: add IRQ check
mmc: meson-mx-sdio: add IRQ check
block: fix error unwinding in device_add_disk
selinux: fix potential memleak in selinux_add_opt()
um: fix ndelay/udelay defines
um: rename set_signals() to um_set_signals()
um: virt-pci: Fix 32-bit compile
lib/logic_iomem: Fix 32-bit build
lib/logic_iomem: Fix operation on 32-bit
um: virtio_uml: Fix time-travel external time propagation
Bluetooth: L2CAP: Fix using wrong mode
bpftool: Enable line buffering for stdout
backlight: qcom-wled: Validate enabled string indices in DT
backlight: qcom-wled: Pass number of elements to read to read_u32_array
backlight: qcom-wled: Fix off-by-one maximum with default num_strings
backlight: qcom-wled: Override default length with qcom,enabled-strings
backlight: qcom-wled: Use cpu_to_le16 macro to perform conversion
backlight: qcom-wled: Respect enabled-strings in set_brightness
software node: fix wrong node passed to find nargs_prop
Bluetooth: hci_qca: Stop IBS timer during BT OFF
x86/boot/compressed: Move CLANG_FLAGS to beginning of KBUILD_CFLAGS
crypto: octeontx2 - prevent underflow in get_cores_bmap()
regulator: qcom-labibb: OCP interrupts are not a failure while disabled
hwmon: (mr75203) fix wrong power-up delay value
x86/mce/inject: Avoid out-of-bounds write when setting flags
io_uring: remove double poll on poll update
serial: 8250_bcm7271: Propagate error codes from brcmuart_probe()
ACPI: scan: Create platform device for BCM4752 and LNV4752 ACPI nodes
pcmcia: rsrc_nonstatic: Fix a NULL pointer dereference in __nonstatic_find_io_region()
pcmcia: rsrc_nonstatic: Fix a NULL pointer dereference in nonstatic_find_mem_region()
power: reset: mt6397: Check for null res pointer
net/xfrm: IPsec tunnel mode fix inner_ipproto setting in sec_path
net: ethernet: mtk_eth_soc: fix return values and refactor MDIO ops
net: dsa: fix incorrect function pointer check for MRP ring roles
netfilter: ipt_CLUSTERIP: fix refcount leak in clusterip_tg_check()
bpf, sockmap: Fix return codes from tcp_bpf_recvmsg_parser()
bpf, sockmap: Fix double bpf_prog_put on error case in map_link
bpf: Don't promote bogus looking registers after null check.
bpf: Fix verifier support for validation of async callbacks
bpf: Fix SO_RCVBUF/SO_SNDBUF handling in _bpf_setsockopt().
netfilter: nft_payload: do not update layer 4 checksum when mangling fragments
netfilter: nft_set_pipapo: allocate pcpu scratch maps on clone
net: fix SOF_TIMESTAMPING_BIND_PHC to work with multiple sockets
ppp: ensure minimum packet size in ppp_write()
rocker: fix a sleeping in atomic bug
staging: greybus: audio: Check null pointer
fsl/fman: Check for null pointer after calling devm_ioremap
Bluetooth: hci_bcm: Check for error irq
Bluetooth: hci_qca: Fix NULL vs IS_ERR_OR_NULL check in qca_serdev_probe
net/smc: Reset conn->lgr when link group registration fails
usb: dwc3: qcom: Fix NULL vs IS_ERR checking in dwc3_qcom_probe
usb: dwc2: do not gate off the hardware if it does not support clock gating
usb: dwc2: gadget: initialize max_speed from params
usb: gadget: u_audio: Subdevice 0 for capture ctls
HID: hid-uclogic-params: Invalid parameter check in uclogic_params_init
HID: hid-uclogic-params: Invalid parameter check in uclogic_params_get_str_desc
HID: hid-uclogic-params: Invalid parameter check in uclogic_params_huion_init
HID: hid-uclogic-params: Invalid parameter check in uclogic_params_frame_init_v1_buttonpad
debugfs: lockdown: Allow reading debugfs files that are not world readable
drivers/firmware: Add missing platform_device_put() in sysfb_create_simplefb
serial: liteuart: fix MODULE_ALIAS
serial: stm32: move tx dma terminate DMA to shutdown
x86, sched: Fix undefined reference to init_freq_invariance_cppc() build error
net/mlx5e: Fix page DMA map/unmap attributes
net/mlx5e: Fix wrong usage of fib_info_nh when routes with nexthop objects are used
net/mlx5e: Don't block routes with nexthop objects in SW
Revert "net/mlx5e: Block offload of outer header csum for UDP tunnels"
Revert "net/mlx5e: Block offload of outer header csum for GRE tunnel"
net/mlx5e: Fix matching on modified inner ip_ecn bits
net/mlx5: Fix access to sf_dev_table on allocation failure
net/mlx5e: Sync VXLAN udp ports during uplink representor profile change
net/mlx5: Set command entry semaphore up once got index free
lib/mpi: Add the return value check of kcalloc()
Bluetooth: L2CAP: uninitialized variables in l2cap_sock_setsockopt()
mptcp: fix per socket endpoint accounting
mptcp: fix opt size when sending DSS + MP_FAIL
mptcp: fix a DSS option writing error
spi: spi-meson-spifc: Add missing pm_runtime_disable() in meson_spifc_probe
octeontx2-af: Increment ptp refcount before use
ax25: uninitialized variable in ax25_setsockopt()
netrom: fix api breakage in nr_setsockopt()
regmap: Call regmap_debugfs_exit() prior to _init()
net: mscc: ocelot: fix incorrect balancing with down LAG ports
can: mcp251xfd: add missing newline to printed strings
tpm: add request_locality before write TPM_INT_ENABLE
tpm_tis: Fix an error handling path in 'tpm_tis_core_init()'
can: softing: softing_startstop(): fix set but not used variable warning
can: xilinx_can: xcan_probe(): check for error irq
can: rcar_canfd: rcar_canfd_channel_probe(): make sure we free CAN network device
pcmcia: fix setting of kthread task states
net/sched: flow_dissector: Fix matching on zone id for invalid conns
net: openvswitch: Fix matching zone id for invalid conns arriving from tc
net: openvswitch: Fix ct_state nat flags for conns arriving from tc
iwlwifi: mvm: Use div_s64 instead of do_div in iwl_mvm_ftm_rtt_smoothing()
bnxt_en: Refactor coredump functions
bnxt_en: move coredump functions into dedicated file
bnxt_en: use firmware provided max timeout for messages
net: mcs7830: handle usb read errors properly
ext4: avoid trim error on fs with small groups
ASoC: Intel: sof_sdw: fix jack detection on HP Spectre x360 convertible
ALSA: jack: Add missing rwsem around snd_ctl_remove() calls
ALSA: PCM: Add missing rwsem around snd_ctl_remove() calls
ALSA: hda: Add missing rwsem around snd_ctl_remove() calls
ALSA: hda: Fix potential deadlock at codec unbinding
RDMA/bnxt_re: Scan the whole bitmap when checking if "disabling RCFW with pending cmd-bit"
RDMA/hns: Validate the pkey index
scsi: pm80xx: Update WARN_ON check in pm8001_mpi_build_cmd()
clk: renesas: rzg2l: Check return value of pm_genpd_init()
clk: renesas: rzg2l: propagate return value of_genpd_add_provider_simple()
clk: imx8mn: Fix imx8mn_clko1_sels
powerpc/prom_init: Fix improper check of prom_getprop()
ASoC: uniphier: drop selecting non-existing SND_SOC_UNIPHIER_AIO_DMA
ASoC: codecs: wcd938x: add SND_SOC_WCD938_SDW to codec list instead
RDMA/rtrs-clt: Fix the initial value of min_latency
ALSA: hda: Make proper use of timecounter
dt-bindings: thermal: Fix definition of cooling-maps contribution property
powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC
powerpc/modules: Don't WARN on first module allocation attempt
powerpc/32s: Fix shift-out-of-bounds in KASAN init
clocksource: Avoid accidental unstable marking of clocksources
ALSA: oss: fix compile error when OSS_DEBUG is enabled
ALSA: usb-audio: Drop superfluous '0' in Presonus Studio 1810c's ID
misc: at25: Make driver OF independent again
char/mwave: Adjust io port register size
binder: fix handling of error during copy
binder: avoid potential data leakage when copying txn
openrisc: Add clone3 ABI wrapper
iommu: Extend mutex lock scope in iommu_probe_device()
iommu/io-pgtable-arm: Fix table descriptor paddr formatting
scsi: core: Fix scsi_device_max_queue_depth()
scsi: ufs: Fix race conditions related to driver data
RDMA/qedr: Fix reporting max_{send/recv}_wr attrs
PCI/MSI: Fix pci_irq_vector()/pci_irq_get_affinity()
powerpc/powermac: Add additional missing lockdep_register_key()
iommu/arm-smmu-qcom: Fix TTBR0 read
RDMA/core: Let ib_find_gid() continue search even after empty entry
RDMA/cma: Let cma_resolve_ib_dev() continue search even after empty entry
ASoC: rt5663: Handle device_property_read_u32_array error codes
of: unittest: fix warning on PowerPC frame size warning
of: unittest: 64 bit dma address test requires arch support
clk: stm32: Fix ltdc's clock turn off by clk_disable_unused() after system enter shell
mips: add SYS_HAS_CPU_MIPS64_R5 config for MIPS Release 5 support
mips: fix Kconfig reference to PHYS_ADDR_T_64BIT
dmaengine: pxa/mmp: stop referencing config->slave_id
iommu/amd: Restore GA log/tail pointer on host resume
iommu/amd: X2apic mode: re-enable after resume
iommu/amd: X2apic mode: setup the INTX registers on mask/unmask
iommu/amd: X2apic mode: mask/unmask interrupts on suspend/resume
iommu/amd: Remove useless irq affinity notifier
ASoC: Intel: catpt: Test dmaengine_submit() result before moving on
iommu/iova: Fix race between FQ timeout and teardown
ASoC: mediatek: mt8195: correct default value
of: fdt: Aggregate the processing of "linux,usable-memory-range"
efi: apply memblock cap after memblock_add()
scsi: block: pm: Always set request queue runtime active in blk_post_runtime_resume()
phy: uniphier-usb3ss: fix unintended writing zeros to PHY register
ASoC: mediatek: Check for error clk pointer
powerpc/64s: Mask NIP before checking against SRR0
powerpc/64s: Use EMIT_WARN_ENTRY for SRR debug warnings
phy: cadence: Sierra: Fix to get correct parent for mux clocks
ASoC: samsung: idma: Check of ioremap return value
misc: lattice-ecp3-config: Fix task hung when firmware load failed
ASoC: mediatek: mt8195: correct pcmif BE dai control flow
arm64: tegra: Remove non existent Tegra194 reset
mips: lantiq: add support for clk_set_parent()
mips: bcm63xx: add support for clk_set_parent()
powerpc/xive: Add missing null check after calling kmalloc
ASoC: fsl_mqs: fix MODULE_ALIAS
ALSA: hda/cs8409: Increase delay during jack detection
ALSA: hda/cs8409: Fix Jack detection after resume
RDMA/cxgb4: Set queue pair state when being queried
clk: qcom: gcc-sc7280: Mark gcc_cfg_noc_lpass_clk always enabled
ASoC: imx-card: Need special setting for ak4497 on i.MX8MQ
ASoC: imx-card: Fix mclk calculation issue for akcodec
ASoC: imx-card: improve the sound quality for low rate
ASoC: fsl_asrc: refine the check of available clock divider
clk: bm1880: remove kfrees on static allocations
of: base: Fix phandle argument length mismatch error message
of/fdt: Don't worry about non-memory region overlap for no-map
MIPS: boot/compressed/: add __ashldi3 to target for ZSTD compression
MIPS: compressed: Fix build with ZSTD compression
mailbox: fix gce_num of mt8192 driver data
ARM: dts: omap3-n900: Fix lp5523 for multi color
leds: lp55xx: initialise output direction from dts
Bluetooth: Fix debugfs entry leak in hci_register_dev()
Bluetooth: Fix memory leak of hci device
drm/panel: Delete panel on mipi_dsi_attach() failure
Bluetooth: Fix removing adv when processing cmd complete
fs: dlm: filter user dlm messages for kernel locks
drm/lima: fix warning when CONFIG_DEBUG_SG=y & CONFIG_DMA_API_DEBUG=y
selftests/bpf: Fix memory leaks in btf_type_c_dump() helper
selftests/bpf: Destroy XDP link correctly
selftests/bpf: Fix bpf_object leak in skb_ctx selftest
ar5523: Fix null-ptr-deref with unexpected WDCMSG_TARGET_START reply
drm/bridge: dw-hdmi: handle ELD when DRM_BRIDGE_ATTACH_NO_CONNECTOR
drm/nouveau/pmu/gm200-: avoid touching PMU outside of DEVINIT/PREOS/ACR
media: atomisp: fix try_fmt logic
media: atomisp: set per-device's default mode
media: atomisp-ov2680: Fix ov2680_set_fmt() clobbering the exposure
media: atomisp: check before deference asd variable
ARM: shmobile: rcar-gen2: Add missing of_node_put()
batman-adv: allow netlink usage in unprivileged containers
media: atomisp: handle errors at sh_css_create_isp_params()
ath11k: Fix crash caused by uninitialized TX ring
usb: dwc3: meson-g12a: fix shared reset control use
USB: ehci_brcm_hub_control: Improve port index sanitizing
usb: gadget: f_fs: Use stream_open() for endpoint files
psi: Fix PSI_MEM_FULL state when tasks are in memstall and doing reclaim
drm: panel-orientation-quirks: Add quirk for the Lenovo Yoga Book X91F/L
HID: magicmouse: Report battery level over USB
HID: apple: Do not reset quirks when the Fn key is not found
media: b2c2: Add missing check in flexcop_pci_isr:
libbpf: Accommodate DWARF/compiler bug with duplicated structs
ethernet: renesas: Use div64_ul instead of do_div
EDAC/synopsys: Use the quirk for version instead of ddr version
arm64: dts: qcom: sm8350: Shorten camera-thermal-bottom name
soc: imx: gpcv2: Synchronously suspend MIX domains
ARM: imx: rename DEBUG_IMX21_IMX27_UART to DEBUG_IMX27_UART
drm/amd/display: check top_pipe_to_program pointer
drm/amdgpu/display: set vblank_disable_immediate for DC
soc: ti: pruss: fix referenced node in error message
mlxsw: pci: Add shutdown method in PCI driver
drm/amd/display: add else to avoid double destroy clk_mgr
drm/bridge: megachips: Ensure both bridges are probed before registration
mxser: keep only !tty test in ISR
tty: serial: imx: disable UCR4_OREN in .stop_rx() instead of .shutdown()
gpiolib: acpi: Do not set the IRQ type if the IRQ is already in use
HSI: core: Fix return freed object in hsi_new_client
crypto: jitter - consider 32 LSB for APT
mwifiex: Fix skb_over_panic in mwifiex_usb_recv()
rsi: Fix use-after-free in rsi_rx_done_handler()
rsi: Fix out-of-bounds read in rsi_read_pkt()
ath11k: Avoid NULL ptr access during mgmt tx cleanup
media: venus: avoid calling core_clk_setrate() concurrently during concurrent video sessions
regulator: da9121: Prevent current limit change when enabled
drm/vmwgfx: Release ttm memory if probe fails
drm/vmwgfx: Introduce a new placement for MOB page tables
ACPI / x86: Drop PWM2 device on Lenovo Yoga Book from always present table
ACPI: Change acpi_device_always_present() into acpi_device_override_status()
ACPI / x86: Allow specifying acpi_device_override_status() quirks by path
ACPI / x86: Add not-present quirk for the PCI0.SDHB.BRC1 device on the GPD win
arm64: dts: ti: j7200-main: Fix 'dtbs_check' serdes_ln_ctrl node
arm64: dts: ti: j721e-main: Fix 'dtbs_check' in serdes_ln_ctrl node
usb: uhci: add aspeed ast2600 uhci support
floppy: Add max size check for user space request
x86/mm: Flush global TLB when switching to trampoline page-table
drm: rcar-du: Fix CRTC timings when CMM is used
media: uvcvideo: Increase UVC_CTRL_CONTROL_TIMEOUT to 5 seconds.
media: rcar-vin: Update format alignment constraints
media: saa7146: hexium_orion: Fix a NULL pointer dereference in hexium_attach()
media: atomisp: fix "variable dereferenced before check 'asd'"
media: m920x: don't use stack on USB reads
thunderbolt: Runtime PM activate both ends of the device link
arm64: dts: renesas: Fix thermal bindings
iwlwifi: mvm: synchronize with FW after multicast commands
iwlwifi: mvm: avoid clearing a just saved session protection id
rcutorture: Avoid soft lockup during cpu stall
ath11k: avoid deadlock by change ieee80211_queue_work for regd_update_work
ath10k: Fix tx hanging
net-sysfs: update the queue counts in the unregistration path
net: phy: prefer 1000baseT over 1000baseKX
gpio: aspeed: Convert aspeed_gpio.lock to raw_spinlock
gpio: aspeed-sgpio: Convert aspeed_sgpio.lock to raw_spinlock
selftests/ftrace: make kprobe profile testcase description unique
ath11k: Avoid false DEADLOCK warning reported by lockdep
ARM: dts: qcom: sdx55: fix IPA interconnect definitions
x86/mce: Allow instrumentation during task work queueing
x86/mce: Mark mce_panic() noinstr
x86/mce: Mark mce_end() noinstr
x86/mce: Mark mce_read_aux() noinstr
net: bonding: debug: avoid printing debug logs when bond is not notifying peers
kunit: Don't crash if no parameters are generated
bpf: Do not WARN in bpf_warn_invalid_xdp_action()
drm/amdkfd: Fix error handling in svm_range_add
HID: quirks: Allow inverting the absolute X/Y values
HID: i2c-hid-of: Expose the touchscreen-inverted properties
media: igorplugusb: receiver overflow should be reported
media: rockchip: rkisp1: use device name for debugfs subdir name
media: saa7146: hexium_gemini: Fix a NULL pointer dereference in hexium_attach()
mmc: tmio: reinit card irqs in reset routine
mmc: core: Fixup storing of OCR for MMC_QUIRK_NONSTD_SDIO
drm/amd/amdgpu: fix psp tmr bo pin count leak in SRIOV
drm/amd/amdgpu: fix gmc bo pin count leak in SRIOV
audit: ensure userspace is penalized the same as the kernel when under pressure
arm64: dts: ls1028a-qds: move rtc node to the correct i2c bus
arm64: tegra: Adjust length of CCPLEX cluster MMIO region
crypto: ccp - Move SEV_INIT retry for corrupted data
crypto: hisilicon/hpre - fix memory leak in hpre_curve25519_src_init()
PM: runtime: Add safety net to supplier device release
cpufreq: Fix initialization of min and max frequency QoS requests
usb: hub: Add delay for SuperSpeed hub resume to let links transit to U0
mt76: mt7615: fix possible deadlock while mt7615_register_ext_phy()
mt76: do not pass the received frame with decryption error
mt76: mt7615: improve wmm index allocation
ath9k_htc: fix NULL pointer dereference at ath9k_htc_rxep()
ath9k_htc: fix NULL pointer dereference at ath9k_htc_tx_get_packet()
ath9k: Fix out-of-bound memcpy in ath9k_hif_usb_rx_stream
rtw88: 8822c: update rx settings to prevent potential hw deadlock
PM: AVS: qcom-cpr: Use div64_ul instead of do_div
iwlwifi: fix leaks/bad data after failed firmware load
iwlwifi: remove module loading failure message
iwlwifi: mvm: Fix calculation of frame length
iwlwifi: mvm: fix AUX ROC removal
iwlwifi: pcie: make sure prph_info is set when treating wakeup IRQ
mmc: sdhci-pci-gli: GL9755: Support for CD/WP inversion on OF platforms
block: check minor range in device_add_disk()
um: registers: Rename function names to avoid conflicts and build problems
ath11k: Fix napi related hang
Bluetooth: btintel: Add missing quirks and msft ext for legacy bootloader
Bluetooth: vhci: Set HCI_QUIRK_VALID_LE_STATES
xfrm: rate limit SA mapping change message to user space
drm/etnaviv: consider completed fence seqno in hang check
jffs2: GC deadlock reading a page that is used in jffs2_write_begin()
ACPICA: actypes.h: Expand the ACPI_ACCESS_ definitions
ACPICA: Utilities: Avoid deleting the same object twice in a row
ACPICA: Executer: Fix the REFCLASS_REFOF case in acpi_ex_opcode_1A_0T_1R()
ACPICA: Fix wrong interpretation of PCC address
ACPICA: Hardware: Do not flush CPU cache when entering S4 and S5
mmc: mtk-sd: Use readl_poll_timeout instead of open-coded polling
drm/amdgpu: fixup bad vram size on gmc v8
amdgpu/pm: Make sysfs pm attributes as read-only for VFs
ACPI: battery: Add the ThinkPad "Not Charging" quirk
ACPI: CPPC: Check present CPUs for determining _CPC is valid
btrfs: remove BUG_ON() in find_parent_nodes()
btrfs: remove BUG_ON(!eie) in find_parent_nodes
net: mdio: Demote probed message to debug print
mac80211: allow non-standard VHT MCS-10/11
dm btree: add a defensive bounds check to insert_at()
dm space map common: add bounds check to sm_ll_lookup_bitmap()
bpf/selftests: Fix namespace mount setup in tc_redirect
mlxsw: pci: Avoid flow control for EMAD packets
net: phy: marvell: configure RGMII delays for 88E1118
net: gemini: allow any RGMII interface mode
regulator: qcom_smd: Align probe function with rpmh-regulator
serial: pl010: Drop CR register reset on set_termios
serial: pl011: Drop CR register reset on set_termios
serial: core: Keep mctrl register state and cached copy in sync
random: do not throw away excess input to crng_fast_load
net/mlx5: Update log_max_qp value to FW max capability
net/mlx5e: Unblock setting vid 0 for VF in case PF isn't eswitch manager
parisc: Avoid calling faulthandler_disabled() twice
can: flexcan: allow to change quirks at runtime
can: flexcan: rename RX modes
can: flexcan: add more quirks to describe RX path capabilities
x86/kbuild: Enable CONFIG_KALLSYMS_ALL=y in the defconfigs
powerpc/6xx: add missing of_node_put
powerpc/powernv: add missing of_node_put
powerpc/cell: add missing of_node_put
powerpc/btext: add missing of_node_put
powerpc/watchdog: Fix missed watchdog reset due to memory ordering race
ASoC: imx-hdmi: add put_device() after of_find_device_by_node()
i2c: i801: Don't silently correct invalid transfer size
powerpc/smp: Move setup_profiling_timer() under CONFIG_PROFILING
i2c: mpc: Correct I2C reset procedure
clk: meson: gxbb: Fix the SDM_EN bit for MPLL0 on GXBB
powerpc/powermac: Add missing lockdep_register_key()
KVM: PPC: Book3S: Suppress warnings when allocating too big memory slots
KVM: PPC: Book3S: Suppress failed alloc warning in H_COPY_TOFROM_GUEST
w1: Misuse of get_user()/put_user() reported by sparse
nvmem: core: set size for sysfs bin file
dm: fix alloc_dax error handling in alloc_dev
interconnect: qcom: rpm: Prevent integer overflow in rate
scsi: ufs: Fix a kernel crash during shutdown
scsi: lpfc: Fix leaked lpfc_dmabuf mbox allocations with NPIV
scsi: lpfc: Trigger SLI4 firmware dump before doing driver cleanup
ALSA: seq: Set upper limit of processed events
MIPS: Loongson64: Use three arguments for slti
powerpc/40x: Map 32Mbytes of memory at startup
selftests/powerpc/spectre_v2: Return skip code when miss_percent is high
powerpc: handle kdump appropriately with crash_kexec_post_notifiers option
powerpc/fadump: Fix inaccurate CPU state info in vmcore generated with panic
udf: Fix error handling in udf_new_inode()
MIPS: OCTEON: add put_device() after of_find_device_by_node()
irqchip/gic-v4: Disable redistributors' view of the VPE table at boot time
i2c: designware-pci: Fix to change data types of hcnt and lcnt parameters
selftests/powerpc: Add a test of sigreturning to the kernel
MIPS: Octeon: Fix build errors using clang
scsi: sr: Don't use GFP_DMA
scsi: mpi3mr: Fixes around reply request queues
ASoC: mediatek: mt8192-mt6359: fix device_node leak
phy: phy-mtk-tphy: add support efuse setting
ASoC: mediatek: mt8173: fix device_node leak
ASoC: mediatek: mt8183: fix device_node leak
habanalabs: skip read fw errors if dynamic descriptor invalid
phy: mediatek: Fix missing check in mtk_mipi_tx_probe
mailbox: change mailbox-mpfs compatible string
seg6: export get_srh() for ICMP handling
icmp: ICMPV6: Examine invoking packet for Segment Route Headers.
udp6: Use Segment Routing Header for dest address if present
rpmsg: core: Clean up resources on announce_create failure.
ifcvf/vDPA: fix misuse virtio-net device config size for blk dev
crypto: omap-aes - Fix broken pm_runtime_and_get() usage
crypto: stm32/crc32 - Fix kernel BUG triggered in probe()
crypto: caam - replace this_cpu_ptr with raw_cpu_ptr
ubifs: Error path in ubifs_remount_rw() seems to wrongly free write buffers
tpm: fix potential NULL pointer access in tpm_del_char_device
tpm: fix NPE on probe for missing device
mfd: tps65910: Set PWR_OFF bit during driver probe
spi: uniphier: Fix a bug that doesn't point to private data correctly
xen/gntdev: fix unmap notification order
md: Move alloc/free acct bioset in to personality
HID: magicmouse: Fix an error handling path in magicmouse_probe()
fuse: Pass correct lend value to filemap_write_and_wait_range()
serial: Fix incorrect rs485 polarity on uart open
cputime, cpuacct: Include guest time in user time in cpuacct.stat
sched/cpuacct: Fix user/system in shown cpuacct.usage*
tracing/kprobes: 'nmissed' not showed correctly for kretprobe
tracing: Have syscall trace events use trace_event_buffer_lock_reserve()
remoteproc: imx_rproc: Fix a resource leak in the remove function
iwlwifi: mvm: Increase the scan timeout guard to 30 seconds
s390/mm: fix 2KB pgtable release race
device property: Fix fwnode_graph_devcon_match() fwnode leak
drm/tegra: submit: Add missing pm_runtime_mark_last_busy()
drm/etnaviv: limit submit sizes
drm/amd/display: Fix the uninitialized variable in enable_stream_features()
drm/nouveau/kms/nv04: use vzalloc for nv04_display
drm/bridge: analogix_dp: Make PSR-exit block less
parisc: Fix lpa and lpa_user defines
powerpc/64s/radix: Fix huge vmap false positive
scsi: lpfc: Fix lpfc_force_rscn ndlp kref imbalance
drm/amdgpu: don't do resets on APUs which don't support it
drm/i915/display/ehl: Update voltage swing table
PCI: xgene: Fix IB window setup
PCI: pciehp: Use down_read/write_nested(reset_lock) to fix lockdep errors
PCI: pci-bridge-emul: Make expansion ROM Base Address register read-only
PCI: pci-bridge-emul: Properly mark reserved PCIe bits in PCI config space
PCI: pci-bridge-emul: Fix definitions of reserved bits
PCI: pci-bridge-emul: Correctly set PCIe capabilities
PCI: pci-bridge-emul: Set PCI_STATUS_CAP_LIST for PCIe device
xfrm: fix policy lookup for ipv6 gre packets
xfrm: fix dflt policy check when there is no policy configured
btrfs: fix deadlock between quota enable and other quota operations
btrfs: check the root node for uptodate before returning it
btrfs: respect the max size in the header when activating swap file
ext4: make sure to reset inode lockdep class when quota enabling fails
ext4: make sure quota gets properly shutdown on error
ext4: fix a possible ABBA deadlock due to busy PA
ext4: initialize err_blk before calling __ext4_get_inode_loc
ext4: fix fast commit may miss tracking range for FALLOC_FL_ZERO_RANGE
ext4: set csum seed in tmp inode while migrating to extents
ext4: Fix BUG_ON in ext4_bread when write quota data
ext4: use ext4_ext_remove_space() for fast commit replay delete range
ext4: fast commit may miss tracking unwritten range during ftruncate
ext4: destroy ext4_fc_dentry_cachep kmemcache on module removal
ext4: fix null-ptr-deref in '__ext4_journal_ensure_credits'
ext4: fix an use-after-free issue about data=journal writeback mode
ext4: don't use the orphan list when migrating an inode
tracing/osnoise: Properly unhook events if start_per_cpu_kthreads() fails
ath11k: qmi: avoid error messages when dma allocation fails
drm/radeon: fix error handling in radeon_driver_open_kms
of: base: Improve argument length mismatch error
firmware: Update Kconfig help text for Google firmware
can: mcp251xfd: mcp251xfd_tef_obj_read(): fix typo in error message
media: rcar-csi2: Optimize the selection PHTW register
drm/vc4: hdmi: Make sure the device is powered with CEC
media: correct MEDIA_TEST_SUPPORT help text
Documentation: coresight: Fix documentation issue
Documentation: dmaengine: Correctly describe dmatest with channel unset
Documentation: ACPI: Fix data node reference documentation
Documentation, arch: Remove leftovers from raw device
Documentation, arch: Remove leftovers from CIFS_WEAK_PW_HASH
Documentation: refer to config RANDOMIZE_BASE for kernel address-space randomization
Documentation: fix firewire.rst ABI file path error
Bluetooth: btusb: Return error code when getting patch status failed
net: usb: Correct reset handling of smsc95xx
Bluetooth: hci_sync: Fix not setting adv set duration
scsi: core: Show SCMD_LAST in text form
scsi: ufs: ufs-mediatek: Fix error checking in ufs_mtk_init_va09_pwr_ctrl()
RDMA/cma: Remove open coding of overflow checking for private_data_len
dmaengine: uniphier-xdmac: Fix type of address variables
dmaengine: idxd: fix wq settings post wq disable
RDMA/hns: Modify the mapping attribute of doorbell to device
RDMA/rxe: Fix a typo in opcode name
dmaengine: stm32-mdma: fix STM32_MDMA_CTBR_TSEL_MASK
Revert "net/mlx5: Add retry mechanism to the command entry index allocation"
powerpc/cell: Fix clang -Wimplicit-fallthrough warning
powerpc/fsl/dts: Enable WA for erratum A-009885 on fman3l MDIO buses
block: fix async_depth sysfs interface for mq-deadline
block: Fix fsync always failed if once failed
drm/vc4: crtc: Drop feed_txp from state
drm/vc4: Fix non-blocking commit getting stuck forever
drm/vc4: crtc: Copy assigned channel to the CRTC
bpftool: Remove inclusion of utilities.mak from Makefiles
bpftool: Fix indent in option lists in the documentation
xdp: check prog type before updating BPF link
bpf: Fix mount source show for bpffs
bpf: Mark PTR_TO_FUNC register initially with zero offset
perf evsel: Override attr->sample_period for non-libpfm4 events
ipv4: update fib_info_cnt under spinlock protection
ipv4: avoid quadratic behavior in netns dismantle
mlx5: Don't accidentally set RTO_ONLINK before mlx5e_route_lookup_ipv4_get()
net/fsl: xgmac_mdio: Add workaround for erratum A-009885
net/fsl: xgmac_mdio: Fix incorrect iounmap when removing module
parisc: pdc_stable: Fix memory leak in pdcs_register_pathentries
riscv: dts: microchip: mpfs: Drop empty chosen node
drm/vmwgfx: Remove explicit transparent hugepages support
drm/vmwgfx: Remove unused compile options
f2fs: fix remove page failed in invalidate compress pages
f2fs: fix to avoid panic in is_alive() if metadata is inconsistent
f2fs: compress: fix potential deadlock of compress file
f2fs: fix to reserve space for IO align feature
f2fs: fix to check available space of CP area correctly in update_ckpt_flags()
crypto: octeontx2 - uninitialized variable in kvf_limits_store()
af_unix: annote lockless accesses to unix_tot_inflight & gc_in_progress
clk: Emit a stern warning with writable debugfs enabled
clk: si5341: Fix clock HW provider cleanup
pinctrl/rockchip: fix gpio device creation
gpio: mpc8xxx: Fix IRQ check in mpc8xxx_probe
gpio: idt3243x: Fix IRQ check in idt_gpio_probe
net/smc: Fix hung_task when removing SMC-R devices
net: axienet: increase reset timeout
net: axienet: Wait for PhyRstCmplt after core reset
net: axienet: reset core on initialization prior to MDIO access
net: axienet: add missing memory barriers
net: axienet: limit minimum TX ring size
net: axienet: Fix TX ring slot available check
net: axienet: fix number of TX ring slots for available check
net: axienet: fix for TX busy handling
net: axienet: increase default TX ring size to 128
bitops: protect find_first_{,zero}_bit properly
um: gitignore: Add kernel/capflags.c
HID: vivaldi: fix handling devices not using numbered reports
rtc: pxa: fix null pointer dereference
vdpa/mlx5: Fix wrong configuration of virtio_version_1_0
virtio_ring: mark ring unused on error
taskstats: Cleanup the use of task->exit_code
inet: frags: annotate races around fqdir->dead and fqdir->high_thresh
netns: add schedule point in ops_exit_list()
iwlwifi: fix Bz NMI behaviour
xfrm: Don't accidentally set RTO_ONLINK in decode_session4()
vdpa/mlx5: Restore cur_num_vqs in case of failure in change_num_qps()
gre: Don't accidentally set RTO_ONLINK in gre_fill_metadata_dst()
libcxgb: Don't accidentally set RTO_ONLINK in cxgb_find_route()
perf script: Fix hex dump character output
dmaengine: at_xdmac: Don't start transactions at tx_submit level
dmaengine: at_xdmac: Start transfer for cyclic channels in issue_pending
dmaengine: at_xdmac: Print debug message after realeasing the lock
dmaengine: at_xdmac: Fix concurrency over xfers_list
dmaengine: at_xdmac: Fix lld view setting
dmaengine: at_xdmac: Fix at_xdmac_lld struct definition
perf tools: Drop requirement for libstdc++.so for libopencsd check
perf probe: Fix ppc64 'perf probe add events failed' case
devlink: Remove misleading internal_flags from health reporter dump
arm64: dts: qcom: msm8996: drop not documented adreno properties
net: fix sock_timestamping_bind_phc() to release device
net: bonding: fix bond_xmit_broadcast return value error bug
net: ipa: fix atomic update in ipa_endpoint_replenish()
net_sched: restore "mpu xxx" handling
net: mscc: ocelot: don't let phylink re-enable TX PAUSE on the NPI port
bcmgenet: add WOL IRQ check
net: wwan: Fix MRU mismatch issue which may lead to data connection lost
net: ethernet: mtk_eth_soc: fix error checking in mtk_mac_config()
net: ocelot: Fix the call to switchdev_bridge_port_offload
net: sfp: fix high power modules without diagnostic monitoring
net: cpsw: avoid alignment faults by taking NET_IP_ALIGN into account
net: phy: micrel: use kszphy_suspend()/kszphy_resume for irq aware devices
net: mscc: ocelot: fix using match before it is set
dt-bindings: display: meson-dw-hdmi: add missing sound-name-prefix property
dt-bindings: display: meson-vpu: Add missing amlogic,canvas property
dt-bindings: watchdog: Require samsung,syscon-phandle for Exynos7
sch_api: Don't skip qdisc attach on ingress
scripts/dtc: dtx_diff: remove broken example from help text
lib82596: Fix IRQ check in sni_82596_probe
mm/hmm.c: allow VM_MIXEDMAP to work with hmm_range_fault
bonding: Fix extraction of ports from the packet headers
lib/test_meminit: destroy cache in kmem_cache_alloc_bulk() test
scripts: sphinx-pre-install: add required ctex dependency
scripts: sphinx-pre-install: Fix ctex support on Debian
Linux 5.15.17
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6ddef7c3463bfc127b34c39ebcf5d286d3117931
1547 lines
47 KiB
C
1547 lines
47 KiB
C
/* SPDX-License-Identifier: GPL-2.0 */
|
|
#ifndef _LINUX_MMZONE_H
|
|
#define _LINUX_MMZONE_H
|
|
|
|
#ifndef __ASSEMBLY__
|
|
#ifndef __GENERATING_BOUNDS_H
|
|
|
|
#include <linux/spinlock.h>
|
|
#include <linux/list.h>
|
|
#include <linux/wait.h>
|
|
#include <linux/bitops.h>
|
|
#include <linux/cache.h>
|
|
#include <linux/threads.h>
|
|
#include <linux/numa.h>
|
|
#include <linux/init.h>
|
|
#include <linux/seqlock.h>
|
|
#include <linux/nodemask.h>
|
|
#include <linux/pageblock-flags.h>
|
|
#include <linux/page-flags-layout.h>
|
|
#include <linux/atomic.h>
|
|
#include <linux/mm_types.h>
|
|
#include <linux/page-flags.h>
|
|
#include <linux/local_lock.h>
|
|
#include <asm/page.h>
|
|
|
|
/* Free memory management - zoned buddy allocator. */
|
|
#ifndef CONFIG_FORCE_MAX_ZONEORDER
|
|
#define MAX_ORDER 11
|
|
#else
|
|
#define MAX_ORDER CONFIG_FORCE_MAX_ZONEORDER
|
|
#endif
|
|
#define MAX_ORDER_NR_PAGES (1 << (MAX_ORDER - 1))
|
|
|
|
/*
|
|
* PAGE_ALLOC_COSTLY_ORDER is the order at which allocations are deemed
|
|
* costly to service. That is between allocation orders which should
|
|
* coalesce naturally under reasonable reclaim pressure and those which
|
|
* will not.
|
|
*/
|
|
#define PAGE_ALLOC_COSTLY_ORDER 3
|
|
|
|
enum migratetype {
|
|
MIGRATE_UNMOVABLE,
|
|
MIGRATE_MOVABLE,
|
|
MIGRATE_RECLAIMABLE,
|
|
#ifdef CONFIG_CMA
|
|
/*
|
|
* MIGRATE_CMA migration type is designed to mimic the way
|
|
* ZONE_MOVABLE works. Only movable pages can be allocated
|
|
* from MIGRATE_CMA pageblocks and page allocator never
|
|
* implicitly change migration type of MIGRATE_CMA pageblock.
|
|
*
|
|
* The way to use it is to change migratetype of a range of
|
|
* pageblocks to MIGRATE_CMA which can be done by
|
|
* __free_pageblock_cma() function. What is important though
|
|
* is that a range of pageblocks must be aligned to
|
|
* MAX_ORDER_NR_PAGES should biggest page be bigger than
|
|
* a single pageblock.
|
|
*/
|
|
MIGRATE_CMA,
|
|
#endif
|
|
MIGRATE_PCPTYPES, /* the number of types on the pcp lists */
|
|
MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
|
|
#ifdef CONFIG_MEMORY_ISOLATION
|
|
MIGRATE_ISOLATE, /* can't allocate from here */
|
|
#endif
|
|
MIGRATE_TYPES
|
|
};
|
|
|
|
/* In mm/page_alloc.c; keep in sync also with show_migration_types() there */
|
|
extern const char * const migratetype_names[MIGRATE_TYPES];
|
|
|
|
#ifdef CONFIG_CMA
|
|
# define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
|
|
# define is_migrate_cma_page(_page) (get_pageblock_migratetype(_page) == MIGRATE_CMA)
|
|
# define get_cma_migrate_type() MIGRATE_CMA
|
|
#else
|
|
# define is_migrate_cma(migratetype) false
|
|
# define is_migrate_cma_page(_page) false
|
|
# define get_cma_migrate_type() MIGRATE_MOVABLE
|
|
#endif
|
|
|
|
static inline bool is_migrate_movable(int mt)
|
|
{
|
|
return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE;
|
|
}
|
|
|
|
#define for_each_migratetype_order(order, type) \
|
|
for (order = 0; order < MAX_ORDER; order++) \
|
|
for (type = 0; type < MIGRATE_TYPES; type++)
|
|
|
|
extern int page_group_by_mobility_disabled;
|
|
|
|
#define MIGRATETYPE_MASK ((1UL << PB_migratetype_bits) - 1)
|
|
|
|
#define get_pageblock_migratetype(page) \
|
|
get_pfnblock_flags_mask(page, page_to_pfn(page), MIGRATETYPE_MASK)
|
|
|
|
struct free_area {
|
|
struct list_head free_list[MIGRATE_TYPES];
|
|
unsigned long nr_free;
|
|
};
|
|
|
|
static inline struct page *get_page_from_free_area(struct free_area *area,
|
|
int migratetype)
|
|
{
|
|
return list_first_entry_or_null(&area->free_list[migratetype],
|
|
struct page, lru);
|
|
}
|
|
|
|
static inline bool free_area_empty(struct free_area *area, int migratetype)
|
|
{
|
|
return list_empty(&area->free_list[migratetype]);
|
|
}
|
|
|
|
struct pglist_data;
|
|
|
|
/*
|
|
* Add a wild amount of padding here to ensure data fall into separate
|
|
* cachelines. There are very few zone structures in the machine, so space
|
|
* consumption is not a concern here.
|
|
*/
|
|
#if defined(CONFIG_SMP)
|
|
struct zone_padding {
|
|
char x[0];
|
|
} ____cacheline_internodealigned_in_smp;
|
|
#define ZONE_PADDING(name) struct zone_padding name;
|
|
#else
|
|
#define ZONE_PADDING(name)
|
|
#endif
|
|
|
|
#ifdef CONFIG_NUMA
|
|
enum numa_stat_item {
|
|
NUMA_HIT, /* allocated in intended node */
|
|
NUMA_MISS, /* allocated in non intended node */
|
|
NUMA_FOREIGN, /* was intended here, hit elsewhere */
|
|
NUMA_INTERLEAVE_HIT, /* interleaver preferred this zone */
|
|
NUMA_LOCAL, /* allocation from local node */
|
|
NUMA_OTHER, /* allocation from other node */
|
|
NR_VM_NUMA_EVENT_ITEMS
|
|
};
|
|
#else
|
|
#define NR_VM_NUMA_EVENT_ITEMS 0
|
|
#endif
|
|
|
|
enum zone_stat_item {
|
|
/* First 128 byte cacheline (assuming 64 bit words) */
|
|
NR_FREE_PAGES,
|
|
NR_ZONE_LRU_BASE, /* Used only for compaction and reclaim retry */
|
|
NR_ZONE_INACTIVE_ANON = NR_ZONE_LRU_BASE,
|
|
NR_ZONE_ACTIVE_ANON,
|
|
NR_ZONE_INACTIVE_FILE,
|
|
NR_ZONE_ACTIVE_FILE,
|
|
NR_ZONE_UNEVICTABLE,
|
|
NR_ZONE_WRITE_PENDING, /* Count of dirty, writeback and unstable pages */
|
|
NR_MLOCK, /* mlock()ed pages found and moved off LRU */
|
|
/* Second 128 byte cacheline */
|
|
NR_BOUNCE,
|
|
NR_ZSPAGES, /* allocated in zsmalloc */
|
|
NR_FREE_CMA_PAGES,
|
|
NR_VM_ZONE_STAT_ITEMS };
|
|
|
|
enum node_stat_item {
|
|
NR_LRU_BASE,
|
|
NR_INACTIVE_ANON = NR_LRU_BASE, /* must match order of LRU_[IN]ACTIVE */
|
|
NR_ACTIVE_ANON, /* " " " " " */
|
|
NR_INACTIVE_FILE, /* " " " " " */
|
|
NR_ACTIVE_FILE, /* " " " " " */
|
|
NR_UNEVICTABLE, /* " " " " " */
|
|
NR_SLAB_RECLAIMABLE_B,
|
|
NR_SLAB_UNRECLAIMABLE_B,
|
|
NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */
|
|
NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */
|
|
WORKINGSET_NODES,
|
|
WORKINGSET_REFAULT_BASE,
|
|
WORKINGSET_REFAULT_ANON = WORKINGSET_REFAULT_BASE,
|
|
WORKINGSET_REFAULT_FILE,
|
|
WORKINGSET_ACTIVATE_BASE,
|
|
WORKINGSET_ACTIVATE_ANON = WORKINGSET_ACTIVATE_BASE,
|
|
WORKINGSET_ACTIVATE_FILE,
|
|
WORKINGSET_RESTORE_BASE,
|
|
WORKINGSET_RESTORE_ANON = WORKINGSET_RESTORE_BASE,
|
|
WORKINGSET_RESTORE_FILE,
|
|
WORKINGSET_NODERECLAIM,
|
|
NR_ANON_MAPPED, /* Mapped anonymous pages */
|
|
NR_FILE_MAPPED, /* pagecache pages mapped into pagetables.
|
|
only modified from process context */
|
|
NR_FILE_PAGES,
|
|
NR_FILE_DIRTY,
|
|
NR_WRITEBACK,
|
|
NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */
|
|
NR_SHMEM, /* shmem pages (included tmpfs/GEM pages) */
|
|
NR_SHMEM_THPS,
|
|
NR_SHMEM_PMDMAPPED,
|
|
NR_FILE_THPS,
|
|
NR_FILE_PMDMAPPED,
|
|
NR_ANON_THPS,
|
|
NR_VMSCAN_WRITE,
|
|
NR_VMSCAN_IMMEDIATE, /* Prioritise for reclaim when writeback ends */
|
|
NR_DIRTIED, /* page dirtyings since bootup */
|
|
NR_WRITTEN, /* page writings since bootup */
|
|
NR_KERNEL_MISC_RECLAIMABLE, /* reclaimable non-slab kernel pages */
|
|
NR_FOLL_PIN_ACQUIRED, /* via: pin_user_page(), gup flag: FOLL_PIN */
|
|
NR_FOLL_PIN_RELEASED, /* pages returned via unpin_user_page() */
|
|
NR_KERNEL_STACK_KB, /* measured in KiB */
|
|
#if IS_ENABLED(CONFIG_SHADOW_CALL_STACK)
|
|
NR_KERNEL_SCS_KB, /* measured in KiB */
|
|
#endif
|
|
NR_PAGETABLE, /* used for pagetables */
|
|
#ifdef CONFIG_SWAP
|
|
NR_SWAPCACHE,
|
|
#endif
|
|
NR_VM_NODE_STAT_ITEMS
|
|
};
|
|
|
|
/*
|
|
* Returns true if the item should be printed in THPs (/proc/vmstat
|
|
* currently prints number of anon, file and shmem THPs. But the item
|
|
* is charged in pages).
|
|
*/
|
|
static __always_inline bool vmstat_item_print_in_thp(enum node_stat_item item)
|
|
{
|
|
if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
|
|
return false;
|
|
|
|
return item == NR_ANON_THPS ||
|
|
item == NR_FILE_THPS ||
|
|
item == NR_SHMEM_THPS ||
|
|
item == NR_SHMEM_PMDMAPPED ||
|
|
item == NR_FILE_PMDMAPPED;
|
|
}
|
|
|
|
/*
|
|
* Returns true if the value is measured in bytes (most vmstat values are
|
|
* measured in pages). This defines the API part, the internal representation
|
|
* might be different.
|
|
*/
|
|
static __always_inline bool vmstat_item_in_bytes(int idx)
|
|
{
|
|
/*
|
|
* Global and per-node slab counters track slab pages.
|
|
* It's expected that changes are multiples of PAGE_SIZE.
|
|
* Internally values are stored in pages.
|
|
*
|
|
* Per-memcg and per-lruvec counters track memory, consumed
|
|
* by individual slab objects. These counters are actually
|
|
* byte-precise.
|
|
*/
|
|
return (idx == NR_SLAB_RECLAIMABLE_B ||
|
|
idx == NR_SLAB_UNRECLAIMABLE_B);
|
|
}
|
|
|
|
/*
|
|
* We do arithmetic on the LRU lists in various places in the code,
|
|
* so it is important to keep the active lists LRU_ACTIVE higher in
|
|
* the array than the corresponding inactive lists, and to keep
|
|
* the *_FILE lists LRU_FILE higher than the corresponding _ANON lists.
|
|
*
|
|
* This has to be kept in sync with the statistics in zone_stat_item
|
|
* above and the descriptions in vmstat_text in mm/vmstat.c
|
|
*/
|
|
#define LRU_BASE 0
|
|
#define LRU_ACTIVE 1
|
|
#define LRU_FILE 2
|
|
|
|
enum lru_list {
|
|
LRU_INACTIVE_ANON = LRU_BASE,
|
|
LRU_ACTIVE_ANON = LRU_BASE + LRU_ACTIVE,
|
|
LRU_INACTIVE_FILE = LRU_BASE + LRU_FILE,
|
|
LRU_ACTIVE_FILE = LRU_BASE + LRU_FILE + LRU_ACTIVE,
|
|
LRU_UNEVICTABLE,
|
|
NR_LRU_LISTS
|
|
};
|
|
|
|
#define for_each_lru(lru) for (lru = 0; lru < NR_LRU_LISTS; lru++)
|
|
|
|
#define for_each_evictable_lru(lru) for (lru = 0; lru <= LRU_ACTIVE_FILE; lru++)
|
|
|
|
static inline bool is_file_lru(enum lru_list lru)
|
|
{
|
|
return (lru == LRU_INACTIVE_FILE || lru == LRU_ACTIVE_FILE);
|
|
}
|
|
|
|
static inline bool is_active_lru(enum lru_list lru)
|
|
{
|
|
return (lru == LRU_ACTIVE_ANON || lru == LRU_ACTIVE_FILE);
|
|
}
|
|
|
|
#define ANON_AND_FILE 2
|
|
|
|
enum lruvec_flags {
|
|
LRUVEC_CONGESTED, /* lruvec has many dirty pages
|
|
* backed by a congested BDI
|
|
*/
|
|
};
|
|
|
|
struct lruvec {
|
|
struct list_head lists[NR_LRU_LISTS];
|
|
/* per lruvec lru_lock for memcg */
|
|
spinlock_t lru_lock;
|
|
/*
|
|
* These track the cost of reclaiming one LRU - file or anon -
|
|
* over the other. As the observed cost of reclaiming one LRU
|
|
* increases, the reclaim scan balance tips toward the other.
|
|
*/
|
|
unsigned long anon_cost;
|
|
unsigned long file_cost;
|
|
/* Non-resident age, driven by LRU movement */
|
|
atomic_long_t nonresident_age;
|
|
/* Refaults at the time of last reclaim cycle */
|
|
unsigned long refaults[ANON_AND_FILE];
|
|
/* Various lruvec state flags (enum lruvec_flags) */
|
|
unsigned long flags;
|
|
#ifdef CONFIG_MEMCG
|
|
struct pglist_data *pgdat;
|
|
#endif
|
|
};
|
|
|
|
/* Isolate unmapped pages */
|
|
#define ISOLATE_UNMAPPED ((__force isolate_mode_t)0x2)
|
|
/* Isolate for asynchronous migration */
|
|
#define ISOLATE_ASYNC_MIGRATE ((__force isolate_mode_t)0x4)
|
|
/* Isolate unevictable pages */
|
|
#define ISOLATE_UNEVICTABLE ((__force isolate_mode_t)0x8)
|
|
|
|
/* LRU Isolation modes. */
|
|
typedef unsigned __bitwise isolate_mode_t;
|
|
|
|
enum zone_watermarks {
|
|
WMARK_MIN,
|
|
WMARK_LOW,
|
|
WMARK_HIGH,
|
|
NR_WMARK
|
|
};
|
|
|
|
/*
|
|
* One per migratetype for each PAGE_ALLOC_COSTLY_ORDER plus one additional
|
|
* for pageblock size for THP if configured.
|
|
*/
|
|
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
|
|
#define NR_PCP_THP 1
|
|
#else
|
|
#define NR_PCP_THP 0
|
|
#endif
|
|
#define NR_PCP_LISTS (MIGRATE_PCPTYPES * (PAGE_ALLOC_COSTLY_ORDER + 1 + NR_PCP_THP))
|
|
|
|
/*
|
|
* Shift to encode migratetype and order in the same integer, with order
|
|
* in the least significant bits.
|
|
*/
|
|
#define NR_PCP_ORDER_WIDTH 8
|
|
#define NR_PCP_ORDER_MASK ((1<<NR_PCP_ORDER_WIDTH) - 1)
|
|
|
|
#define min_wmark_pages(z) (z->_watermark[WMARK_MIN] + z->watermark_boost)
|
|
#define low_wmark_pages(z) (z->_watermark[WMARK_LOW] + z->watermark_boost)
|
|
#define high_wmark_pages(z) (z->_watermark[WMARK_HIGH] + z->watermark_boost)
|
|
#define wmark_pages(z, i) (z->_watermark[i] + z->watermark_boost)
|
|
|
|
/* Fields and list protected by pagesets local_lock in page_alloc.c */
|
|
struct per_cpu_pages {
|
|
int count; /* number of pages in the list */
|
|
int high; /* high watermark, emptying needed */
|
|
int batch; /* chunk size for buddy add/remove */
|
|
short free_factor; /* batch scaling factor during free */
|
|
#ifdef CONFIG_NUMA
|
|
short expire; /* When 0, remote pagesets are drained */
|
|
#endif
|
|
|
|
/* Lists of pages, one per migrate type stored on the pcp-lists */
|
|
struct list_head lists[NR_PCP_LISTS];
|
|
};
|
|
|
|
struct per_cpu_zonestat {
|
|
#ifdef CONFIG_SMP
|
|
s8 vm_stat_diff[NR_VM_ZONE_STAT_ITEMS];
|
|
s8 stat_threshold;
|
|
#endif
|
|
#ifdef CONFIG_NUMA
|
|
/*
|
|
* Low priority inaccurate counters that are only folded
|
|
* on demand. Use a large type to avoid the overhead of
|
|
* folding during refresh_cpu_vm_stats.
|
|
*/
|
|
unsigned long vm_numa_event[NR_VM_NUMA_EVENT_ITEMS];
|
|
#endif
|
|
};
|
|
|
|
struct per_cpu_nodestat {
|
|
s8 stat_threshold;
|
|
s8 vm_node_stat_diff[NR_VM_NODE_STAT_ITEMS];
|
|
};
|
|
|
|
#endif /* !__GENERATING_BOUNDS.H */
|
|
|
|
enum zone_type {
|
|
/*
|
|
* ZONE_DMA and ZONE_DMA32 are used when there are peripherals not able
|
|
* to DMA to all of the addressable memory (ZONE_NORMAL).
|
|
* On architectures where this area covers the whole 32 bit address
|
|
* space ZONE_DMA32 is used. ZONE_DMA is left for the ones with smaller
|
|
* DMA addressing constraints. This distinction is important as a 32bit
|
|
* DMA mask is assumed when ZONE_DMA32 is defined. Some 64-bit
|
|
* platforms may need both zones as they support peripherals with
|
|
* different DMA addressing limitations.
|
|
*/
|
|
#ifdef CONFIG_ZONE_DMA
|
|
ZONE_DMA,
|
|
#endif
|
|
#ifdef CONFIG_ZONE_DMA32
|
|
ZONE_DMA32,
|
|
#endif
|
|
/*
|
|
* Normal addressable memory is in ZONE_NORMAL. DMA operations can be
|
|
* performed on pages in ZONE_NORMAL if the DMA devices support
|
|
* transfers to all addressable memory.
|
|
*/
|
|
ZONE_NORMAL,
|
|
#ifdef CONFIG_HIGHMEM
|
|
/*
|
|
* A memory area that is only addressable by the kernel through
|
|
* mapping portions into its own address space. This is for example
|
|
* used by i386 to allow the kernel to address the memory beyond
|
|
* 900MB. The kernel will set up special mappings (page
|
|
* table entries on i386) for each page that the kernel needs to
|
|
* access.
|
|
*/
|
|
ZONE_HIGHMEM,
|
|
#endif
|
|
/*
|
|
* ZONE_MOVABLE is similar to ZONE_NORMAL, except that it contains
|
|
* movable pages with few exceptional cases described below. Main use
|
|
* cases for ZONE_MOVABLE are to make memory offlining/unplug more
|
|
* likely to succeed, and to locally limit unmovable allocations - e.g.,
|
|
* to increase the number of THP/huge pages. Notable special cases are:
|
|
*
|
|
* 1. Pinned pages: (long-term) pinning of movable pages might
|
|
* essentially turn such pages unmovable. Therefore, we do not allow
|
|
* pinning long-term pages in ZONE_MOVABLE. When pages are pinned and
|
|
* faulted, they come from the right zone right away. However, it is
|
|
* still possible that address space already has pages in
|
|
* ZONE_MOVABLE at the time when pages are pinned (i.e. user has
|
|
* touches that memory before pinning). In such case we migrate them
|
|
* to a different zone. When migration fails - pinning fails.
|
|
* 2. memblock allocations: kernelcore/movablecore setups might create
|
|
* situations where ZONE_MOVABLE contains unmovable allocations
|
|
* after boot. Memory offlining and allocations fail early.
|
|
* 3. Memory holes: kernelcore/movablecore setups might create very rare
|
|
* situations where ZONE_MOVABLE contains memory holes after boot,
|
|
* for example, if we have sections that are only partially
|
|
* populated. Memory offlining and allocations fail early.
|
|
* 4. PG_hwpoison pages: while poisoned pages can be skipped during
|
|
* memory offlining, such pages cannot be allocated.
|
|
* 5. Unmovable PG_offline pages: in paravirtualized environments,
|
|
* hotplugged memory blocks might only partially be managed by the
|
|
* buddy (e.g., via XEN-balloon, Hyper-V balloon, virtio-mem). The
|
|
* parts not manged by the buddy are unmovable PG_offline pages. In
|
|
* some cases (virtio-mem), such pages can be skipped during
|
|
* memory offlining, however, cannot be moved/allocated. These
|
|
* techniques might use alloc_contig_range() to hide previously
|
|
* exposed pages from the buddy again (e.g., to implement some sort
|
|
* of memory unplug in virtio-mem).
|
|
* 6. ZERO_PAGE(0), kernelcore/movablecore setups might create
|
|
* situations where ZERO_PAGE(0) which is allocated differently
|
|
* on different platforms may end up in a movable zone. ZERO_PAGE(0)
|
|
* cannot be migrated.
|
|
* 7. Memory-hotplug: when using memmap_on_memory and onlining the
|
|
* memory to the MOVABLE zone, the vmemmap pages are also placed in
|
|
* such zone. Such pages cannot be really moved around as they are
|
|
* self-stored in the range, but they are treated as movable when
|
|
* the range they describe is about to be offlined.
|
|
*
|
|
* In general, no unmovable allocations that degrade memory offlining
|
|
* should end up in ZONE_MOVABLE. Allocators (like alloc_contig_range())
|
|
* have to expect that migrating pages in ZONE_MOVABLE can fail (even
|
|
* if has_unmovable_pages() states that there are no unmovable pages,
|
|
* there can be false negatives).
|
|
*/
|
|
ZONE_MOVABLE,
|
|
#ifdef CONFIG_ZONE_DEVICE
|
|
ZONE_DEVICE,
|
|
#endif
|
|
__MAX_NR_ZONES
|
|
|
|
};
|
|
|
|
#ifndef __GENERATING_BOUNDS_H
|
|
|
|
#define ASYNC_AND_SYNC 2
|
|
|
|
struct zone {
|
|
/* Read-mostly fields */
|
|
|
|
/* zone watermarks, access with *_wmark_pages(zone) macros */
|
|
unsigned long _watermark[NR_WMARK];
|
|
unsigned long watermark_boost;
|
|
|
|
unsigned long nr_reserved_highatomic;
|
|
|
|
/*
|
|
* We don't know if the memory that we're going to allocate will be
|
|
* freeable or/and it will be released eventually, so to avoid totally
|
|
* wasting several GB of ram we must reserve some of the lower zone
|
|
* memory (otherwise we risk to run OOM on the lower zones despite
|
|
* there being tons of freeable ram on the higher zones). This array is
|
|
* recalculated at runtime if the sysctl_lowmem_reserve_ratio sysctl
|
|
* changes.
|
|
*/
|
|
long lowmem_reserve[MAX_NR_ZONES];
|
|
|
|
#ifdef CONFIG_NUMA
|
|
int node;
|
|
#endif
|
|
struct pglist_data *zone_pgdat;
|
|
struct per_cpu_pages __percpu *per_cpu_pageset;
|
|
struct per_cpu_zonestat __percpu *per_cpu_zonestats;
|
|
/*
|
|
* the high and batch values are copied to individual pagesets for
|
|
* faster access
|
|
*/
|
|
int pageset_high;
|
|
int pageset_batch;
|
|
|
|
#ifndef CONFIG_SPARSEMEM
|
|
/*
|
|
* Flags for a pageblock_nr_pages block. See pageblock-flags.h.
|
|
* In SPARSEMEM, this map is stored in struct mem_section
|
|
*/
|
|
unsigned long *pageblock_flags;
|
|
#endif /* CONFIG_SPARSEMEM */
|
|
|
|
/* zone_start_pfn == zone_start_paddr >> PAGE_SHIFT */
|
|
unsigned long zone_start_pfn;
|
|
|
|
/*
|
|
* spanned_pages is the total pages spanned by the zone, including
|
|
* holes, which is calculated as:
|
|
* spanned_pages = zone_end_pfn - zone_start_pfn;
|
|
*
|
|
* present_pages is physical pages existing within the zone, which
|
|
* is calculated as:
|
|
* present_pages = spanned_pages - absent_pages(pages in holes);
|
|
*
|
|
* present_early_pages is present pages existing within the zone
|
|
* located on memory available since early boot, excluding hotplugged
|
|
* memory.
|
|
*
|
|
* managed_pages is present pages managed by the buddy system, which
|
|
* is calculated as (reserved_pages includes pages allocated by the
|
|
* bootmem allocator):
|
|
* managed_pages = present_pages - reserved_pages;
|
|
*
|
|
* cma pages is present pages that are assigned for CMA use
|
|
* (MIGRATE_CMA).
|
|
*
|
|
* So present_pages may be used by memory hotplug or memory power
|
|
* management logic to figure out unmanaged pages by checking
|
|
* (present_pages - managed_pages). And managed_pages should be used
|
|
* by page allocator and vm scanner to calculate all kinds of watermarks
|
|
* and thresholds.
|
|
*
|
|
* Locking rules:
|
|
*
|
|
* zone_start_pfn and spanned_pages are protected by span_seqlock.
|
|
* It is a seqlock because it has to be read outside of zone->lock,
|
|
* and it is done in the main allocator path. But, it is written
|
|
* quite infrequently.
|
|
*
|
|
* The span_seq lock is declared along with zone->lock because it is
|
|
* frequently read in proximity to zone->lock. It's good to
|
|
* give them a chance of being in the same cacheline.
|
|
*
|
|
* Write access to present_pages at runtime should be protected by
|
|
* mem_hotplug_begin/end(). Any reader who can't tolerant drift of
|
|
* present_pages should get_online_mems() to get a stable value.
|
|
*/
|
|
atomic_long_t managed_pages;
|
|
unsigned long spanned_pages;
|
|
unsigned long present_pages;
|
|
#if defined(CONFIG_MEMORY_HOTPLUG)
|
|
unsigned long present_early_pages;
|
|
#endif
|
|
#ifdef CONFIG_CMA
|
|
unsigned long cma_pages;
|
|
#endif
|
|
|
|
const char *name;
|
|
|
|
#ifdef CONFIG_MEMORY_ISOLATION
|
|
/*
|
|
* Number of isolated pageblock. It is used to solve incorrect
|
|
* freepage counting problem due to racy retrieving migratetype
|
|
* of pageblock. Protected by zone->lock.
|
|
*/
|
|
unsigned long nr_isolate_pageblock;
|
|
#endif
|
|
|
|
#ifdef CONFIG_MEMORY_HOTPLUG
|
|
/* see spanned/present_pages for more description */
|
|
seqlock_t span_seqlock;
|
|
#endif
|
|
|
|
int initialized;
|
|
|
|
/* Write-intensive fields used from the page allocator */
|
|
ZONE_PADDING(_pad1_)
|
|
|
|
/* free areas of different sizes */
|
|
struct free_area free_area[MAX_ORDER];
|
|
|
|
/* zone flags, see below */
|
|
unsigned long flags;
|
|
|
|
/* Primarily protects free_area */
|
|
spinlock_t lock;
|
|
|
|
/* Write-intensive fields used by compaction and vmstats. */
|
|
ZONE_PADDING(_pad2_)
|
|
|
|
/*
|
|
* When free pages are below this point, additional steps are taken
|
|
* when reading the number of free pages to avoid per-cpu counter
|
|
* drift allowing watermarks to be breached
|
|
*/
|
|
unsigned long percpu_drift_mark;
|
|
|
|
#if defined CONFIG_COMPACTION || defined CONFIG_CMA
|
|
/* pfn where compaction free scanner should start */
|
|
unsigned long compact_cached_free_pfn;
|
|
/* pfn where compaction migration scanner should start */
|
|
unsigned long compact_cached_migrate_pfn[ASYNC_AND_SYNC];
|
|
unsigned long compact_init_migrate_pfn;
|
|
unsigned long compact_init_free_pfn;
|
|
#endif
|
|
|
|
#ifdef CONFIG_COMPACTION
|
|
/*
|
|
* On compaction failure, 1<<compact_defer_shift compactions
|
|
* are skipped before trying again. The number attempted since
|
|
* last failure is tracked with compact_considered.
|
|
* compact_order_failed is the minimum compaction failed order.
|
|
*/
|
|
unsigned int compact_considered;
|
|
unsigned int compact_defer_shift;
|
|
int compact_order_failed;
|
|
#endif
|
|
|
|
#if defined CONFIG_COMPACTION || defined CONFIG_CMA
|
|
/* Set to true when the PG_migrate_skip bits should be cleared */
|
|
bool compact_blockskip_flush;
|
|
#endif
|
|
|
|
bool contiguous;
|
|
|
|
ZONE_PADDING(_pad3_)
|
|
/* Zone statistics */
|
|
atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS];
|
|
atomic_long_t vm_numa_event[NR_VM_NUMA_EVENT_ITEMS];
|
|
} ____cacheline_internodealigned_in_smp;
|
|
|
|
enum pgdat_flags {
|
|
PGDAT_DIRTY, /* reclaim scanning has recently found
|
|
* many dirty file pages at the tail
|
|
* of the LRU.
|
|
*/
|
|
PGDAT_WRITEBACK, /* reclaim scanning has recently found
|
|
* many pages under writeback
|
|
*/
|
|
PGDAT_RECLAIM_LOCKED, /* prevents concurrent reclaim */
|
|
};
|
|
|
|
enum zone_flags {
|
|
ZONE_BOOSTED_WATERMARK, /* zone recently boosted watermarks.
|
|
* Cleared when kswapd is woken.
|
|
*/
|
|
ZONE_RECLAIM_ACTIVE, /* kswapd may be scanning the zone. */
|
|
};
|
|
|
|
static inline unsigned long zone_managed_pages(struct zone *zone)
|
|
{
|
|
return (unsigned long)atomic_long_read(&zone->managed_pages);
|
|
}
|
|
|
|
static inline unsigned long zone_cma_pages(struct zone *zone)
|
|
{
|
|
#ifdef CONFIG_CMA
|
|
return zone->cma_pages;
|
|
#else
|
|
return 0;
|
|
#endif
|
|
}
|
|
|
|
static inline unsigned long zone_end_pfn(const struct zone *zone)
|
|
{
|
|
return zone->zone_start_pfn + zone->spanned_pages;
|
|
}
|
|
|
|
static inline bool zone_spans_pfn(const struct zone *zone, unsigned long pfn)
|
|
{
|
|
return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone);
|
|
}
|
|
|
|
static inline bool zone_is_initialized(struct zone *zone)
|
|
{
|
|
return zone->initialized;
|
|
}
|
|
|
|
static inline bool zone_is_empty(struct zone *zone)
|
|
{
|
|
return zone->spanned_pages == 0;
|
|
}
|
|
|
|
/*
|
|
* Return true if [start_pfn, start_pfn + nr_pages) range has a non-empty
|
|
* intersection with the given zone
|
|
*/
|
|
static inline bool zone_intersects(struct zone *zone,
|
|
unsigned long start_pfn, unsigned long nr_pages)
|
|
{
|
|
if (zone_is_empty(zone))
|
|
return false;
|
|
if (start_pfn >= zone_end_pfn(zone) ||
|
|
start_pfn + nr_pages <= zone->zone_start_pfn)
|
|
return false;
|
|
|
|
return true;
|
|
}
|
|
|
|
/*
|
|
* The "priority" of VM scanning is how much of the queues we will scan in one
|
|
* go. A value of 12 for DEF_PRIORITY implies that we will scan 1/4096th of the
|
|
* queues ("queue_length >> 12") during an aging round.
|
|
*/
|
|
#define DEF_PRIORITY 12
|
|
|
|
/* Maximum number of zones on a zonelist */
|
|
#define MAX_ZONES_PER_ZONELIST (MAX_NUMNODES * MAX_NR_ZONES)
|
|
|
|
enum {
|
|
ZONELIST_FALLBACK, /* zonelist with fallback */
|
|
#ifdef CONFIG_NUMA
|
|
/*
|
|
* The NUMA zonelists are doubled because we need zonelists that
|
|
* restrict the allocations to a single node for __GFP_THISNODE.
|
|
*/
|
|
ZONELIST_NOFALLBACK, /* zonelist without fallback (__GFP_THISNODE) */
|
|
#endif
|
|
MAX_ZONELISTS
|
|
};
|
|
|
|
/*
|
|
* This struct contains information about a zone in a zonelist. It is stored
|
|
* here to avoid dereferences into large structures and lookups of tables
|
|
*/
|
|
struct zoneref {
|
|
struct zone *zone; /* Pointer to actual zone */
|
|
int zone_idx; /* zone_idx(zoneref->zone) */
|
|
};
|
|
|
|
/*
|
|
* One allocation request operates on a zonelist. A zonelist
|
|
* is a list of zones, the first one is the 'goal' of the
|
|
* allocation, the other zones are fallback zones, in decreasing
|
|
* priority.
|
|
*
|
|
* To speed the reading of the zonelist, the zonerefs contain the zone index
|
|
* of the entry being read. Helper functions to access information given
|
|
* a struct zoneref are
|
|
*
|
|
* zonelist_zone() - Return the struct zone * for an entry in _zonerefs
|
|
* zonelist_zone_idx() - Return the index of the zone for an entry
|
|
* zonelist_node_idx() - Return the index of the node for an entry
|
|
*/
|
|
struct zonelist {
|
|
struct zoneref _zonerefs[MAX_ZONES_PER_ZONELIST + 1];
|
|
};
|
|
|
|
/*
|
|
* The array of struct pages for flatmem.
|
|
* It must be declared for SPARSEMEM as well because there are configurations
|
|
* that rely on that.
|
|
*/
|
|
extern struct page *mem_map;
|
|
|
|
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
|
|
struct deferred_split {
|
|
spinlock_t split_queue_lock;
|
|
struct list_head split_queue;
|
|
unsigned long split_queue_len;
|
|
};
|
|
#endif
|
|
|
|
/*
|
|
* On NUMA machines, each NUMA node would have a pg_data_t to describe
|
|
* it's memory layout. On UMA machines there is a single pglist_data which
|
|
* describes the whole memory.
|
|
*
|
|
* Memory statistics and page replacement data structures are maintained on a
|
|
* per-zone basis.
|
|
*/
|
|
typedef struct pglist_data {
|
|
/*
|
|
* node_zones contains just the zones for THIS node. Not all of the
|
|
* zones may be populated, but it is the full list. It is referenced by
|
|
* this node's node_zonelists as well as other node's node_zonelists.
|
|
*/
|
|
struct zone node_zones[MAX_NR_ZONES];
|
|
|
|
/*
|
|
* node_zonelists contains references to all zones in all nodes.
|
|
* Generally the first zones will be references to this node's
|
|
* node_zones.
|
|
*/
|
|
struct zonelist node_zonelists[MAX_ZONELISTS];
|
|
|
|
int nr_zones; /* number of populated zones in this node */
|
|
#ifdef CONFIG_FLATMEM /* means !SPARSEMEM */
|
|
struct page *node_mem_map;
|
|
#ifdef CONFIG_PAGE_EXTENSION
|
|
struct page_ext *node_page_ext;
|
|
#endif
|
|
#endif
|
|
#if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT)
|
|
/*
|
|
* Must be held any time you expect node_start_pfn,
|
|
* node_present_pages, node_spanned_pages or nr_zones to stay constant.
|
|
* Also synchronizes pgdat->first_deferred_pfn during deferred page
|
|
* init.
|
|
*
|
|
* pgdat_resize_lock() and pgdat_resize_unlock() are provided to
|
|
* manipulate node_size_lock without checking for CONFIG_MEMORY_HOTPLUG
|
|
* or CONFIG_DEFERRED_STRUCT_PAGE_INIT.
|
|
*
|
|
* Nests above zone->lock and zone->span_seqlock
|
|
*/
|
|
spinlock_t node_size_lock;
|
|
#endif
|
|
unsigned long node_start_pfn;
|
|
unsigned long node_present_pages; /* total number of physical pages */
|
|
unsigned long node_spanned_pages; /* total size of physical page
|
|
range, including holes */
|
|
int node_id;
|
|
wait_queue_head_t kswapd_wait;
|
|
wait_queue_head_t pfmemalloc_wait;
|
|
struct task_struct *kswapd; /* Protected by
|
|
mem_hotplug_begin/end() */
|
|
int kswapd_order;
|
|
enum zone_type kswapd_highest_zoneidx;
|
|
|
|
int kswapd_failures; /* Number of 'reclaimed == 0' runs */
|
|
|
|
#ifdef CONFIG_COMPACTION
|
|
int kcompactd_max_order;
|
|
enum zone_type kcompactd_highest_zoneidx;
|
|
wait_queue_head_t kcompactd_wait;
|
|
struct task_struct *kcompactd;
|
|
bool proactive_compact_trigger;
|
|
#endif
|
|
/*
|
|
* This is a per-node reserve of pages that are not available
|
|
* to userspace allocations.
|
|
*/
|
|
unsigned long totalreserve_pages;
|
|
|
|
#ifdef CONFIG_NUMA
|
|
/*
|
|
* node reclaim becomes active if more unmapped pages exist.
|
|
*/
|
|
unsigned long min_unmapped_pages;
|
|
unsigned long min_slab_pages;
|
|
#endif /* CONFIG_NUMA */
|
|
|
|
/* Write-intensive fields used by page reclaim */
|
|
ZONE_PADDING(_pad1_)
|
|
|
|
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
|
|
/*
|
|
* If memory initialisation on large machines is deferred then this
|
|
* is the first PFN that needs to be initialised.
|
|
*/
|
|
unsigned long first_deferred_pfn;
|
|
#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
|
|
|
|
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
|
|
struct deferred_split deferred_split_queue;
|
|
#endif
|
|
|
|
/* Fields commonly accessed by the page reclaim scanner */
|
|
|
|
/*
|
|
* NOTE: THIS IS UNUSED IF MEMCG IS ENABLED.
|
|
*
|
|
* Use mem_cgroup_lruvec() to look up lruvecs.
|
|
*/
|
|
struct lruvec __lruvec;
|
|
|
|
unsigned long flags;
|
|
|
|
ZONE_PADDING(_pad2_)
|
|
|
|
/* Per-node vmstats */
|
|
struct per_cpu_nodestat __percpu *per_cpu_nodestats;
|
|
atomic_long_t vm_stat[NR_VM_NODE_STAT_ITEMS];
|
|
} pg_data_t;
|
|
|
|
#define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages)
|
|
#define node_spanned_pages(nid) (NODE_DATA(nid)->node_spanned_pages)
|
|
#ifdef CONFIG_FLATMEM
|
|
#define pgdat_page_nr(pgdat, pagenr) ((pgdat)->node_mem_map + (pagenr))
|
|
#else
|
|
#define pgdat_page_nr(pgdat, pagenr) pfn_to_page((pgdat)->node_start_pfn + (pagenr))
|
|
#endif
|
|
#define nid_page_nr(nid, pagenr) pgdat_page_nr(NODE_DATA(nid),(pagenr))
|
|
|
|
#define node_start_pfn(nid) (NODE_DATA(nid)->node_start_pfn)
|
|
#define node_end_pfn(nid) pgdat_end_pfn(NODE_DATA(nid))
|
|
|
|
static inline unsigned long pgdat_end_pfn(pg_data_t *pgdat)
|
|
{
|
|
return pgdat->node_start_pfn + pgdat->node_spanned_pages;
|
|
}
|
|
|
|
static inline bool pgdat_is_empty(pg_data_t *pgdat)
|
|
{
|
|
return !pgdat->node_start_pfn && !pgdat->node_spanned_pages;
|
|
}
|
|
|
|
#include <linux/memory_hotplug.h>
|
|
|
|
void build_all_zonelists(pg_data_t *pgdat);
|
|
void wakeup_kswapd(struct zone *zone, gfp_t gfp_mask, int order,
|
|
enum zone_type highest_zoneidx);
|
|
bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
|
|
int highest_zoneidx, unsigned int alloc_flags,
|
|
long free_pages);
|
|
bool zone_watermark_ok(struct zone *z, unsigned int order,
|
|
unsigned long mark, int highest_zoneidx,
|
|
unsigned int alloc_flags);
|
|
bool zone_watermark_ok_safe(struct zone *z, unsigned int order,
|
|
unsigned long mark, int highest_zoneidx);
|
|
/*
|
|
* Memory initialization context, use to differentiate memory added by
|
|
* the platform statically or via memory hotplug interface.
|
|
*/
|
|
enum meminit_context {
|
|
MEMINIT_EARLY,
|
|
MEMINIT_HOTPLUG,
|
|
};
|
|
|
|
extern void init_currently_empty_zone(struct zone *zone, unsigned long start_pfn,
|
|
unsigned long size);
|
|
|
|
extern void lruvec_init(struct lruvec *lruvec);
|
|
|
|
static inline struct pglist_data *lruvec_pgdat(struct lruvec *lruvec)
|
|
{
|
|
#ifdef CONFIG_MEMCG
|
|
return lruvec->pgdat;
|
|
#else
|
|
return container_of(lruvec, struct pglist_data, __lruvec);
|
|
#endif
|
|
}
|
|
|
|
#ifdef CONFIG_HAVE_MEMORYLESS_NODES
|
|
int local_memory_node(int node_id);
|
|
#else
|
|
static inline int local_memory_node(int node_id) { return node_id; };
|
|
#endif
|
|
|
|
/*
|
|
* zone_idx() returns 0 for the ZONE_DMA zone, 1 for the ZONE_NORMAL zone, etc.
|
|
*/
|
|
#define zone_idx(zone) ((zone) - (zone)->zone_pgdat->node_zones)
|
|
|
|
#ifdef CONFIG_ZONE_DEVICE
|
|
static inline bool zone_is_zone_device(struct zone *zone)
|
|
{
|
|
return zone_idx(zone) == ZONE_DEVICE;
|
|
}
|
|
#else
|
|
static inline bool zone_is_zone_device(struct zone *zone)
|
|
{
|
|
return false;
|
|
}
|
|
#endif
|
|
|
|
/*
|
|
* Returns true if a zone has pages managed by the buddy allocator.
|
|
* All the reclaim decisions have to use this function rather than
|
|
* populated_zone(). If the whole zone is reserved then we can easily
|
|
* end up with populated_zone() && !managed_zone().
|
|
*/
|
|
static inline bool managed_zone(struct zone *zone)
|
|
{
|
|
return zone_managed_pages(zone);
|
|
}
|
|
|
|
/* Returns true if a zone has memory */
|
|
static inline bool populated_zone(struct zone *zone)
|
|
{
|
|
return zone->present_pages;
|
|
}
|
|
|
|
#ifdef CONFIG_NUMA
|
|
static inline int zone_to_nid(struct zone *zone)
|
|
{
|
|
return zone->node;
|
|
}
|
|
|
|
static inline void zone_set_nid(struct zone *zone, int nid)
|
|
{
|
|
zone->node = nid;
|
|
}
|
|
#else
|
|
static inline int zone_to_nid(struct zone *zone)
|
|
{
|
|
return 0;
|
|
}
|
|
|
|
static inline void zone_set_nid(struct zone *zone, int nid) {}
|
|
#endif
|
|
|
|
extern int movable_zone;
|
|
|
|
static inline int is_highmem_idx(enum zone_type idx)
|
|
{
|
|
#ifdef CONFIG_HIGHMEM
|
|
return (idx == ZONE_HIGHMEM ||
|
|
(idx == ZONE_MOVABLE && movable_zone == ZONE_HIGHMEM));
|
|
#else
|
|
return 0;
|
|
#endif
|
|
}
|
|
|
|
#ifdef CONFIG_ZONE_DMA
|
|
bool has_managed_dma(void);
|
|
#else
|
|
static inline bool has_managed_dma(void)
|
|
{
|
|
return false;
|
|
}
|
|
#endif
|
|
|
|
/**
|
|
* is_highmem - helper function to quickly check if a struct zone is a
|
|
* highmem zone or not. This is an attempt to keep references
|
|
* to ZONE_{DMA/NORMAL/HIGHMEM/etc} in general code to a minimum.
|
|
* @zone: pointer to struct zone variable
|
|
* Return: 1 for a highmem zone, 0 otherwise
|
|
*/
|
|
static inline int is_highmem(struct zone *zone)
|
|
{
|
|
#ifdef CONFIG_HIGHMEM
|
|
return is_highmem_idx(zone_idx(zone));
|
|
#else
|
|
return 0;
|
|
#endif
|
|
}
|
|
|
|
/* These two functions are used to setup the per zone pages min values */
|
|
struct ctl_table;
|
|
|
|
int min_free_kbytes_sysctl_handler(struct ctl_table *, int, void *, size_t *,
|
|
loff_t *);
|
|
int watermark_scale_factor_sysctl_handler(struct ctl_table *, int, void *,
|
|
size_t *, loff_t *);
|
|
extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES];
|
|
int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int, void *,
|
|
size_t *, loff_t *);
|
|
int percpu_pagelist_high_fraction_sysctl_handler(struct ctl_table *, int,
|
|
void *, size_t *, loff_t *);
|
|
int sysctl_min_unmapped_ratio_sysctl_handler(struct ctl_table *, int,
|
|
void *, size_t *, loff_t *);
|
|
int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *, int,
|
|
void *, size_t *, loff_t *);
|
|
int numa_zonelist_order_handler(struct ctl_table *, int,
|
|
void *, size_t *, loff_t *);
|
|
extern int percpu_pagelist_high_fraction;
|
|
extern char numa_zonelist_order[];
|
|
#define NUMA_ZONELIST_ORDER_LEN 16
|
|
|
|
#ifndef CONFIG_NUMA
|
|
|
|
extern struct pglist_data contig_page_data;
|
|
static inline struct pglist_data *NODE_DATA(int nid)
|
|
{
|
|
return &contig_page_data;
|
|
}
|
|
#define NODE_MEM_MAP(nid) mem_map
|
|
|
|
#else /* CONFIG_NUMA */
|
|
|
|
#include <asm/mmzone.h>
|
|
|
|
#endif /* !CONFIG_NUMA */
|
|
|
|
extern struct pglist_data *first_online_pgdat(void);
|
|
extern struct pglist_data *next_online_pgdat(struct pglist_data *pgdat);
|
|
extern struct zone *next_zone(struct zone *zone);
|
|
|
|
/**
|
|
* for_each_online_pgdat - helper macro to iterate over all online nodes
|
|
* @pgdat: pointer to a pg_data_t variable
|
|
*/
|
|
#define for_each_online_pgdat(pgdat) \
|
|
for (pgdat = first_online_pgdat(); \
|
|
pgdat; \
|
|
pgdat = next_online_pgdat(pgdat))
|
|
/**
|
|
* for_each_zone - helper macro to iterate over all memory zones
|
|
* @zone: pointer to struct zone variable
|
|
*
|
|
* The user only needs to declare the zone variable, for_each_zone
|
|
* fills it in.
|
|
*/
|
|
#define for_each_zone(zone) \
|
|
for (zone = (first_online_pgdat())->node_zones; \
|
|
zone; \
|
|
zone = next_zone(zone))
|
|
|
|
#define for_each_populated_zone(zone) \
|
|
for (zone = (first_online_pgdat())->node_zones; \
|
|
zone; \
|
|
zone = next_zone(zone)) \
|
|
if (!populated_zone(zone)) \
|
|
; /* do nothing */ \
|
|
else
|
|
|
|
static inline struct zone *zonelist_zone(struct zoneref *zoneref)
|
|
{
|
|
return zoneref->zone;
|
|
}
|
|
|
|
static inline int zonelist_zone_idx(struct zoneref *zoneref)
|
|
{
|
|
return zoneref->zone_idx;
|
|
}
|
|
|
|
static inline int zonelist_node_idx(struct zoneref *zoneref)
|
|
{
|
|
return zone_to_nid(zoneref->zone);
|
|
}
|
|
|
|
struct zoneref *__next_zones_zonelist(struct zoneref *z,
|
|
enum zone_type highest_zoneidx,
|
|
nodemask_t *nodes);
|
|
|
|
/**
|
|
* next_zones_zonelist - Returns the next zone at or below highest_zoneidx within the allowed nodemask using a cursor within a zonelist as a starting point
|
|
* @z: The cursor used as a starting point for the search
|
|
* @highest_zoneidx: The zone index of the highest zone to return
|
|
* @nodes: An optional nodemask to filter the zonelist with
|
|
*
|
|
* This function returns the next zone at or below a given zone index that is
|
|
* within the allowed nodemask using a cursor as the starting point for the
|
|
* search. The zoneref returned is a cursor that represents the current zone
|
|
* being examined. It should be advanced by one before calling
|
|
* next_zones_zonelist again.
|
|
*
|
|
* Return: the next zone at or below highest_zoneidx within the allowed
|
|
* nodemask using a cursor within a zonelist as a starting point
|
|
*/
|
|
static __always_inline struct zoneref *next_zones_zonelist(struct zoneref *z,
|
|
enum zone_type highest_zoneidx,
|
|
nodemask_t *nodes)
|
|
{
|
|
if (likely(!nodes && zonelist_zone_idx(z) <= highest_zoneidx))
|
|
return z;
|
|
return __next_zones_zonelist(z, highest_zoneidx, nodes);
|
|
}
|
|
|
|
/**
|
|
* first_zones_zonelist - Returns the first zone at or below highest_zoneidx within the allowed nodemask in a zonelist
|
|
* @zonelist: The zonelist to search for a suitable zone
|
|
* @highest_zoneidx: The zone index of the highest zone to return
|
|
* @nodes: An optional nodemask to filter the zonelist with
|
|
*
|
|
* This function returns the first zone at or below a given zone index that is
|
|
* within the allowed nodemask. The zoneref returned is a cursor that can be
|
|
* used to iterate the zonelist with next_zones_zonelist by advancing it by
|
|
* one before calling.
|
|
*
|
|
* When no eligible zone is found, zoneref->zone is NULL (zoneref itself is
|
|
* never NULL). This may happen either genuinely, or due to concurrent nodemask
|
|
* update due to cpuset modification.
|
|
*
|
|
* Return: Zoneref pointer for the first suitable zone found
|
|
*/
|
|
static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
|
|
enum zone_type highest_zoneidx,
|
|
nodemask_t *nodes)
|
|
{
|
|
return next_zones_zonelist(zonelist->_zonerefs,
|
|
highest_zoneidx, nodes);
|
|
}
|
|
|
|
/**
|
|
* for_each_zone_zonelist_nodemask - helper macro to iterate over valid zones in a zonelist at or below a given zone index and within a nodemask
|
|
* @zone: The current zone in the iterator
|
|
* @z: The current pointer within zonelist->_zonerefs being iterated
|
|
* @zlist: The zonelist being iterated
|
|
* @highidx: The zone index of the highest zone to return
|
|
* @nodemask: Nodemask allowed by the allocator
|
|
*
|
|
* This iterator iterates though all zones at or below a given zone index and
|
|
* within a given nodemask
|
|
*/
|
|
#define for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, nodemask) \
|
|
for (z = first_zones_zonelist(zlist, highidx, nodemask), zone = zonelist_zone(z); \
|
|
zone; \
|
|
z = next_zones_zonelist(++z, highidx, nodemask), \
|
|
zone = zonelist_zone(z))
|
|
|
|
#define for_next_zone_zonelist_nodemask(zone, z, highidx, nodemask) \
|
|
for (zone = z->zone; \
|
|
zone; \
|
|
z = next_zones_zonelist(++z, highidx, nodemask), \
|
|
zone = zonelist_zone(z))
|
|
|
|
|
|
/**
|
|
* for_each_zone_zonelist - helper macro to iterate over valid zones in a zonelist at or below a given zone index
|
|
* @zone: The current zone in the iterator
|
|
* @z: The current pointer within zonelist->zones being iterated
|
|
* @zlist: The zonelist being iterated
|
|
* @highidx: The zone index of the highest zone to return
|
|
*
|
|
* This iterator iterates though all zones at or below a given zone index.
|
|
*/
|
|
#define for_each_zone_zonelist(zone, z, zlist, highidx) \
|
|
for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, NULL)
|
|
|
|
#ifdef CONFIG_SPARSEMEM
|
|
#include <asm/sparsemem.h>
|
|
#endif
|
|
|
|
#ifdef CONFIG_FLATMEM
|
|
#define pfn_to_nid(pfn) (0)
|
|
#endif
|
|
|
|
#ifdef CONFIG_SPARSEMEM
|
|
|
|
/*
|
|
* PA_SECTION_SHIFT physical address to/from section number
|
|
* PFN_SECTION_SHIFT pfn to/from section number
|
|
*/
|
|
#define PA_SECTION_SHIFT (SECTION_SIZE_BITS)
|
|
#define PFN_SECTION_SHIFT (SECTION_SIZE_BITS - PAGE_SHIFT)
|
|
|
|
#define NR_MEM_SECTIONS (1UL << SECTIONS_SHIFT)
|
|
|
|
#define PAGES_PER_SECTION (1UL << PFN_SECTION_SHIFT)
|
|
#define PAGE_SECTION_MASK (~(PAGES_PER_SECTION-1))
|
|
|
|
#define SECTION_BLOCKFLAGS_BITS \
|
|
((1UL << (PFN_SECTION_SHIFT - pageblock_order)) * NR_PAGEBLOCK_BITS)
|
|
|
|
#if (MAX_ORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS
|
|
#error Allocator MAX_ORDER exceeds SECTION_SIZE
|
|
#endif
|
|
|
|
static inline unsigned long pfn_to_section_nr(unsigned long pfn)
|
|
{
|
|
return pfn >> PFN_SECTION_SHIFT;
|
|
}
|
|
static inline unsigned long section_nr_to_pfn(unsigned long sec)
|
|
{
|
|
return sec << PFN_SECTION_SHIFT;
|
|
}
|
|
|
|
#define SECTION_ALIGN_UP(pfn) (((pfn) + PAGES_PER_SECTION - 1) & PAGE_SECTION_MASK)
|
|
#define SECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SECTION_MASK)
|
|
|
|
#define SUBSECTION_SHIFT 21
|
|
#define SUBSECTION_SIZE (1UL << SUBSECTION_SHIFT)
|
|
|
|
#define PFN_SUBSECTION_SHIFT (SUBSECTION_SHIFT - PAGE_SHIFT)
|
|
#define PAGES_PER_SUBSECTION (1UL << PFN_SUBSECTION_SHIFT)
|
|
#define PAGE_SUBSECTION_MASK (~(PAGES_PER_SUBSECTION-1))
|
|
|
|
#if SUBSECTION_SHIFT > SECTION_SIZE_BITS
|
|
#error Subsection size exceeds section size
|
|
#else
|
|
#define SUBSECTIONS_PER_SECTION (1UL << (SECTION_SIZE_BITS - SUBSECTION_SHIFT))
|
|
#endif
|
|
|
|
#define SUBSECTION_ALIGN_UP(pfn) ALIGN((pfn), PAGES_PER_SUBSECTION)
|
|
#define SUBSECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SUBSECTION_MASK)
|
|
|
|
struct mem_section_usage {
|
|
#ifdef CONFIG_SPARSEMEM_VMEMMAP
|
|
DECLARE_BITMAP(subsection_map, SUBSECTIONS_PER_SECTION);
|
|
#endif
|
|
/* See declaration of similar field in struct zone */
|
|
unsigned long pageblock_flags[0];
|
|
};
|
|
|
|
void subsection_map_init(unsigned long pfn, unsigned long nr_pages);
|
|
|
|
struct page;
|
|
struct page_ext;
|
|
struct mem_section {
|
|
/*
|
|
* This is, logically, a pointer to an array of struct
|
|
* pages. However, it is stored with some other magic.
|
|
* (see sparse.c::sparse_init_one_section())
|
|
*
|
|
* Additionally during early boot we encode node id of
|
|
* the location of the section here to guide allocation.
|
|
* (see sparse.c::memory_present())
|
|
*
|
|
* Making it a UL at least makes someone do a cast
|
|
* before using it wrong.
|
|
*/
|
|
unsigned long section_mem_map;
|
|
|
|
struct mem_section_usage *usage;
|
|
#ifdef CONFIG_PAGE_EXTENSION
|
|
/*
|
|
* If SPARSEMEM, pgdat doesn't have page_ext pointer. We use
|
|
* section. (see page_ext.h about this.)
|
|
*/
|
|
struct page_ext *page_ext;
|
|
unsigned long pad;
|
|
#endif
|
|
/*
|
|
* WARNING: mem_section must be a power-of-2 in size for the
|
|
* calculation and use of SECTION_ROOT_MASK to make sense.
|
|
*/
|
|
};
|
|
|
|
#ifdef CONFIG_SPARSEMEM_EXTREME
|
|
#define SECTIONS_PER_ROOT (PAGE_SIZE / sizeof (struct mem_section))
|
|
#else
|
|
#define SECTIONS_PER_ROOT 1
|
|
#endif
|
|
|
|
#define SECTION_NR_TO_ROOT(sec) ((sec) / SECTIONS_PER_ROOT)
|
|
#define NR_SECTION_ROOTS DIV_ROUND_UP(NR_MEM_SECTIONS, SECTIONS_PER_ROOT)
|
|
#define SECTION_ROOT_MASK (SECTIONS_PER_ROOT - 1)
|
|
|
|
#ifdef CONFIG_SPARSEMEM_EXTREME
|
|
extern struct mem_section **mem_section;
|
|
#else
|
|
extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT];
|
|
#endif
|
|
|
|
static inline unsigned long *section_to_usemap(struct mem_section *ms)
|
|
{
|
|
return ms->usage->pageblock_flags;
|
|
}
|
|
|
|
static inline struct mem_section *__nr_to_section(unsigned long nr)
|
|
{
|
|
#ifdef CONFIG_SPARSEMEM_EXTREME
|
|
if (!mem_section)
|
|
return NULL;
|
|
#endif
|
|
if (!mem_section[SECTION_NR_TO_ROOT(nr)])
|
|
return NULL;
|
|
return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK];
|
|
}
|
|
extern size_t mem_section_usage_size(void);
|
|
|
|
/*
|
|
* We use the lower bits of the mem_map pointer to store
|
|
* a little bit of information. The pointer is calculated
|
|
* as mem_map - section_nr_to_pfn(pnum). The result is
|
|
* aligned to the minimum alignment of the two values:
|
|
* 1. All mem_map arrays are page-aligned.
|
|
* 2. section_nr_to_pfn() always clears PFN_SECTION_SHIFT
|
|
* lowest bits. PFN_SECTION_SHIFT is arch-specific
|
|
* (equal SECTION_SIZE_BITS - PAGE_SHIFT), and the
|
|
* worst combination is powerpc with 256k pages,
|
|
* which results in PFN_SECTION_SHIFT equal 6.
|
|
* To sum it up, at least 6 bits are available.
|
|
*/
|
|
#define SECTION_MARKED_PRESENT (1UL<<0)
|
|
#define SECTION_HAS_MEM_MAP (1UL<<1)
|
|
#define SECTION_IS_ONLINE (1UL<<2)
|
|
#define SECTION_IS_EARLY (1UL<<3)
|
|
#define SECTION_TAINT_ZONE_DEVICE (1UL<<4)
|
|
#define SECTION_MAP_LAST_BIT (1UL<<5)
|
|
#define SECTION_MAP_MASK (~(SECTION_MAP_LAST_BIT-1))
|
|
#define SECTION_NID_SHIFT 6
|
|
|
|
static inline struct page *__section_mem_map_addr(struct mem_section *section)
|
|
{
|
|
unsigned long map = section->section_mem_map;
|
|
map &= SECTION_MAP_MASK;
|
|
return (struct page *)map;
|
|
}
|
|
|
|
static inline int present_section(struct mem_section *section)
|
|
{
|
|
return (section && (section->section_mem_map & SECTION_MARKED_PRESENT));
|
|
}
|
|
|
|
static inline int present_section_nr(unsigned long nr)
|
|
{
|
|
return present_section(__nr_to_section(nr));
|
|
}
|
|
|
|
static inline int valid_section(struct mem_section *section)
|
|
{
|
|
return (section && (section->section_mem_map & SECTION_HAS_MEM_MAP));
|
|
}
|
|
|
|
static inline int early_section(struct mem_section *section)
|
|
{
|
|
return (section && (section->section_mem_map & SECTION_IS_EARLY));
|
|
}
|
|
|
|
static inline int valid_section_nr(unsigned long nr)
|
|
{
|
|
return valid_section(__nr_to_section(nr));
|
|
}
|
|
|
|
static inline int online_section(struct mem_section *section)
|
|
{
|
|
return (section && (section->section_mem_map & SECTION_IS_ONLINE));
|
|
}
|
|
|
|
static inline int online_device_section(struct mem_section *section)
|
|
{
|
|
unsigned long flags = SECTION_IS_ONLINE | SECTION_TAINT_ZONE_DEVICE;
|
|
|
|
return section && ((section->section_mem_map & flags) == flags);
|
|
}
|
|
|
|
static inline int online_section_nr(unsigned long nr)
|
|
{
|
|
return online_section(__nr_to_section(nr));
|
|
}
|
|
|
|
#ifdef CONFIG_MEMORY_HOTPLUG
|
|
void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn);
|
|
void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn);
|
|
#endif
|
|
|
|
static inline struct mem_section *__pfn_to_section(unsigned long pfn)
|
|
{
|
|
return __nr_to_section(pfn_to_section_nr(pfn));
|
|
}
|
|
|
|
extern unsigned long __highest_present_section_nr;
|
|
|
|
static inline int subsection_map_index(unsigned long pfn)
|
|
{
|
|
return (pfn & ~(PAGE_SECTION_MASK)) / PAGES_PER_SUBSECTION;
|
|
}
|
|
|
|
#ifdef CONFIG_SPARSEMEM_VMEMMAP
|
|
static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
|
|
{
|
|
int idx = subsection_map_index(pfn);
|
|
|
|
return test_bit(idx, ms->usage->subsection_map);
|
|
}
|
|
#else
|
|
static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
|
|
{
|
|
return 1;
|
|
}
|
|
#endif
|
|
|
|
#ifndef CONFIG_HAVE_ARCH_PFN_VALID
|
|
/**
|
|
* pfn_valid - check if there is a valid memory map entry for a PFN
|
|
* @pfn: the page frame number to check
|
|
*
|
|
* Check if there is a valid memory map entry aka struct page for the @pfn.
|
|
* Note, that availability of the memory map entry does not imply that
|
|
* there is actual usable memory at that @pfn. The struct page may
|
|
* represent a hole or an unusable page frame.
|
|
*
|
|
* Return: 1 for PFNs that have memory map entries and 0 otherwise
|
|
*/
|
|
static inline int pfn_valid(unsigned long pfn)
|
|
{
|
|
struct mem_section *ms;
|
|
|
|
/*
|
|
* Ensure the upper PAGE_SHIFT bits are clear in the
|
|
* pfn. Else it might lead to false positives when
|
|
* some of the upper bits are set, but the lower bits
|
|
* match a valid pfn.
|
|
*/
|
|
if (PHYS_PFN(PFN_PHYS(pfn)) != pfn)
|
|
return 0;
|
|
|
|
if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
|
|
return 0;
|
|
ms = __nr_to_section(pfn_to_section_nr(pfn));
|
|
if (!valid_section(ms))
|
|
return 0;
|
|
/*
|
|
* Traditionally early sections always returned pfn_valid() for
|
|
* the entire section-sized span.
|
|
*/
|
|
return early_section(ms) || pfn_section_valid(ms, pfn);
|
|
}
|
|
#endif
|
|
|
|
static inline int pfn_in_present_section(unsigned long pfn)
|
|
{
|
|
if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
|
|
return 0;
|
|
return present_section(__nr_to_section(pfn_to_section_nr(pfn)));
|
|
}
|
|
|
|
static inline unsigned long next_present_section_nr(unsigned long section_nr)
|
|
{
|
|
while (++section_nr <= __highest_present_section_nr) {
|
|
if (present_section_nr(section_nr))
|
|
return section_nr;
|
|
}
|
|
|
|
return -1;
|
|
}
|
|
|
|
/*
|
|
* These are _only_ used during initialisation, therefore they
|
|
* can use __initdata ... They could have names to indicate
|
|
* this restriction.
|
|
*/
|
|
#ifdef CONFIG_NUMA
|
|
#define pfn_to_nid(pfn) \
|
|
({ \
|
|
unsigned long __pfn_to_nid_pfn = (pfn); \
|
|
page_to_nid(pfn_to_page(__pfn_to_nid_pfn)); \
|
|
})
|
|
#else
|
|
#define pfn_to_nid(pfn) (0)
|
|
#endif
|
|
|
|
void sparse_init(void);
|
|
#else
|
|
#define sparse_init() do {} while (0)
|
|
#define sparse_index_init(_sec, _nid) do {} while (0)
|
|
#define pfn_in_present_section pfn_valid
|
|
#define subsection_map_init(_pfn, _nr_pages) do {} while (0)
|
|
#endif /* CONFIG_SPARSEMEM */
|
|
|
|
#endif /* !__GENERATING_BOUNDS.H */
|
|
#endif /* !__ASSEMBLY__ */
|
|
#endif /* _LINUX_MMZONE_H */
|