Changes in 5.15.44
HID: amd_sfh: Add support for sensor discovery
KVM: x86/mmu: fix NULL pointer dereference on guest INVPCID
ice: fix crash at allocation failure
ACPI: sysfs: Fix BERT error region memory mapping
MAINTAINERS: co-maintain random.c
MAINTAINERS: add git tree for random.c
lib/crypto: blake2s: include as built-in
lib/crypto: blake2s: move hmac construction into wireguard
lib/crypto: sha1: re-roll loops to reduce code size
lib/crypto: blake2s: avoid indirect calls to compression function for Clang CFI
random: document add_hwgenerator_randomness() with other input functions
random: remove unused irq_flags argument from add_interrupt_randomness()
random: use BLAKE2s instead of SHA1 in extraction
random: do not sign extend bytes for rotation when mixing
random: do not re-init if crng_reseed completes before primary init
random: mix bootloader randomness into pool
random: harmonize "crng init done" messages
random: use IS_ENABLED(CONFIG_NUMA) instead of ifdefs
random: early initialization of ChaCha constants
random: avoid superfluous call to RDRAND in CRNG extraction
random: don't reset crng_init_cnt on urandom_read()
random: fix typo in comments
random: cleanup poolinfo abstraction
random: cleanup integer types
random: remove incomplete last_data logic
random: remove unused extract_entropy() reserved argument
random: rather than entropy_store abstraction, use global
random: remove unused OUTPUT_POOL constants
random: de-duplicate INPUT_POOL constants
random: prepend remaining pool constants with POOL_
random: cleanup fractional entropy shift constants
random: access input_pool_data directly rather than through pointer
random: selectively clang-format where it makes sense
random: simplify arithmetic function flow in account()
random: continually use hwgenerator randomness
random: access primary_pool directly rather than through pointer
random: only call crng_finalize_init() for primary_crng
random: use computational hash for entropy extraction
random: simplify entropy debiting
random: use linear min-entropy accumulation crediting
random: always wake up entropy writers after extraction
random: make credit_entropy_bits() always safe
random: remove use_input_pool parameter from crng_reseed()
random: remove batched entropy locking
random: fix locking in crng_fast_load()
random: use RDSEED instead of RDRAND in entropy extraction
random: get rid of secondary crngs
random: inline leaves of rand_initialize()
random: ensure early RDSEED goes through mixer on init
random: do not xor RDRAND when writing into /dev/random
random: absorb fast pool into input pool after fast load
random: use simpler fast key erasure flow on per-cpu keys
random: use hash function for crng_slow_load()
random: make more consistent use of integer types
random: remove outdated INT_MAX >> 6 check in urandom_read()
random: zero buffer after reading entropy from userspace
random: fix locking for crng_init in crng_reseed()
random: tie batched entropy generation to base_crng generation
random: remove ifdef'd out interrupt bench
random: remove unused tracepoints
random: add proper SPDX header
random: deobfuscate irq u32/u64 contributions
random: introduce drain_entropy() helper to declutter crng_reseed()
random: remove useless header comment
random: remove whitespace and reorder includes
random: group initialization wait functions
random: group crng functions
random: group entropy extraction functions
random: group entropy collection functions
random: group userspace read/write functions
random: group sysctl functions
random: rewrite header introductory comment
random: defer fast pool mixing to worker
random: do not take pool spinlock at boot
random: unify early init crng load accounting
random: check for crng_init == 0 in add_device_randomness()
random: pull add_hwgenerator_randomness() declaration into random.h
random: clear fast pool, crng, and batches in cpuhp bring up
random: round-robin registers as ulong, not u32
random: only wake up writers after zap if threshold was passed
random: cleanup UUID handling
random: unify cycles_t and jiffies usage and types
random: do crng pre-init loading in worker rather than irq
random: give sysctl_random_min_urandom_seed a more sensible value
random: don't let 644 read-only sysctls be written to
random: replace custom notifier chain with standard one
random: use SipHash as interrupt entropy accumulator
random: make consistent usage of crng_ready()
random: reseed more often immediately after booting
random: check for signal and try earlier when generating entropy
random: skip fast_init if hwrng provides large chunk of entropy
random: treat bootloader trust toggle the same way as cpu trust toggle
random: re-add removed comment about get_random_{u32,u64} reseeding
random: mix build-time latent entropy into pool at init
random: do not split fast init input in add_hwgenerator_randomness()
random: do not allow user to keep crng key around on stack
random: check for signal_pending() outside of need_resched() check
random: check for signals every PAGE_SIZE chunk of /dev/[u]random
random: allow partial reads if later user copies fail
random: make random_get_entropy() return an unsigned long
random: document crng_fast_key_erasure() destination possibility
random: fix sysctl documentation nits
init: call time_init() before rand_initialize()
ia64: define get_cycles macro for arch-override
s390: define get_cycles macro for arch-override
parisc: define get_cycles macro for arch-override
alpha: define get_cycles macro for arch-override
powerpc: define get_cycles macro for arch-override
timekeeping: Add raw clock fallback for random_get_entropy()
m68k: use fallback for random_get_entropy() instead of zero
riscv: use fallback for random_get_entropy() instead of zero
mips: use fallback for random_get_entropy() instead of just c0 random
arm: use fallback for random_get_entropy() instead of zero
nios2: use fallback for random_get_entropy() instead of zero
x86/tsc: Use fallback for random_get_entropy() instead of zero
um: use fallback for random_get_entropy() instead of zero
sparc: use fallback for random_get_entropy() instead of zero
xtensa: use fallback for random_get_entropy() instead of zero
random: insist on random_get_entropy() existing in order to simplify
random: do not use batches when !crng_ready()
random: use first 128 bits of input as fast init
random: do not pretend to handle premature next security model
random: order timer entropy functions below interrupt functions
random: do not use input pool from hard IRQs
random: help compiler out with fast_mix() by using simpler arguments
siphash: use one source of truth for siphash permutations
random: use symbolic constants for crng_init states
random: avoid initializing twice in credit race
random: move initialization out of reseeding hot path
random: remove ratelimiting for in-kernel unseeded randomness
random: use proper jiffies comparison macro
random: handle latent entropy and command line from random_init()
random: credit architectural init the exact amount
random: use static branch for crng_ready()
random: remove extern from functions in header
random: use proper return types on get_random_{int,long}_wait()
random: make consistent use of buf and len
random: move initialization functions out of hot pages
random: move randomize_page() into mm where it belongs
random: unify batched entropy implementations
random: convert to using fops->read_iter()
random: convert to using fops->write_iter()
random: wire up fops->splice_{read,write}_iter()
random: check for signals after page of pool writes
ALSA: ctxfi: Add SB046x PCI ID
Linux 5.15.44
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2d874cba14f13379fc5f874e72634c9179b28742
commit 3191dd5a1179ef0fad5a050a1702ae98b6251e8f upstream.
For the irq randomness fast pool, rather than having to use expensive
atomics, which were visibly the most expensive thing in the entire irq
handler, simply take care of the extreme edge case of resetting count to
zero in the cpuhp online handler, just after workqueues have been
reenabled. This simplifies the code a bit and lets us use vanilla
variables rather than atomics, and performance should be improved.
As well, very early on when the CPU comes up, while interrupts are still
disabled, we clear out the per-cpu crng and its batches, so that it
always starts with fresh randomness.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Sultan Alsawaf <sultan@kerneltoast.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 5.15.35
drm/amd/display: Add pstate verification and recovery for DCN31
drm/amd/display: Fix p-state allow debug index on dcn31
hamradio: defer 6pack kfree after unregister_netdev
hamradio: remove needs_free_netdev to avoid UAF
cpuidle: PSCI: Move the `has_lpi` check to the beginning of the function
ACPI: processor idle: Check for architectural support for LPI
ACPI: processor idle: Allow playing dead in C3 state
ACPI: processor: idle: fix lockup regression on 32-bit ThinkPad T40
btrfs: remove unused parameter nr_pages in add_ra_bio_pages()
btrfs: remove no longer used counter when reading data page
btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups()
soc: qcom: aoss: Expose send for generic usecase
dt-bindings: net: qcom,ipa: add optional qcom,qmp property
net: ipa: request IPA register values be retained
btrfs: release correct delalloc amount in direct IO write path
ALSA: core: Add snd_card_free_on_error() helper
ALSA: sis7019: Fix the missing error handling
ALSA: ali5451: Fix the missing snd_card_free() call at probe error
ALSA: als300: Fix the missing snd_card_free() call at probe error
ALSA: als4000: Fix the missing snd_card_free() call at probe error
ALSA: atiixp: Fix the missing snd_card_free() call at probe error
ALSA: au88x0: Fix the missing snd_card_free() call at probe error
ALSA: aw2: Fix the missing snd_card_free() call at probe error
ALSA: azt3328: Fix the missing snd_card_free() call at probe error
ALSA: bt87x: Fix the missing snd_card_free() call at probe error
ALSA: ca0106: Fix the missing snd_card_free() call at probe error
ALSA: cmipci: Fix the missing snd_card_free() call at probe error
ALSA: cs4281: Fix the missing snd_card_free() call at probe error
ALSA: cs5535audio: Fix the missing snd_card_free() call at probe error
ALSA: echoaudio: Fix the missing snd_card_free() call at probe error
ALSA: emu10k1x: Fix the missing snd_card_free() call at probe error
ALSA: ens137x: Fix the missing snd_card_free() call at probe error
ALSA: es1938: Fix the missing snd_card_free() call at probe error
ALSA: es1968: Fix the missing snd_card_free() call at probe error
ALSA: fm801: Fix the missing snd_card_free() call at probe error
ALSA: galaxy: Fix the missing snd_card_free() call at probe error
ALSA: hdsp: Fix the missing snd_card_free() call at probe error
ALSA: hdspm: Fix the missing snd_card_free() call at probe error
ALSA: ice1724: Fix the missing snd_card_free() call at probe error
ALSA: intel8x0: Fix the missing snd_card_free() call at probe error
ALSA: intel_hdmi: Fix the missing snd_card_free() call at probe error
ALSA: korg1212: Fix the missing snd_card_free() call at probe error
ALSA: lola: Fix the missing snd_card_free() call at probe error
ALSA: lx6464es: Fix the missing snd_card_free() call at probe error
ALSA: maestro3: Fix the missing snd_card_free() call at probe error
ALSA: oxygen: Fix the missing snd_card_free() call at probe error
ALSA: riptide: Fix the missing snd_card_free() call at probe error
ALSA: rme32: Fix the missing snd_card_free() call at probe error
ALSA: rme9652: Fix the missing snd_card_free() call at probe error
ALSA: rme96: Fix the missing snd_card_free() call at probe error
ALSA: sc6000: Fix the missing snd_card_free() call at probe error
ALSA: sonicvibes: Fix the missing snd_card_free() call at probe error
ALSA: via82xx: Fix the missing snd_card_free() call at probe error
ALSA: usb-audio: Cap upper limits of buffer/period bytes for implicit fb
ALSA: nm256: Don't call card private_free at probe error path
drm/msm: Add missing put_task_struct() in debugfs path
firmware: arm_scmi: Remove clear channel call on the TX channel
memory: atmel-ebi: Fix missing of_node_put in atmel_ebi_probe
Revert "ath11k: mesh: add support for 256 bitmap in blockack frames in 11ax"
firmware: arm_scmi: Fix sorting of retrieved clock rates
media: rockchip/rga: do proper error checking in probe
SUNRPC: Fix the svc_deferred_event trace class
net/sched: flower: fix parsing of ethertype following VLAN header
veth: Ensure eth header is in skb's linear part
gpiolib: acpi: use correct format characters
cifs: release cached dentries only if mount is complete
net: mdio: don't defer probe forever if PHY IRQ provider is missing
mlxsw: i2c: Fix initialization error flow
net/sched: fix initialization order when updating chain 0 head
net: dsa: felix: suppress -EPROBE_DEFER errors
net: ethernet: stmmac: fix altr_tse_pcs function when using a fixed-link
net/sched: taprio: Check if socket flags are valid
cfg80211: hold bss_lock while updating nontrans_list
netfilter: nft_socket: make cgroup match work in input too
drm/msm: Fix range size vs end confusion
drm/msm/dsi: Use connector directly in msm_dsi_manager_connector_init()
drm/msm/dp: add fail safe mode outside of event_mutex context
net/smc: Fix NULL pointer dereference in smc_pnet_find_ib()
scsi: pm80xx: Mask and unmask upper interrupt vectors 32-63
scsi: pm80xx: Enable upper inbound, outbound queues
scsi: iscsi: Move iscsi_ep_disconnect()
scsi: iscsi: Fix offload conn cleanup when iscsid restarts
scsi: iscsi: Fix endpoint reuse regression
scsi: iscsi: Fix conn cleanup and stop race during iscsid restart
scsi: iscsi: Fix unbound endpoint error handling
sctp: Initialize daddr on peeled off socket
netfilter: nf_tables: nft_parse_register can return a negative value
ALSA: ad1889: Fix the missing snd_card_free() call at probe error
ALSA: mtpav: Don't call card private_free at probe error path
io_uring: move io_uring_rsrc_update2 validation
io_uring: verify that resv2 is 0 in io_uring_rsrc_update2
io_uring: verify pad field is 0 in io_get_ext_arg
testing/selftests/mqueue: Fix mq_perf_tests to free the allocated cpu set
ALSA: usb-audio: Increase max buffer size
ALSA: usb-audio: Limit max buffer and period sizes per time
perf tools: Fix misleading add event PMU debug message
macvlan: Fix leaking skb in source mode with nodst option
net: ftgmac100: access hardware register after clock ready
nfc: nci: add flush_workqueue to prevent uaf
cifs: potential buffer overflow in handling symlinks
dm mpath: only use ktime_get_ns() in historical selector
vfio/pci: Fix vf_token mechanism when device-specific VF drivers are used
net: bcmgenet: Revert "Use stronger register read/writes to assure ordering"
block: fix offset/size check in bio_trim()
drm/amd: Add USBC connector ID
btrfs: fix fallocate to use file_modified to update permissions consistently
btrfs: do not warn for free space inode in cow_file_range
drm/amdgpu: conduct a proper cleanup of PDB bo
drm/amdgpu/gmc: use PCI BARs for APUs in passthrough
drm/amd/display: fix audio format not updated after edid updated
drm/amd/display: FEC check in timing validation
drm/amd/display: Update VTEM Infopacket definition
drm/amdkfd: Fix Incorrect VMIDs passed to HWS
drm/amdgpu/vcn: improve vcn dpg stop procedure
drm/amdkfd: Check for potential null return of kmalloc_array()
Drivers: hv: vmbus: Deactivate sysctl_record_panic_msg by default in isolated guests
PCI: hv: Propagate coherence from VMbus device to PCI device
Drivers: hv: vmbus: Prevent load re-ordering when reading ring buffer
scsi: target: tcmu: Fix possible page UAF
scsi: lpfc: Fix queue failures when recovering from PCI parity error
scsi: ibmvscsis: Increase INITIAL_SRP_LIMIT to 1024
net: micrel: fix KS8851_MLL Kconfig
ata: libata-core: Disable READ LOG DMA EXT for Samsung 840 EVOs
gpu: ipu-v3: Fix dev_dbg frequency output
regulator: wm8994: Add an off-on delay for WM8994 variant
arm64: alternatives: mark patch_alternative() as `noinstr`
tlb: hugetlb: Add more sizes to tlb_remove_huge_tlb_entry
net: axienet: setup mdio unconditionally
Drivers: hv: balloon: Disable balloon and hot-add accordingly
net: usb: aqc111: Fix out-of-bounds accesses in RX fixup
myri10ge: fix an incorrect free for skb in myri10ge_sw_tso
spi: cadence-quadspi: fix protocol setup for non-1-1-X operations
drm/amd/display: Enable power gating before init_pipes
drm/amd/display: Revert FEC check in validation
drm/amd/display: Fix allocate_mst_payload assert on resume
drbd: set QUEUE_FLAG_STABLE_WRITES
scsi: mpt3sas: Fail reset operation if config request timed out
scsi: mvsas: Add PCI ID of RocketRaid 2640
scsi: megaraid_sas: Target with invalid LUN ID is deleted during scan
drivers: net: slip: fix NPD bug in sl_tx_timeout()
io_uring: zero tag on rsrc removal
io_uring: use nospec annotation for more indexes
perf/imx_ddr: Fix undefined behavior due to shift overflowing the constant
mm/secretmem: fix panic when growing a memfd_secret
mm, page_alloc: fix build_zonerefs_node()
mm: fix unexpected zeroed page mapping with zram swap
mm: kmemleak: take a full lowmem check in kmemleak_*_phys()
KVM: x86/mmu: Resolve nx_huge_pages when kvm.ko is loaded
SUNRPC: Fix NFSD's request deferral on RDMA transports
memory: renesas-rpc-if: fix platform-device leak in error path
gcc-plugins: latent_entropy: use /dev/urandom
cifs: verify that tcon is valid before dereference in cifs_kill_sb
ath9k: Properly clear TX status area before reporting to mac80211
ath9k: Fix usage of driver-private space in tx_info
btrfs: fix root ref counts in error handling in btrfs_get_root_ref
btrfs: mark resumed async balance as writing
ALSA: hda/realtek: Add quirk for Clevo PD50PNT
ALSA: hda/realtek: add quirk for Lenovo Thinkpad X12 speakers
ALSA: pcm: Test for "silence" field in struct "pcm_format_data"
nl80211: correctly check NL80211_ATTR_REG_ALPHA2 size
ipv6: fix panic when forwarding a pkt with no in6 dev
drm/amd/display: don't ignore alpha property on pre-multiplied mode
drm/amdgpu: Enable gfxoff quirk on MacBook Pro
x86/tsx: Use MSR_TSX_CTRL to clear CPUID bits
x86/tsx: Disable TSX development mode at boot
genirq/affinity: Consider that CPUs on nodes can be unbalanced
tick/nohz: Use WARN_ON_ONCE() to prevent console saturation
ARM: davinci: da850-evm: Avoid NULL pointer dereference
dm integrity: fix memory corruption when tag_size is less than digest size
i2c: dev: check return value when calling dev_set_name()
smp: Fix offline cpu check in flush_smp_call_function_queue()
i2c: pasemi: Wait for write xfers to finish
dt-bindings: net: snps: remove duplicate name
timers: Fix warning condition in __run_timers()
dma-direct: avoid redundant memory sync for swiotlb
drm/i915: Sunset igpu legacy mmap support based on GRAPHICS_VER_FULL
cpu/hotplug: Remove the 'cpu' member of cpuhp_cpu_state
soc: qcom: aoss: Fix missing put_device call in qmp_get
net: ipa: fix a build dependency
cpufreq: intel_pstate: ITMT support for overclocked system
ax25: add refcount in ax25_dev to avoid UAF bugs
ax25: fix reference count leaks of ax25_dev
ax25: fix UAF bugs of net_device caused by rebinding operation
ax25: Fix refcount leaks caused by ax25_cb_del()
ax25: fix UAF bug in ax25_send_control()
ax25: fix NPD bug in ax25_disconnect
ax25: Fix NULL pointer dereferences in ax25 timers
ax25: Fix UAF bugs in ax25 timers
Linux 5.15.35
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0dd9eaea7f977df42b0a5b9cb9043c879f62718b
commit b7ba6d8dc3569e49800ef0136799f26f43e237e8 upstream.
Currently the setting of the 'cpu' member of struct cpuhp_cpu_state in
cpuhp_create() is too late as it is used earlier in _cpu_up().
If kzalloc_node() in __smpboot_create_thread() fails then the rollback will
be done with st->cpu==0 causing CPU0 to be erroneously set to be dying,
causing the scheduler to get mightily confused and throw its toys out of
the pram.
However the cpu number is actually available directly, so simply remove
the 'cpu' member and avoid the problem in the first place.
Fixes: 2ea46c6fc9 ("cpumask/hotplug: Fix cpu_dying() state tracking")
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220411152233.474129-2-steven.price@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 5.15.6
scsi: sd: Fix sd_do_mode_sense() buffer length handling
ACPI: Get acpi_device's parent from the parent field
ACPI: CPPC: Add NULL pointer check to cppc_get_perf()
USB: serial: pl2303: fix GC type detection
USB: serial: option: add Telit LE910S1 0x9200 composition
USB: serial: option: add Fibocom FM101-GL variants
usb: dwc2: gadget: Fix ISOC flow for elapsed frames
usb: dwc2: hcd_queue: Fix use of floating point literal
usb: dwc3: leave default DMA for PCI devices
usb: dwc3: core: Revise GHWPARAMS9 offset
usb: dwc3: gadget: Ignore NoStream after End Transfer
usb: dwc3: gadget: Check for L1/L2/U3 for Start Transfer
usb: dwc3: gadget: Fix null pointer exception
net: usb: Correct PHY handling of smsc95xx
net: nexthop: fix null pointer dereference when IPv6 is not enabled
usb: chipidea: ci_hdrc_imx: fix potential error pointer dereference in probe
usb: typec: fusb302: Fix masking of comparator and bc_lvl interrupts
usb: xhci: tegra: Check padctrl interrupt presence in device tree
usb: hub: Fix usb enumeration issue due to address0 race
usb: hub: Fix locking issues with address0_mutex
binder: fix test regression due to sender_euid change
ALSA: ctxfi: Fix out-of-range access
ALSA: hda/realtek: Add quirk for ASRock NUC Box 1100
ALSA: hda/realtek: Fix LED on HP ProBook 435 G7
media: cec: copy sequence field for the reply
Revert "parisc: Fix backtrace to always include init funtion names"
HID: wacom: Use "Confidence" flag to prevent reporting invalid contacts
staging/fbtft: Fix backlight
staging: greybus: Add missing rwsem around snd_ctl_remove() calls
staging: rtl8192e: Fix use after free in _rtl92e_pci_disconnect()
staging: r8188eu: Use kzalloc() with GFP_ATOMIC in atomic context
staging: r8188eu: Fix breakage introduced when 5G code was removed
staging: r8188eu: use GFP_ATOMIC under spinlock
staging: r8188eu: fix a memory leak in rtw_wx_read32()
fuse: release pipe buf after last use
xen: don't continue xenstore initialization in case of errors
xen: detect uninitialized xenbus in xenbus_init
io_uring: correct link-list traversal locking
io_uring: fail cancellation for EXITING tasks
io_uring: fix link traversal locking
drm/amdgpu: IH process reset count when restart
drm/amdgpu/pm: fix powerplay OD interface
drm/nouveau: recognise GA106
ksmbd: downgrade addition info error msg to debug in smb2_get_info_sec()
ksmbd: contain default data stream even if xattr is empty
ksmbd: fix memleak in get_file_stream_info()
KVM: PPC: Book3S HV: Prevent POWER7/8 TLB flush flushing SLB
tracing/uprobe: Fix uprobe_perf_open probes iteration
tracing: Fix pid filtering when triggers are attached
mmc: sdhci-esdhc-imx: disable CMDQ support
mmc: sdhci: Fix ADMA for PAGE_SIZE >= 64KiB
mdio: aspeed: Fix "Link is Down" issue
arm64: mm: Fix VM_BUG_ON(mm != &init_mm) for trans_pgd
cpufreq: intel_pstate: Fix active mode offline/online EPP handling
powerpc/32: Fix hardlockup on vmap stack overflow
iomap: Fix inline extent handling in iomap_readpage
NFSv42: Fix pagecache invalidation after COPY/CLONE
PCI: aardvark: Deduplicate code in advk_pcie_rd_conf()
PCI: aardvark: Implement re-issuing config requests on CRS response
PCI: aardvark: Simplify initialization of rootcap on virtual bridge
PCI: aardvark: Fix link training
drm/amd/display: Fix OLED brightness control on eDP
proc/vmcore: fix clearing user buffer by properly using clear_user()
ASoC: SOF: Intel: hda: fix hotplug when only codec is suspended
netfilter: ctnetlink: fix filtering with CTA_TUPLE_REPLY
netfilter: ctnetlink: do not erase error code with EINVAL
netfilter: ipvs: Fix reuse connection if RS weight is 0
netfilter: flowtable: fix IPv6 tunnel addr match
media: v4l2-core: fix VIDIOC_DQEVENT handling on non-x86
firmware: arm_scmi: Fix null de-reference on error path
ARM: dts: BCM5301X: Fix I2C controller interrupt
ARM: dts: BCM5301X: Add interrupt properties to GPIO node
ARM: dts: bcm2711: Fix PCIe interrupts
ASoC: qdsp6: q6routing: Conditionally reset FrontEnd Mixer
ASoC: qdsp6: q6asm: fix q6asm_dai_prepare error handling
ASoC: topology: Add missing rwsem around snd_ctl_remove() calls
ASoC: codecs: wcd938x: fix volatile register range
ASoC: codecs: wcd934x: return error code correctly from hw_params
ASoC: codecs: lpass-rx-macro: fix HPHR setting CLSH mask
net: ieee802154: handle iftypes as u32
firmware: arm_scmi: Fix base agent discover response
firmware: arm_scmi: pm: Propagate return value to caller
ASoC: stm32: i2s: fix 32 bits channel length without mclk
NFSv42: Don't fail clone() unless the OP_CLONE operation failed
ARM: socfpga: Fix crash with CONFIG_FORTIRY_SOURCE
drm/nouveau/acr: fix a couple NULL vs IS_ERR() checks
scsi: qla2xxx: edif: Fix off by one bug in qla_edif_app_getfcinfo()
scsi: mpt3sas: Fix kernel panic during drive powercycle test
scsi: mpt3sas: Fix system going into read-only mode
scsi: mpt3sas: Fix incorrect system timestamp
drm/vc4: fix error code in vc4_create_object()
drm/aspeed: Fix vga_pw sysfs output
net: marvell: prestera: fix brige port operation
net: marvell: prestera: fix double free issue on err path
HID: input: Fix parsing of HID_CP_CONSUMER_CONTROL fields
HID: input: set usage type to key on keycode remap
HID: magicmouse: prevent division by 0 on scroll
iavf: Prevent changing static ITR values if adaptive moderation is on
iavf: Fix refreshing iavf adapter stats on ethtool request
iavf: Fix VLAN feature flags after VFR
x86/pvh: add prototype for xen_pvh_init()
xen/pvh: add missing prototype to header
ALSA: intel-dsp-config: add quirk for JSL devices based on ES8336 codec
mptcp: fix delack timer
mptcp: use delegate action to schedule 3rd ack retrans
af_unix: fix regression in read after shutdown
firmware: smccc: Fix check for ARCH_SOC_ID not implemented
ipv6: fix typos in __ip6_finish_output()
nfp: checking parameter process for rx-usecs/tx-usecs is invalid
net: stmmac: retain PTP clock time during SIOCSHWTSTAMP ioctls
net: ipv6: add fib6_nh_release_dsts stub
net: nexthop: release IPv6 per-cpu dsts when replacing a nexthop group
ice: fix vsi->txq_map sizing
ice: avoid bpf_prog refcount underflow
scsi: core: sysfs: Fix setting device state to SDEV_RUNNING
scsi: scsi_debug: Zero clear zones at reset write pointer
erofs: fix deadlock when shrink erofs slab
i2c: virtio: disable timeout handling
net/smc: Ensure the active closing peer first closes clcsock
mlxsw: spectrum: Protect driver from buggy firmware
net: ipa: directly disable ipa-setup-ready interrupt
net: ipa: separate disabling setup from modem stop
net: ipa: kill ipa_cmd_pipeline_clear()
net: marvell: mvpp2: increase MTU limit when XDP enabled
cpufreq: intel_pstate: Add Ice Lake server to out-of-band IDs
nvmet-tcp: fix incomplete data digest send
drm/hyperv: Fix device removal on Gen1 VMs
arm64: uaccess: avoid blocking within critical sections
net/ncsi : Add payload to be 32-bit aligned to fix dropped packets
PM: hibernate: use correct mode for swsusp_close()
drm/amd/display: Fix DPIA outbox timeout after GPU reset
drm/amd/display: Set plane update flags for all planes in reset
tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows
lan743x: fix deadlock in lan743x_phy_link_status_change()
net: phylink: Force link down and retrigger resolve on interface change
net: phylink: Force retrigger in case of latched link-fail indicator
net/smc: Fix NULL pointer dereferencing in smc_vlan_by_tcpsk()
net/smc: Fix loop in smc_listen
nvmet: use IOCB_NOWAIT only if the filesystem supports it
igb: fix netpoll exit with traffic
MIPS: loongson64: fix FTLB configuration
MIPS: use 3-level pgtable for 64KB page size on MIPS_VA_BITS_48
tls: splice_read: fix record type check
tls: splice_read: fix accessing pre-processed records
tls: fix replacing proto_ops
net: stmmac: Disable Tx queues when reconfiguring the interface
net/sched: sch_ets: don't peek at classes beyond 'nbands'
ethtool: ioctl: fix potential NULL deref in ethtool_set_coalesce()
net: vlan: fix underflow for the real_dev refcnt
net/smc: Don't call clcsock shutdown twice when smc shutdown
net: hns3: fix VF RSS failed problem after PF enable multi-TCs
net: hns3: fix incorrect components info of ethtool --reset command
net: mscc: ocelot: don't downgrade timestamping RX filters in SIOCSHWTSTAMP
net: mscc: ocelot: correctly report the timestamping RX filters in ethtool
locking/rwsem: Make handoff bit handling more consistent
perf: Ignore sigtrap for tracepoints destined for other tasks
sched/scs: Reset task stack state in bringup_cpu()
iommu/rockchip: Fix PAGE_DESC_HI_MASKs for RK3568
iommu/vt-d: Fix unmap_pages support
f2fs: quota: fix potential deadlock
f2fs: set SBI_NEED_FSCK flag when inconsistent node block found
riscv: dts: microchip: fix board compatible
riscv: dts: microchip: drop duplicated MMC/SDHC node
cifs: nosharesock should not share socket with future sessions
ceph: properly handle statfs on multifs setups
iommu/amd: Clarify AMD IOMMUv2 initialization messages
vdpa_sim: avoid putting an uninitialized iova_domain
vhost/vsock: fix incorrect used length reported to the guest
ksmbd: Fix an error handling path in 'smb2_sess_setup()'
tracing: Check pid filtering when creating events
cifs: nosharesock should be set on new server
io_uring: fix soft lockup when call __io_remove_buffers
firmware: arm_scmi: Fix type error assignment in voltage protocol
firmware: arm_scmi: Fix type error in sensor protocol
docs: accounting: update delay-accounting.rst reference
blk-mq: cancel blk-mq dispatch work in both blk_cleanup_queue and disk_release()
block: avoid to quiesce queue in elevator_init_mq
drm/amdgpu/gfx10: add wraparound gpu counter check for APUs as well
drm/amdgpu/gfx9: switch to golden tsc registers for renoir+
Linux 5.15.6
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibe65221ba285038e25de36ad3659e0ce201408c2
[ Upstream commit dce1ca0525bfdc8a69a9343bc714fbc19a2f04b3 ]
To hot unplug a CPU, the idle task on that CPU calls a few layers of C
code before finally leaving the kernel. When KASAN is in use, poisoned
shadow is left around for each of the active stack frames, and when
shadow call stacks are in use. When shadow call stacks (SCS) are in use
the task's saved SCS SP is left pointing at an arbitrary point within
the task's shadow call stack.
When a CPU is offlined than onlined back into the kernel, this stale
state can adversely affect execution. Stale KASAN shadow can alias new
stackframes and result in bogus KASAN warnings. A stale SCS SP is
effectively a memory leak, and prevents a portion of the shadow call
stack being used. Across a number of hotplug cycles the idle task's
entire shadow call stack can become unusable.
We previously fixed the KASAN issue in commit:
e1b77c9298 ("sched/kasan: remove stale KASAN poison after hotplug")
... by removing any stale KASAN stack poison immediately prior to
onlining a CPU.
Subsequently in commit:
f1a0a376ca ("sched/core: Initialize the idle task with preemption disabled")
... the refactoring left the KASAN and SCS cleanup in one-time idle
thread initialization code rather than something invoked prior to each
CPU being onlined, breaking both as above.
We fixed SCS (but not KASAN) in commit:
63acd42c0d ("sched/scs: Reset the shadow stack when idle_task_exit")
... but as this runs in the context of the idle task being offlined it's
potentially fragile.
To fix these consistently and more robustly, reset the SCS SP and KASAN
shadow of a CPU's idle task immediately before we online that CPU in
bringup_cpu(). This ensures the idle task always has a consistent state
when it is running, and removes the need to so so when exiting an idle
task.
Whenever any thread is created, dup_task_struct() will give the task a
stack which is free of KASAN shadow, and initialize the task's SCS SP,
so there's no need to specially initialize either for idle thread within
init_idle(), as this was only necessary to handle hotplug cycles.
I've tested this on arm64 with:
* gcc 11.1.0, defconfig +KASAN_INLINE, KASAN_STACK
* clang 12.0.0, defconfig +KASAN_INLINE, KASAN_STACK, SHADOW_CALL_STACK
... offlining and onlining CPUS with:
| while true; do
| for C in /sys/devices/system/cpu/cpu*/online; do
| echo 0 > $C;
| echo 1 > $C;
| done
| done
Fixes: f1a0a376ca ("sched/core: Initialize the idle task with preemption disabled")
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Qian Cai <quic_qiancai@quicinc.com>
Link: https://lore.kernel.org/lkml/20211115113310.35693-1-mark.rutland@arm.com/
Signed-off-by: Sasha Levin <sashal@kernel.org>
CPU hotplug callbacks can fail and cause a rollback to the previous
state. These failures are silent and therefore hard to debug.
Add pr_debug() to the up and down paths which provide information about the
error code, the CPU and the failed state. The debug printks can be enabled
via kernel command line or sysfs.
[ tglx: Adopt to current mainline, massage printk and changelog ]
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Qais Yousef <qais.yousef@arm.com>
Link: https://lore.kernel.org/r/20210409055316.1709-1-dongli.zhang@oracle.com
kernel/cpu.c:57: warning: cannot understand function prototype: 'struct cpuhp_cpu_state '
kernel/cpu.c:115: warning: cannot understand function prototype: 'struct cpuhp_step '
kernel/cpu.c:146: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* cpuhp_invoke_callback _ Invoke the callbacks for a given state
kernel/cpu.c:75: warning: Function parameter or member 'fail' not described in 'cpuhp_cpu_state'
kernel/cpu.c:75: warning: Function parameter or member 'cpu' not described in 'cpuhp_cpu_state'
kernel/cpu.c:75: warning: Function parameter or member 'node' not described in 'cpuhp_cpu_state'
kernel/cpu.c:75: warning: Function parameter or member 'last' not described in 'cpuhp_cpu_state'
kernel/cpu.c:130: warning: Function parameter or member 'list' not described in 'cpuhp_step'
kernel/cpu.c:130: warning: Function parameter or member 'multi_instance' not described in 'cpuhp_step'
kernel/cpu.c:158: warning: No description found for return value of 'cpuhp_invoke_callback'
kernel/cpu.c:1188: warning: No description found for return value of 'cpu_device_down'
kernel/cpu.c:1400: warning: No description found for return value of 'cpu_device_up'
kernel/cpu.c:1425: warning: No description found for return value of 'bringup_hibernate_cpu'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210809223825.24512-1-rdunlap@infradead.org
Pull CPU hotplug fix from Thomas Gleixner:
"A fix for the CPU hotplug and cpusets interaction:
cpusets delegate the hotplug work to a workqueue to prevent a lock
order inversion vs. the CPU hotplug lock. The work is not flushed
before the hotplug operation returns which creates user visible
inconsistent state. Prevent this by flushing the work after dropping
CPU hotplug lock and before releasing the outer mutex which serializes
the CPU hotplug related sysfs interface operations"
* tag 'smp-urgent-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Cure the cpusets trainwreck
Alexey and Joshua tried to solve a cpusets related hotplug problem which is
user space visible and results in unexpected behaviour for some time after
a CPU has been plugged in and the corresponding uevent was delivered.
cpusets delegate the hotplug work (rebuilding cpumasks etc.) to a
workqueue. This is done because the cpusets code has already a lock
nesting of cgroups_mutex -> cpu_hotplug_lock. A synchronous callback or
waiting for the work to finish with cpu_hotplug_lock held can and will
deadlock because that results in the reverse lock order.
As a consequence the uevent can be delivered before cpusets have consistent
state which means that a user space invocation of sched_setaffinity() to
move a task to the plugged CPU fails up to the point where the scheduled
work has been processed.
The same is true for CPU unplug, but that does not create user observable
failure (yet).
It's still inconsistent to claim that an operation is finished before it
actually is and that's the real issue at hand. uevents just make it
reliably observable.
Obviously the problem should be fixed in cpusets/cgroups, but untangling
that is pretty much impossible because according to the changelog of the
commit which introduced this 8 years ago:
3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()")
the lock order cgroups_mutex -> cpu_hotplug_lock is a design decision and
the whole code is built around that.
So bite the bullet and invoke the relevant cpuset function, which waits for
the work to finish, in _cpu_up/down() after dropping cpu_hotplug_lock and
only when tasks are not frozen by suspend/hibernate because that would
obviously wait forever.
Waiting there with cpu_add_remove_lock, which is protecting the present
and possible CPU maps, held is not a problem at all because neither work
queues nor cpusets/cgroups have any lockchains related to that lock.
Waiting in the hotplug machinery is not problematic either because there
are already state callbacks which wait for hardware queues to drain. It
makes the operations slightly slower, but hotplug is slow anyway.
This ensures that state is consistent before returning from a hotplug
up/down operation. It's still inconsistent during the operation, but that's
a different story.
Add a large comment which explains why this is done and why this is not a
dump ground for the hack of the day to work around half thought out locking
schemes. Document also the implications vs. hotplug operations and
serialization or the lack of it.
Thanks to Alexy and Joshua for analyzing why this temporary
sched_setaffinity() failure happened.
Fixes: 3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()")
Reported-by: Alexey Klimov <aklimov@redhat.com>
Reported-by: Joshua Baker <jobaker@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexey Klimov <aklimov@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87tuowcnv3.ffs@nanos.tec.linutronix.de
This reverts commit e6120dd58d.
The original issue has been solved. This commit is now superflouous.
Bug: 120444461
Cc: Thierry Strudel <tstrudel@google.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I549ef73c423cd3c4bc6d30e8e1ca723549c3b95e
Vincent reported that for states with a NULL startup/teardown function
we do not call cpuhp_invoke_callback() (because there is none) and as
such we'll not update the cpu_dying() state.
The stale cpu_dying() can eventually lead to triggering BUG().
Rectify this by updating cpu_dying() in the exact same places the
hotplug machinery tracks its directional state, namely
cpuhp_set_state() and cpuhp_reset_state().
Reported-by: Vincent Donnefort <vincent.donnefort@arm.com>
Suggested-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Donnefort <vincent.donnefort@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/YH7r+AoQEReSvxBI@hirez.programming.kicks-ass.net
Factorizing and unifying cpuhp callback range invocations, especially for
the hotunplug path, where two different ways of decrementing were used. The
first one, decrements before the callback is called:
cpuhp_thread_fun()
state = st->state;
st->state--;
cpuhp_invoke_callback(state);
The second one, after:
take_down_cpu()|cpuhp_down_callbacks()
cpuhp_invoke_callback(st->state);
st->state--;
This is problematic for rolling back the steps in case of error, as
depending on the decrement, the rollback will start from N or N-1. It also
makes tracing inconsistent, between steps run in the cpuhp thread and
the others.
Additionally, avoid useless cpuhp_thread_fun() loops by skipping empty
steps.
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210216103506.416286-4-vincent.donnefort@arm.com
The atomic states (between CPUHP_AP_IDLE_DEAD and CPUHP_AP_ONLINE) are
triggered by the CPUHP_BRINGUP_CPU step. If the latter fails, no atomic
state can be rolled back.
DEAD callbacks too can't fail and disallow recovery. As a consequence,
during hotunplug, the fail injection interface should prohibit all states
from CPUHP_BRINGUP_CPU to CPUHP_ONLINE.
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210216103506.416286-3-vincent.donnefort@arm.com
Steps on the way to 5.12-rc1
Resolves conflicts in:
kernel/sched/cpufreq_schedutil.c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I06d90f919467f3e7e8970aaedbb872a10eb699ff
Now that CPU pause is gone, this is a lot more manageable. The
remaining conflicts were caused mostly by vendor hooks and Android-specific
tweaks to the EAS topology code, but easily fixable by hand.
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I3665a2d78cb0b8eca6ba5110e90dc7f72030805e
This reverts commit 1734af6299.
CPU Pause causes major merge conflicts with the 5.11 scheduler changes
(migrate-disable specifically), so lets revert Pause temporarily as it
is not needed urgently in android-mainline.
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: If72719521011ece234d0b176e91709d4f4c6f17b
This reverts commit 683010f555.
CPU Pause causes major merge conflicts with the 5.11 scheduler changes
(migrate-disable specifically), so lets revert Pause temporarily as it
is not needed urgently in android-mainline.
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I7f3f6003e962e61111831571602c380136a11d59
This reverts commit e19b8ce907.
CPU Pause causes major merge conflicts with the 5.11 scheduler changes
(migrate-disable specifically), so lets revert Pause temporarily as it
is not needed urgently in android-mainline.
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: Ibc5e502844d5e2e72cc48eede1032c89b16f5ccf
This reverts commit 1d3a64fbd2.
CPU Pause causes major merge conflicts with the 5.11 scheduler changes
(migrate-disable specifically), so lets revert Pause temporarily as it
is not needed urgently in android-mainline.
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I201898fe111d5da25c82b0f6a7b32cd0b1d6f237
This reverts commit 782131fed0.
CPU Pause causes major merge conflicts with the 5.11 scheduler changes
(migrate-disable specifically), so lets revert Pause temporarily as it
is not needed urgently in android-mainline.
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: Iefd741900143a60b945b7523bf0495a07cff8234
Pull scheduler updates from Thomas Gleixner:
- migrate_disable/enable() support which originates from the RT tree
and is now a prerequisite for the new preemptible kmap_local() API
which aims to replace kmap_atomic().
- A fair amount of topology and NUMA related improvements
- Improvements for the frequency invariant calculations
- Enhanced robustness for the global CPU priority tracking and decision
making
- The usual small fixes and enhancements all over the place
* tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits)
sched/fair: Trivial correction of the newidle_balance() comment
sched/fair: Clear SMT siblings after determining the core is not idle
sched: Fix kernel-doc markup
x86: Print ratio freq_max/freq_base used in frequency invariance calculations
x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC
x86, sched: Calculate frequency invariance for AMD systems
irq_work: Optimize irq_work_single()
smp: Cleanup smp_call_function*()
irq_work: Cleanup
sched: Limit the amount of NUMA imbalance that can exist at fork time
sched/numa: Allow a floating imbalance between NUMA nodes
sched: Avoid unnecessary calculation of load imbalance at clone time
sched/numa: Rename nr_running and break out the magic number
sched: Make migrate_disable/enable() independent of RT
sched/topology: Condition EAS enablement on FIE support
arm64: Rebuild sched domains on invariance status changes
sched/topology,schedutil: Wrap sched domains rebuild
sched/uclamp: Allow to reset a task uclamp constraint value
sched/core: Fix typos in comments
Documentation: scheduler: fix information on arch SD flags, sched_domain and sched_debug
...
Incorporate a vendor hook in the resume cpus path
so that vendor specific activities may take place.
Bug: 161210528
Change-Id: I74d03247491b004e891dbcfe06a478d00a95ba9f
Signed-off-by: Stephen Dickey <dickey@codeaurora.org>
In the resume_cpus() path, cpus cannot be taken
advantage of until the cpus write lock is acquired,
and cpus are activated and domains rebuilt. This
can incurr significant delay in the unpause operation.
Additionally, if scheduled through the kworker thread,
the wait time for rebuilding sched domains becomes
large due to a busy system that can prevent the kworker
from executing.
Activate the cpus and call the cpuset_hotplug_workfn
directly within resume_cpus prior to getting the cpus
write lock, thereby eliminating delays associated
with scheduling this activity.
Bug: 161210528
Change-Id: Ie2521f28ed9078b22d421d792f08413016d4dd62
Signed-off-by: Stephen Dickey <dickey@codeaurora.org>
Signed-off-by: Todd Kjos <tkjos@google.com>
paused_cpus intending to force CPUs to go idle as quickly as possible,
adding a migration step, to drain the rq from any running task.
Two steps are actually needed. The first one, "lazy", will run before the
cpu_active_mask has been synced. The second one will run after. It is
possible for another CPU, to observe an outdated version of that mask and
to enqueue a task on a rq that has just been marked inactive. The second
migration is there to catch any of those spurious move, while the first
one will drain the rq as quickly as possible to let the CPU reach an idle
state.
Bug: 161210528
Change-Id: Ie26c2e4c42665dd61d41a899a84536e56bf2b887
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
pause_cpus intends to have a way to force a CPU to go idle and to resume
as quickly as possible, with as little disruption as possible on the
system. This is a way of saving energy or meet thermal constraints, for
which a full CPU hotunplug is too slow. A paused CPU is simply deactivated
from the scheduler point of view. This corresponds to the first hotunplug
step.
Each pause operation still needs some heavy synchronization. Allowing to
pause several CPUs in one go mitigate that issue.
Paused CPUs can be resumed with resume_cpus(), which also takes a cpumask
as an input.
Few limitations:
* It isn't possible to pause a CPU which is running SCHED_DEADLINE task.
* A paused CPU will be removed from any cpuset it is part of. Resuming
the CPU won't put back this CPU in the cpuset if using cgroup1.
Cgroup2 doesn't have this limitation.
* per-CPU kthreads are still allowed to run on a paused CPU.
Bug: 161210528
Change-Id: I1f5cb28190f8ec979bb8640a89b022f2f7266bcf
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Todd Kjos <tkjos@google.com>
In the event of a partial _cpu_down, (i.e. _cpu_down(target) where
target > CPUHP_AP_OFFLINE), the cpu_online_mask won't be aligned with
cpu_active_mask. This is an issue when trying to offline the last CPU
from cpu_active_mask, while num_online_cpus() > 1.
Protect against this case by checking num_active_cpus() instead of
num_online_cpus().
Bug: 161210528
Change-Id: Ibe7d9ef69e5f91e99be0d98076614a7654bda094
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
With the new mechanism which kicks tasks off the outgoing CPU at the end of
schedule() the situation on an outgoing CPU right before the stopper thread
brings it down completely is:
- All user tasks and all unbound kernel threads have either been migrated
away or are not running and the next wakeup will move them to a online CPU.
- All per CPU kernel threads, except cpu hotplug thread and the stopper
thread have either been unbound or parked by the responsible CPU hotplug
callback.
That means that at the last step before the stopper thread is invoked the
cpu hotplug thread is the last legitimate running task on the outgoing
CPU.
Add a final wait step right before the stopper thread is kicked which
ensures that any still running tasks on the way to park or on the way to
kick themself of the CPU are either sleeping or gone.
This allows to remove the migrate_tasks() crutch in sched_cpu_dying(). If
sched_cpu_dying() detects that there is still another running task aside of
the stopper thread then it will explode with the appropriate fireworks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201023102346.547163969@infradead.org
Commit c6e5f9d7cf ("ANDROID: cpu-hotplug: Always use real time
scheduling when hotplugging a CPU") tried to speed-up hotplug of
SCHED_NORMAL tasks by temporarily elevating them to SCHED_FIFO. But
while at it, it also prevented hotplug from SCHED_IDLE, SCHED_BATCH or
SCHED_DEADLINE for no apparent reason.
Since this is a userspace-visible change, and is unlikely to actually be
needed, change the patch logic to only optimize for SCHED_NORMAL tasks
and leave the others untouched.
Bug: 169238689
Fixes: c6e5f9d7cf ("ANDROID: cpu-hotplug: Always use real time
scheduling when hotplugging a CPU")
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I4d9e88b15fee56e7d234826e2eaea306a69328bb
CPU hotplug operations take place in preemptible context. This leaves
the hotplugging thread at the mercy of overall system load and CPU
availability. If the hotplugging thread does not get an opportunity
to execute after it has already begun a hotplug operation, CPUs can
end up being stuck in a quasi online state. In the worst case a CPU
can be stuck in a state where the migration thread is parked while
another task is executing and changing affinity in a loop. This
combination can result in unbounded execution time for the running
task until the hotplugging thread gets the chance to run to complete
the hotplug operation.
Fix the said problem by ensuring that hotplug can only occur from
threads belonging to the RT sched class. This allows the hotplugging
thread priority on the CPU no matter what the system load or the
number of available CPUs are. If a SCHED_NORMAL task attempts to
hotplug a CPU, we temporarily elevate it's scheduling policy to RT.
Furthermore, we disallow hotplugging operations to begin if the
calling task belongs to the idle and deadline classes or those that
use the SCHED_BATCH policy.
Bug: 169238689
Change-Id: Idbb1384626e6ddff46c0d2ce752eee68396c78af
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[psodagud@codeaurora.org: Fixed compilation issues]
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Pull scheduler updates from Ingo Molnar:
"The changes in this cycle are:
- Optimize the task wakeup CPU selection logic, to improve
scalability and reduce wakeup latency spikes
- PELT enhancements
- CFS bandwidth handling fixes
- Optimize the wakeup path by remove rq->wake_list and replacing it
with ->ttwu_pending
- Optimize IPI cross-calls by making flush_smp_call_function_queue()
process sync callbacks first.
- Misc fixes and enhancements"
* tag 'sched-core-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
irq_work: Define irq_work_single() on !CONFIG_IRQ_WORK too
sched/headers: Split out open-coded prototypes into kernel/sched/smp.h
sched: Replace rq::wake_list
sched: Add rq::ttwu_pending
irq_work, smp: Allow irq_work on call_single_queue
smp: Optimize send_call_function_single_ipi()
smp: Move irq_work_run() out of flush_smp_call_function_queue()
smp: Optimize flush_smp_call_function_queue()
sched: Fix smp_call_function_single_async() usage for ILB
sched/core: Offload wakee task activation if it the wakee is descheduling
sched/core: Optimize ttwu() spinning on p->on_cpu
sched: Defend cfs and rt bandwidth quota against overflow
sched/cpuacct: Fix charge cpuacct.usage_sys
sched/fair: Replace zero-length array with flexible-array
sched/pelt: Sync util/runnable_sum with PELT window when propagating
sched/cpuacct: Use __this_cpu_add() instead of this_cpu_ptr()
sched/fair: Optimize enqueue_task_fair()
sched: Make scheduler_ipi inline
sched: Clean up scheduler_ipi()
sched/core: Simplify sched_init()
...
The single user could have called freeze_secondary_cpus() directly.
Since this function was a source of confusion, remove it as it's
just a pointless wrapper.
While at it, rename enable_nonboot_cpus() to thaw_secondary_cpus() to
preserve the naming symmetry.
Done automatically via:
git grep -l enable_nonboot_cpus | xargs sed -i 's/enable_nonboot_cpus/thaw_secondary_cpus/g'
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: https://lkml.kernel.org/r/20200430114004.17477-1-qais.yousef@arm.com
In the CPU-offline process, it calls mmdrop() after idle entry and the
subsequent call to cpuhp_report_idle_dead(). Once execution passes the
call to rcu_report_dead(), RCU is ignoring the CPU, which results in
lockdep complaining when mmdrop() uses RCU from either memcg or
debugobjects below.
Fix it by cleaning up the active_mm state from BP instead. Every arch
which has CONFIG_HOTPLUG_CPU should have already called idle_task_exit()
from AP. The only exception is parisc because it switches them to
&init_mm unconditionally (see smp_boot_one_cpu() and smp_cpu_init()),
but the patch will still work there because it calls mmgrab(&init_mm) in
smp_cpu_init() and then should call mmdrop(&init_mm) in finish_cpu().
WARNING: suspicious RCU usage
-----------------------------
kernel/workqueue.c:710 RCU or wq_pool_mutex should be held!
other info that might help us debug this:
RCU used illegally from offline CPU!
Call Trace:
dump_stack+0xf4/0x164 (unreliable)
lockdep_rcu_suspicious+0x140/0x164
get_work_pool+0x110/0x150
__queue_work+0x1bc/0xca0
queue_work_on+0x114/0x120
css_release+0x9c/0xc0
percpu_ref_put_many+0x204/0x230
free_pcp_prepare+0x264/0x570
free_unref_page+0x38/0xf0
__mmdrop+0x21c/0x2c0
idle_task_exit+0x170/0x1b0
pnv_smp_cpu_kill_self+0x38/0x2e0
cpu_die+0x48/0x64
arch_cpu_idle_dead+0x30/0x50
do_idle+0x2f4/0x470
cpu_startup_entry+0x38/0x40
start_secondary+0x7a8/0xa80
start_secondary_resume+0x10/0x14
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Link: https://lkml.kernel.org/r/20200401214033.8448-1-cai@lca.pw
In a quest to make the huge -rc1 merge easier to handle and bisect,
merge the first chunk of 5.7-rc1 patches into android-mainline.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib54436e9515660a4c0c25c49c21bfb399eb57921
Pull core SMP updates from Thomas Gleixner:
"CPU (hotplug) updates:
- Support for locked CSD objects in smp_call_function_single_async()
which allows to simplify callsites in the scheduler core and MIPS
- Treewide consolidation of CPU hotplug functions which ensures the
consistency between the sysfs interface and kernel state. The low
level functions cpu_up/down() are now confined to the core code and
not longer accessible from random code"
* tag 'smp-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
cpu/hotplug: Ignore pm_wakeup_pending() for disable_nonboot_cpus()
cpu/hotplug: Hide cpu_up/down()
cpu/hotplug: Move bringup of secondary CPUs out of smp_init()
torture: Replace cpu_up/down() with add/remove_cpu()
firmware: psci: Replace cpu_up/down() with add/remove_cpu()
xen/cpuhotplug: Replace cpu_up/down() with device_online/offline()
parisc: Replace cpu_up/down() with add/remove_cpu()
sparc: Replace cpu_up/down() with add/remove_cpu()
powerpc: Replace cpu_up/down() with add/remove_cpu()
x86/smp: Replace cpu_up/down() with add/remove_cpu()
arm64: hibernate: Use bringup_hibernate_cpu()
cpu/hotplug: Provide bringup_hibernate_cpu()
arm64: Use reboot_cpu instead of hardconding it to 0
arm64: Don't use disable_nonboot_cpus()
ARM: Use reboot_cpu instead of hardcoding it to 0
ARM: Don't use disable_nonboot_cpus()
ia64: Replace cpu_down() with smp_shutdown_nonboot_cpus()
cpu/hotplug: Create a new function to shutdown nonboot cpus
cpu/hotplug: Add new {add,remove}_cpu() functions
sched/core: Remove rq.hrtick_csd_pending
...