Changes in 5.15.6
scsi: sd: Fix sd_do_mode_sense() buffer length handling
ACPI: Get acpi_device's parent from the parent field
ACPI: CPPC: Add NULL pointer check to cppc_get_perf()
USB: serial: pl2303: fix GC type detection
USB: serial: option: add Telit LE910S1 0x9200 composition
USB: serial: option: add Fibocom FM101-GL variants
usb: dwc2: gadget: Fix ISOC flow for elapsed frames
usb: dwc2: hcd_queue: Fix use of floating point literal
usb: dwc3: leave default DMA for PCI devices
usb: dwc3: core: Revise GHWPARAMS9 offset
usb: dwc3: gadget: Ignore NoStream after End Transfer
usb: dwc3: gadget: Check for L1/L2/U3 for Start Transfer
usb: dwc3: gadget: Fix null pointer exception
net: usb: Correct PHY handling of smsc95xx
net: nexthop: fix null pointer dereference when IPv6 is not enabled
usb: chipidea: ci_hdrc_imx: fix potential error pointer dereference in probe
usb: typec: fusb302: Fix masking of comparator and bc_lvl interrupts
usb: xhci: tegra: Check padctrl interrupt presence in device tree
usb: hub: Fix usb enumeration issue due to address0 race
usb: hub: Fix locking issues with address0_mutex
binder: fix test regression due to sender_euid change
ALSA: ctxfi: Fix out-of-range access
ALSA: hda/realtek: Add quirk for ASRock NUC Box 1100
ALSA: hda/realtek: Fix LED on HP ProBook 435 G7
media: cec: copy sequence field for the reply
Revert "parisc: Fix backtrace to always include init funtion names"
HID: wacom: Use "Confidence" flag to prevent reporting invalid contacts
staging/fbtft: Fix backlight
staging: greybus: Add missing rwsem around snd_ctl_remove() calls
staging: rtl8192e: Fix use after free in _rtl92e_pci_disconnect()
staging: r8188eu: Use kzalloc() with GFP_ATOMIC in atomic context
staging: r8188eu: Fix breakage introduced when 5G code was removed
staging: r8188eu: use GFP_ATOMIC under spinlock
staging: r8188eu: fix a memory leak in rtw_wx_read32()
fuse: release pipe buf after last use
xen: don't continue xenstore initialization in case of errors
xen: detect uninitialized xenbus in xenbus_init
io_uring: correct link-list traversal locking
io_uring: fail cancellation for EXITING tasks
io_uring: fix link traversal locking
drm/amdgpu: IH process reset count when restart
drm/amdgpu/pm: fix powerplay OD interface
drm/nouveau: recognise GA106
ksmbd: downgrade addition info error msg to debug in smb2_get_info_sec()
ksmbd: contain default data stream even if xattr is empty
ksmbd: fix memleak in get_file_stream_info()
KVM: PPC: Book3S HV: Prevent POWER7/8 TLB flush flushing SLB
tracing/uprobe: Fix uprobe_perf_open probes iteration
tracing: Fix pid filtering when triggers are attached
mmc: sdhci-esdhc-imx: disable CMDQ support
mmc: sdhci: Fix ADMA for PAGE_SIZE >= 64KiB
mdio: aspeed: Fix "Link is Down" issue
arm64: mm: Fix VM_BUG_ON(mm != &init_mm) for trans_pgd
cpufreq: intel_pstate: Fix active mode offline/online EPP handling
powerpc/32: Fix hardlockup on vmap stack overflow
iomap: Fix inline extent handling in iomap_readpage
NFSv42: Fix pagecache invalidation after COPY/CLONE
PCI: aardvark: Deduplicate code in advk_pcie_rd_conf()
PCI: aardvark: Implement re-issuing config requests on CRS response
PCI: aardvark: Simplify initialization of rootcap on virtual bridge
PCI: aardvark: Fix link training
drm/amd/display: Fix OLED brightness control on eDP
proc/vmcore: fix clearing user buffer by properly using clear_user()
ASoC: SOF: Intel: hda: fix hotplug when only codec is suspended
netfilter: ctnetlink: fix filtering with CTA_TUPLE_REPLY
netfilter: ctnetlink: do not erase error code with EINVAL
netfilter: ipvs: Fix reuse connection if RS weight is 0
netfilter: flowtable: fix IPv6 tunnel addr match
media: v4l2-core: fix VIDIOC_DQEVENT handling on non-x86
firmware: arm_scmi: Fix null de-reference on error path
ARM: dts: BCM5301X: Fix I2C controller interrupt
ARM: dts: BCM5301X: Add interrupt properties to GPIO node
ARM: dts: bcm2711: Fix PCIe interrupts
ASoC: qdsp6: q6routing: Conditionally reset FrontEnd Mixer
ASoC: qdsp6: q6asm: fix q6asm_dai_prepare error handling
ASoC: topology: Add missing rwsem around snd_ctl_remove() calls
ASoC: codecs: wcd938x: fix volatile register range
ASoC: codecs: wcd934x: return error code correctly from hw_params
ASoC: codecs: lpass-rx-macro: fix HPHR setting CLSH mask
net: ieee802154: handle iftypes as u32
firmware: arm_scmi: Fix base agent discover response
firmware: arm_scmi: pm: Propagate return value to caller
ASoC: stm32: i2s: fix 32 bits channel length without mclk
NFSv42: Don't fail clone() unless the OP_CLONE operation failed
ARM: socfpga: Fix crash with CONFIG_FORTIRY_SOURCE
drm/nouveau/acr: fix a couple NULL vs IS_ERR() checks
scsi: qla2xxx: edif: Fix off by one bug in qla_edif_app_getfcinfo()
scsi: mpt3sas: Fix kernel panic during drive powercycle test
scsi: mpt3sas: Fix system going into read-only mode
scsi: mpt3sas: Fix incorrect system timestamp
drm/vc4: fix error code in vc4_create_object()
drm/aspeed: Fix vga_pw sysfs output
net: marvell: prestera: fix brige port operation
net: marvell: prestera: fix double free issue on err path
HID: input: Fix parsing of HID_CP_CONSUMER_CONTROL fields
HID: input: set usage type to key on keycode remap
HID: magicmouse: prevent division by 0 on scroll
iavf: Prevent changing static ITR values if adaptive moderation is on
iavf: Fix refreshing iavf adapter stats on ethtool request
iavf: Fix VLAN feature flags after VFR
x86/pvh: add prototype for xen_pvh_init()
xen/pvh: add missing prototype to header
ALSA: intel-dsp-config: add quirk for JSL devices based on ES8336 codec
mptcp: fix delack timer
mptcp: use delegate action to schedule 3rd ack retrans
af_unix: fix regression in read after shutdown
firmware: smccc: Fix check for ARCH_SOC_ID not implemented
ipv6: fix typos in __ip6_finish_output()
nfp: checking parameter process for rx-usecs/tx-usecs is invalid
net: stmmac: retain PTP clock time during SIOCSHWTSTAMP ioctls
net: ipv6: add fib6_nh_release_dsts stub
net: nexthop: release IPv6 per-cpu dsts when replacing a nexthop group
ice: fix vsi->txq_map sizing
ice: avoid bpf_prog refcount underflow
scsi: core: sysfs: Fix setting device state to SDEV_RUNNING
scsi: scsi_debug: Zero clear zones at reset write pointer
erofs: fix deadlock when shrink erofs slab
i2c: virtio: disable timeout handling
net/smc: Ensure the active closing peer first closes clcsock
mlxsw: spectrum: Protect driver from buggy firmware
net: ipa: directly disable ipa-setup-ready interrupt
net: ipa: separate disabling setup from modem stop
net: ipa: kill ipa_cmd_pipeline_clear()
net: marvell: mvpp2: increase MTU limit when XDP enabled
cpufreq: intel_pstate: Add Ice Lake server to out-of-band IDs
nvmet-tcp: fix incomplete data digest send
drm/hyperv: Fix device removal on Gen1 VMs
arm64: uaccess: avoid blocking within critical sections
net/ncsi : Add payload to be 32-bit aligned to fix dropped packets
PM: hibernate: use correct mode for swsusp_close()
drm/amd/display: Fix DPIA outbox timeout after GPU reset
drm/amd/display: Set plane update flags for all planes in reset
tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows
lan743x: fix deadlock in lan743x_phy_link_status_change()
net: phylink: Force link down and retrigger resolve on interface change
net: phylink: Force retrigger in case of latched link-fail indicator
net/smc: Fix NULL pointer dereferencing in smc_vlan_by_tcpsk()
net/smc: Fix loop in smc_listen
nvmet: use IOCB_NOWAIT only if the filesystem supports it
igb: fix netpoll exit with traffic
MIPS: loongson64: fix FTLB configuration
MIPS: use 3-level pgtable for 64KB page size on MIPS_VA_BITS_48
tls: splice_read: fix record type check
tls: splice_read: fix accessing pre-processed records
tls: fix replacing proto_ops
net: stmmac: Disable Tx queues when reconfiguring the interface
net/sched: sch_ets: don't peek at classes beyond 'nbands'
ethtool: ioctl: fix potential NULL deref in ethtool_set_coalesce()
net: vlan: fix underflow for the real_dev refcnt
net/smc: Don't call clcsock shutdown twice when smc shutdown
net: hns3: fix VF RSS failed problem after PF enable multi-TCs
net: hns3: fix incorrect components info of ethtool --reset command
net: mscc: ocelot: don't downgrade timestamping RX filters in SIOCSHWTSTAMP
net: mscc: ocelot: correctly report the timestamping RX filters in ethtool
locking/rwsem: Make handoff bit handling more consistent
perf: Ignore sigtrap for tracepoints destined for other tasks
sched/scs: Reset task stack state in bringup_cpu()
iommu/rockchip: Fix PAGE_DESC_HI_MASKs for RK3568
iommu/vt-d: Fix unmap_pages support
f2fs: quota: fix potential deadlock
f2fs: set SBI_NEED_FSCK flag when inconsistent node block found
riscv: dts: microchip: fix board compatible
riscv: dts: microchip: drop duplicated MMC/SDHC node
cifs: nosharesock should not share socket with future sessions
ceph: properly handle statfs on multifs setups
iommu/amd: Clarify AMD IOMMUv2 initialization messages
vdpa_sim: avoid putting an uninitialized iova_domain
vhost/vsock: fix incorrect used length reported to the guest
ksmbd: Fix an error handling path in 'smb2_sess_setup()'
tracing: Check pid filtering when creating events
cifs: nosharesock should be set on new server
io_uring: fix soft lockup when call __io_remove_buffers
firmware: arm_scmi: Fix type error assignment in voltage protocol
firmware: arm_scmi: Fix type error in sensor protocol
docs: accounting: update delay-accounting.rst reference
blk-mq: cancel blk-mq dispatch work in both blk_cleanup_queue and disk_release()
block: avoid to quiesce queue in elevator_init_mq
drm/amdgpu/gfx10: add wraparound gpu counter check for APUs as well
drm/amdgpu/gfx9: switch to golden tsc registers for renoir+
Linux 5.15.6
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibe65221ba285038e25de36ad3659e0ce201408c2
[ Upstream commit dce1ca0525bfdc8a69a9343bc714fbc19a2f04b3 ]
To hot unplug a CPU, the idle task on that CPU calls a few layers of C
code before finally leaving the kernel. When KASAN is in use, poisoned
shadow is left around for each of the active stack frames, and when
shadow call stacks are in use. When shadow call stacks (SCS) are in use
the task's saved SCS SP is left pointing at an arbitrary point within
the task's shadow call stack.
When a CPU is offlined than onlined back into the kernel, this stale
state can adversely affect execution. Stale KASAN shadow can alias new
stackframes and result in bogus KASAN warnings. A stale SCS SP is
effectively a memory leak, and prevents a portion of the shadow call
stack being used. Across a number of hotplug cycles the idle task's
entire shadow call stack can become unusable.
We previously fixed the KASAN issue in commit:
e1b77c9298 ("sched/kasan: remove stale KASAN poison after hotplug")
... by removing any stale KASAN stack poison immediately prior to
onlining a CPU.
Subsequently in commit:
f1a0a376ca ("sched/core: Initialize the idle task with preemption disabled")
... the refactoring left the KASAN and SCS cleanup in one-time idle
thread initialization code rather than something invoked prior to each
CPU being onlined, breaking both as above.
We fixed SCS (but not KASAN) in commit:
63acd42c0d ("sched/scs: Reset the shadow stack when idle_task_exit")
... but as this runs in the context of the idle task being offlined it's
potentially fragile.
To fix these consistently and more robustly, reset the SCS SP and KASAN
shadow of a CPU's idle task immediately before we online that CPU in
bringup_cpu(). This ensures the idle task always has a consistent state
when it is running, and removes the need to so so when exiting an idle
task.
Whenever any thread is created, dup_task_struct() will give the task a
stack which is free of KASAN shadow, and initialize the task's SCS SP,
so there's no need to specially initialize either for idle thread within
init_idle(), as this was only necessary to handle hotplug cycles.
I've tested this on arm64 with:
* gcc 11.1.0, defconfig +KASAN_INLINE, KASAN_STACK
* clang 12.0.0, defconfig +KASAN_INLINE, KASAN_STACK, SHADOW_CALL_STACK
... offlining and onlining CPUS with:
| while true; do
| for C in /sys/devices/system/cpu/cpu*/online; do
| echo 0 > $C;
| echo 1 > $C;
| done
| done
Fixes: f1a0a376ca ("sched/core: Initialize the idle task with preemption disabled")
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Qian Cai <quic_qiancai@quicinc.com>
Link: https://lore.kernel.org/lkml/20211115113310.35693-1-mark.rutland@arm.com/
Signed-off-by: Sasha Levin <sashal@kernel.org>
CPU hotplug callbacks can fail and cause a rollback to the previous
state. These failures are silent and therefore hard to debug.
Add pr_debug() to the up and down paths which provide information about the
error code, the CPU and the failed state. The debug printks can be enabled
via kernel command line or sysfs.
[ tglx: Adopt to current mainline, massage printk and changelog ]
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Qais Yousef <qais.yousef@arm.com>
Link: https://lore.kernel.org/r/20210409055316.1709-1-dongli.zhang@oracle.com
kernel/cpu.c:57: warning: cannot understand function prototype: 'struct cpuhp_cpu_state '
kernel/cpu.c:115: warning: cannot understand function prototype: 'struct cpuhp_step '
kernel/cpu.c:146: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* cpuhp_invoke_callback _ Invoke the callbacks for a given state
kernel/cpu.c:75: warning: Function parameter or member 'fail' not described in 'cpuhp_cpu_state'
kernel/cpu.c:75: warning: Function parameter or member 'cpu' not described in 'cpuhp_cpu_state'
kernel/cpu.c:75: warning: Function parameter or member 'node' not described in 'cpuhp_cpu_state'
kernel/cpu.c:75: warning: Function parameter or member 'last' not described in 'cpuhp_cpu_state'
kernel/cpu.c:130: warning: Function parameter or member 'list' not described in 'cpuhp_step'
kernel/cpu.c:130: warning: Function parameter or member 'multi_instance' not described in 'cpuhp_step'
kernel/cpu.c:158: warning: No description found for return value of 'cpuhp_invoke_callback'
kernel/cpu.c:1188: warning: No description found for return value of 'cpu_device_down'
kernel/cpu.c:1400: warning: No description found for return value of 'cpu_device_up'
kernel/cpu.c:1425: warning: No description found for return value of 'bringup_hibernate_cpu'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210809223825.24512-1-rdunlap@infradead.org
Pull CPU hotplug fix from Thomas Gleixner:
"A fix for the CPU hotplug and cpusets interaction:
cpusets delegate the hotplug work to a workqueue to prevent a lock
order inversion vs. the CPU hotplug lock. The work is not flushed
before the hotplug operation returns which creates user visible
inconsistent state. Prevent this by flushing the work after dropping
CPU hotplug lock and before releasing the outer mutex which serializes
the CPU hotplug related sysfs interface operations"
* tag 'smp-urgent-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Cure the cpusets trainwreck
Alexey and Joshua tried to solve a cpusets related hotplug problem which is
user space visible and results in unexpected behaviour for some time after
a CPU has been plugged in and the corresponding uevent was delivered.
cpusets delegate the hotplug work (rebuilding cpumasks etc.) to a
workqueue. This is done because the cpusets code has already a lock
nesting of cgroups_mutex -> cpu_hotplug_lock. A synchronous callback or
waiting for the work to finish with cpu_hotplug_lock held can and will
deadlock because that results in the reverse lock order.
As a consequence the uevent can be delivered before cpusets have consistent
state which means that a user space invocation of sched_setaffinity() to
move a task to the plugged CPU fails up to the point where the scheduled
work has been processed.
The same is true for CPU unplug, but that does not create user observable
failure (yet).
It's still inconsistent to claim that an operation is finished before it
actually is and that's the real issue at hand. uevents just make it
reliably observable.
Obviously the problem should be fixed in cpusets/cgroups, but untangling
that is pretty much impossible because according to the changelog of the
commit which introduced this 8 years ago:
3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()")
the lock order cgroups_mutex -> cpu_hotplug_lock is a design decision and
the whole code is built around that.
So bite the bullet and invoke the relevant cpuset function, which waits for
the work to finish, in _cpu_up/down() after dropping cpu_hotplug_lock and
only when tasks are not frozen by suspend/hibernate because that would
obviously wait forever.
Waiting there with cpu_add_remove_lock, which is protecting the present
and possible CPU maps, held is not a problem at all because neither work
queues nor cpusets/cgroups have any lockchains related to that lock.
Waiting in the hotplug machinery is not problematic either because there
are already state callbacks which wait for hardware queues to drain. It
makes the operations slightly slower, but hotplug is slow anyway.
This ensures that state is consistent before returning from a hotplug
up/down operation. It's still inconsistent during the operation, but that's
a different story.
Add a large comment which explains why this is done and why this is not a
dump ground for the hack of the day to work around half thought out locking
schemes. Document also the implications vs. hotplug operations and
serialization or the lack of it.
Thanks to Alexy and Joshua for analyzing why this temporary
sched_setaffinity() failure happened.
Fixes: 3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()")
Reported-by: Alexey Klimov <aklimov@redhat.com>
Reported-by: Joshua Baker <jobaker@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexey Klimov <aklimov@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87tuowcnv3.ffs@nanos.tec.linutronix.de
This reverts commit e6120dd58d.
The original issue has been solved. This commit is now superflouous.
Bug: 120444461
Cc: Thierry Strudel <tstrudel@google.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I549ef73c423cd3c4bc6d30e8e1ca723549c3b95e
Vincent reported that for states with a NULL startup/teardown function
we do not call cpuhp_invoke_callback() (because there is none) and as
such we'll not update the cpu_dying() state.
The stale cpu_dying() can eventually lead to triggering BUG().
Rectify this by updating cpu_dying() in the exact same places the
hotplug machinery tracks its directional state, namely
cpuhp_set_state() and cpuhp_reset_state().
Reported-by: Vincent Donnefort <vincent.donnefort@arm.com>
Suggested-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Donnefort <vincent.donnefort@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/YH7r+AoQEReSvxBI@hirez.programming.kicks-ass.net
Factorizing and unifying cpuhp callback range invocations, especially for
the hotunplug path, where two different ways of decrementing were used. The
first one, decrements before the callback is called:
cpuhp_thread_fun()
state = st->state;
st->state--;
cpuhp_invoke_callback(state);
The second one, after:
take_down_cpu()|cpuhp_down_callbacks()
cpuhp_invoke_callback(st->state);
st->state--;
This is problematic for rolling back the steps in case of error, as
depending on the decrement, the rollback will start from N or N-1. It also
makes tracing inconsistent, between steps run in the cpuhp thread and
the others.
Additionally, avoid useless cpuhp_thread_fun() loops by skipping empty
steps.
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210216103506.416286-4-vincent.donnefort@arm.com
The atomic states (between CPUHP_AP_IDLE_DEAD and CPUHP_AP_ONLINE) are
triggered by the CPUHP_BRINGUP_CPU step. If the latter fails, no atomic
state can be rolled back.
DEAD callbacks too can't fail and disallow recovery. As a consequence,
during hotunplug, the fail injection interface should prohibit all states
from CPUHP_BRINGUP_CPU to CPUHP_ONLINE.
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210216103506.416286-3-vincent.donnefort@arm.com
Steps on the way to 5.12-rc1
Resolves conflicts in:
kernel/sched/cpufreq_schedutil.c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I06d90f919467f3e7e8970aaedbb872a10eb699ff
Now that CPU pause is gone, this is a lot more manageable. The
remaining conflicts were caused mostly by vendor hooks and Android-specific
tweaks to the EAS topology code, but easily fixable by hand.
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I3665a2d78cb0b8eca6ba5110e90dc7f72030805e
This reverts commit 1734af6299.
CPU Pause causes major merge conflicts with the 5.11 scheduler changes
(migrate-disable specifically), so lets revert Pause temporarily as it
is not needed urgently in android-mainline.
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: If72719521011ece234d0b176e91709d4f4c6f17b
This reverts commit 683010f555.
CPU Pause causes major merge conflicts with the 5.11 scheduler changes
(migrate-disable specifically), so lets revert Pause temporarily as it
is not needed urgently in android-mainline.
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I7f3f6003e962e61111831571602c380136a11d59
This reverts commit e19b8ce907.
CPU Pause causes major merge conflicts with the 5.11 scheduler changes
(migrate-disable specifically), so lets revert Pause temporarily as it
is not needed urgently in android-mainline.
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: Ibc5e502844d5e2e72cc48eede1032c89b16f5ccf
This reverts commit 1d3a64fbd2.
CPU Pause causes major merge conflicts with the 5.11 scheduler changes
(migrate-disable specifically), so lets revert Pause temporarily as it
is not needed urgently in android-mainline.
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I201898fe111d5da25c82b0f6a7b32cd0b1d6f237
This reverts commit 782131fed0.
CPU Pause causes major merge conflicts with the 5.11 scheduler changes
(migrate-disable specifically), so lets revert Pause temporarily as it
is not needed urgently in android-mainline.
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: Iefd741900143a60b945b7523bf0495a07cff8234
Pull scheduler updates from Thomas Gleixner:
- migrate_disable/enable() support which originates from the RT tree
and is now a prerequisite for the new preemptible kmap_local() API
which aims to replace kmap_atomic().
- A fair amount of topology and NUMA related improvements
- Improvements for the frequency invariant calculations
- Enhanced robustness for the global CPU priority tracking and decision
making
- The usual small fixes and enhancements all over the place
* tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits)
sched/fair: Trivial correction of the newidle_balance() comment
sched/fair: Clear SMT siblings after determining the core is not idle
sched: Fix kernel-doc markup
x86: Print ratio freq_max/freq_base used in frequency invariance calculations
x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC
x86, sched: Calculate frequency invariance for AMD systems
irq_work: Optimize irq_work_single()
smp: Cleanup smp_call_function*()
irq_work: Cleanup
sched: Limit the amount of NUMA imbalance that can exist at fork time
sched/numa: Allow a floating imbalance between NUMA nodes
sched: Avoid unnecessary calculation of load imbalance at clone time
sched/numa: Rename nr_running and break out the magic number
sched: Make migrate_disable/enable() independent of RT
sched/topology: Condition EAS enablement on FIE support
arm64: Rebuild sched domains on invariance status changes
sched/topology,schedutil: Wrap sched domains rebuild
sched/uclamp: Allow to reset a task uclamp constraint value
sched/core: Fix typos in comments
Documentation: scheduler: fix information on arch SD flags, sched_domain and sched_debug
...
Incorporate a vendor hook in the resume cpus path
so that vendor specific activities may take place.
Bug: 161210528
Change-Id: I74d03247491b004e891dbcfe06a478d00a95ba9f
Signed-off-by: Stephen Dickey <dickey@codeaurora.org>
In the resume_cpus() path, cpus cannot be taken
advantage of until the cpus write lock is acquired,
and cpus are activated and domains rebuilt. This
can incurr significant delay in the unpause operation.
Additionally, if scheduled through the kworker thread,
the wait time for rebuilding sched domains becomes
large due to a busy system that can prevent the kworker
from executing.
Activate the cpus and call the cpuset_hotplug_workfn
directly within resume_cpus prior to getting the cpus
write lock, thereby eliminating delays associated
with scheduling this activity.
Bug: 161210528
Change-Id: Ie2521f28ed9078b22d421d792f08413016d4dd62
Signed-off-by: Stephen Dickey <dickey@codeaurora.org>
Signed-off-by: Todd Kjos <tkjos@google.com>
paused_cpus intending to force CPUs to go idle as quickly as possible,
adding a migration step, to drain the rq from any running task.
Two steps are actually needed. The first one, "lazy", will run before the
cpu_active_mask has been synced. The second one will run after. It is
possible for another CPU, to observe an outdated version of that mask and
to enqueue a task on a rq that has just been marked inactive. The second
migration is there to catch any of those spurious move, while the first
one will drain the rq as quickly as possible to let the CPU reach an idle
state.
Bug: 161210528
Change-Id: Ie26c2e4c42665dd61d41a899a84536e56bf2b887
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
pause_cpus intends to have a way to force a CPU to go idle and to resume
as quickly as possible, with as little disruption as possible on the
system. This is a way of saving energy or meet thermal constraints, for
which a full CPU hotunplug is too slow. A paused CPU is simply deactivated
from the scheduler point of view. This corresponds to the first hotunplug
step.
Each pause operation still needs some heavy synchronization. Allowing to
pause several CPUs in one go mitigate that issue.
Paused CPUs can be resumed with resume_cpus(), which also takes a cpumask
as an input.
Few limitations:
* It isn't possible to pause a CPU which is running SCHED_DEADLINE task.
* A paused CPU will be removed from any cpuset it is part of. Resuming
the CPU won't put back this CPU in the cpuset if using cgroup1.
Cgroup2 doesn't have this limitation.
* per-CPU kthreads are still allowed to run on a paused CPU.
Bug: 161210528
Change-Id: I1f5cb28190f8ec979bb8640a89b022f2f7266bcf
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Todd Kjos <tkjos@google.com>
In the event of a partial _cpu_down, (i.e. _cpu_down(target) where
target > CPUHP_AP_OFFLINE), the cpu_online_mask won't be aligned with
cpu_active_mask. This is an issue when trying to offline the last CPU
from cpu_active_mask, while num_online_cpus() > 1.
Protect against this case by checking num_active_cpus() instead of
num_online_cpus().
Bug: 161210528
Change-Id: Ibe7d9ef69e5f91e99be0d98076614a7654bda094
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
With the new mechanism which kicks tasks off the outgoing CPU at the end of
schedule() the situation on an outgoing CPU right before the stopper thread
brings it down completely is:
- All user tasks and all unbound kernel threads have either been migrated
away or are not running and the next wakeup will move them to a online CPU.
- All per CPU kernel threads, except cpu hotplug thread and the stopper
thread have either been unbound or parked by the responsible CPU hotplug
callback.
That means that at the last step before the stopper thread is invoked the
cpu hotplug thread is the last legitimate running task on the outgoing
CPU.
Add a final wait step right before the stopper thread is kicked which
ensures that any still running tasks on the way to park or on the way to
kick themself of the CPU are either sleeping or gone.
This allows to remove the migrate_tasks() crutch in sched_cpu_dying(). If
sched_cpu_dying() detects that there is still another running task aside of
the stopper thread then it will explode with the appropriate fireworks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201023102346.547163969@infradead.org
Commit c6e5f9d7cf ("ANDROID: cpu-hotplug: Always use real time
scheduling when hotplugging a CPU") tried to speed-up hotplug of
SCHED_NORMAL tasks by temporarily elevating them to SCHED_FIFO. But
while at it, it also prevented hotplug from SCHED_IDLE, SCHED_BATCH or
SCHED_DEADLINE for no apparent reason.
Since this is a userspace-visible change, and is unlikely to actually be
needed, change the patch logic to only optimize for SCHED_NORMAL tasks
and leave the others untouched.
Bug: 169238689
Fixes: c6e5f9d7cf ("ANDROID: cpu-hotplug: Always use real time
scheduling when hotplugging a CPU")
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I4d9e88b15fee56e7d234826e2eaea306a69328bb
CPU hotplug operations take place in preemptible context. This leaves
the hotplugging thread at the mercy of overall system load and CPU
availability. If the hotplugging thread does not get an opportunity
to execute after it has already begun a hotplug operation, CPUs can
end up being stuck in a quasi online state. In the worst case a CPU
can be stuck in a state where the migration thread is parked while
another task is executing and changing affinity in a loop. This
combination can result in unbounded execution time for the running
task until the hotplugging thread gets the chance to run to complete
the hotplug operation.
Fix the said problem by ensuring that hotplug can only occur from
threads belonging to the RT sched class. This allows the hotplugging
thread priority on the CPU no matter what the system load or the
number of available CPUs are. If a SCHED_NORMAL task attempts to
hotplug a CPU, we temporarily elevate it's scheduling policy to RT.
Furthermore, we disallow hotplugging operations to begin if the
calling task belongs to the idle and deadline classes or those that
use the SCHED_BATCH policy.
Bug: 169238689
Change-Id: Idbb1384626e6ddff46c0d2ce752eee68396c78af
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[psodagud@codeaurora.org: Fixed compilation issues]
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Pull scheduler updates from Ingo Molnar:
"The changes in this cycle are:
- Optimize the task wakeup CPU selection logic, to improve
scalability and reduce wakeup latency spikes
- PELT enhancements
- CFS bandwidth handling fixes
- Optimize the wakeup path by remove rq->wake_list and replacing it
with ->ttwu_pending
- Optimize IPI cross-calls by making flush_smp_call_function_queue()
process sync callbacks first.
- Misc fixes and enhancements"
* tag 'sched-core-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
irq_work: Define irq_work_single() on !CONFIG_IRQ_WORK too
sched/headers: Split out open-coded prototypes into kernel/sched/smp.h
sched: Replace rq::wake_list
sched: Add rq::ttwu_pending
irq_work, smp: Allow irq_work on call_single_queue
smp: Optimize send_call_function_single_ipi()
smp: Move irq_work_run() out of flush_smp_call_function_queue()
smp: Optimize flush_smp_call_function_queue()
sched: Fix smp_call_function_single_async() usage for ILB
sched/core: Offload wakee task activation if it the wakee is descheduling
sched/core: Optimize ttwu() spinning on p->on_cpu
sched: Defend cfs and rt bandwidth quota against overflow
sched/cpuacct: Fix charge cpuacct.usage_sys
sched/fair: Replace zero-length array with flexible-array
sched/pelt: Sync util/runnable_sum with PELT window when propagating
sched/cpuacct: Use __this_cpu_add() instead of this_cpu_ptr()
sched/fair: Optimize enqueue_task_fair()
sched: Make scheduler_ipi inline
sched: Clean up scheduler_ipi()
sched/core: Simplify sched_init()
...
The single user could have called freeze_secondary_cpus() directly.
Since this function was a source of confusion, remove it as it's
just a pointless wrapper.
While at it, rename enable_nonboot_cpus() to thaw_secondary_cpus() to
preserve the naming symmetry.
Done automatically via:
git grep -l enable_nonboot_cpus | xargs sed -i 's/enable_nonboot_cpus/thaw_secondary_cpus/g'
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: https://lkml.kernel.org/r/20200430114004.17477-1-qais.yousef@arm.com
In the CPU-offline process, it calls mmdrop() after idle entry and the
subsequent call to cpuhp_report_idle_dead(). Once execution passes the
call to rcu_report_dead(), RCU is ignoring the CPU, which results in
lockdep complaining when mmdrop() uses RCU from either memcg or
debugobjects below.
Fix it by cleaning up the active_mm state from BP instead. Every arch
which has CONFIG_HOTPLUG_CPU should have already called idle_task_exit()
from AP. The only exception is parisc because it switches them to
&init_mm unconditionally (see smp_boot_one_cpu() and smp_cpu_init()),
but the patch will still work there because it calls mmgrab(&init_mm) in
smp_cpu_init() and then should call mmdrop(&init_mm) in finish_cpu().
WARNING: suspicious RCU usage
-----------------------------
kernel/workqueue.c:710 RCU or wq_pool_mutex should be held!
other info that might help us debug this:
RCU used illegally from offline CPU!
Call Trace:
dump_stack+0xf4/0x164 (unreliable)
lockdep_rcu_suspicious+0x140/0x164
get_work_pool+0x110/0x150
__queue_work+0x1bc/0xca0
queue_work_on+0x114/0x120
css_release+0x9c/0xc0
percpu_ref_put_many+0x204/0x230
free_pcp_prepare+0x264/0x570
free_unref_page+0x38/0xf0
__mmdrop+0x21c/0x2c0
idle_task_exit+0x170/0x1b0
pnv_smp_cpu_kill_self+0x38/0x2e0
cpu_die+0x48/0x64
arch_cpu_idle_dead+0x30/0x50
do_idle+0x2f4/0x470
cpu_startup_entry+0x38/0x40
start_secondary+0x7a8/0xa80
start_secondary_resume+0x10/0x14
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Link: https://lkml.kernel.org/r/20200401214033.8448-1-cai@lca.pw
In a quest to make the huge -rc1 merge easier to handle and bisect,
merge the first chunk of 5.7-rc1 patches into android-mainline.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib54436e9515660a4c0c25c49c21bfb399eb57921
Pull core SMP updates from Thomas Gleixner:
"CPU (hotplug) updates:
- Support for locked CSD objects in smp_call_function_single_async()
which allows to simplify callsites in the scheduler core and MIPS
- Treewide consolidation of CPU hotplug functions which ensures the
consistency between the sysfs interface and kernel state. The low
level functions cpu_up/down() are now confined to the core code and
not longer accessible from random code"
* tag 'smp-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
cpu/hotplug: Ignore pm_wakeup_pending() for disable_nonboot_cpus()
cpu/hotplug: Hide cpu_up/down()
cpu/hotplug: Move bringup of secondary CPUs out of smp_init()
torture: Replace cpu_up/down() with add/remove_cpu()
firmware: psci: Replace cpu_up/down() with add/remove_cpu()
xen/cpuhotplug: Replace cpu_up/down() with device_online/offline()
parisc: Replace cpu_up/down() with add/remove_cpu()
sparc: Replace cpu_up/down() with add/remove_cpu()
powerpc: Replace cpu_up/down() with add/remove_cpu()
x86/smp: Replace cpu_up/down() with add/remove_cpu()
arm64: hibernate: Use bringup_hibernate_cpu()
cpu/hotplug: Provide bringup_hibernate_cpu()
arm64: Use reboot_cpu instead of hardconding it to 0
arm64: Don't use disable_nonboot_cpus()
ARM: Use reboot_cpu instead of hardcoding it to 0
ARM: Don't use disable_nonboot_cpus()
ia64: Replace cpu_down() with smp_shutdown_nonboot_cpus()
cpu/hotplug: Create a new function to shutdown nonboot cpus
cpu/hotplug: Add new {add,remove}_cpu() functions
sched/core: Remove rq.hrtick_csd_pending
...
A recent change to freeze_secondary_cpus() which added an early abort if a
wakeup is pending missed the fact that the function is also invoked for
shutdown, reboot and kexec via disable_nonboot_cpus().
In case of disable_nonboot_cpus() the wakeup event needs to be ignored as
the purpose is to terminate the currently running kernel.
Add a 'suspend' argument which is only set when the freeze is in context of
a suspend operation. If not set then an eventually pending wakeup event is
ignored.
Fixes: a66d955e91 ("cpu/hotplug: Abort disabling secondary CPUs if wakeup is pending")
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Pavankumar Kondeti <pkondeti@codeaurora.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/874kuaxdiz.fsf@nanos.tec.linutronix.de
Use separate functions for the device core to bring a CPU up and down.
Users outside the device core must use add/remove_cpu() which will take
care of extra housekeeping work like keeping sysfs in sync.
Make cpu_up/down() static and replace the extra layer of indirection.
[ tglx: Removed the extra wrapper functions and adjusted function names ]
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200323135110.30522-18-qais.yousef@arm.com