8a901b005112d29984d533911a4a0dfbbafff15f
196 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
a8b5dc3032 |
Merge 5.15.17 into android13-5.15
Changes in 5.15.17
KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMU
KVM: VMX: switch blocked_vcpu_on_cpu_lock to raw spinlock
HID: Ignore battery for Elan touchscreen on HP Envy X360 15t-dr100
HID: uhid: Fix worker destroying device without any protection
HID: wacom: Reset expected and received contact counts at the same time
HID: wacom: Ignore the confidence flag when a touch is removed
HID: wacom: Avoid using stale array indicies to read contact count
ALSA: core: Fix SSID quirk lookup for subvendor=0
f2fs: fix to do sanity check on inode type during garbage collection
f2fs: fix to do sanity check in is_alive()
f2fs: avoid EINVAL by SBI_NEED_FSCK when pinning a file
nfc: llcp: fix NULL error pointer dereference on sendmsg() after failed bind()
mtd: rawnand: gpmi: Add ERR007117 protection for nfc_apply_timings
mtd: rawnand: gpmi: Remove explicit default gpmi clock setting for i.MX6
mtd: Fixed breaking list in __mtd_del_partition.
mtd: rawnand: davinci: Don't calculate ECC when reading page
mtd: rawnand: davinci: Avoid duplicated page read
mtd: rawnand: davinci: Rewrite function description
mtd: rawnand: Export nand_read_page_hwecc_oob_first()
mtd: rawnand: ingenic: JZ4740 needs 'oob_first' read page function
riscv: Get rid of MAXPHYSMEM configs
RISC-V: Use common riscv_cpuid_to_hartid_mask() for both SMP=y and SMP=n
riscv: try to allocate crashkern region from 32bit addressible memory
riscv: Don't use va_pa_offset on kdump
riscv: use hart id instead of cpu id on machine_kexec
riscv: mm: fix wrong phys_ram_base value for RV64
x86/gpu: Reserve stolen memory for first integrated Intel GPU
tools/nolibc: x86-64: Fix startup code bug
crypto: x86/aesni - don't require alignment of data
tools/nolibc: i386: fix initial stack alignment
tools/nolibc: fix incorrect truncation of exit code
rtc: cmos: take rtc_lock while reading from CMOS
net: phy: marvell: add Marvell specific PHY loopback
ksmbd: uninitialized variable in create_socket()
ksmbd: fix guest connection failure with nautilus
ksmbd: add support for smb2 max credit parameter
ksmbd: move credit charge deduction under processing request
ksmbd: limits exceeding the maximum allowable outstanding requests
ksmbd: add reserved room in ipc request/response
media: cec: fix a deadlock situation
media: ov8865: Disable only enabled regulators on error path
media: v4l2-ioctl.c: readbuffers depends on V4L2_CAP_READWRITE
media: flexcop-usb: fix control-message timeouts
media: mceusb: fix control-message timeouts
media: em28xx: fix control-message timeouts
media: cpia2: fix control-message timeouts
media: s2255: fix control-message timeouts
media: dib0700: fix undefined behavior in tuner shutdown
media: redrat3: fix control-message timeouts
media: pvrusb2: fix control-message timeouts
media: stk1160: fix control-message timeouts
media: cec-pin: fix interrupt en/disable handling
can: softing_cs: softingcs_probe(): fix memleak on registration failure
mei: hbm: fix client dma reply status
iio: adc: ti-adc081c: Partial revert of removal of ACPI IDs
iio: trigger: Fix a scheduling whilst atomic issue seen on tsc2046
lkdtm: Fix content of section containing lkdtm_rodata_do_nothing()
bus: mhi: pci_generic: Graceful shutdown on freeze
bus: mhi: core: Fix reading wake_capable channel configuration
bus: mhi: core: Fix race while handling SYS_ERR at power up
cxl/pmem: Fix reference counting for delayed work
arm64: errata: Fix exec handling in erratum
|
||
|
|
378723bd01 |
sched/rt: Try to restart rt period timer when rt runtime exceeded
[ Upstream commit 9b58e976b3b391c0cf02e038d53dd0478ed3013c ]
When rt_runtime is modified from -1 to a valid control value, it may
cause the task to be throttled all the time. Operations like the following
will trigger the bug. E.g:
1. echo -1 > /proc/sys/kernel/sched_rt_runtime_us
2. Run a FIFO task named A that executes while(1)
3. echo 950000 > /proc/sys/kernel/sched_rt_runtime_us
When rt_runtime is -1, The rt period timer will not be activated when task
A enqueued. And then the task will be throttled after setting rt_runtime to
950,000. The task will always be throttled because the rt period timer is
not activated.
Fixes:
|
||
|
|
f0a317610a |
ANDROID: sched: Export symbol for vendor RT hook funcion
Export task_may_not_preempt. Bug: 174030348 Change-Id: I71b50f876306811f008414096043b883dc43b4d5 Signed-off-by: Rick Yiu <rickyiu@google.com> Signed-off-by: Will McVicker <willmcvicker@google.com> Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
|
966869fb2a |
Merge 5.15.5 into android13-5.15
Changes in 5.15.5
arm64: zynqmp: Do not duplicate flash partition label property
arm64: zynqmp: Fix serial compatible string
clk: sunxi-ng: Unregister clocks/resets when unbinding
ARM: dts: sunxi: Fix OPPs node name
arm64: dts: allwinner: h5: Fix GPU thermal zone node name
arm64: dts: allwinner: a100: Fix thermal zone node name
staging: wfx: ensure IRQ is ready before enabling it
ARM: dts: BCM5301X: Fix nodes names
ARM: dts: BCM5301X: Fix MDIO mux binding
ARM: dts: NSP: Fix mpcore, mmc node names
arm64: dts: broadcom: bcm4908: Move reboot syscon out of bus
scsi: pm80xx: Fix memory leak during rmmod
scsi: lpfc: Fix list_add() corruption in lpfc_drain_txq()
ASoC: mediatek: mt8195: Add missing of_node_put()
arm64: dts: rockchip: Disable CDN DP on Pinebook Pro
arm64: dts: hisilicon: fix arm,sp805 compatible string
RDMA/bnxt_re: Check if the vlan is valid before reporting
bus: ti-sysc: Add quirk handling for reinit on context lost
bus: ti-sysc: Use context lost quirk for otg
usb: musb: tusb6010: check return value after calling platform_get_resource()
usb: typec: tipd: Remove WARN_ON in tps6598x_block_read
ARM: dts: ux500: Skomer regulator fixes
staging: rtl8723bs: remove possible deadlock when disconnect (v2)
staging: rtl8723bs: remove a second possible deadlock
staging: rtl8723bs: remove a third possible deadlock
ARM: BCM53016: Specify switch ports for Meraki MR32
arm64: dts: qcom: msm8998: Fix CPU/L2 idle state latency and residency
arm64: dts: qcom: ipq6018: Fix qcom,controlled-remotely property
arm64: dts: qcom: ipq8074: Fix qcom,controlled-remotely property
arm64: dts: qcom: sdm845: Fix qcom,controlled-remotely property
arm64: dts: freescale: fix arm,sp805 compatible string
arm64: dts: ls1012a: Add serial alias for ls1012a-rdb
RDMA/rxe: Separate HW and SW l/rkeys
ASoC: SOF: Intel: hda-dai: fix potential locking issue
scsi: core: Fix scsi_mode_sense() buffer length handling
ALSA: usb-audio: disable implicit feedback sync for Behringer UFX1204 and UFX1604
clk: imx: imx6ul: Move csi_sel mux to correct base register
ASoC: es8316: Use IRQF_NO_AUTOEN when requesting the IRQ
ASoC: rt5651: Use IRQF_NO_AUTOEN when requesting the IRQ
ASoC: nau8824: Add DMI quirk mechanism for active-high jack-detect
scsi: advansys: Fix kernel pointer leak
scsi: smartpqi: Add controller handshake during kdump
arm64: dts: imx8mm-kontron: Fix reset delays for ethernet PHY
ALSA: intel-dsp-config: add quirk for APL/GLK/TGL devices based on ES8336 codec
ASoC: Intel: soc-acpi: add missing quirk for TGL SDCA single amp
ASoC: Intel: sof_sdw: add missing quirk for Dell SKU 0A45
firmware_loader: fix pre-allocated buf built-in firmware use
HID: multitouch: disable sticky fingers for UPERFECT Y
ALSA: usb-audio: Add support for the Pioneer DJM 750MK2 Mixer/Soundcard
ARM: dts: omap: fix gpmc,mux-add-data type
usb: host: ohci-tmio: check return value after calling platform_get_resource()
ASoC: rt5682: fix a little pop while playback
ARM: dts: ls1021a: move thermal-zones node out of soc/
ARM: dts: ls1021a-tsn: use generic "jedec,spi-nor" compatible for flash
ALSA: ISA: not for M68K
iommu/vt-d: Do not falsely log intel_iommu is unsupported kernel option
tty: tty_buffer: Fix the softlockup issue in flush_to_ldisc
MIPS: sni: Fix the build
scsi: scsi_debug: Fix out-of-bound read in resp_readcap16()
scsi: scsi_debug: Fix out-of-bound read in resp_report_tgtpgs()
scsi: target: Fix ordered tag handling
scsi: target: Fix alua_tg_pt_gps_count tracking
iio: imu: st_lsm6dsx: Avoid potential array overflow in st_lsm6dsx_set_odr()
RDMA/core: Use kvzalloc when allocating the struct ib_port
scsi: lpfc: Fix use-after-free in lpfc_unreg_rpi() routine
scsi: lpfc: Fix link down processing to address NULL pointer dereference
scsi: lpfc: Allow fabric node recovery if recovery is in progress before devloss
memory: tegra20-emc: Add runtime dependency on devfreq governor module
powerpc/5200: dts: fix memory node unit name
ARM: dts: qcom: fix memory and mdio nodes naming for RB3011
arm64: dts: qcom: Fix node name of rpm-msg-ram device nodes
ALSA: gus: fix null pointer dereference on pointer block
ALSA: usb-audio: fix null pointer dereference on pointer cs_desc
clk: at91: sama7g5: remove prescaler part of master clock
iommu/dart: Initialize DART_STREAMS_ENABLE
powerpc/dcr: Use cmplwi instead of 3-argument cmpli
powerpc/8xx: Fix Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST
sh: check return code of request_irq
maple: fix wrong return value of maple_bus_init().
f2fs: fix up f2fs_lookup tracepoints
f2fs: fix to use WHINT_MODE
f2fs: fix wrong condition to trigger background checkpoint correctly
sh: fix kconfig unmet dependency warning for FRAME_POINTER
sh: math-emu: drop unused functions
sh: define __BIG_ENDIAN for math-emu
f2fs: compress: disallow disabling compress on non-empty compressed file
f2fs: fix incorrect return value in f2fs_sanity_check_ckpt()
clk: ingenic: Fix bugs with divided dividers
clk/ast2600: Fix soc revision for AHB
clk: qcom: gcc-msm8996: Drop (again) gcc_aggre1_pnoc_ahb_clk
KVM: arm64: Fix host stage-2 finalization
mips: BCM63XX: ensure that CPU_SUPPORTS_32BIT_KERNEL is set
MIPS: boot/compressed/: add __bswapdi2() to target for ZSTD decompression
sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()
sched/fair: Prevent dead task groups from regaining cfs_rq's
perf/x86/vlbr: Add c->flags to vlbr event constraints
blkcg: Remove extra blkcg_bio_issue_init
tracing/histogram: Do not copy the fixed-size char array field over the field size
perf bpf: Avoid memory leak from perf_env__insert_btf()
perf bench futex: Fix memory leak of perf_cpu_map__new()
perf tests: Remove bash construct from record+zstd_comp_decomp.sh
drm/nouveau: hdmigv100.c: fix corrupted HDMI Vendor InfoFrame
bpf: Fix inner map state pruning regression.
samples/bpf: Fix summary per-sec stats in xdp_sample_user
samples/bpf: Fix incorrect use of strlen in xdp_redirect_cpu
selftests: net: switch to socat in the GSO GRE test
net/ipa: ipa_resource: Fix wrong for loop range
tcp: Fix uninitialized access in skb frags array for Rx 0cp.
tracing: Add length protection to histogram string copies
nl80211: fix radio statistics in survey dump
mac80211: fix monitor_sdata RCU/locking assertions
net: ipa: HOLB register sometimes must be written twice
net: ipa: disable HOLB drop when updating timer
selftests: gpio: fix gpio compiling error
net: bnx2x: fix variable dereferenced before check
bnxt_en: reject indirect blk offload when hw-tc-offload is off
tipc: only accept encrypted MSG_CRYPTO msgs
sock: fix /proc/net/sockstat underflow in sk_clone_lock()
net/smc: Make sure the link_id is unique
NFSD: Fix exposure in nfsd4_decode_bitmap()
iavf: Fix return of set the new channel count
iavf: check for null in iavf_fix_features
iavf: free q_vectors before queues in iavf_disable_vf
iavf: don't clear a lock we don't hold
iavf: Fix failure to exit out from last all-multicast mode
iavf: prevent accidental free of filter structure
iavf: validate pointers
iavf: Fix for the false positive ASQ/ARQ errors while issuing VF reset
iavf: Fix for setting queues to 0
iavf: Restore VLAN filters after link down
bpf: Fix toctou on read-only map's constant scalar tracking
MIPS: generic/yamon-dt: fix uninitialized variable error
mips: bcm63xx: add support for clk_get_parent()
mips: lantiq: add support for clk_get_parent()
gpio: rockchip: needs GENERIC_IRQ_CHIP to fix build errors
platform/x86: hp_accel: Fix an error handling path in 'lis3lv02d_probe()'
platform/x86: think-lmi: Abort probe on analyze failure
udp: Validate checksum in udp_read_sock()
btrfs: make 1-bit bit-fields of scrub_page unsigned int
RDMA/core: Set send and receive CQ before forwarding to the driver
net/mlx5e: kTLS, Fix crash in RX resync flow
net/mlx5e: Wait for concurrent flow deletion during neigh/fib events
net/mlx5: E-Switch, Fix resetting of encap mode when entering switchdev
net/mlx5e: nullify cq->dbg pointer in mlx5_debug_cq_remove()
net/mlx5: Update error handler for UCTX and UMEM
net/mlx5: E-Switch, rebuild lag only when needed
net/mlx5e: CT, Fix multiple allocations and memleak of mod acts
net/mlx5: Lag, update tracker when state change event received
net/mlx5: E-Switch, return error if encap isn't supported
scsi: ufs: core: Improve SCSI abort handling
scsi: core: sysfs: Fix hang when device state is set via sysfs
scsi: ufs: core: Fix task management completion timeout race
scsi: ufs: core: Fix another task management completion race
net: mvmdio: fix compilation warning
net: sched: act_mirred: drop dst for the direction from egress to ingress
net: dpaa2-eth: fix use-after-free in dpaa2_eth_remove
net: virtio_net_hdr_to_skb: count transport header in UFO
i40e: Fix correct max_pkt_size on VF RX queue
i40e: Fix NULL ptr dereference on VSI filter sync
i40e: Fix changing previously set num_queue_pairs for PFs
i40e: Fix ping is lost after configuring ADq on VF
RDMA/mlx4: Do not fail the registration on port stats
i40e: Fix warning message and call stack during rmmod i40e driver
i40e: Fix creation of first queue by omitting it if is not power of two
i40e: Fix display error code in dmesg
NFC: reorganize the functions in nci_request
NFC: reorder the logic in nfc_{un,}register_device
NFC: add NCI_UNREG flag to eliminate the race
e100: fix device suspend/resume
ptp: ocp: Fix a couple NULL vs IS_ERR() checks
tools build: Fix removal of feature-sync-compare-and-swap feature detection
riscv: fix building external modules
KVM: PPC: Book3S HV: Use GLOBAL_TOC for kvmppc_h_set_dabr/xdabr()
powerpc: clean vdso32 and vdso64 directories
powerpc/pseries: rename numa_dist_table to form2_distances
powerpc/pseries: Fix numa FORM2 parsing fallback code
pinctrl: qcom: sdm845: Enable dual edge errata
pinctrl: qcom: sm8350: Correct UFS and SDC offsets
perf/x86/intel/uncore: Fix filter_tid mask for CHA events on Skylake Server
perf/x86/intel/uncore: Fix IIO event constraints for Skylake Server
perf/x86/intel/uncore: Fix IIO event constraints for Snowridge
s390/kexec: fix return code handling
blk-cgroup: fix missing put device in error path from blkg_conf_pref()
dmaengine: remove debugfs #ifdef
tun: fix bonding active backup with arp monitoring
Revert "mark pstore-blk as broken"
pstore/blk: Use "%lu" to format unsigned long
hexagon: export raw I/O routines for modules
hexagon: clean up timer-regs.h
tipc: check for null after calling kmemdup
ipc: WARN if trying to remove ipc object which is absent
shm: extend forced shm destroy to support objects from several IPC nses
mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag
hugetlb, userfaultfd: fix reservation restore on userfaultfd error
kmap_local: don't assume kmap PTEs are linear arrays in memory
mm/damon/dbgfs: use '__GFP_NOWARN' for user-specified size buffer allocation
mm/damon/dbgfs: fix missed use of damon_dbgfs_lock
x86/boot: Pull up cmdline preparation and early param parsing
x86/sgx: Fix free page accounting
x86/hyperv: Fix NULL deref in set_hv_tscchange_cb() if Hyper-V setup fails
KVM: x86: Assume a 64-bit hypercall for guests with protected state
KVM: x86: Fix uninitialized eoi_exit_bitmap usage in vcpu_load_eoi_exitmap()
KVM: x86/mmu: include EFER.LMA in extended mmu role
KVM: x86/xen: Fix get_attr of KVM_XEN_ATTR_TYPE_SHARED_INFO
powerpc/signal32: Fix sigset_t copy
powerpc/xive: Change IRQ domain to a tree domain
powerpc/8xx: Fix pinned TLBs with CONFIG_STRICT_KERNEL_RWX
Revert "drm/i915/tgl/dsi: Gate the ddi clocks after pll mapping"
Revert "parisc: Reduce sigreturn trampoline to 3 instructions"
ata: libata: improve ata_read_log_page() error message
ata: libata: add missing ata_identify_page_supported() calls
scsi: qla2xxx: Fix mailbox direction flags in qla2xxx_get_adapter_id()
pinctrl: ralink: include 'ralink_regs.h' in 'pinctrl-mt7620.c'
s390/setup: avoid reserving memory above identity mapping
s390/boot: simplify and fix kernel memory layout setup
s390/vdso: filter out -mstack-guard and -mstack-size
s390/kexec: fix memory leak of ipl report buffer
s390/dump: fix copying to user-space of swapped kdump oldmem
block: Check ADMIN before NICE for IOPRIO_CLASS_RT
fbdev: Prevent probing generic drivers if a FB is already registered
KVM: SEV: Disallow COPY_ENC_CONTEXT_FROM if target has created vCPUs
KVM: nVMX: don't use vcpu->arch.efer when checking host state on nested state load
drm/cma-helper: Release non-coherent memory with dma_free_noncoherent()
printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces
udf: Fix crash after seekdir
spi: fix use-after-free of the add_lock mutex
net: stmmac: socfpga: add runtime suspend/resume callback for stratix10 platform
Drivers: hv: balloon: Use VMBUS_RING_SIZE() wrapper for dm_ring_size
btrfs: fix memory ordering between normal and ordered work functions
fs: handle circular mappings correctly
net: stmmac: Fix signed/unsigned wreckage
parisc/sticon: fix reverse colors
cfg80211: call cfg80211_stop_ap when switch from P2P_GO type
mac80211: fix radiotap header generation
mac80211: drop check for DONT_REORDER in __ieee80211_select_queue
drm/amd/display: Update swizzle mode enums
drm/amd/display: Limit max DSC target bpp for specific monitors
drm/i915/guc: Fix outstanding G2H accounting
drm/i915/guc: Don't enable scheduling on a banned context, guc_id invalid, not registered
drm/i915/guc: Workaround reset G2H is received after schedule done G2H
drm/i915/guc: Don't drop ce->guc_active.lock when unwinding context
drm/i915/guc: Unwind context requests in reverse order
drm/udl: fix control-message timeout
drm/prime: Fix use after free in mmap with drm_gem_ttm_mmap
drm/nouveau: Add a dedicated mutex for the clients list
drm/nouveau: use drm_dev_unplug() during device removal
drm/nouveau: clean up all clients on device removal
drm/i915/dp: Ensure sink rate values are always valid
drm/i915/dp: Ensure max link params are always valid
drm/i915: Fix type1 DVI DP dual mode adapter heuristic for modern platforms
drm/amdgpu: fix set scaling mode Full/Full aspect/Center not works on vga and dvi connectors
drm/amd/pm: avoid duplicate powergate/ungate setting
signal: Implement force_fatal_sig
exit/syscall_user_dispatch: Send ordinary signals on failure
signal/powerpc: On swapcontext failure force SIGSEGV
signal/s390: Use force_sigsegv in default_trap_handler
signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails
signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig
signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
signal/x86: In emulate_vsyscall force a signal instead of calling do_exit
signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
signal: Don't always set SA_IMMUTABLE for forced signals
signal: Replace force_fatal_sig with force_exit_sig when in doubt
hugetlbfs: flush TLBs correctly after huge_pmd_unshare
RDMA/netlink: Add __maybe_unused to static inline in C file
bpf: Forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs
selinux: fix NULL-pointer dereference when hashtab allocation fails
ASoC: DAPM: Cover regression by kctl change notification fix
ASoC: rsnd: fixup DMAEngine API
usb: max-3421: Use driver data instead of maintaining a list of bound devices
ice: Fix VF true promiscuous mode
ice: Delete always true check of PF pointer
fs: export an inode_update_time helper
btrfs: update device path inode time instead of bd_inode
net: add and use skb_unclone_keeptruesize() helper
x86/Kconfig: Fix an unused variable error in dell-smm-hwmon
ALSA: hda: hdac_ext_stream: fix potential locking issues
ALSA: hda: hdac_stream: fix potential locking issue in snd_hdac_stream_assign()
Linux 5.15.5
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: If86a02ba2cf9af765d9838ada3b9a2cbcea9a08d
|
||
|
|
512e21c150 |
sched/fair: Prevent dead task groups from regaining cfs_rq's
[ Upstream commit b027789e5e50494c2325cc70c8642e7fd6059479 ] Kevin is reporting crashes which point to a use-after-free of a cfs_rq in update_blocked_averages(). Initial debugging revealed that we've live cfs_rq's (on_list=1) in an about to be kfree()'d task group in free_fair_sched_group(). However, it was unclear how that can happen. His kernel config happened to lead to a layout of struct sched_entity that put the 'my_q' member directly into the middle of the object which makes it incidentally overlap with SLUB's freelist pointer. That, in combination with SLAB_FREELIST_HARDENED's freelist pointer mangling, leads to a reliable access violation in form of a #GP which made the UAF fail fast. Michal seems to have run into the same issue[1]. He already correctly diagnosed that commit |
||
|
|
81a0b58261 |
ANDROID: sched: add hook to rto_next_cpu
Restricted vendor hook to modify the cpu selected in rto_next_cpu, which is needed for the implementation of CPU Pause. Bug: 205164003 Change-Id: I0dc675e54f7f116d538840262fbb0ba6d28246f4 Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com> |
||
|
|
027f8bd863 |
Revert "Revert "ANDROID: sched: avoid migrating when softint on tgt cpu should be short""
This reverts commit
|
||
|
|
1085eff98a |
ANDROID: sched: Add restrict vendor hooks for balance_rt()
Add rvh called android_rvh_sched_balance_rt to influence balance_rt() from vendor modules. Bug: 178572414 Change-Id: I555c8ebcf5a3a5d8e3ab881ab9aa507f325285c2 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> |
||
|
|
7889eed917 |
Merge 54a728dc5e ("Merge tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") into android-mainline
A little step towards 5.14-rc1 Signed-off-by: Lee Jones <lee.jones@linaro.org> Change-Id: I2573a6df9f4e7b67194327ac6db6082a574d2809 |
||
|
|
fecfcbc288 |
sched/rt: Fix RT utilization tracking during policy change
RT keeps track of the utilization on a per-rq basis with the structure
avg_rt. This utilization is updated during task_tick_rt(),
put_prev_task_rt() and set_next_task_rt(). However, when the current
running task changes its policy, set_next_task_rt() which would usually
take care of updating the utilization when the rq starts running RT tasks,
will not see a such change, leaving the avg_rt structure outdated. When
that very same task will be dequeued later, put_prev_task_rt() will then
update the utilization, based on a wrong last_update_time, leading to a
huge spike in the RT utilization signal.
The signal would eventually recover from this issue after few ms. Even if
no RT tasks are run, avg_rt is also updated in __update_blocked_others().
But as the CPU capacity depends partly on the avg_rt, this issue has
nonetheless a significant impact on the scheduler.
Fix this issue by ensuring a load update when a running task changes
its policy to RT.
Fixes:
|
||
|
|
21f56ffe44 |
sched: Introduce sched_class::pick_task()
Because sched_class::pick_next_task() also implies sched_class::set_next_task() (and possibly put_prev_task() and newidle_balance) it is not state invariant. This makes it unsuitable for remote task selection. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [Vineeth: folded fixes] Signed-off-by: Vineeth Remanan Pillai <viremana@linux.microsoft.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Don Hiatt <dhiatt@digitalocean.com> Tested-by: Hongyu Ning <hongyu.ning@linux.intel.com> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210422123308.437092775@infradead.org |
||
|
|
5cb9eaa3d2 |
sched: Wrap rq::lock access
In preparation of playing games with rq->lock, abstract the thing using an accessor. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Don Hiatt <dhiatt@digitalocean.com> Tested-by: Hongyu Ning <hongyu.ning@linux.intel.com> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210422123308.136465446@infradead.org |
||
|
|
4797acfb9c |
Merge 16b3d0cf5b Merge tag 'sched-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into android-mainline
A little step en route to v5.13-rc1 Signed-off-by: Lee Jones <lee.jones@linaro.org> Change-Id: Ic2fb8aa220023572c96907aebce0a675333ef29f |
||
|
|
3b03706fa6 |
sched: Fix various typos
Fix ~42 single-word typos in scheduler code comments. We have accumulated a few fun ones over the years. :-) Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: linux-kernel@vger.kernel.org |
||
|
|
1e5781383b |
Merge 657bd90c93 ("Merge tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") into android-mainline
Steps on the way to 5.12-rc1
Resolves conflicts in:
kernel/sched/cpufreq_schedutil.c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I06d90f919467f3e7e8970aaedbb872a10eb699ff
|
||
|
|
65bcf072e2 |
sched: Use task_current() instead of 'rq->curr == p'
Use the task_current() function where appropriate. No functional change. Signed-off-by: Hui Su <sh_def@163.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20201030173223.GA52339@rlk |
||
|
|
4196c1dafc |
Revert "ANDROID: sched: avoid migrating when softint on tgt cpu should be short"
This reverts commit
|
||
|
|
0dd08d5801 |
Merge adb35e8dc9 ("Merge tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") into android-mainline
Now that CPU pause is gone, this is a lot more manageable. The remaining conflicts were caused mostly by vendor hooks and Android-specific tweaks to the EAS topology code, but easily fixable by hand. Signed-off-by: Quentin Perret <qperret@google.com> Change-Id: I3665a2d78cb0b8eca6ba5110e90dc7f72030805e |
||
|
|
ebce8ec6bd |
ANDROID: sched: rt: rearrange invocation of find_lowest_rq() vendor hook
Right now, invocation of find_lowest_rq() vendor hook is made before error checks and also, cpupri_find() isn't exported either. It would be appropriate to move invocation of find_lowest_rq() vendor hook after error checks are done & calling cpupri_find(). Bug: 173559623 Change-Id: I298dffd39be0451b0b154930ace4e16763c6e78d Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org> |
||
|
|
8d19443b0b |
ANDROID: sched: avoid migrating when softint on tgt cpu should be short
The scheduling change to avoid putting RT threads on cores that are handling softint's was catching cases where there was no reason to believe the softint would take a long time, resulting in unnecessary migration overhead. This patch reduces the migration to cases where the core has a softint that is actually likely to take a long time, as opposed to the RCU, SCHED, and TIMER softints that are rather quick. Bug: 31752786 Bug: 168521633 Change-Id: Ib4e179f1e15c736b2fdba31070494e357e9fbbe2 Signed-off-by: John Dias <joaodias@google.com> [elavila: Amend commit text for AOSP, port to mainline] Signed-off-by: J. Avila <elavila@google.com> |
||
|
|
3adfd8e344 |
ANDROID: sched: avoid placing RT threads on cores handling softirqs
In certain audio use cases, scheduling RT threads on cores that are handling softirqs can lead to glitches. Prevent this behavior. Bug: 31501544 Bug: 168521633 Change-Id: I99dd7aaa12c11270b28dbabea484bcc8fb8ba0c1 Signed-off-by: John Dias <joaodias@google.com> [elavila: Port to mainline, amend commit text] Signed-off-by: J. Avila <elavila@google.com> |
||
|
|
3aef1551e9 |
sched: Remove select_task_rq()'s sd_flag parameter
Only select_task_rq_fair() uses that parameter to do an actual domain search, other classes only care about what kind of wakeup is happening (fork, exec, or "regular") and thus just translate the flag into a wakeup type. WF_TTWU and WF_EXEC have just been added, use these along with WF_FORK to encode the wakeup types we care about. For select_task_rq_fair(), we can simply use the shiny new WF_flag : SD_flag mapping. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201102184514.2733-3-valentin.schneider@arm.com |
||
|
|
12fa97c64d | Merge branch 'sched/migrate-disable' | ||
|
|
a7c81556ec |
sched: Fix migrate_disable() vs rt/dl balancing
In order to minimize the interference of migrate_disable() on lower priority tasks, which can be deprived of runtime due to being stuck below a higher priority task. Teach the RT/DL balancers to push away these higher priority tasks when a lower priority task gets selected to run on a freshly demoted CPU (pull). This adds migration interference to the higher priority task, but restores bandwidth to system that would otherwise be irrevocably lost. Without this it would be possible to have all tasks on the system stuck on a single CPU, each task preempted in a migrate_disable() section with a single high priority task running. This way we can still approximate running the M highest priority tasks on the system. Migrating the top task away is (ofcourse) still subject to migrate_disable() too, which means the lower task is subject to an interference equivalent to the worst case migrate_disable() section. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102347.499155098@infradead.org |
||
|
|
95158a89dd |
sched,rt: Use the full cpumask for balancing
We want migrate_disable() tasks to get PULLs in order for them to PUSH away the higher priority task. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102347.310519774@infradead.org |
||
|
|
14e292f8d4 |
sched,rt: Use cpumask_any*_distribute()
Replace a bunch of cpumask_any*() instances with cpumask_any*_distribute(), by injecting this little bit of random in cpu selection, we reduce the chance two competing balance operations working off the same lowest_mask pick the same CPU. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102347.190759694@infradead.org |
||
|
|
120455c514 |
sched: Fix hotplug vs CPU bandwidth control
Since we now migrate tasks away before DYING, we should also move bandwidth unthrottle, otherwise we can gain tasks from unthrottle after we expect all tasks to be gone already. Also; it looks like the RT balancers don't respect cpu_active() and instead rely on rq->online in part, complete this. This too requires we do set_rq_offline() earlier to match the cpu_active() semantics. (The bigger patch is to convert RT to cpu_active() entirely) Since set_rq_online() is called from sched_cpu_activate(), place set_rq_offline() in sched_cpu_deactivate(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102346.639538965@infradead.org |
||
|
|
bf3d991a7d |
ANDROID: sched: Add trace hook for rt throttle dump
Create a trace hook when RT tasks are throttled. This allows vendors to debug long RT runs. Bug: 172264047 Change-Id: I534959f8e8d714463aac2f9f1c5627d2e735f543 Signed-off-by: Sai Harshini Nimmala <snimmala@codeaurora.org> |
||
|
|
43c31ac0e6 |
sched: Remove relyance on STRUCT_ALIGNMENT
Florian reported that all of kernel/sched/ is rebuild when
CONFIG_BLK_DEV_INITRD is changed, which, while not a bug is
unexpected. This is due to us including vmlinux.lds.h.
Jakub explained that the problem is that we put the alignment
requirement on the type instead of on a variable. Type alignment is a
minimum, the compiler is free to pick any larger alignment for a
specific instance of the type (eg. the variable).
So force the type alignment on all individual variable definitions and
remove the undesired dependency on vmlinux.lds.h.
Fixes:
|
||
|
|
934fc3314b |
sched/cpupri: Remap CPUPRI_NORMAL to MAX_RT_PRIO-1
This makes the mapping continuous and frees up 100 for other usage.
Prev mapping:
p->rt_priority p->prio newpri cpupri
-1 -1 (CPUPRI_INVALID)
100 0 (CPUPRI_NORMAL)
1 98 98 1
...
49 50 50 49
50 49 49 50
...
99 0 0 99
New mapping:
p->rt_priority p->prio newpri cpupri
-1 -1 (CPUPRI_INVALID)
99 0 (CPUPRI_NORMAL)
1 98 98 1
...
49 50 50 49
50 49 49 50
...
99 0 0 99
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
|
||
|
|
67d3ed5765 |
Merge 'v5.10-rc1' into android-mainline
Linux 5.10-rc1 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Iace3fc84a00d3023c75caa086a266de17dc1847c |
||
|
|
33def8498f |
treewide: Convert macro and uses of __section(foo) to __section("foo")
Use a more generic form for __section that requires quotes to avoid
complications with clang and gcc differences.
Remove the quote operator # from compiler_attributes.h __section macro.
Convert all unquoted __section(foo) uses to quoted __section("foo").
Also convert __attribute__((section("foo"))) uses to __section("foo")
even if the __attribute__ has multiple list entry forms.
Conversion done using the script at:
https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
00d6a8a7ee |
Merge e4cbce4d13 ("Merge tag 'sched-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") into android-mainline
Baby steps for 5.9-rc1 Resolves some kernel/sched/ merge issues. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I88cf5411ac7251f9795d9c50cb18b0df5bf0bcd6 |
||
|
|
16330b3560 |
ANDROID: Add vendor hooks to the scheduler
Add vendor hooks for vendor-specific scheduling.
android_rvh_select_task_rq_rt:
To perform vendor-specific RT task placement.
android_rvh_select_fallback_rq:
To restrict cpu usage.
android_rvh_scheduler_tick:
To collect periodic scheduling information and to schedule tasks.
android_rvh_enqueue_tas/android_rvh_dequeue_task:
For vendor to be aware of the task schedule in/out.
android_rvh_can_migrate_task:
To limit task migration based on vendor requirements.
android_rvh_find_lowest_rq:
To find the lowest rq for RT task with vendor-specific way.
Bug: 155241766
Change-Id: I926458b0a911d564e5932e200125b12406c2deee
Signed-off-by: Park Bumgyu <bumgyu.park@samsung.com>
|
||
|
|
a87e749e8f |
sched: Remove struct sched_class::next field
Now that the sched_class descriptors are defined in order via the linker script vmlinux.lds.h, there's no reason to have a "next" pointer to the previous priroity structure. The order of the sturctures can be aligned as an array, and used to index and find the next sched_class descriptor. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20191219214558.845353593@goodmis.org |
||
|
|
590d697963 |
sched: Force the address order of each sched class descriptor
In order to make a micro optimization in pick_next_task(), the order of the sched class descriptor address must be in the same order as their priority to each other. That is: &idle_sched_class < &fair_sched_class < &rt_sched_class < &dl_sched_class < &stop_sched_class In order to guarantee this order of the sched class descriptors, add each one into their own data section and force the order in the linker script. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/157675913272.349305.8936736338884044103.stgit@localhost.localdomain |
||
|
|
cb8e59cc87 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:
1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
Augusto von Dentz.
2) Add GSO partial support to igc, from Sasha Neftin.
3) Several cleanups and improvements to r8169 from Heiner Kallweit.
4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
device self-test. From Andrew Lunn.
5) Start moving away from custom driver versions, use the globally
defined kernel version instead, from Leon Romanovsky.
6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.
7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.
8) Add sriov and vf support to hinic, from Luo bin.
9) Support Media Redundancy Protocol (MRP) in the bridging code, from
Horatiu Vultur.
10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.
11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
Dubroca. Also add ipv6 support for espintcp.
12) Lots of ReST conversions of the networking documentation, from Mauro
Carvalho Chehab.
13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
from Doug Berger.
14) Allow to dump cgroup id and filter by it in inet_diag code, from
Dmitry Yakunin.
15) Add infrastructure to export netlink attribute policies to
userspace, from Johannes Berg.
16) Several optimizations to sch_fq scheduler, from Eric Dumazet.
17) Fallback to the default qdisc if qdisc init fails because otherwise
a packet scheduler init failure will make a device inoperative. From
Jesper Dangaard Brouer.
18) Several RISCV bpf jit optimizations, from Luke Nelson.
19) Correct the return type of the ->ndo_start_xmit() method in several
drivers, it's netdev_tx_t but many drivers were using
'int'. From Yunjian Wang.
20) Add an ethtool interface for PHY master/slave config, from Oleksij
Rempel.
21) Add BPF iterators, from Yonghang Song.
22) Add cable test infrastructure, including ethool interfaces, from
Andrew Lunn. Marvell PHY driver is the first to support this
facility.
23) Remove zero-length arrays all over, from Gustavo A. R. Silva.
24) Calculate and maintain an explicit frame size in XDP, from Jesper
Dangaard Brouer.
25) Add CAP_BPF, from Alexei Starovoitov.
26) Support terse dumps in the packet scheduler, from Vlad Buslov.
27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.
28) Add devm_register_netdev(), from Bartosz Golaszewski.
29) Minimize qdisc resets, from Cong Wang.
30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
eliminate set_fs/get_fs calls. From Christoph Hellwig.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
selftests: net: ip_defrag: ignore EPERM
net_failover: fixed rollback in net_failover_open()
Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
vmxnet3: allow rx flow hash ops only when rss is enabled
hinic: add set_channels ethtool_ops support
selftests/bpf: Add a default $(CXX) value
tools/bpf: Don't use $(COMPILE.c)
bpf, selftests: Use bpf_probe_read_kernel
s390/bpf: Use bcr 0,%0 as tail call nop filler
s390/bpf: Maintain 8-byte stack alignment
selftests/bpf: Fix verifier test
selftests/bpf: Fix sample_cnt shared between two threads
bpf, selftests: Adapt cls_redirect to call csum_level helper
bpf: Add csum_level helper for fixing up csum levels
bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
crypto/chtls: IPv6 support for inline TLS
Crypto/chcr: Fixes a coccinile check error
Crypto/chcr: Fixes compilations warnings
...
|
||
|
|
d505b8af58 |
sched: Defend cfs and rt bandwidth quota against overflow
When users write some huge number into cpu.cfs_quota_us or cpu.rt_runtime_us, overflow might happen during to_ratio() shifts of schedulable checks. to_ratio() could be altered to avoid unnecessary internal overflow, but min_cfs_quota_period is less than 1 << BW_SHIFT, so a cutoff would still be needed. Set a cap MAX_BW for cfs_quota_us and rt_runtime_us to prevent overflow. Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ben Segall <bsegall@google.com> Link: https://lkml.kernel.org/r/20200425105248.60093-1-changhuaixin@linux.alibaba.com |
||
|
|
32927393dc |
sysctl: pass kernel pointers to ->proc_handler
Instead of having all the sysctl handlers deal with user pointers, which is rather hairy in terms of the BPF interaction, copy the input to and from userspace in common code. This also means that the strings are always NUL-terminated by the common code, making the API a little bit safer. As most handler just pass through the data to one of the common handlers a lot of the changes are mechnical. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
|
|
d94a9df490 |
sched/rt: Remove unnecessary push for unfit tasks
In task_woken_rt() and switched_to_rto() we try trigger push-pull if the
task is unfit.
But the logic is found lacking because if the task was the only one
running on the CPU, then rt_rq is not in overloaded state and won't
trigger a push.
The necessity of this logic was under a debate as well, a summary of
the discussion can be found in the following thread:
https://lore.kernel.org/lkml/20200226160247.iqvdakiqbakk2llz@e107158-lin.cambridge.arm.com/
Remove the logic for now until a better approach is agreed upon.
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fixes:
|
||
|
|
98ca645f82 |
sched/rt: Allow pulling unfitting task
When implemented RT Capacity Awareness; the logic was done such that if
a task was running on a fitting CPU, then it was sticky and we would try
our best to keep it there.
But as Steve suggested, to adhere to the strict priority rules of RT
class; allow pulling an RT task to unfitting CPU to ensure it gets a
chance to run ASAP.
LINK: https://lore.kernel.org/lkml/20200203111451.0d1da58f@oasis.local.home/
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fixes:
|
||
|
|
a1bd02e1f2 |
sched/rt: Optimize cpupri_find() on non-heterogenous systems
By introducing a new cpupri_find_fitness() function that takes the
fitness_fn as an argument and only called when asym_system static key is
enabled.
cpupri_find() is now a wrapper function that calls cpupri_find_fitness()
passing NULL as a fitness_fn, hence disabling the logic that handles
fitness by default.
LINK: https://lore.kernel.org/lkml/c0772fca-0a4b-c88d-fdf2-5715fcf8447b@arm.com/
Reported-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fixes:
|
||
|
|
b28bc1e002 |
sched/rt: Re-instate old behavior in select_task_rq_rt()
When RT Capacity Aware support was added, the logic in select_task_rq_rt
was modified to force a search for a fitting CPU if the task currently
doesn't run on one.
But if the search failed, and the search was only triggered to fulfill
the fitness request; we could end up selecting a new CPU unnecessarily.
Fix this and re-instate the original behavior by ensuring we bail out
in that case.
This behavior change only affected asymmetric systems that are using
util_clamp to implement capacity aware. None asymmetric systems weren't
affected.
LINK: https://lore.kernel.org/lkml/20200218041620.GD28029@codeaurora.org/
Reported-by: Pavan Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fixes:
|
||
|
|
b4fb015eef |
sched/rt: Optimize checking group RT scheduler constraints
Group RT scheduler contains protection against setting zero runtime for
cgroup with RT tasks. Right now function tg_set_rt_bandwidth() iterates
over all CPU cgroups and calls tg_has_rt_tasks() for any cgroup which
runtime is zero (not only for changed one). Default RT runtime is zero,
thus tg_has_rt_tasks() will is called for almost at CPU cgroups.
This protection already is slightly racy: runtime limit could be changed
between cpu_cgroup_can_attach() and cpu_cgroup_attach() because changing
cgroup attribute does not lock cgroup_mutex while attach does not lock
rt_constraints_mutex. Changing task scheduler class also races with
changing rt runtime: check in __sched_setscheduler() isn't protected.
Function tg_has_rt_tasks() iterates over all threads in the system.
This gives NR_CGROUPS * NR_TASKS operations under single tasklist_lock
locked for read tg_set_rt_bandwidth(). Any concurrent attempt of locking
tasklist_lock for write (for example fork) will stuck with disabled irqs.
This patch makes two optimizations:
1) Remove locking tasklist_lock and iterate only tasks in cgroup
2) Call tg_has_rt_tasks() iff rt runtime changes from non-zero to zero
All changed code is under CONFIG_RT_GROUP_SCHED.
Testcase:
# mkdir /sys/fs/cgroup/cpu/test{1..10000}
# echo 0 | tee /sys/fs/cgroup/cpu/test*/cpu.rt_runtime_us
At the same time without patch fork time will be >100ms:
# perf trace -e clone --duration 100 stress-ng --fork 1
Also remote ping will show timings >100ms caused by irq latency.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/157996383820.4651.11292439232549211693.stgit@buzz
|
||
|
|
804d402fb6 |
sched/rt: Make RT capacity-aware
Capacity Awareness refers to the fact that on heterogeneous systems (like Arm big.LITTLE), the capacity of the CPUs is not uniform, hence when placing tasks we need to be aware of this difference of CPU capacities. In such scenarios we want to ensure that the selected CPU has enough capacity to meet the requirement of the running task. Enough capacity means here that capacity_orig_of(cpu) >= task.requirement. The definition of task.requirement is dependent on the scheduling class. For CFS, utilization is used to select a CPU that has >= capacity value than the cfs_task.util. capacity_orig_of(cpu) >= cfs_task.util DL isn't capacity aware at the moment but can make use of the bandwidth reservation to implement that in a similar manner CFS uses utilization. The following patchset implements that: https://lore.kernel.org/lkml/20190506044836.2914-1-luca.abeni@santannapisa.it/ capacity_orig_of(cpu)/SCHED_CAPACITY >= dl_deadline/dl_runtime For RT we don't have a per task utilization signal and we lack any information in general about what performance requirement the RT task needs. But with the introduction of uclamp, RT tasks can now control that by setting uclamp_min to guarantee a minimum performance point. ATM the uclamp value are only used for frequency selection; but on heterogeneous systems this is not enough and we need to ensure that the capacity of the CPU is >= uclamp_min. Which is what implemented here. capacity_orig_of(cpu) >= rt_task.uclamp_min Note that by default uclamp.min is 1024, which means that RT tasks will always be biased towards the big CPUs, which make for a better more predictable behavior for the default case. Must stress that the bias acts as a hint rather than a definite placement strategy. For example, if all big cores are busy executing other RT tasks we can't guarantee that a new RT task will be placed there. On non-heterogeneous systems the original behavior of RT should be retained. Similarly if uclamp is not selected in the config. [ mingo: Minor edits to comments. ] Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191009104611.15363-1-qais.yousef@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|
|
a0e813f26e |
sched/core: Further clarify sched_class::set_next_task()
It turns out there really is something special to the first
set_next_task() invocation. In specific the 'change' pattern really
should not cause balance callbacks.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: dietmar.eggemann@arm.com
Cc: juri.lelli@redhat.com
Cc: ktkhai@virtuozzo.com
Cc: mgorman@suse.de
Cc: qais.yousef@arm.com
Cc: qperret@google.com
Cc: rostedt@goodmis.org
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Fixes:
|
||
|
|
98c2f700ed |
sched/core: Simplify sched_class::pick_next_task()
Now that the indirect class call never uses the last two arguments of pick_next_task(), remove them. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: juri.lelli@redhat.com Cc: ktkhai@virtuozzo.com Cc: mgorman@suse.de Cc: qais.yousef@arm.com Cc: qperret@google.com Cc: rostedt@goodmis.org Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Link: https://lkml.kernel.org/r/20191108131909.660595546@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|
|
6e2df0581f |
sched: Fix pick_next_task() vs 'change' pattern race
Commit |
||
|
|
7f2444d38f |
Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core timer updates from Thomas Gleixner:
"Timers and timekeeping updates:
- A large overhaul of the posix CPU timer code which is a preparation
for moving the CPU timer expiry out into task work so it can be
properly accounted on the task/process.
An update to the bogus permission checks will come later during the
merge window as feedback was not complete before heading of for
travel.
- Switch the timerqueue code to use cached rbtrees and get rid of the
homebrewn caching of the leftmost node.
- Consolidate hrtimer_init() + hrtimer_init_sleeper() calls into a
single function
- Implement the separation of hrtimers to be forced to expire in hard
interrupt context even when PREEMPT_RT is enabled and mark the
affected timers accordingly.
- Implement a mechanism for hrtimers and the timer wheel to protect
RT against priority inversion and live lock issues when a (hr)timer
which should be canceled is currently executing the callback.
Instead of infinitely spinning, the task which tries to cancel the
timer blocks on a per cpu base expiry lock which is held and
released by the (hr)timer expiry code.
- Enable the Hyper-V TSC page based sched_clock for Hyper-V guests
resulting in faster access to timekeeping functions.
- Updates to various clocksource/clockevent drivers and their device
tree bindings.
- The usual small improvements all over the place"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits)
posix-cpu-timers: Fix permission check regression
posix-cpu-timers: Always clear head pointer on dequeue
hrtimer: Add a missing bracket and hide `migration_base' on !SMP
posix-cpu-timers: Make expiry_active check actually work correctly
posix-timers: Unbreak CONFIG_POSIX_TIMERS=n build
tick: Mark sched_timer to expire in hard interrupt context
hrtimer: Add kernel doc annotation for HRTIMER_MODE_HARD
x86/hyperv: Hide pv_ops access for CONFIG_PARAVIRT=n
posix-cpu-timers: Utilize timerqueue for storage
posix-cpu-timers: Move state tracking to struct posix_cputimers
posix-cpu-timers: Deduplicate rlimit handling
posix-cpu-timers: Remove pointless comparisons
posix-cpu-timers: Get rid of 64bit divisions
posix-cpu-timers: Consolidate timer expiry further
posix-cpu-timers: Get rid of zero checks
rlimit: Rewrite non-sensical RLIMIT_CPU comment
posix-cpu-timers: Respect INFINITY for hard RTTIME limit
posix-cpu-timers: Switch thread group sampling to array
posix-cpu-timers: Restructure expiry array
posix-cpu-timers: Remove cputime_expires
...
|
||
|
|
3a245c0f11 |
posix-cpu-timers: Move expiry cache into struct posix_cputimers
The expiry cache belongs into the posix_cputimers container where the other cpu timers information is. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190821192921.014444012@linutronix.de |