Changes in 5.15.81
ASoC: fsl_sai: use local device pointer
ASoC: fsl_asrc fsl_esai fsl_sai: allow CONFIG_PM=N
serial: Add rs485_supported to uart_port
serial: fsl_lpuart: Fill in rs485_supported
tty: serial: fsl_lpuart: don't break the on-going transfer when global reset
sctp: remove the unnecessary sinfo_stream check in sctp_prsctp_prune_unsent
sctp: clear out_curr if all frag chunks of current msg are pruned
cifs: introduce new helper for cifs_reconnect()
cifs: split out dfs code from cifs_reconnect()
cifs: support nested dfs links over reconnect
cifs: Fix connections leak when tlink setup failed
ata: libata-scsi: simplify __ata_scsi_queuecmd()
ata: libata-core: do not issue non-internal commands once EH is pending
drm/display: Don't assume dual mode adaptors support i2c sub-addressing
nvme: add a bogus subsystem NQN quirk for Micron MTFDKBA2T0TFH
nvme-pci: add NVME_QUIRK_BOGUS_NID for Micron Nitro
nvme-pci: disable namespace identifiers for the MAXIO MAP1001
nvme-pci: disable write zeroes on various Kingston SSD
nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV7000
iio: ms5611: Simplify IO callback parameters
iio: pressure: ms5611: fixed value compensation bug
ceph: do not update snapshot context when there is no new snapshot
ceph: avoid putting the realm twice when decoding snaps fails
x86/sgx: Create utility to validate user provided offset and length
x86/sgx: Add overflow check in sgx_validate_offset_length()
binder: validate alloc->mm in ->mmap() handler
ceph: Use kcalloc for allocating multiple elements
ceph: fix NULL pointer dereference for req->r_session
wifi: mac80211: fix memory free error when registering wiphy fail
wifi: mac80211_hwsim: fix debugfs attribute ps with rc table support
riscv: dts: sifive unleashed: Add PWM controlled LEDs
audit: fix undefined behavior in bit shift for AUDIT_BIT
wifi: airo: do not assign -1 to unsigned char
wifi: mac80211: Fix ack frame idr leak when mesh has no route
wifi: ath11k: Fix QCN9074 firmware boot on x86
spi: stm32: fix stm32_spi_prepare_mbr() that halves spi clk for every run
selftests/bpf: Add verifier test for release_reference()
Revert "net: macsec: report real_dev features when HW offloading is enabled"
platform/x86: ideapad-laptop: Disable touchpad_switch
platform/x86: touchscreen_dmi: Add info for the RCA Cambio W101 v2 2-in-1
platform/x86/intel/pmt: Sapphire Rapids PMT errata fix
platform/x86/intel/hid: Add some ACPI device IDs
scsi: ibmvfc: Avoid path failures during live migration
scsi: scsi_debug: Make the READ CAPACITY response compliant with ZBC
drm: panel-orientation-quirks: Add quirk for Acer Switch V 10 (SW5-017)
block, bfq: fix null pointer dereference in bfq_bio_bfqg()
arm64/syscall: Include asm/ptrace.h in syscall_wrapper header.
nvmet: fix memory leak in nvmet_subsys_attr_model_store_locked
Revert "drm/amdgpu: Revert "drm/amdgpu: getting fan speed pwm for vega10 properly""
ALSA: usb-audio: add quirk to fix Hamedal C20 disconnect issue
RISC-V: vdso: Do not add missing symbols to version section in linker script
MIPS: pic32: treat port as signed integer
xfrm: fix "disable_policy" on ipv4 early demux
xfrm: replay: Fix ESN wrap around for GSO
af_key: Fix send_acquire race with pfkey_register
ARM: dts: am335x-pcm-953: Define fixed regulators in root node
ASoC: hdac_hda: fix hda pcm buffer overflow issue
ASoC: sgtl5000: Reset the CHIP_CLK_CTRL reg on remove
ASoC: soc-pcm: Don't zero TDM masks in __soc_pcm_open()
x86/hyperv: Restore VP assist page after cpu offlining/onlining
scsi: storvsc: Fix handling of srb_status and capacity change events
ASoC: max98373: Add checks for devm_kcalloc
regulator: core: fix kobject release warning and memory leak in regulator_register()
spi: dw-dma: decrease reference count in dw_spi_dma_init_mfld()
regulator: core: fix UAF in destroy_regulator()
bus: sunxi-rsb: Remove the shutdown callback
bus: sunxi-rsb: Support atomic transfers
tee: optee: fix possible memory leak in optee_register_device()
ARM: dts: at91: sam9g20ek: enable udc vbus gpio pinctrl
selftests: mptcp: more stable simult_flows tests
selftests: mptcp: fix mibit vs mbit mix up
net: liquidio: simplify if expression
rxrpc: Allow list of in-use local UDP endpoints to be viewed in /proc
rxrpc: Use refcount_t rather than atomic_t
rxrpc: Fix race between conn bundle lookup and bundle removal [ZDI-CAN-15975]
net: dsa: sja1105: disallow C45 transactions on the BASE-TX MDIO bus
nfc/nci: fix race with opening and closing
net: pch_gbe: fix potential memleak in pch_gbe_tx_queue()
9p/fd: fix issue of list_del corruption in p9_fd_cancel()
netfilter: conntrack: Fix data-races around ct mark
netfilter: nf_tables: do not set up extensions for end interval
iavf: Fix a crash during reset task
iavf: Do not restart Tx queues after reset task failure
iavf: Fix race condition between iavf_shutdown and iavf_remove
ARM: mxs: fix memory leak in mxs_machine_init()
ARM: dts: imx6q-prti6q: Fix ref/tcxo-clock-frequency properties
net: ethernet: mtk_eth_soc: fix error handling in mtk_open()
net/mlx4: Check retval of mlx4_bitmap_init
net: mvpp2: fix possible invalid pointer dereference
net/qla3xxx: fix potential memleak in ql3xxx_send()
octeontx2-af: debugsfs: fix pci device refcount leak
net: pch_gbe: fix pci device refcount leak while module exiting
nfp: fill splittable of devlink_port_attrs correctly
nfp: add port from netdev validation for EEPROM access
macsec: Fix invalid error code set
Drivers: hv: vmbus: fix double free in the error path of vmbus_add_channel_work()
Drivers: hv: vmbus: fix possible memory leak in vmbus_device_register()
netfilter: ipset: regression in ip_set_hash_ip.c
net/mlx5: Do not query pci info while pci disabled
net/mlx5: Fix FW tracer timestamp calculation
net/mlx5: Fix handling of entry refcount when command is not issued to FW
tipc: set con sock in tipc_conn_alloc
tipc: add an extra conn_get in tipc_conn_alloc
tipc: check skb_linearize() return value in tipc_disc_rcv()
xfrm: Fix oops in __xfrm_state_delete()
xfrm: Fix ignored return value in xfrm6_init()
net: wwan: iosm: use ACPI_FREE() but not kfree() in ipc_pcie_read_bios_cfg()
sfc: fix potential memleak in __ef100_hard_start_xmit()
net: sparx5: fix error handling in sparx5_port_open()
net: sched: allow act_ct to be built without NF_NAT
NFC: nci: fix memory leak in nci_rx_data_packet()
regulator: twl6030: re-add TWL6032_SUBCLASS
bnx2x: fix pci device refcount leak in bnx2x_vf_is_pcie_pending()
dma-buf: fix racing conflict of dma_heap_add()
netfilter: ipset: restore allowing 64 clashing elements in hash:net,iface
netfilter: flowtable_offload: add missing locking
fs: do not update freeing inode i_io_list
dccp/tcp: Reset saddr on failure after inet6?_hash_connect().
ipv4: Fix error return code in fib_table_insert()
arcnet: fix potential memory leak in com20020_probe()
s390/dasd: fix no record found for raw_track_access
nfc: st-nci: fix incorrect validating logic in EVT_TRANSACTION
nfc: st-nci: fix memory leaks in EVT_TRANSACTION
nfc: st-nci: fix incorrect sizing calculations in EVT_TRANSACTION
net: enetc: manage ENETC_F_QBV in priv->active_offloads only when enabled
net: enetc: cache accesses to &priv->si->hw
net: enetc: preserve TX ring priority across reconfiguration
octeontx2-pf: Add check for devm_kcalloc
octeontx2-af: Fix reference count issue in rvu_sdp_init()
net: thunderx: Fix the ACPI memory leak
s390/crashdump: fix TOD programmable field size
lib/vdso: use "grep -E" instead of "egrep"
init/Kconfig: fix CC_HAS_ASM_GOTO_TIED_OUTPUT test with dash
nios2: add FORCE for vmlinuz.gz
mmc: sdhci-brcmstb: Re-organize flags
mmc: sdhci-brcmstb: Enable Clock Gating to save power
mmc: sdhci-brcmstb: Fix SDHCI_RESET_ALL for CQHCI
KVM: arm64: pkvm: Fixup boot mode to reflect that the kernel resumes from EL1
usb: dwc3: exynos: Fix remove() function
usb: cdnsp: Fix issue with Clear Feature Halt Endpoint
usb: cdnsp: fix issue with ZLP - added TD_SIZE = 1
ext4: fix use-after-free in ext4_ext_shift_extents
arm64: dts: rockchip: lower rk3399-puma-haikou SD controller clock frequency
iio: light: apds9960: fix wrong register for gesture gain
iio: core: Fix entry not deleted when iio_register_sw_trigger_type() fails
bus: ixp4xx: Don't touch bit 7 on IXP42x
usb: dwc3: gadget: conditionally remove requests
usb: dwc3: gadget: Return -ESHUTDOWN on ep disable
usb: dwc3: gadget: Clear ep descriptor last
nilfs2: fix nilfs_sufile_mark_dirty() not set segment usage as dirty
gcov: clang: fix the buffer overflow issue
mm: vmscan: fix extreme overreclaim and swap floods
KVM: x86: nSVM: leave nested mode on vCPU free
KVM: x86: forcibly leave nested mode on vCPU reset
KVM: x86: nSVM: harden svm_free_nested against freeing vmcb02 while still in use
KVM: x86: add kvm_leave_nested
KVM: x86: remove exit_int_info warning in svm_handle_exit
x86/tsx: Add a feature bit for TSX control MSR support
x86/pm: Add enumeration check before spec MSRs save/restore setup
x86/ioremap: Fix page aligned size calculation in __ioremap_caller()
Input: synaptics - switch touchpad on HP Laptop 15-da3001TU to RMI mode
ASoC: Intel: bytcht_es8316: Add quirk for the Nanote UMPC-01
tools: iio: iio_generic_buffer: Fix read size
serial: 8250: 8250_omap: Avoid RS485 RTS glitch on ->set_termios()
Input: goodix - try resetting the controller when no config is set
Input: soc_button_array - add use_low_level_irq module parameter
Input: soc_button_array - add Acer Switch V 10 to dmi_use_low_level_irq[]
Input: i8042 - apply probe defer to more ASUS ZenBook models
ASoC: stm32: dfsdm: manage cb buffers cleanup
xen-pciback: Allow setting PCI_MSIX_FLAGS_MASKALL too
xen/platform-pci: add missing free_irq() in error path
platform/x86: asus-wmi: add missing pci_dev_put() in asus_wmi_set_xusb2pr()
platform/x86: acer-wmi: Enable SW_TABLET_MODE on Switch V 10 (SW5-017)
drm/amdgpu: disable BACO support on more cards
zonefs: fix zone report size in __zonefs_io_error()
platform/x86: hp-wmi: Ignore Smart Experience App event
platform/x86: ideapad-laptop: Fix interrupt storm on fn-lock toggle on some Yoga laptops
tcp: configurable source port perturb table size
net: usb: qmi_wwan: add Telit 0x103a composition
scsi: iscsi: Fix possible memory leak when device_register() failed
gpu: host1x: Avoid trying to use GART on Tegra20
dm integrity: flush the journal on suspend
dm integrity: clear the journal on suspend
fuse: lock inode unconditionally in fuse_fallocate()
wifi: wilc1000: validate pairwise and authentication suite offsets
wifi: wilc1000: validate length of IEEE80211_P2P_ATTR_OPER_CHANNEL attribute
wifi: wilc1000: validate length of IEEE80211_P2P_ATTR_CHANNEL_LIST attribute
wifi: wilc1000: validate number of channels
genirq/msi: Shutdown managed interrupts with unsatifiable affinities
genirq: Always limit the affinity to online CPUs
irqchip/gic-v3: Always trust the managed affinity provided by the core code
genirq: Take the proposed affinity at face value if force==true
btrfs: free btrfs_path before copying root refs to userspace
btrfs: free btrfs_path before copying fspath to userspace
btrfs: free btrfs_path before copying subvol info to userspace
btrfs: zoned: fix missing endianness conversion in sb_write_pointer
btrfs: use kvcalloc in btrfs_get_dev_zone_info
btrfs: sysfs: normalize the error handling branch in btrfs_init_sysfs()
drm/amd/dc/dce120: Fix audio register mapping, stop triggering KASAN
drm/amd/display: No display after resume from WB/CB
drm/amdgpu: Enable Aldebaran devices to report CU Occupancy
drm/amdgpu: always register an MMU notifier for userptr
drm/i915: fix TLB invalidation for Gen12 video and compute engines
cifs: fix missed refcounting of ipc tcon
Linux 5.15.81
Change-Id: I9fe677a5bc83d3484f86aebbe20dd129c7ca3a42
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
From: Marc Zyngier <maz@kernel.org>
commit c48c8b829d2b966a6649827426bcdba082ccf922 upstream.
Although setting the affinity of an interrupt to a set of CPUs that doesn't
have any online CPU is generally frowned apon, there are a few limited
cases where such affinity is set from a CPUHP notifier, setting the
affinity to a CPU that isn't online yet.
The saving grace is that this is always done using the 'force' attribute,
which gives a hint that the affinity setting can be outside of the online
CPU mask and the callsite set this flag with the knowledge that the
underlying interrupt controller knows to handle it.
This restores the expected behaviour on Marek's system.
Fixes: 33de0aa4bae9 ("genirq: Always limit the affinity to online CPUs")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/4b7fc13c-887b-a664-26e8-45aed13f048a@samsung.com
Link: https://lore.kernel.org/r/20220414140011.541725-1-maz@kernel.org
Signed-off-by: Luiz Capitulino <luizcap@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
From: Marc Zyngier <maz@kernel.org>
commit 33de0aa4bae982ed6f7c777f86b5af3e627ac937 upstream.
[ Fixed small conflicts due to the HK_FLAG_MANAGED_IRQ flag been
renamed on upstream ]
When booting with maxcpus=<small number> (or even loading a driver
while most CPUs are offline), it is pretty easy to observe managed
affinities containing a mix of online and offline CPUs being passed
to the irqchip driver.
This means that the irqchip cannot trust the affinity passed down
from the core code, which is a bit annoying and requires (at least
in theory) all drivers to implement some sort of affinity narrowing.
In order to address this, always limit the cpumask to the set of
online CPUs.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220405185040.206297-3-maz@kernel.org
Signed-off-by: Luiz Capitulino <luizcap@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 5.15.39
MIPS: Fix CP0 counter erratum detection for R4k CPUs
parisc: Merge model and model name into one line in /proc/cpuinfo
ALSA: hda/realtek: Add quirk for Yoga Duet 7 13ITL6 speakers
ALSA: fireworks: fix wrong return count shorter than expected by 4 bytes
mmc: sdhci-msm: Reset GCC_SDCC_BCR register for SDHC
mmc: sunxi-mmc: Fix DMA descriptors allocated above 32 bits
mmc: core: Set HS clock speed before sending HS CMD13
gpiolib: of: fix bounds check for 'gpio-reserved-ranges'
x86/fpu: Prevent FPU state corruption
KVM: x86/svm: Account for family 17h event renumberings in amd_pmc_perf_hw_id
iommu/vt-d: Calculate mask for non-aligned flushes
iommu/arm-smmu-v3: Fix size calculation in arm_smmu_mm_invalidate_range()
drm/amd/display: Avoid reading audio pattern past AUDIO_CHANNELS_COUNT
drm/amdgpu: do not use passthrough mode in Xen dom0
RISC-V: relocate DTB if it's outside memory region
Revert "SUNRPC: attempt AF_LOCAL connect on setup"
timekeeping: Mark NMI safe time accessors as notrace
firewire: fix potential uaf in outbound_phy_packet_callback()
firewire: remove check of list iterator against head past the loop body
firewire: core: extend card->lock in fw_core_handle_bus_reset
net: stmmac: disable Split Header (SPH) for Intel platforms
genirq: Synchronize interrupt thread startup
ASoC: da7219: Fix change notifications for tone generator frequency
ASoC: wm8958: Fix change notifications for DSP controls
ASoC: meson: Fix event generation for AUI ACODEC mux
ASoC: meson: Fix event generation for G12A tohdmi mux
ASoC: meson: Fix event generation for AUI CODEC mux
s390/dasd: fix data corruption for ESE devices
s390/dasd: prevent double format of tracks for ESE devices
s390/dasd: Fix read for ESE with blksize < 4k
s390/dasd: Fix read inconsistency for ESE DASD devices
can: grcan: grcan_close(): fix deadlock
can: isotp: remove re-binding of bound socket
can: grcan: use ofdev->dev when allocating DMA memory
can: grcan: grcan_probe(): fix broken system id check for errata workaround needs
can: grcan: only use the NAPI poll budget for RX
nfc: replace improper check device_is_registered() in netlink related functions
nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs
NFC: netlink: fix sleep in atomic bug when firmware download timeout
gpio: visconti: Fix fwnode of GPIO IRQ
gpio: pca953x: fix irq_stat not updated when irq is disabled (irq_mask not set)
hwmon: (adt7470) Fix warning on module removal
hwmon: (pmbus) disable PEC if not enabled
ASoC: dmaengine: Restore NULL prepare_slave_config() callback
ASoC: soc-ops: fix error handling
iommu/vt-d: Drop stop marker messages
iommu/dart: check return value after calling platform_get_resource()
net/mlx5e: Fix trust state reset in reload
net/mlx5e: Don't match double-vlan packets if cvlan is not set
net/mlx5e: CT: Fix queued up restore put() executing after relevant ft release
net/mlx5e: Fix the calling of update_buffer_lossy() API
net/mlx5: Avoid double clear or set of sync reset requested
net/mlx5: Fix deadlock in sync reset flow
selftests/seccomp: Don't call read() on TTY from background pgrp
SUNRPC release the transport of a relocated task with an assigned transport
RDMA/siw: Fix a condition race issue in MPA request processing
RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state
RDMA/irdma: Reduce iWARP QP destroy time
RDMA/irdma: Fix possible crash due to NULL netdev in notifier
NFSv4: Don't invalidate inode attributes on delegation return
net: ethernet: mediatek: add missing of_node_put() in mtk_sgmii_init()
net: dsa: mt7530: add missing of_node_put() in mt7530_setup()
net: stmmac: dwmac-sun8i: add missing of_node_put() in sun8i_dwmac_register_mdio_mux()
net: mdio: Fix ENOMEM return value in BCM6368 mux bus controller
net: cpsw: add missing of_node_put() in cpsw_probe_dt()
net: igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter()
net: emaclite: Add error handling for of_address_to_resource()
selftests/net: so_txtime: fix parsing of start time stamp on 32 bit systems
selftests/net: so_txtime: usage(): fix documentation of default clock
drm/msm/dp: remove fail safe mode related code
btrfs: do not BUG_ON() on failure to update inode when setting xattr
hinic: fix bug of wq out of bound access
mld: respect RCU rules in ip6_mc_source() and ip6_mc_msfilter()
rxrpc: Enable IPv6 checksums on transport socket
selftests: mirror_gre_bridge_1q: Avoid changing PVID while interface is operational
bnxt_en: Fix possible bnxt_open() failure caused by wrong RFS flag
bnxt_en: Fix unnecessary dropping of RX packets
selftests: ocelot: tc_flower_chains: specify conform-exceed action for policer
smsc911x: allow using IRQ0
btrfs: force v2 space cache usage for subpage mount
btrfs: always log symlinks in full mode
drm/amdgpu: unify BO evicting method in amdgpu_ttm
drm/amdgpu: explicitly check for s0ix when evicting resources
drm/amdgpu: don't set s3 and s0ix at the same time
drm/amdgpu: Ensure HDA function is suspended before ASIC reset
gpio: mvebu: drop pwm base assignment
kvm: x86/cpuid: Only provide CPUID leaf 0xA if host has architectural PMU
fbdev: Make fb_release() return -ENODEV if fbdev was unregistered
net/mlx5: Fix slab-out-of-bounds while reading resource dump menu
net/mlx5e: Lag, Fix use-after-free in fib event handler
net/mlx5e: Lag, Fix fib_info pointer assignment
net/mlx5e: Lag, Don't skip fib events on current dst
iommu/dart: Add missing module owner to ops structure
kvm: selftests: do not use bitfields larger than 32-bits for PTEs
KVM: selftests: Silence compiler warning in the kvm_page_table_test
x86/kvm: Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume
KVM: x86: Do not change ICR on write to APIC_SELF_IPI
KVM: x86/mmu: avoid NULL-pointer dereference on page freeing bugs
KVM: LAPIC: Enable timer posted-interrupt only when mwait/hlt is advertised
selftest/vm: verify mmap addr in mremap_test
selftest/vm: verify remap destination address in mremap_test
mmc: rtsx: add 74 Clocks in power on flow
Revert "parisc: Mark sched_clock unstable only if clocks are not syncronized"
rcu: Fix callbacks processing time limit retaining cond_resched()
rcu: Apply callbacks processing time limit only on softirq
PCI: pci-bridge-emul: Add description for class_revision field
PCI: pci-bridge-emul: Add definitions for missing capabilities registers
PCI: aardvark: Add support for DEVCAP2, DEVCTL2, LNKCAP2 and LNKCTL2 registers on emulated bridge
PCI: aardvark: Clear all MSIs at setup
PCI: aardvark: Comment actions in driver remove method
PCI: aardvark: Disable bus mastering when unbinding driver
PCI: aardvark: Mask all interrupts when unbinding driver
PCI: aardvark: Fix memory leak in driver unbind
PCI: aardvark: Assert PERST# when unbinding driver
PCI: aardvark: Disable link training when unbinding driver
PCI: aardvark: Disable common PHY when unbinding driver
PCI: aardvark: Replace custom PCIE_CORE_INT_* macros with PCI_INTERRUPT_*
PCI: aardvark: Rewrite IRQ code to chained IRQ handler
PCI: aardvark: Check return value of generic_handle_domain_irq() when processing INTx IRQ
PCI: aardvark: Make MSI irq_chip structures static driver structures
PCI: aardvark: Make msi_domain_info structure a static driver structure
PCI: aardvark: Use dev_fwnode() instead of of_node_to_fwnode(dev->of_node)
PCI: aardvark: Refactor unmasking summary MSI interrupt
PCI: aardvark: Add support for masking MSI interrupts
PCI: aardvark: Fix setting MSI address
PCI: aardvark: Enable MSI-X support
PCI: aardvark: Add support for ERR interrupt on emulated bridge
PCI: aardvark: Optimize writing PCI_EXP_RTCTL_PMEIE and PCI_EXP_RTSTA_PME on emulated bridge
PCI: aardvark: Add support for PME interrupts
PCI: aardvark: Fix support for PME requester on emulated bridge
PCI: aardvark: Use separate INTA interrupt for emulated root bridge
PCI: aardvark: Remove irq_mask_ack() callback for INTx interrupts
PCI: aardvark: Don't mask irq when mapping
PCI: aardvark: Drop __maybe_unused from advk_pcie_disable_phy()
PCI: aardvark: Update comment about link going down after link-up
Linux 5.15.39
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibd053f0980e7d7eb5659ec6cfa7689c6d9bc416e
commit 8707898e22fd665bc1d7b18b809be4b56ce25bdd upstream.
A kernel hang can be observed when running setserial in a loop on a kernel
with force threaded interrupts. The sequence of events is:
setserial
open("/dev/ttyXXX")
request_irq()
do_stuff()
-> serial interrupt
-> wake(irq_thread)
desc->threads_active++;
close()
free_irq()
kthread_stop(irq_thread)
synchronize_irq() <- hangs because desc->threads_active != 0
The thread is created in request_irq() and woken up, but does not get on a
CPU to reach the actual thread function, which would handle the pending
wake-up. kthread_stop() sets the should stop condition which makes the
thread immediately exit, which in turn leaves the stale threads_active
count around.
This problem was introduced with commit 519cc8652b, which addressed a
interrupt sharing issue in the PCIe code.
Before that commit free_irq() invoked synchronize_irq(), which waits for
the hard interrupt handler and also for associated threads to complete.
To address the PCIe issue synchronize_irq() was replaced with
__synchronize_hardirq(), which only waits for the hard interrupt handler to
complete, but not for threaded handlers.
This was done under the assumption, that the interrupt thread already
reached the thread function and waits for a wake-up, which is guaranteed to
be handled before acting on the stop condition. The problematic case, that
the thread would not reach the thread function, was obviously overlooked.
Make sure that the interrupt thread is really started and reaches
thread_fn() before returning from __setup_irq().
This utilizes the existing wait queue in the interrupt descriptor. The
wait queue is unused for non-shared interrupts. For shared interrupts the
usage might cause a spurious wake-up of a waiter in synchronize_irq() or the
completion of a threaded handler might cause a spurious wake-up of the
waiter for the ready flag. Both are harmless and have no functional impact.
[ tglx: Amended changelog ]
Fixes: 519cc8652b ("genirq: Synchronize only with single thread on free_irq()")
Signed-off-by: Thomas Pfaff <tpfaff@pcs.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/552fe7b4-9224-b183-bb87-a8f36d335690@pcs.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vendor kernel modules may implement irq balancers, which could
take irq desc lock of an irq and then based on current affinity
mask or affinity hint, reconfigure the affinity of that irq.
For example : For an irq, for which affinity is broken i.e. all
the cpus in its affinity mask have gone offline. For such irqs,
we might want to reset the affinity, when the original set of
affined cpus, come back online. desc->affinity_hint can be used
for figuring out the original affinity. So, the sequence for doing
this becomes:
desc = irq_to_desc(i);
raw_spin_lock(&desc->lock);
affinity = desc->affinity_hint;
raw_spin_unlock(&desc->lock);
irq_set_affinity_hint(i, affinity);
Here, we need to release the desc lock before calling the exported
api irq_set_affinity_hint(). This creates a window where, after
unlocking desc lock and before calling irq_set_affinity_hint(),
where this setting can race with other irq_set_affinity_hint()
callers. So, export irq_do_set_affinity() symbol to provide an
api, which can be called with desc lock held.
Bug: 187157600
Change-Id: Ifad88bfaa1e7eec09c3fe5a9dd7d1d421362b41e
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
(cherry picked from commit d88c1e77fd)
Perf modules abuse irq_set_affinity_hint() to set the affinity of system
PMU interrupts just because irq_set_affinity() was not exported.
The fact that irq_set_affinity_hint() actually sets the affinity is a
non-documented side effect and the name is clearly saying it's a hint.
To clean this up, export the real affinity setter.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210518093117.968251441@linutronix.de
The whole call to note_interrupt() can be avoided or return early when
interrupts would be marked accordingly. For IPI handlers which always
return HANDLED the whole procedure is pretty pointless to begin with.
Add a IRQF_NO_DEBUG flag and mark the interrupt accordingly if supplied
when the interrupt is requested.
When noirqdebug is set on the kernel commandline, then the interrupt is
marked unconditionally so that there is only one condition in the hotpath
to evaluate.
[ clg: Add changelog ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/7a8ad02f-63a8-c1aa-fdd1-39d973593d02@kaod.org
Pull irq updates from Thomas Gleixner:
"The usual updates from the irq departement:
Core changes:
- Provide IRQF_NO_AUTOEN as a flag for request*_irq() so drivers can
be cleaned up which either use a seperate mechanism to prevent
auto-enable at request time or have a racy mechanism which disables
the interrupt right after request.
- Get rid of the last usage of irq_create_identity_mapping() and
remove the interface.
- An overhaul of tasklet_disable().
Most usage sites of tasklet_disable() are in task context and
usually in cleanup, teardown code pathes. tasklet_disable()
spinwaits for a tasklet which is currently executed. That's not
only a problem for PREEMPT_RT where this can lead to a live lock
when the disabling task preempts the softirq thread. It's also
problematic in context of virtualization when the vCPU which runs
the tasklet is scheduled out and the disabling code has to spin
wait until it's scheduled back in.
There are a few code pathes which invoke tasklet_disable() from
non-sleepable context. For these a new disable variant which still
spinwaits is provided which allows to switch tasklet_disable() to a
sleep wait mechanism. For the atomic use cases this does not solve
the live lock issue on PREEMPT_RT. That is mitigated by blocking on
the RT specific softirq lock.
- The PREEMPT_RT specific implementation of softirq processing and
local_bh_disable/enable().
On RT enabled kernels soft interrupt processing happens always in
task context and all interrupt handlers, which are not explicitly
marked to be invoked in hard interrupt context are forced into task
context as well. This allows to protect against softirq processing
with a per CPU lock, which in turn allows to make BH disabled
regions preemptible.
Most of the softirq handling code is still shared. The RT/non-RT
specific differences are addressed with a set of inline functions
which provide the context specific functionality. The
local_bh_disable() / local_bh_enable() mechanism are obviously
seperate.
- The usual set of small improvements and cleanups
Driver changes:
- New drivers for Nuvoton WPCM450 and DT 79rc3243x interrupt
controllers
- Extended functionality for MStar, STM32 and SC7280 irq chips
- Enhanced robustness for ARM GICv3/4.1 drivers
- The usual set of cleanups and improvements all over the place"
* tag 'irq-core-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
irqchip/xilinx: Expose Kconfig option for Zynq/ZynqMP
irqchip/gic-v3: Do not enable irqs when handling spurious interrups
dt-bindings: interrupt-controller: Add IDT 79RC3243x Interrupt Controller
irqchip: Add support for IDT 79rc3243x interrupt controller
irqdomain: Drop references to recusive irqdomain setup
irqdomain: Get rid of irq_create_strict_mappings()
irqchip/jcore-aic: Kill use of irq_create_strict_mappings()
ARM: PXA: Kill use of irq_create_strict_mappings()
irqchip/gic-v4.1: Disable vSGI upon (GIC CPUIF < v4.1) detection
irqchip/tb10x: Use 'fallthrough' to eliminate a warning
genirq: Reduce irqdebug cacheline bouncing
kernel: Initialize cpumask before parsing
irqchip/wpcm450: Drop COMPILE_TEST
irqchip/irq-mst: Support polarity configuration
irqchip: Add driver for WPCM450 interrupt controller
dt-bindings: interrupt-controller: Add nuvoton, wpcm450-aic
dt-bindings: qcom,pdc: Add compatible for sc7280
irqchip/stm32: Add usart instances exti direct event support
irqchip/gic-v3: Fix OF_BAD_ADDR error handling
irqchip/sifive-plic: Mark two global variables __ro_after_init
...
With interrupt force threading all device interrupt handlers are invoked
from kernel threads. Contrary to hard interrupt context the invocation only
disables bottom halfs, but not interrupts. This was an oversight back then
because any code like this will have an issue:
thread(irq_A)
irq_handler(A)
spin_lock(&foo->lock);
interrupt(irq_B)
irq_handler(B)
spin_lock(&foo->lock);
This has been triggered with networking (NAPI vs. hrtimers) and console
drivers where printk() happens from an interrupt which interrupted the
force threaded handler.
Now people noticed and started to change the spin_lock() in the handler to
spin_lock_irqsave() which affects performance or add IRQF_NOTHREAD to the
interrupt request which in turn breaks RT.
Fix the root cause and not the symptom and disable interrupts before
invoking the force threaded handler which preserves the regular semantics
and the usefulness of the interrupt force threading as a general debugging
tool.
For not RT this is not changing much, except that during the execution of
the threaded handler interrupts are delayed until the handler
returns. Vs. scheduling and softirq processing there is no difference.
For RT kernels there is no issue.
Fixes: 8d32a307e4 ("genirq: Provide forced interrupt threading")
Reported-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Johan Hovold <johan@kernel.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20210317143859.513307808@linutronix.de
Many drivers don't want interrupts enabled automatically via request_irq().
So they are handling this issue by either way of the below two:
(1)
irq_set_status_flags(irq, IRQ_NOAUTOEN);
request_irq(dev, irq...);
(2)
request_irq(dev, irq...);
disable_irq(irq);
The code in the second way is silly and unsafe. In the small time gap
between request_irq() and disable_irq(), interrupts can still come.
The code in the first way is safe though it's subobtimal.
Add a new IRQF_NO_AUTOEN flag which can be handed in by drivers to
request_irq() and request_nmi(). It prevents the automatic enabling of the
requested interrupt/nmi in the same safe way as #1 above. With that the
various usage sites of #1 and #2 above can be simplified and corrected.
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: dmitry.torokhov@gmail.com
Link: https://lore.kernel.org/r/20210302224916.13980-2-song.bao.hua@hisilicon.com
This function uses irq_to_desc() and is going to be used by modules to
replace the open coded irq_to_desc() (ab)usage. The final goal is to remove
the export of irq_to_desc() so driver cannot fiddle with it anymore.
Move it into the core code and fixup the usage sites to include the proper
header.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201210194042.548936472@linutronix.de
Add a function to allow the affinity of an interrupt be switched to
managed, such that interrupts allocated for platform devices may be
managed.
This new interface has certain limitations, and attempts to use it in the
following circumstances will fail:
- For when the kernel is configured for generic IRQ reservation mode (in
config GENERIC_IRQ_RESERVATION_MODE). The reason being that it could
conflict with managed vs. non-managed interrupt accounting.
- The interrupt is already started, which should not be the case during
init
- The interrupt is already configured as managed, which means double init
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1606905417-183214-2-git-send-email-john.garry@huawei.com
A previous commit changed the notification mode from true/false to an
int, allowing notify-no, notify-yes, or signal-notify. This was
backwards compatible in the sense that any existing true/false user
would translate to either 0 (on notification sent) or 1, the latter
which mapped to TWA_RESUME. TWA_SIGNAL was assigned a value of 2.
Clean this up properly, and define a proper enum for the notification
mode. Now we have:
- TWA_NONE. This is 0, same as before the original change, meaning no
notification requested.
- TWA_RESUME. This is 1, same as before the original change, meaning
that we use TIF_NOTIFY_RESUME.
- TWA_SIGNAL. This uses TIF_SIGPENDING/JOBCTL_TASK_WORK for the
notification.
Clean up all the callers, switching their 0/1/false/true to using the
appropriate TWA_* mode for notifications.
Fixes: e91b481623 ("task_work: teach task_work_add() to do signal_wake_up()")
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull sched/fifo updates from Ingo Molnar:
"This adds the sched_set_fifo*() encapsulation APIs to remove static
priority level knowledge from non-scheduler code.
The three APIs for non-scheduler code to set SCHED_FIFO are:
- sched_set_fifo()
- sched_set_fifo_low()
- sched_set_normal()
These are two FIFO priority levels: default (high), and a 'low'
priority level, plus sched_set_normal() to set the policy back to
non-SCHED_FIFO.
Since the changes affect a lot of non-scheduler code, we kept this in
a separate tree"
* tag 'sched-fifo-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
sched,tracing: Convert to sched_set_fifo()
sched: Remove sched_set_*() return value
sched: Remove sched_setscheduler*() EXPORTs
sched,psi: Convert to sched_set_fifo_low()
sched,rcutorture: Convert to sched_set_fifo_low()
sched,rcuperf: Convert to sched_set_fifo_low()
sched,locktorture: Convert to sched_set_fifo()
sched,irq: Convert to sched_set_fifo()
sched,watchdog: Convert to sched_set_fifo()
sched,serial: Convert to sched_set_fifo()
sched,powerclamp: Convert to sched_set_fifo()
sched,ion: Convert to sched_set_normal()
sched,powercap: Convert to sched_set_fifo*()
sched,spi: Convert to sched_set_fifo*()
sched,mmc: Convert to sched_set_fifo*()
sched,ivtv: Convert to sched_set_fifo*()
sched,drm/scheduler: Convert to sched_set_fifo*()
sched,msm: Convert to sched_set_fifo*()
sched,psci: Convert to sched_set_fifo*()
sched,drbd: Convert to sched_set_fifo*()
...
John reported that on a RK3288 system the perf per CPU interrupts are all
affine to CPU0 and provided the analysis:
"It looks like what happens is that because the interrupts are not per-CPU
in the hardware, armpmu_request_irq() calls irq_force_affinity() while
the interrupt is deactivated and then request_irq() with IRQF_PERCPU |
IRQF_NOBALANCING.
Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls
irq_setup_affinity() which returns early because IRQF_PERCPU and
IRQF_NOBALANCING are set, leaving the interrupt on its original CPU."
This was broken by the recent commit which blocked interrupt affinity
setting in hardware before activation of the interrupt. While this works in
general, it does not work for this particular case. As contrary to the
initial analysis not all interrupt chip drivers implement an activate
callback, the safe cure is to make the deferred interrupt affinity setting
at activation time opt-in.
Implement the necessary core logic and make the two irqchip implementations
for which this is required opt-in. In hindsight this would have been the
right thing to do, but ...
Fixes: baedb87d1b ("genirq/affinity: Handle affinity setting on inactive interrupts correctly")
Reported-by: John Keeping <john@metanate.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/87blk4tzgm.fsf@nanos.tec.linutronix.de
Setting interrupt affinity on inactive interrupts is inconsistent when
hierarchical irq domains are enabled. The core code should just store the
affinity and not call into the irq chip driver for inactive interrupts
because the chip drivers may not be in a state to handle such requests.
X86 has a hacky workaround for that but all other irq chips have not which
causes problems e.g. on GIC V3 ITS.
Instead of adding more ugly hacks all over the place, solve the problem in
the core code. If the affinity is set on an inactive interrupt then:
- Store it in the irq descriptors affinity mask
- Update the effective affinity to reflect that so user space has
a consistent view
- Don't call into the irq chip driver
This is the core equivalent of the X86 workaround and works correctly
because the affinity setting is established in the irq chip when the
interrupt is activated later on.
Note, that this is only effective when hierarchical irq domains are enabled
by the architecture. Doing it unconditionally would break legacy irq chip
implementations.
For hierarchial irq domains this works correctly as none of the drivers can
have a dependency on affinity setting in inactive state by design.
Remove the X86 workaround as it is not longer required.
Fixes: 02edee152d ("x86/apic/vector: Ignore set_affinity call for inactive interrupts")
Reported-by: Ali Saidi <alisaidi@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Ali Saidi <alisaidi@amazon.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200529015501.15771-1-alisaidi@amazon.com
Link: https://lkml.kernel.org/r/877dv2rv25.fsf@nanos.tec.linutronix.de
Because SCHED_FIFO is a broken scheduler model (see previous patches)
take away the priority field, the kernel can't possibly make an
informed decision.
Effectively no change.
Cc: tglx@linutronix.de
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
irq_data_get_irq_chip() can return NULL, however it is expected that this
never happens. If a buggy driver leads to NULL being returned from
irq_data_get_irq_chip(), warn about it instead of crashing the machine.
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
To: linux-arm-kernel@lists.infradead.org
The handling of notify->work did not properly maintain notify->kref in two
cases:
1) where the work was already scheduled, another irq_set_affinity_locked()
would get the ref and (no-op-ly) schedule the work. Thus when
irq_affinity_notify() ran, it would drop the original ref but not the
additional one.
2) when cancelling the (old) work in irq_set_affinity_notifier(), if there
was outstanding work a ref had been got for it but was never put.
Fix both by checking the return values of the work handling functions
(schedule_work() for (1) and cancel_work_sync() for (2)) and put the
extra ref if the return value indicates preexisting work.
Fixes: cd7eab44e9 ("genirq: Add IRQ affinity notifiers")
Fixes: 59c39840f5 ("genirq: Prevent use-after-free and work list corruption")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ben Hutchings <ben@decadent.org.uk>
Link: https://lkml.kernel.org/r/24f5983f-2ab5-e83a-44ee-a45b5f9300f5@solarflare.com
Qian Cai reported that the WARN_ON() in the x86/msi affinity setting code,
which catches cases where the affinity setting is not done on the CPU which
is the current target of the interrupt, triggers during CPU hotplug stress
testing.
It turns out that the warning which was added with the commit addressing
the MSI affinity race unearthed yet another long standing bug.
If user space writes a bogus affinity mask, i.e. it contains no online CPUs,
then it calls irq_select_affinity_usr(). This was introduced for ALPHA in
eee45269b0 ("[PATCH] Alpha: convert to generic irq framework (generic part)")
and subsequently made available for all architectures in
1840475676 ("genirq: Expose default irq affinity mask (take 3)")
which introduced the circumvention of the affinity setting restrictions for
interrupt which cannot be moved in process context.
The whole exercise is bogus in various aspects:
1) If the interrupt is already started up then there is absolutely
no point to honour a bogus interrupt affinity setting from user
space. The interrupt is already assigned to an online CPU and it
does not make any sense to reassign it to some other randomly
chosen online CPU.
2) If the interupt is not yet started up then there is no point
either. A subsequent startup of the interrupt will invoke
irq_setup_affinity() anyway which will chose a valid target CPU.
So the only correct solution is to just return -EINVAL in case user space
wrote an affinity mask which does not contain any online CPUs, except for
ALPHA which has it's own magic sauce for this.
Fixes: 1840475676 ("genirq: Expose default irq affinity mask (take 3)")
Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Qian Cai <cai@lca.pw>
Link: https://lkml.kernel.org/r/878sl8xdbm.fsf@nanos.tec.linutronix.de
There's some confusion around if an irq that's disabled with disable_irq()
can still wake the system from sleep states such as "suspend to RAM".
Clarify this in the kernel documentation for irq_set_irq_wake() so that
it's clear that an irq can be disabled and still wake the system if it has
been marked for wakeup.
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lkml.kernel.org/r/20200206191521.94559-1-swboyd@chromium.org
The affinity of managed interrupts is completely handled in the kernel and
cannot be changed via the /proc/irq/* interfaces from user space. As the
kernel tries to spread out interrupts evenly accross CPUs on x86 to prevent
vector exhaustion, it can happen that a managed interrupt whose affinity
mask contains both isolated and housekeeping CPUs is routed to an isolated
CPU. As a consequence IO submitted on a housekeeping CPU causes interrupts
on the isolated CPU.
Add a new sub-parameter 'managed_irq' for 'isolcpus' and the corresponding
logic in the interrupt affinity selection code.
The subparameter indicates to the interrupt affinity selection logic that
it should try to avoid the above scenario.
This isolation is best effort and only effective if the automatically
assigned interrupt mask of a device queue contains isolated and
housekeeping CPUs. If housekeeping CPUs are online then such interrupts are
directed to the housekeeping CPU so that IO submitted on the housekeeping
CPU cannot disturb the isolated CPU.
If a queue's affinity mask contains only isolated CPUs then this parameter
has no effect on the interrupt routing decision, though interrupts are only
happening when tasks running on those isolated CPUs submit IO. IO submitted
on housekeeping CPUs has no influence on those queues.
If the affinity mask contains both housekeeping and isolated CPUs, but none
of the contained housekeeping CPUs is online, then the interrupt is also
routed to an isolated CPU. Interrupts are only delivered when one of the
isolated CPUs in the affinity mask submits IO. If one of the contained
housekeeping CPUs comes online, the CPU hotplug logic migrates the
interrupt automatically back to the upcoming housekeeping CPU. Depending on
the type of interrupt controller, this can require that at least one
interrupt is delivered to the isolated CPU in order to complete the
migration.
[ tglx: Removed unused parameter, added and edited comments/documentation
and rephrased the changelog so it contains more details. ]
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200120091625.17912-1-ming.lei@redhat.com
Requesting a threaded IRQ with handler=NULL and !ONESHOT fails, but the
error message does not include the IRQ line name, which makes it harder to
find the offending driver.
Print the IRQ line name to clarify where the error comes from. Use the same
format as the other pr_err() above in the same function.
Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191105140854.27893-1-luca@lucaceresoli.net
Pull core irq updates from Thomas Gleixner:
"Updates from the irq departement:
- Update the interrupt spreading code so it handles numa node with
different CPU counts properly.
- A large overhaul of the ARM GiCv3 driver to support new PPI and SPI
ranges.
- Conversion of all alloc_fwnode() users to use physical addresses
instead of virtual addresses so the virtual addresses are not
leaked. The physical address is sufficient to identify the
associated interrupt chip.
- Add support for Marvel MMP3, Amlogic Meson SM1 interrupt chips.
- Enforce interrupt threading at compile time if RT is enabled.
- Small updates and improvements all over the place"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
irqchip/gic-v3-its: Fix LPI release for Multi-MSI devices
irqchip/uniphier-aidet: Use devm_platform_ioremap_resource()
irqdomain: Add the missing assignment of domain->fwnode for named fwnode
irqchip/mmp: Coexist with GIC root IRQ controller
irqchip/mmp: Mask off interrupts from other cores
irqchip/mmp: Add missing chained_irq_{enter,exit}()
irqchip/mmp: Do not use of_address_to_resource() to get mux regs
irqchip/meson-gpio: Add support for meson sm1 SoCs
dt-bindings: interrupt-controller: New binding for the meson sm1 SoCs
genirq/affinity: Remove const qualifier from node_to_cpumask argument
genirq/affinity: Spread vectors on node according to nr_cpu ratio
genirq/affinity: Improve __irq_build_affinity_masks()
irqchip: Remove dev_err() usage after platform_get_irq()
irqchip: Add include guard to irq-partition-percpu.h
irqchip/mmp: Do not call irq_set_default_host() on DT platforms
irqchip/gic-v3-its: Remove the redundant set_bit for lpi_map
irqchip/gic-v3: Add quirks for HIP06/07 invalid GICD_TYPER erratum 161010803
irqchip/gic: Skip DT quirks when evaluating IIDR-based quirks
irqchip/gic-v3: Warn about inconsistent implementations of extended ranges
irqchip/gic-v3: Add EPPI range support
...
Pull x96 apic updates from Thomas Gleixner:
"Updates for the x86 APIC interrupt handling and APIC timer:
- Fix a long standing issue with spurious interrupts which was caused
by the big vector management rework a few years ago. Robert Hodaszi
provided finally enough debug data and an excellent initial failure
analysis which allowed to understand the underlying issues.
This contains a change to the core interrupt management code which
is required to handle this correctly for the APIC/IO_APIC. The core
changes are NOOPs for most architectures except ARM64. ARM64 is not
impacted by the change as confirmed by Marc Zyngier.
- Newer systems allow to disable the PIT clock for power saving
causing panic in the timer interrupt delivery check of the IO/APIC
when the HPET timer is not enabled either. While the clock could be
turned on this would cause an endless whack a mole game to chase
the proper register in each affected chipset.
These systems provide the relevant frequencies for TSC, CPU and the
local APIC timer via CPUID and/or MSRs, which allows to avoid the
PIT/HPET based calibration. As the calibration code is the only
usage of the legacy timers on modern systems and is skipped anyway
when the frequencies are known already, there is no point in
setting up the PIT and actually checking for the interrupt delivery
via IO/APIC.
To achieve this on a wide variety of platforms, the CPUID/MSR based
frequency readout has been made more robust, which also allowed to
remove quite some workarounds which turned out to be not longer
required. Thanks to Daniel Drake for analysis, patches and
verification"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/irq: Seperate unused system vectors from spurious entry again
x86/irq: Handle spurious interrupt after shutdown gracefully
x86/ioapic: Implement irq_get_irqchip_state() callback
genirq: Add optional hardware synchronization for shutdown
genirq: Fix misleading synchronize_irq() documentation
genirq: Delay deactivation in free_irq()
x86/timer: Skip PIT initialization on modern chipsets
x86/apic: Use non-atomic operations when possible
x86/apic: Make apic_bsp_setup() static
x86/tsc: Set LAPIC timer period to crystal clock frequency
x86/apic: Rename 'lapic_timer_frequency' to 'lapic_timer_period'
x86/tsc: Use CPUID.0x16 to calculate missing crystal frequency
free_irq() ensures that no hardware interrupt handler is executing on a
different CPU before actually releasing resources and deactivating the
interrupt completely in a domain hierarchy.
But that does not catch the case where the interrupt is on flight at the
hardware level but not yet serviced by the target CPU. That creates an
interesing race condition:
CPU 0 CPU 1 IRQ CHIP
interrupt is raised
sent to CPU1
Unable to handle
immediately
(interrupts off,
deep idle delay)
mask()
...
free()
shutdown()
synchronize_irq()
release_resources()
do_IRQ()
-> resources are not available
That might be harmless and just trigger a spurious interrupt warning, but
some interrupt chips might get into a wedged state.
Utilize the existing irq_get_irqchip_state() callback for the
synchronization in free_irq().
synchronize_hardirq() is not using this mechanism as it might actually
deadlock unter certain conditions, e.g. when called with interrupts
disabled and the target CPU is the one on which the synchronization is
invoked. synchronize_irq() uses it because that function cannot be called
from non preemtible contexts as it might sleep.
No functional change intended and according to Marc the existing GIC
implementations where the driver supports the callback should be able
to cope with that core change. Famous last words.
Fixes: 464d12309e ("x86/vector: Switch IOAPIC to global reservation mode")
Reported-by: Robert Hodaszi <Robert.Hodaszi@digi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Link: https://lkml.kernel.org/r/20190628111440.279463375@linutronix.de
When interrupts are shutdown, they are immediately deactivated in the
irqdomain hierarchy. While this looks obviously correct there is a subtle
issue:
There might be an interrupt in flight when free_irq() is invoking the
shutdown. This is properly handled at the irq descriptor / primary handler
level, but the deactivation might completely disable resources which are
required to acknowledge the interrupt.
Split the shutdown code and deactivate the interrupt after synchronization
in free_irq(). Fixup all other usage sites where this is not an issue to
invoke the combined shutdown_and_deactivate() function instead.
This still might be an issue if the interrupt in flight servicing is
delayed on a remote CPU beyond the invocation of synchronize_irq(), but
that cannot be handled at that level and needs to be handled in the
synchronize_irq() context.
Fixes: f8264e3496 ("irqdomain: Introduce new interfaces to support hierarchy irqdomains")
Reported-by: Robert Hodaszi <Robert.Hodaszi@digi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Link: https://lkml.kernel.org/r/20190628111440.098196390@linutronix.de
Pull printk updates from Petr Mladek:
- Allow state reset of printk_once() calls.
- Prevent crashes when dereferencing invalid pointers in vsprintf().
Only the first byte is checked for simplicity.
- Make vsprintf warnings consistent and inlined.
- Treewide conversion of obsolete %pf, %pF to %ps, %pF printf
modifiers.
- Some clean up of vsprintf and test_printf code.
* tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
lib/vsprintf: Make function pointer_string static
vsprintf: Limit the length of inlined error messages
vsprintf: Avoid confusion between invalid address and value
vsprintf: Prevent crash when dereferencing invalid pointers
vsprintf: Consolidate handling of unknown pointer specifiers
vsprintf: Factor out %pO handler as kobject_string()
vsprintf: Factor out %pV handler as va_format()
vsprintf: Factor out %p[iI] handler as ip_addr_string()
vsprintf: Do not check address of well-known strings
vsprintf: Consistent %pK handling for kptr_restrict == 0
vsprintf: Shuffle restricted_pointer()
printk: Tie printk_once / printk_deferred_once into .data.once for reset
treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
lib/test_printf: Switch to bitmap_zalloc()
When irq_set_affinity_notifier() replaces the notifier, then the
reference count on the old notifier is dropped which causes it to be
freed. But nothing ensures that the old notifier is not longer queued
in the work list. If it is queued this results in a use after free and
possibly in work list corruption.
Ensure that the work is canceled before the reference is dropped.
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: marc.zyngier@arm.com
Link: https://lkml.kernel.org/r/1553439424-6529-1-git-send-email-psodagud@codeaurora.org
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
With -Wimplicit-fallthrough added to CFLAGS:
kernel/irq/manage.c: In function ‘irq_do_set_affinity’:
kernel/irq/manage.c:198:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
cpumask_copy(desc->irq_common_data.affinity, mask);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/irq/manage.c:199:2: note: here
case IRQ_SET_MASK_OK_NOCOPY:
^~~~
Annotate it.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20190228213714.GA9246@embeddedor
Pull irqchip updates from Marc Zyngier
- Core pseudo-NMI handling code
- Allow the default irq domain to be retrieved
- A new interrupt controller for the Loongson LS1X platform
- Affinity support for the SiFive PLIC
- Better support for the iMX irqsteer driver
- NUMA aware memory allocations for GICv3
- A handful of other fixes (i8259, GICv3, PLIC)
Add support for percpu_devid interrupts treated as NMIs.
Percpu_devid NMIs need to be setup/torn down on each CPU they target.
The same restrictions as for global NMIs still apply for percpu_devid NMIs.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>