In commit d71a92631c ("ANDROID: add support for Clang's Control Flow
Integrity (CFI)") some new symbols were exported, but they should have
been set as _GPL symbols.
Fix this up by properly.
Bug: 145210207
Cc: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6ecbb0f3b33f7c02c9b75bb7d80c35ce80e553f3
In commit ff5bf35998 ("ANDROID: bpf: validate bpf_func when BPF_JIT is
enabled with CFI") a new symbol was exported, but it should have been
set as a _GPL symbol.
Fix this up by properly.
Bug: 145210207
Cc: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I7239bb8e0ef329cd7eac6afcd06c341b17ea680b
Changes in 5.4.36
ext4: fix extent_status fragmentation for plain files
f2fs: fix to avoid memory leakage in f2fs_listxattr
net, ip_tunnel: fix interface lookup with no key
arm64: errata: Hide CTR_EL0.DIC on systems affected by Neoverse-N1 #1542419
arm64: Fake the IminLine size on systems affected by Neoverse-N1 #1542419
arm64: compat: Workaround Neoverse-N1 #1542419 for compat user-space
arm64: Silence clang warning on mismatched value/register sizes
tools/testing/nvdimm: Fix compilation failure without CONFIG_DEV_DAX_PMEM_COMPAT
watchdog: reset last_hw_keepalive time at start
scsi: lpfc: Fix kasan slab-out-of-bounds error in lpfc_unreg_login
scsi: lpfc: Fix crash after handling a pci error
scsi: lpfc: Fix crash in target side cable pulls hitting WAIT_FOR_UNREG
scsi: libfc: If PRLI rejected, move rport to PLOGI state
ceph: return ceph_mdsc_do_request() errors from __get_parent()
ceph: don't skip updating wanted caps when cap is stale
pwm: rcar: Fix late Runtime PM enablement
nvme-tcp: fix possible crash in write_zeroes processing
scsi: iscsi: Report unbind session event when the target has been removed
tools/test/nvdimm: Fix out of tree build
ASoC: Intel: atom: Take the drv->lock mutex before calling sst_send_slot_map()
nvme: fix deadlock caused by ANA update wrong locking
drm/amd/display: Update stream adjust in dc_stream_adjust_vmin_vmax
dma-direct: fix data truncation in dma_direct_get_required_mask()
kernel/gcov/fs.c: gcov_seq_next() should increase position index
selftests: kmod: fix handling test numbers above 9
ipc/util.c: sysvipc_find_ipc() should increase position index
kconfig: qconf: Fix a few alignment issues
lib/raid6/test: fix build on distros whose /bin/sh is not bash
s390/cio: generate delayed uevent for vfio-ccw subchannels
s390/cio: avoid duplicated 'ADD' uevents
loop: Better discard support for block devices
Revert "powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled"
powerpc/pseries: Fix MCE handling on pseries
nvme: fix compat address handling in several ioctls
pwm: renesas-tpu: Fix late Runtime PM enablement
pwm: bcm2835: Dynamically allocate base
perf/core: Disable page faults when getting phys address
drm/amd/display: Calculate scaling ratios on every medium/full update
ASoC: Intel: bytcr_rt5640: Add quirk for MPMAN MPWIN895CL tablet
ALSA: usb-audio: Add Pioneer DJ DJM-250MK2 quirk
xhci: Ensure link state is U3 after setting USB_SS_PORT_LS_U3
xhci: Wait until link state trainsits to U0 after setting USB_SS_PORT_LS_U0
xhci: Finetune host initiated USB3 rootport link suspend and resume
drm/amd/display: Not doing optimize bandwidth if flip pending.
PCI/PM: Add pcie_wait_for_link_delay()
libbpf: Fix readelf output parsing on powerpc with recent binutils
PCI: pciehp: Prevent deadlock on disconnect
ASoC: SOF: trace: fix unconditional free in trace release
tracing/selftests: Turn off timeout setting
virtio-blk: improve virtqueue error to BLK_STS
scsi: smartpqi: fix controller lockup observed during force reboot
scsi: smartpqi: fix call trace in device discovery
scsi: smartpqi: fix problem with unique ID for physical device
PCI/ASPM: Allow re-enabling Clock PM
PCI/PM: Add missing link delays required by the PCIe spec
cxgb4: fix adapter crash due to wrong MC size
cxgb4: fix large delays in PTP synchronization
ipv4: Update fib_select_default to handle nexthop objects
ipv6: fix restrict IPV6_ADDRFORM operation
macsec: avoid to set wrong mtu
macvlan: fix null dereference in macvlan_device_event()
mlxsw: Fix some IS_ERR() vs NULL bugs
net: bcmgenet: correct per TX/RX ring statistics
net/mlx4_en: avoid indirect call in TX completion
net: netrom: Fix potential nr_neigh refcnt leak in nr_add_node
net: openvswitch: ovs_ct_exit to be done under ovs_lock
net: stmmac: dwmac-meson8b: Add missing boundary to RGMII TX clock array
net/x25: Fix x25_neigh refcnt leak when receiving frame
sched: etf: do not assume all sockets are full blown
selftests: Fix suppress test in fib_tests.sh
tcp: cache line align MAX_TCP_HEADER
team: fix hang in team_mode_get()
vrf: Fix IPv6 with qdisc and xfrm
net: dsa: b53: Lookup VID in ARL searches when VLAN is enabled
net: dsa: b53: Fix valid setting for MDB entries
net: dsa: b53: Fix ARL register definitions
net: dsa: b53: Rework ARL bin logic
net: dsa: b53: b53_arl_rw_op() needs to select IVL or SVL
vxlan: use the correct nlattr array in NL_SET_ERR_MSG_ATTR
geneve: use the correct nlattr array in NL_SET_ERR_MSG_ATTR
xfrm: Always set XFRM_TRANSFORMED in xfrm{4,6}_output_finish
vrf: Check skb for XFRM_TRANSFORMED flag
KEYS: Avoid false positive ENOMEM error on key read
ALSA: hda: Remove ASUS ROG Zenith from the blacklist
ALSA: usb-audio: Add static mapping table for ALC1220-VB-based mobos
ALSA: usb-audio: Add connector notifier delegation
iio: core: remove extra semi-colon from devm_iio_device_register() macro
iio: st_sensors: rely on odr mask to know if odr can be set
iio: adc: stm32-adc: fix sleep in atomic context
iio: adc: ti-ads8344: properly byte swap value
iio: xilinx-xadc: Fix ADC-B powerdown
iio: xilinx-xadc: Fix clearing interrupt when enabling trigger
iio: xilinx-xadc: Fix sequencer configuration for aux channels in simultaneous mode
iio: xilinx-xadc: Make sure not exceed maximum samplerate
USB: sisusbvga: Change port variable from signed to unsigned
USB: Add USB_QUIRK_DELAY_CTRL_MSG and USB_QUIRK_DELAY_INIT for Corsair K70 RGB RAPIDFIRE
USB: early: Handle AMD's spec-compliant identifiers, too
USB: core: Fix free-while-in-use bug in the USB S-Glibrary
USB: hub: Fix handling of connect changes during sleep
USB: hub: Revert commit bd0e6c9614 ("usb: hub: try old enumeration scheme first for high speed devices")
tty: serial: owl: add "much needed" clk_prepare_enable()
vmalloc: fix remap_vmalloc_range() bounds checks
staging: gasket: Fix incongruency in handling of sysfs entries creation
coredump: fix null pointer dereference on coredump
mm/hugetlb: fix a addressing exception caused by huge_pte_offset
mm/ksm: fix NULL pointer dereference when KSM zero page is enabled
tools/vm: fix cross-compile build
ALSA: usx2y: Fix potential NULL dereference
ALSA: hda/realtek - Fix unexpected init_amp override
ALSA: hda/realtek - Add new codec supported for ALC245
ALSA: hda/hdmi: Add module option to disable audio component binding
ALSA: usb-audio: Fix usb audio refcnt leak when getting spdif
ALSA: usb-audio: Filter out unsupported sample rates on Focusrite devices
tpm/tpm_tis: Free IRQ if probing fails
tpm: fix wrong return value in tpm_pcr_extend
tpm: ibmvtpm: retry on H_CLOSED in tpm_ibmvtpm_send()
KVM: s390: Return last valid slot if approx index is out-of-bounds
KVM: Check validity of resolved slot when searching memslots
KVM: VMX: Enable machine check support for 32bit targets
tty: hvc: fix buffer overflow during hvc_alloc().
tty: rocket, avoid OOB access
usb-storage: Add unusual_devs entry for JMicron JMS566
signal: Avoid corrupting si_pid and si_uid in do_notify_parent
audit: check the length of userspace generated audit records
ASoC: dapm: fixup dapm kcontrol widget
mac80211: populate debugfs only after cfg80211 init
SUNRPC: Fix backchannel RPC soft lockups
iwlwifi: pcie: actually release queue memory in TVQM
iwlwifi: mvm: beacon statistics shouldn't go backwards
iwlwifi: mvm: limit maximum queue appropriately
iwlwifi: mvm: Do not declare support for ACK Enabled Aggregation
iwlwifi: mvm: fix inactive TID removal return value usage
cifs: fix uninitialised lease_key in open_shroot()
ARM: imx: provide v7_cpu_resume() only on ARM_CPU_SUSPEND=y
powerpc/setup_64: Set cache-line-size based on cache-block-size
staging: comedi: dt2815: fix writing hi byte of analog output
staging: comedi: Fix comedi_device refcnt leak in comedi_open
vt: don't hardcode the mem allocation upper bound
vt: don't use kmalloc() for the unicode screen buffer
staging: vt6656: Don't set RCR_MULTICAST or RCR_BROADCAST by default.
staging: vt6656: Fix calling conditions of vnt_set_bss_mode
staging: vt6656: Fix drivers TBTT timing counter.
staging: vt6656: Fix pairwise key entry save.
staging: vt6656: Power save stop wake_up_count wrap around.
cdc-acm: close race betrween suspend() and acm_softint
cdc-acm: introduce a cool down
UAS: no use logging any details in case of ENODEV
UAS: fix deadlock in error handling and PM flushing work
fpga: dfl: pci: fix return value of cci_pci_sriov_configure
usb: dwc3: gadget: Fix request completion check
usb: f_fs: Clear OS Extended descriptor counts to zero in ffs_data_reset()
usb: typec: tcpm: Ignore CC and vbus changes in PORT_RESET change
usb: typec: altmode: Fix typec_altmode_get_partner sometimes returning an invalid pointer
xhci: Fix handling halted endpoint even if endpoint ring appears empty
xhci: prevent bus suspend if a roothub port detected a over-current condition
xhci: Don't clear hub TT buffer on ep0 protocol stall
serial: sh-sci: Make sure status register SCxSR is read in correct sequence
Revert "serial: uartps: Fix uartps_major handling"
Revert "serial: uartps: Use the same dynamic major number for all ports"
Revert "serial: uartps: Fix error path when alloc failed"
Revert "serial: uartps: Do not allow use aliases >= MAX_UART_INSTANCES"
Revert "serial: uartps: Change uart ID port allocation"
Revert "serial: uartps: Move Port ID to device data structure"
Revert "serial: uartps: Register own uart console and driver structures"
powerpc/kuap: PPC_KUAP_DEBUG should depend on PPC_KUAP
powerpc/mm: Fix CONFIG_PPC_KUAP_DEBUG on PPC32
compat: ARM64: always include asm-generic/compat.h
s390/mm: fix page table upgrade vs 2ndary address mode accesses
Linux 5.4.36
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Idd8d97e7f00eb7389e184fc4186c4c0dd14f1704
commit 763dafc520 upstream.
Commit 7561252892 ("audit: always check the netlink payload length
in audit_receive_msg()") fixed a number of missing message length
checks, but forgot to check the length of userspace generated audit
records. The good news is that you need CAP_AUDIT_WRITE to submit
userspace audit records, which is generally only given to trusted
processes, so the impact should be limited.
Cc: stable@vger.kernel.org
Fixes: 7561252892 ("audit: always check the netlink payload length in audit_receive_msg()")
Reported-by: syzbot+49e69b4d71a420ceda3e@syzkaller.appspotmail.com
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 61e713bdca upstream.
Christof Meerwald <cmeerw@cmeerw.org> writes:
> Hi,
>
> this is probably related to commit
> 7a0cf09494 (signal: Correct namespace
> fixups of si_pid and si_uid).
>
> With a 5.6.5 kernel I am seeing SIGCHLD signals that don't include a
> properly set si_pid field - this seems to happen for multi-threaded
> child processes.
>
> A simple test program (based on the sample from the signalfd man page):
>
> #include <sys/signalfd.h>
> #include <signal.h>
> #include <unistd.h>
> #include <spawn.h>
> #include <stdlib.h>
> #include <stdio.h>
>
> #define handle_error(msg) \
> do { perror(msg); exit(EXIT_FAILURE); } while (0)
>
> int main(int argc, char *argv[])
> {
> sigset_t mask;
> int sfd;
> struct signalfd_siginfo fdsi;
> ssize_t s;
>
> sigemptyset(&mask);
> sigaddset(&mask, SIGCHLD);
>
> if (sigprocmask(SIG_BLOCK, &mask, NULL) == -1)
> handle_error("sigprocmask");
>
> pid_t chldpid;
> char *chldargv[] = { "./sfdclient", NULL };
> posix_spawn(&chldpid, "./sfdclient", NULL, NULL, chldargv, NULL);
>
> sfd = signalfd(-1, &mask, 0);
> if (sfd == -1)
> handle_error("signalfd");
>
> for (;;) {
> s = read(sfd, &fdsi, sizeof(struct signalfd_siginfo));
> if (s != sizeof(struct signalfd_siginfo))
> handle_error("read");
>
> if (fdsi.ssi_signo == SIGCHLD) {
> printf("Got SIGCHLD %d %d %d %d\n",
> fdsi.ssi_status, fdsi.ssi_code,
> fdsi.ssi_uid, fdsi.ssi_pid);
> return 0;
> } else {
> printf("Read unexpected signal\n");
> }
> }
> }
>
>
> and a multi-threaded client to test with:
>
> #include <unistd.h>
> #include <pthread.h>
>
> void *f(void *arg)
> {
> sleep(100);
> }
>
> int main()
> {
> pthread_t t[8];
>
> for (int i = 0; i != 8; ++i)
> {
> pthread_create(&t[i], NULL, f, NULL);
> }
> }
>
> I tried to do a bit of debugging and what seems to be happening is
> that
>
> /* From an ancestor pid namespace? */
> if (!task_pid_nr_ns(current, task_active_pid_ns(t))) {
>
> fails inside task_pid_nr_ns because the check for "pid_alive" fails.
>
> This code seems to be called from do_notify_parent and there we
> actually have "tsk != current" (I am assuming both are threads of the
> current process?)
I instrumented the code with a warning and received the following backtrace:
> WARNING: CPU: 0 PID: 777 at kernel/pid.c:501 __task_pid_nr_ns.cold.6+0xc/0x15
> Modules linked in:
> CPU: 0 PID: 777 Comm: sfdclient Not tainted 5.7.0-rc1userns+ #2924
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> RIP: 0010:__task_pid_nr_ns.cold.6+0xc/0x15
> Code: ff 66 90 48 83 ec 08 89 7c 24 04 48 8d 7e 08 48 8d 74 24 04 e8 9a b6 44 00 48 83 c4 08 c3 48 c7 c7 59 9f ac 82 e8 c2 c4 04 00 <0f> 0b e9 3fd
> RSP: 0018:ffffc9000042fbf8 EFLAGS: 00010046
> RAX: 000000000000000c RBX: 0000000000000000 RCX: ffffc9000042faf4
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81193d29
> RBP: ffffc9000042fc18 R08: 0000000000000000 R09: 0000000000000001
> R10: 000000100f938416 R11: 0000000000000309 R12: ffff8880b941c140
> R13: 0000000000000000 R14: 0000000000000000 R15: ffff8880b941c140
> FS: 0000000000000000(0000) GS:ffff8880bca00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2e8c0a32e0 CR3: 0000000002e10000 CR4: 00000000000006f0
> Call Trace:
> send_signal+0x1c8/0x310
> do_notify_parent+0x50f/0x550
> release_task.part.21+0x4fd/0x620
> do_exit+0x6f6/0xaf0
> do_group_exit+0x42/0xb0
> get_signal+0x13b/0xbb0
> do_signal+0x2b/0x670
> ? __audit_syscall_exit+0x24d/0x2b0
> ? rcu_read_lock_sched_held+0x4d/0x60
> ? kfree+0x24c/0x2b0
> do_syscall_64+0x176/0x640
> ? trace_hardirqs_off_thunk+0x1a/0x1c
> entry_SYSCALL_64_after_hwframe+0x49/0xb3
The immediate problem is as Christof noticed that "pid_alive(current) == false".
This happens because do_notify_parent is called from the last thread to exit
in a process after that thread has been reaped.
The bigger issue is that do_notify_parent can be called from any
process that manages to wait on a thread of a multi-threaded process
from wait_task_zombie. So any logic based upon current for
do_notify_parent is just nonsense, as current can be pretty much
anything.
So change do_notify_parent to call __send_signal directly.
Inspecting the code it appears this problem has existed since the pid
namespace support started handling this case in 2.6.30. This fix only
backports to 7a0cf09494 ("signal: Correct namespace fixups of si_pid and si_uid")
where the problem logic was moved out of __send_signal and into send_signal.
Cc: stable@vger.kernel.org
Fixes: 6588c1e3ff ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary")
Ref: 921cf9f630 ("signals: protect cinit from unblocked SIG_DFL signals")
Link: https://lore.kernel.org/lkml/20200419201336.GI22017@edge.cmeerw.net/
Reported-by: Christof Meerwald <cmeerw@cmeerw.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d3296fb372 ]
We hit following warning when running tests on kernel
compiled with CONFIG_DEBUG_ATOMIC_SLEEP=y:
WARNING: CPU: 19 PID: 4472 at mm/gup.c:2381 __get_user_pages_fast+0x1a4/0x200
CPU: 19 PID: 4472 Comm: dummy Not tainted 5.6.0-rc6+ #3
RIP: 0010:__get_user_pages_fast+0x1a4/0x200
...
Call Trace:
perf_prepare_sample+0xff1/0x1d90
perf_event_output_forward+0xe8/0x210
__perf_event_overflow+0x11a/0x310
__intel_pmu_pebs_event+0x657/0x850
intel_pmu_drain_pebs_nhm+0x7de/0x11d0
handle_pmi_common+0x1b2/0x650
intel_pmu_handle_irq+0x17b/0x370
perf_event_nmi_handler+0x40/0x60
nmi_handle+0x192/0x590
default_do_nmi+0x6d/0x150
do_nmi+0x2f9/0x3c0
nmi+0x8e/0xd7
While __get_user_pages_fast() is IRQ-safe, it calls access_ok(),
which warns on:
WARN_ON_ONCE(!in_task() && !pagefault_disabled())
Peter suggested disabling page faults around __get_user_pages_fast(),
which gets rid of the warning in access_ok() call.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200407141427.3184722-1-jolsa@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cdcda0d1f8 ]
The upper 32-bit physical address gets truncated inadvertently
when dma_direct_get_required_mask() invokes phys_to_dma_direct().
This results in dma_addressing_limited() return incorrect value
when used in platforms with LPAE enabled.
Fix it here by explicitly type casting 'max_pfn' to phys_addr_t
in order to prevent overflow of intermediate value while evaluating
'(max_pfn - 1) << PAGE_SHIFT'.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Changes in 5.4.35
ext4: use non-movable memory for superblock readahead
watchdog: sp805: fix restart handler
xsk: Fix out of boundary write in __xsk_rcv_memcpy
arm, bpf: Fix bugs with ALU64 {RSH, ARSH} BPF_K shift by 0
arm, bpf: Fix offset overflow for BPF_MEM BPF_DW
objtool: Fix switch table detection in .text.unlikely
scsi: sg: add sg_remove_request in sg_common_write
ALSA: hda: Honor PM disablement in PM freeze and thaw_noirq ops
ARM: dts: imx6: Use gpc for FEC interrupt controller to fix wake on LAN.
kbuild, btf: Fix dependencies for DEBUG_INFO_BTF
netfilter: nf_tables: report EOPNOTSUPP on unsupported flags/object type
irqchip/mbigen: Free msi_desc on device teardown
ALSA: hda: Don't release card at firmware loading error
xsk: Add missing check on user supplied headroom size
of: unittest: kmemleak on changeset destroy
of: unittest: kmemleak in of_unittest_platform_populate()
of: unittest: kmemleak in of_unittest_overlay_high_level()
of: overlay: kmemleak in dup_and_fixup_symbol_prop()
x86/Hyper-V: Unload vmbus channel in hv panic callback
x86/Hyper-V: Trigger crash enlightenment only once during system crash.
x86/Hyper-V: Report crash register data or kmsg before running crash kernel
x86/Hyper-V: Report crash register data when sysctl_record_panic_msg is not set
x86/Hyper-V: Report crash data in die() when panic_on_oops is set
afs: Fix missing XDR advance in xdr_decode_{AFS,YFS}FSFetchStatus()
afs: Fix decoding of inline abort codes from version 1 status records
afs: Fix rename operation status delivery
afs: Fix afs_d_validate() to set the right directory version
afs: Fix race between post-modification dir edit and readdir/d_revalidate
block, bfq: turn put_queue into release_process_ref in __bfq_bic_change_cgroup
block, bfq: make reparent_leaf_entity actually work only on leaf entities
block, bfq: invoke flush_idle_tree after reparent_active_queues in pd_offline
rbd: avoid a deadlock on header_rwsem when flushing notifies
rbd: call rbd_dev_unprobe() after unwatching and flushing notifies
x86/Hyper-V: Free hv_panic_page when fail to register kmsg dump
drm/ttm: flush the fence on the bo after we individualize the reservation object
clk: Don't cache errors from clk_ops::get_phase()
clk: at91: usb: continue if clk_hw_round_rate() return zero
net/mlx5e: Enforce setting of a single FEC mode
f2fs: fix the panic in do_checkpoint()
ARM: dts: rockchip: fix vqmmc-supply property name for rk3188-bqedison2qc
arm64: dts: allwinner: a64: Fix display clock register range
power: supply: bq27xxx_battery: Silence deferred-probe error
clk: tegra: Fix Tegra PMC clock out parents
arm64: tegra: Add PCIe endpoint controllers nodes for Tegra194
arm64: tegra: Fix Tegra194 PCIe compatible string
arm64: dts: clearfog-gt-8k: set gigabit PHY reset deassert delay
soc: imx: gpc: fix power up sequencing
dma-coherent: fix integer overflow in the reserved-memory dma allocation
rtc: 88pm860x: fix possible race condition
NFS: alloc_nfs_open_context() must use the file cred when available
NFSv4/pnfs: Return valid stateids in nfs_layout_find_inode_by_stateid()
NFSv4.2: error out when relink swapfile
ARM: dts: rockchip: fix lvds-encoder ports subnode for rk3188-bqedison2qc
KVM: PPC: Book3S HV: Fix H_CEDE return code for nested guests
f2fs: fix to show norecovery mount option
phy: uniphier-usb3ss: Add Pro5 support
NFS: direct.c: Fix memory leak of dreq when nfs_get_lock_context fails
f2fs: Fix mount failure due to SPO after a successful online resize FS
f2fs: Add a new CP flag to help fsck fix resize SPO issues
s390/cpuinfo: fix wrong output when CPU0 is offline
hibernate: Allow uswsusp to write to swap
btrfs: add RCU locks around block group initialization
powerpc/prom_init: Pass the "os-term" message to hypervisor
powerpc/maple: Fix declaration made after definition
s390/cpum_sf: Fix wrong page count in error message
ext4: do not commit super on read-only bdev
um: ubd: Prevent buffer overrun on command completion
cifs: Allocate encryption header through kmalloc
mm/hugetlb: fix build failure with HUGETLB_PAGE but not HUGEBTLBFS
drm/nouveau/svm: check for SVM initialized before migrating
drm/nouveau/svm: fix vma range check for migration
include/linux/swapops.h: correct guards for non_swap_entry()
percpu_counter: fix a data race at vm_committed_as
compiler.h: fix error in BUILD_BUG_ON() reporting
KVM: s390: vsie: Fix possible race when shadowing region 3 tables
drm/nouveau: workaround runpm fail by disabling PCI power management on certain intel bridges
leds: core: Fix warning message when init_data
x86: ACPI: fix CPU hotplug deadlock
csky: Fixup cpu speculative execution to IO area
drm/amdkfd: kfree the wrong pointer
NFS: Fix memory leaks in nfs_pageio_stop_mirroring()
csky: Fixup get wrong psr value from phyical reg
f2fs: fix NULL pointer dereference in f2fs_write_begin()
ACPICA: Fixes for acpiExec namespace init file
um: falloc.h needs to be directly included for older libc
drm/vc4: Fix HDMI mode validation
iommu/virtio: Fix freeing of incomplete domains
iommu/vt-d: Fix mm reference leak
SUNRPC: fix krb5p mount to provide large enough buffer in rq_rcvsize
ext2: fix empty body warnings when -Wextra is used
iommu/vt-d: Silence RCU-list debugging warning in dmar_find_atsr()
iommu/vt-d: Fix page request descriptor size
ext2: fix debug reference to ext2_xattr_cache
sunrpc: Fix gss_unwrap_resp_integ() again
csky: Fixup init_fpu compile warning with __init
power: supply: axp288_fuel_gauge: Broaden vendor check for Intel Compute Sticks.
libnvdimm: Out of bounds read in __nd_ioctl()
iommu/amd: Fix the configuration of GCR3 table root pointer
f2fs: fix to wait all node page writeback
drm/nouveau/gr/gp107,gp108: implement workaround for HW hanging during init
net: dsa: bcm_sf2: Fix overflow checks
dma-debug: fix displaying of dma allocation type
fbdev: potential information leak in do_fb_ioctl()
ARM: dts: sunxi: Fix DE2 clocks register range
iio: si1133: read 24-bit signed integer for measurement
fbmem: Adjust indentation in fb_prepare_logo and fb_blank
tty: evh_bytechan: Fix out of bounds accesses
locktorture: Print ratio of acquisitions, not failures
mtd: rawnand: free the nand_device object
mtd: spinand: Explicitly use MTD_OPS_RAW to write the bad block marker to OOB
docs: Fix path to MTD command line partition parser
mtd: lpddr: Fix a double free in probe()
mtd: phram: fix a double free issue in error path
KEYS: Don't write out to userspace while holding key semaphore
bpf: fix buggy r0 retval refinement for tracing helpers
bpf: Test_verifier, bpf_get_stack return value add <0
bpf: Test_progs, add test to catch retval refine error handling
bpf, test_verifier: switch bpf_get_stack's 0 s> r8 test
Linux 5.4.35
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I702aba533097c8533c12561c7f1a51f3a96f6f09
[ no upstream commit ]
See the glory details in 100605035e ("bpf: Verifier, do_refine_retval_range
may clamp umin to 0 incorrectly") for why 849fa50662 ("bpf/verifier: refine
retval R0 state for bpf_get_stack helper") is buggy. The whole series however
is not suitable for stable since it adds significant amount [0] of verifier
complexity in order to add 32bit subreg tracking. Something simpler is needed.
Unfortunately, reverting 849fa50662 ("bpf/verifier: refine retval R0 state
for bpf_get_stack helper") or just cherry-picking 100605035e ("bpf: Verifier,
do_refine_retval_range may clamp umin to 0 incorrectly") is not an option since
it will break existing tracing programs badly (at least those that are using
bpf_get_stack() and bpf_probe_read_str() helpers). Not fixing it in stable is
also not an option since on 4.19 kernels an error will cause a soft-lockup due
to hitting dead-code sanitized branch since we don't hard-wire such branches
in old kernels yet. But even then for 5.x 849fa50662 ("bpf/verifier: refine
retval R0 state for bpf_get_stack helper") would cause wrong bounds on the
verifier simluation when an error is hit.
In one of the earlier iterations of mentioned patch series for upstream there
was the concern that just using smax_value in do_refine_retval_range() would
nuke bounds by subsequent <<32 >>32 shifts before the comparison against 0 [1]
which eventually led to the 32bit subreg tracking in the first place. While I
initially went for implementing the idea [1] to pattern match the two shift
operations, it turned out to be more complex than actually needed, meaning, we
could simply treat do_refine_retval_range() similarly to how we branch off
verification for conditionals or under speculation, that is, pushing a new
reg state to the stack for later verification. This means, instead of verifying
the current path with the ret_reg in [S32MIN, msize_max_value] interval where
later bounds would get nuked, we split this into two: i) for the success case
where ret_reg can be in [0, msize_max_value], and ii) for the error case with
ret_reg known to be in interval [S32MIN, -1]. Latter will preserve the bounds
during these shift patterns and can match reg < 0 test. test_progs also succeed
with this approach.
[0] https://lore.kernel.org/bpf/158507130343.15666.8018068546764556975.stgit@john-Precision-5820-Tower/
[1] https://lore.kernel.org/bpf/158015334199.28573.4940395881683556537.stgit@john-XPS-13-9370/T/#m2e0ad1d5949131014748b6daa48a3495e7f0456d
Fixes: 849fa50662 ("bpf/verifier: refine retval R0 state for bpf_get_stack helper")
Reported-by: Lorenzo Fontana <fontanalorenz@gmail.com>
Reported-by: Leonardo Di Donato <leodidonato@gmail.com>
Reported-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 80c503e0e6 upstream.
The __torture_print_stats() function in locktorture.c carefully
initializes local variable "min" to statp[0].n_lock_acquired, but
then compares it to statp[i].n_lock_fail. Given that the .n_lock_fail
field should normally be zero, and given the initialization, it seems
reasonable to display the maximum and minimum number acquisitions
instead of miscomputing the maximum and minimum number of failures.
This commit therefore switches from failures to acquisitions.
And this turns out to be not only a day-zero bug, but entirely my
own fault. I hate it when that happens!
Fixes: 0af3fe1efa ("locktorture: Add a lock-torture kernel module")
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 286c21de32 ]
pageno is an int and the PAGE_SHIFT shift is done on an int,
overflowing if the memory is bigger than 2G
This can be reproduced using for example a reserved-memory of 4G
reserved-memory {
#address-cells = <2>;
#size-cells = <2>;
ranges;
reserved_dma: buffer@0 {
compatible = "shared-dma-pool";
no-map;
reg = <0x5 0x00000000 0x1 0x0>;
};
};
Signed-off-by: Kevin Grandemange <kevin.grandemange@allegrodvt.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Changes in 5.4.34
amd-xgbe: Use __napi_schedule() in BH context
hsr: check protocol version in hsr_newlink()
l2tp: Allow management of tunnels and session in user namespace
net: dsa: mt7530: fix tagged frames pass-through in VLAN-unaware mode
net: ipv4: devinet: Fix crash when add/del multicast IP with autojoin
net: ipv6: do not consider routes via gateways for anycast address check
net: phy: micrel: use genphy_read_status for KSZ9131
net: qrtr: send msgs from local of same id as broadcast
net: revert default NAPI poll timeout to 2 jiffies
net: tun: record RX queue in skb before do_xdp_generic()
net: dsa: mt7530: move mt7623 settings out off the mt7530
net: ethernet: mediatek: move mt7623 settings out off the mt7530
net/mlx5: Fix frequent ioread PCI access during recovery
net/mlx5e: Add missing release firmware call
net/mlx5e: Fix nest_level for vlan pop action
net/mlx5e: Fix pfnum in devlink port attribute
net: stmmac: dwmac-sunxi: Provide TX and RX fifo sizes
Revert "ACPI: EC: Do not clear boot_ec_is_ecdt in acpi_ec_add()"
ovl: fix value of i_ino for lower hardlink corner case
scsi: ufs: Fix ufshcd_hold() caused scheduling while atomic
platform/chrome: cros_ec_rpmsg: Fix race with host event
jbd2: improve comments about freeing data buffers whose page mapping is NULL
acpi/nfit: improve bounds checking for 'func'
perf report: Fix no branch type statistics report issue
pwm: pca9685: Fix PWM/GPIO inter-operation
net/bpfilter: remove superfluous testing message
ext4: fix incorrect group count in ext4_fill_super error message
ext4: fix incorrect inodes per group in error message
clk: at91: sam9x60: fix usb clock parents
clk: at91: usb: use proper usbs_mask
ARM: dts: imx7-colibri: fix muxing of usbc_det pin
arm64: dts: librem5-devkit: add a vbus supply to usb0
usb: dwc3: gadget: Don't clear flags before transfer ended
ASoC: Intel: mrfld: fix incorrect check on p->sink
ASoC: Intel: mrfld: return error codes when an error occurs
ALSA: hda/realtek - Enable the headset mic on Asus FX505DT
ALSA: usb-audio: Filter error from connector kctl ops, too
ALSA: usb-audio: Don't override ignore_ctl_error value from the map
ALSA: usb-audio: Don't create jack controls for PCM terminals
ALSA: usb-audio: Check mapping at creating connector controls, too
arm64: vdso: don't free unallocated pages
keys: Fix proc_keys_next to increase position index
tracing: Fix the race between registering 'snapshot' event trigger and triggering 'snapshot' operation
btrfs: check commit root generation in should_ignore_root
nl80211: fix NL80211_ATTR_FTM_RESPONDER policy
mac80211: fix race in ieee80211_register_hw()
mac80211_hwsim: Use kstrndup() in place of kasprintf()
net/mlx5e: Encapsulate updating netdev queues into a function
net/mlx5e: Rename hw_modify to preactivate
net/mlx5e: Use preactivate hook to set the indirection table
drm/amd/powerplay: force the trim of the mclk dpm_levels if OD is enabled
drm/amdgpu: fix the hw hang during perform system reboot and reset
i2c: designware: platdrv: Remove DPM_FLAG_SMART_SUSPEND flag on BYT and CHT
ext4: do not zeroout extents beyond i_disksize
irqchip/ti-sci-inta: Fix processing of masked irqs
x86/resctrl: Preserve CDP enable over CPU hotplug
x86/resctrl: Fix invalid attempt at removing the default resource group
scsi: target: remove boilerplate code
scsi: target: fix hang when multiple threads try to destroy the same iscsi session
x86/microcode/AMD: Increase microcode PATCH_MAX_SIZE
Linux 5.4.34
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ice46175a7478217e00e649fa26ee8631985746f1
Changes in 5.4.33
ARM: dts: sun8i-a83t-tbs-a711: HM5065 doesn't like such a high voltage
bus: sunxi-rsb: Return correct data when mixing 16-bit and 8-bit reads
ARM: dts: Fix dm814x Ethernet by changing to use rgmii-id mode
bpf: Fix deadlock with rq_lock in bpf_send_signal()
iwlwifi: mvm: Fix rate scale NSS configuration
Input: tm2-touchkey - add support for Coreriver TC360 variant
soc: fsl: dpio: register dpio irq handlers after dpio create
rxrpc: Abstract out the calculation of whether there's Tx space
rxrpc: Fix call interruptibility handling
net: stmmac: platform: Fix misleading interrupt error msg
net: vxge: fix wrong __VA_ARGS__ usage
hinic: fix a bug of waitting for IO stopped
hinic: fix the bug of clearing event queue
hinic: fix out-of-order excution in arm cpu
hinic: fix wrong para of wait_for_completion_timeout
hinic: fix wrong value of MIN_SKB_LEN
selftests/net: add definition for SOL_DCCP to fix compilation errors for old libc
cxgb4/ptp: pass the sign of offset delta in FW CMD
drm/scheduler: fix rare NULL ptr race
cfg80211: Do not warn on same channel at the end of CSA
qlcnic: Fix bad kzalloc null test
i2c: st: fix missing struct parameter description
i2c: pca-platform: Use platform_irq_get_optional
media: rc: add keymap for Videostrong KII Pro
cpufreq: imx6q: Fixes unwanted cpu overclocking on i.MX6ULL
staging: wilc1000: avoid double unlocking of 'wilc->hif_cs' mutex
media: venus: hfi_parser: Ignore HEVC encoding for V1
firmware: arm_sdei: fix double-lock on hibernate with shared events
null_blk: Fix the null_add_dev() error path
null_blk: Handle null_add_dev() failures properly
null_blk: fix spurious IO errors after failed past-wp access
media: imx: imx7_mipi_csis: Power off the source when stopping streaming
media: imx: imx7-media-csi: Fix video field handling
xhci: bail out early if driver can't accress host in resume
ACPI: EC: Do not clear boot_ec_is_ecdt in acpi_ec_add()
x86: Don't let pgprot_modify() change the page encryption bit
dma-mapping: Fix dma_pgprot() for unencrypted coherent pages
block: keep bdi->io_pages in sync with max_sectors_kb for stacked devices
debugfs: Check module state before warning in {full/open}_proxy_open()
irqchip/versatile-fpga: Handle chained IRQs properly
time/sched_clock: Expire timer in hardirq context
media: allegro: fix type of gop_length in channel_create message
sched: Avoid scale real weight down to zero
selftests/x86/ptrace_syscall_32: Fix no-vDSO segfault
PCI/switchtec: Fix init_completion race condition with poll_wait()
block, bfq: move forward the getting of an extra ref in bfq_bfqq_move
media: i2c: video-i2c: fix build errors due to 'imply hwmon'
libata: Remove extra scsi_host_put() in ata_scsi_add_hosts()
pstore/platform: fix potential mem leak if pstore_init_fs failed
gfs2: Do log_flush in gfs2_ail_empty_gl even if ail list is empty
gfs2: Don't demote a glock until its revokes are written
cpufreq: imx6q: fix error handling
x86/boot: Use unsigned comparison for addresses
efi/x86: Ignore the memory attributes table on i386
genirq/irqdomain: Check pointer in irq_domain_alloc_irqs_hierarchy()
block: Fix use-after-free issue accessing struct io_cq
media: i2c: ov5695: Fix power on and off sequences
usb: dwc3: core: add support for disabling SS instances in park mode
irqchip/gic-v4: Provide irq_retrigger to avoid circular locking dependency
md: check arrays is suspended in mddev_detach before call quiesce operations
firmware: fix a double abort case with fw_load_sysfs_fallback
spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion
locking/lockdep: Avoid recursion in lockdep_count_{for,back}ward_deps()
block, bfq: fix use-after-free in bfq_idle_slice_timer_body
btrfs: qgroup: ensure qgroup_rescan_running is only set when the worker is at least queued
btrfs: remove a BUG_ON() from merge_reloc_roots()
btrfs: restart relocate_tree_blocks properly
btrfs: track reloc roots based on their commit root bytenr
ASoC: fix regwmask
ASoC: dapm: connect virtual mux with default value
ASoC: dpcm: allow start or stop during pause for backend
ASoC: topology: use name_prefix for new kcontrol
usb: gadget: f_fs: Fix use after free issue as part of queue failure
usb: gadget: composite: Inform controller driver of self-powered
ALSA: usb-audio: Add mixer workaround for TRX40 and co
ALSA: hda: Add driver blacklist
ALSA: hda: Fix potential access overflow in beep helper
ALSA: ice1724: Fix invalid access for enumerated ctl items
ALSA: pcm: oss: Fix regression by buffer overflow fix
ALSA: hda/realtek: Enable mute LED on an HP system
ALSA: hda/realtek - a fake key event is triggered by running shutup
ALSA: doc: Document PC Beep Hidden Register on Realtek ALC256
ALSA: hda/realtek - Set principled PC Beep configuration for ALC256
ALSA: hda/realtek - Remove now-unnecessary XPS 13 headphone noise fixups
ALSA: hda/realtek - Add quirk for Lenovo Carbon X1 8th gen
ALSA: hda/realtek - Add quirk for MSI GL63
media: venus: firmware: Ignore secure call error on first resume
media: hantro: Read be32 words starting at every fourth byte
media: ti-vpe: cal: fix disable_irqs to only the intended target
media: ti-vpe: cal: fix a kernel oops when unloading module
seccomp: Add missing compat_ioctl for notify
acpi/x86: ignore unspecified bit positions in the ACPI global lock field
ACPICA: Allow acpi_any_gpe_status_set() to skip one GPE
ACPI: PM: s2idle: Refine active GPEs check
thermal: devfreq_cooling: inline all stubs for CONFIG_DEVFREQ_THERMAL=n
nvmet-tcp: fix maxh2cdata icresp parameter
nvme-fc: Revert "add module to ops template to allow module references"
efi/x86: Add TPM related EFI tables to unencrypted mapping checks
PCI: pciehp: Fix indefinite wait on sysfs requests
PCI/ASPM: Clear the correct bits when enabling L1 substates
PCI: Add boot interrupt quirk mechanism for Xeon chipsets
PCI: qcom: Fix the fixup of PCI_VENDOR_ID_QCOM
PCI: endpoint: Fix for concurrent memory allocation in OB address region
sched/fair: Fix enqueue_task_fair warning
tpm: Don't make log failures fatal
tpm: tpm1_bios_measurements_next should increase position index
tpm: tpm2_bios_measurements_next should increase position index
KEYS: reaching the keys quotas correctly
cpu/hotplug: Ignore pm_wakeup_pending() for disable_nonboot_cpus()
genirq/debugfs: Add missing sanity checks to interrupt injection
irqchip/versatile-fpga: Apply clear-mask earlier
io_uring: remove bogus RLIMIT_NOFILE check in file registration
pstore: pstore_ftrace_seq_next should increase position index
MIPS/tlbex: Fix LDDIR usage in setup_pw() for Loongson-3
MIPS: OCTEON: irq: Fix potential NULL pointer dereference
PM / Domains: Allow no domain-idle-states DT property in genpd when parsing
PM: sleep: wakeup: Skip wakeup_source_sysfs_remove() if device is not there
ath9k: Handle txpower changes even when TPC is disabled
signal: Extend exec_id to 64bits
x86/tsc_msr: Use named struct initializers
x86/tsc_msr: Fix MSR_FSB_FREQ mask for Cherry Trail devices
x86/tsc_msr: Make MSR derived TSC frequency more accurate
x86/entry/32: Add missing ASM_CLAC to general_protection entry
platform/x86: asus-wmi: Support laptops where the first battery is named BATT
KVM: nVMX: Properly handle userspace interrupt window request
KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks
KVM: s390: vsie: Fix delivery of addressing exceptions
KVM: x86: Allocate new rmap and large page tracking when moving memslot
KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support
KVM: x86: Gracefully handle __vmalloc() failure during VM allocation
KVM: VMX: Add a trampoline to fix VMREAD error handling
KVM: VMX: fix crash cleanup when KVM wasn't used
smb3: fix performance regression with setting mtime
CIFS: Fix bug which the return value by asynchronous read is error
mtd: spinand: Stop using spinand->oobbuf for buffering bad block markers
mtd: spinand: Do not erase the block before writing a bad block marker
btrfs: Don't submit any btree write bio if the fs has errors
Btrfs: fix crash during unmount due to race with delayed inode workers
btrfs: reloc: clean dirty subvols if we fail to start a transaction
btrfs: set update the uuid generation as soon as possible
btrfs: drop block from cache on error in relocation
btrfs: fix missing file extent item for hole after ranged fsync
btrfs: unset reloc control if we fail to recover
btrfs: fix missing semaphore unlock in btrfs_sync_file
btrfs: use nofs allocations for running delayed items
remoteproc: qcom_q6v5_mss: Don't reassign mpss region on shutdown
remoteproc: qcom_q6v5_mss: Reload the mba region on coredump
remoteproc: Fix NULL pointer dereference in rproc_virtio_notify
crypto: rng - Fix a refcounting bug in crypto_rng_reset()
crypto: mxs-dcp - fix scatterlist linearization for hash
erofs: correct the remaining shrink objects
io_uring: honor original task RLIMIT_FSIZE
mmc: sdhci-of-esdhc: fix esdhc_reset() for different controller versions
powerpc/pseries: Drop pointless static qualifier in vpa_debugfs_init()
tools: gpio: Fix out-of-tree build regression
net: qualcomm: rmnet: Allow configuration updates to existing devices
arm64: dts: allwinner: h6: Fix PMU compatible
sched/core: Remove duplicate assignment in sched_tick_remote()
arm64: dts: allwinner: h5: Fix PMU compatible
mm, memcg: do not high throttle allocators based on wraparound
dm writecache: add cond_resched to avoid CPU hangs
dm integrity: fix a crash with unusually large tag size
dm verity fec: fix memory leak in verity_fec_dtr
dm clone: Add overflow check for number of regions
dm clone metadata: Fix return type of dm_clone_nr_of_hydrated_regions()
XArray: Fix xas_pause for large multi-index entries
xarray: Fix early termination of xas_for_each_marked
crypto: caam/qi2 - fix chacha20 data size error
crypto: caam - update xts sector size for large input length
crypto: ccree - protect against empty or NULL scatterlists
crypto: ccree - only try to map auth tag if needed
crypto: ccree - dec auth tag size from cryptlen map
scsi: zfcp: fix missing erp_lock in port recovery trigger for point-to-point
scsi: ufs: fix Auto-Hibern8 error detection
scsi: lpfc: Fix lpfc_io_buf resource leak in lpfc_get_scsi_buf_s4 error path
ARM: dts: exynos: Fix polarity of the LCD SPI bus on UniversalC210 board
arm64: dts: ti: k3-am65: Add clocks to dwc3 nodes
arm64: armv8_deprecated: Fix undef_hook mask for thumb setend
selftests: vm: drop dependencies on page flags from mlock2 tests
selftests/vm: fix map_hugetlb length used for testing read and write
selftests/powerpc: Add tlbie_test in .gitignore
vfio: platform: Switch to platform_get_irq_optional()
drm/i915/gem: Flush all the reloc_gpu batch
drm/etnaviv: rework perfmon query infrastructure
drm: Remove PageReserved manipulation from drm_pci_alloc
drm/amdgpu/powerplay: using the FCLK DPM table to set the MCLK
drm/amdgpu: unify fw_write_wait for new gfx9 asics
powerpc/pseries: Avoid NULL pointer dereference when drmem is unavailable
nfsd: fsnotify on rmdir under nfsd/clients/
NFS: Fix use-after-free issues in nfs_pageio_add_request()
NFS: Fix a page leak in nfs_destroy_unlinked_subrequests()
ext4: fix a data race at inode->i_blocks
fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once()
ocfs2: no need try to truncate file beyond i_size
perf tools: Support Python 3.8+ in Makefile
s390/diag: fix display of diagnose call statistics
Input: i8042 - add Acer Aspire 5738z to nomux list
ftrace/kprobe: Show the maxactive number on kprobe_events
clk: ingenic/jz4770: Exit with error if CGU init failed
clk: ingenic/TCU: Fix round_rate returning error
kmod: make request_module() return an error when autoloading is disabled
cpufreq: powernv: Fix use-after-free
hfsplus: fix crash and filesystem corruption when deleting files
libata: Return correct status in sata_pmp_eh_recover_pm() when ATA_DFLAG_DETACH is set
ipmi: fix hung processes in __get_guid()
xen/blkfront: fix memory allocation flags in blkfront_setup_indirect()
powerpc/64/tm: Don't let userspace set regs->trap via sigreturn
powerpc/fsl_booke: Avoid creating duplicate tlb1 entry
powerpc/hash64/devmap: Use H_PAGE_THP_HUGE when setting up huge devmap PTE entries
powerpc/xive: Use XIVE_BAD_IRQ instead of zero to catch non configured IPIs
powerpc/64: Setup a paca before parsing device tree etc.
powerpc/xive: Fix xmon support on the PowerNV platform
powerpc/kprobes: Ignore traps that happened in real mode
powerpc/64: Prevent stack protection in early boot
scsi: mpt3sas: Fix kernel panic observed on soft HBA unplug
powerpc: Make setjmp/longjmp signature standard
arm64: Always force a branch protection mode when the compiler has one
dm zoned: remove duplicate nr_rnd_zones increase in dmz_init_zone()
dm clone: replace spin_lock_irqsave with spin_lock_irq
dm clone: Fix handling of partial region discards
dm clone: Add missing casts to prevent overflows and data corruption
scsi: lpfc: Add registration for CPU Offline/Online events
scsi: lpfc: Fix Fabric hostname registration if system hostname changes
scsi: lpfc: Fix configuration of BB credit recovery in service parameters
scsi: lpfc: Fix broken Credit Recovery after driver load
Revert "drm/dp_mst: Remove VCPI while disabling topology mgr"
drm/dp_mst: Fix clearing payload state on topology disable
drm/amdgpu: fix gfx hang during suspend with video playback (v2)
drm/i915/icl+: Don't enable DDI IO power on a TypeC port in TBT mode
powerpc/kasan: Fix kasan_remap_early_shadow_ro()
mmc: sdhci: Convert sdhci_set_timeout_irq() to non-static
mmc: sdhci: Refactor sdhci_set_timeout()
bpf: Fix tnum constraints for 32-bit comparisons
mfd: dln2: Fix sanity checking for endpoints
efi/x86: Fix the deletion of variables in mixed mode
ASoC: stm32: sai: Add missing cleanup
scsi: lpfc: fix inlining of lpfc_sli4_cleanup_poll_list()
Linux 5.4.33
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6c37e2c64801a572781c46fc5883bcc74e6a7a1a
[ Upstream commit 604dca5e3a ]
The BPF verifier tried to track values based on 32-bit comparisons by
(ab)using the tnum state via 581738a681 ("bpf: Provide better register
bounds after jmp32 instructions"). The idea is that after a check like
this:
if ((u32)r0 > 3)
exit
We can't meaningfully constrain the arithmetic-range-based tracking, but
we can update the tnum state to (value=0,mask=0xffff'ffff'0000'0003).
However, the implementation from 581738a681 didn't compute the tnum
constraint based on the fixed operand, but instead derives it from the
arithmetic-range-based tracking. This means that after the following
sequence of operations:
if (r0 >= 0x1'0000'0001)
exit
if ((u32)r0 > 7)
exit
The verifier assumed that the lower half of r0 is in the range (0, 0)
and apply the tnum constraint (value=0,mask=0xffff'ffff'0000'0000) thus
causing the overall tnum to be (value=0,mask=0x1'0000'0000), which was
incorrect. Provide a fixed implementation.
Fixes: 581738a681 ("bpf: Provide better register bounds after jmp32 instructions")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200330160324.15259-3-daniel@iogearbox.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit d7d27cfc5c upstream.
Patch series "module autoloading fixes and cleanups", v5.
This series fixes a bug where request_module() was reporting success to
kernel code when module autoloading had been completely disabled via
'echo > /proc/sys/kernel/modprobe'.
It also addresses the issues raised on the original thread
(https://lkml.kernel.org/lkml/20200310223731.126894-1-ebiggers@kernel.org/T/#u)
bydocumenting the modprobe sysctl, adding a self-test for the empty path
case, and downgrading a user-reachable WARN_ONCE().
This patch (of 4):
It's long been possible to disable kernel module autoloading completely
(while still allowing manual module insertion) by setting
/proc/sys/kernel/modprobe to the empty string.
This can be preferable to setting it to a nonexistent file since it
avoids the overhead of an attempted execve(), avoids potential
deadlocks, and avoids the call to security_kernel_module_request() and
thus on SELinux-based systems eliminates the need to write SELinux rules
to dontaudit module_request.
However, when module autoloading is disabled in this way,
request_module() returns 0. This is broken because callers expect 0 to
mean that the module was successfully loaded.
Apparently this was never noticed because this method of disabling
module autoloading isn't used much, and also most callers don't use the
return value of request_module() since it's always necessary to check
whether the module registered its functionality or not anyway.
But improperly returning 0 can indeed confuse a few callers, for example
get_fs_type() in fs/filesystems.c where it causes a WARNING to be hit:
if (!fs && (request_module("fs-%.*s", len, name) == 0)) {
fs = __get_fs_type(name, len);
WARN_ONCE(!fs, "request_module fs-%.*s succeeded, but still no fs?\n", len, name);
}
This is easily reproduced with:
echo > /proc/sys/kernel/modprobe
mount -t NONEXISTENT none /
It causes:
request_module fs-NONEXISTENT succeeded, but still no fs?
WARNING: CPU: 1 PID: 1106 at fs/filesystems.c:275 get_fs_type+0xd6/0xf0
[...]
This should actually use pr_warn_once() rather than WARN_ONCE(), since
it's also user-reachable if userspace immediately unloads the module.
Regardless, request_module() should correctly return an error when it
fails. So let's make it return -ENOENT, which matches the error when
the modprobe binary doesn't exist.
I've also sent patches to document and test this case.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jessica Yu <jeyu@kernel.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Ben Hutchings <benh@debian.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200310223731.126894-1-ebiggers@kernel.org
Link: http://lkml.kernel.org/r/20200312202552.241885-1-ebiggers@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d1e7fd6462 upstream.
Replace the 32bit exec_id with a 64bit exec_id to make it impossible
to wrap the exec_id counter. With care an attacker can cause exec_id
wrap and send arbitrary signals to a newly exec'd parent. This
bypasses the signal sending checks if the parent changes their
credentials during exec.
The severity of this problem can been seen that in my limited testing
of a 32bit exec_id it can take as little as 19s to exec 65536 times.
Which means that it can take as little as 14 days to wrap a 32bit
exec_id. Adam Zabrocki has succeeded wrapping the self_exe_id in 7
days. Even my slower timing is in the uptime of a typical server.
Which means self_exec_id is simply a speed bump today, and if exec
gets noticably faster self_exec_id won't even be a speed bump.
Extending self_exec_id to 64bits introduces a problem on 32bit
architectures where reading self_exec_id is no longer atomic and can
take two read instructions. Which means that is is possible to hit
a window where the read value of exec_id does not match the written
value. So with very lucky timing after this change this still
remains expoiltable.
I have updated the update of exec_id on exec to use WRITE_ONCE
and the read of exec_id in do_notify_parent to use READ_ONCE
to make it clear that there is no locking between these two
locations.
Link: https://lore.kernel.org/kernel-hardening/20200324215049.GA3710@pi3.com.pl
Fixes: 2.3.23pre2
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e98eac6ff1 upstream.
A recent change to freeze_secondary_cpus() which added an early abort if a
wakeup is pending missed the fact that the function is also invoked for
shutdown, reboot and kexec via disable_nonboot_cpus().
In case of disable_nonboot_cpus() the wakeup event needs to be ignored as
the purpose is to terminate the currently running kernel.
Add a 'suspend' argument which is only set when the freeze is in context of
a suspend operation. If not set then an eventually pending wakeup event is
ignored.
Fixes: a66d955e91 ("cpu/hotplug: Abort disabling secondary CPUs if wakeup is pending")
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Pavankumar Kondeti <pkondeti@codeaurora.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/874kuaxdiz.fsf@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fe61468b2c upstream.
When a cfs rq is throttled, the latter and its child are removed from the
leaf list but their nr_running is not changed which includes staying higher
than 1. When a task is enqueued in this throttled branch, the cfs rqs must
be added back in order to ensure correct ordering in the list but this can
only happens if nr_running == 1.
When cfs bandwidth is used, we call unconditionnaly list_add_leaf_cfs_rq()
when enqueuing an entity to make sure that the complete branch will be
added.
Similarly unthrottle_cfs_rq() can stop adding cfs in the list when a parent
is throttled. Iterate the remaining entity to ensure that the complete
branch will be added in the list.
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: stable@vger.kernel.org
Cc: stable@vger.kernel.org #v5.1+
Link: https://lkml.kernel.org/r/20200306135257.25044-1-vincent.guittot@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 25016bd7f4 ]
Qian Cai reported a bug when PROVE_RCU_LIST=y, and read on /proc/lockdep
triggered a warning:
[ ] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
...
[ ] Call Trace:
[ ] lock_is_held_type+0x5d/0x150
[ ] ? rcu_lockdep_current_cpu_online+0x64/0x80
[ ] rcu_read_lock_any_held+0xac/0x100
[ ] ? rcu_read_lock_held+0xc0/0xc0
[ ] ? __slab_free+0x421/0x540
[ ] ? kasan_kmalloc+0x9/0x10
[ ] ? __kmalloc_node+0x1d7/0x320
[ ] ? kvmalloc_node+0x6f/0x80
[ ] __bfs+0x28a/0x3c0
[ ] ? class_equal+0x30/0x30
[ ] lockdep_count_forward_deps+0x11a/0x1a0
The warning got triggered because lockdep_count_forward_deps() call
__bfs() without current->lockdep_recursion being set, as a result
a lockdep internal function (__bfs()) is checked by lockdep, which is
unexpected, and the inconsistency between the irq-off state and the
state traced by lockdep caused the warning.
Apart from this warning, lockdep internal functions like __bfs() should
always be protected by current->lockdep_recursion to avoid potential
deadlocks and data inconsistency, therefore add the
current->lockdep_recursion on-and-off section to protect __bfs() in both
lockdep_count_forward_deps() and lockdep_count_backward_deps()
Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200312151258.128036-1-boqun.feng@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 87f2d1c662 ]
irq_domain_alloc_irqs_hierarchy() has 3 call sites in the compilation unit
but only one of them checks for the pointer which is being dereferenced
inside the called function. Move the check into the function. This allows
for catching the error instead of the following crash:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
PC is at 0x0
LR is at gpiochip_hierarchy_irq_domain_alloc+0x11f/0x140
...
[<c06c23ff>] (gpiochip_hierarchy_irq_domain_alloc)
[<c0462a89>] (__irq_domain_alloc_irqs)
[<c0462dad>] (irq_create_fwspec_mapping)
[<c06c2251>] (gpiochip_to_irq)
[<c06c1c9b>] (gpiod_to_irq)
[<bf973073>] (gpio_irqs_init [gpio_irqs])
[<bf974048>] (gpio_irqs_exit+0xecc/0xe84 [gpio_irqs])
Code: bad PC value
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200306174720.82604-1-alexander.sverdlin@nokia.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 26cf52229e ]
During our testing, we found a case that shares no longer
working correctly, the cgroup topology is like:
/sys/fs/cgroup/cpu/A (shares=102400)
/sys/fs/cgroup/cpu/A/B (shares=2)
/sys/fs/cgroup/cpu/A/B/C (shares=1024)
/sys/fs/cgroup/cpu/D (shares=1024)
/sys/fs/cgroup/cpu/D/E (shares=1024)
/sys/fs/cgroup/cpu/D/E/F (shares=1024)
The same benchmark is running in group C & F, no other tasks are
running, the benchmark is capable to consumed all the CPUs.
We suppose the group C will win more CPU resources since it could
enjoy all the shares of group A, but it's F who wins much more.
The reason is because we have group B with shares as 2, since
A->cfs_rq.load.weight == B->se.load.weight == B->shares/nr_cpus,
so A->cfs_rq.load.weight become very small.
And in calc_group_shares() we calculate shares as:
load = max(scale_load_down(cfs_rq->load.weight), cfs_rq->avg.load_avg);
shares = (tg_shares * load) / tg_weight;
Since the 'cfs_rq->load.weight' is too small, the load become 0
after scale down, although 'tg_shares' is 102400, shares of the se
which stand for group A on root cfs_rq become 2.
While the se of D on root cfs_rq is far more bigger than 2, so it
wins the battle.
Thus when scale_load_down() scale real weight down to 0, it's no
longer telling the real story, the caller will have the wrong
information and the calculation will be buggy.
This patch add check in scale_load_down(), so the real weight will
be >= MIN_SHARES after scale, after applied the group C wins as
expected.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/38e8e212-59a1-64b2-b247-b6d0b52d8dc1@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2c8bd58812 ]
To minimize latency, PREEMPT_RT kernels expires hrtimers in preemptible
softirq context by default. This can be overriden by marking the timer's
expiry with HRTIMER_MODE_HARD.
sched_clock_timer is missing this annotation: if its callback is preempted
and the duration of the preemption exceeds the wrap around time of the
underlying clocksource, sched clock will get out of sync.
Mark the sched_clock_timer for expiry in hard interrupt context.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200309181529.26558-1-a.darwish@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 17c4a2ae15 ]
When dma_mmap_coherent() sets up a mapping to unencrypted coherent memory
under SEV encryption and sometimes under SME encryption, it will actually
set up an encrypted mapping rather than an unencrypted, causing devices
that DMAs from that memory to read encrypted contents. Fix this.
When force_dma_unencrypted() returns true, the linear kernel map of the
coherent pages have had the encryption bit explicitly cleared and the
page content is unencrypted. Make sure that any additional PTEs we set
up to these pages also have the encryption bit cleared by having
dma_pgprot() return a protection with the encryption bit cleared in this
case.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20200304114527.3636-3-thomas_os@shipmail.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1bc7896e9e ]
When experimenting with bpf_send_signal() helper in our production
environment (5.2 based), we experienced a deadlock in NMI mode:
#5 [ffffc9002219f770] queued_spin_lock_slowpath at ffffffff8110be24
#6 [ffffc9002219f770] _raw_spin_lock_irqsave at ffffffff81a43012
#7 [ffffc9002219f780] try_to_wake_up at ffffffff810e7ecd
#8 [ffffc9002219f7e0] signal_wake_up_state at ffffffff810c7b55
#9 [ffffc9002219f7f0] __send_signal at ffffffff810c8602
#10 [ffffc9002219f830] do_send_sig_info at ffffffff810ca31a
#11 [ffffc9002219f868] bpf_send_signal at ffffffff8119d227
#12 [ffffc9002219f988] bpf_overflow_handler at ffffffff811d4140
#13 [ffffc9002219f9e0] __perf_event_overflow at ffffffff811d68cf
#14 [ffffc9002219fa10] perf_swevent_overflow at ffffffff811d6a09
#15 [ffffc9002219fa38] ___perf_sw_event at ffffffff811e0f47
#16 [ffffc9002219fc30] __schedule at ffffffff81a3e04d
#17 [ffffc9002219fc90] schedule at ffffffff81a3e219
#18 [ffffc9002219fca0] futex_wait_queue_me at ffffffff8113d1b9
#19 [ffffc9002219fcd8] futex_wait at ffffffff8113e529
#20 [ffffc9002219fdf0] do_futex at ffffffff8113ffbc
#21 [ffffc9002219fec0] __x64_sys_futex at ffffffff81140d1c
#22 [ffffc9002219ff38] do_syscall_64 at ffffffff81002602
#23 [ffffc9002219ff50] entry_SYSCALL_64_after_hwframe at ffffffff81c00068
The above call stack is actually very similar to an issue
reported by Commit eac9153f2b ("bpf/stackmap: Fix deadlock with
rq_lock in bpf_get_stack()") by Song Liu. The only difference is
bpf_send_signal() helper instead of bpf_get_stack() helper.
The above deadlock is triggered with a perf_sw_event.
Similar to Commit eac9153f2b, the below almost identical reproducer
used tracepoint point sched/sched_switch so the issue can be easily caught.
/* stress_test.c */
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <pthread.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#define THREAD_COUNT 1000
char *filename;
void *worker(void *p)
{
void *ptr;
int fd;
char *pptr;
fd = open(filename, O_RDONLY);
if (fd < 0)
return NULL;
while (1) {
struct timespec ts = {0, 1000 + rand() % 2000};
ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0);
usleep(1);
if (ptr == MAP_FAILED) {
printf("failed to mmap\n");
break;
}
munmap(ptr, 4096 * 64);
usleep(1);
pptr = malloc(1);
usleep(1);
pptr[0] = 1;
usleep(1);
free(pptr);
usleep(1);
nanosleep(&ts, NULL);
}
close(fd);
return NULL;
}
int main(int argc, char *argv[])
{
void *ptr;
int i;
pthread_t threads[THREAD_COUNT];
if (argc < 2)
return 0;
filename = argv[1];
for (i = 0; i < THREAD_COUNT; i++) {
if (pthread_create(threads + i, NULL, worker, NULL)) {
fprintf(stderr, "Error creating thread\n");
return 0;
}
}
for (i = 0; i < THREAD_COUNT; i++)
pthread_join(threads[i], NULL);
return 0;
}
and the following command:
1. run `stress_test /bin/ls` in one windown
2. hack bcc trace.py with the following change:
# --- a/tools/trace.py
# +++ b/tools/trace.py
@@ -513,6 +513,7 @@ BPF_PERF_OUTPUT(%s);
__data.tgid = __tgid;
__data.pid = __pid;
bpf_get_current_comm(&__data.comm, sizeof(__data.comm));
+ bpf_send_signal(10);
%s
%s
%s.perf_submit(%s, &__data, sizeof(__data));
3. in a different window run
./trace.py -p $(pidof stress_test) t:sched:sched_switch
The deadlock can be reproduced in our production system.
Similar to Song's fix, the fix is to delay sending signal if
irqs is disabled to avoid deadlocks involving with rq_lock.
With this change, my above stress-test in our production system
won't cause deadlock any more.
I also implemented a scale-down version of reproducer in the
selftest (a subsequent commit). With latest bpf-next,
it complains for the following potential deadlock.
[ 32.832450] -> #1 (&p->pi_lock){-.-.}:
[ 32.833100] _raw_spin_lock_irqsave+0x44/0x80
[ 32.833696] task_rq_lock+0x2c/0xa0
[ 32.834182] task_sched_runtime+0x59/0xd0
[ 32.834721] thread_group_cputime+0x250/0x270
[ 32.835304] thread_group_cputime_adjusted+0x2e/0x70
[ 32.835959] do_task_stat+0x8a7/0xb80
[ 32.836461] proc_single_show+0x51/0xb0
...
[ 32.839512] -> #0 (&(&sighand->siglock)->rlock){....}:
[ 32.840275] __lock_acquire+0x1358/0x1a20
[ 32.840826] lock_acquire+0xc7/0x1d0
[ 32.841309] _raw_spin_lock_irqsave+0x44/0x80
[ 32.841916] __lock_task_sighand+0x79/0x160
[ 32.842465] do_send_sig_info+0x35/0x90
[ 32.842977] bpf_send_signal+0xa/0x10
[ 32.843464] bpf_prog_bc13ed9e4d3163e3_send_signal_tp_sched+0x465/0x1000
[ 32.844301] trace_call_bpf+0x115/0x270
[ 32.844809] perf_trace_run_bpf_submit+0x4a/0xc0
[ 32.845411] perf_trace_sched_switch+0x10f/0x180
[ 32.846014] __schedule+0x45d/0x880
[ 32.846483] schedule+0x5f/0xd0
...
[ 32.853148] Chain exists of:
[ 32.853148] &(&sighand->siglock)->rlock --> &p->pi_lock --> &rq->lock
[ 32.853148]
[ 32.854451] Possible unsafe locking scenario:
[ 32.854451]
[ 32.855173] CPU0 CPU1
[ 32.855745] ---- ----
[ 32.856278] lock(&rq->lock);
[ 32.856671] lock(&p->pi_lock);
[ 32.857332] lock(&rq->lock);
[ 32.857999] lock(&(&sighand->siglock)->rlock);
Deadlock happens on CPU0 when it tries to acquire &sighand->siglock
but it has been held by CPU1 and CPU1 tries to grab &rq->lock
and cannot get it.
This is not exactly the callstack in our production environment,
but sympotom is similar and both locks are using spin_lock_irqsave()
to acquire the lock, and both involves rq_lock. The fix to delay
sending signal when irq is disabled also fixed this issue.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200304191104.2796501-1-yhs@fb.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add EXPORT_SYMBOL_GPL entries for irq_chip_retrigger_hierarchy()
and irq_chip_set_vcpu_affinity_parent() so that we can allow
drivers like the qcom-pdc driver to be loadable as a module.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 153049053
Change-Id: Ie0202adb6f02fee19897d1b18df978c95ff58118
Add export for irq_domain_update_bus_token() so that
we can allow drivers like the qcom-pdc driver to be
loadable as a module.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 153049053
Change-Id: I395859c3e6fe7e8ceb4075c84267fbb68fbb1938
These changes build upon the existing Android kernel wakeup reason code
to:
* improve the positioning of suspend abort logging calls in suspend flow
* add logging of abnormal wakeup reasons like unexpected HW IRQs and
IRQs configured as both wake-enabled and no-suspend
* add support for capturing deferred-processing threaded nested IRQs as
wakeup reasons rather than their synchronously-processed parents
Bug: 150970830
Bug: 140217217
Bug: 120445600
Signed-off-by: Kelly Rossmoyer <krossmo@google.com>
Change-Id: I903b811a0fe11a605a25815c3a341668a23de700
Changes in 5.4.31
nvme-rdma: Avoid double freeing of async event data
kconfig: introduce m32-flag and m64-flag
drm/amd/display: Add link_rate quirk for Apple 15" MBP 2017
drm/bochs: downgrade pci_request_region failure from error to warning
initramfs: restore default compression behavior
drm/amdgpu: fix typo for vcn1 idle check
tools/power turbostat: Fix gcc build warnings
tools/power turbostat: Fix missing SYS_LPI counter on some Chromebooks
tools/power turbostat: Fix 32-bit capabilities warning
net/mlx5e: kTLS, Fix TCP seq off-by-1 issue in TX resync flow
XArray: Fix xa_find_next for large multi-index entries
padata: fix uninitialized return value in padata_replace()
brcmfmac: abort and release host after error
misc: rtsx: set correct pcr_ops for rts522A
misc: pci_endpoint_test: Fix to support > 10 pci-endpoint-test devices
misc: pci_endpoint_test: Avoid using module parameter to determine irqtype
PCI: sysfs: Revert "rescan" file renames
coresight: do not use the BIT() macro in the UAPI header
mei: me: add cedar fork device ids
nvmem: check for NULL reg_read and reg_write before dereferencing
extcon: axp288: Add wakeup support
power: supply: axp288_charger: Add special handling for HP Pavilion x2 10
Revert "dm: always call blk_queue_split() in dm_process_bio()"
ALSA: hda/ca0132 - Add Recon3Di quirk to handle integrated sound on EVGA X99 Classified motherboard
soc: mediatek: knows_txdone needs to be set in Mediatek CMDQ helper
net/mlx5e: kTLS, Fix wrong value in record tracker enum
iwlwifi: consider HE capability when setting LDPC
iwlwifi: yoyo: don't add TLV offset when reading FIFOs
iwlwifi: dbg: don't abort if sending DBGC_SUSPEND_RESUME fails
rxrpc: Fix sendmsg(MSG_WAITALL) handling
IB/hfi1: Ensure pq is not left on waitlist
tcp: fix TFO SYNACK undo to avoid double-timestamp-undo
watchdog: iTCO_wdt: Export vendorsupport
watchdog: iTCO_wdt: Make ICH_RES_IO_SMI optional
i2c: i801: Do not add ICH_RES_IO_SMI for the iTCO_wdt device
net: Fix Tx hash bound checking
padata: always acquire cpu_hotplug_lock before pinst->lock
mm: mempolicy: require at least one nodeid for MPOL_PREFERRED
Linux 5.4.31
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I9c87409ae13ad2da7a90be98586a85904a5cdb33
commit 38228e8848 upstream.
lockdep complains when padata's paths to update cpumasks via CPU hotplug
and sysfs are both taken:
# echo 0 > /sys/devices/system/cpu/cpu1/online
# echo ff > /sys/kernel/pcrypt/pencrypt/parallel_cpumask
======================================================
WARNING: possible circular locking dependency detected
5.4.0-rc8-padata-cpuhp-v3+ #1 Not tainted
------------------------------------------------------
bash/205 is trying to acquire lock:
ffffffff8286bcd0 (cpu_hotplug_lock.rw_sem){++++}, at: padata_set_cpumask+0x2b/0x120
but task is already holding lock:
ffff8880001abfa0 (&pinst->lock){+.+.}, at: padata_set_cpumask+0x26/0x120
which lock already depends on the new lock.
padata doesn't take cpu_hotplug_lock and pinst->lock in a consistent
order. Which should be first? CPU hotplug calls into padata with
cpu_hotplug_lock already held, so it should have priority.
Fixes: 6751fb3c0e ("padata: Use get_online_cpus/put_online_cpus")
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It's long been possible to disable kernel module autoloading completely
(while still allowing manual module insertion) by setting
/proc/sys/kernel/modprobe to the empty string. This can be preferable
to setting it to a nonexistent file since it avoids the overhead of an
attempted execve(), avoids potential deadlocks, and avoids the call to
security_kernel_module_request() and thus on SELinux-based systems
eliminates the need to write SELinux rules to dontaudit module_request.
However, when module autoloading is disabled in this way,
request_module() returns 0. This is broken because callers expect 0 to
mean that the module was successfully loaded.
Apparently this was never noticed because this method of disabling
module autoloading isn't used much, and also most callers don't use the
return value of request_module() since it's always necessary to check
whether the module registered its functionality or not anyway. But
improperly returning 0 can indeed confuse a few callers, for example
get_fs_type() in fs/filesystems.c where it causes a WARNING to be hit:
if (!fs && (request_module("fs-%.*s", len, name) == 0)) {
fs = __get_fs_type(name, len);
WARN_ONCE(!fs, "request_module fs-%.*s succeeded, but still no fs?\n", len, name);
}
This is easily reproduced with:
echo > /proc/sys/kernel/modprobe
mount -t NONEXISTENT none /
It causes:
request_module fs-NONEXISTENT succeeded, but still no fs?
WARNING: CPU: 1 PID: 1106 at fs/filesystems.c:275 get_fs_type+0xd6/0xf0
[...]
This should actually use pr_warn_once() rather than WARN_ONCE(), since
it's also user-reachable if userspace immediately unloads the module.
Regardless, request_module() should correctly return an error when it
fails. So let's make it return -ENOENT, which matches the error when
the modprobe binary doesn't exist.
I've also sent patches to document and test this case.
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Jessica Yu <jeyu@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: NeilBrown <neilb@suse.com>
Link: https://lore.kernel.org/r/20200318230515.171692-2-ebiggers@kernel.org
Bug: 151589316
Change-Id: I5e04f85e12a4f85da23e53bc11da1ade565abcd6
Signed-off-by: Eric Biggers <ebiggers@google.com>
A previous change added a test on the wrong config flag; rename
CFI to CFI_CLANG.
Bug: 145210207
Change-Id: Id8aead2eb2c75ad6442d10165f6cb86ccfb9c2f9
Signed-off-by: Alistair Delva <adelva@google.com>
When RT Capacity Aware support was added, the logic in select_task_rq_rt
was modified to force a search for a fitting CPU if the task currently
doesn't run on one.
But if the search failed, and the search was only triggered to fulfill
the fitness request; we could end up selecting a new CPU unnecessarily.
Fix this and re-instate the original behavior by ensuring we bail out
in that case.
This behavior change only affected asymmetric systems that are using
util_clamp to implement capacity aware. None asymmetric systems weren't
affected.
Bug: 120440300
LINK: https://lore.kernel.org/lkml/20200218041620.GD28029@codeaurora.org/
Reported-by: Pavan Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fixes: 804d402fb6 ("sched/rt: Make RT capacity-aware")
Link: https://lkml.kernel.org/r/20200302132721.8353-3-qais.yousef@arm.com
(cherry picked from commit b28bc1e002)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Ie3e9e56fcf7d92a1552828b20a5d9d827c030099
When searching for the best lowest_mask with a fitness_fn passed, make
sure we record the lowest_level that returns a valid lowest_mask so that
we can use that as a fallback in case we fail to find a fitting CPU at
all levels.
The intention in the original patch was not to allow a down migration to
unfitting CPU. But this missed the case where we are already running on
unfitting one.
With this change now RT tasks can still move between unfitting CPUs when
they're already running on such CPU.
And as Steve suggested; to adhere to the strict priority rules of RT, if
a task is already running on a fitting CPU but due to priority it can't
run on it, allow it to downmigrate to unfitting CPU so it can run.
Bug: 120440300
Reported-by: Pavan Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fixes: 804d402fb6 ("sched/rt: Make RT capacity-aware")
Link: https://lkml.kernel.org/r/20200302132721.8353-2-qais.yousef@arm.com
Link: https://lore.kernel.org/lkml/20200203142712.a7yvlyo2y3le5cpn@e107158-lin/
(cherry picked from commit d9cb236b94)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Ifa10286907644a01e75e37c9f523188fd7b43469
Changes in 5.4.29
mmc: core: Allow host controllers to require R1B for CMD6
mmc: core: Respect MMC_CAP_NEED_RSP_BUSY for erase/trim/discard
mmc: core: Respect MMC_CAP_NEED_RSP_BUSY for eMMC sleep command
mmc: sdhci-omap: Fix busy detection by enabling MMC_CAP_NEED_RSP_BUSY
mmc: sdhci-tegra: Fix busy detection by enabling MMC_CAP_NEED_RSP_BUSY
ACPI: PM: s2idle: Rework ACPI events synchronization
cxgb4: fix throughput drop during Tx backpressure
cxgb4: fix Txq restart check during backpressure
geneve: move debug check after netdev unregister
hsr: fix general protection fault in hsr_addr_is_self()
ipv4: fix a RCU-list lock in inet_dump_fib()
macsec: restrict to ethernet devices
mlxsw: pci: Only issue reset when system is ready
mlxsw: spectrum_mr: Fix list iteration in error path
net/bpfilter: fix dprintf usage for /dev/kmsg
net: cbs: Fix software cbs to consider packet sending time
net: dsa: Fix duplicate frames flooded by learning
net: dsa: mt7530: Change the LINK bit to reflect the link status
net: dsa: tag_8021q: replace dsa_8021q_remove_header with __skb_vlan_pop
net: ena: Add PCI shutdown handler to allow safe kexec
net: mvneta: Fix the case where the last poll did not process all rx
net/packet: tpacket_rcv: avoid a producer race condition
net: phy: dp83867: w/a for fld detect threshold bootstrapping issue
net: phy: mdio-bcm-unimac: Fix clock handling
net: phy: mdio-mux-bcm-iproc: check clk_prepare_enable() return value
net: qmi_wwan: add support for ASKEY WWHC050
net/sched: act_ct: Fix leak of ct zone template on replace
net_sched: cls_route: remove the right filter from hashtable
net_sched: hold rtnl lock in tcindex_partial_destroy_work()
net_sched: keep alloc_hash updated after hash allocation
net: stmmac: dwmac-rk: fix error path in rk_gmac_probe
NFC: fdp: Fix a signedness bug in fdp_nci_send_patch()
r8169: re-enable MSI on RTL8168c
slcan: not call free_netdev before rtnl_unlock in slcan_open
tcp: also NULL skb->dev when copy was needed
tcp: ensure skb->dev is NULL before leaving TCP stack
tcp: repair: fix TCP_QUEUE_SEQ implementation
vxlan: check return value of gro_cells_init()
bnxt_en: Fix Priority Bytes and Packets counters in ethtool -S.
bnxt_en: fix memory leaks in bnxt_dcbnl_ieee_getets()
bnxt_en: Return error if bnxt_alloc_ctx_mem() fails.
bnxt_en: Free context memory after disabling PCI in probe error path.
bnxt_en: Reset rings if ring reservation fails during open()
net: ip_gre: Separate ERSPAN newlink / changelink callbacks
net: ip_gre: Accept IFLA_INFO_DATA-less configuration
hsr: use rcu_read_lock() in hsr_get_node_{list/status}()
hsr: add restart routine into hsr_get_node_list()
hsr: set .netnsok flag
net/mlx5: DR, Fix postsend actions write length
net/mlx5e: Enhance ICOSQ WQE info fields
net/mlx5e: Fix missing reset of SW metadata in Striding RQ reset
net/mlx5e: Fix ICOSQ recovery flow with Striding RQ
net/mlx5e: Do not recover from a non-fatal syndrome
cgroup-v1: cgroup_pidlist_next should update position index
nfs: add minor version to nfs_server_key for fscache
cpupower: avoid multiple definition with gcc -fno-common
drivers/of/of_mdio.c:fix of_mdiobus_register()
cgroup1: don't call release_agent when it is ""
dt-bindings: net: FMan erratum A050385
arm64: dts: ls1043a: FMan erratum A050385
fsl/fman: detect FMan erratum A050385
drm/amd/display: update soc bb for nv14
drm/amdgpu: correct ROM_INDEX/DATA offset for VEGA20
drm/exynos: Fix cleanup of IOMMU related objects
iommu/vt-d: Silence RCU-list debugging warnings
s390/qeth: don't reset default_out_queue
s390/qeth: handle error when backing RX buffer
scsi: ipr: Fix softlockup when rescanning devices in petitboot
mac80211: Do not send mesh HWMP PREQ if HWMP is disabled
dpaa_eth: Remove unnecessary boolean expression in dpaa_get_headroom
sxgbe: Fix off by one in samsung driver strncpy size arg
net: hns3: fix "tc qdisc del" failed issue
iommu/vt-d: Fix debugfs register reads
iommu/vt-d: Populate debugfs if IOMMUs are detected
iwlwifi: mvm: fix non-ACPI function
i2c: hix5hd2: add missed clk_disable_unprepare in remove
Input: raydium_i2c_ts - fix error codes in raydium_i2c_boot_trigger()
Input: fix stale timestamp on key autorepeat events
Input: synaptics - enable RMI on HP Envy 13-ad105ng
Input: avoid BIT() macro usage in the serio.h UAPI header
IB/rdmavt: Free kernel completion queue when done
RDMA/core: Fix missing error check on dev_set_name()
gpiolib: Fix irq_disable() semantics
RDMA/nl: Do not permit empty devices names during RDMA_NLDEV_CMD_NEWLINK/SET
RDMA/mad: Do not crash if the rdma device does not have a umad interface
ceph: check POOL_FLAG_FULL/NEARFULL in addition to OSDMAP_FULL/NEARFULL
ceph: fix memory leak in ceph_cleanup_snapid_map()
ARM: dts: dra7: Add bus_dma_limit for L3 bus
ARM: dts: omap5: Add bus_dma_limit for L3 bus
x86/ioremap: Fix CONFIG_EFI=n build
perf probe: Fix to delete multiple probe event
perf probe: Do not depend on dwfl_module_addrsym()
rtlwifi: rtl8188ee: Fix regression due to commit d1d1a96bdb
tools: Let O= makes handle a relative path with -C option
scripts/dtc: Remove redundant YYLOC global declaration
scsi: sd: Fix optimal I/O size for devices that change reported values
nl80211: fix NL80211_ATTR_CHANNEL_WIDTH attribute type
mac80211: drop data frames without key on encrypted links
mac80211: mark station unauthorized before key removal
mm/swapfile.c: move inode_lock out of claim_swapfile
drivers/base/memory.c: indicate all memory blocks as removable
mm/sparse: fix kernel crash with pfn_section_valid check
mm: fork: fix kernel_stack memcg stats for various stack implementations
gpiolib: acpi: Correct comment for HP x2 10 honor_wakeup quirk
gpiolib: acpi: Rework honor_wakeup option into an ignore_wake option
gpiolib: acpi: Add quirk to ignore EC wakeups on HP x2 10 BYT + AXP288 model
bpf: Fix cgroup ref leak in cgroup_bpf_inherit on out-of-memory
RDMA/core: Ensure security pkey modify is not lost
afs: Fix handling of an abort from a service handler
genirq: Fix reference leaks on irq affinity notifiers
xfrm: handle NETDEV_UNREGISTER for xfrm device
vti[6]: fix packet tx through bpf_redirect() in XinY cases
RDMA/mlx5: Fix the number of hwcounters of a dynamic counter
RDMA/mlx5: Fix access to wrong pointer while performing flush due to error
RDMA/mlx5: Block delay drop to unprivileged users
xfrm: fix uctx len check in verify_sec_ctx_len
xfrm: add the missing verify_sec_ctx_len check in xfrm_add_acquire
xfrm: policy: Fix doulbe free in xfrm_policy_timer
afs: Fix client call Rx-phase signal handling
afs: Fix some tracing details
afs: Fix unpinned address list during probing
ieee80211: fix HE SPR size calculation
mac80211: set IEEE80211_TX_CTRL_PORT_CTRL_PROTO for nl80211 TX
netfilter: flowtable: reload ip{v6}h in nf_flow_tuple_ip{v6}
netfilter: nft_fwd_netdev: validate family and chain type
netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress
i2c: nvidia-gpu: Handle timeout correctly in gpu_i2c_check_status()
bpf, x32: Fix bug with JMP32 JSET BPF_X checking upper bits
bpf: Initialize storage pointers to NULL to prevent freeing garbage pointer
bpf/btf: Fix BTF verification of enum members in struct/union
bpf, sockmap: Remove bucket->lock from sock_{hash|map}_free
ARM: dts: sun8i-a83t-tbs-a711: Fix USB OTG mode detection
vti6: Fix memory leak of skb if input policy check fails
r8169: fix PHY driver check on platforms w/o module softdeps
clocksource/drivers/hyper-v: Untangle stimers and timesync from clocksources
bpf: Undo incorrect __reg_bound_offset32 handling
USB: serial: option: add support for ASKEY WWHC050
USB: serial: option: add BroadMobi BM806U
USB: serial: option: add Wistron Neweb D19Q1
USB: cdc-acm: restore capability check order
USB: serial: io_edgeport: fix slab-out-of-bounds read in edge_interrupt_callback
usb: musb: fix crash with highmen PIO and usbmon
media: flexcop-usb: fix endpoint sanity check
media: usbtv: fix control-message timeouts
staging: kpc2000: prevent underflow in cpld_reconfigure()
staging: rtl8188eu: Add ASUS USB-N10 Nano B1 to device table
staging: wlan-ng: fix ODEBUG bug in prism2sta_disconnect_usb
staging: wlan-ng: fix use-after-free Read in hfa384x_usbin_callback
ahci: Add Intel Comet Lake H RAID PCI ID
libfs: fix infoleak in simple_attr_read()
media: ov519: add missing endpoint sanity checks
media: dib0700: fix rc endpoint lookup
media: stv06xx: add missing descriptor sanity checks
media: xirlink_cit: add missing descriptor sanity checks
media: v4l2-core: fix a use-after-free bug of sd->devnode
net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build
Linux 5.4.29
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iebce1f1b95935de9229e2deb83dae66cf8661a88
commit f2d67fec0b upstream.
Anatoly has been fuzzing with kBdysch harness and reported a hang in
one of the outcomes:
0: (b7) r0 = 808464432
1: (7f) r0 >>= r0
2: (14) w0 -= 808464432
3: (07) r0 += 808464432
4: (b7) r1 = 808464432
5: (de) if w1 s<= w0 goto pc+0
R0_w=invP(id=0,umin_value=808464432,umax_value=5103431727,var_off=(0x30303020;0x10000001f)) R1_w=invP808464432 R10=fp0
6: (07) r0 += -2144337872
7: (14) w0 -= -1607454672
8: (25) if r0 > 0x30303030 goto pc+0
R0_w=invP(id=0,umin_value=271581184,umax_value=271581311,var_off=(0x10300000;0x7f)) R1_w=invP808464432 R10=fp0
9: (76) if w0 s>= 0x303030 goto pc+2
12: (95) exit
from 8 to 9: safe
from 5 to 6: R0_w=invP(id=0,umin_value=808464432,umax_value=5103431727,var_off=(0x30303020;0x10000001f)) R1_w=invP808464432 R10=fp0
6: (07) r0 += -2144337872
7: (14) w0 -= -1607454672
8: (25) if r0 > 0x30303030 goto pc+0
R0_w=invP(id=0,umin_value=271581184,umax_value=271581311,var_off=(0x10300000;0x7f)) R1_w=invP808464432 R10=fp0
9: safe
from 8 to 9: safe
verification time 589 usec
stack depth 0
processed 17 insns (limit 1000000) [...]
The underlying program was xlated as follows:
# bpftool p d x i 9
0: (b7) r0 = 808464432
1: (7f) r0 >>= r0
2: (14) w0 -= 808464432
3: (07) r0 += 808464432
4: (b7) r1 = 808464432
5: (de) if w1 s<= w0 goto pc+0
6: (07) r0 += -2144337872
7: (14) w0 -= -1607454672
8: (25) if r0 > 0x30303030 goto pc+0
9: (76) if w0 s>= 0x303030 goto pc+2
10: (05) goto pc-1
11: (05) goto pc-1
12: (95) exit
The verifier rewrote original instructions it recognized as dead code with
'goto pc-1', but reality differs from verifier simulation in that we're
actually able to trigger a hang due to hitting the 'goto pc-1' instructions.
Taking different examples to make the issue more obvious: in this example
we're probing bounds on a completely unknown scalar variable in r1:
[...]
5: R0_w=inv1 R1_w=inv(id=0) R10=fp0
5: (18) r2 = 0x4000000000
7: R0_w=inv1 R1_w=inv(id=0) R2_w=inv274877906944 R10=fp0
7: (18) r3 = 0x2000000000
9: R0_w=inv1 R1_w=inv(id=0) R2_w=inv274877906944 R3_w=inv137438953472 R10=fp0
9: (18) r4 = 0x400
11: R0_w=inv1 R1_w=inv(id=0) R2_w=inv274877906944 R3_w=inv137438953472 R4_w=inv1024 R10=fp0
11: (18) r5 = 0x200
13: R0_w=inv1 R1_w=inv(id=0) R2_w=inv274877906944 R3_w=inv137438953472 R4_w=inv1024 R5_w=inv512 R10=fp0
13: (2d) if r1 > r2 goto pc+4
R0_w=inv1 R1_w=inv(id=0,umax_value=274877906944,var_off=(0x0; 0x7fffffffff)) R2_w=inv274877906944 R3_w=inv137438953472 R4_w=inv1024 R5_w=inv512 R10=fp0
14: R0_w=inv1 R1_w=inv(id=0,umax_value=274877906944,var_off=(0x0; 0x7fffffffff)) R2_w=inv274877906944 R3_w=inv137438953472 R4_w=inv1024 R5_w=inv512 R10=fp0
14: (ad) if r1 < r3 goto pc+3
R0_w=inv1 R1_w=inv(id=0,umin_value=137438953472,umax_value=274877906944,var_off=(0x0; 0x7fffffffff)) R2_w=inv274877906944 R3_w=inv137438953472 R4_w=inv1024 R5_w=inv512 R10=fp0
15: R0=inv1 R1=inv(id=0,umin_value=137438953472,umax_value=274877906944,var_off=(0x0; 0x7fffffffff)) R2=inv274877906944 R3=inv137438953472 R4=inv1024 R5=inv512 R10=fp0
15: (2e) if w1 > w4 goto pc+2
R0=inv1 R1=inv(id=0,umin_value=137438953472,umax_value=274877906944,var_off=(0x0; 0x7f00000000)) R2=inv274877906944 R3=inv137438953472 R4=inv1024 R5=inv512 R10=fp0
16: R0=inv1 R1=inv(id=0,umin_value=137438953472,umax_value=274877906944,var_off=(0x0; 0x7f00000000)) R2=inv274877906944 R3=inv137438953472 R4=inv1024 R5=inv512 R10=fp0
16: (ae) if w1 < w5 goto pc+1
R0=inv1 R1=inv(id=0,umin_value=137438953472,umax_value=274877906944,var_off=(0x0; 0x7f00000000)) R2=inv274877906944 R3=inv137438953472 R4=inv1024 R5=inv512 R10=fp0
[...]
We're first probing lower/upper bounds via jmp64, later we do a similar
check via jmp32 and examine the resulting var_off there. After fall-through
in insn 14, we get the following bounded r1 with 0x7fffffffff unknown marked
bits in the variable section.
Thus, after knowing r1 <= 0x4000000000 and r1 >= 0x2000000000:
max: 0b100000000000000000000000000000000000000 / 0x4000000000
var: 0b111111111111111111111111111111111111111 / 0x7fffffffff
min: 0b010000000000000000000000000000000000000 / 0x2000000000
Now, in insn 15 and 16, we perform a similar probe with lower/upper bounds
in jmp32.
Thus, after knowing r1 <= 0x4000000000 and r1 >= 0x2000000000 and
w1 <= 0x400 and w1 >= 0x200:
max: 0b100000000000000000000000000000000000000 / 0x4000000000
var: 0b111111100000000000000000000000000000000 / 0x7f00000000
min: 0b010000000000000000000000000000000000000 / 0x2000000000
The lower/upper bounds haven't changed since they have high bits set in
u64 space and the jmp32 tests can only refine bounds in the low bits.
However, for the var part the expectation would have been 0x7f000007ff
or something less precise up to 0x7fffffffff. A outcome of 0x7f00000000
is not correct since it would contradict the earlier probed bounds
where we know that the result should have been in [0x200,0x400] in u32
space. Therefore, tests with such info will lead to wrong verifier
assumptions later on like falsely predicting conditional jumps to be
always taken, etc.
The issue here is that __reg_bound_offset32()'s implementation from
commit 581738a681 ("bpf: Provide better register bounds after jmp32
instructions") makes an incorrect range assumption:
static void __reg_bound_offset32(struct bpf_reg_state *reg)
{
u64 mask = 0xffffFFFF;
struct tnum range = tnum_range(reg->umin_value & mask,
reg->umax_value & mask);
struct tnum lo32 = tnum_cast(reg->var_off, 4);
struct tnum hi32 = tnum_lshift(tnum_rshift(reg->var_off, 32), 32);
reg->var_off = tnum_or(hi32, tnum_intersect(lo32, range));
}
In the above walk-through example, __reg_bound_offset32() as-is chose
a range after masking with 0xffffffff of [0x0,0x0] since umin:0x2000000000
and umax:0x4000000000 and therefore the lo32 part was clamped to 0x0 as
well. However, in the umin:0x2000000000 and umax:0x4000000000 range above
we'd end up with an actual possible interval of [0x0,0xffffffff] for u32
space instead.
In case of the original reproducer, the situation looked as follows at
insn 5 for r0:
[...]
5: R0_w=invP(id=0,umin_value=808464432,umax_value=5103431727,var_off=(0x0; 0x1ffffffff)) R1_w=invP808464432 R10=fp0
0x30303030 0x13030302f
5: (de) if w1 s<= w0 goto pc+0
R0_w=invP(id=0,umin_value=808464432,umax_value=5103431727,var_off=(0x30303020; 0x10000001f)) R1_w=invP808464432 R10=fp0
0x30303030 0x13030302f
[...]
After the fall-through, we similarly forced the var_off result into
the wrong range [0x30303030,0x3030302f] suggesting later on that fixed
bits must only be of 0x30303020 with 0x10000001f unknowns whereas such
assumption can only be made when both bounds in hi32 range match.
Originally, I was thinking to fix this by moving reg into a temp reg and
use proper coerce_reg_to_size() helper on the temp reg where we can then
based on that define the range tnum for later intersection:
static void __reg_bound_offset32(struct bpf_reg_state *reg)
{
struct bpf_reg_state tmp = *reg;
struct tnum lo32, hi32, range;
coerce_reg_to_size(&tmp, 4);
range = tnum_range(tmp.umin_value, tmp.umax_value);
lo32 = tnum_cast(reg->var_off, 4);
hi32 = tnum_lshift(tnum_rshift(reg->var_off, 32), 32);
reg->var_off = tnum_or(hi32, tnum_intersect(lo32, range));
}
In the case of the concrete example, this gives us a more conservative unknown
section. Thus, after knowing r1 <= 0x4000000000 and r1 >= 0x2000000000 and
w1 <= 0x400 and w1 >= 0x200:
max: 0b100000000000000000000000000000000000000 / 0x4000000000
var: 0b111111111111111111111111111111111111111 / 0x7fffffffff
min: 0b010000000000000000000000000000000000000 / 0x2000000000
However, above new __reg_bound_offset32() has no effect on refining the
knowledge of the register contents. Meaning, if the bounds in hi32 range
mismatch we'll get the identity function given the range reg spans
[0x0,0xffffffff] and we cast var_off into lo32 only to later on binary
or it again with the hi32.
Likewise, if the bounds in hi32 range match, then we mask both bounds
with 0xffffffff, use the resulting umin/umax for the range to later
intersect the lo32 with it. However, _prior_ called __reg_bound_offset()
did already such intersection on the full reg and we therefore would only
repeat the same operation on the lo32 part twice.
Given this has no effect and the original commit had false assumptions,
this patch reverts the code entirely which is also more straight forward
for stable trees: apparently 581738a681 got auto-selected by Sasha's
ML system and misclassified as a fix, so it got sucked into v5.4 where
it should never have landed. A revert is low-risk also from a user PoV
since it requires a recent kernel and llc to opt-into -mcpu=v3 BPF CPU
to generate jmp32 instructions. A proper bounds refinement would need a
significantly more complex approach which is currently being worked, but
no stable material [0]. Hence revert is best option for stable. After the
revert, the original reported program gets rejected as follows:
1: (7f) r0 >>= r0
2: (14) w0 -= 808464432
3: (07) r0 += 808464432
4: (b7) r1 = 808464432
5: (de) if w1 s<= w0 goto pc+0
R0_w=invP(id=0,umin_value=808464432,umax_value=5103431727,var_off=(0x0; 0x1ffffffff)) R1_w=invP808464432 R10=fp0
6: (07) r0 += -2144337872
7: (14) w0 -= -1607454672
8: (25) if r0 > 0x30303030 goto pc+0
R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x3fffffff)) R1_w=invP808464432 R10=fp0
9: (76) if w0 s>= 0x303030 goto pc+2
R0=invP(id=0,umax_value=3158063,var_off=(0x0; 0x3fffff)) R1=invP808464432 R10=fp0
10: (30) r0 = *(u8 *)skb[808464432]
BPF_LD_[ABS|IND] uses reserved fields
processed 11 insns (limit 1000000) [...]
[0] https://lore.kernel.org/bpf/158507130343.15666.8018068546764556975.stgit@john-Precision-5820-Tower/T/
Fixes: 581738a681 ("bpf: Provide better register bounds after jmp32 instructions")
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200330160324.15259-2-daniel@iogearbox.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>