Pull ARM updates from Russell King:
- remove a now unnecessary usage of the KERNEL_DS for
sys_oabi_epoll_ctl()
- update my email address in a number of drivers
- decompressor EFI updates from Ard Biesheuvel
- module unwind section handling updates
- sparsemem Kconfig cleanups
- make act_mm macro respect THREAD_SIZE
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8980/1: Allow either FLATMEM or SPARSEMEM on the multiplatform build
ARM: 8979/1: Remove redundant ARCH_SPARSEMEM_DEFAULT setting
ARM: 8978/1: mm: make act_mm() respect THREAD_SIZE
ARM: decompressor: run decompressor in place if loaded via UEFI
ARM: decompressor: move GOT into .data for EFI enabled builds
ARM: decompressor: defer loading of the contents of the LC0 structure
ARM: decompressor: split off _edata and stack base into separate object
ARM: decompressor: move headroom variable out of LC0
ARM: 8976/1: module: allow arch overrides for .init section names
ARM: 8975/1: module: fix handling of unwind init sections
ARM: 8974/1: use SPARSMEM_STATIC when SPARSEMEM is enabled
ARM: 8971/1: replace the sole use of a symbol with its definition
ARM: 8969/1: decompressor: simplify libfdt builds
Update rmk's email address in various drivers
ARM: compat: remove KERNEL_DS usage in sys_oabi_epoll_ctl()
Pull arm64 updates from Will Deacon:
"A sizeable pile of arm64 updates for 5.8.
Summary below, but the big two features are support for Branch Target
Identification and Clang's Shadow Call stack. The latter is currently
arm64-only, but the high-level parts are all in core code so it could
easily be adopted by other architectures pending toolchain support
Branch Target Identification (BTI):
- Support for ARMv8.5-BTI in both user- and kernel-space. This allows
branch targets to limit the types of branch from which they can be
called and additionally prevents branching to arbitrary code,
although kernel support requires a very recent toolchain.
- Function annotation via SYM_FUNC_START() so that assembly functions
are wrapped with the relevant "landing pad" instructions.
- BPF and vDSO updates to use the new instructions.
- Addition of a new HWCAP and exposure of BTI capability to userspace
via ID register emulation, along with ELF loader support for the
BTI feature in .note.gnu.property.
- Non-critical fixes to CFI unwind annotations in the sigreturn
trampoline.
Shadow Call Stack (SCS):
- Support for Clang's Shadow Call Stack feature, which reserves
platform register x18 to point at a separate stack for each task
that holds only return addresses. This protects function return
control flow from buffer overruns on the main stack.
- Save/restore of x18 across problematic boundaries (user-mode,
hypervisor, EFI, suspend, etc).
- Core support for SCS, should other architectures want to use it
too.
- SCS overflow checking on context-switch as part of the existing
stack limit check if CONFIG_SCHED_STACK_END_CHECK=y.
CPU feature detection:
- Removed numerous "SANITY CHECK" errors when running on a system
with mismatched AArch32 support at EL1. This is primarily a concern
for KVM, which disabled support for 32-bit guests on such a system.
- Addition of new ID registers and fields as the architecture has
been extended.
Perf and PMU drivers:
- Minor fixes and cleanups to system PMU drivers.
Hardware errata:
- Unify KVM workarounds for VHE and nVHE configurations.
- Sort vendor errata entries in Kconfig.
Secure Monitor Call Calling Convention (SMCCC):
- Update to the latest specification from Arm (v1.2).
- Allow PSCI code to query the SMCCC version.
Software Delegated Exception Interface (SDEI):
- Unexport a bunch of unused symbols.
- Minor fixes to handling of firmware data.
Pointer authentication:
- Add support for dumping the kernel PAC mask in vmcoreinfo so that
the stack can be unwound by tools such as kdump.
- Simplification of key initialisation during CPU bringup.
BPF backend:
- Improve immediate generation for logical and add/sub instructions.
vDSO:
- Minor fixes to the linker flags for consistency with other
architectures and support for LLVM's unwinder.
- Clean up logic to initialise and map the vDSO into userspace.
ACPI:
- Work around for an ambiguity in the IORT specification relating to
the "num_ids" field.
- Support _DMA method for all named components rather than only PCIe
root complexes.
- Minor other IORT-related fixes.
Miscellaneous:
- Initialise debug traps early for KGDB and fix KDB cacheflushing
deadlock.
- Minor tweaks to early boot state (documentation update, set
TEXT_OFFSET to 0x0, increase alignment of PE/COFF sections).
- Refactoring and cleanup"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
KVM: arm64: Check advertised Stage-2 page size capability
arm64/cpufeature: Add get_arm64_ftr_reg_nowarn()
ACPI/IORT: Remove the unused __get_pci_rid()
arm64/cpuinfo: Add ID_MMFR4_EL1 into the cpuinfo_arm64 context
arm64/cpufeature: Add remaining feature bits in ID_AA64PFR1 register
arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register
arm64/cpufeature: Add remaining feature bits in ID_AA64ISAR0 register
arm64/cpufeature: Add remaining feature bits in ID_MMFR4 register
arm64/cpufeature: Add remaining feature bits in ID_PFR0 register
arm64/cpufeature: Introduce ID_MMFR5 CPU register
arm64/cpufeature: Introduce ID_DFR1 CPU register
arm64/cpufeature: Introduce ID_PFR2 CPU register
arm64/cpufeature: Make doublelock a signed feature in ID_AA64DFR0
arm64/cpufeature: Drop TraceFilt feature exposure from ID_DFR0 register
arm64/cpufeature: Add explicit ftr_id_isar0[] for ID_ISAR0 register
arm64: mm: Add asid_gen_match() helper
firmware: smccc: Fix missing prototype warning for arm_smccc_version_init
arm64: vdso: Fix CFI directives in sigreturn trampoline
arm64: vdso: Don't prefix sigreturn trampoline with a BTI C instruction
...
Pull x86 cleanups from Ingo Molnar:
"Misc cleanups, with an emphasis on removing obsolete/dead code"
* tag 'x86-cleanups-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/spinlock: Remove obsolete ticket spinlock macros and types
x86/mm: Drop deprecated DISCONTIGMEM support for 32-bit
x86/apb_timer: Drop unused declaration and macro
x86/apb_timer: Drop unused TSC calibration
x86/io_apic: Remove unused function mp_init_irq_at_boot()
x86/mm: Stop printing BRK addresses
x86/audit: Fix a -Wmissing-prototypes warning for ia32_classify_syscall()
x86/nmi: Remove edac.h include leftover
mm: Remove MPX leftovers
x86/mm/mmap: Fix -Wmissing-prototypes warnings
x86/early_printk: Remove unused includes
crash_dump: Remove no longer used saved_max_pfn
x86/smpboot: Remove the last ICPU() macro
Pull SMP updates from Ingo Molnar:
"Misc cleanups in the SMP hotplug and cross-call code"
* tag 'smp-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Remove __freeze_secondary_cpus()
cpu/hotplug: Remove disable_nonboot_cpus()
cpu/hotplug: Fix a typo in comment "broadacasted"->"broadcasted"
smp: Use smp_call_func_t in on_each_cpu()
Pull perf updates from Ingo Molnar:
"Kernel side changes:
- Add AMD Fam17h RAPL support
- Introduce CAP_PERFMON to kernel and user space
- Add Zhaoxin CPU support
- Misc fixes and cleanups
Tooling changes:
- perf record:
Introduce '--switch-output-event' to use arbitrary events to be
setup and read from a side band thread and, when they take place a
signal be sent to the main 'perf record' thread, reusing the core
for '--switch-output' to take perf.data snapshots from the ring
buffer used for '--overwrite', e.g.:
# perf record --overwrite -e sched:* \
--switch-output-event syscalls:*connect* \
workload
will take perf.data.YYYYMMDDHHMMSS snapshots up to around the
connect syscalls.
Add '--num-synthesize-threads' option to control degree of
parallelism of the synthesize_mmap() code which is scanning
/proc/PID/task/PID/maps and can be time consuming. This mimics
pre-existing behaviour in 'perf top'.
- perf bench:
Add a multi-threaded synthesize benchmark and kallsyms parsing
benchmark.
- Intel PT support:
Stitch LBR records from multiple samples to get deeper backtraces,
there are caveats, see the csets for details.
Allow using Intel PT to synthesize callchains for regular events.
Add support for synthesizing branch stacks for regular events
(cycles, instructions, etc) from Intel PT data.
Misc changes:
- Updated perf vendor events for power9 and Coresight.
- Add flamegraph.py script via 'perf flamegraph'
- Misc other changes, fixes and cleanups - see the Git log for details
Also, since over the last couple of years perf tooling has matured and
decoupled from the kernel perf changes to a large degree, going
forward Arnaldo is going to send perf tooling changes via direct pull
requests"
* tag 'perf-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (163 commits)
perf/x86/rapl: Add AMD Fam17h RAPL support
perf/x86/rapl: Make perf_probe_msr() more robust and flexible
perf/x86/rapl: Flip logic on default events visibility
perf/x86/rapl: Refactor to share the RAPL code between Intel and AMD CPUs
perf/x86/rapl: Move RAPL support to common x86 code
perf/core: Replace zero-length array with flexible-array
perf/x86: Replace zero-length array with flexible-array
perf/x86/intel: Add more available bits for OFFCORE_RESPONSE of Intel Tremont
perf/x86/rapl: Add Ice Lake RAPL support
perf flamegraph: Use /bin/bash for report and record scripts
perf cs-etm: Move definition of 'traceid_list' global variable from header file
libsymbols kallsyms: Move hex2u64 out of header
libsymbols kallsyms: Parse using io api
perf bench: Add kallsyms parsing
perf: cs-etm: Update to build with latest opencsd version.
perf symbol: Fix kernel symbol address display
perf inject: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
perf annotate: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
perf trace: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
perf script: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
...
Pull locking updates from Ingo Molnar:
"The biggest change to core locking facilities in this cycle is the
introduction of local_lock_t - this primitive comes from the -rt
project and identifies CPU-local locking dependencies normally handled
opaquely beind preempt_disable() or local_irq_save/disable() critical
sections.
The generated code on mainline kernels doesn't change as a result, but
still there are benefits: improved debugging and better documentation
of data structure accesses.
The new local_lock_t primitives are introduced and then utilized in a
couple of kernel subsystems. No change in functionality is intended.
There's also other smaller changes and cleanups"
* tag 'locking-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
zram: Use local lock to protect per-CPU data
zram: Allocate struct zcomp_strm as per-CPU memory
connector/cn_proc: Protect send_msg() with a local lock
squashfs: Make use of local lock in multi_cpu decompressor
mm/swap: Use local_lock for protection
radix-tree: Use local_lock for protection
locking: Introduce local_lock()
locking/lockdep: Replace zero-length array with flexible-array
locking/rtmutex: Remove unused rt_mutex_cmpxchg_relaxed()
Pull RCU updates from Ingo Molnar:
"The RCU updates for this cycle were:
- RCU-tasks update, including addition of RCU Tasks Trace for BPF use
and TASKS_RUDE_RCU
- kfree_rcu() updates.
- Remove scheduler locking restriction
- RCU CPU stall warning updates.
- Torture-test updates.
- Miscellaneous fixes and other updates"
* tag 'core-rcu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits)
rcu: Allow for smp_call_function() running callbacks from idle
rcu: Provide rcu_irq_exit_check_preempt()
rcu: Abstract out rcu_irq_enter_check_tick() from rcu_nmi_enter()
rcu: Provide __rcu_is_watching()
rcu: Provide rcu_irq_exit_preempt()
rcu: Make RCU IRQ enter/exit functions rely on in_nmi()
rcu/tree: Mark the idle relevant functions noinstr
x86: Replace ist_enter() with nmi_enter()
x86/mce: Send #MC singal from task work
x86/entry: Get rid of ist_begin/end_non_atomic()
sched,rcu,tracing: Avoid tracing before in_nmi() is correct
sh/ftrace: Move arch_ftrace_nmi_{enter,exit} into nmi exception
lockdep: Always inline lockdep_{off,on}()
hardirq/nmi: Allow nested nmi_enter()
arm64: Prepare arch_nmi_enter() for recursion
printk: Disallow instrumenting print_nmi_enter()
printk: Prepare for nested printk_nmi_enter()
rcutorture: Convert ULONG_CMP_LT() to time_before()
torture: Add a --kasan argument
torture: Save a few lines by using config_override_param initially
...
Pull kprobes updates from Ingo Molnar:
"Various kprobes updates, mostly centered around cleaning up the
no-instrumentation logic.
Instead of the current per debug facility blacklist, use the more
generic .noinstr.text approach, combined with a 'noinstr' marker for
functions.
Also add instrumentation_begin()/end() to better manage the exact
place in entry code where instrumentation may be used.
And add a kprobes blacklist for modules"
* tag 'core-kprobes-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kprobes: Prevent probes in .noinstr.text section
vmlinux.lds.h: Create section for protection against instrumentation
samples/kprobes: Add __kprobes and NOKPROBE_SYMBOL() for handlers.
kprobes: Support NOKPROBE_SYMBOL() in modules
kprobes: Support __kprobes blacklist in modules
kprobes: Lock kprobe_mutex while showing kprobe_blacklist
Pull printk updates from Petr Mladek:
- Benjamin Herrenschmidt solved a problem with non-matched console
aliases by first checking consoles defined on the command line. It is
a more conservative approach than the previous attempts.
- Benjamin also made sure that the console accessible via /dev/console
always has CON_CONSDEV flag.
- Andy Shevchenko added the %ptT modifier for printing struct time64_t.
It extends the existing %ptR handling for struct rtc_time.
- Bruno Meneguele fixed /dev/kmsg error value returned by unsupported
SEEK_CUR.
- Tetsuo Handa removed unused pr_cont_once().
... and a few small fixes.
* tag 'printk-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: Remove pr_cont_once()
printk: handle blank console arguments passed in.
kernel/printk: add kmsg SEEK_CUR handling
printk: Fix a typo in comment "interator"->"iterator"
usb: pulse8-cec: Switch to use %ptT
ARM: bcm2835: Switch to use %ptT
lib/vsprintf: Print time64_t in human readable format
lib/vsprintf: update comment about simple_strto<foo>() functions
printk: Correctly set CON_CONSDEV even when preferred console was not registered
printk: Fix preferred console selection with multiple matches
printk: Move console matching logic into a separate function
printk: Convert a use of sprintf to snprintf in console_unlock
Pull pstore updates from Kees Cook:
"Fixes and new features for pstore.
This is a pretty big set of changes (relative to past pstore pulls),
but it has been in -next for a while. The biggest change here is the
ability to support a block device as a pstore backend, which has been
desired for a while. A lot of additional fixes and refactorings are
also included, mostly in support of the new features.
- refactor pstore locking for safer module unloading (Kees Cook)
- remove orphaned records from pstorefs when backend unloaded (Kees
Cook)
- refactor dump_oops parameter into max_reason (Pavel Tatashin)
- introduce pstore/zone for common code for contiguous storage
(WeiXiong Liao)
- introduce pstore/blk for block device backend (WeiXiong Liao)
- introduce mtd backend (WeiXiong Liao)"
* tag 'pstore-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (35 commits)
mtd: Support kmsg dumper based on pstore/blk
pstore/blk: Introduce "best_effort" mode
pstore/blk: Support non-block storage devices
pstore/blk: Provide way to query pstore configuration
pstore/zone: Provide way to skip "broken" zone for MTD devices
Documentation: Add details for pstore/blk
pstore/zone,blk: Add ftrace frontend support
pstore/zone,blk: Add console frontend support
pstore/zone,blk: Add support for pmsg frontend
pstore/blk: Introduce backend for block devices
pstore/zone: Introduce common layer to manage storage zones
ramoops: Add "max-reason" optional field to ramoops DT node
pstore/ram: Introduce max_reason and convert dump_oops
pstore/platform: Pass max_reason to kmesg dump
printk: Introduce kmsg_dump_reason_str()
printk: honor the max_reason field in kmsg_dumper
printk: Collapse shutdown types into a single dump reason
pstore/ftrace: Provide ftrace log merging routine
pstore/ram: Refactor ftrace buffer merging
pstore/ram: Refactor DT size parsing
...
Pull crypto updates from Herbert Xu:
"API:
- Introduce crypto_shash_tfm_digest() and use it wherever possible.
- Fix use-after-free and race in crypto_spawn_alg.
- Add support for parallel and batch requests to crypto_engine.
Algorithms:
- Update jitter RNG for SP800-90B compliance.
- Always use jitter RNG as seed in drbg.
Drivers:
- Add Arm CryptoCell driver cctrng.
- Add support for SEV-ES to the PSP driver in ccp"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (114 commits)
crypto: hisilicon - fix driver compatibility issue with different versions of devices
crypto: engine - do not requeue in case of fatal error
crypto: cavium/nitrox - Fix a typo in a comment
crypto: hisilicon/qm - change debugfs file name from qm_regs to regs
crypto: hisilicon/qm - add DebugFS for xQC and xQE dump
crypto: hisilicon/zip - add debugfs for Hisilicon ZIP
crypto: hisilicon/hpre - add debugfs for Hisilicon HPRE
crypto: hisilicon/sec2 - add debugfs for Hisilicon SEC
crypto: hisilicon/qm - add debugfs to the QM state machine
crypto: hisilicon/qm - add debugfs for QM
crypto: stm32/crc32 - protect from concurrent accesses
crypto: stm32/crc32 - don't sleep in runtime pm
crypto: stm32/crc32 - fix multi-instance
crypto: stm32/crc32 - fix run-time self test issue.
crypto: stm32/crc32 - fix ext4 chksum BUG_ON()
crypto: hisilicon/zip - Use temporary sqe when doing work
crypto: hisilicon - add device error report through abnormal irq
crypto: hisilicon - remove codes of directly report device errors through MSI
crypto: hisilicon - QM memory management optimization
crypto: hisilicon - unify initial value assignment into QM
...
Pull scheduler fix from Thomas Gleixner:
"A single scheduler fix preventing a crash in NUMA balancing.
The current->mm check is not reliable as the mm might be temporary due
to use_mm() in a kthread. Check for PF_KTHREAD explictly"
* tag 'sched-urgent-2020-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Don't NUMA balance for kthreads
Pull networking fixes from David Miller:
"Another week, another set of bug fixes:
1) Fix pskb_pull length in __xfrm_transport_prep(), from Xin Long.
2) Fix double xfrm_state put in esp{4,6}_gro_receive(), also from Xin
Long.
3) Re-arm discovery timer properly in mac80211 mesh code, from Linus
Lüssing.
4) Prevent buffer overflows in nf_conntrack_pptp debug code, from
Pablo Neira Ayuso.
5) Fix race in ktls code between tls_sw_recvmsg() and
tls_decrypt_done(), from Vinay Kumar Yadav.
6) Fix crashes on TCP fallback in MPTCP code, from Paolo Abeni.
7) More validation is necessary of untrusted GSO packets coming from
virtualization devices, from Willem de Bruijn.
8) Fix endianness of bnxt_en firmware message length accesses, from
Edwin Peer.
9) Fix infinite loop in sch_fq_pie, from Davide Caratti.
10) Fix lockdep splat in DSA by setting lockless TX in netdev features
for slave ports, from Vladimir Oltean.
11) Fix suspend/resume crashes in mlx5, from Mark Bloch.
12) Fix use after free in bpf fmod_ret, from Alexei Starovoitov.
13) ARP retransmit timer guard uses wrong offset, from Hongbin Liu.
14) Fix leak in inetdev_init(), from Yang Yingliang.
15) Don't try to use inet hash and unhash in l2tp code, results in
crashes. From Eric Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits)
l2tp: add sk_family checks to l2tp_validate_socket
l2tp: do not use inet_hash()/inet_unhash()
net: qrtr: Allocate workqueue before kernel_bind
mptcp: remove msk from the token container at destruction time.
mptcp: fix race between MP_JOIN and close
mptcp: fix unblocking connect()
net/sched: act_ct: add nat mangle action only for NAT-conntrack
devinet: fix memleak in inetdev_init()
virtio_vsock: Fix race condition in virtio_transport_recv_pkt
drivers/net/ibmvnic: Update VNIC protocol version reporting
NFC: st21nfca: add missed kfree_skb() in an error path
neigh: fix ARP retransmit timer guard
bpf, selftests: Add a verifier test for assigning 32bit reg states to 64bit ones
bpf, selftests: Verifier bounds tests need to be updated
bpf: Fix a verifier issue when assigning 32bit reg states to 64bit ones
bpf: Fix use-after-free in fmod_ret check
net/mlx5e: replace EINVAL in mlx5e_flower_parse_meta()
net/mlx5e: Fix MLX5_TC_CT dependencies
net/mlx5e: Properly set default values when disabling adaptive moderation
net/mlx5e: Fix arch depending casting issue in FEC
...
kmsg_dump() allows to dump kmesg buffer for various system events: oops,
panic, reboot, etc. It provides an interface to register a callback
call for clients, and in that callback interface there is a field
"max_reason", but it was getting ignored when set to any "reason"
higher than KMSG_DUMP_OOPS unless "always_kmsg_dump" was passed as
kernel parameter.
Allow clients to actually control their "max_reason", and keep the
current behavior when "max_reason" is not set.
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/lkml/20200515184434.8470-3-keescook@chromium.org/
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Alexei Starovoitov says:
====================
pull-request: bpf 2020-05-29
The following pull-request contains BPF updates for your *net* tree.
We've added 6 non-merge commits during the last 7 day(s) which contain
a total of 4 files changed, 55 insertions(+), 34 deletions(-).
The main changes are:
1) minor verifier fix for fmod_ret progs, from Alexei.
2) af_xdp overflow check, from Bjorn.
3) minor verifier fix for 32bit assignment, from John.
4) powerpc has non-overlapping addr space, from Petr.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
With the latest trunk llvm (llvm 11), I hit a verifier issue for
test_prog subtest test_verif_scale1.
The following simplified example illustrate the issue:
w9 = 0 /* R9_w=inv0 */
r8 = *(u32 *)(r1 + 80) /* __sk_buff->data_end */
r7 = *(u32 *)(r1 + 76) /* __sk_buff->data */
......
w2 = w9 /* R2_w=inv0 */
r6 = r7 /* R6_w=pkt(id=0,off=0,r=0,imm=0) */
r6 += r2 /* R6_w=inv(id=0) */
r3 = r6 /* R3_w=inv(id=0) */
r3 += 14 /* R3_w=inv(id=0) */
if r3 > r8 goto end
r5 = *(u32 *)(r6 + 0) /* R6_w=inv(id=0) */
<== error here: R6 invalid mem access 'inv'
...
end:
In real test_verif_scale1 code, "w9 = 0" and "w2 = w9" are in
different basic blocks.
In the above, after "r6 += r2", r6 becomes a scalar, which eventually
caused the memory access error. The correct register state should be
a pkt pointer.
The inprecise register state starts at "w2 = w9".
The 32bit register w9 is 0, in __reg_assign_32_into_64(),
the 64bit reg->smax_value is assigned to be U32_MAX.
The 64bit reg->smin_value is 0 and the 64bit register
itself remains constant based on reg->var_off.
In adjust_ptr_min_max_vals(), the verifier checks for a known constant,
smin_val must be equal to smax_val. Since they are not equal,
the verifier decides r6 is a unknown scalar, which caused later failure.
The llvm10 does not have this issue as it generates different code:
w9 = 0 /* R9_w=inv0 */
r8 = *(u32 *)(r1 + 80) /* __sk_buff->data_end */
r7 = *(u32 *)(r1 + 76) /* __sk_buff->data */
......
r6 = r7 /* R6_w=pkt(id=0,off=0,r=0,imm=0) */
r6 += r9 /* R6_w=pkt(id=0,off=0,r=0,imm=0) */
r3 = r6 /* R3_w=pkt(id=0,off=0,r=0,imm=0) */
r3 += 14 /* R3_w=pkt(id=0,off=14,r=0,imm=0) */
if r3 > r8 goto end
...
To fix the above issue, we can include zero in the test condition for
assigning the s32_max_value and s32_min_value to their 64-bit equivalents
smax_value and smin_value.
Further, fix the condition to avoid doing zero extension bounds checks
when s32_min_value <= 0. This could allow for the case where bounds
32-bit bounds (-1,1) get incorrectly translated to (0,1) 64-bit bounds.
When in-fact the -1 min value needs to force U32_MAX bound.
Fixes: 3f50f132d8 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/159077331983.6014.5758956193749002737.stgit@john-Precision-5820-Tower
Current RCU hard relies on smp_call_function() callbacks running from
interrupt context. A pending optimization is going to break that, it
will allow idle CPUs to run the callbacks from the idle loop. This
avoids raising the IPI on the requesting CPU and avoids handling an
exception on the receiving CPU.
Change rcu_is_cpu_rrupt_from_idle() to also accept task context,
provided it is the idle task.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Link: https://lore.kernel.org/r/20200527171236.GC706495@hirez.programming.kicks-ass.net
Pull cgroup fixes from Tejun Heo:
- Reverted stricter synchronization for cgroup recursive stats which
was prepping it for event counter usage which never got merged. The
change was causing performation regressions in some cases.
- Restore bpf-based device-cgroup operation even when cgroup1 device
cgroup is disabled.
- An out-param init fix.
* 'for-5.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
device_cgroup: Cleanup cgroup eBPF device filter code
xattr: fix uninitialized out-param
Revert "cgroup: Add memory barriers to plug cgroup_rstat_updated() race window"
There will likely be exception handlers that can sleep, which rules
out the usual approach of invoking rcu_nmi_enter() on entry and also
rcu_nmi_exit() on all exit paths. However, the alternative approach of
just not calling anything can prevent RCU from coaxing quiescent states
from nohz_full CPUs that are looping in the kernel: RCU must instead
IPI them explicitly. It would be better to enable the scheduler tick
on such CPUs to interact with RCU in a lighter-weight manner, and this
enabling is one of the things that rcu_nmi_enter() currently does.
What is needed is something that helps RCU coax quiescent states while
not preventing subsequent sleeps. This commit therefore splits out the
nohz_full scheduler-tick enabling from the rest of the rcu_nmi_enter()
logic into a new function named rcu_irq_enter_check_tick().
[ tglx: Renamed the function and made it a nop when context tracking is off ]
[ mingo: Fixed a CONFIG_NO_HZ_FULL assumption, harmonized and fixed all the
comment blocks and cleaned up rcu_nmi_enter()/exit() definitions. ]
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200521202116.996113173@linutronix.de
Stefano reported a crash with using SQPOLL with io_uring:
BUG: kernel NULL pointer dereference, address: 00000000000003b0
CPU: 2 PID: 1307 Comm: io_uring-sq Not tainted 5.7.0-rc7 #11
RIP: 0010:task_numa_work+0x4f/0x2c0
Call Trace:
task_work_run+0x68/0xa0
io_sq_thread+0x252/0x3d0
kthread+0xf9/0x130
ret_from_fork+0x35/0x40
which is task_numa_work() oopsing on current->mm being NULL.
The task work is queued by task_tick_numa(), which checks if current->mm is
NULL at the time of the call. But this state isn't necessarily persistent,
if the kthread is using use_mm() to temporarily adopt the mm of a task.
Change the task_tick_numa() check to exclude kernel threads in general,
as it doesn't make sense to attempt ot balance for kthreads anyway.
Reported-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/865de121-8190-5d30-ece5-3b097dc74431@kernel.dk
Pull scheduler fixes from Thomas Gleixner:
"A set of fixes for the scheduler:
- Fix handling of throttled parents in enqueue_task_fair() completely.
The recent fix overlooked a corner case where the first iteration
terminates due to an entity already being on the runqueue which
makes the list management incomplete and later triggers the
assertion which checks for completeness.
- Fix a similar problem in unthrottle_cfs_rq().
- Show the correct uclamp values in procfs which prints the effective
value twice instead of requested and effective"
* tag 'sched-urgent-2020-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix unthrottle_cfs_rq() for leaf_cfs_rq list
sched/debug: Fix requested task uclamp values shown in procfs
sched/fair: Fix enqueue_task_fair() warning some more
If uboot passes a blank string to console_setup then it results in
a trashed memory. Ultimately, the kernel crashes during freeing up
the memory.
This fix checks if there is a blank parameter being
passed to console_setup from uboot. In case it detects that
the console parameter is blank then it doesn't setup the serial
device and it gracefully exits.
Link: https://lore.kernel.org/r/20200522065306.83-1-shreyas.joshi@biamp.com
Signed-off-by: Shreyas Joshi <shreyas.joshi@biamp.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
[pmladek@suse.com: Better format the commit message and code, remove unnecessary brackets.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Userspace libraries, e.g. glibc's dprintf(), perform a SEEK_CUR operation
over any file descriptor requested to make sure the current position isn't
pointing to junk due to previous manipulation of that same fd. And whenever
that fd doesn't have support for such operation, the userspace code expects
-ESPIPE to be returned.
However, when the fd in question references the /dev/kmsg interface, the
current kernel code state returns -EINVAL instead, causing an unexpected
behavior in userspace: in the case of glibc, when -ESPIPE is returned it
gets ignored and the call completes successfully, while returning -EINVAL
forces dprintf to fail without performing any action over that fd:
if (_IO_SEEKOFF (fp, (off64_t)0, _IO_seek_cur, _IOS_INPUT|_IOS_OUTPUT) ==
_IO_pos_BAD && errno != ESPIPE)
return NULL;
With this patch we make sure to return the correct value when SEEK_CUR is
requested over kmsg and also add some kernel doc information to formalize
this behavior.
Link: https://lore.kernel.org/r/20200317103344.574277-1-bmeneg@redhat.com
Cc: linux-kernel@vger.kernel.org
Cc: rostedt@goodmis.org,
Cc: David.Laight@ACULAB.COM
Signed-off-by: Bruno Meneguele <bmeneg@redhat.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200507185804.GA15036@embeddedor
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200511201227.GA14041@embeddedor
Although not exactly identical, unthrottle_cfs_rq() and enqueue_task_fair()
are quite close and follow the same sequence for enqueuing an entity in the
cfs hierarchy. Modify unthrottle_cfs_rq() to use the same pattern as
enqueue_task_fair(). This fixes a problem already faced with the latter and
add an optimization in the last for_each_sched_entity loop.
Fixes: fe61468b2c (sched/fair: Fix enqueue_task_fair warning)
Reported-by Tao Zhou <zohooouoto@zoho.com.cn>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Link: https://lkml.kernel.org/r/20200513135528.4742-1-vincent.guittot@linaro.org
sched/fair: Fix enqueue_task_fair warning some more
The recent patch, fe61468b2c (sched/fair: Fix enqueue_task_fair warning)
did not fully resolve the issues with the rq->tmp_alone_branch !=
&rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
the first for_each_sched_entity loop exits due to on_rq, having incompletely
updated the list. In this case the second for_each_sched_entity loop can
further modify se. The later code to fix up the list management fails to do
what is needed because se does not point to the sched_entity which broke out
of the first loop. The list is not fixed up because the throttled parent was
already added back to the list by a task enqueue in a parallel child hierarchy.
Address this by calling list_add_leaf_cfs_rq if there are throttled parents
while doing the second for_each_sched_entity loop.
Fixes: fe61468b2c ("sched/fair: Fix enqueue_task_fair warning")
Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20200512135222.GC2201@lorien.usersys.redhat.com
Same as rcu_is_watching() but without the preempt_disable/enable() pair
inside the function. It is merked noinstr so it ends up in the
non-instrumentable text section.
This is useful for non-preemptible code especially in the low level entry
section. Using rcu_is_watching() there results in a call to the
preempt_schedule_notrace() thunk which triggers noinstr section warnings in
objtool.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200512213810.518709291@linutronix.de
Interrupts and exceptions invoke rcu_irq_enter() on entry and need to
invoke rcu_irq_exit() before they either return to the interrupted code or
invoke the scheduler due to preemption.
The general assumption is that RCU idle code has to have preemption
disabled so that a return from interrupt cannot schedule. So the return
from interrupt code invokes rcu_irq_exit() and preempt_schedule_irq().
If there is any imbalance in the rcu_irq/nmi* invocations or RCU idle code
had preemption enabled then this goes unnoticed until the CPU goes idle or
some other RCU check is executed.
Provide rcu_irq_exit_preempt() which can be invoked from the
interrupt/exception return code in case that preemption is enabled. It
invokes rcu_irq_exit() and contains a few sanity checks in case that
CONFIG_PROVE_RCU is enabled to catch such issues directly.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134904.364456424@linutronix.de
The rcu_nmi_enter_common() and rcu_nmi_exit_common() functions take an
"irq" parameter that indicates whether these functions have been invoked from
an irq handler (irq==true) or an NMI handler (irq==false).
However, recent changes have applied notrace to a few critical functions
such that rcu_nmi_enter_common() and rcu_nmi_exit_common() many now rely on
in_nmi(). Note that in_nmi() works no differently than before, but rather
that tracing is now prohibited in code regions where in_nmi() would
incorrectly report NMI state.
Therefore remove the "irq" parameter and inline rcu_nmi_enter_common() and
rcu_nmi_exit_common() into rcu_nmi_enter() and rcu_nmi_exit(),
respectively.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Link: https://lkml.kernel.org/r/20200505134101.617130349@linutronix.de
ARM stores unwind information for .init.text in sections named
.ARM.extab.init.text and .ARM.exidx.init.text. Since those aren't
currently recognized as init sections, they're allocated along with the
core section, and relocation fails if the core and the init section are
allocated from different regions and can't reach other.
final section addresses:
...
0x7f800000 .init.text
..
0xcbb54078 .ARM.exidx.init.text
..
section 16 reloc 0 sym '': relocation 42 out of range (0xcbb54078 ->
0x7f800000)
Allow architectures to override the section name so that ARM can fix
this.
Acked-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>