Patch series "kasan: switch tag-based modes to stack ring from per-object
metadata", v3.
This series makes the tag-based KASAN modes use a ring buffer for storing
stack depot handles for alloc/free stack traces for slab objects instead
of per-object metadata. This ring buffer is referred to as the stack
ring.
On each alloc/free of a slab object, the tagged address of the object and
the current stack trace are recorded in the stack ring.
On each bug report, if the accessed address belongs to a slab object, the
stack ring is scanned for matching entries. The newest entries are used
to print the alloc/free stack traces in the report: one entry for alloc
and one for free.
The advantages of this approach over storing stack trace handles in
per-object metadata with the tag-based KASAN modes:
- Allows to find relevant stack traces for use-after-free bugs without
using quarantine for freed memory. (Currently, if the object was
reallocated multiple times, the report contains the latest alloc/free
stack traces, not necessarily the ones relevant to the buggy allocation.)
- Allows to better identify and mark use-after-free bugs, effectively
making the CONFIG_KASAN_TAGS_IDENTIFY functionality always-on.
- Has fixed memory overhead.
The disadvantage:
- If the affected object was allocated/freed long before the bug happened
and the stack trace events were purged from the stack ring, the report
will have no stack traces.
Discussion
==========
The proposed implementation of the stack ring uses a single ring buffer
for the whole kernel. This might lead to contention due to atomic
accesses to the ring buffer index on multicore systems.
At this point, it is unknown whether the performance impact from this
contention would be significant compared to the slowdown introduced by
collecting stack traces due to the planned changes to the latter part, see
the section below.
For now, the proposed implementation is deemed to be good enough, but this
might need to be revisited once the stack collection becomes faster.
A considered alternative is to keep a separate ring buffer for each CPU
and then iterate over all of them when printing a bug report. This
approach requires somehow figuring out which of the stack rings has the
freshest stack traces for an object if multiple stack rings have them.
Further plans
=============
This series is a part of an effort to make KASAN stack trace collection
suitable for production. This requires stack trace collection to be fast
and memory-bounded.
The planned steps are:
1. Speed up stack trace collection (potentially, by using SCS;
patches on-hold until steps #2 and #3 are completed).
2. Keep stack trace handles in the stack ring (this series).
3. Add a memory-bounded mode to stack depot or provide an alternative
memory-bounded stack storage.
4. Potentially, implement stack trace collection sampling to minimize
the performance impact.
This patch (of 34):
__kasan_metadata_size() calculates the size of the redzone for objects in
a slab cache.
When accounting for presence of kasan_free_meta in the redzone, this
function only compares free_meta_offset with 0. But free_meta_offset
could also be equal to KASAN_NO_FREE_META, which indicates that
kasan_free_meta is not present at all.
Add a comparison with KASAN_NO_FREE_META into __kasan_metadata_size().
Link: https://lkml.kernel.org/r/cover.1662411799.git.andreyknvl@google.com
Link: https://lkml.kernel.org/r/c7b316d30d90e5947eb8280f4dc78856a49298cf.1662411799.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 254721825
(cherry picked from commit ca77f290cff1dfa095d71ae16cc7cda8ee6df495)
Change-Id: I1feae36ac8435c0ffab4e72bcb03f03b689a0677
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
(Backport: fix conflicts with 608d4c5f9f in arch/x86/um/Makefile.)
Make KASAN run on User Mode Linux on x86_64.
The UML-specific KASAN initializer uses mmap to map the ~16TB of shadow
memory to the location defined by KASAN_SHADOW_OFFSET. kasan_init()
utilizes constructors to initialize KASAN before main().
The location of the KASAN shadow memory, starting at
KASAN_SHADOW_OFFSET, can be configured using the KASAN_SHADOW_OFFSET
option. The default location of this offset is 0x100000000000, which
keeps it out-of-the-way even on UML setups with more "physical" memory.
For low-memory setups, 0x7fff8000 can be used instead, which fits in an
immediate and is therefore faster, as suggested by Dmitry Vyukov. There
is usually enough free space at this location; however, it is a config
option so that it can be easily changed if needed.
Note that, unlike KASAN on other architectures, vmalloc allocations
still use the shadow memory allocated upfront, rather than allocating
and free-ing it per-vmalloc allocation.
If another architecture chooses to go down the same path, we should
replace the checks for CONFIG_UML with something more generic, such
as:
- A CONFIG_KASAN_NO_SHADOW_ALLOC option, which architectures could set
- or, a way of having architecture-specific versions of these vmalloc
and module shadow memory allocation options.
Also note that, while UML supports both KASAN in inline mode
(CONFIG_KASAN_INLINE) and static linking (CONFIG_STATIC_LINK), it does
not support both at the same time.
Signed-off-by: Patricia Alfonso <trishalfonso@google.com>
Co-developed-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Bug: 254721825
(cherry picked from commit 5b301409e8bc5d7fad2ee138be44c5c529dd0874)
Change-Id: I7b6a5dfef80dd8e4684db738c35d80e6405b2f8b
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Commit c275c5c6d5 ("kasan: disable freed user page poisoning with HW
tags") added __GFP_SKIP_KASAN_POISON to GFP_HIGHUSER_MOVABLE. A similar
argument can be made about unpoisoning, so also add
__GFP_SKIP_KASAN_UNPOISON to user pages. To ensure the user page is
still accessible via page_address() without a kasan fault, reset the
page->flags tag.
With the above changes, there is no need for the arm64
tag_clear_highpage() to reset the page->flags tag.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Link: https://lore.kernel.org/r/20220610152141.2148929-3-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 254721825
(cherry picked from commit 70c248aca9e7efa85a6664d5ab56c17c326c958f)
Change-Id: Ie5c13ce38e5c030cb77d63326c9cfd72bd668239
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
__kasan_unpoison_pages() colours the memory with a random tag and stores
it in page->flags in order to re-create the tagged pointer via
page_to_virt() later. When the tag from the page->flags is read, ensure
that the in-memory tags are already visible by re-ordering the
page_kasan_tag_set() after kasan_unpoison(). The former already has
barriers in place through try_cmpxchg(). On the reader side, the order
is ensured by the address dependency between page->flags and the memory
access.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lore.kernel.org/r/20220610152141.2148929-2-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 254721825
(cherry picked from commit ed0a6d1d973e9763989b44913ae1bd2a5d5d5777)
Change-Id: I21744380b630fa5c8c6174ca0d4f063ff75b6acd
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
HW_TAGS KASAN skips zeroing page_alloc allocations backing vmalloc
mappings via __GFP_SKIP_ZERO. Instead, these pages are zeroed via
kasan_unpoison_vmalloc() by passing the KASAN_VMALLOC_INIT flag.
The problem is that __kasan_unpoison_vmalloc() does not zero pages when
either kasan_vmalloc_enabled() or is_vmalloc_or_module_addr() fail.
Thus:
1. Change __vmalloc_node_range() to only set KASAN_VMALLOC_INIT when
__GFP_SKIP_ZERO is set.
2. Change __kasan_unpoison_vmalloc() to always zero pages when the
KASAN_VMALLOC_INIT flag is set.
3. Add WARN_ON() asserts to check that KASAN_VMALLOC_INIT cannot be set
in other early return paths of __kasan_unpoison_vmalloc().
Also clean up the comment in __kasan_unpoison_vmalloc.
Link: https://lkml.kernel.org/r/4bc503537efdc539ffc3f461c1b70162eea31cf6.1654798516.git.andreyknvl@google.com
Fixes: 23689e91fb22 ("kasan, vmalloc: add vmalloc tagging for HW_TAGS")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 254721825
(cherry picked from commit 6c2f761dad7851d8088b91063ccaea3c970efe78)
Change-Id: I07d3f8dc3cd28f43852a04b741a1c0b5a65a4ff9
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
preempt_count: 1, expected: 0
...........
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.1-rt16-yocto-preempt-rt #22
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x60/0x8c
dump_stack+0x10/0x12
__might_resched.cold+0x13b/0x173
rt_spin_lock+0x5b/0xf0
___cache_free+0xa5/0x180
qlist_free_all+0x7a/0x160
per_cpu_remove_cache+0x5f/0x70
smp_call_function_many_cond+0x4c4/0x4f0
on_each_cpu_cond_mask+0x49/0xc0
kasan_quarantine_remove_cache+0x54/0xf0
kasan_cache_shrink+0x9/0x10
kmem_cache_shrink+0x13/0x20
acpi_os_purge_cache+0xe/0x20
acpi_purge_cached_objects+0x21/0x6d
acpi_initialize_objects+0x15/0x3b
acpi_init+0x130/0x5ba
do_one_initcall+0xe5/0x5b0
kernel_init_freeable+0x34f/0x3ad
kernel_init+0x1e/0x140
ret_from_fork+0x22/0x30
When the kmem_cache_shrink() was called, the IPI was triggered, the
___cache_free() is called in IPI interrupt context, the local-lock or
spin-lock will be acquired. On PREEMPT_RT kernel, these locks are
replaced with sleepbale rt-spinlock, so the above problem is triggered.
Fix it by moving the qlist_free_allfrom() from IPI interrupt context to
task context when PREEMPT_RT is enabled.
[akpm@linux-foundation.org: reduce ifdeffery]
Link: https://lkml.kernel.org/r/20220401134649.2222485-1-qiang1.zhang@intel.com
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 254721825
(cherry picked from commit 07d067e4f2ceb72b9f681995cc53828caaba9e6e)
Change-Id: I0a0555265d2e05f31cf2cd58b318d3602ba3cc6a
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
(Backport: no conflict, neighboring lines changes.)
On hardware with features like arm64 MTE or SPARC ADI, an access fault
can be triggered at sub-page granularity. Depending on how the
fault_in_writeable() function is used, the caller can get into a
live-lock by continuously retrying the fault-in on an address different
from the one where the uaccess failed.
In the majority of cases progress is ensured by the following
conditions:
1. copy_to_user_nofault() guarantees at least one byte access if the
user address is not faulting.
2. The fault_in_writeable() loop is resumed from the first address that
could not be accessed by copy_to_user_nofault().
If the loop iteration is restarted from an earlier (initial) point, the
loop is repeated with the same conditions and it would live-lock.
Introduce an arch-specific probe_subpage_writeable() and call it from
the newly added fault_in_subpage_writeable() function. The arch code
with sub-page faults will have to implement the specific probing
functionality.
Note that no other fault_in_subpage_*() functions are added since they
have no callers currently susceptible to a live-lock.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20220423100751.1870771-2-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Bug: 254721825
(cherry picked from commit da32b5817253697671af961715517bfbb308a592)
Change-Id: I8362937496a2a8709686af9f97009b00a21b1f5d
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Split out the part of __kasan_report() that prints things into
print_report(). One of the subsequent patches makes another error handler
use print_report() as well.
Includes lower-level changes:
- Allow addr_has_metadata() accepting a tagged address.
- Drop the const qualifier from the fields of kasan_access_info to
avoid excessive type casts.
- Change the type of the address argument of __kasan_report() and
end_report() to void * to reduce the number of type casts.
Link: https://lkml.kernel.org/r/9be3ed99dd24b9c4e1c4a848b69a0c6ecefd845e.1646237226.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 254721825
(cherry picked from commit 9d7b7dd946924de43021f57a8bee122ff0744d93)
Change-Id: I0e2c4f9d721938fb4b3756384772cc38589d11d8
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
(Backport: fix conflicts due to 261a7a2ac9 having been backported
before this patch.)
With KASAN_VMALLOC and NEED_PER_CPU_PAGE_FIRST_CHUNK the kernel crashes:
Unable to handle kernel paging request at virtual address ffff7000028f2000
...
swapper pgtable: 64k pages, 48-bit VAs, pgdp=0000000042440000
[ffff7000028f2000] pgd=000000063e7c0003, p4d=000000063e7c0003, pud=000000063e7c0003, pmd=000000063e7b0003, pte=0000000000000000
Internal error: Oops: 96000007 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.0-rc4-00003-gc6e6e28f3f30-dirty #62
Hardware name: linux,dummy-virt (DT)
pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO BTYPE=--)
pc : kasan_check_range+0x90/0x1a0
lr : memcpy+0x88/0xf4
sp : ffff80001378fe20
...
Call trace:
kasan_check_range+0x90/0x1a0
pcpu_page_first_chunk+0x3f0/0x568
setup_per_cpu_areas+0xb8/0x184
start_kernel+0x8c/0x328
The vm area used in vm_area_register_early() has no kasan shadow memory,
Let's add a new kasan_populate_early_vm_area_shadow() function to
populate the vm area shadow memory to fix the issue.
[wangkefeng.wang@huawei.com: fix redefinition of 'kasan_populate_early_vm_area_shadow']
Link: https://lkml.kernel.org/r/20211011123211.3936196-1-wangkefeng.wang@huawei.com
Link: https://lkml.kernel.org/r/20210910053354.26721-4-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Marco Elver <elver@google.com> [KASAN]
Acked-by: Andrey Konovalov <andreyknvl@gmail.com> [KASAN]
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 254721825
(cherry picked from commit 3252b1d8309ea42bc6329d9341072ecf1c9505c0)
Change-Id: Ic7008c3e00741e91ba6cac42b9995f83b5aed5cf
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Changes in 5.15.83
clk: generalize devm_clk_get() a bit
clk: Provide new devm_clk helpers for prepared and enabled clocks
mmc: mtk-sd: Fix missing clk_disable_unprepare in msdc_of_clock_parse()
arm64: dts: rockchip: keep I2S1 disabled for GPIO function on ROCK Pi 4 series
arm: dts: rockchip: fix node name for hym8563 rtc
arm: dts: rockchip: remove clock-frequency from rtc
ARM: dts: rockchip: fix ir-receiver node names
arm64: dts: rockchip: fix ir-receiver node names
ARM: dts: rockchip: rk3188: fix lcdc1-rgb24 node name
fs: use acquire ordering in __fget_light()
ARM: 9251/1: perf: Fix stacktraces for tracepoint events in THUMB2 kernels
ARM: 9266/1: mm: fix no-MMU ZERO_PAGE() implementation
ASoC: wm8962: Wait for updated value of WM8962_CLOCKING1 register
spi: mediatek: Fix DEVAPC Violation at KO Remove
ARM: dts: rockchip: disable arm_global_timer on rk3066 and rk3188
ASoC: rt711-sdca: fix the latency time of clock stop prepare state machine transitions
9p/fd: Use P9_HDRSZ for header size
regulator: slg51000: Wait after asserting CS pin
ALSA: seq: Fix function prototype mismatch in snd_seq_expand_var_event
selftests/net: Find nettest in current directory
btrfs: send: avoid unaligned encoded writes when attempting to clone range
ASoC: soc-pcm: Add NULL check in BE reparenting
regulator: twl6030: fix get status of twl6032 regulators
fbcon: Use kzalloc() in fbcon_prepare_logo()
usb: dwc3: gadget: Disable GUSB2PHYCFG.SUSPHY for End Transfer
9p/xen: check logical size for buffer size
net: usb: qmi_wwan: add u-blox 0x1342 composition
mm/khugepaged: take the right locks for page table retraction
mm/khugepaged: fix GUP-fast interaction by sending IPI
mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths
rtc: mc146818-lib: extract mc146818_avoid_UIP
rtc: cmos: avoid UIP when writing alarm time
rtc: cmos: avoid UIP when reading alarm time
cifs: fix use-after-free caused by invalid pointer `hostname`
drm/bridge: anx7625: Fix edid_read break case in sp_tx_edid_read()
xen/netback: Ensure protocol headers don't fall in the non-linear area
xen/netback: do some code cleanup
xen/netback: don't call kfree_skb() with interrupts disabled
media: videobuf2-core: take mmap_lock in vb2_get_unmapped_area()
soundwire: intel: Initialize clock stop timeout
Revert "ARM: dts: imx7: Fix NAND controller size-cells"
media: v4l2-dv-timings.c: fix too strict blanking sanity checks
memcg: fix possible use-after-free in memcg_write_event_control()
mm/gup: fix gup_pud_range() for dax
Bluetooth: btusb: Add debug message for CSR controllers
Bluetooth: Fix crash when replugging CSR fake controllers
net: mana: Fix race on per-CQ variable napi work_done
KVM: s390: vsie: Fix the initialization of the epoch extension (epdx) field
drm/vmwgfx: Don't use screen objects when SEV is active
drm/amdgpu/sdma_v4_0: turn off SDMA ring buffer in the s2idle suspend
drm/shmem-helper: Remove errant put in error path
drm/shmem-helper: Avoid vm_open error paths
net: dsa: sja1105: avoid out of bounds access in sja1105_init_l2_policing()
HID: usbhid: Add ALWAYS_POLL quirk for some mice
HID: hid-lg4ff: Add check for empty lbuf
HID: core: fix shift-out-of-bounds in hid_report_raw_event
HID: ite: Enable QUIRK_TOUCHPAD_ON_OFF_REPORT on Acer Aspire Switch V 10
can: af_can: fix NULL pointer dereference in can_rcv_filter
clk: Fix pointer casting to prevent oops in devm_clk_release()
gpiolib: improve coding style for local variables
gpiolib: check the 'ngpios' property in core gpiolib code
gpiolib: fix memory leak in gpiochip_setup_dev()
netfilter: nft_set_pipapo: Actually validate intervals in fields after the first one
drm/vmwgfx: Fix race issue calling pin_user_pages
ieee802154: cc2520: Fix error return code in cc2520_hw_init()
ca8210: Fix crash by zero initializing data
netfilter: ctnetlink: fix compilation warning after data race fixes in ct mark
drm/bridge: ti-sn65dsi86: Fix output polarity setting bug
gpio: amd8111: Fix PCI device reference count leak
e1000e: Fix TX dispatch condition
igb: Allocate MSI-X vector when testing
net: broadcom: Add PTP_1588_CLOCK_OPTIONAL dependency for BCMGENET under ARCH_BCM2835
drm: bridge: dw_hdmi: fix preference of RGB modes over YUV420
af_unix: Get user_ns from in_skb in unix_diag_get_exact().
vmxnet3: correctly report encapsulated LRO packet
vmxnet3: use correct intrConf reference when using extended queues
Bluetooth: 6LoWPAN: add missing hci_dev_put() in get_l2cap_conn()
Bluetooth: Fix not cleanup led when bt_init fails
net: dsa: ksz: Check return value
net: dsa: hellcreek: Check return value
net: dsa: sja1105: Check return value
selftests: rtnetlink: correct xfrm policy rule in kci_test_ipsec_offload
mac802154: fix missing INIT_LIST_HEAD in ieee802154_if_add()
net: encx24j600: Add parentheses to fix precedence
net: encx24j600: Fix invalid logic in reading of MISTAT register
net: mdiobus: fwnode_mdiobus_register_phy() rework error handling
net: mdiobus: fix double put fwnode in the error path
octeontx2-pf: Fix potential memory leak in otx2_init_tc()
xen-netfront: Fix NULL sring after live migration
net: mvneta: Prevent out of bounds read in mvneta_config_rss()
i40e: Fix not setting default xps_cpus after reset
i40e: Fix for VF MAC address 0
i40e: Disallow ip4 and ip6 l4_4_bytes
NFC: nci: Bounds check struct nfc_target arrays
nvme initialize core quirks before calling nvme_init_subsystem
gpio/rockchip: fix refcount leak in rockchip_gpiolib_register()
net: stmmac: fix "snps,axi-config" node property parsing
ip_gre: do not report erspan version on GRE interface
net: microchip: sparx5: Fix missing destroy_workqueue of mact_queue
net: thunderx: Fix missing destroy_workqueue of nicvf_rx_mode_wq
net: hisilicon: Fix potential use-after-free in hisi_femac_rx()
net: mdio: fix unbalanced fwnode reference count in mdio_device_release()
net: hisilicon: Fix potential use-after-free in hix5hd2_rx()
tipc: Fix potential OOB in tipc_link_proto_rcv()
ipv4: Fix incorrect route flushing when source address is deleted
ipv4: Fix incorrect route flushing when table ID 0 is used
net: dsa: sja1105: fix memory leak in sja1105_setup_devlink_regions()
tipc: call tipc_lxc_xmit without holding node_read_lock
ethernet: aeroflex: fix potential skb leak in greth_init_rings()
dpaa2-switch: Fix memory leak in dpaa2_switch_acl_entry_add() and dpaa2_switch_acl_entry_remove()
xen/netback: fix build warning
net: phy: mxl-gpy: fix version reporting
net: plip: don't call kfree_skb/dev_kfree_skb() under spin_lock_irq()
ipv6: avoid use-after-free in ip6_fragment()
net: thunderbolt: fix memory leak in tbnet_open()
net: mvneta: Fix an out of bounds check
macsec: add missing attribute validation for offload
s390/qeth: fix various format strings
s390/qeth: fix use-after-free in hsci
can: esd_usb: Allow REC and TEC to return to zero
block: move CONFIG_BLOCK guard to top Makefile
io_uring: move to separate directory
io_uring: Fix a null-ptr-deref in io_tctx_exit_cb()
Linux 5.15.83
Change-Id: I08ef74d6ad8786c191050294dcbf1090908e7c4d
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 4a7ba45b1a435e7097ca0f79a847d0949d0eb088 upstream.
memcg_write_event_control() accesses the dentry->d_name of the specified
control fd to route the write call. As a cgroup interface file can't be
renamed, it's safe to access d_name as long as the specified file is a
regular cgroup file. Also, as these cgroup interface files can't be
removed before the directory, it's safe to access the parent too.
Prior to 347c4a8747 ("memcg: remove cgroup_event->cft"), there was a
call to __file_cft() which verified that the specified file is a regular
cgroupfs file before further accesses. The cftype pointer returned from
__file_cft() was no longer necessary and the commit inadvertently dropped
the file type check with it allowing any file to slip through. With the
invarients broken, the d_name and parent accesses can now race against
renames and removals of arbitrary files and cause use-after-free's.
Fix the bug by resurrecting the file type check in __file_cft(). Now that
cgroupfs is implemented through kernfs, checking the file operations needs
to go through a layer of indirection. Instead, let's check the superblock
and dentry type.
Link: https://lkml.kernel.org/r/Y5FRm/cfcKPGzWwl@slm.duckdns.org
Fixes: 347c4a8747 ("memcg: remove cgroup_event->cft")
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jann Horn <jannh@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: <stable@vger.kernel.org> [3.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>