31f50e8abc2eec22ba0f78d25f8a69cde6d65ba1
38569 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
31f50e8abc |
FROMGIT: bpf: btf: limit logging of ignored BTF mismatches
Enabling CONFIG_MODULE_ALLOW_BTF_MISMATCH is an indication that BTF mismatches are expected and module loading should proceed anyway. Logging with pr_warn() on every one of these "benign" mismatches creates unnecessary noise when many such modules are loaded. Instead, handle this case with a single log warning that BTF info may be unavailable. Mismatches also result in calls to __btf_verifier_log() via __btf_verifier_log_type() or btf_verifier_log_member(), adding several additional lines of logging per mismatched module. Add checks to these paths to skip logging for module BTF mismatches in the "allow mismatch" case. All existing logging behavior is preserved in the default CONFIG_MODULE_ALLOW_BTF_MISMATCH=n case. Signed-off-by: Connor O'Brien <connoro@google.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20230107025331.3240536-1-connoro@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Bug: 260025057 Bug: 256578296 Bug: 258989318 (cherry picked from commit 9cb61e50bf6bf54db712bba6cf20badca4383f96 https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master) Signed-off-by: Connor O'Brien <connoro@google.com> Change-Id: I2f9d19a2027610af2e825fc38fd08f56f4cd7091 |
||
|
|
eebff8eab2 |
FROMLIST: mm: cma: introduce gfp flag in cma_alloc instead of no_warn
The upcoming patch will introduce __GFP_NORETRY semantic
in alloc_contig_range which is a failfast mode of the API.
Instead of adding a additional parameter for gfp, replace
no_warn with gfp flag.
To keep old behaviors, it follows the rule below.
no_warn gfp_flags
false GFP_KERNEL
true GFP_KERNEL|__GFP_NOWARN
gfp & __GFP_NOWARN GFP_KERNEL | (gfp & __GFP_NOWARN)
Bug: 170340257
Bug: 120293424
Link: https://lore.kernel.org/linux-mm/YAnM5PbNJZlk%2F%2FiX@google.com/T/#m36b144ff81fe0a8f0ecaf6813de4819ecc41f8fe
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I1ce020ab5d5fff34eb6464be4632ddef72fb43eb
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit
|
||
|
|
abafa1328b |
BACKPORT: locking: Add missing __sched attributes
This patch adds __sched attributes to a few missing places to show blocked function rather than locking function in get_wchan. Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220115231657.84828-1-minchan@kernel.org Conflicts: kernel/locking/percpu-rwsem.c 1. conflict <linux/sched/debug.h> Bug: 228243692 Change-Id: Ifb50c13cfdd7484269d9a291a8da515e1cce6a7b (cherry picked from commit c441e934b604a3b5f350a9104124cf6a3ba07a34) Signed-off-by: Minchan Kim <minchan@google.com> Signed-off-by: Richard Chang <richardycc@google.com> (cherry picked from commit da358e264cbed0240debe37fa6867c81f0748bf8) |
||
|
|
5a91f1aa85 |
Merge 5.15.84 into android14-5.15
Changes in 5.15.84 x86/vdso: Conditionally export __vdso_sgx_enter_enclave() vfs: fix copy_file_range() averts filesystem freeze protection nfp: fix use-after-free in area_cache_get() ASoC: fsl_micfil: explicitly clear software reset bit ASoC: fsl_micfil: explicitly clear CHnF flags ASoC: ops: Check bounds for second channel in snd_soc_put_volsw_sx() libbpf: Use page size as max_entries when probing ring buffer map pinctrl: meditatek: Startup with the IRQs disabled can: sja1000: fix size of OCR_MODE_MASK define can: mcba_usb: Fix termination command argument net: fec: don't reset irq coalesce settings to defaults on "ip link up" ASoC: cs42l51: Correct PGA Volume minimum value perf: Fix perf_pending_task() UaF nvme-pci: clear the prp2 field when not used ASoC: ops: Correct bounds check for second channel on SX controls net: fec: properly guard irq coalesce setup Linux 5.15.84 Change-Id: I34ef5e73fca9da9a77c89b1f0c7ad4af37b63a79 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
a6eaf3db80 |
ANDROID: GKI: Only protect exports if KMI symbols are present
Only enforce export protection if there are symbols in the
unprotected list for the Kernel Module Interface (KMI).
This is only relevant for targets like arm64 that have
defined ABI symbol lists. This allows non-GKI targets
like arm and x86 to continue using GKI source code
without disabling the feature for those targets.
Bug: 232430739
Test: TH
Fixes:
|
||
|
|
8bffa95ac1 |
perf: Fix perf_pending_task() UaF
[ Upstream commit 517e6a301f34613bff24a8e35b5455884f2d83d8 ] Per syzbot it is possible for perf_pending_task() to run after the event is free()'d. There are two related but distinct cases: - the task_work was already queued before destroying the event; - destroying the event itself queues the task_work. The first cannot be solved using task_work_cancel() since perf_release() itself might be called from a task_work (____fput), which means the current->task_works list is already empty and task_work_cancel() won't be able to find the perf_pending_task() entry. The simplest alternative is extending the perf_event lifetime to cover the task_work. The second is just silly, queueing a task_work while you know the event is going away makes no sense and is easily avoided by re-arranging how the event is marked STATE_DEAD and ensuring it goes through STATE_OFF on the way down. Reported-by: syzbot+9228d6098455bb209ec8@syzkaller.appspotmail.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Marco Elver <elver@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
fd1e768866 |
ANDROID: GKI: Protect exports of protected GKI modules
Implement support for protecting the exported symbols of protected GKI modules. Only signed GKI modules are permitted to export symbols listed in the android/abi_gki_protected_exports file. Attempting to export these symbols from an unsigned module will result in the module failing to load, with a 'Permission denied' error message. Bug: 232430739 Test: TH Change-Id: I3e8b330938e116bb2e022d356ac0d55108a84a01 Signed-off-by: Ramji Jiyani <ramjiyani@google.com> |
||
|
|
63696badd9 |
Merge 5.15.83 into android14-5.15
Changes in 5.15.83 clk: generalize devm_clk_get() a bit clk: Provide new devm_clk helpers for prepared and enabled clocks mmc: mtk-sd: Fix missing clk_disable_unprepare in msdc_of_clock_parse() arm64: dts: rockchip: keep I2S1 disabled for GPIO function on ROCK Pi 4 series arm: dts: rockchip: fix node name for hym8563 rtc arm: dts: rockchip: remove clock-frequency from rtc ARM: dts: rockchip: fix ir-receiver node names arm64: dts: rockchip: fix ir-receiver node names ARM: dts: rockchip: rk3188: fix lcdc1-rgb24 node name fs: use acquire ordering in __fget_light() ARM: 9251/1: perf: Fix stacktraces for tracepoint events in THUMB2 kernels ARM: 9266/1: mm: fix no-MMU ZERO_PAGE() implementation ASoC: wm8962: Wait for updated value of WM8962_CLOCKING1 register spi: mediatek: Fix DEVAPC Violation at KO Remove ARM: dts: rockchip: disable arm_global_timer on rk3066 and rk3188 ASoC: rt711-sdca: fix the latency time of clock stop prepare state machine transitions 9p/fd: Use P9_HDRSZ for header size regulator: slg51000: Wait after asserting CS pin ALSA: seq: Fix function prototype mismatch in snd_seq_expand_var_event selftests/net: Find nettest in current directory btrfs: send: avoid unaligned encoded writes when attempting to clone range ASoC: soc-pcm: Add NULL check in BE reparenting regulator: twl6030: fix get status of twl6032 regulators fbcon: Use kzalloc() in fbcon_prepare_logo() usb: dwc3: gadget: Disable GUSB2PHYCFG.SUSPHY for End Transfer 9p/xen: check logical size for buffer size net: usb: qmi_wwan: add u-blox 0x1342 composition mm/khugepaged: take the right locks for page table retraction mm/khugepaged: fix GUP-fast interaction by sending IPI mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths rtc: mc146818-lib: extract mc146818_avoid_UIP rtc: cmos: avoid UIP when writing alarm time rtc: cmos: avoid UIP when reading alarm time cifs: fix use-after-free caused by invalid pointer `hostname` drm/bridge: anx7625: Fix edid_read break case in sp_tx_edid_read() xen/netback: Ensure protocol headers don't fall in the non-linear area xen/netback: do some code cleanup xen/netback: don't call kfree_skb() with interrupts disabled media: videobuf2-core: take mmap_lock in vb2_get_unmapped_area() soundwire: intel: Initialize clock stop timeout Revert "ARM: dts: imx7: Fix NAND controller size-cells" media: v4l2-dv-timings.c: fix too strict blanking sanity checks memcg: fix possible use-after-free in memcg_write_event_control() mm/gup: fix gup_pud_range() for dax Bluetooth: btusb: Add debug message for CSR controllers Bluetooth: Fix crash when replugging CSR fake controllers net: mana: Fix race on per-CQ variable napi work_done KVM: s390: vsie: Fix the initialization of the epoch extension (epdx) field drm/vmwgfx: Don't use screen objects when SEV is active drm/amdgpu/sdma_v4_0: turn off SDMA ring buffer in the s2idle suspend drm/shmem-helper: Remove errant put in error path drm/shmem-helper: Avoid vm_open error paths net: dsa: sja1105: avoid out of bounds access in sja1105_init_l2_policing() HID: usbhid: Add ALWAYS_POLL quirk for some mice HID: hid-lg4ff: Add check for empty lbuf HID: core: fix shift-out-of-bounds in hid_report_raw_event HID: ite: Enable QUIRK_TOUCHPAD_ON_OFF_REPORT on Acer Aspire Switch V 10 can: af_can: fix NULL pointer dereference in can_rcv_filter clk: Fix pointer casting to prevent oops in devm_clk_release() gpiolib: improve coding style for local variables gpiolib: check the 'ngpios' property in core gpiolib code gpiolib: fix memory leak in gpiochip_setup_dev() netfilter: nft_set_pipapo: Actually validate intervals in fields after the first one drm/vmwgfx: Fix race issue calling pin_user_pages ieee802154: cc2520: Fix error return code in cc2520_hw_init() ca8210: Fix crash by zero initializing data netfilter: ctnetlink: fix compilation warning after data race fixes in ct mark drm/bridge: ti-sn65dsi86: Fix output polarity setting bug gpio: amd8111: Fix PCI device reference count leak e1000e: Fix TX dispatch condition igb: Allocate MSI-X vector when testing net: broadcom: Add PTP_1588_CLOCK_OPTIONAL dependency for BCMGENET under ARCH_BCM2835 drm: bridge: dw_hdmi: fix preference of RGB modes over YUV420 af_unix: Get user_ns from in_skb in unix_diag_get_exact(). vmxnet3: correctly report encapsulated LRO packet vmxnet3: use correct intrConf reference when using extended queues Bluetooth: 6LoWPAN: add missing hci_dev_put() in get_l2cap_conn() Bluetooth: Fix not cleanup led when bt_init fails net: dsa: ksz: Check return value net: dsa: hellcreek: Check return value net: dsa: sja1105: Check return value selftests: rtnetlink: correct xfrm policy rule in kci_test_ipsec_offload mac802154: fix missing INIT_LIST_HEAD in ieee802154_if_add() net: encx24j600: Add parentheses to fix precedence net: encx24j600: Fix invalid logic in reading of MISTAT register net: mdiobus: fwnode_mdiobus_register_phy() rework error handling net: mdiobus: fix double put fwnode in the error path octeontx2-pf: Fix potential memory leak in otx2_init_tc() xen-netfront: Fix NULL sring after live migration net: mvneta: Prevent out of bounds read in mvneta_config_rss() i40e: Fix not setting default xps_cpus after reset i40e: Fix for VF MAC address 0 i40e: Disallow ip4 and ip6 l4_4_bytes NFC: nci: Bounds check struct nfc_target arrays nvme initialize core quirks before calling nvme_init_subsystem gpio/rockchip: fix refcount leak in rockchip_gpiolib_register() net: stmmac: fix "snps,axi-config" node property parsing ip_gre: do not report erspan version on GRE interface net: microchip: sparx5: Fix missing destroy_workqueue of mact_queue net: thunderx: Fix missing destroy_workqueue of nicvf_rx_mode_wq net: hisilicon: Fix potential use-after-free in hisi_femac_rx() net: mdio: fix unbalanced fwnode reference count in mdio_device_release() net: hisilicon: Fix potential use-after-free in hix5hd2_rx() tipc: Fix potential OOB in tipc_link_proto_rcv() ipv4: Fix incorrect route flushing when source address is deleted ipv4: Fix incorrect route flushing when table ID 0 is used net: dsa: sja1105: fix memory leak in sja1105_setup_devlink_regions() tipc: call tipc_lxc_xmit without holding node_read_lock ethernet: aeroflex: fix potential skb leak in greth_init_rings() dpaa2-switch: Fix memory leak in dpaa2_switch_acl_entry_add() and dpaa2_switch_acl_entry_remove() xen/netback: fix build warning net: phy: mxl-gpy: fix version reporting net: plip: don't call kfree_skb/dev_kfree_skb() under spin_lock_irq() ipv6: avoid use-after-free in ip6_fragment() net: thunderbolt: fix memory leak in tbnet_open() net: mvneta: Fix an out of bounds check macsec: add missing attribute validation for offload s390/qeth: fix various format strings s390/qeth: fix use-after-free in hsci can: esd_usb: Allow REC and TEC to return to zero block: move CONFIG_BLOCK guard to top Makefile io_uring: move to separate directory io_uring: Fix a null-ptr-deref in io_tctx_exit_cb() Linux 5.15.83 Change-Id: I08ef74d6ad8786c191050294dcbf1090908e7c4d Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
f435c66d23 |
io_uring: move to separate directory
[ Upstream commit ed29b0b4fd835b058ddd151c49d021e28d631ee6 ] In preparation for splitting io_uring up a bit, move it into its own top level directory. It didn't really belong in fs/ anyway, as it's not a file system only API. This adds io_uring/ and moves the core files in there, and updates the MAINTAINERS file for the new location. Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: 998b30c3948e ("io_uring: Fix a null-ptr-deref in io_tctx_exit_cb()") Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
aad8bbd17a |
memcg: fix possible use-after-free in memcg_write_event_control()
commit 4a7ba45b1a435e7097ca0f79a847d0949d0eb088 upstream. memcg_write_event_control() accesses the dentry->d_name of the specified control fd to route the write call. As a cgroup interface file can't be renamed, it's safe to access d_name as long as the specified file is a regular cgroup file. Also, as these cgroup interface files can't be removed before the directory, it's safe to access the parent too. Prior to |
||
|
|
757d548ca6 |
ANDROID: kernel: sched: Export reweight_task
Export reweight_task for vendor usage when they are trying to manipulate task prio. After the prio changed, it will need to update its load weight to take effect. Therefore, this function needs to be called from vendor kernel module. It could be used with trace_android_rvh_set_user_nice and trace_android_rvh_setscheduler. Bug: 245675204 Change-Id: I0033518bf1cbd0a8129795743b95340f439d5fe8 Signed-off-by: Rick Yiu <rickyiu@google.com> (cherry picked from commit db144888f871fa149c0d18844faa971a3bf8f4fc) |
||
|
|
b3d82fd3de |
Merge 5.15.82 into android14-5.15
Changes in 5.15.82
arm64: mte: Avoid setting PG_mte_tagged if no tags cleared or restored
drm/i915: Create a dummy object for gen6 ppgtt
drm/i915/gt: Use i915_vm_put on ppgtt_create error paths
erofs: fix order >= MAX_ORDER warning due to crafted negative i_size
btrfs: sink iterator parameter to btrfs_ioctl_logical_to_ino
btrfs: free btrfs_path before copying inodes to userspace
spi: spi-imx: Fix spi_bus_clk if requested clock is higher than input clock
btrfs: move QUOTA_ENABLED check to rescan_should_stop from btrfs_qgroup_rescan_worker
btrfs: qgroup: fix sleep from invalid context bug in btrfs_qgroup_inherit()
drm/display/dp_mst: Fix drm_dp_mst_add_affected_dsc_crtcs() return code
drm/amdgpu: update drm_display_info correctly when the edid is read
drm/amdgpu: Partially revert "drm/amdgpu: update drm_display_info correctly when the edid is read"
iio: health: afe4403: Fix oob read in afe4403_read_raw
iio: health:
|
||
|
|
48642f9431 |
proc: proc_skip_spaces() shouldn't think it is working on C strings
commit bce9332220bd677d83b19d21502776ad555a0e73 upstream. proc_skip_spaces() seems to think it is working on C strings, and ends up being just a wrapper around skip_spaces() with a really odd calling convention. Instead of basing it on skip_spaces(), it should have looked more like proc_skip_char(), which really is the exact same function (except it skips a particular character, rather than whitespace). So use that as inspiration, odd coding and all. Now the calling convention actually makes sense and works for the intended purpose. Reported-and-tested-by: Kyle Zeng <zengyhkyle@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
3eb9213f66 |
proc: avoid integer type confusion in get_proc_long
commit e6cfaf34be9fcd1a8285a294e18986bfc41a409c upstream. proc_get_long() is passed a size_t, but then assigns it to an 'int' variable for the length. Let's not do that, even if our IO paths are limited to MAX_RW_COUNT (exactly because of these kinds of type errors). So do the proper test in the rigth type. Reported-by: Kyle Zeng <zengyhkyle@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
417d5ea6e7 |
tracing: Free buffers when a used dynamic event is removed
commit 4313e5a613049dfc1819a6dfb5f94cf2caff9452 upstream.
After 65536 dynamic events have been added and removed, the "type" field
of the event then uses the first type number that is available (not
currently used by other events). A type number is the identifier of the
binary blobs in the tracing ring buffer (known as events) to map them to
logic that can parse the binary blob.
The issue is that if a dynamic event (like a kprobe event) is traced and
is in the ring buffer, and then that event is removed (because it is
dynamic, which means it can be created and destroyed), if another dynamic
event is created that has the same number that new event's logic on
parsing the binary blob will be used.
To show how this can be an issue, the following can crash the kernel:
# cd /sys/kernel/tracing
# for i in `seq 65536`; do
echo 'p:kprobes/foo do_sys_openat2 $arg1:u32' > kprobe_events
# done
For every iteration of the above, the writing to the kprobe_events will
remove the old event and create a new one (with the same format) and
increase the type number to the next available on until the type number
reaches over 65535 which is the max number for the 16 bit type. After it
reaches that number, the logic to allocate a new number simply looks for
the next available number. When an dynamic event is removed, that number
is then available to be reused by the next dynamic event created. That is,
once the above reaches the max number, the number assigned to the event in
that loop will remain the same.
Now that means deleting one dynamic event and created another will reuse
the previous events type number. This is where bad things can happen.
After the above loop finishes, the kprobes/foo event which reads the
do_sys_openat2 function call's first parameter as an integer.
# echo 1 > kprobes/foo/enable
# cat /etc/passwd > /dev/null
# cat trace
cat-2211 [005] .... 2007.849603: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
cat-2211 [005] .... 2007.849620: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
cat-2211 [005] .... 2007.849838: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
cat-2211 [005] .... 2007.849880: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
# echo 0 > kprobes/foo/enable
Now if we delete the kprobe and create a new one that reads a string:
# echo 'p:kprobes/foo do_sys_openat2 +0($arg2):string' > kprobe_events
And now we can the trace:
# cat trace
sendmail-1942 [002] ..... 530.136320: foo: (do_sys_openat2+0x0/0x240) arg1= cat-2046 [004] ..... 530.930817: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
cat-2046 [004] ..... 530.930961: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
cat-2046 [004] ..... 530.934278: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
cat-2046 [004] ..... 530.934563: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
bash-1515 [007] ..... 534.299093: foo: (do_sys_openat2+0x0/0x240) arg1="kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk���������@��4Z����;Y�����U
And dmesg has:
==================================================================
BUG: KASAN: use-after-free in string+0xd4/0x1c0
Read of size 1 at addr ffff88805fdbbfa0 by task cat/2049
CPU: 0 PID: 2049 Comm: cat Not tainted 6.1.0-rc6-test+ #641
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
Call Trace:
<TASK>
dump_stack_lvl+0x5b/0x77
print_report+0x17f/0x47b
kasan_report+0xad/0x130
string+0xd4/0x1c0
vsnprintf+0x500/0x840
seq_buf_vprintf+0x62/0xc0
trace_seq_printf+0x10e/0x1e0
print_type_string+0x90/0xa0
print_kprobe_event+0x16b/0x290
print_trace_line+0x451/0x8e0
s_show+0x72/0x1f0
seq_read_iter+0x58e/0x750
seq_read+0x115/0x160
vfs_read+0x11d/0x460
ksys_read+0xa9/0x130
do_syscall_64+0x3a/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fc2e972ade2
Code: c0 e9 b2 fe ff ff 50 48 8d 3d b2 3f 0a 00 e8 05 f0 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
RSP: 002b:00007ffc64e687c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fc2e972ade2
RDX: 0000000000020000 RSI: 00007fc2e980d000 RDI: 0000000000000003
RBP: 00007fc2e980d000 R08: 00007fc2e980c010 R09: 0000000000000000
R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000020f00
R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
</TASK>
The buggy address belongs to the physical page:
page:ffffea00017f6ec0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5fdbb
flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
raw: 000fffffc0000000 0000000000000000 ffffea00017f6ec8 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88805fdbbe80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff88805fdbbf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>ffff88805fdbbf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^
ffff88805fdbc000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff88805fdbc080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================
This was found when Zheng Yejian sent a patch to convert the event type
number assignment to use IDA, which gives the next available number, and
this bug showed up in the fuzz testing by Yujie Liu and the kernel test
robot. But after further analysis, I found that this behavior is the same
as when the event type numbers go past the 16bit max (and the above shows
that).
As modules have a similar issue, but is dealt with by setting a
"WAS_ENABLED" flag when a module event is enabled, and when the module is
freed, if any of its events were enabled, the ring buffer that holds that
event is also cleared, to prevent reading stale events. The same can be
done for dynamic events.
If any dynamic event that is being removed was enabled, then make sure the
buffers they were enabled in are now cleared.
Link: https://lkml.kernel.org/r/20221123171434.545706e3@gandalf.local.home
Link: https://lore.kernel.org/all/20221110020319.1259291-1-zhengyejian1@huawei.com/
Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Depends-on: e18eb8783ec49 ("tracing: Add tracing_reset_all_online_cpus_unlocked() function")
Depends-on:
|
||
|
|
52fc245d15 |
tracing: Fix race where histograms can be called before the event
commit ef38c79a522b660f7f71d45dad2d6244bc741841 upstream.
commit 94eedf3dded5 ("tracing: Fix race where eprobes can be called before
the event") fixed an issue where if an event is soft disabled, and the
trigger is being added, there's a small window where the event sees that
there's a trigger but does not see that it requires reading the event yet,
and then calls the trigger with the record == NULL.
This could be solved with adding memory barriers in the hot path, or to
make sure that all the triggers requiring a record check for NULL. The
latter was chosen.
Commit 94eedf3dded5 set the eprobe trigger handle to check for NULL, but
the same needs to be done with histograms.
Link: https://lore.kernel.org/linux-trace-kernel/20221118211809.701d40c0f8a757b0df3c025a@kernel.org/
Link: https://lore.kernel.org/linux-trace-kernel/20221123164323.03450c3a@gandalf.local.home
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: stable@vger.kernel.org
Fixes:
|
||
|
|
cb2b0612cd |
tracing/osnoise: Fix duration type
commit 022632f6c43a86f2135642dccd5686de318e861d upstream.
The duration type is a 64 long value, not an int. This was
causing some long noise to report wrong values.
Change the duration to a 64 bits value.
Link: https://lkml.kernel.org/r/a93d8a8378c7973e9c609de05826533c9e977939.1668692096.git.bristot@kernel.org
Cc: stable@vger.kernel.org
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Fixes:
|
||
|
|
e376183167 |
bpf: Do not copy spin lock field from user in bpf_selem_alloc
[ Upstream commit 836e49e103dfeeff670c934b7d563cbd982fce87 ]
bpf_selem_alloc function is used by inode_storage, sk_storage and
task_storage maps to set map value, for these map types, there may
be a spin lock in the map value, so if we use memcpy to copy the whole
map value from user, the spin lock field may be initialized incorrectly.
Since the spin lock field is zeroed by kzalloc, call copy_map_value
instead of memcpy to skip copying the spin lock field to fix it.
Fixes:
|
||
|
|
ee3d37d796 |
bpf, perf: Use subprog name when reporting subprog ksymbol
[ Upstream commit 47df8a2f78bc34ff170d147d05b121f84e252b85 ] Since commit |
||
|
|
cd65f53a58 |
ANDROID: sched/uclamp: Don't enable uclamp_is_used static key by in-kernel requests
We do have now in-kernel users of uclamp to implement inheritance. The static_branch_enable() path unconditionally holds the cpus_read_lock() which might_sleep(). The path in binder that implements inheritance happens from in_atomic() context which leads to a splat like this one: [ 147.529960] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:56 [ 147.530196] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2586, name: RenderThread [ 147.530410] INFO: lockdep is turned off. [ 147.530518] Preemption disabled at: [ 147.530521] [<ffffffc008ca2cec>] binder_proc_transaction+0x78/0x41c [ 147.530793] CPU: 8 PID: 2586 Comm: RenderThread Tainted: G S W O 5.15.76-android14-5-00086-gc01afe5d262f #1 [ 147.531214] Call trace: [ 147.531288] dump_backtrace+0xe8/0x134 [ 147.531444] show_stack+0x1c/0x4c [ 147.531598] dump_stack_lvl+0x74/0x94 [ 147.531766] dump_stack+0x14/0x3c [ 147.531920] ___might_sleep+0x210/0x230 [ 147.532094] __might_sleep+0x54/0x84 [ 147.532259] cpus_read_lock+0x2c/0x160 [ 147.532429] static_key_enable+0x1c/0x34 [ 147.532608] __sched_setscheduler+0x2a8/0x99c [ 147.532802] sched_setattr_nocheck+0x1c/0x24 [ 147.532994] binder_do_set_priority+0x31c/0x4a4 [ 147.533195] binder_transaction_priority+0x200/0x3f4 [ 147.533413] binder_proc_transaction+0x220/0x41c [ 147.533618] binder_transaction+0x1df0/0x234c [ 147.533812] binder_thread_write+0xd84/0x2398 [ 147.534007] binder_ioctl_write_read+0x19c/0xb28 [ 147.534212] binder_ioctl+0x344/0x1a3c [ 147.534382] __arm64_sys_ioctl+0x94/0xc8 [ 147.534561] invoke_syscall+0x44/0xf8 [ 147.534729] el0_svc_common+0xc8/0x10c [ 147.534900] do_el0_svc+0x20/0x28 [ 147.535053] el0_svc+0x58/0xe0 [ 147.535198] el0t_64_sync_handler+0x7c/0xe4 [ 147.535386] el0t_64_sync+0x188/0x18c Prevent enabling the lock for !user initiated sched_setattr() operations. Generally we don't expect in-kernel uclamp users. Bug: 259145692 Signed-off-by: Qais Yousef <qyousef@google.com> Change-Id: Iac5be139b5ffd39f5e1c0431ce253133d81b98cf |
||
|
|
880a2689a3 |
Merge 5.15.81 into android14-5.15
Changes in 5.15.81 ASoC: fsl_sai: use local device pointer ASoC: fsl_asrc fsl_esai fsl_sai: allow CONFIG_PM=N serial: Add rs485_supported to uart_port serial: fsl_lpuart: Fill in rs485_supported tty: serial: fsl_lpuart: don't break the on-going transfer when global reset sctp: remove the unnecessary sinfo_stream check in sctp_prsctp_prune_unsent sctp: clear out_curr if all frag chunks of current msg are pruned cifs: introduce new helper for cifs_reconnect() cifs: split out dfs code from cifs_reconnect() cifs: support nested dfs links over reconnect cifs: Fix connections leak when tlink setup failed ata: libata-scsi: simplify __ata_scsi_queuecmd() ata: libata-core: do not issue non-internal commands once EH is pending drm/display: Don't assume dual mode adaptors support i2c sub-addressing nvme: add a bogus subsystem NQN quirk for Micron MTFDKBA2T0TFH nvme-pci: add NVME_QUIRK_BOGUS_NID for Micron Nitro nvme-pci: disable namespace identifiers for the MAXIO MAP1001 nvme-pci: disable write zeroes on various Kingston SSD nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV7000 iio: ms5611: Simplify IO callback parameters iio: pressure: ms5611: fixed value compensation bug ceph: do not update snapshot context when there is no new snapshot ceph: avoid putting the realm twice when decoding snaps fails x86/sgx: Create utility to validate user provided offset and length x86/sgx: Add overflow check in sgx_validate_offset_length() binder: validate alloc->mm in ->mmap() handler ceph: Use kcalloc for allocating multiple elements ceph: fix NULL pointer dereference for req->r_session wifi: mac80211: fix memory free error when registering wiphy fail wifi: mac80211_hwsim: fix debugfs attribute ps with rc table support riscv: dts: sifive unleashed: Add PWM controlled LEDs audit: fix undefined behavior in bit shift for AUDIT_BIT wifi: airo: do not assign -1 to unsigned char wifi: mac80211: Fix ack frame idr leak when mesh has no route wifi: ath11k: Fix QCN9074 firmware boot on x86 spi: stm32: fix stm32_spi_prepare_mbr() that halves spi clk for every run selftests/bpf: Add verifier test for release_reference() Revert "net: macsec: report real_dev features when HW offloading is enabled" platform/x86: ideapad-laptop: Disable touchpad_switch platform/x86: touchscreen_dmi: Add info for the RCA Cambio W101 v2 2-in-1 platform/x86/intel/pmt: Sapphire Rapids PMT errata fix platform/x86/intel/hid: Add some ACPI device IDs scsi: ibmvfc: Avoid path failures during live migration scsi: scsi_debug: Make the READ CAPACITY response compliant with ZBC drm: panel-orientation-quirks: Add quirk for Acer Switch V 10 (SW5-017) block, bfq: fix null pointer dereference in bfq_bio_bfqg() arm64/syscall: Include asm/ptrace.h in syscall_wrapper header. nvmet: fix memory leak in nvmet_subsys_attr_model_store_locked Revert "drm/amdgpu: Revert "drm/amdgpu: getting fan speed pwm for vega10 properly"" ALSA: usb-audio: add quirk to fix Hamedal C20 disconnect issue RISC-V: vdso: Do not add missing symbols to version section in linker script MIPS: pic32: treat port as signed integer xfrm: fix "disable_policy" on ipv4 early demux xfrm: replay: Fix ESN wrap around for GSO af_key: Fix send_acquire race with pfkey_register ARM: dts: am335x-pcm-953: Define fixed regulators in root node ASoC: hdac_hda: fix hda pcm buffer overflow issue ASoC: sgtl5000: Reset the CHIP_CLK_CTRL reg on remove ASoC: soc-pcm: Don't zero TDM masks in __soc_pcm_open() x86/hyperv: Restore VP assist page after cpu offlining/onlining scsi: storvsc: Fix handling of srb_status and capacity change events ASoC: max98373: Add checks for devm_kcalloc regulator: core: fix kobject release warning and memory leak in regulator_register() spi: dw-dma: decrease reference count in dw_spi_dma_init_mfld() regulator: core: fix UAF in destroy_regulator() bus: sunxi-rsb: Remove the shutdown callback bus: sunxi-rsb: Support atomic transfers tee: optee: fix possible memory leak in optee_register_device() ARM: dts: at91: sam9g20ek: enable udc vbus gpio pinctrl selftests: mptcp: more stable simult_flows tests selftests: mptcp: fix mibit vs mbit mix up net: liquidio: simplify if expression rxrpc: Allow list of in-use local UDP endpoints to be viewed in /proc rxrpc: Use refcount_t rather than atomic_t rxrpc: Fix race between conn bundle lookup and bundle removal [ZDI-CAN-15975] net: dsa: sja1105: disallow C45 transactions on the BASE-TX MDIO bus nfc/nci: fix race with opening and closing net: pch_gbe: fix potential memleak in pch_gbe_tx_queue() 9p/fd: fix issue of list_del corruption in p9_fd_cancel() netfilter: conntrack: Fix data-races around ct mark netfilter: nf_tables: do not set up extensions for end interval iavf: Fix a crash during reset task iavf: Do not restart Tx queues after reset task failure iavf: Fix race condition between iavf_shutdown and iavf_remove ARM: mxs: fix memory leak in mxs_machine_init() ARM: dts: imx6q-prti6q: Fix ref/tcxo-clock-frequency properties net: ethernet: mtk_eth_soc: fix error handling in mtk_open() net/mlx4: Check retval of mlx4_bitmap_init net: mvpp2: fix possible invalid pointer dereference net/qla3xxx: fix potential memleak in ql3xxx_send() octeontx2-af: debugsfs: fix pci device refcount leak net: pch_gbe: fix pci device refcount leak while module exiting nfp: fill splittable of devlink_port_attrs correctly nfp: add port from netdev validation for EEPROM access macsec: Fix invalid error code set Drivers: hv: vmbus: fix double free in the error path of vmbus_add_channel_work() Drivers: hv: vmbus: fix possible memory leak in vmbus_device_register() netfilter: ipset: regression in ip_set_hash_ip.c net/mlx5: Do not query pci info while pci disabled net/mlx5: Fix FW tracer timestamp calculation net/mlx5: Fix handling of entry refcount when command is not issued to FW tipc: set con sock in tipc_conn_alloc tipc: add an extra conn_get in tipc_conn_alloc tipc: check skb_linearize() return value in tipc_disc_rcv() xfrm: Fix oops in __xfrm_state_delete() xfrm: Fix ignored return value in xfrm6_init() net: wwan: iosm: use ACPI_FREE() but not kfree() in ipc_pcie_read_bios_cfg() sfc: fix potential memleak in __ef100_hard_start_xmit() net: sparx5: fix error handling in sparx5_port_open() net: sched: allow act_ct to be built without NF_NAT NFC: nci: fix memory leak in nci_rx_data_packet() regulator: twl6030: re-add TWL6032_SUBCLASS bnx2x: fix pci device refcount leak in bnx2x_vf_is_pcie_pending() dma-buf: fix racing conflict of dma_heap_add() netfilter: ipset: restore allowing 64 clashing elements in hash:net,iface netfilter: flowtable_offload: add missing locking fs: do not update freeing inode i_io_list dccp/tcp: Reset saddr on failure after inet6?_hash_connect(). ipv4: Fix error return code in fib_table_insert() arcnet: fix potential memory leak in com20020_probe() s390/dasd: fix no record found for raw_track_access nfc: st-nci: fix incorrect validating logic in EVT_TRANSACTION nfc: st-nci: fix memory leaks in EVT_TRANSACTION nfc: st-nci: fix incorrect sizing calculations in EVT_TRANSACTION net: enetc: manage ENETC_F_QBV in priv->active_offloads only when enabled net: enetc: cache accesses to &priv->si->hw net: enetc: preserve TX ring priority across reconfiguration octeontx2-pf: Add check for devm_kcalloc octeontx2-af: Fix reference count issue in rvu_sdp_init() net: thunderx: Fix the ACPI memory leak s390/crashdump: fix TOD programmable field size lib/vdso: use "grep -E" instead of "egrep" init/Kconfig: fix CC_HAS_ASM_GOTO_TIED_OUTPUT test with dash nios2: add FORCE for vmlinuz.gz mmc: sdhci-brcmstb: Re-organize flags mmc: sdhci-brcmstb: Enable Clock Gating to save power mmc: sdhci-brcmstb: Fix SDHCI_RESET_ALL for CQHCI KVM: arm64: pkvm: Fixup boot mode to reflect that the kernel resumes from EL1 usb: dwc3: exynos: Fix remove() function usb: cdnsp: Fix issue with Clear Feature Halt Endpoint usb: cdnsp: fix issue with ZLP - added TD_SIZE = 1 ext4: fix use-after-free in ext4_ext_shift_extents arm64: dts: rockchip: lower rk3399-puma-haikou SD controller clock frequency iio: light: apds9960: fix wrong register for gesture gain iio: core: Fix entry not deleted when iio_register_sw_trigger_type() fails bus: ixp4xx: Don't touch bit 7 on IXP42x usb: dwc3: gadget: conditionally remove requests usb: dwc3: gadget: Return -ESHUTDOWN on ep disable usb: dwc3: gadget: Clear ep descriptor last nilfs2: fix nilfs_sufile_mark_dirty() not set segment usage as dirty gcov: clang: fix the buffer overflow issue mm: vmscan: fix extreme overreclaim and swap floods KVM: x86: nSVM: leave nested mode on vCPU free KVM: x86: forcibly leave nested mode on vCPU reset KVM: x86: nSVM: harden svm_free_nested against freeing vmcb02 while still in use KVM: x86: add kvm_leave_nested KVM: x86: remove exit_int_info warning in svm_handle_exit x86/tsx: Add a feature bit for TSX control MSR support x86/pm: Add enumeration check before spec MSRs save/restore setup x86/ioremap: Fix page aligned size calculation in __ioremap_caller() Input: synaptics - switch touchpad on HP Laptop 15-da3001TU to RMI mode ASoC: Intel: bytcht_es8316: Add quirk for the Nanote UMPC-01 tools: iio: iio_generic_buffer: Fix read size serial: 8250: 8250_omap: Avoid RS485 RTS glitch on ->set_termios() Input: goodix - try resetting the controller when no config is set Input: soc_button_array - add use_low_level_irq module parameter Input: soc_button_array - add Acer Switch V 10 to dmi_use_low_level_irq[] Input: i8042 - apply probe defer to more ASUS ZenBook models ASoC: stm32: dfsdm: manage cb buffers cleanup xen-pciback: Allow setting PCI_MSIX_FLAGS_MASKALL too xen/platform-pci: add missing free_irq() in error path platform/x86: asus-wmi: add missing pci_dev_put() in asus_wmi_set_xusb2pr() platform/x86: acer-wmi: Enable SW_TABLET_MODE on Switch V 10 (SW5-017) drm/amdgpu: disable BACO support on more cards zonefs: fix zone report size in __zonefs_io_error() platform/x86: hp-wmi: Ignore Smart Experience App event platform/x86: ideapad-laptop: Fix interrupt storm on fn-lock toggle on some Yoga laptops tcp: configurable source port perturb table size net: usb: qmi_wwan: add Telit 0x103a composition scsi: iscsi: Fix possible memory leak when device_register() failed gpu: host1x: Avoid trying to use GART on Tegra20 dm integrity: flush the journal on suspend dm integrity: clear the journal on suspend fuse: lock inode unconditionally in fuse_fallocate() wifi: wilc1000: validate pairwise and authentication suite offsets wifi: wilc1000: validate length of IEEE80211_P2P_ATTR_OPER_CHANNEL attribute wifi: wilc1000: validate length of IEEE80211_P2P_ATTR_CHANNEL_LIST attribute wifi: wilc1000: validate number of channels genirq/msi: Shutdown managed interrupts with unsatifiable affinities genirq: Always limit the affinity to online CPUs irqchip/gic-v3: Always trust the managed affinity provided by the core code genirq: Take the proposed affinity at face value if force==true btrfs: free btrfs_path before copying root refs to userspace btrfs: free btrfs_path before copying fspath to userspace btrfs: free btrfs_path before copying subvol info to userspace btrfs: zoned: fix missing endianness conversion in sb_write_pointer btrfs: use kvcalloc in btrfs_get_dev_zone_info btrfs: sysfs: normalize the error handling branch in btrfs_init_sysfs() drm/amd/dc/dce120: Fix audio register mapping, stop triggering KASAN drm/amd/display: No display after resume from WB/CB drm/amdgpu: Enable Aldebaran devices to report CU Occupancy drm/amdgpu: always register an MMU notifier for userptr drm/i915: fix TLB invalidation for Gen12 video and compute engines cifs: fix missed refcounting of ipc tcon Linux 5.15.81 Change-Id: I9fe677a5bc83d3484f86aebbe20dd129c7ca3a42 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
7d0c25b5fe |
genirq: Take the proposed affinity at face value if force==true
From: Marc Zyngier <maz@kernel.org> commit c48c8b829d2b966a6649827426bcdba082ccf922 upstream. Although setting the affinity of an interrupt to a set of CPUs that doesn't have any online CPU is generally frowned apon, there are a few limited cases where such affinity is set from a CPUHP notifier, setting the affinity to a CPU that isn't online yet. The saving grace is that this is always done using the 'force' attribute, which gives a hint that the affinity setting can be outside of the online CPU mask and the callsite set this flag with the knowledge that the underlying interrupt controller knows to handle it. This restores the expected behaviour on Marek's system. Fixes: 33de0aa4bae9 ("genirq: Always limit the affinity to online CPUs") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/4b7fc13c-887b-a664-26e8-45aed13f048a@samsung.com Link: https://lore.kernel.org/r/20220414140011.541725-1-maz@kernel.org Signed-off-by: Luiz Capitulino <luizcap@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
52a93f2dcf |
genirq: Always limit the affinity to online CPUs
From: Marc Zyngier <maz@kernel.org> commit 33de0aa4bae982ed6f7c777f86b5af3e627ac937 upstream. [ Fixed small conflicts due to the HK_FLAG_MANAGED_IRQ flag been renamed on upstream ] When booting with maxcpus=<small number> (or even loading a driver while most CPUs are offline), it is pretty easy to observe managed affinities containing a mix of online and offline CPUs being passed to the irqchip driver. This means that the irqchip cannot trust the affinity passed down from the core code, which is a bit annoying and requires (at least in theory) all drivers to implement some sort of affinity narrowing. In order to address this, always limit the cpumask to the set of online CPUs. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220405185040.206297-3-maz@kernel.org Signed-off-by: Luiz Capitulino <luizcap@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
599cf4b845 |
genirq/msi: Shutdown managed interrupts with unsatifiable affinities
From: Marc Zyngier <maz@kernel.org> commit d802057c7c553ad426520a053da9f9fe08e2c35a upstream. [ This commit is almost a rewrite because it conflicts with Thomas Gleixner's refactoring of this code in v5.17-rc1. I wasn't sure if I should drop all the s-o-bs (including Mark's), but decided to keep as the original commit ] When booting with maxcpus=<small number>, interrupt controllers such as the GICv3 ITS may not be able to satisfy the affinity of some managed interrupts, as some of the HW resources are simply not available. The same thing happens when loading a driver using managed interrupts while CPUs are offline. In order to deal with this, do not try to activate such interrupt if there is no online CPU capable of handling it. Instead, place it in shutdown state. Once a capable CPU shows up, it will be activated. Reported-by: John Garry <john.garry@huawei.com> Reported-by: David Decotigny <ddecotig@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/20220405185040.206297-2-maz@kernel.org Signed-off-by: Luiz Capitulino <luizcap@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
feb2eda5e1 |
gcov: clang: fix the buffer overflow issue
commit a6f810efabfd789d3bbafeacb4502958ec56c5ce upstream.
Currently, in clang version of gcov code when module is getting removed
gcov_info_add() incorrectly adds the sfn_ptr->counter to all the
dst->functions and it result in the kernel panic in below crash report.
Fix this by properly handling it.
[ 8.899094][ T599] Unable to handle kernel write to read-only memory at virtual address ffffff80461cc000
[ 8.899100][ T599] Mem abort info:
[ 8.899102][ T599] ESR = 0x9600004f
[ 8.899103][ T599] EC = 0x25: DABT (current EL), IL = 32 bits
[ 8.899105][ T599] SET = 0, FnV = 0
[ 8.899107][ T599] EA = 0, S1PTW = 0
[ 8.899108][ T599] FSC = 0x0f: level 3 permission fault
[ 8.899110][ T599] Data abort info:
[ 8.899111][ T599] ISV = 0, ISS = 0x0000004f
[ 8.899113][ T599] CM = 0, WnR = 1
[ 8.899114][ T599] swapper pgtable: 4k pages, 39-bit VAs, pgdp=00000000ab8de000
[ 8.899116][ T599] [ffffff80461cc000] pgd=18000009ffcde003, p4d=18000009ffcde003, pud=18000009ffcde003, pmd=18000009ffcad003, pte=00600000c61cc787
[ 8.899124][ T599] Internal error: Oops: 9600004f [#1] PREEMPT SMP
[ 8.899265][ T599] Skip md ftrace buffer dump for: 0x1609e0
....
..,
[ 8.899544][ T599] CPU: 7 PID: 599 Comm: modprobe Tainted: G S OE 5.15.41-android13-8-g38e9b1af6bce #1
[ 8.899547][ T599] Hardware name: XXX (DT)
[ 8.899549][ T599] pstate: 82400005 (Nzcv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--)
[ 8.899551][ T599] pc : gcov_info_add+0x9c/0xb8
[ 8.899557][ T599] lr : gcov_event+0x28c/0x6b8
[ 8.899559][ T599] sp : ffffffc00e733b00
[ 8.899560][ T599] x29: ffffffc00e733b00 x28: ffffffc00e733d30 x27: ffffffe8dc297470
[ 8.899563][ T599] x26: ffffffe8dc297000 x25: ffffffe8dc297000 x24: ffffffe8dc297000
[ 8.899566][ T599] x23: ffffffe8dc0a6200 x22: ffffff880f68bf20 x21: 0000000000000000
[ 8.899569][ T599] x20: ffffff880f68bf00 x19: ffffff8801babc00 x18: ffffffc00d7f9058
[ 8.899572][ T599] x17: 0000000000088793 x16: ffffff80461cbe00 x15: 9100052952800785
[ 8.899575][ T599] x14: 0000000000000200 x13: 0000000000000041 x12: 9100052952800785
[ 8.899577][ T599] x11: ffffffe8dc297000 x10: ffffffe8dc297000 x9 : ffffff80461cbc80
[ 8.899580][ T599] x8 : ffffff8801babe80 x7 : ffffffe8dc2ec000 x6 : ffffffe8dc2ed000
[ 8.899583][ T599] x5 : 000000008020001f x4 : fffffffe2006eae0 x3 : 000000008020001f
[ 8.899586][ T599] x2 : ffffff8027c49200 x1 : ffffff8801babc20 x0 : ffffff80461cb3a0
[ 8.899589][ T599] Call trace:
[ 8.899590][ T599] gcov_info_add+0x9c/0xb8
[ 8.899592][ T599] gcov_module_notifier+0xbc/0x120
[ 8.899595][ T599] blocking_notifier_call_chain+0xa0/0x11c
[ 8.899598][ T599] do_init_module+0x2a8/0x33c
[ 8.899600][ T599] load_module+0x23cc/0x261c
[ 8.899602][ T599] __arm64_sys_finit_module+0x158/0x194
[ 8.899604][ T599] invoke_syscall+0x94/0x2bc
[ 8.899607][ T599] el0_svc_common+0x1d8/0x34c
[ 8.899609][ T599] do_el0_svc+0x40/0x54
[ 8.899611][ T599] el0_svc+0x94/0x2f0
[ 8.899613][ T599] el0t_64_sync_handler+0x88/0xec
[ 8.899615][ T599] el0t_64_sync+0x1b4/0x1b8
[ 8.899618][ T599] Code: f905f56c f86e69ec f86e6a0f 8b0c01ec (f82e6a0c)
[ 8.899620][ T599] ---[ end trace ed5218e9e5b6e2e6 ]---
Link: https://lkml.kernel.org/r/1668020497-13142-1-git-send-email-quic_mojha@quicinc.com
Fixes:
|
||
|
|
706d642b1d |
ANDROID: GKI: Handle no ABI symbol list for modules
ACKs with no ABI symbol lists like mainline,
don't let any unsigned modules load as every
access is being treated as violation as
NO_OF_UNPROTECTED_SYMBOLS will be 0 in this case.
Check NO_OF_UNPROTECTED_SYMBOLS and if it's 0,
allow every symbol access by unsigned modules;
so we can keep the feature enable and also not
break any devices. It should never be 0 with
kernel branches where KMI_SYMBOL_LISTS have been
enabled.
Bug: 257458145
Bug: 232430739
Test: TH
Fixes: e9669eeb2f45 ("ANDROID: GKI: Add module load time symbol protection")
Change-Id: Iab65e1425473e32baaad0d6c7f0d3eb007ae864f
Signed-off-by: Ramji Jiyani <ramjiyani@google.com>
(cherry picked from commit 8e00226a8fffa10b6383e448af785ce44451688e)
|
||
|
|
baeb9a0025 |
BACKPORT: mm: multi-gen LRU: kill switch
Add /sys/kernel/mm/lru_gen/enabled as a kill switch. Components that
can be disabled include:
0x0001: the multi-gen LRU core
0x0002: walking page table, when arch_has_hw_pte_young() returns
true
0x0004: clearing the accessed bit in non-leaf PMD entries, when
CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y
[yYnN]: apply to all the components above
E.g.,
echo y >/sys/kernel/mm/lru_gen/enabled
cat /sys/kernel/mm/lru_gen/enabled
0x0007
echo 5 >/sys/kernel/mm/lru_gen/enabled
cat /sys/kernel/mm/lru_gen/enabled
0x0005
NB: the page table walks happen on the scale of seconds under heavy memory
pressure, in which case the mmap_lock contention is a lesser concern,
compared with the LRU lock contention and the I/O congestion. So far the
only well-known case of the mmap_lock contention happens on Android, due
to Scudo [1] which allocates several thousand VMAs for merely a few
hundred MBs. The SPF and the Maple Tree also have provided their own
assessments [2][3]. However, if walking page tables does worsen the
mmap_lock contention, the kill switch can be used to disable it. In this
case the multi-gen LRU will suffer a minor performance degradation, as
shown previously.
Clearing the accessed bit in non-leaf PMD entries can also be disabled,
since this behavior was not tested on x86 varieties other than Intel and
AMD.
[1] https://source.android.com/devices/tech/debug/scudo
[2] https://lore.kernel.org/r/20220128131006.67712-1-michel@lespinasse.org/
[3] https://lore.kernel.org/r/20220426150616.3937571-1-Liam.Howlett@oracle.com/
Link: https://lkml.kernel.org/r/20220918080010.2920238-11-yuzhao@google.com
Change-Id: If3116e6698cc6967b6992c2017962fac6c2d3a11
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 354ed597442952fb680c9cafc7e4eb8a76f9514c)
Bug: 249601646
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
|
||
|
|
7f53b0e704 |
BACKPORT: mm: multi-gen LRU: support page table walks
To further exploit spatial locality, the aging prefers to walk page tables
to search for young PTEs and promote hot pages. A kill switch will be
added in the next patch to disable this behavior. When disabled, the
aging relies on the rmap only.
NB: this behavior has nothing similar with the page table scanning in the
2.4 kernel [1], which searches page tables for old PTEs, adds cold pages
to swapcache and unmaps them.
To avoid confusion, the term "iteration" specifically means the traversal
of an entire mm_struct list; the term "walk" will be applied to page
tables and the rmap, as usual.
An mm_struct list is maintained for each memcg, and an mm_struct follows
its owner task to the new memcg when this task is migrated. Given an
lruvec, the aging iterates lruvec_memcg()->mm_list and calls
walk_page_range() with each mm_struct on this list to promote hot pages
before it increments max_seq.
When multiple page table walkers iterate the same list, each of them gets
a unique mm_struct; therefore they can run concurrently. Page table
walkers ignore any misplaced pages, e.g., if an mm_struct was migrated,
pages it left in the previous memcg will not be promoted when its current
memcg is under reclaim. Similarly, page table walkers will not promote
pages from nodes other than the one under reclaim.
This patch uses the following optimizations when walking page tables:
1. It tracks the usage of mm_struct's between context switches so that
page table walkers can skip processes that have been sleeping since
the last iteration.
2. It uses generational Bloom filters to record populated branches so
that page table walkers can reduce their search space based on the
query results, e.g., to skip page tables containing mostly holes or
misplaced pages.
3. It takes advantage of the accessed bit in non-leaf PMD entries when
CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y.
4. It does not zigzag between a PGD table and the same PMD table
spanning multiple VMAs. IOW, it finishes all the VMAs within the
range of the same PMD table before it returns to a PGD table. This
improves the cache performance for workloads that have large
numbers of tiny VMAs [2], especially when CONFIG_PGTABLE_LEVELS=5.
Server benchmark results:
Single workload:
fio (buffered I/O): no change
Single workload:
memcached (anon): +[8, 10]%
Ops/sec KB/sec
patch1-7: 1147696.57 44640.29
patch1-8: 1245274.91 48435.66
Configurations:
no change
Client benchmark results:
kswapd profiles:
patch1-7
48.16% lzo1x_1_do_compress (real work)
8.20% page_vma_mapped_walk (overhead)
7.06% _raw_spin_unlock_irq
2.92% ptep_clear_flush
2.53% __zram_bvec_write
2.11% do_raw_spin_lock
2.02% memmove
1.93% lru_gen_look_around
1.56% free_unref_page_list
1.40% memset
patch1-8
49.44% lzo1x_1_do_compress (real work)
6.19% page_vma_mapped_walk (overhead)
5.97% _raw_spin_unlock_irq
3.13% get_pfn_page
2.85% ptep_clear_flush
2.42% __zram_bvec_write
2.08% do_raw_spin_lock
1.92% memmove
1.44% alloc_zspage
1.36% memset
Configurations:
no change
Thanks to the following developers for their efforts [3].
kernel test robot <lkp@intel.com>
[1] https://lwn.net/Articles/23732/
[2] https://llvm.org/docs/ScudoHardenedAllocator.html
[3] https://lore.kernel.org/r/202204160827.ekEARWQo-lkp@intel.com/
Link: https://lkml.kernel.org/r/20220918080010.2920238-9-yuzhao@google.com
Change-Id: I7ed3daf288e664e15bfd34991a77467a19a4e39a
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit bd74fdaea146029e4fa12c6de89adbe0779348a9)
[ Resolve conflicts in include/linux/memcontrol.h,
include/linux/mm_types.h ]
Bug: 249601646
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
|
||
|
|
37397878ee |
BACKPORT: mm: multi-gen LRU: minimal implementation
To avoid confusion, the terms "promotion" and "demotion" will be applied
to the multi-gen LRU, as a new convention; the terms "activation" and
"deactivation" will be applied to the active/inactive LRU, as usual.
The aging produces young generations. Given an lruvec, it increments
max_seq when max_seq-min_seq+1 approaches MIN_NR_GENS. The aging promotes
hot pages to the youngest generation when it finds them accessed through
page tables; the demotion of cold pages happens consequently when it
increments max_seq. Promotion in the aging path does not involve any LRU
list operations, only the updates of the gen counter and
lrugen->nr_pages[]; demotion, unless as the result of the increment of
max_seq, requires LRU list operations, e.g., lru_deactivate_fn(). The
aging has the complexity O(nr_hot_pages), since it is only interested in
hot pages.
The eviction consumes old generations. Given an lruvec, it increments
min_seq when lrugen->lists[] indexed by min_seq%MAX_NR_GENS becomes empty.
A feedback loop modeled after the PID controller monitors refaults over
anon and file types and decides which type to evict when both types are
available from the same generation.
The protection of pages accessed multiple times through file descriptors
takes place in the eviction path. Each generation is divided into
multiple tiers. A page accessed N times through file descriptors is in
tier order_base_2(N). Tiers do not have dedicated lrugen->lists[], only
bits in page->flags. The aforementioned feedback loop also monitors
refaults over all tiers and decides when to protect pages in which tiers
(N>1), using the first tier (N=0,1) as a baseline. The first tier
contains single-use unmapped clean pages, which are most likely the best
choices. In contrast to promotion in the aging path, the protection of a
page in the eviction path is achieved by moving this page to the next
generation, i.e., min_seq+1, if the feedback loop decides so. This
approach has the following advantages:
1. It removes the cost of activation in the buffered access path by
inferring whether pages accessed multiple times through file
descriptors are statistically hot and thus worth protecting in the
eviction path.
2. It takes pages accessed through page tables into account and avoids
overprotecting pages accessed multiple times through file
descriptors. (Pages accessed through page tables are in the first
tier, since N=0.)
3. More tiers provide better protection for pages accessed more than
twice through file descriptors, when under heavy buffered I/O
workloads.
Server benchmark results:
Single workload:
fio (buffered I/O): +[30, 32]%
IOPS BW
5.19-rc1: 2673k 10.2GiB/s
patch1-6: 3491k 13.3GiB/s
Single workload:
memcached (anon): -[4, 6]%
Ops/sec KB/sec
5.19-rc1: 1161501.04 45177.25
patch1-6: 1106168.46 43025.04
Configurations:
CPU: two Xeon 6154
Mem: total 256G
Node 1 was only used as a ram disk to reduce the variance in the
results.
patch drivers/block/brd.c <<EOF
99,100c99,100
< gfp_flags = GFP_NOIO | __GFP_ZERO | __GFP_HIGHMEM;
< page = alloc_page(gfp_flags);
---
> gfp_flags = GFP_NOIO | __GFP_ZERO | __GFP_HIGHMEM | __GFP_THISNODE;
> page = alloc_pages_node(1, gfp_flags, 0);
EOF
cat >>/etc/systemd/system.conf <<EOF
CPUAffinity=numa
NUMAPolicy=bind
NUMAMask=0
EOF
cat >>/etc/memcached.conf <<EOF
-m 184320
-s /var/run/memcached/memcached.sock
-a 0766
-t 36
-B binary
EOF
cat fio.sh
modprobe brd rd_nr=1 rd_size=113246208
swapoff -a
mkfs.ext4 /dev/ram0
mount -t ext4 /dev/ram0 /mnt
mkdir /sys/fs/cgroup/user.slice/test
echo 38654705664 >/sys/fs/cgroup/user.slice/test/memory.max
echo $$ >/sys/fs/cgroup/user.slice/test/cgroup.procs
fio -name=mglru --numjobs=72 --directory=/mnt --size=1408m \
--buffered=1 --ioengine=io_uring --iodepth=128 \
--iodepth_batch_submit=32 --iodepth_batch_complete=32 \
--rw=randread --random_distribution=random --norandommap \
--time_based --ramp_time=10m --runtime=5m --group_reporting
cat memcached.sh
modprobe brd rd_nr=1 rd_size=113246208
swapoff -a
mkswap /dev/ram0
swapon /dev/ram0
memtier_benchmark -S /var/run/memcached/memcached.sock \
-P memcache_binary -n allkeys --key-minimum=1 \
--key-maximum=65000000 --key-pattern=P:P -c 1 -t 36 \
--ratio 1:0 --pipeline 8 -d 2000
memtier_benchmark -S /var/run/memcached/memcached.sock \
-P memcache_binary -n allkeys --key-minimum=1 \
--key-maximum=65000000 --key-pattern=R:R -c 1 -t 36 \
--ratio 0:1 --pipeline 8 --randomize --distinct-client-seed
Client benchmark results:
kswapd profiles:
5.19-rc1
40.33% page_vma_mapped_walk (overhead)
21.80% lzo1x_1_do_compress (real work)
7.53% do_raw_spin_lock
3.95% _raw_spin_unlock_irq
2.52% vma_interval_tree_iter_next
2.37% page_referenced_one
2.28% vma_interval_tree_subtree_search
1.97% anon_vma_interval_tree_iter_first
1.60% ptep_clear_flush
1.06% __zram_bvec_write
patch1-6
39.03% lzo1x_1_do_compress (real work)
18.47% page_vma_mapped_walk (overhead)
6.74% _raw_spin_unlock_irq
3.97% do_raw_spin_lock
2.49% ptep_clear_flush
2.48% anon_vma_interval_tree_iter_first
1.92% page_referenced_one
1.88% __zram_bvec_write
1.48% memmove
1.31% vma_interval_tree_iter_next
Configurations:
CPU: single Snapdragon 7c
Mem: total 4G
ChromeOS MemoryPressure [1]
[1] https://chromium.googlesource.com/chromiumos/platform/tast-tests/
Link: https://lkml.kernel.org/r/20220918080010.2920238-7-yuzhao@google.com
Change-Id: I30b26b3086ce1879b83b96eb265f8f0dcb16a1fb
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit ac35a490237446b71e3b4b782b1596967edd0aa8)
[Resolve confilcts in mm/Kconfig, mm/swap.c, mm/vmscan.c]
Bug: 249601646
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
|
||
|
|
d5b2fa1c7b |
BACKPORT: mm: multi-gen LRU: groundwork
Evictable pages are divided into multiple generations for each lruvec. The youngest generation number is stored in lrugen->max_seq for both anon and file types as they are aged on an equal footing. The oldest generation numbers are stored in lrugen->min_seq[] separately for anon and file types as clean file pages can be evicted regardless of swap constraints. These three variables are monotonically increasing. Generation numbers are truncated into order_base_2(MAX_NR_GENS+1) bits in order to fit into the gen counter in page->flags. Each truncated generation number is an index to lrugen->lists[]. The sliding window technique is used to track at least MIN_NR_GENS and at most MAX_NR_GENS generations. The gen counter stores a value within [1, MAX_NR_GENS] while a page is on one of lrugen->lists[]. Otherwise it stores 0. There are two conceptually independent procedures: "the aging", which produces young generations, and "the eviction", which consumes old generations. They form a closed-loop system, i.e., "the page reclaim". Both procedures can be invoked from userspace for the purposes of working set estimation and proactive reclaim. These techniques are commonly used to optimize job scheduling (bin packing) in data centers [1][2]. To avoid confusion, the terms "hot" and "cold" will be applied to the multi-gen LRU, as a new convention; the terms "active" and "inactive" will be applied to the active/inactive LRU, as usual. The protection of hot pages and the selection of cold pages are based on page access channels and patterns. There are two access channels: one through page tables and the other through file descriptors. The protection of the former channel is by design stronger because: 1. The uncertainty in determining the access patterns of the former channel is higher due to the approximation of the accessed bit. 2. The cost of evicting the former channel is higher due to the TLB flushes required and the likelihood of encountering the dirty bit. 3. The penalty of underprotecting the former channel is higher because applications usually do not prepare themselves for major page faults like they do for blocked I/O. E.g., GUI applications commonly use dedicated I/O threads to avoid blocking rendering threads. There are also two access patterns: one with temporal locality and the other without. For the reasons listed above, the former channel is assumed to follow the former pattern unless VM_SEQ_READ or VM_RAND_READ is present; the latter channel is assumed to follow the latter pattern unless outlying refaults have been observed [3][4]. The next patch will address the "outlying refaults". Three macros, i.e., LRU_REFS_WIDTH, LRU_REFS_PGOFF and LRU_REFS_MASK, used later are added in this patch to make the entire patchset less diffy. A page is added to the youngest generation on faulting. The aging needs to check the accessed bit at least twice before handing this page over to the eviction. The first check takes care of the accessed bit set on the initial fault; the second check makes sure this page has not been used since then. This protocol, AKA second chance, requires a minimum of two generations, hence MIN_NR_GENS. [1] https://dl.acm.org/doi/10.1145/3297858.3304053 [2] https://dl.acm.org/doi/10.1145/3503222.3507731 [3] https://lwn.net/Articles/495543/ [4] https://lwn.net/Articles/815342/ Link: https://lkml.kernel.org/r/20220918080010.2920238-6-yuzhao@google.com Change-Id: I7b24d1e9d263e4eb2c2ee23f2eb143824fcb5201 Signed-off-by: Yu Zhao <yuzhao@google.com> Acked-by: Brian Geffon <bgeffon@google.com> Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name> Acked-by: Steven Barrett <steven@liquorix.net> Acked-by: Suleiman Souhlal <suleiman@google.com> Tested-by: Daniel Byrne <djbyrne@mtu.edu> Tested-by: Donald Carr <d@chaos-reins.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> Tested-by: Shuang Zhai <szhai2@cs.rochester.edu> Tested-by: Sofia Trinh <sofia.trinh@edi.works> Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Barry Song <baohua@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michael Larabel <Michael@MichaelLarabel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit ec1c86b25f4bdd9dce6436c0539d2a6ae676e1c4) [ Resolve conflicts in mm/memory.c, mm/memcontrol.c, mm/Kconfig, include/linux/mm_inline.h] Bug: 249601646 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> |
||
|
|
2635d7d108 |
Revert "FROMLIST: mm: multi-gen LRU: groundwork"
This reverts commit
|
||
|
|
e931b1d222 |
Revert "FROMLIST: mm: multi-gen LRU: minimal implementation"
This reverts commit
|
||
|
|
02dc0d1dda |
Revert "FROMLIST: mm: multi-gen LRU: support page table walks"
This reverts commit
|
||
|
|
8994fcd031 |
Revert "FROMLIST: mm: multi-gen LRU: kill switch"
This reverts commit
|
||
|
|
f5ea8b2710 |
Merge 5.15.80 into android14-5.15
Changes in 5.15.80 mm: hwpoison: refactor refcount check handling mm: hwpoison: handle non-anonymous THP correctly mm: shmem: don't truncate page if memory failure happens ASoC: wm5102: Revert "ASoC: wm5102: Fix PM disable depth imbalance in wm5102_probe" ASoC: wm5110: Revert "ASoC: wm5110: Fix PM disable depth imbalance in wm5110_probe" ASoC: wm8997: Revert "ASoC: wm8997: Fix PM disable depth imbalance in wm8997_probe" ASoC: mt6660: Keep the pm_runtime enables before component stuff in mt6660_i2c_probe ASoC: rt1019: Fix the TDM settings ASoC: wm8962: Add an event handler for TEMP_HP and TEMP_SPK spi: intel: Fix the offset to get the 64K erase opcode ASoC: codecs: jz4725b: add missed Line In power control bit ASoC: codecs: jz4725b: fix reported volume for Master ctl ASoC: codecs: jz4725b: use right control for Capture Volume ASoC: codecs: jz4725b: fix capture selector naming ASoC: Intel: sof_sdw: add quirk variant for LAPBC710 NUC15 selftests/futex: fix build for clang selftests/intel_pstate: fix build for ARCH=x86_64 ASoC: rt1308-sdw: add the default value of some registers drm/amd/display: Remove wrong pipe control lock ACPI: scan: Add LATT2021 to acpi_ignore_dep_ids[] RDMA/efa: Add EFA 0xefa2 PCI ID btrfs: raid56: properly handle the error when unable to find the missing stripe NFSv4: Retry LOCK on OLD_STATEID during delegation return ACPI: x86: Add another system to quirk list for forcing StorageD3Enable firmware: arm_scmi: Cleanup the core driver removal callback i2c: tegra: Allocate DMA memory for DMA engine i2c: i801: add lis3lv02d's I2C address for Vostro 5568 drm/imx: imx-tve: Fix return type of imx_tve_connector_mode_valid btrfs: remove pointless and double ulist frees in error paths of qgroup tests Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm x86/cpu: Add several Intel server CPU model numbers ASoC: codecs: jz4725b: Fix spelling mistake "Sourc" -> "Source", "Routee" -> "Route" mtd: spi-nor: intel-spi: Disable write protection only if asked spi: intel: Use correct mask for flash and protected regions KVM: x86/pmu: Do not speculatively query Intel GP PMCs that don't exist yet hugetlbfs: don't delete error page from pagecache arm64: dts: qcom: sa8155p-adp: Specify which LDO modes are allowed arm64: dts: qcom: sm8150-xperia-kumano: Specify which LDO modes are allowed arm64: dts: qcom: sm8250-xperia-edo: Specify which LDO modes are allowed arm64: dts: qcom: sm8350-hdk: Specify which LDO modes are allowed spi: stm32: Print summary 'callbacks suppressed' message ARM: dts: at91: sama7g5: fix signal name of pin PB2 ASoC: core: Fix use-after-free in snd_soc_exit() ASoC: tas2770: Fix set_tdm_slot in case of single slot ASoC: tas2764: Fix set_tdm_slot in case of single slot ARM: at91: pm: avoid soft resetting AC DLL serial: 8250: omap: Fix missing PM runtime calls for omap8250_set_mctrl() serial: 8250_omap: remove wait loop from Errata i202 workaround serial: 8250: omap: Fix unpaired pm_runtime_put_sync() in omap8250_remove() serial: 8250: omap: Flush PM QOS work on remove serial: imx: Add missing .thaw_noirq hook tty: n_gsm: fix sleep-in-atomic-context bug in gsm_control_send bpf, test_run: Fix alignment problem in bpf_prog_test_run_skb() ASoC: soc-utils: Remove __exit for snd_soc_util_exit() pinctrl: rockchip: list all pins in a possible mux route for PX30 scsi: scsi_transport_sas: Fix error handling in sas_phy_add() block: sed-opal: kmalloc the cmd/resp buffers bpf: Fix memory leaks in __check_func_call arm64: Fix bit-shifting UB in the MIDR_CPU_MODEL() macro siox: fix possible memory leak in siox_device_add() parport_pc: Avoid FIFO port location truncation pinctrl: devicetree: fix null pointer dereferencing in pinctrl_dt_to_map drm/vc4: kms: Fix IS_ERR() vs NULL check for vc4_kms drm/panel: simple: set bpc field for logic technologies displays drm/drv: Fix potential memory leak in drm_dev_init() drm: Fix potential null-ptr-deref in drm_vblank_destroy_worker() ARM: dts: imx7: Fix NAND controller size-cells arm64: dts: imx8mm: Fix NAND controller size-cells arm64: dts: imx8mn: Fix NAND controller size-cells ata: libata-transport: fix double ata_host_put() in ata_tport_add() ata: libata-transport: fix error handling in ata_tport_add() ata: libata-transport: fix error handling in ata_tlink_add() ata: libata-transport: fix error handling in ata_tdev_add() nfp: change eeprom length to max length enumerators MIPS: fix duplicate definitions for exported symbols MIPS: Loongson64: Add WARN_ON on kexec related kmalloc failed bpf: Initialize same number of free nodes for each pcpu_freelist net: bgmac: Drop free_netdev() from bgmac_enet_remove() mISDN: fix possible memory leak in mISDN_dsp_element_register() net: hinic: Fix error handling in hinic_module_init() net: stmmac: ensure tx function is not running in stmmac_xdp_release() soc: imx8m: Enable OCOTP clock before reading the register net: liquidio: release resources when liquidio driver open failed mISDN: fix misuse of put_device() in mISDN_register_device() net: macvlan: Use built-in RCU list checking net: caif: fix double disconnect client in chnl_net_open() bnxt_en: Remove debugfs when pci_register_driver failed net: mhi: Fix memory leak in mhi_net_dellink() net: dsa: make dsa_master_ioctl() see through port_hwtstamp_get() shims xen/pcpu: fix possible memory leak in register_pcpu() net: ionic: Fix error handling in ionic_init_module() net: ena: Fix error handling in ena_init() net: hns3: fix setting incorrect phy link ksettings for firmware in resetting process bridge: switchdev: Fix memory leaks when changing VLAN protocol drbd: use after free in drbd_create_device() platform/x86/intel: pmc: Don't unconditionally attach Intel PMC when virtualized platform/surface: aggregator: Do not check for repeated unsequenced packets cifs: add check for returning value of SMB2_close_init net: ag71xx: call phylink_disconnect_phy if ag71xx_hw_enable() fail in ag71xx_open() net/x25: Fix skb leak in x25_lapb_receive_frame() cifs: Fix wrong return value checking when GETFLAGS net: microchip: sparx5: Fix potential null-ptr-deref in sparx_stats_init() and sparx5_start() net: thunderbolt: Fix error handling in tbnet_init() cifs: add check for returning value of SMB2_set_info_init ftrace: Fix the possible incorrect kernel message ftrace: Optimize the allocation for mcount entries ftrace: Fix null pointer dereference in ftrace_add_mod() ring_buffer: Do not deactivate non-existant pages tracing: Fix memory leak in tracing_read_pipe() tracing/ring-buffer: Have polling block on watermark tracing: Fix memory leak in test_gen_synth_cmd() and test_empty_synth_event() tracing: Fix wild-memory-access in register_synth_event() tracing: Fix race where eprobes can be called before the event tracing: kprobe: Fix potential null-ptr-deref on trace_event_file in kprobe_event_gen_test_exit() tracing: kprobe: Fix potential null-ptr-deref on trace_array in kprobe_event_gen_test_exit() drm/amd/display: Add HUBP surface flip interrupt handler ALSA: usb-audio: Drop snd_BUG_ON() from snd_usbmidi_output_open() ALSA: hda/realtek: fix speakers for Samsung Galaxy Book Pro ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy Book Pro 360 Revert "usb: dwc3: disable USB core PHY management" slimbus: qcom-ngd: Fix build error when CONFIG_SLIM_QCOM_NGD_CTRL=y && CONFIG_QCOM_RPROC_COMMON=m slimbus: stream: correct presence rate frequencies speakup: fix a segfault caused by switching consoles USB: bcma: Make GPIO explicitly optional USB: serial: option: add Sierra Wireless EM9191 USB: serial: option: remove old LARA-R6 PID USB: serial: option: add u-blox LARA-R6 00B modem USB: serial: option: add u-blox LARA-L6 modem USB: serial: option: add Fibocom FM160 0x0111 composition usb: add NO_LPM quirk for Realforce 87U Keyboard usb: chipidea: fix deadlock in ci_otg_del_timer usb: cdns3: host: fix endless superspeed hub port reset usb: typec: mux: Enter safe mode only when pins need to be reconfigured iio: adc: at91_adc: fix possible memory leak in at91_adc_allocate_trigger() iio: trigger: sysfs: fix possible memory leak in iio_sysfs_trig_init() iio: adc: mp2629: fix wrong comparison of channel iio: adc: mp2629: fix potential array out of bound access iio: pressure: ms5611: changed hardcoded SPI speed to value limited dm ioctl: fix misbehavior if list_versions races with module loading serial: 8250: Fall back to non-DMA Rx if IIR_RDI occurs serial: 8250: Flush DMA Rx on RLSI serial: 8250_lpss: Configure DMA also w/o DMA filter Input: iforce - invert valid length check when fetching device IDs maccess: Fix writing offset in case of fault in strncpy_from_kernel_nofault() net: phy: marvell: add sleep time after enabling the loopback bit scsi: zfcp: Fix double free of FSF request when qdio send fails iommu/vt-d: Preset Access bit for IOVA in FL non-leaf paging entries iommu/vt-d: Set SRE bit only when hardware has SRS cap firmware: coreboot: Register bus in module init mmc: core: properly select voltage range without power cycle mmc: sdhci-pci-o2micro: fix card detect fail issue caused by CD# debounce timeout mmc: sdhci-pci: Fix possible memory leak caused by missing pci_dev_put() docs: update mediator contact information in CoC doc misc/vmw_vmci: fix an infoleak in vmci_host_do_receive_datagram() perf/x86/intel/pt: Fix sampling using single range output nvme: restrict management ioctls to admin nvme: ensure subsystem reset is single threaded serial: 8250_lpss: Use 16B DMA burst with Elkhart Lake perf: Improve missing SIGTRAP checking ring-buffer: Include dropped pages in counting dirty patches tracing: Fix warning on variable 'struct trace_array' net: use struct_group to copy ip/ipv6 header addresses scsi: target: tcm_loop: Fix possible name leak in tcm_loop_setup_hba_bus() scsi: scsi_debug: Fix possible UAF in sdebug_add_host_helper() kprobes: Skip clearing aggrprobe's post_handler in kprobe-on-ftrace case Input: i8042 - fix leaking of platform device on module removal macvlan: enforce a consistent minimal mtu tcp: cdg: allow tcp_cdg_release() to be called multiple times kcm: avoid potential race in kcm_tx_work kcm: close race conditions on sk_receive_queue 9p: trans_fd/p9_conn_cancel: drop client lock earlier gfs2: Check sb_bsize_shift after reading superblock gfs2: Switch from strlcpy to strscpy 9p/trans_fd: always use O_NONBLOCK read/write wifi: wext: use flex array destination for memcpy() mm: fs: initialize fsdata passed to write_begin/write_end interface net/9p: use a dedicated spinlock for trans_fd ntfs: fix use-after-free in ntfs_attr_find() ntfs: fix out-of-bounds read in ntfs_attr_find() ntfs: check overflow when iterating ATTR_RECORDs Linux 5.15.80 Change-Id: Idc9aa4c30c528dd194bc813201cbb2c5df8c1d62 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
c49cc2c059 |
kprobes: Skip clearing aggrprobe's post_handler in kprobe-on-ftrace case
[ Upstream commit 5dd7caf0bdc5d0bae7cf9776b4d739fb09bd5ebb ]
In __unregister_kprobe_top(), if the currently unregistered probe has
post_handler but other child probes of the aggrprobe do not have
post_handler, the post_handler of the aggrprobe is cleared. If this is
a ftrace-based probe, there is a problem. In later calls to
disarm_kprobe(), we will use kprobe_ftrace_ops because post_handler is
NULL. But we're armed with kprobe_ipmodify_ops. This triggers a WARN in
__disarm_kprobe_ftrace() and may even cause use-after-free:
Failed to disarm kprobe-ftrace at kernel_clone+0x0/0x3c0 (error -2)
WARNING: CPU: 5 PID: 137 at kernel/kprobes.c:1135 __disarm_kprobe_ftrace.isra.21+0xcf/0xe0
Modules linked in: testKprobe_007(-)
CPU: 5 PID: 137 Comm: rmmod Not tainted 6.1.0-rc4-dirty #18
[...]
Call Trace:
<TASK>
__disable_kprobe+0xcd/0xe0
__unregister_kprobe_top+0x12/0x150
? mutex_lock+0xe/0x30
unregister_kprobes.part.23+0x31/0xa0
unregister_kprobe+0x32/0x40
__x64_sys_delete_module+0x15e/0x260
? do_user_addr_fault+0x2cd/0x6b0
do_syscall_64+0x3a/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
[...]
For the kprobe-on-ftrace case, we keep the post_handler setting to
identify this aggrprobe armed with kprobe_ipmodify_ops. This way we
can disarm it correctly.
Link: https://lore.kernel.org/all/20221112070000.35299-1-lihuafei1@huawei.com/
Fixes:
|
||
|
|
73cf0ff9a3 |
ring-buffer: Include dropped pages in counting dirty patches
[ Upstream commit 31029a8b2c7e656a0289194ef16415050ae4c4ac ]
The function ring_buffer_nr_dirty_pages() was created to find out how many
pages are filled in the ring buffer. There's two running counters. One is
incremented whenever a new page is touched (pages_touched) and the other
is whenever a page is read (pages_read). The dirty count is the number
touched minus the number read. This is used to determine if a blocked task
should be woken up if the percentage of the ring buffer it is waiting for
is hit.
The problem is that it does not take into account dropped pages (when the
new writes overwrite pages that were not read). And then the dirty pages
will always be greater than the percentage.
This makes the "buffer_percent" file inaccurate, as the number of dirty
pages end up always being larger than the percentage, event when it's not
and this causes user space to be woken up more than it wants to be.
Add a new counter to keep track of lost pages, and include that in the
accounting of dirty pages so that it is actually accurate.
Link: https://lkml.kernel.org/r/20221021123013.55fb6055@gandalf.local.home
Fixes:
|
||
|
|
35c60b4e8c |
perf: Improve missing SIGTRAP checking
[ Upstream commit bb88f9695460bec25aa30ba9072595025cf6c8af ] To catch missing SIGTRAP we employ a WARN in __perf_event_overflow(), which fires if pending_sigtrap was already set: returning to user space without consuming pending_sigtrap, and then having the event fire again would re-enter the kernel and trigger the WARN. This, however, seemed to miss the case where some events not associated with progress in the user space task can fire and the interrupt handler runs before the IRQ work meant to consume pending_sigtrap (and generate the SIGTRAP). syzbot gifted us this stack trace: | WARNING: CPU: 0 PID: 3607 at kernel/events/core.c:9313 __perf_event_overflow | Modules linked in: | CPU: 0 PID: 3607 Comm: syz-executor100 Not tainted 6.1.0-rc2-syzkaller-00073-g88619e77b33d #0 | Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/11/2022 | RIP: 0010:__perf_event_overflow+0x498/0x540 kernel/events/core.c:9313 | <...> | Call Trace: | <TASK> | perf_swevent_hrtimer+0x34f/0x3c0 kernel/events/core.c:10729 | __run_hrtimer kernel/time/hrtimer.c:1685 [inline] | __hrtimer_run_queues+0x1c6/0xfb0 kernel/time/hrtimer.c:1749 | hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811 | local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1096 [inline] | __sysvec_apic_timer_interrupt+0x17c/0x640 arch/x86/kernel/apic/apic.c:1113 | sysvec_apic_timer_interrupt+0x40/0xc0 arch/x86/kernel/apic/apic.c:1107 | asm_sysvec_apic_timer_interrupt+0x16/0x20 arch/x86/include/asm/idtentry.h:649 | <...> | </TASK> In this case, syzbot produced a program with event type PERF_TYPE_SOFTWARE and config PERF_COUNT_SW_CPU_CLOCK. The hrtimer manages to fire again before the IRQ work got a chance to run, all while never having returned to user space. Improve the WARN to check for real progress in user space: approximate this by storing a 32-bit hash of the current IP into pending_sigtrap, and if an event fires while pending_sigtrap still matches the previous IP, we assume no progress (false negatives are possible given we could return to user space and trigger again on the same IP). Fixes: ca6c21327c6a ("perf: Fix missing SIGTRAPs") Reported-by: syzbot+b8ded3e2e2c6adde4990@syzkaller.appspotmail.com Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221031093513.3032814-1-elver@google.com Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
e57daa7503 |
tracing: kprobe: Fix potential null-ptr-deref on trace_array in kprobe_event_gen_test_exit()
commit 22ea4ca9631eb137e64e5ab899e9c89cb6670959 upstream.
When test_gen_kprobe_cmd() failed after kprobe_event_gen_cmd_end(), it
will goto delete, which will call kprobe_event_delete() and release the
corresponding resource. However, the trace_array in gen_kretprobe_test
will point to the invalid resource. Set gen_kretprobe_test to NULL
after called kprobe_event_delete() to prevent null-ptr-deref.
BUG: kernel NULL pointer dereference, address: 0000000000000070
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 246 Comm: modprobe Tainted: G W
6.1.0-rc1-00174-g9522dc5c87da-dirty #248
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
RIP: 0010:__ftrace_set_clr_event_nolock+0x53/0x1b0
Code: e8 82 26 fc ff 49 8b 1e c7 44 24 0c ea ff ff ff 49 39 de 0f 84 3c
01 00 00 c7 44 24 18 00 00 00 00 e8 61 26 fc ff 48 8b 6b 10 <44> 8b 65
70 4c 8b 6d 18 41 f7 c4 00 02 00 00 75 2f
RSP: 0018:ffffc9000159fe00 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88810971d268 RCX: 0000000000000000
RDX: ffff8881080be600 RSI: ffffffff811b48ff RDI: ffff88810971d058
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
R10: ffffc9000159fe58 R11: 0000000000000001 R12: ffffffffa0001064
R13: ffffffffa000106c R14: ffff88810971d238 R15: 0000000000000000
FS: 00007f89eeff6540(0000) GS:ffff88813b600000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000070 CR3: 000000010599e004 CR4: 0000000000330ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__ftrace_set_clr_event+0x3e/0x60
trace_array_set_clr_event+0x35/0x50
? 0xffffffffa0000000
kprobe_event_gen_test_exit+0xcd/0x10b [kprobe_event_gen_test]
__x64_sys_delete_module+0x206/0x380
? lockdep_hardirqs_on_prepare+0xd8/0x190
? syscall_enter_from_user_mode+0x1c/0x50
do_syscall_64+0x3f/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f89eeb061b7
Link: https://lore.kernel.org/all/20221108015130.28326-3-shangxiaojing@huawei.com/
Fixes:
|
||
|
|
3a41c0f2a5 |
tracing: kprobe: Fix potential null-ptr-deref on trace_event_file in kprobe_event_gen_test_exit()
commit e0d75267f59d7084e0468bd68beeb1bf9c71d7c0 upstream.
When trace_get_event_file() failed, gen_kretprobe_test will be assigned
as the error code. If module kprobe_event_gen_test is removed now, the
null pointer dereference will happen in kprobe_event_gen_test_exit().
Check if gen_kprobe_test or gen_kretprobe_test is error code or NULL
before dereference them.
BUG: kernel NULL pointer dereference, address: 0000000000000012
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 3 PID: 2210 Comm: modprobe Not tainted
6.1.0-rc1-00171-g2159299a3b74-dirty #217
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
RIP: 0010:kprobe_event_gen_test_exit+0x1c/0xb5 [kprobe_event_gen_test]
Code: Unable to access opcode bytes at 0xffffffff9ffffff2.
RSP: 0018:ffffc900015bfeb8 EFLAGS: 00010246
RAX: ffffffffffffffea RBX: ffffffffa0002080 RCX: 0000000000000000
RDX: ffffffffa0001054 RSI: ffffffffa0001064 RDI: ffffffffdfc6349c
RBP: ffffffffa0000000 R08: 0000000000000004 R09: 00000000001e95c0
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000800
R13: ffffffffa0002420 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f56b75be540(0000) GS:ffff88813bc00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffff9ffffff2 CR3: 000000010874a006 CR4: 0000000000330ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__x64_sys_delete_module+0x206/0x380
? lockdep_hardirqs_on_prepare+0xd8/0x190
? syscall_enter_from_user_mode+0x1c/0x50
do_syscall_64+0x3f/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Link: https://lore.kernel.org/all/20221108015130.28326-2-shangxiaojing@huawei.com/
Fixes:
|
||
|
|
7291dec4f2 |
tracing: Fix race where eprobes can be called before the event
commit 94eedf3dded5fb472ce97bfaf3ac1c6c29c35d26 upstream.
The flag that tells the event to call its triggers after reading the event
is set for eprobes after the eprobe is enabled. This leads to a race where
the eprobe may be triggered at the beginning of the event where the record
information is NULL. The eprobe then dereferences the NULL record causing
a NULL kernel pointer bug.
Test for a NULL record to keep this from happening.
Link: https://lore.kernel.org/linux-trace-kernel/20221116192552.1066630-1-rafaelmendsr@gmail.com/
Link: https://lore.kernel.org/linux-trace-kernel/20221117214249.2addbe10@gandalf.local.home
Cc: Linux Trace Kernel <linux-trace-kernel@vger.kernel.org>
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: stable@vger.kernel.org
Fixes:
|
||
|
|
6517b97134 |
tracing: Fix wild-memory-access in register_synth_event()
commit 1b5f1c34d3f5a664a57a5a7557a50e4e3cc2505c upstream.
In register_synth_event(), if set_synth_event_print_fmt() failed, then
both trace_remove_event_call() and unregister_trace_event() will be
called, which means the trace_event_call will call
__unregister_trace_event() twice. As the result, the second unregister
will causes the wild-memory-access.
register_synth_event
set_synth_event_print_fmt failed
trace_remove_event_call
event_remove
if call->event.funcs then
__unregister_trace_event (first call)
unregister_trace_event
__unregister_trace_event (second call)
Fix the bug by avoiding to call the second __unregister_trace_event() by
checking if the first one is called.
general protection fault, probably for non-canonical address
0xfbd59c0000000024: 0000 [#1] SMP KASAN PTI
KASAN: maybe wild-memory-access in range
[0xdead000000000120-0xdead000000000127]
CPU: 0 PID: 3807 Comm: modprobe Not tainted
6.1.0-rc1-00186-g76f33a7eedb4 #299
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
RIP: 0010:unregister_trace_event+0x6e/0x280
Code: 00 fc ff df 4c 89 ea 48 c1 ea 03 80 3c 02 00 0f 85 0e 02 00 00 48
b8 00 00 00 00 00 fc ff df 4c 8b 63 08 4c 89 e2 48 c1 ea 03 <80> 3c 02
00 0f 85 e2 01 00 00 49 89 2c 24 48 85 ed 74 28 e8 7a 9b
RSP: 0018:ffff88810413f370 EFLAGS: 00010a06
RAX: dffffc0000000000 RBX: ffff888105d050b0 RCX: 0000000000000000
RDX: 1bd5a00000000024 RSI: ffff888119e276e0 RDI: ffffffff835a8b20
RBP: dead000000000100 R08: 0000000000000000 R09: fffffbfff0913481
R10: ffffffff8489a407 R11: fffffbfff0913480 R12: dead000000000122
R13: ffff888105d050b8 R14: 0000000000000000 R15: ffff888105d05028
FS: 00007f7823e8d540(0000) GS:ffff888119e00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7823e7ebec CR3: 000000010a058002 CR4: 0000000000330ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__create_synth_event+0x1e37/0x1eb0
create_or_delete_synth_event+0x110/0x250
synth_event_run_command+0x2f/0x110
test_gen_synth_cmd+0x170/0x2eb [synth_event_gen_test]
synth_event_gen_test_init+0x76/0x9bc [synth_event_gen_test]
do_one_initcall+0xdb/0x480
do_init_module+0x1cf/0x680
load_module+0x6a50/0x70a0
__do_sys_finit_module+0x12f/0x1c0
do_syscall_64+0x3f/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Link: https://lkml.kernel.org/r/20221117012346.22647-3-shangxiaojing@huawei.com
Fixes:
|
||
|
|
07ba4f0603 |
tracing: Fix memory leak in test_gen_synth_cmd() and test_empty_synth_event()
commit a4527fef9afe5c903c718d0cd24609fe9c754250 upstream.
test_gen_synth_cmd() only free buf in fail path, hence buf will leak
when there is no failure. Add kfree(buf) to prevent the memleak. The
same reason and solution in test_empty_synth_event().
unreferenced object 0xffff8881127de000 (size 2048):
comm "modprobe", pid 247, jiffies 4294972316 (age 78.756s)
hex dump (first 32 bytes):
20 67 65 6e 5f 73 79 6e 74 68 5f 74 65 73 74 20 gen_synth_test
20 70 69 64 5f 74 20 6e 65 78 74 5f 70 69 64 5f pid_t next_pid_
backtrace:
[<000000004254801a>] kmalloc_trace+0x26/0x100
[<0000000039eb1cf5>] 0xffffffffa00083cd
[<000000000e8c3bc8>] 0xffffffffa00086ba
[<00000000c293d1ea>] do_one_initcall+0xdb/0x480
[<00000000aa189e6d>] do_init_module+0x1cf/0x680
[<00000000d513222b>] load_module+0x6a50/0x70a0
[<000000001fd4d529>] __do_sys_finit_module+0x12f/0x1c0
[<00000000b36c4c0f>] do_syscall_64+0x3f/0x90
[<00000000bbf20cf3>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
unreferenced object 0xffff8881127df000 (size 2048):
comm "modprobe", pid 247, jiffies 4294972324 (age 78.728s)
hex dump (first 32 bytes):
20 65 6d 70 74 79 5f 73 79 6e 74 68 5f 74 65 73 empty_synth_tes
74 20 20 70 69 64 5f 74 20 6e 65 78 74 5f 70 69 t pid_t next_pi
backtrace:
[<000000004254801a>] kmalloc_trace+0x26/0x100
[<00000000d4db9a3d>] 0xffffffffa0008071
[<00000000c31354a5>] 0xffffffffa00086ce
[<00000000c293d1ea>] do_one_initcall+0xdb/0x480
[<00000000aa189e6d>] do_init_module+0x1cf/0x680
[<00000000d513222b>] load_module+0x6a50/0x70a0
[<000000001fd4d529>] __do_sys_finit_module+0x12f/0x1c0
[<00000000b36c4c0f>] do_syscall_64+0x3f/0x90
[<00000000bbf20cf3>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Link: https://lkml.kernel.org/r/20221117012346.22647-2-shangxiaojing@huawei.com
Cc: <mhiramat@kernel.org>
Cc: <zanussi@kernel.org>
Cc: <fengguang.wu@intel.com>
Cc: stable@vger.kernel.org
Fixes:
|
||
|
|
8b318f3032 |
tracing/ring-buffer: Have polling block on watermark
commit 42fb0a1e84ff525ebe560e2baf9451ab69127e2b upstream.
Currently the way polling works on the ring buffer is broken. It will
return immediately if there's any data in the ring buffer whereas a read
will block until the watermark (defined by the tracefs buffer_percent file)
is hit.
That is, a select() or poll() will return as if there's data available,
but then the following read will block. This is broken for the way
select()s and poll()s are supposed to work.
Have the polling on the ring buffer also block the same way reads and
splice does on the ring buffer.
Link: https://lkml.kernel.org/r/20221020231427.41be3f26@gandalf.local.home
Cc: Linux Trace Kernel <linux-trace-kernel@vger.kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Primiano Tucci <primiano@google.com>
Cc: stable@vger.kernel.org
Fixes:
|
||
|
|
2c21ee020c |
tracing: Fix memory leak in tracing_read_pipe()
commit 649e72070cbbb8600eb823833e4748f5a0815116 upstream.
kmemleak reports this issue:
unreferenced object 0xffff888105a18900 (size 128):
comm "test_progs", pid 18933, jiffies 4336275356 (age 22801.766s)
hex dump (first 32 bytes):
25 73 00 90 81 88 ff ff 26 05 00 00 42 01 58 04 %s......&...B.X.
03 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000560143a1>] __kmalloc_node_track_caller+0x4a/0x140
[<000000006af00822>] krealloc+0x8d/0xf0
[<00000000c309be6a>] trace_iter_expand_format+0x99/0x150
[<000000005a53bdb6>] trace_check_vprintf+0x1e0/0x11d0
[<0000000065629d9d>] trace_event_printf+0xb6/0xf0
[<000000009a690dc7>] trace_raw_output_bpf_trace_printk+0x89/0xc0
[<00000000d22db172>] print_trace_line+0x73c/0x1480
[<00000000cdba76ba>] tracing_read_pipe+0x45c/0x9f0
[<0000000015b58459>] vfs_read+0x17b/0x7c0
[<000000004aeee8ed>] ksys_read+0xed/0x1c0
[<0000000063d3d898>] do_syscall_64+0x3b/0x90
[<00000000a06dda7f>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
iter->fmt alloced in
tracing_read_pipe() -> .. ->trace_iter_expand_format(), but not
freed, to fix, add free in tracing_release_pipe()
Link: https://lkml.kernel.org/r/1667819090-4643-1-git-send-email-wangyufen@huawei.com
Cc: stable@vger.kernel.org
Fixes:
|
||
|
|
00f74b1a98 |
ring_buffer: Do not deactivate non-existant pages
commit 56f4ca0a79a9f1af98f26c54b9b89ba1f9bcc6bd upstream.
rb_head_page_deactivate() expects cpu_buffer to contain a valid list of
->pages, so verify that the list is actually present before calling it.
Found by Linux Verification Center (linuxtesting.org) with the SVACE
static analysis tool.
Link: https://lkml.kernel.org/r/20221114143129.3534443-1-d-tatianin@yandex-team.ru
Cc: stable@vger.kernel.org
Fixes:
|
||
|
|
1bea037a1a |
ftrace: Fix null pointer dereference in ftrace_add_mod()
commit 19ba6c8af9382c4c05dc6a0a79af3013b9a35cd0 upstream.
The @ftrace_mod is allocated by kzalloc(), so both the members {prev,next}
of @ftrace_mode->list are NULL, it's not a valid state to call list_del().
If kstrdup() for @ftrace_mod->{func|module} fails, it goes to @out_free
tag and calls free_ftrace_mod() to destroy @ftrace_mod, then list_del()
will write prev->next and next->prev, where null pointer dereference
happens.
BUG: kernel NULL pointer dereference, address: 0000000000000008
Oops: 0002 [#1] PREEMPT SMP NOPTI
Call Trace:
<TASK>
ftrace_mod_callback+0x20d/0x220
? do_filp_open+0xd9/0x140
ftrace_process_regex.isra.51+0xbf/0x130
ftrace_regex_write.isra.52.part.53+0x6e/0x90
vfs_write+0xee/0x3a0
? __audit_filter_op+0xb1/0x100
? auditd_test_task+0x38/0x50
ksys_write+0xa5/0xe0
do_syscall_64+0x3a/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Kernel panic - not syncing: Fatal exception
So call INIT_LIST_HEAD() to initialize the list member to fix this issue.
Link: https://lkml.kernel.org/r/20221116015207.30858-1-xiujianfeng@huawei.com
Cc: stable@vger.kernel.org
Fixes:
|
||
|
|
fadfcf39fb |
ftrace: Optimize the allocation for mcount entries
commit bcea02b096333dc74af987cb9685a4dbdd820840 upstream.
If we can't allocate this size, try something smaller with half of the
size. Its order should be decreased by one instead of divided by two.
Link: https://lkml.kernel.org/r/20221109094434.84046-3-wangwensheng4@huawei.com
Cc: <mhiramat@kernel.org>
Cc: <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Fixes:
|
||
|
|
5c5f264289 |
ftrace: Fix the possible incorrect kernel message
commit 08948caebe93482db1adfd2154eba124f66d161d upstream.
If the number of mcount entries is an integer multiple of
ENTRIES_PER_PAGE, the page count showing on the console would be wrong.
Link: https://lkml.kernel.org/r/20221109094434.84046-2-wangwensheng4@huawei.com
Cc: <mhiramat@kernel.org>
Cc: <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Fixes:
|
||
|
|
7ff4fa179e |
bpf: Initialize same number of free nodes for each pcpu_freelist
[ Upstream commit 4b45cd81f737d79d0fbfc0d320a1e518e7f0bbf0 ]
pcpu_freelist_populate() initializes nr_elems / num_possible_cpus() + 1
free nodes for some CPUs, and then possibly one CPU with fewer nodes,
followed by remaining cpus with 0 nodes. For example, when nr_elems == 256
and num_possible_cpus() == 32, CPU 0~27 each gets 9 free nodes, CPU 28 gets
4 free nodes, CPU 29~31 get 0 free nodes, while in fact each CPU should get
8 nodes equally.
This patch initializes nr_elems / num_possible_cpus() free nodes for each
CPU firstly, then allocates the remaining free nodes by one for each CPU
until no free nodes left.
Fixes:
|