kernel_arpi

Author	SHA1	Message	Date
Nicholas Piggin	5dce84f475	powerpc/tm: Fix more userspace r13 corruption [ Upstream commit 9d71165d3934e607070c4e48458c0cf161b1baea ] Commit `cf13435b73` ("powerpc/tm: Fix userspace r13 corruption") fixes a problem in treclaim where a SLB miss can occur on the thread_struct->ckpt_regs while SCRATCH0 is live with the saved user r13 value, clobbering it with the kernel r13 and ultimately resulting in kernel r13 being stored in ckpt_regs. There is an equivalent problem in trechkpt where the user r13 value is loaded into r13 from chkpt_regs to be recheckpointed, but a SLB miss could occur on ckpt_regs accesses after that, which will result in r13 being clobbered with a kernel value and that will get recheckpointed and then restored to user registers. The same memory page is accessed right before this critical window where a SLB miss could cause corruption, so hitting the bug requires the SLB entry be removed within a small window of instructions, which is possible if a SLB related MCE hits there. PAPR also permits the hypervisor to discard this SLB entry (because slb_shadow->persistent is only set to SLB_NUM_BOLTED) although it's not known whether any implementations would do this (KVM does not). So this is an extremely unlikely bug, only found by inspection. Fix this by also storing user r13 in a temporary location on the kernel stack and don't change the r13 register from kernel r13 until the RI=0 critical section that does not fault. The SCRATCH0 change is not strictly part of the fix, it's only used in the RI=0 section so it does not have the same problem as the previous SCRATCH0 bug. Fixes: `98ae22e15b` ("powerpc: Add helper functions for transactional memory context switching") Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220311024733.48926-1-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-07-12 16:35:02 +02:00
Nicholas Piggin	ed8a5d63a0	powerpc: flexible GPR range save/restore macros [ Upstream commit aebd1fb45c622e9a2b06fb70665d084d3a8d6c78 ] Introduce macros that operate on a (start, end) range of GPRs, which reduces lines of code and need to do mental arithmetic while reading the code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211022061322.2671178-1-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-07-12 16:35:02 +02:00
Liam Howlett	0c1d781d6b	powerpc/prom_init: Fix kernel config grep commit 6886da5f49e6d86aad76807a93f3eef5e4f01b10 upstream. When searching for config options, use the KCONFIG_CONFIG shell variable so that builds using non-standard config locations work. Fixes: `26deb04342` ("powerpc: prepare string/mem functions for KASAN") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220624011745.4060795-1-Liam.Howlett@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-07-07 17:53:24 +02:00
Naveen N. Rao	ab0b6dc5e1	powerpc/ftrace: Remove ftrace init tramp once kernel init is complete commit 84ade0a6655bee803d176525ef457175cbf4df22 upstream. Stop using the ftrace trampoline for init section once kernel init is complete. Fixes: `67361cf807` ("powerpc/ftrace: Handle large kernel configs") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220516071422.463738-1-naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-07-02 16:41:14 +02:00
Andrew Donnellan	c1cfae46c5	powerpc/rtas: Allow ibm,platform-dump RTAS call with null buffer address commit 7bc08056a6dabc3a1442216daf527edf61ac24b6 upstream. Add a special case to block_rtas_call() to allow the ibm,platform-dump RTAS call through the RTAS filter if the buffer address is 0. According to PAPR, ibm,platform-dump is called with a null buffer address to notify the platform firmware that processing of a particular dump is finished. Without this, on a pseries machine with CONFIG_PPC_RTAS_FILTER enabled, an application such as rtas_errd that is attempting to retrieve a dump will encounter an error at the end of the retrieval process. Fixes: `bd59380c5b` ("powerpc/rtas: Restrict RTAS requests from userspace") Cc: stable@vger.kernel.org Reported-by: Sathvika Vasireddy <sathvika@linux.ibm.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com> Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220614134952.156010-1-ajd@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-06-29 09:03:30 +02:00
Naveen N. Rao	fe643b5afd	powerpc: Enable execve syscall exit tracepoint commit ec6d0dde71d760aa60316f8d1c9a1b0d99213529 upstream. On execve[at], we are zero'ing out most of the thread register state including gpr[0], which contains the syscall number. Due to this, we fail to trigger the syscall exit tracepoint properly. Fix this by retaining gpr[0] in the thread register state. Before this patch: # tail /sys/kernel/debug/tracing/trace cat-123 [000] ..... 61.449351: sys_execve(filename: 7fffa6b23448, argv: 7fffa6b233e0, envp: 7fffa6b233f8) cat-124 [000] ..... 62.428481: sys_execve(filename: 7fffa6b23448, argv: 7fffa6b233e0, envp: 7fffa6b233f8) echo-125 [000] ..... 65.813702: sys_execve(filename: 7fffa6b23378, argv: 7fffa6b233a0, envp: 7fffa6b233b0) echo-125 [000] ..... 65.822214: sys_execveat(fd: 0, filename: 1009ac48, argv: 7ffff65d0c98, envp: 7ffff65d0ca8, flags: 0) After this patch: # tail /sys/kernel/debug/tracing/trace cat-127 [000] ..... 100.416262: sys_execve(filename: 7fffa41b3448, argv: 7fffa41b33e0, envp: 7fffa41b33f8) cat-127 [000] ..... 100.418203: sys_execve -> 0x0 echo-128 [000] ..... 103.873968: sys_execve(filename: 7fffa41b3378, argv: 7fffa41b33a0, envp: 7fffa41b33b0) echo-128 [000] ..... 103.875102: sys_execve -> 0x0 echo-128 [000] ..... 103.882097: sys_execveat(fd: 0, filename: 1009ac48, argv: 7fffd10d2148, envp: 7fffd10d2158, flags: 0) echo-128 [000] ..... 103.883225: sys_execveat -> 0x0 Cc: stable@vger.kernel.org Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Tested-by: Sumit Dubey2 <Sumit.Dubey2@ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220609103328.41306-1-naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-06-29 09:03:30 +02:00
He Ying	70d6d6874d	powerpc/kasan: Silence KASAN warnings in __get_wchan() [ Upstream commit a1b29ba2f2c171b9bea73be993bfdf0a62d37d15 ] The following KASAN warning was reported in our kernel. BUG: KASAN: stack-out-of-bounds in get_wchan+0x188/0x250 Read of size 4 at addr d216f958 by task ps/14437 CPU: 3 PID: 14437 Comm: ps Tainted: G O 5.10.0 #1 Call Trace: [daa63858] [c0654348] dump_stack+0x9c/0xe4 (unreliable) [daa63888] [c035cf0c] print_address_description.constprop.3+0x8c/0x570 [daa63908] [c035d6bc] kasan_report+0x1ac/0x218 [daa63948] [c00496e8] get_wchan+0x188/0x250 [daa63978] [c0461ec8] do_task_stat+0xce8/0xe60 [daa63b98] [c0455ac8] proc_single_show+0x98/0x170 [daa63bc8] [c03cab8c] seq_read_iter+0x1ec/0x900 [daa63c38] [c03cb47c] seq_read+0x1dc/0x290 [daa63d68] [c037fc94] vfs_read+0x164/0x510 [daa63ea8] [c03808e4] ksys_read+0x144/0x1d0 [daa63f38] [c005b1dc] ret_from_syscall+0x0/0x38 --- interrupt: c00 at 0x8fa8f4 LR = 0x8fa8cc The buggy address belongs to the page: page:98ebcdd2 refcount:0 mapcount:0 mapping:00000000 index:0x2 pfn:0x1216f flags: 0x0() raw: 00000000 00000000 01010122 00000000 00000002 00000000 ffffffff 00000000 raw: 00000000 page dumped because: kasan: bad access detected Memory state around the buggy address: d216f800: 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00 00 00 00 d216f880: f2 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >d216f900: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 ^ d216f980: f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 d216fa00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 After looking into this issue, I find the buggy address belongs to the task stack region. It seems KASAN has something wrong. I look into the code of __get_wchan in x86 architecture and find the same issue has been resolved by the commit `f7d27c35dd` ("x86/mm, kasan: Silence KASAN warnings in get_wchan()"). The solution could be applied to powerpc architecture too. As Andrey Ryabinin said, get_wchan() is racy by design, it may access volatile stack of running task, thus it may access redzone in a stack frame and cause KASAN to warn about this. Use READ_ONCE_NOCHECK() to silence these warnings. Reported-by: Wanming Hu <huwanming@huaweil.com> Signed-off-by: He Ying <heying24@huawei.com> Signed-off-by: Chen Jingwen <chenjingwen6@huawei.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220121014418.155675-1-heying24@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-06-22 14:21:55 +02:00
Alexey Kardashevskiy	82a2059a11	powerpc/mm: Switch obsolete dssall to .long commit d51f86cfd8e378d4907958db77da3074f6dce3ba upstream. The dssall ("Data Stream Stop All") instruction is obsolete altogether with other Data Cache Instructions since ISA 2.03 (year 2006). LLVM IAS does not support it but PPC970 seems to be using it. This switches dssall to .long as there is no much point in fixing LLVM. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211221055904.555763-6-aik@ozlabs.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-06-14 18:36:27 +02:00
Michael Ellerman	2a0165d278	powerpc/32: Fix overread/overwrite of thread_struct via ptrace commit 8e1278444446fc97778a5e5c99bca1ce0bbc5ec9 upstream. The ptrace PEEKUSR/POKEUSR (aka PEEKUSER/POKEUSER) API allows a process to read/write registers of another process. To get/set a register, the API takes an index into an imaginary address space called the "USER area", where the registers of the process are laid out in some fashion. The kernel then maps that index to a particular register in its own data structures and gets/sets the value. The API only allows a single machine-word to be read/written at a time. So 4 bytes on 32-bit kernels and 8 bytes on 64-bit kernels. The way floating point registers (FPRs) are addressed is somewhat complicated, because double precision float values are 64-bit even on 32-bit CPUs. That means on 32-bit kernels each FPR occupies two word-sized locations in the USER area. On 64-bit kernels each FPR occupies one word-sized location in the USER area. Internally the kernel stores the FPRs in an array of u64s, or if VSX is enabled, an array of pairs of u64s where one half of each pair stores the FPR. Which half of the pair stores the FPR depends on the kernel's endianness. To handle the different layouts of the FPRs depending on VSX/no-VSX and big/little endian, the TS_FPR() macro was introduced. Unfortunately the TS_FPR() macro does not take into account the fact that the addressing of each FPR differs between 32-bit and 64-bit kernels. It just takes the index into the "USER area" passed from userspace and indexes into the fp_state.fpr array. On 32-bit there are 64 indexes that address FPRs, but only 32 entries in the fp_state.fpr array, meaning the user can read/write 256 bytes past the end of the array. Because the fp_state sits in the middle of the thread_struct there are various fields than can be overwritten, including some pointers. As such it may be exploitable. It has also been observed to cause systems to hang or otherwise misbehave when using gdbserver, and is probably the root cause of this report which could not be easily reproduced: https://lore.kernel.org/linuxppc-dev/dc38afe9-6b78-f3f5-666b-986939e40fc6@keymile.com/ Rather than trying to make the TS_FPR() macro even more complicated to fix the bug, or add more macros, instead add a special-case for 32-bit kernels. This is more obvious and hopefully avoids a similar bug happening again in future. Note that because 32-bit kernels never have VSX enabled the code doesn't need to consider TS_FPRWIDTH/OFFSET at all. Add a BUILD_BUG_ON() to ensure that 32-bit && VSX is never enabled. Fixes: `87fec0514f` ("powerpc: PTRACE_PEEKUSR/PTRACE_POKEUSER of FPR registers in little endian builds") Cc: stable@vger.kernel.org # v3.13+ Reported-by: Ariel Miculas <ariel.miculas@belden.com> Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220609133245.573565-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-06-14 18:36:27 +02:00
Randy Dunlap	c8a9b3defa	powerpc/idle: Fix return value of __setup() handler [ Upstream commit b793a01000122d2bd133ba451a76cc135b5e162c ] __setup() handlers should return 1 to obsolete_checksetup() in init/main.c to indicate that the boot option has been handled. A return of 0 causes the boot option/value to be listed as an Unknown kernel parameter and added to init's (limited) argument or environment strings. Also, error return codes don't mean anything to obsolete_checksetup() -- only non-zero (usually 1) or zero. So return 1 from powersave_off(). Fixes: `302eca184f` ("[POWERPC] cell: use ppc_md->power_save instead of cbe_idle_loop") Reported-by: Igor Zhbanov <izh1979@gmail.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220502192925.19954-1-rdunlap@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-06-09 10:23:10 +02:00
Hari Bathini	2db3a8f541	powerpc/fadump: fix PT_LOAD segment for boot memory area [ Upstream commit 15eb77f873255cf9f4d703b63cfbd23c46579654 ] Boot memory area is setup as separate PT_LOAD segment in the vmcore as it is moved by f/w, on crash, to a destination address provided by the kernel. Having separate PT_LOAD segment helps in handling the different physical address and offset for boot memory area in the vmcore. Commit `ced1bf52f4` ("powerpc/fadump: merge adjacent memory ranges to reduce PT_LOAD segements") inadvertly broke this pre-condition for cases where some of the first kernel memory is available adjacent to boot memory area. This scenario is rare but possible when memory for fadump could not be reserved adjacent to boot memory area owing to memory hole or such. Reading memory from a vmcore exported in such scenario provides incorrect data. Fix it by ensuring no other region is folded into boot memory area. Fixes: `ced1bf52f4` ("powerpc/fadump: merge adjacent memory ranges to reduce PT_LOAD segements") Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220406093839.206608-2-hbathini@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-06-09 10:23:07 +02:00
Laurent Dufour	5ca40fcf0d	powerpc/rtas: Keep MSR[RI] set when calling RTAS [ Upstream commit b6b1c3ce06ca438eb24e0f45bf0e63ecad0369f5 ] RTAS runs in real mode (MSR[DR] and MSR[IR] unset) and in 32-bit big endian mode (MSR[SF,LE] unset). The change in MSR is done in enter_rtas() in a relatively complex way, since the MSR value could be hardcoded. Furthermore, a panic has been reported when hitting the watchdog interrupt while running in RTAS, this leads to the following stack trace: watchdog: CPU 24 Hard LOCKUP watchdog: CPU 24 TB:997512652051031, last heartbeat TB:997504470175378 (15980ms ago) ... Supported: No, Unreleased kernel CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded Tainted: G E X 5.14.21-150400.71.1.bz196362_2-default #1 SLE15-SP4 (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c NIP: 000000001fb41050 LR: 000000001fb4104c CTR: 0000000000000000 REGS: c00000000fc33d60 TRAP: 0100 Tainted: G E X (5.14.21-150400.71.1.bz196362_2-default) MSR: 8000000002981000 <SF,VEC,VSX,ME> CR: 48800002 XER: 20040020 CFAR: 000000000000011c IRQMASK: 1 GPR00: 0000000000000003 ffffffffffffffff 0000000000000001 00000000000050dc GPR04: 000000001ffb6100 0000000000000020 0000000000000001 000000001fb09010 GPR08: 0000000020000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 80040000072a40a8 c00000000ff8b680 0000000000000007 0000000000000034 GPR16: 000000001fbf6e94 000000001fbf6d84 000000001fbd1db0 000000001fb3f008 GPR20: 000000001fb41018 ffffffffffffffff 000000000000017f fffffffffffff68f GPR24: 000000001fb18fe8 000000001fb3e000 000000001fb1adc0 000000001fb1cf40 GPR28: 000000001fb26000 000000001fb460f0 000000001fb17f18 000000001fb17000 NIP [000000001fb41050] 0x1fb41050 LR [000000001fb4104c] 0x1fb4104c Call Trace: Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX Oops: Unrecoverable System Reset, sig: 6 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries ... Supported: No, Unreleased kernel CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded Tainted: G E X 5.14.21-150400.71.1.bz196362_2-default #1 SLE15-SP4 (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c NIP: 000000001fb41050 LR: 000000001fb4104c CTR: 0000000000000000 REGS: c00000000fc33d60 TRAP: 0100 Tainted: G E X (5.14.21-150400.71.1.bz196362_2-default) MSR: 8000000002981000 <SF,VEC,VSX,ME> CR: 48800002 XER: 20040020 CFAR: 000000000000011c IRQMASK: 1 GPR00: 0000000000000003 ffffffffffffffff 0000000000000001 00000000000050dc GPR04: 000000001ffb6100 0000000000000020 0000000000000001 000000001fb09010 GPR08: 0000000020000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 80040000072a40a8 c00000000ff8b680 0000000000000007 0000000000000034 GPR16: 000000001fbf6e94 000000001fbf6d84 000000001fbd1db0 000000001fb3f008 GPR20: 000000001fb41018 ffffffffffffffff 000000000000017f fffffffffffff68f GPR24: 000000001fb18fe8 000000001fb3e000 000000001fb1adc0 000000001fb1cf40 GPR28: 000000001fb26000 000000001fb460f0 000000001fb17f18 000000001fb17000 NIP [000000001fb41050] 0x1fb41050 LR [000000001fb4104c] 0x1fb4104c Call Trace: Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX ---[ end trace 3ddec07f638c34a2 ]--- This happens because MSR[RI] is unset when entering RTAS but there is no valid reason to not set it here. RTAS is expected to be called with MSR[RI] as specified in PAPR+ section "7.2.1 Machine State": R1–7.2.1–9. If called with MSR[RI] equal to 1, then RTAS must protect its own critical regions from recursion by setting the MSR[RI] bit to 0 when in the critical regions. Fixing this by reviewing the way MSR is compute before calling RTAS. Now a hardcoded value meaning real mode, 32 bits big endian mode and Recoverable Interrupt is loaded. In the case MSR[S] is set, it will remain set while entering RTAS as only urfid can unset it (thanks Fabiano). In addition a check is added in do_enter_rtas() to detect calls made with MSR[RI] unset, as we are forcing it on later. This patch has been tested on the following machines: Power KVM Guest P8 S822L (host Ubuntu kernel 5.11.0-49-generic) PowerVM LPAR P8 9119-MME (FW860.A1) p9 9008-22L (FW950.00) P10 9080-HEX (FW1010.00) Suggested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220504101244.12107-1-ldufour@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-06-09 10:22:42 +02:00
Alexey Kardashevskiy	151322c24e	powerpc/64: Add UADDR64 relocation support commit d799769188529abc6cbf035a10087a51f7832b6b upstream. When ld detects unaligned relocations, it emits R_PPC64_UADDR64 relocations instead of R_PPC64_RELATIVE. Currently R_PPC64_UADDR64 are detected by arch/powerpc/tools/relocs_check.sh and expected not to work. Below is a simple chunk to trigger this behaviour (this disables optimization for the demonstration purposes only, this also happens with -O1/-O2 when CONFIG_PRINTK_INDEX=y, for example): \#pragma GCC push_options \#pragma GCC optimize ("O0") struct entry { const char *file; int line; } __attribute__((packed)); static const struct entry e1 = { .file = __FILE__, .line = __LINE__ }; static const struct entry e2 = { .file = __FILE__, .line = __LINE__ }; ... prom_printf("e1=%s %lx %lx\n", e1.file, (unsigned long) e1.file, mfmsr()); prom_printf("e2=%s %lx\n", e2.file, (unsigned long) e2.file); \#pragma GCC pop_options This adds support for UADDR64 for 64bit. This reuses __dynamic_symtab from the 32bit code which supports more relocation types already. Because RELACOUNT includes only R_PPC64_RELATIVE, this replaces it with RELASZ which is the size of all relocation records. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20220309061822.168173-1-aik@ozlabs.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:44 +02:00
Andreas Gruenbacher	923f05a660	gup: Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable} commit bb523b406c849eef8f265a07cd7f320f1f177743 upstream Turn fault_in_pages_{readable,writeable} into versions that return the number of bytes not faulted in, similar to copy_to_user, instead of returning a non-zero value when any of the requested pages couldn't be faulted in. This supports the existing users that require all pages to be faulted in as well as new users that are happy if any pages can be faulted in. Rename the functions to fault_in_{readable,writeable} to make sure this change doesn't silently break things. Neither of these functions is entirely trivial and it doesn't seem useful to inline them, so move them to mm/gup.c. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-01 17:22:28 +02:00
Hangyu Hua	2a71e3ecd8	powerpc/secvar: fix refcount leak in format_show() [ Upstream commit d601fd24e6964967f115f036a840f4f28488f63f ] Refcount leak will happen when format_show returns failure in multiple cases. Unified management of of_node_put can fix this problem. Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220302021959.10959-1-hbh25y@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-13 20:59:08 +02:00
Sourabh Jain	06ee48a4fc	powerpc: Set crashkernel offset to mid of RMA region [ Upstream commit 7c5ed82b800d8615cdda00729e7b62e5899f0b13 ] On large config LPARs (having 192 and more cores), Linux fails to boot due to insufficient memory in the first memblock. It is due to the memory reservation for the crash kernel which starts at 128MB offset of the first memblock. This memory reservation for the crash kernel doesn't leave enough space in the first memblock to accommodate other essential system resources. The crash kernel start address was set to 128MB offset by default to ensure that the crash kernel get some memory below the RMA region which is used to be of size 256MB. But given that the RMA region size can be 512MB or more, setting the crash kernel offset to mid of RMA size will leave enough space for the kernel to allocate memory for other system resources. Since the above crash kernel offset change is only applicable to the LPAR platform, the LPAR feature detection is pushed before the crash kernel reservation. The rest of LPAR specific initialization will still be done during pseries_probe_fw_features as usual. This patch is dependent on changes to paca allocation for boot CPU. It expect boot CPU to discover 1T segment support which is introduced by the patch posted here: https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239175.html Reported-by: Abdul haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220204085601.107257-1-sourabhjain@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-13 20:59:03 +02:00
Andreas Gruenbacher	ec5ebfd1ce	powerpc/kvm: Fix kvm_use_magic_page commit 0c8eb2884a42d992c7726539328b7d3568f22143 upstream. When switching from __get_user to fault_in_pages_readable, commit `9f9eae5ce7` broke kvm_use_magic_page: like __get_user, fault_in_pages_readable returns 0 on success. Fixes: `9f9eae5ce7` ("powerpc/kvm: Prefer fault_in_pages_readable function") Cc: stable@vger.kernel.org # v4.18+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-08 14:22:57 +02:00
Christophe Leroy	0e0b570564	powerpc/603: Fix boot failure with DEBUG_PAGEALLOC and KFENCE commit 9bb162fa26ed76031ed0e7dbc77ccea0bf977758 upstream. Allthough kernel text is always mapped with BATs, we still have inittext mapped with pages, so TLB miss handling is required when CONFIG_DEBUG_PAGEALLOC or CONFIG_KFENCE is set. The final solution should be to set a BAT that also maps inittext but that BAT then needs to be cleared at end of init, and it will require more changes to be able to do it properly. As DEBUG_PAGEALLOC or KFENCE are debugging, performance is not a big deal so let's fix it simply for now to enable easy stable application. Fixes: `035b19a15a` ("powerpc/32s: Always map kernel text and rodata with BATs") Cc: stable@vger.kernel.org # v5.11+ Reported-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/aea33b4813a26bdb9378b5f273f00bd5d4abe240.1638857364.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-02-23 12:03:14 +01:00
Nicholas Piggin	498e6604a3	powerpc/64s: Mask SRR0 before checking against the masked NIP [ Upstream commit aee101d7b95a03078945681dd7f7ea5e4a1e7686 ] Commit 314f6c23dd8d ("powerpc/64s: Mask NIP before checking against SRR0") masked off the low 2 bits of the NIP value in the interrupt stack frame in case they are non-zero and mis-compare against a SRR0 register value of a CPU which always reads back 0 from the 2 low bits which are reserved. This now causes the opposite problem that an implementation which does implement those bits in SRR0 will mis-compare against the masked NIP value in which they have been cleared. QEMU is one such implementation, and this is allowed by the architecture. This can be triggered by sigfuz by setting low bits of PT_NIP in the signal context. Fix this for now by masking the SRR0 bits as well. Cleaner is probably to sanitise these values before putting them in registers or stack, but this is the quick and backportable fix. Fixes: 314f6c23dd8d ("powerpc/64s: Mask NIP before checking against SRR0") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220117134403.2995059-1-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-02-01 17:27:09 +01:00
Christophe Leroy	c894d2f9fd	powerpc/32: Fix boot failure with GCC latent entropy plugin commit bba496656a73fc1d1330b49c7f82843836e9feb1 upstream. Boot fails with GCC latent entropy plugin enabled. This is due to early boot functions trying to access 'latent_entropy' global data while the kernel is not relocated at its final destination yet. As there is no way to tell GCC to use PTRRELOC() to access it, disable latent entropy plugin in early_32.o and feature-fixups.o and code-patching.o Fixes: `38addce8b6` ("gcc-plugins: Add latent_entropy plugin") Cc: stable@vger.kernel.org # v4.9+ Reported-by: Erhard Furtner <erhard_f@mailbox.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215217 Link: https://lore.kernel.org/r/2bac55483b8daf5b1caa163a45fa5f9cdbe18be4.1640178426.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-02-01 17:27:06 +01:00
Hari Bathini	625be247e8	powerpc/fadump: Fix inaccurate CPU state info in vmcore generated with panic [ Upstream commit 06e629c25daa519be620a8c17359ae8fc7a2e903 ] In panic path, fadump is triggered via a panic notifier function. Before calling panic notifier functions, smp_send_stop() gets called, which stops all CPUs except the panic'ing CPU. Commit `8389b37dff` ("powerpc: stop_this_cpu: remove the cpu from the online map.") and again commit `bab26238bb` ("powerpc: Offline CPU in stop_this_cpu()") started marking CPUs as offline while stopping them. So, if a kernel has either of the above commits, vmcore captured with fadump via panic path would not process register data for all CPUs except the panic'ing CPU. Sample output of crash-utility with such vmcore: # crash vmlinux vmcore ... KERNEL: vmlinux DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 1 DATE: Wed Nov 10 09:56:34 EST 2021 UPTIME: 00:00:42 LOAD AVERAGE: 2.27, 0.69, 0.24 TASKS: 183 NODENAME: XXXXXXXXX RELEASE: 5.15.0+ VERSION: #974 SMP Wed Nov 10 04:18:19 CST 2021 MACHINE: ppc64le (2500 Mhz) MEMORY: 8 GB PANIC: "Kernel panic - not syncing: sysrq triggered crash" PID: 3394 COMMAND: "bash" TASK: c0000000150a5f80 [THREAD_INFO: c0000000150a5f80] CPU: 1 STATE: TASK_RUNNING (PANIC) crash> p -x __cpu_online_mask __cpu_online_mask = $1 = { bits = {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0} } crash> crash> crash> p -x __cpu_active_mask __cpu_active_mask = $2 = { bits = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0} } crash> While this has been the case since fadump was introduced, the issue was not identified for two probable reasons: - In general, the bulk of the vmcores analyzed were from crash due to exception. - The above did change since commit `8341f2f222` ("sysrq: Use panic() to force a crash") started using panic() instead of deferencing NULL pointer to force a kernel crash. But then commit `de6e5d3841` ("powerpc: smp_send_stop do not offline stopped CPUs") stopped marking CPUs as offline till kernel commit `bab26238bb` ("powerpc: Offline CPU in stop_this_cpu()") reverted that change. To ensure post processing register data of all other CPUs happens as intended, let panic() function take the crash friendly path (read crash_smp_send_stop()) with the help of crash_kexec_post_notifiers option. Also, as register data for all CPUs is captured by f/w, skip IPI callbacks here for fadump, to avoid any complications in finding the right backtraces. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211207103719.91117-2-hbathini@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:05:02 +01:00
Hari Bathini	953cacfaf3	powerpc: handle kdump appropriately with crash_kexec_post_notifiers option [ Upstream commit 219572d2fc4135b5ce65c735d881787d48b10e71 ] Kdump can be triggered after panic_notifers since commit `f06e5153f4` ("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers") introduced crash_kexec_post_notifiers option. But using this option would mean smp_send_stop(), that marks all other CPUs as offline, gets called before kdump is triggered. As a result, kdump routines fail to save other CPUs' registers. To fix this, kdump friendly crash_smp_send_stop() function was introduced with kernel commit `0ee59413c9` ("x86/panic: replace smp_send_stop() with kdump friendly version in panic path"). Override this kdump friendly weak function to handle crash_kexec_post_notifiers option appropriately on powerpc. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> [Fixed signature of crash_stop_this_cpu() - reported by lkp@intel.com] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211207103719.91117-1-hbathini@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:05:01 +01:00
Christophe Leroy	e7dd8ba6fc	powerpc/40x: Map 32Mbytes of memory at startup [ Upstream commit 06e7cbc29e97b4713b4ea6def04ae8501a7d1a59 ] As reported by Carlo, 16Mbytes is not enough with modern kernels that tend to be a bit big, so map another 16M page at boot. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/89b5f974a7fa5011206682cd092e2c905530ff46.1632755552.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:05:01 +01:00
Michael Ellerman	f374976a7e	powerpc/smp: Move setup_profiling_timer() under CONFIG_PROFILING [ Upstream commit a4ac0d249a5db80e79d573db9e4ad29354b643a8 ] setup_profiling_timer() is only needed when CONFIG_PROFILING is enabled. Fixes the following W=1 warning when CONFIG_PROFILING=n: linux/arch/powerpc/kernel/smp.c:1638:5: error: no previous prototype for ‘setup_profiling_timer’ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211124093254.1054750-5-mpe@ellerman.id.au Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:04:58 +01:00
Nicholas Piggin	52ce10c187	powerpc/watchdog: Fix missed watchdog reset due to memory ordering race [ Upstream commit 5dad4ba68a2483fc80d70b9dc90bbe16e1f27263 ] It is possible for all CPUs to miss the pending cpumask becoming clear, and then nobody resetting it, which will cause the lockup detector to stop working. It will eventually expire, but watchdog_smp_panic will avoid doing anything if the pending mask is clear and it will never be reset. Order the cpumask clear vs the subsequent test to close this race. Add an extra check for an empty pending mask when the watchdog fires and finds its bit still clear, to try to catch any other possible races or bugs here and keep the watchdog working. The extra test in arch_touch_nmi_watchdog is required to prevent the new warning from firing off. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Debugged-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211110025056.2084347-2-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:04:57 +01:00
Julia Lawall	aa8c270145	powerpc/btext: add missing of_node_put [ Upstream commit a1d2b210ffa52d60acabbf7b6af3ef7e1e69cda0 ] for_each_node_by_type performs an of_node_get on each iteration, so a break out of the loop requires an of_node_put. A simplified version of the semantic patch that fixes this problem is as follows (http://coccinelle.lip6.fr): // <smpl> @@ local idexpression n; expression e; @@ for_each_node_by_type(n,...) { ... ( of_node_put(n); \| e = n \| + of_node_put(n); ? break; ) ... } ... when != n // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1448051604-25256-6-git-send-email-Julia.Lawall@lip6.fr Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:04:57 +01:00
Michael Ellerman	fca58a4344	powerpc/64s: Use EMIT_WARN_ENTRY for SRR debug warnings [ Upstream commit fd1eaaaaa6864b5fb8f99880fcefb49760b8fe4e ] When CONFIG_PPC_RFI_SRR_DEBUG=y we check the SRR values before returning from interrupts. This is done in asm using EMIT_BUG_ENTRY, and passing BUGFLAG_WARNING. However that fails to create an exception table entry for the warning, and so do_program_check() fails the exception table search and proceeds to call _exception(), resulting in an oops like: Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 2 PID: 1204 Comm: sigreturn_unali Tainted: P 5.16.0-rc2-00194-g91ca3d4f77c5 #12 NIP: c00000000000c5b0 LR: 0000000000000000 CTR: 0000000000000000 ... NIP [c00000000000c5b0] system_call_common+0x150/0x268 LR [0000000000000000] 0x0 Call Trace: [c00000000db73e10] [c00000000000c558] system_call_common+0xf8/0x268 (unreliable) ... Instruction dump: 7cc803a6 888d0931 2c240000 4082001c 38800000 988d0931 e8810170 e8a10178 7c9a03a6 7cbb03a6 7d7a02a6 e9810170 <7f0b6088> 7d7b02a6 e9810178 7f0b6088 We should instead use EMIT_WARN_ENTRY, which creates an exception table entry for the warning, allowing the warning to be correctly recognised, and the code to resume after printing the warning. Note however that because this warning is buried deep in the interrupt return path, we are not able to recover from it (due to MSR_RI being clear), so we still end up in die() with an unrecoverable exception. Fixes: `59dc5bfca0` ("powerpc/64s: avoid reloading (H)SRR registers if they are still valid") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211221135101.2085547-2-mpe@ellerman.id.au Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:04:16 +01:00
Michael Ellerman	5bb2d955e8	powerpc/64s: Mask NIP before checking against SRR0 [ Upstream commit 314f6c23dd8d417281eb9e8a516dd98036f2e7b3 ] When CONFIG_PPC_RFI_SRR_DEBUG=y we check that NIP and SRR0 match when returning from interrupts. This can trigger falsely if NIP has either of its two low bits set via sigreturn or ptrace, while SRR0 has its low two bits masked in hardware. As a quick fix make sure to mask the low bits before doing the check. Fixes: `59dc5bfca0` ("powerpc/64s: avoid reloading (H)SRR registers if they are still valid") Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20211221135101.2085547-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:04:16 +01:00
Christophe Leroy	2d17ab8874	powerpc/modules: Don't WARN on first module allocation attempt [ Upstream commit f1797e4de1146009c888bcf8b6bb6648d55394f1 ] module_alloc() first tries to allocate module text within 24 bits direct jump from kernel text, and tries a wider allocation if first one fails. When first allocation fails the following is observed in kernel logs: vmap allocation for size 2400256 failed: use vmalloc=<size> to increase size systemd-udevd: vmalloc error: size 2395133, vm_struct allocation failed, mode:0xcc0(GFP_KERNEL), nodemask=(null) CPU: 0 PID: 127 Comm: systemd-udevd Tainted: G W 5.15.5-gentoo-PowerMacG4 #9 Call Trace: [e2a53a50] [c0ba0048] dump_stack_lvl+0x80/0xb0 (unreliable) [e2a53a70] [c0540128] warn_alloc+0x11c/0x2b4 [e2a53b50] [c0531be8] __vmalloc_node_range+0xd8/0x64c [e2a53c10] [c00338c0] module_alloc+0xa0/0xac [e2a53c40] [c027a368] load_module+0x2ae0/0x8148 [e2a53e30] [c027fc78] sys_finit_module+0xfc/0x130 [e2a53f30] [c0035098] ret_from_syscall+0x0/0x28 ... Add __GFP_NOWARN flag to first allocation so that no warning appears when it fails. Reported-by: Erhard Furtner <erhard_f@mailbox.org> Fixes: `2ec13df167` ("powerpc/modules: Load modules closer to kernel text") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/93c9b84d6ec76aaf7b4f03468e22433a6d308674.1638267035.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:04:08 +01:00
Peiwei Hu	49a237c73a	powerpc/prom_init: Fix improper check of prom_getprop() [ Upstream commit 869fb7e5aecbc163003f93f36dcc26d0554319f6 ] prom_getprop() can return PROM_ERROR. Binary operator can not identify it. Fixes: `94d2dde738` ("[POWERPC] Efika: prune fixups and make them more carefull") Signed-off-by: Peiwei Hu <jlu.hpw@foxmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/tencent_BA28CC6897B7C95A92EB8C580B5D18589105@qq.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:04:06 +01:00
Mark Rutland	0c1cf578a8	powerpc: Avoid discarding flags in system_call_exception() [ Upstream commit 08b0af5b2affbe7419853e8dd1330e4b3fe27125 ] Some thread flags can be set remotely, and so even when IRQs are disabled, the flags can change under our feet. Thus, when setting flags we must use an atomic operation rather than a plain read-modify-write sequence, as a plain read-modify-write may discard flags which are concurrently set by a remote thread, e.g. // task A // task B tmp = A->thread_info.flags; set_tsk_thread_flag(A, NEWFLAG_B); tmp \|= NEWFLAG_A; A->thread_info.flags = tmp; arch/powerpc/kernel/interrupt.c's system_call_exception() sets _TIF_RESTOREALL in the thread info flags with a read-modify-write, which may result in other flags being discarded. Elsewhere in the file it uses clear_bits() to atomically remove flag bits, so use set_bits() here for consistency with those. There may be reasons (e.g. instrumentation) that prevent the use of set_thread_flag() and clear_thread_flag() here, which would otherwise be preferable. Fixes: `ae7aaecc3f` ("powerpc/64s: system call rfscv workaround for TM bugs") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Eirik Fuller <efuller@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Link: https://lore.kernel.org/r/20211129130653.2037928-10-mark.rutland@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:03:22 +01:00
Russell Currey	9b9f256714	powerpc/module_64: Fix livepatching for RO modules commit 8734b41b3efe0fc6082c1937b0e88556c396dc96 upstream. Livepatching a loaded module involves applying relocations through apply_relocate_add(), which attempts to write to read-only memory when CONFIG_STRICT_MODULE_RWX=y. Work around this by performing these writes through the text poke area by using patch_instruction(). R_PPC_REL24 is the only relocation type generated by the kpatch-build userspace tool or klp-convert kernel tree that I observed applying a relocation to a post-init module. A more comprehensive solution is planned, but using patch_instruction() for R_PPC_REL24 on should serve as a sufficient fix. This does have a performance impact, I observed ~15% overhead in module_load() on POWER8 bare metal with checksum verification off. Fixes: `c35717c71e` ("powerpc: Set ARCH_HAS_STRICT_MODULE_RWX") Cc: stable@vger.kernel.org # v5.14+ Reported-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Russell Currey <ruscur@russell.cc> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> [mpe: Check return codes from patch_instruction()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211214121248.777249-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-12-22 09:32:49 +01:00
Christophe Leroy	c4e3ff8b8b	powerpc/32: Fix hardlockup on vmap stack overflow commit 5bb60ea611db1e04814426ed4bd1c95d1487678e upstream. Since the commit `c118c7303a` ("powerpc/32: Fix vmap stack - Do not activate MMU before reading task struct") a vmap stack overflow results in a hard lockup. This is because emergency_ctx is still addressed with its virtual address allthough data MMU is not active anymore at that time. Fix it by using a physical address instead. Fixes: `c118c7303a` ("powerpc/32: Fix vmap stack - Do not activate MMU before reading task struct") Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ce30364fb7ccda489272af4a1612b6aa147e1d23.1637227521.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-12-01 09:04:44 +01:00
Eric W. Biederman	686bf79203	signal: Replace force_fatal_sig with force_exit_sig when in doubt commit fcb116bc43c8c37c052530ead79872f8b2615711 upstream. Recently to prevent issues with SECCOMP_RET_KILL and similar signals being changed before they are delivered SA_IMMUTABLE was added. Unfortunately this broke debuggers[1][2] which reasonably expect to be able to trap synchronous SIGTRAP and SIGSEGV even when the target process is not configured to handle those signals. Add force_exit_sig and use it instead of force_fatal_sig where historically the code has directly called do_exit. This has the implementation benefits of going through the signal exit path (including generating core dumps) without the danger of allowing userspace to ignore or change these signals. This avoids userspace regressions as older kernels exited with do_exit which debuggers also can not intercept. In the future is should be possible to improve the quality of implementation of the kernel by changing some of these force_exit_sig calls to force_fatal_sig. That can be done where it matters on a case-by-case basis with careful analysis. Reported-by: Kyle Huey <me@kylehuey.com> Reported-by: kernel test robot <oliver.sang@intel.com> [1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpjw@mail.gmail.com [2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-9020 Fixes: 00b06da29cf9 ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed") Fixes: `a3616a3c02` ("signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die") Fixes: 83a1f27ad773 ("signal/powerpc: On swapcontext failure force SIGSEGV") Fixes: 9bc508cf0791 ("signal/s390: Use force_sigsegv in default_trap_handler") Fixes: 086ec444f866 ("signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig") Fixes: c317d306d550 ("signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails") Fixes: 695dd0d634df ("signal/x86: In emulate_vsyscall force a signal instead of calling do_exit") Fixes: 1fbd60df8a85 ("signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.") Fixes: 941edc5bf174 ("exit/syscall_user_dispatch: Send ordinary signals on failure") Link: https://lkml.kernel.org/r/871r3dqfv8.fsf_-_@email.froward.int.ebiederm.org Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Tested-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Thomas Backlund <tmb@iki.fi> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-25 09:49:07 +01:00
Eric W. Biederman	02d28b5fdb	signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV) commit e21294a7aaae32c5d7154b187113a04db5852e37 upstream. Now that force_fatal_sig exists it is unnecessary and a bit confusing to use force_sigsegv in cases where the simpler force_fatal_sig is wanted. So change every instance we can to make the code clearer. Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Link: https://lkml.kernel.org/r/877de7jrev.fsf@disp2133 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Thomas Backlund <tmb@iki.fi> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-25 09:49:06 +01:00
Eric W. Biederman	c7b7868dba	signal/powerpc: On swapcontext failure force SIGSEGV commit 83a1f27ad773b1d8f0460d3a676114c7651918cc upstream. If the register state may be partial and corrupted instead of calling do_exit, call force_sigsegv(SIGSEGV). Which properly kills the process with SIGSEGV and does not let any more userspace code execute, instead of just killing one thread of the process and potentially confusing everything. Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Fixes: 756f1ae8a44e ("PPC32: Rework signal code and add a swapcontext system call.") Fixes: 04879b04bf50 ("[PATCH] ppc64: VMX (Altivec) support & signal32 rework, from Ben Herrenschmidt") Link: https://lkml.kernel.org/r/20211020174406.17889-7-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Thomas Backlund <tmb@iki.fi> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-25 09:49:06 +01:00
Nicholas Piggin	c3b0ab956d	printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces commit 5d5e4522a7f404d1a96fd6c703989d32a9c9568d upstream. printk from NMI context relies on irq work being raised on the local CPU to print to console. This can be a problem if the NMI was raised by a lockup detector to print lockup stack and regs, because the CPU may not enable irqs (because it is locked up). Introduce printk_trigger_flush() that can be called another CPU to try to get those messages to the console, call that where printk_safe_flush was previously called. Fixes: `93d102f094` ("printk: remove safe buffers") Cc: stable@vger.kernel.org # 5.15 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20211107045116.1754411-1-npiggin@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-25 09:48:45 +01:00
Christophe Leroy	de04ee7d7d	powerpc/8xx: Fix pinned TLBs with CONFIG_STRICT_KERNEL_RWX commit 1e35eba4055149c578baf0318d2f2f89ea3c44a0 upstream. As spotted and explained in commit c12ab8dbc492 ("powerpc/8xx: Fix Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST"), the selection of STRICT_KERNEL_RWX without selecting DEBUG_RODATA_TEST has spotted the lack of the DIRTY bit in the pinned kernel data TLBs. This problem should have been detected a lot earlier if things had been working as expected. But due to an incredible level of chance or mishap, this went undetected because of a set of bugs: In fact the DTLBs were not pinned, because instead of setting the reserve bit in MD_CTR, it was set in MI_CTR that is the register for ITLBs. But then, another huge bug was there: the physical address was reset to 0 at the boundary between RO and RW areas, leading to the same physical space being mapped at both 0xc0000000 and 0xc8000000. This had by miracle no consequence until now because the entry was not really pinned so it was overwritten soon enough to go undetected. Of course, now that we really pin the DTLBs, it must be fixed as well. Fixes: `f76c8f6d25` ("powerpc/8xx: Add function to set pinned TLBs") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Depends-on: c12ab8dbc492 ("powerpc/8xx: Fix Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a21e9a057fe2d247a535aff0d157a54eefee017a.1636963688.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-25 09:48:44 +01:00
Christophe Leroy	7cc16be1ae	powerpc/signal32: Fix sigset_t copy commit 5499802b2284331788a440585869590f1bd63f7f upstream. The conversion from __copy_from_user() to __get_user() by commit `d3ccc97815` ("powerpc/signal: Use __get_user() to copy sigset_t") introduced a regression in __get_user_sigset() for powerpc/32. The bug was subsequently moved into unsafe_get_user_sigset(). The bug is due to the copied 64 bit value being truncated to 32 bits while being assigned to dst->sig[0] The regression was reported by users of the Xorg packages distributed in Debian/powerpc -- "The symptoms are that the fb screen goes blank, with the backlight remaining on and no errors logged in /var/log; wdm (or startx) run with no effect (I tried logging in in the blind, with no effect). And they are hard to kill, requiring 'kill -KILL ...'" Fix the regression by copying each word of the sigset, not only the first one. __get_user_sigset() was tentatively optimised to copy 64 bits at once in order to minimise KUAP unlock/lock impact, but the unsafe variant doesn't suffer that, so it can just copy words. Fixes: `887f3ceb51` ("powerpc/signal32: Convert do_setcontext[_tm]() to user access block") Cc: stable@vger.kernel.org # v5.13+ Reported-by: Finn Thain <fthain@linux-m68k.org> Reported-and-tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/99ef38d61c0eb3f79c68942deb0c35995a93a777.1636966353.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-25 09:48:44 +01:00
Masahiro Yamada	a0995ebe4e	powerpc: clean vdso32 and vdso64 directories [ Upstream commit 964c33cd0be621b291b5d253d8731eb2680082cb ] Since commit `bce74491c3` ("powerpc/vdso: fix unnecessary rebuilds of vgettimeofday.o"), "make ARCH=powerpc clean" does not clean up the arch/powerpc/kernel/{vdso32,vdso64} directories. Use the subdir- trick to let "make clean" descend into them. Fixes: `bce74491c3` ("powerpc/vdso: fix unnecessary rebuilds of vgettimeofday.o") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211109185015.615517-1-masahiroy@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-25 09:48:40 +01:00
Christophe Leroy	23274bd8d7	powerpc/8xx: Fix Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST [ Upstream commit c12ab8dbc492b992e1ea717db933cee568780c47 ] Until now, all tests involving CONFIG_STRICT_KERNEL_RWX were done with DEBUG_RODATA_TEST to check the result. But now that CONFIG_STRICT_KERNEL_RWX is selected by default, it came without CONFIG_DEBUG_RODATA_TEST and led to the following Oops [ 6.830908] Freeing unused kernel image (initmem) memory: 352K [ 6.840077] BUG: Unable to handle kernel data access on write at 0xc1285200 [ 6.846836] Faulting instruction address: 0xc0004b6c [ 6.851745] Oops: Kernel access of bad area, sig: 11 [#1] [ 6.857075] BE PAGE_SIZE=16K PREEMPT CMPC885 [ 6.861348] SAF3000 DIE NOTIFICATION [ 6.864830] CPU: 0 PID: 1 Comm: swapper Not tainted 5.15.0-rc5-s3k-dev-02255-g2747d7b7916f #451 [ 6.873429] NIP: c0004b6c LR: c0004b60 CTR: 00000000 [ 6.878419] REGS: c902be60 TRAP: 0300 Not tainted (5.15.0-rc5-s3k-dev-02255-g2747d7b7916f) [ 6.886852] MSR: 00009032 <EE,ME,IR,DR,RI> CR: 53000335 XER: 8000ff40 [ 6.893564] DAR: c1285200 DSISR: 82000000 [ 6.893564] GPR00: 0c000000 c902bf20 c20f4000 08000000 00000001 04001f00 c1800000 00000035 [ 6.893564] GPR08: ff0001ff c1280000 00000002 c0004b60 00001000 00000000 c0004b1c 00000000 [ 6.893564] GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 6.893564] GPR24: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 c1060000 [ 6.932034] NIP [c0004b6c] kernel_init+0x50/0x138 [ 6.936682] LR [c0004b60] kernel_init+0x44/0x138 [ 6.941245] Call Trace: [ 6.943653] [c902bf20] [c0004b60] kernel_init+0x44/0x138 (unreliable) [ 6.950022] [c902bf30] [c001122c] ret_from_kernel_thread+0x5c/0x64 [ 6.956135] Instruction dump: [ 6.959060] 48ffc521 48045469 4800d8cd 3d20c086 89295fa0 2c090000 41820058 480796c9 [ 6.966890] 4800e48d 3d20c128 39400002 3fe0c106 <91495200> 3bff8000 4806fa1d 481f7d75 [ 6.974902] ---[ end trace 1e397bacba4aa610 ]--- 0xc1285200 corresponds to 'system_state' global var that the kernel is trying to set to SYSTEM_RUNNING. This var is above the RO/RW limit so it shouldn't Oops. It oopses because the dirty bit is missing. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3d5800b0bbcd7b19761b98f50421358667b45331.1635520232.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-25 09:48:30 +01:00
Nicholas Piggin	d05dc4bdc3	powerpc/64s/interrupt: Fix check_return_regs_valid() false positive commit 4a5cb51f3db4be547225a4bce7a43d41b231382b upstream. The check_return_regs_valid() can cause a false positive if the return regs are marked as norestart and they are an HSRR type interrupt, because the low bit in the bottom of regs->trap causes interrupt type matching to fail. This can occcur for example on bare metal with a HV privileged doorbell interrupt that causes a signal, but do_signal returns early because get_signal() fails, and takes the "No signal to deliver" path. In this case no signal was delivered so the return location is not changed so return SRRs are not invalidated, yet set_trap_norestart is called, which messes up the match. Building go-1.16.6 is known to reproduce this. Fix it by using the TRAP() accessor which masks out the low bit. Fixes: `6eaaf9de35` ("powerpc/64s/interrupt: Check and fix srr_valid without crashing") Cc: stable@vger.kernel.org # v5.14+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211026122531.3599918-1-npiggin@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:19 +01:00
Nicholas Piggin	317cc5bacf	powerpc/32e: Ignore ESR in instruction storage interrupt handler commit 81291383ffde08b23bce75e7d6b2575ce9d3475c upstream. A e5500 machine running a 32-bit kernel sometimes hangs at boot, seemingly going into an infinite loop of instruction storage interrupts. The ESR (Exception Syndrome Register) has a value of 0x800000 (store) when this happens, which is likely set by a previous store. An instruction TLB miss interrupt would then leave ESR unchanged, and if no PTE exists it calls directly to the instruction storage interrupt handler without changing ESR. access_error() does not cause a segfault due to a store to a read-only vma because is_exec is true. Most subsequent fault handling does not check for a write fault on a read-only vma, and might do strange things like create a writeable PTE or call page_mkwrite on a read only vma or file. It's not clear what happens here to cause the infinite faulting in this case, a fault handler failure or low level PTE or TLB handling. In any case this can be fixed by having the instruction storage interrupt zero regs->dsisr rather than storing the ESR value to it. Fixes: `a01a3f2ddb` ("powerpc: remove arguments from fault handler functions") Cc: stable@vger.kernel.org # v5.12+ Reported-by: Jacques de Laval <jacques.delaval@protonmail.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Jacques de Laval <jacques.delaval@protonmail.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211028133043.4159501-1-npiggin@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:19 +01:00
Nathan Lynch	ddd95eb01a	powerpc: fix unbalanced node refcount in check_kvm_guest() [ Upstream commit 56537faf8821e361d739fc5ff58c9c40f54a1d4c ] When check_kvm_guest() succeeds in looking up a /hypervisor OF node, it returns without performing a matching put for the lookup, leaving the node's reference count elevated. Add the necessary call to of_node_put(), rearranging the code slightly to avoid repetition or goto. Fixes: `107c55005f` ("powerpc/pseries: Add KVM guest doorbell restrictions") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210928124550.132020-1-nathanl@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-18 19:16:51 +01:00
Linus Torvalds	0a3221b658	Merge tag 'powerpc-5.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix a bug exposed by a previous fix, where running guests with certain SMT topologies could crash the host on Power8. - Fix atomic sleep warnings when re-onlining CPUs, when PREEMPT is enabled. Thanks to Nathan Lynch, Srikar Dronamraju, and Valentin Schneider. * tag 'powerpc-5.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/smp: do not decrement idle task preempt count in CPU offline powerpc/idle: Don't corrupt back chain when going idle	2021-10-21 15:30:09 -10:00
Nathan Lynch	787252a10d	powerpc/smp: do not decrement idle task preempt count in CPU offline With PREEMPT_COUNT=y, when a CPU is offlined and then onlined again, we get: BUG: scheduling while atomic: swapper/1/0/0x00000000 no locks held by swapper/1/0. CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.15.0-rc2+ #100 Call Trace: dump_stack_lvl+0xac/0x108 __schedule_bug+0xac/0xe0 __schedule+0xcf8/0x10d0 schedule_idle+0x3c/0x70 do_idle+0x2d8/0x4a0 cpu_startup_entry+0x38/0x40 start_secondary+0x2ec/0x3a0 start_secondary_prolog+0x10/0x14 This is because powerpc's arch_cpu_idle_dead() decrements the idle task's preempt count, for reasons explained in commit `a7c2bb8279` ("powerpc: Re-enable preemption before cpu_die()"), specifically "start_secondary() expects a preempt_count() of 0." However, since commit `2c669ef697` ("powerpc/preempt: Don't touch the idle task's preempt_count during hotplug") and commit `f1a0a376ca` ("sched/core: Initialize the idle task with preemption disabled"), that justification no longer holds. The idle task isn't supposed to re-enable preemption, so remove the vestigial preempt_enable() from the CPU offline path. Tested with pseries and powernv in qemu, and pseries on PowerVM. Fixes: `2c669ef697` ("powerpc/preempt: Don't touch the idle task's preempt_count during hotplug") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211015173902.2278118-1-nathanl@linux.ibm.com	2021-10-20 21:38:01 +11:00
Michael Ellerman	496c5fe25c	powerpc/idle: Don't corrupt back chain when going idle In isa206_idle_insn_mayloss() we store various registers into the stack red zone, which is allowed. However inside the IDLE_STATE_ENTER_SEQ_NORET macro we save r2 again, to 0(r1), which corrupts the stack back chain. We used to do the same in isa206_idle_insn_mayloss() itself, but we fixed that in `73287caa92` ("powerpc64/idle: Fix SP offsets when saving GPRs"), however we missed that the macro also corrupts the back chain. Corrupting the back chain is bad for debuggability but doesn't necessarily cause a bug. However we recently changed the stack handling in some KVM code, and it now relies on the stack back chain being valid when it returns. The corruption causes that code to return with r1 pointing somewhere in kernel data, at some point LR is restored from the stack and we branch to NULL or somewhere else invalid. Only affects Power8 hosts running KVM guests, with dynamic_mt_modes enabled (which it is by default). The fixes tag below points to the commit that changed the KVM stack handling, exposing this bug. The actual corruption of the back chain has always existed since `948cf67c47` ("powerpc: Add NAP mode support on Power7 in HV mode"). Fixes: `9b4416c509` ("KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211020094826.3222052-1-mpe@ellerman.id.au	2021-10-20 21:37:58 +11:00
Linus Torvalds	efb52a7d95	Merge tag 'powerpc-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "A bit of a big batch, partly because I didn't send any last week, and also just because the BPF fixes happened to land this week. Summary: - Fix a regression hit by the IPR SCSI driver, introduced by the recent addition of MSI domains on pseries. - A big series including 8 BPF fixes, some with potential security impact and the rest various code generation issues. - Fix our program check assembler entry path, which was accidentally jumping into a gas macro and generating strange stack frames, which could confuse find_bug(). - A couple of fixes, and related changes, to fix corner cases in our machine check handling. - Fix our DMA IOMMU ops, which were not always returning the optimal DMA mask, leading to at least one device falling back to 32-bit DMA when it shouldn't. - A fix for KUAP handling on 32-bit Book3S. - Fix crashes seen when kdumping on some pseries systems. Thanks to Naveen N. Rao, Nicholas Piggin, Alexey Kardashevskiy, Cédric Le Goater, Christophe Leroy, Mahesh Salgaonkar, Abdul Haleem, Christoph Hellwig, Johan Almbladh, Stan Johnson" * tag 'powerpc-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: pseries/eeh: Fix the kdump kernel crash during eeh_pseries_init powerpc/32s: Fix kuap_kernel_restore() powerpc/pseries/msi: Add an empty irq_write_msi_msg() handler powerpc/64s: Fix unrecoverable MCE calling async handler from NMI powerpc/64/interrupt: Reconcile soft-mask state in NMI and fix false BUG powerpc/64: warn if local irqs are enabled in NMI or hardirq context powerpc/traps: do not enable irqs in _exception powerpc/64s: fix program check interrupt emergency stack path powerpc/bpf ppc32: Fix BPF_SUB when imm == 0x80000000 powerpc/bpf ppc32: Do not emit zero extend instruction for 64-bit BPF_END powerpc/bpf ppc32: Fix JMP32_JSET_K powerpc/bpf ppc32: Fix ALU32 BPF_ARSH operation powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC powerpc/security: Add a helper to query stf_barrier type powerpc/bpf: Fix BPF_SUB when imm == 0x80000000 powerpc/bpf: Fix BPF_MOD when imm == 1 powerpc/bpf: Validate branch ranges powerpc/lib: Add helper to check if offset is within conditional branch range powerpc/iommu: Report the correct most efficient DMA mask for PCI devices	2021-10-10 10:12:42 -07:00
Nicholas Piggin	f08fb25bc6	powerpc/64s: Fix unrecoverable MCE calling async handler from NMI The machine check handler is not considered NMI on 64s. The early handler is the true NMI handler, and then it schedules the machine_check_exception handler to run when interrupts are enabled. This works fine except the case of an unrecoverable MCE, where the true NMI is taken when MSR[RI] is clear, it can not recover, so it calls machine_check_exception directly so something might be done about it. Calling an async handler from NMI context can result in irq state and other things getting corrupted. This can also trigger the BUG at arch/powerpc/include/asm/interrupt.h:168 BUG_ON(!arch_irq_disabled_regs(regs) && !(regs->msr & MSR_EE)); Fix this by making an _async version of the handler which is called in the normal case, and a NMI version that is called for unrecoverable interrupts. Fixes: `2b43dd7653` ("powerpc/64: enable MSR[EE] in irq replay pt_regs") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211004145642.1331214-6-npiggin@gmail.com	2021-10-07 19:54:55 +11:00
Nicholas Piggin	ff058a8ada	powerpc/64: warn if local irqs are enabled in NMI or hardirq context This can help catch bugs such as the one fixed by the previous change to prevent _exception() from enabling irqs. ppc32 could have a similar warning but it has no good config option to debug this stuff (the test may be overkill to add for production kernels). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211004145642.1331214-4-npiggin@gmail.com	2021-10-07 19:54:55 +11:00

1 2 3 4 5 ...

8203 Commits