Files
kernel_arpi/arch/s390/include/asm/lowcore.h
Martin Schwidefsky 0aaba41b58 s390: remove all code using the access register mode
The vdso code for the getcpu() and the clock_gettime() call use the access
register mode to access the per-CPU vdso data page with the current code.

An alternative to the complicated AR mode is to use the secondary space
mode. This makes the vdso faster and quite a bit simpler. The downside is
that the uaccess code has to be changed quite a bit.

Which instructions are used depends on the machine and what kind of uaccess
operation is requested. The instruction dictates which ASCE value needs
to be loaded into %cr1 and %cr7.

The different cases:

* User copy with MVCOS for z10 and newer machines
  The MVCOS instruction can copy between the primary space (aka user) and
  the home space (aka kernel) directly. For set_fs(KERNEL_DS) the kernel
  ASCE is loaded into %cr1. For set_fs(USER_DS) the user space is already
  loaded in %cr1.

* User copy with MVCP/MVCS for older machines
  To be able to execute the MVCP/MVCS instructions the kernel needs to
  switch to primary mode. The control register %cr1 has to be set to the
  kernel ASCE and %cr7 to either the kernel ASCE or the user ASCE dependent
  on set_fs(KERNEL_DS) vs set_fs(USER_DS).

* Data access in the user address space for strnlen / futex
  To use "normal" instruction with data from the user address space the
  secondary space mode is used. The kernel needs to switch to primary mode,
  %cr1 has to contain the kernel ASCE and %cr7 either the user ASCE or the
  kernel ASCE, dependent on set_fs.

To load a new value into %cr1 or %cr7 is an expensive operation, the kernel
tries to be lazy about it. E.g. for multiple user copies in a row with
MVCP/MVCS the replacement of the vdso ASCE in %cr7 with the user ASCE is
done only once. On return to user space a CPU bit is checked that loads the
vdso ASCE again.

To enable and disable the data access via the secondary space two new
functions are added, enable_sacf_uaccess and disable_sacf_uaccess. The fact
that a context is in secondary space uaccess mode is stored in the
mm_segment_t value for the task. The code of an interrupt may use set_fs
as long as it returns to the previous state it got with get_fs with another
call to set_fs. The code in finish_arch_post_lock_switch simply has to do a
set_fs with the current mm_segment_t value for the task.

For CPUs with MVCOS:

CPU running in                        | %cr1 ASCE | %cr7 ASCE |
--------------------------------------|-----------|-----------|
user space                            |  user     |  vdso     |
kernel, USER_DS, normal-mode          |  user     |  vdso     |
kernel, USER_DS, normal-mode, lazy    |  user     |  user     |
kernel, USER_DS, sacf-mode            |  kernel   |  user     |
kernel, KERNEL_DS, normal-mode        |  kernel   |  vdso     |
kernel, KERNEL_DS, normal-mode, lazy  |  kernel   |  kernel   |
kernel, KERNEL_DS, sacf-mode          |  kernel   |  kernel   |

For CPUs without MVCOS:

CPU running in                        | %cr1 ASCE | %cr7 ASCE |
--------------------------------------|-----------|-----------|
user space                            |  user     |  vdso     |
kernel, USER_DS, normal-mode          |  user     |  vdso     |
kernel, USER_DS, normal-mode lazy     |  kernel   |  user     |
kernel, USER_DS, sacf-mode            |  kernel   |  user     |
kernel, KERNEL_DS, normal-mode        |  kernel   |  vdso     |
kernel, KERNEL_DS, normal-mode, lazy  |  kernel   |  kernel   |
kernel, KERNEL_DS, sacf-mode          |  kernel   |  kernel   |

The lines with "lazy" refer to the state after a copy via the secondary
space with a delayed reload of %cr1 and %cr7.

There are three hardware address spaces that can cause a DAT exception,
primary, secondary and home space. The exception can be related to
four different fault types: user space fault, vdso fault, kernel fault,
and the gmap faults.

Dependent on the set_fs state and normal vs. sacf mode there are a number
of fault combinations:

1) user address space fault via the primary ASCE
2) gmap address space fault via the primary ASCE
3) kernel address space fault via the primary ASCE for machines with
   MVCOS and set_fs(KERNEL_DS)
4) vdso address space faults via the secondary ASCE with an invalid
   address while running in secondary space in problem state
5) user address space fault via the secondary ASCE for user-copy
   based on the secondary space mode, e.g. futex_ops or strnlen_user
6) kernel address space fault via the secondary ASCE for user-copy
   with secondary space mode with set_fs(KERNEL_DS)
7) kernel address space fault via the primary ASCE for user-copy
   with secondary space mode with set_fs(USER_DS) on machines without
   MVCOS.
8) kernel address space fault via the home space ASCE

Replace user_space_fault() with a new function get_fault_type() that
can distinguish all four different fault types.

With these changes the futex atomic ops from the kernel and the
strnlen_user will get a little bit slower, as well as the old style
uaccess with MVCP/MVCS. All user accesses based on MVCOS will be as
fast as before. On the positive side, the user space vdso code is a
lot faster and Linux ceases to use the complicated AR mode.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2017-11-14 11:01:47 +01:00

203 lines
6.1 KiB
C

/* SPDX-License-Identifier: GPL-2.0 */
/*
* Copyright IBM Corp. 1999, 2012
* Author(s): Hartmut Penner <hp@de.ibm.com>,
* Martin Schwidefsky <schwidefsky@de.ibm.com>,
* Denis Joseph Barrow,
*/
#ifndef _ASM_S390_LOWCORE_H
#define _ASM_S390_LOWCORE_H
#include <linux/types.h>
#include <asm/ptrace.h>
#include <asm/cpu.h>
#include <asm/types.h>
#define LC_ORDER 1
#define LC_PAGES 2
struct lowcore {
__u8 pad_0x0000[0x0014-0x0000]; /* 0x0000 */
__u32 ipl_parmblock_ptr; /* 0x0014 */
__u8 pad_0x0018[0x0080-0x0018]; /* 0x0018 */
__u32 ext_params; /* 0x0080 */
__u16 ext_cpu_addr; /* 0x0084 */
__u16 ext_int_code; /* 0x0086 */
__u16 svc_ilc; /* 0x0088 */
__u16 svc_code; /* 0x008a */
__u16 pgm_ilc; /* 0x008c */
__u16 pgm_code; /* 0x008e */
__u32 data_exc_code; /* 0x0090 */
__u16 mon_class_num; /* 0x0094 */
__u8 per_code; /* 0x0096 */
__u8 per_atmid; /* 0x0097 */
__u64 per_address; /* 0x0098 */
__u8 exc_access_id; /* 0x00a0 */
__u8 per_access_id; /* 0x00a1 */
__u8 op_access_id; /* 0x00a2 */
__u8 ar_mode_id; /* 0x00a3 */
__u8 pad_0x00a4[0x00a8-0x00a4]; /* 0x00a4 */
__u64 trans_exc_code; /* 0x00a8 */
__u64 monitor_code; /* 0x00b0 */
__u16 subchannel_id; /* 0x00b8 */
__u16 subchannel_nr; /* 0x00ba */
__u32 io_int_parm; /* 0x00bc */
__u32 io_int_word; /* 0x00c0 */
__u8 pad_0x00c4[0x00c8-0x00c4]; /* 0x00c4 */
__u32 stfl_fac_list; /* 0x00c8 */
__u8 pad_0x00cc[0x00e8-0x00cc]; /* 0x00cc */
__u64 mcck_interruption_code; /* 0x00e8 */
__u8 pad_0x00f0[0x00f4-0x00f0]; /* 0x00f0 */
__u32 external_damage_code; /* 0x00f4 */
__u64 failing_storage_address; /* 0x00f8 */
__u8 pad_0x0100[0x0110-0x0100]; /* 0x0100 */
__u64 breaking_event_addr; /* 0x0110 */
__u8 pad_0x0118[0x0120-0x0118]; /* 0x0118 */
psw_t restart_old_psw; /* 0x0120 */
psw_t external_old_psw; /* 0x0130 */
psw_t svc_old_psw; /* 0x0140 */
psw_t program_old_psw; /* 0x0150 */
psw_t mcck_old_psw; /* 0x0160 */
psw_t io_old_psw; /* 0x0170 */
__u8 pad_0x0180[0x01a0-0x0180]; /* 0x0180 */
psw_t restart_psw; /* 0x01a0 */
psw_t external_new_psw; /* 0x01b0 */
psw_t svc_new_psw; /* 0x01c0 */
psw_t program_new_psw; /* 0x01d0 */
psw_t mcck_new_psw; /* 0x01e0 */
psw_t io_new_psw; /* 0x01f0 */
/* Save areas. */
__u64 save_area_sync[8]; /* 0x0200 */
__u64 save_area_async[8]; /* 0x0240 */
__u64 save_area_restart[1]; /* 0x0280 */
/* CPU flags. */
__u64 cpu_flags; /* 0x0288 */
/* Return psws. */
psw_t return_psw; /* 0x0290 */
psw_t return_mcck_psw; /* 0x02a0 */
/* CPU accounting and timing values. */
__u64 sync_enter_timer; /* 0x02b0 */
__u64 async_enter_timer; /* 0x02b8 */
__u64 mcck_enter_timer; /* 0x02c0 */
__u64 exit_timer; /* 0x02c8 */
__u64 user_timer; /* 0x02d0 */
__u64 guest_timer; /* 0x02d8 */
__u64 system_timer; /* 0x02e0 */
__u64 hardirq_timer; /* 0x02e8 */
__u64 softirq_timer; /* 0x02f0 */
__u64 steal_timer; /* 0x02f8 */
__u64 last_update_timer; /* 0x0300 */
__u64 last_update_clock; /* 0x0308 */
__u64 int_clock; /* 0x0310 */
__u64 mcck_clock; /* 0x0318 */
__u64 clock_comparator; /* 0x0320 */
__u64 boot_clock[2]; /* 0x0328 */
/* Current process. */
__u64 current_task; /* 0x0338 */
__u64 kernel_stack; /* 0x0340 */
/* Interrupt, panic and restart stack. */
__u64 async_stack; /* 0x0348 */
__u64 panic_stack; /* 0x0350 */
__u64 restart_stack; /* 0x0358 */
/* Restart function and parameter. */
__u64 restart_fn; /* 0x0360 */
__u64 restart_data; /* 0x0368 */
__u64 restart_source; /* 0x0370 */
/* Address space pointer. */
__u64 kernel_asce; /* 0x0378 */
__u64 user_asce; /* 0x0380 */
__u64 vdso_asce; /* 0x0388 */
/*
* The lpp and current_pid fields form a
* 64-bit value that is set as program
* parameter with the LPP instruction.
*/
__u32 lpp; /* 0x0390 */
__u32 current_pid; /* 0x0394 */
/* SMP info area */
__u32 cpu_nr; /* 0x0398 */
__u32 softirq_pending; /* 0x039c */
__u32 preempt_count; /* 0x03a0 */
__u32 spinlock_lockval; /* 0x03a4 */
__u32 spinlock_index; /* 0x03a8 */
__u32 fpu_flags; /* 0x03ac */
__u64 percpu_offset; /* 0x03b0 */
__u64 vdso_per_cpu_data; /* 0x03b8 */
__u64 machine_flags; /* 0x03c0 */
__u64 gmap; /* 0x03c8 */
__u8 pad_0x03d0[0x0e00-0x03d0]; /* 0x03d0 */
/*
* 0xe00 contains the address of the IPL Parameter Information
* block. Dump tools need IPIB for IPL after dump.
* Note: do not change the position of any fields in 0x0e00-0x0f00
*/
__u64 ipib; /* 0x0e00 */
__u32 ipib_checksum; /* 0x0e08 */
__u64 vmcore_info; /* 0x0e0c */
__u8 pad_0x0e14[0x0e18-0x0e14]; /* 0x0e14 */
__u64 os_info; /* 0x0e18 */
__u8 pad_0x0e20[0x0f00-0x0e20]; /* 0x0e20 */
/* Extended facility list */
__u64 stfle_fac_list[32]; /* 0x0f00 */
__u8 pad_0x1000[0x11b0-0x1000]; /* 0x1000 */
/* Pointer to the machine check extended save area */
__u64 mcesad; /* 0x11b0 */
/* 64 bit extparam used for pfault/diag 250: defined by architecture */
__u64 ext_params2; /* 0x11B8 */
__u8 pad_0x11c0[0x1200-0x11C0]; /* 0x11C0 */
/* CPU register save area: defined by architecture */
__u64 floating_pt_save_area[16]; /* 0x1200 */
__u64 gpregs_save_area[16]; /* 0x1280 */
psw_t psw_save_area; /* 0x1300 */
__u8 pad_0x1310[0x1318-0x1310]; /* 0x1310 */
__u32 prefixreg_save_area; /* 0x1318 */
__u32 fpt_creg_save_area; /* 0x131c */
__u8 pad_0x1320[0x1324-0x1320]; /* 0x1320 */
__u32 tod_progreg_save_area; /* 0x1324 */
__u32 cpu_timer_save_area[2]; /* 0x1328 */
__u32 clock_comp_save_area[2]; /* 0x1330 */
__u8 pad_0x1338[0x1340-0x1338]; /* 0x1338 */
__u32 access_regs_save_area[16]; /* 0x1340 */
__u64 cregs_save_area[16]; /* 0x1380 */
__u8 pad_0x1400[0x1800-0x1400]; /* 0x1400 */
/* Transaction abort diagnostic block */
__u8 pgm_tdb[256]; /* 0x1800 */
__u8 pad_0x1900[0x2000-0x1900]; /* 0x1900 */
} __packed;
#define S390_lowcore (*((struct lowcore *) 0))
extern struct lowcore *lowcore_ptr[];
static inline void set_prefix(__u32 address)
{
asm volatile("spx %0" : : "m" (address) : "memory");
}
static inline __u32 store_prefix(void)
{
__u32 address;
asm volatile("stpx %0" : "=m" (address));
return address;
}
#endif /* _ASM_S390_LOWCORE_H */