Rather than blindly copying the register state between the hyp and host
vCPU structures, abstract this code into some helpers which are called
only for non-protected VMs running under pKVM. To faciliate host access
to guest registers within a get/put sequence, introduce a new
'sync_state' hypercall to provide access to the registers of a
non-protected VM when handling traps.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I5b0d874d2d2184c4da95a91c0b9b57af500cbce3
While userspace enables single-step, if the Software Step state at the
last guest exit was "Active-pending", clear PSTATE.SS on guest entry
to restore the state.
Currently, KVM sets PSTATE.SS to 1 on every guest entry while userspace
enables single-step for the vCPU (with KVM_GUESTDBG_SINGLESTEP).
It means KVM always makes the vCPU's Software Step state
"Active-not-pending" on the guest entry, which lets the VCPU perform
single-step (then Software Step exception is taken). This could cause
extra single-step (without returning to userspace) if the Software Step
state at the last guest exit was "Active-pending" (i.e. the last
exit was triggered by an asynchronous exception after the single-step
is performed, but before the Software Step exception is taken.
See "Figure D2-3 Software step state machine" and "D2.12.7 Behavior
in the active-pending state" in ARM DDI 0487I.a for more info about
this behavior).
Fix this by clearing PSTATE.SS on guest entry if the Software Step state
at the last exit was "Active-pending" so that KVM restore the state (and
the exception is taken before further single-step is performed).
Fixes: 337b99bf7e ("KVM: arm64: guest debug, add support for single-step")
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220917010600.532642-3-reijiw@google.com
(cherry picked from commit 370531d1e95be57c62fdf065fb04fd8db7ade8f9)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I214f1064e8e233facfee08254c42db7c8c707cf0
The unwinding code doesn't really belong to the exit handling
code. Instead, move it to a file (conveniently named stacktrace.c
to confuse the reviewer), and move all the stacktrace-related
stuff there.
It will be joined by more code very soon.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Tested-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20220727142906.1856759-3-maz@kernel.org
(cherry picked from commit 9f5fee05f6897d0fe0e3a44ade71bb85cd97b2ef)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I00812f13bc4efc13ef61476fa28b27d881a9f039
With CONFIG_RANDOMIZE_BASE=y vmlinux addresses will resolve incorrectly
from kallsyms. Fix this by adding the KASLR offset before printing the
symbols.
Fixes: 6ccf9cb557bd ("KVM: arm64: Symbolize the nVHE HYP addresses")
Reported-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220715235824.2549012-1-kaleshsingh@google.com
(cherry picked from commit ed6313a93fd11d2015ad17046f3c418bf6a8dab1)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I1b93e2c97ad1927b1dcba2a4bc43637f9427c41f
The host kernel uses the WFIT flag to remember that a vcpu has used
this instruction and wake it up as required. Move it to the state
set, as nothing in the hypervisor uses this information.
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit eebc538d8e07e0ec759823664cbe2011a8bd885d)
[willdeacon@: Use kvm_vcpu_block() instead of kvm_vcpu_halt() in context]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Iccd6cca64cfb3920be878fa883255e3d6ba2a01c
In order to enable HCR_EL2.TID3 for AArch32 guests KVM needs to handle
traps where ESR_EL2.EC=0x8, which corresponds to an attempted VMRS
access from an ID group register. Specifically, the MVFR{0-2} registers
are accessed this way from AArch32. Conveniently, these registers are
architecturally mapped to MVFR{0-2}_EL1 in AArch64. Furthermore, KVM
already handles reads to these aliases in AArch64.
Plumb VMRS read traps through to the general AArch64 system register
handler.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220503060205.2823727-5-oupton@google.com
(cherry picked from commit 9369bc5c5e35985f38d04bd98c6d28a032e84b17)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I9cc5c2ff532454ea13bdb8b5ca6076133974284a
Reintroduce the __kvm_nvhe_ symbols in kallsyms, ignoring the local
symbols in this namespace. The local symbols are not informative and
can cause aliasing issues when symbolizing the addresses.
With the necessary symbols now in kallsyms we can symbolize nVHE
addresses using the %p print format specifier:
[ 98.916444][ T426] kvm [426]: nVHE hyp panic at: [<ffffffc0096156fc>] __kvm_nvhe_overflow_stack+0x8/0x34!
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220420214317.3303360-7-kaleshsingh@google.com
(cherry picked from commit 6ccf9cb557bd32073b0d68baed97f1bd8a40ff1d)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Iccb04ef986b7a68ad97bb0598c9a47fdcfe5ba64
For WFxT instructions used with very small delays, it is not
unlikely that the deadline is already expired by the time we
reach the WFx handling code.
Check for this condition as soon as possible, and return to the
guest immediately if we can.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220419182755.601427-7-maz@kernel.org
(cherry picked from commit a3fb59651449d8bd4dc4ed5413888819932c740b)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I03090182ab73793c54601d8beb09be3ec090a053
When trapping a blocking WFIT instruction, take it into account when
computing the deadline of the background timer.
The state is tracked with a new vcpu flag, and is gated by a new
CPU capability, which isn't currently enabled.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220419182755.601427-6-maz@kernel.org
(cherry picked from commit 89f5074c503b6b6f181c0240c931f67bcaf266e9)
[willdeacon@: Use kvm_vcpu_block() instead of kvm_vcpu_halt() in context]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I9819233e1cb0df8a4bfb5018a47249b16fecae49
Move the put and reload of the vGIC out of the block/unblock callbacks
and into a dedicated WFI helper. Functionally, this is nearly a nop as
the block hook is called at the very beginning of kvm_vcpu_block(), and
the only code in kvm_vcpu_block() after the unblock hook is to update the
halt-polling controls, i.e. can only affect the next WFI.
Back when the arch (un)blocking hooks were added by commits 3217f7c25b
("KVM: Add kvm_arch_vcpu_{un}blocking callbacks) and d35268da66
("arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block"),
the hooks were invoked only when KVM was about to "block", i.e. schedule
out the vCPU. The use case at the time was to schedule a timer in the
host based on the earliest timer in the guest in order to wake the
blocking vCPU when the emulated guest timer fired. Commit accb99bcd0
("KVM: arm/arm64: Simplify bg_timer programming") reworked the timer
logic to be even more precise, by waiting until the vCPU was actually
scheduled out, and so move the timer logic from the (un)blocking hooks to
vcpu_load/put.
In the meantime, the hooks gained usage for enabling vGIC v4 doorbells in
commit df9ba95993 ("KVM: arm/arm64: GICv4: Use the doorbell interrupt
as an unblocking source"), and added related logic for the VMCR in commit
5eeaf10eec ("KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block").
Finally, commit 07ab0f8d9a ("KVM: Call kvm_arch_vcpu_blocking early
into the blocking sequence") hoisted the (un)blocking hooks so that they
wrapped KVM's halt-polling logic in addition to the core "block" logic.
In other words, the original need for arch hooks to take action _only_
in the block path is long since gone.
Cc: Oliver Upton <oupton@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211009021236.4122790-11-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 6109c5a6ab7f38ed8e1beb06a90aa83884c18700)
[willdeacon@: Fix context conflict in kvm/arm.c due to move of
kvm_vcpu_initialized() definition]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I1deeb18d3134a74e565738757cdacee06e53baab
This function only works for loaded vcpus and no more information
is needed by hyp. This removes the need to access potentially
unsafe host memory.
Bug: 220830416
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: Id705e9d8f1d147d474cb81af4ce974bbe45f3614
Changes in 5.15.22
drm/i915: Disable DSB usage for now
selinux: fix double free of cond_list on error paths
audit: improve audit queue handling when "audit=1" on cmdline
ipc/sem: do not sleep with a spin lock held
spi: stm32-qspi: Update spi registering
ASoC: hdmi-codec: Fix OOB memory accesses
ASoC: ops: Reject out of bounds values in snd_soc_put_volsw()
ASoC: ops: Reject out of bounds values in snd_soc_put_volsw_sx()
ASoC: ops: Reject out of bounds values in snd_soc_put_xr_sx()
ALSA: usb-audio: Correct quirk for VF0770
ALSA: hda: Fix UAF of leds class devs at unbinding
ALSA: hda: realtek: Fix race at concurrent COEF updates
ALSA: hda/realtek: Add quirk for ASUS GU603
ALSA: hda/realtek: Add missing fixup-model entry for Gigabyte X570 ALC1220 quirks
ALSA: hda/realtek: Fix silent output on Gigabyte X570S Aorus Master (newer chipset)
ALSA: hda/realtek: Fix silent output on Gigabyte X570 Aorus Xtreme after reboot from Windows
btrfs: don't start transaction for scrub if the fs is mounted read-only
btrfs: fix deadlock between quota disable and qgroup rescan worker
btrfs: fix use-after-free after failure to create a snapshot
Revert "fs/9p: search open fids first"
drm/nouveau: fix off by one in BIOS boundary checking
drm/i915/adlp: Fix TypeC PHY-ready status readout
drm/amd/pm: correct the MGpuFanBoost support for Beige Goby
drm/amd/display: watermark latencies is not enough on DCN31
drm/amd/display: Force link_rate as LINK_RATE_RBR2 for 2018 15" Apple Retina panels
nvme-fabrics: fix state check in nvmf_ctlr_matches_baseopts()
mm/debug_vm_pgtable: remove pte entry from the page table
mm/pgtable: define pte_index so that preprocessor could recognize it
mm/kmemleak: avoid scanning potential huge holes
block: bio-integrity: Advance seed correctly for larger interval sizes
dma-buf: heaps: Fix potential spectre v1 gadget
IB/hfi1: Fix AIP early init panic
Revert "fbcon: Disable accelerated scrolling"
fbcon: Add option to enable legacy hardware acceleration
mptcp: fix msk traversal in mptcp_nl_cmd_set_flags()
Revert "ASoC: mediatek: Check for error clk pointer"
KVM: arm64: Avoid consuming a stale esr value when SError occur
KVM: arm64: Stop handle_exit() from handling HVC twice when an SError occurs
RDMA/cma: Use correct address when leaving multicast group
RDMA/ucma: Protect mc during concurrent multicast leaves
RDMA/siw: Fix refcounting leak in siw_create_qp()
IB/rdmavt: Validate remote_addr during loopback atomic tests
RDMA/siw: Fix broken RDMA Read Fence/Resume logic.
RDMA/mlx4: Don't continue event handler after memory allocation failure
ALSA: usb-audio: initialize variables that could ignore errors
ALSA: hda: Fix signedness of sscanf() arguments
ALSA: hda: Skip codec shutdown in case the codec is not registered
iommu/vt-d: Fix potential memory leak in intel_setup_irq_remapping()
iommu/amd: Fix loop timeout issue in iommu_ga_log_enable()
spi: bcm-qspi: check for valid cs before applying chip select
spi: mediatek: Avoid NULL pointer crash in interrupt
spi: meson-spicc: add IRQ check in meson_spicc_probe
spi: uniphier: fix reference count leak in uniphier_spi_probe()
IB/hfi1: Fix tstats alloc and dealloc
IB/cm: Release previously acquired reference counter in the cm_id_priv
net: ieee802154: hwsim: Ensure proper channel selection at probe time
net: ieee802154: mcr20a: Fix lifs/sifs periods
net: ieee802154: ca8210: Stop leaking skb's
netfilter: nft_reject_bridge: Fix for missing reply from prerouting
net: ieee802154: Return meaningful error codes from the netlink helpers
net/smc: Forward wakeup to smc socket waitqueue after fallback
net: stmmac: dwmac-visconti: No change to ETHER_CLOCK_SEL for unexpected speed request.
net: stmmac: properly handle with runtime pm in stmmac_dvr_remove()
net: macsec: Fix offload support for NETDEV_UNREGISTER event
net: macsec: Verify that send_sci is on when setting Tx sci explicitly
net: stmmac: dump gmac4 DMA registers correctly
net: stmmac: ensure PTP time register reads are consistent
drm/kmb: Fix for build errors with Warray-bounds
drm/i915/overlay: Prevent divide by zero bugs in scaling
drm/amd: avoid suspend on dGPUs w/ s2idle support when runtime PM enabled
ASoC: fsl: Add missing error handling in pcm030_fabric_probe
ASoC: xilinx: xlnx_formatter_pcm: Make buffer bytes multiple of period bytes
ASoC: simple-card: fix probe failure on platform component
ASoC: cpcap: Check for NULL pointer after calling of_get_child_by_name
ASoC: max9759: fix underflow in speaker_gain_control_put()
ASoC: codecs: wcd938x: fix incorrect used of portid
ASoC: codecs: lpass-rx-macro: fix sidetone register offsets
ASoC: codecs: wcd938x: fix return value of mixer put function
pinctrl: sunxi: Fix H616 I2S3 pin data
pinctrl: intel: Fix a glitch when updating IRQ flags on a preconfigured line
pinctrl: intel: fix unexpected interrupt
pinctrl: bcm2835: Fix a few error paths
scsi: bnx2fc: Make bnx2fc_recv_frame() mp safe
nfsd: nfsd4_setclientid_confirm mistakenly expires confirmed client.
gve: fix the wrong AdminQ buffer queue index check
bpf: Use VM_MAP instead of VM_ALLOC for ringbuf
selftests/exec: Remove pipe from TEST_GEN_FILES
selftests: futex: Use variable MAKE instead of make
tools/resolve_btfids: Do not print any commands when building silently
e1000e: Separate ADP board type from TGP
rtc: cmos: Evaluate century appropriate
kvm: add guest_state_{enter,exit}_irqoff()
kvm/arm64: rework guest entry logic
perf: Copy perf_event_attr::sig_data on modification
perf stat: Fix display of grouped aliased events
perf/x86/intel/pt: Fix crash with stop filters in single-range mode
x86/perf: Default set FREEZE_ON_SMI for all
EDAC/altera: Fix deferred probing
EDAC/xgene: Fix deferred probing
ext4: prevent used blocks from being allocated during fast commit replay
ext4: modify the logic of ext4_mb_new_blocks_simple
ext4: fix error handling in ext4_restore_inline_data()
ext4: fix error handling in ext4_fc_record_modified_inode()
ext4: fix incorrect type issue during replay_del_range
net: dsa: mt7530: make NET_DSA_MT7530 select MEDIATEK_GE_PHY
cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning
tools include UAPI: Sync sound/asound.h copy with the kernel sources
gpio: idt3243x: Fix an ignored error return from platform_get_irq()
gpio: mpc8xxx: Fix an ignored error return from platform_get_irq()
selftests: nft_concat_range: add test for reload with no element add/del
selftests: netfilter: check stateless nat udp checksum fixup
Linux 5.15.22
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I9143b858b768a8497c1df9440a74d8c105c32271
commit 1229630af88620f6e3a621a1ebd1ca14d9340df7 upstream.
Prior to commit defe21f49b ("KVM: arm64: Move PC rollback on SError to
HYP"), when an SError is synchronised due to another exception, KVM
handles the SError first. If the guest survives, the instruction that
triggered the original exception is re-exectued to handle the first
exception. HVC is treated as a special case as the instruction wouldn't
normally be re-exectued, as its not a trap.
Commit defe21f49b didn't preserve the behaviour of the 'return 1'
that skips the rest of handle_exit().
Since commit defe21f49b, KVM will try to handle the SError and the
original exception at the same time. When the exception was an HVC,
fixup_guest_exit() has already rolled back ELR_EL2, meaning if the
guest has virtual SError masked, it will execute and handle the HVC
twice.
Restore the original behaviour.
Fixes: defe21f49b ("KVM: arm64: Move PC rollback on SError to HYP")
Cc: stable@vger.kernel.org
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127122052.1584324-4-james.morse@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In order to deal with state synchronisation between EL1 and EL2,
we use the following setup:
- On exit from EL2, the state is forcefully marked clean.
- Should a trap be handled, the state is synchronised and immediately
marked dirty
- On vcpu_put(), the state is also marked dirty, since it can be
modified by userspace
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 209580772
Change-Id: I47a889ca5432566f236de4630d81753348632f8a
Signed-off-by: Will Deacon <willdeacon@google.com>
* kvm-arm64/misc-5.15:
: Misc improvements for 5.15:
:
: - Account the number of VMID-wide TLB invalidations as
: remote TLB flushes
: - Fix comments in the VGIC code
: - Cleanup the PMU IMPDEF identification
: - Streamline the TGRAN2 usage
: - Avoid advertising a 52bit IPA range for non-64KB configs
: - Avoid spurious signalling when a HW-mapped interrupt is in the
: A+P state on entry, and in the P state on exit, but that the
: physical line is not pending anymore.
: - Bunch of minor cleanups
KVM: arm64: Trim guest debug exception handling
Signed-off-by: Marc Zyngier <maz@kernel.org>
The switch-case for handling guest debug exception covers
all the debug exception classes, but functionally, doesn't
do anything with them other than ESR_ELx_EC_WATCHPT_LOW.
Moreover, even though handled well, the 'default' case
could be confusing from a security point of view, stating
that the guests' actions can potentially flood the syslog.
But in reality, the code is unreachable.
Hence, trim down the function to only handle the case with
ESR_ELx_EC_WATCHPT_LOW with a simple 'if' check.
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210823223940.1878930-1-rananta@google.com
When protected mode is enabled, the host is unable to access most parts
of the EL2 hypervisor image, including 'hyp_physvirt_offset' and the
contents of the hypervisor's '.rodata.str' section. Unfortunately,
nvhe_hyp_panic_handler() tries to read from both of these locations when
handling a BUG() triggered at EL2; the former for converting the ELR to
a physical address and the latter for displaying the name of the source
file where the BUG() occurred.
Hack the EL2 panic asm to pass both physical and virtual ELR values to
the host and utilise the newly introduced CONFIG_NVHE_EL2_DEBUG so that
we disable stage-2 protection for the host before returning to the EL1
panic handler. If the debug option is not enabled, display the address
instead of the source file:line information.
Cc: Andrew Scull <ascull@google.com>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210813130336.8139-1-will@kernel.org
To aid with debugging, add details of the source of a panic from nVHE
hyp. This is done by having nVHE hyp exit to nvhe_hyp_panic_handler()
rather than directly to panic(). The handler will then add the extra
details for debugging before panicking the kernel.
If the panic was due to a BUG(), look up the metadata to log the file
and line, if available, otherwise log an address that can be looked up
in vmlinux. The hyp offset is also logged to allow other hyp VAs to be
converted, similar to how the kernel offset is logged during a panic.
__hyp_panic_string is now inlined since it no longer needs to be
referenced as a symbol and the message is free to diverge between VHE
and nVHE.
The following is an example of the logs generated by a BUG in nVHE hyp.
[ 46.754840] kvm [307]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/switch.c:242!
[ 46.755357] kvm [307]: Hyp Offset: 0xfffea6c58e1e0000
[ 46.755824] Kernel panic - not syncing: HYP panic:
[ 46.755824] PS:400003c9 PC:0000d93a82c705ac ESR:f2000800
[ 46.755824] FAR:0000000080080000 HPFAR:0000000000800800 PAR:0000000000000000
[ 46.755824] VCPU:0000d93a880d0000
[ 46.756960] CPU: 3 PID: 307 Comm: kvm-vcpu-0 Not tainted 5.12.0-rc3-00005-gc572b99cf65b-dirty #133
[ 46.757459] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[ 46.758366] Call trace:
[ 46.758601] dump_backtrace+0x0/0x1b0
[ 46.758856] show_stack+0x18/0x70
[ 46.759057] dump_stack+0xd0/0x12c
[ 46.759236] panic+0x16c/0x334
[ 46.759426] arm64_kernel_unmapped_at_el0+0x0/0x30
[ 46.759661] kvm_arch_vcpu_ioctl_run+0x134/0x750
[ 46.759936] kvm_vcpu_ioctl+0x2f0/0x970
[ 46.760156] __arm64_sys_ioctl+0xa8/0xec
[ 46.760379] el0_svc_common.constprop.0+0x60/0x120
[ 46.760627] do_el0_svc+0x24/0x90
[ 46.760766] el0_svc+0x2c/0x54
[ 46.760915] el0_sync_handler+0x1a4/0x1b0
[ 46.761146] el0_sync+0x170/0x180
[ 46.761889] SMP: stopping secondary CPUs
[ 46.762786] Kernel Offset: 0x3e1cd2820000 from 0xffff800010000000
[ 46.763142] PHYS_OFFSET: 0xffffa9f680000000
[ 46.763359] CPU features: 0x00240022,61806008
[ 46.763651] Memory Limit: none
[ 46.813867] ---[ end Kernel panic - not syncing: HYP panic:
[ 46.813867] PS:400003c9 PC:0000d93a82c705ac ESR:f2000800
[ 46.813867] FAR:0000000080080000 HPFAR:0000000000800800 PAR:0000000000000000
[ 46.813867] VCPU:0000d93a880d0000 ]---
Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210318143311.839894-6-ascull@google.com
kvm_coproc.h used to serve as a compatibility layer for the files
shared between the 32 and 64 bit ports.
Another one bites the dust...
Signed-off-by: Marc Zyngier <maz@kernel.org>
Instead of handling the "PC rollback on SError during HVC" at EL1 (which
requires disclosing PC to a potentially untrusted kernel), let's move
this fixup to ... fixup_guest_exit(), which is where we do all fixups.
Isn't that neat?
Signed-off-by: Marc Zyngier <maz@kernel.org>
In an effort to remove the vcpu PC manipulations from EL1 on nVHE
systems, move kvm_skip_instr() to be HYP-specific. EL1's intent
to increment PC post emulation is now signalled via a flag in the
vcpu structure.
Signed-off-by: Marc Zyngier <maz@kernel.org>
There is no need to feed the result of kvm_vcpu_trap_il_is32bit()
to kvm_skip_instr(), as only AArch32 has a variable length ISA, and
this helper can equally be called from kvm_skip_instr32(), reducing
the complexity at all the call sites.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
On SMC trap, the prefered return address is set to that of the SMC
instruction itself. It is thus wrong to try and roll it back when
an SError occurs while trapping on SMC. It is still necessary on
HVC though, as HVC doesn't cause a trap, and sets ELR to returning
*after* the HVC.
It also became apparent that there is no 16bit encoding for an AArch32
HVC instruction, meaning that the displacement is always 4 bytes,
no matter what the ISA is. Take this opportunity to simplify it.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
KVM/arm64 updates for Linux 5.9:
- Split the VHE and nVHE hypervisor code bases, build the EL2 code
separately, allowing for the VHE code to now be built with instrumentation
- Level-based TLB invalidation support
- Restructure of the vcpu register storage to accomodate the NV code
- Pointer Authentication available for guests on nVHE hosts
- Simplification of the system register table parsing
- MMU cleanups and fixes
- A number of post-32bit cleanups and other fixes
In the current kvm version, 'kvm_run' has been included in the 'kvm_vcpu'
structure. For historical reasons, many kvm-related function parameters
retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time. This
patch does a unified cleanup of these remaining redundant parameters.
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200623131418.31473-3-tianjia.zhang@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm/arm32 isn't supported since commit 541ad0150c ("arm: Remove
32bit KVM host support"). So HSR isn't meaningful since then. This
renames HSR to ESR accordingly. This shouldn't cause any functional
changes:
* Rename kvm_vcpu_get_hsr() to kvm_vcpu_get_esr() to make the
function names self-explanatory.
* Rename variables from @hsr to @esr to make them self-explanatory.
Note that the renaming on uapi and tracepoint will cause ABI changes,
which we should avoid. Specificly, there are 4 related source files
in this regard:
* arch/arm64/include/uapi/asm/kvm.h (struct kvm_debug_exit_arch::hsr)
* arch/arm64/kvm/handle_exit.c (struct kvm_debug_exit_arch::hsr)
* arch/arm64/kvm/trace_arm.h (tracepoints)
* arch/arm64/kvm/trace_handle_exit.h (tracepoints)
Signed-off-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Andrew Scull <ascull@google.com>
Link: https://lore.kernel.org/r/20200630015705.103366-1-gshan@redhat.com
The current way we deal with PtrAuth is a bit heavy handed:
- We forcefully save the host's keys on each vcpu_load()
- Handling the PtrAuth trap forces us to go all the way back
to the exit handling code to just set the HCR bits
Overall, this is pretty cumbersome. A better approach would be
to handle it the same way we deal with the FPSIMD registers:
- On vcpu_load() disable PtrAuth for the guest
- On first use, save the host's keys, enable PtrAuth in the
guest
Crucially, this can happen as a fixup, which is done very early
on exit. We can then reenter the guest immediately without
leaving the hypervisor role.
Another thing is that it simplify the rest of the host handling:
exiting all the way to the host means that the only possible
outcome for this trap is to inject an UNDEF.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
When using the PtrAuth feature in a guest, we need to save the host's
keys before allowing the guest to program them. For that, we dump
them in a per-CPU data structure (the so called host context).
But both call sites that do this are in preemptible context,
which may end up in disaster should the vcpu thread get preempted
before reentering the guest.
Instead, save the keys eagerly on each vcpu_load(). This has an
increased overhead, but is at least safe.
Cc: stable@vger.kernel.org
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
We currently intertwine the KVM PSCI implementation with the general
dispatch of hypercall handling, which makes perfect sense because PSCI
is the only category of hypercalls we support.
However, as we are about to support additional hypercalls, factor out
this functionality into a separate hypercall handler file.
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[steven.price@arm.com: rebased]
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not see http www gnu org
licenses
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 503 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Enrico Weigelt <info@metux.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When pointer authentication is supported, a guest may wish to use it.
This patch adds the necessary KVM infrastructure for this to work, with
a semi-lazy context switch of the pointer auth state.
Pointer authentication feature is only enabled when VHE is built
in the kernel and present in the CPU implementation so only VHE code
paths are modified.
When we schedule a vcpu, we disable guest usage of pointer
authentication instructions and accesses to the keys. While these are
disabled, we avoid context-switching the keys. When we trap the guest
trying to use pointer authentication functionality, we change to eagerly
context-switching the keys, and enable the feature. The next time the
vcpu is scheduled out/in, we start again. However the host key save is
optimized and implemented inside ptrauth instruction/register access
trap.
Pointer authentication consists of address authentication and generic
authentication, and CPUs in a system might have varied support for
either. Where support for either feature is not uniform, it is hidden
from guests via ID register emulation, as a result of the cpufeature
framework in the host.
Unfortunately, address authentication and generic authentication cannot
be trapped separately, as the architecture provides a single EL2 trap
covering both. If we wish to expose one without the other, we cannot
prevent a (badly-written) guest from intermittently using a feature
which is not uniformly supported (when scheduled on a physical CPU which
supports the relevant feature). Hence, this patch expects both type of
authentication to be present in a cpu.
This switch of key is done from guest enter/exit assembly as preparation
for the upcoming in-kernel pointer authentication support. Hence, these
key switching routines are not implemented in C code as they may cause
pointer authentication key signing error in some situations.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
[Only VHE, key switch in full assembly, vcpu_has_ptrauth checks
, save host key in ptrauth exception trap]
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
[maz: various fixups]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Pull KVM updates from Paolo Bonzini:
"ARM:
- selftests improvements
- large PUD support for HugeTLB
- single-stepping fixes
- improved tracing
- various timer and vGIC fixes
x86:
- Processor Tracing virtualization
- STIBP support
- some correctness fixes
- refactorings and splitting of vmx.c
- use the Hyper-V range TLB flush hypercall
- reduce order of vcpu struct
- WBNOINVD support
- do not use -ftrace for __noclone functions
- nested guest support for PAUSE filtering on AMD
- more Hyper-V enlightenments (direct mode for synthetic timers)
PPC:
- nested VFIO
s390:
- bugfixes only this time"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits)
KVM: x86: Add CPUID support for new instruction WBNOINVD
kvm: selftests: ucall: fix exit mmio address guessing
Revert "compiler-gcc: disable -ftracer for __noclone functions"
KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines
KVM: VMX: Explicitly reference RCX as the vmx_vcpu pointer in asm blobs
KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup
MAINTAINERS: Add arch/x86/kvm sub-directories to existing KVM/x86 entry
KVM/x86: Use SVM assembly instruction mnemonics instead of .byte streams
KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()
KVM/MMU: Flush tlb directly in kvm_set_pte_rmapp()
KVM/MMU: Move tlb flush in kvm_set_pte_rmapp() to kvm_mmu_notifier_change_pte()
KVM: Make kvm_set_spte_hva() return int
KVM: Replace old tlb flush function with new one to flush a specified range.
KVM/MMU: Add tlb flush with range helper function
KVM/VMX: Add hv tlb range flush support
x86/hyper-v: Add HvFlushGuestAddressList hypercall support
KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops
KVM: x86: Disable Intel PT when VMXON in L1 guest
KVM: x86: Set intercept for Intel PT MSRs read/write
KVM: x86: Implement Intel PT MSRs read/write emulation
...
When we emulate a guest instruction, we don't advance the hardware
singlestep state machine, and thus the guest will receive a software
step exception after a next instruction which is not emulated by the
host.
We bodge around this in an ad-hoc fashion. Sometimes we explicitly check
whether userspace requested a single step, and fake a debug exception
from within the kernel. Other times, we advance the HW singlestep state
rely on the HW to generate the exception for us. Thus, the observed step
behaviour differs for host and guest.
Let's make this simpler and consistent by always advancing the HW
singlestep state machine when we skip an instruction. Thus we can rely
on the hardware to generate the singlestep exception for us, and never
need to explicitly check for an active-pending step, nor do we need to
fake a debug exception from the guest.
Cc: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In subsequent patches we're going to expose ptrauth to the host kernel
and userspace, but things are a bit trickier for guest kernels. For the
time being, let's hide ptrauth from KVM guests.
Regardless of how well-behaved the guest kernel is, guest userspace
could attempt to use ptrauth instructions, triggering a trap to EL2,
resulting in noise from kvm_handle_unknown_ec(). So let's write up a
handler for the PAC trap, which silently injects an UNDEF into the
guest, as if the feature were really missing.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Will Deacon <will.deacon@arm.com>
This commit adds a paranoid check when entering the guest to make sure
we don't attempt running guest code in an equally or more privilged mode
than the hypervisor. We also catch other accidental programming of the
SPSR_EL2 which results in an illegal exception return and report this
safely back to the user.
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The new SMC Calling Convention (v1.1) allows for a reduced overhead
when calling into the firmware, and provides a new feature discovery
mechanism.
Make it visible to KVM guests.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When handling an SMC trap, the "preferred return address" is set
to that of the SMC, and not the next PC (which is a departure from
the behaviour of an SMC that isn't trapped).
Increment PC in the handler, as the guest is otherwise forever
stuck...
Cc: stable@vger.kernel.org
Fixes: acfb3b883f ("arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls")
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
KVM doesn't follow the SMCCC when it comes to unimplemented calls,
and inject an UNDEF instead of returning an error. Since firmware
calls are now used for security mitigation, they are becoming more
common, and the undef is counter productive.
Instead, let's follow the SMCCC which states that -1 must be returned
to the caller when getting an unknown function number.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We expect to have firmware-first handling of RAS SErrors, with errors
notified via an APEI method. For systems without firmware-first, add
some minimal handling to KVM.
There are two ways KVM can take an SError due to a guest, either may be a
RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO,
or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit.
The current SError from EL2 code unmasks SError and tries to fence any
pending SError into a single instruction window. It then leaves SError
unmasked.
With the v8.2 RAS Extensions we may take an SError for a 'corrected'
error, but KVM is only able to handle SError from EL2 if they occur
during this single instruction window...
The RAS Extensions give us a new instruction to synchronise and
consume SErrors. The RAS Extensions document (ARM DDI0587),
'2.4.1 ESB and Unrecoverable errors' describes ESB as synchronising
SError interrupts generated by 'instructions, translation table walks,
hardware updates to the translation tables, and instruction fetches on
the same PE'. This makes ESB equivalent to KVMs existing
'dsb, mrs-daifclr, isb' sequence.
Use the alternatives to synchronise and consume any SError using ESB
instead of unmasking and taking the SError. Set ARM_EXIT_WITH_SERROR_BIT
in the exit_code so that we can restart the vcpu if it turns out this
SError has no impact on the vcpu.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We expect to have firmware-first handling of RAS SErrors, with errors
notified via an APEI method. For systems without firmware-first, add
some minimal handling to KVM.
There are two ways KVM can take an SError due to a guest, either may be a
RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO,
or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit.
For SError that interrupt a guest and are routed to EL2 the existing
behaviour is to inject an impdef SError into the guest.
Add code to handle RAS SError based on the ESR. For uncontained and
uncategorized errors arm64_is_fatal_ras_serror() will panic(), these
errors compromise the host too. All other error types are contained:
For the fatal errors the vCPU can't make progress, so we inject a virtual
SError. We ignore contained errors where we can make progress as if
we're lucky, we may not hit them again.
If only some of the CPUs support RAS the guest will see the cpufeature
sanitised version of the id registers, but we may still take RAS SError
on this CPU. Move the SError handling out of handle_exit() into a new
handler that runs before we can be preempted. This allows us to use
this_cpu_has_cap(), via arm64_is_ras_serror().
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When an SError arrives during single-step both the SError and debug
exceptions may be pending when the step is completed, and the
architecture doesn't define the ordering of the two. This means that we
can observe en SError even though we've just completed a step, without
receiving a debug exception. In that case the DBG_SPSR_SS bit will have
flipped as the instruction executed. After handling the abort in
handle_exit() we test to see if the bit is clear and we were
single-stepping before deciding if we need to exit to user space.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
If we are using guest debug to single-step the guest, we need to ensure
that we exit after emulating the instruction. This only affects
instructions completely emulated by the kernel. For instructions
emulated in userspace, we need to exit and return to complete the
emulation.
The kvm_arm_handle_step_debug() helper sets up the necessary exit
state if needed.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>