* Rework heuristics for resolving the fault IPA (HPFAR_EL2 v. re-walk
   stage-1 page tables) to align with the architecture. This avoids
   possibly taking an SEA at EL2 on the page table walk or using an
   architecturally UNKNOWN fault IPA.
 
 * Use acquire/release semantics in the KVM FF-A proxy to avoid reading
   a stale value for the FF-A version.
 
 * Fix KVM guest driver to match PV CPUID hypercall ABI.
 
 * Use Inner Shareable Normal Write-Back mappings at stage-1 in KVM
   selftests, which is the only memory type for which atomic
   instructions are architecturally guaranteed to work.
 
 s390:
 
 * Don't use %pK for debug printing and tracepoints.
 
 x86:
 
 * Use a separate subclass when acquiring KVM's per-CPU posted interrupts
   wakeup lock in the scheduled out path, i.e. when adding a vCPU on
   the list of vCPUs to wake, to workaround a false positive deadlock.
   The schedule out code runs with a scheduler lock that the wakeup
   handler takes in the opposite order; but it does so with IRQs disabled
   and cannot run concurrently with a wakeup.
 
 * Explicitly zero-initialize on-stack CPUID unions
 
 * Allow building irqbypass.ko as as module when kvm.ko is a module
 
 * Wrap relatively expensive sanity check with KVM_PROVE_MMU
 
 * Acquire SRCU in KVM_GET_MP_STATE to protect guest memory accesses
 
 selftests:
 
 * Add more scenarios to the MONITOR/MWAIT test.
 
 * Add option to rseq test to override /dev/cpu_dma_latency
 
 * Bring list of exit reasons up to date
 
 * Cleanup Makefile to list once tests that are valid on all architectures
 
 Other:
 
 * Documentation fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmf083IUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroN1dgf/QwfpZcHoMNQSnrc1jMy2LHrArln2
 XfmsOGZTU7kyoLQsLWGAPNocOveGdiemTDsj5ZXoNMnqV8hCBr+tZuv2gWI1rr/o
 kiGerdIgSZ9piTjBlJkVAaOzbWhg2DUnr7qVVzEzFY9+rPNyQ81vgAfU7h56KhYB
 optecozmBrHHAxvQZwmPeL9UyPWFjOF1BY/8LTMx7X+aVuCX6qx1JqO3a3ylAw4J
 tGXv6qFJfuCnu1d1b4X0ILce0iMUTOjQzvTcIm+BKjYycecl+3j1aczC/BOorIgc
 mf0+XeauhcTduK73pirnvx2b05eOxntgkOpwJytO2RP6pE0uK+2Th/C3Qg==
 =ba/Y
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Rework heuristics for resolving the fault IPA (HPFAR_EL2 v. re-walk
     stage-1 page tables) to align with the architecture. This avoids
     possibly taking an SEA at EL2 on the page table walk or using an
     architecturally UNKNOWN fault IPA

   - Use acquire/release semantics in the KVM FF-A proxy to avoid
     reading a stale value for the FF-A version

   - Fix KVM guest driver to match PV CPUID hypercall ABI

   - Use Inner Shareable Normal Write-Back mappings at stage-1 in KVM
     selftests, which is the only memory type for which atomic
     instructions are architecturally guaranteed to work

  s390:

   - Don't use %pK for debug printing and tracepoints

  x86:

   - Use a separate subclass when acquiring KVM's per-CPU posted
     interrupts wakeup lock in the scheduled out path, i.e. when adding
     a vCPU on the list of vCPUs to wake, to workaround a false positive
     deadlock. The schedule out code runs with a scheduler lock that the
     wakeup handler takes in the opposite order; but it does so with
     IRQs disabled and cannot run concurrently with a wakeup

   - Explicitly zero-initialize on-stack CPUID unions

   - Allow building irqbypass.ko as as module when kvm.ko is a module

   - Wrap relatively expensive sanity check with KVM_PROVE_MMU

   - Acquire SRCU in KVM_GET_MP_STATE to protect guest memory accesses

  selftests:

   - Add more scenarios to the MONITOR/MWAIT test

   - Add option to rseq test to override /dev/cpu_dma_latency

   - Bring list of exit reasons up to date

   - Cleanup Makefile to list once tests that are valid on all
     architectures

  Other:

   - Documentation fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (26 commits)
  KVM: arm64: Use acquire/release to communicate FF-A version negotiation
  KVM: arm64: selftests: Explicitly set the page attrs to Inner-Shareable
  KVM: arm64: selftests: Introduce and use hardware-definition macros
  KVM: VMX: Use separate subclasses for PI wakeup lock to squash false positive
  KVM: VMX: Assert that IRQs are disabled when putting vCPU on PI wakeup list
  KVM: x86: Explicitly zero-initialize on-stack CPUID unions
  KVM: Allow building irqbypass.ko as as module when kvm.ko is a module
  KVM: x86/mmu: Wrap sanity check on number of TDP MMU pages with KVM_PROVE_MMU
  KVM: selftests: Add option to rseq test to override /dev/cpu_dma_latency
  KVM: x86: Acquire SRCU in KVM_GET_MP_STATE to protect guest memory accesses
  Documentation: kvm: remove KVM_CAP_MIPS_TE
  Documentation: kvm: organize capabilities in the right section
  Documentation: kvm: fix some definition lists
  Documentation: kvm: drop "Capability" heading from capabilities
  Documentation: kvm: give correct name for KVM_CAP_SPAPR_MULTITCE
  Documentation: KVM: KVM_GET_SUPPORTED_CPUID now exposes TSC_DEADLINE
  selftests: kvm: list once tests that are valid on all architectures
  selftests: kvm: bring list of exit reasons up to date
  selftests: kvm: revamp MONITOR/MWAIT tests
  KVM: arm64: Don't translate FAR if invalid/unsafe
  ...
This commit is contained in:
Linus Torvalds 2025-04-08 13:47:55 -07:00
commit 0e8863244e
29 changed files with 964 additions and 786 deletions

File diff suppressed because it is too large Load Diff

View File

@ -121,6 +121,15 @@
#define ESR_ELx_FSC_SEA_TTW(n) (0x14 + (n))
#define ESR_ELx_FSC_SECC (0x18)
#define ESR_ELx_FSC_SECC_TTW(n) (0x1c + (n))
#define ESR_ELx_FSC_ADDRSZ (0x00)
/*
* Annoyingly, the negative levels for Address size faults aren't laid out
* contiguously (or in the desired order)
*/
#define ESR_ELx_FSC_ADDRSZ_nL(n) ((n) == -1 ? 0x25 : 0x2C)
#define ESR_ELx_FSC_ADDRSZ_L(n) ((n) < 0 ? ESR_ELx_FSC_ADDRSZ_nL(n) : \
(ESR_ELx_FSC_ADDRSZ + (n)))
/* Status codes for individual page table levels */
#define ESR_ELx_FSC_ACCESS_L(n) (ESR_ELx_FSC_ACCESS + (n))
@ -161,8 +170,6 @@
#define ESR_ELx_Xs_MASK (GENMASK_ULL(4, 0))
/* ISS field definitions for exceptions taken in to Hyp */
#define ESR_ELx_FSC_ADDRSZ (0x00)
#define ESR_ELx_FSC_ADDRSZ_L(n) (ESR_ELx_FSC_ADDRSZ + (n))
#define ESR_ELx_CV (UL(1) << 24)
#define ESR_ELx_COND_SHIFT (20)
#define ESR_ELx_COND_MASK (UL(0xF) << ESR_ELx_COND_SHIFT)
@ -464,6 +471,39 @@ static inline bool esr_fsc_is_access_flag_fault(unsigned long esr)
(esr == ESR_ELx_FSC_ACCESS_L(0));
}
static inline bool esr_fsc_is_addr_sz_fault(unsigned long esr)
{
esr &= ESR_ELx_FSC;
return (esr == ESR_ELx_FSC_ADDRSZ_L(3)) ||
(esr == ESR_ELx_FSC_ADDRSZ_L(2)) ||
(esr == ESR_ELx_FSC_ADDRSZ_L(1)) ||
(esr == ESR_ELx_FSC_ADDRSZ_L(0)) ||
(esr == ESR_ELx_FSC_ADDRSZ_L(-1));
}
static inline bool esr_fsc_is_sea_ttw(unsigned long esr)
{
esr = esr & ESR_ELx_FSC;
return (esr == ESR_ELx_FSC_SEA_TTW(3)) ||
(esr == ESR_ELx_FSC_SEA_TTW(2)) ||
(esr == ESR_ELx_FSC_SEA_TTW(1)) ||
(esr == ESR_ELx_FSC_SEA_TTW(0)) ||
(esr == ESR_ELx_FSC_SEA_TTW(-1));
}
static inline bool esr_fsc_is_secc_ttw(unsigned long esr)
{
esr = esr & ESR_ELx_FSC;
return (esr == ESR_ELx_FSC_SECC_TTW(3)) ||
(esr == ESR_ELx_FSC_SECC_TTW(2)) ||
(esr == ESR_ELx_FSC_SECC_TTW(1)) ||
(esr == ESR_ELx_FSC_SECC_TTW(0)) ||
(esr == ESR_ELx_FSC_SECC_TTW(-1));
}
/* Indicate whether ESR.EC==0x1A is for an ERETAx instruction */
static inline bool esr_iss_is_eretax(unsigned long esr)
{

View File

@ -305,7 +305,12 @@ static __always_inline unsigned long kvm_vcpu_get_hfar(const struct kvm_vcpu *vc
static __always_inline phys_addr_t kvm_vcpu_get_fault_ipa(const struct kvm_vcpu *vcpu)
{
return ((phys_addr_t)vcpu->arch.fault.hpfar_el2 & HPFAR_MASK) << 8;
u64 hpfar = vcpu->arch.fault.hpfar_el2;
if (unlikely(!(hpfar & HPFAR_EL2_NS)))
return INVALID_GPA;
return FIELD_GET(HPFAR_EL2_FIPA, hpfar) << 12;
}
static inline u64 kvm_vcpu_get_disr(const struct kvm_vcpu *vcpu)

View File

@ -14,7 +14,7 @@
* Was this synchronous external abort a RAS notification?
* Returns '0' for errors handled by some RAS subsystem, or -ENOENT.
*/
static inline int kvm_handle_guest_sea(phys_addr_t addr, u64 esr)
static inline int kvm_handle_guest_sea(void)
{
/* apei_claim_sea(NULL) expects to mask interrupts itself */
lockdep_assert_irqs_enabled();

View File

@ -12,6 +12,16 @@
#include <asm/kvm_hyp.h>
#include <asm/kvm_mmu.h>
static inline bool __fault_safe_to_translate(u64 esr)
{
u64 fsc = esr & ESR_ELx_FSC;
if (esr_fsc_is_sea_ttw(esr) || esr_fsc_is_secc_ttw(esr))
return false;
return !(fsc == ESR_ELx_FSC_EXTABT && (esr & ESR_ELx_FnV));
}
static inline bool __translate_far_to_hpfar(u64 far, u64 *hpfar)
{
int ret;
@ -44,34 +54,50 @@ static inline bool __translate_far_to_hpfar(u64 far, u64 *hpfar)
return true;
}
/*
* Checks for the conditions when HPFAR_EL2 is written, per ARM ARM R_FKLWR.
*/
static inline bool __hpfar_valid(u64 esr)
{
/*
* CPUs affected by ARM erratum #834220 may incorrectly report a
* stage-2 translation fault when a stage-1 permission fault occurs.
*
* Re-walk the page tables to determine if a stage-1 fault actually
* occurred.
*/
if (cpus_have_final_cap(ARM64_WORKAROUND_834220) &&
esr_fsc_is_translation_fault(esr))
return false;
if (esr_fsc_is_translation_fault(esr) || esr_fsc_is_access_flag_fault(esr))
return true;
if ((esr & ESR_ELx_S1PTW) && esr_fsc_is_permission_fault(esr))
return true;
return esr_fsc_is_addr_sz_fault(esr);
}
static inline bool __get_fault_info(u64 esr, struct kvm_vcpu_fault_info *fault)
{
u64 hpfar, far;
u64 hpfar;
far = read_sysreg_el2(SYS_FAR);
fault->far_el2 = read_sysreg_el2(SYS_FAR);
fault->hpfar_el2 = 0;
if (__hpfar_valid(esr))
hpfar = read_sysreg(hpfar_el2);
else if (unlikely(!__fault_safe_to_translate(esr)))
return true;
else if (!__translate_far_to_hpfar(fault->far_el2, &hpfar))
return false;
/*
* The HPFAR can be invalid if the stage 2 fault did not
* happen during a stage 1 page table walk (the ESR_EL2.S1PTW
* bit is clear) and one of the two following cases are true:
* 1. The fault was due to a permission fault
* 2. The processor carries errata 834220
*
* Therefore, for all non S1PTW faults where we either have a
* permission fault or the errata workaround is enabled, we
* resolve the IPA using the AT instruction.
* Hijack HPFAR_EL2.NS (RES0 in Non-secure) to indicate a valid
* HPFAR value.
*/
if (!(esr & ESR_ELx_S1PTW) &&
(cpus_have_final_cap(ARM64_WORKAROUND_834220) ||
esr_fsc_is_permission_fault(esr))) {
if (!__translate_far_to_hpfar(far, &hpfar))
return false;
} else {
hpfar = read_sysreg(hpfar_el2);
}
fault->far_el2 = far;
fault->hpfar_el2 = hpfar;
fault->hpfar_el2 = hpfar | HPFAR_EL2_NS;
return true;
}

View File

@ -730,10 +730,10 @@ static void do_ffa_version(struct arm_smccc_res *res,
hyp_ffa_version = ffa_req_version;
}
if (hyp_ffa_post_init())
if (hyp_ffa_post_init()) {
res->a0 = FFA_RET_NOT_SUPPORTED;
else {
has_version_negotiated = true;
} else {
smp_store_release(&has_version_negotiated, true);
res->a0 = hyp_ffa_version;
}
unlock:
@ -809,7 +809,8 @@ bool kvm_host_ffa_handler(struct kvm_cpu_context *host_ctxt, u32 func_id)
if (!is_ffa_call(func_id))
return false;
if (!has_version_negotiated && func_id != FFA_VERSION) {
if (func_id != FFA_VERSION &&
!smp_load_acquire(&has_version_negotiated)) {
ffa_to_smccc_error(&res, FFA_RET_INVALID_PARAMETERS);
goto out_handled;
}

View File

@ -578,7 +578,14 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
return;
}
addr = (fault.hpfar_el2 & HPFAR_MASK) << 8;
/*
* Yikes, we couldn't resolve the fault IPA. This should reinject an
* abort into the host when we figure out how to do that.
*/
BUG_ON(!(fault.hpfar_el2 & HPFAR_EL2_NS));
addr = FIELD_GET(HPFAR_EL2_FIPA, fault.hpfar_el2) << 12;
ret = host_stage2_idmap(addr);
BUG_ON(ret && ret != -EAGAIN);
}

View File

@ -1794,9 +1794,28 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
gfn_t gfn;
int ret, idx;
/* Synchronous External Abort? */
if (kvm_vcpu_abt_issea(vcpu)) {
/*
* For RAS the host kernel may handle this abort.
* There is no need to pass the error into the guest.
*/
if (kvm_handle_guest_sea())
kvm_inject_vabt(vcpu);
return 1;
}
esr = kvm_vcpu_get_esr(vcpu);
/*
* The fault IPA should be reliable at this point as we're not dealing
* with an SEA.
*/
ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
if (KVM_BUG_ON(ipa == INVALID_GPA, vcpu->kvm))
return -EFAULT;
is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
if (esr_fsc_is_translation_fault(esr)) {
@ -1818,18 +1837,6 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
}
}
/* Synchronous External Abort? */
if (kvm_vcpu_abt_issea(vcpu)) {
/*
* For RAS the host kernel may handle this abort.
* There is no need to pass the error into the guest.
*/
if (kvm_handle_guest_sea(fault_ipa, kvm_vcpu_get_esr(vcpu)))
kvm_inject_vabt(vcpu);
return 1;
}
trace_kvm_guest_fault(*vcpu_pc(vcpu), kvm_vcpu_get_esr(vcpu),
kvm_vcpu_get_hfar(vcpu), fault_ipa);

View File

@ -3536,3 +3536,10 @@ Field 5 F
Field 4 P
Field 3:0 Align
EndSysreg
Sysreg HPFAR_EL2 3 4 6 0 4
Field 63 NS
Res0 62:48
Field 47:4 FIPA
Res0 3:0
EndSysreg

View File

@ -95,7 +95,7 @@ static int handle_validity(struct kvm_vcpu *vcpu)
vcpu->stat.exit_validity++;
trace_kvm_s390_intercept_validity(vcpu, viwhy);
KVM_EVENT(3, "validity intercept 0x%x for pid %u (kvm 0x%pK)", viwhy,
KVM_EVENT(3, "validity intercept 0x%x for pid %u (kvm 0x%p)", viwhy,
current->pid, vcpu->kvm);
/* do not warn on invalid runtime instrumentation mode */

View File

@ -3161,7 +3161,7 @@ void kvm_s390_gisa_clear(struct kvm *kvm)
if (!gi->origin)
return;
gisa_clear_ipm(gi->origin);
VM_EVENT(kvm, 3, "gisa 0x%pK cleared", gi->origin);
VM_EVENT(kvm, 3, "gisa 0x%p cleared", gi->origin);
}
void kvm_s390_gisa_init(struct kvm *kvm)
@ -3177,7 +3177,7 @@ void kvm_s390_gisa_init(struct kvm *kvm)
hrtimer_setup(&gi->timer, gisa_vcpu_kicker, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
memset(gi->origin, 0, sizeof(struct kvm_s390_gisa));
gi->origin->next_alert = (u32)virt_to_phys(gi->origin);
VM_EVENT(kvm, 3, "gisa 0x%pK initialized", gi->origin);
VM_EVENT(kvm, 3, "gisa 0x%p initialized", gi->origin);
}
void kvm_s390_gisa_enable(struct kvm *kvm)
@ -3218,7 +3218,7 @@ void kvm_s390_gisa_destroy(struct kvm *kvm)
process_gib_alert_list();
hrtimer_cancel(&gi->timer);
gi->origin = NULL;
VM_EVENT(kvm, 3, "gisa 0x%pK destroyed", gisa);
VM_EVENT(kvm, 3, "gisa 0x%p destroyed", gisa);
}
void kvm_s390_gisa_disable(struct kvm *kvm)
@ -3467,7 +3467,7 @@ int __init kvm_s390_gib_init(u8 nisc)
}
}
KVM_EVENT(3, "gib 0x%pK (nisc=%d) initialized", gib, gib->nisc);
KVM_EVENT(3, "gib 0x%p (nisc=%d) initialized", gib, gib->nisc);
goto out;
out_unreg_gal:

View File

@ -1022,7 +1022,7 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, struct kvm_device_attr *att
}
mutex_unlock(&kvm->lock);
VM_EVENT(kvm, 3, "SET: max guest address: %lu", new_limit);
VM_EVENT(kvm, 3, "New guest asce: 0x%pK",
VM_EVENT(kvm, 3, "New guest asce: 0x%p",
(void *) kvm->arch.gmap->asce);
break;
}
@ -3466,7 +3466,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
kvm_s390_gisa_init(kvm);
INIT_LIST_HEAD(&kvm->arch.pv.need_cleanup);
kvm->arch.pv.set_aside = NULL;
KVM_EVENT(3, "vm 0x%pK created by pid %u", kvm, current->pid);
KVM_EVENT(3, "vm 0x%p created by pid %u", kvm, current->pid);
return 0;
out_err:
@ -3529,7 +3529,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm_s390_destroy_adapters(kvm);
kvm_s390_clear_float_irqs(kvm);
kvm_s390_vsie_destroy(kvm);
KVM_EVENT(3, "vm 0x%pK destroyed", kvm);
KVM_EVENT(3, "vm 0x%p destroyed", kvm);
}
/* Section: vcpu related */
@ -3650,7 +3650,7 @@ static int sca_switch_to_extended(struct kvm *kvm)
free_page((unsigned long)old_sca);
VM_EVENT(kvm, 2, "Switched to ESCA (0x%pK -> 0x%pK)",
VM_EVENT(kvm, 2, "Switched to ESCA (0x%p -> 0x%p)",
old_sca, kvm->arch.sca);
return 0;
}
@ -4027,7 +4027,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
goto out_free_sie_block;
}
VM_EVENT(vcpu->kvm, 3, "create cpu %d at 0x%pK, sie block at 0x%pK",
VM_EVENT(vcpu->kvm, 3, "create cpu %d at 0x%p, sie block at 0x%p",
vcpu->vcpu_id, vcpu, vcpu->arch.sie_block);
trace_kvm_s390_create_vcpu(vcpu->vcpu_id, vcpu, vcpu->arch.sie_block);

View File

@ -56,7 +56,7 @@ TRACE_EVENT(kvm_s390_create_vcpu,
__entry->sie_block = sie_block;
),
TP_printk("create cpu %d at 0x%pK, sie block at 0x%pK",
TP_printk("create cpu %d at 0x%p, sie block at 0x%p",
__entry->id, __entry->vcpu, __entry->sie_block)
);
@ -255,7 +255,7 @@ TRACE_EVENT(kvm_s390_enable_css,
__entry->kvm = kvm;
),
TP_printk("enabling channel I/O support (kvm @ %pK)\n",
TP_printk("enabling channel I/O support (kvm @ %p)\n",
__entry->kvm)
);

View File

@ -1472,8 +1472,13 @@ struct kvm_arch {
struct once nx_once;
#ifdef CONFIG_X86_64
/* The number of TDP MMU pages across all roots. */
#ifdef CONFIG_KVM_PROVE_MMU
/*
* The number of TDP MMU pages across all roots. Used only to sanity
* check that KVM isn't leaking TDP MMU pages.
*/
atomic64_t tdp_mmu_pages;
#endif
/*
* List of struct kvm_mmu_pages being used as roots.

View File

@ -1427,8 +1427,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
}
break;
case 0xa: { /* Architectural Performance Monitoring */
union cpuid10_eax eax;
union cpuid10_edx edx;
union cpuid10_eax eax = { };
union cpuid10_edx edx = { };
if (!enable_pmu || !static_cpu_has(X86_FEATURE_ARCH_PERFMON)) {
entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
@ -1444,8 +1444,6 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
if (kvm_pmu_cap.version)
edx.split.anythread_deprecated = 1;
edx.split.reserved1 = 0;
edx.split.reserved2 = 0;
entry->eax = eax.full;
entry->ebx = kvm_pmu_cap.events_mask;
@ -1763,7 +1761,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
break;
/* AMD Extended Performance Monitoring and Debug */
case 0x80000022: {
union cpuid_0x80000022_ebx ebx;
union cpuid_0x80000022_ebx ebx = { };
entry->ecx = entry->edx = 0;
if (!enable_pmu || !kvm_cpu_cap_has(X86_FEATURE_PERFMON_V2)) {

View File

@ -40,7 +40,9 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
kvm_tdp_mmu_invalidate_roots(kvm, KVM_VALID_ROOTS);
kvm_tdp_mmu_zap_invalidated_roots(kvm, false);
WARN_ON(atomic64_read(&kvm->arch.tdp_mmu_pages));
#ifdef CONFIG_KVM_PROVE_MMU
KVM_MMU_WARN_ON(atomic64_read(&kvm->arch.tdp_mmu_pages));
#endif
WARN_ON(!list_empty(&kvm->arch.tdp_mmu_roots));
/*
@ -325,13 +327,17 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
{
kvm_account_pgtable_pages((void *)sp->spt, +1);
#ifdef CONFIG_KVM_PROVE_MMU
atomic64_inc(&kvm->arch.tdp_mmu_pages);
#endif
}
static void tdp_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
{
kvm_account_pgtable_pages((void *)sp->spt, -1);
#ifdef CONFIG_KVM_PROVE_MMU
atomic64_dec(&kvm->arch.tdp_mmu_pages);
#endif
}
/**

View File

@ -31,6 +31,8 @@ static DEFINE_PER_CPU(struct list_head, wakeup_vcpus_on_cpu);
*/
static DEFINE_PER_CPU(raw_spinlock_t, wakeup_vcpus_on_cpu_lock);
#define PI_LOCK_SCHED_OUT SINGLE_DEPTH_NESTING
static inline struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
{
return &(to_vmx(vcpu)->pi_desc);
@ -89,9 +91,20 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
* current pCPU if the task was migrated.
*/
if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) {
raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
raw_spinlock_t *spinlock = &per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu);
/*
* In addition to taking the wakeup lock for the regular/IRQ
* context, tell lockdep it is being taken for the "sched out"
* context as well. vCPU loads happens in task context, and
* this is taking the lock of the *previous* CPU, i.e. can race
* with both the scheduler and the wakeup handler.
*/
raw_spin_lock(spinlock);
spin_acquire(&spinlock->dep_map, PI_LOCK_SCHED_OUT, 0, _RET_IP_);
list_del(&vmx->pi_wakeup_list);
raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
spin_release(&spinlock->dep_map, _RET_IP_);
raw_spin_unlock(spinlock);
}
dest = cpu_physical_id(cpu);
@ -148,11 +161,23 @@ static void pi_enable_wakeup_handler(struct kvm_vcpu *vcpu)
struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct pi_desc old, new;
unsigned long flags;
local_irq_save(flags);
lockdep_assert_irqs_disabled();
raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
/*
* Acquire the wakeup lock using the "sched out" context to workaround
* a lockdep false positive. When this is called, schedule() holds
* various per-CPU scheduler locks. When the wakeup handler runs, it
* holds this CPU's wakeup lock while calling try_to_wake_up(), which
* can eventually take the aforementioned scheduler locks, which causes
* lockdep to assume there is deadlock.
*
* Deadlock can't actually occur because IRQs are disabled for the
* entirety of the sched_out critical section, i.e. the wakeup handler
* can't run while the scheduler locks are held.
*/
raw_spin_lock_nested(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu),
PI_LOCK_SCHED_OUT);
list_add_tail(&vmx->pi_wakeup_list,
&per_cpu(wakeup_vcpus_on_cpu, vcpu->cpu));
raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
@ -176,8 +201,6 @@ static void pi_enable_wakeup_handler(struct kvm_vcpu *vcpu)
*/
if (pi_test_on(&new))
__apic_send_IPI_self(POSTED_INTR_WAKEUP_VECTOR);
local_irq_restore(flags);
}
static bool vmx_needs_pi_wakeup(struct kvm_vcpu *vcpu)

View File

@ -11786,6 +11786,8 @@ int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
if (kvm_mpx_supported())
kvm_load_guest_fpu(vcpu);
kvm_vcpu_srcu_read_lock(vcpu);
r = kvm_apic_accept_events(vcpu);
if (r < 0)
goto out;
@ -11799,6 +11801,8 @@ int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
mp_state->mp_state = vcpu->arch.mp_state;
out:
kvm_vcpu_srcu_read_unlock(vcpu);
if (kvm_mpx_supported())
kvm_put_guest_fpu(vcpu);
vcpu_put(vcpu);

View File

@ -95,7 +95,7 @@ void __init kvm_arm_target_impl_cpu_init(void)
for (i = 0; i < max_cpus; i++) {
arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_DISCOVER_IMPL_CPUS_FUNC_ID,
i, &res);
i, 0, 0, &res);
if (res.a0 != SMCCC_RET_SUCCESS) {
pr_warn("Discovering target implementation CPUs failed\n");
goto mem_free;
@ -103,7 +103,7 @@ void __init kvm_arm_target_impl_cpu_init(void)
target[i].midr = res.a1;
target[i].revidr = res.a2;
target[i].aidr = res.a3;
};
}
if (!cpu_errata_set_target_impl(max_cpus, target)) {
pr_warn("Failed to set target implementation CPUs\n");

View File

@ -2382,7 +2382,7 @@ static inline bool kvm_is_visible_memslot(struct kvm_memory_slot *memslot)
struct kvm_vcpu *kvm_get_running_vcpu(void);
struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void);
#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
#if IS_ENABLED(CONFIG_HAVE_KVM_IRQ_BYPASS)
bool kvm_arch_has_irq_bypass(void);
int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *,
struct irq_bypass_producer *);

View File

@ -50,8 +50,18 @@ LIBKVM_riscv += lib/riscv/ucall.c
# Non-compiled test targets
TEST_PROGS_x86 += x86/nx_huge_pages_test.sh
# Compiled test targets valid on all architectures with libkvm support
TEST_GEN_PROGS_COMMON = demand_paging_test
TEST_GEN_PROGS_COMMON += dirty_log_test
TEST_GEN_PROGS_COMMON += guest_print_test
TEST_GEN_PROGS_COMMON += kvm_binary_stats_test
TEST_GEN_PROGS_COMMON += kvm_create_max_vcpus
TEST_GEN_PROGS_COMMON += kvm_page_table_test
TEST_GEN_PROGS_COMMON += set_memory_region_test
# Compiled test targets
TEST_GEN_PROGS_x86 = x86/cpuid_test
TEST_GEN_PROGS_x86 = $(TEST_GEN_PROGS_COMMON)
TEST_GEN_PROGS_x86 += x86/cpuid_test
TEST_GEN_PROGS_x86 += x86/cr4_cpuid_sync_test
TEST_GEN_PROGS_x86 += x86/dirty_log_page_splitting_test
TEST_GEN_PROGS_x86 += x86/feature_msrs_test
@ -119,27 +129,21 @@ TEST_GEN_PROGS_x86 += x86/triple_fault_event_test
TEST_GEN_PROGS_x86 += x86/recalc_apic_map_test
TEST_GEN_PROGS_x86 += access_tracking_perf_test
TEST_GEN_PROGS_x86 += coalesced_io_test
TEST_GEN_PROGS_x86 += demand_paging_test
TEST_GEN_PROGS_x86 += dirty_log_test
TEST_GEN_PROGS_x86 += dirty_log_perf_test
TEST_GEN_PROGS_x86 += guest_memfd_test
TEST_GEN_PROGS_x86 += guest_print_test
TEST_GEN_PROGS_x86 += hardware_disable_test
TEST_GEN_PROGS_x86 += kvm_create_max_vcpus
TEST_GEN_PROGS_x86 += kvm_page_table_test
TEST_GEN_PROGS_x86 += memslot_modification_stress_test
TEST_GEN_PROGS_x86 += memslot_perf_test
TEST_GEN_PROGS_x86 += mmu_stress_test
TEST_GEN_PROGS_x86 += rseq_test
TEST_GEN_PROGS_x86 += set_memory_region_test
TEST_GEN_PROGS_x86 += steal_time
TEST_GEN_PROGS_x86 += kvm_binary_stats_test
TEST_GEN_PROGS_x86 += system_counter_offset_test
TEST_GEN_PROGS_x86 += pre_fault_memory_test
# Compiled outputs used by test targets
TEST_GEN_PROGS_EXTENDED_x86 += x86/nx_huge_pages_test
TEST_GEN_PROGS_arm64 = $(TEST_GEN_PROGS_COMMON)
TEST_GEN_PROGS_arm64 += arm64/aarch32_id_regs
TEST_GEN_PROGS_arm64 += arm64/arch_timer_edge_cases
TEST_GEN_PROGS_arm64 += arm64/debug-exceptions
@ -158,22 +162,16 @@ TEST_GEN_PROGS_arm64 += arm64/no-vgic-v3
TEST_GEN_PROGS_arm64 += access_tracking_perf_test
TEST_GEN_PROGS_arm64 += arch_timer
TEST_GEN_PROGS_arm64 += coalesced_io_test
TEST_GEN_PROGS_arm64 += demand_paging_test
TEST_GEN_PROGS_arm64 += dirty_log_test
TEST_GEN_PROGS_arm64 += dirty_log_perf_test
TEST_GEN_PROGS_arm64 += guest_print_test
TEST_GEN_PROGS_arm64 += get-reg-list
TEST_GEN_PROGS_arm64 += kvm_create_max_vcpus
TEST_GEN_PROGS_arm64 += kvm_page_table_test
TEST_GEN_PROGS_arm64 += memslot_modification_stress_test
TEST_GEN_PROGS_arm64 += memslot_perf_test
TEST_GEN_PROGS_arm64 += mmu_stress_test
TEST_GEN_PROGS_arm64 += rseq_test
TEST_GEN_PROGS_arm64 += set_memory_region_test
TEST_GEN_PROGS_arm64 += steal_time
TEST_GEN_PROGS_arm64 += kvm_binary_stats_test
TEST_GEN_PROGS_s390 = s390/memop
TEST_GEN_PROGS_s390 = $(TEST_GEN_PROGS_COMMON)
TEST_GEN_PROGS_s390 += s390/memop
TEST_GEN_PROGS_s390 += s390/resets
TEST_GEN_PROGS_s390 += s390/sync_regs_test
TEST_GEN_PROGS_s390 += s390/tprot
@ -182,27 +180,14 @@ TEST_GEN_PROGS_s390 += s390/debug_test
TEST_GEN_PROGS_s390 += s390/cpumodel_subfuncs_test
TEST_GEN_PROGS_s390 += s390/shared_zeropage_test
TEST_GEN_PROGS_s390 += s390/ucontrol_test
TEST_GEN_PROGS_s390 += demand_paging_test
TEST_GEN_PROGS_s390 += dirty_log_test
TEST_GEN_PROGS_s390 += guest_print_test
TEST_GEN_PROGS_s390 += kvm_create_max_vcpus
TEST_GEN_PROGS_s390 += kvm_page_table_test
TEST_GEN_PROGS_s390 += rseq_test
TEST_GEN_PROGS_s390 += set_memory_region_test
TEST_GEN_PROGS_s390 += kvm_binary_stats_test
TEST_GEN_PROGS_riscv = $(TEST_GEN_PROGS_COMMON)
TEST_GEN_PROGS_riscv += riscv/sbi_pmu_test
TEST_GEN_PROGS_riscv += riscv/ebreak_test
TEST_GEN_PROGS_riscv += arch_timer
TEST_GEN_PROGS_riscv += coalesced_io_test
TEST_GEN_PROGS_riscv += demand_paging_test
TEST_GEN_PROGS_riscv += dirty_log_test
TEST_GEN_PROGS_riscv += get-reg-list
TEST_GEN_PROGS_riscv += guest_print_test
TEST_GEN_PROGS_riscv += kvm_binary_stats_test
TEST_GEN_PROGS_riscv += kvm_create_max_vcpus
TEST_GEN_PROGS_riscv += kvm_page_table_test
TEST_GEN_PROGS_riscv += set_memory_region_test
TEST_GEN_PROGS_riscv += steal_time
SPLIT_TESTS += arch_timer

View File

@ -199,7 +199,7 @@ static bool guest_set_ha(void)
if (hadbs == 0)
return false;
tcr = read_sysreg(tcr_el1) | TCR_EL1_HA;
tcr = read_sysreg(tcr_el1) | TCR_HA;
write_sysreg(tcr, tcr_el1);
isb();

View File

@ -62,6 +62,67 @@
MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL) | \
MAIR_ATTRIDX(MAIR_ATTR_NORMAL_WT, MT_NORMAL_WT))
/* TCR_EL1 specific flags */
#define TCR_T0SZ_OFFSET 0
#define TCR_T0SZ(x) ((UL(64) - (x)) << TCR_T0SZ_OFFSET)
#define TCR_IRGN0_SHIFT 8
#define TCR_IRGN0_MASK (UL(3) << TCR_IRGN0_SHIFT)
#define TCR_IRGN0_NC (UL(0) << TCR_IRGN0_SHIFT)
#define TCR_IRGN0_WBWA (UL(1) << TCR_IRGN0_SHIFT)
#define TCR_IRGN0_WT (UL(2) << TCR_IRGN0_SHIFT)
#define TCR_IRGN0_WBnWA (UL(3) << TCR_IRGN0_SHIFT)
#define TCR_ORGN0_SHIFT 10
#define TCR_ORGN0_MASK (UL(3) << TCR_ORGN0_SHIFT)
#define TCR_ORGN0_NC (UL(0) << TCR_ORGN0_SHIFT)
#define TCR_ORGN0_WBWA (UL(1) << TCR_ORGN0_SHIFT)
#define TCR_ORGN0_WT (UL(2) << TCR_ORGN0_SHIFT)
#define TCR_ORGN0_WBnWA (UL(3) << TCR_ORGN0_SHIFT)
#define TCR_SH0_SHIFT 12
#define TCR_SH0_MASK (UL(3) << TCR_SH0_SHIFT)
#define TCR_SH0_INNER (UL(3) << TCR_SH0_SHIFT)
#define TCR_TG0_SHIFT 14
#define TCR_TG0_MASK (UL(3) << TCR_TG0_SHIFT)
#define TCR_TG0_4K (UL(0) << TCR_TG0_SHIFT)
#define TCR_TG0_64K (UL(1) << TCR_TG0_SHIFT)
#define TCR_TG0_16K (UL(2) << TCR_TG0_SHIFT)
#define TCR_IPS_SHIFT 32
#define TCR_IPS_MASK (UL(7) << TCR_IPS_SHIFT)
#define TCR_IPS_52_BITS (UL(6) << TCR_IPS_SHIFT)
#define TCR_IPS_48_BITS (UL(5) << TCR_IPS_SHIFT)
#define TCR_IPS_40_BITS (UL(2) << TCR_IPS_SHIFT)
#define TCR_IPS_36_BITS (UL(1) << TCR_IPS_SHIFT)
#define TCR_HA (UL(1) << 39)
#define TCR_DS (UL(1) << 59)
/*
* AttrIndx[2:0] encoding (mapping attributes defined in the MAIR* registers).
*/
#define PTE_ATTRINDX(t) ((t) << 2)
#define PTE_ATTRINDX_MASK GENMASK(4, 2)
#define PTE_ATTRINDX_SHIFT 2
#define PTE_VALID BIT(0)
#define PGD_TYPE_TABLE BIT(1)
#define PUD_TYPE_TABLE BIT(1)
#define PMD_TYPE_TABLE BIT(1)
#define PTE_TYPE_PAGE BIT(1)
#define PTE_SHARED (UL(3) << 8) /* SH[1:0], inner shareable */
#define PTE_AF BIT(10)
#define PTE_ADDR_MASK(page_shift) GENMASK(47, (page_shift))
#define PTE_ADDR_51_48 GENMASK(15, 12)
#define PTE_ADDR_51_48_SHIFT 12
#define PTE_ADDR_MASK_LPA2(page_shift) GENMASK(49, (page_shift))
#define PTE_ADDR_51_50_LPA2 GENMASK(9, 8)
#define PTE_ADDR_51_50_LPA2_SHIFT 8
void aarch64_vcpu_setup(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init);
struct kvm_vcpu *aarch64_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id,
struct kvm_vcpu_init *init, void *guest_code);
@ -102,12 +163,6 @@ enum {
(v) == VECTOR_SYNC_LOWER_64 || \
(v) == VECTOR_SYNC_LOWER_32)
/* Access flag */
#define PTE_AF (1ULL << 10)
/* Access flag update enable/disable */
#define TCR_EL1_HA (1ULL << 39)
void aarch64_get_supported_page_sizes(uint32_t ipa, uint32_t *ipa4k,
uint32_t *ipa16k, uint32_t *ipa64k);

View File

@ -72,13 +72,13 @@ static uint64_t addr_pte(struct kvm_vm *vm, uint64_t pa, uint64_t attrs)
uint64_t pte;
if (use_lpa2_pte_format(vm)) {
pte = pa & GENMASK(49, vm->page_shift);
pte |= FIELD_GET(GENMASK(51, 50), pa) << 8;
attrs &= ~GENMASK(9, 8);
pte = pa & PTE_ADDR_MASK_LPA2(vm->page_shift);
pte |= FIELD_GET(GENMASK(51, 50), pa) << PTE_ADDR_51_50_LPA2_SHIFT;
attrs &= ~PTE_ADDR_51_50_LPA2;
} else {
pte = pa & GENMASK(47, vm->page_shift);
pte = pa & PTE_ADDR_MASK(vm->page_shift);
if (vm->page_shift == 16)
pte |= FIELD_GET(GENMASK(51, 48), pa) << 12;
pte |= FIELD_GET(GENMASK(51, 48), pa) << PTE_ADDR_51_48_SHIFT;
}
pte |= attrs;
@ -90,12 +90,12 @@ static uint64_t pte_addr(struct kvm_vm *vm, uint64_t pte)
uint64_t pa;
if (use_lpa2_pte_format(vm)) {
pa = pte & GENMASK(49, vm->page_shift);
pa |= FIELD_GET(GENMASK(9, 8), pte) << 50;
pa = pte & PTE_ADDR_MASK_LPA2(vm->page_shift);
pa |= FIELD_GET(PTE_ADDR_51_50_LPA2, pte) << 50;
} else {
pa = pte & GENMASK(47, vm->page_shift);
pa = pte & PTE_ADDR_MASK(vm->page_shift);
if (vm->page_shift == 16)
pa |= FIELD_GET(GENMASK(15, 12), pte) << 48;
pa |= FIELD_GET(PTE_ADDR_51_48, pte) << 48;
}
return pa;
@ -128,7 +128,8 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
static void _virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
uint64_t flags)
{
uint8_t attr_idx = flags & 7;
uint8_t attr_idx = flags & (PTE_ATTRINDX_MASK >> PTE_ATTRINDX_SHIFT);
uint64_t pg_attr;
uint64_t *ptep;
TEST_ASSERT((vaddr % vm->page_size) == 0,
@ -147,18 +148,21 @@ static void _virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
ptep = addr_gpa2hva(vm, vm->pgd) + pgd_index(vm, vaddr) * 8;
if (!*ptep)
*ptep = addr_pte(vm, vm_alloc_page_table(vm), 3);
*ptep = addr_pte(vm, vm_alloc_page_table(vm),
PGD_TYPE_TABLE | PTE_VALID);
switch (vm->pgtable_levels) {
case 4:
ptep = addr_gpa2hva(vm, pte_addr(vm, *ptep)) + pud_index(vm, vaddr) * 8;
if (!*ptep)
*ptep = addr_pte(vm, vm_alloc_page_table(vm), 3);
*ptep = addr_pte(vm, vm_alloc_page_table(vm),
PUD_TYPE_TABLE | PTE_VALID);
/* fall through */
case 3:
ptep = addr_gpa2hva(vm, pte_addr(vm, *ptep)) + pmd_index(vm, vaddr) * 8;
if (!*ptep)
*ptep = addr_pte(vm, vm_alloc_page_table(vm), 3);
*ptep = addr_pte(vm, vm_alloc_page_table(vm),
PMD_TYPE_TABLE | PTE_VALID);
/* fall through */
case 2:
ptep = addr_gpa2hva(vm, pte_addr(vm, *ptep)) + pte_index(vm, vaddr) * 8;
@ -167,7 +171,11 @@ static void _virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
TEST_FAIL("Page table levels must be 2, 3, or 4");
}
*ptep = addr_pte(vm, paddr, (attr_idx << 2) | (1 << 10) | 3); /* AF */
pg_attr = PTE_AF | PTE_ATTRINDX(attr_idx) | PTE_TYPE_PAGE | PTE_VALID;
if (!use_lpa2_pte_format(vm))
pg_attr |= PTE_SHARED;
*ptep = addr_pte(vm, paddr, pg_attr);
}
void virt_arch_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr)
@ -293,20 +301,20 @@ void aarch64_vcpu_setup(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init)
case VM_MODE_P48V48_64K:
case VM_MODE_P40V48_64K:
case VM_MODE_P36V48_64K:
tcr_el1 |= 1ul << 14; /* TG0 = 64KB */
tcr_el1 |= TCR_TG0_64K;
break;
case VM_MODE_P52V48_16K:
case VM_MODE_P48V48_16K:
case VM_MODE_P40V48_16K:
case VM_MODE_P36V48_16K:
case VM_MODE_P36V47_16K:
tcr_el1 |= 2ul << 14; /* TG0 = 16KB */
tcr_el1 |= TCR_TG0_16K;
break;
case VM_MODE_P52V48_4K:
case VM_MODE_P48V48_4K:
case VM_MODE_P40V48_4K:
case VM_MODE_P36V48_4K:
tcr_el1 |= 0ul << 14; /* TG0 = 4KB */
tcr_el1 |= TCR_TG0_4K;
break;
default:
TEST_FAIL("Unknown guest mode, mode: 0x%x", vm->mode);
@ -319,35 +327,35 @@ void aarch64_vcpu_setup(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init)
case VM_MODE_P52V48_4K:
case VM_MODE_P52V48_16K:
case VM_MODE_P52V48_64K:
tcr_el1 |= 6ul << 32; /* IPS = 52 bits */
tcr_el1 |= TCR_IPS_52_BITS;
ttbr0_el1 |= FIELD_GET(GENMASK(51, 48), vm->pgd) << 2;
break;
case VM_MODE_P48V48_4K:
case VM_MODE_P48V48_16K:
case VM_MODE_P48V48_64K:
tcr_el1 |= 5ul << 32; /* IPS = 48 bits */
tcr_el1 |= TCR_IPS_48_BITS;
break;
case VM_MODE_P40V48_4K:
case VM_MODE_P40V48_16K:
case VM_MODE_P40V48_64K:
tcr_el1 |= 2ul << 32; /* IPS = 40 bits */
tcr_el1 |= TCR_IPS_40_BITS;
break;
case VM_MODE_P36V48_4K:
case VM_MODE_P36V48_16K:
case VM_MODE_P36V48_64K:
case VM_MODE_P36V47_16K:
tcr_el1 |= 1ul << 32; /* IPS = 36 bits */
tcr_el1 |= TCR_IPS_36_BITS;
break;
default:
TEST_FAIL("Unknown guest mode, mode: 0x%x", vm->mode);
}
sctlr_el1 |= (1 << 0) | (1 << 2) | (1 << 12) /* M | C | I */;
/* TCR_EL1 |= IRGN0:WBWA | ORGN0:WBWA | SH0:Inner-Shareable */;
tcr_el1 |= (1 << 8) | (1 << 10) | (3 << 12);
tcr_el1 |= (64 - vm->va_bits) /* T0SZ */;
sctlr_el1 |= SCTLR_ELx_M | SCTLR_ELx_C | SCTLR_ELx_I;
tcr_el1 |= TCR_IRGN0_WBWA | TCR_ORGN0_WBWA | TCR_SH0_INNER;
tcr_el1 |= TCR_T0SZ(vm->va_bits);
if (use_lpa2_pte_format(vm))
tcr_el1 |= (1ul << 59) /* DS */;
tcr_el1 |= TCR_DS;
vcpu_set_reg(vcpu, KVM_ARM64_SYS_REG(SYS_SCTLR_EL1), sctlr_el1);
vcpu_set_reg(vcpu, KVM_ARM64_SYS_REG(SYS_TCR_EL1), tcr_el1);

View File

@ -2019,9 +2019,8 @@ static struct exit_reason {
KVM_EXIT_STRING(RISCV_SBI),
KVM_EXIT_STRING(RISCV_CSR),
KVM_EXIT_STRING(NOTIFY),
#ifdef KVM_EXIT_MEMORY_NOT_PRESENT
KVM_EXIT_STRING(MEMORY_NOT_PRESENT),
#endif
KVM_EXIT_STRING(LOONGARCH_IOCSR),
KVM_EXIT_STRING(MEMORY_FAULT),
};
/*

View File

@ -196,25 +196,27 @@ static void calc_min_max_cpu(void)
static void help(const char *name)
{
puts("");
printf("usage: %s [-h] [-u]\n", name);
printf("usage: %s [-h] [-u] [-l latency]\n", name);
printf(" -u: Don't sanity check the number of successful KVM_RUNs\n");
printf(" -l: Set /dev/cpu_dma_latency to suppress deep sleep states\n");
puts("");
exit(0);
}
int main(int argc, char *argv[])
{
int r, i, snapshot, opt, fd = -1, latency = -1;
bool skip_sanity_check = false;
int r, i, snapshot;
struct kvm_vm *vm;
struct kvm_vcpu *vcpu;
u32 cpu, rseq_cpu;
int opt;
while ((opt = getopt(argc, argv, "hu")) != -1) {
while ((opt = getopt(argc, argv, "hl:u")) != -1) {
switch (opt) {
case 'u':
skip_sanity_check = true;
case 'l':
latency = atoi_paranoid(optarg);
break;
case 'h':
default:
@ -243,6 +245,20 @@ int main(int argc, char *argv[])
pthread_create(&migration_thread, NULL, migration_worker,
(void *)(unsigned long)syscall(SYS_gettid));
if (latency >= 0) {
/*
* Writes to cpu_dma_latency persist only while the file is
* open, i.e. it allows userspace to provide guaranteed latency
* while running a workload. Keep the file open until the test
* completes, otherwise writing cpu_dma_latency is meaningless.
*/
fd = open("/dev/cpu_dma_latency", O_RDWR);
TEST_ASSERT(fd >= 0, __KVM_SYSCALL_ERROR("open() /dev/cpu_dma_latency", fd));
r = write(fd, &latency, 4);
TEST_ASSERT(r >= 1, "Error setting /dev/cpu_dma_latency");
}
for (i = 0; !done; i++) {
vcpu_run(vcpu);
TEST_ASSERT(get_ucall(vcpu, NULL) == UCALL_SYNC,
@ -278,6 +294,9 @@ int main(int argc, char *argv[])
"rseq CPU = %d, sched CPU = %d", rseq_cpu, cpu);
}
if (fd > 0)
close(fd);
/*
* Sanity check that the test was able to enter the guest a reasonable
* number of times, e.g. didn't get stalled too often/long waiting for
@ -293,8 +312,8 @@ int main(int argc, char *argv[])
TEST_ASSERT(skip_sanity_check || i > (NR_TASK_MIGRATIONS / 2),
"Only performed %d KVM_RUNs, task stalled too much?\n\n"
" Try disabling deep sleep states to reduce CPU wakeup latency,\n"
" e.g. via cpuidle.off=1 or setting /dev/cpu_dma_latency to '0',\n"
" or run with -u to disable this sanity check.", i);
" e.g. via cpuidle.off=1 or via -l <latency>, or run with -u to\n"
" disable this sanity check.", i);
pthread_join(migration_thread, NULL);

View File

@ -7,6 +7,7 @@
#include "kvm_util.h"
#include "processor.h"
#include "kselftest.h"
#define CPUID_MWAIT (1u << 3)
@ -14,6 +15,8 @@ enum monitor_mwait_testcases {
MWAIT_QUIRK_DISABLED = BIT(0),
MISC_ENABLES_QUIRK_DISABLED = BIT(1),
MWAIT_DISABLED = BIT(2),
CPUID_DISABLED = BIT(3),
TEST_MAX = CPUID_DISABLED * 2 - 1,
};
/*
@ -35,11 +38,19 @@ do { \
testcase, vector); \
} while (0)
static void guest_monitor_wait(int testcase)
static void guest_monitor_wait(void *arg)
{
int testcase = (int) (long) arg;
u8 vector;
GUEST_SYNC(testcase);
u64 val = rdmsr(MSR_IA32_MISC_ENABLE) & ~MSR_IA32_MISC_ENABLE_MWAIT;
if (!(testcase & MWAIT_DISABLED))
val |= MSR_IA32_MISC_ENABLE_MWAIT;
wrmsr(MSR_IA32_MISC_ENABLE, val);
__GUEST_ASSERT(this_cpu_has(X86_FEATURE_MWAIT) == !(testcase & MWAIT_DISABLED),
"Expected CPUID.MWAIT %s\n",
(testcase & MWAIT_DISABLED) ? "cleared" : "set");
/*
* Arbitrarily MONITOR this function, SVM performs fault checks before
@ -50,19 +61,6 @@ static void guest_monitor_wait(int testcase)
vector = kvm_asm_safe("mwait", "a"(guest_monitor_wait), "c"(0), "d"(0));
GUEST_ASSERT_MONITOR_MWAIT("MWAIT", testcase, vector);
}
static void guest_code(void)
{
guest_monitor_wait(MWAIT_DISABLED);
guest_monitor_wait(MWAIT_QUIRK_DISABLED | MWAIT_DISABLED);
guest_monitor_wait(MISC_ENABLES_QUIRK_DISABLED | MWAIT_DISABLED);
guest_monitor_wait(MISC_ENABLES_QUIRK_DISABLED);
guest_monitor_wait(MISC_ENABLES_QUIRK_DISABLED | MWAIT_QUIRK_DISABLED | MWAIT_DISABLED);
guest_monitor_wait(MISC_ENABLES_QUIRK_DISABLED | MWAIT_QUIRK_DISABLED);
GUEST_DONE();
}
@ -74,56 +72,64 @@ int main(int argc, char *argv[])
struct kvm_vm *vm;
struct ucall uc;
int testcase;
char test[80];
TEST_REQUIRE(this_cpu_has(X86_FEATURE_MWAIT));
TEST_REQUIRE(kvm_has_cap(KVM_CAP_DISABLE_QUIRKS2));
vm = vm_create_with_one_vcpu(&vcpu, guest_code);
vcpu_clear_cpuid_feature(vcpu, X86_FEATURE_MWAIT);
ksft_print_header();
ksft_set_plan(12);
for (testcase = 0; testcase <= TEST_MAX; testcase++) {
vm = vm_create_with_one_vcpu(&vcpu, guest_monitor_wait);
vcpu_args_set(vcpu, 1, (void *)(long)testcase);
disabled_quirks = 0;
if (testcase & MWAIT_QUIRK_DISABLED) {
disabled_quirks |= KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS;
strcpy(test, "MWAIT can fault");
} else {
strcpy(test, "MWAIT never faults");
}
if (testcase & MISC_ENABLES_QUIRK_DISABLED) {
disabled_quirks |= KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT;
strcat(test, ", MISC_ENABLE updates CPUID");
} else {
strcat(test, ", no CPUID updates");
}
vm_enable_cap(vm, KVM_CAP_DISABLE_QUIRKS2, disabled_quirks);
if (!(testcase & MISC_ENABLES_QUIRK_DISABLED) &&
(!!(testcase & CPUID_DISABLED) ^ !!(testcase & MWAIT_DISABLED)))
continue;
if (testcase & CPUID_DISABLED) {
strcat(test, ", CPUID clear");
vcpu_clear_cpuid_feature(vcpu, X86_FEATURE_MWAIT);
} else {
strcat(test, ", CPUID set");
vcpu_set_cpuid_feature(vcpu, X86_FEATURE_MWAIT);
}
if (testcase & MWAIT_DISABLED)
strcat(test, ", MWAIT disabled");
while (1) {
vcpu_run(vcpu);
TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
switch (get_ucall(vcpu, &uc)) {
case UCALL_SYNC:
testcase = uc.args[1];
break;
case UCALL_ABORT:
REPORT_GUEST_ASSERT(uc);
goto done;
/* Detected in vcpu_run */
break;
case UCALL_DONE:
goto done;
ksft_test_result_pass("%s\n", test);
break;
default:
TEST_FAIL("Unknown ucall %lu", uc.cmd);
goto done;
break;
}
disabled_quirks = 0;
if (testcase & MWAIT_QUIRK_DISABLED)
disabled_quirks |= KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS;
if (testcase & MISC_ENABLES_QUIRK_DISABLED)
disabled_quirks |= KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT;
vm_enable_cap(vm, KVM_CAP_DISABLE_QUIRKS2, disabled_quirks);
/*
* If the MISC_ENABLES quirk (KVM neglects to update CPUID to
* enable/disable MWAIT) is disabled, toggle the ENABLE_MWAIT
* bit in MISC_ENABLES accordingly. If the quirk is enabled,
* the only valid configuration is MWAIT disabled, as CPUID
* can't be manually changed after running the vCPU.
*/
if (!(testcase & MISC_ENABLES_QUIRK_DISABLED)) {
TEST_ASSERT(testcase & MWAIT_DISABLED,
"Can't toggle CPUID features after running vCPU");
continue;
}
vcpu_set_msr(vcpu, MSR_IA32_MISC_ENABLE,
(testcase & MWAIT_DISABLED) ? 0 : MSR_IA32_MISC_ENABLE_MWAIT);
}
done:
kvm_vm_free(vm);
}
ksft_finished();
return 0;
}

View File

@ -75,7 +75,7 @@ config KVM_COMPAT
depends on KVM && COMPAT && !(S390 || ARM64 || RISCV)
config HAVE_KVM_IRQ_BYPASS
bool
tristate
select IRQ_BYPASS_MANAGER
config HAVE_KVM_VCPU_ASYNC_IOCTL

View File

@ -149,7 +149,7 @@ irqfd_shutdown(struct work_struct *work)
/*
* It is now safe to release the object's resources
*/
#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
#if IS_ENABLED(CONFIG_HAVE_KVM_IRQ_BYPASS)
irq_bypass_unregister_consumer(&irqfd->consumer);
#endif
eventfd_ctx_put(irqfd->eventfd);
@ -274,7 +274,7 @@ static void irqfd_update(struct kvm *kvm, struct kvm_kernel_irqfd *irqfd)
write_seqcount_end(&irqfd->irq_entry_sc);
}
#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
#if IS_ENABLED(CONFIG_HAVE_KVM_IRQ_BYPASS)
void __attribute__((weak)) kvm_arch_irq_bypass_stop(
struct irq_bypass_consumer *cons)
{
@ -424,7 +424,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
if (events & EPOLLIN)
schedule_work(&irqfd->inject);
#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
#if IS_ENABLED(CONFIG_HAVE_KVM_IRQ_BYPASS)
if (kvm_arch_has_irq_bypass()) {
irqfd->consumer.token = (void *)irqfd->eventfd;
irqfd->consumer.add_producer = kvm_arch_irq_bypass_add_producer;
@ -609,14 +609,14 @@ void kvm_irq_routing_update(struct kvm *kvm)
spin_lock_irq(&kvm->irqfds.lock);
list_for_each_entry(irqfd, &kvm->irqfds.items, list) {
#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
#if IS_ENABLED(CONFIG_HAVE_KVM_IRQ_BYPASS)
/* Under irqfds.lock, so can read irq_entry safely */
struct kvm_kernel_irq_routing_entry old = irqfd->irq_entry;
#endif
irqfd_update(kvm, irqfd);
#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
#if IS_ENABLED(CONFIG_HAVE_KVM_IRQ_BYPASS)
if (irqfd->producer &&
kvm_arch_irqfd_route_changed(&old, &irqfd->irq_entry)) {
int ret = kvm_arch_update_irqfd_routing(