From fd3bc912d3d1ca3049beb9f905ee68df3b82282b Mon Sep 17 00:00:00 2001 From: Dave Martin Date: Fri, 28 Sep 2018 14:39:26 +0100 Subject: KVM: Documentation: Document arm64 core registers in detail Since the the sizes of individual members of the core arm64 registers vary, the list of register encodings that make sense is not a simple linear sequence. To clarify which encodings to use, this patch adds a brief list to the documentation. Signed-off-by: Dave Martin Reviewed-by: Julien Grall Reviewed-by: Peter Maydell Signed-off-by: Marc Zyngier --- Documentation/virtual/kvm/api.txt | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 7de9eee73fcd..2d4f7ce5e967 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2107,6 +2107,30 @@ contains elements ranging from 32 to 128 bits. The index is a 32bit value in the kvm_regs structure seen as a 32bit array. 0x60x0 0000 0010 +Specifically: + Encoding Register Bits kvm_regs member +---------------------------------------------------------------- + 0x6030 0000 0010 0000 X0 64 regs.regs[0] + 0x6030 0000 0010 0002 X1 64 regs.regs[1] + ... + 0x6030 0000 0010 003c X30 64 regs.regs[30] + 0x6030 0000 0010 003e SP 64 regs.sp + 0x6030 0000 0010 0040 PC 64 regs.pc + 0x6030 0000 0010 0042 PSTATE 64 regs.pstate + 0x6030 0000 0010 0044 SP_EL1 64 sp_el1 + 0x6030 0000 0010 0046 ELR_EL1 64 elr_el1 + 0x6030 0000 0010 0048 SPSR_EL1 64 spsr[KVM_SPSR_EL1] (alias SPSR_SVC) + 0x6030 0000 0010 004a SPSR_ABT 64 spsr[KVM_SPSR_ABT] + 0x6030 0000 0010 004c SPSR_UND 64 spsr[KVM_SPSR_UND] + 0x6030 0000 0010 004e SPSR_IRQ 64 spsr[KVM_SPSR_IRQ] + 0x6060 0000 0010 0050 SPSR_FIQ 64 spsr[KVM_SPSR_FIQ] + 0x6040 0000 0010 0054 V0 128 fp_regs.vregs[0] + 0x6040 0000 0010 0058 V1 128 fp_regs.vregs[1] + ... + 0x6040 0000 0010 00d0 V31 128 fp_regs.vregs[31] + 0x6020 0000 0010 00d4 FPSR 32 fp_regs.fpsr + 0x6020 0000 0010 00d5 FPCR 32 fp_regs.fpcr + arm64 CCSIDR registers are demultiplexed by CSSELR value: 0x6020 0000 0011 00 -- cgit v1.2.3 From 395f562f2b4cf9aef0db540d460b859fcde110b6 Mon Sep 17 00:00:00 2001 From: Dave Martin Date: Tue, 15 Jan 2019 17:02:08 +0000 Subject: KVM: Document errors for KVM_GET_ONE_REG and KVM_SET_ONE_REG KVM_GET_ONE_REG and KVM_SET_ONE_REG return some error codes that are not documented (but hopefully not surprising either). To give an indication of what these may mean, this patch adds brief documentation. Signed-off-by: Dave Martin Signed-off-by: Marc Zyngier --- Documentation/virtual/kvm/api.txt | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 2d4f7ce5e967..cd920dd1195c 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1871,6 +1871,9 @@ Architectures: all Type: vcpu ioctl Parameters: struct kvm_one_reg (in) Returns: 0 on success, negative value on failure +Errors: +  ENOENT:   no such register +  EINVAL:   other errors, such as bad size encoding for a known register struct kvm_one_reg { __u64 id; @@ -2192,6 +2195,9 @@ Architectures: all Type: vcpu ioctl Parameters: struct kvm_one_reg (in and out) Returns: 0 on success, negative value on failure +Errors: +  ENOENT:   no such register +  EINVAL:   other errors, such as bad size encoding for a known register This ioctl allows to receive the value of a single register implemented in a vcpu. The register to read is indicated by the "id" field of the -- cgit v1.2.3 From 50036ad06b7f31f7312b43752185e37cf1d0b663 Mon Sep 17 00:00:00 2001 From: Dave Martin Date: Fri, 28 Sep 2018 14:39:27 +0100 Subject: KVM: arm64/sve: Document KVM API extensions for SVE This patch adds sections to the KVM API documentation describing the extensions for supporting the Scalable Vector Extension (SVE) in guests. Signed-off-by: Dave Martin Signed-off-by: Marc Zyngier --- Documentation/virtual/kvm/api.txt | 132 +++++++++++++++++++++++++++++++++++++- 1 file changed, 129 insertions(+), 3 deletions(-) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index cd920dd1195c..68509dee23e8 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1873,6 +1873,7 @@ Parameters: struct kvm_one_reg (in) Returns: 0 on success, negative value on failure Errors:  ENOENT:   no such register +  EPERM:    register access forbidden for architecture-dependent reasons  EINVAL:   other errors, such as bad size encoding for a known register struct kvm_one_reg { @@ -2127,13 +2128,20 @@ Specifically: 0x6030 0000 0010 004c SPSR_UND 64 spsr[KVM_SPSR_UND] 0x6030 0000 0010 004e SPSR_IRQ 64 spsr[KVM_SPSR_IRQ] 0x6060 0000 0010 0050 SPSR_FIQ 64 spsr[KVM_SPSR_FIQ] - 0x6040 0000 0010 0054 V0 128 fp_regs.vregs[0] - 0x6040 0000 0010 0058 V1 128 fp_regs.vregs[1] + 0x6040 0000 0010 0054 V0 128 fp_regs.vregs[0] (*) + 0x6040 0000 0010 0058 V1 128 fp_regs.vregs[1] (*) ... - 0x6040 0000 0010 00d0 V31 128 fp_regs.vregs[31] + 0x6040 0000 0010 00d0 V31 128 fp_regs.vregs[31] (*) 0x6020 0000 0010 00d4 FPSR 32 fp_regs.fpsr 0x6020 0000 0010 00d5 FPCR 32 fp_regs.fpcr +(*) These encodings are not accepted for SVE-enabled vcpus. See + KVM_ARM_VCPU_INIT. + + The equivalent register content can be accessed via bits [127:0] of + the corresponding SVE Zn registers instead for vcpus that have SVE + enabled (see below). + arm64 CCSIDR registers are demultiplexed by CSSELR value: 0x6020 0000 0011 00 @@ -2143,6 +2151,61 @@ arm64 system registers have the following id bit patterns: arm64 firmware pseudo-registers have the following bit pattern: 0x6030 0000 0014 +arm64 SVE registers have the following bit patterns: + 0x6080 0000 0015 00 Zn bits[2048*slice + 2047 : 2048*slice] + 0x6050 0000 0015 04 Pn bits[256*slice + 255 : 256*slice] + 0x6050 0000 0015 060 FFR bits[256*slice + 255 : 256*slice] + 0x6060 0000 0015 ffff KVM_REG_ARM64_SVE_VLS pseudo-register + +Access to slices beyond the maximum vector length configured for the +vcpu (i.e., where 16 * slice >= max_vq (**)) will fail with ENOENT. + +These registers are only accessible on vcpus for which SVE is enabled. +See KVM_ARM_VCPU_INIT for details. + +In addition, except for KVM_REG_ARM64_SVE_VLS, these registers are not +accessible until the vcpu's SVE configuration has been finalized +using KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE). See KVM_ARM_VCPU_INIT +and KVM_ARM_VCPU_FINALIZE for more information about this procedure. + +KVM_REG_ARM64_SVE_VLS is a pseudo-register that allows the set of vector +lengths supported by the vcpu to be discovered and configured by +userspace. When transferred to or from user memory via KVM_GET_ONE_REG +or KVM_SET_ONE_REG, the value of this register is of type __u64[8], and +encodes the set of vector lengths as follows: + +__u64 vector_lengths[8]; + +if (vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX && + ((vector_lengths[(vq - 1) / 64] >> ((vq - 1) % 64)) & 1)) + /* Vector length vq * 16 bytes supported */ +else + /* Vector length vq * 16 bytes not supported */ + +(**) The maximum value vq for which the above condition is true is +max_vq. This is the maximum vector length available to the guest on +this vcpu, and determines which register slices are visible through +this ioctl interface. + +(See Documentation/arm64/sve.txt for an explanation of the "vq" +nomenclature.) + +KVM_REG_ARM64_SVE_VLS is only accessible after KVM_ARM_VCPU_INIT. +KVM_ARM_VCPU_INIT initialises it to the best set of vector lengths that +the host supports. + +Userspace may subsequently modify it if desired until the vcpu's SVE +configuration is finalized using KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE). + +Apart from simply removing all vector lengths from the host set that +exceed some value, support for arbitrarily chosen sets of vector lengths +is hardware-dependent and may not be available. Attempting to configure +an invalid set of vector lengths via KVM_SET_ONE_REG will fail with +EINVAL. + +After the vcpu's SVE configuration is finalized, further attempts to +write this register will fail with EPERM. + MIPS registers are mapped using the lower 32 bits. The upper 16 of that is the register group type: @@ -2197,6 +2260,7 @@ Parameters: struct kvm_one_reg (in and out) Returns: 0 on success, negative value on failure Errors:  ENOENT:   no such register +  EPERM:    register access forbidden for architecture-dependent reasons  EINVAL:   other errors, such as bad size encoding for a known register This ioctl allows to receive the value of a single register implemented @@ -2690,6 +2754,33 @@ Possible features: - KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU. Depends on KVM_CAP_ARM_PMU_V3. + - KVM_ARM_VCPU_SVE: Enables SVE for the CPU (arm64 only). + Depends on KVM_CAP_ARM_SVE. + Requires KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE): + + * After KVM_ARM_VCPU_INIT: + + - KVM_REG_ARM64_SVE_VLS may be read using KVM_GET_ONE_REG: the + initial value of this pseudo-register indicates the best set of + vector lengths possible for a vcpu on this host. + + * Before KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE): + + - KVM_RUN and KVM_GET_REG_LIST are not available; + + - KVM_GET_ONE_REG and KVM_SET_ONE_REG cannot be used to access + the scalable archietctural SVE registers + KVM_REG_ARM64_SVE_ZREG(), KVM_REG_ARM64_SVE_PREG() or + KVM_REG_ARM64_SVE_FFR; + + - KVM_REG_ARM64_SVE_VLS may optionally be written using + KVM_SET_ONE_REG, to modify the set of vector lengths available + for the vcpu. + + * After KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE): + + - the KVM_REG_ARM64_SVE_VLS pseudo-register is immutable, and can + no longer be written using KVM_SET_ONE_REG. 4.83 KVM_ARM_PREFERRED_TARGET @@ -3904,6 +3995,41 @@ number of valid entries in the 'entries' array, which is then filled. 'index' and 'flags' fields in 'struct kvm_cpuid_entry2' are currently reserved, userspace should not expect to get any particular value there. +4.119 KVM_ARM_VCPU_FINALIZE + +Capability: KVM_CAP_ARM_SVE +Architectures: arm, arm64 +Type: vcpu ioctl +Parameters: int feature (in) +Returns: 0 on success, -1 on error +Errors: + EPERM: feature not enabled, needs configuration, or already finalized + EINVAL: unknown feature + +Recognised values for feature: + arm64 KVM_ARM_VCPU_SVE + +Finalizes the configuration of the specified vcpu feature. + +The vcpu must already have been initialised, enabling the affected feature, by +means of a successful KVM_ARM_VCPU_INIT call with the appropriate flag set in +features[]. + +For affected vcpu features, this is a mandatory step that must be performed +before the vcpu is fully usable. + +Between KVM_ARM_VCPU_INIT and KVM_ARM_VCPU_FINALIZE, the feature may be +configured by use of ioctls such as KVM_SET_ONE_REG. The exact configuration +that should be performaned and how to do it are feature-dependent. + +Other calls that depend on a particular feature being finalized, such as +KVM_RUN, KVM_GET_REG_LIST, KVM_GET_ONE_REG and KVM_SET_ONE_REG, will fail with +-EPERM unless the feature has already been finalized by means of a +KVM_ARM_VCPU_FINALIZE call. + +See KVM_ARM_VCPU_INIT for details of vcpu features that require finalization +using this ioctl. + 5. The kvm_run structure ------------------------ -- cgit v1.2.3 From c110ae578ca0a10064dfbda3d786d6a733b9fe69 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Thu, 28 Mar 2019 17:24:03 +0100 Subject: kvm: move KVM_CAP_NR_MEMSLOTS to common code All architectures except MIPS were defining it in the same way, and memory slots are handled entirely by common code so there is no point in keeping the definition per-architecture. Signed-off-by: Paolo Bonzini --- Documentation/virtual/kvm/api.txt | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 67068c47c591..b62ad0d94234 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1117,9 +1117,8 @@ struct kvm_userspace_memory_region { This ioctl allows the user to create, modify or delete a guest physical memory slot. Bits 0-15 of "slot" specify the slot id and this value should be less than the maximum number of user memory slots supported per -VM. The maximum allowed slots can be queried using KVM_CAP_NR_MEMSLOTS, -if this capability is supported by the architecture. Slots may not -overlap in guest physical address space. +VM. The maximum allowed slots can be queried using KVM_CAP_NR_MEMSLOTS. +Slots may not overlap in guest physical address space. If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 of "slot" specifies the address space which is being modified. They must be -- cgit v1.2.3 From 13209ad0395c4de7fa48108b1dac72e341d5c089 Mon Sep 17 00:00:00 2001 From: Christian Borntraeger Date: Fri, 28 Dec 2018 09:33:35 +0100 Subject: KVM: s390: add MSA9 to cpumodel This enables stfle.155 and adds the subfunctions for KDSA. Bit 155 is added to the list of facilities that will be enabled when there is no cpu model involved as MSA9 requires no additional handling from userspace, e.g. for migration. Please note that a cpu model enabled user space can and will have the final decision on the facility bits for a guests. Signed-off-by: Christian Borntraeger Acked-by: Janosch Frank Reviewed-by: Collin Walling Reviewed-by: David Hildenbrand --- Documentation/virtual/kvm/devices/vm.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/devices/vm.txt b/Documentation/virtual/kvm/devices/vm.txt index 95ca68d663a4..4ffb82b02468 100644 --- a/Documentation/virtual/kvm/devices/vm.txt +++ b/Documentation/virtual/kvm/devices/vm.txt @@ -141,7 +141,8 @@ struct kvm_s390_vm_cpu_subfunc { u8 pcc[16]; # valid with Message-Security-Assist-Extension 4 u8 ppno[16]; # valid with Message-Security-Assist-Extension 5 u8 kma[16]; # valid with Message-Security-Assist-Extension 8 - u8 reserved[1808]; # reserved for future instructions + u8 kdsa[16]; # valid with Message-Security-Assist-Extension 9 + u8 reserved[1792]; # reserved for future instructions }; Parameters: address of a buffer to load the subfunction blocks from. -- cgit v1.2.3 From 4bd774e57b29f5bbf296d1daf69cc761e1e75fa8 Mon Sep 17 00:00:00 2001 From: Dave Martin Date: Thu, 11 Apr 2019 17:09:59 +0100 Subject: KVM: arm64/sve: Simplify KVM_REG_ARM64_SVE_VLS array sizing A complicated DIV_ROUND_UP() expression is currently written out explicitly in multiple places in order to specify the size of the bitmap exchanged with userspace to represent the value of the KVM_REG_ARM64_SVE_VLS pseudo-register. Userspace currently has no direct way to work this out either: for documentation purposes, the size is just quoted as 8 u64s. To make this more intuitive, this patch replaces these with a single define, which is also exported to userspace as KVM_ARM64_SVE_VLS_WORDS. Since the number of words in a bitmap is just the index of the last word used + 1, this patch expresses the bound that way instead. This should make it clearer what is being expressed. For userspace convenience, the minimum and maximum possible vector lengths relevant to the KVM ABI are exposed to UAPI as KVM_ARM64_SVE_VQ_MIN, KVM_ARM64_SVE_VQ_MAX. Since the only direct use for these at present is manipulation of KVM_REG_ARM64_SVE_VLS, no corresponding _VL_ macros are defined. They could be added later if a need arises. Since use of DIV_ROUND_UP() was the only reason for including in guest.c, this patch also removes that #include. Suggested-by: Andrew Jones Signed-off-by: Dave Martin Reviewed-by: Andrew Jones Signed-off-by: Marc Zyngier --- Documentation/virtual/kvm/api.txt | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 68509dee23e8..03df379a02b0 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2171,13 +2171,15 @@ and KVM_ARM_VCPU_FINALIZE for more information about this procedure. KVM_REG_ARM64_SVE_VLS is a pseudo-register that allows the set of vector lengths supported by the vcpu to be discovered and configured by userspace. When transferred to or from user memory via KVM_GET_ONE_REG -or KVM_SET_ONE_REG, the value of this register is of type __u64[8], and -encodes the set of vector lengths as follows: +or KVM_SET_ONE_REG, the value of this register is of type +__u64[KVM_ARM64_SVE_VLS_WORDS], and encodes the set of vector lengths as +follows: -__u64 vector_lengths[8]; +__u64 vector_lengths[KVM_ARM64_SVE_VLS_WORDS]; if (vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX && - ((vector_lengths[(vq - 1) / 64] >> ((vq - 1) % 64)) & 1)) + ((vector_lengths[(vq - KVM_ARM64_SVE_VQ_MIN) / 64] >> + ((vq - KVM_ARM64_SVE_VQ_MIN) % 64)) & 1)) /* Vector length vq * 16 bytes supported */ else /* Vector length vq * 16 bytes not supported */ -- cgit v1.2.3 From 9df2d660c7f37aed7244ec0b920c0749dbb69167 Mon Sep 17 00:00:00 2001 From: Dave Martin Date: Fri, 12 Apr 2019 13:28:05 +0100 Subject: KVM: Clarify capability requirements for KVM_ARM_VCPU_FINALIZE Userspace is only supposed to use KVM_ARM_VCPU_FINALIZE when there is some vcpu feature that can actually be finalized. This means that documenting KVM_ARM_VCPU_FINALIZE as available or not depending on the capabilities present is not helpful. This patch amends the documentation to describe availability in terms of which capability is required for each finalizable feature instead. In any case, userspace sees the same error (EINVAL) regardless of whether the given feature is not present or KVM_ARM_VCPU_FINALIZE is not implemented at all. No functional change. Suggested-by: Andrew Jones Signed-off-by: Dave Martin Reviewed-by: Andrew Jones Signed-off-by: Marc Zyngier --- Documentation/virtual/kvm/api.txt | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 03df379a02b0..5519df0d3ed0 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -3999,17 +3999,16 @@ userspace should not expect to get any particular value there. 4.119 KVM_ARM_VCPU_FINALIZE -Capability: KVM_CAP_ARM_SVE Architectures: arm, arm64 Type: vcpu ioctl Parameters: int feature (in) Returns: 0 on success, -1 on error Errors: EPERM: feature not enabled, needs configuration, or already finalized - EINVAL: unknown feature + EINVAL: feature unknown or not present Recognised values for feature: - arm64 KVM_ARM_VCPU_SVE + arm64 KVM_ARM_VCPU_SVE (requires KVM_CAP_ARM_SVE) Finalizes the configuration of the specified vcpu feature. -- cgit v1.2.3 From fe365b4ea6c0df3eb44d636c32c5210ae1e58364 Mon Sep 17 00:00:00 2001 From: Dave Martin Date: Fri, 12 Apr 2019 12:59:47 +0100 Subject: KVM: Clarify KVM_{SET,GET}_ONE_REG error code documentation The current error code documentation for KVM_GET_ONE_REG and KVM_SET_ONE_REG could be read as implying that all architectures implement these error codes, or that KVM guarantees which error code is returned in a particular situation. Because this is not really the case, this patch waters down the documentation explicitly to remove such guarantees. EPERM is marked as arm64-specific, since for now arm64 really is the only architecture that yields this error code for the finalization-required case. Keeping this as a distinct error code is useful however for debugging due to the statefulness of the API in this instance. No functional change. Suggested-by: Andrew Jones Fixes: 395f562f2b4c ("KVM: Document errors for KVM_GET_ONE_REG and KVM_SET_ONE_REG") Fixes: 50036ad06b7f ("KVM: arm64/sve: Document KVM API extensions for SVE") Signed-off-by: Dave Martin Reviewed-by: Andrew Jones Signed-off-by: Marc Zyngier --- Documentation/virtual/kvm/api.txt | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 5519df0d3ed0..818ac97fdabc 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1873,8 +1873,10 @@ Parameters: struct kvm_one_reg (in) Returns: 0 on success, negative value on failure Errors:  ENOENT:   no such register -  EPERM:    register access forbidden for architecture-dependent reasons -  EINVAL:   other errors, such as bad size encoding for a known register +  EINVAL:   invalid register ID, or no such register +  EPERM:    (arm64) register access not allowed before vcpu finalization +(These error codes are indicative only: do not rely on a specific error +code being returned in a specific situation.) struct kvm_one_reg { __u64 id; @@ -2260,10 +2262,12 @@ Architectures: all Type: vcpu ioctl Parameters: struct kvm_one_reg (in and out) Returns: 0 on success, negative value on failure -Errors: +Errors include:  ENOENT:   no such register -  EPERM:    register access forbidden for architecture-dependent reasons -  EINVAL:   other errors, such as bad size encoding for a known register +  EINVAL:   invalid register ID, or no such register +  EPERM:    (arm64) register access not allowed before vcpu finalization +(These error codes are indicative only: do not rely on a specific error +code being returned in a specific situation.) This ioctl allows to receive the value of a single register implemented in a vcpu. The register to read is indicated by the "id" field of the -- cgit v1.2.3 From 43b8e1f08938c0fd3b8924e846dba89863badc2f Mon Sep 17 00:00:00 2001 From: Dave Martin Date: Fri, 12 Apr 2019 13:25:38 +0100 Subject: KVM: arm64: Clarify access behaviour for out-of-range SVE register slice IDs The existing documentation for which SVE register slice IDs are considered out-of-range, and what happens when userspace tries to access them, is cryptic. This patch rewords the text with the aim of making it a bit easier to understand. No functional change. Suggested-by: Andrew Jones Signed-off-by: Dave Martin Reviewed-by: Andrew Jones Signed-off-by: Marc Zyngier --- Documentation/virtual/kvm/api.txt | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 818ac97fdabc..e410a9f0f0d4 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2159,8 +2159,9 @@ arm64 SVE registers have the following bit patterns: 0x6050 0000 0015 060 FFR bits[256*slice + 255 : 256*slice] 0x6060 0000 0015 ffff KVM_REG_ARM64_SVE_VLS pseudo-register -Access to slices beyond the maximum vector length configured for the -vcpu (i.e., where 16 * slice >= max_vq (**)) will fail with ENOENT. +Access to register IDs where 2048 * slice >= 128 * max_vq will fail with +ENOENT. max_vq is the vcpu's maximum supported vector length in 128-bit +quadwords: see (**) below. These registers are only accessible on vcpus for which SVE is enabled. See KVM_ARM_VCPU_INIT for details. -- cgit v1.2.3 From a22fa321d13b0264976cbbc1d22f4c27c41d3642 Mon Sep 17 00:00:00 2001 From: Amit Daniel Kachhap Date: Tue, 23 Apr 2019 10:12:36 +0530 Subject: KVM: arm64: Add userspace flag to enable pointer authentication Now that the building blocks of pointer authentication are present, lets add userspace flags KVM_ARM_VCPU_PTRAUTH_ADDRESS and KVM_ARM_VCPU_PTRAUTH_GENERIC. These flags will enable pointer authentication for the KVM guest on a per-vcpu basis through the ioctl KVM_ARM_VCPU_INIT. This features will allow the KVM guest to allow the handling of pointer authentication instructions or to treat them as undefined if not set. Necessary documentations are added to reflect the changes done. Reviewed-by: Dave Martin Signed-off-by: Amit Daniel Kachhap Cc: Mark Rutland Cc: Marc Zyngier Cc: Christoffer Dall Cc: kvmarm@lists.cs.columbia.edu Signed-off-by: Marc Zyngier --- Documentation/virtual/kvm/api.txt | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index e410a9f0f0d4..32afe7f5c35a 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2761,6 +2761,16 @@ Possible features: - KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU. Depends on KVM_CAP_ARM_PMU_V3. + - KVM_ARM_VCPU_PTRAUTH_ADDRESS: Enables Address Pointer authentication + for arm64 only. + Both KVM_ARM_VCPU_PTRAUTH_ADDRESS and KVM_ARM_VCPU_PTRAUTH_GENERIC + must be requested or neither must be requested. + + - KVM_ARM_VCPU_PTRAUTH_GENERIC: Enables Generic Pointer authentication + for arm64 only. + Both KVM_ARM_VCPU_PTRAUTH_ADDRESS and KVM_ARM_VCPU_PTRAUTH_GENERIC + must be requested or neither must be requested. + - KVM_ARM_VCPU_SVE: Enables SVE for the CPU (arm64 only). Depends on KVM_CAP_ARM_SVE. Requires KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE): -- cgit v1.2.3 From a243c16d18be130b17cf1064e9115de73bfdff5a Mon Sep 17 00:00:00 2001 From: Amit Daniel Kachhap Date: Tue, 23 Apr 2019 10:12:37 +0530 Subject: KVM: arm64: Add capability to advertise ptrauth for guest This patch advertises the capability of two cpu feature called address pointer authentication and generic pointer authentication. These capabilities depend upon system support for pointer authentication and VHE mode. The current arm64 KVM partially implements pointer authentication and support of address/generic authentication are tied together. However, separate ABI requirements for both of them is added so that any future isolated implementation will not require any ABI changes. Signed-off-by: Amit Daniel Kachhap Cc: Mark Rutland Cc: Marc Zyngier Cc: Christoffer Dall Cc: kvmarm@lists.cs.columbia.edu Signed-off-by: Marc Zyngier --- Documentation/virtual/kvm/api.txt | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 32afe7f5c35a..fac1887f25b5 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2763,13 +2763,19 @@ Possible features: - KVM_ARM_VCPU_PTRAUTH_ADDRESS: Enables Address Pointer authentication for arm64 only. - Both KVM_ARM_VCPU_PTRAUTH_ADDRESS and KVM_ARM_VCPU_PTRAUTH_GENERIC - must be requested or neither must be requested. + Depends on KVM_CAP_ARM_PTRAUTH_ADDRESS. + If KVM_CAP_ARM_PTRAUTH_ADDRESS and KVM_CAP_ARM_PTRAUTH_GENERIC are + both present, then both KVM_ARM_VCPU_PTRAUTH_ADDRESS and + KVM_ARM_VCPU_PTRAUTH_GENERIC must be requested or neither must be + requested. - KVM_ARM_VCPU_PTRAUTH_GENERIC: Enables Generic Pointer authentication for arm64 only. - Both KVM_ARM_VCPU_PTRAUTH_ADDRESS and KVM_ARM_VCPU_PTRAUTH_GENERIC - must be requested or neither must be requested. + Depends on KVM_CAP_ARM_PTRAUTH_GENERIC. + If KVM_CAP_ARM_PTRAUTH_ADDRESS and KVM_CAP_ARM_PTRAUTH_GENERIC are + both present, then both KVM_ARM_VCPU_PTRAUTH_ADDRESS and + KVM_ARM_VCPU_PTRAUTH_GENERIC must be requested or neither must be + requested. - KVM_ARM_VCPU_SVE: Enables SVE for the CPU (arm64 only). Depends on KVM_CAP_ARM_SVE. -- cgit v1.2.3 From 90c73795afa24890bd2ae4f3b359de04b4147d37 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:27 +0200 Subject: KVM: PPC: Book3S HV: Add a new KVM device for the XIVE native exploitation mode MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This is the basic framework for the new KVM device supporting the XIVE native exploitation mode. The user interface exposes a new KVM device to be created by QEMU, only available when running on a L0 hypervisor. Support for nested guests is not available yet. The XIVE device reuses the device structure of the XICS-on-XIVE device as they have a lot in common. That could possibly change in the future if the need arise. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/devices/xive.txt | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) create mode 100644 Documentation/virtual/kvm/devices/xive.txt (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt new file mode 100644 index 000000000000..fdbd2ff92a88 --- /dev/null +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -0,0 +1,19 @@ +POWER9 eXternal Interrupt Virtualization Engine (XIVE Gen1) +========================================================== + +Device types supported: + KVM_DEV_TYPE_XIVE POWER9 XIVE Interrupt Controller generation 1 + +This device acts as a VM interrupt controller. It provides the KVM +interface to configure the interrupt sources of a VM in the underlying +POWER9 XIVE interrupt controller. + +Only one XIVE instance may be instantiated. A guest XIVE device +requires a POWER9 host and the guest OS should have support for the +XIVE native exploitation interrupt mode. If not, it should run using +the legacy interrupt mode, referred as XICS (POWER7/8). + +* Groups: + + 1. KVM_DEV_XIVE_GRP_CTRL + Provides global controls on the device -- cgit v1.2.3 From eacc56bb9de3e6830ddc169553772cd6de59ee4c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:28 +0200 Subject: KVM: PPC: Book3S HV: XIVE: Introduce a new capability KVM_CAP_PPC_IRQ_XIVE MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The user interface exposes a new capability KVM_CAP_PPC_IRQ_XIVE to let QEMU connect the vCPU presenters to the XIVE KVM device if required. The capability is not advertised for now as the full support for the XIVE native exploitation mode is not yet available. When this is case, the capability will be advertised on PowerNV Hypervisors only. Nested guests (pseries KVM Hypervisor) are not supported. Internally, the interface to the new KVM device is protected with a new interrupt mode: KVMPPC_IRQ_XIVE. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/api.txt | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 67068c47c591..e38eb17b7be6 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -4504,6 +4504,15 @@ struct kvm_sync_regs { struct kvm_vcpu_events events; }; +6.75 KVM_CAP_PPC_IRQ_XIVE + +Architectures: ppc +Target: vcpu +Parameters: args[0] is the XIVE device fd + args[1] is the XIVE CPU number (server ID) for this vcpu + +This capability connects the vcpu to an in-kernel XIVE device. + 7. Capabilities that can be enabled on VMs ------------------------------------------ -- cgit v1.2.3 From 4131f83c3d64e591014dad14c7f8070c538b9422 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:29 +0200 Subject: KVM: PPC: Book3S HV: XIVE: add a control to initialize a source MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The XIVE KVM device maintains a list of interrupt sources for the VM which are allocated in the pool of generic interrupts (IPIs) of the main XIVE IC controller. These are used for the CPU IPIs as well as for virtual device interrupts. The IRQ number space is defined by QEMU. The XIVE device reuses the source structures of the XICS-on-XIVE device for the source blocks (2-level tree) and for the source interrupts. Under XIVE native, the source interrupt caches mostly configuration information and is less used than under the XICS-on-XIVE device in which hcalls are still necessary at run-time. When a source is initialized in KVM, an IPI interrupt source is simply allocated at the OPAL level and then MASKED. KVM only needs to know about its type: LSI or MSI. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/devices/xive.txt | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt index fdbd2ff92a88..cd8bfc37b72e 100644 --- a/Documentation/virtual/kvm/devices/xive.txt +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -17,3 +17,18 @@ the legacy interrupt mode, referred as XICS (POWER7/8). 1. KVM_DEV_XIVE_GRP_CTRL Provides global controls on the device + + 2. KVM_DEV_XIVE_GRP_SOURCE (write only) + Initializes a new source in the XIVE device and mask it. + Attributes: + Interrupt source number (64-bit) + The kvm_device_attr.addr points to a __u64 value: + bits: | 63 .... 2 | 1 | 0 + values: | unused | level | type + - type: 0:MSI 1:LSI + - level: assertion level in case of an LSI. + Errors: + -E2BIG: Interrupt source number is out of range + -ENOMEM: Could not create a new source block + -EFAULT: Invalid user pointer for attr->addr. + -ENXIO: Could not allocate underlying HW interrupt -- cgit v1.2.3 From e8676ce50e224d507946b1c535bc13584e6b49ff Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:30 +0200 Subject: KVM: PPC: Book3S HV: XIVE: Add a control to configure a source MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This control will be used by the H_INT_SET_SOURCE_CONFIG hcall from QEMU to configure the target of a source and also to restore the configuration of a source when migrating the VM. The XIVE source interrupt structure is extended with the value of the Effective Interrupt Source Number. The EISN is the interrupt number pushed in the event queue that the guest OS will use to dispatch events internally. Caching the EISN value in KVM eases the test when checking if a reconfiguration is indeed needed. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/devices/xive.txt | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt index cd8bfc37b72e..33c64b2cdbe8 100644 --- a/Documentation/virtual/kvm/devices/xive.txt +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -32,3 +32,24 @@ the legacy interrupt mode, referred as XICS (POWER7/8). -ENOMEM: Could not create a new source block -EFAULT: Invalid user pointer for attr->addr. -ENXIO: Could not allocate underlying HW interrupt + + 3. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only) + Configures source targeting + Attributes: + Interrupt source number (64-bit) + The kvm_device_attr.addr points to a __u64 value: + bits: | 63 .... 33 | 32 | 31 .. 3 | 2 .. 0 + values: | eisn | mask | server | priority + - priority: 0-7 interrupt priority level + - server: CPU number chosen to handle the interrupt + - mask: mask flag (unused) + - eisn: Effective Interrupt Source Number + Errors: + -ENOENT: Unknown source number + -EINVAL: Not initialized source number + -EINVAL: Invalid priority + -EINVAL: Invalid CPU number. + -EFAULT: Invalid user pointer for attr->addr. + -ENXIO: CPU event queues not configured or configuration of the + underlying HW interrupt failed + -EBUSY: No CPU available to serve interrupt -- cgit v1.2.3 From 13ce3297c5766b9541b6a7a255794c5168a7ae1a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:31 +0200 Subject: KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit These controls will be used by the H_INT_SET_QUEUE_CONFIG and H_INT_GET_QUEUE_CONFIG hcalls from QEMU to configure the underlying Event Queue in the XIVE IC. They will also be used to restore the configuration of the XIVE EQs and to capture the internal run-time state of the EQs. Both 'get' and 'set' rely on an OPAL call to access the EQ toggle bit and EQ index which are updated by the XIVE IC when event notifications are enqueued in the EQ. The value of the guest physical address of the event queue is saved in the XIVE internal xive_q structure for later use. That is when migration needs to mark the EQ pages dirty to capture a consistent memory state of the VM. To be noted that H_INT_SET_QUEUE_CONFIG does not require the extra OPAL call setting the EQ toggle bit and EQ index to configure the EQ, but restoring the EQ state will. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/devices/xive.txt | 34 ++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt index 33c64b2cdbe8..cc13bfd5cf53 100644 --- a/Documentation/virtual/kvm/devices/xive.txt +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -53,3 +53,37 @@ the legacy interrupt mode, referred as XICS (POWER7/8). -ENXIO: CPU event queues not configured or configuration of the underlying HW interrupt failed -EBUSY: No CPU available to serve interrupt + + 4. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write) + Configures an event queue of a CPU + Attributes: + EQ descriptor identifier (64-bit) + The EQ descriptor identifier is a tuple (server, priority) : + bits: | 63 .... 32 | 31 .. 3 | 2 .. 0 + values: | unused | server | priority + The kvm_device_attr.addr points to : + struct kvm_ppc_xive_eq { + __u32 flags; + __u32 qshift; + __u64 qaddr; + __u32 qtoggle; + __u32 qindex; + __u8 pad[40]; + }; + - flags: queue flags + KVM_XIVE_EQ_ALWAYS_NOTIFY (required) + forces notification without using the coalescing mechanism + provided by the XIVE END ESBs. + - qshift: queue size (power of 2) + - qaddr: real address of queue + - qtoggle: current queue toggle bit + - qindex: current queue index + - pad: reserved for future use + Errors: + -ENOENT: Invalid CPU number + -EINVAL: Invalid priority + -EINVAL: Invalid flags + -EINVAL: Invalid queue size + -EINVAL: Invalid queue address + -EFAULT: Invalid user pointer for attr->addr. + -EIO: Configuration of the underlying HW failed -- cgit v1.2.3 From 5ca806474859a0e94584b3a63f9509a25758408e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:32 +0200 Subject: KVM: PPC: Book3S HV: XIVE: Add a global reset control MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This control is to be used by the H_INT_RESET hcall from QEMU. Its purpose is to clear all configuration of the sources and EQs. This is necessary in case of a kexec (for a kdump kernel for instance) to make sure that no remaining configuration is left from the previous boot setup so that the new kernel can start safely from a clean state. The queue 7 is ignored when the XIVE device is configured to run in single escalation mode. Prio 7 is used by escalations. The XIVE VP is kept enabled as the vCPU is still active and connected to the XIVE device. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/devices/xive.txt | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt index cc13bfd5cf53..429cbc4cf960 100644 --- a/Documentation/virtual/kvm/devices/xive.txt +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -17,6 +17,11 @@ the legacy interrupt mode, referred as XICS (POWER7/8). 1. KVM_DEV_XIVE_GRP_CTRL Provides global controls on the device + Attributes: + 1.1 KVM_DEV_XIVE_RESET (write only) + Resets the interrupt controller configuration for sources and event + queues. To be used by kexec and kdump. + Errors: none 2. KVM_DEV_XIVE_GRP_SOURCE (write only) Initializes a new source in the XIVE device and mask it. -- cgit v1.2.3 From 7b46b6169ab80f8f415a0ca2ea4aa7f1afdcc4f3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:33 +0200 Subject: KVM: PPC: Book3S HV: XIVE: Add a control to sync the sources MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This control will be used by the H_INT_SYNC hcall from QEMU to flush event notifications on the XIVE IC owning the source. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/devices/xive.txt | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt index 429cbc4cf960..1e7f19d7594b 100644 --- a/Documentation/virtual/kvm/devices/xive.txt +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -92,3 +92,11 @@ the legacy interrupt mode, referred as XICS (POWER7/8). -EINVAL: Invalid queue address -EFAULT: Invalid user pointer for attr->addr. -EIO: Configuration of the underlying HW failed + + 5. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only) + Synchronize the source to flush event notifications + Attributes: + Interrupt source number (64-bit) + Errors: + -ENOENT: Unknown source number + -EINVAL: Not initialized source number -- cgit v1.2.3 From e6714bd1671da9d8dfb5332075df251b746fd0fd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:34 +0200 Subject: KVM: PPC: Book3S HV: XIVE: Add a control to dirty the XIVE EQ pages MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When migration of a VM is initiated, a first copy of the RAM is transferred to the destination before the VM is stopped, but there is no guarantee that the EQ pages in which the event notifications are queued have not been modified. To make sure migration will capture a consistent memory state, the XIVE device should perform a XIVE quiesce sequence to stop the flow of event notifications and stabilize the EQs. This is the purpose of the KVM_DEV_XIVE_EQ_SYNC control which will also marks the EQ pages dirty to force their transfer. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/devices/xive.txt | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt index 1e7f19d7594b..7ffd4c7be7b5 100644 --- a/Documentation/virtual/kvm/devices/xive.txt +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -23,6 +23,12 @@ the legacy interrupt mode, referred as XICS (POWER7/8). queues. To be used by kexec and kdump. Errors: none + 1.2 KVM_DEV_XIVE_EQ_SYNC (write only) + Sync all the sources and queues and mark the EQ pages dirty. This + to make sure that a consistent memory state is captured when + migrating the VM. + Errors: none + 2. KVM_DEV_XIVE_GRP_SOURCE (write only) Initializes a new source in the XIVE device and mask it. Attributes: @@ -100,3 +106,26 @@ the legacy interrupt mode, referred as XICS (POWER7/8). Errors: -ENOENT: Unknown source number -EINVAL: Not initialized source number + +* Migration: + + Saving the state of a VM using the XIVE native exploitation mode + should follow a specific sequence. When the VM is stopped : + + 1. Mask all sources (PQ=01) to stop the flow of events. + + 2. Sync the XIVE device with the KVM control KVM_DEV_XIVE_EQ_SYNC to + flush any in-flight event notification and to stabilize the EQs. At + this stage, the EQ pages are marked dirty to make sure they are + transferred in the migration sequence. + + 3. Capture the state of the source targeting, the EQs configuration + and the state of thread interrupt context registers. + + Restore is similar : + + 1. Restore the EQ configuration. As targeting depends on it. + 2. Restore targeting + 3. Restore the thread interrupt contexts + 4. Restore the source states + 5. Let the vCPU run -- cgit v1.2.3 From e4945b9da52b36052b7c509ca31c5ead1d165b24 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:35 +0200 Subject: KVM: PPC: Book3S HV: XIVE: Add get/set accessors for the VP XIVE state MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The state of the thread interrupt management registers needs to be collected for migration. These registers are cached under the 'xive_saved_state.w01' field of the VCPU when the VPCU context is pulled from the HW thread. An OPAL call retrieves the backup of the IPB register in the underlying XIVE NVT structure and merges it in the KVM state. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/api.txt | 1 + Documentation/virtual/kvm/devices/xive.txt | 17 +++++++++++++++++ 2 files changed, 18 insertions(+) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index e38eb17b7be6..5b505520a616 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1985,6 +1985,7 @@ registers, find a list below: PPC | KVM_REG_PPC_TLB3PS | 32 PPC | KVM_REG_PPC_EPTCFG | 32 PPC | KVM_REG_PPC_ICP_STATE | 64 + PPC | KVM_REG_PPC_VP_STATE | 128 PPC | KVM_REG_PPC_TB_OFFSET | 64 PPC | KVM_REG_PPC_SPMC1 | 32 PPC | KVM_REG_PPC_SPMC2 | 32 diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt index 7ffd4c7be7b5..525d1eebcf34 100644 --- a/Documentation/virtual/kvm/devices/xive.txt +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -107,6 +107,23 @@ the legacy interrupt mode, referred as XICS (POWER7/8). -ENOENT: Unknown source number -EINVAL: Not initialized source number +* VCPU state + + The XIVE IC maintains VP interrupt state in an internal structure + called the NVT. When a VP is not dispatched on a HW processor + thread, this structure can be updated by HW if the VP is the target + of an event notification. + + It is important for migration to capture the cached IPB from the NVT + as it synthesizes the priorities of the pending interrupts. We + capture a bit more to report debug information. + + KVM_REG_PPC_VP_STATE (2 * 64bits) + bits: | 63 .... 32 | 31 .... 0 | + values: | TIMA word0 | TIMA word1 | + bits: | 127 .......... 64 | + values: | unused | + * Migration: Saving the state of a VM using the XIVE native exploitation mode -- cgit v1.2.3 From 39e9af3de5ca936098bc80ebe14401426673c208 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:37 +0200 Subject: KVM: PPC: Book3S HV: XIVE: Add a TIMA mapping MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Each thread has an associated Thread Interrupt Management context composed of a set of registers. These registers let the thread handle priority management and interrupt acknowledgment. The most important are : - Interrupt Pending Buffer (IPB) - Current Processor Priority (CPPR) - Notification Source Register (NSR) They are exposed to software in four different pages each proposing a view with a different privilege. The first page is for the physical thread context and the second for the hypervisor. Only the third (operating system) and the fourth (user level) are exposed the guest. A custom VM fault handler will populate the VMA with the appropriate pages, which should only be the OS page for now. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/devices/xive.txt | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt index 525d1eebcf34..0cd7847ec38a 100644 --- a/Documentation/virtual/kvm/devices/xive.txt +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -13,6 +13,29 @@ requires a POWER9 host and the guest OS should have support for the XIVE native exploitation interrupt mode. If not, it should run using the legacy interrupt mode, referred as XICS (POWER7/8). +* Device Mappings + + The KVM device exposes different MMIO ranges of the XIVE HW which + are required for interrupt management. These are exposed to the + guest in VMAs populated with a custom VM fault handler. + + 1. Thread Interrupt Management Area (TIMA) + + Each thread has an associated Thread Interrupt Management context + composed of a set of registers. These registers let the thread + handle priority management and interrupt acknowledgment. The most + important are : + + - Interrupt Pending Buffer (IPB) + - Current Processor Priority (CPPR) + - Notification Source Register (NSR) + + They are exposed to software in four different pages each proposing + a view with a different privilege. The first page is for the + physical thread context and the second for the hypervisor. Only the + third (operating system) and the fourth (user level) are exposed the + guest. + * Groups: 1. KVM_DEV_XIVE_GRP_CTRL -- cgit v1.2.3 From 6520ca64cde71b75dae54f3fcb33517a93d82486 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:38 +0200 Subject: KVM: PPC: Book3S HV: XIVE: Add a mapping for the source ESB pages MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Each source is associated with an Event State Buffer (ESB) with a even/odd pair of pages which provides commands to manage the source: to trigger, to EOI, to turn off the source for instance. The custom VM fault handler will deduce the guest IRQ number from the offset of the fault, and the ESB page of the associated XIVE interrupt will be inserted into the VMA using the internal structure caching information on the interrupts. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/devices/xive.txt | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt index 0cd7847ec38a..69ee62d3d4dc 100644 --- a/Documentation/virtual/kvm/devices/xive.txt +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -36,6 +36,13 @@ the legacy interrupt mode, referred as XICS (POWER7/8). third (operating system) and the fourth (user level) are exposed the guest. + 2. Event State Buffer (ESB) + + Each source is associated with an Event State Buffer (ESB) with + either a pair of even/odd pair of pages which provides commands to + manage the source: to trigger, to EOI, to turn off the source for + instance. + * Groups: 1. KVM_DEV_XIVE_GRP_CTRL -- cgit v1.2.3 From 232b984b7d55e68971962f07f1dd1d1eb1be52e0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:39 +0200 Subject: KVM: PPC: Book3S HV: XIVE: Add passthrough support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The KVM XICS-over-XIVE device and the proposed KVM XIVE native device implement an IRQ space for the guest using the generic IPI interrupts of the XIVE IC controller. These interrupts are allocated at the OPAL level and "mapped" into the guest IRQ number space in the range 0-0x1FFF. Interrupt management is performed in the XIVE way: using loads and stores on the addresses of the XIVE IPI interrupt ESB pages. Both KVM devices share the same internal structure caching information on the interrupts, among which the xive_irq_data struct containing the addresses of the IPI ESB pages and an extra one in case of pass-through. The later contains the addresses of the ESB pages of the underlying HW controller interrupts, PHB4 in all cases for now. A guest, when running in the XICS legacy interrupt mode, lets the KVM XICS-over-XIVE device "handle" interrupt management, that is to perform the loads and stores on the addresses of the ESB pages of the guest interrupts. However, when running in XIVE native exploitation mode, the KVM XIVE native device exposes the interrupt ESB pages to the guest and lets the guest perform directly the loads and stores. The VMA exposing the ESB pages make use of a custom VM fault handler which role is to populate the VMA with appropriate pages. When a fault occurs, the guest IRQ number is deduced from the offset, and the ESB pages of associated XIVE IPI interrupt are inserted in the VMA (using the internal structure caching information on the interrupts). Supporting device passthrough in the guest running in XIVE native exploitation mode adds some extra refinements because the ESB pages of a different HW controller (PHB4) need to be exposed to the guest along with the initial IPI ESB pages of the XIVE IC controller. But the overall mechanic is the same. When the device HW irqs are mapped into or unmapped from the guest IRQ number space, the passthru_irq helpers, kvmppc_xive_set_mapped() and kvmppc_xive_clr_mapped(), are called to record or clear the passthrough interrupt information and to perform the switch. The approach taken by this patch is to clear the ESB pages of the guest IRQ number being mapped and let the VM fault handler repopulate. The handler will insert the ESB page corresponding to the HW interrupt of the device being passed-through or the initial IPI ESB page if the device is being removed. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/devices/xive.txt | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt index 69ee62d3d4dc..9a24a4525253 100644 --- a/Documentation/virtual/kvm/devices/xive.txt +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -43,6 +43,25 @@ the legacy interrupt mode, referred as XICS (POWER7/8). manage the source: to trigger, to EOI, to turn off the source for instance. + 3. Device pass-through + + When a device is passed-through into the guest, the source + interrupts are from a different HW controller (PHB4) and the ESB + pages exposed to the guest should accommadate this change. + + The passthru_irq helpers, kvmppc_xive_set_mapped() and + kvmppc_xive_clr_mapped() are called when the device HW irqs are + mapped into or unmapped from the guest IRQ number space. The KVM + device extends these helpers to clear the ESB pages of the guest IRQ + number being mapped and then lets the VM fault handler repopulate. + The handler will insert the ESB page corresponding to the HW + interrupt of the device being passed-through or the initial IPI ESB + page if the device has being removed. + + The ESB remapping is fully transparent to the guest and the OS + device driver. All handling is done within VFIO and the above + helpers in KVM-PPC. + * Groups: 1. KVM_DEV_XIVE_GRP_CTRL -- cgit v1.2.3 From 3a1e5e4a2c7a1ec63e6d70a7e921f62bcbb57b85 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= Date: Mon, 29 Apr 2019 15:25:35 +0200 Subject: Revert "KVM: doc: Document the life cycle of a VM and its resources" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This reverts commit 919f6cd8bb2fe7151f8aecebc3b3d1ca2567396e. The patch was applied twice. The first commit is eca6be566d47029f945a5f8e1c94d374e31df2ca. Reported-by: Cornelia Huck Signed-off-by: Radim Krčmář Signed-off-by: Paolo Bonzini --- Documentation/virtual/kvm/api.txt | 17 ----------------- 1 file changed, 17 deletions(-) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index b62ad0d94234..26dc1280b49b 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -69,23 +69,6 @@ by and on behalf of the VM's process may not be freed/unaccounted when the VM is shut down. -It is important to note that althought VM ioctls may only be issued from -the process that created the VM, a VM's lifecycle is associated with its -file descriptor, not its creator (process). In other words, the VM and -its resources, *including the associated address space*, are not freed -until the last reference to the VM's file descriptor has been released. -For example, if fork() is issued after ioctl(KVM_CREATE_VM), the VM will -not be freed until both the parent (original) process and its child have -put their references to the VM's file descriptor. - -Because a VM's resources are not freed until the last reference to its -file descriptor is released, creating additional references to a VM via -via fork(), dup(), etc... without careful consideration is strongly -discouraged and may have unwanted side effects, e.g. memory allocated -by and on behalf of the VM's process may not be freed/unaccounted when -the VM is shut down. - - 3. Extensions ------------- -- cgit v1.2.3 From 65c4189de8c1d995f6bc2cc96b22206405466b53 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Wed, 17 Apr 2019 15:28:44 +0200 Subject: KVM: fix KVM_CLEAR_DIRTY_LOG for memory slots of unaligned size If a memory slot's size is not a multiple of 64 pages (256K), then the KVM_CLEAR_DIRTY_LOG API is unusable: clearing the final 64 pages either requires the requested page range to go beyond memslot->npages, or requires log->num_pages to be unaligned, and kvm_clear_dirty_log_protect requires log->num_pages to be both in range and aligned. To allow this case, allow log->num_pages not to be a multiple of 64 if it ends exactly on the last page of the slot. Reported-by: Peter Xu Fixes: 98938aa8edd6 ("KVM: validate userspace input in kvm_clear_dirty_log_protect()", 2019-01-02) Signed-off-by: Paolo Bonzini --- Documentation/virtual/kvm/api.txt | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 26dc1280b49b..675cb0bea903 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -3812,8 +3812,9 @@ The ioctl clears the dirty status of pages in a memory slot, according to the bitmap that is passed in struct kvm_clear_dirty_log's dirty_bitmap field. Bit 0 of the bitmap corresponds to page "first_page" in the memory slot, and num_pages is the size in bits of the input bitmap. -Both first_page and num_pages must be a multiple of 64. For each bit -that is set in the input bitmap, the corresponding page is marked "clean" +first_page must be a multiple of 64; num_pages must also be a multiple of +64 unless first_page + num_pages is the size of the memory slot. For each +bit that is set in the input bitmap, the corresponding page is marked "clean" in KVM's dirty bitmap, and dirty tracking is re-enabled for that page (for example via write-protection, or by clearing the dirty bit in a page table entry). -- cgit v1.2.3 From d7547c55cbe7471255ca51f14bcd4699f5eaabe5 Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Wed, 8 May 2019 17:15:47 +0800 Subject: KVM: Introduce KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 The previous KVM_CAP_MANUAL_DIRTY_LOG_PROTECT has some problem which blocks the correct usage from userspace. Obsolete the old one and introduce a new capability bit for it. Suggested-by: Paolo Bonzini Signed-off-by: Peter Xu Signed-off-by: Paolo Bonzini --- Documentation/virtual/kvm/api.txt | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) (limited to 'Documentation/virtual') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 675cb0bea903..47a5eb00bc53 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -330,7 +330,7 @@ They must be less than the value that KVM_CHECK_EXTENSION returns for the KVM_CAP_MULTI_ADDRESS_SPACE capability. The bits in the dirty bitmap are cleared before the ioctl returns, unless -KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is enabled. For more information, +KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is enabled. For more information, see the description of the capability. 4.9 KVM_SET_MEMORY_ALIAS @@ -3791,7 +3791,7 @@ to I/O ports. 4.117 KVM_CLEAR_DIRTY_LOG (vm ioctl) -Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT +Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 Architectures: x86 Type: vm ioctl Parameters: struct kvm_dirty_log (in) @@ -3824,10 +3824,10 @@ the address space for which you want to return the dirty bitmap. They must be less than the value that KVM_CHECK_EXTENSION returns for the KVM_CAP_MULTI_ADDRESS_SPACE capability. -This ioctl is mostly useful when KVM_CAP_MANUAL_DIRTY_LOG_PROTECT +This ioctl is mostly useful when KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is enabled; for more information, see the description of the capability. However, it can always be used as long as KVM_CHECK_EXTENSION confirms -that KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is present. +that KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is present. 4.118 KVM_GET_SUPPORTED_HV_CPUID @@ -4780,7 +4780,7 @@ and injected exceptions. * For the new DR6 bits, note that bit 16 is set iff the #DB exception will clear DR6.RTM. -7.18 KVM_CAP_MANUAL_DIRTY_LOG_PROTECT +7.18 KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 Architectures: all Parameters: args[0] whether feature should be enabled or not @@ -4803,6 +4803,11 @@ while userspace can see false reports of dirty pages. Manual reprotection helps reducing this time, improving guest performance and reducing the number of dirty log false positives. +KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 was previously available under the name +KVM_CAP_MANUAL_DIRTY_LOG_PROTECT, but the implementation had bugs that make +it hard or impossible to use it correctly. The availability of +KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 signals that those bugs are fixed. +Userspace should not try to use KVM_CAP_MANUAL_DIRTY_LOG_PROTECT. 8. Other capabilities. ---------------------- -- cgit v1.2.3