SVM AVIC/IOMMU AVIC improves interrupt performance on Zen(+)/2/etc based processors

* Some of the technical info may be wrong as am not an expert which is why I try to include as much sources as I can.

Eariler in the week I posted a thread on reddit about IOMMU AVIC getting some fixes/improvements allowing for easier general usage.

After some discussion it would seem the performance difference I found was due to something else - My gut feeling after testing and reviewing documenation to the best of my understanding is that it’s due to SVM AVIC which still can provide speed up to interrupt performance.

Documentation am I referring to is AMD64 Architecture Programmer’s Manual Volume 2: System Programming. This line being the reason I think SVM AVIC is working.

AVIC Enable—Virtual Interrupt Control, Bit 31. The AVIC hardware support may be enabled on
a per virtual processor basis. This bit determines whether or not AVIC is enabled for a particular virtual
processor. Any guest configured to use AVIC must also enable RVI (nested paging). Enabling AVIC
implicitly disables the V_IRQ, V_INTR_PRIO, V_IGN_TPR, and V_INTR_VECTOR fields in the
VMCB Control Word.

For people that can understand things here is the AMD I/O Virtualization Technology (IOMMU) Specification also.

To enable AVIC keep the below in mind -

  • avic=1 npt=1 needs to be added as part of kvm_amd module options. options kvm-amd nested=0 avic=1 npt=1. NPT is needed.

  • If using with a Windows guest hyperv stimer + synic is incompatible. If you are worried about timer performance (don’t be :slight_smile:) just ensure you have hypervclock and invtsc exposed in your cpu features.

    <cpu mode="host-passthrough" check="none">
     <feature policy="require" name="invtsc"/>
    </cpu>
    <clock offset="utc">
      <timer name="hypervclock" present="yes"/>
    </clock>
    
  • AVIC is deactivated when x2apic is enabled. This change is coming in Linux 5.7 so you will want to remove x2apic from your CPUID like so -

    <cpu mode="host-passthrough" check="none">
     <feature policy="disable" name="x2apic"/>
    </cpu>
    
  • AVIC does not work with nested virtualization
    Either disabled nested via kvm_amd options or remove svm from your CPUID like so -

    <cpu mode="host-passthrough" check="none">
     <feature policy="disable" name="svm"/>
    </cpu>
    
  • AVIC needs pit to be set as discard
    <timer name='pit' tickpolicy='discard'/>

  • Some other hyper-v enlightenments can get in the way of AVIC working optimally. vapic helps provide paravirtualized EOI processing which is in conflict with what SVM AVIC provides.

In particular, this enlightenment allows paravirtualized
(exit-less) EOI processing
.

hv-tlbflush/hv-ipi likely also would interfere but wasn’t tested as these are also things SVM AVIC helps to accelerate.
Nested related enlightenments wasn’t tested but don’t look like they should cause problems.
hv-reset/hv-vendor-id/hv-crash/hv-vpindex/hv-spinlocks/hv-relaxed
also look to be fine.

The patches to get things working (With some important fixes) were merged in Linux Kernel 5.6.

I made a patch for 5.5.13 tested applying against 5.5.13 arch/stable git/fedora sources (May work on older hasn’t older 5.5.x series but haven’t tested it. You will also want this patch if hasn’t being backported by Greg Kroah-Hartman) - https://pastebin.com/FmEc81zu

Patch was made using the merged changes from the kvm git tracking repo. Also included the GA Log tracepoint patch and these two fixes -
https://git.kernel.org/pub/scm/virt/kvm/kvm.git/commit/?h=for-linus&id=93fd9666c269877fffb74e14f52792d9c000c1f2
https://git.kernel.org/pub/scm/virt/kvm/kvm.git/commit/?h=for-linus&id=7943f4acea3caf0b6d5b6cdfce7d5a2b4a9aa608

Background info to this

AVIC (Advance Virtual Interrupt Controller) is AMD’s implementation of Advanced Programmable Interrupt Controller similar to Intel’s APICv. Main benefit for us causal/advanced users is it aims to improved interrupt performance. And unlike with Intel it’s not limited to only to HEDT/Server processors.

These patches added AVIC support some times ago

KVM: x86: Introduce SVM AVIC support

iommu/AMD: Introduce IOMMU AVIC support

However, until to now it hasn’t been easy to use as it had some limitations as best explained by Suravee Suthikulpanit from AMD who implemented the initial patch and follow ups.

kvm: x86: Support AMD SVM AVIC w/ in-kernel irqchip mode

The ‘commit 67034bb9dd5e (“KVM: SVM: Add irqchip_split() checks before enabling AVIC”)’ was introduced to fix miscellaneous boot-hang issues when enable AVIC. This is mainly due to AVIC hardware doest not #vmexit on write to LAPIC EOI register resulting in-kernel PIC and IOAPIC to wait and do not inject new interrupts (e.g. PIT, RTC). This limits AVIC to only work with kernel_irqchip=split mode, which is not currently enabled by default, and also required user-space to support split irqchip model, which might not be the case.

Please see the original reddit thread for full details in terms of things like performance differences measured in terms of latency in the original thread. If people want me however to copy the thread over here just let me know. Just remember it will make this post very long :smiley:

@wendell @gnif - This might interest you :wink:
Hello @futurefade :blush:

Edit 1 -

After some further investigating it is likely IOMMU AVIC isn’t working as I though in my original post on reddit and as Aiberia on reddit suspected. The performance difference I saw was from SVM AVIC.

Below is details from my investigation -

Using perf kvm --host top -p `pidof qemu-system-x86_64` here is what I found -

Linux -

   0.12%  [kvm_amd]  [k] avic_vcpu_put.part.0
   0.10%  [kvm_amd]  [k] avic_vcpu_load
   0.02%  [kvm_amd]  [k] avic_incomplete_ipi_interception
   0.01%  [kvm_amd]  [k] svm_deliver_avic_intr
   
   2.83%  [kernel]  [k] iommu_completion_wait
   0.87%  [kernel]  [k] __iommu_queue_command_sync
   0.16%  [kernel]  [k] amd_iommu_update_ga
   0.03%  [kernel]  [k] iommu_flush_irt

Windows -

   0.61%  [kvm_amd]  [k] svm_deliver_avic_intr
   0.05%  [kvm_amd]  [k] avic_vcpu_put.part.0
   0.02%  [kvm_amd]  [k] avic_vcpu_load
   0.14%  [kvm]      [k] kvm_emulate_wrmsr         

amd_iommu_update_ga references to this function

svm_deliver_avic_intr references to this function.

Edit 2 -
Added some more info on requirements for AVIC

Edit 3 -
Update on WIndows AVIC IOMMU & vapic/enlightments.

Windows AVIC IOMMU is now working as of this patch but performance doesn’t appear to be completely stable atm.

Edit 4 -
Patch above has been merged in Linux 5.6.13/5.4.41. To continue to use SVM AVIC either revert the patch from edit 3 or don’t upgrade your kernel.
Another thing to note is with AVIC IOMMU there seems to be some problems with some PCIe devices causing the guest to not boot. In testing this was a Mellanox Connect X3 card and for Aiber from reddit it was his Samsung 970(Not sure on what model) personally my Samsung 970 Evo has worked so it appears to be YMMV kind of thing until we know the cause of the issues.
If you want more detail on testing and have discord see this post I made in the VFIO discord

3 Likes

Heya, just want to report back some findings on kernel 5.6.13.

It seems like enabling hyper-v enlightenment vapic fixes the windows 10 locking up on boot. Though doesn’t that conflict what AVIC does? It does seems like IOMMU AVIC is enabled despite this enlightenment, based on your perf top logs example.

If you tried to passthrough a network controller, like an Intel X550T 10G NIC, your VM will lock up. However I knew that would’ve been an issue, cause it also locks up in my linux VM on kernel 5.6.12, if I tried to pass through. I had to set IOAPIC driver to qemu to make it work. It seems like AVIC IOMMU doesn’t like network NICs.
USBs and Graphics cards seems to pass through nicely.

I do have one issue, it seems like native_queued_spin_lock_slowpath is using majority of my pinned CPU’s. Quick google revealed network issue? I’ll sort this one out in the meantime.

Edit 1: Not network related, probably a quirk or something that was always there. I do not recommend to upgrade to 5.6.13 with this workaround. You might run into the same issue I have. E.g; native_queued_spin_lock_slowpath using up all your VM pinned CPU’s.

I’ll go back to path 5.6.12 to sanity check myself. One more note, my second Windows 10 doesn’t have this issue as severe as it does on my first Windows 10 VM. Though this one has 36 threads available and doesn’t have an NVME pass through.

Update 2: Doesn’t seem to exhibit the same issue, mainly due to fact that IOMMU AVIC is disabled. Sooo… I guess couple more minor kernel bumps before its stable.

@Kayant12 @FutureFade

I am just trying this out after a friend pointing me to the redit post. My Windows VM unfortunately locks up or goes into a boot loop with the following config, can you guys help me out?

I am on 5.8 Kernel by the way:

<domain xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0" type="kvm">
  <name>gaming</name>
  <uuid>be73bb69-b6b0-4ecb-8694-aa9460a0c11f</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://microsoft.com/win/10"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit="KiB">32768000</memory>
  <currentMemory unit="KiB">32768000</currentMemory>
  <vcpu placement="static">12</vcpu>
  <cputune>
    <vcpupin vcpu="0" cpuset="6"/>
    <vcpupin vcpu="1" cpuset="18"/>
    <vcpupin vcpu="2" cpuset="7"/>
    <vcpupin vcpu="3" cpuset="19"/>
    <vcpupin vcpu="4" cpuset="8"/>
    <vcpupin vcpu="5" cpuset="20"/>
    <vcpupin vcpu="6" cpuset="9"/>
    <vcpupin vcpu="7" cpuset="21"/>
    <vcpupin vcpu="8" cpuset="10"/>
    <vcpupin vcpu="9" cpuset="22"/>
    <vcpupin vcpu="10" cpuset="11"/>
    <vcpupin vcpu="11" cpuset="23"/>
    <emulatorpin cpuset="5,17"/>
  </cputune>
  <os>
    <type arch="x86_64" machine="pc-q35-5.0">hvm</type>
    <loader readonly="yes" type="pflash">/usr/share/edk2-ovmf/x64/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/gaming_VARS.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state="on"/>
      <spinlocks state="on" retries="8191"/>
    </hyperv>
    <vmport state="off"/>
  </features>
  <cpu mode="host-passthrough" check="none" migratable="on">
    <topology sockets="1" dies="1" cores="6" threads="2"/>
    <feature policy="require" name="topoext"/>
    <feature policy="require" name="invtsc"/>
    <feature policy="disable" name="amd-stibp"/>
    <feature policy="disable" name="monitor"/>
    <feature policy="disable" name="x2apic"/>
  </cpu>
  <clock offset="localtime">
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="discard"/>
    <timer name="hpet" present="no"/>
    <timer name="hypervclock" present="yes"/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled="no"/>
    <suspend-to-disk enabled="no"/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2" cache="none" io="native" discard="unmap" detect_zeroes="unmap"/>
      <source file="/mnt/SecondSSD/Virtualization/libvirt/images/gaming.qcow2"/>
      <target dev="vda" bus="virtio"/>
      <boot order="2"/>
      <address type="pci" domain="0x0000" bus="0x05" slot="0x00" function="0x0"/>
    </disk>
    <controller type="usb" index="0" model="qemu-xhci" ports="15">
      <address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
    </controller>
    <controller type="sata" index="0">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/>
    </controller>
    <controller type="pci" index="0" model="pcie-root"/>
    <controller type="pci" index="1" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="1" port="0x10"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="2" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="2" port="0x11"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x1"/>
    </controller>
    <controller type="pci" index="3" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="3" port="0x12"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x2"/>
    </controller>
    <controller type="pci" index="4" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="4" port="0x13"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x3"/>
    </controller>
    <controller type="pci" index="5" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="5" port="0x14"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x4"/>
    </controller>
    <controller type="pci" index="6" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="6" port="0x8"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="7" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="7" port="0x9"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x1"/>
    </controller>
    <controller type="pci" index="8" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="8" port="0xa"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x2"/>
    </controller>
    <controller type="pci" index="9" model="pcie-to-pci-bridge">
      <model name="pcie-pci-bridge"/>
      <address type="pci" domain="0x0000" bus="0x08" slot="0x00" function="0x0"/>
    </controller>
    <controller type="pci" index="10" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="10" port="0xb"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x3"/>
    </controller>
    <controller type="virtio-serial" index="0">
      <address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0"/>
    </controller>
    <interface type="network">
      <mac address="52:54:00:61:a9:08"/>
      <source network="default"/>
      <model type="e1000e"/>
      <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
    </interface>
    <channel type="spicevmc">
      <target type="virtio" name="com.redhat.spice.0"/>
      <address type="virtio-serial" controller="0" bus="0" port="1"/>
    </channel>
    <input type="mouse" bus="ps2"/>
    <input type="keyboard" bus="ps2"/>
    <input type="keyboard" bus="virtio">
      <address type="pci" domain="0x0000" bus="0x0a" slot="0x00" function="0x0"/>
    </input>
    <graphics type="spice" autoport="yes">
      <listen type="address"/>
      <image compression="off"/>
      <gl enable="no" rendernode="/dev/dri/by-path/pci-0000:0b:00.0-render"/>
    </graphics>
    <video>
      <model type="none"/>
    </video>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x0a" slot="0x00" function="0x0"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x06" slot="0x00" function="0x0"/>
    </hostdev>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x0a" slot="0x00" function="0x1"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x07" slot="0x00" function="0x0"/>
    </hostdev>
    <redirdev bus="usb" type="spicevmc">
      <address type="usb" bus="0" port="2"/>
    </redirdev>
    <redirdev bus="usb" type="spicevmc">
      <address type="usb" bus="0" port="3"/>
    </redirdev>
    <memballoon model="virtio">
      <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
    </memballoon>
    <shmem name="looking-glass">
      <model type="ivshmem-plain"/>
      <size unit="M">32</size>
      <address type="pci" domain="0x0000" bus="0x09" slot="0x01" function="0x0"/>
    </shmem>
  </devices>
  <qemu:commandline>
    <qemu:arg value="-device"/>
    <qemu:arg value="ich9-intel-hda,bus=pcie.0,addr=0x1b"/>
    <qemu:arg value="-device"/>
    <qemu:arg value="hda-micro,audiodev=hda"/>
    <qemu:arg value="-audiodev"/>
    <qemu:arg value="pa,id=hda,server=unix:/run/user/1000/pulse/native"/>
  </qemu:commandline>
</domain>

@Kayant12 I have been reading all your posts here and in Reddit, and although I am grateful, I cannot say I am happy too… :crazy_face:
A bit confused at the moment, and I would like to ask few questions:
Can you please post your XML, kernel parameters and if any, qemu script for starting the VM?
Also, can you please explain what was your goal in the first place?
I have low CPU performance under my VM (AMD 5900X) and ended up here trying to solve this. Ideally I would like to increase performance, without loosing the ability to enable HyperV under Windows (I think disabling SVM results in this).

Can you please advise?