tl;dr: I’m having network performance issues on my KVM guests, possibly related to SRIOV / vfio / irq handling?
Seeing a lot of these in the logs, well, ever since I started using SRIOV for networking in my guests
[3587751.438487] irq 150: Affinity broken due to vector space exhaustion.
[3639410.959080] irq 150: Affinity broken due to vector space exhaustion.
[3639410.960207] irq 150: Affinity broken due to vector space exhaustion.
[3639783.045026] irq 150: Affinity broken due to vector space exhaustion.
[3639783.045639] irq 149: Affinity broken due to vector space exhaustion.
[3639783.046873] irq 149: Affinity broken due to vector space exhaustion.
[3639783.047425] irq 150: Affinity broken due to vector space exhaustion.
[3639830.916299] irq 150: Affinity broken due to vector space exhaustion.
[3639830.917404] irq 150: Affinity broken due to vector space exhaustion.
[3639865.884657] irq 160: Affinity broken due to vector space exhaustion.
[3639865.885711] irq 160: Affinity broken due to vector space exhaustion.
I’m not seeing much by way of a fix, or even explanation as to what’s going on. I tried to make sense of it using this very old (from 2.4) doc, but at this hour, getting lost pretty easy.
The guests seem to “bog down” trying to do much over 20 megabytes/sec despite having a 10 gigabit connection. Heavy network traffic results in those log messages on the host. The IRQs referenced are assigned to the VFIO driver, with specific pci-id’s pointing at my network card’s virtual functions
It could also be unrelated to performance issues, is virtio networking an option?
Does irqbalance do anything for you?
(Background CPUs have hardware interrupts where literally they have a pin on the cpu reserved for such purpose, then they have software interrupts in total worth hardware and internal ones there can be max of 255 of them, it’s a limitation of the x86 architecture. With PCIe and modern apic there’s something called msi-x ; that makes interrupt processing slower, but each device gets up to 2048 possible interrupts values for itself. That means in theory a network card could have e.g. 1000 different queues to deliver packets from different ip/ports to different cores each with it’s own interrupt line).
7 VF, the card supports up to 63. Card is an Intel x520, using the stock Linux ixgbe driver.
Vitrio seems like a kludge compared to SR-IOV. I was happy to move away from using virtio and a host bridge.
Yes, q35 for the guests. Guests are pinned to specific CPU cores. I wonder if that’s part of the problem. I pinned CPUs based on which numa nodes own the GPUs, but the network card is on a different note if I recall correctly.
Hmm, there were a bunch of bugs around interrupt handling when q35-4.0 first came out, but that was a few months back and should be fixed by now. Maybe moving back and forth between different versions here helps.
Can you check if you have msi or msi-x in the guest lspci -vv output?
I don’t understand why it’s even bothering allocating so many legacy interrupts, it seems crazy - it shouldn’t be doing that at all.
I’m assuming both host and guest are running recent 5.4/5.5 mostly vanilla kernels? (for host / guest drivers)