Long story short, I have a gaming VM that I’m trying to get maximum performance out of. One of the ways I’m achieving that is to direct all interrupt sources to only send interrupts to cores that are not busy running games in a VM. (I’m using cores 0-3 for host stuff.)
This is easy enough:
# Send all interrupts to cores 0-3 by default
echo 0000000f | tee /proc/irq/default_smp_affinity
# Map all current interrupt sources to core 0-3
echo 0-3 | tee /proc/irq/*/smp_affinity_list
This works for everything EXCEPT interrupts coming from my NVME SSDs. These appear to be pre-mapped evenly across all my cores:
If I try to change the affinity of these interrupt sources, I get this error:
bash: line 1: echo: write error: Input/output error
Is there simply nothing I can do about these? Any time disk activity goes up, my virtualization cores are interrupted to handle the interrupts, causing stutters in my VM.
I’m on kernel 5.12.1.
Hardware:
Gigabyte Aorus X570 Master
Ryzen 5950X
3x Corsair Force Series MP510 960GB (LVM on top of md raid-0)
I asked this question on Serverfault months ago but I haven’t received any answers, so I’m hopeful somebody on this forum may have some insight. I just started a bounty on the Serverfault post, if somebody wants to grab 250 points.
Have you considered hitting the kernel mailing list with this question? I know mailing lists are outdated and cumbersome to deal with, but maby you can find and talk to the guy who actually coded the nvme interrupts.
A quick google search suggest that perhaps Keith Busch would have some familiarity of where to head.
If you find a solution, we’d appreciate an update!
Provides a mask of CPUs which irqbalance should ignore and never assign interrupts to. If not specified, irqbalance use mask of isolated and adaptive-ticks CPUs on the system as the default value.
Have you looked into setting this environmental variable?
That actually hadn’t occurred to me, I’ll do that!
I don’t think that will do anything. As far as I understand “irqbalance” is a userland daemon that automatically balances IRQ around my cores using /proc/irq. If I can’t do it manually, I don’t see why irqbalance would succeed.
I’ll give it a shot anyway though, just to be sure.
Meanwhile though, I found the documentation for the isolcpus kernel command line option:
managed_irq
Isolate from being targeted by managed interrupts
which have an interrupt mask containing isolated
CPUs. The affinity of managed interrupts is
handled by the kernel and cannot be changed via
the /proc/irq/* interfaces.
So I tried adding this to my kernel command line: isolcpus=managed_irq,8-31, aaaaand… It did absolutely nothing. Interrupts are still evenly spread across my cores, even the ones I’m allowed to move.
So I have very little working knowledge of IRQ stuff (only started looking into it when you brought it up the other day) but I wonder if for nvme the IRQ stuff is vestigial and we are looking in the wrong place since some adaptive/hybrid polling scheme might be what is actually being used now
Here’s what brought this up:
Unfortunately I don’t know enough yet to form a better question.