Threadripper 5000 VM performance issues - do I need to worry about CCX'es?

I’m setting up TR 5965wx (24c/48t) based hypervisor and I’m having some consistency issues. Performance inside VMs is all over the place. I tried binning CPUs but that made it even worse, however there’s a chance I did very bad job at binning CPUs.

I have added following config to my VM:

<vcpu placement="static">24</vcpu>
  <cputune>
    <vcpupin vcpu="0" cpuset="6"/>
    <vcpupin vcpu="1" cpuset="30"/>
    <vcpupin vcpu="2" cpuset="7"/>
    <vcpupin vcpu="3" cpuset="31"/>
    <vcpupin vcpu="4" cpuset="8"/>
    <vcpupin vcpu="5" cpuset="32"/>
    <vcpupin vcpu="6" cpuset="9"/>
    <vcpupin vcpu="7" cpuset="33"/>
    <vcpupin vcpu="8" cpuset="10"/>
    <vcpupin vcpu="9" cpuset="34"/>
    <vcpupin vcpu="10" cpuset="11"/>
    <vcpupin vcpu="11" cpuset="35"/>
    <vcpupin vcpu="12" cpuset="12"/>
    <vcpupin vcpu="13" cpuset="36"/>
    <vcpupin vcpu="14" cpuset="13"/>
    <vcpupin vcpu="15" cpuset="37"/>
    <vcpupin vcpu="16" cpuset="14"/>
    <vcpupin vcpu="17" cpuset="38"/>
    <vcpupin vcpu="18" cpuset="15"/>
    <vcpupin vcpu="19" cpuset="39"/>
    <vcpupin vcpu="20" cpuset="16"/>
    <vcpupin vcpu="21" cpuset="40"/>
    <vcpupin vcpu="22" cpuset="17"/>
    <vcpupin vcpu="23" cpuset="41"/>
    <emulatorpin cpuset="2-5,26-29"/>
  </cputune>

and performance is VERY bad. I believe that I could possibly hit across two CCX’es with this config but at the same time I’m doing lots of pci-e passthrough and I’m not sure if TR 5000 architecture still binds certain pci-e slots to certain CCXes (I’m not sure whether it makes difference which pci-e devices are passed through to VM in context of CCX’es that are bound to VM)

Did anyone have an experience with performance optimizations for TR 5000?

Here’s my lscpu -e:

CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE    MAXMHZ    MINMHZ       MHZ
  0    0      0    0 0:0:0:0          yes 7021.0928 1800.0000 2397.8511
  1    0      0    1 1:1:1:0          yes 7021.0928 1800.0000 1914.3700
  2    0      0    2 2:2:2:0          yes 7021.0928 1800.0000 1799.9821
  3    0      0    3 3:3:3:0          yes 7021.0928 1800.0000 1916.1400
  4    0      0    4 4:4:4:0          yes 7021.0928 1800.0000 1800.0000
  5    0      0    5 5:5:5:0          yes 7021.0928 1800.0000 2112.3550
  6    0      0    6 8:8:8:1          yes 7021.0928 1800.0000 1800.0000
  7    0      0    7 9:9:9:1          yes 7021.0928 1800.0000 2804.7649
  8    0      0    8 10:10:10:1       yes 7021.0928 1800.0000 1800.0000
  9    0      0    9 11:11:11:1       yes 7021.0928 1800.0000 1800.0000
 10    0      0   10 12:12:12:1       yes 7021.0928 1800.0000 1800.0000
 11    0      0   11 13:13:13:1       yes 7021.0928 1800.0000 1800.0000
 12    0      0   12 16:16:16:2       yes 7021.0928 1800.0000 1800.0000
 13    0      0   13 17:17:17:2       yes 7021.0928 1800.0000 1800.0000
 14    0      0   14 18:18:18:2       yes 7021.0928 1800.0000 1800.0000
 15    0      0   15 19:19:19:2       yes 7021.0928 1800.0000 3042.5891
 16    0      0   16 20:20:20:2       yes 7021.0928 1800.0000 2800.0000
 17    0      0   17 21:21:21:2       yes 7021.0928 1800.0000 1800.0000
 18    0      0   18 24:24:24:3       yes 7021.0928 1800.0000 1800.0000
 19    0      0   19 25:25:25:3       yes 7021.0928 1800.0000 1800.0000
 20    0      0   20 26:26:26:3       yes 7021.0928 1800.0000 1798.4550
 21    0      0   21 27:27:27:3       yes 7021.0928 1800.0000 1800.0000
 22    0      0   22 28:28:28:3       yes 7021.0928 1800.0000 1797.6340
 23    0      0   23 29:29:29:3       yes 7021.0928 1800.0000 1800.0000
 24    0      0    0 0:0:0:0          yes 7021.0928 1800.0000 2169.8999
 25    0      0    1 1:1:1:0          yes 7021.0928 1800.0000 1800.0000
 26    0      0    2 2:2:2:0          yes 7021.0928 1800.0000 1800.0000
 27    0      0    3 3:3:3:0          yes 7021.0928 1800.0000 2143.9541
 28    0      0    4 4:4:4:0          yes 7021.0928 1800.0000 2145.1350
 29    0      0    5 5:5:5:0          yes 7021.0928 1800.0000 1800.0000
 30    0      0    6 8:8:8:1          yes 7021.0928 1800.0000 1800.0000
 31    0      0    7 9:9:9:1          yes 7021.0928 1800.0000 2800.1069
 32    0      0    8 10:10:10:1       yes 7021.0928 1800.0000 1800.0000
 33    0      0    9 11:11:11:1       yes 7021.0928 1800.0000 1800.0000
 34    0      0   10 12:12:12:1       yes 7021.0928 1800.0000 1851.3370
 35    0      0   11 13:13:13:1       yes 7021.0928 1800.0000 1800.0000
 36    0      0   12 16:16:16:2       yes 7021.0928 1800.0000 1800.0000
 37    0      0   13 17:17:17:2       yes 7021.0928 1800.0000 1800.0000
 38    0      0   14 18:18:18:2       yes 7021.0928 1800.0000 1800.0000
 39    0      0   15 19:19:19:2       yes 7021.0928 1800.0000 1800.0000
 40    0      0   16 20:20:20:2       yes 7021.0928 1800.0000 1800.0000
 41    0      0   17 21:21:21:2       yes 7021.0928 1800.0000 1800.0000
 42    0      0   18 24:24:24:3       yes 7021.0928 1800.0000 1800.0000
 43    0      0   19 25:25:25:3       yes 7021.0928 1800.0000 1798.7371
 44    0      0   20 26:26:26:3       yes 7021.0928 1800.0000 1796.5959
 45    0      0   21 27:27:27:3       yes 7021.0928 1800.0000 1799.0460
 46    0      0   22 28:28:28:3       yes 7021.0928 1800.0000 1791.1230
 47    0      0   23 29:29:29:3       yes 7021.0928 1800.0000 1796.6670

by inconsistend performance I mean that when let’s say I’m playing video in software (x11 driver in mplayer) it works fine for like 40 seconds and then suddely whole VM flops to its face for 3-4 seconds and then everything is alright again for next dozen of seconds. I have few “benchmarks” that are quite good at showing inconsitencies in CPU performance like playing giant GIF file in gwenview (where GIF gramerate is CPU bound, so when CPU performance is incosistent then fps varies strongly). Overall performance is okay but you can sometimes feel that something just trips for a second.

I experimented with

<vcpu placement="static">24</vcpu>
  <cputune>
    <vcpupin vcpu="0" cpuset="12"/>
    <vcpupin vcpu="1" cpuset="36"/>
    <vcpupin vcpu="2" cpuset="13"/>
    <vcpupin vcpu="3" cpuset="37"/>
    <vcpupin vcpu="4" cpuset="14"/>
    <vcpupin vcpu="5" cpuset="38"/>
    <vcpupin vcpu="6" cpuset="15"/>
    <vcpupin vcpu="7" cpuset="39"/>
    <vcpupin vcpu="8" cpuset="16"/>
    <vcpupin vcpu="9" cpuset="40"/>
    <vcpupin vcpu="10" cpuset="17"/>
    <vcpupin vcpu="11" cpuset="41"/>
    <vcpupin vcpu="12" cpuset="18"/>
    <vcpupin vcpu="13" cpuset="42"/>
    <vcpupin vcpu="14" cpuset="19"/>
    <vcpupin vcpu="15" cpuset="43"/>
    <vcpupin vcpu="16" cpuset="20"/>
    <vcpupin vcpu="17" cpuset="44"/>
    <vcpupin vcpu="18" cpuset="21"/>
    <vcpupin vcpu="19" cpuset="45"/>
    <vcpupin vcpu="20" cpuset="22"/>
    <vcpupin vcpu="21" cpuset="46"/>
    <vcpupin vcpu="22" cpuset="23"/>
    <vcpupin vcpu="23" cpuset="47"/>
  </cputune>

and it’s more or less just as bad

After further troubleshooting I think it’s related to storage issues. When I copy test files to local storage (as in VM root disk which is standard raw SATA image then performance is quite consistent. However accessing drives that are attached directly to VM via SATA controller pci-e passthrough results in weird inconsistencies.