yes just realized that this is your post
yes with Kernel 6.2 and 6.3, I’m also waiting for the fix
it works for me with the kernel parameter “amdgpu.sg_display=0”
Thanks for that, will try it today!
do you use 16 cores or 8 cores 16 threads? I get 97% single thread and exactly 50% multi-thread performance.
I use 4 cores per CCD, I also tried to use only one CCD, but the latency doesn’t get any better, currently I’m at about 60ns memory latency with Aida64.
<vcpu placement='static' current='16'>32</vcpu>
<iothreads>1</iothreads>
<cputune>
<vcpupin vcpu='0' cpuset='1'/>
<vcpupin vcpu='1' cpuset='17'/>
<vcpupin vcpu='2' cpuset='9'/>
<vcpupin vcpu='3' cpuset='25'/>
<vcpupin vcpu='4' cpuset='2'/>
<vcpupin vcpu='5' cpuset='18'/>
<vcpupin vcpu='6' cpuset='10'/>
<vcpupin vcpu='7' cpuset='26'/>
<vcpupin vcpu='8' cpuset='3'/>
<vcpupin vcpu='9' cpuset='19'/>
<vcpupin vcpu='10' cpuset='11'/>
<vcpupin vcpu='11' cpuset='27'/>
<vcpupin vcpu='12' cpuset='4'/>
<vcpupin vcpu='13' cpuset='20'/>
<vcpupin vcpu='14' cpuset='12'/>
<vcpupin vcpu='15' cpuset='28'/>
<emulatorpin cpuset='7,23'/>
<iothreadpin iothread='1' cpuset='9,25'/>
</cputune>
<os firmware='efi'>
<type arch='x86_64' machine='pc-q35-8.0'>hvm</type>
<firmware>
<feature enabled='no' name='enrolled-keys'/>
<feature enabled='yes' name='secure-boot'/>
</firmware>
<loader readonly='yes' secure='yes' type='pflash'>/usr/share/edk2/x64/OVMF_CODE.secboot.4m.fd</loader>
<nvram template='/usr/share/edk2/x64/OVMF_VARS.4m.fd'>/var/lib/libvirt/qemu/nvram/win11-offg_VARS.fd</nvram>
<boot dev='hd'/>
</os>
<features>
<acpi/>
<apic/>
<hyperv mode='custom'>
<relaxed state='on'/>
<vapic state='on'/>
<spinlocks state='on' retries='8191'/>
<vpindex state='on'/>
<synic state='on'/>
<stimer state='on'/>
<reset state='on'/>
<vendor_id state='on' value='1234567890ab'/>
<frequencies state='on'/>
</hyperv>
<pmu state='off'/>
<vmport state='off'/>
<smm state='on'/>
</features>
<cpu mode='host-passthrough' check='none' migratable='off'>
<topology sockets='1' dies='2' cores='8' threads='2'/>
<cache mode='passthrough'/>
<feature policy='require' name='topoext'/>
<feature policy='require' name='invtsc'/>
</cpu>
Works! Finally running on Kernel 6.3 which supposedly has optimizations for X3D CPUs
Cinebench single thread always run way slower in my vm. Same as Geekbench 6. 85% is optimistic in my case.
You get 60ns latency inside vm? That is amazing. I am about 60ns on the bare metal.
Yes inside the VM.
You’re right, I hadn’t tested R23 single thread yet, single thread R23 It’s not quite there yet.
I am using an 40 bucks air cooler right now, it’s a good one, but when I got my AIO next week, I might get to 2000 points in single thread.
This is again a different configuration, it has the same 1% lows (Metro Exodus) as a single CCD configuration but better multi tread performance.
<vcpu placement='static' current='16'>32</vcpu>
<vcpus>
<vcpu id='0' enabled='yes' hotpluggable='no'/>
<vcpu id='1' enabled='yes' hotpluggable='yes'/>
<vcpu id='2' enabled='yes' hotpluggable='yes'/>
<vcpu id='3' enabled='yes' hotpluggable='yes'/>
<vcpu id='4' enabled='yes' hotpluggable='yes'/>
<vcpu id='5' enabled='yes' hotpluggable='yes'/>
<vcpu id='6' enabled='yes' hotpluggable='yes'/>
<vcpu id='7' enabled='yes' hotpluggable='yes'/>
<vcpu id='8' enabled='no' hotpluggable='yes'/>
<vcpu id='9' enabled='no' hotpluggable='yes'/>
<vcpu id='10' enabled='no' hotpluggable='yes'/>
<vcpu id='11' enabled='no' hotpluggable='yes'/>
<vcpu id='12' enabled='no' hotpluggable='yes'/>
<vcpu id='13' enabled='no' hotpluggable='yes'/>
<vcpu id='14' enabled='no' hotpluggable='yes'/>
<vcpu id='15' enabled='no' hotpluggable='yes'/>
<vcpu id='16' enabled='no' hotpluggable='yes'/>
<vcpu id='17' enabled='no' hotpluggable='yes'/>
<vcpu id='18' enabled='no' hotpluggable='yes'/>
<vcpu id='19' enabled='no' hotpluggable='yes'/>
<vcpu id='20' enabled='no' hotpluggable='yes'/>
<vcpu id='21' enabled='no' hotpluggable='yes'/>
<vcpu id='22' enabled='no' hotpluggable='yes'/>
<vcpu id='23' enabled='no' hotpluggable='yes'/>
<vcpu id='24' enabled='yes' hotpluggable='yes'/>
<vcpu id='25' enabled='yes' hotpluggable='yes'/>
<vcpu id='26' enabled='yes' hotpluggable='yes'/>
<vcpu id='27' enabled='yes' hotpluggable='yes'/>
<vcpu id='28' enabled='yes' hotpluggable='yes'/>
<vcpu id='29' enabled='yes' hotpluggable='yes'/>
<vcpu id='30' enabled='yes' hotpluggable='yes'/>
<vcpu id='31' enabled='yes' hotpluggable='yes'/>
</vcpus>
<iothreads>1</iothreads>
<cputune>
<emulatorpin cpuset='8,24'/>
<iothreadpin iothread='1' cpuset='6,22'/>
</cputune>
<os firmware='efi'>
<type arch='x86_64' machine='pc-q35-8.0'>hvm</type>
<firmware>
<feature enabled='no' name='enrolled-keys'/>
<feature enabled='yes' name='secure-boot'/>
</firmware>
<loader readonly='yes' secure='yes' type='pflash'>/usr/share/edk2/x64/OVMF_CODE.secboot.4m.fd</loader>
<nvram template='/usr/share/edk2/x64/OVMF_VARS.4m.fd'>/var/lib/libvirt/qemu/nvram/win11-offg-clone2_VARS.fd</nvram>
</os>
<features>
<acpi/>
<apic/>
<hyperv mode='custom'>
<relaxed state='on'/>
<vapic state='on'/>
<spinlocks state='on' retries='8191'/>
<vpindex state='on'/>
<synic state='on'/>
<stimer state='on'/>
<reset state='on'/>
<vendor_id state='on' value='1234567890ab'/>
<frequencies state='on'/>
</hyperv>
<pmu state='off'/>
<vmport state='off'/>
<smm state='on'/>
</features>
<cpu mode='host-passthrough' check='none' migratable='off'>
<topology sockets='1' dies='2' cores='8' threads='2'/>
<cache mode='passthrough'/>
<feature policy='require' name='topoext'/>
<feature policy='require' name='invtsc'/>
<feature policy='disable' name='monitor'/>
</cpu>
<clock offset='localtime'>
<timer name='hypervclock' present='yes'/>
<timer name='rtc' tickpolicy='catchup'/>
<timer name='pit' tickpolicy='delay'/>
<timer name='hpet' present='no'/>
<timer name='hypervclock' present='yes'/>
</clock>
Yes, you look like thermal throttled severely.
I am only passing 16 threads, one from each physical core, yet better multithread performance.
I am on $70 air cooler.
<cputune>
<vcpupin vcpu="0" cpuset="0"/>
<vcpupin vcpu="1" cpuset="1"/>
<vcpupin vcpu="2" cpuset="2"/>
<vcpupin vcpu="3" cpuset="3"/>
<vcpupin vcpu="4" cpuset="4"/>
<vcpupin vcpu="5" cpuset="5"/>
<vcpupin vcpu="6" cpuset="6"/>
<vcpupin vcpu="7" cpuset="7"/>
<vcpupin vcpu="8" cpuset="8"/>
<vcpupin vcpu="9" cpuset="9"/>
<vcpupin vcpu="10" cpuset="10"/>
<vcpupin vcpu="11" cpuset="11"/>
<vcpupin vcpu="12" cpuset="12"/>
<vcpupin vcpu="13" cpuset="13"/>
<vcpupin vcpu="14" cpuset="14"/>
<vcpupin vcpu="15" cpuset="15"/>
<iothreadpin iothread="1" cpuset="24-27"/>
<iothreadpin iothread="2" cpuset="28-31"/>
</cputune>
I cannot figure out why my single thread score is pretty low comparing yours.
Some one mentioned this high score is due to the skew of emulated clock. I post here anyway for your reference. GPU: RTX 4070 TI
I tried to recreate your results, I just get a slightly worse memory latency, but everything else is about the same.
Have you disabled SMT in the bios and which power management settings are you using in Windows?
test config
<memoryBacking>
<hugepages/>
</memoryBacking>
<vcpu placement='static'>16</vcpu>
<iothreads>1</iothreads>
<cputune>
<vcpupin vcpu='0' cpuset='0'/>
<vcpupin vcpu='1' cpuset='1'/>
<vcpupin vcpu='2' cpuset='2'/>
<vcpupin vcpu='3' cpuset='3'/>
<vcpupin vcpu='4' cpuset='4'/>
<vcpupin vcpu='5' cpuset='5'/>
<vcpupin vcpu='6' cpuset='6'/>
<vcpupin vcpu='7' cpuset='7'/>
<vcpupin vcpu='8' cpuset='8'/>
<vcpupin vcpu='9' cpuset='9'/>
<vcpupin vcpu='10' cpuset='10'/>
<vcpupin vcpu='11' cpuset='11'/>
<vcpupin vcpu='12' cpuset='12'/>
<vcpupin vcpu='13' cpuset='13'/>
<vcpupin vcpu='14' cpuset='14'/>
<vcpupin vcpu='15' cpuset='15'/>
<iothreadpin iothread='1' cpuset='16'/>
</cputune>
<os firmware='efi'>
<type arch='x86_64' machine='pc-q35-8.0'>hvm</type>
<firmware>
<feature enabled='no' name='enrolled-keys'/>
<feature enabled='yes' name='secure-boot'/>
</firmware>
<loader readonly='yes' secure='yes' type='pflash'>/usr/share/edk2/x64/OVMF_CODE.secboot.4m.fd</loader>
<nvram template='/usr/share/edk2/x64/OVMF_VARS.4m.fd'>/var/lib/libvirt/qemu/nvram/win11-offg-clone4_VARS.fd</nvram>
</os>
<features>
<acpi/>
<apic/>
<hyperv mode='custom'>
<relaxed state='on'/>
<vapic state='on'/>
<spinlocks state='on' retries='8191'/>
<vpindex state='on'/>
<synic state='on'/>
<stimer state='on'/>
<reset state='on'/>
<vendor_id state='on' value='1234567890ab'/>
<frequencies state='on'/>
</hyperv>
<pmu state='off'/>
<vmport state='off'/>
<smm state='on'/>
</features>
<cpu mode='host-passthrough' check='none' migratable='on'>
<topology sockets='1' dies='1' cores='16' threads='1'/>
<cache mode='passthrough'/>
<feature policy='require' name='topoext'/>
<feature policy='require' name='invtsc'/>
<feature policy='require' name='x2apic'/>
</cpu>
<clock offset='localtime'>
<timer name='hypervclock' present='yes'/>
<timer name='rtc' tickpolicy='catchup'/>
<timer name='pit' tickpolicy='delay'/>
<timer name='hpet' present='no'/>
<timer name='hypervclock' present='yes'/>
</clock>
Result
I would have guessed that this config uses 16 cores, as Aida64 shows, but the host only uses 8 cores
edit: BS, this are the cores, 16-31 are the the siblings, but coreinfo shows only 32MB L3 cache for your config…??!
Processor signature: 00A60F12
Logical to Physical Processor Map:
*--------------- Physical Processor 0
-*-------------- Physical Processor 1
--*------------- Physical Processor 2
---*------------ Physical Processor 3
----*----------- Physical Processor 4
-----*---------- Physical Processor 5
------*--------- Physical Processor 6
-------*-------- Physical Processor 7
--------*------- Physical Processor 8
---------*------ Physical Processor 9
----------*----- Physical Processor 10
-----------*---- Physical Processor 11
------------*--- Physical Processor 12
-------------*-- Physical Processor 13
--------------*- Physical Processor 14
---------------* Physical Processor 15
Logical Processor to Socket Map:
**************** Socket 0
Logical Processor to NUMA Node Map:
**************** NUMA Node 0
No NUMA nodes.
Logical Processor to Cache Map:
**-------------- Data Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64
**-------------- Instruction Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64
**-------------- Unified Cache 0, Level 2, 1 MB, Assoc 8, LineSize 64
**************** Unified Cache 1, Level 3, 32 MB, Assoc 16, LineSize 64
--**------------ Data Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64
--**------------ Instruction Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64
--**------------ Unified Cache 2, Level 2, 1 MB, Assoc 8, LineSize 64
----**---------- Data Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64
----**---------- Instruction Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64
----**---------- Unified Cache 3, Level 2, 1 MB, Assoc 8, LineSize 64
------**-------- Data Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64
------**-------- Instruction Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64
------**-------- Unified Cache 4, Level 2, 1 MB, Assoc 8, LineSize 64
--------**------ Data Cache 4, Level 1, 32 KB, Assoc 8, LineSize 64
--------**------ Instruction Cache 4, Level 1, 32 KB, Assoc 8, LineSize 64
--------**------ Unified Cache 5, Level 2, 1 MB, Assoc 8, LineSize 64
----------**---- Data Cache 5, Level 1, 32 KB, Assoc 8, LineSize 64
----------**---- Instruction Cache 5, Level 1, 32 KB, Assoc 8, LineSize 64
----------**---- Unified Cache 6, Level 2, 1 MB, Assoc 8, LineSize 64
------------**-- Data Cache 6, Level 1, 32 KB, Assoc 8, LineSize 64
------------**-- Instruction Cache 6, Level 1, 32 KB, Assoc 8, LineSize 64
------------**-- Unified Cache 7, Level 2, 1 MB, Assoc 8, LineSize 64
--------------** Data Cache 7, Level 1, 32 KB, Assoc 8, LineSize 64
--------------** Instruction Cache 7, Level 1, 32 KB, Assoc 8, LineSize 64
--------------** Unified Cache 8, Level 2, 1 MB, Assoc 8, LineSize 64
Logical Processor to Group Map:
**************** Group 0
that’s my config
Logical to Physical Processor Map:
**-------------- Physical Processor 0 (Hyperthreaded)
--**------------ Physical Processor 1 (Hyperthreaded)
----**---------- Physical Processor 2 (Hyperthreaded)
------**-------- Physical Processor 3 (Hyperthreaded)
--------**------ Physical Processor 4 (Hyperthreaded)
----------**---- Physical Processor 5 (Hyperthreaded)
------------**-- Physical Processor 6 (Hyperthreaded)
--------------** Physical Processor 7 (Hyperthreaded)
Logical Processor to Socket Map:
**************** Socket 0
Logical Processor to NUMA Node Map:
**************** NUMA Node 0
No NUMA nodes.
Logical Processor to Cache Map:
**-------------- Data Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64
**-------------- Instruction Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64
**-------------- Unified Cache 0, Level 2, 1 MB, Assoc 8, LineSize 64
********-------- Unified Cache 1, Level 3, 32 MB, Assoc 16, LineSize 64
--**------------ Data Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64
--**------------ Instruction Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64
--**------------ Unified Cache 2, Level 2, 1 MB, Assoc 8, LineSize 64
----**---------- Data Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64
----**---------- Instruction Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64
----**---------- Unified Cache 3, Level 2, 1 MB, Assoc 8, LineSize 64
------**-------- Data Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64
------**-------- Instruction Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64
------**-------- Unified Cache 4, Level 2, 1 MB, Assoc 8, LineSize 64
--------**------ Data Cache 4, Level 1, 32 KB, Assoc 8, LineSize 64
--------**------ Instruction Cache 4, Level 1, 32 KB, Assoc 8, LineSize 64
--------**------ Unified Cache 5, Level 2, 1 MB, Assoc 8, LineSize 64
--------******** Unified Cache 6, Level 3, 32 MB, Assoc 16, LineSize 64
----------**---- Data Cache 5, Level 1, 32 KB, Assoc 8, LineSize 64
----------**---- Instruction Cache 5, Level 1, 32 KB, Assoc 8, LineSize 64
----------**---- Unified Cache 7, Level 2, 1 MB, Assoc 8, LineSize 64
------------**-- Data Cache 6, Level 1, 32 KB, Assoc 8, LineSize 64
------------**-- Instruction Cache 6, Level 1, 32 KB, Assoc 8, LineSize 64
------------**-- Unified Cache 8, Level 2, 1 MB, Assoc 8, LineSize 64
--------------** Data Cache 7, Level 1, 32 KB, Assoc 8, LineSize 64
--------------** Instruction Cache 7, Level 1, 32 KB, Assoc 8, LineSize 64
--------------** Unified Cache 9, Level 2, 1 MB, Assoc 8, LineSize 64
Logical Processor to Group Map:
**************** Group 0
Correct.
I don’t disable smt in the bios, but I intend to just pass 1 thread per physical core to the vm. I don’t need multicore efficiency. I just use the vm for gaming.
Here is the core activities, when I run multithread cinebench in the guest windows.
Here is the output lscpu -e
, which confirms that 16-31 are siblings and mostly in idle.
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ MHZ
0 0 0 0 0:0:0:0 yes 6070.0000 400.0000 5224.9160
1 0 0 1 1:1:1:0 yes 6070.0000 400.0000 5224.9209
2 0 0 2 2:2:2:0 yes 6070.0000 400.0000 5224.9282
3 0 0 3 3:3:3:0 yes 6070.0000 400.0000 5220.9941
4 0 0 4 4:4:4:0 yes 6070.0000 400.0000 5220.9800
5 0 0 5 5:5:5:0 yes 6070.0000 400.0000 5220.9932
6 0 0 6 6:6:6:0 yes 6070.0000 400.0000 5220.9951
7 0 0 7 7:7:7:0 yes 6070.0000 400.0000 5220.9771
8 0 0 8 8:8:8:1 yes 6070.0000 400.0000 5090.8501
9 0 0 9 9:9:9:1 yes 6070.0000 400.0000 5090.8262
10 0 0 10 10:10:10:1 yes 6070.0000 400.0000 5090.8442
11 0 0 11 11:11:11:1 yes 6070.0000 400.0000 5090.8462
12 0 0 12 12:12:12:1 yes 6070.0000 400.0000 5090.8530
13 0 0 13 13:13:13:1 yes 6070.0000 400.0000 5090.8408
14 0 0 14 14:14:14:1 yes 6070.0000 400.0000 5090.8481
15 0 0 15 15:15:15:1 yes 6070.0000 400.0000 5090.8379
16 0 0 0 0:0:0:0 yes 6070.0000 400.0000 5214.4019
17 0 0 1 1:1:1:0 yes 6070.0000 400.0000 400.0000
18 0 0 2 2:2:2:0 yes 6070.0000 400.0000 400.0000
19 0 0 3 3:3:3:0 yes 6070.0000 400.0000 400.0000
20 0 0 4 4:4:4:0 yes 6070.0000 400.0000 400.0000
21 0 0 5 5:5:5:0 yes 6070.0000 400.0000 400.0000
22 0 0 6 6:6:6:0 yes 6070.0000 400.0000 400.0000
23 0 0 7 7:7:7:0 yes 6070.0000 400.0000 400.0000
24 0 0 8 8:8:8:1 yes 6070.0000 400.0000 400.0000
25 0 0 9 9:9:9:1 yes 6070.0000 400.0000 5091.0811
26 0 0 10 10:10:10:1 yes 6070.0000 400.0000 400.0000
27 0 0 11 11:11:11:1 yes 6070.0000 400.0000 5092.7329
28 0 0 12 12:12:12:1 yes 6070.0000 400.0000 400.0000
29 0 0 13 13:13:13:1 yes 6070.0000 400.0000 400.0000
30 0 0 14 14:14:14:1 yes 6070.0000 400.0000 400.0000
31 0 0 15 15:15:15:1 yes 6070.0000 400.0000 400.0000
Additionally, I use this in my boot parameter, amd_pstate=passive
. I guess it may affect single thread performance. I verified the boost of the host threads are not affected, but the vm threads seems not to boost high. I choose to keep this parameter as it does lower the idle temperature.
Geekbench 6 on the host.
https://browser.geekbench.com/v6/cpu/1334085
Geekbench 6 on the guest.
https://browser.geekbench.com/v6/cpu/1390536
PS: Curve Optimizer may play a role. What CO value do you use? How much Max Boost Frequency Offset do you use?
I set -15 on the best 4 cores, and use -10 on rest of them. +200MHz Max Offset.
just PBO enabled and CPU voltage at auto, I will deal with that as soon I got my AIO.
The highest load I saw in Ryzen Master with my cooler was 150W, so there are still 20 watts left.
I found going from Ultimate power profile to balanced in Windows is worth 100 points more single core performance in R23
Ok, works but not fully…
So, I no more have that white overlay in my whole desktop once I login, but I cannot run my VMs…
One that work working perfectly with 6.1 now does not start with this message:
libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2023-05-28T13:02:33.049123Z qemu-system-x86_64: -device {"driver":"vfio-pci","host":"0000:0e:00.0","id":"hostdev4","bus":"pci.6","addr":"0x0","rombar":0}: vfio 0000:0e:00.0: group 25 is not viable
Please ensure all devices within the iommu_group are bound to their vfio bus driver.
I am passing through a NIC, that is in a group with other devices, which there is no way I can pass through them too… Works with 6.1 though
Sorry no experience with that, I’m guessing you’ve already used the ACS override patch?
I got lucky with my board, it’s optimal for VFIO, everything is in a separate group, thanks to @jxdking for the recommendation, I only looked at X670e and didn’t have B650 on the radar
This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.