I have been on this site for years, but never created an account until now. Please let me know if I am posting in the wrong forum.
I recently built a new, way overkill, Threadripper machine that is a cross between a gaming server and a workstation. I have local and remote access to the VMs it runs.
My issue is regarding the idle power usage and what I can change in the BIOS or in Proxmox to help lower it, but still get decent boost clocks for my gaming VMs. Idle wattage right now is 230 watts with 1 VM sitting at idle. I had an Epyc 7443p machine that idled at basically half of that with 6 VMs running.
I am using the amd_pstate driver in active mode and Iāve played around with the Proxmox scaling governors and EPP hint, but they donāt seem to do much for idle power. I do see idle wattage at 180watts after rebooting with no VMs running . On boot, the VFIO driver is being used on my 7900xtx and Nouveau is used on the gt1030. Booting the VM the 7900xtx is passed through to bumps the wattage up to 230 watts despite the card getting itās driver and in the guest OS and the GPUs fans spinning down. Radeon software says the GPU is at like 30 watts, but my experience with prior builds has shown the system wattage goes down after the GPU is passed through and gets itās actually driver in the guest OS.
Any help would be much appreciated.
Hardware:
CPU - 7960X
Mobo - Asrock TRX50 WS
RAM - 2x32GB 6000MHz ECC (soon to be 4x32GB)
OS Drive - Crucial 250GB Sata (Proxmox, LVM)
VM OS Drives - Samsung 960GB PM9A3 x2 (Windows 11 x2 and EndeavourOS, LVM)
VM Storage Drives - Samsung 2TB 980 Pro (LVM), 1TB 860 Evo, 1TB 870 Evo
GPU1 - 7900XTX passed through to Windows 11 #1 (for gaming)
GPU2 - gt1030 passed through to Windows 11 #2 (audio production, editing)
GPU3 (currently not installed) - 7700xt passed through to EndeavourOS (for gaming)
PCIe device - Startech PEXUSB3S44V (4 USB host controllers for passthrough)
Maybe itās overhead of the VM itself? What happens if you boot up a similar VM but with no passed GPU?
Overall Iām not too surprised with the idle power. My 7950x box takes 100W idle, 120W with light load (like playing youtube or so). This is already down 30W from what it was recently ā I think thanks to kernel upgrades. The more chiplets the higher the idle use AFAIK.
I just booted up a second VM with the gt1030 passed through and it went from 230 watts to basically 240 watts. I can try rebooting the host and then booting one of the VMs without a GPU, but even 180 watts with no VMs running seems high to me. Like I said, I had a lot of this hardware on an Epyc 7443p platform and it idled at like 120 to 130 watts with a 6 HDDs spun up.
Think it might be possible that running the 6000MHz EXPO profile on my RAM is hogging power? I didnāt notice it being so high until after I enabled it and I donāt have the power usage logged yet on this build to go back and look.
It is high but in line with my expectations of a 7950x pulls 100W. That is with 3 m.2ās one 3060 and a connectx-3 nic.
It might a bit. I have no registered memory but on my system there is a small difference with 6000MT vs 5200 jedec speeds. Also VSOC will be raised with expo, which will increase power a bit.
I just rebooted and have no VMs running. Idle power is 180 - 186 watts. Prior to rebooting, I enabled the multi-refresh rate thing in Windows so it renders at 60Hz on the 7900xtx instead 120Hz when there is nothing going on. It dropped about 12 watts with the VM running.
I really feel like something isnāt tuned properly here. I am not all that versed on how the C-states and stuff work, but I just noticed when running powertop that I only have C1 and C2 as states on the Idle Stats tab. Shouldnāt there be more than that?
I donāt know what is different now, but the one VM is pulling 300 watts at idle with the 7900xtx added back in. Note, the above config was prior to adding it and the USB controller back in, but the wattage was super high both ways. The Windows guest is in Balanced power mode and it appears that the GPU is clocking down. Task manager shows very little usage with Steam being the main thing using RAM, but next to no CPU usage.
See below. Note that powertop is just showing the top ~10 rows.
VM Running
root@pve3:~# cpupower frequency-info
analyzing CPU 0:
driver: amd-pstate-epp
CPUs which run at the same hardware frequency: 0
CPUs which need to have their frequency coordinated by software: 0
maximum transition latency: Cannot determine or is not supported.
hardware limits: 545 MHz - 5.67 GHz
available cpufreq governors: performance powersave
current policy: frequency should be within 545 MHz and 545 MHz.
The governor āperformanceā may decide which speed to use
within this range.
current CPU frequency: Unable to call hardware
current CPU frequency: 545 MHz (asserted by call to kernel)
boost state support:
Supported: yes
Active: yes
Boost States: 0
Total States: 3
Pstate-P0: 4200MHz
Pstate-P1: 2200MHz
Pstate-P2: 1500MHz
powertop
Summary: 3936.6 wakeups/second, 0.0 GPU ops/seconds, 0.0 VFS ops/sec and 8.4% CPU use
Every 2.0s: grep ācpu MHzā /proc/cpuinfo pve3: Thu Dec 19 14:08:09 2024
cpu MHz : 545.000
cpu MHz : 4799.856
cpu MHz : 4785.872
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 4794.211
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 4755.990
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 4694.330
cpu MHz : 4767.088
cpu MHz : 4771.754
cpu MHz : 4779.843
cpu MHz : 545.000
cpu MHz : 4785.342
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 4799.805
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 4796.127
cpu MHz : 4790.330
cpu MHz : 4798.965
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 4783.829
cpu MHz : 4781.904
cpu MHz : 4776.916
cpu MHz : 545.000
cpu MHz : 4762.992
cpu MHz : 4780.768
cpu MHz : 4773.240
cpu MHz : 545.000
cpu MHz : 4785.842
cpu MHz : 545.000
cpu MHz : 545.000
VM not running
root@pve3:~# cpupower frequency-info
analyzing CPU 0:
driver: amd-pstate-epp
CPUs which run at the same hardware frequency: 0
CPUs which need to have their frequency coordinated by software: 0
maximum transition latency: Cannot determine or is not supported.
hardware limits: 545 MHz - 5.67 GHz
available cpufreq governors: performance powersave
current policy: frequency should be within 545 MHz and 545 MHz.
The governor āperformanceā may decide which speed to use
within this range.
current CPU frequency: Unable to call hardware
current CPU frequency: 545 MHz (asserted by call to kernel)
boost state support:
Supported: yes
Active: yes
Boost States: 0
Total States: 3
Pstate-P0: 4200MHz
Pstate-P1: 2200MHz
Pstate-P2: 1500MHz
powertop
Summary: 207.1 wakeups/second, 0.0 GPU ops/seconds, 0.0 VFS ops/sec and 5.4% CPU use
Every 2.0s: grep ācpu MHzā /proc/cpuinfo pve3: Thu Dec 19 14:06:31 2024
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 2599.967
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 2597.811
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 2597.146
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 545.000
cpu MHz : 2666.895
cpu MHz : 545.000
cpu MHz : 545.000
Yeah, clearly running the VM is making the CPU run at high frequencies more often. I see this on my system to (120 ā 170W). Switching to the powersave governor may help a bit, cpupower frequency-set -g powersave. Maybe someone with a TRX50 system can chime in but compared to my 7950X system, I think your power use makes sense. With a VM running I get 170W, you get 230W but on a 4 CCD vs. 2 CCD CPU. It is what it is. Idle power is the achilles heel of AMDs current architecture.
Damn, I really hope I can find a way to drop it down. Iām already on the powersave governor.
I am just doing things remotely right now, but I am going to poke around the BIOS tonight to see if anything looks off. I did a BIOS update yesterday, which seemed to set a lot of stuff to default. I went back through and changed the virtualization stuff, along with enabling EXPO, which I never had enabled prior to the BIOS update. Iām wondering if I left C-States disabled or something 'cause I swear it was idling lower than this a few days ago.
FWIW, I briefly had a 7950X and Supermicro H13SAE-MF board prior to this build, but it was using the iGPU in place of the gt1030. I had the 7700xt and 7900xtx both passed through at PCIE4 x8 each. That was idling at less than 100 watts, so you might still have some a little efficiency on the table. I never ran R23 on the host, but 6c12t in the Windows VM would still hit 5.4Ghz and was getting a 20k score, so it was doing well at idle and load with all the powersave stuff set up and a lower ppt set in the BIOS.
I was getting frustrated trying to make things work with not enough PCIe lanes on AM5, so I requested a return and went with Threadripper.
Yeah I could optimize power use for sure. Mostly I have 2 crappy Kingston NV2 SSDs that seem to not support ASPM and have high power use in idle. Plus a pretty old connect-x 3 NIC which does support ASPM but is probably still pretty inefficient. Just one device without ASPM will keep the CPU from idling very deep AFAIK.
Your cpupower output shows governor performance though?
Ah, yep, it does. I have a crontab that sets it to powersave at boot. I have been messing with it and mustāve forgot to swap back. Changing it back did drop the wattage by 20 watts, which is about what Iāve seen on multiple platforms. For some reason, after booting the VM with the GPU removed, though, it is no longer at 230 watts when the VM was running. It was at like 300 prior to powersave mode and 280 afterwards. That is with or without the GPU passed through. I rebooted Proxmox and itās still doing the same thing. I just really feel like something is wonky.
Ive recently upgraded to same hardware as you are, except I have 128Gb ram and 4090. I donāt run Proxmox on it, just Windows 11. I can try to measure idle power usage on mine if you tell me how you measure it ? Do you have an energy power meter in the socket from where you connect to the PC ?
I have home assistant running on one of my servers with an MQTT broker that can bring in data from Zigbee devices. The Threadripper machine is plugged into a smart Zigbee outlet, which sends power stats back to Home Assistant for viewing and trending.
If youāre on Windows, you may be able to just look at package power in software like HwInfo, but yeah a smart outlet or something works really well for machines that are on 24/7. Thatās why I am so concerned with power usage.
Iāve only ever been able to get some thermals and even then, not on all devices. Thatās across multiple boards, both AMD and Intel. I havenāt even been able to get fan speeds or DIMM temps.