DELL Precision 7920 with 2 x XEON 6226R, 128GB RAM, dual RTX A4000, WIN11PROforWS - very laggy with 8 screens

Hi all,
I am trying to find any possible resolution to my issues regarding my DELL PRECISION 7920 Workstation:
Quick specs:
2 x XEON 6226R
2 x 4 x 16GB RAM ECC 2934MT/s
2 x RTX A4000 GPU
1 x X710 DA2 SFP+ NIC
1 x Ultraspeed quad card with nvme class 2 x M.2 512GB PCIe NVMe Class 50 RAID1 for OS
Windows 11 PRO for Workstation
BIOS Settings with NUMA enabled, HT, enabled, All CPU Cores enabled

The system is still under ProSupport, both GPUs were officially bought via DELL, so the ProSupport coverage will remain valid. However, no ProSupport did not find any issues with my setup.

I am running 8 screens connected to this WS. Please see the images below for the setup (6 x 2560x1440 + 1 x 3440x1440 + 1 x 3840x1600, where possible running 10bit color, all monitors set to max refresh rate.


The workstation was fresh installed in May 2024, everything is up to date and running 23H2 of WIN11PRo for Workstations. System performance is set to ultimate performance, most setting set by GPOs.

I am running a lot of RDP sessions and web base sessions, a ton of Edge tabs and Office apps.

The system is showin utilization of 1-5% on CPU level, around 16-32GB RAM consumed. GPUs running around 5-15%.

And now, for the problem description - the workstation is very sluggish, the graphical performance is very poor. When dragging Edge across the screen, it feels very choppy and laggy, feels like the system is under heavy load. I cannot find the way, why this is occurring. After a few days I am in constant need to reboot the machine, simple things like Youtube video playback is choppy. Audio is OK, but it feels like a mammoth. Nothing special running on the system, except of ESET malware solution. No simulations or any CPU/GPU heavy tasks.
Previously, DELL refused the support thus of unofficially equipped GPUs, now, with two cards bought directly from DELL, we ended up on a suggestion to reinstall Windows.
I do not think, this will get us any resolution. Diagnostics are showing no issues. All parts including BIOS are up to date.
Did some further testing, like enable/disable HT, NUMA, no change. Also, I have split both GPUs between CPU0 and CPU1, no change. System behaves the same even if GPUs are running under CPU0.

Can it be, that the system is just too OLD and those CPUs are causing these bottlenecks?

I am sorry for my complicated description, but I am desperate of find any possible cause of my issues. I invested a lot of money into this WS with a fresh pair of RTX A4000s, but still no resolution.

Thank you very much for the support.
Regards
Boris.

2 Likes

Hi @borisgarami , welcome to Level1Techs!

First, I would make sure not to do anything that runs afoul of the rules for your expensive service contracts.

Next you should identify the root cause for your symptoms (sluggishness).

What stands out to me in your configuration is the use of eight monitors. I assume 4 are connected to each GPU.
Have you tried turning off/disconnecting monitors to see if this affects your symptoms? What if you just run 7 monitors? 6? … 4?
What if you used only monitors connected to a single GPU? What if evenly split across both GPUs?
If the symptoms disappear, at which step? Do they return if you undo the last step?

Another thing to consider is the location of the GPUs in your computer case. You have a powerful dual CPU system. While these promise twice the performance of a single CPU, workloads that require both CPUs to work together can make this hard to achieve. This is because the communication between the CPUs can become the bottleneck.
Your computer has PCIe slots that are connected either to CPU 1 or CPU 2.
I would talk with Dell support about which slots are best used with your two GPUs. I.e. should both GPUs connect to slots of a single CPU? or rather having them connect to different CPUs?

2 Likes

It kind of feels like the memory load on the gpu might be too high? Or the gpu is stuck in 2d mode. can you run hwmonitor while it is sluggish and check what the gpu is doing? just make a screenshot of the gpu part like this:

I also suggest putting the windows power mode to high performance to rule out any power management things kicking in.

1 Like

I don’t know much about Windows, but @borisgarami might want to check with Microsoft to see if they have a particular version of Windows designed for multiple CPUs. I recall reading an article on a specific version of Windows 10. @nutral, running Hwmonitor isn’t a bad idea. @borisgarami, welcome to the forum.

1 Like

He said he’s running WIN11 Pro for workstations, that is the windows version for running multiple physical cpu’s.

1 Like

I thought there was a different version when multiple NUMAs were involved. This goes to show how much I know about Windows. Sorry, @borisgarami, I wasn’t much help. I hope you find a solution.

1 Like

Thank you for your reply. I have already contacted DELL ProSupport so they should suggest the optimal GPU to PCI mapping schema.

Lowering the monitor count was not tested yet, but gonna do if nothing else will be found.

Thank you.
Boris

Thank you for the warm welcome and no issue at all, I am thankful for any help!

Thanks a lot. I am attaching a few screenshots. I hope it is OK to post those screenshots directly here.

Basically, the machine is running for one day and it is getting unusable. I am also noticing that MS edge es sometimes turning all windows to black, then coming back to normal - is this related to some specific issue? But, in any way, all windows are very laggy when moving them or resizing. For me, it really feels, like I´d be working via RDP on the machine under heavy load.

I have also gone through this document, so, from my experience and point of view, the workstation is configured as it supposed to be. Also, THe 6226R is enabled with 3 UPI, so it should really be not bottlenecked:
DELL PRECISION 7920 Tech. Guide Book

Thank you once again.
Boris.


Snímka obrazovky 2024-09-03 180721
Snímka obrazovky 2024-09-03 180728






2 Likes

Gpu seems fine still. You can really rule it out by running a 3d stress test in the background.

What is really happening is that 2 cores are pegged at 100%. So it feels like the cpu is not working that hard but an application os only running on 1 or 2 threads. Can you use task manager to check what application that is ? With how many cores you have it would only show up as a 3-5% cpu load.

What could be happening is that that application is bottlenecked by cache/memory and that is causing the issue. This could be a browser for example, your cpu is not that fast in single threaded applications.

To solve it we need to know what application it is, an option would be to set the priority of that application lower, in hopes it would help usage of the computer.

2 Likes

Thanks.
For the 3D Stress test, we were running stress test via DELL Support Assist - Running around 120-240 fps on all 4 screens per GPU. So, it was fully robust and stable.

For the processes, I am running a ton of MS Edge Windows. Please, see the attached screenshots. It is possible to fix it, somehow? Any possibility to distribute Edge to use multi core?

Yes, I am assuming, that the single core CPU performance is not anything compared to todays CPUs.

If this would be the case and I will be this much limited by those CPUs, I would probably need to shift to a new Workstation, which a very last think I am planning to do.

I appreciate your help.
Have a nice day,
Boris.


What Nvidia driver are you running?
Have you tried reinstalling the Nvidia studio driver using the clean install option?

P.S. be sure to reboot between driver changes

1 Like

This is more of a windows issue, the window manager uses a cpu core fully and a lot of memory. This also causes all the windows to slow down with it because that process is bogged down.

I’ve read up a little on it (you can google dwm high cpu/high memory)
It is caused by an intel gpu driver issue, which shouldn’t be the case here because you are using nvidia quadro’s

Or it can be caused by the driver used for Remote desktop. If your issue goes away when using remote desktop then that is probably it.

i would also suggest turning off fast boot, so when you start your computer it’s fresh.

2 Likes

Hi,

I am running 556.12, currently the official package from DELL. However, there is no change when running lastest one form the nvidia website.

When installing this pair of RTX cards, the workstation went through an offline cleanup - in an offline state, safeboot, DDU ran and afterwards deployed a fresh install of nvidia drivers. No change at all.

Thanks
Boris

Hi,

if you mean FAST STARTUP, I am not using that feature at all, always disabled. DELL BIOS set to thorough boot process.
Snímka obrazovky 2024-09-03 220728

So, the prioritization of Edge for example will not make any change, correct?

Thanks
Boris.

ehm, prioritization of edge will only change things related to edge. But the desktop window manager is the real issue. You can test if it solves itself with turning the rdp off.

If it is RDP then you can switch the rdp display driver
https://www.reddit.com/r/sysadmin/comments/jni86j/psa_if_youre_having_issues_connecting_via_rdp/

1 Like

Slightly OT butI find running huge amount of monitors to not work very well / even when I have crazy core count cpus.

I’m now running two pc’s side by side instead w 6 monitors total. It’s a bit more of a pain admittedly.

Also thorium runs faster than chrome or edge- try https://browserbench.org/
This thread may help since you seem focused on browser perf, snappiness, etc. https://forum.level1techs.com/t/any-total-gurus-on-machine-snappyness-extreme-general-desktop-use-browsing/

1 Like

Yeah, not sure how you ended up on that.

NVIDIA is showing 552.86 as the latest production branch with full support for A4000 cards. You can run Game Ready drivers, but prepared for this type of shenanigans.

I am running 5/6 monitors with a quadro / GeForce pairing, dual CPU’s with each GPU assigned to each CPU based on PCIe slot.

Had to clean install NVIDIA drivers a few times and do some other things to make it work right. I posted about it here once…

2 Likes

Found it

1 Like

In the power options, try setting the minimum-CPU-state to 5% or 10%.