Speed Junky optimizations wanted

OS- WIN10 Pro for WS
Mobo- Asus sWRX80
Cpu- TR PRO 16C
C: Drive- Samsung 980 Pro 1TB
Optane P5800x 800GB + Primocache
Ramdisk (not yet configured)
GPU: AMD Radeon Pro W6800
LAN speeds 500/500Mbps

Question: Aside from 32c or 64c cpu + sabrent ssd, what can be done to further accelerate/minimize latency of this system?

I’ve considered software for managing PING times, since I don’t want to pay more for internet, but am somewhat apprehensive, what’s the catch?

Use-case is homelab, general productivity, and some streaming video

Windows sucks for Threadrippers. Or so I’ve heard our overlords say. Maybe try having Linux underneath a Windows VM to maximize performance.

Try to read about CPU process pinning so that the processes you are running are not all over the place. I dont have the use case or equiptment to learn more.

1 Like

If for a workload latency matters and bandwidth/total processing power doesn’t, then you can confine that workload to a numa node or even to specific CCX’s, and keep all others off it. This is a common thing to do for a gaming vm on TR/Epyc builds. Also make sure the slot the gpu or related pcie hardware is associated with that numa node. Having a few high performance SSD’s on the same cores can also make a difference, as long as the PCIe bus isn’t getting congested in ways I don’t fully understand yet.

When handing out cores to specific tasks, also keep note of how the lvl3 cache is split or shared across cores and CCX’s, which changes with cpu generation. If a task/core shares a cache with other cores that are doing other work, you’ll have cache contention.

Otherwise seriously evaluate whether it saves time, mental effort and money just to have a separate non-numa high performance machine.

2 Likes

Looks like I have some homework to do, but at least I have an idea where to start now.

A good resource to outright use (for VM’s on linux) or to look into for inspiration is: https://github.com/spheenik/vfio-isolate

It handles the major things that people mess with to decrease latency. That said, you always want to benchmark to confirm that you actually get a benefit.

Also keep in mind numa nodes and whether ram is interleaved. Basically each numa node for TR/Epyc has two channels. All those channels can be interleaved to result in the bandwidth of “octo channel ram”, but that also incurs the latency penatly of crossing nodes. A VM confined to a numa node for the purpose of minimum latency would only want to use the 2 channels physically connected to that node. I have no idea what is involved with setting that up and verifying that it’s doing what you think. Supposedly the kernel sort of does this automatically if you don’t have bios ram interleaving set and keep numa node confined applications within the space available from the closest ram.

Basically, multi-numa-node systems are really just 4 computers closely weaved together, and going for minimum latency involves deliberately trying to peel them back apart and separate them.

2 Likes

That sounds complicated but VERY cool. At the moment I only have 16 cores so I’m not sure if this will help or hurt. Definitely worth looking into though.