AMD Epyc Milan Workstation Questions

MadMatt · June 24, 2021, 10:33am

Hi guys, new poster here, just pitching in with my setup:

Asrock Rack ROMED6U-2L2T
EPYC 7402P 24-Core Processor
128GB ECC Unbranded DRAM 2x32GB, 4x16GB, the ROMED6U has 6 dimm slots)
Proxmox VE as main OS
Radeon XT 5700 passed through my main Windwos WM (for Gaming)
Sapphire GPRO e9260 8 GB passed through my main OSX VM (for Working)
2x Fresco Logic FL1100 USB 3.0 Host Controllers, passed through either my main VMs, connected using two of the ROMED6U Slimline connectors to external PCIe 8x slots
1x Dual 2X NVME M.2 AHCI x8 adapter, bifurcated to 2x 4x NVME ssds
1x 4x NVME Pci Adapter
OS running from the onboard NVME slot

I am using the 7402P configured as a single numa node:

and since both my main VMs require as low latency as possible I am pinning KVM threads to single cores, and shielding them from the main OS threads.
The OSX machine has 8 dedicated threads, while the Windows VM has 12, I also pin and isolate the emulator thread for perfromance reasons, and isolate the VM interrupts as well. I will be posting the details in a separate thread

I am running the Milan beta BIOS L3.01 even if I ended up with a Rome CPU (the plans where to get a 7443P), but availability in the past months has been nonexistent, and after two orders that were voided, one of which I still have to get my money back, I ‘settled’ on the 24 core rome part

I am running ‘in production’ since a couple of weeks, and have been fiddling with the system for a couple of months now
System is Air cooled with

1x Noctua NH-U9 TR4-SP3
2x Noctua NF-A14 PWM in pull configuration
2x 200mm Thermaltake FAns in push configuration
I am using 2x Thermaltake V21 cases stacked, could have probably fit everything into a single V21, but thermals would have been horrific

Since I am running proxmox on the main system and passing through all the GPUS I am controlling the initial start of the VMs using a Streamdeck:

So far a very happy camper, the only ‘problem’ I have encountered is that my 16GB dimms (all four of them) are not seen by the BIOS, but are seen and usable by the OS … have tried to contact asrock support but gotten a silly answer back …

Nefastor · June 24, 2021, 10:50am

No, not really. I think you’ve done enough. Thank you, man !

Nefastor · June 24, 2021, 11:09am

Welcome, @MadMatt ! Nice to have you here, the more the merrier !

I’ll be sure to look for it, I’m very interested. I’ll keep my questions until then.

oegat · June 24, 2021, 11:56am

For the vast majority of workloads, including rendering, 4 channels with a 7443p is definitely enough. Even 2 channels might be enough, but I’m not as sure about that.

You mention 3D rendering, but will you do that on the CPU? Since GPUs usually do it better - however I don’t know whether they are still that much better with today’s prices.

Edit: Ok I see now you mentioned GPU vs CPU earlier in our discussion, and that you use both. I believe 4ch will still be enough, but you can always do as I suggest below and make a thread about your specific workload. There are many around here that seem to be good at bandwidth math

I suggest you describe in more detail what you plan to do with your machine, particularly the tasks that are important for you to do fast. That could even be a topic for a thread of its own, since there are probably many experts in the forum that know a lot about what different tasks require, but they might not follow this thread since it is focused on specific chips.

For workloads like rendering, bandwidth is more important than latency. Bandwidth scales with number of memory channels, but only until the compute units (be it CPU cores, GPUs, or whatever) are saturated and work with full speed. Then the memory will have to wait anyway. Excess bandwidth is useless.

You can also think like this. A 7443p has 24 cores. The largest Milan chips have 64. AMD thinks that 8 channels is enough for those chips. That is 8 cores per channel. For your 7443p, with 4 channels of memory you would have 6 cores per channel. So your cores will still be better fed with data compared to the top Milan chips.

Demands are slightly different if you need low latency. Latency can be improved by tying specific memory channels to the CPU cores that are physically closest to them, this is done with the NPS setting (“NUMA Nodes Per Socket”) in the BIOS setup. This is also what we have been discussed most in this thread, when discussing memory optimization. If you do this, then 4 channels have some drawbacks compared to 8, but the drawbacks can be mitigated. Either way this will not be important if you mainly do rendering.

Sounds a bit scammy to me, especially if it is a full machine.

oegat · June 24, 2021, 12:34pm

Wow, very interesting build, thanks for posting!

I am considering building a Rome-based VM and file server around that particular motherboard, ROMED6U-2L2T. That would be based on my current CPU, a 7252, after I upgraded my current H12SSL-based rig to Milan (likely 7313p) some time later this year. Then I’ll know who to ask for tips

Also nice to see a GPRO card in production. I have the GPRO 4300 (roughly the same as Radeon wx4100). However I did have some problems with it, but I think I’ll wait for your other thread before asking about your experience with yours, as to not make this thread longer for off-topical reasons

MadMatt · June 24, 2021, 12:39pm

Thanks @Nefastor , I have created a separathe thread here Epyc on Asrock ROMED6U-2L2T - Proxmox Build

MadMatt · June 24, 2021, 12:44pm

@oegat , I had the 7252 for about a month while waiting for the 7402P (thanks Amazon) and other than the limited number of cores that would not let me isolate safely the two low latency VMs it performed egregiously, with better thermals and power consumption compared to its bigger brother …
As for the GPRO card, it was the best solution I could find for a dedicated GPU that used only one slot and was natively supported in OSX … the only downside being the scalper price I had to pay for it

jtredux · June 24, 2021, 4:52pm

The 7443P pricing is very much an outlier though - it’s $1337 - just to spell out leet ! The 1/2 socket version is $2010 by comparison, and the 7443P is actually cheaper than the 7343 16-core/32 thread.

STH has some good tables in its article:

https://www.servethehome.com/wp-content/uploads/2021/03/AMD-EPYC-7003-Series-SKU-Comparison-with-EPYC-7002-side-by-side-no-F-SKUs.jpg

jtredux · June 24, 2021, 4:57pm

TBH all my server-boards do this, it’s just that most of my other servers have Noctuas, not 7.5K rpm monsters. I really wouldn’t want to put a finger anywhere near these case fans - ‘will it blend?’

On a warm boot, this fan ramp dies down after ~10s, then it drops to a mere 58dB for ~1 min, before settling to idle (my office is ~48dB anyway due to the other machines), but you really wouldn’t want to share an office with anyone who had this under their desk, unless they never turned it on/off.

Edin_Gacic · June 24, 2021, 7:27pm

Well this is just the SKU but does not tell us anything about performance jumps and especially for use cases where turbo clocks of .5ghz more destroy Rome CPUs in all benchmarks especially in single core for apps like after effects, 3d simulation and so on.

For me both latency and raw bandwith performance is imprtant as different apps use it differently, that is why I think this 7443P is amazing value when you compare it to what you get on threadripper pro and destop cpu.

I think i will settle for 4 sticks and add 4 more later

jtredux · June 24, 2021, 7:41pm

From what I’ve read, I think you’re looking at about 15-20% higher IPC for Milan. I’m not sure what your definition of ‘destroy’ is performance wise - mine’s at least 50% faster but it would seem most folk are freer with superlatives than me

Milan seems to offer more speed for sligltly less more money - a win in my view, and the 7443P is indeed IMO a great value proposition, I suspect I’ll be picking one up soon.

Edin_Gacic · June 24, 2021, 8:11pm

That article you linked I just read again and went to the comments and was actually pleased somwone put it right go there and read it.

I think rome cpu has to come down in price A LOT

jtredux · June 24, 2021, 9:36pm

I don’t know why you’d consider Rome at all now unless you could get a really cheap used one (I consider the £170 I paid for an 8-core 7262 with 128MB of L3 a bargain) or you need a SKU with fewer cores than the Milan’s start at. Or I suppose you need a CPU in a real hurry…

Preston_Bannister · June 27, 2021, 10:20pm

@Nefastor to my understanding, the limit is not in the standards or the protocol stack, but between the CPU and Ethernet adapter. When 10GbE emerged in the early 2000s, no CPU was fast enough to keep up with the interrupt rate. So the hardware makers got clever (I think Sun was first). The 10GbE Ethernet adapters present to the CPU as one virtual (in hardware) Ethernet adapter per CPU. This spreads the interrupts across the CPUs, and reduces the rate to a manageable level.

I can see this with the Intel NICs that I have at present (X550, X710, X720(?)).

You can override this behavior on the Intel NICs (at least) if you do not want all your CPUs servicing Ethernet interrupts.

The joker here is that for a single socket/connection, all the interrupts for that connection seem to stay on one CPU, so we are bottlenecked by the interrupt rate.

(Again, for the datacenter folk, this is a non-issue, and could count as desirable.)

I went through are great deal of trouble on a prior project to grease a single-connection transfer (used Linux native AIO, large buffers, and aligned everything). Got up to ~~500MB/s transfer rates, as I recall.

Later (for another project) it occurred to me to look for other folk who had solved the problem. Found there was a great deal of work that went into SMB3 (and thus into Windows, Linux, and Samba). Looks like this landed in the early 2010s, so your bump in Windows file transfer rate could easily have been due to a Windows update (and SMB3).

When testing with older NICs (on either end), for a single file transfer I could see several Linux kernel worker threads very active. Under the hood, that single transfer was spread across several connections, and used the full 10Gbps rate.

There are also improvements at the Ethernet adapter level, if you have newer hardware on both ends. With newer NICs on both ends and current software (Centos 8 / Samba), the load drops a lot. (Looked only briefly, as my application met and exceeded requirements.)

Also, while I do not live in France, I too have had gigabit internet for several years.

Note that none of the above applies to mere 1GbE.

Preston_Bannister · June 27, 2021, 10:33pm

@KeithMyers note that the Thermaltake P3 is an open-sided case (suitable for a test rig for hardware folk), and lacks the usual flow-through airflow of even a desktop case. Thus my concern. Thinking of mounting a single fan just to generally push airflow over the motherboard.

There are some finned power(?) converters at the back center of the motherboard that in still air are getting just over 70C. That drops quite a lot with a little airflow.

KeithMyers · June 28, 2021, 11:08pm

@Preston_Bannister Yes, that would be a concern I suppose.
I always cram 10lbs of stuff into even large cases and that is a squeeze, so I engineer for lots of air movement and volume exchange. Noise concerns be damned.

Whether the existing air flow is just good to begin with or I just don’t max out the 1GB LAN cpu on my EPYCD8 mobo, I see 40° C. for both the motherboard temp and card side temp. I don’t have a LAN interface temp in the IPMI interface so guessing card side temp is the closest. And that temp is with 4 gpus sitting over the LAN chip.

Even with the Aquantia 10Gbe chip on the TR board I stay in the 40°’s with the same 4 gpus covering up the heatsink.

akv8 · June 29, 2021, 12:08pm

Hello,

I would like to make a server for a home lab. The CPU will be 7313p. I plan to use ESXi 7. What’s the best choice for this SuperMicro H12SSL-NT-B or ASRock ROMED8-2T?

It is my understanding that these boards are not officially supported by VMware.

jtredux · July 1, 2021, 8:40pm

If they’re not supported will the on board 10G networking work?

KeithMyers · July 1, 2021, 8:51pm

I’m a network novice . . . . but why do you need special drivers for 10Gbe?

I didn’t have to get any special kind of 10Gbe drivers for either of my 10GBe capable motherboards. The OS standard drivers work fine.
Aquantia AQC107 chipset and Intel x550 chipset.

redocbew · July 1, 2021, 8:57pm

You don’t really it’s just that gigabit ethernet has been around for so long now even your toaster has drivers for it.