AMD Epyc Milan Workstation Questions

Hi guys, new poster here, just pitching in with my setup:

  • Asrock Rack ROMED6U-2L2T
  • EPYC 7402P 24-Core Processor
  • 128GB ECC Unbranded DRAM 2x32GB, 4x16GB, the ROMED6U has 6 dimm slots)
  • Proxmox VE as main OS
  • Radeon XT 5700 passed through my main Windwos WM (for Gaming)
  • Sapphire GPRO e9260 8 GB passed through my main OSX VM (for Working)
  • 2x Fresco Logic FL1100 USB 3.0 Host Controllers, passed through either my main VMs, connected using two of the ROMED6U Slimline connectors to external PCIe 8x slots
  • 1x Dual 2X NVME M.2 AHCI x8 adapter, bifurcated to 2x 4x NVME ssds
  • 1x 4x NVME Pci Adapter
  • OS running from the onboard NVME slot

I am using the 7402P configured as a single numa node:

and since both my main VMs require as low latency as possible I am pinning KVM threads to single cores, and shielding them from the main OS threads.
The OSX machine has 8 dedicated threads, while the Windows VM has 12, I also pin and isolate the emulator thread for perfromance reasons, and isolate the VM interrupts as well. I will be posting the details in a separate thread

I am running the Milan beta BIOS L3.01 even if I ended up with a Rome CPU (the plans where to get a 7443P), but availability in the past months has been nonexistent, and after two orders that were voided, one of which I still have to get my money back, I ‘settled’ on the 24 core rome part

I am running ‘in production’ since a couple of weeks, and have been fiddling with the system for a couple of months now
System is Air cooled with

  • 1x Noctua NH-U9 TR4-SP3
  • 2x Noctua NF-A14 PWM in pull configuration
  • 2x 200mm Thermaltake FAns in push configuration
    I am using 2x Thermaltake V21 cases stacked, could have probably fit everything into a single V21, but thermals would have been horrific

Since I am running proxmox on the main system and passing through all the GPUS I am controlling the initial start of the VMs using a Streamdeck:

So far a very happy camper, the only ‘problem’ I have encountered is that my 16GB dimms (all four of them) are not seen by the BIOS, but are seen and usable by the OS … have tried to contact asrock support but gotten a silly answer back …

4 Likes

No, not really. I think you’ve done enough. Thank you, man ! :+1:

1 Like

Welcome, @MadMatt ! Nice to have you here, the more the merrier ! :grin:

I’ll be sure to look for it, I’m very interested. I’ll keep my questions until then.

For the vast majority of workloads, including rendering, 4 channels with a 7443p is definitely enough. Even 2 channels might be enough, but I’m not as sure about that.

You mention 3D rendering, but will you do that on the CPU? Since GPUs usually do it better - however I don’t know whether they are still that much better with today’s prices.

Edit: Ok I see now you mentioned GPU vs CPU earlier in our discussion, and that you use both. I believe 4ch will still be enough, but you can always do as I suggest below and make a thread about your specific workload. There are many around here that seem to be good at bandwidth math :slight_smile:

I suggest you describe in more detail what you plan to do with your machine, particularly the tasks that are important for you to do fast. That could even be a topic for a thread of its own, since there are probably many experts in the forum that know a lot about what different tasks require, but they might not follow this thread since it is focused on specific chips.

For workloads like rendering, bandwidth is more important than latency. Bandwidth scales with number of memory channels, but only until the compute units (be it CPU cores, GPUs, or whatever) are saturated and work with full speed. Then the memory will have to wait anyway. Excess bandwidth is useless.

You can also think like this. A 7443p has 24 cores. The largest Milan chips have 64. AMD thinks that 8 channels is enough for those chips. That is 8 cores per channel. For your 7443p, with 4 channels of memory you would have 6 cores per channel. So your cores will still be better fed with data compared to the top Milan chips.

Demands are slightly different if you need low latency. Latency can be improved by tying specific memory channels to the CPU cores that are physically closest to them, this is done with the NPS setting (“NUMA Nodes Per Socket”) in the BIOS setup. This is also what we have been discussed most in this thread, when discussing memory optimization. If you do this, then 4 channels have some drawbacks compared to 8, but the drawbacks can be mitigated. Either way this will not be important if you mainly do rendering.

Sounds a bit scammy to me, especially if it is a full machine.

1 Like

Wow, very interesting build, thanks for posting!

I am considering building a Rome-based VM and file server around that particular motherboard, ROMED6U-2L2T. That would be based on my current CPU, a 7252, after I upgraded my current H12SSL-based rig to Milan (likely 7313p) some time later this year. Then I’ll know who to ask for tips :slight_smile:

Also nice to see a GPRO card in production. I have the GPRO 4300 (roughly the same as Radeon wx4100). However I did have some problems with it, but I think I’ll wait for your other thread before asking about your experience with yours, as to not make this thread longer for off-topical reasons :slight_smile:

1 Like

Thanks @Nefastor , I have created a separathe thread here Epyc on Asrock ROMED6U-2L2T - Proxmox Build

1 Like

@oegat , I had the 7252 for about a month while waiting for the 7402P (thanks Amazon) and other than the limited number of cores that would not let me isolate safely the two low latency VMs it performed egregiously, with better thermals and power consumption compared to its bigger brother …
As for the GPRO card, it was the best solution I could find for a dedicated GPU that used only one slot and was natively supported in OSX … the only downside being the scalper price I had to pay for it :slight_smile:

1 Like

The 7443P pricing is very much an outlier though - it’s $1337 - just to spell out leet ! The 1/2 socket version is $2010 by comparison, and the 7443P is actually cheaper than the 7343 16-core/32 thread.

STH has some good tables in its article:

https://www.servethehome.com/wp-content/uploads/2021/03/AMD-EPYC-7003-Series-SKU-Comparison-with-EPYC-7002-side-by-side-no-F-SKUs.jpg

TBH all my server-boards do this, it’s just that most of my other servers have Noctuas, not 7.5K rpm monsters. I really wouldn’t want to put a finger anywhere near these case fans - ‘will it blend?’

On a warm boot, this fan ramp dies down after ~10s, then it drops to a mere 58dB for ~1 min, before settling to idle (my office is ~48dB anyway due to the other machines), but you really wouldn’t want to share an office with anyone who had this under their desk, unless they never turned it on/off.

1 Like

Well this is just the SKU but does not tell us anything about performance jumps and especially for use cases where turbo clocks of .5ghz more destroy Rome CPUs in all benchmarks especially in single core for apps like after effects, 3d simulation and so on.

For me both latency and raw bandwith performance is imprtant as different apps use it differently, that is why I think this 7443P is amazing value when you compare it to what you get on threadripper pro and destop cpu.

I think i will settle for 4 sticks and add 4 more later

From what I’ve read, I think you’re looking at about 15-20% higher IPC for Milan. I’m not sure what your definition of ‘destroy’ is performance wise - mine’s at least 50% faster but it would seem most folk are freer with superlatives than me :wink:

Milan seems to offer more speed for sligltly less more money - a win in my view, and the 7443P is indeed IMO a great value proposition, I suspect I’ll be picking one up soon.

That article you linked I just read again and went to the comments and was actually pleased somwone put it right go there and read it.

I think rome cpu has to come down in price A LOT

I don’t know why you’d consider Rome at all now unless you could get a really cheap used one (I consider the £170 I paid for an 8-core 7262 with 128MB of L3 a bargain) or you need a SKU with fewer cores than the Milan’s start at. Or I suppose you need a CPU in a real hurry…

@Nefastor to my understanding, the limit is not in the standards or the protocol stack, but between the CPU and Ethernet adapter. When 10GbE emerged in the early 2000s, no CPU was fast enough to keep up with the interrupt rate. So the hardware makers got clever (I think Sun was first). The 10GbE Ethernet adapters present to the CPU as one virtual (in hardware) Ethernet adapter per CPU. This spreads the interrupts across the CPUs, and reduces the rate to a manageable level.

I can see this with the Intel NICs that I have at present (X550, X710, X720(?)).

You can override this behavior on the Intel NICs (at least) if you do not want all your CPUs servicing Ethernet interrupts.

The joker here is that for a single socket/connection, all the interrupts for that connection seem to stay on one CPU, so we are bottlenecked by the interrupt rate.

(Again, for the datacenter folk, this is a non-issue, and could count as desirable.)

I went through are great deal of trouble on a prior project to grease a single-connection transfer (used Linux native AIO, large buffers, and aligned everything). Got up to ~~500MB/s transfer rates, as I recall.

Later (for another project) it occurred to me to look for other folk who had solved the problem. Found there was a great deal of work that went into SMB3 (and thus into Windows, Linux, and Samba). Looks like this landed in the early 2010s, so your bump in Windows file transfer rate could easily have been due to a Windows update (and SMB3).

When testing with older NICs (on either end), for a single file transfer I could see several Linux kernel worker threads very active. Under the hood, that single transfer was spread across several connections, and used the full 10Gbps rate.

There are also improvements at the Ethernet adapter level, if you have newer hardware on both ends. With newer NICs on both ends and current software (Centos 8 / Samba), the load drops a lot. (Looked only briefly, as my application met and exceeded requirements.)

Also, while I do not live in France, I too have had gigabit internet for several years.

Note that none of the above applies to mere 1GbE. :slight_smile:

@KeithMyers note that the Thermaltake P3 is an open-sided case (suitable for a test rig for hardware folk), and lacks the usual flow-through airflow of even a desktop case. Thus my concern. Thinking of mounting a single fan just to generally push airflow over the motherboard.

There are some finned power(?) converters at the back center of the motherboard that in still air are getting just over 70C. That drops quite a lot with a little airflow.

@Preston_Bannister Yes, that would be a concern I suppose.
I always cram 10lbs of stuff into even large cases and that is a squeeze, so I engineer for lots of air movement and volume exchange. Noise concerns be damned.

Whether the existing air flow is just good to begin with or I just don’t max out the 1GB LAN cpu on my EPYCD8 mobo, I see 40° C. for both the motherboard temp and card side temp. I don’t have a LAN interface temp in the IPMI interface so guessing card side temp is the closest. And that temp is with 4 gpus sitting over the LAN chip.

Even with the Aquantia 10Gbe chip on the TR board I stay in the 40°’s with the same 4 gpus covering up the heatsink.

Hello,

I would like to make a server for a home lab. The CPU will be 7313p. I plan to use ESXi 7. What’s the best choice for this SuperMicro H12SSL-NT-B or ASRock ROMED8-2T?

It is my understanding that these boards are not officially supported by VMware.

If they’re not supported will the on board 10G networking work?

I’m a network novice . . . . but why do you need special drivers for 10Gbe?

I didn’t have to get any special kind of 10Gbe drivers for either of my 10GBe capable motherboards. The OS standard drivers work fine.
Aquantia AQC107 chipset and Intel x550 chipset.

You don’t really it’s just that gigabit ethernet has been around for so long now even your toaster has drivers for it.