AMD Epyc Milan Workstation Questions

Yes, it’s been stable since 9 months or so. I’m too more inclined to think the re-seating of the CPU did the trick, rather than the UPS. I also did not follow the instruction correctly when seating the CPU the first time, as I did not trust the setting of my torque screwdriver fully at the time. I think I secured it a bit too loosely - a sudden loss of contact of a pin sounds like a thing that might trigger an emergency poweroff, that might also be hardwired as the BMC might be too slow to react. (Though @Nefastor is probably the one to ask as soon as we enter the realm of electrophysics).

The second time I made better effort to get the correct torque, and after that the issue never happened. So I’m inclined to think it was a handling error being responsible for my shutoff issues.

I have never updated the bios, I belive the bios date is around february 2021 (I don’t have access to the machine now so I don’t remember the version - I remember it was claimed to support Milan though). Also note that I have no Milan CPU in it, but a Rome chip (7252). I am still waiting for a 7443p (like many others in the thread).

1 Like

Modern PSUs typically handle dirty power-lines quite well, so I’d also be inclined to think the UPS didn’t change anything, but you never know without a scope hooked-up 24/7 to capture events triggered on a software watchdog.

I’ve never had an unstable EPYC system but I think cleaning and reseating the CPU/RAM and power connectors would be the first thing I’d try, over 4000 contacts and only an unlucky one needs a sub-micron piece of dust or grease to change the electrical characteristics of that circuit. I’ve watched several YouTube videos where people used the Threadripper torque tool incorrectly (not tightening until it clicks) and then commented on issues like not all RAM channels working, not all PCIe slots working - these sockets need some care.

2 Likes

@amduser In my case it turned out to be a faulty PSU (brand new), that also took a hit on the MoBo. Both were swapped out under guaranty and since then it has been rock solid!

Proper seating of the CPU is - as mentioned in the comments- indeed a thing. Luckily I was send a thread-ripper torque tool and that really helps in tightening it down correctly.

1 Like

New Beta firmware for the Milan-X processors for the Asrock RomeD8-2T is available (ASRock Rack > ROMED8-2T)
Any one wants to test it for extra/new functionality?

1 Like

Well it’s been 11 months since I started building this workstation and I still don’t have either of the two 7443Ps I ordered here in the UK. I’m getting very tempted just to import one from the US where there seem to be literally 100s in stock!

1 Like

I cancelled all my pre-orders and ordered a 7443P from Newegg as they claimed to have them in stock. I had a bit of a nervous wait while the status was stuck on ‘packaging’ for a couple of days, but it does now seem to be in transit! :slight_smile:

4 Likes

congrats!

1 Like

Congratz. You won’t be sorry. Been loving mine for the past 9 months.

1 Like

Thanks - not long now, I’ve got an inbound ETA from the courier. Very much looking forward to my first make -j48 :slight_smile:

3 Likes

I’ve also got an ETA now! My Norwegian supplier got my 7443p earlier today, and shipped it soon after. I actually got a headsup during last week that it was on its way, but I figured I’d not jinx it by celebrating too early :slight_smile:

I’ll get a 7443p, an Arctic Freezer 4U/SP3 (the largest cooler that supports front-to-back airflow with EPYC), and another 4x16GB dimms to up the memory to 8 channels.

I haven’t reported back in this thread for a while. I already got my second GPU since a month, a used A4000 that I got for slightly below retail price despite the GPU shortage (“Quadro - the way Nvidia meant us to be scalped”…). So with these parts my build will finally be more or less complete.

Only drawback is that I’m currently in the middle of moving houses, which means I won’t have as much time for play and experimenting as I would have had if I had got it last summer, while Covid was still on. Oh well.

2 Likes

Got home to find my 7443P waiting for me - took me about 25 mins to swap CPU’s - mostly cleaning off / reapplying thermal paste!

My BIOS/BMC got a bit spooked by the CPU change, which upset the NVME/SATA settings on the slim-SAS-8i connectors, which meant I initially booted without my U.2 drive running, which caused me a bit of head scratching as to why Proxmox was broken.

Anyway, just got it going - did a wide build of LLVM from source, and I’m pretty impressed to say the least :slight_smile: I just need to save up for some more RAM to be able to link 44 targets at once :smiley:

2 Likes

Update: my 7443p has arrived!

With it came an Arctic Freezer 4U-SP3 cooler, as I was fearing that the Supermicro SNK-P0064AP4 I already own will end up noisy together with a hotter CPU. This as the latter’s peak fan speed is 3800rpm (vs 2300rpm x 2 fans for the Arctic), and it does make a lot of noise when on full speed.

Anyhow, the new cooler is fairly heavy even for its size, at 1288g. Many Noctuas of similar fan size are lighter. And I use a tower chassi, meaning that the cooler will contribute to some stress on the motherboard structure. The question is - could the cooler be too heavy for safe vertical mounting?

The cooler is advertised as a 4U server cooler, however most such cases are horizontal (rackmount). And Supermicro would never recommend or comment on anything else than their own coolers. So I wonder if the stress on the motherboard will be a problem during continued operation.

The Arctic Freezer 4U-SP3 weighs 1288g, and is 152mm high. Compare that to the Supermicro SNK-P0064AP on ~750g and 120mm - aproximating the center of gravity to be a bit higher than the middle for both (due to the heatpipes), I compute that the static torque (is that the relevant concept?) expressed by the cooler on the mainboard, will be about 2.2 times higher for the Arctic, than for the Supermicro cooler. All assuming vertical (tower) mounting.

The motherboard I use is a Supermicro H12SSL-I. Similarly to the previous discussion about PCIe slot reinforcement, I foresee that server boards might be less reinforced in the heatsink mounting, compared to consumer boards.

So I turn to you guys for a sanity check, before going further. Any thoughts on this? Does anyone have experience that may reinforce or refute my worry?

My system is in a Supermicro tower chassis but with the smaller SM cooler. The SP3 socket seems to have a pretty chunky steel top frame which the heatsink is bolted to - I probably wouldn’t ship it with a massive cooler installed due to the risk of a shock loading but the static load of a big tower cooler wouldn’t worry me personally. 1.2kg @ 0.15m is only about 1.8Nm over 4 screws and the heatsink’s CoG is lower as you suggest.

2 Likes

Thanks for your input! My thought is similar, that as long as I don’t transport it carelessly it should be ok. Supermicro ship towers with their own cooler mounted, and since mine would create about double the stress, it is equivalent with SM accepting a shipping stress twice the stress of static load. Which is by no means unreasonable as a limit.

I notice when considering the socket shape, that SP3/sWRX8 is at a disadvantage w.r.t. cooler weight compared to sTRX4. The vertical distance between the connection screws (to the backplate) is about half the length for SP3 compared to regular Threadripper (non-pro), as a consequence of the CPU orientation. This means double the stress on the socket, simply due to the orientation!

However, the shorter of the screw distances on SP3 is still similar to the corresponding distance on pretty much every consumer socket up to now. - Edit: actually, no this is not the case - most cooler backplates are aligned in the vertical direction, spreading out the torque optimally in the case of tower case mounting. So SP3 is at a disadvantage also compared to consumer sockets.

The extreme case in terms of weight seems to be the ProSiphon Elite that is 164mm tall, weighs 2kg, and has its CoG fairly high. This one does not fit a standing SP3 board for another reason - it is constrained in its orientation due to the fluid it uses to move heat around. Factoring in the orientation, this probably gives similar stress on a TRX4 mainboard as the Arctic will exert on an SP3 mainboard. So it looks like I am touching the boundary of what has been tested on the market so far (but we don’t know the physical limit though).

Have you ruled out water cooling your system? It’s not cheap but a custom loop would be both cool and quiet if you spec it right.

I’ve just converted my 6900xt which pulls 400W and an i12700k which hits 240W so a 200W EPYC doesn’t sound challenging especially with It’s huge IHS.

Yes, I’d say I already ruled out water cooling, too much hassle and I have no experience with it. I’d definitely try the Supermicro cooler before getting into something like that - and as you say, its only 200w which is not that bad. The SM cooler is actually specced to 280w (they changed the spec from 240-something when the 280w chips started showing up :slight_smile: ).

Anyhow, I’m possibly re-planning my build a bit, I recently learned that ASRock Rack nowadays has a 8-dimm variant of their micro-ATX board: ASRock Rack > ROMED8U-2T

I just ordered one. It’s essentially the board that @MadMatt uses, but with one fewer PCIe slot and a full set dimm slots! I was originally planning to keep the Supermicro ATX board for my workstation, and get a smaller ASrock board to build a server around the 7252. However now I’m considering making the workstation mATX instead, as I seem to need only 3x PCIe slots, and keep the SM board for the server. Previously, memory was the showstopper for a mATX workstation for me, and now ASRock has presented a solution to that dilemma! Now I have to think a bit on which CPU to put where :slight_smile:

Edit: ASRock has only one USB 3.1 controller onboard, while my Supermicro board has two. I currently dedicate one of my two controllers to a Windows VM, this I would not be able to do with the ASRock board. I wonder whether there is a cheap way to connect a second USB controller without using a PCIe slot (these would all be taken). SlimSas cables are still quite expensive. I could perhaps use an M2 slot somehow…

Im heavily considering swiching to EPYC as ive had so many damn issues on Intel and AMD Consumer parts for the last few months and since i have to return my TR PRO system back to work EPYC seems to be the way to go for me for the future as a Daily driver for work and gaming and whatever i really throw at the system, although just need to find what parts work for me and what price range id be looking at…

My Vendor has the 7473X listed for £3240 or the 75F3 for £4884 (what do seem kinda pricey for a home WS/Desktop) so if there is any other options that you guys think would be a suitable upgrade from my TR PRO 3945WX that i have to give back id be happy!

The things i need the system todo:

  • Run some games on the weekend/days off
  • Work well with my Optane Drives + NVME RAID without any issues
  • Work well with high heat and long hours of computing simulations for my day job
  • Handle USB/PCIE Hotplug as i like to swap out GPUs/PCIE Cards to do benchmarks for customers

How many nvme drives are we taking here? And are they M.2 or U.2?
M.2 are fairly easy to attach with an adapter and bifurcation, but U.2 is a hell filled with expensive cable that may or may not work if you want PCIe 2.0. Unless you get one of those 2U 24x nvme chassis.

Also, the 7443p boosts to 4.0ghz if you’re not running all cores full throttle. It’s way more than enough for most games when pared with an adequate GPU.

Check the p series sku. Much more reasonable for home server use. Look for ones with a high ctdp

1 Like

2x Optane p5800x/ 1x 4800x / 6x 980 PRO

I already have cables and drives I put in my previous TR Pro I’ll bring with me too the new build.

I’ll be putting a Radeon W6800 in the system as I’m wanting to run 6x high res monitors and have enough power to play some games.