AMD Epyc Milan Workstation Questions

I didn’t get one either (got the naked CPU, not realizing a torque driver was required and part of the boxed CPU). However, I don’t think you actually need one. Let me explain : when you tighten the socket’s three screws, you will clearly feel a point at which resistance increases a lot. I stopped screwing when I got there and I’ve had no issue.

I’ve got some mechanical engineering experience, including processor sockets. My educated guess is that AMD isn’t dumb enough to design a socket that can be overtightened to destruction. That means the screws and the threads they go into are designed to bottom out at the perfect distance to create a good contact on all pins. The torque wrench is probably here merely to facilitate the life of datacenter techs and factory workers who might have to install hundreds of EPYC’s in short order.

There’s also an interesting aspect to consider : the heatsink will also press the CPU onto the socket’s pins, and arguably with more even strength since it will compress the entire heat spreader (and not just the periphery, like the socket does). This uses four spring-loaded screws that are also designed to bottom out at a specific distance. Springs being fairly predictable, this will result in a known amount of force applied to the CPU. I wonder if AMD took this into account when they designed the socket. I’d say, probably.

Anyway, trust me on this : unless you’ve got gorilla arms and no sensitivity at all in your fingers, you’re not going to damage your socket or CPU.

That being said, I won’t dissuade you to “play it safe”, or if you’re just looking for an excuse to buy that sweet Wera torque wrench :heart_eyes:

1 Like

Hello Princess,

It all depends on what you mean by “gaming”. From your question, I’m going to assume that (like me) you’re just looking to enjoy a good game after work is done. Meaning you’re not a teenager looking to compete in Fortnite with a 360 Hz pro gaming monitor :crazy_face: (if that is the case, then yeah, the EPYC is probably not up to your standards).

For context, I’m using an LG 43UD79 monitor (43’’, 4K, 60 Hz) so I’m inherently limited to 4K and 60 FPS. If you’re in a similar situation, then you’ll have no problem gaming on EPYC. Mine is the 7282, which is running at “only” 2.8 GHz. My GPU is an RTX 3090 FE.

I haven’t run any gaming benchmarks, however I’ve been moving my game installation folders to my NAS and, for shoots and giggles, I’ve been trying to play those games (without reinstalling them) on the EPYC while keeping them stored on the NAS. Loading times are identical if you’re running 10 Gb Ethernet.

So, as I’ve posted before (I think) Cyberpunk 2077 runs at 4K / 60 FPS in Ultra quality. There are sometimes dips at 55 FPS but that’s in the really dense areas and it’s just a blip, definitely wouldn’t notice if the FPS wasn’t displayed.

I have also tried the following games :

  • Watch Dogs
  • Watch Dogs 2
  • Borderlands 3

All smooth as butter in 4K / 60 FPS / Vsync / ultra quality. Nothing to complain about at all.

From experience, though, I have to say I’ve never met a game that was CPU-bottlenecked. All my machines tend to have a lot of RAM and more channels than the usual two, I think this helps a whole lot more than single-threaded performance. But like I said, a 2.8 GHz Rome is enough to play anything including the latest games, so I’m certain that a faster-clocked Milan, with its IPC increase, will not leave you wanting.

Sweet, thanks so much for your reply!

Yeah, as I am gaming on a 7900X at the moment gaming was not the top priority when I selected my CPU in my current workstation. However, I do want to play games in my spare time ;). 60 FPS on Cyberpunk everything Ultra sounds more than solid enough.

If you ever do benchmark, please share your results - otherwise I am satisfied that I will likely go for a Milan when the 7443Ps become available. :slight_smile:

Looks like my H12SSL-NT will be with me early next week - just a shame the case I (Supermicro CSE-747) ordered at the same time has no ETA at the minute!

I decided to go with that Wera torque-screwdriver in the end because I like decent tools and even torquing is a good idea IMO with all those thousands of pins to connect.

I also switched from my first choice of a Noctua NH-U14S-TR4-SP3 to the same Supermicro one as oegat. I guess I’m a glutton for punsihment at the hand of Supermicro’s lead times!

1 Like

I got all my parts since a few days back, but work has kept me from starting to build until now. This will be the weekend’s project. I just now inverted the beQuiet case, that was quite a project in itself.

For the torqe-screws I went with one of these, however its probably overkill, esp. judging from @Nefastor s report above - the Wera would have sufficed.

The screwdriver counts in Nm. The internet tells me 16.1 kgf cm = 1.58 Nm, does it sound correct? I would hate to get it wrong when I made the effort to get a special device :stuck_out_tongue:

I’ll hopefully be able to report on the noise levels in a couple of days :slight_smile: Then we’ll start to see if I have to re-think my plan. The 7252 can probably go with a Noctua fan instead, but Milan might be needing the 3k8 rpm… we’ll see.

Just wanted to check-in, I’ll be back with more reports over the next few days.

2 Likes

My 7302p (set to the highest power draw) has absolutely no issues with my 92mm Noctua NH-U9 TR4-SP3, other than having a “wrong” orientation on a server board which rustles my autism.

The only time you need to worry about fans is when overclocking the big boy threadrippers or 5000 series Ryzens.

1 Like

Hello guys. I’m back with an update regarding my startup issue :

First off, I was wrong : the code isn’t 64 it’s B4. The 6 and B are easy to mistake on 7-segment displays. 0xB4 is documented in the board’s manual, it’s DXE_USB_HOTPLUG.

A quick search yielded a 2013 post on a forum regarding a different server board with the same issue I’m facing. The poster managed to get rid of the problem by changing keyboard and monitor.

Since the only thing plugged into my ROMED8-2T USB ports at this moment is a KVM switch, I’m going to assume that it’s the problem. I’ve tried booting with the KVM selecting this PC and with selecting another PC, there isn’t a difference. There is no determinism I can discern, though power-cycling the PC does improve the chance that it’ll boot. I’m thinking my KVM is the likely culprit here. It’s not a very expensive one, though it works well.

I’ll keep you updated on this issue.

1 Like

TL;DR : Asrock’s Milan BIOS is not mature yet ! You should read the rest anyway !

Curiouser and curiouser…

Even with the KVM disconnected and no USB device present, I still get the B4 error. The only USB device in the system at that point is the IPMI itself.

I hadn’t use the IPMI in a while since the machine lives on my desk right now. I connected its network cable again and was very surprised to find out it didn’t ask the router for an IP address. To save time, I decided to clear the CMOS and start over. Now the IPMI behaves as it should and I can log into it. What I’m not calling “the B4 issue” is still there, however.

I decided to take this opportunity to tackle another issue : the total lack of fan control on this motherboard. As best I can piece it together, fan control used to be in the BIOS but Asrock decided to move it to the BMC. But because they appear to be far less professional than I remember them, they shipped the ROMED8-2T with a new BIOS (no fan control) and an old firmware (no fan control either).

My board shipped with BMC firmware 1.00. I updated it to 1.10, which doesn’t have fan control either. I also tried 1.16, which is a beta version for Milan support, but this one won’t flash at all : it uploads to the board but then it stays stuck on “processing” and you never get to actually flash it.

Finally, I found a version 1.11 firmware image on a forum. Apparently Asrock sent that elusive firmware version to some guy who asked for fan control from technical support. I had to create a fake gmail account to sign up to DropBox just I could download the file. Flashed it. It works.

And it does have fan control ! And yes, I’ve tried it, it works ! So that’s one problem solved…

But the “B4 issue” remains.

Early on, I had also installed the Milan beta BIOS (L3.11). Interestingly, within the BIOS, it shows as L3.11 but the BMC reports it at P1.30 (production v1.30, I guess). This might just be that the “lab” BIOS still used an old name string, purely “cosmetic”, and it didn’t prevent the machine from working… but I decided to flash the BIOS back to v1.30 : after all, I’m using a Rome processor.

Flashing the BIOS took a lot longer than I remembered. At first I almost thought it had hanged, like the BMC 1.16 update… but eventually it did ask me to confirm and then flashed correctly.

And this appears to have solved the B4 issue. I’ve just done a dozen reboots (restart from Windows as well as shutdown + power button) and it didn’t hang on B4 even once. The KVM wasn’t the issue after all.

Now, because I’ve done a CMOS reset (and reflashed the BIOS) it’s also possible that one of my BIOS settings could have been causing the B4 issue. I’ll be looking into that. However the only settings I remember changing were :

  • “wait on BMC” during boot : can’t be the cause of the problem, because that problem happens even when the BMC is already running.
  • “external GPU” instead of “Auto” : initially selected to solve a conflict between Windows and the KVM but I eventually solved that by just disabling the internal graphics in Windows.

So, my conclusion is that Asrock’s firmware and/or BIOS for Milan are not yet mature. I can’t install the first one, and the second one either clashes with Rome BMC firmware or is somehow incompatible with the Rome USB controllers.

If anyone is interested, here is the BMC v1.11 firmware. It works for me but, of course, use at your own peril :

And here’s a link to the forum thread that led me to it :

I’d love to know if anyone has managed to install Asrock’s v1.16 BMC firmware. From what I’ve read everywhere else, no one has managed to flash it.

3 Likes

I had a lengthy mail conversation with Asrock Tech-support about flashing this Board to support Milan CPUs. Apparently, the 1.16 BMC firmware will ONLY work when you FIRST update the Bios to the latest version (3.11), WITHOUT a CPU present in the system and after that seat a MILAN CPU and boot once and after that update the BMC to 1.16.

As I’m still waiting for my Milan CPU (7313p) to arrive, I haven’t been able to test it.

2 Likes

Very interesting.

I had assumed that the Milan BIOS / firmware would also support Rome, but from what they told you it seems it might not be the case. It may be that you need to flash this motherboard every time you change from Rome to Milan and back. That could be worth keeping track of for when those parts end-up for 50 bucks on eBay ten years from now ! :sweat_smile:

Could you share with us (once you get your Milan) whether or not the BCM 1.16 firmware has fan control ?

Yeah it varies and you can’t make assumptions because of the corners these companies love to design themselves into.

My Tyan S8030 with a 7302p works fine with the 4.0 firmware, which supports Milan.

However the board can’t support 1st gen Naples because of some sort of size restriction reason that would require it’s own firmware, which Tyan is understandably not interested in trying to juggle.

Isn’t there also a discrepency between their PCIe complexes ? I seem to recall Naples had “only” 112 lanes (and is PCIe 3.0)

This is the explanation I got straight from Tyan via email asking about it. Otherwise I would have gone from Naples to eventually Milan, rather than my current Rome chip

1st and 2nd generation EPYC processors require separate BIOS images to be created and maintained. If two BIOS images were developed they could technically fit within a larger BIOS chip with some sort of switch logic to detect which branch to load during POST. However, given that the s8030 was just recently launched and the 1stgeneration AMD EPYC processors are quite old now and no longer the focus of most customers, we decided to not develop a BIOS branch capable of running the 1stgeneration EPYC 7001 parts.
Philip Maher
Tyan Product Planning and Marketing

So my impression is that the reason we see motherboards Naples/Rome compatibility and Rome/Milan compatibility but not Naples/Rome/Milan compatibility is because of bios size and Dev desire issues, rather than PCIe or some other fundamental hardware problem.

1 Like

Possibly. And even if they could make a board with a BIOS chip large enough to fit three different versions there wouldn’t be a market for it.

This makes me wonder : could they design a motherboard with the BIOS on SD card ? What I like about the EPYC is that there’s no chipset on the board, it’s just a big “break-out” to connectors. It would be interesting if they could exploit that to make a socket last maybe 10 generations of processor. That way you could always pick an old processor on eBay and pair it with a new motherboard.

I was restoring an X58 machine recently and discovered that it’s much easier to find an old processor than an old motherboard. The Chinese even made a niche business out of creating new X58 and X79 motherboards using salvaged chips. But again, this is probably not a big enough market for any board manufacturer. Still, I’d like the e-waste reduction.

I’ve got a NAS running off a Core-i7 975 (1st gen) with 48 GB of RAM (that required a BIOS update, normally it’s limited to 24 GB). That CPU is fast enough to sustain 10 Gb/s data transfers to and from an array of 8 SAS drives. Can you imagine what an EPYC Rome would be able to do years down the line when comes the time to recycle it into a server ? :heart_eyes: but for that you’d need a working motherboard, and those fail more easily than processors.

Anyway, I’m ranting… it’s time to go to sleep !

H12SSL-I / EPYC 7252 (120w) / beQuiet 802 first impressions

Now I’ve gathered enough data for a first report.

The rig is assembled, running Xubuntu 20.04.2 on baremetal since this morning. TL;DR it works as expected, no notable surprises.

Hardware list

Supermicro H12SSL-I (1GbE, SATA + slimSAS ports)
EPYC 7252 (8 x 3.1-3.2GHz)
Supermicro SNK-P0064AP4 CPU heatsink (<= 3800rpm)
4x Samsung 16Gb 3200MT/s (M393A2K43DB3-CWE)
beQuiet Silent Base 802 with stock fans
Corsair AX850
Gigabyte AORUS 7000s 1Tb NVMe (4.0)
Geforce 210 (old leftover card, kept only for temporary use like this)

Assembly notes

Except from my choice to invert the layout of the beQuiet case, to have the CPU at the bottom and the PCIe slots at the top, there were nothing special. The inversion process was more cumbersome than I expected though.

When mounting the CPU I used a torque screwdriver set to 1.58Nm (= 16.1 kg f cm) according to instruction - however I did not pull until it clicked - instead it was sort of like this:

I did the same, and the screwdriver hadn’t clicked when I stopped. I don’t know how far from the set torque I was. So I can confirm @Nefastor’s observation.

The Gigabyte NVMe drive was carefully selected for its fairly flat heatsink, with the hope that it would fit underneath a longer graphics card. It fits under the GF210 for sure. This will be the boot drive, I plan to add more flash storage for home directories and VMs later.

What I also realized when building is that the Silent Base 802 chassis is quite wide. This is probably not optimal for server-style airflow. A narrower chassis would in theory be more wind-tunnel-like, even though most are far from Supermicro’s own chassis in terms of air channeling.

BIOS / OS install / BMC issues
(maybe relevant for @jtredux)

The board came with BMC Firmware 01.00.32, and BIOS 2.0. The BIOS is the latest and is supposed to be ready for Milan.

I started by trying to get an understanding of the firmware before attempting any OS install. I actually found less real issues than I expected. The main “issue” was more of the expectation of issues fooling me, than a real issue - here’s how:

I had read about people having problems with external GPUs on the board’s predecessor, H11SSL-*. E.g. here. Then I installed an old Radeon 5450, set BIOS to boot from “offboard” VGA. This led the 5450’s output to stay black, while the onboard VGA simply said that “iKVM not supported by external VGA” (or similar). So I assumed that it was not possible to boot with an external card as primary. I realized only later that my card is far too old for having an UEFI-compatible VGA bios… Switching to the PCIe slot’s OPROM setting to “Legacy” solved it, now the post messages go on the external VGA, and I can enter BIOS setup from it.

Anyway that little adventure took several hours to sort out. What complicated it was that for the OPROM=Legacy setting for devices to be available, one has to enable “dual” mode in the boot settings tab. However a better solution is to not use 10+ years old GPUs :slight_smile:

Notes about firmware & settings:

  • It is not possible to connect to iKVM when booting from an external GPU. So BIOS settings must be changed on the screen connected to the GPU in that case. However booting from onboard VGA, and simply letting the OS take over the external GPU at boot, should work (haven’t tested yet).
  • I could not boot from the NVMe drive at default BIOS settings. However changing the drive’s firmware setting from “Vendor-specific…” to “AMI Default” (or similar wordings, don’t remember) fixed it. I don’t know yet whether it has performance consequences.
  • No driver issues so far in Xubuntu 20.04.2
  • Fan and temp controls seem to be BMC’s job, but I haven’t investigated
  • All devices can be set to be initialized in EFI mode, Legacy mode, or not at all (disabled).
  • All the relevant PCIe bifurcation settings exist

Thermals & noise

I did a quick test of CPU-related thermals from Linux, by running mprime (prime95) stresstest for 20 minutes. Here are the temp readings from BMC around their peak:

As you can see, CPU VRMs peak at 50 centigrade at full load, critical limit is 100. With a hotter CPU I might need point-cooling for the VRMs, but for now they seem fine. At this point the machine was impressively silent, and the CPU fan was at about 1600rpm. So there is definitely lots of headroom left for the CPU given this cooler.

(Max cooling is too loud for desktop use though - I know as all fans run at max during post)

I’ll return with IOMMU groupings and such tomorrow. I might also install Win10 on baremetal some day soon, just to check compatibility.

4 Likes

Thanks for more interesting data and opinions. Unfortuantely I learnt today that I won’t be able to get the Supermicro 747 case I was after (unless I can wait until July!), so will probably go with something very similar to yours. I don’t like the extra empty space over the top of the expansion slots in pretty much all of the gamer-focused desktop cases.

Yeah, I have no doubts that Noctua’s tower coolers can handle most SP3 things we throw at them. When I mentioned the possibility of using a Noctua fan I meant replacing the fan on the Supermicro cooler. However now I know that the stock cooler on SNK-P0064AP4 is silent enough at similar fan speeds. Also the fan is bolted to the shroud that holds it, rather than screwed, so it seems less trivial to replace it than I thought.

Btw you mention highest power draw, but according to AMD specs 7302p has no cTDP range, only a single 155w rating. Are there undocumented cTDP options? What is your setting? I ask because in the H12SSL bios I found a configurable power limit for the CPU, that is given as an integer. I haven’t tried, but I wonder if I can change the TDP for the 7252, despite the spec not mentioning cTDP.

I’ve experienced similar things with Supermicro. For my last workstation, my H8DG6-O (skt G34) mainboard had an IOMMU-related BIOS bug that broke PCI passthrough with xen. I got emailed a beta BIOS from Supermicro support that fixed the problem. However, that BIOS never became official, the last official one on the product page is still the buggy one. It’s nice of them of course, to provide a BIOS hack to a single customer with a specific problem - but it would lead to lots of fixes being obtainable only through some guy on a forum, esp after the board’s EOL :slight_smile:

Or go with a cheap replacement case until you can get the SM one? I notice that the 747 is discontinued on SM homepage. Also, it was listed on the compatibility list for H11SSL boards, but not for H12SSL - the latter has no workstation-class case on its HCL. Perhaps a consequence of SM making a workstation brand out of TR PRO.

I start to see the problem with spacious cases when I start thinking about airflow. Though I also find the 8-9cm fans in the 747 smaller than needed. (I keep thinking that it is airflow and noise that got it named after the iconic Boeing :slight_smile: )

4U can do better IMO, but there seems to exist little between the airducted fast-fan professional 4U cases, and the gamer related hi-volume ATX ones.

Or go with a cheap replacement case until you can get the SM one? I notice that the 747 is discontinued on SM homepage. Also, it was listed on the compatibility list for H11SSL boards, but not for H12SSL - the latter has no workstation-class case on its HCL. Perhaps a consequence of SM making a workstation brand out of TR PRO.

Part of the reason I went with the 747 was because I thought it would also fix the problem of finding a 1.2kW+ PSU which also seem hard to come by at the moment.

From what I can see, there is a newer 747 case, and even a workstation that uses it, but the case now comes with redundant 2kW PSUs. I might see if I can get a quote for one of those, but I’m guessing that SM aren’t immune to the PSU shortages that seem to be affecting all retail supplies of >1kW PSUs. I still haven’t found a review of SM’s new TRPro workstation - aren’t they shipping either?

I start to see the problem with spacious cases when I start thinking about airflow. Though I also find the 8-9cm fans in the 747 smaller than needed. (I keep thinking that it is airflow and noise that got it named after the iconic Boeing :slight_smile: )

Yes - lots of dead space above the PCIe slots and CPU - I’d like the front->back flow of a rack-mount case in tower format, which is why I picked the 747 in the firstplace.

The 92mm fans in the '747 can do 9k rpm apparently - yes, that won’t be quiet! But when not burning 1kW in the PCie slots, the case is apparently <38dB so not totally crazy compared to a 2U GPU server :slight_smile:

4U can do better IMO, but there seems to exist little between the airducted fast-fan professional 4U cases, and the gamer related hi-volume ATX ones.

There are some cheap and nasty 4U cases - my current EPYC is in one, but no proper airflow, just some 120mm fans right on the front of the case 200+mm away from the PCIe slots, so barely a gentle breeze over the I/O even on full speed.

I suspect I will be buying a temporary case/PSU and trying again to source a 747 later in the year.

Of course, I forgot about the PSU. Judging from the power headroom you are planning I understand that cooling will be a concern.

This is even the situation with my beQuiet case, that has the old-school (and for my use case unnecessary) spacing for a tower of HDDs between the fans and the mainboard. I would have preferred it to be shallower, and have the fans closer to the actual hardware. The Seasonic Syncro (the other case I considered) would probably have been slightly better in this respect, but only a tiny bit.

I don’t know where to find official numbers from AMD, but this pdf from Lenovo should be good enough see page 41 for a list. These are also a good reference for tuning epyc rome

My 7302p default is 155w, but can be bumped to 180w.

If you set the number above the max of what the chip it, it just automatically goes to the chip’s max.

Looks like the 7252 can be bumped to 150w

1 Like