AMD Epyc Milan Workstation Questions

The product page says there is room for “optional” 3x 12cm fans at the front and at the top. The top ones are probably for the watercooling option that SM offers (which in turn suggests the CPU fan is not necessary for cooling the VRMs, however a top radiator would probably pull some air over them), and the front ones for intake. The product picture I linked previosly seems to show 2x intake fans, but they sit lower than the VRMs, similarly to how a regular chassis with a couple of 5"25 drive bays at the top would have it.

So then I take you don’t think putting the fans sideways on the NH-U9 would be a good idea. You are probably right. I hope some new front-to-back cooling options will emerge, the cooler industry is perhaps waiting to see what happens with the next Threadripper iteration, first. I hope AMD will stay with this new orientation of the socket, I feel the orientation of the regular TR sockets was a mistake. The new one seems much more efficient, but only provided new coolers will be made for it.

I’m gearing up to that. I’ve send a message to Asrock regarding the hibernation issue and I think I’ve gone as far as I can on that front. I’ve actually created a separate for it.

I’m considering a temporary fix for that : using VMWare on top of Windows. That way I could suspend VM’s and simulate hibernation. Don’t know if it’ll be as comfortable to use, but it could fit my use-cases.

I’ve already installed some fans at the front of my case : three Noctuas 140mm “Chromax”. Plugged into the motherboard they appear to be stuck at 500 RPM, though.

I’m installing some software, I’ll try a little of everything : games, DaVinci Resolve, perhaps SolidWorks, STM32 software development tools, perhaps also Visual Studio. I’m not much of a benchmarks guy. For now, I mostly want to check compatibility with desktop apps. I’ll stick with the GTX970. If everything works, I’ll move on to the RTX3090. The UPS I’m using for this is only 900 VA so it should fit the load, but barely.

Keep in mind that the only EPYC I have is the 7282, a 120 W part. It’s probably not going to stress the VRM but maybe we can extrapolate VRM temperatures for higher TDP chips.

Software’s installing, I’m also decompressing huge archives off my network :

It’s been running like this for a while. The VRM heatsink is room-temperature.

It’s not unexpected : this is a 120 W part on a board that can take a 280 W processor. I don’t think it’s going to get any warm at all. Note that the airflow from the front fans is barely noticeable. As in : you need to lick your finger and place it near the VRM heatsink to even notice it. Needless to say, at that speed you can’t even hear them.

Likewise, the CPU heatsink is cold to the touch, all the way to the lowest fins.

I chose the 120 W to go in an always-on NAS, as I want to keep its overall power consumption as low as possible… as well as fan noise. I’d say it’s mission accomplished.

Going back to that hibernation itch of mine… the system idles at 70 W (the 80+ Platinum PSU surely helps) and I have ECC memory, so I may not actually need it to hibernate : I could just leave it on 24/7. This machine was always going to be on 18 hours a day anyway.

3 Likes

Thanks, yeah, that was my thought with the different cTDP states. The load-temperature function is probably exponential, so it might be tricky to extrapolate, but we can probably get some rough directions.

I just realized you mentioned that your UPS provide the power draw from the socket. Please track that too when testing, it would probably help the extrapolation.

Sounds reasonable. If the heatsink temp doesn’t change with different cTDP settings, then we might not know much without a thirstier CPU. Still curious whether you get any variability.

If I may add my personal opinion about this : I think our paranoia about VRM heat comes from “gamer” hardware marketing.

I’ve already mentioned I’m an electrical engineer. Modern VRM designs are extremely efficient to begin with, well over 90 % even for the crappiest. It’s fair to say that no matter the processor and motherboard quality, the VRM is only going to dissipate around 5 % of the power going into the processor. Even with a 280 W processor, that’s 14 W wasted in the VRM at most.

Now look at consumer-grade motherboards : Anandtech did a piece on ThreadRipper boards a while back that had a whole page on VRM’s :

Most of those boards use a 16-phase VRM design, with 16 FET stages rated 90 A each, so 1440 A total. Assuming that’s all generating Vcore, and it’s 1.1 V, we’re looking at motherboards which can deliver close to 1.6 kilowatts to the CPU socket. Which is patently insane.

The target demo is obviously Der8auer, Jaz2Cents and every YouTuber crazy enough to pour liquid nitrogen on a running processor for fun and profit. I think it’s safe to say that’s not us.

But even so. Assuming 5% of waste in the VRM, that’s only going to mean 80 W of heat to deal with. Spread over a lot of FET’s. There’s no way you need a crazy huge RGB-encrusted heatsink just to deal with that. But it sure looks cool, and it’s one way to stand out from the crowd and make Joe Gamer spend a hundred bucks more on a motherboard that just has five bucks more worth of aluminum on it.

Back to the ROMED8-2T : it’s a thick board. It kinda shocked me at first but I’d say it’s at least a 12-layer PCB. Rather expensive and hard to justify on a consumer product where you want high margins. It does three things for us, though :

  • first, that’s probably the only way you can make an ATX-size board with 128 PCIe lanes and 10 Gb Ethernet channels. I’m pretty sure all those consumer-grade boards are EATX because they needed the real-estate to route 128 lanes on fewer layers.

  • second, this motherboard is stiff. It doesn’t bend under its own weight the way my Asus X99 Deluxe does.

  • third, more layers means more copper. In circuit board design we definitely use power layers as heatsinks because they are just large foils of copper.

And I’m guessing that’s why the VRM is going to stay cold : A/ a properly designed VRM used within spec doesn’t get hot in the first place and B/ between the thick board and the heatsink it’s got all the thermal management it actually needs.

That being said, that’s just my opinion and I’m ranting while software installs run in the background. Have any of you guys ever seen a toasty VRM ? What was the context ?

2 Likes

OK, not exactly the most academic of tests, but for shits and giggles… I ran Cyberpunk 2077 in 4K on the poor GTX970. Here’s what it looks like in infrared :

Of note :

Room temperature is 18 °C.

Power draw at the plug rose to 227 W. Copacetic : the 970 has a 150 W TDP and the game doesn’t really need 16 CPU cores.

At this point the machine has been running several hours. The two DIMM’s above the processor are a few degrees warmer than the two below the processor, so there’s definitely a little heat coming off the EPYC, all sticks are still under 40 °C.

The X550’s heatsink now reads 48 °C, it’s the hottest thing on the board, as you can see on the picture. That probably has to do with the fact that one of its ports is connected to a 10 Gb/s NAS and it confirms my initial fear that this part would require additional cooling.

The Sabrent SSD is a good surprise, consistently under 40 °C without heatsink. It really seems to benefit a lot from the front fans.

The VRM is at 26 °C.

If you have a CPU stress-test that’s easy to setup, I’ll try it. I think I remember one but I haven’t done it in a long time.

3 Likes

You can try running Cinebench or Prime95 as a CPU stress test, and for GPU FurMark is a good test.

So. Google led me to PassMark. They have a “burn-in test” to stress-test systems for stability. I ran it for CPU and RAM, and of course set everything to the maxxxxx. Here is where it stabilized :

Of note :

I think I’m right about the additional power planes acting as heatsinks. Notice that the circuit board’s temperature is always even. Also, it rises with load. All that copper is really spreading the heat. This is a very good thing : PCB’s are laminates, a hot point on a board can lead to delamination through uneven thermal expansion. Not likely to happen here.

The RAM shot up to 47 °C. The temperature gap between the sticks above and below the CPU heatsink has narrowed, however. It makes sense : the heatsink fins start above the upper edge of the DIMM’s and the case is vented at the top, so most of the heat coming off the CPU will miss them. 47 °C appears to be the temperature those sticks run at if you use them 100% with no airflow. It probably helps that those are dual-rank DIMM’s and therefore have more packages to spread the heat. At any rate, 47 °C for any silicon-based IC is perfectly acceptable.

The CPU heatsink finally got a tiny bit warm to the touch. The FLIR tells an interesting story : those heatpipes really work well, the temperature of the fins is the same as that of the base of the heatsink. The fin sides (which are closed) are significantly cooler.

I tried shooting the VRM from the edge to get a reading on the components under the heatsink but this isn’t reliable (I should have bought an even higher resolution bolometer, but those are way too expensive). What I can say is that at some point near the heatsink I saw 32 °C. It may have been the temperature of a VRM FET. The 1 / 1.5 °C difference with the heatsink is easily explained by the heatsink’s thermal resistance which is lower than I expected. The BCM reports a “board temperature” of 33 °C which may be measured near the VRM.

I tried the SSD stress-test and, unfortunately, it didn’t go well. The poor thing throttled almost immediately and got real warm

This puppy is really going need its heatsink, I’m afraid. It couldn’t go past 3.5 GB/s even though @wendell tested it at over twice that speed.

3 Likes

While I’m sure the steady-state load on those VRMs doesn’t need anywhere as much current as they provide headroom for with all those phases, the transients are probably quite huge, and given the tiny voltages these are running at, they probably need ultra-low impeedence power-stages to stop the CPU browning out when the load changes suddenly.

Another reason that ROMED8-2T is so thick and expensive is that it’s PCIe-4 compliant. The EPYCD8-2T is basically the PCIe-3 version and is £100 less for pretty much the same functionality.

1 Like

I’m thinking you’re correct on all counts.

I just realized I needed to add a point of comparison for all those FLIR photos I’ve been posting. Here’s my current workstation, as it’s been running for 7 years (minus the 3090, which has only been here for around four months)

This is a Core i7-5820K, cooled by an AIO watercooler, on an Asus X99 Deluxe.

The hot point is the NVMe SSD, mounted vertically with a heatsink. It’s PCIe 3.0.

Close in temperature, the RAM : there are 8 sticks of 8 GB, closely-packed, with almost zero air flow (only the radiator fans above, and they don’t exactly work hard). So, clearly, DDR4 can run for years at around 50 °C with no ill effect.

The waterblock / pump on the CPU reads 45 °C : I’ve set the AIO for silence over performance.

The 3090’s body reads at 30-35 °C. It’s idle (I’m just running browsers and Office documents) and both fans are stopped, which explains the warmth.

Don’t comment on the cable management : I’ve been working on this machine recently, it’s not supposed to be tidy :grinning:

1 Like

I’m really going to have to try very hard not to ‘accidentally’ spend some of my RAM budget on a FLIR camera :smiley:

I would hate to think what the IR pics of the laptop I’m composing this on would look like - its CPU bounces off 99C under load!

I think it’s easy to get used to thinking anything ‘warm’ is a problem, but after a bit of research about the HDD’s in my NAS, I came to the conclusion that I wouldn’t be doing them any favours if by some miracle I did manage to keep them at <5C above ambient. Now I use SMART/IPMI to keep them at an average of 38C, but the fans don’t go crazy until they get towards the upper limits specified on the drives datasheets.

2 Likes

Well, I don’t know your RAM budget but the camera I’m using is a FLIR One Pro. It goes for 450 to 500 €. And no, I didn’t buy it just to check up on my PC’s :grinning:

2 Likes

I have a new Supermicro H12SSL-CT waiting to get installed in place of an old Xeon dual CPU machine. Unfortunately, the 7313p I also ordered will probably not likely show up for some, otherwise I could have checked hibernation for you. Don’t even have the RAM yet, but figure that will probably arrive sooner than the CPU.

Sounds like a nice combo, I am also leaning towards 16 cores to start. The p-cpus are in the next release wave, from what I understood. However I believe 7313 non-p is out, but that’s 170$ more if we believe the recommended prices… still cheaper than the 3955x though, if in a hurry.

@Nefastor thanks for the measurements, I’ll get back to those after work!

I’d appreciate that. I’m very much afraid that it won’t support hibernation either. I searched “ACPI” in its user manual and it says “ACPI power management (S5)”.

If that means it supports only the S5 state then it’s no good. S5 is just “soft off”.

@oegat I looked up the H12SSL on Amazon and found photos of all the variants in one row. This answers the question of which connectors are mounted on which.

It appears I was wrong : you can get the single-drive SATA on all variants except for the non-SAS / 10G Ethernet. So I guess that the “8 + 8*” ports in the product matrix means “8 single drive ports + 8 ports on the SFF8654 connector”.

1 Like

More thermal camera fun… I finally swapped the GPU’s between my current workstation and this EPYC tentative workstation. Now, I think it’s common knowledge that the RTX 3090 FE has a good cooler and honestly I don’t have any complaints, but it does run hot and it is a bit scary. You will definitely burn your fingers if you touch it in the wrong place while it’s running.

Here’s what it looks like a couple minutes after I pulled it out of my X99, where it was only showing the desktop and no GPU-accelerated software was running :

(room temperature is 18 °C)

Keep in mind, its fans do not start until it runs something harder than Chrome and Office.

And now it’s in. You really need to install all three bracket screws, this board is way too heavy otherwise. It tilts into the PCIe socket something scary. What you’re seeing on the left is the UPS showing idle power draw from the wall and yes that’s only 63 W for the whole system. It varies between 60 and 75 W, no clear average, call it 67.5 W.

Now for the real crazy stuff. I launched Cyberpunk 2077 and set all graphics options to the maximum (aptly-named “Psycho” in some cases). RTX ON, of course. Looks perfectly lovely. Now the UPS reads 455 W and here’s what it looks like on thermal :

If anyone still wondered why I chose to install that GPU well away from the motherboard…

What’s interesting is that NVIDIA clearly prioritized silence : this GPU is clearly operating at max TDP (from previous tests in this thread we know this machine rarely uses more than 100 W, so the additional 350 are all for the GPU). Yet the fans barely make any noise and the airflow is a very gentle breeze.

This leads to some… uncomfortably high temperatures :

60 °C at the front, 80 °C at the back. This board should come with several “warning : hot surface” labels.

And you can tell it’s really sucking in the watts just by looking at the PCIe extension :

The first ribbon is regular cables and carries the 75 W of power a PCIe slot can provide. All the other ribbons are data. That power ribbon does feel warm to the touch. Remember that LTT video where they strung three meters of PCIe extensions and it worked ? Well, it probably wouldn’t have with a 3090.

Other than that, I’m pleased to confirm that the 3090 runs without issue on EPYC with Windows 10.

Oh yeah, I’d like to add : while this GPU occupies 3 slots, its actual thickness is 2.5 slots. That, combined with the low airflow required by its fans, means it should be safe to run several of these right next to each other. I’d make sure to have some good case airflow, though.

2 Likes

The adventure continues…

First, a ROMED8-2T issue : the first PCIe slot appears to work at x8, not x16. That’s according to GPU-Z, which I have no reason to distrust. The motherboard’s manual says it’s an x16 slot, though :

Since I’m using a PCIe extension, it was very easy for me to switch the GPU over to slot 2, which I have setup as x16. And this time GPU-Z does report it’s working at x16, 4.0 speed. So I guess Asrock has some explaining to do…

I’ve gone crazy (crazier ?) and closed the case :

(the tiny monitor is for the BMC, it’s the only thing I have that has a VGA input)

Because the case has a tempered glass side-panel, I can’t use the FLIR directly (glass stops infrared radiation). Luckily, there are a few holes I could shoot through both at the front and rear. I fired-up (pun intended) Cyberpunk once again and let it sizzle for a good half hour to see where things would settle. I was pleasantly surprised :

  • 57 °C at the rear vent
  • 53 °C at the PCI connector
  • 70 °C at the backplate (was 80 °C with the case open)

The side panel really seems to channel the airflow from the front case fans as well as can be expected. Those fans are still running at only 500 RPM.

The glass gets a little warm but nothing to worry about :

Remember, that’s the temperature of the glass, the FLIR can’t “see” the graphics card through it.

There’s a 21 mm gap between the GPU and the glass, and it appears to be enough. I really wasn’t sure when I thought-up this machine.

As soon as I stopped Cyberpunk, power draw fell down to 65 W. So maybe I don’t need to hibernate this machine and should leave it always on.

1 Like

@Nefastor First I must say that I too started trying to think up enough reasons to need one of those heat cameras :slight_smile:

Nice to see that VRMs are not reaching anything remotely alarming in temps. I don’t know what to expect for the 7443p then, factoring in the wattage difference to 7282, but your calculus suggests we are looking at a sort of linear increase of VRM power loss with CPU TDP (not exponential, as I speculated, I kind of forgot that it is TDP we scale against, not clockspeed*voltage as if overclocking). So the sustained heat budget should probably be fine.

And there is no bifurcation setting or so in BIOS that could be messed up?

The 70-80 °C on the backplate might be partially due to its position in the case, I suppose it is not a place that is likely to be reached by the ambient airstream. In contrast to regular PCIe slot mounting, where the backside usually gets some of it.

I suppose the 3090FE is of the kind exhausts most or all of the hot air through it’s bracket, rather than back into the case? Should be a good thing in a build like this.

Phew, the glass won’t melt then :slight_smile:

Those images match the ones on Supermicro’s homepage, so I suppose they are accurate. It is still a bit weird with the 2 orange single SATA ports on all boards but NT, I don’t really see what they map to in the spec lists (if they are part of the 8, where are the other 6?)… anyway, as long as there is the cable option to attach some SATA, I think the NT board is the most interesting.

I notice that the M2 slots, just as on the ASrock, have potential to block longer PCIe cards if the nvme heatsinks are too large. I don’t know whether that will be a real issue. I probably need to put GPUs in the slots as usual, since I need at least 2 (Linux + Windows). Neither will be top-tier though, more like <= 3060-tier.

1 Like

Nope. The BIOS is very explicit about that : each slot is listed as x16, you need to manually set them to either x8x8 or x4x4x4x4. Something else is disabling those 8 PCIe lanes on slot 1. Since they should be routed straight to the CPU, it could be a problem with the OS (but I haven’t found any indication of that in the Windows device manager) or some weird incompatibility.

The other day, I was repurposing my 2009 workstation (a Core i7-975 / X58 platform) into a NAS and I ran into such a weird situation : the processor has three RAM channels, however if I install an Intel X540 in any of its PCIe slots, the second RAM channel gets disabled. The X540 is PCIe 2.0 x8, no reason it should do this, it’s an Intel product on an Intel machine, and yet I couldn’t fix this. I had to install an X550 instead (PCIe 3.0 x4) and the issue didn’t manifest. Not an optimal use of resources, but better than losing access to 33% of the machine’s RAM !

I have a GTX460 lying around. Tomorrow I’ll try it on the Asrock’s slot 1 and we’ll see what’s what.

If you can afford one, you totally should. It opens a new world of perception. I know a guy in construction who uses one to detect faulty heat insulation in homes. Just go outside on a cold day or at night, point it at your house, see exactly where you’re losing heat. You can also detect poor electrical cables through walls. It’s good for any sort of machine maintenance. It really takes the guesswork out of a lot of situation. I wish I could see infrared with my own eyes. It might even let us detect Predators should they be real and among us :sweat_smile:

It seems FLIR makes a cheaper version of the one I’m using. Costs around 250. It’s got a lower-resolution bolometer but honestly, with IR you don’t need megapixels. Especially since those FLIR cams fusion the heat data on top of a normal image taken by a regular camera. It helps, because pure IR images are too mushy to make out exact components.

Actually it’s right in the middle of its own airstream. This is a case designed for a dual-PC setup (it’s a game streamer thing, I’ve been told). The area where I put this GPU is intended for a complete Mini-ITX PC with its own GPU. It seemed like a useful feature when I ordered it, and boy was I right.

The fans are a bit weird on the 3090 : the one close to the I/O plate pulls air from inside the PC and exhausts at the back. The other fan pulls air through the second half of the card, which is just a heatsink. People have been concerned that in a normal installation it would blow hot GPU air at the processor and RAM. Honestly, in my X99 machine this hasn’t been a problem. It really seems like most of the heat goes out the PC and that second “internal” fan doesn’t do much.

2 Likes