I built a dual-Epyc 7773X server ! It's FAST, and now in V3.0 (updated 2, with pics)

You can do something like this, with a plate heat exchanger inside, then have a rain barrel outside.
These can exchange about 200k btu/hour.
Just make sure to add an air stone and bubbler to your rain barrel or it will eventually get too hot.

1 Like

Hello everyone,

I really couldn’t help but chime in on the discussion about using Epyc CPUs for CFD work we are using Milan here though. As an aerodynamicist who has been using a similar setup for the past three years, I’ve found them to be quite stable in general. We faced the same challenge with those identical coolers as well. Really you want them rotated 90 degrees to exhaust out the back of the case.

I agree with the post about these motherboards being designed for racks with front-to-back flow and coolers that don’t blow in that direction. In my experience, adding more fans and cooling isn’t the ideal solution; sometimes, a simple duct can be more effective. We’ve had success with 3D printing them for this kind of case, but many materials and techniques can work.

From looking at your computer, it seems to me that the high-heat CPU exhausts are blowing over the VRMs. I would suggest ducting the intake air to cool them directly or venting the CPU exhaust out the back and away from the VRMs. Personally, I prefer the latter option. Some fans to mix cold air with the exhaust air as a less-than-ideal solution that might work though I prefer to keep things cooler than that. When we design cooling systems, “hot things hot and cold things cold” is a central philosophy. So just mixing good and bad to make lukewarm isn’t really my favorite.

On a different note, I’m also curious (excited) about what solution you’re working on with that long solve time but fits in such a small amount of memory. I don’t want to pry into confidential information, but I’d love to hear about it if you’re willing to share more.

1 Like

The plate heat exchanger is major overkill, but they work extremely well. I used them in industrial medical equipment. You don’t even have to have all the much water flow in the external circuit. City water temp is fine.

I’m doing marine CFD. Incompressible RANS VOF with wall functions. Approx. 10GB RAM per million cells ! On not too large 5 DOF calculations, I need 1 day to simulate 1 sec. of physical time…

A relatively easy way to duct air to a particular spot is to use the lines from:
https://www.loc-line.com/
The 3/4 inch system may work well for you.

I usually pick up some sufficiently temperature rated plastic. If you are in the US, most cities have TAP Plastics, easy enough to cut with scissors and then glue it together with epoxy.

If the exhaust air from the CPU is 80 degrees, blowing over the CPU comes out on the parts you will work very hard to get the temperatures low where a duct could expose them to near ambient air even a modest flow rate is huge.

2 Likes

Hi guys,

It’s been a few weeks, and a lot has happened (beside my birthday :wink: ) !

First, as I wrote previously, I installed Arctic Freezer SP3 4U coolers on the CPUs, and Noctua Industrial 3000RPM fans in a purely front-to-back air flow.

It barely worked to properly cool the VRMs. More precisely, it didn’t cool the VRMs more than the previous installation (mostly bottom to top flow, with the little 60mm fans blowing directly on the VRM radiator), while being unbearably louder ! And the “downwind” RAM VRMs went much hotter as well.

A few more lessons learned : The quality of the Arctic coolers is nowhere as good as the Noctua stuff. The top fin was already barely attached on one of the coolers when they arrived, and it got completely detached after sliding the fan just once. Same thing for the thermal paste : I had one of the coolers mounted on Arctic paste, and the other one on Noctua paste, and there was almost a 10°C difference between the CPU temperatures (no, it wasn’t because in this configuration, one CPU cooler blows hot air into the next : it was the one feeding on cool air that was hotter !).

Anyway, last episode in this story… the motherboard crashed while doing a big calculation.
It wouldn’t even POST. IPMI was still working, so I was able to check that the different components of the computer were still talking to the MB. I was also able to re-flash the BIOS, after which the machine POSTed again, and enabled me to access the BIOS. The Linux launcher also worked, but now the machine keeps crashing while loading Linux. Both on the SSD or on a flashdrive. The CLPD LED of the board keeps flashing, and the IPMI doesn’t find the CLPD firmware (but I’m not sure that it wasn’t the case before…). I’m in touch with Gigabyte support, waiting to see if they’ll send me the .rcu file to re-flash the CLPD firmware (if there’s any on this board…).

For the ones who have the same board, does your Firmware page in IMPI look the same ?

All in all, I’ve now been two weeks without the machine on which I should be doing calculations 24/7, and it’s not good for the project…

Once again, your insight and your help will be very much appreciated.

Have a great day,
David

1 Like

Well, here’s the reply from Gigabyte : we do not provide CPLD firmware update, contact your supplier to arrange repair service… :sob:

The board is out of stock everywhere in the world as far as I can tell, so my supplier can’t do an advance RMA, and I’m stuck without my calculation ressource for weeks…

I would have no problem buying a new MB to speed up the process, but right now, only the Asrock Rack Rome2D16-2T can be found. And while I love that it has 2 M.2 connectors, I have doubts on the 3 instead of 4 connection lanes between the CPUs, and also on the thermal characteristics : At least, their spec doesn’t try to sugarcoat it…

Is that even per CPU, or total ?

Any opinion on the relative merits of both boards ?

That would be 240W cTDP per cpu.

The problem is that these dual Epyc ‘server’ boards are designed for server cases. Those cases have row of fans very close to the 1st CPU socket. And those fans spin at very high spindle speed (noisy).

What we need is a workstation design as shown in the photo.

On the right we have the dual Xeon ‘workstation’ board. It has two massive heatsinks for the CPU VRM.

OTOH, the CPU VRM for the dual Epyc board is sandwiched betwen the CPU sockets. When you install massive coolers such as Noctua NH-U14S or Artic Freezer 4U, they essentially block air flow to the VRM. Thus your VRM got so hot when the 2nd CPU was under extreme load.

Replacing the case to Fractal Torrent will not help. Adding the Noctua 60mm fan does not help much because it doesn’t blow a lot of air. But if you mount a small fan on top of the VRM running at very high speed (~10000rpm), then the VRM will drop to low 80 under extreme loads.

Unfortunately adding such fan will introduce noise under loads. But it is the best solution outside of using water cooling.

When you get the replacement board, install only one CPU. That would be the socket toward the back of the case. Then run your application. You will see that the CPU VRM runs very cool. That is because there is no massive heatsink blocking the air path.

With all 128 cores running OpenSSL and two RTX A4500 GPU running gpuburn, here are the temperature readings. Fan3 is the tiny VRM fan running at 10K RPM. That keeps the CPU VRM at 82C.

The rest of the fans are spinning at very low speed.

1 Like

I cannot see how an overheating VRM could lead to the BIOS and the CPLD firmware getting nuked…unless the VRM controller happened to glitch out and issue SPI writes, maybe? Seems very far fetched. Something does not add up here.

I agree, but since I’m under time pressure, and the Gigabyte board isn’t available anywhere, I ordered an ASRock Rack Rome2D16-2T. I’ll benchmark it when it arrives, and I’ll let you guys know what I find out…

1 Like

Hi guys,

So I installed the ASRock MB in my machine.

I’ll write a complete report, with pics and all, a bit later, when I have time.

In the meantime, here’s a little teaser/short summary :

  • I’m not sure the Gigabyte MB was dead :flushed:
  • But the ASRock makes my calculations 30% faster (I’m not even sure how that’s possible) :open_mouth:
  • And it accepts a setting at 280W cTDP :sunglasses:
  • And I won’t have to worry about VRM temps again : there’s no VRM temp sensor on the board !!! :laughing: :thinking:

Back soon for the full report !
Have a great week-end everyone !

3 Likes

In IPMI sensors, or from the nct6779-isa device? My ASRockRack EPYC board has the VRM temps in the IPMI sensors.

In the IPMI.
There’s : TEMP_CPU1, CPU2, MB1, MB2, card_side, LAN, and one for each DDR4.
That’s it !

The motherboard might have an older IPMI revision. Normally CPU VRM temperature is reported.

My Asrock Rack ROMED8 board doesn’t report VRM temps either in the IPMI or via the sensors. Up to date BIOS revision too.

Can you provide a screen output of ipmitool? My H12DSi board reports the CPU VRM. Thanks!

Interesting, my ROMED6U-2L2T reports it both in the IPMI and to HWinfo64 as of BIOS version 3.30.