Return to Level1Techs.com

Case and CPU Coolers for EPYC 7413

I’m building a workstation for CFD modeling with two EPYC 7413’s and a MZ72-HB0 motherboard. I have seen the video where Wendell used the Fractal Torrent case for the MZ72-HB0 motherboard. However, Gamers Nexus showed the case to be very noisy while being excellent at cooling. I was hoping to do a build that has both good thermals and being relatively quiet, if that is possible. Would the Fractal Meshify 2 XL be a good substitute, the reviews shows it being good at cooling while being quieter? Should the stand-offs in the case for E-ATX motherboards be the same as the Fractal Torrent? Are there any other cases to consider? I was also interested in the Thermaltake X9 case, but it appears to be no longer produced, are there similar cases for E-ATX motherboards in the same configuration?

For CPU coolers, is an air cooler (i.e. NH-U12S) the best choice? Are there any problems with the CPU air cooler being orientated 90 degrees from the front to back airflow in the case? Would AIO coolers be a better solution for this build?

A further question, the EPYC 7413 has a TDP of 180W with a cTDP range of 165W to 200W while the EPYC 7443 has a TDP of 200W. If the cTDP is set to 200W for EPYC 7413, will the CPU performance be to closer to that of an EPYC 7443? I’m trying to understand what increasing the cTDP provides.

I built a system with a Fractal Meshify 2 XL this week actually, but only with a single EPYC 74F3 (240W TDP) with NH-U14S TR4-SP3 cooler on an ASRockRack ROMED8-2T board. I’m using the fans which came with the case, and additionally a Noctua NF-A14 in the roof, and 2 NF-A8’s to cool the motherboard VRMs and NVMe SSDs.

I’m personally very happy with the case and the Noctua air cooler. I had no problems with cooler orientation for performance. The U14S has about 1cm of clearance with the side panel from the top of the fan, the heat pipes are shorter - about 1.5cm of clearance. My main concern was how much weight the cooler exerts on the socket, as I had heard of a mounting bolt coming out of a SP3 socket before (although could be from overtightening rather than levering), so I supported it a bit with cable ties under tension.

AIO should be even better, and less strain on the socket.

The loudest noise seems to come from my PSU (EVGA supernova 1000 T2) when the case/CPU fans are set low. With the case/CPU fans set to 100%, they are distracting, but I wouldn’t say “loud” - but of course very subjective. I can’t tell if the Noctua fans are any quieter than the Fractal Design ones, they seem very similar.

With them set at 50%, they’re a little louder than my PSU fan, noticeable only in a very quiet room. At 20%, the PSU is lowest.

The top of the case has fan mounting rails rather than just holes, so you can position top fans anywhere you’d like.

I actually did some very basic load/temperature testing a couple of days ago:

Definitions:

  • Load: 0% = idle, 100% = 48 threads running POVray benchmark using performance governer
  • Governor: CPU frequency scaling governor set by cpupower on all cores.
  • Case fan / CPU fan: Percentage, as set in ipmitool.
  • Temperature = Highest Tccd temperature reported by Linux k10temp, 20.5°C ambient. Add/subtract a degree or two, as they fluctuate a fair amount.
  • 30 minutes to stabilise temperatures between tests:

Results (hottest to coolest):

Load Governor Case fan CPU fan Tccd °C
100% @ 3.5Ghz performance 20% 20% 63.0
100% @ 3.5Ghz performance 100% 100% 54.2
100% on 6 core @ 4GHz performance 20% 20% 50.8
100% on 6 core @ 4GHz performance 50% 50% 49.5
100% on 6 core @ 4GHz performance 100% 100% 45.8
100% on 1 core @ 4GHz performance 20% 20% 43.0
100% on 1 core @ 4GHz performance 50% 50% 42.2
0% @ 3.2GHz performance 20% 20% 32.8
0% @ 3.2GHz performance 50% 50% 30.8
0% @ 3.2GHz performance 100% 100% 29.2
0% @ 1.5GHz powersave 20% 20% 29.0
0% @ 1.5GHz powersave 50% 50% 26.2
0% @ 1.5GHz powersave 100% 100% 25.2

I wouldn’t expect any problems at all mounting a SSI EEB/EATX, the stand-offs are all removable if you need to reposition them (handly little tool comes in the accessory box), and there are rubber covers over the holes in the tray for cables for both standard ATX and EATX. Plenty of room at the bottom for angled front-panel connectors too.

Adding a photo so you can see my mounting hole / Noctua situation in the Meshify 2 XL:

Not sure about the cTDP question. I was under the impression that if you raised the cTDP above the limit for the model, then it just used the model limit anyway.

3 Likes

My understanding is that this is correct for Milan. I believe it was sorted out somewhere in this long thread.

1 Like

Clarification question, does this mean that 3.5MHz was where you ended up when loading all threads with POVray?

I quess the card at the bottom is an nvme bifurcation card, which one? I don’t recognize it, but I also don’t know the market.

Also, how come you put the little Noctua in the front a bit lower than the VRM heatsink? I’m also curious what VRM temps you get when fully loading the system.

3.5GHz, yes. AMD say the all-core is 3.2GHz, but i’ve seen it vary from 3.5GHz with POVray, and 3.7GHz with openssl benchmarks, I guess it depends on actual power usage rather than just what the OS determines to be load. Either way, very happy with the F-series Milan.

Yup, a quad M.2 PCIe gen4 - Gigabyte AORUS Gen4 AIC - just a lane-splitter with a fan, no PLX. Actually have two of these and having hassles with them both at the moment: some of the ports on the cards consistently work perfectly and some ports sync at gen4, but get PCI AER/parity errors, or sync at gen3 and work perfectly. Not sure what is happening yet, currently in a process of elimination.

Because in that photo I didn’t put the other little Noctua next to it yet =)
I can’t find out if the VRM temperature is reported in IPMI or nct6779 sensors, the hottest is IPMI “Onboard LAN Temp” at 43C, and “Card Side Temp” at 37C. But, the VRM heatsink doesn’t even feel warm to touch, when running POVray, phoronix-test-suite stress-run y-cruncher, or OpenSSL benchmarks.

edit: First Geekbench run I did with this system (catchy “Model” name =)) : To Be Filled By O.E.M. To Be Filled By O.E.M. - Geekbench Browser

1 Like

Thanks. I did not know of that AORUS nvme adapter card. I could see it was not the corresponding Asus model (Hyper M2 gen4), as the Asus has the slots oriented diagonally and yours seem to have them aligned with the slot - I think your card has the better thermal design, putting all nvme:s in equal orientation relative to the builtin fan. You use a Noctua to cool them, does that mean you switched off the builtin blower fan? If so, was this because of noise? (Or did the builtin fan not suffice?)

I myself am waiting for an Asus Hyper M2 card, that I ordered for my Epyc rig (currently running a 7252 cpu on a Supermicro mobo).

Thanks, as I lack a Milan chip yet I try to get a feel for what kind of frequencies to expect in practice for different scenarios. The speeds one gets are apparently very dependent on the workload. Where did you get the info about all-core turbo being 3.2GHz? I have had a hard time extracting any such specs from AMD directly. Edit: Nevermind, I realize that 3.2GHz is the listed base frequency! Which is of course what to expect as minimum given load on all cores.

Re the temp sensors, does it seem like the board lacks VRM temp sensors, or is it rather that the sensors have cryptical names in IPMI? The IPMI-based sensors on my Supermicro H12SSL-I are properly named, here is a post where I list them. I don’t know if it can give you any hint about the meanings of your sensors (probably not, as the manufacturers may wire the sensors differently despite using similar BMC chips).

It was a temporary setup for testing, giving easy access to the SSDs. Without the cover, the airflow from the integrated blower fan isn’t guided over the SSDs, so I added the temporary Noctua. I have two of the cards with SSDs now, with the covers on, and cannot hear the fans. They have a large piece of copper to spread the heat also, I’m quite happy with the cooling.

At least with the version of the BMC firmware I have (01.10.00 / 2020-09-29), there’s no sensors labelled “VRM” in IPMI, but I still don’t know where the “Card Side Temp” and “MB Temp” sensors are. I haven’t tried upgrading the firmware yet since I’ve had no problems otherwise. Here’s the output of ipmitool sensor | grep degrees at idle :

MB Temp          | 26.000     | degrees C  | ok    | na        | na        | na        | 55.000    | na        | na        
Card Side Temp   | 42.000     | degrees C  | ok    | na        | na        | na        | 68.000    | na        | na        
CPU Temp         | 31.000     | degrees C  | ok    | na        | na        | na        | 93.000    | 94.000    | na        
TR1 Temp         | na         | degrees C  | na    | na        | na        | na        | 65.000    | na        | na        
Onboard LAN Temp | 49.000     | degrees C  | ok    | na        | na        | na        | 103.000   | 104.000   | na        
DDR4_A Temp      | 32.000     | degrees C  | ok    | na        | na        | na        | 84.000    | 85.000    | na        
DDR4_B Temp      | 32.000     | degrees C  | ok    | na        | na        | na        | 84.000    | 85.000    | na        
DDR4_C Temp      | 32.000     | degrees C  | ok    | na        | na        | na        | 84.000    | 85.000    | na        
DDR4_D Temp      | 31.000     | degrees C  | ok    | na        | na        | na        | 84.000    | 85.000    | na        
DDR4_E Temp      | 31.000     | degrees C  | ok    | na        | na        | na        | 84.000    | 85.000    | na        
DDR4_F Temp      | 33.000     | degrees C  | ok    | na        | na        | na        | 84.000    | 85.000    | na        
DDR4_G Temp      | 33.000     | degrees C  | ok    | na        | na        | na        | 84.000    | 85.000    | na        
DDR4_H Temp      | 32.000     | degrees C  | ok    | na        | na        | na        | 84.000    | 85.000    | na   

Thank you very much for your replies.

I’m leaning towards using NH-U12S coolers, but as with xzpfzxds, I’m afraid of the torque that the air cooler will place on the CPU/motherboard. Did you find your cooler putting too much stress on your motherboard that you used braces for your cooler?

Is the motherboard’s VRM very hot during use? I have read it is a potential problem when using a server motherboard in a workstation configuration? Did you need additional airflow over the VRM?

What RAM are you using? Is there a reliability/performance difference between Samsung, Supermicro, NEMIX, etc… ram? I have previously used NEMIX ram and it worked well, and it’s cheaper than other brands, but it makes we wonder if it has the same reliability/performance as other brands.

I see, nothing in that list that particularly looks like it could be the VRMs. “Card side” is the only one it could be IMO, but then why call it that? Anyway, thanks for reporting.

That’s good to hear, as thin blower fans are always a bit of a gamble. Is that four ssds in each card?

This issue was discussed in a few threads here over the last few months, I don’t have the links at hand unfortunately. Generally it seems it is less of a problem than feared, and that it can be solved by extra spot cooling like how @xzpfzxds set it up. My own Rome cpu (7252) is too weak to put any serious of stress on my VRMs, so I use only the regular consumer-case-low-pressure-fanwall in my beQuiet silent base 802 (for now). In the link in my previous post you can see some details and also my temp delta between idle and full CPU load, but mind that it is from a mere 125w CPU. I also have the CPU fan sucking air over the VRMs, which you won’t have with the Noctua.

I use Samsung-branded ram on recommendation from my retailer. I believe the ram branded Supermicro is made by Hynix nowadays (if I remember right). I don’t think there is a serious difference in reliability and performance, what can matter is dual vs. single-rank. I went with dual-rank as it is theoretically better performance in some scenarios, at least as long as you have one memory slot per channel (as is the case with the MZ72-HB0).

Does anyone have feedback on using the H12DSI-NT6 (or H12DSI-N6) motherboards, or has anyone done a review of it? My order for a MZ72-HB0 has been delayed by a month and I’m concerned about it slipping further, and H12DSI-NT6 appear to be stock but I have never seen a review of the board.

Thanks, I hope the Asus card I have on order is no worse in that respect. I’m curious, how many ssds do you have in those cards, and how do you use them/what for? What filesystem(s) do you use with them?

My question is because I am pondering how I’m going to build my own array for storing /home and some VM backing, on my Epyc workstation in construction.

I’m using 6 × WD SN850 2TB, each partitioned as ESP, 1.7T ZFS, 0.3T md.
ZFS partitions are in one pool as 3 pairs of striped mirror vdevs. So about 5T of effective space. I put each side of the mirror per adaptor, so I can lose any drive, or an entire adaptor.
The md partitions are in raid0 for 1.8T of scratch space.
For VMs I use either raw disk images (+ ZFS snapshots) or qcow2 files (if i need suspend/resume).

1 Like

Nice, my plan is basically a smaller scale of that. I will accept the adapter card as a possible single point of failure, and put up to four drives in it. I also plan to use ZFS, since it fits my home ecosystem and would simplify backups. I plan to use files rather than zvols for VM backing, as you describe.

I was thinking of going with a simple 2x2Tb mirror vdev to start, leaving room for expansion. Given prices of disks, 2+1 raidz (with 1Tb drives) is also a possibility, but I think I prefer simplicity. Have you done any benchmarks?

Also, I’m curious what motivated your choice to get the SN850. I don’t really know what parameters to track other than speeds and guaranteed TBW. I read something about internal shuffling around of data within the device, that could be more or less suited for ZFS. Also something about behaviour in the event of a powerfail playing a role (I do have a UPS).* Anyhow I don’t really know what to look for in the specs. If you have any leads I’m happy to hear them!

*Clarification edit: the italicized lines refer to unknown ssds, not to the SN850. I realize now that might not have been clear! I know nothing about the SN850 in particular, hence my asking :slight_smile:

My impression is that there are two more or less hard upper limits: the power budget given by (c)TDP, and the specified max clock. This suggests that the 7413 will not go over its spec’d 3.6GHz regardless of cTDP. With a few cores loaded, then of course 7443(p) with its 4.0GHz would be faster.

However, if you load all cores, then it seems to depend a lot on the workload whether the TDP limit is hit before all cores reach max speed. This suggests that if you plan to load all cores and your workload would hit 200w TDP before 3.6GHz is reached, then it should not matter whether you have the 7443 or the 7413.

The question then would be, whether the workload you care most about would reach above 3.6GHz on a 7443. There are some owners of 7443(p) in this forum, that might have experience with the relevant workloads.

1 Like

Choice of the WD SN850 was for good read throughput/IOPs, being PCIe 4, and
reasonable write durability and price for the capacity/performance. It’s a development/testing machine and my usual workload is not write-heavy, so I didn’t see the need for something like Intel DC P-series.

Had some experience with some SN850’s in some very write-heavy workloads too, and haven’t killed them yet :slight_smile: (~1.3-1.5PB written).

I have some numbers handy from testing with 4 SSDs on one card, the 2 cards with 6 SSDs were a slight improvement, didn’t really test heavily because I have no idea what I’m doing with fio, and the numbers were all-over the place with different fio parameters and zfs attributes, but real-world VM/database performance easily met my needs anyway :slight_smile:

I used the md device as a base-line to see what to expect. I have no idea why ZFS reads are faster than md raid-0 reads. Also no idea why random writes are so low.

test throughput IOPs
[1] raw md (no fs) / sequential read 26.0 GiB/s 426k IOPs
[2] raw md (no fs) / sequential write 8.739 GiB/s 140k IOPs
[3] ZFS / sequential read 16k 27.8 GiB/s 1824k IOPs
[4] ZFS / sequential write 16k 2.177 GiB/s 877k IOPs
[5] ZFS / random read 16k 2.099 GiB/s 134k IOPs
[6] ZFS / random read 4k 0.495 GiB/s 126k IOPs
[7] ZFS / random write 16k 0.166 GiB/s 10.9k IOPs
fio commands

[1] raw md (no fs) / sequential read

fio \
--name=test \
--time_based \
--runtime=60 \
--size=1000G \
--filename=/dev/md127 \
--direct=1 \
--bs=64k \
--iodepth=16 \
--numjobs=24 \
--ioengine=libaio \
--rw=read \
--group_reporting=1

[2] raw md (no fs) / sequential write

fio \
--name=test \
--time_based \
--runtime=60 \
--size=1000G \
--filename=/dev/md127 \
--direct=1 \
--bs=64k \
--iodepth=16 \
--numjobs=24 \
--ioengine=libaio \
--rw=write \
--group_reporting=1

[3] ZFS / sequential read

fio \
--name=test \
--rw=read \
--time_based \
--runtime=60 \
--ramp_time=2s \
--size=50G \
--direct=0 \
--verify=0 \
--bs=16k \
--iodepth=16 \
--numjobs=24 \
--ioengine=libaio \
--end_fsync=1 \
--group_reporting=1

[4] ZFS / sequential write

fio \
--name=test \
--rw=write \
--time_based \
--runtime=60 \
--ramp_time=2s \
--size=10G \
--direct=0 \
--verify=0 \
--bs=16k \
--iodepth=16 \
--numjobs=24 \
--ioengine=libaio \
--end_fsync=1 \
--group_reporting=1

[5] ZFS / random reads 16k

fio \
--name=test \
--rw=randread \
--time_based \
--runtime=60 \
--ramp_time=2s \
--size=50G \
--direct=0 \
--verify=0 \
--bs=16k \
--iodepth=16 \
--numjobs=24 \
--ioengine=libaio \
--end_fsync=1 \
--group_reporting=1

[6] ZFS / random reads 4k

fio \
--name=test \
--rw=randread \
--time_based \
--runtime=60 \
--ramp_time=2s \
--size=50G \
--direct=0 \
--verify=0 \
--bs=4k \
--iodepth=16 \
--numjobs=24 \
--ioengine=libaio \
--end_fsync=1 \
--group_reporting=1

[7] ZFS / random writes 16k

fio \
--name=test \
--rw=randwrite \
--time_based \
--runtime=60 \
--ramp_time=2s \
--size=50G \
--direct=0 \
--verify=0 \
--bs=16k \
--iodepth=16 \
--numjobs=24 \
--ioengine=libaio \
--end_fsync=1 \
--group_reporting=1
2 Likes

Great to hear your experiences, and thanks for the benchmarks! Those WD disks sound like an option for me. I also looked at Samsung PM9A1, which have similar TBW ratings and seem to be marketed more as enterprisey (without a c, for once).

For the fio settings, I read claims that libaio should go with direct=0 (not 1). That said, I did some testing on a single-sata-ssd zfs volume (see below) with direct= {0 and 1}, and did not get obvious differences in results.

I would suspect that ARC is active regardless of fio trying to bypass caches. If I run the same command as you do on a zfs volume backed by a single Samsung 860 evo 1Tb SATA ssd, but with 5Gb files instead of 50Gb (space limitations :stuck_out_tongue: ), I got up to 4.7Gb/s for 16k seq reads - that is 10+ x what the SATA link can theoretically sustain :slight_smile:

In your case the ARC is likely less of an issue, as you use bigger files. But still, as default it is at 1/2 of total RAM if not tuned differently. However, I believe that you can get around it - fio does not delete the files when done, and if I reboot and run the same command in the same dir I get more reasonable 450-550Mb/s read speeds from my SATA-connected 860 evo. This tallies well with it being the ARC. When the files are first layed out a substantial portion is put in the ARC, but after a reboot the ARC is empty (at least unless you have zfs 2.x).

Today another post appeared, also describing unnaturally high seq read speeds with zfs. I think the above speaks to @thetrick 's question in that thread.

1 Like