AMD Epyc Milan Workstation Questions

Asrock Rack ROMED8-2T with onboard X550-T2 , both ports connected at 10G, 20 C ambient, slight airflow from 3 silent fans at front of case :

My Mellanox ConnectX-4 Lx add-in cards are similar (2x optical 10G) : 55 C and 57 C

1 Like

https://www.mindfactory.de/Hardware/Arbeitsspeicher+(RAM)/DDR5+ECC+Module.html

unpleasant prices, but first retailer where I was able to spot some RDIMMS. Around +75-100%/GB compared to DDR4 RDIMMs. Have fun populating all 12 channels on Genoa :slight_smile:

1 Like

Hey fellow ROMED8-2T (and 7443p) users - I am currently running with 4 RAM sticks of 16 GB each (Samsung, DDR4-3200, CL22-22-22, reg ECC, 2 Rank → M393A2K43DB3-CWE) running beautifully in 4 channels.

I am dealing with a project where I (sooner than intended) need to up that a “little”, to 256 GB total at least. I am considering to go for 4 sticks of 64 GB, same type of RAM, same manufacturer (so I can eventually add yet another 4 as needed). Samsung has two options in its portfolio, each labeled as “Samsung RDIMM 64GB, DDR4-3200, CL22-22-22, reg ECC, dual rank, x4”:

  1. M393A8G40AB2-CWE
  2. M393A8G40BB4-CWE

The second option goes for 1.3 times the price of the first. Technically, digging into data sheets, even the chips appear to be the same except for a “v1” for the first option and “v2” for the second option in their type IDs. Option one was apparently first listed in about 2019 while option two apparently appeared in 2021.

Any ideas and/or experiences in terms of compatibility or other related issues?

Fun side question: I intend for my 16 GB sticks to become spare parts, but would it theoretically be possible to run this thing in 8 channel mode fully populated with 4x 64 and 4x 16 GB sticks? I remember servers populated with a mix of 1 GB and 4 GB sticks “in the old days” being able to do stuff like this.

From my investigations, I find no difference in specs or pricing for those two part numbers.

AFAIK, you can’t run asymmetric channels for memory on Epyc.
The modules all have to be the same.

1 Like

Update: I Got the replacement for my DOA front fan for my Arctic Freezer TR4-SP3, after a long wait. Adding it lowered the temps under full load from 64c to 59c in back-to-back testing at same ambient conditions, so you were right it makes quite a difference with both fans.

Although I still see up to 65c with both fans at 1500rpm, i.e. they don’t reach their max - so the default BMC fan settings of the ROMED8U-2T might not be well optimized for this cooler.

Looks nice - how did you attach the fans to the cooler, are those screws or some kind of straps?

Also did you compare temps with the Noctua fans vs. original fans on the AF cooler?

My ROMED8U-2T has an Intel X710, and I seldom see it go above 40c given normal ambient conditions (20-25c). At idle its now at 38c, with the idle CPU at 33c.

It’s only attached to a 1GbE network though, so I’m not sure if that’s a relevant comparison. I don’t have any 10GbE peers to connect it to yet.

From specs the X710 is supposed to run a bit cooler than both the Broadcom 10Gbe chip you have, and the Intel X550-T2.

Thermals consideration was my original reason to go with the H12SSL-i instead of the H12SSL-NT. I feared the onboard 10GbE on the latter would run too hot, given that I use desktop chassis. From your post it looks like my assumption was correct. You still use a Supermicro chassis, right? That should have sufficient cooling, maybe ask Supermicro whether your temps are in the expected range?

Memory timings on the ROMED8U-2T

I finally got around to fiddling with the memory speed settings on my ASRock Rack ROMED8U-2T. I did not try any literal overclock, as the default 3200MT/s are matched to the IF of my 7443p, and I don’t foresee myself wanting to run the sticks at higher clocks in any realistic scenario (unless convinced otherwise). But I did play around with timings, mostly to find out whether it would work at all. I’m a total noob on memory tweaking, so I tried only some basic stuff.

Caveat - no way to verify running timings

First, my ROMED8U-2T has a problem that I didn’t see on my H12SSL - it seems impossible to read the SPD info from the memory modules. I could not get them with decode-dimms in Linux, neither with MemTest86 - the latter only reports the numbers as missing.

Therefore the only way for me to verify a change of clocks is to run MLC and look for a performance difference.

Anyway, I have essentially the same sticks as @s-m-e , M393A2K43DB3-CWE, 16Gb 3200MT/s Dual Rank. My memory from the other board where I could read the SPD, is that they run at CL22-22-22-?? where ?? might have been 52 (did not manage to google it).
Edit: Hwinfo64 in Windows could read the timings despite SPD information being unavailable: they are reported as 24-22-22-52.

Test procedure

Here are my previous screenshots of the memory timing options:

BIOS menu


I first lowered the CAS Latency (Tcl) in steps of two, down to CL16. Then I got a message at POST that two of the dimms did not pass the self-test. Which means that 6 of them did, suggesting CL16 is not impossible with the right set of dimms.

Then I went up to CL18, and started to change the other primary timings. I settled for 18-18-18-36, wich I’m running now. Every change I did gave rise to a stable difference in MLC results, so I assume that changing the settings worked.

Stability

Passed:

  • 4 rounds of MemTest86 (freeware version)
  • One full round of MemTest86+ (FOSS)
  • About 1hr of mprime (prime95), mixed FFT sizes
  • No ECC errors reported after 2 days

Performance

MLC results only.

Default (3200MT/s, 24-22-22-52)
Intel(R) Memory Latency Checker - v3.9
Measuring idle latencies (in ns)...
                Numa node
Numa node            0
       0          94.8

Measuring Peak Injection Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads        :      144877.1
3:1 Reads-Writes :      134949.7
2:1 Reads-Writes :      136703.5
1:1 Reads-Writes :      137479.7
Stream-triad like:      140258.9

Measuring Memory Bandwidths between nodes within system 
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
                Numa node
Numa node            0
       0        144865.2

Measuring Loaded Latencies for the system
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Inject  Latency Bandwidth
Delay   (ns)    MB/sec
==========================
 00000  555.29   144735.1
 00002  552.16   144774.0
 00008  551.47   145408.6
 00015  545.08   145517.4
 00050  504.37   145663.3
 00100  387.87   146326.6
 00200  133.94   100345.0
 00300  124.09    69892.2
 00400  119.85    53663.7
 00500  117.53    43612.7
 00700  115.41    31760.8
 01000  113.95    22645.1
 01300  112.08    17668.3
 01700  106.92    13725.5
 02500  106.73     9578.5
 03500  105.94     7037.3
 05000  105.78     5121.1
 09000  105.23     3125.4
 20000  104.99     1745.6

Measuring cache-to-cache transfer latency (in ns)...
Local Socket L2->L2 HIT  latency        22.6
Local Socket L2->L2 HITM latency        23.7
Tweaked (3200MT/s, 18-18-18-34)
Intel(R) Memory Latency Checker - v3.9
Measuring idle latencies (in ns)...
                Numa node
Numa node            0
       0          89.9

Measuring Peak Injection Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads        :      147305.1
3:1 Reads-Writes :      138777.5
2:1 Reads-Writes :      140462.3
1:1 Reads-Writes :      141285.1
Stream-triad like:      143622.2

Measuring Memory Bandwidths between nodes within system 
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
                Numa node
Numa node            0
       0        147259.7

Measuring Loaded Latencies for the system
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Inject  Latency Bandwidth
Delay   (ns)    MB/sec
==========================
 00000  523.11   147208.4
 00002  521.15   147108.9
 00008  532.44   146583.0
 00015  530.16   146652.3
 00050  492.74   147099.2
 00100  347.49   149592.6
 00200  128.43   100264.8
 00300  118.34    69817.3
 00400  114.41    53619.9
 00500  112.13    43582.8
 00700  110.17    31723.3
 01000  108.72    22626.7
 01300  107.31    17668.1
 01700  101.82    13739.7
 02500  102.47     9599.2
 03500  101.95     7059.3
 05000  101.69     5144.6
 09000  103.32     3134.0
 20000  101.01     1768.0

Measuring cache-to-cache transfer latency (in ns)...
Local Socket L2->L2 HIT  latency        22.7
Local Socket L2->L2 HITM latency        23.8

About a 5% decrease in latency.

Conclusions

It’s possible to lower the timings on ASRock ROMED8U-2T without the system complaining just for the sake of it. What I haven’t tried are any subtimings, as my understanding of memory timing tweaking is quite shallow. But I’m happy for suggestions!

I don’t have concrete workloads which I know of where these settings would make a difference. It’s fun to optimize for the OCD of it, and I would probably run lower-than-spec timings if I keep this board for my main work machine, “because I can”. However there are still some aspects of it that are less optimal (e.g. the lack of a VRM temp sensor) and might lead me to put it in my server instead.

Btw:

If I were to check this, how could I verify the resulting FCLK on Linux?

2 Likes

I’ve never bothered trying to verify overclocking in Linux, because doing so would have me fall down a fractal rabbit hole and end up spending days researching extra shit.

I just want to get my overclocking done and move on, so what I do is just install windows on an old sata drive and boot that while testing. Literally all my testing tools are for windows anyways.

Ryzen master should show it. HWinfo should also show fclk, probably. CPU-Z may also report it as northbridge frequency.

1 Like

Ok that was the step I was hoping I could skip :slight_smile:

I might actually do that. I have an old 120Gb ssd laying around, might as well install Windows and some test tools on it, and then keep it around for occasions like this.

sudo lshw -C memory

 *-bank:3
      description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 3600 MHz (0.3 ns)
      product: F4-3200C14-8GTZ
      vendor: G Skill Intl
      physical id: 3
      serial: 00000000
      slot: DIMM_B2
      size: 8GiB
      width: 64 bits
      clock: 3600MHz (0.3ns)

Just divide the reported clock by 2 and that is FCLK

Isn’t that just showing the frequency of the memory module? The FCLK (infinity fabric clock) is usually in sync, but it can be uncoupled and be different, which is what we want to check for.

No, it isn’t in this case. Look again. The memory module is XMP 3200 rated. I have overclocked the memory to 3600 via 1800 FCLK.

The memory is running in 1:1:1 synchronous mode as that is always going to work at anything less than 3600 speed.

Crap shoot trying to run 3800 or more for most memory controllers.

I still am not clear why you think the reported clock here is FCLK.

We are looking for something that correctly reports the infinity fabric clock when it is NOT synchronized with memory clock.

I am not stating that this explicitly reports FCLK.

I am stating that the FCLK is derived from the reported MCLK because I set the FCLK to 1800 Mhz to get my MCLK to run at 3600 Mhz in synchronous mode.

The only tool I know of that directly reports Fabric, Uncore and Memory clock is Ryzen SMU.

Ryzen SMU

Ryzen SMU output

2 Likes

Ah, now that looks exactly like what we’re looking for, thanks. Hopefully it has enough data to work with recent epyc chips.

So I gave https://github.com/leogx9r/ryzen_smu a try, but ultimately got PM Tables are not supported on this platform. for my epyc 7443p. /sys/kernel/ryzen_smu_drv/pm_table does not show up, and dmesg shows a load of errors.

@Wendell I noticed you pop up in this github issue https://github.com/hattedsquirrel/ryzen_monitor/issues/2, did you ever get get any results helping support other CPU’s?

2 Likes

Thanks for this @KeithMyers!

I tried it too now. The readme that came with the source claims it supports Milan, but it didn’t work for me either, on 7443p and kernel 5.15.

First time I modprobe, I get nothing in dmesg, though lsmod shows the moule loaded. I rmmod and modprobe again, then I get a slightly different message than @Log got (but perhaps with the same ultimate meaning):

[  527.148502] ryzen_smu: CPUID: family 0x19, model 0x1, stepping 0x1, package 0x4
[  527.148555] ryzen_smu: SMU v45.59.100
[  527.149176] ryzen_smu: Unknown PM table version: 0x002D0208
[  527.149178] ryzen_smu: Failed to probe the PM table -- disabling feature (249)
[  527.149185] sysfs: cannot create duplicate filename '/kernel/ryzen_smu_drv'

Followed by a bunch of stack traces. The included test.py script failed both times.

Edit: it looks like the stack trace errors might be related to loading it a second time, rather than the failure to read the PM table. Either way, it does not seem to work properly.

In the meanwhile I did install Windows 10 on a spare drive, though I could not get Ryzen Master to install properly. Neither cpu-z or Hwinfo64 reports Infinity Fabric explicitly as far as I can see - though perhaps it can be inferred from other stats.

Well shoot, I was expecting those to be compatible by now.

It’s going to be a while before I have the time to look into thinks again.

Yes, provided access to a couple different configs. Was gonna do a video on it but things sort of trailed off and idk if it was really much accomplished. Going to be the same sorts of issues with tr5k

I asked for Epyc Rome and Milan support over a year ago in this issue. Ryzen_smu support for Rome and Milan Epyc cpus
But I got the response that since the dev had no access to that hardware, it would likely never happen.

I thought maybe after a year the issue might have gained some traction. Guess not.

Ok, I believe I found the Infinity Fabric info in Hwinfo64 - it was hidden under the sensors view:

image

It is listed separately from the Memory Controller Clock and Memory Clock (not shown in the picture), which is hopeful (for the program’s ability to answer our question).

When setting memory clock to 1600/1467/1333, we get:

Memory Clock: as set
FCLK/IF: a dynamic number <= Memory Clock
Memory Controller Clock: always in sync with FCLK/IF

However, when I set Memory Clock to 1667, I get exactly the same behaviour as if set to 1600:

Memory Clock = 1600
FCLK/IF = MCC <= 1600

So it seems I can’t get the memory to run above 1600MHz in the first place. :frowning:

Perhaps there is a hard limit to this imposed by the CPU chip. I believe that in practice, despite being listed as 1600, the Memory Clock is dynamically in sync with IF and MCC, as various AMD documentation suggests.

1 Like