Quick question, has anyone tested and shown a significant reduction in memory latency on Epyc Rome chips when using 2933mhz Ram (Coupled Mode) vs 3200mhz (Uncoupled mode)? I have been trying to figure this out by checking all the reviewers from the era and servethehome, phoronix, level1 @wendell , ect. didn’t really seem to benchmark this.
For those unfamiliar with the topic, Epyc Rome ran it’s infinity clock speed lower than the max supported ram speed of 3200mhz which added latency. If you ran 2933mhz ram it was called “Coupled Mode” according the AMD docs, which lowered latency. This was later fixing on Epyc Milan.
If someone has the 3200 ram on a Rome platform, can they tune it down to 2933 in the BIOS, and see the difference?.. Or does it require the actual 2933 sticks?
CPU Epyc 7F52 8x 32GB RAM
That’s 2933MT/s RAM overclocked to 3200MT/s, but IMC still runs at 2933MT/s. I can’t find “Coupled Mode”, but I think the system is using it automatically.
CPU Epyc 7F52 8x 32GB RAM
################ 3200Mt/s ####################
root@pve:/home/user/Linux# echo 4000 > /proc/sys/vm/nr_hugepages
root@pve:/home/user/Linux# ./mlc
Intel(R) Memory Latency Checker - v3.11b
Measuring idle latencies for random access (in ns)...
Numa node
Numa node 0
0 119.1
Measuring Peak Injection Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads : 146471.3
3:1 Reads-Writes : 138848.3
2:1 Reads-Writes : 140488.5
1:1 Reads-Writes : 141013.6
Stream-triad like: 142600.9
Measuring Memory Bandwidths between nodes within system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Numa node
Numa node 0
0 146168.9
Measuring Loaded Latencies for the system
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Inject Latency Bandwidth
Delay (ns) MB/sec
==========================
00000 163.79 145131.8
00002 164.70 144807.0
00008 164.17 144670.7
00015 164.96 144364.9
00050 161.97 144561.8
00100 147.30 116455.2
00200 133.99 66158.1
00300 131.09 46819.9
00400 128.03 35871.7
00500 127.30 29244.8
00700 125.61 21285.5
01000 125.11 15165.6
01300 125.10 11802.2
01700 125.37 9174.3
02500 126.34 6396.9
03500 126.79 4713.3
05000 127.53 3452.0
09000 128.50 2139.5
20000 129.47 1233.7
Measuring cache-to-cache transfer latency (in ns)...
Local Socket L2->L2 HIT latency 125.3
Local Socket L2->L2 HITM latency 125.5
**########### 2933MT/s ###################
root@pve:/home/user/Linux# ./mlc
Intel(R) Memory Latency Checker - v3.11b
Measuring idle latencies for random access (in ns)...
Numa node
Numa node 0
0 118.8
Measuring Peak Injection Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads : 143089.7
3:1 Reads-Writes : 137099.9
2:1 Reads-Writes : 139180.9
1:1 Reads-Writes : 140306.7
Stream-triad like: 139655.8
Measuring Memory Bandwidths between nodes within system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Numa node
Numa node 0
0 143965.7
Measuring Loaded Latencies for the system
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Inject Latency Bandwidth
Delay (ns) MB/sec
==========================
00000 163.53 142517.5
00002 163.87 142287.2
00008 163.83 142029.4
00015 163.47 141836.7
00050 161.77 141864.5
00100 137.57 115385.5
00200 124.49 65952.2
00300 130.69 46506.1
00400 128.59 35551.7
00500 127.14 28874.8
00700 126.37 20991.5
01000 125.29 14992.6
01300 125.67 11691.7
01700 125.39 9101.9
02500 126.23 6364.2
03500 126.63 4690.9
05000 127.59 3432.2
09000 128.23 2130.8
20000 129.40 1229.0
Measuring cache-to-cache transfer latency (in ns)...
Local Socket L2->L2 HIT latency 125.3
Local Socket L2->L2 HITM latency 125.4
root@pve:/home/user/Linux# **
I don’t have numbers, but with a 2700X I had lower latency in a benchmark with 2933 vs 3200, slightly higher bandwidth or speed with 3200 same benchmark, but no real-world difference boot speed/general OS/games (I used 3200)
3200 was the highest I could use and 2933 was a convenient drop-down option when testing (it sounded more interesting than 3000 )