[SOLVED] 5995wx on ASUS WRX80E-SAGE with 1TB memory installed goes into an infinite Q-Code loop

It would be interesting for Wendell or someone else with a dual-socket DDR5 system to run the Comsol benchmark provided earlier.

Something like 32 core Epyc Genoa might really shine on this workload because of the significant single-threaded portions, in addition to the faster memory and more channels.

2x8173M + 12x HYNIX HMA84GR7DJR4N-XN:

3 Likes

Oh pls do not apologize. I am totally ok with it :).

I’m very happy that everyone got their 1TB working :D.

Great job by my local Asus team to get this resolved in less than 5 days after we made a ruckus. haha.

If anyone from Asus is looking at this post, thank you so much and I apologize sincerely for any trouble that we have caused :smiley:

3 Likes

Best I Got My Eyes On You GIFs | Gfycat

2 Likes

One observation:
I noticed that after flashing 1106 BIOS the lstopo layout has changed for me.
Before the upgrade (as I remember) the PCI slots where all separate to the left of Package L#0, but now the slots are assigned to different Numa nodes.

2 Likes

That’s almost a 20% increase in performance over the previous BIOS, impressive; also sort of negates my comment about the 5965WX being almost as fast as the 5995WX. Now that an apples to apples comparison is being made the 5995WX is ~25% faster in this workload.

​​​ ​ ​
​ ​ ​​​ ​ ​​

Very much so, I might make a separate thread for this because I have several thousand of similar simulations I need to do and am in the market for new hardware.
I wasn’t sure how many takers I would get to actually run the benchmark because of the heavy memory footprint.

3 Likes

The 1106 bios definitely solves the boot issue with 1024 GB of RAM but I’m also experiencing the slowdown when running memory intensive tasks. In my case the task is DL model inference with PyTorch and the results are as follows (the speed is printed every 15 seconds):

  • With 8x64 GB RAM: runs at about 59-62 img/s for the whole test (about 10 minutes or so).
  • With 8x128 GB RAM: starts at about 61-64 img/s but after a few minutes the speed abruptly drops to 5-6 img/s and stays low for the rest of the test. This happens every time, just at slightly different starting point. I have set fixed thread affinity to prevent any funky NUMA issues, and also limited the task to use only 32 threads but nothing helps.

The real kicker is that this is just a test, and it used at most about 16 GB, so nowhere near the memory capacity.

I also bought and tested an ASRock WRX80 Creator R2.0 motherboard with 5955WX but it was the same as the old ASUS BIOS: endless boot loop with 1024 GB RAM.

In conclusion, I would stay away from 1 TB RAM with 5000 series Threadripper PRO. I haven’t tested with 1024 GB RAM and 3000 series Threadripper yet but I’m going to do this soon and report if I have any issues.

2 Likes

Have you ruled out memory temps causing the slowdown? A similar issue on the same board happened further up in the thread.

1 Like

My current conclusion is the same: 5995wx + 1TB is unstable yet.

On the other hand, my 3995wx + 1TB (8 x Samsung M393AAG40M32-CEC0) + 1003 system doesn’t have any issue so far. It’s running 24/7 and super fast. I’m not sure if 3995wx + 1TB will work properly on 1106, but at least I would say it works very well with 1003.

The temperature is definitely important for the stability, but I think the fundamental reason for the slowdown of the 5995wx + 1TB is not related to the temperature of the modules. I’m using a gigantic 12 inch fan to make a typhoon around the modules. When I experienced a slowdown, HWInfo told me that temps of the modules were around 65 degrees Celsius.

I read some useful info here in past, so here is my experience for people who are trying to make it work.
I have asus wrx80e-sage wifi 2 with 1106 bios, 5995wx + 8x samsung 128gb memory so 1TB in total. It was unstable and I found the reason - memory must be cooled, as soon as mem temp >60C it starts freezing. After several attempts to use fans, I had to use water cooling for memory and finally now it works properly.
Use HWMonitor to check RAM temperatures (though it shows only four temps instead of eight) and you can easily confirm that samsung memory starts freezing at 62-63C.
Another thing is that when you change RAM config (install or remove some RAM) you need to reboot PC several times until it gets stable, it seems like BIOS tries to find optimal timings or something like that.
Anyway, I can confirm that 5995WX + 1TB RAM can be stable under heavy loading if you cool hardware properly.

4 Likes

Thank you for the comment and for providing crucial information. If this is a temperature-related problem, I still don’t understand why the instability at high temp (> 60C) is CPU-dependent: my 3995wx + 1TB doesn’t have an issue (for my case, it is slowdown rather than freezing) even when temps of sticks are around 75C. I’m wondering if 5000wx has a tighter standard for memory module temperature.

Some quick updates:

Yes, the slowdown was due to the dimm temperature: my 5000wx gets slowdown when dimm temp (shown in HWinfo) hits 67C, until the temp goes down to 62C.
I recently got a chance to use Supermicro m12swa-tf, and with this board and 5995wx+1TB, I experienced the same slowdown. This means that the WRX80E-SAGE SE is innocent (sorry, ASUS)

I think the 67C dimm temp boundary is only for 5000wx. I tested again with my 3995wx to look at the temperature dependency, but this guy runs just fine even at 85C :astonished:

1 Like

The M12SWA-TF system with no slowdown. The benchmark finished in 27m 6s. Max dimm temp was 62C. Using the fan slightly way from the MB couldn’t prevent hitting 67C, and in this case the bench finished in about 1.5 hours.

I have to find out a better way to cool the dimms.

1 Like

3 d printable blower adapter should work great. Blower can be low rpm

1 Like

A funny photo :grinning:
My solution is two Thermaltake Pacific A2 and two Corsair iCUE H60i Pro XT, after some minor DIY I managed to connect them. But I use watercooling for CPU too (because I have a lot of hot 4090 GPUs and air CPU cooler cannot work properly), so I have a bit more space around memory.
Memory temp is around 55-57C under load.

1 Like

This would only be true for low rank RDIMMs, one perhaps two ranks; the quad and octo rank DIMMs would definitely stress the IMC more than LRDIMMs

1 Like

I have an issue with 8 x 32gb sticks getting code 15 error memory retrains on every boot, I can boot into the bios but not install windows. I have tried the usual installing modules into each slot individual and building from that with the same result comes up with a bsod hardware issue on trying to install windows I have the newest bios installed on the version 2 sage motherboard cpu is 5975wx ram is Corsair dominator platinum ddr4 3200mhz c16 all new ram only my second pc build still learning any help appreciated I have uploaded a video to youtube showing the problem and bios screen