Poor MATLAB Performance on Threadripper 5995WX System (Win10)

Reporting some initial tests:

The MJS profile enabled via the Admin center was worth an improvement of anywhere from 25-35% depending on the run. Then I enabled CorePrio and its NUMA Dissociater, and that got me another 20% improvement. So far so good, but still nowhere close to where this system should be performing. The benchmarks I’m running should ideally take around 2 hours, but are taking nearly 10-12 hours even after the improvements I outlined. Sorry about being vague, but the code I am running is proprietary and I can’t share much detail into what it actually does or share any files from it. @alpha754293, my code uses some functions that are not compatible with the “Threads” environment, so I am on the “Processes” one currently.

One interesting thing that I noted is that this performance degradation becomes worse with longer tests. I can see some pretty good performance at the beginning of the optimization run, but performance starts degrading severely around an hour into the test. HWinfo64 reports CPU temperatures are fine (maxing out at around 71-72C), but I read this thread ([SOLVED] 5995wx on ASUS WRX80E-SAGE with 1TB memory installed goes into an infinite Q-Code loop - #73 by wendell) and I’m kind of getting suspicious that it could be the memory starting to get too hot and degrading performance. Unfortunately, I don’t think the Precision 7865 is setup to sense DIMM temperatures as I am not seeing any of that information on HWinfo64 or Hardware Monitor. I am not sure how well the info from the other thread would correlate since I am running just 8x32 GB and can’t see the DIMM temperatures via software.

Basically, the performance breakdown across different runs of my optimization code is kind of looking like this:

1000 generation optimization - 400 - 450 seconds
5000 generation optimization - 2k - 3k seconds
10000 generation optimization - 40k - 60k seconds (!!)

If there was no throttling anywhere, the time taken for completion should be linearly proportional to the number of generations. The “ideal” number to hit for the 10000 generation optimization should be around 4500 seconds or less. Funny thing is, with the fixes via the Admin Center and CorePrio, the CPU usage does consistently hit 100% and stay there as expected, which is why I am suspecting the memory throttling now.

Any thoughts?