So here is the deal:
I am running some burn in tests on a Ryzen 1700X system with 2*16GB of 2400 Mhz ECC memory. This machine will be crunching numbers 24/7 once I am done setting it up, so I am testing it by running LinX (aka. IBT or Intel Linpack with a patch to run on on AMD) with ~28 GB memory.
This is an excellent test that stresses the FPU, all levels of cache and the RAM.
The problem is that I am seeing error messages in the Windows Error Log, that seem to indicate corrected errors in the L3 cache, and while the system seems stable, there have been over 300 corrected errors (supposedly in the L3$) in 24 hours of LinX.
The system is not overclocked, I have updated the UEFI on the mobo (Asrock x370 Taichi) to 3.20 and set all fan speeds to full speed.
My question is: what is the best course of action for such a situation? Since there is no overclocking going on, the only thing I can think of is faulty HW.
RMA-ing is possible, but due to the special ways this system was procured (nothing illegal) I have to be damn bloody sure that the component we are going to RMA is indeed faulty. The CPU currently in the system has been received from AMD, as an RMA, the previous CPU was affected by the SEGV problem under Linux. I would hate to have to go for another CPU RMA.
BTW I am having the same issue as this guy: https://community.amd.com/thread/222508