I recently built a Threadripper machine with ECC and have verified that windows and linux reports my ECC ram as ECC enabled. However, I want to verify that ECC is actually working since I got cheapish ram off of ebay that I don’t necessarily trust (some no-name brand with hynix dies).
i.e.)
C:\Users\user>wmic memphysical get memoryerrorcorrection
MemoryErrorCorrection
6 (this is multi-bit detection)
-also-
C:\Users\user>wmic memorychip get datawidth, totalwidth
DataWidth TotalWidth
64 128
64 128
64 128
64 128
(since totalwidth > datawidth, I should be fine, right?)
Also, dmesg indicates ECC is enabled, alongside AIDA64. So it looks to be enabled.
How exactly can I verify this, though? After digging around for hours, it looks like a pro version of memtest86+ is the only way to inject single bit errors into the system, but that costs like $50. Are there any free tools out there for this?
Start overclocking the ram. Then errors should show up in memtest86+ (the free version, comes most Linux boot installers) just fine
Passmark memtest86 is the paid version. Memtest86+ is a free version with source available that has been abandoned for several years, but still works. The confusion is common.
However, I couldn’t boot into arch with unstable ram. It either booted or didn’t. Nowhere inbetween. Do you have any pointers to overclock ram to an unstable, but still bootable state? I ran 10 hours of testing with the stress tool as described, and 0 errors.
I overclocked it to the xmp profile for 2800, since the next step up would not boot. I then kept tightening the timings until it would no longer boot, and then left it at that value-1 to get it to boot. Still no errors reported.
If no adjustment to ram voltage gives you a hard wall, the try setting ram voltage to 1.4 and trying again. Don’t worry about high ram voltage. In fact, creeping it up to 1.5 will likely get it warm enough to start causing instability
Also Set the SOC voltage to 1.1, which is where you should leave it at and is safe 24/7.
Don’t go higher than 1.15v for SOC(anything lower is still safe”, it won’t do anything for stability. Above 1.20v can degrade the soc via high LLC, and above 1.25v WILL degrade it.
Procodt is also important for stability, and the best setting is individual to the motherboard/cpu/ram combo, and not nessicarilly found automatically. You can adjust this to get errors too.
Someone correct me if I’m wrong, but AFAIK you could also use something like rowhammer that is supposed to introduce RAM errors, they should be reported and/or corrected.
I’m not super familiar with overclocking memory. I tried the xmp profile for 2800, and increased the DRAM Voltage to 1.4, and no errors yet using stress. It still won’t boot on 2866.
If you get me screenshots of the primary and memory related bios options, I may be able to help you out, otherwise I’m flying blind. Too many screenshots is better than too few.
I’m a bit surprised that the ecc ram you have, has an xmp profile. With mine it was manual the whole way, which is fine since I changed so much.
I’m phoneposting, but I’ll check back late tonight.
Oh, must have missed that you have a Taichi, thought you had something else. Or many asrock uses the same scheme for multiple boards.
If you look at the Column of numbers that you can’t change, that’s actually what the board is running at. What happened is that it couldn’t make your entered numbers work, so it ignored them. You’re actually running at 2666 20-19-19-19
Late tonight I’ll cobble together some of my other helpposts into a guide. Well get a good OC foundation first, then it’ll be easier to create small instability.
Are you sure? If I switch my frequency or ram timings in the bios (right side) , I can detect the changes in the OS (admittedly only frequency, not sure how to check timings in linux).
Unless Linux is lying to me. Running a memtest86+ run on 3200 mhz (memtest detects the higher frequency), no errors and almost done. What’s going on here?
Edit: thought 3200 was unstable enough since I was able to boot into arch, but it froze.
If you have windows, run the ryzen timing checker, which is the only non-bios way to check that I trust. I vaguely recall seeing incorrect readings in other things, but that was early on in the ryzen comparability phase, and a long time ago.
Another thing that can happen is that the bios may not be able to work things out the first time, but subsequent tries (with incremental changes to the auto settings) can work.
Nice. The bios was likely still trying to auto adjust settings, meaning it was changing each boot. At least we’ve verified I’m not yet delusional.
Also, the bios hates certain timings being odd or even when they get low enough, and will change your “attempt this” setting, as you saw. You can actually try for 14, even if it hates 15.
Interesting, ill have to try that. Whats the next step here? No failures in memtest86+ @ 3200MHz. Now I’m running some tests with Phoronix at the suggestion of @wendell, with no errors yet at 2866MHz.
Not entirely sure which tests to run, but been running some mbw for about a half an hour now.