Diagnosing memtest86 errors

I have an old system that I converted into a TrueNAS server. It originally had 32GB of RAM (4x8), which I recently doubled by getting the same set (although it turns out it was a different speed).

I ran memtest86 for about 14 hours and got 1 error. This seemed odd as I’ve heard that most bad RAM reveals errors in the first few minutes of memtesting.

I tested the original RAM for another 8 hours, with 0 errors.

I have now been testing the new RAM by itself for about 1.6 hours (0 errors so far), in the old RAM’s slots, to see if it was a slot issue.

What other issues could cause it?
Could it be the fact that the old ram is 1333 and the new ram is 1600?
Could it be a hair or dust on one of the pins?
Should I return it and get all new RAM?

To be clear, the “new” RAM means newly purchased. It’s actually like 11 years old, as I wanted to get a set that matched the original RAM.

Was this ECC memory? And did you take a screenshot of the errors?

We need to get a bit specific here. The memtest86 program has different tests that it does, each testing different types of memory failures. Once it loops through all the tests it does a second pass of all the tests, then a third, etc. The failure will be in a specific test, and knowing which one would help us narrow down potential causes.

Test list here - Memtest86+ | The Open-Source Memory Testing Tool - FYI memtest86 and memtest86+ are different, I’m assuming you’re using memtest86+. Though IIRC the tests numbers are the same.

I’m guessing that the failure is in the bitfade test, which can take a long time to run. And from what memory testing I have done most bad memory fails in the early tests, not all green except bit fade. (Though I’ve had weird issues, like memory corruption during suspend on a specific stick).

I’d suggest you leave it running again for 8-12 hours and take a screenshot of any errors that pop up. It will show you the memory address and the test, then you can rerun that specific test to validate the results. If you’ve only seen that one error that one time there is a chance that it’s cosmic rays.

2 options, interference, or timings that cause issues only in some situations, one factor that could impact is heat another voltage, generally, you won’t really know what’s going on on a modern system, because you need to put a load on IMC when running a RAM test otherwise you can completely pass it and when you run a load run into issues.

Thanks for the reply! It’s not ECC memory (which I know is preferred for servers, but that’s a whole other conversation), it’s ripjaw X series. And stupidly I didn’t take a screenshot of the error, but going forward I definitely will.

I’m not using memtest86+ as I had some issues running it. So I’m just using regular memtest86, and also I had to use v4.3.7 as my motherboard doesn’t support the UEFI version.

I’ve now been running all 8 sticks again for about 9 hours, no error so far. I’ll leave it for a few more and see if another error happens again. Currently it’s on test #4, moving inversions - 8 bit pattern.

Well I ran it again for about 18 hours, then I ran the bit fade test for a couple hours more. No errors at all. Frankly it doesn’t ease my mind, as I still have no idea what caused the initial error! I can only hope it was some dust in the socket that accumulated over the years that PC has sat dormant, that was dislodged the second time I reseated all the sticks. Or an errant cosmic ray flipping a bit.

Cosmic rays is a pretty likely bogeyman that you can blame, you’ve done nearly all the testing you can to eliminate any actual issue with the sticks. The only alternative is to do testing wen there is more load on the memory as @stratego suggested, but I’m not sure of good tools for that.

https://blogs.oracle.com/linux/post/attack-of-the-cosmic-rays

You can use TM5 with Extreme profile this would be a great help

As for cosmic rays, well, let’s just say that more likely than not they aren’t at fault, usually the issue is timings combined with temperature, more load on IMC can often get them to show up faster, although the occasional 1 error on 3 runs of memtest86 should be stable enough for most people, but in the long run, you can get corrupted system files, if you’re unlucky enough.

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.