I apologise in advance if this is a stupid question, but I have been googling for quite awhile and have found conflicting information.
I have a number of desktops, and I’m looking at using one to practice some vmware and windows server stuff to help me progress in to the sys admin space for work.
Basically I have two machines I could use. One is a socket 1366 xeon 6 x5650 system with 12 GB of RAM and the other an FX8350 with 8GB of RAM.
I know normally ECC memory is unsupported on consumer desktop systems but I’ve seen some talk of it working on AMD systems and older intel systems like my socket 1366 board. I want to use ECC because there is a lot of cheap DDR3 on ebay which would be useful for the sort of work I’m planning to do.
The thing is (in my experience) that when a manufacturer lists “ECC-RAM compatible” in their motherboard specifications, this doesn’t necessarily mean that the system will run in “ECC-mode”. In other words, the system will boot with ECC memory installed but it might not correct errors. Unless the board supports ECC memory in working ECC-mode, this compatibility won’t help you.
However, it might be possible that this board actually does support working ECC and is not just compatible. A look in your board’s manual might give you some further insights. One of my system’s board’s manual for example says that “ECC will only work with a Xeon CPU” (it’s an Intel C232 board).
Since your CPU also supports ECC, chances are good that you might be successful with your AMD system. If you have the option to order some ECC-RAM to test it, I would simply try it out. Most shops will probably let you send it back, if you don’t want to keep the memory.
Validation on the other side is a big can of worms. Usually the newer Linux kernels will report it in the logs, but if not you will need some other tools. What worked best for me was Passmark’s fork of memtest86 (the free edition). It will show a simple “Yes” or “No” right after booting the live CD. If you can overclock the memory on that board, you can also try that. If you go too high on the frequency, the memory will not be able to keep up and errors will occur, which ideally should be corrected and reported to the OS.
Thanks for the info. So you would suggest that I would still need to test the memory even if it boots. Am I looking to create errors via overclocking or is there some other way to ensure ECC is working.
Like I said, you can try to verify in many ways. Usually you are all good if the Linux kernel reports ECC enabled in the logs. Overclocking is a nice way of triggering real-life errors and the mentioned live-CD is also a possibility that worked well for me.
That’s what I have read as well, but on the other hand, I have a workstation (not server) board that has a special chipset (C232) which is designed to have ECC-support, along-side a CPU (Xeon) that is designed for ECC as well, in conjunction with RAM that is listed in the compatibility list of the motherboard manufacturer as working ECC-RAM, but I still get a different number there. What I get is 128 bits even if I move the two modules into separate memory channels.
When I boot the mentioned live-CD I get “ECC: Yes”, so I assume it is working fine, but I haven’t verified it with the newer Linux kernels yet. Of course the live-CD could just do a simple “Is the data width > 64?” kind of check and call it ECC if true. Back when I built it, the then current kernel wouldn’t print anything related to ECC.
What I am trying to say is that a width of something other than 72 bits doesn’t necessarily mean that it’s not working, but if you see 72 bits, then I think it’s safe to assume that ECC is working. Of course the best way of verifying this is always to generate errors and have error correction reports logged by the kernel, because just the fact that the extra data is available doesn’t necessarily mean that the system will also correct them.
Note to myself: Boot from a live Linux and check logs for ECC messages.