I’ve been having some issues with my NAS for a while and 2 of the drives have been having Checksum errors. It’s only been 1 per drive but my system will have an unexpected restart every few weeks. I figured I’d replace the drives, so I replaced 1 and started the resilver process.
It will get up to about 96% then the system restarts and then the whole process starts over again. It does show there are 96 errors but I’m not sure if that’s high or what that means exactly but I’m assuming its not good to have errors.
It’s in a RAIDZ2 config so I don’t want to replace both drives at the same time or I could lose everything. I am in the process of getting a new NAS to create a backup, but I don’t have it just yet. Also, I did not burn-in the drive beforehand because I wasn’t aware that was recommended until after I started searching for solutions.
Is there anything I can do to have the process finish?
System Specs
Motherboard : MSI PRO B660M-A Micro-ATX
RAM : 2x 16GB DDR4 3600 CORSAIR Vengeance
CPU : Intel Core i5-12600K
Storage : 8x Seagate 10TB IronWolf SATA III, 1x 500GB SSD (for running TrueNAS)
Expansion cards : 1x 10GB PCI-E Network Card (onboard had issues), 1x M.2 PCIe 6 x SATA Adapter Card (motherboard doesn’t have enough SATA ports)
The two drives with the checksum errors, are they running off the motherboard sata controller? I had an issue on my NAS system where two disks using the motherboard controller would throw random checksum errors and degrade the array, I can’t remember what I did to fix it but I can try and work it out when I get back. You could try a BIOS update.
However given that it’s crashing then it’s probably a memory stability issue, try turning off the XMP profile and running the RAM at stock speeds.
Thanks for the reply. Funny thing is I tried both disabling XMP and updating the BIOS. When I updated the BIOS I could not boot into TrueNAS and it would throw a “guard1 fail” error so I had to revert it. I’ve ran MemTest before overnight and no issues came up. I believe both drives are plugged directly into the motherboard.
I put the old drive back in and it started to resilver but so far no errors have shown up but its only at 40%. I’ll report back once its done.
The resilver with the old drive got to more than 90% without any errors then the system restarted on its own and got stuck in a boot loop with the following error:
The error could be due to a memory issue so I’m going to leave the system offline for now until I get my new system in a few weeks and hopefully that works.