Before I dive into my requirements just want to give some background around why my thinking is the way it is.
I have a very basic NAS setup. It’s a Asus PN51 (I think) mini-pc with a USB to Sata 4-Bay Mediasonic drive bay.
Till a few days ago there were 2 drives on it; one a 18TB and another a 14TB. I was using ext4 filesystem on each and pooling the drives using mergerfs.
I wasn’t concerned with backup as for the important data I was doing a 1:1 sync between the NAS and my personal PC which had 2x4TB drives pooled using Stablebit’s Drivepool. I was just about learning about restic and setting up another pool of 5x4TB drives to backup items I don’t need to use directly on my local machine.
About a month or so ago I implemented a cron on my NAS to reboot every weekend. During this I noticed that on reboot the machine would go into maintenance mode with one of the drives failing to mount. It was always the same drive; the 14TB drive. I kept ignoring this since I wasn’t really understanding what was going on or what could be the issue especially since if I just exitted the maintenance mode the machine would start as expected without any issues.
Then last week i saw in the logs to run the fsck command to fix issues. When I ran the command I saw a lot of errors and asked if I can ignore and I ignored and fsck tried to fix it. Got varied comments on what could be the potential issue so I disconnected the drive and ran a surface scan on my windows machine using Minitool Partition wizard to check for errors. And lo an behold there were bad sectors in this drive. Thinking fsck may have fixed the files in the bad sectors and the fact that I needed to replace that drive anyway I started backing up the imp data again on my 1:1 copy. After the copy on my local machine I logged into Fedora which I had setup as dual boot since I wanted to learn more about linux to eventually be able to move to it. On login I seemed to see a pop-up saying one of my drives was unstable and I should change immediately. Tried searching for the notification again but couldn’t find it so I went into windows mode again and ran CrystalDiskInfo. There I could see a warning and one the properties; unrecoverable something; I saw a number around 90 or 100. So I did a full surface scan of this drive and to my shock; even this had bad sectors.
Luckily with the 14TB I was able to apply for a refund so money was sorted but I was really worried about my data. Especially since I had only recently setup immich and started uploading a lot of my personal data into immich.
After sorting out the moving of data from the 14tb into the 18tb I just tried to do a quick diff between the local and the nas content that had a 1:1 copy. Sadly I started seeing that some files weren’t matching. Then I looked into the immich folder and saw that some of my images got corrupted.
Now luckily for the personal images and videos I do have a backup on my phone and a copy on nas drive but I basically now need to figure out a way to find all the files that are potentially corrupted on my nas.
This brings me to my query and the reason for my post
Is there a way to be able to setup a pooled structure of the 2 drives and also set something up that could detect “bit rot”? I was looking into btrfs (single and raid0) but looks like with either if Iose 1 drive I lose all the data. I get the parity checks I think but I want to be able to add drives of any capacity to a pool and be able to use it completely. Tried reading a lot of posts around btrfs and saw a lot of folks suggesting zfs. But from what I can see zfs needs 1 drive to be parity? As of today I don’t have the capacity to buy an identical drive to what my pool would have.
Would prefer if I’m able to use the full capacity on my nas with the ability to detect any corruption in files and fix it automatically. I’m mentally exhausted and thus this post. Would really appreciate if I could get some guidance around what direction to take. A lot of the jargon used with btrfs and zfs seem to go over my head and the more I read the more I seem to get confused.
I’m willing to weigh the pros and cons of any suggestion. I was able to procure a new drive with the refund money but haven’t setup anything on it. Want to first decide on what I want and then after I set it up will start moving data/balancing data onto it.