Need guidance around setting up drives with ability to handle "bit rot" without needing a parity drive if possible

Before I dive into my requirements just want to give some background around why my thinking is the way it is.

I have a very basic NAS setup. It’s a Asus PN51 (I think) mini-pc with a USB to Sata 4-Bay Mediasonic drive bay.
Till a few days ago there were 2 drives on it; one a 18TB and another a 14TB. I was using ext4 filesystem on each and pooling the drives using mergerfs.

I wasn’t concerned with backup as for the important data I was doing a 1:1 sync between the NAS and my personal PC which had 2x4TB drives pooled using Stablebit’s Drivepool. I was just about learning about restic and setting up another pool of 5x4TB drives to backup items I don’t need to use directly on my local machine.

About a month or so ago I implemented a cron on my NAS to reboot every weekend. During this I noticed that on reboot the machine would go into maintenance mode with one of the drives failing to mount. It was always the same drive; the 14TB drive. I kept ignoring this since I wasn’t really understanding what was going on or what could be the issue especially since if I just exitted the maintenance mode the machine would start as expected without any issues.

Then last week i saw in the logs to run the fsck command to fix issues. When I ran the command I saw a lot of errors and asked if I can ignore and I ignored and fsck tried to fix it. Got varied comments on what could be the potential issue so I disconnected the drive and ran a surface scan on my windows machine using Minitool Partition wizard to check for errors. And lo an behold there were bad sectors in this drive. Thinking fsck may have fixed the files in the bad sectors and the fact that I needed to replace that drive anyway I started backing up the imp data again on my 1:1 copy. After the copy on my local machine I logged into Fedora which I had setup as dual boot since I wanted to learn more about linux to eventually be able to move to it. On login I seemed to see a pop-up saying one of my drives was unstable and I should change immediately. Tried searching for the notification again but couldn’t find it so I went into windows mode again and ran CrystalDiskInfo. There I could see a warning and one the properties; unrecoverable something; I saw a number around 90 or 100. So I did a full surface scan of this drive and to my shock; even this had bad sectors.

Luckily with the 14TB I was able to apply for a refund so money was sorted but I was really worried about my data. Especially since I had only recently setup immich and started uploading a lot of my personal data into immich.

After sorting out the moving of data from the 14tb into the 18tb I just tried to do a quick diff between the local and the nas content that had a 1:1 copy. Sadly I started seeing that some files weren’t matching. Then I looked into the immich folder and saw that some of my images got corrupted.

Now luckily for the personal images and videos I do have a backup on my phone and a copy on nas drive but I basically now need to figure out a way to find all the files that are potentially corrupted on my nas.

This brings me to my query and the reason for my post

Is there a way to be able to setup a pooled structure of the 2 drives and also set something up that could detect “bit rot”? I was looking into btrfs (single and raid0) but looks like with either if Iose 1 drive I lose all the data. I get the parity checks I think but I want to be able to add drives of any capacity to a pool and be able to use it completely. Tried reading a lot of posts around btrfs and saw a lot of folks suggesting zfs. But from what I can see zfs needs 1 drive to be parity? As of today I don’t have the capacity to buy an identical drive to what my pool would have.

Would prefer if I’m able to use the full capacity on my nas with the ability to detect any corruption in files and fix it automatically. I’m mentally exhausted and thus this post. Would really appreciate if I could get some guidance around what direction to take. A lot of the jargon used with btrfs and zfs seem to go over my head and the more I read the more I seem to get confused.
I’m willing to weigh the pros and cons of any suggestion. I was able to procure a new drive with the refund money but haven’t setup anything on it. Want to first decide on what I want and then after I set it up will start moving data/balancing data onto it.

There is will and there is a way.

Without using ZFS (or btrfs), I’ve seen people using little programs to calculate checksum of each file then store it in the same directory or as part of meta data (e.g. if using ext4). Personally I find ZFS simplifies the workflow by a lot. Haven’t looked into btrfs.

This will enable you to detect bit rot but not correct them. But I assume people follow 1-2-3 backup strategy (or at least two copies). When you detect bit rot (which is a very rare event these days), you can recover the corrupted file from the other two copies.

2 Likes

First part sounds interesting but if zfs does this for me id prefer that. Is it possible to create a zfs pool without parity drive?

I would want to avoid having to buy another 18tb drive just for parity (money a bit of a concern). Also I’m assuming zfs would stripe the data which means without parity if i lose 1 drive i lose everything right?
I was reading a few threads around btrfs and zfs and my only concern is people kept talking about things like performance issues and the fact that you need to be quite technical to be able to debug any issues you face with zfs. That’s what scares me a little

You can set up each drive on its own, without parity, and without pooling, in either zfs, or btrfs.

Then, if either drive goes kaput, it won’t bring down the other.

Just set them up as individual pools.

You probably don’t want arc cacheing for zfs of media drives, so can probably reduce it, if you use zfs

You also don’t need the advanced performance tuning of zfs, with things like compression and such, because media doesn’t compress well

Although not technical, or an expert, I am a massive fan of ZFS and use it at home, but I do like the pools, and cacheing and redundancy and parity, but for yourself, these are detractors, not benefits.

BTRFS might be easier, to be honest, and as I mention, each drive in a pool of 1, so they don’t knock each other out

1 Like

Thank you… When I first started looking into this i was contemplating setting the two drives with btrfs individually and then using mergerfs to pool them.

My expectation from this setup was that every new file i put into it it would do the checks it does and store the information for it. then if something goes wrong it would check against existing information (metadata i think?) and let me know if there’s something wrong.
Would what I’m proposing do this for me automatically or would I need to do something extra to find out issues with files and get information of the exact files that got impacted?

also; let’s say you were to setup something like this in your setup what is the thing you’d miss the most?

I haven’t used mergerfs, but it’s apparently quite stable

Sounds like a good idea

BTRFS and ZFS are checksumming, Copy On Write systems, and save the checksum each block that is written. Any time the block is read, the checksum is also checked, and IIRC, a failure, cancels the read, like “unable to open file” or some such.

I don’t think the reason for the read failure, is communicated back to the GUI app, just a simple failure.

The corrupted files I have detected, only showed to me, when I checked the CLI app for regular maintenance, and the system said (in zfs) “unrecoverable error in blockx” or filex
But, as I use redundancy (with parity) normally the system corrects without saying anything… A failed read, is checked against another copy, and if the other copy is good, the system replaces the damaged one, serves up a good copy, and quietly goes about its business.

I would miss the “easy” raid setup and manipulation with a single drive I guess, but I have used single drives before, as example for data drives in laptops and such, or a pi media server

ZFS can still do error detection, can still snap shot, and replicate.

I understand the value of the checksum, to show if a file is bad, like a DVD rip, so you know to just re-rip the DVD, and not keep backing up/storing a damaged file

But, I haven’t merged individual drives outside of zfs, just using them individually, like one mounted at /home/troop/movies0-M and the other /home/troop/moviesN-Z

But, as I use redundancy (with parity) normally the system corrects without saying anything… A failed read, is checked against another copy, and if the other copy is good, the system replaces the damaged one, serves up a good copy, and quietly goes about its business.

Hope maybe by next year I might be in a better position. If so will definitely look into setting up zfs with parity.

For now I think I will go ahead and play around with btrfs on the new drive and see the capabilities.

Thanks for your guidance

ZFS mirror or setting dataset property to copies=2, no parity space needed.

Hmm. Others have commented on common ways how to prevent bit-rot.

Linking insightful wikipedia article on data degradation.

I want to get into the question posed in the subject.

To detect data degradation or even repairing data degradation some additional data needs to be kept. This data is often referred to as parity data (linking wikipedia article on partity bit explaining the basics).

So, to detect data degradation you need to store some parity data to detect data corruption, some more to repair it.
There are multiple technical solutions to this. Modern file systems store this parity data transparently (e.g. zfs, btrfs). If you don’t want to use these there are other software tools that calculate parity data explicitly and store it separately from the file system (linking to wikipedia article of parchive - no endorsement, I have no experience with this).

The point is that you need to store parity data to be able to detect or prevent data degradation. If you currently don’t have it (and it sounds like your current setup doesn’t) you need to invest in additional storage.

The topic of data preservation is generally well understood with a myriad of technologies and methods on how to prevent data degradation. Also, data will degrade over time without management.
Anyone hosting data “at home” should be aware and have a tested strategy on how to mitigate data degradation - everything else will lead to data loss given enough time (enter 3-2-1 backup strategy).

Many folks on this forum like and use zfs and/or btrfs filesystems because these offer efficient ways of dealing with this problem (transparent parity calculations, send/receive backup operations).

ZFS mirror or setting dataset property to copies=2, no parity space needed.

when you say mirror I’m assuming it’s something like Raid1 where I need to have 2 identical drives and 1 gets duplicated against the other?
If so current budget constraints means I am unable to set something like this up. I’m currently testing out btrfs though

So, to detect data degradation you need to store some parity data to detect data corruption, some more to repair it.

I’m currently trying to setup btrfs on the new drive and then move all the data from existing drive into it, then convert that drive from ext4 to btrfs too. After that was planning to use mergerfs to pool both drives with btrfs in single mode for data and metadata set as dup for each drive.
Assuming the way btrfs works I should be able to detect issues if I run scrub once a week or so right? As I said I plan to do a 1:1 copy for the important data to my local machine and am also looking at restic to be able to backup to another drive pool (all of this would still be local/on-site. no option for offsite until I’m able to convince my mom to be able to keep a low powered machine on when she is not using it :smiley:

If you currently don’t have it (and it sounds like your current setup doesn’t) you need to invest in additional storage.

Will definitely plan on buying another drive for parity around next year. Quick question though; with my current setup as explained above what should be the setup I should look at? I’ve read not to use the raid5 option of btrs. The only possible solution I can think of is maybe using snapraid (I think that’s what it’s called) to configure parity?

everything else will lead to data loss given enough time (enter 3-2-1 backup strategy).

In my case I did have a 1:1 copy but it was just my bad luck that 2 drives on different systems both got bad sectors around the same time. This is exactly why I am looking at a solution that would give me an early indication of data corruption. I’m hoping btrfs is the right fit

Damn !! So I was doing an rsync to copy all data from drive 1 into drive 2 and started facing some issues. Not sure if it’s the right place to ask these queries or should I create a new post. Will put it here but if this needs a new post will create that too… I’m not worried the first drive is also corrupted… maybe?

This is how i created the btrfs partition
sudo mkfs.btrfs --data single --metadata dup --csum blake2 -L msdisk2_btrfs /dev/sdc1

This is the rsync command I used
sudo rsync -avxHAXWE --numeric-ids --info=progress2 /mnt/msdisk1/personal/ /mnt/tmp/personal/ --log-file ~/rsync_msdisk1_to_msdisk2_run3_personal.log

# rsync error
rsync: [receiver] write failed on "/mnt/tmp/personal/memories/library/common/2022/09-September/2022-09-10/VID_20220910_205619.mp4": Read-only file system (30)
rsync error: error in file IO (code 11) at receiver.c(380) [receiver=3.2.7]
rsync: [sender] write error: Broken pipe (32)
# error in journalctb
journalctl -xb | less
Jul 05 21:50:04 marge kernel: critical target error, dev sdb, sector 32891738392 op 0x1:(WRITE) flags 0x103000 phys_seg 2 prio class 2
Jul 05 21:50:04 marge kernel: Buffer I/O error on dev sdb1, logical block 4111467043, lost async page write
Jul 05 21:50:04 marge kernel: Buffer I/O error on dev sdb1, logical block 4111467044, lost async page write
Jul 05 21:50:04 marge kernel: EXT4-fs error (device sdb1): ext4_check_bdev_write_error:223: comm rsync: Error while async write back metadata

Jul 05 21:57:12 marge kernel: critical target error, dev sdc, sector 2943534848 op 0x1:(WRITE) flags 0x1800 phys_seg 4 prio class 2
Jul 05 21:57:12 marge kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 26, rd 0, flush 0, corrupt 0, gen 0
Jul 05 21:57:12 marge kernel: BTRFS: error (device sdc1) in btrfs_commit_transaction:2493: errno=-5 IO failure (Error while writing out transaction)
Jul 05 21:57:12 marge kernel: BTRFS info (device sdc1: state E): forced readonly
Jul 05 21:57:12 marge kernel: BTRFS warning (device sdc1: state E): Skipping commit of aborted transaction.
Jul 05 21:57:12 marge kernel: BTRFS error (device sdc1: state EA): Transaction aborted (error -5)
Jul 05 21:57:12 marge kernel: BTRFS: error (device sdc1: state EA) in cleanup_transaction:1990: errno=-5 IO failure

# btrfs dev stats
sudo btrfs dev stats /dev/sdc1
[/dev/sdc1].write_io_errs    26
[/dev/sdc1].read_io_errs     0
[/dev/sdc1].flush_io_errs    0
[/dev/sdc1].corruption_errs  0
[/dev/sdc1].generation_errs  0

# btrfs check
sudo btrfs check --force --progress /dev/sdc1
Opening filesystem to check...
WARNING: filesystem mounted, continuing because of --force
Checking filesystem on /dev/sdc1
UUID: dcbd18f7-e454-40d3-873a-1c07513325e0
[1/7] checking root items                      (0:00:09 elapsed, 1006205 items checked)
[2/7] checking extents                         (0:01:50 elapsed, 902917 items checked)
[3/7] checking free space tree                 (0:00:08 elapsed, 1475 items checked)
[4/7] checking fs roots                        (0:03:42 elapsed, 10305 items checked)
[5/7] checking csums (without verifying data)  (0:02:49 elapsed, 1333966 items checked)
[6/7] checking root refs                       (0:00:00 elapsed, 3 items checked)
[7/7] checking quota groups skipped (not enabled on this FS)
found 1579216732160 bytes used, no error found
total csum bytes: 12222058496
total tree bytes: 14793244672
total fs tree bytes: 169312256
total extent tree bytes: 69550080
btree space waste bytes: 2215697805
file data blocks allocated: 1564423487488
 referenced 1564423487488

sudo mount /dev/sdc1 /mnt/tmp
mount: /mnt/tmp: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error.
       dmesg(1) may have more information after failed mount system call.
# Checking condition of x18 18TB
sudo fsck -n /dev/sdb1
fsck from util-linux 2.38.1
e2fsck 1.47.0 (5-Feb-2023)
/dev/sdb1 contains a filesystem with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Inode 255983640 extent tree (at level 2) could be narrower.  Optimise? no

Inode 255983641 extent tree (at level 2) could be narrower.  Optimise? no

Inode 255983645 extent tree (at level 2) could be narrower.  Optimise? no

Inode 255983776 extent tree (at level 2) could be narrower.  Optimise? no

Inode 255986684 extent tree (at level 2) could be narrower.  Optimise? no

Inode 256016500 extent tree (at level 2) could be narrower.  Optimise? no

Inode 256016501 extent tree (at level 2) could be narrower.  Optimise? no

Inode 256016502 extent tree (at level 2) could be narrower.  Optimise? no

Inode 256254850 extent tree (at level 2) could be narrower.  Optimise? no

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/sdb1: 560451/274661376 files (3.3% non-contiguous), 3954545993/4394581504 blocks


sudo fsck -cp /dev/sdb1
fsck from util-linux 2.38.1
badblocks: last block too large - 4394581503
/dev/sdb1: Updating bad block inode.
/dev/sdb1: Inode 255983640 extent tree (at level 2) could be narrower.  IGNORED.
/dev/sdb1: Inode 255983641 extent tree (at level 2) could be narrower.  IGNORED.
/dev/sdb1: Inode 255983645 extent tree (at level 2) could be narrower.  IGNORED.
/dev/sdb1: Inode 255983776 extent tree (at level 2) could be narrower.  IGNORED.
/dev/sdb1: Inode 255986684 extent tree (at level 2) could be narrower.  IGNORED.
/dev/sdb1: Inode 256016500 extent tree (at level 2) could be narrower.  IGNORED.
/dev/sdb1: Inode 256016501 extent tree (at level 2) could be narrower.  IGNORED.
/dev/sdb1: Inode 256016502 extent tree (at level 2) could be narrower.  IGNORED.
/dev/sdb1: Inode 256254850 extent tree (at level 2) could be narrower.  IGNORED.
/dev/sdb1: 560451/274661376 files (3.3% non-contiguous), 3954545993/4394581504 blocks


sudo smartctl -a /dev/sdb
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.2.0-39-generic] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST18000NM000J-2TV103
Serial Number:    WR50FVMZ
LU WWN Device Id: 5 000c50 0f19c2374
Firmware Version: SN04
User Capacity:    18,000,207,937,536 bytes [18.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database 7.3/5319
ATA Version is:   ACS-4 (minor revision not indicated)
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Jul  6 00:54:30 2025 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  567) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (1583) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   069   064   044    Pre-fail  Always       -       8176872
  3 Spin_Up_Time            0x0003   093   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       24
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   080   060   045    Pre-fail  Always       -       92831092
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       5894
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       19
 18 Unknown_Attribute       0x000b   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   098   000    Old_age   Always       -       541174136961
190 Airflow_Temperature_Cel 0x0022   054   049   000    Old_age   Always       -       46 (Min/Max 38/51)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       10
193 Load_Cycle_Count        0x0032   092   092   000    Old_age   Always       -       17636
194 Temperature_Celsius     0x0022   046   051   000    Old_age   Always       -       46 (0 23 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       1
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0023   100   100   001    Pre-fail  Always       -       0
240 Head_Flying_Hours       0x0000   100   100   000    Old_age   Offline      -       2916 (97 188 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       46778575048
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       59046003475

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

one thing i seem to notice is that the ram goes from 13GB free to like 100MB free. Could this be the cause or is it mostly the enclosure causing hard disk issues :frowning: … Hope it’s not coz it’s something I bought after wendell’s recommendation

❯ free -h
               total        used        free      shared  buff/cache   available
Mem:            15Gi       1.4Gi       9.4Gi       3.8Mi       4.6Gi        13Gi
Swap:             0B          0B          0B
❯ free -h
               total        used        free      shared  buff/cache   available
Mem:            15Gi       1.4Gi       146Mi       3.8Mi        13Gi        13Gi
Swap:             0B          0B          0B
❯ free -h
               total        used        free      shared  buff/cache   available
Mem:            15Gi       1.4Gi       149Mi       3.8Mi        13Gi        13Gi
Swap:             0B          0B          0B

Hope someone can guide me. Freaking out a bit here :frowning:

Surely possible. I meant it without parity drives when I suggested zfs in my previous post. Since you’re interested in it, here is a bit more details.

Create a single-drive vdev for one of your HDD. Create another single-drive vdev for the other HDD. ZFS will merge them as pool automatically. Data will be stored and distributed to one of the two vdevs, dependent on remaining free spaces.

Every day or few days, you do rsync from your primary storage to this ZFS pool. Every month you do a scrub (with command ‘zpool scrub ’). Any bit rot will be reported. In the very rare case, if you see corrupted files, rsync it manually from your primary storage again.

I’ve been running a similar setup for 3+ yrs now. The ZFS pool resides in a USB enclose and 99% of the time is powered down. Not only prolongs its service lifespan but such a setup also comes with 100% space efficiency (no wasted HDDs).

To hold up the scheme, you’d better follow the 1-2-3 backup strategy or at least with two separate copies of your data.

ZFS will not allow you to create a “loose” vdev, neither will it automatically merge things into a pool. (I’ve never used TrueNAS, perhaps its GUI has functionality that pretends this is happening?)

Command line ZFS will only create vdevs as part of pools – i.e. you create pools (not vdevs) and tell it which drives to include and how to build the vdevs that make up the pool from them.

So you can, for example, create separate pools from two drives, each containing just a single-drive vdev. The OP could do this and use one pool/drive for active data and one for backups, for example. (If one drive fails the other pool would still work; both pools would detect bit rot but neither would be able to auto-repair it.)

So you have the same type of error on both drives? Looks like a controller (or memory?) issue to me.

The “critical target error” seems to be btrfs-specific? I only get a single hit doing a web search. Not much to go on, unfortunately.

Put in other words of my previous statements. Was little dramatic to make it interesting to read. You can create a ZFS pool with a single-drive VDEV. Then you add a new single-drive VDEV to the existing pool. You end up with a pool with two single-drive VDEV’s without parity disk. Sure you can.

Here is the result of bare-bone ZFS command lines:

I use two loopback devices to emulate two HDDs of different sizes. Without loss of generosity, you can do it with two physical HDDs.

This is not what OP was asking for help. Also two pools in one USB enclosure. One pool for primary storage, and a second pool for backup storage is a poorly conceived idea IMO.

Of course. That was never in question. No need to create strawmen here. I just wanted to make clear that what you wrote, which implies you cannot create multiple pools since vdevs are automatically merged into one pool, is wrong.

It’s exactly the kind of help OP was requesting:

OP ideally wants to have bit rot detection and repair without any parity storage. Which is impossible. He also doesn’t want a solution where if one drive dies, he lose all data. So we need to supply alternatives so that OP can decide where to go with this.

Two one-drive pools is one option which seems quite suitable to me.

1 Like

Now I’m totally lost what you were originally trying to argue for. I showed you that you can create a single pool of multiple single-drive VDEV’s. So I’m done.

Don’t attempt to respond and clarify what you meant further for me. I’m not interested. Cheers.

That’s just a basic RAID0 config. no problem adding as many vdevs as you like to. ZFS just wants block device in whatever form they are.
You can even make a pool with multiple vdevs consisting of different partitions of the same drive. But in this case, setting copies=2 or 3 in your dataset is the more sane approach to this if you want the entire data integrity package with just one drive (excluding drive failure).

Well, for the benefit of anyone else who may not be well versed with ZFS then, and in case any confusion was created by vic’s and mine exchange:

A single pool made out of two “striped” single-drive vdevs, as demonstrated by vic, is different from two pools each made out of a single-drive vdev.

With the former you get all space from both drives available in the single pool, but a drive failure destroys all data.

With the latter, the pools are independent: each pool’s capacity corresponds to the underlying drive’s, and if you have a drive failure only the data on its pool is lost. This is similar to just having two drives with “normal” file systems on them, but with ZFS’s checksumming, snapshotting, send/recv, compression, encryption etc.