Try adding your new devices with the same sector size as the existing one specifying with option ashift=<>
I think this is the limiting issue - the pool is 4 RAIDZ2 vdevs, the special device is a 3-way mirror. Also my test VM had all the disks created the same way and got the same error with this pool configuration.
Hi, there seems to be confusion about this:
In order to add / remove / modify the special VDEVs
- Given top-level VDEVs and special VDEVs have same ashift
which configuration is correct?
a) Top-level VDEVs and special VDEVs need to both be Mirror or both be same RAID Z
b) Top-level VDEVs can be RAID Z or Mirror but special VDEVs can only be Mirror
c) Top-level VDEVs and special VDEVs can ONLY BOTH be mirror
The correct answer would be āb)ā
Thanks!
As mentioned, itās B. However in order to remove any vdev, including special, ALL vdevs in the pool must be mirrors. They must also have the same ashift.
Any vdev can be added to the pool at any time.
Mirror vdevs can have disks added or removed from them, though the final disk in the vdev cannot be removed (if there are non-mirror or mixed ashift vdevs in the pool).
In the future raidz vdevs will allow expanding by adding a disk at any time.
A special vdev can theoretically be removed, but this is not to be done casually. You should have a full backup confirmed good before trying it, as there remains the slim possibility of killing the pool if something goes wrong like here: Pool corruption after export when using special vdev Ā· Issue #10612 Ā· openzfs/zfs Ā· GitHub
Also note, my understanding is removing a vdev does a weird thing where the data is basically put on a sort of virtual vdev, adding a layer of indirection for blocks going to it. Usually not something that really matters in practice, but also not optimal and will permanently follow the pool from then on.
Thanks for that GitHub link. The issue is rather concerning.
The last thread entry is Nov 19, 2021:
thatChadM commented Nov 19, 2021 ā¢
Tried recreating the failure exporting a pool with a special vdev losing the special vdev during the export but so far I havenāt been able to. Appears this is a somewhat random event since my digging so far only found 2 instances of it occuring so far: @don-bradyās event earlier in this ticket and when I ran into it.
Since then no action or updates to the topic.
From reading the entire GitHub thread, it would seem that there are two cases where exporting and importing (likely the issue specifically importing) a pool with special vdevs corrupts the pool because the import does not recognize the special vdevs as special.
The original poster tested and reproduced the problem many times with many different constellations of controllers, disks, with and without nvme.
Perhaps a good idea might be to test the export and import of a special vdev pool for oneself first, before putting data on it.
I will do that: I am preparing the LSI 3008 HBA and the three Toshiba HDDs as Z1 and the two Kingston NVMEs (partitioned, with special
and slog
on the same devices) as a mirror - before I migrate all my data to the new pool, I will export the empty pool and import it.
And see what happens.
Agreed on the testing, and confirmed backup before any major change. ZFS is a weird mix of very stable core, but with neat but still beta features that you donāt want to shake out without a fallback.
Jim Salter is also know as mercenary_sysadmin, and you probably seen his various guides on ZFS. Heās also the author of sanoid/syncoid, and manages many VMs on ZFS professionally. Heās not infallible but he generally knows reasonably well what heās doing so any issue he has is definitely one to pay attention to. That said, no one else can really reproduce the issue. Thereās no telling what sort of dark and weird code paths heās managing to touch or how. Bugs are constantly being fixed in ZFS, in fact hereās a neat summary about Ryaoās recent work using a static analyzer to clean a lot of loose ends up that few would ever be able to hit.
Iām not sure if Jimās tested it recently or not, and probably isnāt inclined to as he doesnāt see a performance benefit (which I disagree with) so Iām not sure if he can still reproduce the issue himself.
The overwhelming majority of people I see mention using special vdevs have no issues, same with myself.
technically also striped mirrors are fine for special vdevs. and stripe(mirror,mirror,mirror) is also fine and so on.
With that Optane sale I picked up 2 x 118 gb P1600X drives to use for my meta data + small blocks i currently have this saved to a 3 way mirror on 256gb sata ssdās.
block 25.7T 9.02T 16.7T - - 0% 35% 1.00x ONLINE /mnt
mirror-0 12.7T 4.51T 8.21T - - 0% 35.5% - ONLINE
9c49ef16-716d-11ec-8ca0-e41d2d7f4000 - - - - - - - - ONLINE
9c68698c-716d-11ec-8ca0-e41d2d7f4000 - - - - - - - - ONLINE
mirror-1 12.7T 4.50T 8.22T - - 0% 35.4% - ONLINE
9c1f9065-716d-11ec-8ca0-e41d2d7f4000 - - - - - - - - ONLINE
9c09833f-716d-11ec-8ca0-e41d2d7f4000 - - - - - - - - ONLINE
special - - - - - - - - -
mirror-2 238G 10.6G 227G - - 5% 4.43% - ONLINE
9ae1c0c6-716d-11ec-8ca0-e41d2d7f4000 - - - - - - - - ONLINE
9af5dd71-716d-11ec-8ca0-e41d2d7f4000 - - - - - - - - ONLINE
ef72e0f5-fdd0-4b42-9ea2-b6459171972b - - - - - - - - ONLINE
now i think i know the answer but let me ask is there any way to add the 2 Optane to the special mirror and then remove the 3 ssd or bc the drives that are in the mirror now are larger then the 118 i cant add? and if thatās the case do i need to export the data and rebuild the zpool to get the Optane in the mix.
Thanks
zpool add <poolname> special mirror <optane1> <optane2>
zpool remove <poolname> mirror-2
so i add another special mirror thats the 2 optane and im guessing that will be like mirror-3 under the special section. so it will then have have 2 vdev mirrors holding the same metta data? and then i can remove the first 3way ssd
Sadly Optane in Germany is not really cheap.
I went with WD SN700 as they are cheap and have a very high write endurance for the special VDEVs. They are not as insanely write proof as optane, but SLC NAND is the closest you can get.
NAME SIZE ALLOC FREE
asphodel 15.4T 5.51T 9.86T
mirror-0 14.5T 5.49T 9.06T
ata-TOSHIBA_MG08ACA16TE_XXXXXXXAFWTG 14.6T - -
ata-TOSHIBA_MG08ACA16TE_XXXXXXXDFVGG 14.6T - -
special - - -
mirror-1 848G 24.2G 824G
nvme-WD_Red_SN700_1000GB_22012N8XXXXX-part1 850G - -
nvme-WD_Red_SN700_1000GB_22012N8XXXXX-part1 850G - -
special_small_blocks=64K and the default record size is 128K. The speedup is very very noticable (for example zdb -PLbbbs just flies, doesnāt take a minute.)
As per best practice I have to wait a couple months to add the third SN700 to the special mirror to stagger the device aging.
ok so iv aded the new mirror-3 but i dont see any data on it?
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
bigblock 25.8T 9.08T 16.7T - - 0% 35% 1.00x ONLINE /mnt
mirror-0 12.7T 4.53T 8.18T - - 0% 35.6% - ONLINE
9c49ef16-716d-11ec-8ca0-e41d2d7f4000 12.7T - - - - - - - ONLINE
9c68698c-716d-11ec-8ca0-e41d2d7f4000 12.7T - - - - - - - ONLINE
mirror-1 12.7T 4.52T 8.20T - - 0% 35.5% - ONLINE
9c1f9065-716d-11ec-8ca0-e41d2d7f4000 12.7T - - - - - - - ONLINE
9c09833f-716d-11ec-8ca0-e41d2d7f4000 12.7T - - - - - - - ONLINE
special - - - - - - - - -
mirror-2 238G 21.7G 216G - - 5% 9.12% - ONLINE
9ae1c0c6-716d-11ec-8ca0-e41d2d7f4000 238G - - - - - - - ONLINE
9af5dd71-716d-11ec-8ca0-e41d2d7f4000 238G - - - - - - - ONLINE
ef72e0f5-fdd0-4b42-9ea2-b6459171972b 238G - - - - - - - ONLINE
mirror-3 110G 184K 110G - - 0% 0.00% - ONLINE
nvme1n1 110G - - - - - - - ONLINE
nvme2n1 110G - - - - - - - ONLINE
will that data move when I put in the remove mirror-2 command? iv taken backups so im not too scared but Iām a little scared
ZFS never moves data by itself by definition. In a so called evacuation process (zpool remove
), the contents of the evacuated device will be distributed across all remaining vdevs. Those 22G should be allocated towards mirror-3.
Worst thing that could happen is that ZFS distributes the data across your data vdevs
You should be fine as long as you donāt lose power while the data is flight, and as long as your system is stable.
There is no in-flight. The āevacueeā stays active and running until everything is finished and checksummed. I tested this by pulling the plug myself (āhey letās test this zfsā-afternoon). It just treats the evacuation/removal as cancelled and you have to re-enter the remove command again (or resume process, itās a bit like scrub (same syntax, same listing under zpool status). Only after everything is done, the transaction is committed to the pool.
Built in safe guards do not prevent unexpected behavior is all I am saying.
Data corruption happens when power disappears.
Does anyone know what search terms I have to use to find the card at 7:49 in the video? I havenāt been able to find an quad slot cards for $20 and I havenāt found one by Gigabyte.
I believe that low profile blue card is the GIGABYTE CMT4030. Good luck finding one for a decent price.
ālow profile pcie 4 slot|port m.2 riser x16ā is probably what you need to find similar things. Here is a double sided linkreal card
https://www.aliexpress.us/item/3256803691464173.html
https://www.newegg.com/p/17Z-00TX-000B5
Do note itās gen 3.0