I’ve been trying to search and find some recommendations but haven’t had any luck so I figured I’d post in the place I know would be most likely to have many who would know.
Backstory -
My brother in law gave me eight ST4000DM000 Seagate 4TB white label drives that were retired from a network attached DVR. Three of the drives were already dead by the time I got them so I set up the remaining five drives in a RaidZ1 configuration. Unfortunately one of the drives died within the first month or two. I shut off the NAS in hopes that it was safer for my data than leaving it running due to the lack of failure protection.
The fork in the road -
I have the ability to empty an 8TB external SMR drive to move the NAS data to. I would need to shuck it and prepare it first so the write speed wasn’t complete trash. Doing this would allow me to put the replacement drive of the same model that I just scored for $20 into service and transition to RaidZ2 since I really don’t trust these things that much. I just don’t know if it would be better/safer to rebuild the replacement drive first.
SMR is always trash in ZFS. Resilver means shuffling around the entire pool → lots of random writes. And the slowest drive sets the bar in an array.
You have worse performance while the pool is degraded. I’d do a resilver first. Resilver is always the first to do.
BUT: I rather wait a day or a week than plugging SMR into a RAIDZ resilver. If the pool isn’t mission critical, I’d power off the server. And I’d order a spare as well. Because if you expect other drives to fail, you want the spare to kick in asap and then you need a new drive anyway.
The SMR drive will never see a pool. The SMR would be used to copy the data off of the pool so I can destroy and recreate it as RaidZ2 instead of RaidZ1.
I am aware the pool will experience degraded performance due to the on the fly parity calculations to rebuild the data. But it will save me time in the long run if I intend to remove the data from the pool to change it’s configuration anyway. That is unless I am missing something and I can change reduce capacity and change to Z2 without needing to destroy and build a new pool
In that case…yeah. backup and recreate pool with the new config. Sequential write on (external non-ZFS) SMR drive should be fine.
If you are running with old drives and without spare, this will always happen. Always have a spare ready. My backup pool/disks are also my (cold) spares.
If you want to do Z2 with the pool, use a spare. Costs a disk but from the experience you mentioned, it’s probably not dead weight in the long run.
edit: You can send the pool snapshot as a file to the other disk. So you only have to do a send/receive for restoring the pool, should be faster and you keep all of the pool, not just files. And it’s “SMR-friendly” sequential and serialized work.
Yes, ideally I would run RaidZ2 with a cold spare or two and potentially even a hot spare; I just don’t have enough drives. Originally, my plan was to go with Z2 on 6 drives and have two cold spares when I was informed I would be getting 8 drives, but when 3 turned out to be dead I changed my plans. Ultimately this NAS is my test/toy to learn from before I am able to finally upgrade my X99 desktop (there is no HEDT worth upgrading to still - it’s all Lite Workstation far beyond reasonable budgets). At that point I plan to get a 16 core CPU, 256GB of DDR4 ECC an HBA or two and have at least 8, if not more drives. Oh and a pair of SFP+ or QSFP depending on what is the fastest practical option at the time.
This pool snapshot is interesting. I am not terribly worried about transfer speeds though as the NAS does not store small files, just large videos and full disk images for my PC backup. I also know that the transfer rate is fine to the drive when it’s fresh - it’s already filled to the brim non data files. But I know that if I just delete those files and try to re-write the entire drive that it will not be the same experience I had the first time. I also know it is not possible to do the prep on Seagate externals and they need to be shucked. The drives are at least 6 months out of warranty and having so many USB drives hooked up to my desktop makes boot time a nightmare so shucking them would be a good thing anyway.
Trial by fire is always the best way to learn things…although it’s inherently rather uncomfortable But done a fire drill in replacing and resilvering a pool…you’ll be well prepared next time.
I use mirrors. If I’m out of drives, I remove a mirror from the pool and use the remaining drive as a spare. Mirrors are great and fast
Q = quad, means 4x. So SFP+ = 10G and QSFP+=40G. I get SFP28 myself, the 25Gbit option because that’s the new stuff and future proof. QSFP28 is 4x25=100G.
ahh yes that’s right sfp28 is what I was talking about. I couldn’t remember it and when I googled sfp vs sfp+ to remember the third, QSFP was what popped up - ironically, it mentioned QSFP28 but not SFP28. I should have made the connection. haha
I understand raid 1, but how does mirrors work with ZFS - or is it really as simple as software raid 1 with z file system?
Or something different entirely - like my friend who uses unraid and confuses me every time he talks about the stuff he does with it because it breaks the mold of what my brain expects. haha
It’s like RAIDZ, but without all the troubles of parity RAID (like no vdev removal, slow resilver or no RAIDZ expansion).
You make a mirror vdev for each two drives you have. Two mirrors are RAID10 so to say and you can just add mirror after mirror. You lose capacity compared to RAIDZ, but you get performance and flexibility. Two new drives with 12TB? no problem, add a mirror vdev. Old 4TB drive died and no spare? remove the mirror + have a new spare for the remaining 4TB mirrors.
I don’t know what I have or need in 5 years and whether all my drives survive. I like having options.
Ok, I’m still confused. Do you have many different root shares of mirrored drives and simply move data around them if one of the mirrored drives fails? If it’s striped at all - how does that work? I guess I still don’t entirely have my mind wrapped around vdevs as I’ve never had more than 1 in 1 pool. I know there is more complicated configurations, but finances don’t let me have enough toys for the hands on experience. At least not in a practical way.
With ZFS there is only one pool. Each pool consists of vdevs. A vdev is a “device group” in a RAID config of your choosing, either mirror or RAIDZ. Each vdev gets a share of the data, so you get load-balancing and performance increase for every additional vdev.
With e.g. 12 drives you can do one vdev with a 12-wide RAIDZ, two vdevs with 6-wide RAIDZ or 6 vdevs in mirrors.
If you need more capacity, you add a new vdev. That’s how ZFS works. And all filesystems immediately get the +12TB capacity. It’s global and seamless.
Ok I guess I’m just not making the link of how the multiple vdevs are handled within the pool. In hardware terms would be like JBOD? Like raid 0? Clearly it’s not raid 1 or 6 mirrored vdevs would be the capacity of a single drive.
The way it’s sounding, it would be like the pool would be the motherboard raid and vdevs would be like hardware nvme raid cards.
90% correct. It’s basically a RAID0 where each vdev is a disk. If that makes sense. So if you have 6 vdevs of RAIDZ1, it’s in essence a RAID 50 or RAID 50000 if you add a 0 for each additional group. This feature wasn’t invented when names were made
But RAID50 or RAID10 are more or less common terms.
Nice analogy But with ZFS you don’t lose your data. Unlike nesting RAID card inside of “BIOS” RAID…never do this lol.
Ok, so I do like the sound of this. But can you still expand the pool with mirrored VDEVs if you opt for raidz1 or raidz2 if it’s possible and you’re a really paranoid madman? haha
Basically, I find it weird that you would be able to expand a virtual raidz (pool) but not a raidz of physical (vdev). And what would be stopping someone from just making every physical device it’s own vdev to effectively have an expandable raidz in the first place? Or is that a thing and I just didn’t realize it?
Also, yeah, it’s purely analogy. Multiple reasons not to nest like that. Cost, often lacking an ability to get stats of individual drives in hardware raid configurations, ect.
If you do not wish to talk shop, I’ll stop asking questions I can google; simply let me know.
You can. But as soon as you have a RAIDZ vdev, you can’t remove vdevs anymore. “Device removal” aka vdev removal only works if everything is mirrors. And that includes special metadata vdevs as well. A price you pay for RAIDZ.
You can always use the zpool expand feature if you want to replace disks with higher capacity disks. But then you got old but working disks lying around you can’t use. Which is why I never found that feature very useful. Every running HDD is additional IOPS and MB/s for the pool.
What I really need to do is scour my area for literal e-waste like a populated 12+ bay first gen SAS machine with 300GB HDDs that cannot support reasonable sized drives that’s free for pickup. Something I can actually create stuff on even if it will never serve an actual purpose.
A simulator would be epic. Like PC Building simulator but for ZFS. haha Although that wouldn’t offer the ability to test the outcomes.
My first ZFS was me grabbing every USB thumb drive I could find at home and connecting them to daisy-chained ancient USB hubs. Nowadays I’m more lazy and just use VMs for testing stuff.
As long as it is a block device, ZFS eats everything, from thumb drive, ramdisk to iSCSI share.
You can totally do a mirror vdev with a ramdisk and an iSCSI share, but you shouldn’t
Good idea. Though, I’d have to buy hardware specifically for that and not something I’m interested in. Although, having a dozen thumb drives would be handy. Not sure what I’d do with the crazy USB hub afterword tho. Oh yeah, Ventoy would replace the handiness of those thumb drives anyway; so back to lack of interest in buying stuff to do it. haha
So I finally got around to wiping that drive and I started the file copy around 8 hours ago. Due to my gigabit network connection, I think it probably would have been wiser for data integrity to get the drive in the array replaced. I think that would probably go faster than 110 MB/s. Or I should have at least put the 8TB drive into the NAS. Oh - and Windows decided to Windows and queue up nearly 10 files for simultaneous writes once it got to smaller (2 to 7GB plus kb sized files for each of the larger ones) on the SMR drive causing the write speed to tank to USB 2 speeds.