ZFS migration and backup

Zorak · September 20, 2023, 2:57am

I have a question about the best way to migrate and backup my data. I’ve been looking around and actually found a lot of answers, but they differ (even contradict) themselves and can’t say 100% they match my use case.

So, here’s what I’m doing:

PC1 (Ubuntu Mate 20) has pool A (1 raidz1 vdev). It’s smb-shared to my LAN.
PC2 (Ybuntu Mate 22) just built, about to create pool B (1 raidz2 vdev, much bigger)
I want to copy everything from pool A to pool B, so that having access to either PC1 or PC2 enables network users to read the data (the same way they currently read pool A). At that point, PC2 will substitute PC1 as my always-on home server.
After it’s done, I want to sporadically update pool A from pool B. Remember, PC2 wil be the new main server, while PC1 will be turned on and off discretionally. All life will be happening in PC2-pool B, PC1-pool A will just be a backup, but I’d want it to be as readable (so when PC1 is it can save the network trip, for instance, but also in case PC2 would be down in some emergency).

So my questions:

What’s the best way to copy all data from pool A to pool B so as to replace PC1 with PC2 as server afterwards?
What’s the best way to do the sporadic updates on the copy that’s left on pool A?
Bonus: considering PC1 will now be mostly switched off, is there a way to automatically start the computer, run the update, and shut it down again after it’s finished, so I can move from sporadic to periodic updates?

For (1) I’ve mostly found the answer is zfs snap then zfs send/receive, but it wasn’t clear to me whether this will just backup a snapshot to be restored later or whether it will create an identical, “live” dataset in pool B I can immediately share via smb and use as I was using pool A.

For (2), I found multiple solutions, third-party tools (more ZFS snap+send, rsync, syncoid, unison…), with people describing all sorts of successes an failures. Again, I’m confused about which tool best serves my purpose here.

Thanks in advance for any answers!

Dexter_Kane · September 20, 2023, 3:55am

I use sanoid (which includes syncoid) to do pretty much what you’re looking for. It’s just a script for automating zfs send/recieve so you can achieve the same thing manually if you want, plus doing it manually gives you more options like if you want to move or copy a dataset while also changing some of the settings on the new dataset.

Zorak · September 20, 2023, 4:04am

Thank you for your reply. So, to confirm:

zfs send/receive (directly or through sanoid) will just give me a living copy of the data
*it will not overwrite the properties of the new pool

I’m asking the latter because one of the issues people using zfs send/rcv reported is that properties like recordsize would be retained during the migration, which would be bad in my case: I made my first pool when barely starting with ZFS, and left everything on default values (recordsize 128, no compression, xattr on…). My new pool will be tuned in many ways compared to that. Will ZFS send write the date in the destination pool according to that pool’s properties, or will it pass on the dataset on pool A with all its (now undesirable) properties?

Dexter_Kane · September 20, 2023, 4:13am

By default it will copy the dataset with it’s properties however you can specify new properties as part of the command when it creates the new (copied) dataset.

You can specify if the new dataset will be read/write or read only, there’s no difference in functionality between the copy and the original, however if you’re synchronising a master pool to a backup pool it probably doesn’t make sense to have the backup be read/write. If you want to be able to write data to both pools and have that data replicated across both of them then some kind of cluster file system might be a better option.

Zorak · September 21, 2023, 2:57pm

Thanks for your reply. Upon further reading, it seems zfs send/receive will let me shrink the recordsize property, but will ignore it if I try to expand it (sadly, my case)… can anyone confirm? Is the alternative to just cp, losing any metadata in exchange for writing the data with the new parameters? Or will zfs send indeed accept and perform the increase in recrodsize, compression, etc?

Dexter_Kane · September 21, 2023, 3:04pm

You can set whichever options you like, I don’t think there are any limitations.

Exard3k · September 21, 2023, 3:28pm

If you send the snapshot to remote dataset xy, the properties of xy apply. It does not replicate the dataset properties.

E.g. if you have uncompressed data on machine A and send the snapshot to machine B that has compression enabled for that dataset, it will compress the data on the receiving end.

And once everything is on the second machine, you want to look into incremental send/receive. It will only transfer the delta between both datasets and is amazingly fast. You just state the old snapshot and then the new snapshot in the send/receive command and ZFS figures out the differences on a block level.

send/receive is basically wire-speed. Or as fast as your drives can sequentially read/write data. It’s always max speed. Yesterday my pool delta was 4GB (didn’t do much this week) and sending the recent 15T snapshot was done in 6 seconds. It’s frighteningly fast.

dazagrt · September 21, 2023, 5:10pm

Klara has a webinar on OpenZFS data replication you might be interested in. You can take anything that Klara Systems say about OpenZFS as gospel, they really know their stuff.

Exard3k · September 21, 2023, 5:16pm

If you write the go-to book on the topic, sit in the openZFS project lead for a decade and do this all day, you get people like Allan Jude from Klara. Always good to listen to him. Wendell had him for a chat in a video or two

Zorak · September 21, 2023, 5:47pm

Thanks - will watch!

I realized I was just reading this article a couple of days ago without realizing exactly who was behind it

Zorak · September 22, 2023, 6:33am

So, I watch the webinar and now I’m almost convinced I cannot achieve what I want in step 1 with ZFS send/receive. However, it will work for step 2.

The reason is that they make very clear that send/receive works at the block level, and makes sure the blocks in the destination are identical to the original ones. It further tracks blocks for future incremental replications. Hence, altering the block structure doesn’t seem possible. Admittedly, people who know what they’re doing have identically set file systems to begin with, so they don’t discuss issues like mine much
Furthermore, this bug report thread seems to confirm that a sent snapshot will not be written with a different recordsize at the destination. The final argument is: receiving a snapshot is not rewriting the data, so properties such as recordsize (which you can change even in the original pool) are irrelevant, as they apply to new writes. Received snapshots are “old writes”, and the end result is the same as creating pool B with recordsize 128, copuing the date, and then changing it to 1M.

What apparently I can do is to write the data anew (scp? smb?) on pool B, then send it to poolA (which would need to be emptied and apply the -o options, or destroyed and recreated better, in line with pool B), and continue to update it with send/receive (probably via syncoid) in the future.

[As an aside: I took a couple of older drives to try to test this myself, but I have no clue how to check the result. zfs get recordsize will just give you the parameter value last set, used in new writes, but is not informative on how existing data was written. And I’m literally clueless as to how to read the various forms of zdb output. If anyone knows how to find actual block sizes used by existing data with zdb, I’m happy to experiment further]

Zorak · September 22, 2023, 2:54pm

Update: still no clue how to read zdb, but I made a few “on disk” file size comparisons with du -B:

Initial setup
testA: all defaults (no compression, recordsize 128K, xattr=on, etc)
testB: intended setup for my new pool (lz4, recordsize 1M, xattr=sa, etc).

created “testdata”, put one ~480MB zip file in it. Snapshot and sen/receive to testB. Result:

sudo zfs send testA/testdata@2 | sudo zfs receive testB/backups/testdata

du -B 1 /testA/testdata/2020-10-17_13-30-06.zip 
480838144	/testA/testdata/2020-10-17_13-30-06.zip

du -B 1 /testB/backups/testdata/2020-10-17_13-30-06.zip 
480788992	/testB/backups/testdata/2020-10-17_13-30-06.zip

Next, I cp the zip file from its original source directly onto testB. Result:

du -B 1 /testB/alternadata/2020-10-17_13-30-06.zip 
480551424	/testB/alternadata/2020-10-17_13-30-06.zip

which is smaller than both. However, the two testdata copies are not identical. Then again, I have too many different variables at the same time. I made an identical dataset in testA, “uncompressed”, and then turned off compression in testB. Y snapshot-send-receive the new dataset:

sudo zfs send testA/uncompressed@1 | sudo zfs receive testB/backups/uncompressed

du -B 1 /testA/uncompressed/2020-10-17_13-30-06.zip 
480838144	/testA/uncompressed/2020-10-17_13-30-06.zip

du -B 1 /testB/backups/uncompressed/2020-10-17_13-30-06.zip 
480838144	/testB/backups/uncompressed/2020-10-17_13-30-06.zip

So, it would seem the original differences came from compression, while compression+recordsize explain the further differences between send and cp. However, cp into testB without compression is actually bigger than anything else…
Anyway, with compression out of the way, I made yet another dataset with the same file, to test the -o modifier. What I get:

sudo zfs send testA/addedopts@1 | sudo zfs receive -o recordsize=1M testB/backups/addedopts

du -B 1 /testA/addedopts/2020-10-17_13-30-06.zip 
480838144	/testA/addedopts/2020-10-17_13-30-06.zip

du -B 1 /testB/backups/addedopts/2020-10-17_13-30-06.zip 
480838144	/testB/backups/addedopts/2020-10-17_13-30-06.zip

At this point, I’m quite convinced I cannot get out of the existing recordsize with a send/receive operation, but would rather need to cp (or rsync, etc) the data to pool B so that it has the intended properties, then back it up into a reborn pool A.

Don’t know if anyone else actually cared, but since I made the thread anyways, I thought I’d shared as far as I got in case a search engine tricks someone into this thread.

Thanks to everyone chipping in!

gogoFC · September 27, 2023, 3:52am

Well you didn’t overcomplicate this at all

You should use terms such as Main server and Backup server. And you should use them as such and not just give them those names. One PC should be the Main server and the other one should only ever be used for backing up the pool. Do not do what you wrote you want to do which is having two pools with same data and then editing data on each pool so then they will have discrepancies, you’ll be saving data all over the place. Only one pool should be SMB shared, you don’t even have to mount the backup pool, but you can.

So rethink the purpose of having two PCs. If the purpose is bakckup then fine, if the purpose is High Availability then ZFS isn’t a HA filesystem and you can’t do that.

I don’t know what problems other people had mentioned on forums but that is irrelevant.
Another thing the 1MB bug isn’t proof of anything that you concluded, it’s just proof that there was a bug on one specific OS in one specific use case, which I didn’t read.

Don’t worry about block size. Worry about block size when you’re tunning a Database and creating a new Dataset for the DB. For now 128k is just fine. You don’t need 1M. If you really wanted to 1M for movie streaming Dataset then sure you could enable it, but don’t worry about it.

So to quote Nike Just Do It.

zfs send | zfs recv is all you need.

Incremental zfs sends are also possible and yes syncoid can probably do them.

https://www.headdesk.me/ZFS_incremental_replication

If you just type zdb on FreeBSD you can see the ashift of a vdev

if you type zfs get recordsize pool/dataset it will give you the recordsize of a Dataset.

A good guide

https://illumos.org/books/zfs-admin/preface.html#preface

An example of incremental send would look like this

zfs send -i rpool/USERDATA/user0_sx0jq9@inc rpool/USERDATA/user0_sx0jq9@inc4 | ssh user3853@localhost zfs recv -F rpool/USERDATA/user3853_2bv9ey/snapnomount

Where you take a snapshot and then send it and the next time you take a new snapshot and then you send the incremental snapshot with the -i flag. Of course you need recursion for taking snapshots of child datasets and sending them via -r and -R flags, sometimes if you place multiple flags they need to be in a logical order otherwise it doesn’t work.

Zorak · September 27, 2023, 10:39pm

Are you trying to “sudo use these names” me?

Well, I must have written it in a complicated way indeed, since that’s not what I wrote I intended to do. To recap, I’m moving from pool A to pool B, then pool B will be the only “live” one, and pool A (or a new pool using those drives) will receive backups.

Other than that, thanks for the effort in trying to contribute. I have already found my answers, though, as reported in my last posts, and I have implemented step 1 successfully, . Essentially, the correct way to achieve my 1st step was to copy the data into the new pool, thus obtaining a zfs dataset with the right properties, which can now be backed up through zfs send/rcv to a backup pool.

“It’s not a bug, it’s a feature” In the sources I cite, the bug report discussion, and the Klara webinar linked by @dazagrt it is made very clear that zfs send/receive is meant to respect the sent dataset in every respect, and that receiving a snapshot is not “writing” data in the zfs sense, therefore the parameters governing how a pool/dataset handles writes (which includes recordsize but also many others) are irrelevant for how the zfs-received dataset is treated.

In sum, to everyone who wants to do exactly the same as I did and stumbles upon this thread:

the way to migrate a zfs dataset (or, really, any data you have stored in whichever way) to a new zpool and make sure it is stored according to your settings for the new pool is to copy it to the new pool (with or without some modifiers depending on whether you want to keep permissions, ownership, etc).
zfs send / receive -i will perform the backups as intended, as stated by @Dexter_Kane and @gogoFC