Backing up a ZFS Pool

anon89476829 · December 22, 2020, 9:23pm

Hi everyone,

I spend the last week setting up a few ZFS pools for storage. I have a mirrored pool in my PC for storage as well as a striped pool of the same size in a remote PC for backing everything up.

I am currently looking into the best way to backup my pools content to the backup pool.

I have seen in Aaron Toponce blog that ZFS offers a send and a receive command and I could do something like:

# zfs send pool/dataset@tuesday | ssh [email protected] "zfs receive pool/backup"

However as far as I see it these commands are completely agnostic of what data is already on the backup server.

# zfs send pool/dataset@wednesday | ssh [email protected] "zfs receive pool/backup"

Doing the above with a snapshot following after that one from tuesday would not only transmit the delta, meaning the changes between the two snapshots, but the complete data in the pool again, right?

What do you all see as the best way to backup a ZFS pool incrementally? Maybe I should better use a dedicated cross-platform backup software like restic?

TheCakeIsNaOH · December 22, 2020, 9:35pm

Restic is really good, so that is a valid option.

However, can’t you just use zfs send -i oldsnapshot newsnapshot for incremental? https://openzfs.github.io/openzfs-docs/man/8/zfs-send.8.html

anon89476829 · December 22, 2020, 9:43pm

I heard good things, I am a bit hesitant to use it though, since it is still in development a has not seen a 1.0 release yet. Have you used it, does it seem stable enough? Nothing is worse than needing to restore a backup and it failing!

Interesting, I will have a look! Thanks for the link!

Trooper_ish · December 22, 2020, 9:53pm

If you are just backing up a pool, you might also look at Syncoid, from the sanoid project

which can be set up for regular syncing, as long as both side use zfs

TheCakeIsNaOH · December 22, 2020, 9:55pm

I use it, and it’s worked fine for me. I don’t have anything to crazy (600~ gb, 80k~ files), and the only complaint I have is that it gets expensive to prune down the repository size on Backblaze, due to the quantity of api calls.

@SgtAwesomesauce also uses it, and also actually has contributed code.

SgtAwesomesauce · December 22, 2020, 9:57pm

I had 2.5TB at one point and it didn’t even sneeze.

I’ve since reduced that since backblaze costs were adding up.

We actually just merged a redone prune into master not long ago. You might find it significantly cheaper since it uses your cache rather than hitting the API for every query.

*I had no part in this, I was just following the progress.

SgtAwesomesauce · December 22, 2020, 10:53pm

If you go by that policy, you’re going to have a bad time.

Log · December 22, 2020, 11:10pm

As TheCakeIsNaOH mentioned, You can play around with sending different snapshots with “-i” and “-I”.

-I sends ALL present snapshots on the sending dataset between the older snapshot and the newer snapshot references.

-i can be thought of as just referencing a common older snapshot, and sending whatever delta is needed to create the newer snapshot on the receiving machine. Multiple snapshots can apparently be sent this way.

If you are sending a dataset on the local network, and thus don’t need any security, you can forgo ssh. Here are some examples. In both cases I highly recommend using tmux, which will preserve your session even if you close the terminal, and allows you to easily join back.

Also note that these are per dataset. If you want to send the whole pool, all datasets and all snapshots as a one time thing, look into -R, which I think can also be used with -i and -I, but no clue how that works out.

No SSH

Start the receiving machine first:

This listens on port 9090, has a 1GB buffer, and uses 128kb chunks which should ideally match or be an even multiple of your zfs dataset recordsize (default 128K). This will likely tell you you need to force things, but for safety I’m not including that in a command that others may copy and paste blindly. Add -F after the receive.:

mbuffer -s 128k -m 1G -I 9090 | zfs receive data/filesystem

Now run this on the sending machine:

The IP address is the address of the machine that is receiving.

zfs send -i pool/dataset@oldsnapshot pool/dataset@newsnapshot | mbuffer -s 128k -m 1G -O 10.0.0.1:9090

Using SSH

No fucking clue. I’ve never actually used ssh and zfs send since all my shit is local, so I don’t have the magic command written down. I’m told you should also use mbuffer with ssh. Again, no idea how.

The big brain way

It’s best to setup sanoid or pyznap to manage this automatically. They make use of ssh, mbuffer and I believe even save and resume progress of an interrupted send.

ChuckH · December 23, 2020, 6:51am

I’m new to ZFS and have been playing with it lately. I tried the ZFS send/receive commands locally (from root pool to another pool on the same machine) and got a little caught up in having duplicate mount points since the dataset I was sending had a manually set mountpoint.

I think you can add some options to the command and fix that issue, but I decided to give syncoid/sanoid a try before continuing down that path. Getting Sanoid setup was super easy. I have not tried getting syncoid setup yet, but from what I’ve read it can keep datasets synchronized over network and should be a lot less setup than manually running scripts with ZFS commands.

anon89476829 · December 23, 2020, 6:28pm

Thank you I will have a look!

Sounds good! I will try it out I think, backup is important for me since I do not want to loose my data. I have around 12TB I want to back up currently so it should do it’s thing good.

Fair enough! Thing is nothing is worse than being in the need to restore from backup and then the backup does not work. This is the only reason why I am so picky here. I have the impression that most of the backup software needs some work, since it often has features missing and you find a lot of threads about problems with it. Restic seems to be a really good candidate here since the main developer seems to be really passionate about it and at least people are still working on it. It’s not one of these stale projects. Currently for me it is really between using restic and Sanoid/Syncoid for native ZFS functionality!

Yeah I am currently looking into Sanoid and Syncoid as well. I like the idea to utilize native ZFS features like Snapshots! Thanks for the hint with the mountpoint, I will see how Syncoid handles this.
For testing I currently have the same setup as you, every pool locally. But in the future I will put the backup pools in a dedicated computer.

SgtAwesomesauce · December 23, 2020, 8:46pm

I’ve had multiple suggestions shot down by Restic because it was going to require a repository format change. The maintainer team values stability above all else, even to the detriment of space or performance efficiency.

Additionally, we’re rounding the corner to a 1.0 release hopefully in the next year or so, IIRC, there’s a list of what needs to be done somewhere.

anon89476829 · December 27, 2020, 8:53pm

Thank you for this information. I am sure I will find usage for restic but for now I have setup Sanoid and Syncoid since I want to use some of the data I backup on the backup-PC as well and being able to just mount the filesystem on that PC is a huge advantage.

I have marked the answer mentioning the software as the solution!