Snapshots & settings for storing backups on ZFS

voltagex · October 24, 2023, 3:40am

After Alternatives to Windows Defender I am realising I need a much better setup for storing my backups.

What settings and snapshot tools/scripts would you use for a dataset that is storing backup files?
What settings would you use for a dataset that’s storing system images? I am distinguishing between the two as I don’t really want to be carrying snapshots for 2TB system images.

SgtAwesomesauce · October 24, 2023, 4:17am

I operate on a 3/2/1 model for backups, and don’t snapshot my local backups.

What I do is use Restic to backup to a dataset on my NAS. That backup then gets shipped to Backblaze B2 with rclone.

The benefit of the two-stage, rather than just using the rclone REST API is that you get better backup speeds to local stuff, then anything additional gets shipped to the cloud asynchronously.

xzpfzxds · October 24, 2023, 4:36pm

I use syncoid for ZFS replication between backup machines, and rsync to get the data there if the source does not use ZFS. No special settings other than encryption and compression=zstd-1.

Why not? That’s the perfect application for snapshots.

To make it efficient you need to ensure that the backup process updates the existing image file on the backup server, rather than creating a new one and deleting the old one (e.g. rsync --inplace) then the only space used by each snapshot is the blocks which actually changed between it and the previous snapshot.

voltagex · October 25, 2023, 4:26am

What determines the block size here?

xzpfzxds · October 25, 2023, 10:17am

It’ll be between the sector size of the disk from the system you’re imaging (usually 512B or 4kB), to the ZFS recordsize of the dataset (default 128kB), depending on where the blocks are distributed in the file and how they were aggregated into ZFS txg(s).

If you modify one 512B sector between snapshots then it’s possible then not much more than that is used by ZFS (after the necessary internal merkle-tree structures), since the recordsize is a maximum.

voltagex · February 4, 2024, 1:44pm

Sorry for the delayed response, does this mean that this changed block detection will only work if the file name doesn’t change?

xzpfzxds · February 4, 2024, 8:04pm

Changing a filename only changes the block containing that directory entry, updating it’s checksum and its parents checksums. That’s efficient, and zfs send/recv of that type of change is much more efficient than rsyncing that type of change, because rsync has no way of knowing that the contents have not changed.

Copying the entire contents of a file to another file with a different name, then deleting the original would be sent as a change of whatever the stored size of the file is (after ZFS compression and recordsize overheads). That won’t be efficient.