I’m curious to hear people’s strategies for verifying the integrity of offline backups, e.g. external hard drives or magnetic tapes (on Linux). I’d be particularly interested in solutions which use basic tools like tar, dump, xfsdump, but I’m open to hearing about other software as long as it’s stable and open source.
One idea (which I have not tried): if the media is is a hard drive, you could put a dm-integrity volume on the drive, and then dump your backup to that. If you do this, you’ll find out on restore whether there has been any silent data corruption. You can also scan the whole drive for integrity by using dd into /dev/null, though that’s a bit inefficient particularly if the drive isn’t full.
BSD has a tool called mtree (which I think has been ported to linux) which can store a list of file paths and their checksums, and can accept such a list and verify the files. Perhaps that could be used as well, as an alternative to dm-integrity. Though if tar/dump/xfsdump (something other than just a filesystem) is being used, I’m not sure how one would verify the backup without restoring it…
I think of course this depends what you’re using to backup. I’m happy that zfs replication does what is its job, but that’s no use where that’s not your platform.
I also utilise borg - it can also mount the archive as a filesystem (and allows to run integrity checks on its archives itself), on which;
I can’t say I do this often, but if I experiment with a utility I’ve not used recently I might spot check before/after restore mostly, to verify I did it correctly. but if you just ran an rsync, it’s possible (and of the least ‘marking your own homework’ to do the same on the target volume and diff the source and target .
I don’t, but I could; a zpool scrub on the filesystem will detect error by a failing crc, but on single drive not fix (assuming anything is readable).
There is (with caveats) the zfs copies={n} property - https://jrs-s.net/2016/05/02/zfs-copies-equals-n/ - but as the article says in its title, this is not a substitute for device redundancy it’s targeted at bit rot, this seems like your interest however.
There are a few things to be concerned about going wrong and depending on your concerns it can change the strategy a bit. If the concern is the backup program (tar, zfs send, whatever) generating an incorrect backup file, a test restore is the only way your going to find it.
Assuming you’re confident in your backup program it allows some different strategies. If you save the file to a filesystem that checksums data like zfs, a scrub gets the file check “for free” but managing external drives with zfs can be a pain.
What I do is use gpg to encrypt and sign the backups as they’re copied to the external drive. This way a lost disk is less scary but the encrypt step could be skipped and you could just sign them. The idea is to be piping from the backup program through gpg and not just gpg the file after it’s written. If using a filesystem like ext4 you will be exposed to botrot if you work with the files after writing them and I have had files fail a signature check just after writing. You wouldn’t want to sign an already corrupted file. If you verify the signature it ensures the file is unchanged on disk with any corruption resulting in a bad signature.
This can be adapted to just using checksums shh256sum but make sure you are piping the data rather than summing it on a “suspect” disk.