Feedback on my data backup strategy wanted

I think I’ve got a pretty good handle on protecting my data, but I’m looking for smart people to point out any issues. This does not cover my bulk data like movies etc, as those are not worth backing up to this level. For this data set, think family photos and documents etc.

  • How the Data is stored

The data is stored on my Supermicro based TrueNAS box, 3 x 4TB mirrors in the pool consisting of 4TB SATA and SAS disks. 2 x Intel DC S3700 SSD’s in a mirror for metadata, and a Samsung 970 EVO Plus for L2ARC. Server has 64GB ECC, and is powered by redundant power supplies, which is backed by double conversion UPS and standby generator. I always stay a few versions behind and update my secondary NAS first to test the updates. I am using encryption with passphrase which does not auto unlock, I must enter the password manually (Protects against theft)

  • Snapshots

I take snapshots every 5 minutes, which thins down to hourly, daily, weekly, monthly and then expires at 1 year.

  • Replication/Primary Local Backups

All of the above snapshots are replicated to my secondary TrueNAS box. The secondary box is actually a VM on my ESXi host. This turns those snapshots, into backups. It also means my data is always available and up to date by at most 5 minutes in case my primary NAS runs into a problem and I need to get and use my data. Second NAS is also encrypted.

  • Cloud Backup

The data is backed up using Arq which runs in a VM on my ESXi host. I am using their Arq Premium service which includes software and storage. The data is stored for just 5 days and the backup runs nightly. All data is encrypted, and available via the Arq web portal, so I can easily restore a file to a different computer, from everywhere.

  • Long Term on-site backup

Also using Arq, also nightly, the data is backed up to the secondary TrueNAS box with a very limited file selection (No software/games for example) with 10 year retention set.

  • Long Term off-site off-line backups

In a Debian VM I have borg setup in. Every 1 month I rotate a SATA disk in a protective case which has a borg backup stored on it. I keep one drive in my desk drawer, and another at my wife’s work, and I swap them out. Drives are encrypted, and I’m using Borg so there is some software diversity.

  • Problems

First, a problem is that both the primary and secondary NAS are located in the same room and same rack. This is fine for being a local backup, but it would be more ideal if it was in a different part of the house.

Second, Arq as the long term on-site backup isn’t as ideal as borg. Borg will be around forever, and I doubt Arq will be. I don’t want to switch it to borg as then I lose some software redundancy, but then I still have the cloud backup so its probably fine. I already ran into this problem in the past using Synology Hyperbackup, and now I have to store an 800GB backup file as long as I want those backups, with no way to thin the file out, etc.

Third, while the off-site hard drive is a great idea, its unreliable. For a while both me and my wife worked from home, and I had nowhere to take it. Now its okay because my wife is back at the office, but if she works from home again, I’m out of luck again

What do you think? How could I improve?

Impressive, to me anyway. I meet people on-line unaware of the utility of frequent snapshots.

I thought about going every 1 minute and thinning to 15 mins but that seemed excessive. So far 5 Mins has served me pretty well, and having that replicated data just 5 mins out is great

First of all, test each copy! I mean, perform an emergency data recovery from each backup / location and make yourself a summary of how quickly and without problems you recovered your data. I have already seen some miraculous backup combinations that, when it came down to it, didn’t work.

Also think about data consistency… introduce a uniform bit control mechanism so that you can be sure that each copy is always identical with the others and the source and that you did not get bit rot in one of your copies.

Just don’t forget to delete old snapshots from time to time. If you’re doing recursive snapshots too, you end up with 100k+ snapshots really quickly and performance will suffer and deleting snapshots (especially old ones with a lot of delta on the dataset) will generate quite a bit of load on the system.

Well there is a lot that can happen in one year in terms of differing data. Not so much for static data, but my pool has enough “movement” so this becomes a problem far earlier than a year.

Otherwise, yeah…looks solid.

They auto expire, so far I’ve not noticed any slipping through like what would happen with Synology

I have a few separate datasets for data that has a high churn rate, /Data is my main important data, but then I have /share which is stuff I still care about, but is temporary. So those Snaps all expire after around a month, and then I have /temp which has just 1 days worth of snaps. If I’m going to download something and mess with it, like an install ISO of some software, it gets nowhere near /data