Optimal compression for (borg) backups

sceps · July 31, 2019, 8:34pm

Has anyone found optimal compression settings for (borg) backups?

The main contenders are lz4 (default) for speed, lzma for maximum compression, or facebook’s new configurable zst which seems to have very good compression-to-speed tradeoffs.

From `borg help compression`:

Valid compression specifiers are:

none
Do not compress.

lz4
Use lz4 compression. Very high speed, very low compression. (default)

zstd[,L]
Use zstd (“zstandard”) compression, a modern wide-range algorithm.
If you do not explicitely give the compression level L (ranging from 1
to 22), it will use level 3.
Archives compressed with zstd are not compatible with borg < 1.1.4.

zlib[,L]
Use zlib (“gz”) compression. Medium speed, medium compression.
If you do not explicitely give the compression level L (ranging from 0
to 9), it will use level 6.
Giving level 0 (means “no compression”, but still has zlib protocol
overhead) is usually pointless, you better use “none” compression.

lzma[,L]
Use lzma (“xz”) compression. Low speed, high compression.
If you do not explicitely give the compression level L (ranging from 0
to 9), it will use level 6.
Giving levels above 6 is pointless and counterproductive because it does
not compress better due to the buffer size used by borg - but it wastes
lots of CPU cycles and RAM.

auto,C[,L]
Use a built-in heuristic to decide per chunk whether to compress or not.
The heuristic tries with lz4 whether the data is compressible.
For incompressible data, it will not use compression (uses “none”).
For compressible data, it uses the given C[,L] compression - with C[,L]
being any valid compression specifier.

My thinking is that the optimal setting is the one that would maximize the compression ratio while maintaining high enough transfer rates so as not to become a bottleneck. i.e. it should be faster than the target backup disk and usb/network interface.

Thoughts? Testing methodologies?

tacofrog2 · July 31, 2019, 9:38pm

I’m not sure about your skill with scripting or bash, but you could probably write a bash script that does the following:

print uncompressed file size and current time
compress file
print compressed file size and current time

This is basic, but you can add some arithmetic to it that would print the compression ratio and total time taken.

Spiller · August 1, 2019, 10:16pm

As you have already noticed, some compression algorithms are more geared to some tasks. There are 3 main dimensions to a compression algorithm: encoding speed, compression ratio, and decoding speed. lzma focuses on compression ratio and scarifies encoding and decoding speed, while lz4 is the reverse. zst aims to be a deflate replacement, so it has similar characteristics to deflate, but outperforms it in all cases. These are all asymmetric algorithms, meaning that the encode time is much slower than decoding, but this is not true for all compression algorithms, some are symmetric.

So what you want is to find the algorithm that can give the best compression ratio while being above you minimum compression/decompression speed limits. You want something which can decompress faster than you can write to disc and something which can compress faster than you can transfer it over your network. So start by finding those and once you know that you can test different compression algorithms to find the one that gives the best compression ratio.

Note that the speeds/ratio changes depending on type of the data. Some files compresses well and this will lower the demands to the compression speed, so you need to test on data which is representative of what you are making a backup off. There are huge differences here, some type of files will easily become 10 times smaller, while others will barely change in size, that is just the reality of things. (Common stuff that doesn’t compress well are jpg, png, mp3, movies, zip files and other stuff which is already compressed. While stuff like executables and text compresses very well.)

Also, after finding the limits in your system, see how close you are to hitting those limits and check if the bottlenecks really are where you think they are. Maybe it is worth upgrading the hardware instead or paying for a better connection (if it is over the internet), as these will give you solid improvements, while compression will flutter depending on the data which are compressed at any given time.

sceps · August 2, 2019, 2:11am

I think I’ll do just that. Write some bash scripts to find the bottleneck in my USB3/dock/HDD backup target, using borg benchmark crud https://borgbackup.readthedocs.io/en/stable/usage/benchmark.html

Then, I’ll make a loop adapted from this example testing different compression algorithms and levels. I’m new at this so right now I’m thinking I’ll need a repo for each test case so that each borg create will be the first image without the benefit of deduplication or caches.

Considering every borg user would benefit from optimizing settings for their case, I wonder why an automated borg tune doesn’t exist.

@Spiller It’s all going to be local for now, but speaking of hardware upgrades, I just got a 10TB drive for the backups and I’m blown away that HDDs can do >200MB/s!! What? I’m used to ~100MB/s as the top speed in ideal sequential scenarios, at least that’s what it was for my 500GB drives.

Spiller · August 2, 2019, 9:36pm

That is certainly impressive, a gigabit connection wouldn’t keep up. LZMA is way to slow to handle that as well, ZSTD is probably a good bet here.

It took me a while to find it again, but this is a great site for checking out compression algorithms:
http://quixdb.github.io/squash-benchmark/
Be sure to pick a dataset that is at least 10 MB as there are a bunch of small ones and take the results with a grain of salt, but it is a great way to find alternative algorithms. Density looks interesting as a contender for LZ4 for example:

I don’t know how easy it is to add a random one to Borg however.

sceps · August 4, 2019, 9:26pm

Time for some surprising/disappointing results.

Synthetic benchmarks

borg benchmark crud

First column: [C]reate, [R]ead, [U]pdate, [D]elete
Second column: [Z]ero file, [R]andom file

C-Z-BIG         212.70 MB/s (10 * 100.00 MB all-zero files: 4.70s)
R-Z-BIG         320.26 MB/s (10 * 100.00 MB all-zero files: 3.12s)
U-Z-BIG        1608.99 MB/s (10 * 100.00 MB all-zero files: 0.62s)
D-Z-BIG        2893.47 MB/s (10 * 100.00 MB all-zero files: 0.35s)
C-R-BIG         115.99 MB/s (10 * 100.00 MB random files: 8.62s)
R-R-BIG         184.58 MB/s (10 * 100.00 MB random files: 5.42s)
U-R-BIG        1902.28 MB/s (10 * 100.00 MB random files: 0.53s)
D-R-BIG         365.61 MB/s (10 * 100.00 MB random files: 2.74s)
C-Z-MEDIUM      150.24 MB/s (1000 * 1.00 MB all-zero files: 6.66s)
R-Z-MEDIUM      380.77 MB/s (1000 * 1.00 MB all-zero files: 2.63s)
U-Z-MEDIUM     4389.63 MB/s (1000 * 1.00 MB all-zero files: 0.23s)
D-Z-MEDIUM     9237.31 MB/s (1000 * 1.00 MB all-zero files: 0.11s)
C-R-MEDIUM      108.29 MB/s (1000 * 1.00 MB random files: 9.23s)
R-R-MEDIUM      153.02 MB/s (1000 * 1.00 MB random files: 6.53s)
U-R-MEDIUM     4613.61 MB/s (1000 * 1.00 MB random files: 0.22s)
D-R-MEDIUM      288.00 MB/s (1000 * 1.00 MB random files: 3.47s)
C-Z-SMALL        33.25 MB/s (10000 * 10.00 kB all-zero files: 3.01s)
R-Z-SMALL       153.04 MB/s (10000 * 10.00 kB all-zero files: 0.65s)
U-Z-SMALL        84.67 MB/s (10000 * 10.00 kB all-zero files: 1.18s)
D-Z-SMALL       515.81 MB/s (10000 * 10.00 kB all-zero files: 0.19s)
C-R-SMALL        23.46 MB/s (10000 * 10.00 kB random files: 4.26s)
R-R-SMALL       144.29 MB/s (10000 * 10.00 kB random files: 0.69s)
U-R-SMALL        59.00 MB/s (10000 * 10.00 kB random files: 1.69s)
D-R-SMALL       142.62 MB/s (10000 * 10.00 kB random files: 0.70s)

This benchmark was done on the local SSD, so there is no bottleneck from a USB/network interface or HDD. The C-R-BIG result of 115 MB/s seems like the “better” case scenario. Zero files get totally deduplicated, so not meaningful here. For small random files it’s much slower at only ~20 MB/s.

So it would appear that USB/HDD won’t be much of a bottleneck after all.

Real data benchmarks

I also tested compression ratio and throughput with a script using my own data. It varies a lot depending on the data. Again all on the local SSD.

Backing up a ~35 MB directory of mostly documents (openoffice, pdf):

algorithm	compression ratio	rate [MB/s]
none	0.99	80.60
lz4	1.13	68.96
zstd,1	1.16	62.90
zstd,3	1.19	56.88
zstd,6	1.20	39.99
zstd,9	1.20	37.24
lzma	1.21	4.76

Backing up /etc:

algorithm	compression ratio	rate [MB/s]
none	0.99	28.17
lz4	2.72	10.21
zstd,1	4.09	6.54
zstd,3	4.31	6.09
zstd,6	4.49	5.10
zstd,9	4.67	4.02
lzma	5.41	0.50

lzma really doesn’t seem worth it because it gets only a bit more compression than zstd, but is 10X slower. Maybe a low level zstd is more favorable than lz4, or maybe not. The drop in rate is usually slightly more than the gain through increased compression. In the end, it doesn’t seem particularly compelling either way.

All this time I thought “slow” HDDs would be the bottleneck. Looks like the bottleneck is borg! I wonder how much it could improve with multi-threading/parallelization.

Other factors which also have an impact

The rates do drop further when backing up to a slower/older external HDD over USB3. I didn’t test my new backup drive yet as badblocks is still working on it.
Encrypted (repokey-blake2) repositories drop the rate slightly as well. It’s not a lot, and it’s also not too realistic to use no encryption, so I don’t worry about this much.
Slower computers also show slower rates. The above benchmarks were done on a Ryzen 7 2700X. On an i5-5300U the real data benchmark rates were ~75% of the above results.

Tentative conclusion

I’m tempted to go with zstd, just out of curiosity for a new fancy algorithm. Maybe level 3, which happens to be borg’s default setting for zstd. The Density compression algorithm mentioned above is also interesting, but I definitely don’t want to complicate my life with borg. The closer to defaults, the better.

Spiller · August 7, 2019, 3:36pm

Out of curiosity, how does these numbers compare to making a tar archive of the folders and to making a copy using cp? And if you compress that tar archive do the compression ratio improve?

sceps · August 7, 2019, 4:33pm

For the directory with documents:

/usr/bin/time tar -czvf resulted in 1.165 compression ratio and 34.2 MB/s (output size divided by elapsed time)

/usr/bin/time tar -cjvf gives 1.211 compression ratio and 8.37 MB/s.

For /etc:

/usr/bin/time tar -czvf resulted in 5.378 compression ratio and 8.96 MB/s

/usr/bin/time tar -cjvf gives 7.347 compression ratio and 2.41 MB/s.

So the compression ratios seem to be in the right ballpark.

Nothing was copied though. It was all on the local SSD as in the benchmarks above. Did you want me to try something else?

Spiller · August 7, 2019, 7:35pm

I was actually more curious about uncompressed data rates, tar should be one of the fastest ways to transform a folder structure to a file stream, right? No deduplication overhead or anything, just read a file and write it again with a tiny amount of metadata in a loop.

I tried switching back to Linux to try it myself and tar -cf test.tar ~/ was only 77.3 MB/s, which is very similar to your documents backup without compression. Which is very disappointing considering it is a Samsung 960 Pro 512GB which should be very fast. dding a 6 GB video file did give a more reasonable 1.1 GB/s so my SSD isn’t broken at least…

This just makes me more curious of what is causing the bottleneck. Is it the file system being slow? Is it non-sequential reads due to files being located randomly on the SSD? I don’t think it is borg…

sceps · August 7, 2019, 8:45pm

Oh, yes, I also tried uncompressed tar to the local SSD. Did you try a few times? The rate varies a lot between trials. Not sure if having badblocks constantly reading >100 MB/s on an HDD on another SATA port could have some effect.

tar -cf test.tar with a 2.4GB directory gave readings between 170 MB/s to 530 MB/s, the latter of which is the rating for my Micron 1100 2TB SATA SSD.

With the small 35MB documents directory I used above for benchmarks, it finishes so quickly that the average comes out to 1.8 GB/s, again with fluctuations of roughly 2X.

I would think tar should be faster since it doesn’t have borg's other tasks like hashing, chunking, deduplicating and whatever else borg does.

Spiller · August 7, 2019, 10:13pm

I tried it with 50 GB of random folders so I only tried once.

Test of taring ~/.cache containing ~79k files totaling 4.3 GB:

27.84 s
14.05 s
14.16 s
15.01 s
sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches
28.28 s
14.17 s
13.48 s
... drop_caches
28.27 s

Fairly consistent at 28 seconds which gives a higher 157 MB/s speed, 14 seconds when not touching the SSD. The SSD is specified to 3500 MB/s read and 2100 MB/s write, so I would expect much better even under subpar conditions.

I will give this a closer look later this week I guess.