Here’s something that has managed to escape my attention since June 2020: As of the following pull 9158 - Add a binning histogram of blocks to zdb by sailnfool · Pull Request #10315 · openzfs/zfs · GitHub zdb
gives you a histogram of block sizes when using -bb
or greater (-bbb
,-bbbb
etc), so you can quickly determine exactly what size you can get away with setting the special vdev small block redirection to.
Normally the output is in “human readable” numbers, like “64K”, but if you use -P
then the output will use full numbers, like “65536”
Using -L
suppossedly cuts down on the time needed to generate the data because it’s not trying to walk through and verify everything is correct.
My main pool’s output looks like this:
block psize lsize asize
size Count Size Cum. Count Size Cum. Count Size Cum.
512: 833K 416M 416M 833K 416M 416M 0 0 0
1K: 1.04M 1.22G 1.62G 1.04M 1.22G 1.62G 0 0 0
2K: 889K 2.31G 3.94G 889K 2.31G 3.94G 0 0 0
4K: 2.63M 11.6G 15.6G 963K 5.31G 9.25G 0 0 0
8K: 2.01M 21.8G 37.4G 1.61M 18.6G 27.8G 4.83M 50.6G 50.6G
16K: 1.44M 30.5G 67.9G 2.15M 40.6G 68.4G 2.77M 65.0G 116G
32K: 1.46M 65.3G 133G 802K 36.3G 105G 1.49M 67.1G 183G
64K: 3.08M 310G 443G 1.01M 93.3G 198G 1.64M 151G 334G
128K: 401M 50.2T 50.6T 405M 50.7T 50.9T 403M 75.6T 75.9T
256K: 602K 214G 50.8T 798K 287G 51.1T 734K 259G 76.1T
512K: 451K 320G 51.2T 585K 411G 51.5T 542K 385G 76.5T
1M: 331K 485G 51.6T 264K 371G 51.9T 385K 540G 77.1T
2M: 373K 1.03T 52.7T 153K 432G 52.3T 360K 1.03T 78.1T
4M: 7.14M 28.6T 81.2T 7.58M 30.3T 82.6T 7.35M 43.9T 122T
8M: 0 0 81.2T 0 0 82.6T 0 0 122T
16M: 0 0 81.2T 0 0 82.6T 0 0 122T
I believe the asize Cum.
( ͡° ͜ʖ ͡°) column is what we are concerned with here. A few observations:
-
LSIZE is logical size. which is the original blocksize before compression or any funny business.
-
PSIZE is physical size, which is how much space the block itself consumes on disk (after compression).
-
ASIZE is allocation size. The total physical space consumed to write out the block plus indexing overhead. Includes padding, parity and obeying vdev ashift as a lower limit and dataset recordsizes as the upperlimit.
-
The block sizes for 512, 1K, 2K, and 4K are blank. 512, 1K and 2K make sense, because my ashift for everything in the pool is 12. Because the 4K column is blank, I guess that the size counts are for “counts for blocksizes below this number”. But I’m not sure yet. It could be some sort of padding weirdness.
-
Pending better understanding of wtf these numbers are doing exactly, at the very least I can easily redirect blocks up to 32K (or 64K?) because it only takes up 334G, and my special vdevs are 1TB enterprise intel sata ssds. Hopefully I’m mistaken and I can set it to 64K. Either way this is awesome, I didn’t realize I could get so much of the worst performing blocks off my HDDs.
-
When block get above 64K (128K?) the block count shoots to the moon. This is what makes me think that the 4K block column is weird or I’m reading it wrong, because what seems to be happening is the vast majority of my blocks are at the default 128K recordsize. This means I need to refactor my pool recordsize because the majority of it is archived video that would benefit from large block sizes.
-
At the end it jumps again to 122T, so I guess it’s just tacking on the rest of the free space? This is weird.
As always, I’m probably wrong on some of this so feel free to correct me, and take what I say with a grain of salt. This is just my “current working knowledge”.