This is a thread for posting and discussing ZFS updates and the new added features for those interested, rather than just making a new thread for each one.
Updates:
This is a thread for posting and discussing ZFS updates and the new added features for those interested, rather than just making a new thread for each one.
Updates:
Compatibility and bug fixes for this one.
https://github.com/openzfs/zfs/releases/tag/zfs-2.1.8
$PATH
for deb-utils #14339
Well that was fast, last release had an issue involved with encryption causing errors to be reported back when there werenāt any. This release reverts that particular patch.
https://github.com/openzfs/zfs/releases/tag/zfs-2.1.9
Waiting for Ubuntu Lunar, I will get zfs 2.1.9 from there.
Internet says that there is no ZFS option anymore in the recent Lunar Lobster Beta ISO Installer. Seems like Canonical is phasing out ZFS support
So much for the ZFS-on-root easy mode. Ditching flatpak & ZFS at the same time would let me reconsider Ubuntu on the two machines Iām running Ubuntu on.
And still no 6.2 kernel support for ZFS. Really fucks up basically all leading edge distros. My Tumbleweed is on 6.2.x for what feels like ages now.
Canāt wait for 2.1.10 or even 2.2.
New features sound nice and all, like progress on RAIDZ for special vdev or shared L2ARC. But RAIDZ for storing metadata and small stuff just screams āwrite amplificationā. Same problem RAIDZ has with low record/blocksize stuff.
Most interesting stuff for me is an ARC change to seperate data and metadata MFU and MRU which really optimizes eviction policy of metadata and comes with a tunable parameter. This will help many pools to keep their metadata in memory, especially useful for pools with little memory or above average amounts of metadata.
Note that while ZFS v2.1.10 has been released, you should hold off on updating as there appears to be a data corruption bug that may be caused by an intermittent race condition, and thus wasnāt caught: Data corruption with 519851122b1703b8 ("ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()") Ā· Issue #14753 Ā· openzfs/zfs Ā· GitHub
So youāll need to revert the specific commit or wait for v2.1.11 for official kernel 6.2 compatibility.
This is a very interesting update, lots of new stuff including block cloning.
As always, Iād strongly suggest waiting a few months for a minor point release (e.g. ā2.2.1ā) to come out before upgrading unless you have something that can take the risk of testing. Thereās often some unpleasant bugs that get found once a major release is made.
/bin/cp
on Linux, will try to create clones automatically.renameat(2)
, support for overlayfs, idmapped mounts in a user namespace, and namespace delegation support for containers.zpool status
will report all filesystems, snapshots, and clones affected by a shared corrupt block. zpool scrub -e
can be used to scrub only the known damaged blocks in the error log to perform a fast, targeted repair when possible.zfs receive
which can be used to heal corrupted data in filesystems, snapshots, and clones when a replica of the data already exists in the form of a backup send stream.Documentation - OpenZFS documentation for Linux and FreeBSD.
Change log - Complete v2.1.0 - v2.2.0 change log Thanks to 202 contributors
Module options - The default values for the module options were selected to yield good performance for the majority of workloads and configurations. They should not need to be tuned for most systems but are available for performance analysis and tuning. See the module parameters documentation for the complete list of the options and what they control.
Offtopic: It took me way too long to figure out how to align the expand and collapse formatting to the prior bullet points.
# Header
* Bullet1 - These have **no** spaces in front of the *
* Bullet2
* Bullet3
<details open>
<summary>Dropdown1 which loads expanded</summary>
* Item1 - This first item must have a blank line above it, and **two** spaces before the *
* Item2
* Item3
* Item4
* Item5
</details>
<details>
<summary>Dropdown2 which loads collapsed</summary>
* Item1
* Item2
* Item3
* Item4
* Item5
</details>
This results in:
Bullet1 - These have no spaces in front of the *
Bullet2
Bullet3
Is this a revamp? I remember this feature for shortly after some 2.0 release. And Iām pretty sure I used the -c flag in the past. Or is this something new?
Scrubbing 200T+ isnāt trivial. Iām sure many pools will benefit from this. Itās situational. But if you face the situation, you can save yourself days.
Iām not sure what difference and possibilities it will bring. To optimize and tweak heterogeneous vdev arrangement we often see in homelab scenarios? Not sure.
This is really gamebreaking. And Iāve seen the devs talking about this at the OpenZFS Leadership meetings. Gone are the days where your ARC evicts metadata seemingly at random and in a large manner. You could get this under control with tuning parameters, but having this adaptability out of the box is great for us homelabbers and basically removes the need for most special metadata vdevs. Because ARC will take the metadata just as serious as data now, with counters and weight in the algorithm.
You want cached metadata? You can rely on your ARC to do this out of the box. Amazing work from Alexander Motkin and others.
The early abort feature of LZ4 was the holy grail in the last decade. ZSTD is great, but wasted so much CPU on barely compressible stuff.
Iām probably changing all my remaining LZ4 datasets to ZSTD. I only had LZ4 because of the early abort on barely compressible data.
Great news for the average compressratio for basically any pool out there.
zfs send
has always had a -c
for compression, zfs receive
didnāt?
Yeah I probably confused recv with send. The recv -c
stands for corrective instead of compression.
Hereās some interesting and amusing info on how the early abort feature works: edit: I can't tell if you edited this since I read it, or I just skipped the las... | Hacker News
Posted by rincebrain
To add something amusing but constructive - it turns out that one pass of Brotli is even better than using LZ4+zstd-1 for predicting this, but integrating a compression algorithm just as an early pass filter seemed like overkill. Heh.
Itās actually even funnier than that.
It tries LZ4, and if that fails, then it tries zstd-1, and if that fails, then it doesnāt try your higher level. If either of the first two succeed, it just jumps to the thing you actually requested.
Because it turns out, just using LZ4 is insanely faster than using zstd-1 is insanely faster than any of the higher levels, and you burn a lot of CPU time on incompressible data if youāre wrong; using zstd-1 as a second pass helps you avoid the false positive rate from LZ4 having different compression characteristics sometimes.
Source: I wrote the feature.
Iām not that much of a maths nerd to dig deeper into those algorithms, but I remember seeing LZ4 abort mechanic as a helper algorithm to boost ZSTD performance. If this gets the job done, so be itā¦LZ4 speed is insane.
Canāt wait for QAT (which is also mentioned in the linked thread) to support LZ4 and ZSTD. Or other standardized hardware offloading, be it CPU,GPU or whatever. If a freaking 6-core Ryzen (or attached iGPU) could offload ZSTD compression like QAT can, thatās your perfect storage server CPU right there. compressing 100Gbit/sec without the need to boost any of the cores.
Those lazy codersā¦writing code and developing new features just to avoid copy&pasting some URL from your browser into some forums
ZFS Interface for Accelerators (ZIA) is an upcoming feature so we get hardware offloading on a lot of stuff. So if you want QAT or DPUs to turbo-up your CPU-intensive ZFS workload, thatās it.
Compression, Checksumming, RAIDZ are the most prominent benefactors.
Developed for OpenZFS by Los Alamos Laboratoriesā¦they got 3.2TB/s of Infiniband and 100PB storage according to the slides. So yeah, you need some offloading if you donāt want to buy and run thousands of CPU cores just for that. With 100PB, every 0.01 compressratio
is equal to a diskshelf worth of drives.
I thought zRAID expansion made it into this update, am I wrong?
Itās only in testing, not in the master repository, as far as Iām aware. Getting closer, but not quite there yet.
Even when it does drop, youāll want to wait a few more months yet to wait for others to painfully run into edge cases.
Yeahā¦looks good. Itās been quite the odyssey ever since it was sponsored by the FreeBSD Foundation. But you donāt want to roll out these kinds of things if you are not 100% confident. Weāre talking about ZFS after all.
Also keep in mind that RAIDZ expansion leaves a permanent footprint in memory (uses RAM). Gets less over time as you modify and delete stuffā¦a small price if you consider the next level wizardry necessary to make this even possible while still keeping all of RAIDZ features like fixing RAID5 write hole.
I personally donāt have a use case for this. I donāt think making a wide RAIDZ vdev even wider is a good choice because you eventually end up with a single 20-wide vdev instead of 3x 7-wide. Adding a vdev gets more expensive the more you use RAIDZ expansion.
But if talking about pools of 3-7 disks, itās great and thatās probably where people use it.
Iām personally of the same inclination, as the feature seems to mostly be demanded by those who want to perform a risky operation, but donāt even have backups to do that age old ābackup, check your goddamn backup, destroy, recreate, restore from backupā dance. Iām certainly not against the feature, but Iām hoping people donāt footgun themselves.
What I actually look forward to regarding the feature is there being one less reason that people can complain about pointlessly. Fortunately for those that want to derail online forum discussions, theyāll always have ālicense issuesā as the gift that keeps on giving.
In other FS news, in the upcoming 6.7 kernel
Supposedly BTRFS will be having itās raid5/6 write hole addressed: Btrfs updates for 6.7 with new raid-stripe-tree | Hacker News but I donāt know enough about whatās involved to say if thatās really true.
BcacheFS is apparently getting merged as well.
Iāve seen some news a couple of weeks/months ago at Phoronix. Looked convincing, but sacrifices performance as a compromise. I run BTRFS on 3 machinesā¦no need for parity RAID there.
But that ASUS NVMe consumer NAS has ext4 and BTRFS. So we see BTRFS spreading and parity RAID as an option is always good to have. Synology certainly lost their edge with SHR and I bet we see many more BTRFS NAS products in the future. ZFS isnāt for everyone and thatās totally fine.
I admire the dedicationā¦but there is still a lot of work to do in terms of features before I would consider it. Iām a spoiled ZFS kid.
My personal focus is on Ceph right now. And I really want that Crimson-OSD update (soon ). Itās the Ceph equivalent for ZFS Direct_IO, boosting NVMe capability by quite a bit. CPU cycles on NVMe drives are insanityā¦2-4 cores per NVMe drive on heavy writes. The metric that is used is IOPS/Core and that tells you a lot where the bottlenecks are.