Storage setup for small workstation: Bcache, ZFS, LVM-cache what in 2022

n0s3 · August 2, 2022, 12:48pm

Hey all,
i need advice!
Current setup:

12TB HDD + 500GB SSH Bcache + SSD for OS (Ubuntu server 18.04 LTS)

Problem:

Hard to upgrade the storage, adding new or replacing disks is not plug’n’play.
Backup through cronjob using rsync to NAS - not real incremental (using rsync replace if newer).

Goal/Requirements:

MS StoragePool styled setup
No need for disks to have the same size
SSD read cache → we are mostly writing only once but reading often.
Easy snapshotting via cron.
Volume/Directory based parity selection (just a plus, not hard required!)

The workstation is mainly used for ML/DL training and is equipped with 4 GPUs. Storage is mainly locally due to not having 10G ethernet in the building. I am upgrading to 22.04 and thinking about improving the storage setup for the future.

Question:
What is the state of the art local storage approach for Linux when you want to have JBOD with a SSD read cache?

Regards
n0s3!

cowphrase · August 9, 2022, 2:42pm

I’m only aware of two general techniques that can handle this, BTRFS or mergefs + snapraid (never used mergerfs + snapraid myself).

For BTRFS the mirror mode (called RAID 1) really creates two copies of blocks on two disks, the trick is that if your RAID 1 has five disks it will balance these copies over the disks. However it uses the disks with the most free space first, and it can’t utilise the two copies for faster reads. You get the redundancy of copies without the performance improvement. (If you have a BTRFS “RAID 1” with two 8TB drives, and two 4TB drives, your 4TB drives are practically doing nothing until you write 4TB to your array. A real waste of performance.).

For mergerfs + snapraid you basically have N independent disks, and files are stored on a single disk. Then a cronjob uses snapraid to create redundancy on another disk, which can be used to (manually) recover the contents of a lost disk. You may get better performance if your files are spread across multiple disks, but it’s really best suited for rarely read data like linux ISOs. However this is by far the most flexible if you want per volume/directory parity.

Instead of the two above, I’d be tempted to make a ZFS pool with multiple mirror pairs. The disks in a mirror need to match in size, but each pair can be different sizes. You get the redundancy and speed of mirrors, plus buildin caching if you choose (l2arc). ZFS has some limitations in terms of your requirements, but the performance and stability is top notch.

If you want it to be plug and play maybe look into TrueNAS Scale? I haven’t used it myself, but AFAIK it has a nice ZFS gui that makes all this work nicely. You can run your main workload in a container or VM.