I’ll put headlines to make this more pallatable, it’s clear we have some XY problems here
on ZFS with Linux and packaging
I’ve used ZFS on FreeBSD, Ubuntu, Gentoo, Arch, Debian … I’ve been changing distros and keeping the filesystem. I’m currently on Debian testing (kind of like a rolling release Debian if you can imagine it) and Arch, both with LTS kernel (currently 5.4).
I haven’t really tried centos recently (old strange heavily patched versions of everything), or Ubuntu server (started feeling too patched up, microforked everything, and too opinionated for a distro according to my tastes).
I prefer the rolling release model where I control when to upgrade what and how much effort to put in rather than someone managing me to not go through updates when I have the time, and go through a major upgrade when I don’t have the time, while at the same time holding back features on me - no thank you.
One can always download and build zfs to fit the kernel version they’re using by hand. But, DKMS is a system of hooks for your package manager that does that, ie. it builds the spl and zfs kernel modules from sources (or partially built sources) for you. OpenZFS devs are keeping track of next and rc releases and in my experience they’re doing a better job than most driver/hardware maintainers who maintain out-of-tree code. It’s a big community of actual devs, and so the kernel releases track each other usually such that you’ll have OpenZFS sources updated published on the same day as the kernel. However, they’re still separate releases, and they’re usually living in separate repos downstream and sometimes if you’re using prepackaged/precompiled ZFS your package manager might barf for like a day until the distro maintainers get their act together.
Both Debian and Arch (which is what I’m using with ZFS at home today) are hugely popular, for Arch there’s the archzfs package repo where you get binary ZFS packages (no need for DKMS). I use DKMS on Debian (no special repo, I don’t know if OpenZFS folks are now doing binary packages for Debian). I prefer not compiling my own software if the distro has a release, mostly out of principle – if distros/maintainers can’t figure out and deal with obscure configure flags and compiler errors for you, then what’s the point of using distro packages.
on disk layouts
You could do/store the OS separately from your data, or you could use a bootloader for your kernels, but this is what I have:
All my disks are partitioned using GPT to have:
- a 256M FAT32 EFI partition
- a separate 2GB /boot partition as ext4 for the kernel(s) and images.
- empty space until 10G offset (felt like a good idea after that one time grub2 wouldn’t fit in my superblock all those years ago, 10G == 2cents these days so worth it.
- rest of space as either zfs or btrfs. I’m doing this strange thing where I have a specific dataset / subvolume for “root” in both cases, and just the os stuff underneath that, no VMs no my own data. That allows me to snapshot the OS independently from most of the data on the system and it allows me to change distros more easily - although that’s always been a pain.
The first partitions (efi and boot) are mdraid mirrored across all drives (yes I have 6 copies of busybox and kernels on the Debian box, oh well). That means the motherboard can pick any drive on startup. Motherboard starts the bootloader, which loads and starts the kernel and it’s zfs containing initramfs from ext4, which then mounts zfs (or btrfs same setup) and pivots root to continue the rest of the boot. Grub is supposed to have zfs support. In theory you can drop it into your efi partition and read the kernel/initramfs from zfs - I just don’t want to rely on it being able to boot from a broken pool, and I do want to be able to boot from a broken pool somehow.
on VMs stuff.
KVM doesn’t care about block devices, the mechanism for that is called virtio (the one that makes sense using anyway). Your VM OS will have a pci device with a descriptor telling it to use the virtio block driver. And this driver will issue a read to the host os kernel virtio stack which will have a driver/kernel module intercept it for performance and redirect the read to either a block device on the host (e.g. a zvol) or to some region of some file or to a user space process (e.g. qcow2).
Host sees the underlying block device/zvol or a qcow2 files, not individual files.
Traditionally, if you wanted more than one machine to use the same files, you’d use some kind of network filesystem between the guest or the host.
If your guest is Linux, there’s a virtio-fs filesystem driver that’s supposed to work well, but I haven’t been able to get it working well yet (I’m trying to get it running with user namespaces and with shell script only - without writing python or c or reaching carefully into syscalls). There’s also virt9p that’s much older but poorly maintained (e.g. ftruncate doesn’t exist in it, except for that mailing list patch from a few years ago, basically it’s trash and not maintained all that well, I’ve booted Alpine linux in it, but it’s worrysome).
You can expose old snapshots of a zvol to your home.php developer somehow (attach an additional readonly block device to their VM, and have them mount it). Or if a developer is storing things on NFS that’s served by the host, then you can give them access to snapshots in a subdirectory of their nfs share, and over that same nfs mount, or make another share with just all the snapshots.
In general, you don’t want to mount guest block devices on the host, it’s possible, but it’s like taking a hard drive from one machine you’ve decided not to trust and mounting it on the other machine that you care about - technically it’ll work in a pinch, but gives me pause.
on “oh-noes I deleted home.php :o” problem
This accidentally deleting home.php scenario gave me pause as all. I’m not entirely sure how the developer team you’re working with is used to doing things, but this “I’ll give people VMs and have them admin them as root and write random files to random places and have them be responsible for admin-ing a Linux system” is a fairly old way of thinking about deploying web apps - it’s not really suitable for working in a group setting where you have multiple people working on a thing together.
Most web developers these days who I know, would already have some kind of version control and would send each other pull requests, and some kind of build system (read: Makefile) that compiles their javascript and CSS and also starts a local docker container on their workstation with a bunch of mounts towards the developers home dir for experimentation and quick iteration on the code.
Once the change is checked into a central git repo, automation (read: post commit hook, but sometimes it’d be a person) would build and push new docker images. These could then be deployed to k8s, or docker pull-ed by anyone in the group who’s interested. Ideally, there’d be some automated testing and labeling of things in between, and some kind of gradual deployment and instrumentation/metrics would be observed during this time when there’s multiple versions running at the same time… but, sadly this doesn’t always happen.
In other words, there’d be no (w)holistic backups for machines (except obviously when this is offered as a service to others in the cloud because they think they need it for whatever reason and want to pay for it, or because having the feature ticks a box in some consultants magic quadrant analysis of orderings). It’s just not all that useful when you have an organization with multiple developers… not sure why.
Databases like MySQL or redis or cockroach; filesystem-ish things like ceph and samba shares and various s3-like storage services would be where most of the persistent data lives, and these would be managed in their own special ways, that are independent of the host kernel filesystem typically; and in fact independent of the hosts themselves. So machines would go up/down get rebooted or not come back sometimes, containers would get restarted and restored from most recent backups or builds all the time, and once in a while we’d get a new version of software deployed. Usually gdpr compliance - ensuring data is really deleted and gone on demand, from all backups and logs is a bigger problem.
Now, if you have <5 websites / VMs total, and some “part-time and cowboy and very set in their ways” developers maintaining them, and you’re rebuilding this infrastructure and their workflows over time, and just need to have snapshot-ability as a transient mechanism for this oh-shit moment, until they can wrap their minds around Git and using e.g. Makefiles and scripts to build/deploy code… I’m sorry, but great! being able to snapshot a ZVOL is actually not a bad place to be in. In their VM they can have more ZFS if they want
or LVM or rsync/rclone/duplicity or whichever.