More ipxe shenanigans.
I gave in and set up an http server for loading alpine’s modloop (I needed it for modprobing nfs). It seems like the modloop file contains the /lib/modules folder (which now that I know, I realize the name makes a ton of sense).
I’ve installed darkhttpd (because I knew it was easy to configure, literally just point it to a folder, give it the chroot flag and a user and you’re off to the races), dead-slapped the modloop in there, modified the ipxe boot file for the alpine boot cmdline and pointed the modloop to the http server. Alpine booted just fine and I could modprobe nfs.
I haven’t yet configured it to properly save configs and load them, but I have read a bit more into ipxe. Apparently there’s a variable for the dhcp hostname entry for a server (${net0.dhcp/hostname}
), which I can use in the ipxe script to send it as a hostname boot cmdline. That will make alpine see its proper hostname and read its own hostname.apkovl file, which fixes some worries I had (no more setting the hostname of a VM to its MAC address, thanks sweet Jesus, on God fr fr).
I’m not sure how to tackle security on this. On one hand it’d be very easy to make a nfs share for the whole subnet and allow all alpine servers to save their lbu backups there, but that will mean all servers can modify other server’s saved data, which is not ideal. I’m not expecting anything sus in my lab, but I’m doing this for “research” (meaning others might want to pick up this config) so I can’t just make it insecure because I’m lazy.
That means I’d have to make a nfs share for each alpine VM, almost like a root-on-nfs, but instead use the classic alpine diskless mode (root on ramdisk with config on nfs).
After reading a bit on the alpine wiki, it looks like an apkovl can be sent via http (just like the modloop), so all I would technically need is a good ipxe script to load the full OS into RAM, completely diskless. Reboot without lbu commit will result in the loss of all non-saved data. If all you do with alpine is serve a static site, rebooting is fine. If your alpine VM is a VPN, rebooting is also fine, but you lose all logs (which might be a good thing in certain environments), unless you have an rsyslog server where the logs are sent.
If you have a database, you’d be screwed with a reboot (well, you can’t lbu commit a DB with it running anyway, except maybe for postgres and sqlite). But running a DB from RAM, while you’d have great performance (if the DB will run at all), I think you’d be better off mounting a nfs share for the DB or other volatile user data.
I can already think of a few uses like this. HA in particular would be really easy to implement. As long as your VM config file will contain the MAC address for the interface, you don’t even need fencing because the VM will run solely from RAM, so if your host is taken offline, then all your VMs will be restarted on another host, run from memory with the same configuration and all and if the host comes back online, its VMs should be put on-hold / stopped or have the services moved back on that host (maybe live migrated?). Obviously if your VMs have persistent data via NFS, then you’d need some fencing (like cutting the hypervisor off the nfs network on the switch port).
One thing that I’ve read on the alpine wiki which kinda stood out, was the fact that if you want to upgrade the kernel, you need a minimum of 8GB of RAM free for that alpine VM where you’re performing the upgrade. That’s quite a lot, but it’s needed for the modloop creation. On the plus side, you don’t need all your VMs to have 8+ GB of RAM. With netbooting you can just start 1 VM once, upgrade the kernel, generate the image, initramfs and modloop files, then copy these to the tftp or http server that’s serving them to the rest of the VMs. Reboot your VMs (lbu commit if necessary) and voilà, you’re running the latest kernel on 50+ VMs by upgrading the kernel inside just 1 VM.
Given that ipxe supports http and that alpine can’t use tftp anyway, I think I’ll only use http for the ipxe scripts and deliver the ipxe.efi file via tftp. All that’s really left is to generate an ipxe.efi payload that has an embedded ipxe script that chainloads another script from next-server’s http server.
I’m not sure yet if I want the chainloaded ipxe script to be custom to all VMs or if I want a generic one. The DHCP server config needs to be edited anyway, but I’d rather not add more work with the ipxe script.
One thing’s for sure, most VMs can use a generic ipxe script just fine (based on ${net0.dhcp/hostname}
and ${net0/mac:hexhyp}
entries) and if there’s a couple of VMs here and there that need a special boot config, it’ll probably be easier to set up the dhcpd.conf to send a different (custom built) ipxe.efi payload with an embedded script that chainloads an ipxe script based on the dhcp hostname entry.
Alternatively, I could go full retard and go the symlink route. The ipxe.efi can be the same for all VMs and all VMs would chainload their own ipxe script based on the hostname, but they all point to the same config ('cuz they’ll be symlinks) and for the only few that need a custom script, just copy the generic one and modify it under the hostname entry for the VMs that need it (or further refine it to make it somewhat generic for whatever purpose it’s needed).
I’ll be honest, I don’t like symlinks, but for things like this, they’re absolutely fantastic! Why bother to build custom efi payloads or write a script for each VM, when most VMs can just point to an existing script? That way you also kinda guarantee that you’re only loading 1 payload for all systems, which for SecureBoot environments might be a cool feature to save time signing efi files (with secureboot even if you boot via tftp and there’s a malicious actor on the network, they can’t load their malware efi payloads without using the secret key that you used to sign the ipxe.efi file that’s used by all the VMs - making even tftp secure).
I’m not gonna make a tutorial for secureboot (I’m not using it myself), so someone might have to pick up the slack after I finish the netboot ipxe wiki. Oh and 1 more thing: while I’m testing this on proxmox, what I’m doing is not limited to it. You can run this stuff on libvirt (virt-manager), opennebula, probably bhyve too and even on bare-metal. This makes for a very cross-platform solution as long as you can set up a dhcp, tftp and http (maybe also nfs too, but with alpine, that seems to not be that necessary, except for certain workloads).