ThatGuyB's rants

I tried mx linux, it booted on first shot. I’m not too happy with the resource consumption (almost 1GB of RAM used with just the default xfce, an empty firefox tab and a terminal open running htop). Upgrades seemed to run fast at first, but apt download still dips into 550 KB/s range (which is still 10x higher than xbps on this particular buggy hardware), but the average is around 2.2 MB/s which I’m contempt with. IDK if the kernel did something (this is running default debian 6.1 kernel). Aside from this hardware, xbps is always faster than apt (both download and extracting the downloads).

The touchscreen in xfce works only in portrait mode (doesn’t rotate input when rotating display output). I tried to play a youtube vid, but apt had a procedure to run a dkms-make script, which tanked he CPU, cached the memory and sent 700M to swap. Closing FF reduced ram consumption to 410M (with 600M still cached).

I’ll probably remove xfce and most other tools and install sway. This tablet is a glorified browser, it doesn’t need a DE, even 1 as light as xfce (besides, I’m not willing to re-learn how to rotate touch input in xorg and sway uses less resources).

The update took ages because of the make for the dkms module 8821cu.ko and update-initramfs (which is weird, because the only non-intel device in this thing is a broadcom 43241 wifi and bluetooth card).

Opening FF again and trying to play a yt video is basically impossible (every 20 seconds Network Activity is 0 KB and Connection Speed 7004 Kbps). I rebooted the system. Setting playback to 144p does the same. I tried to play a video on peertube (which is generally faster to buffer), but it’s still laggy (@ 480p). Sound doesn’t work OOTB (expected, I needed to bash my head real hard to get sound to work on the built-in stereo, I’ve used a USB audio jack dongle for a long time with this).

I’ll see another day if this tablet is salvageable. I said I don’t want to waste more time on it and here I am wasting more time on it (at least I saved how to fix the audio, totally worth the time spent documenting it).

1 Like

Browsing the web is surprisingly cpu intensive and the poor GPU doesn’t help. A browser can easily utilize 2Gb alone and you’re trying to shoehorn all that into 2Gbyte in total, it’s a losing battle…

I’m only opening 1 tab at a time and basically never use more than a terminal side by side with it. If I was coming to this tablet after 8 or even 5 years of it being non-utilized, then I’d probably give up easier. But I know I’ve used it for the past 1.5 years or so for watching videos and it’s been working.

MX Linux is slightly bloated (and I’m not aware if they have a minimal installer), so I’ll need to remove a bit of stuff to make it leaner (well, I don’t really need to do much, just disabling lightdm and not using xfce will remove most of the things that get launched upon login, although I disabled conky for now).

I wish I could’ve used alpine, but I didn’t manage to make it boot into uefi (I couldn’t make it boot in uefi even in a VM, I’m too much of a brainlet to pull it off). I just now read that it’s possible to make an EFI bootable USB with it, which might work for VMs later, but still. Time to research.

That didn’t work, but it was worth a shot (I’ve read that this was supposed to not be necessary anymore, but given the old efi firmware, I thought it’d be worth trying).

But I went back to mx and disabled lightdm. On sway with a terminal with htop and a firefox with duckduckgo open, I got 670M of RAM utilized (all of 2GB cached and 256K swap). Closing FF makes it go back to 227M (and 800M still cached). That’s in-line with what I expected.

I need to disable the splash screen / plymouth so I can use tty1, then to fix the audio (pulsemixer doesn’t even work when I launch pipewire from my profile, something about permissions, I’ll look into it). I’m starting to be really happy with this setup (well, mainly because now it looks and feels no different than what I had before, it’s just plain sway and a working foot). The tearing when scrolling in the browser is also gone (I had compton enabled before, but it was still tearing - with sway, I never got tearing, even on the rpi2).

NFS on FreeBSD rant.

I just realized something and I need to quickly dump my thoughts to words. A few years back, when the rpi 2 with a 1TB USB HDD was my make-shift NAS, it had linux on it and I was using the linux nfs server package. Back then, root-on-nfs was working good with netboot.

Now I have a freebsd NAS. Root-on-nfs still works with netbooting (pretty sure I’ve tried booting the n2+ and hc4 relatively recently and they worked), but what didn’t work was launching LXC containers over on NFS. And now that I have this realization, I think around the same time I switched to FreeBSD (and its NFS server) I couldn’t yt-dlp directly to NFS, couldn’t git-clone to NFS and neither xbps-src on NFS.

That’s a massive hit on my previous workflow. I used to be able to at least yt-dlp, but not anymore since the change. I think there’s something fishy, which if I can fix on freebsd then I should get my old workflow back and I think I might also fix the lxc on nfs problem as well.

IDK even where to start tho’. I’ve looked through the mountd man and it seems like I might need both “-n” and “-r” (I was only using “-n”). So I added both flags in rc.conf and restarted mountd, followed by nfsd. I mounted my nfs, but yt-dlp is still not working when inside an nfs folder (yt-dlp hangs so hard that ctrl+c or a simple kill won’t work - only sigkill, a.k.a. kill -9 works).

I’ve looked through nfsd man page, but I didn’t find any specific flag that I could use that would help me. I’ll need to remember (or rather to find out) what options I was using in nfs on linux (all I remember was no_root_squash, but I might’ve used others, too - I need to inspect my old pi 2’s dd img file to check /etc/exports).

It’s embarrassing that it took me 2 years (or more) to realize. The past couple of months, I kinda thought of using iscsi to attach a LUN and mount a partition locally and use that for my remote storage needs (git, yt-dlp, lxc etc.), but if I can fix NFS on freebsd (which I’d prefer, though I’m not entirely opposed to isci) then I wouldn’t need to go through the workaround (it’s easier to export nfs in zfs than to create a zvol and export it in iscsi and I’m planning on more nfs shares, it’d be a bit annoying to do iscsi for all the things I plan on running).

Do you have locking enabled on both sides?

3 Likes

You’re a mad genius, diizzy! I never thought of looking at the lock on the client-side. mount -t nfs -o local_lock=all fixed it (at least yt-dlp).

I wonder what changed, I’m still using the default mount options from fstab (all I did was change the hostname and export path).
:man_shrugging:

I’ll test everything else I can later tonight, like git clone and xbps-src. Thanks a lot!

1 Like

np! Regarding git clone, https://projects.pyret.net/files/public/freebsd/git-tools.htm :slight_smile:

3 Likes
  • xbps-src: check
  • git clone (rather pull rebase, but still git operations): check
  • nfs chroot: check
  • root-on-nfs: check
  • lxc on nfs: unverified yet

Special thanks to diizzy again. My hero :heart:

Back to business, I played with my proxmox lab the past couple of days and the nfs part was the last real remaining piece of the puzzle. VMs I have:

  • freebsd NAS
  • void router
  • openbsd router
  • void diskless install

The FreeBSD VM NAS is just using 2 vdisks (OS and data) for 2 zfs pools, nothing fancy like hardware passthrough (just testing inside the lab). I’m more comfy with freebsd, but I should write documentation on how to use proxmox itself as an nfs-server and iscsi target server ('cuz the idea is to use proxmox zfs with diskless installs - although you can use a separate NAS appliance).

The Void router is self-explanatory. I’m used to linux routing so I got up-to-speed quickly for testing. The OpenBSD router will replace it eventually (still need to read the setup steps, once I’m done I might switch my rockpro64 main router to openbsd).

The void diskless install is the whole point of today’s lab experiments. I made the VM diskless, booted an ISO, installed nfs-utils, mounted the share, bootstrap the OS and chroot into it for final config (took a bit of trial-and-error to get right). The FreeBSD NAS is running NFS, iSCSI and TFTP (inetd) services (ignore iscsi for now). The vmlinuz and initramfs are copied from the nfs folder to the tftp folder (to load them).

The void router has dhcp services (and does routing and fw, duh!). The dhcp server is set to point the filename (dhcp option) to pxelinux.0 (obtained by installing the syslinux package). The pxelinux.0 and ldlinux.c32 are present in the tftp root folder. When pxelinux is loaded on the VM as an iPXE payload (the default software net loader that proxmox ships with), it reads the dhcp conf-file and prefix options (actually, I’m not entirely sure about that, as pxelinux is loaded immediately when downloaded from tftp - and from reading the syslinux wiki, it seems like pxelinux is automatically looking on the tftp for the folder pxelinux.cfg and for the file 01-mac-address, so pretty sure these other dhcp options don’t matter for ipxe - they matter for petitboot and I believe uboot too).

When the pxelinux.cfg/01-mac-address file is read, it sees the boot options for the VM, which is set to get the linux kernel and initramfs from the same freebsd tftp server and also grabs a cmdline (boot options) with nfs root. Then the initramfs is extracted to memory and the system starts booting, including mounting the nfs share as root (i.e. “/” mount point). This was a bit tricky, because I needed dracut-network (while at it, I installed dracut-uefi for some later testing).

The result is this.

Some of you with sharper eyes noticed something. I loaded “pxelinux.0.” That means the VM was booting into BIOS mode.

Unfortunately I couldn’t make syslinux.efi to boot (I switched from OVMF to SeaBIOS to at least test if things work). I’m only getting a “BdsDxe: load error” when I try booting OVMF with iPXE. It downloads the syslinux.efi file (I can see the success message), but then shows that BdsDxe output.

It’s still a good start, I probably just need to read more on the syslinux wiki.

The plan here is to be able to run DHCP, NFS, TFTP (maybe HTTP) and iSCSI servers (between the router and NAS) and have (most of) the other servers and PCs around the house netboot and be diskless. That only leaves the router and the NAS to have local storage (actually, if you run the DHCP server on your NAS, you could literally have a diskless router / firewall, which is mad sus frfr no cap, as long as you don’t mess with that ethernet interface at all after booting).

For super-lightweight VMs, I’d love to set up frugal alpine install with netboot, but with root-on-ram (default alpine install, but with nfs for lbu backups). But that will have to wait. The NFS revelation really boosted my morale to larp as a sysadmin for a while.

With working git and xbps-srv, I can return to writing the s6 documentation and improve (and add) some other s6 services. Once I have a decent workflow of keeping up with upstream git repo, adding and modifying my own build templates and building the packages, I should be able to finally move my personal system to s6 suite (and put my money where my mouth is). The only reason I’ve delayed it for so long was because I wasn’t sure if I could break something with updates if I don’t keep my own packages up-to-date.

1 Like

I’ve had some more resources opened while testing that I didn’t get to read. Apparently I’m doing this wrong. Instead of being a brainlet and chainload ipxe into syslinux / pxelinux, I should’ve built the filename to be an ipxe script (what next, I’ll pipe cat into grep?).

Well, this would be useful if I’d be booting a minimal cd or usb with syslinux.efi or pxelinux.0. But by that point, why not just boot ipxe instead? Change of plans kings, we retooling this for ipxe.

NGL fam, I’ve been spoiled by petitboot and uboot. They’re awesome bootloaders and they take pxelinux.cfg type files (and kboot configs - I prefer the kboot config). But looking around the interwebz, it looks like ipxe is preferred for netbooting almost universally (ipxe page itself, netboot.xyz, iventoy etc.).

1 Like

Just a funny (?) and short story today. I wanted to set up a local DNS in the lab (mostly for the lolz). I created an alpine DNS container where I installed blocky. Its IPv4 address (DHCP) happened to end with 53. So I kept that.

IDK why I find that funny. It’s a funny coincidence.

1 Like

Back to PXE booting.

I’ve switched the test void VM from chainloading pxelinux.0 from ipxe, to just booting straight from ipxe. The ipxe scripts are somewhat straight forward and I like the variable names (although I could live without them).

What I like about ipxe is the documentation (to some degree, not as good as freebsd’s tho’). I easily found that there’s built-in variable names (like ${net0/mac:hexhyp}, or net1, net2 etc., which you can use to download a file from tftp or http with that name).

I tried to boot alpine, but I was using the virt kernel which apparently doesn’t come with built-in dhcp client, so it kernel panicked lmao. But this forced me to learn how to use the ip= boot parameter (besides just ip=dhcp). It’s easy, the parameter goes like this:

ip=<client-ip>:<server-ip>:<gateway-ip>:<netmask>:<hostname>:<device>:<autoconf>:<dns0-ip>:<dns1-ip>

https://www.kernel.org/doc/Documentation/filesystems/nfs/nfsroot.txt

The server-ip is for root-on-nfs booting (which I don’t need to provide in the ip parameter, I give it in a different place, but I’m not using alpine with root-on-nfs, so it’s pointless). So the parameter looks like this:

ip=192.168.69.69::192.168.69.1:255.255.255.0:thehostname:eth0:on:192.168.69.53:1.1.1.1

I managed to boot alpine, but it couldn’t load the modloop and it seems there’s no apparent way to make it do it with tftp (it supports http and ftp, but doesn’t seem to want to do tftp in the boot cmdline).

One thing I couldn’t fix was booting in uefi mode. If I try to boot with OVMF, ipxe will crap out (that same BdsDxe). I’m starting to feel that the ovmf pxe boot option is not ipxe (it doesn’t look anything alike)… aaand I proved myself right. I grabbed an ipxe.efi-x86_64 file and set it as the filename in my dhcp server.

Whatever OVMF boots in PXE boot mode, it just tries to load an EFI file from next-server. With that, ipxe was loaded… but now ipxe was chainloading itself over and over (because it was reading the filename from dhcp) so it got stuck in a boot loop until I pressed ctrl + b to get into the ipxe shell.

Whenever I try to set up ipxe in the ovmf settings, it fails to do it (I can’t blame it on proxmox because this is just ovmf, it could happen on any other kvm-based systems).

So I guess my only chance is to bite the bullet and build a custom ipxe.efi file that has a bundled script that chainloads the OS (maybe based on the MAC of the system).

I managed to netboot OVMF UEFI by:

  • using ipxe.efi-x86_64 as the filename in dhcpd.conf and downloading it via tftp on the proxmox VM
  • break out of loop by pressing ctrl + b
  • use these commands
dhcp net0
chain tftp://192.168.69.9/alpine.ipxe

(modloop obviously still WIP, but I’m getting closer)

To prove that this method works, I did the same to the test void vm. Switched to ovmf, changed the dhcpd for its filename to the ipxe.efi file, broke out of loop, dhcp’ed and chained void.ipxe script. And it worked.

I guess I’ll have to follow this and build an ipxe.efi file with a script that loads an ipxe script based on the mac of the VM.


Now if I’m planning to build a netbooting infrastructure, I’d like as little maintenance (and friction) as possible. So the ipxe config for void only really works for 1 VM (unless I boot with read-only root, which is doable, but I never tested that).

I’m not looking at managing boot files for each VM I’d deploy, that’s insane. For alpine it might actually work better if I set the hostnames to the mac addresses and load the ma-ca-dd-re-ss-es as apkovl files.

So I could technically make a single ipxe.efi file that boots a script based on the mac name (à la syslinux / pxelinux) and maintain only a single kernel, initramfs and modloop files. I get an update, I only update these once and keep each server’s conf files as ma-ca-dd-re-ss.apkovl, which will then be loaded automatically. For void (or any other root-on-nfs linux VM) I could export the NFS as the mac address and then I wouldn’t need to manage each boot file individually (I’d just use the built-in ipxe variables to boot each VM correctly).

More ipxe shenanigans.

I gave in and set up an http server for loading alpine’s modloop (I needed it for modprobing nfs). It seems like the modloop file contains the /lib/modules folder (which now that I know, I realize the name makes a ton of sense).

I’ve installed darkhttpd (because I knew it was easy to configure, literally just point it to a folder, give it the chroot flag and a user and you’re off to the races), dead-slapped the modloop in there, modified the ipxe boot file for the alpine boot cmdline and pointed the modloop to the http server. Alpine booted just fine and I could modprobe nfs.

I haven’t yet configured it to properly save configs and load them, but I have read a bit more into ipxe. Apparently there’s a variable for the dhcp hostname entry for a server (${net0.dhcp/hostname}), which I can use in the ipxe script to send it as a hostname boot cmdline. That will make alpine see its proper hostname and read its own hostname.apkovl file, which fixes some worries I had (no more setting the hostname of a VM to its MAC address, thanks sweet Jesus, on God fr fr).

I’m not sure how to tackle security on this. On one hand it’d be very easy to make a nfs share for the whole subnet and allow all alpine servers to save their lbu backups there, but that will mean all servers can modify other server’s saved data, which is not ideal. I’m not expecting anything sus in my lab, but I’m doing this for “research” (meaning others might want to pick up this config) so I can’t just make it insecure because I’m lazy.

That means I’d have to make a nfs share for each alpine VM, almost like a root-on-nfs, but instead use the classic alpine diskless mode (root on ramdisk with config on nfs).


After reading a bit on the alpine wiki, it looks like an apkovl can be sent via http (just like the modloop), so all I would technically need is a good ipxe script to load the full OS into RAM, completely diskless. Reboot without lbu commit will result in the loss of all non-saved data. If all you do with alpine is serve a static site, rebooting is fine. If your alpine VM is a VPN, rebooting is also fine, but you lose all logs (which might be a good thing in certain environments), unless you have an rsyslog server where the logs are sent.

If you have a database, you’d be screwed with a reboot (well, you can’t lbu commit a DB with it running anyway, except maybe for postgres and sqlite). But running a DB from RAM, while you’d have great performance (if the DB will run at all), I think you’d be better off mounting a nfs share for the DB or other volatile user data.

I can already think of a few uses like this. HA in particular would be really easy to implement. As long as your VM config file will contain the MAC address for the interface, you don’t even need fencing because the VM will run solely from RAM, so if your host is taken offline, then all your VMs will be restarted on another host, run from memory with the same configuration and all and if the host comes back online, its VMs should be put on-hold / stopped or have the services moved back on that host (maybe live migrated?). Obviously if your VMs have persistent data via NFS, then you’d need some fencing (like cutting the hypervisor off the nfs network on the switch port).

One thing that I’ve read on the alpine wiki which kinda stood out, was the fact that if you want to upgrade the kernel, you need a minimum of 8GB of RAM free for that alpine VM where you’re performing the upgrade. That’s quite a lot, but it’s needed for the modloop creation. On the plus side, you don’t need all your VMs to have 8+ GB of RAM. With netbooting you can just start 1 VM once, upgrade the kernel, generate the image, initramfs and modloop files, then copy these to the tftp or http server that’s serving them to the rest of the VMs. Reboot your VMs (lbu commit if necessary) and voilà, you’re running the latest kernel on 50+ VMs by upgrading the kernel inside just 1 VM.

Given that ipxe supports http and that alpine can’t use tftp anyway, I think I’ll only use http for the ipxe scripts and deliver the ipxe.efi file via tftp. All that’s really left is to generate an ipxe.efi payload that has an embedded ipxe script that chainloads another script from next-server’s http server.

I’m not sure yet if I want the chainloaded ipxe script to be custom to all VMs or if I want a generic one. The DHCP server config needs to be edited anyway, but I’d rather not add more work with the ipxe script.

One thing’s for sure, most VMs can use a generic ipxe script just fine (based on ${net0.dhcp/hostname} and ${net0/mac:hexhyp} entries) and if there’s a couple of VMs here and there that need a special boot config, it’ll probably be easier to set up the dhcpd.conf to send a different (custom built) ipxe.efi payload with an embedded script that chainloads an ipxe script based on the dhcp hostname entry.

Alternatively, I could go full retard and go the symlink route. The ipxe.efi can be the same for all VMs and all VMs would chainload their own ipxe script based on the hostname, but they all point to the same config ('cuz they’ll be symlinks) and for the only few that need a custom script, just copy the generic one and modify it under the hostname entry for the VMs that need it (or further refine it to make it somewhat generic for whatever purpose it’s needed).

I’ll be honest, I don’t like symlinks, but for things like this, they’re absolutely fantastic! Why bother to build custom efi payloads or write a script for each VM, when most VMs can just point to an existing script? That way you also kinda guarantee that you’re only loading 1 payload for all systems, which for SecureBoot environments might be a cool feature to save time signing efi files (with secureboot even if you boot via tftp and there’s a malicious actor on the network, they can’t load their malware efi payloads without using the secret key that you used to sign the ipxe.efi file that’s used by all the VMs - making even tftp secure).

I’m not gonna make a tutorial for secureboot (I’m not using it myself), so someone might have to pick up the slack after I finish the netboot ipxe wiki. Oh and 1 more thing: while I’m testing this on proxmox, what I’m doing is not limited to it. You can run this stuff on libvirt (virt-manager), opennebula, probably bhyve too and even on bare-metal. This makes for a very cross-platform solution as long as you can set up a dhcp, tftp and http (maybe also nfs too, but with alpine, that seems to not be that necessary, except for certain workloads).

Proxmox rant (making me pull out my hair again).

Proxmox VM startup (and shutdown) order is absolutely infuriating. Why on earth ordering is done for VMs with the same order ID based on the VMID, instead of all VMs being started in parallel? It makes no absolutely friggin’ sense!

Let’s say I have these VMs:

  • router (dhcpd, routing, firewall) - id 300, start 1
  • NAS (tftpd, internal httpd, nfsd) - id 200, start 1
  • database for httpd - id 730, start 10
  • database for monitoring - id 731, start 10
  • (external) httpd1 - id 420, start 20
  • (external) httpd2 - id 422, start 20
  • service monitor - id 555, start 99

So in theory, the router and NAS would be started in parallel, then the database, then the 2x httpd in parallel, then the service monitor. If you thought that’d be the case, then you’re wrong, think again! This is proxmox after all!

The start order is sequential and would be like this:

  1. NAS - start 1, id 200
  2. router - start 1, id 300
  3. database for httpd - start 10, id 730
  4. database for monitoring - start 10, id 731
  5. httpd1 - start 20, id 420
  6. httpd2 - start 20, id 422
  7. service monitor - start 99, id 555

Notice the pattern? It’s sequential, first based on the start order (i.e. the number) then subsequently ordered again by the VM ID.
https://pve.proxmox.com/pve-docs/chapter-qm.html#qm_startup_and_shutdown

Even systemd does a better job at handling parallel starting. I’m not here to argue that systemd should be implemented to handle VM startup sequence… but systemd would 100% absolutely definitely on God fr fr no cap be better at managing VMs startup!

So now think of the scenario where I have a large DB server and it takes maybe 5 minutes for the DB to come up, which the http servers require. So I add a startup delay (which means “wait for n seconds until starting the next VM”).

Logically you’d think that adding 300 sec to VM 730 makes sense, but you’d wrong again! Because 730 and 731 have the same start number (10), you have to put 300 seconds on VM 731. So, ok, all fine and well, we have a working setup. The NAS starts, followed by the router, followed by the httpd db, followed by the monitoring db, then wait 5 min (for the 2 previous VMs to start), followed by the httpd1 and 2, then by monitoring.

Guess what happens if you add more VMs and you need a new DB server… take your time, think about it.

Let’s add VM 732, a database VM for, idk, something. Have you guessed what the order is? NAS → router → httpd db → monitoring db → wait 5 minutes → new DB VM […].

So now you have a DB VM that waits 5 minutes for nothing, instead of being immediately started along the other 2 DB VMs. And if you don’t have any subsequent waiting period, you might have the other server (which uses this new DB) start up before the DB. Great!

And if you add another 2 or 5 min startup delay after VM 732, you’re waiting 7 or 10 minutes for the next VMs to start up.

IDK if there’s a way to analyze the startup VM order in proxmox at all. On proxmox forums some (retired) proxmox staff recommended to recursively grep in the qemu-server folder (grep -R startup /etc/pve/qemu-server), which is ridiculous (and will give you the order of the VMID, not the startup order). Using sort directly won’t do. You’d have to sort by the 2nd word, probably something silly like:

grep -R startup /etc/pve/qemu-server | rev | sort | rev

(reverse the lines, sort by the last entry, which will be the start id, then reverse back - highly inefficient)

Actually, scratch that. If you add any startup delay, then it’s highly volatile (i.e. if you have 300 sec for VMs to start after vmid 732, then the VM 732 will show up as the first one to start in the order (because 300 → 003 → 0 is lower than 1 → 732 shows before 1 of the NAS VM, which is not true regarding the startup order).

Who even designed this thing? There’s also no cluster-wide support for VM startup sequence (thank sweet mother of God that I’m not using pve clusters anymore).

I get the excuse that if you start many VMs at the same time, you’ll be hitting your storage server really hard with lots of IOPS, because of the VMs booting up. But 1) by default proxmox doesn’t start VMs in batches (AFAIK) and 2) even if you were to use the startup order and delay feature, if you have 30 VMs with order 30, then you’ll have 30 VMs start up one after another in close succession (so you’re still hitting the storage anyway).

I guess if someone requires some specific VM startup sequence, they’d design something external that not just connects to the hypervisor to start the VMs, but would also check the service status (to make sure the service is actually up and running before starting the next VM). And if there’s a VM crashing (or host crashing and HA kicking in), then the VM sequencer would restart the VMs (or rather just the services that require the other VMs that got started elsewhere).

Yeah, I’d say don’t use this crap, but for some scenarios where you don’t need a lot of smart startup, then I guess this does alright. I keep my own proxmox host off most of the time and when I want to test stuff, I start it up and I set my router and NAS VMs to start automatically (to allow other VMs to boot via ipxe).

More ipxe shenanigans.

So I’ve managed to build ipxe with an integrated script after maybe 15 minutes of debugging why the built-in script wasn’t working (typo: EMBER instead of EMBED in the make command).

The built-in script is simple (it can be http as well):

#!ipxe
dhcp
chain tftp://${net0.dhcp/next-server}/${net0.dhcp/hostname}.ipxe

Straight-forward, right? It can be done more “legible” but I wanted it to be small.

#!ipxe
dhcp
set NEXT_SRV ${net0.dhcp/next-server}
set HSTNM ${net0.dhcp/hostname}
chain tftp://${NEXT_SRV}/${HSTNM}.ipxe

This just boots the ipxe script from the tftp server. No more boot loop into filename loading ipxe ad-nauseam. Tested with both alpine-diskless.ipxe and void-root-on-nfs.ipxe. Both got their file based on their DHCP hostname. I also tested symlinks, they can definitely work.

I tested the hostname= and the ip=::::hostname:eth0:dhcp boot cmdline options, but to no avail (at least on alpine generic boot w/o apkovl), so idk if I can force VMs to take a certain hostname on boot. BUT because ipxe is so awesome, we don’t really need to bother (just keep ip=dhcp).

In the boot cmdline for alpine, we can give the parameter apkovl= and set it to the server’s hostname, i.e. (can only be http or ftp, not tftp):

apkovl=http://${NEXT_SRV}/${HSTNM}.apkovl.tar.gz

So alpine would just grab its own dhcp hostname and its configuration file. Just how cool is that, m8! All you need to do is keep your dhcp server updated properly (and preferably dns too, based on the dhcp entries), symlink your server hostname in the tftp or http server (for the ipxe file) and that’s it.

Everything can be centralized to a NAS, backups become easier and you become more hypervisor independent (it literally doesn’t matter if you’re running qemu or virtualbox: if your VM can boot off of pxe and run the ipxe.efi file, it can boot anywhere without needing a local disk).

HA for the NAS can also be implemented, just zfs replicate to another NAS and keep a HA stack around, like corosync + pacemaker to switch the virtual IP and start the services if the other NAS goes down. Or if you don’t want to use zfs and want something like ceph, you can have a ceph backend and 2 or 3 VMs that connect to ceph and run your services (you’d still need corosync + pacemaker to decide which VM gets the virt-IP and serves the hosts in your lab).

My alpine ipxe script (that gets chainloaded from the ipxe.efi binary sent via tftp boot) looks like this:

#!ipxe
set NEXT_SRV ${net0.dhcp/next-server}
set HSTNM ${net0.dhcp/hostname}
set ALP_REPO http://dl-cdn.alpinelinux.org/alpine/v3.20/main
### AFAIK the repo var is optional if you have an apkovl already
kernel tftp://${NEXT_SRV}/vmlinuz-virt
initrd tftp://${NEXT_SRV}/initramfs-virt
imgargs vmlinuz-virt modloop=http://${NEXT_SRV}/modloop-virt initrd=initramfs-virt ip=dhcp alpine_repo=${ALP_REPO} -- || read void
boot || read void

Like I mentioned, this script can be set to full http and skip the tftp, just like the script embedded into the ipxe.efi file. If your http server is not your next-server (in my case it is, my freebsd NAS test VM serves both http, tftp and nfs), then obviously embed a different variable name in your script, preferably a DNS hostname that you can change the IP in DNS on-demand (assuming you have a local DNS, otherwise you can go static IP, but eh).


To get ipxe to compile, I needed to install lzma-devel and gcc (I’m on a musl system). To build it, I just:

cd ipxe/src
### make sure your boot.ipxe file is here, the one that chainloads the other ipxe script from tftp or http
make clean
NO_WERROR=1 make bin-x86_64-efi/ipxe.efi EMBED=boot.ipxe
cp bin-x86_64/ipxe.efi /tmp/ipxe64.efi

Grab the file from /tmp, send it to your tftp server to be net-booted into and you’re off to the races. Make sure your dhcp server is updated with filename "ipxe.efi"; or the full path if you use subfolders in the tftp server.

I’ll probably make a full wiki entry when I feel like it. For now I want to larp as a sysadmin some more and figure out netbooted apkovl files and completely diskless VMs.

With this I believe I can even transition to bhyve and have basically the same boot infrastructure. If I can, then I can finally put my odroid h3+ to good use (currently it’s just a nfs server for my PC, but the plan after my main PC became the h4, was always to have it run a few homeprod services on it).

Even more! But this post will also mostly focus on alpine netboot.

I managed to make alpine to boot with a persistent config using the POC in my last reply (the apkovl boot cmdline). The only problem I see with this is that without an alpine installation media present, alpine will always want to connect to the repo you select during setup-alpine to download all the packages it needs to run the system (like nfs-utils if you opt to install it).

That’s insane. If you have 100 alpine VMs, they’d all be downloading the same basic packages and a couple of customized ones for some workloads. That means you have a couple of options:

  • mount an iso media on the VM and make your VM not entirely diskless
  • rsync your own alpine mirror repo locally
  • make a proxy cache and cache the most major packages
  • just keep downloading from the mirror repo if your lab is small enough

The 1st option is kinda against the spirit of this project (diskless) and idk if multiple VMs can mount an iso at the same time (never tried it - I know one can assign the same iso to more VMs with a warning, but if other VMs aren’t mounting / accessing the iso, it worked, but with multiple VMs trying to read it, idk if it will).

The 2nd option is not easy. A local alpine mirror repo might be 1 or 2 TB in size if you include the community one, at least I guess so. And you have to keep it in sync (thankfully rsync is incremental, but still).

The 3rd option is a good contender for a larger setup. With squid proxy you can cache the most frequent packages, but you’ll need to find a way to force alpine during boot to configure the proxy env, so it doesn’t go straight to the internet (I think when you setup-alpine you have the option to configure a proxy, which should be saved in lbu commit and read on boot). If you set a large retention period, like a day or 2, you should be in a good spot with the packages (assuming you reboot VMs often or deploy new ones).

Otherwise if you know you’re not going to reboot the VMs very frequently, then it’s perfectly acceptable to leave your VMs up and have them go straight to the alpine mirror you configured (just make sure you don’t overdo it).

There’s something more about local alpine mirror (or any other distro local mirroring). Generally mirroring is done from an official rsync server. If you don’t have thousands or 100k+ hits to the repo, you’re going to waste alpine’s bandwidth and cause more harm by hosting your local mirror than if you were just netbooting with their repo.

That’s because in the grand scheme of things you’re only downloading the basic OS packages on boot. If you do it 50 times every month or so (assuming there’s a kernel update or any other update that requires a reboot), then you’re still only downloading a very small amount of data, compared to downloading the whole repo and keeping it in sync (reminder that a repo’s packages are built daily, or whenever there’s an update - and if there’s an update to a library that most packages use it, then all the packages get rebuilt; and if there’s a package that requires a newer library, then all the packages that link to that library will be rebuilt to get the updated library). So you can’t just rsync a repo.

Mirrors with a squid proxy seems like the best option (or simply use mirrors without a proxy if your infrastructure isn’t too massive, but squid wouldn’t use much storage or CPU and it can be helpful if you have a couple dozen to a couple hundred VMs - the first 3 or 4 might download slower, because they hit the mirror, but all the rest will grab their files from the squid cache and boot as fast as the network interface allows, assuming there’s no congestion from all the boot-up traffic).

Did I mention how much I love ipxe? Man, ipxe is great, fr fr no cap!

Diskless Alpine and some homeprod hardware rant

So I underestimated the necessary RAM requirements for diskless alpine installs (that actually do stuff). If normally an alpine VM (sys install) would use less than 100MB of RAM, a diskless alpine will use at least 450MB of RAM.

I used to make alpine VMs with 128MB of RAM and they worked flawlessly (because I wouldn’t load the whole rootfs in memory). When I configured an alpine dns server, I originally gave it 512MB of RAM. Well, after not even installing nsd (just the basics, e.g. neovim and nfs-utils for saving the apkovl), it already got its rootfs 100% utilized.

I powered the VM off, increased to 1GB of RAM and it seems to be ok-ish for now (still 283.5M from “/” and 184.9M from modloop, which makes for about 468.4 MB, with only 183.9M being available for everything else, with nsd and htop installed).

Well, I do have 64GB of RAM on my thinkpenguin 4-bay NAS, so memory is plenty for what I’m doing with it. I’m still a bit salty though, but won’t be crying too much over this. It’s nice to know that I’m not wasting RAM and it’s actually put to good use ('cuz otherwise, I’d be giving alpine VMs only 128MB of RAM and be left with something like 56GB of RAM available, after all the non-alpine VMs that maybe get 2GB).

I’m planning for an alpine VM with lots of RAM, from where I can save apkovl configs for other systems, so I don’t have to contantly reconfigure the ipxe files and switch symlinks. Once a configuration is set, I can lbu commit, copy the apkovl to the http server and boot a separate VM with it. This VM can reboot and it gets completely cleaned up, ready to package the next needed VM.

For now I only have one with nsd (which seems to be working alright, but I’ll need an unbound one soon).

I’m still thinking if I want to install nfs-utils and load rpcbind and rpc.statd on-boot. It’s not too heavy, but I kinda think of it as a waste of RAM. I should be able to script the apkovl collection via ssh / scp easily. I could even set the VMs without an sshd running, but I don’t really want to give that control up (opening a vnc console is annoying and I can’t copy and paste from my own notes to it). And sshd isn’t really too big of a deal, compared to nfs-utils and RPC.

The only problem I see without a mounted NFS path is that the apkovl might fill up the rootfs in memory, which would then kinda force my hand to have to increase VM RAM again (which I’m still very reluctant to offer, but I have to realize it’s for the sake of running things fast).


The thing about my lab is that I always try to run things as efficiently as possible (which is why I used to do the 128MB alpine VMs to begin with). There might come a time where I might want to set up alpine in sys mode and run it with root-on-nfs, like I do with void. For some workloads, at least I’ll set alpine in data mode.

But I always configure my lab with more resources than it needs and when I go live I set it with as low resources as I can get away with. I have an odroid n2+ around that’s not doing anything rn. It was supposed to be an incus homeprod, but I haven’t delved too deep into it (because I had some issues with NFS before, which might be fixed now, but if not, I can revert to iscsi).

Obviously I can’t run VM into RAM (let alone containers) and even if I could, I’m dealing with 4GB of RAM here. My odroid h3+ has 32GB of RAM, which should be able to handle quite a few bhyve (diskless) VMs, but I’m not planning to overload my NAS. So although the lab part is mostly comprised of alpine diskless VMs, my homeprod is going to look a bit differently (well, the h3+ might still run some things straight into RAM anyway). But I want the bulk of my services to be sent to the n2+.

I want my hardware to be as low-power consuming as it can be. For that reason I’m sticking SBCs. And the services I’ll be running will mostly be CLI-only, with maybe a couple of exceptions. Other than dns, dhcp, tftp, http and nfs, I’m not even sure what I want to host. I’ve survived w/o many services that I found cool to implement, for a long time.

I don’t have a personal movie collection, so I don’t need jellyfin or kodi. I also don’t really find next-cloud that attractive as a platform (it might be useful for some collaborations, but in general, most of it can be replaced by other smaller projects). Web servers and VPN routers are things I’ll definitely be running, maybe gemini too. I’m thinking maybe a wayback / webarchive server might be needed. A pastbin and URL shortener like 0x0.st would be nice to have. XMPP probably as well with either jitsi or galène for video. SearxNG? And a wiki.

As you can tell, most of these should run just fine on low-end hardware. I just need to figure out how to make some of these work on alpine or void.

1 Like

Adventures in ipxe netbooting DNS…

So in my infinite wisdom I decided to start my diskless alpine installs with DNS. Before that, first I’ve set up my first alpine DNS as a blocky ad-blocker in LXC. This one has no issues. This connects to the next one, which is a diskless alpine VM running unbound (recursive DNS). Unbound then connects to 2x alpine VMs running NSD (authoritative).

All my VM lab is getting DNS from blocky, including the unbound and nsd VMs. So what’s the problem? Well, netbooting alpine involves connecting to an alpine mirror to download packages for the rootfs… I think you see where this is going.

If everything is shutdown and is getting powered on, then we have an infinite loop. Ignore NSD for now, these are just internal DNS, it’s unbound and blocky that are the focus. So outbound starts up, grabbing IP from DHCP (it’s static mapping) and the DNS server is pointing to blocky. The blocky lxc itself is running fine.

But now unbound wants to connect to the http alpine mirror. To do that, it queries blocky for the domain name… which then tries to connect to unbound, which is still booting up (so no dns yet)… aaaaaand we’ve got a loop. And unbound goes in emergency shell, because it couldn’t load init, because it couldn’t download the init from the alpine mirror.

I could just convert unbound to a lxc container, it’ll be pretty easy. But to correct that, I can set only the unbound server itself to connect to a public DNS server and skip local rules / resolutions (alpine’s /etc/resolv.conf, not unbound’s config to use root servers), which is filled from the dhcp server. So that’s what I did, the unbound dhcp entry has the option to use a normal dns and is now booting fine.

IMO if the router doesn’t already serve dhcp and DNS, then the ideal setup for these services would be to have their OS set up with persistent storage (or at least local media loading, like through the ISO).

This could’ve also been solved with /etc/hosts entries, but nobody should be using these anyway. Besides, the mirror might not have a static IP and it’s possible it’s using dyndns, not to mention that if you’re planning for ipv6 when the time comes, you never know if the ipv6 prefix or suffix ever change, so /etc/hosts would be really unreliable.

Moral of the story: it’s always DNS.

1 Like

Netbooted infrastructure:

Given my previous gripes with DNS and not really liking the idea of setting up the DNS resolver of the OS of my main DNS VM to a public 1 (although I can live with that), I’ve been trying to set up alpine with root-on-nfs. And after banging my head quite a bit and trying all kinds of stuff, I looked up online and it seems like alpine might in fact not support root-on-nfs (I wish to be proven wrong, but I couldn’t get passed the error: “can’t mount /dev/nfs as sysroot” even after making sure nfs-utils pkg was installed and rpcbind and rpc.statd were enabled).

That’s strange. You’d think a “bare basic” like root-on-nfs (that distros like debian, gentoo and void support) would be supported by alpine. But it seems their focus has been on their version of diskless setup, which loads the OS into RAM. But to some degree, that’s ok, because alpine really shines in diskless and data config modes (I was forcing my hand with sys mode on nfs).

Furthermore, with the way alpine does it (apkovl files), all the OS have the same basic OS utilities version (and kernel and drivers), unless you don’t reboot or update for ages, so it really saves some storage (in exchange for running everything from RAM). But the only thing I don’t like about this approach is the fact that on-boot you have to connect to a repo. And hosting my own repo wouldn’t be too much of a problem, but I’d rather put those resources to better use.

So my current setup is still going to be mostly alpine diskless VMs netbooting. But I’ve been playing with void and root-on-nfs (made a template) and I’m planning to run that as my main DNS, only because it can be jump-started locally without DNS (just my working DHCP server). There’s points to be made about running Unbound straight on my router that also serves DHCP (which I might do on my homeprod side). There’s also running alpine in lxc (blocky works just fine like that, unbound would just as well).

With that template I’m going to set up more VMs like this for other purposes, but I’ve hit one brick wall, which is the management of multiple VMs software stack (bins and libs, but particularly kernels, initramfs and kernel modules). Alpine gets away with it by unloading everything to RAM on every boot and by handling a modloop file with the modules built for the specific kernel and initramfs you’re booting. But other distros that run straight on nfs aren’t so fortunate.

Gentoo’s root-on-nfs wiki seems to point to multiple systems sharing unified /usr (read-only), /opt (ro) and /home (rw) folders. This might be something to consider, as it will tremendously help with keeping OS versions in-line. With independent rootfs, if I were to update one system from 6.6.69 kernel to 6.12.9, I’d need to maintain a separate vmlinuz and initramfs in my tftp folder to match the system’s /usr/lib/modules folder (if I were to boot 6.6.69, the default from netboot without the modules present, but with 6.12.9 modules, the system would be unbootable). Consequently, I’d have to maintain a separate ipxe boot config file for every system (instead of symlinking every new VM’s hostname to a generic ipxe script, like I do now).

So sharing a /usr as ro makes sense. Only 1 system gets updated and I only need to transfer the vmlinuz and initramfs once to the tftp server and I can reboot all the other systems to get the (kernel) update. It makes real efficient use of storage space and uses less internet bandwidth than the alpine diskless mode would. In fact, the more alpine VMs one spins, the more bandwidth will be used (which is why squid proxy might make sense), while with this shared /usr, one could literally have hundreds of VMs kept up to date with the bandwidth (and storage utilization) of just 1 system’s update (assuming the NFS server can serve 100s of VMs without bogging all of them and itself down).

But the massive elephant in the room is… you’re sharing a ro /usr folder. That means that whatever you’ll install on one system will be available on all your systems. You have unbound on one server, you’ll have unbound on all servers. You have iperf3 on one server, you’ll have iperf3 on all servers. This could give an attack more tools to play with (if they manage to get root privileges on one, there’s a real chance more systems can get affected).

On the plus side, you’re sharing a (ro) /usr folder. That means no more installing utilities on a system and forgetting to install them on another. And being read-only, if 1 system gets compromised, it won’t affect the others (your master system that mounts that usr share as rw would have to get compromised and that’s a tough call when you’re not running any services on that system - or your NFS, TFTP and HTTP combo server, which can wreak some havoc).

That doesn’t mean your entire infrastructure’s safe, just that an attacker couldn’t install tools through your package manager (if you have wget, curl, git and / or make, it’d fairly trivial to use these to download a binary that can be used to further compromise the system - not to mention that python is basically present everywhere).

So I’m thinking that I might want to build a custom install with most tools stripped out and then installing only what I need. Thankfully with xbps it’s easy to uninstall even parts of the base-system (you just add an ignorepkg entry in a xbps.d conf file), so it should be possible to strip out even runit and coreutils and to install busybox. I just don’t think anyone even tried to go that far. If I’ll do that, I’ll then be basically creating my own flavor of the distro (it wouldn’t be a hard fork, or even a soft fork, as all the tools and whatnot still point to the same repos).

Am I willing to go that far? Maybe… I’ve installed s6-linux-init (outside of the repos) on void and I am planning to do the same for the nfs template (i.e. the one that all the VMs would be using). It wouldn’t be too out of character for me to do something insane like that just for stroking my mental illness.

Now it will all be a matter of motivation. I felt pretty good lately, maybe I’ll do it, who knows. The only worrying thing about it is that I’ll have to actually pull my sleeves and start maintaining the s6 packages properly this time. Also, if I am to rebuild the OS around busybox, then I’d need a separate stack of s6 services (most would still be shared, but for busybox I’d need to invoke things with other flags, e.g. the tty). I’ll probably need to look really deep into alpine to take inspiration on how it does things (which I’m sure will be a time sync).

1 Like

I want to rant a bit on the nanokvm.

I’m entirely disappointed on how the OS side of things came out to be. I love a busybox approach and I think this is better than a full blown distro. I don’t mind the web server being http-only (you can just set an https reverse proxy in front of it anyway). I don’t mind the piss-poor security like not hashing and salting passwords, as I’m always using random passwords anyway (and it doesn’t matter anyway if this thing will be in a completely segregated network, only accessible through a VPN running on a different, trusted device, like maybe where the reverse proxy may also be running).

But the biggest disappointment is absolutely the proprietary serial number-based, custom compiled .so library. What the hell were Sipeed devs / management thinking? If it’s an open source product, they shouldn’t care if the software’s running on different hardware than what they sell. And if people buy the hardware from them, they make money anyway, so they shouldn’t care about bundling the serial number in the software stack.

The custom DNS is egregious, but not uncommon (unfortunately). They probably got that idea from somewhere else on the internet. Connecting to a server in China to download updates, while sketchy, isn’t entirely a bad thing. They can later change the repos to a cloudflare CDN, so you’re connecting to a server in Europe (the “source of truth” as far as the software goes is still going to come from China, so that’s what you sign up for anyway).

But the fact that they haven’t updated the github releases and they ship new software only on the higher-end model, without a download link for the updated software for the base model is stupid.

I wasn’t sure if I wanted to buy a nanoKVM or jetKVM (or buy a network KVM at all for that matter), but unless there’s change in the next months, I won’t be buying or recommending the nanoKVM. I really want to be proven wrong and that Sipeed is committed to FOSS, but I won’t hold my breath.

However, if people manage to port Pi-KVM to the nanoKVM, I’d be using that (even if the stack is a bit more bloated - it does have quite a bit more features). But I find that unlikely, given what was said in the video: v4l2 isn’t supported by the RISC-V SoC, so they have to use that proprietary .so (which probably contains vendor-specific proprietary junk, to get access to the hdmi input).

I don’t like this situation. There’s some scenarios which I’m looking to get remote access to my home to manage things that have poor software remote control support, so using hardware to get around that would be nice, stuff like plugging a phone into a computer and using KDE Connect to control it - yes, I could use XFCE or LXQt and use XRDP or VNC, but they aren’t that great. Alternatively, I could use a rpi 4 with android and manage it through a network KVM directly (but the situation of running software straight from a computer android device ain’t great).

2 Likes