Before I drive into that rabbit hole and get myself cancelled from this form for my zfs heresies, I need to vent my frustration at Xen hypervisor.
The beginnings were really nice. Initial setup, ability to run dom0 directly or virtualized. EFI integrations. All seemed very nice.
Some problems started when I tried installing opnsense on an EFI HVM. I did everything by the books, could get into UEFI and even into the bootloader. But no matter how hard I tried, I couldn’t get it to boot properly, it would always stop at the same spot. After some time troubleshooting, I could get it to boot a bit further, but for some reason with no inputs. The same would happen after swapping iso to “archiso” - inputs in EFI are ok, but not in booted Linux. Well, sucks.
It turned out, I could get it to boot properly by changing machine type from EFI to default (BIOS). So for my first “look around” I just went with it and installed it in a BIOS HVM. I needed another VM for tests which ended up being a PV arch instance. Cool, I had brought up an HVM, a PV, and had some experience with debugging the setup.
Now, yesterday I started testing my recorded procedure of (re)setting up the whole software stack on the host. Everything was running smoothly, until at one point qemu-xen started throwing unknown opcode errors out of the blue. It took me a long time to realize what was going on, until I finally realized that my “host” (dom0 in Xen speak)… Doesn’t list AVX as supported when running under Xen. That was obviously not the case when running without the hypervisor, so I started digging. After a fair amount of useless leads I finally stumbled upon the cause and solution at the same time:
https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html#spec-ctrl-arm
Specifically this fragment:
On all hardware, the
gds-mit=option can be used to force or prevent Xen from mitigating the GDS (Gather Data Sampling) vulnerability. By default, Xen will mitigate GDS on hardware believed to be vulnerable. On hardware supporting GDS_CTRL (requires the August 2023 microcode), and where firmware has elected not to lock the configuration, Xen will use GDS_CTRL to mitigate GDS with. Otherwise, Xen will mitigate by disabling AVX, which blocks the use of the AVX2 Gather instructions.
Well, sucks I thought adding the proper disable option to the command line. I can accept some performance hit in the name of security, but not in the form of diamond disabling AVX.
Right after fixing that I stumbled upon another issue, where trying to activate SR-IOV Virtual Function on a NIC would fail with a very generic error. This time it turned out to plague both direct Linux and Xen virtualized dom0, so the fix was not necessarily Xen-centric. It turned out I had to add a specific Linux Kernel options that I found in one thread, on one forum. Not only it took me a fair amount of time, the euphoria from solving the riddle was rather short-lived, when it turned out I can’t even pass through the virtual function NIC to the target VM, which was supposed to be… The new version of the HVM OPNsense.
I spent another hour or two trying to figure out why I can’t pass the card, while realizing I can’t actually pass any PCI device to the HVM. I spun up a Linux PV and verified that passthrough was working there. Ok, progress. Finished setting up the Linux VM and went back to OPNsense (which is a freeBSD under the hood BTW). I started working towards running out as a PV, but for some reason I couldn’t get pygrub to recognize root and find the Kernel. After wasting another unspecified amount of time I settled for running it as a PVH with an extracted Kernel (n.b. I can currently only extract it because apparently my host Kernel has been compiled with read-only support for ufs…) Which worked!
…
But then I got an error that PCIe passthrough is not supported on PVHs… (╯°□°)╯︵ ┻━┻
…
PV kernel panicked on me twice I think, I don’t remember anymore. According to freeBSD docs I’m no longer sure if it’s supposed to work or not.
I need to sleep on it, it’s past 4AM again and I tried really hard to be done by 2. Writing this rant for another hour doesn’t help.