Marandil's Homelab evolution

Got it mostly populated tonight:


Even the rad “fits” with cover popped open :laughing:

I’m leaving the top slot (x16) unused for now, all the other lanes are occupied.

I’m debating whether to put the HBA or GPU in the middle or bottom slot. For now I went with GPU in the middle (PCH) and HBA in the bottom (CPU). With M.2s in use both slots are x4, but I think the HBA will benefit more from lower latency, while the GPU is there just because the motherboard complains if it’s being ran without any output :frowning:. I might have to do something about it.

Once again it’s 4AM and I’m not sleeping yet. But it’s time. I’ll deal with the rad placement tomorrow.

1 Like

Knowing the angle’s there will haunt me forever :laughing:


Now I need to 3D-print some supports to keep it in place.

I still need to have a long and noisy session with UEFI to configure the fans. I put all of them in “Silent” mode and the two at the rear (above IO) still run at over 3600 RPM with no load and ramp up to very noisy every time the CPU picks up any task, so they can go to ~5000 for a second and then down to 3400-3600 immediately afterwards.

I also need to monitor temps as this may be caused by bad mount and the CPU getting thermal spikes.

In other news, having the original fan wall running at 100% PWM directly onto the radiator was the first time I saw the CPU (reminder: i9-7900X) running at 22ºC idle (mobo panicked because I didn’t plug anything into the CPU fan header and put all the fans at full speed). It was literally at room temp, or 1-2ºC above (I’m not sure what the actual room temp was at that time).

Got the rad mounted with 3d-printed supports:

I also had to put a 120mm fan on top of the AICs because NICs were getting hot - my guess is the original 80mm fans above the CPU were pulling too much air and so there was almost no airflow over the PCIe area. This is something I need to have a look into. I may also need to replace the Eiswind fans with something with a bit more oomph, or tweak RPMs - I read somewhere that some “noisy” higher airflow fans worked better and quieter at lower RPM than Noctuas at full speed, so this is something to look into.

Here’s the setup without the RGB puke from the motherboard:

1 Like

I had this experience recently with a passively cooled CPU in a ITX NAS build. The quiet fan that came with the case (Arctic F8) running at max speed (2000rpm) could only keep the CPU around 80C. I put a ‘louder’ fan in (Arctic P8 Max, 5000 rpm max) but it’s only running at about half its rated speed and keeping the CPU in the mid-60s now. I can’t say I’ve noticed a difference in the noise, either (it’s sitting on a shelf in my office). Has the headroom if it needs it once summer rolls around, too.

1 Like

Thanks for sharing! The P8 Max would actually fit perfectly as the rear exhaust, but I would need to compare it to the San Ace 80s I have from the case. I may actually go for P12 Maxes at the rad as well.
Both options added straight to the wishlist :smiley:

I kinda do the blogs for myself to have a known location for the stuff I otherwise tend to write in a text file that I saved in a known location and then forget what that location was, when I need to recover.

So, today I’m gonna reinstall the experimental configuration (more on that later) as I’m still deciding between a virtualized and monolithic approach to the system, but for now the long awaited:

Flash storage inventory (2024-01-19)

For now I’m only gonna list flash-based storage as that’s somewhat constant. With rust, yesterday I went through a batch of 2nd hand drives and found 3 of them more or less damaged, so I’m gonna have a chat with the seller once I finalize my findings. Meanwhile, here it goes:

M.2 NVMe

  • 3x Intel “Optane” H10 512G+32G; I can’t get the board to reliably recognize the 32G optane devices so I’m gonna stick to the 512G bits. In terms of GiBs that’s 476.9GiB.
    Additional note: I can only fit 2 in the system at the same time, because I need to use the PCH M.2 slots.
  • 2x Samsung SSD 970 EVO Plus 250G; lightly used. 232.9GiB.
  • 1x Samsung OEM 256G; harvested from a laptop that decided to incinerate itself [a sad story for another day]. 238.5GiB. Under the hood appears to be the same ctrl as the 970 EVO+, just provisioned for 256G instead of 250G and configured slightly differently.
nvme id-ctrl diff
$ sudo nvme id-ctrl -H /dev/nvme5 > samsung-oem
$ sudo nvme id-ctrl -H /dev/nvme2 > samsung-evo
$ diff samsung-oem samsung-evo
4,6c4,6
< sn        : S4DXN*********
< mn        : SAMSUNG MZVLB256HBHQ-000L2
< fr        : 3L1QEXH7
---
> sn        : S4EUN*********
> mn        : Samsung SSD 970 EVO Plus 250GB
> fr        : 2B2QEXM7
112,113c112,113
< wctemp    : 357
<  [15:0] : 84 °C (357 K)       Warning Composite Temperature Threshold (WCTEMP)
---
> wctemp    : 358
>  [15:0] : 85 °C (358 K)       Warning Composite Temperature Threshold (WCTEMP)
121,122c121,122
< tnvmcap   : 256,060,514,304
< [127:0] : 256,060,514,304
---
> tnvmcap   : 250,059,350,016
> [127:0] : 250,059,350,016
142,143c142,143
< mntmt     : 321
<  [15:0] : 48 °C (321 K)       Minimum Thermal Management Temperature (MNTMT)
---
> mntmt     : 356
>  [15:0] : 83 °C (356 K)       Minimum Thermal Management Temperature (MNTMT)
148c148
< sanicap   : 0x2
---
> sanicap   : 0
152c152
<     [1:1] : 0x1       Block Erase Sanitize Operation Supported
---
>     [1:1] : 0 Block Erase Sanitize Operation Not Supported
200c200
< fna       : 0
---
> fna       : 0x5
202c202
<   [2:2] : 0   Crypto Erase Not Supported as part of Secure Erase
---
>   [2:2] : 0x1 Crypto Erase Supported as part of Secure Erase
204c204
<   [0:0] : 0   Format Applies to Single Namespace(s)
---
>   [0:0] : 0x1 Format Applies to All Namespace(s)
246c246
< ps      0 : mp:8.00W operational enlat:0 exlat:0 rrt:0 rrl:0
---
> ps      0 : mp:7.80W operational enlat:0 exlat:0 rrt:0 rrl:0
249c249
< ps      1 : mp:6.30W operational enlat:0 exlat:0 rrt:1 rrl:1
---
> ps      1 : mp:6.00W operational enlat:0 exlat:0 rrt:1 rrl:1
252c252
< ps      2 : mp:3.50W operational enlat:0 exlat:0 rrt:2 rrl:2
---
> ps      2 : mp:3.40W operational enlat:0 exlat:0 rrt:2 rrl:2
255c255
< ps      3 : mp:0.0760W non-operational enlat:210 exlat:1200 rrt:3 rrl:3
---
> ps      3 : mp:0.0700W non-operational enlat:210 exlat:1200 rrt:3 rrl:3
258c258
< ps      4 : mp:0.0050W non-operational enlat:2000 exlat:8000 rrt:4 rrl:4
---
> ps      4 : mp:0.0100W non-operational enlat:2000 exlat:8000 rrt:4 rrl:4
  • 1x Samsung SSD 980 1TB not PRO unfortunately. I got 2 PROs, but they are in use in other systems currently. Lightly used for write-once data. 931.5GiB.

SATA SSD

  • 2x Intel DC S4600 240G; different wear levels. 223.6GiB.
  • 1x Samsung SSD 860 QVO 2TB; not tortured. Also lived in my laptop. 1 863GiB.
  • 1x SSDPR-CL100-960-G3; or the trusty old GOODRAM . 894.3GiB.

NVMe formatting

Unsuprisingly neither of the drives supports >1 namespace, but I should still be able to use to underprovision with n1. For benchmarks:

$ sudo blkdiscard /dev/nvmeXn1

should suffice.

Assignments

I’m not yet sure what to do with all the drives. I’ll likely keep one H10 as a spare, unless I find a reliable way to have it enumerate, e.g. this time it decided to pop up:

nvme0n1             259:0    0 476.9G  0 disk                                INTEL HBRPEKNX0202AL   PHxxxx-1
nvme1n1             259:1    0 476.9G  0 disk                                INTEL HBRPEKNX0202AL   PHxxxx-1
nvme3n1             259:2    0  27.3G  0 disk              isw_raid_member   INTEL HBRPEKNX0202ALO  PHxxxx-2
└─md126               9:126  0     0B  0 md
nvme2n1             259:3    0 232.9G  0 disk                                Samsung SSD 970 EVO Pl 
nvme6n1             259:4    0 232.9G  0 disk                                Samsung SSD 970 EVO Pl 
nvme5n1             259:5    0 238.5G  0 disk                                SAMSUNG MZVLB256HBHQ-0 
nvme4n1             259:10   0 931.5G  0 disk                                Samsung SSD 980 1TB    

The S4600 at some point I wanted to use for some ZFS special vdev (either metadata, ZIL or L2ARC), but I found better use for them as the boot & VM drive in MD RAID mirror. For now it works remarkably well (in testing).

The current setup has a total of 8 M.2 slots, with a x16 bifurcation card occupying the first x16 slot, so:

  • 6x CPU (x4 lanes, limit: x24)
  • 2x PCH (x4 lanes, limit: x4)

The PCH slots I occupy with H10s, so they are not even limited by the width (each H10 half is x2) currently, excluding other traffic through PCH (e.g. SATA, VGA).
This leaves me with 6 CPU slots and 4-5 sticks to occupy them with. So for now I just populated all the slots.

Next time maybe: ZFS Sacrilege

1 Like

Before I drive into that rabbit hole and get myself cancelled from this form for my zfs heresies, I need to vent my frustration at Xen hypervisor.

The beginnings were really nice. Initial setup, ability to run dom0 directly or virtualized. EFI integrations. All seemed very nice.

Some problems started when I tried installing opnsense on an EFI HVM. I did everything by the books, could get into UEFI and even into the bootloader. But no matter how hard I tried, I couldn’t get it to boot properly, it would always stop at the same spot. After some time troubleshooting, I could get it to boot a bit further, but for some reason with no inputs. The same would happen after swapping iso to “archiso” - inputs in EFI are ok, but not in booted Linux. Well, sucks.

It turned out, I could get it to boot properly by changing machine type from EFI to default (BIOS). So for my first “look around” I just went with it and installed it in a BIOS HVM. I needed another VM for tests which ended up being a PV arch instance. Cool, I had brought up an HVM, a PV, and had some experience with debugging the setup.

Now, yesterday I started testing my recorded procedure of (re)setting up the whole software stack on the host. Everything was running smoothly, until at one point qemu-xen started throwing unknown opcode errors out of the blue. It took me a long time to realize what was going on, until I finally realized that my “host” (dom0 in Xen speak)… Doesn’t list AVX as supported when running under Xen. That was obviously not the case when running without the hypervisor, so I started digging. After a fair amount of useless leads I finally stumbled upon the cause and solution at the same time:
https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html#spec-ctrl-arm

Specifically this fragment:

On all hardware, the gds-mit= option can be used to force or prevent Xen from mitigating the GDS (Gather Data Sampling) vulnerability. By default, Xen will mitigate GDS on hardware believed to be vulnerable. On hardware supporting GDS_CTRL (requires the August 2023 microcode), and where firmware has elected not to lock the configuration, Xen will use GDS_CTRL to mitigate GDS with. Otherwise, Xen will mitigate by disabling AVX, which blocks the use of the AVX2 Gather instructions.

Well, sucks I thought adding the proper disable option to the command line. I can accept some performance hit in the name of security, but not in the form of diamond disabling AVX.

Right after fixing that I stumbled upon another issue, where trying to activate SR-IOV Virtual Function on a NIC would fail with a very generic error. This time it turned out to plague both direct Linux and Xen virtualized dom0, so the fix was not necessarily Xen-centric. It turned out I had to add a specific Linux Kernel options that I found in one thread, on one forum. Not only it took me a fair amount of time, the euphoria from solving the riddle was rather short-lived, when it turned out I can’t even pass through the virtual function NIC to the target VM, which was supposed to be… The new version of the HVM OPNsense.

I spent another hour or two trying to figure out why I can’t pass the card, while realizing I can’t actually pass any PCI device to the HVM. I spun up a Linux PV and verified that passthrough was working there. Ok, progress. Finished setting up the Linux VM and went back to OPNsense (which is a freeBSD under the hood BTW). I started working towards running out as a PV, but for some reason I couldn’t get pygrub to recognize root and find the Kernel. After wasting another unspecified amount of time I settled for running it as a PVH with an extracted Kernel (n.b. I can currently only extract it because apparently my host Kernel has been compiled with read-only support for ufs…) Which worked!

But then I got an error that PCIe passthrough is not supported on PVHs… (⁠╯⁠°⁠□⁠°⁠)⁠╯⁠︵⁠ ⁠┻⁠━⁠┻

PV kernel panicked on me twice I think, I don’t remember anymore. According to freeBSD docs I’m no longer sure if it’s supposed to work or not.

I need to sleep on it, it’s past 4AM again and I tried really hard to be done by 2. Writing this rant for another hour doesn’t help.

4 Likes

A new set of fans arrived this week:

Thanks @Molly for recommending the Pn MAX series, really like the blade design on those.

I ended up going with their server fans for some reason instead of P8s though. For now they seem to be better at being quiet than the San Aces, but in an A/B test at “normal” RPM I wasn’t sure which is which.

Didn’t test them too much yet, barely finished replacing 3x Eiswind with 5x P12 MAX:

The P14 Slim is going to help cool the AICs, I still need to print a mounting bracket though:


On the software side I’m still in a bind, because the original plan (detailed below) is not going to work, sadly. I think. I didn’t manage to get a PCIe passthrough to an HVM and couldn’t get OPNsense to run on a PV, even after installing Xen “additions”. Tough luck.

The original plan was to get a Xen hypervisor and a lean Arch distribution (or something different, but highly customizable, none of that Ubuntu crap) to serve as minimal dom0 (for those unfamiliar with Xen speak, a privileged VM), plugged only to a “management LAN” with dedicated physical connections, and only on-demand internet access. Working hostname for the dom0: “vmserver”.

Along the dom0 there were supposed to be at least 3 different virtual machines (domUs) running on the server:

  • A dedicated router OS, most likely OPNsense, to manage the 10G NICs and switching on them; working hostname: “opnsense”
  • A storage server, likely another Arch installation but I wasn’t hell-bent on that. ZFS management, storage passthrough (HBA + NVMes), NAS servers (SMB, NFS, iSCSI), this kind of stuff; working hostname “zfserver”
  • A services server. All the other junk that I want/need to run, like the internal certbot, pihole, lancache, etc. The only vm allowed to run somewhat unvetted software in containers (like the pihole or lancache); working hostname: “svcshost”

As you might have noticed, all the hostnames are 8-letters long. The word “hostname” also has 8 letters. Coincidence? ( ͡° ͜ʖ ͡° )

Initially I wanted the domUs to have direct access to the router VM bypassing dom0 for additional separation, but that seems to be impossible, at least for now. The closest thing I have achieved was VF passthrough from the NICs, as the dom0 can stay disconnected from those, and likely this is going to be the way forward.

Ideal network separation diagram

+------------------------------------------------------------+
| Xen Hypervisor                                             |
|                                                            |
|+----------------+                                          |
|| opnsense [nic0]+-------------------------------------[nic0]
||          [nic1]+-------------------------------------[nic1]
||          [mgmt]+---------+                                |
||          [vif0]+----+    |                                |
||          [vif1]+--+ |    |                                |
|+----------------+  | |    |                                |
|                    | |    |                                |
|+----------------+  | |    |                                |
|| zfserver [vif0]+--+ |    |                                |
||          [mgmt]+----)--+ |                                |
|+----------------+    |  | |                                |
|                      |  | |+------------------------------+|
|+----------------+    |  | ++[vif0]-----+        vmserver  ||
|| svcshost [vif0]+----+  +--+[vif1]-----+-[mgmt-lan]       ||
||          [mgmt]+----------+[vif2]-----+--------------[eth0]
|+----------------+          +------------------------------+|
+------------------------------------------------------------+

Non-ideal (SR-IOV based) network separation

+------------------------------------------------------------+
| Xen Hypervisor                                             |
|                                                            |
|+------------------+                                        |
|| opnsense [n0p0v0]+--------[passthrough]------------[n0p0v0]
||          [n0p1v0]+--------[passthrough]------------[n0p1v0]
||          [n0p2v0]+--------[passthrough]------------[n0p2v0]
||          [n0p3v0]+--------[passthrough]------------[n0p3v0]
||          [n1p0v0]+--------[passthrough]------------[n1p0v0]
||          [n1p1v0]+--------[passthrough]------------[n1p1v0]
||          [n1p2v0]+--------[passthrough]------------[n1p2v0]
||          [n1p3v0]+--------[passthrough]------------[n1p3v0]
||            [mgmt]+-------+                                |
|+------------------+       |                                |
|                           |                                |
|+----------------+         |                                |
|| zfserver [vif0]+---------)----------[passthrough]--[n1p3v1]
||          [mgmt]+-------+ |                                |
|+----------------+       | |                                |
|                         | |+------------------------------+|
|+----------------+       | ++[vif0]-----+        vmserver  ||
|| svcshost [vif0]+--+    +--+[vif1]-----+-[mgmt-lan]       ||
||          [mgmt]+--)-------+[vif2]-----+--------------[eth0]
|+----------------+  |       +------------------------------+|
|                    +-----------------[passthrough]--[n1p3v2]
+------------------------------------------------------------+

The ideal layout assumes a full NIC can be passed; I’m not 100% sure that’s the case, but I didn’t try passing all 7 functions at once (4x PF for the VPs, 1x general NIC at .4, and 2 storage offloading functions). The non-deal assumes SR-IOV passthrough of all the ports to the router + inter-domain connection via the last Virtual Port - all n1p3v* act as if they were connected to the same network.

Except for OPNsense, all the other vms could have been running in PVs as they are Linux. In theory I could just go with some Linux-based router OS like VyOS, the list is long. Which prompts the…

Solution #1 - Linux-based Router OS in PV

Since PVs work so far, this solution seems like the obvious choice. I do, however, have some reservations. What if that’s not the only broken thing that I’m about to encounter in Xen? What if it’s another piece of the puzzle that’s broken? We’ve seen pfSense running in xcp-ng and that’s also Xen, right?

If I go that route I’ll probably choose VyOS. Going with Linux has the additional benefit of drivers - while the kernel is a mess at times, the community and corporate support here seems to be above what FreeBSD can offer.

Solution #2 - Dedicated hypervisor distribution

See previous point - we’ve seen pfSense running as xcp-ng VM, at least on the Son of the Forbidden Router IIRC. With passthrough. Which means it should be possible. I should possibly at least try to set it up and check whether similar problems persist.
I’m not a huge fan of those dedicated hypervisor distributions because of all them hiding the details and doing things “their way” as opposed to doing them “my way”. Even something as dumb simple as virt-manager can be a pain to work with the moment you have to do something outside the box.

Just so we’re clear, I have never used xcp-ng and I’m not dissing on it; I just have a feeling it’s not going to be my cup of tea. And if you think about suggesting Proxmox…

Solution #3 - Just go KVM

… I’d rather just stay with the original plan, but switch to KVM and libvirt, as I already have a ton of experience with them. I just really wanted to go with Xen initially for the added separation of dom0, but the more I work with it, the less differences from KVM I see. For instance, I assumed I wouldn’t even have to deal with “passthrough” and I’ll be able to just assign hardware to VMs sort of like I do vcpus or memory; but at least from the tutorials and documentations I went through I see no way to do it properly.

Side note: It appears to still be possible though, see e.g. this slideshow.

Solution #4 - Ah, screw it! (go monolithic)

If all else fails, I could just go monolithic instead. All the separation and security is gone, but at least it works, right? Right?


For now I’m pretty undecided, so I’m ready to receive feedback.

Edit: sorry for the typos, my eyes hurt already :disappointed:. Just re-read the post on my phone and corrected some, but there may still be more left.

4 Likes

Perfectly fine reservations. Maybe Alpine with Xen has better support. You’d also get a more minimal and more stable distro.

https://wiki.alpinelinux.org/wiki/Xen_Dom0

I find that the FreeBSD and OpenBSD communities can offer decent support. Normally for routers (and firewalls) I’d go openbsd, unless you really need all the juice you can squeeze out, so you go freebsd (or there are really no drivers for openbsd).

You just edit the XML. You can do so much with virt-manager, as it’s using bare basic qemu and libvirt, like basically all other major systems. OpenStack, OpenNebula, oVirt, virt-manager, probably more, all use libvirt. Proxmox uses qemu, but not libvirt (they have their own utilities). People got passthrough working on it, but no clue how you’d do it (the qemu-server conf for proxmox is not the same xml stuff as libvirt’s).

Yeah… The XML we got from virt-manager was so bad we basically had to rewrite it from scratch. I don’t remember all the details, but the PCIe tree barely had any slots left.

I remember having “editing HTML in word in early 2000s” flashbacks :smiley:

Wow, fun to see someone struggling with similar things that I am because of a similar design!

But no matter how hard I tried, I couldn’t get it to boot properly, it would always stop at the same spot

Yo! This was so frustrating, I would xl create and see xl top showing 100% cpu usage with no output. And yeah BIOS mode seemed to solve if I didn’t also…

while realizing I can’t actually pass any PCI device to the HVM

do that too. I got the same 100% cpu usage there too. I was wondering what flags or settings we are missing. Did you use a kernel with pciback compiled in? I haven’t tried that yet, just as a module

I started working towards running out as a PV, but for some reason I couldn’t get pygrub to recognize root and find the Kernel. After wasting another unspecified amount of time I settled for running it as a PVH with an extracted Kernel. Which worked!

Based on my reading, PV is very not supported for 64-bit os FreeBSD, and I couldn’t find much information on PVH. How did you extract the kernel boot arguments for OPNsense? I couldn’t make much sense for the magic loader.conf, and my attempts to chainload via the loader also failed.

Solution #3 - Just go KVM

I tried this, and was able to go EFI! But I got a lot of core dumps starting up, and weird 500 errors when managing opnsense via the web interface. That required reboots and service restarts, respectively so I’ve temporarily given up on opnsense until I can fix either KVM or Xen.

Solution #2 - Dedicated hypervisor distribution

XCP-NG gave me 100% cpu usage when I passed through my PCIe NICs too

Solution #1 - Linux-based Router OS in PV

I have been tempted towards this direction because of all of this, but I haven’t yet found a reasonable web or gui-managable solution. If you find something, I’d love to know too

I just really wanted to go with Xen initially for the added separation of dom0, but the more I work with it, the less differences from KVM I see.

Exactly why I tried to use xen too!

Just started building kernel, we’ll see.
I ticked all the boxes listed here + marked pciback as * instead of M. We’ll see.

I suspect this may be an issue of some kernel flags, because I initially couldn’t even turn on VFs on the NICs until I added pci=realloc to kernel command line. I think.

In 2010 their documentation listed this as a bug, I assumed something might have changed in the last (check notes) 13 years since Dec. 17, 2010. The “current” version (from 2015) doesn’t have this annotation anymore, just says

As of this release, Xen PV DomU support is not heavily tested; insta-
bility has been reported during VM migration of PV kernels.

As for PVH support and also kernel params, I used this: FreeBSD PVH - Xen

Some more reading on PVH and the spectrum:
https://xenbits.xen.org/docs/4.6-testing/misc/pvh.html
https://wiki.xenproject.org/wiki/Understanding_the_Virtualization_Spectrum

I want to test VyOS, but idk if they have a “nice” GUI interface. CLI is fine for me personally, but I understand it might not be fine for you.

In 2010 their documentation listed this as a bug, I assumed something might have changed in the last (check notes) 13 years since Dec. 17, 2010. The “current” version (from 2015) doesn’t have this annotation anymore, just says

As of this release, Xen PV DomU support is not heavily tested; insta-
bility has been reported during VM migration of PV kernels.

As for PVH support and also kernel params, I used this: FreeBSD PVH - Xen

Interesting, I saw that DomU Support for Xen - Xen linked to [base] Revision 282274 where PV was removed in 2015

It’s very frustrating how out of date the documentation is considering how active Xen is as a project!

It works!


Still no interface though, but at least the passthrough worked.
Custom kernel. Will post instructions later.
Top looks reasonable:

2 Likes

I think in the meantime I have finally figured out how to create direct links between domains. The keywords seem to actually be “driver domain” and “backend” in “vif” specification, cf:

https://xenbits.xen.org/docs/unstable/man/xl.cfg.5.html#Other-Options
https://xenbits.xen.org/docs/unstable/man/xl-network-configuration.5.html

I have not tested that yet, but the setup would require setting up a “driver domain” VM, say “dom1” and then specifying vif=...,backend=dom1 for dom2, which should create a link between dom1 and dom2 with the backend driver in dom1. This is purely theoretical and I have not seen any guide for that yet.


Meanwhile I’m actually seriously considering having all the switching being done either in a completely separate VM (even from opnsense) or in dom0. The reason being, I can’t find reliable info about whether the FreeBSD Chelsio driver can handle switching on hardware level, or will it do all of it in software. The Linux manuals for the drivers are much more complete on that front, including OVS offloading etc.
I also noticed something I didn’t previously, that OPNsense actually doesn’t really like having more than one LAN interfaces and you have to setup the bridge manually. So given I currently have 2 NICs, each with switching offloading, I could probably be better off by configuring the switching on dom0, either bridging the two at dom0 or at the router domain, and passing a single VF from either card (so 2 total) to each VM. Why? Because that way I can have them use the offloading features on the corresponding interface.

P.S. It would be ideal if the driver could do switching by DMA between the cards, but I have no idea if that’s even possible.

I’m about to rebuild the system once again to make sure my instructions are more or less complete and I’m not missing anything important. Like I just realized I forgot to add a peculiar module_blacklist to the ArchISO command line.

1 Like

(My) Arch Xen setup (as of 2024-02-01):

Let me know if I should post some parts of this “guide” edited somewhere else.
Comments and suggestions are welcome.

Start & identification

  1. Run Arch ISO, connect mobo non-IPMI Ethernet
  • I have this weird kernel panic caused by csiostor so I blacklist it with module_blacklist=csiostor in GRUB parameters. This has to be applied on every Live ISO boot (until fixed).
  1. ip addr
ip addr
root@archiso ~ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    altname enp0s31f6
3: enp23s0f4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 192.168.1.15/24 metric 100 brd 192.168.1.255 scope global dynamic enp23s0f4
       valid_lft 86122sec preferred_lft 86122sec
4: enp23s0f4d1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
5: enp23s0f4d2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
6: enp23s0f4d3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
7: enp24s0f4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
8: enp24s0f4d1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
9: enp24s0f4d2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
10: enp24s0f4d3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
11: wlan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
  1. For SSH setup instead of local, s.t. commands can be easily copied:
    1. passwd; systemctl status sshd
    2. ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null root@[archiso ip addr]
  2. lspci -vvt, identify hardware, check if anything’s missing
lspci -vvt
root@archiso ~ # lspci -vvt
-+-[0000:00]-+-00.0  Intel Corporation Sky Lake-E DMI3 Registers
 |           +-04.0  Intel Corporation Sky Lake-E CBDMA Registers
 |           +-04.1  Intel Corporation Sky Lake-E CBDMA Registers
 |           +-04.2  Intel Corporation Sky Lake-E CBDMA Registers
 |           +-04.3  Intel Corporation Sky Lake-E CBDMA Registers
 |           +-04.4  Intel Corporation Sky Lake-E CBDMA Registers
 |           +-04.5  Intel Corporation Sky Lake-E CBDMA Registers
 |           +-04.6  Intel Corporation Sky Lake-E CBDMA Registers
 |           +-04.7  Intel Corporation Sky Lake-E CBDMA Registers
 |           +-05.0  Intel Corporation Sky Lake-E MM/Vt-d Configuration Registers
 |           +-05.2  Intel Corporation Sky Lake-E RAS
 |           +-05.4  Intel Corporation Sky Lake-E IOAPIC
 |           +-08.0  Intel Corporation Sky Lake-E Ubox Registers
 |           +-08.1  Intel Corporation Sky Lake-E Ubox Registers
 |           +-08.2  Intel Corporation Sky Lake-E Ubox Registers
 |           +-14.0  Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller
 |           +-14.2  Intel Corporation 200 Series PCH Thermal Subsystem
 |           +-16.0  Intel Corporation 200 Series PCH CSME HECI #1
 |           +-17.0  Intel Corporation 200 Series PCH SATA controller [AHCI mode]
 |           +-1b.0-[01]----00.0  Intel Corporation Optane NVME SSD H10 with Solid State Storage [Teton Glacier]
 |           +-1b.4-[02]--+-00.0  Advanced Micro Devices, Inc. [AMD/ATI] Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM]
 |           |            \-00.1  Advanced Micro Devices, Inc. [AMD/ATI] Caicos HDMI Audio [Radeon HD 6450 / 7450/8450/8490 OEM / R5 230/235/235X OEM]
 |           +-1c.0-[03]----00.0  Realtek Semiconductor Co., Ltd. RTL8822BE 802.11a/b/g/n/ac WiFi adapter
 |           +-1c.1-[04]----00.0  ASMedia Technology Inc. ASM1062 Serial ATA Controller
 |           +-1c.4-[05]----00.0  ASMedia Technology Inc. ASM2142/ASM3142 USB 3.1 Host Controller
 |           +-1c.6-[06]----00.0  ASMedia Technology Inc. ASM2142/ASM3142 USB 3.1 Host Controller
 |           +-1d.0-[07]----00.0  Intel Corporation Optane NVME SSD H10 with Solid State Storage [Teton Glacier]
 |           +-1f.0  Intel Corporation X299 Chipset LPC/eSPI Controller
 |           +-1f.2  Intel Corporation 200 Series/Z370 Chipset Family Power Management Controller
 |           +-1f.3  Intel Corporation 200 Series PCH HD Audio
 |           +-1f.4  Intel Corporation 200 Series/Z370 Chipset Family SMBus Controller
 |           \-1f.6  Intel Corporation Ethernet Connection (2) I219-V
 +-[0000:16]-+-00.0-[17]--+-00.0  Chelsio Communications Inc T540-BT Unified Wire Ethernet Controller
 |           |            +-00.1  Chelsio Communications Inc T540-BT Unified Wire Ethernet Controller
 |           |            +-00.2  Chelsio Communications Inc T540-BT Unified Wire Ethernet Controller
 |           |            +-00.3  Chelsio Communications Inc T540-BT Unified Wire Ethernet Controller
 |           |            +-00.4  Chelsio Communications Inc T540-BT Unified Wire Ethernet Controller
 |           |            +-00.5  Chelsio Communications Inc T540-BT Unified Wire Storage Controller
 |           |            \-00.6  Chelsio Communications Inc T540-BT Unified Wire Storage Controller
 |           +-02.0-[18]--+-00.0  Chelsio Communications Inc T540-BT Unified Wire Ethernet Controller
 |           |            +-00.1  Chelsio Communications Inc T540-BT Unified Wire Ethernet Controller
 |           |            +-00.2  Chelsio Communications Inc T540-BT Unified Wire Ethernet Controller
 |           |            +-00.3  Chelsio Communications Inc T540-BT Unified Wire Ethernet Controller
 |           |            +-00.4  Chelsio Communications Inc T540-BT Unified Wire Ethernet Controller
 |           |            +-00.5  Chelsio Communications Inc T540-BT Unified Wire Storage Controller
 |           |            \-00.6  Chelsio Communications Inc T540-BT Unified Wire Storage Controller
 |           +-05.0  Intel Corporation Sky Lake-E VT-d
 |           +-05.2  Intel Corporation Sky Lake-E RAS Configuration Registers
 |           +-05.4  Intel Corporation Sky Lake-E IOxAPIC Configuration Registers
 |           +-08.0  Intel Corporation Sky Lake-E CHA Registers
 |           +-08.1  Intel Corporation Sky Lake-E CHA Registers
 |           +-08.2  Intel Corporation Sky Lake-E CHA Registers
 |           +-08.3  Intel Corporation Sky Lake-E CHA Registers
 |           +-08.4  Intel Corporation Sky Lake-E CHA Registers
 |           +-08.5  Intel Corporation Sky Lake-E CHA Registers
 |           +-08.6  Intel Corporation Sky Lake-E CHA Registers
 |           +-08.7  Intel Corporation Sky Lake-E CHA Registers
 |           +-09.0  Intel Corporation Sky Lake-E CHA Registers
 |           +-09.1  Intel Corporation Sky Lake-E CHA Registers
 |           +-0e.0  Intel Corporation Sky Lake-E CHA Registers
 |           +-0e.1  Intel Corporation Sky Lake-E CHA Registers
 |           +-0e.2  Intel Corporation Sky Lake-E CHA Registers
 |           +-0e.3  Intel Corporation Sky Lake-E CHA Registers
 |           +-0e.4  Intel Corporation Sky Lake-E CHA Registers
 |           +-0e.5  Intel Corporation Sky Lake-E CHA Registers
 |           +-0e.6  Intel Corporation Sky Lake-E CHA Registers
 |           +-0e.7  Intel Corporation Sky Lake-E CHA Registers
 |           +-0f.0  Intel Corporation Sky Lake-E CHA Registers
 |           +-0f.1  Intel Corporation Sky Lake-E CHA Registers
 |           +-1d.0  Intel Corporation Sky Lake-E CHA Registers
 |           +-1d.1  Intel Corporation Sky Lake-E CHA Registers
 |           +-1d.2  Intel Corporation Sky Lake-E CHA Registers
 |           +-1d.3  Intel Corporation Sky Lake-E CHA Registers
 |           +-1e.0  Intel Corporation Sky Lake-E PCU Registers
 |           +-1e.1  Intel Corporation Sky Lake-E PCU Registers
 |           +-1e.2  Intel Corporation Sky Lake-E PCU Registers
 |           +-1e.3  Intel Corporation Sky Lake-E PCU Registers
 |           +-1e.4  Intel Corporation Sky Lake-E PCU Registers
 |           +-1e.5  Intel Corporation Sky Lake-E PCU Registers
 |           \-1e.6  Intel Corporation Sky Lake-E PCU Registers
 +-[0000:64]-+-01.0-[65]----00.0  Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
 |           +-02.0-[66]----00.0  Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
 |           +-03.0-[67]----00.0  Intel Corporation Optane NVME SSD H10 with Solid State Storage [Teton Glacier]
 |           +-05.0  Intel Corporation Sky Lake-E VT-d
 |           +-05.2  Intel Corporation Sky Lake-E RAS Configuration Registers
 |           +-05.4  Intel Corporation Sky Lake-E IOxAPIC Configuration Registers
 |           +-08.0  Intel Corporation Sky Lake-E Integrated Memory Controller
 |           +-09.0  Intel Corporation Sky Lake-E Integrated Memory Controller
 |           +-0a.0  Intel Corporation Sky Lake-E Integrated Memory Controller
 |           +-0a.1  Intel Corporation Sky Lake-E Integrated Memory Controller
 |           +-0a.2  Intel Corporation Sky Lake-E Integrated Memory Controller
 |           +-0a.3  Intel Corporation Sky Lake-E Integrated Memory Controller
 |           +-0a.4  Intel Corporation Sky Lake-E Integrated Memory Controller
 |           +-0a.5  Intel Corporation Sky Lake-E LM Channel 1
 |           +-0a.6  Intel Corporation Sky Lake-E LMS Channel 1
 |           +-0a.7  Intel Corporation Sky Lake-E LMDP Channel 1
 |           +-0b.0  Intel Corporation Sky Lake-E DECS Channel 2
 |           +-0b.1  Intel Corporation Sky Lake-E LM Channel 2
 |           +-0b.2  Intel Corporation Sky Lake-E LMS Channel 2
 |           +-0b.3  Intel Corporation Sky Lake-E LMDP Channel 2
 |           +-0c.0  Intel Corporation Sky Lake-E Integrated Memory Controller
 |           +-0c.1  Intel Corporation Sky Lake-E Integrated Memory Controller
 |           +-0c.2  Intel Corporation Sky Lake-E Integrated Memory Controller
 |           +-0c.3  Intel Corporation Sky Lake-E Integrated Memory Controller
 |           +-0c.4  Intel Corporation Sky Lake-E Integrated Memory Controller
 |           +-0c.5  Intel Corporation Sky Lake-E LM Channel 1
 |           +-0c.6  Intel Corporation Sky Lake-E LMS Channel 1
 |           +-0c.7  Intel Corporation Sky Lake-E LMDP Channel 1
 |           +-0d.0  Intel Corporation Sky Lake-E DECS Channel 2
 |           +-0d.1  Intel Corporation Sky Lake-E LM Channel 2
 |           +-0d.2  Intel Corporation Sky Lake-E LMS Channel 2
 |           \-0d.3  Intel Corporation Sky Lake-E LMDP Channel 2
 \-[0000:b2]-+-00.0-[b3]----00.0  Samsung Electronics Co Ltd NVMe SSD Controller 980 (DRAM-less)
             +-01.0-[b4]----00.0  Hewlett-Packard Company Smart Array Gen8 Controllers
             +-03.0-[b5]----00.0  Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
             +-05.0  Intel Corporation Sky Lake-E VT-d
             +-05.2  Intel Corporation Sky Lake-E RAS Configuration Registers
             +-05.4  Intel Corporation Sky Lake-E IOxAPIC Configuration Registers
             +-12.0  Intel Corporation Sky Lake-E M3KTI Registers
             +-12.1  Intel Corporation Sky Lake-E M3KTI Registers
             +-12.2  Intel Corporation Sky Lake-E M3KTI Registers
             +-15.0  Intel Corporation Sky Lake-E M2PCI Registers
             +-16.0  Intel Corporation Sky Lake-E M2PCI Registers
             +-16.4  Intel Corporation Sky Lake-E M2PCI Registers
             \-17.0  Intel Corporation Sky Lake-E M2PCI Registers
  1. timedatectl
  2. lsblk -o+FSTYPE

Root on RAID1

Setting up root on a mirror DC S4600 (240GB each).
All default VMs will have their root hosted on a mirrored SATA SSD with good write endurance for logging.
In my setup, all VMs use thick-provisioned LVM volumes as their drives.
This also allows me to boot into them directly if needed.

  1. Identify disks using lsblk or ls -l /dev/disk/by-id:
    root@archiso ~ # ls -l /dev/disk/by-id
    total 0
    lrwxrwxrwx 1 root root  9 Jan  4 17:43 ata-MK000240GWKVK_BTYM73830F0Q240AGN -> ../../sdb
    lrwxrwxrwx 1 root root  9 Jan  4 17:43 ata-MK000240GWKVK_BTYM7384027Z240AGN -> ../../sda
    
    If there is an active md / lvm active on the drives, remove them first:
    • vgremove /dev/vgX (answer yes to all)
    • mdadm --stop /dev/mdX
  2. blkdiscard -f /dev/sdX for both drives
  3. fdisk /dev/sdX for both mirrors
    : g
    : n
      : 1
      : [default 2048]
      : +8G
      : (optional) Y (if asked for previous signature)
    : n
      : 2
      : [default]
      : [default]
      : (optional) Y (if asked for previous signature)
    : t
      : 1
      : 1 [EFI System]
    : t
      : 2
      : 43 [Linux RAID]
    : w
    
  4. mkfs.fat -F32 /dev/sdX1 for both mirrors
  5. mdadm --homehost=any --create /dev/md0 --verbose --level=1 --metadata=1.2 --raid-devices=2 --name=rootraid /dev/sda2 /dev/sdb2
  6. mdadm --detail /dev/md0
    mdadm --detail /dev/md0
    root@archiso ~ # mdadm --detail /dev/md0
    /dev/md0:
               Version : 1.2
         Creation Time : Thu Feb  1 22:01:56 2024
            Raid Level : raid1
            Array Size : 225908736 (215.44 GiB 231.33 GB)
         Used Dev Size : 225908736 (215.44 GiB 231.33 GB)
          Raid Devices : 2
         Total Devices : 2
           Persistence : Superblock is persistent
    
         Intent Bitmap : Internal
    
           Update Time : Thu Feb  1 22:02:21 2024
                 State : clean, resyncing
        Active Devices : 2
       Working Devices : 2
        Failed Devices : 0
         Spare Devices : 0
    
    Consistency Policy : bitmap
    
         Resync Status : 2% complete
    
                  Name : any:rootraid
                  UUID : 6c2409d0:0d12ba02:5705e0cf:2c69beb6
                Events : 5
    
        Number   Major   Minor   RaidDevice State
           0       8        2        0      active sync   /dev/sda2
           1       8       18        1      active sync   /dev/sdb2
    
  7. pvcreate /dev/md0
  8. vgcreate vgroot /dev/md0
  9. lvcreate -n vmserver-root -L 20G vgroot
  10. mkfs.ext4 -vL "vmserver-root" -b 4096 /dev/vgroot/vmserver-root

Arch system install

C.f. Installation guide - ArchWiki, more or less from the “1.11 Mount the file systems” step.

  1. mount /dev/vgroot/vmserver-root /mnt
  2. mount --mkdir /dev/sda1 /mnt/boot - sdb will be mirrored manually; OPTIONAL: installation - Can the EFI system partition be RAIDed? - Ask Ubuntu
  3. pacstrap -K /mnt base linux-hardened linux-firmware
  4. genfstab -U /mnt >> /mnt/etc/fstab
  5. arch-chroot /mnt
  6. pacman -Sy nano vim zsh less sudo wget tmux htop iotop mdadm lvm2 efivar edk2-shell memtest86+-efi openssh bash-completion man-db
    If there are any other “essential” packages for you, add them here.
  7. nano /etc/makepkg.conf change:
    • CFLAGS="-march=native -mtune=native ..."
    • RUSTFLAGS="-C opt-level=2 -C target-cpu=native"
    • MAKEFLAGS="-j16" or w/e the core count
    • PACKAGER="Marcin Slowik <[email protected]>" or whoever you are :wink:
  8. systemctl enable sshd systemd-networkd systemd-resolved
  9. nano /etc/systemd/network/20-wired.network
    [Match]
    Name=en*
    
    [Network]
    DHCP=yes
    
  10. echo "vmserver" > /etc/hostname
  11. rm /etc/resolv.conf; ln -s /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf

Locale & timezone

  1. ln -sf /usr/share/zoneinfo/CET /etc/localtime; hwclock --systohc - change to your timezone if needed, I prefer CET to e.g. Europe/Warsaw
  2. sed -r 's/#(en_US.UTF-8 UTF-8)/\1/' -i /etc/locale.gen
  3. echo "LANG=en_US.UTF-8" > /etc/locale.conf
  4. locale-gen

Admin users

We need a dedicated Admin user for building Xen and Linux kernel from source, because makepkg really doesn’t like when you build something as root.

  1. passwd (in chroot)
  2. EDITOR=nano visudo, uncomment %sudo ALL=(ALL:ALL) ALL, save & exit
  3. groupadd sudo -g 32
  4. useradd -m admin -s /bin/bash -G adm,sudo,wheel,power,users, although I want to learn zsh one day, today’s still not the day :wink: . Set w/e shell and groups you like though.
  5. passwd admin

Install and configure bootloader

I’m using systemd-boot instead of GRUB, steps for GRUB will be slightly different. I consider systemd-boot to be good enough.

  1. Install microcode updates
    • sudo pacman -Sy intel-ucode or amd-ucode or both.
  2. nano /etc/mkinitcpio.conf
    • Add mdadm_udev and lvm2 in HOOKS=(... block mdadm_udev lvm2 filesystems ...)
  3. nano /etc/fstab, in /boot change fmask=0077,dmask=0077
  4. umount /boot; chmod 700 /boot; mount -a
  5. mkdir -p /boot/loader/entries
  6. nano /boot/loader/entries/20-vmserver-direct.conf
    title    VMServer Arch Linux - Direct
    linux    /vmlinuz-linux-hardened
    initrd   /intel-ucode.img
    initrd   /initramfs-linux-hardened.img
    options  root=/dev/vgroot/vmserver-root rw module_blacklist=csiostor add_efi_memmap intel_iommu=on iommu=pt pci=realloc
    
  7. nano /boot/loader/entries/30-memtest86+.conf
    title    Memtest86+ - EFI
    efi      /memtest86+/memtest.efi
    
  8. cp /usr/share/edk2-shell/x64/Shell.efi /boot/shellx64.efi
  9. nano /boot/loader/loader.conf
    default  @saved
    timeout  3
    
  10. bootctl install
  11. mkinitcpio -P
  12. exit; reboot → Try booting into arch
  13. ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null admin@[...] should work, if not check ip addr and then troubleshoot

Build & Install additional software

THIS IS THE MOST PECULIAR ISSUE I HAVE EVER ENCOUNTERED SO FAR!

I tried building everything inside chroot to not pollute the base system with devel packages, so I started with building paru (using makechrootpkg) and then configured it with Chroot and used it to download, build and install xen - NO PROBLEM.

But when I try to build Linux kernel, I encounter the following issue - neither of the interactive configuration tools work properly! And (my guess is) that’s because the terminal properties are not being properly forwarded down the chain into the chroot environment. The closest I could find was this:

How can I export an env var so makechrootpkg uses color when building / Creating & Modifying Packages / Arch Linux Forums.

At first, I thought the issue was the default .bashrc as someone pointed out here. But no. I changed to my .bashrc that I planned to download at a later stage, reset it and nothing!

I finally validated that in a fresh terminal (the broken nconfig/menuconfig can severely cripple the terminal) makepkg -s works after installing base-devel, but in the same terminal makechrootpkg is borked again. The most surprising thing for me is that NOBODY IS COMPLAINING ABOUT THIS as if I was the only one affected by this issue. I believe there has to be a way to remedy this, but for the time being, I’m just going to build the kernel outside a clean chroot, or modify the config file manually.

The last time I simply forgot I could do the whole makechrootpkg and built linux without chroot.

Nevertheless, I need to redo some steps now and will try again tomorrow.

2 Likes

Optional: bashrc config

I use a global /etc/bash.bashrc config to have consistent base config between accounts, you can configure it however you like. Mine is a modified Debian/Ubuntu default from a few years ago.

  1. wget https://gist.githubusercontent.com/Marandil/2054fbc797b4613a19c22b22d769bdc2/raw/etc-bash.bashrc
  2. nano ~/.bashrc
    Comment out conflicting/duplicate entries, e.g.:
    • alias ls='ls --color=auto'
    • alias grep='grep --color=auto'
    • PS1='[\u@\h \W]\$ '
  3. sudo mv etc-bash.bashrc /etc/bash.bashrc
  4. sudo chown root:root /etc/bash.bashrc
  5. Relog or . /etc/bash.bashrc

Build & Install additional software (finally)

  1. Prepare temporary build environment:
    In my experience the current linux build (6.7.3) needs about 32G of storage. vmserver-root only has 20G total. Optane on H10 has under 32GiB but close (27.3GiB). With enough RAM we could build it in RAM-disk, but NVMe storage is also good enough.
    1. mkdir ~/build
    2. If using NVMe backing:
      1. sudo mkfs.ext4 /dev/nvme5n1 where nvme5 is a spare NVMe. Yes, no partitions, raw FS on full namespace is OK.
      2. sudo mount /dev/nvne5n1 ~/build
        If using RAM backing:
      • sudo mount -o size=40G -t tmpfs none ~/build
    3. cd ~/build
    4. mkdir chroot; mkarchroot chroot/root base-devel
  2. Install paru (AUR package manager):
    1. sudo pacman -Sy git devtools
    2. git clone https://aur.archlinux.org/paru.git; cd paru
    3. makechrootpkg -r ../chroot
    4. sudo pacman -U paru-*.pkg.tar.zst
    5. sudo nano /etc/paru.conf uncomment Chroot and LocalRepo
    6. sudo nano /etc/pacman.conf set CacheDir = /var/lib/repo/aur and append:
    [aur]
    SigLevel = PackageOptional DatabaseOptional
    Server = file:///var/lib/repo/aur
    
    1. paru -Sy paru should regenerate repository s.t. pacman does not complain.
      When needed, paru pkg is at /var/lib/repo/aur/paru-*-x86_64.pkg.tar.zst
  3. Build custom Linux kernel:
    Kernel/Arch build system - ArchWiki
    Xen flags based on Xen - Gentoo wiki + pciback built into kernel.
    In this setup, the image is only used with Xen and direct boot should use the default.
    1. cd ~/build; pkgctl repo clone --protocol=https linux; cd linux
    2. nano PKGBUILD:
      • Set pkgbase=linux-custom or something similar, like linux-xen. linux-custom is used below.
      • Comment out make htmldocs in build() and "$pkgbase-docs" in pkgname, packages under htmldocs in makedepends.
    3. Import gpg keys:
    4. Depending on the strategy chosen, we need to build outside of chroot, or can either build in or outside of chroot. Building outside requires pacman -Sy base-devel.
      Changing config using nconfig/menuconfig:
      1. nano PKGBUILD, change make olddefconfig to make nconfig or make menuconfig in prepare().
      2. makepkg -s
        Change config (based on Xen - Gentoo wiki):
      Kernel Config
      Processor type and features  --->
        [*] Linux guest support  --->
            [*]   Enable paravirtualization code
            [*]   Paravirtualization layer for spinlocks
            [*]   Xen guest support
            [*]     Xen PV guest support
            [*]       Limit Xen pv-domain memory to 512GB
            [*]     Xen PVHVM guest support
            [*]     Enable Xen debug and tuning parameters in debugfs
            [*]     Xen PVH guest support
            [*]   Xen Dom0 support
      Device Drivers  --->
        Character devices  --->
            [*] Xen Hypervisor Console support
            [*]   Xen Hypervisor Multiple Consoles support
        [*] Block devices  --->
            <*>   Xen virtual block device support
            <*>   Xen block-device backend driver
        [*] Network device support  --->
            <*>   Xen network device frontend driver
            <*>   Xen block-device backend driver
        [*] PCI support  --->
          <*> Xen PCI Frontend
        Input device support  --->
            [*] Miscellaneous devices  --->
              <*>   Xen virtual keyboard and mouse support
        Graphics support  --->
              Frame buffer Devices  --->
                  <*> Xen virtual frame buffer support
        Network device support --->
            <M> Universal TUN/TAP device driver support
        Xen driver support  --->
              [*] Xen memory balloon driver
              [*]   Memory hotplug support for Xen balloon driver
              [*] Scrub pages before returning them to system by default
              <*> Xen /dev/xen/evtchn device
              [*] Backend driver support
              <*> Xen filesystem
              [*]   Create compatibility mount point /proc/xen
              [*] Create xen entries under /sys/hypervisor
              <*> userspace grant access device driver
              [*]   Add support for dma-buf grant access device driver extension
              <*> User-space grant reference allocator driver
              [*] Allow allocating DMA capable buffers with grant reference module
              <*> Xen PCI-device backend driver
              <*> XEN PV Calls frontend driver
              <*> XEN PV Calls backend driver
              <M> XEN SCSI backend driver
              -*- Xen hypercall passthrough driver
              [*]   Xen Ioeventfd and irqfd support
              <*> Xen ACPI processor
              [*] Xen platform mcelog
              [*] Xen symbols
              [*] Use unpopulated memory ranges for guest mappings
              [*] Xen virtio support
      Power management and ACPI options  --->
        [*] ACPI (Advanced Configuration and Power Interface) Support  --->
      [*] Networking support --->
        Networking options  --->
              <*> 802.1d Ethernet Bridging
          [*] Network packet filtering framework (Netfilter) --->
                    [*] Advanced netfilter configuration
                    [*]   Bridged IP/ARP packets filtering
      
      Alternatively, config diff (for linux-6.7.2/3):
      1. Apply config patch (either manually or with patch -p1):
      config.patch
      --- a/config
      +++ b/config
      @@ -388,7 +388,7 @@
       CONFIG_XEN_PVHVM_SMP=y
       CONFIG_XEN_PVHVM_GUEST=y
       CONFIG_XEN_SAVE_RESTORE=y
      -# CONFIG_XEN_DEBUG_FS is not set
      +CONFIG_XEN_DEBUG_FS=y
       CONFIG_XEN_PVH=y
       CONFIG_XEN_DOM0=y
       CONFIG_XEN_PV_MSR_SAFE=y
      @@ -1351,7 +1351,7 @@
       CONFIG_NETWORK_PHY_TIMESTAMPING=y
       CONFIG_NETFILTER=y
       CONFIG_NETFILTER_ADVANCED=y
      -CONFIG_BRIDGE_NETFILTER=m
      +CONFIG_BRIDGE_NETFILTER=y
      
       #
       # Core Netfilter Configuration
      @@ -1735,10 +1735,10 @@
       CONFIG_L2TP_V3=y
       CONFIG_L2TP_IP=m
       CONFIG_L2TP_ETH=m
      -CONFIG_STP=m
      +CONFIG_STP=y
       CONFIG_GARP=m
       CONFIG_MRP=m
      -CONFIG_BRIDGE=m
      +CONFIG_BRIDGE=y
       CONFIG_BRIDGE_IGMP_SNOOPING=y
       CONFIG_BRIDGE_VLAN_FILTERING=y
       CONFIG_BRIDGE_MRP=y
      @@ -1770,7 +1770,7 @@
       CONFIG_VLAN_8021Q=m
       CONFIG_VLAN_8021Q_GVRP=y
       CONFIG_VLAN_8021Q_MVRP=y
      -CONFIG_LLC=m
      +CONFIG_LLC=y
       CONFIG_LLC2=m
       CONFIG_ATALK=m
       # CONFIG_X25 is not set
      @@ -2047,7 +2047,7 @@
      
       CONFIG_AF_RXRPC=m
       CONFIG_AF_RXRPC_IPV6=y
      -# CONFIG_AF_RXRPC_INJECT_LOSS is not set
      +CONFIG_AF_RXRPC_INJECT_LOSS=y
       # CONFIG_AF_RXRPC_INJECT_RX_DELAY is not set
       CONFIG_AF_RXRPC_DEBUG=y
       CONFIG_RXKAD=y
      @@ -2193,7 +2193,7 @@
       # CONFIG_PCI_REALLOC_ENABLE_AUTO is not set
       CONFIG_PCI_STUB=y
       CONFIG_PCI_PF_STUB=m
      -CONFIG_XEN_PCIDEV_FRONTEND=m
      +CONFIG_XEN_PCIDEV_FRONTEND=y
       CONFIG_PCI_ATS=y
       CONFIG_PCI_DOE=y
       CONFIG_PCI_LOCKLESS_CONFIG=y
      @@ -2607,8 +2607,8 @@
       CONFIG_CDROM_PKTCDVD_BUFFERS=8
       # CONFIG_CDROM_PKTCDVD_WCACHE is not set
       CONFIG_ATA_OVER_ETH=m
      -CONFIG_XEN_BLKDEV_FRONTEND=m
      -CONFIG_XEN_BLKDEV_BACKEND=m
      +CONFIG_XEN_BLKDEV_FRONTEND=y
      +CONFIG_XEN_BLKDEV_BACKEND=y
       CONFIG_VIRTIO_BLK=m
       CONFIG_BLK_DEV_RBD=m
       CONFIG_BLK_DEV_UBLK=m
      @@ -4167,8 +4167,8 @@
       CONFIG_MTK_T7XX=m
       # end of Wireless WAN
      
      -CONFIG_XEN_NETDEV_FRONTEND=m
      -CONFIG_XEN_NETDEV_BACKEND=m
      +CONFIG_XEN_NETDEV_FRONTEND=y
      +CONFIG_XEN_NETDEV_BACKEND=y
       CONFIG_VMXNET3=m
       CONFIG_FUJITSU_ES=m
       CONFIG_USB4_NET=m
      @@ -4496,7 +4496,7 @@
       CONFIG_INPUT_IQS7222=m
       CONFIG_INPUT_CMA3000=m
       CONFIG_INPUT_CMA3000_I2C=m
      -CONFIG_INPUT_XEN_KBDDEV_FRONTEND=m
      +CONFIG_INPUT_XEN_KBDDEV_FRONTEND=y
       CONFIG_INPUT_IDEAPAD_SLIDEBAR=m
       CONFIG_INPUT_SOC_BUTTON_ARRAY=m
       CONFIG_INPUT_DRV260X_HAPTICS=m
      @@ -6939,7 +6939,7 @@
       # CONFIG_FB_UDL is not set
       # CONFIG_FB_IBM_GXT4500 is not set
       # CONFIG_FB_VIRTUAL is not set
      -CONFIG_XEN_FBDEV_FRONTEND=m
      +CONFIG_XEN_FBDEV_FRONTEND=y
       # CONFIG_FB_METRONOME is not set
       # CONFIG_FB_MB862XX is not set
       # CONFIG_FB_HYPERV is not set
      @@ -8952,25 +8952,25 @@
       CONFIG_XEN_BALLOON_MEMORY_HOTPLUG=y
       CONFIG_XEN_MEMORY_HOTPLUG_LIMIT=512
       CONFIG_XEN_SCRUB_PAGES_DEFAULT=y
      -CONFIG_XEN_DEV_EVTCHN=m
      +CONFIG_XEN_DEV_EVTCHN=y
       CONFIG_XEN_BACKEND=y
      -CONFIG_XENFS=m
      +CONFIG_XENFS=y
       CONFIG_XEN_COMPAT_XENFS=y
       CONFIG_XEN_SYS_HYPERVISOR=y
       CONFIG_XEN_XENBUS_FRONTEND=y
      -CONFIG_XEN_GNTDEV=m
      +CONFIG_XEN_GNTDEV=y
       CONFIG_XEN_GNTDEV_DMABUF=y
      -CONFIG_XEN_GRANT_DEV_ALLOC=m
      +CONFIG_XEN_GRANT_DEV_ALLOC=y
       CONFIG_XEN_GRANT_DMA_ALLOC=y
       CONFIG_SWIOTLB_XEN=y
       CONFIG_XEN_PCI_STUB=y
      -CONFIG_XEN_PCIDEV_BACKEND=m
      -CONFIG_XEN_PVCALLS_FRONTEND=m
      +CONFIG_XEN_PCIDEV_BACKEND=y
      +CONFIG_XEN_PVCALLS_FRONTEND=y
       CONFIG_XEN_PVCALLS_BACKEND=y
       CONFIG_XEN_SCSI_BACKEND=m
      -CONFIG_XEN_PRIVCMD=m
      +CONFIG_XEN_PRIVCMD=y
       CONFIG_XEN_PRIVCMD_EVENTFD=y
      -CONFIG_XEN_ACPI_PROCESSOR=m
      +CONFIG_XEN_ACPI_PROCESSOR=y
       CONFIG_XEN_MCE_LOG=y
       CONFIG_XEN_HAVE_PVMMU=y
       CONFIG_XEN_EFI=y
      
      1. makepkg -g >> PKGBUILD ← update config hash
      2. makechrootpkg -r ../chroot
    5. sudo cp linux-custom-*.tar.zst /var/lib/repo/aur/
    6. sudo pacman -U linux-custom-*.tar.zst
  4. Install xen:
    1. paru -Sy seabios edk2-ovmf
    2. paru -S xen xen-qemu xen-pvhgrub xen-docs
      • If it fails with gpg: keyserver receive failed: Server indicated a failure add nameserver 9.9.9.9 to resolv.conf temporarily or check if /etc/resolv.conf points to the systemd-resolved stub.
      • If the process fails in general, try installing packages one-by-one
  5. Configure xen:
    1. sudo nano /boot/xen.cfg:
    [global]
    default=xen
    
    [xen]
    options=console=vga dom0_mem=32768M,max:32768M dom0_max_vcpus=16 loglvl=all noreboot ucode=scan spec-ctrl=gds-mit=no iommu=force,verbose,qinval=yes
    kernel=vmlinuz-linux-custom root=/dev/vgroot/vmserver-root rw module_blacklist=csiostor add_efi_memmap intel_iommu=on iommu=pt pci=realloc
    ramdisk=initramfs-linux-custom.img
    
    dom0 mem and dom0 vcpus should be adjusted to available and modified as VMs are added.
    Finally something small like 4G RAM and 2vcpus should suffice.
    spec-ctrl=gds-mit=no is required to have AVX if host doesn’t have hw mitigation
    2. sudo nano /boot/loader/entries/10-xen.conf
    title   Xen Hypervisor
    efi     /xen.efi
    
    sudo systemctl enable xenstored
    sudo systemctl enable xenconsoled
    sudo systemctl enable xendomains
    sudo systemctl enable xen-init-dom0
    
    1. sudo reboot, make sure to select Xen Hypervisor in bootloader
    2. sudo xl list
    3. Test PCIe passthrough with a dummy HVM domain and random PCIe appliance:
      Using 0:b3:00.0 for the random appliance. As root run:
      1. mkdir -p /opt/xen/isos; cd /opt/xen/isos
      2. wget https://dl-cdn.alpinelinux.org/alpine/v3.19/releases/x86_64/alpine-virt-3.19.1-x86_64.iso
      3. nano /opt/xen/test.cfg
      name = "Test"
      type = "hvm"
      
      memory = 2048
      maxmem = 2048
      vcpus = 2
      
      disk = [ 
        "file:/opt/xen/isos/alpine-virt-3.19.1-x86_64.iso,hdc:cdrom,r",
      ]
      pci = [
        "0:b3:00.0"
      ]
      
      1. xl pci-assignable-add 0:b3:00.0; xl pci-assignable-list
      2. xl create /opt/xen/test.cfg
      3. xl top; xl dmesg; dmesg, check for errors/crashes, wait a minte, recheck again
      4. xl shutdown Test, check for errors/crashes, wait a minte, recheck again
      5. rm /opt/xen/test.cfg

PCIe passthrough should now work as expected.

@avicks512 What distro are you using? I can try and see if it works with my kernel (or kernel config diff).

I’m dumb. I just spent 2 hours trying to debug why I suddenly can’t passthrough SR-IOV VFs, suspecting IOMMU issues. I managed to break my UEFI configuration (can’t boot into BIOS but can into system - fun, huh? Need to clear CMOS and redo some of the configuration because I likely didn’t save all of it. Why did I think disabling CSM was such a good idea?)

Nevertheless, I learned that Xen doesn’t have the concept of iommu groups…

xen grep iommu
root@vmserver:/home/admin # dmesg | grep -i iommu
[ 1.922035] Kernel command line: root=/dev/vgroot/vmserver-root rw module_blacklist=csiostor add_efi_memmap intel_iommu=on iommu=pt pci=realloc
[ 1.922083] DMAR: IOMMU enabled
[ 2.472496] iommu: Default domain type: Passthrough (set via kernel command line)
root@vmserver:/home/admin # xl dmesg | grep -i iommu
(XEN) Command line: console=vga dom0_mem=32768M,max:32768M dom0_max_vcpus=16 loglvl=all noreboot ucode=scan spec-ctrl=gds-mit=no iommu=force,verbose,qinval=yes
(XEN) [VT-D]drhd->address = b5ffc000 iommu->reg = ffff82c000966000
(XEN) [VT-D]drhd->address = d8ffc000 iommu->reg = ffff82c000968000
(XEN) [VT-D]drhd->address = fbffc000 iommu->reg = ffff82c00096a000
(XEN) [VT-D]drhd->address = 92ffc000 iommu->reg = ffff82c00096c000
(XEN) Intel VT-d iommu 2 supported page sizes: 4kB, 2MB, 1GB
(XEN) Intel VT-d iommu 1 supported page sizes: 4kB, 2MB, 1GB
(XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB
(XEN) Intel VT-d iommu 3 supported page sizes: 4kB, 2MB, 1GB
(XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00096a000
(XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c000968000
(XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c000966000
(XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00096c000

… compared to direct Linux boot…

linux grep iommu
[    0.000000] Command line: initrd=\intel-ucode.img initrd=\initramfs-linux-hardened.img root=/dev/vgroot/vmserver-root rw module_blacklist=csiostor add_efi_memmap intel_iommu=on iommu=pt pci=realloc
[    0.382432] Kernel command line: pti=on page_alloc.shuffle=1 initrd=\intel-ucode.img initrd=\initramfs-linux-hardened.img root=/dev/vgroot/vmserver-root rw module_blacklist=csiostor add_efi_memmap intel_iommu=on iommu=pt pci=realloc
[    0.382516] DMAR: IOMMU enabled
[   12.240348] DMAR-IR: IOAPIC id 12 under DRHD base  0xfbffc000 IOMMU 2
[   12.240349] DMAR-IR: IOAPIC id 11 under DRHD base  0xd8ffc000 IOMMU 1
[   12.240350] DMAR-IR: IOAPIC id 10 under DRHD base  0xb5ffc000 IOMMU 0
[   12.240351] DMAR-IR: IOAPIC id 8 under DRHD base  0x92ffc000 IOMMU 3
[   12.240352] DMAR-IR: IOAPIC id 9 under DRHD base  0x92ffc000 IOMMU 3
[   12.477052] iommu: Default domain type: Passthrough (set via kernel command line)
[   12.499875] pci 0000:b2:00.0: Adding to iommu group 0
[   12.499904] pci 0000:b2:01.0: Adding to iommu group 1
[   12.499930] pci 0000:b2:03.0: Adding to iommu group 2
[   12.499960] pci 0000:b3:00.0: Adding to iommu group 3
[   12.499988] pci 0000:b4:00.0: Adding to iommu group 4
[   12.500015] pci 0000:b5:00.0: Adding to iommu group 5
[   12.500092] pci 0000:64:01.0: Adding to iommu group 6
[   12.500119] pci 0000:64:02.0: Adding to iommu group 7
[   12.500145] pci 0000:64:03.0: Adding to iommu group 8
[   12.500177] pci 0000:65:00.0: Adding to iommu group 9
[   12.500206] pci 0000:66:00.0: Adding to iommu group 10
[   12.500233] pci 0000:67:00.0: Adding to iommu group 11
[   12.500292] pci 0000:16:00.0: Adding to iommu group 12
[   12.500319] pci 0000:16:02.0: Adding to iommu group 13
[   12.500492] pci 0000:17:00.0: Adding to iommu group 14
[   12.500529] pci 0000:17:00.1: Adding to iommu group 14
[   12.500565] pci 0000:17:00.2: Adding to iommu group 14
[   12.500601] pci 0000:17:00.3: Adding to iommu group 14
[   12.500636] pci 0000:17:00.4: Adding to iommu group 14
[   12.500673] pci 0000:17:00.5: Adding to iommu group 14
[   12.500709] pci 0000:17:00.6: Adding to iommu group 14
[   12.500873] pci 0000:18:00.0: Adding to iommu group 15
[   12.500964] pci 0000:18:00.1: Adding to iommu group 15
[   12.501001] pci 0000:18:00.2: Adding to iommu group 15
[   12.501037] pci 0000:18:00.3: Adding to iommu group 15
[   12.501073] pci 0000:18:00.4: Adding to iommu group 15
[   12.501109] pci 0000:18:00.5: Adding to iommu group 15
[   12.501146] pci 0000:18:00.6: Adding to iommu group 15
[   12.501209] pci 0000:00:00.0: Adding to iommu group 16
[   12.501237] pci 0000:00:04.0: Adding to iommu group 17
[   12.501264] pci 0000:00:04.1: Adding to iommu group 18
[   12.501292] pci 0000:00:04.2: Adding to iommu group 19
[   12.501318] pci 0000:00:04.3: Adding to iommu group 20
[   12.501344] pci 0000:00:04.4: Adding to iommu group 21
[   12.501372] pci 0000:00:04.5: Adding to iommu group 22
[   12.501399] pci 0000:00:04.6: Adding to iommu group 23
[   12.501425] pci 0000:00:04.7: Adding to iommu group 24
[   12.501452] pci 0000:00:05.0: Adding to iommu group 25
[   12.501479] pci 0000:00:05.2: Adding to iommu group 26
[   12.501506] pci 0000:00:05.4: Adding to iommu group 27
[   12.501534] pci 0000:00:08.0: Adding to iommu group 28
[   12.501584] pci 0000:00:08.1: Adding to iommu group 29
[   12.501610] pci 0000:00:08.2: Adding to iommu group 30
[   12.501677] pci 0000:00:14.0: Adding to iommu group 31
[   12.501704] pci 0000:00:14.2: Adding to iommu group 31
[   12.501755] pci 0000:00:16.0: Adding to iommu group 32
[   12.501782] pci 0000:00:17.0: Adding to iommu group 33
[   12.501809] pci 0000:00:1b.0: Adding to iommu group 34
[   12.501837] pci 0000:00:1b.4: Adding to iommu group 35
[   12.501865] pci 0000:00:1c.0: Adding to iommu group 36
[   12.501893] pci 0000:00:1c.1: Adding to iommu group 37
[   12.501920] pci 0000:00:1c.4: Adding to iommu group 38
[   12.501948] pci 0000:00:1c.6: Adding to iommu group 39
[   12.501976] pci 0000:00:1d.0: Adding to iommu group 40
[   12.502082] pci 0000:00:1f.0: Adding to iommu group 41
[   12.502110] pci 0000:00:1f.2: Adding to iommu group 41
[   12.502138] pci 0000:00:1f.3: Adding to iommu group 41
[   12.502166] pci 0000:00:1f.4: Adding to iommu group 41
[   12.502194] pci 0000:00:1f.6: Adding to iommu group 42
[   12.502222] pci 0000:01:00.0: Adding to iommu group 43
[   12.502289] pci 0000:02:00.0: Adding to iommu group 44
[   12.502319] pci 0000:02:00.1: Adding to iommu group 44
[   12.502347] pci 0000:03:00.0: Adding to iommu group 45
[   12.502374] pci 0000:04:00.0: Adding to iommu group 46
[   12.502404] pci 0000:05:00.0: Adding to iommu group 47
[   12.502430] pci 0000:06:00.0: Adding to iommu group 48
[   12.502458] pci 0000:07:00.0: Adding to iommu group 49
[   12.502485] pci 0000:16:05.0: Adding to iommu group 50
[   12.502512] pci 0000:16:05.2: Adding to iommu group 51
[   12.502539] pci 0000:16:05.4: Adding to iommu group 52
[   12.502727] pci 0000:16:08.0: Adding to iommu group 53
[   12.502758] pci 0000:16:08.1: Adding to iommu group 53
[   12.502789] pci 0000:16:08.2: Adding to iommu group 53
[   12.502820] pci 0000:16:08.3: Adding to iommu group 53
[   12.502851] pci 0000:16:08.4: Adding to iommu group 53
[   12.502882] pci 0000:16:08.5: Adding to iommu group 53
[   12.502913] pci 0000:16:08.6: Adding to iommu group 53
[   12.502946] pci 0000:16:08.7: Adding to iommu group 53
[   12.503013] pci 0000:16:09.0: Adding to iommu group 54
[   12.503045] pci 0000:16:09.1: Adding to iommu group 54
[   12.503232] pci 0000:16:0e.0: Adding to iommu group 55
[   12.503265] pci 0000:16:0e.1: Adding to iommu group 55
[   12.503298] pci 0000:16:0e.2: Adding to iommu group 55
[   12.503331] pci 0000:16:0e.3: Adding to iommu group 55
[   12.503363] pci 0000:16:0e.4: Adding to iommu group 55
[   12.503395] pci 0000:16:0e.5: Adding to iommu group 55
[   12.503427] pci 0000:16:0e.6: Adding to iommu group 55
[   12.503459] pci 0000:16:0e.7: Adding to iommu group 55
[   12.503525] pci 0000:16:0f.0: Adding to iommu group 56
[   12.503561] pci 0000:16:0f.1: Adding to iommu group 56
[   12.503666] pci 0000:16:1d.0: Adding to iommu group 57
[   12.503701] pci 0000:16:1d.1: Adding to iommu group 57
[   12.503735] pci 0000:16:1d.2: Adding to iommu group 57
[   12.503768] pci 0000:16:1d.3: Adding to iommu group 57
[   12.503935] pci 0000:16:1e.0: Adding to iommu group 58
[   12.503969] pci 0000:16:1e.1: Adding to iommu group 58
[   12.504003] pci 0000:16:1e.2: Adding to iommu group 58
[   12.504037] pci 0000:16:1e.3: Adding to iommu group 58
[   12.504071] pci 0000:16:1e.4: Adding to iommu group 58
[   12.504106] pci 0000:16:1e.5: Adding to iommu group 58
[   12.504141] pci 0000:16:1e.6: Adding to iommu group 58
[   12.504169] pci 0000:64:05.0: Adding to iommu group 59
[   12.504196] pci 0000:64:05.2: Adding to iommu group 60
[   12.504226] pci 0000:64:05.4: Adding to iommu group 61
[   12.504252] pci 0000:64:08.0: Adding to iommu group 62
[   12.504281] pci 0000:64:09.0: Adding to iommu group 63
[   12.504310] pci 0000:64:0a.0: Adding to iommu group 64
[   12.504337] pci 0000:64:0a.1: Adding to iommu group 65
[   12.504363] pci 0000:64:0a.2: Adding to iommu group 66
[   12.504391] pci 0000:64:0a.3: Adding to iommu group 67
[   12.504417] pci 0000:64:0a.4: Adding to iommu group 68
[   12.504444] pci 0000:64:0a.5: Adding to iommu group 69
[   12.504472] pci 0000:64:0a.6: Adding to iommu group 70
[   12.504499] pci 0000:64:0a.7: Adding to iommu group 71
[   12.504525] pci 0000:64:0b.0: Adding to iommu group 72
[   12.504551] pci 0000:64:0b.1: Adding to iommu group 73
[   12.504580] pci 0000:64:0b.2: Adding to iommu group 74
[   12.504607] pci 0000:64:0b.3: Adding to iommu group 75
[   12.504633] pci 0000:64:0c.0: Adding to iommu group 76
[   12.504660] pci 0000:64:0c.1: Adding to iommu group 77
[   12.504687] pci 0000:64:0c.2: Adding to iommu group 78
[   12.504714] pci 0000:64:0c.3: Adding to iommu group 79
[   12.504740] pci 0000:64:0c.4: Adding to iommu group 80
[   12.504768] pci 0000:64:0c.5: Adding to iommu group 81
[   12.504795] pci 0000:64:0c.6: Adding to iommu group 82
[   12.504822] pci 0000:64:0c.7: Adding to iommu group 83
[   12.504848] pci 0000:64:0d.0: Adding to iommu group 84
[   12.504883] pci 0000:64:0d.1: Adding to iommu group 85
[   12.504912] pci 0000:64:0d.2: Adding to iommu group 86
[   12.504939] pci 0000:64:0d.3: Adding to iommu group 87
[   12.504965] pci 0000:b2:05.0: Adding to iommu group 88
[   12.504992] pci 0000:b2:05.2: Adding to iommu group 89
[   12.505019] pci 0000:b2:05.4: Adding to iommu group 90
[   12.505045] pci 0000:b2:12.0: Adding to iommu group 91
[   12.505110] pci 0000:b2:12.1: Adding to iommu group 92
[   12.505155] pci 0000:b2:12.2: Adding to iommu group 92
[   12.505203] pci 0000:b2:15.0: Adding to iommu group 93
[   12.505269] pci 0000:b2:16.0: Adding to iommu group 94
[   12.505312] pci 0000:b2:16.4: Adding to iommu group 94
[   12.505358] pci 0000:b2:17.0: Adding to iommu group 95
[   24.171527] pci 0000:18:01.3: Adding to iommu group 96
[   24.171790] pci 0000:17:01.3: Adding to iommu group 97
[   24.172850] pci 0000:18:01.7: Adding to iommu group 98
[   24.172963] pci 0000:17:01.7: Adding to iommu group 99
[   24.173405] pci 0000:17:02.3: Adding to iommu group 100
[   24.173526] pci 0000:18:02.3: Adding to iommu group 101
[   24.174105] pci 0000:17:02.7: Adding to iommu group 102
[   24.176354] pci 0000:18:02.7: Adding to iommu group 103
[   24.176468] pci 0000:17:01.0: Adding to iommu group 104

(and that’s without any override patch)
… meanwhile I can’t remember to replace all placeholder PCIe addresses in all the places :rofl:

root@vmserver:/home/admin # lspci | grep '\[VF]'
17:01.0 Ethernet controller: Chelsio Communications Inc T540-BT Unified Wire Ethernet Controller [VF]
17:01.3 Ethernet controller: Chelsio Communications Inc T540-BT Unified Wire Ethernet Controller [VF]
17:01.7 Ethernet controller: Chelsio Communications Inc T540-BT Unified Wire Ethernet Controller [VF]
17:02.3 Ethernet controller: Chelsio Communications Inc T540-BT Unified Wire Ethernet Controller [VF]
17:02.7 Ethernet controller: Chelsio Communications Inc T540-BT Unified Wire Ethernet Controller [VF]
18:01.3 Ethernet controller: Chelsio Communications Inc T540-BT Unified Wire Ethernet Controller [VF]
18:01.7 Ethernet controller: Chelsio Communications Inc T540-BT Unified Wire Ethernet Controller [VF]
18:02.3 Ethernet controller: Chelsio Communications Inc T540-BT Unified Wire Ethernet Controller [VF]
18:02.7 Ethernet controller: Chelsio Communications Inc T540-BT Unified Wire Ethernet Controller [VF]
root@vmserver:/home/admin # xl create /opt/xen/opnsense.cfg
Parsing config from /opt/xen/opnsense.cfg
libxl: error: libxl_pci.c:1658:libxl__device_pci_add: Domain 1:PCI device 0000:17:02.0 already assigned to a different guest?
libxl: error: libxl_pci.c:1809:device_pci_add_done: Domain 1:libxl__device_pci_add failed for PCI device 0:17:2.0 (rc -1)
libxl: error: libxl_pci.c:1658:libxl__device_pci_add: Domain 1:PCI device 0000:18:02.0 already assigned to a different guest?
libxl: error: libxl_pci.c:1809:device_pci_add_done: Domain 1:libxl__device_pci_add failed for PCI device 0:18:2.0 (rc -1)
libxl: error: libxl_create.c:1939:domcreate_attach_devices: Domain 1:unable to add pci devices
libxl: warning: libxl_pci.c:2156:pci_remove_timeout: Domain 1:timed out waiting for DM to remove pci-pt-17_01.0
libxl: error: libxl_xshelp.c:201:libxl__xs_read_mandatory: xenstore read failed: `/libxl/1/type': No such file or directory
libxl: warning: libxl_dom.c:49:libxl__domain_type: unable to get domain type for domid=1, assuming HVM
libxl: error: libxl_domain.c:1612:domain_destroy_domid_cb: Domain 1:xc_domain_destroy failed: No such process
libxl: error: libxl_domain.c:1133:domain_destroy_callback: Domain 1:Unable to destroy guest
libxl: error: libxl_domain.c:1060:domain_destroy_cb: Domain 1:Destruction of domain failed

TBF I did change them after checking, just not in the config file…