Connecting two Xen VMs directly

Assume the following setup. Xen Hypervisor, Linux on dom0, 2+ domUs with potentially different operating systems (PVs or HVMs with PV drivers).
Is it possible to somehow directly connect vifs between the domUs, bypassing dom0, as if the domUs were directly connected with a cable?

Illustration:

+-----------------------------------------------------------+
| Xen Hypervisor                                            |
|                                                           |
|+---------------+          +------------------------------+|
|| dom1    [eth0]+----------+[vif1.0]-+               dom0 ||
||         [eth1]+----+     |         |                    ||
||         [eth2]+--+ |     |         |                    ||
|+---------------+  | |     |         |                    ||
|                   | |     |         |                    ||
|+---------------+  | |     |         +-[xenbr0]           ||
|| dom2    [eth0]+--+ |     |         |                    ||
|+---------------+    |     |         |                    ||
|                     |     |         |                    ||
|+---------------+    |     |         |                    ||
|| dom3    [eth0]+----+     |         +--------------[eth0]||
|+---------------+          +------------------------------+|
+-----------------------------------------------------------+

I know I can create bridges in dom0 and connect the domains there, but this seems to have some drawbacks:

+-----------------------------------------------------------+
| Xen Hypervisor                                            |
|                                                           |
|+---------------+     +-----------------------------------+|
|| dom1    [eth0]+-----+[vif1.0]---------------+      dom0 ||
||         [eth1]+-----+[vif1.1]---+           |           ||
||         [eth2]+-----+[vif1.2]-+ |           |           ||
|+---------------+     |         | |           |           ||
|                      |         |-)-[dom12br] |           ||
|+---------------+     |         | |           +-[xenbr0]  ||
|| dom2    [eth0]+-----+[vif2.0]-+ +-[dom13br] |           ||
|+---------------+     |           |           |           ||
|                      |           |           |           ||
|+---------------+     |           |           |           ||
|| dom3    [eth0]+-----+[vif3.0]---+           +-----[eth0]||
|+---------------+     +-----------------------------------+|
+-----------------------------------------------------------+
  1. The traffic between dom1 and dom2 needs to be explicitly routed at dom0, can’t be just “passed through”.
    I believe this in turn requires explicit use of dom0 cores and interrupt time.
  2. dom12br and dom13br are MAC addresable from dom2 and dom3; that’s not the case in my desired configuration.
    Even without DHCP they would receive MAC broadcasts I believe.
  3. Reduced throughput (?)

There must be a way to solve this “more intelligently” than by passing through physical ports (when NICs permit) and connecting them via a physical cable.
Even a dumb pipe in dom0 shoud be better (would it be?) than creating an unnecessary bridge.
I assume I’m just missing an important keyword here. For instance, the links between domU vifs and veths on dom0 look just like the thing I want to have between domUs, but without involving dom0 in the connection.

I can assume that domUs start in a predefined order, i.e. dom1 first, then dom2 and dom3 afterwards.

What I checked

1 Like

I want to do the exact same networking setup, but have yet to work out how. In my case dom1 would be acting as a firewall.

Do post a followup with what you come up with, it seems an obvious thing to want to do.

Paradoxically, the closest solution I found so far is offloading via NIC SR-IOV. If you plug the VFs to VMs you can have them talking to each other (and other stuff connected to associated PF port) without involving dom0. This has the drawback of using PCIe bandwidth for inter-VM connections, unless the VF driver can handle it better. Did not benchmark it yet, as I’m having problems getting it set up with Xen and HVM OPNsense.

Regarding direct connection, I found one presentation that suggested it’s possible and implemented in Xen, but that’s it so far. I think it was something from these guys (definitely associated with Xilinx), but no concrete solutions yet. dom0less seems to be the keyword.

2 Likes

S’more +/- related presentations:

This one basically says “not yet” in 2019:

I think this is the one I remember:

I’m not too optimistic anymore. It seems it’s still work in progress and high performance is not the highest priority.

OK, so I actually managed to find how to create such a link and the keywords were actually driver_domain and backend in vif specification.

https://xenbits.xen.org/docs/unstable/man/xl.cfg.5.html#Other-Options
https://xenbits.xen.org/docs/unstable/man/xl-network-configuration.5.html

As a proof-of-concept I made the following two PVs:

name = "Test1"
type = "pv"
driver_domain=1

memory = 2048
maxmem = 2048
vcpus = 2

kernel = "/mnt/arch/boot/x86_64/vmlinuz-linux"
ramdisk = "/mnt/arch/boot/x86_64/initramfs-linux.img"
extra = "archisobasedir=arch archisodevice=UUID=2024-01-01-16-44-54-00"

disk = [ 
    "file:/opt/xen/isos/archlinux-2024.01.01-x86_64.iso,hdc:cdrom,r",
]
vif = [
    "mac=00:16:3e:11:22:33,bridge=mgmt-lan-br",
]
name = "Test2"
type = "pv"

memory = 2048
maxmem = 2048
vcpus = 2

kernel = "/mnt/arch/boot/x86_64/vmlinuz-linux"
ramdisk = "/mnt/arch/boot/x86_64/initramfs-linux.img"
extra = "archisobasedir=arch archisodevice=UUID=2024-01-01-16-44-54-00"

disk = [ 
    "file:/opt/xen/isos/archlinux-2024.01.01-x86_64.iso,hdc:cdrom,r",
]
vif = [
    "mac=00:16:3e:22:33:44,bridge=mgmt-lan-br",
    "mac=00:16:3e:33:44:55,backend=Test1",
]

Directly after booting you can see the following state:

After bringing up vif10.1 and setting up ip you can do a ping:


Perf is… not terrible, but could be better honestly, with about 7Gb/s in basic iperf3. Increasing MTU to 9k did not change anything.

With HVMs it’s also possible, but the performance is noticeably worse at ~3.3Gb/s:

For comparison, these are the same HVM domains, but communicating through SR-IOV VF on a 10G Chelsio NIC (T540-BT):

The associated physical port is unused. For a used port, the speed is limited by the physical link speed, e.g. if I use a port connected to a 1G appliance the VM-to-VM speed is also limited to 1G:

Although there might be some way to tune it.

And one more thing, a dumb bridge in dom0 appears to be the fastest overall:

I have also checked with increased VM size to 4 vcpu and 4G mem, same results.

The core usage in the “dumb bridge” scenario was up to 3 cores in dom0, 1 core in test1 and 0.5 core in test2.

The core usage in the “direct link” scenario was up to 2.4 in test1 and 0.4 in test2.

The core usage in the vf passthrough scenario was up to 0.8 in test1 and 0.24 in test2.

1 Like

Just to throw even more shade on Xen, this is an identical setup using QEMU/KVM+libvirt, virtio driver and a host bridge:


Top CPU usage I noticed was about 2.8 in server and 4 cores in client, but in this scenario I have pinned cpu cores with SMT being grouped on the same VM.

Notice how with Xen dom0 bridge had over 1000 retransmissions on average, no such thing with KVM.