Will this work? KVM + SR-IOV Mellanox

Hello, I am in the process of building my Threadripper workstation.

I want to run a Linux host with a WIndows VM for starters.

I have a Mellanox Connect-X 3 single port 10Gb card.

I want to connect both the host and guest to my LAN with said card, without NAT and virtual networks in the middle. I want the guest to operate as if I had passed through a NIC.

Will SR-IOV work in my case? I am planning on getting a Mikrotik CRS326-24G-2S+RM switch that has two SFP+ ports to connect my workstation and NAS. Is a single cable and switch port going to present any problems? Does the switch need any special configuration or features?

Upgrading to a dual port card on the host means I have to buy a much more expensive switch that has more than two ports. I would like to avoid that.

Hi there … sorry to dig up your year old thread. Did you get this going? I’m experimenting with SR-IOV networking on my Threadripper server and well, it’s kinda working?

Hello!

I did actually get it working and I am actively using it but with an Intel instead of a Mellanox card.

I am running a TR 2920x on an ASRock Taichi X399 board.

If you have any specific questions or problems let me know and I will help out as much as I can!

I am finding my way, a few bumps along the road.

Keeping the host OS from trying to initialize all the virtual functions is the headache I’ve run into so far. I’m not sure if there’s a more elegant solution, but I just blacklisted the ixgbevf driver using the kernel cmd line…

Supposedly this will do it, the proper way?

# echo 0 > /sys/bus/pci/devices/0000\:08\:00.0/sriov_drivers_autoprobe

I keep setting it to 0, but it keeps reverting back to 1 at every reboot, not sure how to make it persistent.

Thanks for the reply!

Your

method is per the Intel documentation but it will not survive reboots.

I am running Arch Linux so my setup is as this:

I have a file under /etc/modprobe.d named ixgbevf.conf:

install ixgbevf /bin/false

That makes sure that the ixgbevf driver does not take control of the Intel VFs and as such, they are free to be assigned and passed through to any VM.

Also, for the sake of sanity, I have a systemd unit enabled called ixgbevfmac.service:

[Service]

Type=oneshot
RemainAfterExit=yes
Wants=network-online.target
After=network-online.target
ExecStartPre=/bin/sleep 30
ExecStart=/home/.scripts/ixgbevfmac.sh

[Install]
WantedBy=multi-user.target

That calls the /home/.scripts/ixgbevfmac.sh:

#!/bin/bash

ip link set enp10s0f0 vf 0 mac 00:19:99:c1:3e:bf
ip link set enp10s0f0 vf 1 mac 00:19:99:c1:3e:be
ip link set enp10s0f0 vf 2 mac 00:19:99:c1:3e:bh
ip link set enp10s0f0 vf 3 mac 00:19:99:c1:3e:bi
ip link set enp10s0f1 vf 0 mac 00:19:99:c1:3e:bg
ip link set enp10s0f1 vf 1 mac 00:19:99:c1:3e:ba
ip link set enp10s0f1 vf 2 mac 00:19:99:c1:3e:bc
ip link set enp10s0f1 vf 3 mac 00:19:99:c1:3e:bd

You have to edit enp10s0f0 or enp10s0f1 to the name of your physical Intel network interface accordingly and set the MAC addresses to something that is valid (I believe all of mine are but I am not so sure). My Intel card is a two-port card, hence both enp10s0f0 and enp10s0f1 as network interfaces.

That sets the MAC addresses of each VF to the same address on every reboot so that VMs do not get new IPs each time the host reboots. By default, the Intel driver assigns new MAC addresses to each VF and that leads to new IPs getting used up and exhausted. In the case of my situation, where I have set my DHCP server to keep addresses for 3 months, that is a really bad thing to have happen.

That is how I am handling it on Arch (and similarly CentOS for my NAS). The best part of this is that you can use this method with docker and pipework and get docker containers on your LAN with super fast networking, without putting extra load on the docker host.

What I have yet failed to demonstrate is VF to VF or PF communication using the “onboard bridge”. Everything seems to be going out to the network (to my switch) and back in again but that is a limitation of how the driver was compiled and at least with the Arch Inbox driver, the “onboard bridge” seems to be disabled.

Again, I will be glad to answer any more questions that I am able to do so.

1 Like

Great idea with the script… will give that a shot.

I don’t run anything intensive on my guests, or the host for that matter. Just wanted to try out something “new”

I’ll have to give that pipework / docker thing more study. I’m finishing up moving most of my containers to a dedicated docker Centos guest to make reign in resources a bit easier.

1 Like

Great!

I moved my NAS setup from FreeNAS to CentOS 8 for the sake of docker and FusionIO SSDs.

I am running more than 20 SR-IOV VFs on the CentOS NAS, one for each of my docker containers. Some containers even have multiple VFs assigned to them with different VLANs. Having a central management location of IPs through my pfSense host instead of relying on host ports is really useful. In my case, every container has its own hostname no matter the port, so I can call e.g. sonarr.lan:8989 instead of centos.lan:8989. Makes troubleshooting and general management so much easier. Also, this way, you can avoid port conflicts or run any container service (that is configurable) on port 80/443 so that you don’t have to be hunting for ports when you are trying to find a service.

I might eventually get around to making a guide on moving from FreeNAS to ordinary Linux but it will be huge and no trivial task.

Anyway, if you are interested in pipework, the container that can do all of this, look no further than: https://github.com/dreamcat4/docker-images/tree/master/pipework

Just make sure the entrypoint.sh inside the container is longer than 600 lines. I worked with dreamcat4 on the SR-IOV stuff and they had some trouble building new dockerhub containers with the new entrypoint.sh script, so the dockerhub image might be a bit older.

Once more, I have looked much into this so if you require any assistance, I will be glad to help to the best of my knowledge and abilities.

Best of Luck

1 Like