[SOLVED] Linux VM bridged network: ARP OK, no IP traffic

I’m trying out Firecracker to launch KVMs on an Arch Linux homelab host. The host has a dual-SFP+ NIC, both ports connected to an FS S3900 switch using 802.3ad mode in a bond interface bond1, which has been attached to a bridge br1. The SFP+, bond1, and br1 interfaces are all configured using systemd-networkd. A tap interface tap0 was added to br1 and is passed to Firecracker to appear as eth0 inside the vm. My goal is to have the VM appear on my LAN, without NAT or custom routes.

The bridge br1 shows promiscuity 1 and nf_call_iptables shows 0. brctl showmacs br1 lists the tap0 MAC (twice) and the VM eth0 MAC (once).

I am able to ssh and ping between the host and the VM. From a laptop on the network, arping using the VM’s static IP works OK, showing the made-up MAC address assigned to the VM eth0 interface. My AT&T BGW320-500 router even seems aware of the VM and shows its MAC and static IP. However, neither ping nor ssh in or out of the VM from other machines on my network are working.

What might I be missing?

1 Like

IP forwarding missing?

sysctl -w net.ipv4.ip_forward=1

Try to tcpdump the ping packets to find out whether it’s the requests or replies getting lost.

You can also look at TC filters:
https://manpages.debian.org/testing/iproute2/tc-simple.8.en.html

There’s macvlans and macvtaps (you’re not using them are you?) and there’s a chance that bonding isn’t working right, you should be able to yank a cable or down a link to test.

Are you using vhostnet acceleration in the kernel?

That seems to be in place.

$ sudo sysctl -a | grep net.ipv4.ip_forward
net.ipv4.ip_forward = 1

Here are some tcpdumps collected on the host. That was a great idea! I can see that, from an outside host, ping packets enter on one of the SFP+ ports, pass to the bond1 interface, and then do not get sent onward to the tap0 interface. In contrast, when the ping is sent from an outside host to the host, the packets go from the SFP+ port to the bond1 interface to the br1 interface.

$ sudo tcpdump -nv -i any icmp # pinging VM from another host on network: not working
tcpdump: data link type LINUX_SLL2
tcpdump: listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
00:09:18.338181 ens4f1np1 P   IP (tos 0x0, ttl 63, id 18488, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.1.233 > 192.168.1.17: ICMP echo request, id 1006, seq 1, length 64
00:09:18.338181 bond1 P   IP (tos 0x0, ttl 63, id 18488, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.1.233 > 192.168.1.17: ICMP echo request, id 1006, seq 1, length 64
^C
2 packets captured
5 packets received by filter
0 packets dropped by kernel
$ sudo tcpdump -nv -i any icmp # pinging VM from the host: OK
tcpdump: data link type LINUX_SLL2
tcpdump: listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
00:10:10.165886 br1   Out IP (tos 0x0, ttl 64, id 19344, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.1.15 > 192.168.1.17: ICMP echo request, id 12, seq 1, length 64
00:10:10.165892 tap0  Out IP (tos 0x0, ttl 64, id 19344, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.1.15 > 192.168.1.17: ICMP echo request, id 12, seq 1, length 64
00:10:10.166205 tap0  P   IP (tos 0x0, ttl 64, id 64865, offset 0, flags [none], proto ICMP (1), length 84)
    192.168.1.17 > 192.168.1.15: ICMP echo reply, id 12, seq 1, length 64
00:10:10.166205 br1   In  IP (tos 0x0, ttl 64, id 64865, offset 0, flags [none], proto ICMP (1), length 84)
    192.168.1.17 > 192.168.1.15: ICMP echo reply, id 12, seq 1, length 64
^C
4 packets captured
7 packets received by filter
0 packets dropped by kernel
$ sudo tcpdump -nv -i any icmp # pinging host from another host on network: OK
tcpdump: data link type LINUX_SLL2
tcpdump: listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
00:12:27.613923 ens4f1np1 P   IP (tos 0x0, ttl 63, id 20161, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.1.233 > 192.168.1.15: ICMP echo request, id 1006, seq 1, length 64
00:12:27.613923 bond1 P   IP (tos 0x0, ttl 63, id 20161, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.1.233 > 192.168.1.15: ICMP echo request, id 1006, seq 1, length 64
00:12:27.613923 br1   In  IP (tos 0x0, ttl 63, id 20161, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.1.233 > 192.168.1.15: ICMP echo request, id 1006, seq 1, length 64
00:12:27.613967 br1   Out IP (tos 0x0, ttl 64, id 33892, offset 0, flags [none], proto ICMP (1), length 84)
    192.168.1.15 > 192.168.1.233: ICMP echo reply, id 1006, seq 1, length 64
00:12:27.613969 bond1 Out IP (tos 0x0, ttl 64, id 33892, offset 0, flags [none], proto ICMP (1), length 84)
    192.168.1.15 > 192.168.1.233: ICMP echo reply, id 1006, seq 1, length 64
00:12:27.613975 ens4f0np0 Out IP (tos 0x0, ttl 64, id 33892, offset 0, flags [none], proto ICMP (1), length 84)
    192.168.1.15 > 192.168.1.233: ICMP echo reply, id 1006, seq 1, length 64
^C
6 packets captured
9 packets received by filter
0 packets dropped by kernel
$ sudo tcpdump -nv -i any icmp # pinging another host on network from VM: not working
tcpdump: data link type LINUX_SLL2
tcpdump: listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
00:41:25.446761 tap0  P   IP (tos 0x0, ttl 64, id 39047, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.1.17 > 192.168.1.233: ICMP echo request, id 21201, seq 1, length 64
^C
1 packet captured
4 packets received by filter
0 packets dropped by kernel

The tap0 interface is showing connected to br1. The 7a:50 MAC address belongs to tap0; not sure why it’s listed twice, but the bond1 MAC address is also listed twice. The aa:fc MAC address is set by Firecracker for the VM’s eth0.

$ brctl show br1
bridge name     bridge id               STP enabled     interfaces
br1             8000.4e7f990e7a72       no              bond1
                                                        tap0
$ brctl showmacs br1
port no mac addr                is local?       ageing timer
[trimmed...]
  1     2a:81:c6:d2:7b:73       yes                0.00
  1     2a:81:c6:d2:7b:73       yes                0.00
  2     7a:50:27:f0:64:35       yes                0.00
  2     7a:50:27:f0:64:35       yes                0.00
  2     aa:fc:00:00:00:01       no                 4.38

I’m not using macvlans or macvtaps, as far as I know. The tap0 interface was created with ip tuntap add tap0 mode tap user john. From my understanding of the Firecracker docs, I’m not using vhostnet.

For comparison, I booted a spare PC with a single Ethernet interface in a live environment, configured it to have a bridge, added a tap0 interface to it, and booted a Firecracker VM (using a different aa:fc MAC) on it. That VM sees the network OK and is accessible from outside the machine. I haven’t spotted anything different yet, other than not having a bonded interface.

From memory the FORWARD chain policy is DROP by default? something to try

iptables -P FORWARD ACCEPT

I’m using nftables instead of iptables on this machine. There’s a setting for the bridge which turns off calling iptables, but I should probably try setting my nftables rules to log-drop instead of drop everywhere. Maybe that will reveal something.

$ grep -r '' /sys/class/net/br1/bridge/nf_call_*
/sys/class/net/br1/bridge/nf_call_arptables:0
/sys/class/net/br1/bridge/nf_call_ip6tables:0
/sys/class/net/br1/bridge/nf_call_iptables:0

If ARP works then my instinct says it’s a firewall issue, at least layer 2 works fine

One other observation—if I connect the tap0 interface to my docker0 bridge instead of br1, then the VM can connect out to the network. It’s just behind NAT, which isn’t my goal.

proxy arp?

If I understand it correctly, proxy ARP is meant for cases where a host should answer for other hosts which aren’t on the network.

Another machine on the network seems to be able to arping the VM OK.

$ arping -c1 192.168.1.17 # from another machine on the network to VM
ARPING 192.168.1.17 from 192.168.1.110 br1
Unicast reply from 192.168.1.17 [AA:FC:00:00:00:01]  0.942ms
Sent 1 probes (1 broadcast(s))
Received 1 response(s)

Inside the VM, arping in the reverse direction is also looking OK.

$ arping -c1 192.168.1.110 # from inside the VM to the above machine
ARPING 192.168.1.110 from 192.168.1.17 eth0
Unicast reply from 192.168.1.110 [46:00:00:00:00:00]  0.892ms
Sent 1 probes (1 broadcast(s))
Received 1 response(s)

Whoa, solved by running this:

sudo sysctl -w net.bridge.bridge-nf-call-iptables=0

I’m surprised, given that /sys/class/net/br1/bridge/nf_call_iptables was 0.

Awesome!

I’m gonna claim I was pseudo-right even though I just blurted out every solution under the sun :stuck_out_tongue:

Weird!!!

There’s some code pointers in this old OpenWRT patch here: [OpenWrt-Devel] generic: Fix per interface nf_call_iptables setting - Patchwork

The kernel probably needs a documentation patch, if nothing else.