Two LXC cannot ping each other but can ping everything else on the same network?

Vitalius · May 1, 2017, 10:31pm

So, I have two Linux Containers on my PC running Fedora. Both are Arch containers.

Both have this configuration file for their containers:

# Template used to create this container: /usr/share/lxc/templates/lxc-download
# Parameters passed to the template:
# For additional config options, please look at lxc.container.conf(5)

# Uncomment the following line to support nesting containers:
#lxc.include = /usr/share/lxc/config/nesting.conf
# (Be aware this has security implications)

# Distribution configuration
lxc.include = /usr/share/lxc/config/archlinux.common.conf
lxc.arch = x86_64

# Container specific configuration
lxc.rootfs = /var/lib/lxc/-lxc_name-/rootfs
lxc.rootfs.backend = btrfs
lxc.utsname = -lxc_name-

# Network configuration
lxc.network.type = veth
lxc.network.link = br0
lxc.network.veth.pair = -lxc_name-.veth0
lxc.network.name = veth0
lxc.network.flags = up
lxc.network.hwaddr = -MAC Address-

I'm not using the default lxc bridge because it forces NAT'd networking and I need these containers to be visible on the LAN directly.

One has IP address 10.0.1.99 and one has IP address 10.0.1.101, both using /22 subnet mask. Both are configured either using iproute2 or netctl. I've tried both on both containers.

Both containers can ping the host's IP address (10.0.1.97). Both can access the internet through the LAN gateway. But neither can ping the other.

This is confusing to me because it was working just fine before I restarted the containers. I restarted the containers because I destroyed one and built it from scratch.

I need them to be able to communicate for various reasons, but I can't figure out what could've possibly changed.

cburn11 · May 1, 2017, 11:51pm

Would you post the routing table on the host (ip route show)? Can the host ping the containers? And do you have some iptable rule on the containers, or the host, dropping icmp traffic?

Vitalius · May 2, 2017, 1:42pm

Here is ip route show on the host:

default via 10.0.1.5 dev br0 proto static metric 425
10.0.1.0/22 dev br0 proto kernel scope link src 10.0.1.97 metric 425
10.0.1.5 dev br0 proto static scope link metric 425

Here it is on the containers (they both return the same, except the source IP is the container's static IP):

default via 10.0.1.5 dev veth0
10.0.1.0/22 dev veth0 proto kernel scope link src 10.0.1.101

The process I have to take to get this setup to work is a bit involved. If I run nmcli c on the host once it's done, I get:

NAME - UUID - TYPE - DEVICE
br0 - e37726ec-e18a-4785-ba29-d7545108962e - bridge - br0
br0-ether - f898c107-ac7f-4bd4-8068-f39ade069fee - 802-3-ethernet - enp7s0
br0-httpd.veth0 - 3418e958-f735-4143-b8af-d49e788be0f0 - 802-3-ethernet - httpd.veth0
br0-mysqld.veth0 - 54a6a2e3-2066-4294-aae0-4d8bc8df7e33 - 802-3-ethernet - mysqld.veth0

This is after I've rebooted the machine and completed the following steps:

Start both containers.
Enable both connections for br0 on host
(the connections have to exist for this to work, so the containers have to be started already).
Set the veth0 device in both containers to "down" state with ip link.
(netctl can't start a profile for an 'up' interface)
Start the netctl profile in both containers.

This will give me a static IP address on a device that's bridged on the host for the LAN. Can't use DHCP. Can't use NAT, and I can't figure out another way to make this work.

Anyway, if I try and ping the containers' IP addresses from the host:

Summary

[user@host ~]$ ping 10.0.1.97 (host pinging itself)

PING 10.0.1.97 (10.0.1.97) 56(84) bytes of data.
64 bytes from 10.0.1.97: icmp_seq=1 ttl=64 time=0.063 ms
64 bytes from 10.0.1.97: icmp_seq=2 ttl=64 time=0.050 ms

--- 10.0.1.97 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1055ms
rtt min/avg/max/mdev = 0.050/0.056/0.063/0.009 ms

[user@host ~]$ ping 10.0.1.99 (container 1)

PING 10.0.1.99 (10.0.1.99) 56(84) bytes of data.
64 bytes from 10.0.1.99: icmp_seq=1 ttl=64 time=0.115 ms
64 bytes from 10.0.1.99: icmp_seq=2 ttl=64 time=0.045 ms

--- 10.0.1.99 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1051ms
rtt min/avg/max/mdev = 0.045/0.080/0.115/0.035 ms

[user@host ~]$ ping 10.0.1.101 (container 2)

PING 10.0.1.101 (10.0.1.101) 56(84) bytes of data.

--- 10.0.1.101 ping statistics ---
9 packets transmitted, 0 received, 100% packet loss, time 8172ms

.... Hmm, I can't ping one of the containers, but I can the other. That's weird because in the container with IP address 10.0.1.101, I can ping the host, but not the container with IP address 10.0.1.99.

[root@container2 /]# ping 10.0.1.97 (host)

PING 10.0.1.97 (10.0.1.97) 56(84) bytes of data.
64 bytes from 10.0.1.97: icmp_seq=1 ttl=64 time=0.163 ms
64 bytes from 10.0.1.97: icmp_seq=2 ttl=64 time=0.068 ms
^C
--- 10.0.1.97 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1016ms
rtt min/avg/max/mdev = 0.068/0.115/0.163/0.048 ms

[root@container2 /]# ping 10.0.1.99 (container 1)

PING 10.0.1.99 (10.0.1.99) 56(84) bytes of data.
From 10.0.1.101 icmp_seq=1 Destination Host Unreachable
^C
--- 10.0.1.99 ping statistics ---
4 packets transmitted, 0 received, +1 errors, 100% packet loss, time 3096ms
pipe 3

[root@container2 /]# ping 10.0.1.101 (container 2, itself)

PING 10.0.1.101 (10.0.1.101) 56(84) bytes of data.
64 bytes from 10.0.1.101: icmp_seq=1 ttl=64 time=0.059 ms
64 bytes from 10.0.1.101: icmp_seq=2 ttl=64 time=0.040 ms
^C
--- 10.0.1.101 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1048ms
rtt min/avg/max/mdev = 0.040/0.049/0.059/0.011 ms

For completeness, here is container 1's ping results:

[root@container1 /]# ping 10.0.1.97 (host)

PING 10.0.1.97 (10.0.1.97) 56(84) bytes of data.
64 bytes from 10.0.1.97: icmp_seq=1 ttl=64 time=0.113 ms
64 bytes from 10.0.1.97: icmp_seq=2 ttl=64 time=0.067 ms
^C
--- 10.0.1.97 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1052ms
rtt min/avg/max/mdev = 0.067/0.090/0.113/0.023 ms

[root@container1 /]# ping 10.0.1.99 (container 1, itself)

PING 10.0.1.99 (10.0.1.99) 56(84) bytes of data.
64 bytes from 10.0.1.99: icmp_seq=1 ttl=64 time=0.051 ms
64 bytes from 10.0.1.99: icmp_seq=2 ttl=64 time=0.041 ms
^C
--- 10.0.1.99 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1024ms
rtt min/avg/max/mdev = 0.041/0.046/0.051/0.005 ms

[root@container1 /]# ping 10.0.1.101 (container 2)

PING 10.0.1.101 (10.0.1.101) 56(84) bytes of data.
^C
--- 10.0.1.101 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3087ms

So it seems the notables are the following information:

The host cannot ping Container 2.
Container 2 cannot ping Container 1.
Container 1 cannot ping Container 2.
The host can ping Container 1.
Both Container 1 & 2 can ping the host.

I didn't manually configure any ip table rules, except for SSH, but I destroyed and recreated the container after doing so, and it worked before, so I don't know why that would've done something.

This issue started when I recreated Container 1 to restart my installation process for the services I'm working on. It was working 100% fine before I did that. After recreating the container, I restarted the machine.

This is bizarre.

Vitalius · May 3, 2017, 1:45pm

So I restarted my machine and set up both containers. Now the host can ping either, but the containers can't ping each other.

I might guess LXC is doing something? Or SELinux? It'd be weird to me though, just because it was working fine before.

cburn11 · May 3, 2017, 4:00pm

Nothing jumps out to me as obviously wrong with the routing table. Is it just icmp traffic that can't travel between the containers or is tcp or udp traffic between containers not going through?

The only other things I can think of would be to watch the traffic on the host and container interfaces by logging the traffic though the raw iptable (iptables -t raw -A PREROUTING -p tcmp -j TRACE or tcpdump -n -i icmp) and see where traffic gets dropped and maybe that would give you a clue. Another option might be to temporarily revert to the lxc default bridge and if that allows the containers to "ping" each then that might hint that the problem is with the custom bridge settings.

Vitalius · May 3, 2017, 7:41pm

PHPMyAdmin in Container 1 can't get to the MariaDB MySQL Database in Container 2.

That is the PHPMyAdmin error it's giving.

... I found my problem.

I had copied one LXC config file to the other container to speed up setup. I must've forgotten to change the MAC address.

At least I figured it out. Thanks for the help.