Configuring IPv6 for LXC containers

max1220 · April 8, 2021, 2:12pm

Hello everybody,

I(absolute IPv6 beginner) am having trouble setting up IPv6 addresses for my LXC containers on my VPS.
I think I must have routes configured wrong or something…

I have this VPS, with a single IPv4 and a /64 IPv6 network.
I’ve got LXC(the classical LXC, not LXD!) running, using the default network lxc-net bridge for IPv4 NAT, and that works great.

Now I’ve been trying to setup IPv6 for containers that might need it.

I’ve added a section like this to my /etc/network/interfaces, to create a new bridge device called lxcbr0inet6(It has nothing to do with the default lxcbr0), and assign it a /80 sub-network :cccc(mnemonic for container).
I’ve also setup ens3(WAN interface) on a different /80 sub-network :aaaa, so my host can use IPv6 normally:

[pr:ef:ix:xx] is obviously the /64 prefix assigned to my by my VPS provider

iface ens3 inet6 static
        address [pr:ef:ix:xx]:aaaa::1
        netmask 80
        gateway fe80::1

auto lxcbr0inet6
iface lxcbr0inet6 inet6 static
        address [pr:ef:ix:xx]:cccc::1
        netmask 80
        gateway [pr:ef:ix:xx]:aaaa::1
        bridge_ports none

For the container I want to add IPv6 to, I’ve added a config section like this:

lxc.net.1.type = veth
lxc.net.1.link = lxcbr0inet6
lxc.net.1.flags = up
lxc.net.1.hwaddr = 00:16:3e:xx:xx:x6
lxc.net.1.ipv6.address = [pr:ef:ix:xx]:cccc::2/80
lxc.net.1.ipv6.gateway = [pr:ef:ix:xx]:cccc::1

This adds a new veth network interface in my container, connected to the newly created lxcbr0inet6, and should automatically configure it for the specified IP/GW.

I’ve also setup the sysctl’s like this:

net.ipv6.conf.default.accept_ra=0
net.ipv6.conf.default.autoconf=0
net.ipv6.conf.default.forwarding=1

net.ipv6.conf.all.accept_ra=0
net.ipv6.conf.all.autoconf=0
net.ipv6.conf.all.forwarding=1

If I now start the container, I get a sensible(?) network configuration(in the container, cut for brevity):

ip -6 a

7: eth1@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 [pr:ef:ix:xx]:cccc::2/80 scope global 
       valid_lft forever preferred_lft forever

ip -6 r

[pr:ef:ix:xx]:cccc::/80 dev eth1 proto kernel metric 256 pref medium
fe80::/64 dev eth1 proto kernel metric 256 pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
default via [pr:ef:ix:xx]:cccc::1 dev eth1 metric 1024 pref medium

ip -6 n

[pr:ef:ix:xx]:cccc::1 dev eth1 lladdr fe:74:6e:56:a4:57 router REACHABLE
fe80::7c7c:e0ff:fec8:8fe8 dev eth1 lladdr fe:74:6e:56:a4:57 router STALE

I can even ping the various host IPs - I just can’t get outside of my own VPS.
Host that are reachable via ping:
[pr:ef:ix:xx]:cccc::2(veth interface) → Ok
[pr:ef:ix:xx]:cccc::1(lxcbr0inet6) → Ok
[pr:ef:ix:xx]:aaaa::1(enp3s0 WAN) → Ok

But If I try e.g. ping -6 google.de I get no replies
It’s not just ICMP, I’ve tried wget -6 https://google.com -O /dev/zero as well, to no avail

It works perfectly fine on the host, just not in the containers.

It seems to me that I’m missing some routing information? I’m out of my depth…

Is this such an uncommon scenario? I thought this might be more common, does anybody know of a good tutorial/hints? My google-foo found some similar, but not same scenarios - trying a lot of parts from these lead to no success.

I’ve tried a lot of things already, but I’m basically just guessing:
I’ve set up a RADVD server to advertise routes(?), but I’m actually fairly certain that I don’t need to, and it didn’t work.
I’ve played with using the entire /64 on the bridge, but that also did not work.
Also the provider-given gateway of fe80::1, does that make sense?
Am I missing firewall rules? Currently it’s just:

sudo ip6tables -L
Chain INPUT (policy ACCEPT)
Chain FORWARD (policy ACCEPT)
Chain OUTPUT (policy ACCEPT)

On the host, after starting a Container, I get this from ip -6 a:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1000
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 [pr:ef:ix:xx]:aaaa::1/80 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::c89f:9eff:fe33:4b7c/64 scope link 
       valid_lft forever preferred_lft forever
3: lxcbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 fe80::216:3eff:fe00:0/64 scope link 
       valid_lft forever preferred_lft forever
4: lxcbr0inet6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 [pr:ef:ix:xx]:cccc::1/80 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::dcaa:56ff:feab:c2d7/64 scope link 
       valid_lft forever preferred_lft forever
8: vethTG6U7D@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 fe80::fc79:a9ff:fe95:f308/64 scope link 
       valid_lft forever preferred_lft forever
10: veth8PLSXL@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 fe80::fc45:e6ff:fe6f:a140/64 scope link 
       valid_lft forever preferred_lft forever

And this from ip -6 r:

::1 dev lo proto kernel metric 256 pref medium
[pr:ef:ix:xx]:aaaa::/80 dev ens3 proto kernel metric 256 pref medium
[pr:ef:ix:xx]:cccc::/80 dev lxcbr0inet6 proto kernel metric 256 pref medium
fe80::/64 dev ens3 proto kernel metric 256 pref medium
fe80::/64 dev lxcbr0inet6 proto kernel metric 256 pref medium
fe80::/64 dev lxcbr0 proto kernel metric 256 pref medium
fe80::/64 dev veth8PLSXL proto kernel metric 256 pref medium
fe80::/64 dev vethTG6U7D proto kernel metric 256 pref medium
default via fe80::1 dev ens3 metric 1024 onlink pref medium

Any help would be appreciated, keep in mind that I’m very new to IPv6. Also I don’t have IPv6 at home.

risk · April 8, 2021, 4:00pm

use tcpdump / tshark to disect traffic - you can even tunnel the capture over SSH into wireshark so you can browse/investigate at leisure.
you can test routing tables with ip route get.
check that all virtual interfaces are up (both sides of veth, lxc should be sensible, but check)
every network namespace has its own iptables, if you’re not seeing packets coming out of an interface (by running tcpdump or tshark on a veth/bridge) - it’s possible that they’re blocked by / lost in the firewall - you can add -j LOG rules and look at dmesg to see if you can find the packets.
check reverse path filtering as well. (I don’t think you need it since you’re splitting a /64 subnet into multiple /80 for the bridge and the containers, it’s easy to enable / test / disable if there’s issues) (<- actually I think this might be it … check this first)
theoretically, there’s various other places where there might be filtering going on / tc comes to mind, bridges can do filtering as well, but it’s usually not a problem and IMHO you’d do well to bisect the icmp request reply paths with tcpdump and ip route get first before looking at iptables and before looking at other stuff.
check for any ND packets going over the network. mentioning your container addresses anywhere.

Dutch_Master · April 8, 2021, 4:20pm

To ensure it’s not a DNS issue, use 8.8.8.8 instead of google.com as it’s the exact same service you’re pointing at

max1220 · April 8, 2021, 4:40pm

Actually DNS works fine - I can even ping google.com or whatever on the host. And I can see that the containers also resolve the correct IPv6 address.

max1220 · April 8, 2021, 7:26pm

I’m sorry, I only now have a time for a proper response.

ip r get yields nothing out of the ordinary, IMHO:

ip r get 2a00:1450:4001:803::2003 in container(google IPv6 address)

2a00:1450:4001:803::2003 from :: via [pr:ef:ix:xx]:cccc::1 dev eth1 src [pr:ef:ix:xx]:cccc::2 metric 1024 pref medium

ip r get [pr:ef:ix:xx]:aaaa::1in container(host IPv6 address) yields basically the same, but is pingable so some sort of routing is taking place(?):

[pr:ef:ix:xx]:aaaa::1 from :: via [pr:ef:ix:xx]:cccc::1 dev eth1 src [pr:ef:ix:xx]:cccc::2 metric 1024 pref medium

ip r get 2a00:1450:4001:803::2003 On host:

2a00:1450:4001:803::2003 from :: via fe80::1 dev ens3 src [pr:ef:ix:xx]:aaaa::1 metric 1024 pref medium

sudo ip neig on container:

10.0.3.1 dev eth0 lladdr 00:16:3e:00:00:00 REACHABLE
[pr:ef:ix:xx]:cccc::1 dev eth1 lladdr fe:70:bb:51:34:94 router REACHABLE
fe80::80d3:7eff:fe66:4a14 dev eth1 lladdr fe:70:bb:51:34:94 router REACHABLE

On host, running: sudo tcpdump -nni lxcbr0inet6 icmp6

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lxcbr0inet6, link-type EN10MB (Ethernet), capture size 262144 bytes


# in container start pinging google.de

21:13:53.155751 IP6 [pr:ef:ix:xx]:cccc::2 > 2a00:1450:4001:827::2003: ICMP6, echo request, seq 1, length 64
21:13:54.179843 IP6 [pr:ef:ix:xx]:cccc::2 > 2a00:1450:4001:827::2003: ICMP6, echo request, seq 2, length 64
21:13:58.275807 IP6 fe80::216:3eff:fe66:8211 > [pr:ef:ix:xx]:cccc::1: ICMP6, neighbor solicitation, who has [pr:ef:ix:xx]:cccc::1, length 32
21:13:58.275864 IP6 [pr:ef:ix:xx]:cccc::1 > fe80::216:3eff:fe66:8211: ICMP6, neighbor advertisement, tgt is [pr:ef:ix:xx]:cccc::1, length 24
21:14:03.395788 IP6 fe80::80d3:7eff:fe66:4a14 > fe80::216:3eff:fe66:8211: ICMP6, neighbor solicitation, who has fe80::216:3eff:fe66:8211, length 32
21:14:03.395844 IP6 fe80::216:3eff:fe66:8211 > fe80::80d3:7eff:fe66:4a14: ICMP6, neighbor advertisement, tgt is fe80::216:3eff:fe66:8211, length 24
21:14:08.515777 IP6 fe80::216:3eff:fe66:8211 > fe80::80d3:7eff:fe66:4a14: ICMP6, neighbor solicitation, who has fe80::80d3:7eff:fe66:4a14, length 32
21:14:08.515814 IP6 fe80::80d3:7eff:fe66:4a14 > fe80::216:3eff:fe66:8211: ICMP6, neighbor advertisement, tgt is fe80::80d3:7eff:fe66:4a14, length 24




# in container start pinging [pr:ef:ix:xx]:aaaa::1

21:14:59.410211 IP6 [pr:ef:ix:xx]:cccc::2 > [pr:ef:ix:xx]:aaaa::1: ICMP6, echo request, seq 1, length 64
21:14:59.410236 IP6 [pr:ef:ix:xx]:aaaa::1 > [pr:ef:ix:xx]:cccc::2: ICMP6, echo reply, seq 1, length 64
21:15:00.419837 IP6 [pr:ef:ix:xx]:cccc::2 > [pr:ef:ix:xx]:aaaa::1: ICMP6, echo request, seq 2, length 64
21:15:00.419868 IP6 [pr:ef:ix:xx]:aaaa::1 > [pr:ef:ix:xx]:cccc::2: ICMP6, echo reply, seq 2, length 64
21:15:04.579773 IP6 fe80::80d3:7eff:fe66:4a14 > [pr:ef:ix:xx]:cccc::2: ICMP6, neighbor solicitation, who has [pr:ef:ix:xx]:cccc::2, length 32
21:15:04.579823 IP6 [pr:ef:ix:xx]:cccc::2 > fe80::80d3:7eff:fe66:4a14: ICMP6, neighbor advertisement, tgt is [pr:ef:ix:xx]:cccc::2, length 24
21:15:08.163772 IP6 fe80::216:3eff:fe66:8211 > ff02::2: ICMP6, router solicitation, length 16
21:15:09.699802 IP6 fe80::216:3eff:fe66:8211 > fe80::80d3:7eff:fe66:4a14: ICMP6, neighbor solicitation, who has fe80::80d3:7eff:fe66:4a14, length 32
21:15:09.699864 IP6 fe80::80d3:7eff:fe66:4a14 > fe80::216:3eff:fe66:8211: ICMP6, neighbor advertisement, tgt is fe80::80d3:7eff:fe66:4a14, length 24
^C
17 packets captured
17 packets received by filter

max1220 · April 8, 2021, 10:27pm

Nevermind, I think I figured it out:
I don’t have a routed IPv6 /64 subnet.
I can now ping without problems after running these commands:
sysctl net.ipv6.conf.all.proxy_ndp=1
sudo ip -6 neigh add proxy [pr:ef:ix:xx]:cccc::2 dev ens3
As far as I understand it, this should start to advertise that I use the address [pr:ef:ix:xx]:cccc::2 on the interface ens3, right?

This basically solved my problem, but feel free to explain more details - I feel like I have only a surface-level grasp of the topic.

For example, do I need to re-run this command every so often?
It’s probably not reboot-proof.
What’s the best way to make this permanent? Do I need to make the route more permanent for the provider routers? Will they “loose” my route after a while without activity?

max1220 · April 9, 2021, 12:58am

To sort automatically make container addresses available I’ve installed ndppd, and configured it like this:

route-ttl 30000
proxy ens3 {
    router no
    timeout 500
    ttl 30000
    rule [pr:ef:ix:xx]:cccc::/80 {
        auto
    }
}

This should(on the host WAN interface) broadcast NDP information about the LXC network and make it reachable without adding the peers manually.
I’d like someone to verify that this is at least somewhat right.

EDIT: Also LXC 3.2.1 added support for the lxc.net.[i].l2proxy option, making this less relevant. Debian 10 buster unfortunately does not ship that version

risk · April 9, 2021, 4:33am

So you had “echo request” going out, and saw “nd who has” coming back that was ignored by your host physical interface… Well… ok.

Happy to see this working for you now.

max1220 · April 9, 2021, 1:41pm

My problem was that I didn’t knew enough to interpret that tcpdump successfully. I still don’t - Here’s my best-guess as to what happens now on the working setup. Feel free to correct me, this is full of assumptions. I’m here to learn .
The parts marked with (?) I’m especially uncertain about.


# this is a real tcpdump for the interface lxcbr0inet6,
# just indented and reformatted and with addresses replaced with symbolic names.

# start pinging google on one of the containers

# first, we need to know how to contact the [bridge ip], as it is set as the default route("gateway")

# the container sends a packet asking
# "is [bridge ip] my neighbor(on this network), tell [container ip]"()
# to the corresponding multicast address, calculated as a prefix + the last 24 bit of [bridge ip]
# the bridge simply answers "Yes, I'm here"

# first for the link-local address("Do I have the neighbor [bridge ip]?")
[container link-local ip]	 > [bridge Solicited-node multicast ip]: 	neighbor solicitation, who has [bridge ip]

# the bridge answers to our link local IP that it's reachable here("I, [bridge ip], am your neighbor")
[bridge ip]					 > [container link-local ip]: 				neighbor advertisement, tgt is [bridge ip]

# repeat with the global IP("Do I have the neighbor [bridge ip]?")
[container ip]				 > [bridge Solicited-node multicast ip]: 	neighbor solicitation, who has [bridge ip]

# the bridge answers again that it's reachable here("I, [bridge ip], am your neighbor")
[bridge ip]					 > [container ip]: 							neighbor advertisement, tgt is [bridge ip]

# our client and host now know how to communicate.
# now we're sending echo requests:
[container ip]				 > 	[google ip]: 	echo request, seq 1
[container ip]				 > 	[google ip]: 	echo request, seq 2

# the VM host forwarded those echo requests to the google IP,
# and a echo reply came back to the provider upstream router, intended for the container IP.
# It sends out NDP packets to see if we will claim that IP, and if we do, set up the route.
# Because we setup ndppd to listen on ens3 for NDP packets in our bridge IP range,
# (to which the container ip belongs) we accept these packets, and dump them on this interface.

# (?) somebody from the outside wanted to know if [container ip] is there,
# and ndppd now wants to know if it has that IP.
[bridge link-local ip]		 > [container ip]: 							neighbor solicitation, who has [container ip]
[bridge link-local ip]		 > [container link-local ip]: 				neighbor solicitation, who has [container link-local ip]

# (?) this should tell ndppd that we're using this IP
[container ip]				 > [bridge link-local ip]: 					neighbor advertisement, tgt is [container ip]
[container link-local ip]	 > [bridge link-local ip]: 					neighbor advertisement, tgt is [container link-local ip]

# echo request(no response)
[container ip]				 > [google ip]: 	echo request, seq 1
[container ip]				 > [google ip]: 	echo request, seq 2

# (?) the container wants to know if it can reach bridge link-local ip
[container link-local ip]	 > [bridge link-local ip]: 					neighbor solicitation, who has [bridge link-local ip]
[bridge link-local ip]		 > [container link-local ip]: 				neighbor advertisement, tgt is [bridge link-local ip]

# echo request(no response)
[container ip]				 > [google ip]: 	echo request, seq 7
[container ip]				 > [google ip]: 	echo request, seq 8

# (?) we get a forwarded NDP request from the WAN interface, and NDP knows it's
# reachable, so it forwards it to our network.
[bridge link-local ip]		 > [container Solicited-node multicast ip]: neighbor solicitation, who has [container ip]
# (?) we respond to the proxied request, it gets send out via enp3(WAN).
[container ip]				 > [bridge link-local ip]: 					neighbor advertisement, tgt is [container ip]
# (?) The provider router now has successfully got the NDP response,
# routes this IP for our hosts ens3(WAN) interface.

# finally we're getting echo reply packets
[google ip]					 > [container ip]: 	echo reply, seq 8
[container ip]				 > [google ip]: 	echo request, seq 9
[google ip]					 > [container ip]: 	echo reply, seq 9

risk · April 9, 2021, 6:13pm

So the thing is - your provider is running a flat L2 network for all the host-like VPS thingies. (Maybe it’s per physical host, maybe it’s flat in a rack, or row or racks, not sure about the granularity, but you get exposed to a thing that looks like a flat L2 you’re connected to) and that means they need neighbor discovery to map specific individual IPs to mac addresses of VPS VMs that are using them.

Normally with ipv6 the idea is that each host / machine / network stack gets it’s own /64 … so that you can run multiple http servers/services all on port 80 and so on. (IPv6 was designed in the early days of apache and SSL/TLS and virtual hosts embedded into protocols didn’t exist, so /64 is severe overkill today when we have these things.)

So, you have ndppd listening on your VPS outside interface, and whenever an ND request for an IP from that subnet shows up, it checks to see if it can find that IP on the bridge, If yes, then it sets the kernel to advertise that IP on the public interface: “me, me, my mac address has this ip as well”. And network outside of your VPS is happy with it.

I wonder if you get more than a single /64. What would happen if you used macvlans for your containers tied to the outside interface, or if you added the outside interface to the bridge. There’s a possibility to break your VPS networking that way, … I guess it depends on your VPS provider setup… Maybe you break their network that way.

max1220 · April 11, 2021, 11:25am

Thank you for confirming my collbed-together IP knowledge on this topic, you’ve been a great help

I actually tried to use the bridge with then ens3 interface simply attached to a port. I didn’t really do an Investigation, but I noticed that IPv4 didn’t work anymore, and that I was getting a lot of log output on the console(don’t remember what(was a VNC connection), possibly logged “martians” packets for the bridgenet.ipv4.conf.default.log_martians=1).

I bet it would be possible(I think I did a setup like this before…), but would require some iptables filtering/mangling.

Nevertheless, my current setup(default lxc-net bridge for IPv4 with dnsmasq as pseudo-dhcp, optional second manual interface for IPv6) works great now.
I’m currently aggregating my setup scripts to make my setup repeatable, and when I’m (mostly) done, I’ll publish them here with a nice explanation (including a menu-based LXC admin script, and a bunch of scripts for setting up a LXC container for web server, mail, etc.).