PROXMOX LXC Containers, Possible to Link Aggregate 10Gig Interfaces?

Windows7ge · March 11, 2023, 11:37pm

We have a HPE DL380 G9 with six-teen 2.5" storage bays. This is going to be used as a Windows Image Deployment server. Imaging up to around 30 computers at a time. We also have a plethora of 1.92TB HPE branded SSDs. If I can I’d like to aggregate two or potentially four 10Gig NIC ports and run these to a pair of CISCO switches but I’m having some issues with preliminary planning.

First issue I see checking the link speed of a Linux Bridge is that even with a 40Gig NIC (Mellanox CX314A) a LXC Container only recognizes the 40Gig interface as 10Gig?

user@server:~$ cat /sys/class/net/eth1/speed
10000

It is recognized as 40Gig on the host but the NIC itself is not going in the planned server but it worries me to see that if a true 40Gig NIC reports as 10Gig that if I bond two or four 10Gig ports they’re not going to report or function any better. Or is it more complicated than that?

An additional concern is reading through the PROXMOX documentation on creating a Linux Bond. I looked at all the various modes balance-rr, active-backup, balance-xor, broadcast, LACP, balance-tlb, & balance-alb. Some of them I understand. Some of them I don’t. The desire would be for the two or four ports to behave like a switch so I can retain one network configuration on the Windows Imaging Server but I would also like to potentially plug the cables into different switches or VLAN groups.

Ideally I’d like to have one network with two 10Gig cables going to two switches (4 cables total) providing up to 20Gig to each switch. I don’t know if that is possible though with this software or at all.

Worse comes to worst it looks like the mode “broadcast” would make the ports behave like a hub. I’d still only have 10Gig but I could break the two switches into two or four VLAN groups and give each a 10Gig cable. That may be the route I end up going if I have to for at the very least fault-tolerance.

jode · March 13, 2023, 6:07pm

Check card capabilities with ethtool <device>.

Example

Settings for enp4s0:
Supported ports: [ FIBRE ]
Supported link modes: 10000baseKX4/Full
40000baseCR4/Full
40000baseSR4/Full
56000baseCR4/Full
56000baseSR4/Full
1000baseX/Full
10000baseCR/Full
10000baseSR/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10000baseKX4/Full
40000baseCR4/Full
40000baseSR4/Full
1000baseX/Full
10000baseCR/Full
10000baseSR/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Speed: 56000Mb/s
Duplex: Full
Auto-negotiation: off
Port: FIBRE
PHYAD: 0
Transceiver: internal
netlink error: Operation not permitted
Current message level: 0x00000014 (20)
link ifdown
Link detected: yes

It should list supported link modes and advertised link modes.
In case 40gib is listed in supported but not in advertised, you can manually advertise using ethtool. Check the man page for the codes specific to your transceiver/cable.

Windows7ge · March 13, 2023, 7:11pm

This is from the container not the hypervisor but it looks to me like I’m missing both.

Settings for eth1:
        Supported ports: [ ]
        Supported link modes:   Not reported
        Supported pause frame use: No
        Supports auto-negotiation: No
        Supported FEC modes: Not reported
        Advertised link modes:  Not reported
        Advertised pause frame use: No
        Advertised auto-negotiation: No
        Advertised FEC modes: Not reported
        Speed: 10000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: off
        MDI-X: Unknown
        Link detected: yes

This is a VMBR interface (virtual bridge) connected to the physical interface so I’d expect to see minimal details unlike reading the actual physical interface of which PROXMOX does not support ethtool but I can read the current port speed using cat.

proxmox@host:~# cat /sys/class/net/enp24s0/speed
40000

I guess the question is more a PROXMOX hypervisor issue. Does it emulate all VMBR interfaces as 10Gig? Can I tell the hypervisor to emulate this bridge faster?

Zedicus · March 13, 2023, 7:46pm

Mellanox cards are kind of random in linux. the general first question is have you updated the firmware on the Mellanox card? It is kind of trial and error to find firmware, kernel, and driver to get all the features out of a Mellanox card.

a NIC bond only makes multiple cards function as 1 card. so to do what you want, you would make a multi-device bridge only, and enable VLAN support on the bridge. (think trunk ports) also, i dislike cisco for making this harder than it should be. a linux bridge can have multiple NICs added to it, and can be made VLAN aware.

take the ‘speed’ line out of your bridge, it is more of a ‘oops linux didnt detect it’ setting and should only be there if it was running at like 1mb. the bridge nic will report to a VM what it thinks it physically is, regaurdless of how fast it can actually go, also. so just because a bridge reports 1gb (or whatever) that does not mean it is throttling anything.

also, for example, if the VM is set to e1000 for a nic it will always report 1gb, and will be driver limited to 1gb.

Windows7ge · March 13, 2023, 8:42pm

Little bit of miscommunication but still informative. The mellanox card here is just for planning and testing. It’s what I have fast at hand. A Linux Bond is a PROXMOX feature I’ve never used but looks to be the answer for what I’m looking to do. I’m just trying to understand how it works exactly before I go investing time deploying.

The server that I’m looking to bond multiple interfaces on would be using one or two generic BCM57810S NICs. So Broadcom which I take may not be any better or possibly worse.

Really the company can live with just 10Gig broadcasted but the server is so over-the-top for what it’s being used for I might as well kit-out the networking if I can figure it out.

This is one of the things I’m wondering in this scenario if it’s not being throttled just misreported. As I understand 40Gig interfaces they’re really 4x10Gig somehow? So I would figure a virtual bridge probably doesn’t have a setting that displays higher than 10Gig. I suppose the only real way to find out for sure would be to deploy it, flood the network with data transfers and see if the bond goes over the 10Gig threshold.

This is another thing I may run into as an issue. If I can I’d like to avoid having to configure proper LACP 802.3ad on the switch. I don’t have a model number for you but it’s CISCO and it is definitely managed.

I’ve worked in CISCO switch environments before but I’ve never setup LACP. So that’s another thing I’d have to research.

Zedicus · March 14, 2023, 2:11am

a ‘bond’ and a ‘bridge’ in linux (ProxMox = Linux) are 2 different things. from your initial description, i am recommending you use a bridge.

this again makes me think a bridge (with or with out VLAN support) is what you want. i will list a couple examples:

a bridge WITHOUT VLAN support, then on the switch you configure access ports on the VLANS you need to talk to and plug stuff in there. (the server can have multiple bridge ports)

a bridge WITH VLAN support, and all you needed VLANS tagged, then you configure a TRUNK port on the switch and cable it.

likely, and your thought on testing is correct. i use ProxMox a ton but honestly i mostly use VMs and not LXC containers. i have not ever had to do any mods to a LXC container nic so for that one item i can not say how it functions with a 40gb physical device on the host.

Windows7ge · March 14, 2023, 11:31pm

Am I able to aggregate multiple NIC’s with a bridge for higher throughput and redundancy? I’m not using the 40Gig NIC. I’m looking to aggregate multiple 10Gig NICs.

For the application I am contemplating this if I have to it just needs more figuring out on the software side of things because the applications will need to figure out on their own what network they’re attached to in order to run scripts. The payoff would be worth it though as that would offer the full bandwidth of every NIC in the group.

I mainly used VMs until I learned of the overhead that still exists. It’s not bad in low load applications but becomes very apparent in high load applications. I was running BOINC in a VM then tried it in a LXC. Measurable performance gain. For this application though I could probably get away just fine. I do have to virtualize a pfSense router on this server though.

Zedicus · March 15, 2023, 1:14pm

not exactly, in theory it is possible to build a bridge and use bonded NICs as the interface for the bridge. that is actually something i have thought about but never tried.

how many different networks do you have? if you have enough NICs that you could put 1 on each network then you could just pass through the NICs and not bother with any bridge config or any real custom network.

yes true, i just have always ran out of disk, or ram, long before running out of compute. and an LXC container has minimal difference in those areas. Also containers have some significant drawbacks for me as i tend to be more behind on version updates of my core environment, and some apps need newer kernel and other features than what a container could provide.

Windows7ge · March 15, 2023, 7:34pm

It’s going to be at least two networks but if the ZFS cache works I plan to setup four. I was hoping to keep everything on one network for the sake of simplicity but if I can’t aggregate the NICs how I’d like to then there’s no way I’m aware of to get around multiple.

Yeah the LXC’s still have their drawbacks. I mean. I know if I use full virtualization and paravirtualize the NIC Windows in that VM thinks that interface is 100Gbit so. Pick my poison.

It will probably come down to doing some experimentation, trial and error. We have the space to test thing in an isolated environment before making them live. Regardless I wanted to ask the community if they had any knowledge on the topic before I run into anything unexpected.