TL;DR - I’m clearly missing something in my switch config, I can’t get basic L2 routing working correctly. I have ping and ssh working, but any attempt to move traffic beyond a few KB/s kills the socket.
[solution]
I set MTU to 9000 everywhere and it still failed. I then set the switch to 12000 (its internally stated max) and the clients to 9000 and now it works)
Dell(conf)interface fortyGigE 0/0
Dell(conf-if-fo-0/0)#mtu 12000
Dell(conf-if-fo-0/0)#interface fortyGigE 0/8
Dell(conf-if-fo-0/8)#mtu 12000
Dell(conf-if-fo-0/8)#interface fortyGigE 0/4
Dell(conf-if-fo-0/4)#mtu 12000
Dell(conf-if-fo-0/4)#interface fortyGigE 0/12
Dell(conf-if-fo-0/12)#mtu 12000
I’m guessing that 9000 + ~100-300b would have worked as well since that’s what I recall the linux kernel/driver bug missing in its header computation…
…
I’ve managed to factory reset my ebay’d S6000 switch by interrupting the boot:
(esc)
BOOT_USER# ignore enable-password
BOOT_USER# reload
(normal boot)
Dell>enable
Dell#restore factory-defaults stack_unit all clear_all
I then connected exactly 2 linux machines to ports 0 and 8. These two machines are running Mellanox VPI adaptors set to “eth” mode. For reference, here is a single threaded iperf with those two machines connected point-to-point with a copper QSFP cable:
# iperf3 -c 10.6.9.20
Connecting to host 10.6.9.20, port 5201
[ 4] local 10.6.9.22 port 39030 connected to 10.6.9.20 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 2.47 GBytes 21.2 Gbits/sec 0 866 KBytes
[ 4] 1.00-2.00 sec 2.75 GBytes 23.6 Gbits/sec 0 866 KBytes
[ 4] 2.00-3.00 sec 2.75 GBytes 23.6 Gbits/sec 0 866 KBytes
[ 4] 3.00-4.00 sec 2.75 GBytes 23.6 Gbits/sec 0 866 KBytes
Here are the setup steps I’ve taken with the switch
Dell>enable
Dell># conf
(conf)stack-unit 0 provision S6000
(config)interface fortyGigE 0/0
(conf-if-fo-0/0)#no shutdown
(conf-if-fo-0/0)#switchport
(config)interface fortyGigE 0/8
(conf-if-fo-0/8)#no shutdown
(conf-if-fo-0/8)#switchport
I can ping:
PING 10.6.9.22 (10.6.9.22) 56(84) bytes of data.
64 bytes from 10.6.9.22: icmp_seq=1 ttl=64 time=0.117 ms
64 bytes from 10.6.9.22: icmp_seq=2 ttl=64 time=0.087 ms
64 bytes from 10.6.9.22: icmp_seq=3 ttl=64 time=0.099 ms
64 bytes from 10.6.9.22: icmp_seq=4 ttl=64 time=0.120 ms
64 bytes from 10.6.9.22: icmp_seq=5 ttl=64 time=0.114 ms
I can ssh and do simple commands without issue:
ssh [email protected]
# cd /bin
# ls
abrt-action-analyze-backtrace
abrt-action-analyze-c
abrt-action-analyze-ccpp-local
abrt-action-analyze-core
abrt-action-analyze-oops
abrt-action-analyze-python
abrt-action-analyze-vmcore
....
HOWEVER, any higher bandwith activity hangs the link:
Connecting to host 10.6.9.20, port 5201
[ 4] local 10.6.9.22 port 39002 connected to 10.6.9.20 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 6.25 MBytes 52.4 Mbits/sec 2 8.75 KBytes
[ 4] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 1 8.75 KBytes
[ 4] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 8.75 KBytes
[ 4] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 1 8.75 KBytes
[ 4] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 0 8.75 KBytes
The “hang” appears to function on a socket level… ^C on iperf then allows ping to work afterward.
....
stack-unit 0 quad-port-profile 0,8,16,24,32,36,40,44,48,52,56,60,64,68,72,76,80,84,88,92,100,108,116,124
!
stack-unit 0 provision S6000
!
interface fortyGigE 0/0
no ip address
switchport
no shutdown
!
....
interface fortyGigE 0/8
no ip address
switchport
no shutdown
!
EDIT: I’ve tried ports 8 &12 as well - identical result
I also see no errors/drops reported on the client interfaces:
ens1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 10.6.9.22 netmask 255.255.255.0 broadcast 10.6.9.255
RX packets 147973 bytes 8946946 (8.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 2312667 bytes 20721437985 (19.2 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0