Connection issues: Intel 10G NIC and 10GTek direct attached copper

So this past weekend, a friend of mine helped me run a ton of cable, including a OM3 fiber cable that connects two parts of my house. I have two QNAP switches with 10GTek transceivers. They get a full link and I can get 1GB connections through all the standard copper ports.

(excuse the mess. I’m waiting on some wall plates to come in to do all the cable management)

I bought two Intel 82599ES 10-Gigabit SFI/SFP+ NICs, one for my NAS and the other for my desktop. I’m using shorter 10GTek SPF+ direct attached copper connections to go from the QNAP switches to the two machines. When I first plug in the cables, I get a link, but it immediately goes down:

[927094.813216] ixgbe 0000:01:00.0 enp1s0f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[927095.029300] ixgbe 0000:01:00.0 enp1s0f0: NIC Link is Down
[927305.720160] ixgbe 0000:01:00.0 enp1s0f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[927305.968216] ixgbe 0000:01:00.0 enp1s0f0: NIC Link is Down
[927310.921036] ixgbe 0000:01:00.0 enp1s0f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[927311.025036] ixgbe 0000:01:00.0 enp1s0f0: NIC Link is Down

This happens on both my Linux (desktop) and FreeBSD (file server) boxes:

# ifconfig
ix0: flags=8822<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=4e53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
	ether 90:e2:ba:86:79:78
	media: Ethernet autoselect
	status: no carrier
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
ix1: flags=8822<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=4e53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
	ether 90:e2:ba:86:79:79
	media: Ethernet autoselect
	status: no carrier
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
# service dhclient restart ix1
dhclient not running? (check /var/run/dhclient/dhclient.ix1.pid).
Starting dhclient.
ix1: no link .............. giving up
/etc/rc.d/dhclient: WARNING: failed to start dhclient

The 10G light on the QNAP side is green. On the PC sides, they’re solid green for a second and then flicker. Here is the information from ethtool on the Linux side:

ethtool --module-info enp1s0f0
	Identifier                                : 0x03 (SFP)
	Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
	Connector                                 : 0x21 (Copper pigtail)
	Transceiver codes                         : 0x00 0x00 0x00 0x00 0x00 0x04 0x00 0x00 0x00
	Transceiver type                          : Passive Cable
	Encoding                                  : 0x00 (unspecified)
	BR, Nominal                               : 10300MBd
	Rate identifier                           : 0x00 (unspecified)
	Length (SMF,km)                           : 0km
	Length (SMF)                              : 0m
	Length (50um)                             : 0m
	Length (62.5um)                           : 0m
	Length (Copper)                           : 3m
	Length (OM3)                              : 0m
	Passive Cu cmplnce.                       : 0x01 (SFF-8431 appendix E) [SFF-8472 rev10.4 only]
	Vendor name                               : OEM
	Vendor OUI                                : 00:40:20
	Vendor PN                                 : SFP-H10GB-CU3M
	Vendor rev                                : R
	Option values                             : 0x00 0x00
	BR margin, max                            : 0%
	BR margin, min                            : 0%
	Vendor SN                                 : CSC210601561102
	Date code                                 : 210610

On Linux I added options ixgbe allow_unsupported_sfp=1 to /etc/modprobe.d/ixgbe.conf and did an rmmod and modprobe of the ixgbe module and still get the same issue.

So what’s the problem here? Are the 10GTek direct attached copper not compatible with Intel NICs?

Have you tried the cables between switches and between the two servers? Just to understand where the issue is…

If I plug the transceiver that’s in the QNAP switch directly into one of the NICs, I see the same flapping connection/disconnected issue (at least on Linux). Plugging the directly attached fiber into two ports on the same switch gives me green solid lights on both ports.

The NAS is on the other side of the house and I haven’t tried moving it to directly attach it. Both the LC fiber transceivers and the directly attached copper SPF+ connectors are made by 10Gtek. The NICs were used off eBay. I’m going to guess the Intel NICs might be bad or might be incompatible with 10Gtek.

It looks like 10Gbtek has a specific dac cable for intel gear: XDACBL2M
that ‘may be’ your issue given that the cable works everywhere else …

Alright, so some updates. On the Linux box, I purchased a MYRICOM Inc. Myri-10G with a matching transciever and I get a full 10G link to the switch and can get an IP address:

ethtool enp4s0
Settings for enp4s0:
	Supported ports: [ FIBRE ]
	Supported link modes:   Not reported
	Supported pause frame use: No
	Supports auto-negotiation: No
	Supported FEC modes: Not reported
	Advertised link modes:  Not reported
	Advertised pause frame use: No
	Advertised auto-negotiation: No
	Advertised FEC modes: Not reported
	Speed: 10000Mb/s
	Duplex: Full
	Auto-negotiation: off
	Port: FIBRE
	PHYAD: 0
	Transceiver: internal

This card doesn’t work in FreeBSD unfortunately. So I purchased two INTEL FTLX8571D3BCV-IT transceivers off eBay. Per the information I could fine, these should be compatible with the Intel 82599ES. Plugging in and unplugging the transceiver with fiber attached does change the link state (FreeBSD kernel logs:)

ix1: link state changed to DOWN
ix1: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP

But I still can’t get an IP address from my DHCP server. If I try to manually assign an address on my network using ifconfig ix1 10.10.10.88 255.255.255.0, I can’t ping any other devices on my network via that adapter.

If I run service dhclient restart ix1 on FreeBSD, and unplug/replug the transceiver, the logs do indicate the device goes down and comes back up. For a brief moment I can get an acknowledgement for the DHCP request. The light on the card also turns bright green briefly for a second, before going dim. I feel like it’s doing the same thing as the 10GTech transceiver where it initially gets a network connection and then immediately stops working, except now the link status indicates it’s active:

Starting dhclient.
dhclient not running? (check /var/run/dhclient/dhclient.ix1.pid).
DHCPDISCOVER on ix1 to 255.255.255.255 port 67 interval 6
ix1 link state up -> down
DHCPDISCOVER on ix1 to 255.255.255.255 port 67 interval 7
DHCPDISCOVER on ix1 to 255.255.255.255 port 67 interval 19
ix1 link state down -> up
DHCPDISCOVER on ix1 to 255.255.255.255 port 67 interval 5
DHCPOFFER from 10.10.10.1
DHCPREQUEST on ix1 to 255.255.255.255 port 67
DHCPREQUEST on ix1 to 255.255.255.255 port 67
DHCPDISCOVER on ix1 to 255.255.255.255 port 67 interval 7
DHCPDISCOVER on ix1 to 255.255.255.255 port 67 interval 12
DHCPDISCOVER on ix1 to 255.255.255.255 port 67 interval 13
DHCPDISCOVER on ix1 to 255.255.255.255 port 67 interval 16
DHCPDISCOVER on ix1 to 255.255.255.255 port 67 interval 9
DHCPDISCOVER on ix1 to 255.255.255.255 port 67 interval 4
No DHCPOFFERS received.
No working leases in persistent database - sleeping.

If I run ifconfig, I see that ix1 does show as active. ix0 shows as no carrier. I have tried the Intel SFP+ transceiver in both cages on the card, as well as tried both transceivers. I’ve also tried both the Intel and 10GTek transceiver on the QNAP switch side. Every combination results in the same result. The link status says active, but I can’t get DHCP requests and can’t communicate on the network.

Is this transceiver also not compatible with this card, or do I just have a bad card/transceiver?

EDIT: here is the additional -v info on FreeBSD with more transceiver information:

ifconfig -v ix1
ix1: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=4e53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
	ether 90:e2:ba:86:79:79
	inet 0.0.0.0 netmask 0xff000000 broadcast 255.255.255.255
	media: Ethernet autoselect (10Gbase-SR <full-duplex,rxpause,txpause>)
	status: active
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
	plugged: SFP/SFP+/SFP28 10G Base-SR (LC)
	vendor: Intel Corp PN: AFBR-709DMZ-IN2 SN: AA172430CXB DATE: 2017-06-19
	module temperature: 34.00 C voltage: 3.34 Volts
	lane 1: RX power: 0.60 mW (-2.20 dBm) TX bias: 6.82 mA

I just put one of the Intel NICs and Intel SPF+ transceivers back in my Linux box and it kept flapping up/down on the link. I think either these NICs are bad or Intel is extremely finicky about SPF+ units. I’m going to try an old Chelsio for the FreeBSD based NAS.

I had similiar issues with my 2 mellanox cards and the fiber optics. Now it’s all working but it costed me quite some nerves, but it works now. So don’t give up hope

  1. First issue was between my NIC (bidi 25Gb tranceiver) and ISP. There the problem / solution was to disable FEC via ethtool --set-fec encoding off. This setting got lost after every reboot, so I had to create a udev rule to automatically set it, since you can’t set it via settings in NetworkManager (at least I didn’t find the setting).

root@zephir:~# cat /etc/udev/rules.d/99-enp65s0f0-ethtool.rules

ACTION==“add”, SUBSYSTEM==“net”, NAME==“enp65s0f0”, RUN+="/usr/sbin/ethtool --set-fec enp65s0f0 encoding off"
root@zephir:~#

  1. Second problem was between Mellanox NICs (25Gb multimode transceivers).
  • First problem was that I had to disassemble/reassemble he fiber cable to cross the connections, so that sender goes into receiver on the other card.

  • Mellanox has it’s own tool to set the link speed (mlxlink). I have multiple ways to set the speed. NetworkManager, ethtool, mlxlink. It looked like they somehow interfered with each other and the link was never successfully established, it just constantly tried to negotiate a connection. So I set the speed via mlxlink to 25G only (removed 10G, 1G options). Disabled autonegotiate in NetworkManager for the connection and completely ignore ethtool (no FEC stuff have been set). And now the 2 NICs/Transceiver could negotiate to 25Gb and which FEC to use.

root@zephir:~# mlxlink -d 41:00.1

Operational Info

State : Active
Physical state : LinkUp
Speed : 25GbE
Width : 1x
FEC : Standard RS-FEC - RS(528,514)
Loopback Mode : No Loopback
Auto Negotiation : ON

root@zephir:~# nmcli connection show FiberNet | grep 802-3-ethernet.auto-negotiate
802-3-ethernet.auto-negotiate: no

Maybe it’s a similiar problem on Intel NICs. Or Intel has a similiar tool to set connection parameters, that interfers with other stuff.

Finally have 10G links and speeds!

On the FreeBSD side, I replaced the Intel NIC with a Chelsio S310 I got off eBay, with transceiver, for under $20:

One note, FreeBSD did have to be booted with the card connected to the switch via Fibre in order to install the firmware and get a carrier signal:

dmesg | grep cx
cxgbc0: <Chelsio T310, 1 port> mem 0xf6081000-0xf6081fff,0xf5800000-0xf5ffffff,0xf6080000-0xf6080fff irq 24 at device 0.0 on pci1
cxgbc0: using MSI-X interrupts (5 vectors)
cxgbc0: firmware needs to be updated to version 7.11.0
cxgb0: <Port 0 10GBASE-R> on cxgbc0
cxgb0: Using defaults for TSO: 65518/35/2048
cxgb0: Ethernet address: 00:14:5e:99:2d:64
cxgbc0: Firmware Version 7.4.0
cxgbc0: installing firmware on card
cxgb0: link state changed to UP

For the Linux side I used a Myricom that also came with its transceiver:

[    1.737250] myri10ge: Version 1.5.3-1.534
[    1.737523] Loading firmware: myri10ge_eth_z8e.dat
[    1.737553] myri10ge 0000:03:00.0: Direct firmware load for myri10ge_eth_z8e.dat failed with error -2
[    1.737571] myri10ge 0000:03:00.0: Unable to load myri10ge_eth_z8e.dat firmware image via hotplug
[    1.737585] myri10ge 0000:03:00.0: hotplug firmware loading failed
[    1.737631] myri10ge 0000:03:00.0: Successfully adopted running firmware
[    1.737641] myri10ge 0000:03:00.0: Using firmware currently running on NIC.  For optimal
[    1.737653] myri10ge 0000:03:00.0: performance consider loading optimized firmware
[    1.737665] myri10ge 0000:03:00.0: via hotplug
[    1.746332] myri10ge 0000:03:00.0: Not enabling ECRC on non-root port 0000:02:02.0
[    1.776373] myri10ge 0000:03:00.0: Direct firmware load for adopted failed with error -2
[    1.776391] myri10ge 0000:03:00.0: Unable to load adopted firmware image via hotplug
[    1.776405] myri10ge 0000:03:00.0: hotplug firmware loading failed
[    1.776452] myri10ge 0000:03:00.0: Successfully adopted running firmware
[    1.845770] myri10ge 0000:03:00.0: MSI IRQ 24, tx bndry 2048, fw adopted, MTRR Disabled, WC Enabled

It was able to get an IP immediately without issue as well.

Typically Intel has made the best NICs when it came to Linux and FreeBSD compatiabily, but that’s where I ran into issues with fiber. I don’t know if the issue was with Intel hardware in general or just my Intel hardware. This is the first time I’ve ever done 10G networking. I guess the moral of the story is, try to buy cards with their transceivers to ensure comparability.