iPXE Not Fetching IP via DHCP

Hello, I’ve recently setup a new server consisting of a AMD EPYC 7601, ASRock Rack EPYCD8, and 8x16GB 2Rx8 ECC RDIMM NEMIX/Micron memory modules.

I’ve run into an issue where the 10Gig NIC I added (Mellanox ConnectX-2 MNPA19-XTR) is suppose to query a Debian based DHCP server I configured using isc-dhcp-server for both IP and iSCSI information.

I’ve tested:

  • The NIC’s
  • The Optics
  • The Fiber Patch Cables

Everything checks out. I’ve verified that the same NICs collect an IP just fine when installed on another server but for some reason when used on this one it times out.

Screenshot from 2021-04-09 20-20-45

I tried chain-loading by running the latest version of iPXE from the ipxe.org website but this changed nothing.

The most peculiar thing is if I attempt the same function with the motherboards built-in onboard PXE that collects an IP no problem. If I try to load a OS from USB it collects an IP no problem.

Something between iPXE, FlexBoot, and this platform isn’t compatible and I can’t figure out how to make it work or if it’s possible to do at all at this point.

Any pointers are appreciated.

Those Mellanox cards seem to be cursed to not work with iPXE. One of my colleagues had to deal with a similar issue, don’t know what server that was exactly, but he couldn’t get iPXE working with the ConnectX3 card (similar sort of stuff).

It was failing to bring the card up, I do not remember if it was the DHCP step though. Unfortunately, what he had to do in the end is to patch iPXE and fix it himself. It was a week or two amount of work, and I’m not entirely familiar with the details.

All I know that it was a multi step process with lots of printk debugging. It did not end up a giant change, and he could make his own iPXE image for a bunch of servers with that same configuration.

I’ll interrogate him after the weekend I guess. Might get some more details on what was done. As hard as I try, I can’t remember anything too specific on what he explained about it.

Connection timed out

There’s your problem. How to solve it? No idea, I’m not familiar with these cards.

If you believe he may have insight into my issue it would be appreciated.

It’s the strangest thing to me. It works. I know it works. I tested and validated that it works on other systems. It just doesn’t want to on this EPYC system and that’s where I’m being thrown for a loop. My only guess is some sort of Legacy incompatibility but then that doesn’t explain why booting updated iPXE from USB gave the exact same results…

Helpfulness: 10/10

Not super helpful, but are you connecting straight to a nic on the debian server? Did you try looking at all traffic from that nic (wireshark/tshark/tcpdump) and have you tried checking if the card firmware doing anything else after your DHCP server replies, or is it behaving like it’s ignoring dhcp offer?

(e.g. is it reaching for tftp)

You could also try dnsmasq for your troubleshooting, it has a built-in tftp server, and it’s fully configurable via command line flags, might be easier to configure for testing and re-testing.

This Debian instance is actually being hosted on a VM. The NIC it uses is shared via paravirtualization. This connects to a Ubiquiti US-16-XG which then connects to the EPYC server. I’ve proven this setup works on other servers but this EPYC platform just doesn’t want anything to do with it.

I lack the skill required to read network packets. I would have no idea what I’m looking at/for or what action to take if I found where the signal stops. As for the behavior as far as I can read it it acts like it’s not getting a reply. Indicator lights dictate there is both a connection and activity on the line but somehow it’s almost as if it’s rejecting DHCP or just not sending the request to begin with…

I’ve disabled TFTP as I had unrelated issues with chain-loading and it isn’t needed for my application anyhow. I could explore dnsmasq but if that does end up being a work around it means I’d have to either rework my entire DHCP setup for the other servers or somehow through VLAN’s make dnsmasq special for only this server and that’d be tedious.

I’d just buy an Intel NIC. Costs way less than a week or two of your time is worth.

I’m not opposed to this if it works with my existing isc-dhcp-server setup. If I have to modify the DHCP servers config to suit how the Intel NIC wants to receive information then I’m going to have to figure out/learn how to make the server hand out DHCP information based on vendor ID or the first three octets of the MAC address or something…

As it stand right now this is how DHCP is configured:

subnet 10.0.0.0 netmask 255.255.255.0 {
range 10.0.0.12 10.0.0.254;
}

host test-client {
hardware ethernet d0:50:99:db:a0:e3;
fixed-address 10.0.0.50;
option root-path “iscsi:10.0.0.11::::iqn.2020-12.server.com:server-1”;
}

This works with the MNPA19-XTR’s. I don’t know if this would work with Intel NIC’s. I’m open to suggestions though if you can name any inexpensive single port SFP+ variants. That preferably work with optics from Fiberstore.