I’ve been struggling for a few days now to figure out RDMA (pretend in this case, before I buy more hardware). I have a server at 192.168.2.14 and a client at 192.168.2.17. Trying the simple rping
command, I can talk over the rdma interface from 14 to 17, but from 17 to 14 fails with a address resolution error.
Setup:
- Both machines running Fedora 35, each with a 10g network card, connected through a basic L2 switch (Mikrotik)
- Both machines are using SoftiWarp (supplied with kernel)
- Software firewall disabled on both machines
- Regular tcp/ip communication works fine between machines
- Both machines can rping themselves, running client and server in separate ssh sessions
Commands:
create siw interface on .17 machine
# rdma link add siw0 type siw netdev ens2f0
confirm device exists
# rdma link
link siw0/1 state ACTIVE physical_state LINK_UP netdev ens2f0
# ls /dev/infiniband/
rdma_cm uverbs0 uverbs1
create siw interface on .14 machine
# rdma link add siw0 type siw netdev enp10s0
cofirm device exists
# rdma link
link siw0/1 state ACTIVE physical_state LINK_UP netdev enp10s0
# ls /dev/infiniband/
rdma_cm uverbs0 uverbs1
test connection - setup rping server on .17
# rping -s -a 192.168.2.17 -d
created cm_id 0x55e6b39176d0
rdma_bind_addr successful
rdma_listen
run rping client on .14 (successful)
# rping -c -a 192.168.2.17 -C 3 -v
ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr
ping data: rdma-ping-1: BCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrs
ping data: rdma-ping-2: CDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrst
client DISCONNECT EVENT...
Try it the other way around
# rping -s -a 192.168.2.14 -d
created cm_id 0x56552e7fe280
rdma_bind_addr successful
rdma_listen
failure here!
# rping -c -a 192.168.2.14 -C 3 -v -d
created cm_id 0x563c038b3280
cma_event type RDMA_CM_EVENT_ADDR_ERROR cma_id 0x563c038b3280 (parent)
cma event RDMA_CM_EVENT_ADDR_ERROR, error -19
waiting for addr/route resolution state 1
destroy cm_id 0x563c038b3280
From rdma_resolve_addr(3) — Linux manual page
The IP to RDMA address mapping is done using the local routing tables, or via ARP. If a source address is given, the rdma_cm_id is bound to that address, the same as if rdma_bind_addr were called. If no source address is given, and the rdma_cm_id has not yet been bound to a device, then the rdma_cm_id will be bound to a source address based on the local routing tables.
Since IPv4 TCP/IP is working correctly I wouldn’t think it could be an ARP issue. So maybe it’s just a binding problem(?). Have you tried specifying the source rping address?
rping -c -I 192.168.2.17 -a 192.168.2.14 -C 3 -v -d
1 Like
This works, but there’s a long delay (roughly 10 seconds) before the pings go through. I’ll read the man page you referenced and see if I can learn anything.
time rping -c -I 192.168.2.17 -a 192.168.2.14 -v -C 3 -d
created cm_id 0x55ae0fe78270
cma_event type RDMA_CM_EVENT_ADDR_RESOLVED cma_id 0x55ae0fe78270 (parent)
cma_event type RDMA_CM_EVENT_ROUTE_RESOLVED cma_id 0x55ae0fe78270 (parent)
rdma_resolve_addr - rdma_resolve_route successful
created pd 0x55ae0fe6e020
created channel 0x55ae0fe6dfe0
created cq 0x55ae0fe78a10
created qp 0x55ae0fe78ad0
rping_setup_buffers called on cb 0x55ae0fe6d830
allocated & registered buffers...
cq_thread started.
cma_event type RDMA_CM_EVENT_ESTABLISHED cma_id 0x55ae0fe78270 (parent)
ESTABLISHED
rdma_connect successful
RDMA addr 55ae0fe6e390 rkey 7b5aed00 len 64
send completion
recv completion
RDMA addr 55ae0fe6e600 rkey 6b3f5300 len 64
send completion
recv completion
ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr
RDMA addr 55ae0fe6e390 rkey 7b5aed00 len 64
send completion
recv completion
RDMA addr 55ae0fe6e600 rkey 6b3f5300 len 64
send completion
recv completion
ping data: rdma-ping-1: BCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrs
RDMA addr 55ae0fe6e390 rkey 7b5aed00 len 64
send completion
recv completion
RDMA addr 55ae0fe6e600 rkey 6b3f5300 len 64
send completion
recv completion
ping data: rdma-ping-2: CDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrst
cma_event type RDMA_CM_EVENT_DISCONNECTED cma_id 0x55ae0fe78270 (parent)
client DISCONNECT EVENT...
rping_free_buffers called on cb 0x55ae0fe6d830
destroy cm_id 0x55ae0fe78270
real 0m10.289s
user 0m0.003s
sys 0m0.006s