iWarp or SoftiWarp in Linux?

I’ve been struggling for a few days now to figure out RDMA (pretend in this case, before I buy more hardware). I have a server at 192.168.2.14 and a client at 192.168.2.17. Trying the simple rping command, I can talk over the rdma interface from 14 to 17, but from 17 to 14 fails with a address resolution error.

Setup:

  • Both machines running Fedora 35, each with a 10g network card, connected through a basic L2 switch (Mikrotik)
  • Both machines are using SoftiWarp (supplied with kernel)
  • Software firewall disabled on both machines
  • Regular tcp/ip communication works fine between machines
  • Both machines can rping themselves, running client and server in separate ssh sessions

Commands:

create siw interface on .17 machine
# rdma link add siw0 type siw netdev ens2f0

confirm device exists
# rdma link
link siw0/1 state ACTIVE physical_state LINK_UP netdev ens2f0

# ls /dev/infiniband/
rdma_cm  uverbs0  uverbs1

create siw interface on .14 machine
# rdma link add siw0 type siw netdev enp10s0

cofirm device exists
# rdma link
link siw0/1 state ACTIVE physical_state LINK_UP netdev enp10s0

# ls /dev/infiniband/
rdma_cm  uverbs0  uverbs1

test connection - setup rping server on .17
# rping -s -a 192.168.2.17 -d
created cm_id 0x55e6b39176d0
rdma_bind_addr successful
rdma_listen

run rping client on .14 (successful)
# rping -c -a 192.168.2.17 -C 3 -v
ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr
ping data: rdma-ping-1: BCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrs
ping data: rdma-ping-2: CDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrst
client DISCONNECT EVENT...

Try it the other way around
# rping -s -a 192.168.2.14 -d
created cm_id 0x56552e7fe280
rdma_bind_addr successful
rdma_listen

failure here!
# rping -c -a 192.168.2.14 -C 3 -v -d
created cm_id 0x563c038b3280
cma_event type RDMA_CM_EVENT_ADDR_ERROR cma_id 0x563c038b3280 (parent)
cma event RDMA_CM_EVENT_ADDR_ERROR, error -19
waiting for addr/route resolution state 1
destroy cm_id 0x563c038b3280

From rdma_resolve_addr(3) — Linux manual page

The IP to RDMA address mapping is done using the local routing tables, or via ARP. If a source address is given, the rdma_cm_id is bound to that address, the same as if rdma_bind_addr were called. If no source address is given, and the rdma_cm_id has not yet been bound to a device, then the rdma_cm_id will be bound to a source address based on the local routing tables.

Since IPv4 TCP/IP is working correctly I wouldn’t think it could be an ARP issue. So maybe it’s just a binding problem(?). Have you tried specifying the source rping address?

rping -c -I 192.168.2.17 -a 192.168.2.14 -C 3 -v -d
1 Like

This works, but there’s a long delay (roughly 10 seconds) before the pings go through. I’ll read the man page you referenced and see if I can learn anything.

time rping -c -I 192.168.2.17 -a 192.168.2.14 -v -C 3 -d
created cm_id 0x55ae0fe78270
cma_event type RDMA_CM_EVENT_ADDR_RESOLVED cma_id 0x55ae0fe78270 (parent)
cma_event type RDMA_CM_EVENT_ROUTE_RESOLVED cma_id 0x55ae0fe78270 (parent)
rdma_resolve_addr - rdma_resolve_route successful
created pd 0x55ae0fe6e020
created channel 0x55ae0fe6dfe0
created cq 0x55ae0fe78a10
created qp 0x55ae0fe78ad0
rping_setup_buffers called on cb 0x55ae0fe6d830
allocated & registered buffers...
cq_thread started.
cma_event type RDMA_CM_EVENT_ESTABLISHED cma_id 0x55ae0fe78270 (parent)
ESTABLISHED
rdma_connect successful
RDMA addr 55ae0fe6e390 rkey 7b5aed00 len 64
send completion
recv completion
RDMA addr 55ae0fe6e600 rkey 6b3f5300 len 64
send completion
recv completion
ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr
RDMA addr 55ae0fe6e390 rkey 7b5aed00 len 64
send completion
recv completion
RDMA addr 55ae0fe6e600 rkey 6b3f5300 len 64
send completion
recv completion
ping data: rdma-ping-1: BCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrs
RDMA addr 55ae0fe6e390 rkey 7b5aed00 len 64
send completion
recv completion
RDMA addr 55ae0fe6e600 rkey 6b3f5300 len 64
send completion
recv completion
ping data: rdma-ping-2: CDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrst
cma_event type RDMA_CM_EVENT_DISCONNECTED cma_id 0x55ae0fe78270 (parent)
client DISCONNECT EVENT...
rping_free_buffers called on cb 0x55ae0fe6d830
destroy cm_id 0x55ae0fe78270

real    0m10.289s
user    0m0.003s
sys     0m0.006s