So…that depends, a little bit, on what you are willing to spend.
The Infiniband cards, even older/used ones (for example the 100 Gbps Mellanox ConnectX-4 dual port VPI cards that I am using), still run at minimum, between $350-400 USD per card.
The pricing of cables vary, mostly by length, then type. For example, direct attached copper (DAC) cables I think can be had for as little as maybe $50-100. (Again, varies depending on when you’re looking to buy them, and supply, etc.)
If you need cables that are longer than a few metres, then you’ll need fibre optic cables which can come in both passive and active flavours. I THINK that the last time that I bought, I THINK it was a 100 m active optical cable (AOC) (QSFP28 to QSFP28) was I THINK $160 USD per cable.
So, if you’re only trying to link two systems up together, you can buy two NICs, and an upto $100 1 m long DAC cable and that will be enough to get you up and running.
But that’s also if you’re willing to spend $850 for 100 Gbps.
If, say, you know that you aren’t EVER going to REMOTELY come CLOSE to hitting or needing that kind of bandwidth, then whilst you can get it at a significant discount compared to retail pricing, there is an argument that can be made that you are spending money on bandwidth that you won’t ever be able to use.
At that point, you might be better of with either 10 Gbps (e.g. Mellanox ConnectX-3) or 25 Gbps or 40 Gbps or 56 Gbps. Again, each “tier” has their own pricing levels for cards, cables, etc.
If you want to connect more than two computers to each other, (e.g. three) - you can do it where computer A talks to B, and computer B talks to both A and computer C, but computer A nor computer C would be able to talk directly to each other (i.e. computer B MUST act as the messenger-in-the-middle to pass data between computer A through to computer C).
More than that, then you pretty much NEED to have a switch. I only paid $2950 CAD for my 36-port Mellanox MSB7890 externally managed 100 Gbps Infiniband switch, so again, it depends on what you are willing to pay.
Bottom line:
To run NFSoRDMA, you need at least two NICs, both of which support RDMA.
And a cable that will connect the two cards together.
Stupid question - what’s “ecn”?
If it is about explicit congestion notification - I am not sure if you need ECN for RoCE.
I mean, you can implement it (cf. https://support.mellanox.com/s/article/how-to-configure-roce-over-a-lossless-fabric--pfc---ecn--end-to-end-using-connectx-4-and-spectrum--trust-l2-x), but I’m not sure that’s a requirement for RoCE.
You can consult either the Mellanox driver manual that’s appropriate for your target OS for details in regards to configuring RDMA and/or RoCE.
(RoCE assumes that you are running over the ethernet protocol, instead of, for example, over the Infiniband protocol.)
(I use the Infiniband protocol, not Ethernet protocol as I can assign a IPv4 static IP address via IPoIB as IPv4 is easier for software to work with rather than using the IB GUID or QP or something along those lines.)
RDMA over converged ethernet (RoCE) would need a card that supports RoCE, yes.
But if you just want to say, run NFS over RDMA (NFSoRDMA), you do not “need” to have ethernet (at all).
(Again, varies by implementation.)
For example:
If you have clients that you want to access your servers via NFSoRDMA, then your clients would also need to have that capability as well. And sometimes you might have that, whilst other times, you might not want to spend that kind of money on it.
For example, my Windows systems doesn’t really support NFSoRDMA with the default Windows Mellanox ConnectX-4 driver anyways, so that’s a little bit of a moot point. (I think that the Windows NFS “feature” can only mount the NFS export the “normal” way (i.e. NOT over RDMA)), so enabling it would be a bit of a moot point.)
I forget if the Mellanox ConnectX-4 driver for Windows allowed for NFSoRDMA on Windows. (It’s been a LONGGGG time since I’ve used it/tested it in Windows.)
So, none of my Windows systems has it. I still access the server over just conventional gigabit ethernet.
Conversely though, my HPC clients (compute nodes) along with my system that runs the LTO-8 tape backup system, all of those run Linux, and since they all can make use of NFSoRDMA, therefore; all of those system has that feature and functionality enabled because those systems can actually take/make use of it.
I am trying to be deliberately clear that the whole “converged ethernet” part is NOT needed to deploy RDMA, especially NFSoRDMA. For that, you can run entirely on the Infiniband protocol and it works fine. I’ve had pretty much no problems with it, so long as I am using CentOS. I vaguely recall that I might have had a tiny bit of an issue with Ubuntu not wanting to start up the subnet manager that’s needed for Infiniband to connect whereas with ethernet, you don’t need the subnet manager. But with CentOS, OpenSM subnet manager runs just fine, so I have my micro HPC cluster headnode also act as the subnet manager and that brings up my entire IB network online.
(In theory, if CentOS is NOT the main OS that you want to use, you MIGHT be able to passthrough one of the IB ports to a VM that runs CentOS or maybe Mellanox might have fixed that in newer versions of the driver. I don’t really know, again, as I haven’t tried it in over 2.5 years. Once I got CentOS working, I just stuck with it.)
So the requirement on the hardware side is if you want to run NFSoRDMA, you just need to have a NIC that supports RDMA. RDMA over converged ethernet is not even 100% necessary, unless you want to skip running OpenSM subnet manager altogether. (OpenSM doesn’t take much to run at all.)
In my initial testing, on my dual VPI cards, I can configure the ports to run in ethernet mode instead of in IB mode, and that resulted in about a 1-3% additional overhead.
If you want to just run NFSoRDMA, you can get a couple of “normal” (or original flavour) Mellanox ConnectX-3 cards (i.e. not the EN nor the LX versions), and a cable, and then install the OS of your choice, follow the instructions on how to enable RDMA for NFS on your said specific OS of choice, whether it is with the “inbox” drivers that ships with the OS or with the MLNX OFED Linux driver, and then you can follow your OS specific instructions for deploying NFSoRDMA.
(Again, if you’re using CentOS, the instructions for how to deploy NFSoRDMA is provided above, which comes from my cluster deployment notes.)
It doesn’t really take a whole lot if you want to test it out.
To be able to take full advantage of it though, that also depends a LOT on the kind of hardware that you are connecting those NICs TO:
CPU, RAM, motherboard, availability of free PCIe lanes, are you using NVMe SSDs, SATA SSDs, or HDDs? etc.
Like the “bulk” storage that’s in my micro HPC cluster head node has eight 10 TB SAS 12 Gbps spinners in RAID5 (I think that it’s controlled with a LSI/Broadcom MegaRAID SAS 12 Gbps HW RAID HBA 9341-8i?), so the most that I can pull from that is about 800 MB/s or about 6.4 Gbps, which means that even 10 GbE would have sufficed.
My SATA 6 Gbps SSDs (four in RAID5, I think), doesn’t really fair that much better.
NVMe SSDs would be faster, but then I’d also just wear them out that much faster as well.
So, at the end of the day, it can be really cool to have NFSoRDMA, but if you don’t have the hardware that you would really be able to make use of it, again, it’s something nice to play with, but you might not realise the kind of benefits that you might be expecting, for example, over “conventional” 10 GbE.
But I leave that up to you.
I needed my 100 Gbps because of the type of HPC problems that I was solving. The NFSoRDMA was just the cherry on top.